A Practical ImplementationOfHJM

A Practical Implementation of the
Heath–Jarrow–Morton Framework
Proyecto fin de carrera

Escuela Técnica Superior de Ingeniería (ICAI)
Universidad Pontificia Comillas
Madrid
Autor: Juan Monge Liaño
Directores: François Friggit, Maria Teresa Martínez

Colaboradores: Luis Martí, Anh Tuan NGO
Madrid, junio de 2007

6
A Practical Implementation
of the Heath – Jarrow – Morton Framework
A Practical Implementation of the
Heath–Jarrow–Morton Framework
1. IntroductionEquation Chapter (Next) Section 1 5
1.1 Exotic Options 6
1.2 History 7
1.3 Models 8
1.4 HJM 9
1.5 Document Structure. 10
1.6 Special Acknowledgements 14
1.7 Project Aims 14
2. Stochastic Calculus 16
2.1 Introduction 16
2.2 Markov Process 17
2.3 Martingale 18
2.4 Brownian Motion 21
2.5 Stochastic Differential Equation 22
2.6 Risk Neutral Probability 23
2.7 Solving Stochastic Differential Equations 24
2.8 Ito’s Lemma 24
2.9 Stochastic Integral 27
2.10 Girsanov’s Theorem 27
2.11 Martingale Representation Theorem 30
2.12 Major Stochastic Differential Equations 32
3. Historical ModelsEquation Chapter (Next) Section 1 36
3.1 The Black Scholes Model 36
3.2 Beyond Black 42
7
3.3 Lognormal Classic Black 45
3.4 Normal Black 46
3.5 Black Shifted 47
3.6 Local Volatility - Dupire’s Model 48
3.7 Stochastic Volatility 59
3.8 SABR 60
4. Interest Rate Models Equation Chapter (Next) Section 1 62
4.1 Rendleman and Bartter model 63
4.2 Ho-Lee model 63
4.3 Black Derman Toy model 64
4.4 Vasicek Model 64
4.5 Cox Ingersoll and Ross model 64
4.6 Black Karasinski model 65
4.7 Hull White Model 65
4.8 Conclusions 67
5. Interest Rate ProductsEquation Chapter (Next) Section 1 68
5.1 Discount Factors 68
5.2 Zero-coupon bond 70
5.3 Interest Rate Compounding 70
5.4 Present Value PV 71
5.5 Internal Rate of Return IRR 72
5.6 Bond Yield (to Maturity) YTM 72
5.7 Coupon Rate 73
5.8 Interest Rates 75
5.9 Forward Rates 78
5.10 Instantaneous forward rate 79
6. More Complex Derivative ProductsEquation Chapter (Next) Section 1 81
6.1 Calls and Puts 81
6.2 Forward 86
6.3 Future 87
6.4 FRA 89
8
6.5 FRA Forward 90
6.6 Caplet 92
6.7 Cap 94
6.8 Swap 97
6.9 Swaption 100
7. HJMEquation Chapter (Next) Section 1 104
7.1 Introduction 104
7.2 Model Origins 105
7.3 The HJM Development 106
7.4 The rt in the HJM Approach 109
8. Santander HJMEquation Chapter (Next) Section 1 112
8.1 How to choose the γ? 113
8.2 One Factor 114
8.3 Model Implementation 117
8.4 Controlled correlation 126
8.5 Tangible Parameter Explanation 128
9. Numerical MethodsEquation Chapter (Next) Section 1 134
9.1 Discretisation 135
9.2 MonteCarlo 136
9.3 Tree Diagrams 140
9.4 PDE Solvers 145
10. CalibrationEquation Chapter (Next) Section 1 149
10.1 Algorithm 150
10.2 Calibration in Detail 153
10.3 Best Fit or not Best Fit? 157
10.4 Newton Raphson 164
10.5 Iteration Algorithm 169
11. Graphical UnderstandingEquation Chapter (Next) Section 1 170
11.1 Dynamics of the curve 174
9
11.2 HJM Problematics 175
11.3 Attempted Solutions 179
11.4 3D Surface Algorithm 181
12. HJM 3 StrikesEquation Chapter (Next) Section 1 183
12.1 Exponential 183
12.2 Mean Reversion 185
12.3 Square Root Volatility 187
12.4 Pilipovic 188
12.5 Logarithmic 189
12.6 Taylor expansion 190
12.7 Graphical Note 191
12.8 Results 191
13. Analytic approximation Equation Chapter (Next) Section 1 195
13.1 Formula Development 196
13.2 Step 1 198
13.3 Second Method 203
13.4 Step 2 205
13.5 Swaption Valuation 206
13.6 Approximation Conclusion 207
13.7 Alternative point of Calculation 208
13.8 Two Factors 209
13.9 Use of ‘No Split’ 217
14. Analytic Approximation ResultsEquation Chapter (Next) Section 1 219
14.1 1 Factor Model 219
14.2 Analytic Approximation Jacobian 227
14.3 2 Factor Analytic Approximation 230
14.4 Final Considerations on the Analytic approximation 232
14.5 Conclusions and Further Developments 233
14.6 Analytic approximation Peculiarities 233
15. Calibration Set Interpolation MatrixEquation Chapter (Next) Section 1 240
15.1 Initial Data 240
10
15.2 Former approach analysis 240
15.3 2 Strikes 241
15.4 Graphical representation 244
16. Interest Rate Volatilities: Stripping Caplet Volatilities from cap quotes 249
16.1 Introduction. Equation Chapter (Next) Section 1 249
16.2 Stripping Caplet Volatility Methods 251
16.3 Previous Santander Approach for 6 month caplets 252
16.4 Linear Cap Interpolation 254
16.5 Quadratic Cap Interpolation 257
16.6 Cubic Spline Interpolation 257
16.7 Natural splines 261
16.8 Parabolic Run out Spline 262
16.9 Cubic Run out Spline 263
16.10 Constrained Cubic Splines 263
16.11 Functional Interpolation 265
16.12 Constant Caplet Volatilities. 266
16.13 Piecewise Linear Caplet Volatility Method 267
16.14 Piecewise Quadratic 269
16.15 The Algorithm 271
16.16 About the problem of extracting 6M Caplets from Market data. 272
17. SABREquation Chapter (Next) Section 1 277
17.1 Detailed SABR 279
17.2 Dynamics of the SABR: understanding the parameters 281
18. Result AnalysisEquation Chapter (Next) Section 1 288
18.2 SABR Results 292
18.3 3D Analysis 296
18.4 Algorithm 307
18.5 Future Developments 309
19. Summary and ConclusionsEquation Chapter (Next) Section 1 314
11
20. References 320
Figure index
Fig. 2.1. Stochastic Volatility Dynamics 17
Fig. 2.2. Linear Stochastic Differential Equation dynamics 33
Fig. 2.3. Geometric Stochastic Differential Equation dynamics 33
Fig. 2.4. Square Root Stochastic Differential Equation dynamics 34
Fig. 3.1. Only the present call value is relevant to compute its future price. Any
intermediate time-periods are irrelevant 42
Fig. 3.2. Term Structure of Vanilla Options 43
Fig. 3.3. Flat lognormal Black Volatilities 45
Fig. 3.4. Normal and lognormal Black Scholes model comparison a) price vs strike b)
black volatility vs strike 46
Fig. 3.5. Alpha skew modelling 47
Fig. 3.6. Market data smile 48
Fig. 3.7. Smiles at different maturities 53
Fig. 3.8. Implied volatility σB(K,f) if forward price decreases from f0 to f (solid line) 55
Fig. 3.9. Implied volatility σB(K,f) if forward price increases from f0 to f (solid line). 55
Fig. 3.10. Future asset volatility scenarios for different an asset 56
Fig. 3.11. Future volatility scenarios for different strikes of a same underlying asset 56
Fig. 5.1. Future value of money 69
Fig. 5.2. Discount factor 69
Fig. 5.3. Bond curve dynamics 73
Fig. 5.4. Bond curve for different maturities 74
Fig. 5.5. Relating discount rates 76
12
Fig. 6.1. Investor’s profit from buying a European call option: Option price= 5$; Strike
K= 60$ 82
Fig. 6.2.Vendor’s profit from selling a European call option: Option price= 5$; Strike =
60$ 83
Fig. 6.3. Investor’s profit from buying a European put option: Option price= 7$; Strike
= 90$ 84
Fig. 6.4. Profit from writing a European put option: Option price= 7$; Strike = 90$ 85
Fig. 6.5. Forward contract: future expected value versus real future value 86
Fig. 6.6. Future contract: future expected value versus real future value 88
Fig. 6.7. FRA payoffs 90
Fig. 6.8. FRA future’s payoffs 90
Fig. 6.9. Caplet payoffs 93
Fig. 6.10. Cap payoffs 95
Fig. 6.11. Swap payoffs 98
Fig. 8.1. Example of lack of correlation between variables belonging to a unique

Brownian motion 120
Fig. 8.2. HJM dynamics for a lognormal model: flat 122
Fig. 8.3. HJM dynamics for a normal model: skew 122
Fig. 8.4. HJM dynamics for alpha parameters greater than 1 124
Fig. 8.5. for a correlation=1 amongst interest rates. 127
Fig. 8.6. Allowing for de-correlation among different interest rates 127
Fig. 8.7. Typical vanilla dynamics for different maturities 129
Fig. 8.8. Smile to skew deformation with maturity 130
Fig. 8.9. Sigma parameter global volatility level 131
Fig. 8.10. Alpha parameter skew 132
13
Fig. 8.11. Stochastic Volatility: smile creation 133
Fig. 9.1. Call future scenarios generation 136
Fig. 9.2. Normally distributed variable generation from random numbers in the (0,1)
interval 137
Fig. 9.3. Binomial tree 141
Fig. 9.4. Trinomial tree 141
Fig. 9.5. Non recombining binomial tree 142
Fig. 9.6. Binomial tree probabilities 143
Fig. 9.7. Recombining binomial tree 144
Fig. 9.8. Recombining trinomial tree 144
Fig. 9.9. PDE mesh and boundary conditions 147
Fig. 9.10. First PDE algorithm steps 148
Fig. 10.1. Calibration Process: Vanilla Products 150
Fig. 10.2. Calibration Process: Exotic Pricing 152
Fig. 10.3. Analogy cancellable swap and inverse swap 152
Fig. 10.4. Decomposition of an exotic into time periods 154
Fig. 10.5. Decomposition of an exotic into vanillas fixing at T and with different
maturities 155
Fig. 10.6. Schematic calibration matrix representation 156
Fig. 10.7. Initial Variation before first Fixing 156
Fig. 10.8. First Row Interpolated Data 157
Fig. 10.9. Inexact fit: minimum square method 158
Fig. 10.10. Exact fit 162
Fig. 10.11. Anomaly in exact fit 163
Fig. 10.12. Anomaly in minimum square method 163
14
Fig. 10.13. Newton Raphson Iterations 164
Fig. 10.14. Newton Raphson Iterations with a constant Jacobian 166
Fig. 10.15. Calibration Jacobian 167
Fig. 10.16. Jacobian calculation Iterations 168
Fig. 10.17. Detailed Calibration Algorithm: Jacobian computation 169
Fig. 11.1. HJM model: sigma vs price dynamics with different alpha parameters 171
Fig. 11.2. HJM model: alpha vs price dynamics with different sigma parameters 172
Fig. 11.3. HJM MonteCarlo model price surface 172
Fig.11.4. HJM MonteCarlo two dimensional solution 173
Fig. 11.5. HJM MonteCarlo two dimensional solution intersection for two vanilla
products 173
Fig. 11.6. Model implications on taking a) very close strikes b)distant strikes 174
Fig. 11.7. Solution dynamics with a) variation in market price b) variations in strike 175
Fig. 11.8. Convergence of the algorithm 176
Fig. 11.9. Solution Duplicity 177
Fig. 11.10. No HJM MonteCarlo solution intersection 178
Fig. 11.11. HJM MonteCarlo surface does not descend sufficiently so as to create a
solution curve 179
Fig. 11.12. Graphic surface generation algorithm 182
Fig. 14.1. Analytic approximation at high strikes 220
Fig. 14.2. Analytic approximation at distant strikes 221
Fig. 14.3. Analytic approximation acting as a tangent ‘at the money’ 222
Fig. 14.4. Analytic approximation presents difficulties in adjusting to the curve at

distant strikes 223
Fig. 14.5. Analytic approximation corrected in sigma at high strikes 225

15
Fig. 14.6. Analytic approximation corrected in sigma for low strikes 225
Fig. 14.7. Analytic approximation with a varying sigma correction 226
Fig. 14.8. HJM MonteCarlo slopes and solution curve 228
Fig. 14.9. Analytic approximation slopes and solution curve 229
Fig. 14.10. Close-up on HJM MonteCarlo’s slopes and solution curve 229
Fig. 14.11. Close-up on analytic approximation’s slopes and solution curve 230
Fig. 14.12. HJM MonteCarlo versus analytic approximation solving solution

duplicities 234
Fig. 14.13. HJM MonteCarlo presents no solution curve intersection 235
Fig. 14.14. Analytic approximation solving a case with no HJM MonteCarlo solution
intersection 235
Fig. 14.15. HJM MonteCarlo first vanilla presenting a solution curve 236
Fig. 14.16. HJM MonteCarlo second vanilla does not descend sufficiently 237
Fig. 14.17. Analytic approximation presents a solution for the first vanilla 238
Fig. 14.18. Analytic approximation also presents a solution for the second troublesome
vanilla 238
Fig. 14.19. HJM MonteCarlo versus analytic approximation for a two dimensional
view of the previous cases 239
Fig. 15.1. Strike Interpolation 241
Fig. 15.2. Strike Interpolation 242
Fig. 15.3. Vertical extrapolation no longer flat 244
Fig. 15.4. Surface Deformation in Horizontal Extrapolation 245
Fig. 15.5. Swaption Vertical Extrapolation stays the same 245
Fig. 15.6. New Circular Solution Intersection 246
Fig. 16.1. Market Cap Quotes 249
Fig. 16.2. Cap decomposition into other caps and capforwards 252
16
Fig. 16.3. Capforward decomposition into two unknown caplets 253
Fig. 16.4. Each cap is made up of a number of caplets of unknown volatility 254
Fig. 16.5. 2 Cap Interpolation 254
Fig. 16.6. Forward caps related to the caplets 267
Fig. 16.7. Optimisation algorithm for interpolation in maturities 271
Fig. 16.8. Cap market quotes: flat cap difference under 2 year barrier 272
Fig. 16.9. Creation of the 6 month caplets from 3 month Cap market quotes: flat cap
difference under 2 year barrier 273
Fig. 16.10. Decomposition of a six menthe caplet into two 3 month caplets 273
Fig. 17.1. Caplet current market behaviour 278
Fig. 17.2. beta = 0 skew imposition, rho smile imposition 281
Fig. 17.3. beta flat imposition =1, rho smile imposition 282
Fig. 17.4. Undistinguishable smile difference on calibrating with different beta

parameters β = 0 and β =1 284
Fig. 17.5. Constructing the caplet ‘at the money’ volatilities 286
Fig. 18.1. Cap flat market quotes versus interpolated flat market quotes 288
Fig. 18.2. Caplet interpolated volatilities using linear and quadratic approaches 289
Fig. 18.3. Cap interpolated volatilities using linear and cubic spline approaches 290
Fig. 18.4. Cap interpolation between natural and constrained cubic splines 290
Fig. 18.5. Caplet interpolated volatilities using linear and linear cap approaches 291
Fig. 18.6. Caplet interpolated volatilities using linear approaches 292
Fig. 18.7. Caplet interpolated volatilities using cubic spline, and an optimisation
algorithm using quadratic approaches 292
Fig. 18.8. - SABR shot maturity caplet smile 293
Fig. 18.9. - SABR long maturity caplet smile 293

17
Fig. 18.10. - SABR very short 6 month maturity - sharp smile 294
Fig. 18.11. SABR short maturity caplet smile inexact correction: very irregular smile294
Fig. 18.12. Difference in general smile and ‘at the money’ level 295
Fig. 18.13. Smile correction towards ‘at the money level’ 296
Fig. 18.14. Initial linear interpolated caplet volatility surface 297
Fig. 18.15. Cubic Spline caplet volatility surface 297
Fig. 18.16. SABR smooth interpolated smile surface with cubic spline 297
Fig. 18.17. Irregular Smile for both linear interpolation and cubic spline, whereas
SABR presents a much smoother outline 298
Fig. 18.18. Smile bump ‘at the money level’ in linear cap interpolation; maturity of 1,5
years 299
Fig. 18.19. Caplet 3M to 6M smile for a maturity of 1,5 years 300
Fig. 18.20. Comparisons in cubic and 3M to 6M adjustments 301
Fig. 18.21. SABR strike adjustment 302
Fig. 18.22. SABR at the money strike adjustment, β = 1 302
Fig. 18.23. SABR β = 0, normal skew; maturity 1 year 304
Fig. 18.24. SABR comparisons between long and short maturities, varying the β and
the weights 305
Fig. 18.25. Weighted SABR, β = 1, Maturity 8Y 306
Fig. 18.26. Caplet volatility surface construction algorithm 308
Fig. 18.27. Cap market quotes 309
18
Table index
Table 4.1 Normal or lognormal models with a mean reversion 66
Table 5.1 Construction of forward rates from bonds 78
Table 6.1 Swap payoff term structure 97
Table 10.1. Exotic product risk decomposition 153
Table 10.2 Ideal Vanilla calibration matrix: all data available 155
Table 10.3 Vanilla calibration matrix: market quoted data 156
Table 12.1. Mean Reversion Stochastic Volatility Summary 192
Table 12.2. Mean Reversion Stochastic Volatility Summary 193
Table 12.3. Square Root Stochastic Volatility Summary – 10 and 20 packs 193
Table 12.4. Logarithmic Stochastic Volatility Summary 193
Table 14.1. Approximation increases calibration speed by a factor of 5 232
Table 14.2. Approximation increases calibration speed by a factor of 8 233
Table 15.1 Horizontal interpolation, vertical extrapolation 242
Table 15.2 Summary table of the differences between vertical, horizontal

extrapolation, split and no split 243
Table 15.3 Results obtained through vertical extrapolation 247
Table 15.4 Results obtained through horizontal extrapolation 248
Table 16.1. Cap market quotes: flat cap difference under 2 year barrier 272
Table 17.1. Shows the different dates and strikes with which to model products with
similar needs to the HJM 277
19
20
1. Introduction
During the past two decades, derivatives have become increasingly important in
the world of finance. A derivative can be defined as a financial instrument whose
value depends on (derives from) the values of other, more basic underlying variables.
Very often the variables underlying derivatives are the prices of traded assets.
Some major developments have occurred in the theoretical understanding of how

derivative asset prices are determined, and how these prices change over time. Three
major steps in the theoretical revolution led to the use of advanced mathematical
methods.
• The arbitrage theorem, sometimes called the ‘Fundamental Theorem of Finance’

gives the formal conditions under which ‘arbitrage’ profits can or cannot exist.
This major development permitted the calculation of the arbitrage-free price of
any ‘new’ derivative product
• The Black-Scholes model (1973) used the method of arbitrage-free pricing. The
ideas are the basis of most discussions in mathematical finance. The paper they
published was also influential because of technical steps introduced in obtaining a
closed-form formula for option prices
• The equivalent martingale measures dramatically simplified and generalised the

original approach of Black and Scholes.
With the above tools, a general method could be used to price any derivative
product. Hence arbitrage-free prices under more realistic conditions could be
obtained.
But how close are the market prices of options really to those predicted by Black-
Scholes? Do traders genuinely use Black Scholes when determining a price for an
option? Are the probability distributions of asset prices really lognormal? What
further developments have been carried out since 1973?
5
Chapter 1 Introduction
In fact, traders do still use the Black Scholes Model, but not exactly in the way
that Black and Scholes originally intended. This is because they allow the volatility
used for the pricing of an option to depend on its strike price and its time to maturity.
As a result, a plot of the implied volatility as a function of its strike price gives rise to
what is known as a volatility smile. This suggests that traders do not make the same
assumption as Black and Scholes. Instead, they assume that the probability
distribution of a derivative price has a heavier left tail and a less heavy right tail than
the lognormal distribution.
Often, traders also use a volatility term structure. The implied volatility of an
option then depends on its life. When volatility smiles and volatility term structures
are combined, they produce a volatility surface. This defines implied volatility as a
function of both strike price and time to maturity.
1.1 Exotic Options
Derivatives covered at the beginning of the 1980’s were what we note as plain
vanilla products. These have standard, well defined properties and trade actively.
Their prices or implied volatilities are quoted by exchanges or by brokers on a regular
basis. One of the exciting aspects of the over the counter derivatives market is the
number of non-standard (or exotic) products that have been created by financial
engineers. Although they are usually a relatively small part of its portfolio, these
exotic products are important to a bank because they are generally much more
profitable than plain vanilla products.
These exotic products are options with rules governing the payoffs that are more
complicated than standard options. Types of exotic options can typically include
packages, non-standard American options, forward start options, chooser options,
barrier options, binary options, lookback options, Asian options,….
6
1.2 History
Models using stochastic differential equations for the modelling of stock markets
arise from the search of a function capable of reproducing the historical behaviour of
the prices. This is, a function that presents the same spiky irregular form as market
stock quotes, and thus reflects the randomness of the dynamics. The behaviour itself is
said to be fractal because any rescaling of the temporal axis always yields the same
irregular price form.
The origins of much of financial mathematics traces back to a dissertation

published in 1900 by Louis Bachelier. In it he proposed to model the movement of
stock prices with a diffusion process, or what was later to be called a Weiner, or
Brownian Motion process. However, he assumed that the stock prices were Gaussian
rather than log-normal as in the Black Scholes Model. It was not until five years later,
in Einstein’s seminal paper outlining the theory of Brownian Motion that the concepts
of Brownian motion and differentials were finally formalised.
For most of the century, the mathematical and financial branches of Bachilier’s
work evolved independently of each other. On the mathematical side, influenced by
Bacheliers work, Kiyoshi Ito went on to develop stochastic calculus, which would later
become and essential tool of modern finance.
On the financial side, Bachelier’s work was largely lost for more than half a
century. It wasn’t until 1965 that an American called Paul Samualson resurrected
Bachelier’s work and extended his ideas to include an exponential (or geometric) form
rather than the arithmetic Brownian Motion.
After this it wasn’t long until the links between the two separate branches of
work (both stemming from Bachelier) were reunited when Black, Scholes and Merton
wrote down their famous equation for the price of a European Call and Put option in
1969, work for which the surviving members received the Nobel Prize for economics
in 1997.
7
1.3 Models
Models based on the original Black-Scholes assumptions and the numerical

procedures they require are relatively straightforward. However, when tackling exotic
options, there is no simple way of calculating the volatility that should be input from
the volatility smile applicable to plain vanilla options.
Further, a simplistic approach such as that by Black Scholes models assumes that
the individual spot rates move independently to one another in a random fashion.
This is perhaps acceptable abstractly, but is not in accord with the observation that
rates for adjacent maturities tend to move together.
A number of alternative new models have since been introduced to attempt to

solve this problematic. The parameters of these models can be chosen so that they are
approximately consistent with the volatility smile observed in the market.
These models, that have come to be known as the ‘traditional models’ for pricing
interest rate options, such as the Hull White model, the Vasicek Model, the Cox
Ingersoll and Ross model or the Black Karasinski model to name but a few. Their
main difference with respect to previous models is the incorporation of a description
of how interest rates change through time. For this reason, they involve the building
of a term structure, typically based on the short term interest rate r. These models are
very robust and can be used in conjunction with any set of initial zero rates. The main
advantage of these methods lies in the possibility of specifying rt as a solution to a

Stochastic Differential Equation (SDE). This allows, through Markov theory, to work
with the associated Partial Differential Equation (PDE) and to subsequently derive a
rather simple formula for bond prices. This makes them widely suited for valuing
instruments such as caps, European bond options and European swap options.
However, they have some limitations. They need a complicated diffusion model to
realistically reconstruct the volatility observed in forward rates. In addition, despite
being able to provide a perfect fit to volatilities observed in the market today, there is
no way of controlling their future volatilities.
Furthermore, they all lead to the same drawback when solving interest rate
products: the fact that they use only one explanatory variable (the short interest rate
rt ) to construct a model for the entire market. The use of a single rt proves
insufficient to realistically model the market curve, which appears to be dependent on
8
all the rates and their different time intervals. It constrains the model to move
randomly up and down, with all the rates moving together by the same amount
according to the random motions in the unique short rate r. Consequently, these
models cannot be used for valuing interest rate derivatives such as American-style
swap options, callable bonds, and structures notes, as they introduce arbitrage
possibilities.
1.4 HJM
The most straightforward solution to the above problem should include the use of
more explanatory variables: long and medium term rates. This is, we could perhaps
consider a model in which we would use one representative short term rate, a middle
term rate, and finally a long term interest rate. The Heath Jarrow Morton framework
arises as the most complete application of the suggested approach. It chooses to
include the entire forward rate curve as a theoretically infinite dimensional state
variable. Unlike other models, this model can match the volatility structure observed
today in the market, as well as at all future times.
The LIBOR market model also provides an approach that gives the user complete
freedom in choosing the volatility term structure. We will not cover the study of this
model in the present text, but only state here the advantages that it presents over the
HJM model.
Firstly, it is developed in terms of the forward rates that determine the pricing of
caps rather than in terms of instantaneous forward rates. Secondly, it is relatively easy
to calibrate to the price of caps or European sap options.
Both HJM and LIBOR market models have the serious disadvantage that they
cannot be represented as recombining trees. In practice, this means that they must be
implemented using MonteCarlo Simulations.
9
1.5 Document Structure.
The project is divided into two parts. The first is essentially an introduction to
mathematical engineering methods in finance. The second focuses on a much more
practical implementation of a concrete model, the HJM, and discusses in depth certain
problematics that can surface within this framework.
The text approaches the mathematics behind continuous-time finance informally.

Such an approach may be found imprecise by a technical reader. We simply hope that
the informal treatment provides enough intuition about some of these difficult
concepts to compensate for this shortcoming.
The text is directed towards a reader with minimal background in finance, but
who is willing to embark in the adventure of getting acquainted with this unknown
domain. A strong background in calculus or stochastic processes is not needed, as the
concepts are built from scratch within the text. However, previous knowledge in these
fields will certainly be helpful, and it is necessary that the reader be comfortable with
the use of mathematics as a method of deduction and problem solving. Hence, the text
is designed for individuals who have a technical background roughly equivalent to a
bachelor’s degree in engineering, mathematics or science. The language of financial
markets is largely mathematical, and some aspects of the subject can only be
expressed in mathematical terms. It is hoped that practitioners in financial markets, as
well as beginning graduate students will find the text useful.
Chapter 2 introduces the mathematics of derivative assets assuming that time

passes continuously. Consequent to this assumption, information is also revealed
continuously, and decision makers may face instantaneous changes in random news.
Hence technical tools for pricing derivative products require ways of handling
random variables over infinitesimal time intervals. The mathematics necessary for
dealing with such random variables is known as Stochastic calculus.
Chapter 3 and 4 discuss classical approaches in the modelling of the term

structure of interest rates. Learning the differences between the assumptions, the basic
philosophies, and the practical implementations that one can adopt in each case is an
important step for understanding the valuation of interest-sensitive instruments.
10
Further, it enables us to understand the limitations behind these models, and why the
HJM framework could be used as a good alternative.
Chapters 5 and 6 deal with the basic building blocs of financial derivatives:
bonds, interest rates, options and futures. These are truly the foundations necessary
for modelling the term structure of interest rates. With these, we proceed to introduce
the more complicated classes of derivatives such as swaps or swaptions. We conclude
by showing how these complicated products can be decomposed into a number of
simpler derivatives. This decomposition results extremely practical.
The purpose of the second part of the text is principally to provide an

introduction to one way in which the HJM framework can be implemented. It sets out
in Chapter 7 by providing a broad vision of the framework itself before discussing the
more technical aspects. The text focuses on the practical implementation of the
framework developed in the Banco Santander. We analyse the assumptions,
hypotheses, and development of the framework through Chapter 8, becoming more
and more involved in practical discussions, until diverting to solve the specific
problematics that the model faced. In this part of the text, we have attempted to
introduce each problematic situation firstly under a broad theoretical scope.
Following, we proceed to explain the practical implementation which we decided on,
and the results which they yielded. We will often suggest alternatives for the different
methods that we implement, or offer ideas to face further developments that could
stem from our studies.
We continue in Chapter 9 and 10 with the practical implementation of the HJM.

Chapter 9 is a very brief introduction to the various numerical techniques that are
available for resolving stochastic differential equations: specifically, MonteCarlo
integrals, Tree Diagrams, and Partial Differential Equations Solvers. We deem it
necessary to acquire an understanding of the calibration procedure itself, and how we
obtain the precise parameters that the model requires for the pricing of more complex
exotic products. For this reason, Chapter 10 attempts to provide a broad discussion of
how the parameters are extracted from simple vanilla products. We discuss in detail
the advantages of the Newton Raphson root solver in our approach. A very
interesting discussion is also presented facing two contrary lines of thought: the use of
11
exact calibration methods versus the use of over determined systems that centre on a
error minimisation.
Chapter 11 discusses a mathematical tool that proves extremely useful for

visually analysing the HJM iterative process before it converges towards its final
parameter values. It is also in this section where we begin to encounter certain
calibration flaws in the model that we will need to tackle in future chapters.
Chapter 12 the first genuinely experimental chapter. It discusses the need for a
three parameter HJM model, and continues to analyse a set of possible formulations
that the third parameter, i.e. the volatility of volatilities, could present. The chapter
concludes with an analysis of specific calibrations performed with the three strike
model, and examines the possible causes of the failure to calibrate for long maturities.
In particular it points directly at a possible flaw in the caplet – swaption calibrations.
The analytic approximation of Caplet and Swaption prices is a huge achievement

that is presented in Chapter 13. As its name indicates, it is an approximate
formulation of the HJM model that enables us to reduce calibration computation times
drastically. The exhaustive mathematics of the approximation are studied in the
present chapter. We then proceed in Chapter 14 to analyse the specific results attained
through the various alternatives. We conclude the chapter by selecting the most
advantageous alternatives, and implementing them at the trader’s desk.
Chapter 15 is a first attempt to solve the calibrations problems that the HJM was
facing. It analyses the process by which the vanilla products for any specific
calibration are selected, and more specifically, it studies how the ‘missing’ data is
completed in the corresponding calibration matrices. We reach back briefly to Chapter
11 to analyse this problem by using the mathematical tool developed here. We end
realising that the selection of our interpolation and extrapolation methods can have a
drastic impact in the model solution surface itself.
Chapter 16 picks up the loose ends left behind in Chapter 12 and tackles the
caplet problematics as a cause for the failure in the 3 strikes HJM. Obtaining caplet
quotes from market cap data proves incomplete, and the need for consistent
interpolation techniques to extract the necessary values can be far from simple. We
provide a broad description for the main interpolation and optimisation algorithms
available, but show how even so, we are incapable of doing away with certain
irregularities in the caplet volatility surface. As a direct result of this, we continue to
12
the introduction of the SABR in Chapter 17. This is a stochastic volatility model that
centres on smoothening out the smile obtained by the previous interpolation
techniques. The model is discussed in depth, presenting its parameters and dynamic
characteristics.
Chapter 18 is the down to earth study of the results obtained in the previous two
chapters. It compares the exact interpolation methods with the inexact optimisations
and SABR algorithm, showing how the latter succeed in achieving a smoother
characteristic at the expense of losing precision.
13
1.6 Special Acknowledgements
Several people provided comments and helped during the process of revising the
document. I thank François Friggit for his invaluable discussions on mathematical
finance and models. I thank him also for being such a great director, for his endless
fantastic humour and support. I thank Maria Teresa Martínez for her advice, her
mathematical introductory text, and her help working out many technicalities in this
project. I would also like to thank Luis Martí in particular for having a word of help in
practically everything I did. I cannot imagine what I would have done without him. I
congratulate him for his work and thank him immensely for his friendship. Having
had the possibility to work together with Anh Tuan NGO in the creation of the
analytic approximation has proved a marvellous experience. I thank him for his help,
patience and friendship. I would finally like to thank all the other quant team
members of the Banco Santander that have made my stage during the current year so
formidable. The team is truly marvellous. I thank in particular Álvaro Serrano, Miguel
Pardo, Souahil, Cristina, Lamine, Álvaro Baillo, Pedro Franco, Jorge, Alberto, Carlos,
Manuel, and the rest of the team.
1.7 Project Aims
We briefly outline the aims with which we initiated the current project.
Setting out with the realisation that we cannot successfully approach the HJM
model without a consistent background in mathematical finance, the first part of the
project was directed towards building up this knowledge. Immediately after this, we
directly proceeded to a practical analysis of the implemented model with a direct
exposure to the relevant problematics. The project was thus, aimed at:
• An introduction to pricing derivative models
• The study of existing theoretical models: Black Scholes, Hull White, Heath Jarrow
Morton amongst others.
• The analysis of the internal Banco Santander HJM approach.
14
• Development of an understanding of the various programming techniques in use:
· Microsoft Visual Studio.net 2003: programming framework in C++; complier,

linker and assembler
· Python IL interface
· Murex: valuation platform for credit, equity and interest rate products
· Internal tools created and programmed by the BSCH quant team
· Visual Basic
• Acquaintance with the Hull White framework already implemented and running
in the Banco Santander, as a simplified model serving as prior knowledge to the
Heath Jarrow Morton framework.
• Detailed comprehension of the Heath Jarrow Morton framework: Identification of

specific market examples where the HJM calibration does not compute:
examination of the cases. Analysis of the market data related to the above cases-
verification of whether the failure to compute is due to anomalies in the market.
• Graphical representation and analysis of the correlation between the price and
each of the volatility parameters. Comprehension of the minima obtained in the
Price versus Alpha graphical representations. Isolation and examination of
specific examples.
• Resolution of the impossibility to correctly model real market situations in which

the true price lies below the minimum value of our model. The following two
preliminary approaches will be considered:
· Adjustment of the dependence between HJM volatility and the parameters.
· Modification of the parameters themselves: selection of an appropriate

statistical distribution other than the lognormal or normal distributions.
• Development and implementation of an analytic approximate formula so as to

reduce the time involved in the calibration process.
• Analysis and resolution of the existing HJM problems: possible caplet matrix
stripping problem. Possible failure in interpolation techniques.
15
Chapter 2 Stochastic Calculus
2. Stochastic Calculus Equation Chapter 2 Section 1
2.1 Introduction
Stochastic calculus handles random variables over infinitesimal time intervals,

in continuous time. In it, we can no longer apply Riemann integrals, and these are
instead replaced by Ito Integrals.
A stochastic process is a random function, i.e. a function of the form f(t, w) that is
therefore time dependent and that is defined by the variable W which represents a
source of randomness.
In deterministic or classical calculus, the future value of any function f(t) is

always known by simply calculating
t
f t = f 0 + ∫ df
0
The above requires ft to be differentiable. If f is stochastic as in the case of

Brownian motion, it is not differentiable almost anywhere. Moreover, we do not know
the value of ft for any future instant of time. We only know the probability density
that our function will follow. We can simply write:
t t
f t = f 0 + ∫ α ( s)ds + ∫ β ( s)dWs
0 0
known Slope Standard

today Brownian Motion
Mean Random motion

diffusion (stochastic)
dispersion
The above function is composed by two main processes. A term representing the
random motion or dispersion, that is stochastic; and a mean diffusion term that may
16
be constant over time, increasing or decreasing. Moreover, it can in turn either be

deterministic or stochastic depending on whether the slope term α is also stochastic.
noise
mean
Fig. 2.1. Stochastic Volatility Dynamics
We are in this way left with what is known as an Ornstein Uhlenbeck Process:
df = α (t )dt + β (t )dWt (2.1)
where the first term is known as the drift, and the second as the diffusion.
Before proceeding any further with the Ornstein Uhlenbeck Process or any other
more elaborate setting, we must first grasp a set of important concepts that we will
now proceed to introduce.
2.2 Markov Process
In a non mathematical approach, a Markov Process is one in which the future

probability that a variable X attains a specific value depends only on the latest
~
observation Ft available at present. All other previous (or historical) information sets
~ ~
Ft −1 ,..., F1 are considered irrelevant for the given probability calculation.
17
Mathematically, a Markov Process is therefore a discrete time process

{X1,X2,…,Xt,…} with a joint probability distribution function F(x1,x2,…,xt) that verifies
P( X t + s ≤ a F%S ) = P( X t + s ≤ a X S ) (2.2)
~
And where FT is the information set on which we construct the probability
2.3 Martingale
We will say that an asset St is a martingale if its future expected value, (calculated
~
at time t over an information set F ), is equal to its current value. A martingale is
always defined with respect to a probability P. This is:
EtP [ St +U ] = St (2.3)
2.3.1 Martingale Properties
Martingales possess a number of very interesting properties that we will come

across time and time again. The main properties that martingales verify are:
1. The random variable St is completely defined given the information at time t,

~
Ft
2. Unconditional finite forecasts EtP [S t +U ] < ∞ almost everywhere
3. Future variations are completely unpredictable, meaning there are no trends
EtP [ St +U − St ] = EtP [ St +U ] − EtP [ St ] = St − St = 0 (2.4)
4. A continuous square integrable martingale is a martingale with a finite second

order moment
EtP  St 2  < ∞ almost everywhere (2.5)
Stochastic processes can behave as:
18
• Martingales: if the trajectory has no trends or periodicities
• Submartingale: for processes that increase on average. They verify:
EtP [ St +U ] ≥ St almost everywhere (2.6)
• Supermartingale: for processes that decrease on average. They verify:
EtP [ St +U ] ≤ St almost everywhere (2.7)
In general, assets are therefore submartingales because they are not completely
unpredictable but instead, are expected to increase over time. It is important to notice
that they can be converted to martingales using risk neutral probability if there is
absence of arbitrage.
There are two main methods of converting non martingales to martingales:
2.3.2 Doob Meyer Decomposition:
involves adding or subtracting the expected trend from the martingale, thus
leaving only the random component.
For example, in the case of right continuous time submartingales St, we can split
St into the sum of a unique right continuous martingale Mt and an increasing process
~
At, that is measurable with respect to Ft .
S t = M t + At. (2.8)
In general we will normalize our martingales so that their first order moment is
equal to 0. In most cases, our martingales will also have continuous trajectories. For
such martingales that in addition are square integrable – see (2.5) -, we can define their
quadratic variation process as the unique increasing process given by the Doob-Meyer
decomposition of the positive submartingale {M2}. In other words, it is the unique
increasing process <M>t such that {M2 - < M>t} is a martingale, normalized by <M>0 =
0. In particular, E(M2t ) = E(<M>t).
19
Quadratic Variation Process:
n
= lim ∑ M ti − M ti−1
2
M (2.9)
t ∏→ 0
i =1
∏ = supi ti − ti −1 limit in probability
2.3.3 First Approach to Girsanov’s Theorem
The transformation of a probability distribution using Girsanov’s Theorem

~
involves simply changing the probability space in which we operate, P → P so as to
convert non – martingales to martingales.
EtP [ St +U ] > St EtP [ St +U ] = St

%
→ (2.10)
Continuous martingales have a further set of specific properties:
1. As we increase the partitioning of the interval [0,T] the St’s get closer.
Therefore ( )
P S ti − S ti −1 > ε → 0
 n 
P ∑ S ti − S ti −1
2
2. They must have nonzero variance > 0  = 1 . The quadratic
 i =1 
variation therefore converges to a positive well defined random variable.
n
3. V 1 = ∑ Sti − Sti−1 → ∞ (2.11)
i =1
4. All variations of greater order than V2 tend to 0, and so contain no useful

information. Thus the martingale is completely defined by its first two moments.
20
2.4 Brownian Motion
Also known as a Wiener Process, is a continuous process St whose increments ∆St

are normally distributed, i.e. they are Gaussian.
2.4.1 Characteristics:
1. W0 = 0
2. Continuous square integrable function in time
3. Independent increments dW:
Time
t U V
Thus (WU – Wt) is independent of (WV – WU) and independent of (Wt –W0) = Wt.
This implies that any model based on a Brownian motion parameter will be
independent of all that has occurred in the past. Only future events are of any
concern.
4. dWt2 = dt
5. Follows a Normal probability distribution, under the probability space Ω
(WU − WT )~N (0,U − T ) (2.12)
1 2
1 − x T −U
P (WT − WU ∈ A ) = ∫ e 2 dx (2.13)
A
2π T − U
The mean or expectancy of any given increment in the Brownian variable is

µ=E(WU - WT) = 0
U
The variance is the time interval itself Variance = U - T = ∫ dt
T
This is easily deduced following the subsequent logic.
21
WU = WT + WU − WT
(2.14)
ET (WU >T ) = E (WT ) + E (WU − WT )
Where we have initially defined E (WU − WT ) = 0 and where we know
E (WT ) = WT as it is calculated for the present instant of time at which all data is
known. Therefore:
ET (WU ) = WT U >T (2.15)
Which is what we previously introduced as a Markov Process - a variable whose

future expectation is exactly equal to its current value. A Brownian motion is therefore
always a martingale.
2.5 Stochastic Differential Equation
We will start by analysing the simplest stochastic differential equation possible.

Recall that we have already encountered it under the name of an Ornstein Uhlenbeck
process:
df = α (t ) dt + β (t ) dWt (2.16)
Where α(t) is known as the drift term, and σ(t) as the diffusion. It is the diffusion
term, through the Brownian motion, that introduces the randomness in this dynamics.
We will start by taking both α(t) and σ(t) as being deterministic, as this is the
simplest scenario that we can possibly imagine. In such a case, the solution to the
integral form of the stochastic differential equation is relatively straightforward
t
· ∫ α (s)ds
0
has a classic, deterministic solution, or at its worst, can be solved
numerically through a Riemann integral, as
n
lim ∑ α ( S k −1 , k ) ( tk − tk −1 ) (2.17)
n →∞
k =1
22
t
•
∫ β (s)dW
0
s is probabilistic, follows a Normal (Gaussian) distribution which we
can represent as having a zero mean and whose variance is easily calculated, thus
 t 
leaving ~ N  0, ∫ β 2 ( s )ds 
 0 
Where we have seen that dW2 = dt
2.6 Risk Neutral Probability
The difficulty when dealing with real probability is the fact that each investor
has a different inherent risk profile. This means that each will have a different
willingness to invest.
Imagine that we have a number of investors and each one of them faces the
following situation:
Each investor starts off with 100,000$. He stands a 10% chance that if he
invests, his returns will be of 1M $, whereas he faces a 90% chance that his return will
be 0 $.
As each investor has a different risk profile, they will each price the product
differently- or in other words, they will each be willing to pay a different price with
respect to the 100,000 $ stated to enter this ‘roulette’ game. The reader himself may be
willing to enter the game at a price of only 10,000 $ whereas another more risk averse
may find 90,000 $ a more suitable price for the game. Thus we realize that it is too
arbitrary to associate a price to a product if we rely on real probability.
The risk neutral probability arises as one in which the risk profile is
homogeneous. This is, that all assets present the same instantaneous yield throughout
time, and are all infinitely liquid. They must therefore all present the same rt.
23
Under the risk neutral probability, any tradable asset follows the dynamics
dS = rt Sdt + σ (t )dWt P (2.18)
Where σ is known as the volatility process associated to the asset S.
Here rt is the instantaneous rate defined in the discount factor section. It is time
dependent, but independent of the underlying asset S. We can see clearly in the above
formula that dS is a stochastic process, as it depends on dWtP.
In this risk neutral probability space, if we were to select another asset S2, then it
too would present the same instantaneous yield rt, despite the fact that it would
probably follow a different random process, dW2P
2.7 Solving Stochastic Differential Equations
In multiple occasions, we must resort to a change in variable so as to simplify the

more complex stochastic differential equations, attempting to transform them back
into the ideal Ornstein Uhlenbeck formulation seen before (2.1). For this reason we use
two principal methods: Ito’s Lemma and the Girsanov Theorem.
2.8 Ito’s Lemma
Is the stochastic version of the chain rule in classical calculus. We must firstly
realize that whereas partial derivatives are valid for stochastic calculus, (just as they
are for classical calculus), total derivatives and the chain rule itself are no longer
applicable. That is, considering f(t, St):
∂f ∂f
Partial derivatives: valid ft = fS = dS t
∂t ∂S t
∂f ∂f
Total derivatives: not valid df = dt + dS t
∂t ∂S t
24
df ∂f ∂f dS t
Chain rule: not valid = +
dt ∂t ∂S t dt
Let S = S(t, dWt) where dWt is a Brownian process. Let us recall the stochastic
differential equation:
dSt = α t dt + σ t dWt (2.19)
If we perform a Taylor expansion on the function f(t, S) with respect to the two
variables S and t, then
∂f t ∂f 1 ∂ 2 ft 1 ∂ 2 ft ∂ 2 ft
f t = f t −1 + dt + t dSt + ( dt ) 2
+ ( dS ) 2
+ dtdSt + R
∂t ∂St 2 ∂t 2 2 ∂S 2t ∂St ∂t
t
(2.20)
Replacing with our diffusion equation, we obtain
∂f t ∂f 1 ∂ 2 ft 2
df t = dt + t (α t dt + σ t dW ) + dt +
∂t ∂St 2 ∂t 2
1 ∂ 2 ft ∂ 2 ft
( α σ ) dt (α t dt + σ t dW ) + R
2
dt + dW +
2 ∂S 2t ∂St ∂t
t t
(2.21)
In general, with Taylor expansions we decide to at least maintain the first order
terms, both in S as in t. However, we must be particularly careful with this
simplification:
• Our variable t is deterministic, meaning that as with classic Taylor expansions, we

can ignore powers that are greater or equal than the second order. St however is a
random process as it depends on dW, where we must recall that dW2 = dt. This is
equivalent to a first order term in t and thus cannot be ignored. Therefore, from
(2.20) and (2.21) we retain all the first order elements, that are
∂f ∂f 1 ∂2 f
df = dt + dS + dt (2.22)
∂t ∂S 2 ∂S 2
25
The first two terms correspond to a classical differentiation. The last term is
known as Ito’s term, where we have already replaced dW2 by dt. We can now replace
in (2.22) the diffusion dS of the initial function (2.19), obtaining:
∂f ∂f 1 ∂2 f
df = dt + (α dt + σ dWt ) + dt
∂t ∂S 2 ∂S 2
(2.23)
 ∂f ∂f 1 ∂2 f  ∂f
= + α + 2 
dt + σ dWt
 ∂t ∂S 2 ∂S  ∂S
Ito’s Lemma is mainly used when applying a change in variable to differential

equations. It takes a function f(t, S), that depends on both time and the stochastic
function S and then writes the diffusion of the new variable f(t, S) in terms of S and the
old function’s diffusion dynamics.
Ito’s formula can be used to evaluate Integrals. The general procedure is the
following:
· To guess a form for f(Wt, t)
· Then apply Ito’s Lemma to obtain the standard differential equation
· Integrate both sides of the equation.
· Rearrange the resulting equation.
We proceed to provide a simple practical example: Suppose that we would like

t
to solve ∫ W dW
0
s s . Then
W 2 
d  = WdW
 2 
W2
f (t , Wt ) =
2
1
df = 0 + WdW + dt
2
26
integrating and then rearranging
1
∫ WdW = ∫ df − 2 ∫ dt
1
∫ WdW = f − 2 t
Substituting f again we obtain
W2 1
∫ WdW = − t
2 2
2.9 Stochastic Integral
Is nothing more than the integrated form of Ito’s equation (2.22).
t t
 ∂f ∂f 1 ∂2 f  t
∂f
df = + α
∫0 ∫0  ∂t ∂S 2 ∂S 2
+  dt + ∫ σ dWt (2.24)
 0
∂S
Which rearranged solves the stochastic integral in terms of deterministic,

temporal terms.
∂f  ∂f ∂f 1 ∂2 f 
t t
∫0 ∂S σ dWt = f ( S t , t ) − f ( S 0 , t ) − + α
∫0  ∂t ∂S 2 ∂S 2
+  dt (2.25)

2.10 Girsanov’s Theorem
Girsanov’s Theorem is generally used when we seek to change the probability

space in which we work. Thus, if we have a Brownian variable under the probability
measure P and want to transform it to the probability measure Q, we then simply
perform:
dW P = dW Q + λ ( P, Q )dt (2.26)
27
The result is particularly useful when seeking to change the drift term in an
equation. However, it cannot simplify the diffusion term.
df (
= α dt + β dW P = α dt + β dW Q + λ ( P, Q)dt = ) (2.27)
= (α + βλ ) dt + β dW Q
2.10.1 Radon Nikodym Derivative:
~
States that dP = ξ ( Z t )dP . This implies that the two are equivalent
probability measures if and only if it is possible to go back and forth between the two
measures, thus
dP = ξ ( Z t ) dP% ↔ dP% = ξ −1 ( Z t ) dP (2.28)
This also implies that
P% ( dZ ) > 0 ↔ P ( dZ ) > 0 (2.29)
~
The two can only exist if P assigns a 0 probability at the same time as P
Thus with this new knowledge about the Radon Nikodym derivative we can
revise what we had said about Girsanov’s Theorem. We can now say:
Girsanov’s Theorem states that given a Wiener process Wt, we can always
multiply this process by a probability distribution ξ ( Z t ) to convert it to a different

~
Wiener process Wt . The two Brownians will be related by
dW%t = dWt − St dt (2.30)
and their probability measures by
dP% (W%t ) = ξ (Wt ) dP (Wt ) (2.31)
Notice that ξ ( Z t ) is a martingale with E [ξ ( Z t )] = 1 , and the product
S t ⋅ ξ ( Z t ) is now also a martingale.
28
~
Notice also that both Wt and Wt are Brownian motions, meaning that they have 0
~
mean. However, they are related by the expression dWt = dWt − S t dt . How can this
~
be possible? The answer lies in the fact that Wt has 0 mean under its probability
~
measure P , and Wt under its respective measure P. Therefore, they do not
simultaneously present 0 means with respect to one common probability measure.
We define the random process ξ (Z t ) as the stochastic exponential of St with
respect to W:
1 t 2
∫0 Su dWu − ∫o S u du
t
ξ (Zt ) = e 2
t ∈ [ 0, T ] (2.32)
(Notice that ξ 0 ( Z 0 ) = 1 ), then
There are several conditions that must always hold true:
~
• St must be known exactly, given the information set Ft .
• St must not vary much: [

E e ∫o
t
S 2 u du
]< ∞
But these conditions do not yet assure us that St is a martingale. A condition that
is sufficient for condition ξt ( Z t ) to satisfy the hypothesis of Girsanov’s theorem is to
assume
 1 ∫ot S 2u du 
E e 2 <∞ (2.33)
 
With this, ξt ( Z t ) is a martingale if S is deterministic.
29
2.10.2 Novikov condition:
The above conditions imply that ξ ( Z t ) is a square integrable martingale with
respect to the probability P.
Proof
We proceed now to provide a demonstration for the deterministic scenario. For a

more generic approach, see [Karatzas 1988].
( )
1
− ∫0 S u du
t 2
t
E ( St F%S ) = e 2 E e ∫0 u u F%S
S dW
Given (2.34)
1 1 t 2
− ∫ot S 2u du S S dW ∫ S u du
=e 2
e ∫0 u u
e 2 S
If the last term is finite, then
1
− ∫oS S 2u du t S dW
=e 2 ∫0 u u
e = SS (2.35)
2.11 Martingale Representation Theorem
Recall that from the Doob Meyer Decomposition, we were able to convert any
asset price St into a martingale by simply separating it into a right continuous
martingale component Mt and an increasing process At that was measurable under the
~
information set Ft . This was
St = Mt + At.
Let us consider now that we have
k
M tk = M t0 + ∑ H ti−1  Z ti − Z ti−1  (2.36)
i =1
or equivalently, its integral form
t
M t = M 0 + ∫ H u dZu (2.37)
0
30
~
• H ti−1 is any random variable adapted to Fti−1 . Each H ti−1 is constant because they
are each entirely know at time t.
~
• Z ti − Z ti−1 is any martingale with respect to Ft and P. They are unknown and have
unpredictable increments.
~
Given the above conditions, then M t k is also a martingale with respect to Ft .
Proof:
If we calculate the expectation of the above (2.36), then
 k   =
Et0  M tk  = Et0  M t0  + Et0  ∑ Eti−1  H ti−1  Z ti − Z ti−1  (2.38)
 i =1  
M t0 is known at time t0, meaning that its expectation is its own value. The same is
true for H ti−1 when we apply the expectation operator on it for time ti-1. It too can
therefore exit the operator.
 k 
= M t0 + Et0  ∑ H ti−1 ⋅ Eti−1  Z ti − Z ti−1   (2.39)
 i =1 
Now from the martingale properties (2.4) we know that [ ]

Eti−1 Z ti − Z ti−1 = 0 . We
are therefore left with
[ ]
Et0 M tk = M t0 (2.40)
which according to the definition of a martingale (2.3) , implies that M t k is itself a

martingale.
We can therefore re-write the Doob-Meyer martingale decomposition as:
 T
 T
St = M t + At =  M 0 + ∫ H ( M s ) dZ s  + ∫ α s ds (2.41)
 t  t
This formulation is what is known as the martingale representation theorem.
31
The martingale component Mt represents the degree of deviation about a given

~
trend, and is composed by H, a function adapted to Ms (that is in turn adapted to Fs ),
~
Zs that is already a martingale given Fs and P, and a constant component M0.
The trend At is obtained by the interaction of αt, which is a known, measurable

~
process given Fs
2.12 Major Stochastic Differential Equations
We shall now proceed to give a brief outline of the major existing stochastic
differential equations. Many of the more complex developments can be grouped or
transformed into these more simple formulations.
2.12.1 Linear constant coefficients
Parameters: µ drift, σ diffusion
This model has the following well know stochastic differential equation
dSt = µ dt + σ dWt P (2.42)
Whose solution is St = µ t + σ Wt (2.43)
It is entirely defined by the measures
mean: E [dS t ] = µdt
Variance(dS t ) = σ 2 dt
σ is called the normal volatility of the process
32
µdt
S0
Fig. 2.2. Linear Stochastic Differential Equation dynamics
The variable St fluctuates around a straight line, whose slope is µdt
The fluctuations do not increase over time, and have no systematic jumps. The
variations are therefore entirely random.
2.12.2 Geometric Brownian Motion
It is the basis of the Black Scholes model, which as we shall see, has the following
stochastic differential equation:
dSt = µ St dt + σ St dWt P (2.44)
 1 2
 µ − σ  t +σ Wt
St = S 0 e  2 
(2.45)
σ is called the volatility of the process
Variance(dS t ) = σ 2 S 2 t −1
St has an exponential growth rate µ
Its random fluctuations variance i.e. its variance increases with time
dS t
= µdt + σdWt
P
The ratio of the change has constant parameters
St
S0
Fig. 2.3. Geometric Stochastic Differential Equation dynamics
33
2.12.3 Square Root
This is the same as the Black Scholes model only changing the variance to a
square root form as its name implies
dSt = µ St dt + σ St dWt P (2.46)
Variance(dSt ) = σ 2 St −1
St has an exponential growth rate µ
Its random fluctuations are much smaller than in the Black Scholes model.
S0
Fig. 2.4. Square Root Stochastic Differential Equation dynamics
2.12.4 Mean Reverting Process:
dSt = λ ( µ − St ) dt + σ St dWt P
(2.47)
dSt = λ ( µ − St ) dt + σ St dWt P
As St falls below the mean µ, the term (µ- St) becomes positive and so tends to
make dSt more positive still, attempting to ‘revert’ the dynamics towards its mean
trend. σ is still called the volatility of the process.
The speed of this reversion is defined by the parameter λ
The Ornstein Uhlenbeck as seen in (2.1) is therefore a particular case of this mean
reverting process
34
2.12.5 Stochastic Volatility
Consists in using a volatility parameter that is in itself, time dependant and

random.
dSt = µ St dt + σ t St dWt P
(2.48)
dσ t = λ (σ 0 − σ t ) dt + βσ t dZ t P
If σ t = σ (t , St ) is deterministic then we have a local volatility model.
The particularity of the above model lies in the fact that the volatility depends on
a different Brownian motion Z. Thus a new source of stochasticity is introduced into
the model that is different from that which we previously had: W, which before was
used to exclusively determine the underlying.
In the above, we have created a dynamics where the volatility has a mean
reverting process. Any other diffusion equation could also be considered, with the
simple addition of a stochastic volatility in the last term.
35
Chapter 3 Historical Models
3. Historical Models
We will now proceed to present the historical development of the major models
that have been used in the world of finance. We will follow a simplistic approach,
stating the principal characteristics of each and discussing their flaws. Our aim here is
to show the logical development of ideas from one model to the next. This is to say, to
show how each model builds on the previous and attempts to solve its intrinsic
problems. We hope that with this ‘time-travel’ the reader will be able to finally arrive
at the current date in which this project was written, and to recognise our project’s
developments as an intuitive ‘next step’ to what had previously been done.
3.1 The Black Scholes Model
Was developed in the early 1970’s by Fisher Black, Myron Scholes and Robert
Merton. It soon became a major breakthrough in the pricing of stock options. The
model has since had a huge influence on the way that traders price and hedge options.
It has also been pivotal to the growth and success of financial engineering in the 1980’s
and 1990’s. In 1997 the importance of the model was recognised when Robert Merton
and Myron Scholes were awarded the Nobel Prize for economics. Fisher Black had
sadly died in 1995. Undoubtedly, he would have otherwise also been a recipient of the
prize.
The equation was developed under the assumption that the price fluctuations of
the underlying security of an option can be described by an Ito process. Let the price S
of an underlying security be governed by a geometric Brownian motion process over a
time interval [0, T]. Then the price may be described as
dSt = rSt dt + β (t ) dWt P (3.1)
where W is a standard Brownian motion (or a Wiener Process). r is the risk

neutral rate of a risk free asset (a bond) over [0,T]. The value of the bond satisfies the
well known dynamics dB = rBdt. It is the theory behind the rate r that truly won
Black, Scholes and Merton the Nobel Prize.
36
We also impose that β(t) = St σ. Thus, we have
dSt = rSt dt + σ St dWt P (3.2)
We cannot solve the above dynamics directly because the diffusion term is not
deterministic - St is stochastic, despite the fact that σ is constant and deterministic in a
first simplistic approach.
As previously shown, in stochastic calculus we seek to transform our expression

into an Ornstein Uhlenbeck expression. Thus we seek
dSt
= rdt + σ dWt P (3.3)
St
3.1.1 Solving the PDE
The easiest setting to tackle is therefore that in which r is deterministic and σ is

constant. Then applying Ito’s Lemma, we have
∂g ∂g 1 ∂2 g
dg = dt + dSt + dt (3.4)
∂t ∂St 2 ∂St 2
where
dSt
g ( St , t ) = = dLnSt (3.5)
St
∂g ∂g 1 ∂2g 1
=0 = =− 2
∂t ∂S t S t ∂S t
2
St
1 1  −1 
dLnSt = 0dt + dSt +  2  σ 2 St 2 dt (3.6)
St 2  St 
Substituting dS we now have
1 1
dLnSt = ( rSdt + σ St dWt P ) − σ 2 dt (3.7)
St 2
37
T σ 2T
∫ rt dt − +σ WT P
St = S 0 e 0 2
(3.8)
Having obtained a solution to the diffusion of the asset dSt, we can now price a
Call of strike K and yield r. But before, we must realize the following:
As a direct consequence of the expression
dSt = rSt dt + β (t ) dWt P (3.9)
we know that the expectation of any such tradable assets is
 − ∫ rs ds 
U
St = Et  SU e T 
P
(3.10)
 
Proof
β (t ) β 2 (t )
dLnSt = rdt + dWt P − dt (3.11)
St 2 St 2
U U β (t ) U β 2 (t )
∫ rdt + ∫ dWt P − ∫ dt
SU = ST e 2 St 2
T T St T
(3.12)
U U β (t ) U β 2 (t )
− ∫ rdt ∫ dWt P − ∫ dt
= ST e 2 St 2
T St T
SU e T
(3.13)
Taking expectations
 U∫ β ( t ) dWt P −U∫ β (2t ) dt 

2
 − ∫ rdt 
U
 = ST Et e T t  = ST
S T 2 St
Et  SU e
P T P
(3.14)
   
 
Gaussian variance
As it is a martingale, its mean is 1
38
3.1.2 Pricing a Call by Black Scholes1
Let us consider now the pricing of a Call Option. A Call gives the buyer of the
option the right, but not the obligation, to buy an agreed quantity of a particular
commodity or financial instrument (the underlying instrument) from the seller of the
option at a certain time (the expiration date) and for a certain price (the strike price).
The seller (or "writer") is obligated to sell the commodity or financial instrument
should the buyer so decide. The buyer pays a fee (called a premium) for this right. It
may be useful for the reader to jump momentarily to Chapter 6.1 at this point for a
detailed description of a Call option.
Suppose that we are holding a call option and that at time T the price of the
underlying asset is ST < K. In this case, we would not be interested in buying the asset
at price K. Thus we would not exercise the option, and our profit from this contract
would be 0. On the other hand, if ST > K, we would be ready to buy the asset at the
lower price K from the unfortunate underwriter of our call option, and then go on to
the market to sell the share of our underlying, so as to make a profit of ST - K.
Thus, at time T, the expected benefit obtained from the call option would be
 − ∫ rt dt 
T
C0 = E0  e 0 ( ST − K ) + 
P
(3.15)
 
Where (ST – K)+ = max [ST – K, 0].
Note that exponentials are always greater than zero, so the positive sign can come
out from the expectation operator
+
 − ∫ rt dt − ∫ rt dt 
T T
C0 = E0  e 0 ST − Ke 0 
P
(3.16)
 
Let us note the function
1 Note that we could have also derived Black’s formula by using the equivalent
~
martingale measure P
39
1 → St > K
1{St > K } 
 0 → St < K
 − ∫ rt dt 
T T
− ∫ rt dt
Then C0 = E0 e 0 ST ⋅1{St > K } − Ke 0 ⋅1{St > K } 
P
(3.17)
 
Now considering the second term in the above, the exponential (deterministic) is
independent of St so can be extracted from the expectation since its value is well
known.
 − ∫ rt dt  − ∫ rt dt ∞
T T T
− ∫ rt dt
E  Ke 0 ⋅1{St > K }  = Ke E 1{St > K }  = Ke ∫1
P P
0
0
0
0
{ St > K } dP( St )
  −∞
(3.18)
− ∫ rt dt ∞
T T
− ∫ rt dt
= Ke 0
∫ dP(S )
K
t = Ke 0
prob( St > K )
Analysing the first term, we see that we are left with:
 − ∫ rt dt  − ∫ rt dt − ∫ rt dt ∞
T T T
[
E0P St e 0 ⋅ 1{S t > K }  = e 0 E0P St ⋅1{ S t > K } = e 0 ∫ St ⋅ dP ( St ) ] (3.19)
  K
Let us substitute St with the formula that we derived previously in (3.8):

T σ 2T
∫ rt dt − +σWT P
St = S 0e 0 2
We then have
σ 2T
− ∫ rt dt ∞ − ∫ rt dt ∞
T T T
∫ rt dt − +σ WT P
∫ S t ⋅ dP ( S t ) = e ∫ S 0e ⋅ dP ( S t )
2
e 0 0 0
(3.20)
K K
T T
− ∫ rt dt − ∫ rt dt
now notice St > K ↔ St e 0
> Ke 0
(3.21)
once again substituting St
σ 2T
σ 2T
T
− +σ WT P − ∫ rt dt T
S0e 2
> Ke 0
↔ LnS0 − + σ WT P > LnK − ∫ rt dt (3.22)
2 0
The property of Brownians seen in Chapter 2.4 lies in the fact that they can be
split into a normal and temporal component, thus if
40
WTP = U T and U ~ N(0,1) then
K T
Ln − ∫ rt dt
S0 0 σ T
St > K ↔ U > + (3.23)
σ T 2
We shall call
K T
Ln − ∫ rt dt
S0 0 σ T
d0 = + (3.24)
σ T 2
Evidently therefore, U > d0. At this point therefore we now have both the first and
second terms in the expression (3.17). We can substitute them into the equation to find
the call’s price discounted to present, assuming a deterministic rate r
U2
∞ σ 2T − T
− +σ U T e 2 − ∫ rt dt
C0 = ∫ S0 e 2
⋅ dU − Ke 0
⋅ prob(U > d 0 )
d0 2π
(U −σ T ) 2
∞ − T
e 2 − ∫ rt dt
= S0 ∫ ⋅ dU − Ke 0
⋅ N (−d 0 )
d0 2π
X2
∞ − T
(3.25)
e 2 − ∫ rt dt
= S0 ∫
d 0 −σ ( T ) 2π
⋅ dX − Ke 0
⋅ N (−d 0 )
T
− ∫ rt dt
= S0 N (σ (T ) − d 0 ) − Ke 0
⋅ N (−d 0 )
T
− ∫ rt dt
= S0 N (d1 ) − Ke 0
⋅ N (d2 )
It is important to notice that the price of the Call depends only on
• its price today (t0)
• the distribution of probability of ST at its maturity T
• the discount factors
Therefore, all that occurs in between these two dates is irrelevant to us.
41
ST
(ST-K)T
S0 K
T Tiempo
Time
t0
irrelevant
Fig. 3.1. Only the present call value is relevant to compute its future price. Any
intermediate time-periods are irrelevant
3.2 Beyond Black
The Black Scholes model presents several difficulties that it cannot surmount. As
we have seen, in theory it has a single volatility σ for every K and maturity T.
This however, is not what is perceived by the traders in the markets. If we were to
set up a matrix of prices for K vs T for a given underlying asset, and perform the
inverse transformation of the Black Scholes formula so as to find their corresponding
σ, we would discover that their σ is not unique, and instead varies with K and T.
Therefore, the two main problems presented by the Black Scholes’ model are:
• Smile: for any given maturity T, there are different Black implied volatilities σBlack
for different strikes K.
• Term Structure: for any given strike K, there are different Black implied
volatilities σBlack for different maturities T.
42
Black Implied Volatility σ

σ
Short maturity
(smile)
Long maturity (skew)
Strike
Fig. 3.2. Term Structure of Vanilla Options
The lack of a unique volatility as assumed by Black does not mean that the Black
Scholes model is useless to our case. As stated initially, the model has triggered an
enormous amount of research and revolutionised the practice of finance. Further, we
have learnt in the above development that we can successfully use the Black Scholes
model as a ‘translator’ between option prices and their σBlack. Both have parallel
dynamics, thus, any movement in prices will result in a similar movement in the
asset’s σBlack.. We will explore the advantage of this property in the following
discussion.
3.2.1 Term Structure
For a given strike K, we may have the following structure:
T1
0 σ1
of Variance = σ12T1
and for a different product:
T2
σ2
0
of Variance = σ22T2
As stated previously, Black’s implied volatility is not unique, therefore σ ≠ σ1 ≠ σ2
43
3.2.2 Time Dependent Black
We can solve this inconvenience by modelling σ as the mean value of the variance
that is accumulated between 0 and T2. This is simply obtained by replacing in the
original equation
T
σ T→ ∫σ
2
(t )dt (3.26)
0
Thus we have a deterministic σ(t) yielding:
dS t = rS t dt + σ (t ) S t dWt
P
(3.27)
which is known as ‘Time Dependent Black’.
We now have a unique σ(t) so can write
T1
0 σ1
T1
Var1 = ∫ σ 2 (t )dt
0
T2
σ2
0
T2 T1 T2
Var2 = ∫ σ (t )dt = ∫ σ (t )dt + ∫ σ 2 (t )dt = Var1 + Var12

2 2
0 0 T1
3.2.3 Smile
We cannot use a similar change as in the previous case and convert our equation
dS t = rS t dt + σ ( K ) S t dWt . This is because we do not know the strike K at which

P
to
the asset will be sold in the market.
44
3.3 Lognormal Classic Black
We have so far seen that we can write the diffusion of the Black Scholes model as:
dS t = rS t dt + σS t dWt
P
(3.28)
The second term is not normal as it includes a Brownian (which is normal), but
also the term St which itself follows a stochastic diffusion. We can rewrite the above
as
dS t
= rdt + σdWt
P
(3.29)
St
dS t
We call it a lognormal model because subsequently, the term is transformed
St
through Ito into dLnS t . Therefore it is the log of St that is now normal, as its diffusion
term depends only on the Brownian parameter and a constant sigma.
As the diffusion is constant in the lognormal model, when is represented in a

strike vs σBlack graph, we obtain a flat curve:
σBlack
Lognormal
(flat)
Strike K
Fig. 3.3. Flat lognormal Black Volatilities
45
3.4 Normal Black
We could attempt to transform the probability distribution that our equation

follows to a different form. Thus by taking:
dS t = rS t dt + σdWt
P
(3.30)
the second term now consists in a constant volatility attached to a Gaussian

Brownian motion. This diffusion term is already normal without having to transform
it to a lognormal version. The above is commonly rewritten as:
dSt = rSt dt + σ S0 dWt P (3.31)
where we have included a constant term S0 so that the magnitude of the volatility
is comparable to that in the classic Black Scholes model.
The previous formulation can be transformed to its lognormal version so as to

compare its dynamics with the classic lognormal Black Scholes model:
dSt S
= rt dt + σ 0 dWt P (3.32)
St St
σBlack
Call Price
a) b)
Lognormal
(Flat)
Lognormal Normal
(Skew)
Normal
Strike K ATM Strike K
Fig. 3.4. Normal and lognormal Black Scholes model comparison a) price vs strike
b) black volatility vs strike
The comparison between the two models must be made under the classic σBlack.
As seen above, if we were to compare the two normal and lognormal models in terms
of their Call prices, we would find it difficult to perceive any difference. Thus we
realise the utility of using the σBlack: The σBlack allows us to compare and clearly
46
distinguish models that are otherwise undistinguishable under their price measures.
This matches up with our previous section’s discussion in which we questioned the
utility of calculating σBlack if we knew that the Black Scholes model could not correctly
model varying local volatilities.
The main problem with the normal Black model is the fact that it imposes a slope
that doesn’t always coincide with that which is observed in real markets. Thus a
logical continuation to the model is proposed.
3.5 Black Shifted
dS t = rS t dt + σ (αS t + (1 − α ) S 0 )dWt
P
(3.33)
This model allows for a variety of slopes ranging between the skewed normal and
the flat lognormal version. The parameter α acts as the weight associated to each of
the models. Market data shows that the general level of volatility is imposed at the
money ATM, and is basically independent of the α parameter.
σBlack
Lognormal
(Flat) α=1
Normal
(Skew)α=0
ATM Strike K
Fig. 3.5. Alpha skew modelling
The interpretation of this new model is best made when analysed from the classic
lognormal σBlack perspective:
47
dS t  S 
= rdt + σ  α + (1 − α ) 0 dWt
P
(3.34)
St  St 
Acts as the volatility ~σ*(St)dWtP
Just as we had expected, we see that by varying the alpha parameter we are
capable of adjusting our model to any true market slope.
3.6 Local Volatility - Dupire’s Model
dS t
= rdt + σ (t , St )dWt
P
(3.35)
St
Recall that in Black’s model there is a one-to one relationship between the price of
a European option and the volatility parameter σBlack. This was seen clearly in Fig. 3.3
where the lognormal Black model was a flat straight line, constant at all strikes K.
Consequently, option prices are often quoted by stating the implied volatility σBlack: the
unique value of the volatility which yields the option’s money price when used in
Black’s model. In theory, the volatility σBlack in Black’s model is a constant but as we
have already stated, in practice, options with different strikes K require different
volatilities σBlack to match their market prices. For example for a unique maturity,
market data may present this form:
Fig. 3.6. Market data smile
48
Handling these market skews and smiles correctly is critical to fixed income and
foreign exchange desks, since these desks usually have large exposures across a wide
range of strikes. Yet the inherent contradiction of using different volatilities for
different options makes it difficult to successfully manage these risks using Black’s
model.
The development of local volatility models by Dupire and Derman- Kani was a
major advance in handling smiles and skews. Local volatility models are self-
consistent, arbitrage-free, and can be calibrated to precisely match observed market
smiles and skews. Currently these models are the most popular way of managing
smile and skew risk. However, as we shall discover the dynamic behaviour of smiles
and skews predicted by local volatility models is exactly opposite to the behaviour
observed in the marketplace: when the price of the underlying asset decreases, local
volatility models predict that the smile shifts to higher prices; when the price
increases, these models predict that the smile shifts to lower prices. In reality, asset
prices and market smiles move in the same direction.
This contradiction between the model and the marketplace tends to de-stabilize
the delta and vega hedges derived from local volatility models, and often these
hedges perform worse than the naive Black-Scholes’ hedges.
3.6.1 Detailed Comparison
We will now advance to derive a more detailed comparison between the

traditional Black model and the Dupire model.
Consider a European call option on an asset A with exercise date tex, settlement
date tset , and strike K. If the holder exercises the option on tex, then on the settlement
date tset he receives the underlying asset A and pays the strike K. To derive the value
of the option, define Fˆ (t ) to be the forward price of the asset for a forward contract
that matures on the settlement date tset, and define f = Fˆ (0) to be today’s forward
49
price. Also let B(t) be the discount factor for date t; that is, let B(t) be the value today of
1 u.c. to be delivered on date t. Martingale pricing theory asserts that under the “usual
conditions,” there is a measure, known as the forward measure, under which the
value of a European option can be written as the expected value of the payoff. The
value of a call options is
(

) 
+
Vcall = B(t set ) E P  Fˆ (tex ) − K F%0  (3.36)
 
and the value of the corresponding European put is

( ) 
+
V put = B(tset ) E P  K − Fˆ (tex ) F%0  (3.37)
 
V put = Vcall + B(tset ) ( K − f ) (3.38)
(Refer to the Discount Factor section in Chapter 5.1 to learn that a future payoff
can be discounted to its present value by continuous compounding, which can be
equivalently expressed as a bond maturing at the future date).
Here the expectation E is over the forward measure, and F%0 can be interpreted
as “given all information available at t = 0.” Martingale pricing theory also shows that
the forward price Fˆ (t ) is a Martingale under this measure, so the Martingale
representation theorem shows that Fˆ (t ) obeys
dFˆ (t ) = C (t ,*)dW Fˆ (0) = f (3.39)
for some coefficient C (t, * ), where dW is a Brownian motion in this measure.

The coefficient C (t,* ) may be deterministic or random, and may depend on any
information that can be resolved by time t.
This is as far as the fundamental theory of arbitrage free pricing goes. In

particular, one cannot determine the coefficient C(t,*) on purely theoretical grounds.
Instead one must postulate a mathematical model for C (t, * ).
50
European swaptions fit within an identical framework. (Refer to a swpation

description in Chapter 6.9). Consider a European swaption with exercise date tex and
fixed rate (strike) K. Let Rˆ s (t ) be the swaption’s forward swap rate as seen at date t,
and let R0 = Rˆ s (0) be the forward swap rate as seen today. The value of a payer
swaption is
n −1
V pay = ∑ mB(t ; Ti ) ⋅ E ( Rˆ s (tex ) − K )+ F%0  (3.40)
i=0
 
and the value of a receiver swaption is
n −1
Vrec = ∑ mB(t ; Ti ) ⋅ E  ( K − Rˆ s (tex ))+ F%0  (3.41)
i =0
 
n −1
Vrec = V pay + ∑ mB(t ; Ti ) ⋅ [ K − R (0)] (3.42)
i =0
n −1
Here the level ∑ mB(t; T )
i =0
i is today’s value of the annuity, which is a known
quantity, and E is the expectation. The PV01 of the forward swap, like the discount
factor rate Rˆ s (t ) is a Martingale in this measure, so once again
dRˆ s (t ) = C (t ,*)dW Rˆ s (0) = R0 (3.43)
where dW is Brownian motion. As before, the coefficient C(t,*) may be

deterministic or random, and cannot be determined from fundamental theory. Apart
from notation, this is identical to the framework provided by the previous equations
for European calls and puts. Caplets and floorlets can also be included in this picture,
since they are just one period payer and receiver swaptions.
3.6.2 Black’s model and implied volatilities.
To go any further requires postulating a model for the coefficient C(t,*). We saw
in previous sections that Black postulated that the coefficient C(t,*) is σ B Fˆ (t ) where
51
the volatility σB is a constant. The forward price Fˆ (t ) is then a geometric Brownian

motion:
dFˆ (t ) = σ B Fˆ (t )dW Fˆ (0) = f (3.44)
Evaluating the expected values in equations (3.36) and (3.37) under this model
then yields through Black’s formula,
Vcall = B(t set ) ( f ⋅ N (d1 ) − K ⋅ N (d 2 ) )

(3.45)
V put = Vcall + B(t set ) ( K − f )
where
 f  σ ⋅ (t )
2
log   ± B ex
T
d1,2 = K 2
(3.46)
σ B ⋅ tex
for the price of European calls and puts, as is well-known. All parameters in
Black’s formula are easily observed, except for the volatility σBlack .
An option’s implied volatility is the value of σBlack that needs to be used in Black’s
formula so that this formula matches the market price of the option. Since the call (and
put) prices in equations (3.45) are increasing functions of σBlack, the volatility σBlack
implied by the market price of an option is unique. Indeed, in many markets it is
standard practice to quote prices in terms of the implied volatility σBlack implied the
option’s dollar price is then recovered by substituting the agreed upon σBlack into
Black’s formula.
The derivation of Black’s formula presumes that the volatility σBlack is a constant
for each underlying asset A. However, the implied volatility needed to match market
prices nearly always varies with both the strike K and the time-to-exercise tex .
52
Fig. 3.7. Smiles at different maturities
Changing the volatility σBlack means that a different model is being used for the
underlying asset for each K and tex.
3.6.3 Local volatility models.
In an insightful work, Dupire essentially argued that Black was too bold in setting
the coefficient C (t,*) to σ B Fˆ (t ) . Instead one should only assume that C is Markovian:
C = C (t , Fˆ ) . Re-writing C (t , Fˆ ) as σ loc (t , Fˆ ) Fˆ then yields the “local volatility

model,” where the forward price of the asset is
dFˆ = σ loc (t , Fˆ )dW Fˆ (0) = f (3.47)
in the forward measure. Dupire argued that instead of theorizing about the
unknown local volatility function σ loc (t , Fˆ ) , one should obtain it directly from the
marketplace by “calibrating” the local volatility model to market prices of liquid

European options.
53
3.6.4 Calibration
In calibration, one starts with a given local volatility function σ loc (t , Fˆ ) , and
evaluates

( ) 
+
Vcall = B(t set ) E P  Fˆ (tex ) − K F%0  (3.48)
 
= V put + B(t set ) ( f − K ) (3.49)
To obtain the theoretical prices of the options; one then varies the local volatility
function σ loc (t , Fˆ ) until these theoretical prices match the actual market prices of the
option for each strike K and exercise date tex. In practice liquid markets usually exist
only for options with specific exercise dates tex1 , tex2 , tex3 ,.... for example, for 1m, 2m,
3m, 6m, and 12m from today. Commonly the local volatilities σ loc (t , Fˆ ) are taken to
be piecewise constant in time:
σ loc (t , Fˆ ) = σ loc
1
( Fˆ ) for t < tex
1
σ loc (t , Fˆ ) = σ locj ( Fˆ ) for texj −1 < t < texj j = 2, 3,...J

σ loc (t , Fˆ ) = σ loc
J
( Fˆ ) for t > texJ
One first calibrates σ loc

1
(t , Fˆ ) to reproduce the option prices at
1
tex for all strikes
K, then calibrates σ loc

2
(t , Fˆ ) to reproduce the option prices at tex2 , for all K, and so
forth.
(This calibration process can be greatly simplified by solving to obtain the prices
of European options under the local volatility model (3.47) to (3.49), and from these
prices we obtain explicit algebraic formulas for the implied volatility of the local
volatility models.)
Once σ loc (t , Fˆ ) has been obtained by calibration, the local volatility model is a
single, self-consistent model which correctly reproduces the market prices of calls
(and puts) for all strikes K and exercise dates tex without “adjustment.” Prices of exotic
options can now be calculated from this model without ambiguity. This model yields
54
consistent delta and vega risks for all options, so these risks can be consolidated across
strikes. Finally, perturbing f and re-calculating the option prices enables one to
determine how the implied volatilities change with changes in the underlying asset
price. Thus, the local volatility model provides a method of pricing and hedging
options in the presence of market smiles and skews. It is perhaps the most popular
method of managing exotic equity and foreign exchange options.
Unfortunately, the local volatility model predicts the wrong dynamics of the
implied volatility curve, which leads to inaccurate and often unstable hedges. Local
volatility models predict that the market smile/skew moves in the opposite direction
with respect to the price of the underlying asset. This is opposite to typical market
behaviour, in which smiles and skews move in the same direction as the underlying.
Fig. 3.8. Implied volatility σB(K,f) if forward price decreases from f0 to f (solid line)
Fig. 3.9. Implied volatility σB(K,f) if forward price increases from f0 to f (solid line).
55
Let us explain in a little more detail this calculation for the future projection of
local volatilities. Given that we know the σ(t, St) up until the present time instant t,
how do we construct the model up to time t + dt?
At t we know the price of the asset St. We can also calculate for it a set of possible
future scenarios, each with their respective probabilities of occurring. Thus we have
the corresponding probability distribution at time t + dt for all the values that the local
volatility could possibly take.
t t + dt
Fig. 3.10. Future asset volatility scenarios for different an asset
By analysing for a given asset the continuum of market prices for every strike at
time t, and projecting their scenarios into the future, it is possible to discover for each
price the corresponding probability distribution at time t + dt. Thus, on obtaining the
probability distribution, we can then simply invert the Black Scholes formula so as to
obtain each σBlack for each of the product’s strikes.
Time
t t + dt
Fig. 3.11. Future volatility scenarios for different strikes of a same underlying
asset
56
Each strike has a particular σBlack at a given date T, just as is shown by the market
quoted data. The same procedure can be repeated at every desired maturity T,
therefore also obtaining a σBlack for a continuum of times. Any other desired σBlack can
simply be calculated by interpolating between two known σBlack derived from the
market data. Note that the method used here for interpolation will have a great
impact on the final value and smile of the various σBlack.
Proof:
Let us set out by writing the expression for a Call through a typical stochastic
diffusion equation i.e. composed by a risk neutral drift and a stochastic component
yields:
∂C ∂C 1 ∂ 2C 2
dCt = rCdt + σ (t , S t )CdWt → dt + dS + σ (t , St )dt
P Ito
∂t ∂S 2 ∂S 2
(3.50)
dS t = rS t dt + σ (t , S t )dWt Thus replacing we

P
from before, we also know that
obtain
 ∂C ∂C 1 ∂ 2C 2  ∂C
 + rS + σ (t , S t ) dt + σ (t , S t )dWt P (3.51)
 ∂t ∂S 2 ∂S ∂S
2

we can apply here the fact that the Call Price = (S – K)t
Therefore, for a known fixed price as is our case, if the underlying asset S
increases, the strike must consequently decrease. This implies the following
relationship between the given variables
∂C ∂C ∂ 2C ∂ 2C
=− , = − (3.52)
∂S ∂K ∂S 2 ∂K 2
∂C ∂C
Similarly =− (3.53)
∂t ∂T
Ignoring the Brownian term in dW we obtain
57
 ∂C ∂C 1 ∂ 2C 2 
dC = − − rS − σ (t , S t ) dt (3.54)
 ∂T ∂K 2 ∂K 2

All of which is known from the market data with respect to every strike and
maturity, as stated before. The integral form of the above can therefore easily be
calculated obtaining a value for σ 2 (t , S t ) at each (K,T).
Thus, given a continuum of market prices with respect to the strike K and the
maturity T, it is possible to construct a unique solution for σ(t,St). The model known as
Local Volatility because it associates a local σ to each pair (K,T).
3.6.5 Problems
The model correctly simulates the market smile at any strike and for any maturity
T starting from the present date t0. However, it also presents a set of intrinsic
problems:
1. The computation is much slower as the algorithm advances by infinite time

intervals dt which can consist in periods of time composed by days or weeks. We saw
that the Black Scholes model could jump directly from the data in time t0 to the
probability distribution at maturity T (say 1, 10 or 30 years in the future) without the
need for any intermediate steps.
2. Experience has demonstrated that although the model is extremely accurate for
evaluating products that start today and end at time T, it is not accurate for forward
products that start at a future time U and end at T. The dynamics of the products
obtained through the Dupire model results in an almost flat σBlack vs K curve, whereas
market data show a smile.
3. The Dupire model implies that the volatility at a given time depends only on
the underlying ST whereas experience has demonstrated that this dependence is not
really constant. This implies the need of a further parameter to introduce another
stochastic source into our expression
The above concerns have taken the quant world to step back from the Dupire
model when analysing future products- and this has meant the need to return to the
58
models that we were previously discussing, i.e. the normal and lognormal Black
Scholes models, or the Black shifted, although with very subtle modifications.
3.7 Stochastic Volatility
dS t
= rdt + VdWt
P
(3.55)
St
dV = f t dt + g t dZ t
P
(3.56)
In this model we assume that the volatility itself follows a stochastic process.
Notice that the Brownian motion driving the volatility process is different from the
asset’s Wiener process. This is the approach followed to introduce a new variable of
stochasticity different from dW, as discussed previously. Both Brownian motions may
be correlated
ρ t dt = dWt P , dZ tP (3.57)
Note also that ft represents the drift term of the volatility V, whereas gt represents
the volatility of the volatility, otherwise referred to as Vol of Vol.
The advantage of this new model is the fact that before, all the stochasticity
derived from S, and in turn, from W. Now in turn, we have two sources if
stochasticity, W and Z, where, if correlated with a ρ= 1, give way to the previously
discussed normal and lognormal models.
This approach has given way to the models know as:
• Heston 1994
• SABR 1999
Current lines of thought suggest that the path to follow include a combination of
local volatility versus stochastic volatility.
59
3.7.1 Black shifted with Stochastic Volatility
2002 – 2003
dS t = rS t dt + V (αSt + (1 − α ) St )dWt
P
(3.58)
dV = fdt + gdZ t
P
3.7.2 Local volatility versus simple Stochastic Volatility
dS t = rS t dt + VS tσ (t , S t )dWt
P
(3.59)
dV = (V − λV0 )dt + g 0 dZ t
P
where the stochastic volatility includes a mean revision term, and where the
Volatility of Volatilities is a constant.
3.8 SABR
Most markets experience both relatively quiescent and relatively chaotic periods.
This suggests that volatility is not constant, but is itself a random function of time.
Respecting the preceding discussion, the unknown coefficient V (t,*) is chosen to be
α Ŝ β , where the “volatility” α is itself a stochastic process. Choosing the simplest

reasonable process for α now yields the “stochastic – α β ρ model,” which has become
known as the SABR model. In this model, the forward price and volatility are
dFˆ = α Fˆ β dW1 Fˆ (0) = f

(3.60)
dα = να dW2 α (0) = α
under the forward measure, the two processes are correlated
dW1dW2 = ρ dt (3.61)
The SABR model has the virtue of being the simplest stochastic volatility model
which is homogenous in F̂ and α. We find that the SABR model can be used to
accurately fit the implied volatility curves observed in the marketplace for any single
exercise date tex. More importantly, it predicts the correct dynamics of the implied
60
volatility curves. This makes the SABR model an effective means to manage the smile
risk in markets where each asset only has a single exercise date; these markets include
the swaption and caplet/floorlet markets.
The SABR model may or may not fit the observed volatility surface of an asset
which has European options at several different exercise dates; such markets include
foreign exchange options and most equity options. Fitting volatility surfaces requires
the dynamic SABR model or other stochastic volatility models.
In the SABR there are two special cases: If we analyse (3.60) in a little more depth,
we notice that for β = 1, we obtain our previous stochastic log normal Black model.
For β = 0 in turn we obtain the stochastic Black normal model. For a deeper
understanding, refer to the SABR study in the Caplet Stripping Section 17.
Finally, it is worthwhile noting that the SABR model predicts that whenever the
forward price f changes, the implied volatility curve shifts in the same direction and
by the same amount as the price f. This predicted dynamics of the smile matches
market experience, and so is a great advance over the Dupire model which was
inconsistent at this point.
61
Chapter 4 Interest Rate Models
4. Interest Rate Models
As seen, Black can be used for simple vanilla options such as Caps, European
Swaptions,... The Black-Scholes model is commonly used for the pricing of equity
assets where the model takes on a deterministic rate r. The model replicates the
evolution of its underlying asset through the use of a drift and a diffusion parameter.
However a difficulty arises when interest rate models are constructed using this
technique as the interest rate curve r is non deterministic and so can lead to arbitrage
solutions.
If we were to apply the Black Scholes formulation to forward interest rates, we

would obtain a lognormal model with a constant volatility σ. Under the risk neutral
forward probability, we would have
dF (t , T , U ) = 0dt + σF (t , T , U )dWt
P
which results extremely simple. Applying Girsanov’s Theorem and inverting the
forward rates in a local volatility approach, we would need a different underlying
each time, because each forward rate has a different black volatility. This is a
characteristic that Black’s model does not support. Black requires a unique volatility.
A solution to the problem that we haven’t discussed could be to consider the

different forward rates as a basket of different products in equity- each being
lognormal with their associated volatility. Thus we would obtain a matrix of
correlations between them.
This approach gives way to arbitrage opportunities, creating a forward curve that
always increases with time. According to market data, the former does not always
occur- thus the need for specific interest rate models.
Hence, in previous approaches, interest rate models avoid assigning a short rate
by specifying it at every time and state. Although this is a good and practical method,
an alternative is to specify the short rate as a process defined by an Ito equation. This
allows to work in continuous time.
62
In this approach we specify that the instantaneous short rate r satisfies an

equation of the type
dr = µ (r , t )dt + σ (r , t )dW (4.1)
where W(t) is a standardised Wiener process in the risk-neutral world. Given an

initial condition r(0), the equation defines a stochastic process r(t)
Many such models have been proposed as being good approximations to actual
interest rate processes. We list a few of the best known short rate models:
4.1 Rendleman and Bartter model
dr = mrdt + σ rdW (4.2)
This model copies the standard geometric Brownian motion model used for stock
dynamics. It leads to lognormal distributions of future short rates. It is now, however,
rarely advocated as a realistic model of the short rate process
4.2 Ho-Lee model
dr = θ (t ) dt + σ dW (4.3)
This is the continuous-time limit of the Ho-Lee model. The function θ (t ) is

chosen so that the resulting forward rate curve matches the current term structure. A
potential difficulty with the model is that r(t)may be negative for some t
63
4.3 Black Derman Toy model
dLnr = θ (t )dt + σ dW (4.4)
This is virtually identical to the Ho-Lee model, except that the underlying
variable is Ln r rather than r. Using Ito’s Lemma, it can be transformed to the
equivalent form
 1 
dr = θ (t ) + σ 2  rdt + σ rdW (4.5)
 2 
4.4 Vasicek Model
dr = a ( b − r ) dt + σ dW (4.6)
The model has the feature of mean reversion in that it tends to be pulled to the
value b. Again, it is possible for r(t) to be negative, but this is less likely than in other
models because of the mean reversion effect. Indeed, if there were no stochastic term
(that is if σ = 0 ), then r would decrease if it were above b and it would increase if it
were below b. This feature of mean reversion is considered to be quite important by
many researchers and practitioners since it is felt that interest rates have a natural
‘home’ of about 5% and that if rates differ widely from this home value there is a
strong tendency to move back to it.
4.5 Cox Ingersoll and Ross model
dr = a ( b − r ) dt + σ rdW (4.7)
In this model not only does the drift have a mean reversion, but the stochastic
term is multiplied by r , implying that the variance of the process increases as the
rate r itself decreases.
64
4.6 Black Karasinski model
dLnr = (θ − aLnr ) dt + σ dW (4.8)
This is the Black Derman Toy model with mean reversion
4.7 Hull White Model
dr = (θ t − at r )dt + σ t dWt P (4.9)
Where θ, σ and ‘a’ are deterministic.
r is a normal (Gaussian) variable with a mean reversion term. This mean revision
within the drift parameter allows for a more static evolution of the interest rates, a
property that is historically consistent
Notice that the above equation can be solved analytically.
However, there are two main problems connected to this model:
• The first is the fact that the model is normal by definition, thus always yields a
positive probability for values of r < 0. This is not such a great concern if the
product we try to model depends on high interest rates.
• Secondly, the model gives a correlation of practically 1 between the different long
and short term interest rates. This means that any movement we try to reproduce
in our interest rate curve will result in an equal translation across the entire curve.
Thus the flexibility required to bend the curve at different points, as occurs in
reality is unavailable here. A possible solution would be to take a short term
interest rate into consideration in our model, and at the same time a long term
interest rate. This however requires taking two Brownian motions that are not
correlated.
65
4.7.1 BK
To solve the normality of the problem, a lognormal Hull White model can be
proposed, which follows a diffusion equation of the form:
dLnr = (θ t − at Lnr )dt + σ t dWt

P
(4.10)
and where θ, σ and a are deterministic. The above is known as the BK model. This
equation however has no analytical solution, meaning it must be solved through
mathematical approximations such as tree diagrams or PDE solvers (MonteCarlo is
not applicable). This results in more difficult and time consuming calibrations.
4.7.2 BDT
is a solution that greatly simplifies the calculations. In it, we take the previous
equation for BK, (4.10)
σ ' (t )
imposing at = ±
σ (t )
 σ '(t ) 
dLnr =  θt ± ·Lnr  dt + σ t dWt P (4.11)
 σ (t ) 
However, the previously stated problem that was characterised by uniform

translations in the interest rate curve is still recurrent with this form, i.e. there is still a
unique interest rate that must be selected to define an entire curve.
A summary of the models following similar approaches are:
Table 4.1 Normal or lognormal models with a mean reversion
66
4.8 Conclusions
All of these models are referred to as ‘single factor models’ because they each
depend on a single Wiener process W. There are other models that are ‘multifactor’
which depend on two or more underlying Wiener processes.
All the single factor models lead to the same drawback: the fact that they use only
one explanatory variable (the short interest rate rt ) to construct a model for the entire
market. The main advantage of these methods lies in the possibility of specifying rt as
a solution to a Stochastic Differential Equation (SDE). This allows, through Markov

theory, to work with the associated Partial Differential Equation (PDE) and to
subsequently derive a rather simple formula for bond prices. The disadvantage: the
need for a complicated diffusion model to realistically reconstruct the volatility
observed in forward rates. In addition, the use of a single rt proves insufficient to
model the market curve, which appears to be dependent on all the rates and their
different time intervals.
The most straightforward solution should include the use of more explanatory
variables: long and medium term rates. This is, we could perhaps consider a model in
which we would use one representative short term rate, a middle term rate, and
finally a long term interest rate. The Heath Jarrow Morton framework arises as the
most complete application of the suggested approach. It chooses to include the entire
forward rate curve as a theoretically infinite dimensional state variable. This model
will be described in Chapter 7.
67
Chapter 5 Interest Rate Products
5. Interest Rate Products
Interest rate derivatives are instruments whose payoffs are dependent in some
way on the level of interest rates. In 1973, the market for calls over stocks begins. By
the 1980’s, public debt explodes causing a huge increase in the number of Swaps being
traded. By the end of the 1980’s there already appears a market for options over
swaps, thus a whole range of new products- Swaptions, FRA’s, Caps, Caplets,... –
were developed to meet the particular needs of end users. These begin trading based
particularly on the LIBOR and EURIBOR rates: London Interbanking Offer Rate and
the European equivalent.
Interest rate derivatives are more difficult to value than equity and foreign
exchange derivatives due to four main reasons:
1. The behaviour of an individual interest rate is more complicated than that of a

stock price or exchange rate.
2. The product valuation generally requires the development of a model to

describe the behaviour of the entire zero-coupon yield curve
3. The volatilities of different points on the yield curve are different
4. Interest rates are used for discounting as well as for defining the payoff
obtained from the derivative.
5.1 Discount Factors
The money account, bank account or money-market account process is the

process that describes the values of a (local) riskless investment, where the profit is
continuously compounded at the risk free rate existent in the market at every
moment.
68
Let B(t) be the value of a bank account at time t ≥ 0. Let us assume that the bank
account evolves according to the following differential equation:
dB * (t ) = rt B * (t ) dt (5.1)
B * (t , t ) = 1 because we assume that today we invest one unit of currency, u.c.

This value is not yet stochastic because today, we know the exact value of the bank
account.
T
∫ rs ds
Thus, integrating we obtain B (t , T ) = e t
*
, where rt is a positive (usually
stochastic) function of time. The above gives us the return at a future time T obtained
from the bank when we invest 1 unit of currency (u.c.) on day t.
T
∫ rs ds
et
t T
Fig. 5.1. Future value of money
As the opposite backward operation is more common, we define
T
∫ rs ds 1
B (t , T ) = e t
*
= (5.2)
B(t , T )
The inverse operation is the discount of money from a future date T to present t,
T
− ∫ rs ds
at the bank’s riskless rate. This is defined therefore as B '(t , T ) = e t
and is known
as the stochastic discount factor. It is used to bring prices from the future to the present.
T
− ∫ rs ds
e t
t T
Fig. 5.2. Discount factor
69
5.2 Zero-coupon bond
The zero coupon bond at time t with maturity T > t, denoted by B(t,T) is the value
of a contract that starts at time t and guarantees the payment of 1 u.c. at maturity T.
It’s price is the future unit of currency, discounted to present. As we do not know
what the future discount rate will be, we can only calculate the bond’s expected price.
 − ∫ rs ds 
T
B (t , T ) = Et e t 
P
(5.3)
 
 − ∫ rs ds 
T
This is simply the extension of a martingale measure S t = E e t S T  where

P
 
according to our definition of a zero coupon bond, we set ST = 1 u.c. Remember that
all martingales have the above property- a martingale is any tradable asset that under
risk neutral probability, follows a diffusion of the form
dSt = rSt dt + σ (t ) St dWt

P
(5.4)
Where WtP is a Brownian motion, rt the instantaneous risk free rate and S0 a
constant.
5.3 Interest Rate Compounding
Refers to the profit that one makes by loaning money to a bank at a given time t,
and receiving it back at time T increased by a factor rt. The factor rt is the interest rate,
and represents how much you win to compensate for the fact that you have been
unable to access or invest your money during the period [t, T].
We proceed to describe the different forms of interest rates through a simple

numerical example.
Imagine that we start off with a sum of S = 100$, and decide to invest it at an
annual interest rate of r = 10% = 0.1
70
5.3.1 Annual Compounding:
1001.1 = 110$ thus mathematically S (1 + rt )
5.3.2 Semi annual compounding:
r = 5% every 6 months. The difference respect the previous case is that you can
reinvest the profits every 6 months.
6M 1001.05 = 105$
12M 1051.05 = 110.25$ equivalent to 1001.052 annual.
2Y 1001.054 = 121.55$
m mn
 r   r 
S  1 + t  
over n years
→ S 1 + t  (5.5)
 m  m
5.3.3 Continuous Compounding:
is the limit when the rate r is fixed at every instant of time
m
 r 
lim S  1 + t  = Se rt ⋅n (5.6)
m →∞
 m
5.4 Present Value PV
Is obtained bringing all future cash flows to present by using predicted or

constant interest rates
− m⋅ n
 r 
∑ S 1 + mt  continuous  → ∑ Se − rt ⋅n
  compound (5.7)
71
5.5 Internal Rate of Return IRR
rate IRR that will make the PV = 0.
− IRR⋅n
 r 
∑ S 1 + mt  = 0 
continuous compound
→ ∑ Se − IRR⋅n = 0 (5.8)
The higher the IRR, the more desirable the investment will be (provided that it is
larger than the bank’s interest rate r).
The IRR is the main method to evaluate an investment together with its NPV.
5.6 Bond Yield (to Maturity) YTM
Is the internal rate of return IRR at the current price. It is always quoted on an
annual basis. It is the face value paid at maturity, with C/m coupon payments each
time period in between. Thus its PV is the face value brought to present summed to
each coupon also brought to present.
k =n
F C m
PV = ⋅n
+∑ ⋅k
(5.9)
 IRR  k =1  IRR 
1 +  1 + 
 m   m 
If there are n periods of coupons left, then
F C m
PV = ⋅n
+C m+ ⋅n
(5.10)
 IRR   IRR 
1 +  1 + 
 m   m 
There exists an inverse relationship between interest rates and bond prices. If the
IRR increases, then the PV of the bond price decreases.
72
5.7 Coupon Rate
It is possible to calculate the most representative points on the present value

curve for any given bond. Imagine that F = 100$, C = 10$ :
k =n
· if IRR = 0 then PV = F + ∑ C m
k =1
· if Coupon = YTM so price is constant at F
· if IRR ∞ then PV0
Price
15%
10%
F10%
10% Yield
Fig. 5.3. Bond curve dynamics
5.7.1 Time to Maturity
For each curve we can calculate three characteristic points, as before, analysing
how they are influenced by the maturity
k =n
· if IRR = 0 then PV = F + ∑ C m thus if you increase the maturity T, more
k =1
coupons are paid thus the present value is greater
73
· if YTM= Coupon then PV = F thus all curves pass through

the same point
· if IRR ∞ PV0
Price
30Y
10Y
3Y
10% Yield
Fig. 5.4. Bond curve for different maturities
As maturity increases, the curve becomes steeper, becoming also more sensitive
to variations in interest rates
5.7.2 Using Bonds to Determine Zero Rates:
Paying no Coupons: using the interest rate compounding formula
Principal = 100$ Time to Maturity: 3M Price: 97,5$
100 − 97.5
You earn = 2.56% every 3M
97.5
4
In 1Y you would earn (Note that you do not reinvest so cannot do 0.0256 )
2,56%·4 = 10,24% per year
1
− r3 M ⋅
An easier way to see this is by doing 97.5 = 100e 4
giving r 3M = 10,127%
74
Paying Coupons: once the short rate interests have been calculated, we can use
the short term bonds:
Principal = 100 Time to Maturity: 1.5Y Price: 97,5$
Having calculated the zero interest rate curve paying no coupons, we have
3m 10,127%; 6M 10,496%; 1Y 10,536%
• Let us imagine our bond is defined by having
Principal = 100 Time to Maturity: 1.5Y Price: 96$
Annual Coupons paid every 6M = 8$ (thus every 6M we receive 4$)
The price will be all the cash flows discounted to present
1
− r6 M ⋅
price = C1e 2
+ C2 e − r1Y ⋅1 + ( principal + C3 )e − r1.5Y ⋅1.5
(5.11)
1
−10.469⋅
−10.536⋅1 − r1.5 Y ⋅1.5
96 = 4e 2
+ 4e + (100 + 4)e → r1.5Y = 10, 681%
5.8 Interest Rates
Most of the interest rates we will deal with are yearly-based rates. In general, the
times that appear are dates whose length we specify as (T;U). This is, we convert the
interval into a yearly basis as
m = m(T, U) = U – T /1 year
the day-count fraction assumed for the product can vary depending on the
reference used. We state here the most common: (act=act, act=365, act=360, etc.).
Further, each currency has a particular calendar to be considered when

calculating the difference U-T. The convention that is usually taken is that of counting
national holidays, bank holidays, etc.
The simple forward rate contracted at t for the period [T; U] is
75
 B (t , T )  1
L(t , T ,U ) =  − 1 (5.12)
 B (t ,U )  m(T ,U )
This can easily be obtained following the previous discussions we have

considered. That is:
U
∫ rs ds
et
t T U
T
∫ rs ds U
∫ Ls ds
e t
eT
Fig. 5.5. Relating discount rates
U T U
∫ rs ds ∫ rs ds ∫ Ls ds
Therefore e t
=e t
⋅ eT . L is clearly the forward rate and therefore the rate
between two future time periods, whereas r is the known rate between today t and a
future time period.
If we do not use continuous compounding but instead use annual compounding

(without reinvestment) the former then becomes
U T
∫ rs ds ∫ rs ds
e t
= et ⋅ (1 + L ⋅ m) (5.13)
U T m
∫ rs ds ∫ rs ds  L
Not to mistake with e t
= et ⋅ 1 +  that would involve the
 m
reinvestment of the benefits obtained. Recall that the above can be rewritten as
1 1
U = T ⋅ (1 + L ⋅ m) which are the discount factors
− ∫ rs ds − ∫ rs ds
e t
e t
1 1
= ⋅ (1 + Lm) . Solving for L we obtain the simple forward rate:
B(t , U ) B(t , T )
76
 B (t , T )  1
L(t , T , U ) =  − 1 (5.14)
 B (t , U )  m(T ,U )
The EURIBOR (Europe Inter-Bank Offer Rate, for the European zone and fixed in
Frankfurt) and the LIBOR (London Inter-Bank Offer Rate, fixed in London) are simple
forward rates. They are an average of the rates at which certain banks (risk rated
AAA) are willing to borrow money with specified maturities (usually 3M, 6M, 1Y).
The rates L(t,T,U) are unknown (modelled as stochastic) as they fluctuate up until the
true fixing date T*. The Libor rate is usually fixed sometime before the payment
period starts. In the Euro zone, the fixing T* is made two business days (2bd) before T,
and in London, it is made the same day, thus T = T*.
5.8.1 Overnight market rates:
Are forward rates fixed each day for a one day period starting that day, meaning
they have the form O(t; T; U) with t = T, U = t + 1bd. Therefore O(t; t; t + 1bd)
5.8.2 The tomorrow-next market rates:
Are forward rates for a one day period, i.e they are fixed on one day and applied
on the following. They are of the form TN(t; T; U) with T = t + 1bd, U = t + 2bd.
Therefore TN(t, t+1bd; t+2bd)
We will now proceed to demonstrate that the market interest rate curve can only
exist if there exists a discount factor curve:
Observe that from the expressions, in the first equation B(t,t) = 1.
 1  1
O (t , T ,U ) =  − 1
 B(t , t + 1bd )  m(t , t + 1bd )
(5.15)
 B(t , t + 1bd )  1
TN (t , T ,U ) =  − 1
 B(t , t + 2bd )  m(t + 1bd , t + 2bd )
For each of the two equations independently, we can recover at time t the values
of the zero-coupon curve B(t; T) for T = t + 1bd, and T=t+2bd given the market rates
77
for O(t; T; U) and TN(t; T; U). For longer T's, we use Libor rates to obtain the values of
B(t,T) since their usual fixing date is simpler- of the form t = T. Thus, from the
different Libor rates we obtain B(t,U) with U = T + 3M, T + 6M, T + 1y, typically. For
longer values of U we need to use swap forward rates.
5.9 Forward Rates
Forward interest rates are those implied by current (continuously compounding)

zero coupon rates.
YEAR Zero Rate per Forward

Year Rate
1 10
2 10.5 11
3 10.8 11.4
4 11 11.6
Table 5.1 Construction of forward rates from bonds
r ⋅T2 r ⋅T r f ⋅( T2 −T1 ) rT2 ⋅ T2 − rT1 ⋅ T1

S 0e T2 = S 0e T1 1 ⋅ e and solving rf =
T2 − T1
We can therefore calculate the forward rate for the 2Y:
r f 1Y ⋅1
100e10.5⋅2 = 100e10⋅1 ⋅ e → rf 1Y = 11%
And analogously, we can calculate the forward rate for 4Y as:
r f 3Y →4 Y ⋅1
100e11⋅4 = 100e10 ⋅ e11 ⋅ e11.4 ⋅ e → rf 3Y →4Y = 11.6%
r f 3Y →4 Y ⋅1
Or 100e11⋅4 = 100e10.8⋅3 ⋅ e → rf 3Y →4Y = 11.6%
78
5.10 Instantaneous forward rate
The instantaneous forward rate is the risk free rate of return which we may have
on an investment over the infinitesimal interval [T; T+dT ] if the contract is made at T.
Thus it is natural to view f(t; T) as an estimate of the future short rate r(T).
Recall, assuming m(t, U) = U - T, that we had
 B(t , T )  1 − 1  B(t , U ) − B(t , T ) 

L(t , T , U ) =  − 1 =   (5.16)
 B(t , U )  m(T ,U ) B(t ,U )  U −T 
If we assume U and T to be very close, then
−1  B (t , U ) − B(t , T ) 
lim L(t , T , U ) = lim  =
U →T U →T B (t , U )
 U −T 
(5.17)
1 ∂B(t , T ) ∂ log B (t , T )
=− =−
B(t , T ) ∂T ∂T
∂ log B(t , T ) 1 ∂B(t , T )

Because =
∂T B(t , T ) ∂T
We therefore define the instantaneous forward rate with maturity T
∂ log B(t , T )
F (t , T ) = − (5.18)
∂T
U
B(t ,U ) − ∫ F ( t ,u ) s du
We have =e T (5.19)
B(t , T )
If instead of T we were to take t, where B(t, t) = 1, we would have
U
− ∫ F ( t ,u ) s du
B(t , U ) = e t
(5.20)
But remember now from (5.3) that we also had the expression from the zero
coupon bond
 − ∫ rs ds 
U
B(t ,U ) = E e t 
P
(5.21)
 
79
Therefore, by making U tend to t in (5.21), the expectation becomes known at t.

Having done this, we can now directly equate the expressions (5.20) and (5.21),
obtaining
rt = F(t, t) (5.22)
80
6. More Complex Derivative Products
The LIBOR is the London InterBank Exchange Rate. We will let L(t,T,U) denote
the forward LIBOR, seen from time t. This is, the Libor rate that will exist in a future t
period, ranging from T to U. Thus the spot LIBOR rate is given as L(T,T,U). This rate
fixes at time T, such that 1$ invested at this rate pays 1+mL(T,T,U) at maturity U. The
maturity U is generally expressed in terms of fractions of years, such that a 3 month
LIBOR will have an m = 0,25.
6.1 Calls and Puts
6.1.1 Call
A European call option (respectively a put option) with strike or exercise price K
and maturity or exercise date T on the underlying asset St, is a contract that gives to its
holder the right, but not the obligation, to buy (respectively to sell) one share of the
asset at the fixed time T. The underwriter of the option has the obligation, if the holder
decides to exercise the option, to sell (buy) the share of the asset. The option can be
exercised exclusively at time T.
The buyer of a call option wants the price of the underlying instrument to rise in
the future so that he can ‘call it’ from the vendor at the cheaper pre-established price
and then go on to sell it in the market at a profit. The seller either expects that the
call’s price will not rise, or is willing to give up some of the upside (profit) from a
price rise in return for (a) the premium (paid immediately) plus (b) retaining the
opportunity to make a gain up to the strike price
A European call option allows the holder to exercise the option (i.e., to buy) only
on the option expiration date. An American call option allows exercise at any time
during the life of the option.
The price is known as the exercise price or strike price. The final date is the
expiration or maturity.
81
Chapter 6 More Complex Derivative Products
Fig. 6.1. Investor’s profit on buying a European call option: Option price= 5$;
Strike K= 60$
In the above, consider an investor who buys a European call option with strike
60$ to buy a given stock whose current price is 58$. The price to purchase this option
is 5$. The option exercises in 5 months.
If in 5 months the stock price is above 60$, say 75$ the option will be exercised
and the investor will buy the stock. He can immediately sell the share for 60$ at the
market, thus gaining 15$ (ignoring the initial 5$ cost to buy the call). If the price of the
stock falls below 60$, the investor will clearly not exercise the call option to buy at 60$.
The call’s payoff is therefore
Payoff = max( ST − K ;0) (6.1)
The above figure shows the investor’s net profit or loss on a call option. It is
important to notice that the investor can lose money even if he does not exercise the
purchase, because he has paid the initial price or prime of 5$.
Notice the subtlety that even if the stock price rises to 62$ which is above the
market price of 60$, the investor is still incurring in a loss. This is because he gains 2$
for selling the stock, but initially lost 5$ by entering the transaction.
Nevertheless, although in the graph the investor is under the horizontal axis and
therefore in a situation of loss, he should still exercise the option. In this way his loss is
82
only of 3$. If he does not exercise, his loss are the entire 5$ that he spent to enter the
contract.
The above calculation (6.1) still holds valid as the call’s terminal value. Although
we are not taking the initial price into consideration, we have seen that irrespective of
the initial price, we will always sell if the final price is greater than the strike
Fig. 6.2.Vendor’s profit on selling a European call option: Option price= 5$; Strike
= 60$
From the vendor’s point of view, he will only gain a profit (the price at which the
call was sold) if the option is not exercised. We see his profits in the above graph. As
soon as the stock’s price rises above 60$, he is losing money because he is selling at
60$ an asset that is worth more than 60$ in the market. His payoff is the exact opposite
of the investor’s:
Payoff = − max( K − ST ; 0) = min( K − ST ;0) (6.2)
Note: an option is said to be “at the money” if the strike price of the option equals
the market price of the underlying security.
83
6.1.2 Put
A call option gives the holder the right to buy the underlying asset. A put option
gives the holder the right to sell the underlying asset by a certain date at a certain
price. The vendor of a put option therefore hopes that the asset’s price will decrease so
that he will be able to ‘put it’ to the buyer at the higher pre-established price.
Imagine an investor who buys a European put option with a strike price of 90$.
Imagine the current stock price is of 85$, and that the option price is 7$.
Fig. 6.3. Investor’s profit on buying a European put option: Option price= 7$; Strike =
90$
If the share price drops to 75$, the buyer earns 15$ (ignoring the initial cost)
because he can exercise the option to sell at 90$. If the price instead rises above 90$,
the investor will clearly not exercise the option to sell at 90$, and will instead go
directly to the market where he can sell the stock at its true, higher price. The put’s
payoff is
Payoff = max( K − ST ;0) (6.3)
Conversely, the person who buys the put can only win the prime price paid when
the investor entered the contract, if the stock’s price goes up. If it declines, he will be
forced to buy the stock more expensive than its market quote, so will be losing money.
His payoff is
Payoff = − max( K − ST ; 0) = min( K − ST ;0) (6.4)
84
Fig. 6.4. Profit from writing a European put option: Option price= 7$; Strike = 90$
6.1.3 Present value of an option
We already derived in the Black Scholes section (Chapter 3.1.2) a formula to

calculate the present value of a call which at a future date gives the final payoffs that
we have seen above. We simply recall the results that were derived.
Call = S0e − qT N ( d1 ) − Ke− rT N ( d 2 ) (6.5)
Put = Ke− rT N ( − d 2 ) − S0 e − qT N ( −d1 ) (6.6)
 S e − qT  S0
Ln  0  = ln − qT (6.7)
 K  K
S   σ2 
Ln  0  +  r − q + T
K  2 
d1 = (6.8)
σ T
S   σ2 
Ln  0  +  r − q − T
K  2 
d2 = = d1 − σ Τ (6.9)
σ T
85
6.2 Forward
A forward contract on a commodity is an agreement between two parts to

purchase or sell a specific amount of an underlying commodity at a defined price, at a
determined future date. It is therefore a security that settles a future price,
irrespectively of any possible future variations in that asset’s price. In a forward
contract, all the payments take place at time T. The buyer is said to be long whereas
the seller is said to be short.
This is best understood with a graphical representation:
K=F(t,ST)
difference
St ST
t T
Fig. 6.5. Forward contract: future expected value versus real future value
St is the asset’s current price. F(t, ST) is today’s expected future price of the asset at
time T, and so is the price K at which the vendor sells his asset today to be delivered
in the future.
ST is the real future price of the asset, and is only known at time T.
In the above example, the vendor sold above the real future price so earns the
difference (K–ST). Obviously the buyer only enters such a contract if he thinks the
asset is going to be more expensive in the future, so he secures a fixed future price K.
6.2.1 Mathematically
A forward contract on an asset ST made at t has the following cash flows:
• The holder of the contract pays at time T a deterministic amount K that was
settled before at t.
86
• He receives a stochastic (floating) amount F(T,ST) = ST which is the

future value of that underlying asset he has bought. Nothing is paid or
received at t.
The forward price K is determined at time t so that it equals the future expected
price of the asset ST.
Forwardt = e − r (T −t ) Et P [ ( ST − K )] = 0
(6.10)
K = S t e − r (T − t )
6.2.2 Call option on a forward
A call option over a forward contract would be the possibility of entering the
forward agreement at a time 0 < t < maturity. One would enter only if he were to
receive a floating amount that were greater than the fixed amount K paid.
CallForward t = e − r (T −t ) Et P  ( ST − K ) + F%t  = 0
(6.11)
6.3 Future
Future contracts are quoted in organised markets, giving them great liquidity and
few transactional costs.
A futures contract on an asset ST made at t with delivery date T is very much like
a forward contract, with some differences:
K=F(t,ST)
difference
St ST
t T
87
Fig. 6.6. Future contract: future expected value versus real future value
• St is the asset’s current price.
• F(t, ST) is today’s expected future price of the asset, and so is the price K at which
the vendor sells his asset to be delivered in the future.
• ST is the real future price of the asset, and is only known at time T.
• F(ti, ST) is the evolution of that expected future value as we advance through time.
The expected future value right at the end must equal the price of the underlying
at that point, thus F(T, T, ST) = ST.
The buyer, instead of paying the difference at the end T, now pays it continually
at every ti as the difference in F with respect to the previous date. If ever this
difference increases (moves upwards in Fig. 6.6), he receives this amount from the
vendor instead of having to pay for it.
In the previous example, the seller sold above the real price, so earns the
difference. In practice, the exchange of the asset at T for its real price at T needn’t be
done. Instead only the cash is exchanged.
6.3.1 Mathematically
Mathematically we can give the following explanation:
In forwards, the price of K at time t was the expected future cost of the asset F(t,
ST). Now with futures, this is substituted by a continuous stochastic payment process
F(t, T, ST) made continuously over the time interval (t,T] in such a way that the
contract value be continually zero. There is no obligation to interchange the claim ST
and the payment F(T,T, ST). The contract is achieved by the following agreement:
During each time interval (ti-1; ti), t ≤ t1 ≤ t2 ≤ …≤ T, the holder of the contract pays
F (t i , T , ST ) − F (t i −1 , T , ST ) . If the payment is negative, he will receive the money

from the counterparty. Therefore
[ ~
]
F (t , T1 , S T1 ) = Et ST1 Ft = S t e r (T1 − t )
P
(6.12)
88
6.3.2 Call option on a future
One can also formulate call options on futures contracts. For instance, consider a
future contract on ST1, with delivery date T1, and a call option on F(T; T1; ST1) with
exercise price K and exercise date T < T1. The payment at T of this option is thus (F(T;
T1; ST1) - K)+. From this, we obtain that the price of the call option on the future is
(
= e − r (T1 −t ) Et P  F (t , T1 , ST1 ) − K ) F%t  =
+
CallFuturet
 
(6.13)
(
Et  St e r (T1 −t ) − K ) F%t 
− r (T1 −t ) +
=e P
 
Thus, proceeding as above in the call option case, and by using Black Scholes, we
get
(
CallFuturet = e − r (T1 − t ) F (t , T1 , ST1 ) N (d1 ) − KN (d 2 ) ) (6.14)
with
 F (t , T1 , S T1 )  σ T2 ⋅ (T − t )
log  ±
 K  2
d1T, 2 = (6.15)
σT ⋅ T −t
6.4 FRA
A FRA, or future rate agreement, consists in an agreement between two

counterparties, where one of the two parts pays a fixed flow of income (or fixed leg)
against a variable flow of income (or variable leg) which he receives in return.
Let us set the case where we pay a fixed leg in 3 months at a 3% annual interest
rate, fixed today.
In return we receive a 3 month EURIBOR rate fixed at maturity (in T = U = 3M)
89
Variable leg
Fixed leg
Fig. 6.7. FRA payoffs
We will therefore be paying 0.03 · 0.25 = 0.75%
By definition this contract has a NPV = 0
6.5 FRA Forward
Consists in an agreement between two counterparties, where one of the two parts
pays a fixed flux of income (or fixed leg) against a variable flux of income (or variable
leg) which he receives in return. The fixed rate is settled at the present date t = 0,
whereas the variable leg is set by the LIBOR value at the fixing date T, over a time
period up until U. The difference between the two rates at T is exchanged at U.
Variable leg
0 T U
Fixed leg
Fig. 6.8. FRA future’s payoffs
Mathematically, the interchange of flows at time U is then
Nm ( K − L (T , U ) ) (6.16)
where N is the nominal and L(T,U) the floating rate. The value of such a contract
discounted to time t, assuming N = 1 for simplicity, is:
90
 − rs ds 
U
FRAt = E P e ∫t m ( K − L (T ,U ) ) F%t 
 
 
T
= E P  B(T ,U )e ∫t m ( K − L (T , U ) ) F%t 
− rs ds
(6.17)
 
remembering that the LIBOR forward rate L(t,T,U) is defined as
 B(t , T )  1
L(t , T , U ) =  − 1 (6.18)
 B(t , U )  m(T ,U )
 − 
T
= E e ∫t ( KmB(T ,U ) − 1 + B(T ,U ) ) F%t 

P rs ds
then FRAt (6.19)
 
And because bonds can be expressed as
B(t , T ) = e ∫t
− rs ds
(6.20)
we can rewrite the above as
 − rs ds   − rs ds   − rs ds 
T T T
FRAt = KmE  e ∫t
P
B(T ,U ) F%t  − E P  e ∫t  + E P  e ∫t B(T ,U ) 
     
 − rs ds − rs ds % 
T U
 − rs ds 
T
 − rs ds − rs ds % 
T U
= KmE P  e ∫t e ∫T Ft  − E P  e ∫t  + E P  e ∫t e ∫T F
     
= KmB(t ,U ) − B(t , T ) + B(t ,U )
(6.21)
Let us set as an example a FRA Forward contract 3M x 15M.
We are obliged to pay, at time U = 15M, the rate fixed today.
In return we receive at time U the LIBOR rate L(T,U). This quantity is variable as
we cannot know the value of the LIBOR lasting 1Y and starting in 3M until T = 3M
itself.
In reality, what is exchanged is the difference between the two rates, multiplied
by the notional over which the exchange was agreed.
91
6.6 Caplet
Is the option of entering a FRA contract. This is, the option of entering the
FRA at time T (at a premium), if the variable leg is greater than the fixed leg (and thus
compensates for the premium paid).
Therefore, a caplet with maturity T and strike K will have the following
payoff: at time U the holder of the caplet receives:
Caplet T = m( L (T , T , U ) − K ) + (6.22)
Note that the caplet expires at time T, but the payoff is received at the end of the
accrual period, i.e. at time U. The payoff is day-count adjusted. The liabilities of the
holder of this caplet are always bounded above by the strike rate K, and clearly if
interest rates increase, the value of the caplet increases, so that the holder benefits
from rising interest rates.
By the usual arguments, the price of this caplet is given by the discounted risk-
adjusted expected payoff. If {P (t, T) : T ≥ t} represents the observed term structure of
zero-coupon bond prices at time t, then the price of the caplet is given by
[
Caplet t = mB (t ;U ) Et ( L (T , T , U ) − K ) + ] (6.23)
In this equation, the only random term is the future spot LIBOR, L(T,T,U). The
price of the caplet therefore depends on the distributional assumptions made on L(T,
T, U ). One of the standard models for this is the Black model. According to this
model, for each maturity T, the risk-adjusted relative changes in the forward LIBOR
L(t, T, U ) are normally distributed with a specified constant volatility σT , i.e.
dL(T , T ,U )
= σ T dWt (6.24)
L(T , T ,U )
This implies a lognormal distribution for L(T, T, U ), and under this modelling
assumption the price of the T – maturity caplet is given by
ς t = mB(t ;U ) E P ( L(t , T , U ) − K ) 
+
(6.25)
 
ς t = mB(t ;U )  L(t , T , U ) N (d1T ) − KN (d 2T ) 

i i
(6.26)
92
 L(t , T ,U * )  σ T2 ⋅ (T − t )
log  ±
 K  2
d 1T, 2 = (6.27)
σT ⋅ T − t
z 1
1 u2
N ( z) =
2π
∫e −2
−∞
du (6.28)
Exercise Option
Variable leg
0 T U
Fixed leg
Fig. 6.9. Caplet payoffs
6.6.1 Caplet as a Put Option on a Zero-Coupon Bond
A caplet is a call option on an interest rate, and since bond prices are inversely
related to interest rates, it is natural to be able to view a caplet as a put option on a
zero coupon bond. Specifically, the payoff of a caplet at time U is
CapletU = m( L(T , T ,U ) − K )+ (6.29)
This payoff is received at time T + 1Y/m, where m is the number of payments in a

year. The LIBOR rate prevalent over the accrual period [T, U ] is L(T, T, U ). It follows
that, at time T, the price of the caplet is annually discounted from U to T as was shown
in (5.13) as
1
Caplet T = m( L(T , T ,U ) − K ) + (6.30)
1 + mL(T , T , U )
The price of a zero-coupon bond, on the other hand, is expressed as
1
B(t ; T ,U ) = (6.31)
1 + mL(T , T ,U )
from which it follows that
93
+ +
1 1    1 
Caplet T = mB(t ; T ,U )  − 1 − K  = (1 + mK ) − B(t ; T ,U ) 
 m  mL(T , T ,U )    (1 + mK ) 
(6.32)
This is just 1 + mK units of a put option on the T + 1Y/m – maturity zero-coupon

bond with strike (1 + mK) − 1 . Thus a caplet is a put option on a zero-coupon bond. A
cap, therefore, is a basket of put options on zero-coupon bonds of various maturities.
6.7 Cap
Is a sum of Caplets. This is, the option at every fixing Ti of entering that specific
Caplet from Ti to Ui = Ti+1. Each Caplet has an equal duration of time 1year/m up until
its corresponding Ui. More specifically, it is a collection (or strip) of caplets, each of
which is a call option on the LIBOR level at a specified date in the future.
To construct a standard cap on the 1year/m – maturity LIBOR with strike K and
maturity U, we proceed as follows. Suppose we are currently at time t. Starting with
U, we proceed backwards in steps of length 1year/m. Let n be the number of
complete periods of length 1year/m between t and U. Thus, we get a set of times
T0 = t + δ
T1 = T1 + 1Y/m
T2 = T2 + 1Y/m = T0 + 2·1Y/m
Tn = U = T0 + n·1Y/m
We now construct the portfolio of n caplets, struck at K, with maturities T0, T1,…,
Tn-1, called the fixing dates or caplet maturity dates. The payment dates are thus T1,
T2,…, Tn. The cap is then just equal to the sum of the prices of the strip of caplets. We
will now calculate this strip:
If ζi(t) denotes the price at time t of a caplet with maturity date Ti (and payment
date Ui = Ti+1), then the price of the cap is
94
ς (t , T ) = ∑ ς i (t ) =∑ mB(t ;U ) Et [( L(T , Ti , U i ) − K ) + ]
n −1 n −1
(6.33)
i =0 i =0
Applying Black Scholes
[ ]
n −1 n −1
ς (t , T ) = ∑ ς i (t ) =∑ mB(t;U ) L(T , Ti , U i ) N (d 1T ) − KN (d 2T ) +
i i
(6.34)
i =0 i=0
The only quantity that cannot be directly observed in this pricing formula is the
set of forward rate volatilities, σTi for each caplet. Thus
ς (t , T ) = ς (t , T , σ T , σ T ,...., σ Tn −1 )
0 1
As a given set of forward rate volatilities produces a unique price, if we can find a
single number σ such that
ς (t , T ) = ς (t , T , σ T , σ T ,....,σ Tn −1 ) = ς (t , T , σ , σ ,....,σ )
0 1
then this σ is called the implied or Black volatility for the U – maturity cap.
The market’s observed prices of caps of various maturities are inverted

numerically to obtain a term structure of Black volatilities, and these implied
volatilities are then quoted on the market itself.
Exercise Option
Variable leg
T1 T2 T3 T4 T5 U
0
Fixed leg
Fig. 6.10. Cap payoffs
95
6.7.1 A Floor.
is a strip of floorlets, each of which is a put option on the LIBOR level at a given
future date. The pricing and hedging of floors is exactly complementary to the
treatment of caps. The price of a floor with similar a structure to the plain vanilla cap
discussed before is given by:
[ ]
n −1 n −1
φ (t , T ) = ∑ φi (t ) =∑ mB(t ;U ) KN (−d 2T ) − L(T , Ti , U i ) N (− d1T )
i i
(6.35)
i =0 i =0
Just as a caplet is a put option on a pure discount bond, similarly a floorlet is a

call option on such a bond. The hedging instruments for floors are the same as for
caps, except that positions are reversed since the holder of a floor benefits from falling
interest rates. The owner of a floor is always long in the market and long in vega, i.e.
benefits from rising volatility. The value of a floor also increases with maturity, as the
number of put options increases.
6.7.2 Put-Call Parity
Consider a caplet ζ and a floorlet Ф, each maturing at time T, with the same strike
rate K. Let us construct a portfolio
̟=ζ−Ф
The payoff from this portfolio, received at the end of the accrual period, is
[ ]
π T = m (L(T , Ti ,U i ) − K )+ − (K − L(T , Ti ,U i ) )+ = m[(L(T , Ti ,U i ) − K )]
(6.36)
This is just a cash flow from a payers swap. Thus we have the following version
of the put-call parity appropriate for caps and floors:
Cap – Floor = Payer Swap
96
6.8 Swap
Consists in an agreement between two counterparties made at t = 0, where one of

the two parts pays a fixed coupon rate K on every fixing (or fixed leg) and in return,
receives a variable floating rate (or variable leg) defined over LIBOR rates. It is thus an
exchange of a series of payments at regular intervals for a specified period of time.
The payments are based on a notional underlying principal. There is only one option
to enter or not the agreement.
The fixed rate K is settled at the present date t = 0, whereas the variable leg is set
by the LIBOR value at each fixing date Ti, lasting a time period until Ui - thus L(Ti, Ui).
The difference between the two rates is exchanged at Ui, after being multiplied by a
notional. Note that every maturity has Ui = Ti+1
The lifetime of the swap is called its tenor. An investor is said to hold a payer
swap if he pays the fixed leg and receives the floating; an investor is said to hold a
receiver swap if the reverse is true. The time-table for payments can be represented
schematically as follows (assuming a unit notional).
Time Fixed Coupon Floating Coupon Cashflow

T0 0 0 0
T1 mK mL(T0,T0,U) m(K-L(T0,T0,U))
: : : :
Tn mK mL(Tn-1,Tn-1,U) m(K-L(Tn-1,Tn-1,U))
Table 6.1 Swap payoff term structure
Consider the position of the holder of a payer swap. The value of the payer swap
at time t < T0 (the first LIBOR fixing date) is given by
n −1
V (t , Tn ) = ∑ mB(t ; Ti +1 )[( L(T , Ti ,U i ) − K )] (6.37)
i =0
where K is the fixed coupon rate. To give this swap zero initial value, we can set
97
n −1
∑ B(t; T ) L(T , T ,U )
i i i
K = R(t , T0 , Tn ) = i =0
n−1
(6.38)
∑ B(t; T )
i =0
i
This rate R(t, T0, Tn) which gives a zero value to a swap starting at time T0 and
expiring after n payments at Tn, is called the forward par swap rate for a swap with
tenor Tn − T0. The spot starting par swap rate, or just the swap rate, is the fixed coupon
rate that gives zero initial value to a swap starting at the present time. This is denoted
by R(t, t, T) for a swap with tenor T − t.
Variable leg
T1 T2 T3 T4 T5 U
0
Fixed leg
Fig. 6.11. Swap payoffs
6.8.1 Swap –another approach
A swap with notional N and time schedule τ = (T0; T1; T2,…,Tn) is a contract that
interchanges payments of two legs:
The fixed leg pays at times T1; T2,…,Tn a fixed annual rate
NmiK, With i = 1,… ,n mi = m(Ti-1; Ti)
The floating leg pays at times U1,…, Um that are most probably different from Ti,
although U0 = T0, Um = Tn. The payments are based on the Libor forward rates,
resetting at Ti. That is, seen from time t ≤ T0,
N miL(t; Ui; Ui+1) with i = 1,…,n mi = m(Ui-1; Ui)
The basis or day count convention for the floating leg is usually act/360, and for
the fixed leg there are many usual choices. There are different classes of swaps,
according to what is done with the fixed leg. If it is paid, we call it a payer swap, and a
receiver swap if the fixed leg is received. Let us think, for example, of a payer swap.
98
 − ∫ rs ds 
T
According to S t = E e t ST  the value of the payer swap will be the conditional

P
 
expectation of the discounted payoff of the product. Thus, the value (at t) of our payer
swap is then the value of the fixed leg minus the value of the floating leg, that is
Vt s = Vt fi − Vt fl (6.39)
For simplicity, let us assume N = 1. For the fixed leg, by the linearity of the
conditional expectation and the definition of B(t; T), its value is
 n − T∫i rs ds  n
Vt = E  ∑ e t Kmi F%t  = ∑ B(t , Ti )Kmi
fi P
(6.40)
 i =1 
  i =1
To value the floating leg, let us assume that there are no differences in the time
schedules Ti and Ui as was mentioned previously. Therefore, its value is
 n − T∫i rs ds 
Vt fl
= E  ∑ e t mi L ( t , Ti −1 , Ti ) F%t 
P
 i =1 
 
n  −T∫i rs ds  B ( t , T )  1 
= ∑ mi E  e t 
P i −1
− 1 F%t 
  B (t,T ) 
i =1  i  mi 
n  − T∫i rs ds   B ( t , T ) 
= ∑ E e t
P
F%t   i −1
− 1 (6.41)
i =1 



 B ( t , T i ) 

n  B ( t , Ti −1 ) 
= ∑ B ( t , Ti )  − 1
 B (t,T ) 
i =1  i 
n
= ∑ ( B ( t , Ti −1 ) − B ( t , Ti ) ) = B ( t , T0 ) − B ( t , Tn )
i =1
With different time schedules, the formula results less concise. Nevertheless, it is
often said that a floating leg that is valued as was done previously is valued “at par”.
Thus, the value of the payer swap is
n
Vt fl = K ∑ B ( t , Ti ) mi −B ( t , T0 ) + B ( t , Tn ) (6.42)
i =1
99
The market swap rate associated to a swap with a time schedule of τ = (T0; T1;
T2,…,Tn) and a day-count convention m, is the rate of the fixed leg that makes the
contract fair (price 0) at time t: thus, such that the fixed leg and the variable leg have
the same price. Solving for K = Sτ,m and VSt = 0 we obtain
B(t , T0 ) − B ( t , Tn )
Sτ ,m (t ) = n
(6.43)
∑ B ( t, T ) m
i =1
i i
Of course, we can therefore rewrite the initial formula in terms of Sτ,m as
n
Vt s ( K ,τ ) = ( K − Sτ ,m (t ) ) ∑ B ( t , Ti ) mi (6.44)
i =1
Observe that by using the values of the curve already obtained from the forward
cash rates and using these market swap rates, (which are market data), we can obtain
the values of the zero-coupon bond B(t,T) for values of T = 3M, 6M, 1y, etc. For values
of the curve with T’s that are not exactly these values, we use interpolation (usually
log linear interpolation, since heuristically B(0; T) ~ e-rT , where the unknown is r).
Such a recursive method to obtain information is often called a bootstrapping method.
6.9 Swaption
A swaption is the option of entering a Swap contract at a future time T. There are
two types:
A Payer swaption allows its holder to exercise into a payer swap at maturity U,
thus agreeing to pay a fixed quantity and receive floating cash flows for a specified
period of time, called the swap tenor.
A receiver swaption allows to exercise into a receiver swap, paying floating and
receiving fixed rate payments for a specified time.
At maturity, the holder of a swaption can exercise into swaps having several
possible tenors. For this reason, swaptions must specify not just the expiry time T of
the option, but also the tenor T* of the swap resulting from exercise. Thus swaption
prices, volatilities etc are quoted on a matrix.
100
6.9.1 Payoff Structure
Consider a T × T* payer swaption. Let T = T0, T1, ..., Tn−1 be the fixing dates and
T1, T2, ..., Tn = T* the cash flow dates for the swap. If K is the fixed swap coupon and m
the constant accrual factor between fixing and payment dates, then the value at time t
of the payer swap is
n −1
V (t , T *) = ∑ mB(t ; Ti )[( L (T , Ti ,U i ) − K )] (6.45)
i=0
The payoff from the payer swaption is therefore given by
Payoff (t , T × T *) = V (T , T *) + (6.46)
If we consider a forward-starting par swap, then the coupon rate K is given by

R(t, T, T*). Substituting this rate into the payer swap underlying the swaption, in
(6.45) we get
n −1
V (t , T *) = (( R(t , T , T * ) − K )∑ mB(t ; Ti ) (6.47)
i=0
The payoff from the swaption becomes
n −1 +
 
Payoff =  (( R(t , T , T * ) − K )∑ mB(t ; Ti )  =
 i =0 
n −1
= (( R(t , T , T * ) − K ) + × ∑ mB(t ; Ti ) (6.48)
i =0
= (( R(t , T , T * ) − K ) + × PV 01
n −1
The summation factor ∑ mB(t;T ) is called the PV01, and represents the present
i =0
i
value of a basis point paid on the swap cash flow dates. Thus a payer swaption is just
a call option on the forward swap rate, with strike K. Similarly, a receiver swaption is
a put option on the forward swap rate.
101
6.9.2 Pricing a Swaption
The Black model is the market standard for pricing swaptions. In fact it is curious
to note here that the Black model started being used by the market itself, and it was
not until later that a theory was elaborated to justify its application The two crucial
assumptions in the model are firstly, that the forward swap rate R(t, T, T*) is driven
by a zero drift geometric Brownian motion dR
dR(t , T , T * )
= σ T ×T * dWt (6.49)
R(t , T , T * )
and secondly, that the discounting is constant. This implies that the PV01 does
not change through time.
Now, by using exactly the same calculations as for vanilla caplets, it is easy to see
that the price of a payer swaption is given by
[
Payoff (t ; T × T * ) payer = B (t ; Ti ) × PV 01 × E ( R(t , T , T * ) − K ) + ]
[ ]
(6.50)
= B(t ; Ti ) × PV 01 × R (t , T , T ) N (d ) − KN (d )
* T
1
T
2
 R(t , T , T * )  σ T ×T * ⋅ (T − t )
2
log  ±
 K  2
d 1T, 2 = (6.51)
σ T ×T * ⋅ T − t
The only unobservable is the forward rate volatility σT×T* . However, there is a 1 to
1 correspondence between this volatility and the resultant price of a swaption, and
this fact is used to invert observed swaption market prices and obtain a matrix of flat
implied volatilities, also known as Black or lognormal volatilities.
A receiver swaption is similar to a payer swaption, except that it can be expressed

as a put option on the forward starting swap rate. The price of an otherwise identical
receiver swaption is given by
Payoff (t ; T × T * ) receiver = B(t ; Ti ) × PV 01× E  ( K − R (t , T , T * ))+ 

(6.52)
= B(t ; Ti ) × PV 01×  KN (−d 2T ) − R(t , T , T * ) N (−d1T ) 
102
6.9.3 Put-Call Parity for Swaptions
Let the PayoffReceiver (t, T × T*) be a receiver swaption with strike K and the
PayoffPayer (t, T × T*) be an identical payer swaption. Consider the portfolio
Payoff (t, T × T*) = PayoffPayer (t, T × T*) - PayoffReceiver (t, T × T*)
It is simple to verify that
Payoff (t, T × T*) = PV01 x (R(T,T,T*)-K)
Or written in words
Payer Swaption – Receiver Swaption = Payer Swap
By no-arbitrage, the value of the portfolio (the Payoff) must be equal to the value
of a payer swap. This relationship can be used as an alternative to direct integration
when finding the price of a receiver swaption. The ATM strike is defined to be the
value of K which makes the values of the payer and receiver swaptions equal. Put-call
parity now implies that this must be the same rate that gives a forward-starting swap,
zero value. In other words, the ATM strike is simply equal to the forward starting par
swap rate R(t, T, T*).
103
Chapter 7 HJM
7. HJM
7.1 Introduction
The Heath-Jarrow-Morton framework is a general framework to model the

evolution of interest rates (forward rates in particular). It describes the behaviour of
the future price (in t) of a zero coupon bond B(t,T) paying 1 unit of currency at time T.
The framework originates from the studies of D. Heath, Robert A. Jarrow and A.
Morton in the late 1980s- refer to: “Bond pricing and the term structure of interest
rates- a new methodology” (1987) - working paper, Cornell University, and “Bond
pricing and the term structure of interest rates: a new methodology” (1989) - working
paper, Cornell University.
The Heath, Jarrow and Morton term structure model provides a consistent
framework for the pricing of interest rate derivatives. The model is directly calibrated
to the currently observed yield curve, and is complete in the sense that it does not
involve the market price of interest rate risk, something which was a feature of the
early generation of interest rate models, such as Vasicek (1977) and Cox, Ingersoll and
Ross (1985).
The key aspect of HJM techniques lies in the recognition that the drifts of the no-
arbitrage evolution of certain variables can be expressed as functions of their
volatilities and the correlations among themselves. In other words, no drift estimation
is needed. Models developed according to the HJM framework are different from the
so called short-rate models in the sense that HJM-type models capture the full
dynamics of the entire forward rate curve, while the short-rate models only capture
the dynamics of a point on the curve (the short rate). In practice however, we will not
work with a complete, absolutely continuous discount curve B(t,T), but will instead
construct our curve based on discrete market quotes, and will then extrapolate the
data to make it continuous.
104
Given the zero-coupon curve B(t,T), there exists a forward rate F(t,u) such that
dF (t , T ) = µ (t , T )dt + σ (t , T )dWt P (7.1)
This dynamics is the foundation on which the HJM model is constructed.
7.2 Model Origins
There are two basic arbitrage relationships that derive from the bond pricing
equation:
 − ∫ rs ds 
T
1. B (t , T ) = E e t  associated with the spot rates (Classical)

P
t
 
T
− ∫ F ( t , s ) ds
2. B(t , T ) = e t
associated with the instantaneous forward rates (HJM)
All existing models start from one or another and follow the same general
procedure:
They start with a set of bond prices {B(t,T)} that are reasonably arbitrage free, and
use either of the previous two arbitrage relationships to go backwards in time so as to
determine a model for either the spot rate rt or for the set of forward rates F(t,s)
depending on the arbitrage relationship selected.
As both relations hold under the no arbitrage conditions, the models obtained are
risk adjusted i.e. they are valid under the risk neutral measure P.
The aim behind the creation of these rt or F(t,s) models is to then perform the
inverse path. That is, to use the models developed to price interest rate derivatives
other than bonds.
1. Classical methods use the first relationship. They try to extract from the set of
bonds {B(t,T)} a risk adjusted model for the spot rate rt, using an assumption on the
Markovness of rt.
105
Chapter 7 HJM
2. The HJM approach uses the second relationship. It obtains as a result the
arbitrage free dynamics of ‘d’ dimensional instantaneous forward rates F(t,s). It
requires no spot rate modelling, and what is more, it demonstrates that the spot rate rt
is in general not Markov.
7.3 The HJM Development
The HJM model starts off by exploiting the risk neutral relationship. Imagine
that we have a pair of arbitrage free zero coupon bonds B(t,T), and B(t,U), and let
F(t,T,U) be the default-free forward interest rate contracted at time t, starting at T and
ending at maturity U. For simplicity, we will assume that we have no discount factor
‘m’. As seen in the Libor section ( Chapter 5.1), we can write the arbitrage free
relationship:
= [1 − F (t , T ,U )]
B(t , T )
(7.2)
B(t ,U )
We are thus relating two different bonds (and therefore their two different
dynamics) through a unique forward rate F. This means that the bond’s arbitrage
relations will be directly built into the forward rate dynamics.
The question that logically follows is: which forward rate to use? We have already
seen that there exist both a continuously compounded model for instantaneous rates,
F(t,T), and a discrete expression of the form F(t,T,U). The above clearly makes use of
the discrete approach, and leads to the BGM models created by the work of Brace,
Gatarek and Musiela.
In contrast, the original approach used by HJM was to model the continuously
compounded instantaneous forward rates, F(t,T), where as we saw previously in 5.3.3
that
T
− ∫ F ( t , s ) ds
B(t , T ) = e t
(7.3)
With the above, the arbitrage relationship between interest rate bonds now
becomes:
106
u
B(t , T ) ∫ F ( t , s ) ds
= eT (7.4)
B(t ,U )
We will continue our introduction to the HJM model along the lines of the
original HJM approach.
Notice that there is no expectation operator in the above, since the F(t,s) are all
forward rates observed at the current time t, beginning at a future date s, and lasting
an infinitesimal time ds. For simplicity we will now adopt Bt = B(t,T).
The HJM model, following Black, assumes that a typical bond in a risk neutral
environment follows the stochastic differential equation:
dBt = rt Bt dt + σ (t , T , Bt ) Bt dWt (7.5)
Where rt is the risk-free instantaneous spot rate and is therefore equal for all
bonds or assets. The noteworthy part of this model is the fact that it uses a unique
Brownian Motion. Indeed, it is quite tedious to demonstrate how the Brownians do
not depend on the end WT. We will not enter the details of this development here, but
will simply underline once more the importance of the fact that we are able to take a
unique Brownian parameter.
From the equation above (7.4), we could rearrange the expression so as to have
B (t , T ) u
log = ∫ F (t , s )ds (7.6)
B(t ,U ) T
This can be re written in terms of non instantaneous forward rates, for an

infinitesimal interval ∆ as:
log B(t , T ) − log B(t , T + ∆) = F (t , T , T + ∆)((T + ∆) − T ) (7.7)
by applying Ito’s Lemma:
1  −1 
d [log B(t , T )] = (σ (t , T , Bt ) B(t , T ) )2 dt
1
dB(t , T ) + 0 +  2 
B(t , T ) 2  B(t , T ) 
(7.8)
107
Chapter 7 HJM
we can replace our diffusion expression for dB in the above
 
d [log B(t , T )] =  rt − σ (t , T , Bt ) 2  dt + σ (t , T , Bt )dWt
1 P
(7.9)
 2 
Similarly for log B(T+ ∆ ) we can write
 
d [log B(t , T + ∆)] = rt − σ (t , T + ∆, Bt ) 2  dt + σ (t , T + ∆, Bt )dWt (7.10)
1 P
 2 
It is important to realize that the drift terms rt are the same in both cases, because
we are considering a risk neutral scenario. This is the same argument that is applied in
the Black Scholes derivation.
The drift term is unknown, but we can use a trick to eliminate it:- subtracting the
two equations:
1
d [ log Bt +∆ ] − d [ log Bt ] = σ (t , T + ∆, Bt +∆ )2 − σ (t , T , Bt )2  dt +
2
+ [σ (t , T + ∆, Bt +∆ ) − σ (t , T , Bt )] dWt P
(7.11)
From before we had
log B(t , T + ∆) − log B(t , T )

F (t ; T , T + ∆) =
∆
 σ (t , T + ∆, Bt +∆ ) 2 − σ (t , T , Bt ) 2   σ (t , T + ∆, Bt +∆ ) − σ (t , T , Bt ) 
=  dt +   dWt P
 2∆   ∆ 
(7.12)
Now the above can be considered a derivative if ∆ → 0 . Recall that
∂f f ( x + ∆) − f ( x)
= lim
∂x ∆→ 0 ∆
We can therefore rewrite the first term in (7.12) as:
σ (t , T + ∆, Bt +∆ )2 − σ (t , T , Bt ) 2  ∂σ (t , T , Bt ) 
lim = σ (t , T , Bt )  
∆→ 0 2∆  ∂T 
108
(7.13)
for the second term we have
lim
∆→0 σ (t , T + ∆, Bt + ∆ ) − σ (t , T , Bt ) ∂σ (t , T , Bt )
= (7.14)
2∆ ∂T
and lim F (t , T , T + ∆) = dF (t , T )
∆→ 0
We therefore end up with
 ∂σ (t , T , Bt )   ∂σ (t , T + ∆, Bt ) 
dF (t , T ) = σ (t , T + ∆, Bt )   dt −   dWt
P
 ∂ T   ∂ T 
(7.15)
where σ are the bond price volatilities (which are generally quoted by the market
itself). We therefore, need only solve the above to attain the HJM forward rate model.
Note lastly that the above corresponds to a diffusion model for F(t,T) of the form
dF (t , T ) = σ (t , T + ∆, Bt ) ⋅ b( s, t )dt − b( s, t )dWt P (7.16)
where the partial derivatives are collected under the term b(s,t)
7.4 The rt in the HJM Approach
We will now demonstrate that through the HJM approach, there is no need to
model a diffusion process for rt, that may be inaccurate. Instead, we can directly derive
the spot rates from our instantaneous forward rates F(t,T), by simply realising that:
rt = F (t , t ) (7.17)
This is, that the spot rate corresponds to the nearest infinitesimal forward loan
starting at time t- recall (5.22) .
Now by integrating our forward model derived in (7.16), this is
dF (t , T ) = σ (t , T + ∆, Bt ) ⋅ b( s, t )dt − b( s, t )dWt P (7.18)
109
Chapter 7 HJM
we obtain
t t
F (t , T ) = F (0, T ) + ∫ σ (t , T + ∆, Bt ) ⋅ b( s, T )ds + ∫ b( s, T )dWsP (7.19)
0 0
 ∂σ (t , T , Bt ) 
Remember that we had b ( s, t ) =  
 ∂T 
t
meaning σ (t , T , Bt ) = ∫ b( s, u )du (7.20)
s
t
t  t
F (t , T ) = F (0, T ) + ∫ b( s, T )  ∫ b( s, u )du  ds + ∫ b( s, T )dWsP (7.21)
0 s  0
So if we now select T = t, our expression for the forward rate becomes an

expression for the spot rate:
t
t  t
rt = F (0, T ) + ∫ b( s, T )  ∫ b( s, u )du  ds + ∫ b( s, T )dWsP (7.22)
0 s  0
The forward rates are biased estimators of future spot rates under the risk free
measure.
Proof
Let us demonstrate this by taking the conditional expectation of a future spot rate
rτ with τ > t . Then
τ τ   τ 
EtP [rτ ] = EtP [F (t ,τ )] + EtP  ∫ b( s,τ )  ∫ b( s, u )du  ds  + EtP  ∫ b( s,τ )dWsP 
 t s   0 
(7.23)
The last term is 0 as all Brownian processes have 0 future expectation. F(t,τ)is
known at time t, so comes out of the expectation. We are thus left with
τ τ  
EtP [rτ ] = F (t ,τ ) + EtP  ∫ b( s,τ )  ∫ b( s, u )du  ds  (7.24)
 t s  
110
meaning F (t ,τ ) ≠ EtP (rτ )
The HJM exploits the arbitrage relationship between forward rates and bond
prices, eliminating the need to model the expected rate of change of the spot rate.
111
Chapter 8 Santander HJM
8. Santander HJM
In the Heath Jarrow Morton framework, we assume the bond prices follow the
subsequent dynamics:
dB(t , T )
= rt dt + Γ(t , T )dWt
P
(8.1)
B(t , T )
The first term rt dt is constructed based on the risk neutral probability P which
was already explained in detail in the mathematical section 2.6. Remember that it
implies that all assets, bonds and securities should present an equal internal rate of
return. It is thus independent of all bond prices B(t,T), avoiding any arbitrage
possibilities.
Γ(t , T )dWt follows the Heath Jarrow Morton model. The

P
The second term
specific formulation chosen for the diffusion term Γ(t , T ) has been invented and
developed by the Banco Santander quant team itself.
The initial equation which we have set out with would require a set of infinite
Brownians for every t, especially if the product we model were constructed with
numerous fixings. However, market observation of real data suggests that the curve
obtained through the above equation only experiences translations and rotations
around the 2 to 3 year mark, and presents an important smile which we must be
capable of modelling. These are the only transformations which we need to be capable
of representing, and therefore, we will not be needing hundreds of Brownian sources
of chaos. Instead, we create our HJM model based on a finite number of Brownian
sources.
N
Γ(t , T )dWt = ∑ γ j (t , T )dWt P (8.2)
i =1
Where γi is the volatility of the discount factor B(t,T) for a particular instant.
112
As can be seen by the N index, we introduce a calibration set of N products that

each introduce a particular source of risk or stochasticity. It is up to the trader to
decide which set of N vanilla products will best reproduce the risk that he is trying to
model for his exotic product. In the above notation, each ‘j’ will correspond to a
particular maturity ‘T’.
However, the above time partitioning of the volatilities is fixed for our HJM
model, and independent of any product that we decide to calibrate. This implies that
it is we who define the series of time intervals 0= t0 < t1 < t2 < … < tN <… <∞ that we
will be considering, not the product.
From observation of historical data, we decide also that the Brownian motion
P
term dWt is independent of the bond maturities T. Bond price dynamics B(t,T) seem
to all, historically, behave in the same way for a common T.
In the Banco Santander we set out to model Γ using the following criteria. We
have already stated that the model must be capable of reproducing the smile of exotic
products. We search that it be able to provide de-correlation between forward Libor
rates so as to maintain a degree of flexibility within movements in different parts of
the forward rate curve. But most important of all, we search that the model be as
simple as possible, i.e., that it should have the minimal number of parameters capable
of reproducing the above characteristics. This will enable calibrations to be as rapid as
possible.
8.1 How to choose the γ?
Our choice for the different γ j (T ) will determine the bond price distribution that
we obtain. We start by seeing how this parameter had been chosen in other models.
The main historical developments in this field can be grouped under the BGM
methodology. Developed in 1995 – 1996, it builds on the construction of a form for the
γ that is consistent with the lognormality of a quasi Black-Scholes model, and that
113
takes into account a set of particular forward Libor rates, selected depending on their
maturities.
We have also already seen that another main stream of thought was to take the
zero coupon rates R(t,T) where B(t , T ) = e − R (t ,T )(T −t )
8.2 One Factor
Being one factor refers to the fact that the dynamics we attempt to reproduce are
modelled through a unique parameter, which will be the global volatility σ in our
case.
8.2.1 One-factor quasi log-normal:
γ j (T ) = σ (t j +1 , T ) log B(t j , T ) (8.3)
Proof
The above is derived from the following: recall that
log B(t , T )
B(t , T ) = e − R ( t ,T )(T −t ) → R(t , T ) = (8.4)
t −T
Applying Ito, we obtain
dR(t , T ) =
1  1  1  −1  − log B (t , T ) 
− dB(t , T )  dt +  2 ( Γ ) +
2
  ( t , T ) B (t , T ) dt dt 
t −T   B (t , T ) ( − ) 
2
  2  B (t , T )  t T 
(8.5)
dB(t , T )
= rt dt + Γ(t , T )dWt in the previous
P
Replacing our diffusion equation
B(t , T )
and also replacing log B(t,T) with the expression in (8.4), we obtain
114
dR(t , T ) = −
1
t −T

( )
1
 rt dt + Γ(t , T )dWt dt − Γ(t , T ) 2 dt −
P R(t , T ) 
dt 
(t − T ) 
 2
(8.6)
and so regrouping terms in dt and dW
1  Γ 2 (t, T)  Γ(t , T )
dR(t , T ) = −  R(t , T ) + − rt dt + dWt P
t −T  2  t − T
(8.7)
From market data analysis, we realize that the dynamics of bonds in general,
follows a lognormal term structure. As we have seen in (8.4), B(t,T) is directly related
to R(t,T). Thus if our bonds follow a lognormal behaviour, so must our rate dynamics.
Therefore, if we impose that our model be log-normal, then we must impose that
the volatility term be linear with R(t,T). This is
Γ(t , T )
= σ (t , T ) R(t , T ) (8.8)
t −T
in such a way that the log of our dynamics be normal
dR(t , T )
log R(t , T ) → = (...)dt + σ (t , T )dWt
P
(8.9)
R(t , T )
As can be seen, the Brownian term associated with log R(t,T) is now normal. Since
we have log R(t,T), we say that our version is lognormal.
With this imposition, we obtain from (8.8) that
Γ(t , T ) = ( t − T ) σ (t , T ) R(t , T ) (8.10)
And as we had from (8.4)
log B(t , T )
R(t , T ) = (8.11)
t −T
Then
115
Γ(t , T ) = σ (t , T ) log B(t , T ) (8.12)
Note that the volatility of the rate R is a deterministic function σ(t; T) to be

calibrated to market data. More concretely, we will use swaption and caplet prices to
obtain information about σ, and thus about Γ. In reality we propose a piecewise
constant version of Γ with regards to t, so that R is only log-normal by parts, meaning
that it is quasi log-normal.
As we have seen in section 3.3, log-normality of R means that we have a

relatively flat Black's implied volatility smile. We have already seen that other models
consider normality instead and we have also seen in Fig. 3.4 that it implies a negative
skew for their associated Black's implied volatility smile.
The term log B(t,T) is Markovian, meaning that to continue the process, the
term depends on previous data.
8.2.2 One-factor normal
The development of a normal model is completely analogous to the above. The

only difference lies at the moment of imposing the model we want to follow. Instead
Γ(t , T ) Γ(t , T )
of = σ (t , T ) R(t , T ) we now simply impose = σ (t , T ) This is a
t −T t −T
normal model i.e. the Brownian term is independent of R(t,T). Since (t-T) is a
deterministic term, we can include it within the volatility term, which is itself
deterministic and also time dependent. Therefore:
Γ(t , T )
= (t − T ) ⋅ σ (t , T ) = σ~ (t , T ) (8.13)
t −T
In fact, in our Santander model, we will extract from the σ~ (t , T ) a deterministic,

time dependent part, leaving
B(0, T )
Γ(t , T ) = σ (t , T ) log (8.14)
B(0, t )
116
8.3 Model Implementation
1st Approach:
Suppose a generic HJM setting of the form,
dB(t , T )
= rt dt + Γ(t , T ) dWt P (8.15)
B(t , T )
where
Γ(t , T ) = σ (t , T ) log B(t , T ) (8.16)
For simplicity we have not included the time dependency, thus
R(t , T ) = log B(t , T ) (8.17)
We can then write
 1  1  −1 
dR(t , T ) = d log B(t , T ) =  dB(t , T ) dt + 0 +  2 (Γ(t , T ) B(t , T ) )2 dt
 B (t , T )  2  B (t , T ) 
(8.18)
dB(t , T )
= rt dt + Γ(t , T ) dWt , then:
P
recalling
B(t , T )
( 1
)
d log B(t , T ) = rt dt + Γ(t , T ) dWt dt − Γ(t , T ) 2 dt
P
2
(8.19)
 Γ 2 (t, T) 
d log B(t , T ) =  rt − dt + Γ(t , T )dWt P (8.20)
 2 
Our main difficulty here is the term rt, which is risk neutral. This means that it
must show the same internal rate of return for any two assets. This must also apply
for any two assets that are separated in time. We can therefore examine two bonds of
different maturities, and subtract them to eliminate the term rt.
 Γ 2 (t, T) − Γ 2 (t,U ) 
d (log B(t , T ) − log B(t ,U ) ) =  dt + (Γ(t , T ) − Γ(t ,U ) )dWt P
 2 
117
(8.21)
The easiest way to implement this model is through a MonteCarlo approach

(refer to section 9.2). For this, we will need to generate a number of paths between
every interval ti and ti+1. As mentioned previously, in the HJM model we only
examine the particular bonds and maturities Ti that are of our interest.
Integrating the previous equation, we obtain:
Ti +1  
 Γ (t,Ti +1 ) − Γ (t,Ti )  dt +
2 2 Ti +1
B(t , Ti +1 ) ∫Ti   ∫Ti (Γ ( t ,Ti+1 ) − Γ ( t ,Ti ) )dWt P

=e  2 
(8.22)
B(t , Ti +1 )
Not all of the above components are completely determined since we do not
possess information on the parameter values at future times:
∆ Γ2
• dt should be integrated, using γ j (T ) = σ (t j +1 , T ) log B(t j , T )
2
• ∆ΓdWt P is a stochastic Brownian term, and so relatively easy to calculate. i.e. it

integrates to give a 0 mean, and so we must only calculate its variance. Recall a
rapid example:
2nd Approach:
Develops on the previous idea, but is now non Markovian. We therefore change
our initial approach
Γ(t , T ) = σ (t , T ) log B(t , T ) (8.23)
to Γ(t ,U ) = σ (t ,U ) log B(Ti ,U ) ∀t ∈ [Ti , Ti +1 ] (8.24)
In the previous approach, we did not know the future values at time t. Now,
instead, we evaluate our data at the beginning of our time step, where the value is
already known, and where only Ti+1 is left to determine.
Ti are therefore model time steps, strictly associated to the model itself and not to
the product.
118
B(Ti,U) is now no longer Markovian. This is, futures steps only depend on the
previous point.
Integrating now from Ti to Ti+1 we obtain:
Ti +1  Γ (t, U ) − Γ (t, V ) 
2 2
B (Ti +1 , V ) B(Ti , V )
dt + ∫ ( Γ(t , U ) − Γ(t , V ) ) dWt P
Ti +1
log − log =∫  
B (Ti +1 , U ) B(Ti , U ) Ti  2  Ti
(8.25)
where Γ(t , V ) = σ (t ,U ) log B(Ti ,U ) , and where log B(Ti,U) is no longer time
dependent, so can be extracted from the integral:
B(Ti +1 ,V ) B (Ti ,V )
log − log =
B(Ti +1 ,U ) B(Ti , U )
σ 2 (t, U ) σ 2 (t,U )
( log B(Ti ,U ) ) ∫T dt − ( log B(Ti , V ) ) ∫
2 Ti +1 Ti +1
dt +
i 2 Ti 2
+ ( log B (Ti ,U ) ) ∫ σ (t ,U )dWt P − ( log B(Ti , V ) ) ∫
Ti +1 Ti +1
σ (t , V )dWt P
Ti Ti
(8.26)
At Ti we already know all the values for B(Ti, _), and all the σ are also known and
deterministic. We are only left with the need to generate the stochastic integrals
Ti+1
∫ σ (t , V )dWt P (8.27)
Ti
for the maturities V that are of our interest, and which correspond to the fixing
dates at which the cash flows are exchanged.
Notice that in this approach we only have one Brownian motion. This does not
Ti +1
necessarily imply that all our elements ∫ σ (t ,V )dWt P be perfectly correlated:
Ti
Imagine for example that we have
 σ (t ,V )dW
∫ 1
 both with the same dW. Then if the individual σi follow
∫ σ 2 (t ,V )dW

119
σ1(s σ2(s
t t
Fig. 8.1. Example of lack of correlation between variables belonging to a unique

Brownian motion
then
 σ ( s )dW
∫ 1
only depends on W before t

∫ σ 2 ( s ) dW only depends on W after t

Therefore
Covariance = ∫ σ 1σ 2 dt = 0
Ti +1
We notice however, that to simulate the integrals ∫ σ (t ,V )dWt P , we require
Ti
the same number of Brownians Wt as the number of maturities Vi that we want for
each step. Thus we are dependent upon the form of σ(t,V) between any two dates Ti
and Ti+1. We therefore decide to look for the simplest possible expression for σ.
3rd Approach:
We make the hypothesis that σ(t,U) = σ(Ti,U), which appears to be the simplest
form for numeric generation. We therefore have:
Γ(t ,U ) = σ (Ti ,U ) log B(Ti ,U ) ∀t ∈ [Ti , Ti +1 ] (8.28)
Notice that we select, as we did for the bonds, a known Ti for our σ(Ti,U), that is,
at the beginning of the interval t. We do this because there is no strong reason that
would suggest we should take any other t, and because by selecting a known Ti,
numerically, everything becomes much easier.
120
Notice also that Γ is still stochastic for every Ti, as it changes value between each
Ti stochastically. However, Γ is piecewise constant for every interval [Ti , Ti +1 ]

Constructing on the formulation that we had developed in our previous
approach, we can now also extract the constant and known σ(Ti,U) from the integrals:
2 Ti +1 σ (t, U )
2
B(Ti +1 ,V ) B(Ti ,V )
log = log + ( log B(Ti ,U ) ) ∫ dt −
B(Ti +1 ,U ) B(Ti , U ) Ti 2
σ 2 (t, U )
− ( log B(Ti ,V ) ) ∫ dt + ( log B(Ti ,U ) ) ∫
Ti +1 Ti +1
σ (t , U )dWt P
Ti 2 Ti
− ( log B(Ti ,V ) ) ∫
Ti +1
σ (t ,V )dWt P
Ti
(8.29)
B (Ti +1 ,V ) B (Ti , V ) Ti +1 1
+ ( log B (Ti , U ) ) σ 2 (Ti ,U ) ∫
2
log = log dt −
B (Ti +1 , U ) B (Ti , U ) Ti 2
1
− ( log B (Ti ,V ) ) σ 2 (Ti , U ) ∫
Ti +1 Ti +1
dt + log B (Ti , U )σ (Ti , U ) ∫ dWt P −
Ti 2 Ti
Ti +1
− log B (Ti , V )σ (Ti , U ) ∫ dWt P
Ti
(8.30)
B(Ti ,V )  ( log B (Ti ,U ) ) σ (Ti ,U ) − ( log B(Ti , V ) ) σ (Ti ,U ) 

2 2 2
B (Ti +1 , V )
log = log +  (Ti +1 − Ti ) +
B (Ti +1 , U ) B(Ti ,U )  2 

+ [ log B (Ti ,U )σ (Ti ,U ) − log B(Ti , V )σ (Ti ,U )] WTPi +1 − WTPi ( )
(8.31)
At this point therefore, we can summarize that:
· logB which are constant in the interval ∆t
· We have obtained a set of σ which are also constant in ∆t
Now from market data, as previously mentioned, we know that zero coupon
bonds must be globally log-normal. In our model, we have (Γ(t ,U ) − Γ(t ,V ) )dWt P
with ∆Γ independent of ∆logB
121
Instantaneously therefore, we have constructed a model that is constant during

∆t, and so is lognormal. Globally however, the model still presents stochasticity for
∆Γ.
Notice finally that for any form of constant Γ (where we take the left Ti value), if
the particular case occurs in which Γ(t,U) = Γ(Ti,U), our approximation then becomes
exact.
4th Approach: Shifted Black
To this point, we have developed a log-normal model:
σBlack
Lognormal
(flat)
Strike K
Fig. 8.2. HJM dynamics for a lognormal model: flat
dS
Following = σdW
S
And a normal model: σBlack
K
Fig. 8.3. HJM dynamics for a normal model: skew
122
dS λ
Following = dW
S S
Where λ is a constant
Typically the implied volatilities quoted by market data form a smile that is
somewhat in between both normal and lognormal models. As shown by the following
equation, we choose a parameter α to interpolate between the more or less flat curve
presented by the quasi log-normal version and the negative slope curve produced by
the normal (Gaussian) version.
dS = (...) dt + (σS + λ ) dW (8.32)
where the term σS is log-normal, and the term λ is normal.
We can rewrite the above slightly differently so as to better understand its

dynamics. We therefore replace the constant λ by a known S0 and insert a factor of
interpolation α that allows us to modify the slope between a lognormal model for α =
1 and a normal model for α= 0.
dS = (...)dt + σ (αS + (1 − α ) S 0 ) dWt (8.33)
Our Black shifted model therefore now becomes
 B(0,U ) 
Γ(t , T ) = σ (Ti , U ) α (Ti ,U ) log B(Ti , U ) + (1 − α (Ti ,U )) log  ∀t ∈ [Ti , Ti +1 ]
 B (0, Ti ) 
(8.34)
we will refer to σ as the general volatility level, and to α as the skew. Both are
now entirely deterministic functions.
123
5th Approach:
Notice firstly that if we take α > 1 we obtain
σBlack
K
Fig. 8.4. HJM dynamics for alpha parameters greater than 1
We realize that our model is constrained by two limiting values
• if α < 0 there is a limiting maximum that we can never touch
• if α > 1 there is a limiting minimum that we can never attain
We decide that we would like to be able to access the entire range of prices with
our model, and still maintain 0 < α < 1. So as to not restrict ourselves, we decide to
include a new parameter that will enable us to attain very steep slopes - both negative
and positive.
 B(0,U ) 
Γ(t ,U ) = σ (Ti ,U ) ⋅ V (Ti ,U ) ⋅ α (Ti ,U ) log B(Ti ,U ) + (1 − α (Ti ,U )) log 
 B(0, Ti ) 
(8.35)
We are now multiplying our previous volatility by a new term V(Ti,U). The
question is, what expression must this V(Ti,U) undertake. We have already seen a
number of ideas in the model section of Stochastic Volatility. There, the simplest
124
possible expression was suggested as the SABR formulation. As we will see further
on, the analysis of stochastic ‘volatilities of volatilities’ is an important part of the
development undergone in this project.
Other Approaches
Another alternative would be to consider the mathematical formulation that is
currently being used as a first order Taylor expansion, and to extend it for instance to
a second order expansion. This would imply that instead of
dS = (...)dt + σ (αS + (1 − α ) S 0 ) dWt = (...)dt + σ [S 0 + α ( S − S 0 )]dW
(8.36)
we would now include a new calibration parameter λ, obtaining an expression of

the form:
[
dS = (...)dt + σ S 0 + α ( S − S 0 ) + λ ( S − S 0 ) 2 dW ] (8.37)
Other alternatives include a different from of interpolation between normal and

lognormal forms. Instead of performing a linear interpolation as mentioned earlier, we
could perform a geometric interpolation. For instance, instead of
 B(0,U ) 
α (Ti ,U ) log B(Ti ,U ) + (1 − α (Ti ,U )) log  (8.38)
 B(0, Ti ) 
we could consider using
1−α ( Ti ,U )
 B(0,U ) 
[log B(Ti ,U )] α (Ti ,U )
log  (8.39)
 B(0, Ti ) 
125
8.4 Controlled correlation
In the models we have seen up until now, the correlation structure among the
bond prices is always implicit. A way to control this correlation structure is by
changing WPt in the model to a two dimensional Brownian motion Zt = (Zt1 ;Zt2 ), and
to therefore consider a vector-valued Γ (t; T) given by
 Γ (t , T )  (
 sin θ ( t j +1 , T )

)  Χ
Γ(t , T ) =  1  ∑ j
= γ (T ) (t ) (8.40)
Γ
 2 (t , T )  j ≥0
 (
 cos θ ( t j +1 , T ))  ( t j ,t j +1 )
We can think of this as
j≥0
( ( ) ( ) )
Wt P = ∑ γ j cos θ ( t j +1 , T ) Zt1 + sin θ ( t j +1 , T ) Z t1 dW( t ,t ) (t )
j j +1
(8.41)
This modification could be included in any of the versions above, and would give
us one more model parameter θ(t; T).
The insertion of the two factors provides therefore an element of de-correlation
between each of the different interest rate terms rt , defined by their Ti . Without this
modification, an increase in the short term interest rates would necessarily result in a
similar increase in the long term rates- implying a correlation very close to 1, and thus
would always lead to vertical displacements of the entire curve.
Instead, with the inclusion of a de-correlation term, we can allow each interest
rate to vary differently in time, allowing for evolutions from flat curves, to positive
gradients, to other more complex interest rate functions.
126
Rate curve
Time
Fig. 8.5. for a correlation=1 amongst interest rates.
Rate curve
Time
Fig. 8.6. Allowing for de-correlation among different interest rates
127
8.5 Tangible Parameter Explanation
To this point, we have seen that the diffusion of the stochastic volatility Γ has
been modelled through 3 parameters and an element of de-correlation. We seek now
to gain a deeper understanding of how they truly behave within our model. Let us
recall that we had:
 B(0,U ) 
Γ(t ,U ) = σ (Ti ,U ) ⋅ V (Ti ,U ) ⋅ α (Ti ,U ) log B(Ti ,U ) + (1 − α (Ti ,U )) log 
 B(0, Ti ) 
(8.42)
where we could for example take the simplest possible form of stochastic
volatility, V as:
V (t , T ) = e γ ( t ,T ) Z t (8.43)
Our parameters are therefore:
σ : global volatility level
α: skew or slope
γ : smile- volatility of volatilities (Vol of Vol)
We shall refer to our HJM framework as being 2-factor whenever we pursue a

two dimensional approach in the modelling of the Brownian motion.
dW (t ) → sin θ (Ti ,U )dW 1 (t ) + cos θ (Ti ,U ) dW 2 (t ) (8.44)
We seek to able to model products that present the following behaviour:
128
σBlack
Short maturity
(smile)
Long maturity
(skew)
Strike
Fig. 8.7. Typical vanilla dynamics for different maturities
In the 3 dimensional view below, we have attempted to go further in our

understanding on the 2 dimensional representation. Note that the time scale tends to
more recent dates as we look into the 3D view, meaning that the products shows
greatest smiles for more recent dates, and greater skew for very long maturities.
The representations that we have brought here are indeed entirely schematic and
extremely smooth. We refer the reader to later chapters such as the SABR Caplet
surface in section 18.2 to see how real market quotes produce much more irregular
surfaces, and how it is only after an adequate ‘massaging’ process that the data is
finally smoothened out into the below form.
129
Smile
3,00
2,50
2,00
Black Volatility 1,50
1,00
0,50
28
25
-
22
1,00
19
6,00
16
11,00
13
Time
16,00
21,00
10,00
26,00
7,00
31,0000
Strike 4,00
36,0000
1,00
Fig. 8.8. Smile to skew deformation with maturity
8.5.1 Impact of Sigma
The sigma parameter represents the global volatility level of the vanilla product
that we analyse. This is, it represents a sort of measure of the stochastic deviation that
the product can suffer with respect to its average drift trend. The sigma is a parameter
that is generally, very closely related to the ‘at the money’ Black’s volatility i.e. to the
product’s volatility at its forward value. This is because it is this point, amongst all
other possible vanilla strikes, which defines the overall level of the smile presented.
130
Imapct of Sigma
Impact of Sigma
22%
21%
20%
kjgh
19%
18%
17%
Black Vol
16%
15%
14%
13%
12%
1% 2% 3% 4% 5% 6% 7%
Strikes
sigma = 16 % sigma = 18 % sigma = 20 %
Fig. 8.9. Sigma parameter global volatility level
Indeed, we see that the principal effect of the sigma is to set a global volatility
level. The higher we set this sigma, the higher the curve rises on a Black volatility
scale- and therefore also in price since the two measures are directly correlated. Note
that typical values for sigma are within the range of 8 – 20%. Note also that we have
constructed the above for a lognormal case, which is why the slope is flat: α = 1. If we
were to have constructed the same with an α = 0, we would have obtained set of
skewed graphs that would be once again, vertically displaced in black volatility with
respect to each other.
131
8.5.2 Impact of Alpha:
Impact of Alpha
23%
21%
khjk
19%
Black Volatility
17%
15%
13%
11%
2% 3% 4% 5% 6% 7%
Strikes
Alpha = 40 % Alpha = 80 % Alpha = 130 %
Fig. 8.10. Alpha parameter skew
We see from the above behaviour that the alpha clearly acts as a slope or skew
parameter. This is, as we make alpha tend towards 1, it increases the weighting on the
lognormal component of the volatility, thus tending towards a flat slope.
If instead we make the alpha tend towards 1, we revert towards a normal model
presenting a clearly defined skew as is shown by the negative slope above. Other
values are allowed, but whose interpretation is not so clear.
132
8.5.3 Impact of the Stochastic Volatility
Impact of Vol of Vol
28%
26%
24%
jhj
22%
Black Volatility
20%
18%
16%
14%
12%
10%
2% 3% 4% 5% 6% 7%
Strikes
VolOfVol = 20 % VolOfVol = 40% VolOfVol = 60%
Fig. 8.11. Stochastic Volatility: smile creation
As can be seen above, the stochastic volatility successfully produces the sought
for smile effect, increasing Black’s volatility for very high and very low strikes. This is
because the stochastic volatility attributes greater weights to the more extreme values
in strikes, thus having a much more pronounced effect at either ends.
133
Chapter 9 Numerical Methods
9. Numerical Methods
In this section our intention is that of introducing the various numerical engines
that are available when attempting to solve a stochastic differential equation. We will
see that there are three principal alternatives: MonteCarlo simulations, tree diagrams,
and partial differential equation solvers.
dSt = rSt dt + β t dWt

P
We set out with a model of the form:
We have already seen that both rt and βt can be stochastic.
There are also three main methods or approaches that can be considered when
analysing a product and its associated model.
• An analytical formula- such as the case of Black Scholes, that can be solved
directly
• Semi analytic formulas – can be solved almost completely using a direct approach,
apart from an occasional integral which must be performed numerically although
without any further complications
• All other combinations of models and products require a numerical computation

in order to achieve a solution. Such numerical approaches always require some
means of discretisation.
134
9.1 Discretisation
Theoretical models consider:
• ∞ time intervals- i.e. continuous time
• ∞ possible paths that the underlying asset can take between two discretised time
intervals t and t + dt.
Any model in practice can only deal with
• a finite number of future time steps
• a finite number of possible paths between any two time intervals
Anything different from this would require an infinite calculation time.
As stated initially, there are three main approaches by which a discretised model
can be tackled. These are:
• MonteCarlo simulations
• Tree Diagrams
• PDE (Partial Differential Equations) Solvers
We set out in this introduction using the simplest option we can conceive: a call
as our asset of study.
135
Call
ST (ST-K)T
K
P1
P2
P3 T Time
0
P4
Fig. 9.1. Call future scenarios generation
We will ignore all that has occurred in the past, starting from the present time
instant t = 0, and estimating the future values that our call can take. It is in this future
projection that the mathematical engine comes into play. It acts as a numerical
resource generating future scenarios and identifying all the different paths that an
asset can possibly follow before arriving at the future date.
After this generation, the next step is to equip each of the paths with its own
outcome probability. In each trajectory, we will analyse the gains or losses in which
we have incurred as investors.
The engine then proceeds to calculate the sum over all the possible outcomes,
weighting them with respect to their probabilities.
9.2 MonteCarlo
In the field of financial mathematics, Monte Carlo methods often give better
results for large dimensional integrals, converging to the solution more quickly than
numerical integration methods, requiring less memory and being easier to program.
The advantage Monte Carlo methods offer increases as the dimensions of the problem
increase.
136
In our model’s scope, we must firstly determine the number of points where the
engine is going to have to stop and construct a probability distribution for the
product. A more complex product with fixings at numerous intervals will logically
require the engine to perform the previously explained weighting of probabilities at a
large number of points. These calculation dates that are required are almost always
defined by the product itself. They tend to be imposed by the fixings or maturities at
which the product must be evaluated. Note that a specific model can also impose the
need for certain time steps, such as in the case of our HJM.
Through the MonteCarlo approach, a finite number of paths is chosen, say n =

10,000. For each, we must construct a possible evolution of the underlying asset S. But
dSt = rSt dt + β t dWt , which depends on a Brownian

P
recall that the asset follows
motion. We must therefore firstly construct our model’s Brownian variables.
For this reason, a random number generator is used to create ‘n’ variables within
0 < xn < 1 (excluding both the 0 and the 1). Each of these ‘xn’ random numbers are
easily transformed into a Gaussian variable Kn by using an inverse Normal
transformation.
This is, if we consider the cumulative frequency diagram for a Normal

distribution, we have that for any number between 0 < xn < 1
Fig. 9.2. Normally distributed variable generation from random numbers in the
(0,1) interval
137
we can write that xi=N(φi)= Prob (φ < φi). We can therefore perform an inverse
transformation to find φi =N-1(xi)
At this point we have already established the foundations for the construction of
our Brownian variables:
Remember that we had already seen that dWi = [Wk − Wk −1 ]i = Tk − Tk −1 ⋅ ϕ i

Meaning that any Brownian motion could be decomposed into its temporal
component and its Normal (Gaussian) component. However, we also know that
Brownians have the property of presenting
W(0) = 0, where T0 = 0.
This allows us to calculate every individual Brownian value:
WT1 − WT0  = WT1 − 0 = WT1

 i i i
(9.1)
WT1 = T1 − T0 ⋅ ϕi = T1 ⋅ ϕi
i
This process must be repeated for each of the i = 1,…, n paths.
Subsequently we can calculate the following path step at T2 by simply generating

another set of random variables yi from T1 to T2. All the yi must be independent from
xi: Thus Vi =N-1(yi) with
[W T2 ]
− WT1 i = T2 − T1 ⋅ Vi (9.2)
In the above equality, all the terms are known except for WT2 which we must
solve for.
Returning to our diffusion equation:

P
(9.3)
we see that we can apply a similar approach to calculate the value of our asset at
every time interval.
S(0) is known at present for T0=0. Then
138
[S T1 ]
− ST0 i = dST1 −T0 (9.4)
but we know dST1 −T0 from our diffusion equation, since it can be rewritten as
dST1 −T0 = rST0 (T1 − T0 ) + β t (WT1 − WT0 )

P P
(9.5)
where all the Brownians have been previously calculated. We therefore can
directly solve for the asset’s future price by solving
ST1 = S (0) + rST0 (T1 − T0 ) + β t (WT1 − WT0 )

P P
(9.6)
Note that we had said that we would generate n paths. We will consider each
path as having an equal probability of occurring, therefore a probability of 1/n. This is
equivalent to taking a simple random sample of all the possible trajectories.
Finally, for each path i of the product, we will have to evaluate the final product’s
payoff, F(Tk)i ; that is, how much we will receive or pay for the product in that given
scenario. What in fact must be analysed is the value today of that product in its future
scenario. We must therefore discount to today all of its possible cash flows at each of
their different payoff fixing dates:
  − ∫kr ( s ) ds  
T

Ai = ∑ [F (Tk ) ]i e 0
  (9.7)
k 
 
  i 
We repeat this valuation for each path scenario i = 1,..n, analysing each of the
payoffs at every time interval k. That is at T0 = 0 ,…, Tn.
The final price value that we select for our product is the weighted average of all
the product values in each path scenario i, with respect to the probability of each path
occurring (as stated, this was 1/n for each). Therefore we end up with
1
product value = ∑ Ai (9.8)
i n
139
9.2.1 Limitations
The main limitation of a MonteCarlo approach is the fact that it does not work
well for forward products, whereas it works extremely well for backward products.
This is, if we settle at a future date (Tk)i within our path scenario horizon, we have
only one unique path i leading to that point. This means that we can easily evaluate
the product at that node’s previous conditions, since it is completely determined by
that one path. However, if we had to generate a forward evaluation of the same
product, that is, if we had to find the value of the product that were to evolve from
that point onwards, we would need to perform a calculation of the sort:
 − ∫ r ( s ) ds 
Tq
VTk = E  VTq e Tk
P  (9.9)
 Tk

 
The value at a future time Tk depends on an expected value of these future

developments. But in the MonteCarlo analysis we only have one path leaving the
point being studied, so cannot really perform an average over a unique path.
A solution could be envisioned as an algorithm that would create a further 10,000

paths leaving from each evaluation node. We cannot realistically consider a
MonteCarlo operating in this way. It would definitely enable us to calculate the future
expected value that we came across previously, as we would now have a set of paths
over which to integrate. However, the proposed method would cause our calculation
time to explode exponentially.
MonteCarlo is therefore only useful for past dependant products.
9.3 Tree Diagrams
The basic idea behind any tree diagram used in financial analysis is very similar
to that used for the MonteCarlo. Its aim is to generate a number of possible paths
along which the asset of study can evolve towards future scenarios. Each path is then
weighted with a specific probability depending on the possibility that the asset
follows that route. The algorithm ends at a predetermined date at which the payoff of
140
the product is evaluated. At this point, just as in the MonteCarlo case, the probability
distribution for each possible outcome scenario is computed. The mean expected
payoff is then taken as the final value for our product. This must subsequently be
discounted back to present so as to know its trading value today. The number of
branches that sprout from each node can be chosen at will. In general the simplest
forms are the binomial and trinomial variations, although other multinomial
extensions can easily be constructed.
Binomial – 2 branches
Prob down
Fig. 9.3. Binomial tree
Trinomial- 3 branches
Prob up
Prob down
Fig. 9.4. Trinomial tree
141
Each of these may or may not have a symmetric setting for their branches, and
may or may not use a symmetric distribution for the probabilities to be assigned to
each of the possible routes.
9.3.1 Non Recombining Tree Diagram:
Analogous to the MonteCarlo simulation, each node here has a unique path
leading to it, and a number n leading away from it. As with the MonteCarlo
discussion, this alternative is impossible to implement as it rapidly diverges towards
infinite simulations. See the binomial case for instance, that generates 2k new paths at
each time step, and 3k paths in the trinomial case-
Prob up
Prob down
Tk Tk+1 TT
k+2
Fig. 9.5. Non recombining binomial tree
9.3.2 Recombining Tree Diagrams:
Regarding the previous forms of tree diagrams, the recombining tree alternative
results as the only viable solution with regards towards a practical implementation. It
allows the algorithm to reach the same node via several different paths. The binomial
case is no longer exponential in terms of the number of nodes created at each step.
Instead, it only adds k + 1 nodes at each successive time interval. In the trinomial case,
the number is slightly greater, adding 2k + 1 new nodes at each time step.
142
It is important to note that in a binomial tree, we reduce the infinite paths that can
stem from any single node to only two possibilities. We must assign to each of these
branches a particular probability, that we will note as Probup and Probdown. It is
evident that the sum of these two probabilities must equal one. In fact, we have three
equations that enable us to evaluate each of the two probabilities. These are:
• zero central moment ∑ prob = 1

• The first central moment: (
NodeTk = Ek NodeTk +1 )
• The second central moment : discrete variance= theoretical variance
The three equations above define a unique Gaussian variable. Note that a normal
distribution is defined entirely by its moments M0, M1 and M2.
In fact, what we have here is a set of three equations and four unknowns:
Prob up, Prob down, S1,up, S1,down
S1 up
Prob up
Prob down
S1 down
Fig. 9.6. Binomial tree probabilities
It is therefore necessary to fix one of the four parameters in order to solve for the
other three. Typically, we seek a symmetry of the form Prob up = 0.5, Prob down = 0.5
143
Prob up
Prob down
Tk Tk+1 Tk+2 Tk+3
Fig. 9.7. Recombining binomial tree
In the trinomial recombining tree we would obtain the following graphical

representation:
Prob up
Prob down
Tk Tk+1 Tk+2 Tk+3
Fig. 9.8. Recombining trinomial tree
The tree diagram algorithm has properties that are the complete opposite of those
presented by the MonteCarlo algorithm. A tree diagram turns out to be very good for
future dependant products, since there is no longer the problem of a unique path
144
leaving a particular node. Thus, a probability can be computed when calculating a

future expectation by taking into account the numerous paths that emerge from any
given node. In contrast, the tree method is not good with back dependant products, as
any node (Tk, Sk)i cannot be traced back through a particular path, since it possesses a
number of possible routes.
The main problem that arises at this point is when we are faced with products
that are both forward and backward dependant. Neither of the two previous methods
can realistically be applied to this kind of situations and still yield suitable results.
In the 1990’s the Hull White Model developed a series of tricks so as to avoid such
difficulties in some of its products. These however remain limited, and inefficient in
the case where the products have many payoff dates.
Between 1995 and 2000, a series of techniques were created enabling the
MonteCarlo method to tackle future dependant products such as American contracts.
From 2002 onwards, the development has been directed towards a Longstaff
Schwartz approach. A further idea that seems promising but that has never reached a
concrete form was the Willow tree development.
9.4 PDE Solvers
This involves an entirely different approach to that of the two previous methods.
It no longer deals with the probabilities and Brownian motions necessary to solve the
diffusion equations encountered. Instead, it solves deterministic partial differential
equations, eliminating all the stochasticity of the problem. The method is in one of its
variations, a more generic approach that encompasses the recombining tree diagram.
Indeed, it is capable of incorporating tree diagrams as a specific sub-case in which the
PDE mesh is generated in a triangular form.
The basic development of the equations used can be considered as follows. Let us
consider the basic stochastic differential equation of any tradable asset. This is:
145

P
(9.10)
By applying Ito to the product’s payoff, which we shall note as V(t,St), then
∂V ∂V 1 ∂ 2V 2
dVt = dt + dS + β dt =
∂t ∂S 2 ∂t 2
=
∂V
∂t
dt +
∂V
∂S
( )
rS t dt + β t dWt +
P 1 ∂ 2V 2
2 ∂t 2
β dt = (9.11)
 ∂V ∂V 1 ∂ 2V 2  ∂V
= + rS t + β  dt + β t dWt P
 ∂t ∂S 2 ∂t ∂S
2

Under the risk neutral probability, every asset must have the same average yield.
This means that the payoff can also be written generically as
dVt = rVt dt + γ t dWt

P
(9.12)
Since both the product and its payoff must have the same yield, we can therefore
equate the two drift terms through the Black Scholes formula:
 ∂V ∂V 1 ∂ 2V 2 
rtVt =  + rSt + β  (9.13)
 ∂t ∂S 2 ∂t 2 
With this procedure we have now eliminated the Brownian component of the
equation, and with it, all the probability distributions that it implies. They are still
present in the equation, but implicitly, behind the terms V, S, and β.
Having done this, we proceed to construct a mesh for the PDE solver. This can be
of varying forms, and we shall outline briefly here just the simplest method so as to
achieve a general understanding of the procedure to be followed.
146
St
VT,i ST,i-K
ST,1 VT,1
ST,2 VT,2
ST,3 VT,3
.
.
VT0 . Vt
0 t1 t2
V=0
Fig. 9.9. PDE mesh and boundary conditions
Note that we have written the underlying on the vertical axis as reference, but we
will in fact be dealing with the underlying’s payoff V, represented on the right hand
side of our graph as another vertical axis.
We can impose certain boundary conditions to restrict our mesh. For example, if
the value of St drops too far, it is clear that our product, priced V (and that is
proportional to (S - K)+), will not be profitable, meaning that the option will not be
exercised, and its value will drop to 0. In this way we have already imposed the lower
horizontal boundary for our mesh. In addition, we can also exclude any value of S
which is excessively high.
Moreover, at maturity, we have a fixed date which constrains us by imposing a

vertical limit. It is itself divided into all the possible future values that ST,i can take-
imposing our mesh divisions. We can therefore construct all the corresponding
payoffs for these different asset prices ST,i as:
[
VTi = E e −rT ( STi − K ) ] (9.14)
Where for t=0, we know that V = S0, thus defining our final vertical boundary at
the left of the above mesh.
The resemblance with the tree diagram is now very clear. The above method
allows to start at a future node T already knowing the value of the product here, and
work backwards in time using the discrete equation
147
 ∂V ∂V 1 ∂ 2V 2 
rtVt =  + rSt + β  (9.15)
 ∂t ∂S 2 ∂t 2 
Let us start for instance at the top left hand corner:
Boundary conditions
i+1
i
Boundary conditions
i-1
k-1 k k+1
Fig. 9.10. First PDE algorithm steps
The procedure chosen to discretise the different terms in the equation from this
point onwards can be extremely diverse. For instance, we can take, according to
Taylor expansions:
∂V Vt − Vtk
→ k +1 (9.16)
∂t t k +1 − t k
∂V Vt ( ) ( )
− Vtk
→ k i +1 i −1
∂S S tk( ) i +1
− (S )t k i −1
(9.17)
∂ 2V ( ) ( ) ( )
Vt k i +1 − 2 Vt k i + Vt k i −1
→
∂S 2 (( ) ( ) ) (( ) ( ) )
St k i +1 − St k i ⋅ St k i − St k i −1
(9.18)
The success of the PDE solver will depend mainly on the mesh used and the way
in which we choose to discretise. We will not discuss any further the advantages and
disadvantages of explicit versus implicit discretisations, or of other methods that
could have been stated- we simply name the Crank Nicholson technique as a very
broadly used method in this domain.
148
10. Calibration
Calibration is the process by which the output on a measurement instrument is

adjusted to agree with the value of the applied standard, within a specified accuracy.
The final objective behind our calibration procedure is to be capable of pricing an

exotic interest rate product whose market value is unknown. To do so, we decompose
the complex product into a set of simple, market traded vanilla products. Thus, we
expect the combination of these simple products to be capable of replicating the
complex product’s behaviour. These simple liquid products have market quoted
prices.
The following step is to create an HJM model capable of correctly pricing these
simple products. For this, we adjust our HJM’s model parameters until its output
price for each of these plain vanilla products coincides with their true market value.
Having reached this point, we have therefore obtained a set of parameters which
correctly model the entire group of vanilla products. Further, we expect that with
these parameters, our HJM model will also be capable of modelling a more complex
exotic product which is in some manner a combination of these simple products.
Thus, inputting into our HJM model the calibrated parameters, the exotic
product’s characteristics and the market rates, we obtain a final price for our exotic
product.
149
Chapter 10 Calibration
10.1 Algorithm
We will now continue to explain in further detail how the overall algorithm
works. We start by presenting a simple flowchart with the main processes involved.
Model Parameters Market Rates N liquid

α, σ, γ, θ vanilla products
PRICER
Modify Parameters no
N market prices = N model prices
Newton Raphson
yes
Export model parameters

α, σ, γ, θ
Fig. 10.1. Calibration Process: Vanilla Products
As we have already stated, the aim of the calibration is to be able to price a new
financial exotic product. For this require:
1. Taking from the market the current interest rates, including spot rates,
dividends,…
2. The second element we also take from the market are the characteristics of a set
of n products. These must include a complete description of the various payoffs and
cash flows they generate at each of their possible fixings. The products should be
similar to the new product we want to price. In this vanilla product selection, we
stress
150
• That the products should be very liquid i.e. that their market price is exact and
reliable – plain vanillas
• That the product must have a very similar risk profile to the exotic product we
attempt to evaluate.
The calibration group is configured so that it can be selected automatically or

manually by the trader himself.
In general we choose to use vanilla products, which are simple to model, and that
play the role during the calibration of giving us the market’s point of view of the risks
in our exotic product.
3. The last elements are the input parameters needed for our model to generate
the pricing of a product. In the case of Black Scholes, we have seen that there is only
one input parameter, which is the product’s σBlack. In the HJM model, we have seen
that the input data can be σ and α if it is our two strike model, and an additional λ in
the case of the three strike model. The model parameters are responsible for
simulating the different sources of risk in a product. These risks can include variations
in the underlying price, movements in the interest rate curves, …
(There are situations in which the parameter of a particular model can become so
widely used that traders begin to talk of the products themselves in terms of the
specific parameter itself, and consequently, the market creates, itself, a value for which
the parameter of each product trades. Such is the case of the σBlack for instance, and is
now also occurring with the SABR model)
With the above input data, we are now ready to proceed with our calibration
process. We must first select a set of initial guesses for our model parameters-
typically between 0 and 1.
We then proceed to test each of the n vanilla products by comparing their market
price with the model price that our HJM pricer algorithm generates for them. If they
are different, it means that our HJM model, with its set of initial parameters, does not
correctly reproduce the market. We must modify our initial parameter values and
151
repeat the procedure, once again comparing the real market prices with those
generated by our model. We continue our iterations until we find a set of parameters
for which our model correctly prices the n vanilla products.
With these parameters, we are ready to tackle the exotic product. We use these
final parameters, the market interest rates and the characteristics of our new exotic
product to enter our HJM algorithm and price the exotic product.
Model Parameters Market Rates EXOTIC

α, σ, γ, θ PRODUCT
PRICER
Exotic Product
Price
Fig. 10.2. Calibration Process: Exotic Pricing
10.1.1 An example of a vanilla product set selection
Let us consider a 10Y callable swap which is callable every year. This means that
each year, we have the option of cancelling the swap. We shall consider this our
complex exotic product.
The product, from a quant’s point of view, is equivalent to entering an inverse

swap so as to cancel the incoming cash flows. Thus, to cancel the callable swap on the
ninth year would be equivalent to having an option in nine years time of entering a 1Y
swap.
Call Swap on the 9th year
0Y 9Y 10Y
1Y
Entering a 1Y Swap on the 9th year
Fig. 10.3. Analogy cancellable swap and inverse swap
152
The same can be said for all the other fixings, meaning that we can model the
option to cancel in 8 years as the option to enter in a 2 year swap starting in 8 years.
Thus the risk of our exotic product is modelled here by decomposition into a list
of nine simple vanilla products considered very liquid and that can be taken directly
from the market:
Starting Date
Exercise Date Length
Tenor
9Y 1Y
8Y 2Y
7Y 3Y
: :
: :
1Y 9Y
Table 10.1. Exotic product risk decomposition
10.2 Calibration in Detail
For any product calibration, the trader must select a range of simple vanilla
products which he believes will adequately model his exotic product’s risk profile.
The vanilla products that we will use will commonly consist of caplets or swaptions
constructed over their corresponding forward rates.
Say for example that an exotic product has a determined fixing at time T that
depends on the 3 month EURIBOR. The trader may decide therefore to incorporate
this particular risk into his analysis by using a caplet based on the 3 month EURIBOR
whose maturity coincides with T. If instead, the trader needed to model a 1 year
EURIBOR risk at this date T, he would probably decide to use a swaption instead of a
caplet, since the swaptions are constructed over the 1 year EURIBOR. That he should
choose to use a swaption or a caplet on the 1 year EURIBOR over the same time
period would depend on a more subtle analysis that we will not enter here.
The HJM Santander model is constructed following an entirely generic approach.

We realize that an exotic product with a global maturity of UN years may be replicated
153
using a wide range of intermediate vanilla products with maturities Ui ≤ UN and of

varying life spans. Therefore, the approach that gives the trader the greatest freedom
is to allow him to select any of these for his calibration.
The first thing that the trader must do is to decide on the minimum time intervals
that his product will depend on. The minimum possibility is a 3 month scenario
between each date, meaning we would proceed to divide the global maturity UN into
3 month intervals. Other options would be to have 6 month or yearly intervals.
Settle Product
First Fixing Global Maturity
t T UN
t T UN
Fig. 10.4. Decomposition of an exotic into time periods
The next step would be to select the specific vanilla products for the calibration.
These could have a starting or fixing time T at any of the dates Ui into which we have
divided our time space, and could have a maturity Uj > T, with a superior limit to this
maturity equal to the product’s global maturity, thus Uj ≤ UN. The range of all the
possible vanilla products that we can incorporate into our calibration set therefore
takes on the form of a matrix. Below we present all the possible combinations for a
given exercise date Ti.
Vanilla Possibilities starting at Ti
t T Ti UN
Settle Product Global Maturity
First Fixing
154
Fig. 10.5. Decomposition of an exotic into vanillas fixing at T and with different
maturities
Yet the above decomposition can be equally performed for any Ti satisfying T < Ti
<UN. The global exotic product with maturity UN can therefore be composed by any of
the product combinations presented in the matrix. Note that each cell within the
matrix represents a vanilla product that fixes at its corresponding row date T, and
ends at its corresponding column maturity Ui.
U0 U1 U2 U3
T0 V(t, T0,U0) V(t, T0, U1) V(t, T0, U2) V(t, T0, U3)
T3 V(t, T3,U0) V(t, T3, U1) V(t, T3, U2) V(t, T2, T3)
Table 10.2 Ideal Vanilla calibration matrix: all data available
In practice however, taking into account the entire matrix to calibrate our model
parameters would result excessively time-consuming. It would however be the ideal
goal to attain in the future.
In general, we decide instead to calibrate the most representative areas of the

above matrix. These are, firstly, the diagonal, representing vanilla products that start
at Ti and end immediately 3, 6 or 12 months after depending on the vanilla product
we are using.
A further addition that we can allow ourselves to perform is to calibrate with the
end column. This is, to take into account vanilla products that start at each of the
possible time intervals Ti and whose maturity is the product’s global maturity UN.
Thus we have the subsequent matrix appearance:
U0 U1 U2 U3
T3 V(t, T3,U0) V(t, T3, U1) V(t, T3, U2) V(t, T2, T3)
155
Table 10.3 Vanilla calibration matrix: market quoted data
Or schematically:
U0 Ui UN
T0
Ti
TN
Fig. 10.6. Schematic calibration matrix representation
Evidently, the above is not sufficient for our calibration. We complete the rest of
our matrix through interpolation and extrapolation between the known values. Thus,
we will refer to the end column and the diagonal as the Target Parameters in our
calibration set, whereas we will consider the rest of the matrix as being composed by
Dependent Parameters.
Note that there is a further extrapolation that we had not mentioned earlier. This
is the initialisation of a set of values that we will also be using. For example, if our first
possible fixing is at date T, we must be aware nevertheless that as traders, we agreed
to enter this product at a previous value date t. There is definitely a time interval
between t and the first fixing T, in which the market is not static, and whose
fluctuations can affect our product before we even start the first fixing.
First Fixing
Product Settlement
t T U
Fig. 10.7. Initial Variation before first Fixing
These fluctuations correspond to time intervals which would be included as the

first rows in our matrix. They too must be extrapolated. We really have:
156
U0 Ui UN
T0
Ti
TN
Fig. 10.8. First Row Interpolated Data
Although for simplicity we will ignore this representation and use the former.
We must distinguish a further characteristic before we can proceed to the analysis

of results. When we are calibrating in a 1 strike model, we need to calibrate with only
one product for each matrix position, whereas in a two strike model, in each cell we
must consider two products with different strikes. Further, a 1 strike model only
calibrates with the end column, whereas a 2 strike model calibrates with both the
column and the diagonal.
10.3 Best Fit or not Best Fit?
The following aims at discussing the criteria required to determine whether we

are satisfied with the approximation between model and market prices, and thus are
ready to end the iterative process.
10.3.1 Minimum Square Error. 1st Method:
Given a model of M parameters and a calibration set of N liquid products:
· If N < M, then the problem has an infinite set of solutions, and lacks sufficient
information to be solved.
· If N ≥ M, the problem can be solved by applying a square mean calculation,

where we solve
157
 
F = min  ∑ ( Market pricei − Model pricei ) 2  (10.1)
 i 
The main advantage of this method is the fact that the calibration is performed
over an excess of information. This means that there always exists a solution.
The main problem in contrast, is that it is time consuming when seeking the
solution due to the minimisation algorithm which is necessary. Further, minimisation
algorithms can prove imprecise, finding local minima rather than the absolute
minimum to the problem. We must also realise that we seek
Market prices = Model prices
This means that we seek a minimum of exactly 0 for the difference
Market prices - Model prices
This cannot be guaranteed by our algorithm, which simply provides a minimal

difference between the two parameters- but this difference can be arbitrarily large.
In addition, when we arrive at our final solution curve, we cannot know which
points are exact i.e. which points are supplied by the market data and which are
approximations, thus we have truly lost information. Projecting N dimensional data
onto an M dimensional problem space results in a loss of information:
σBlack
Fixed Maturity
(t0,T)
Strike K
Fig. 10.9. Inexact fit: minimum square method
158
Above, our approximation curve only coincides with one of the market data
points, so realistically speaking, can only be considered exact at that precise point, and
is therefore an approximation for all other regions.
One of the greatest problems encountered using this procedure is the

introduction of noise in the calculation of the Greeks.
The Greeks:
Are the variations of a product (its derivative) with respect to its market
parameters. All the different types of risks in the pricing of a product can be
measured by several instruments, the Greeks. With these, it is possible to hedge
against the risks in a very easy manner.
The Greeks are vital tools in risk management. Each Greek (with the exception of
theta - see below) represents a specific measure of risk in owning an option, and
option portfolios can be adjusted accordingly ("hedged") to achieve a desired
exposure
As a result, a desirable property of a model of a financial market is that it allows

for easy computation of the Greeks. The Greeks in the Black-Scholes model are very
easy to calculate and this is one reason for the model's continued popularity in the
market.
The delta measures sensitivity to price. The ∆ of an instrument is the

mathematical derivative of the value function with respect to the underlying price,
∂V
∆= (10.2)
∂S
The gamma measures second order sensitivity to price. The Γ is the second
derivative of the value function with respect to the underlying price,
∂ 2V
Γ= (10.3)
∂S 2
The speed measures third order sensitivity to price. The speed is the third
derivative of the value function with respect to the underlying price,
159
∂ 3V
speed = (10.4)
∂S 3
The vega, which is not a Greek letter (ν, nu is used instead), measures sensitivity
to volatility. The vega is the derivative of the option value with respect to the volatility
of the underlying,
∂V
υ= (10.5)
∂σ
The term kappa, κ, is sometimes used instead of vega, and some trading firms
also use the term tau, τ.
The theta measures sensitivity to the passage of time. Θ is the negative of the
derivative of the option value with respect to the amount of time to expiry of the
option,
∂V
Θ=− (10.6)
∂T
The rho measures sensitivity to the applicable interest rate. The ρ is the derivative
of the option value with respect to the risk free rate,
∂V
ρ= (10.7)
∂r
Less commonly used, the lambda λ is the percentage change in option value per
change in the underlying price, or
∂V 1
λ= (10.8)
∂S V
It is the logarithmic derivative.
The vega gamma or volga measures second order sensitivity to implied volatility.
This is the second derivative of the option value with respect to the volatility of the
underlying,
∂ 2V
volga = (10.9)
∂σ 2
160
The vanna measures cross-sensitivity of the option value with respect to change
in the underlying price and the volatility,
∂ 2V
vanna = (10.10)
∂S ∂σ
This can also be interpreted as the sensitivity of delta to a unit change in

volatility.
The delta decay, or charm, measures the time decay of delta,.
∂∆ ∂ 2V
charm = = (10.11)
∂T ∂S ∂T
This can be important when hedging a position over a weekend.
The colour measures the sensitivity of the charm, or delta decay to the underlying
asset price,
∂ 3V
colour = (10.12)
∂S 2 ∂T
It is the third derivative of the option value, twice to underlying asset price and
once to time.
For our particular model, if we want to calculate:
∂price price1 − price2

≈ (10.13)
∂σ Black σ Black1 − σ Black2
we simply evaluate our product’s price once to obtain price1. This will have an
inherent error E1 due to the square mean minimisation used to obtain the solution
curve. To calculate the price2 we slightly shift the market σBlack which we input into
our calibration set, and then re-price our product. We must notice that with such a
variation, we generate a new minimisation curve that will most probably have a
different minimisation error, E2 at the point we are studying. Such variations are
responsible for the noise generated in the Greeks we obtain.
161
10.3.2 Exact Fit. 2nd Method:
Despite the fact that the market contains an excess of products, we seek to have N
= M. This is, for every maturity T, we shall select the same number of market products
as unknown parameters in our model. The difficulty in this procedure is to correctly
select the pertinent calibration set – yet this is where the skill of the trader lies. Thus,
with this model, we can assure that the value at any of the selected strikes is precise.
For example for a model of five parameters, we would select five products:
σBlack
Fixed Maturity
(t0,T)
Strike K
Fig. 10.10. Exact fit
Between points, the model may or may not be exact, but at the specific strikes K
set by the trader, we can be sure of the results obtained.
1st Effect:
A further advantage of the best fit method lies in the fact that a real anomaly
within market data can be truthfully taken account of in our model, whereas the same
anomaly would be evened out by the square mean method:
162
σBlack
Fixed Maturity
(t0,T)
Strike K
Fig. 10.11. Anomaly in exact fit
The square mean method in contrast, would produce:
σBlack
Fixed Maturity
(t0,T)
Strike K
Fig. 10.12. Anomaly in minimum square method
2nd Effect:
The use of the minimum squares method introduces noise in the calculation of
the Greeks, not allowing for any form of control over the residual error created.
The best fit method however allows to calculate sensitivities through differentials
or finite difference methods. It will only present sensitivities towards the factors
included in our calibration set.
Thus, we must include in our calibration set all the sensibilities that we
specifically want to take into consideration. For example, if we want our three
163
parameter calibration to present several sensibilities, we will not construct it with

three identical vanilla products, but instead for example, with a vanilla at the money,
a risk product and a butterfly. Each of these will be aimed at bringing into our model
a characteristic sensitivities.
3rd Effect:
We economise a lot of time by analysing only the pertinent products, and not an
indefinite range of them. Time of calculation is of utmost importance in our working
environment
10.4 Newton Raphson
We seek: Model Prices = Market Prices.
We use the Newton Raphson algorithm to obtain the solution to the above
equation. The algorithm is known to be the fastest among other optimisation
algorithms. Its main problems arise when the surface is not smooth, and when the
initial point of calculation is far from the final solution. The first is not our case, and
we will show that our first guess is generally a good starting approximation.
Market Price – Model Price
P1
σ*
σ4 σ3 σ2 σ1 Parameter
Fig. 10.13. Newton Raphson Iterations
164
The Newton Raphson procedure is simple and well known. It seeks
Model Prices - Market Prices = 0
As seen in the above figure, a first guess σ1 is used to construct the model price,
∂P
P1. The slope is calculated at this point as m1 = . This is used to construct a
∂σ
straight line of equal slope at the point (σ1, P1). Its point of intersection with the
P1
horizontal axis provides the next point of calculation. σ 2 = σ1 −
m1
The method is easily generalised to M parameters by substituting the slope by the
∂P
Thus σ 2 = σ 1 − P1 ⋅ J
−1
Jacobian: J=
∂σ i
The difficulty in this procedure lies in the need to calculate the inverse of the
Jacobian, something that can result not only difficult numerically, but also time
consuming.
10.4.1 Simplifications of the algorithm.
1st Simplification:
The Newton Raphson method can only be used where the slope is smooth. If ever
∂P
is not smooth, we can create what is called a buffer parameter, λ. Typical forms
∂σ
λ = ∫ σ 2 dt λ = ∫ eσ dt
2
are: or
∂P
The transformation allows for a smooth where the Newton Raphson
∂λ
algorithm can now be applied successfully.
165
2nd Simplification:
We do not really need an extremely accurate Jacobian. This is, for a smooth curve,
we can use a constant initial Jacobian in all iterations to reach the final solution σ*.
Although more steps are needed for the algorithm, we avoid the time consuming
calculation of a new Jacobian at each step, and thus comparatively, greatly reduce the
time of computation.
Market Price – Model Price
σ*
…σ6 σ5 σ4 σ3 σ2 σ1 Parameter
Fig. 10.14. Newton Raphson Iterations with a constant Jacobian
A further solution would be to use an analytic, approximate Jacobian at each

point. Making it analytical would render it very rapid, and would also avoid the need
to calculate the numeric Jacobian at each step. This is what we have developed in
section 12 and called an analytic approximation. In fact, the analytic approximation
calculates an approximate solution using its own approximate Jacobians, and once it
arrives at its solution, the algorithm transforms into a MonteCarlo approach that
nevertheless, still uses the last Jacobian that the analytic approximation calculated.
The proof for the fact that the use of an analytic approximation is less time
consuming than the calculation of a new Jacobian each time relies on the following
logic:
166
Without an analytic approximation, each time we go through one whole iteration

within the calibration process we must perform the pricing n+1 times: We call on the
pricer a first time during the calibration so as to calculate the model price for the n
liquid vanilla products based on the first guess model parameters. If we then have to
iterate because these model prices do not coincide with the market prices, we must
call on the pricer a further n times. This occurs because we must generate a Jacobian to
proceed with the slope method of the Newton Raphson algorithm.
N Prices
P1 P2 P3 … Pi … Pn
λ1
λ2
Jacobian Matrix
λ3
:
N Parameters
d Pricei
λi
d Paramj
:
λn
Fig. 10.15. Calibration Jacobian
We calculate the Jacobian by varying or ‘bumping’ the first of the n parameters λ1

by a differential amount dλ1. We then go through the pricer again, calculating all the
new prices Pi + dP|λ1. Thus we obtain the first row in our Jacobian matrix:
∂Pi P − (Pi + dP )
= i (10.14)
∂λ1 λ1 − (λ1 + dλ )
167
Model Parameters Market Rates N liquid

α, σ, γ, θ vanilla products
Recalculate Pi
PRICER
Bump λ1
Modify Parameters no
N market prices = N model prices
Newton Raphson
yes
Export model parameters

α, σ, γ, θ
Fig. 10.16. Jacobian calculation Iterations
We repeat the process n times for each of the parameters λi .With the old prices Pi
and parameters λi, and with the new parameters λi + dλ and prices Pi + dP, the
Jacobian is then fully computed.
With an analytic approximation therefore, we would avoid calling on the pricer

n+1 times. Instead we would use a unique Jacobian and simply require a few more
iterations to arrive at the final result.
168
10.5 Iteration Algorithm
Recall the initial Calibration flow chart, we are now going to present in more
detail the iteration’s right hand side:
New Parameters
j = j +1 j+1
λ i
START
n Parameters λji Market rates
N Vanilla
i =0 Products
Newton Raphson PRICER
yes
Bump Parameters i =0? Market Price = Model Price?
λij = λij + ∆ i no yes
no
Price Store
Model Priceji
Price ij − Price 0j
λi
∆λij
j
i = i+1
Export n
Parameters λji
no
yes
i= n ?
Fig. 10.17. Detailed Calibration Algorithm: Jacobian computation
169
Chapter 11 Graphical Understanding
11. Graphical Understanding
We now seek to truly understand how well our HJM model operates, and a direct
method to achieve this is by reproducing it graphically. By doing so, we will be able to
see where calibration solutions lie, where the first guesses are taken, and how directly
our algorithm converges towards its solution. We also hope that any specific cases
where our algorithm finds difficulties in converging, or does not converge at all, will
become apparent during this phase.
We set out to analyse the two strike HJM model. Recall that the calibration
parameters for such a formulation were simply σ and α. The initial idea is to generate
the space Ω consisting of all the prices that the HJM model generates for each pair (σ,
α), for a determined vanilla product. We then compare this three dimensional surface
with the true market price of the vanilla product. Wherever the HJM model price
coincides with the market price, we would have a valid solution for the parameters
(σ,α).
We develop this idea a step further, and set our z axis directly as:
HJM Model Price – Market Price
This is simply a vertical displacement of the curve along the z axis, since the
market price for a given vanilla product is a constant. What we achieve through this
transformation, is to be able to directly observe our (σ, α) solutions as the intersection
of our surface with the horizontal plane. That is, we have a valid solution whenever
HJM Model Price – Market Price = 0
Note that as discussed in the Black Scholes Chapter, the use of Market prices or of
Black volatilities on our vertical axis is indifferent, as the two are directly related. We
must also be aware of the fact that each valuation must be made for a product with a
defined maturity and a definite strike.
We will not enter the details of the programming algorithm involved in this
procedure yet. For the interested reader, refer to the algorithm at the end of this
section Fig. 11.12.
170
Our space Ω will consist in the parameters to calibrate along the horizontal plane.
Notice that we can intuitively guess the form that this surface will have, by analysing
the 2 dimensional behaviour of the individual parameters.
We must not confuse the behaviour of the model parameters in this section with
those in the HJM section. Here, we are representing price versus model parameters. In
previous graphs, we were analysing prices versus strikes, and seeing how our
parameters transformed those curves.
The sigma parameter in the two strike model presents a characteristic curve of the
form:
Sigma
6
k
4
Prices
0
0 0,05 0,1 0,15 0,2 0,25
Sigma [%]
Alpha -20 Alpha 0 Alpha 20 Alpha 40 Alpha -60
Fig. 11.1. HJM model: sigma vs price dynamics with different alpha parameters
It is a monotonously increasing curve. We see that its slope varies depending on

the value of alpha.
The alpha parameter presents a characteristic convexity. Its global level varies
vertically in price depending on the value of sigma.
171
Alpha
15
10
k
5
Price
0
-65 -45 -25 -5 15 35
-5
Alpha [%]
Sigma 0.04 Sigma 0.08 sigma 0.12 Sigma 0.16 Sigma 0.2
Fig. 11.2. HJM model: alpha vs price dynamics with different sigma parameters
By combining the two, we obtain a 3Dsurface Ω of the form:
HJM Dynamics
120
.
100
Model Price- Market Price .
80
60
40
0,18
20 0,14
0,1 Sigma
0
0,06
-20
50
40
0,02
30
20
10
0
-10
-20
-30
-40
-50
Alpha
Fig. 11.3. HJM MonteCarlo model price surface
The possible (σ, α) solutions are those in which Model Price = Market Price,
therefore where the surface passes through 0. This is depicted above as the limiting
curve between the blue and purple regions.
172
HJM Solution
0,18
K = 3.35%
0,16
0,14
l
0,12
0,1
Sigma
0,08
0,06
0,04
0,02
0
-60 -40 -20 0 20 40 60 80
Alpha [%]
Fig.11.4. HJM MonteCarlo two dimensional solution
Notice however that we obtain an entire curve of possible solutions, meaning that
we have an infinite set of possibilities to select our (σ, α) pair from. As in any
mathematical problem with two variables to solve, it is clear that we need two
equations to fully determine the system. We must therefore perform each calibration
in pairs of vanilla products (that have the same time schedules, and different strikes),
so that the final solution set (σ, α) is that which satisfies both curves, thus their
intersection.
HJM Solution
0,18
0,16 K = 3.35%
K = 4.45%
0,14
l
0,12
0,1
Sigma
0,08
0,06
0,04
0,02
0
-60 -40 -20 0 20 40 60 80
Alpha [%]
Fig. 11.5. HJM MonteCarlo two dimensional solution intersection for two vanilla products
Notice also that the pair of vanilla products must be selected with the same
maturity, but different strike. It is left to the trader to decide which specific pair of
173
vanilla products to select at each maturity and tenor, so as to model the risk that he
seeks.
Depending on his selection, the trader will be taking into account a tangent to the
volatility curve at a specific point, or will be reproducing a broad slope modelisation.
σB σB
K K
K K K K K
Fig. 11.6. Model implications on taking a) very close strikes b)distant strikes
We immediately see here that a trader trying to capture a curvature, i.e. a smile,
will be unable to do so with the selection of only two strike positions. A third strike
would be required, and consequently a third parameter γ – the ‘Volatility of
Volatilities’- would have to be inserted into our HJM formulation to completely
determine the system of 3 equations and 3 unknowns.
11.1 Dynamics of the curve
Increasing the market prices, or equivalently, the σBlack results in a downward

vertical displacement of the entire surface Ω. According to our z axis definition we are
now subtracting a greater amount in the second term of the equation
HJM Model Price – Market Price
This has the effect on our 2D graph of displacing our solution curves upwards in
the sigma axis. (Not to confuse our HJM model’s sigma with the σBlack that is related to
the product price through the Black Scholes formula).
Variation of the strike K in turn appears to be a re-scaling of the solution curve.

Its effect is to increase the overall size of the curve and shift it towards smaller values
of alpha, maintaining a more or less constant right end value
174
σ σ
σBlack1 K1
σBlack2 K2
K K
Fig. 11.7. Solution dynamics with a) variation in market price b) variations in

strike
11.2 HJM Problematics
Of specific interest to us is to analyse the cases in which our HJM algorithm does
not find a concrete solution. We have encountered four main cases of concern.
11.2.1 Lack of Convergence
There exists a solution but the algorithm does not converge properly towards it.
175
Convergence
K = 3.35% 0,16 1st

K = 3.45% Guess
0,14
0,12
l
Sigma
0,1
no
convergence
0,08
0,06
0,04
-40 -30 -20 -10 0 10 20 30 40
Alpha [%]
Fig. 11.8. Convergence of the algorithm
We realise that this problem can be solved easily and directly through two main
alternatives, each being equally valid.
a. Selecting a first guess which is closer to the initial solution. This will be one of
the driving forces to develop an analytic approximation.
b. Increasing the number of MonteCarlo simulations so as to have a more robust

convergence towards the true solution. The only drawback with this alternative is the
increased computation time required.
11.2.2 Duplicity
There exists a duplicity of solutions
The observant reader will have already noticed this in the previous graph or
could have already forecasted this difficulty due to the fact that the alpha has a
concave form. We present here a more evident solution duplicity case.
176
Solution Duplicity
0,2 K = 4.35%
K = 4.45%
0,18
l
0,16
0,14
Sigma
0,12
0,1
0,08
-10 -5 0 5 10 15
Alpha [%]
Fig. 11.9. Solution Duplicity
In general we intend our model to have a logical set of parameters. Remember

that the alpha represents the weight that we attribute to the normality or log-
normality of our model. Thus we expect our set of alpha values to range between [0,1].
We notice that one of our solutions always lies within this range, whereas a second
solution sometimes appears that can greatly exceed it. To impose that our model does
not converge towards solutions of the form α = 5, we could either code a restrictive
algorithm, or we can simply impose a first guess with an alpha within this range.
Better still, we could approximate our first guess greatly to the valid solution to
ensure that the algorithm converges towards it. (See the analytic approximation in
Section 12).
177
11.2.3 No solution
No curve intersection, and thus no valid pair of parameters.
Lack of Intersection
0,2
0,18
K = 4.45%
0,16 K = 4.45%
0,14
l
0,12
Sigma
0,1
0,08
0,06
0,04
0,02
0
-50 -40 -30 -20 -10 0 10 20 30 40
Alpha [%]
Fig. 11.10. No HJM MonteCarlo solution intersection
11.2.4 No curve whatsoever
This occurs whenever our HJM model is not flexible enough to descend below the
horizontal axis. In other words, the model is never capable of equating its model
prices to the real market prices with any possible combination of parameters (σ, α).
The surface remains asymptotic to the horizontal axis.
178
Surface Flexibility - Vanilla 2
5
Model - Market Price
1
0.18
0
0.11
7
Sigma
5.5
2.5
0.04
1
-0.5
-2
Alpha [%]
Fig. 11.11. HJM MonteCarlo surface does not descend sufficiently so as to create a
solution curve
11.3 Attempted Solutions
A first idea that we tested to probe the cause of this inability to calibrate was to
analyse the gradients. We define the gradient in a σBlack versus strike K graph as
∂σ Black
∂Strike
We realize that the model has particular difficulties when trying to calibrate
products that are very smile i.e. that are very convex; or that have strong changes in
their slope. We analysed two principal cases:
· Whether the algorithm was incapable of calibrating due to the magnitude

(steepness) of the gradient.
· Whether the algorithm was incapable of calibrating depending on the

particular strike K location where the gradient was calculated i.e., whether the
179
algorithm found it particularly more difficult to calibrate when it was far from
the ‘at the money’ strike position.
No conclusive evidence was found supporting either of the two suggestions.
A second idea tested was to analyse the actual plain vanilla products that we
were calibrating. We found that calibrations of similar products, for example, a
swaption pair, could be solved for by the HJM model. If instead we calibrated
different products together, such as a swaption and a capfloor pair, the HJM would be
unable to converge to a suitable solution. We noticed also that there is a clear
difference in slopes between these two types of products. On average, a capfloor’s
slope has a value of around -1, whereas a Swaption has a slope of -2.
The above sparks off two possibilities:
The first is an examination of the actual cap market data so as to examine if the
caplets are being correctly derived from their cap market quotes. This will be
discussed in detail in a later chapter- Section 16.
A second option would be to try and normalize the caplet volatility measure. We
have noticed that the joint calibration of swaptions (based on the 1Y LIBOR) and 1Y
caplets (based on the 1Y LIBOR), do yield results. However, when we try to calibrate
the same swaptions with 3 month or 6 month caplets (based on the 6M LIBOR),
immediate problems arise. A possible alternative would be to transform these caplets
into an equivalent annual measure, and to calibrate this fictitious annual measure
with the swaption measure. See [BM].
180
11.4 3D Surface Algorithm
We follow 4 main steps in the algorithm that constructs the 3 dimensional surface
and that then evaluates the 2 dimensional solutions for α and σ.
1. Selection of the maturity i at which we will evaluate a particular pair of

products
2. Mesh generation: this is, we define the horizontal plane of α’s and σ’s over
which to construct our surface.
3. For each of the two products q, we generate for every mesh node (α, σ) the 3D
surface as
Model Price(α, σ) – Market Price
4. We search for the (α, σ) solutions. This is where the surface passes through the
plane z = 0. For this, we check every node on the horizontal mesh and see if its
corresponding price difference (z value) experiences a change in sign with respect to
any adjacent node. We check both the horizontally adjacent, and vertically adjacent.
Any solution found is stored.
181
START
N Products
2 Parameters
Select product
pair ‘i’ to
calibrate
q=2
Set Parameter mesh
LL = m σ points
L = n α points
no Export
q>0? 3D Matrices
yes
3D Matrix
q=q-1
σ0 σLL σm
α0
For each point ML,LL
αL
MarketPrice-ModelPrice
αn
Vertical Check: for each σLL cst.

See if change in sign in α column
σ LL
α0
αL
Next αn
column σLL
LL = LL – 1
L =n
Next αL Export Solution
yes
ML,LL · ML-1,LL < 0  α L + α L −1 
L =L - 1  , σ LL 
 2 
no
yes
L >0 ?
no
yes LL > 0
?
no
Horizontal Check: for each αL cst.

See if change in sign in σ row
σ0 σLL σm
αL L =n
Next LL = m
row αL
L =L – 1
LL = m Export Solution
ML,LL · ML,LL-1 < 0 yes  σ + σ LL −1 
Next σ LL  α L , LL 
LL = LL - 1  2 
no
yes
LL > 0 ?
no
yes
L >0 ?
no
Fig. 11.12. Graphic surface generation algorithm
182
12. HJM 3 Strikes
The HJM three strike model, as stated initially, aims at being able to capture the
smile perceived by the market through the addition of a new parameter. The
peculiarity of this new component lies in the fact that it is itself stochastic. We will see
that the new parameter, which we will call the volatility of volatilities, has its own
diffusion dynamics that is governed by a different Brownian motion, Z.
We will now proceed to analyse the main alternatives which we experimentally

invented and developed here. Notice as we advance, how they all present a direct
relationship with the typical normal and lognormal stochastic differential diffusion
equations. This is why, upon integrating, they all evolve towards an exponential form.
We have provided their mathematical development below:
 B(0, T ) 
Γ(t , T ) = σ (t , T ) ⋅V (t , T ) ⋅ α (t , T ) log B(t , T ) + (1 − α (t , T )) log 
 B(0, t ) 
(12.1)
Where V(t,T) introduces the new term of stochasticity, Z. We will describe several
choices of V tried in HJM Santander.
12.1 Exponential
Is one of the most simple stochasticity models that we shall tackle. It involves a
unique volatility of volatilities parameter, combined with the new Brownian motion
itself.
V (t , T ) = eγ (t ,T ) Z t (12.2)
The above is selected in particular so that we ensure
E P [V (t , T )] = 1 (12.3)
183
Chapter 12 HJM 3 Strikes
A logical extension of the presented model would be
1
− γ 2 ( t ,T ) t + γ ( t ,T ) Z t
V (t , T ) = e 2
(12.4)
Clearly, recalling chapter 3.1, we can obtain the above from a Black formulation of
the diffusion for the new Brownian motion. Imagine that we have a dynamics of the
form
dV (t , T ) = βV (t , T )dt + γ (t , T )V (t , T ) dZ t (12.5)
dVt
= β dt + γ t dZ t (12.6)
Vt
Let us impose a change in variable
X = LN (Vt )
(12.7)
dX = dLN (Vt )
Applying Ito
1 1  −1  2 2
dX = dVt + 0 +  2 Vt γ t dt (12.8)
Vt 2  Vt 
replacing the term in dV with our initial diffusion equation
1 2
dLN (Vt ) = dX = βdt + γ t dZ t − γ t dt (12.9)
2
and now integrating
 1 
∫  β − 2 γ t  dt + ∫ γ t dZ t
2
Vt = V0 e 
(12.10)
Assuming γ(t,T) to be piecewise constant, we could extract it from the integral,

obtaining
1
βt − γ t 2t +γ t Z
Vt = V0 e 2
(12.11)
If we do not make the above assumption, we must solve the second integral using
Brownian motion properties, as
184
∫γ t dZ t = Variance (∫ γ dt ) Zt
t = (∫ γ 2
t dt ) Zt (12.12)
Which leaves us with
 1
∫  β − 2 γ t
2
 dt + ( ∫ γ dt ) Zt
2
Vt = V0 e
t

(12.13)
We will see that this procedure is common throughout all the possibilities
attempted. This is to say, that we could always use a more simplistic approach rather
than a complete integration. In general we had particular difficulties in implementing
the complete integrals. For this reason we commonly resorted to piecewise constant
assumptions. Another successful solution to avoid the integral of a squared parameter
was to approximate:
∫ γ 2t dt ≈ ( ∫ γ t dt )
2
12.2 Mean Reversion
We simply add here a term of mean reversion to the previously developed

dynamics in (12.5).
dV (t , T ) = β (θ − V (t , T ))dt + γ (t , T ) dZ t (12.14)
We now must apply a change in variable to attain the solution. Let
∫ β s ds
xt = Vt e 0 (12.15)
Then by Ito
t t
∫ β s ds ∫ β s ds
dxt = e 0 dVt + β tVt e 0 dt + 0 (12.16)
∫ β s ds
= e0 (dVt + β tVt dt ) (12.17)
185
And substituting our diffusion equation in dV we obtain
∫ β s ds
= e0 (β t (θ − V (t , T ))dt + γ (t , T )dZ t + β tVt dt ) (12.18)
Leaving
∫ β s ds
dxt = e 0 (β tθdt + γ (t , T )dZ t ) (12.19)
Integrating between t and 0 we have
 ∫ β s ds 
u
t

xt − x0 = ∫ e

0
(β uθdu + γ (u, T )dZ u ) (12.20)
0
 
We can substitute now our change in variable for x to return to a formulation in

V:
 ∫ β s ds 
t 0 u
∫ β a da ∫ β s ds t
Vt e 0 − V0 e 0 = ∫ e 0 (β uθdu + γ (u , T ) dZ u ) (12.21)
 
 
0
 ∫ β s ds 
t t u
∫
− β s ds ∫
− β a da t
e 0 (β θdu + γ (u , T )dZ )
Vt = V0 e 0
+e 0
∫0  u u

(12.22)
 
 ∫ β s ds 
t t u
∫
− β s ds ∫
− β a da t
e 0 (β θdu + γ (u , T ) dZ )
Vt = V0 e 0
+e 0
∫0  u u

(12.23)
 
Let us take βt = β, then
t t
− βt − βt βu − βt βu
Vt = V0 e +e ∫e βθdu + e ∫e γ (u , T )dZ u (12.24)
0 0
If we assume γ(t,T) to be piecewise constant, we can extract it from the

integration, thus
186
 e βt − 1  t
Vt = V0 e − βt + e − βt βθ   + γ (t , T )e − βt ∫ e βu dZ t (12.25)
 β  0
t
βu
Where using the Brownian motion properties we can calculate ∫e
0
dZ t as
t  Z
Variance ∫ e βu dt  ⋅ t =
2β
(
1 2 βt Z
e −1 ⋅ t ) (12.26)
0  t t
Leaving
( )
Vt = V0 e − βt + θ 1 − e − βt + γ (t , T )e − βt (
1 2 βt
2β
Z
e −1 ⋅ t ) (12.27)
t
Notice that if we do not consider γ(t,T) piecewise constant, we would have
t  Z t  Z
Variance ∫ e βt γ (t , T )dt  ⋅ t =  ∫ e 2 βt γ 2 (t , T )dt  ⋅ t (12.28)
0  t 0  t
Leaving
 eβt − 1   t 2βt 2  Zt
Vt = V0 e− β t + e − β t βθ   + γ (t , T ) e −βt
 ∫ e γ (t , T )dt  ⋅
 β  0  t
(12.29)
12.3 Square Root Volatility
We simply state the idea of the diffusion for this form. We did not finally
implement the expression below
dv(t , T ) = β (θ − v(t , T ))dt + γ (t , T ) v(t , T )dZ t (12.30)
187
12.4 Pilipovic
Notice that the expression below is very similar to the one that we stated in the
mean revision section- (12.14). The main difference is simply the inclusion of the
stochastic volatility of volatilities term V in the diffusion (second term). Previously, it
had only been included in the drift term.
dV (t , T ) = β (θ − V (t , T ))dt + γ (t , T )V (t , T )dZ t (12.31)
A solution to the above equation was discovered by Dragana Pilipovic as
 
t t
1 1 2
∫
− β dt − γ t 2 dt + γ s dZ s
 t
∫ βds − 2 γ s d s + γ s dZ s 
βθdu 
2
V (t ) = e  V ( 0) + ∫ e (12.32)
0 0
 0 
 
To demonstrate that this is a solution, we will apply Ito as proof. For this we
must calculate each term in:
∂V (t ) ∂V (t ) 1 ∂V 2 (t ) 2
dV (t ) = dt + dZ + γ t dt (12.33)
∂t ∂Z 2 ∂Z 2
We therefore calculate each of the above partial derivatives as:
 
t t
1 1 2
∂V (t ) ∫
− β t − γ t 2 dt + γ s dZ s

t
∫ β − 2 γ s ds +γ s dZ s
= −β e 0 βθ du  −
2
V (0) + ∫ e 0
∂t  
0
 
  − ∫ β t − 1 γ t 2 dt +γ s dZ s
t t t t
1 2 1 2 1
1 2 − ∫0 β t − 2 γ t dt +γ s dZ s  ∫ β − 2 γ s ds +γ s dZ s ∫ β ds − γ s 2 ds +γ s dZ s
t
− γt e V (0) + ∫ e 0
βθ du + e  0
2
βθ e 0
2
2 
 

0
 
(12.34)
∂V (t ) 1 2
= − β V (t ) − γ t V (t ) + βθ (12.35)
∂t 2
− ∫ βdt − γ t 2 dt +γ s dZ s  
t t
1 1 2
∂V (t )  t
∫ βds− 2 γ s d s +γ s dZ s 
= γ te 0 ∫ βθ  = γ tV (t )
2
 V ( 0) + e 0
du
∂Z t  0  (12.36)
 
∂V (t )
= γ tV (t )
2
∂Z t
188
By Ito, applying (12.33) and now substituting the above, we see that the complex
terms cancel out, bringing us back to the original diffusion equation.
 1 2  1
dV (t ) =  − βV (t ) − γ t V (t ) + βθ dt + V (t )γ t dZ + V (t )γ t dt
2
(12.37)
 2  2
= β (θ − βV (t ) )dt + V (t )γ t dZ
For a constant β, and piecewise constant γt ,our solution would be
1
− βt + γ t 2 t −γ s Z s  t 1
βu − γ u 2 u +γ u Z u 
V (t ) = e 2 V (0) + βθ ∫ e 2 du  =

 0 
(12.38)
  βt  − 12 γ t 2 t  t γ u Z u  
V (0) + βθ  e − 1  + 2
1
− βt + γ t 2 t −γ s Z s
=e 2
  e − 1 + ∫ e du 
  
 β  γ t
2
   0  
where the last term is of particular difficulty, and could numerically be
approached as ∑ eγ
i
i Zi
(t i − t i −1 )
12.5 Logarithmic
Note that this is still a very similar format to that which we had in our mean
reversion model. We have simply replaced the variable V by log V.
d log V (t ) = β (θ − log V (t ))dt + γ (t , T )dZ t (12.39)
Let us convert the above to a simpler form through
X (t ) = log V (t )
(12.40)
dX (t ) = d log(V (t ) )
Then the initial equation is now written as
dX (t ) = β (θ − X (t ))dt + γ (t , T ) dZ t (12.41)
For which the solution, as we saw previously, for a constant β , was
( )
t
X t = X 0 e − βt + θ 1 − e − βt + γ (t , T )e − βt ∫ e βu dZ t (12.42)
0
189
Now undoing our initial change in variable:
X (t ) = log V (t )
(12.43)
dX (t ) = d log(V (t ) )
( )
t
so log Vt = log V0 e − βt + θ 1 − e − βt + γ (t , T )e − βt ∫ e βu dZ t (12.44)
0
Where the integral can be treated as a Brownian term
 t βu  Z t  t 2 βu  Z t
( )
t
βu 1 2 βt Z
∫0    ∫ e du  ⋅
e dZ t = Variance ∫ e du ⋅
 t =   t = 2β
e −1 ⋅ t
0  0  t
(12.45)
e − βt
( )
θ 1 − e − β t + γ t e − βt (
1 2 βt
2β
)Z
e −1 ⋅ t
Vt = V0e ⋅e t
(12.46)
12.6 Taylor expansion
Another alternative that has been examined is the transformation of the curvature
into a Taylor expansion. Recall that we had
 B(0, T ) 
Γ(t , T ) = σ (t , T ) α (t , T ) log B(t , T ) + (1 − α (t , T )) log
B(0, t ) 
(12.47)

let us note
B(0, T )
x0 = log
B (0, t ) (12.48)
x = log B(t , T )
we can then re-write the above as
Γ(t , T ) = σ tα t ( x − x0 ) + σ t x0 (12.49)
190
We could imagine a more accurate extension of the above as a second order

Taylor expansion
[
Γ(t , T ) = σ t x0 + α t ( x − x0 ) − γ ( x − x0 ) 2 ] (12.50)
The above works relatively well, yet only until maturities of around 12 years. We
do not have a strong reason to abandon this formulation, other than the fact that we
need to narrow down our range of alternatives, and thus have decided to use the
volatility of volatility dynamics method instead, as it gives good results for products
with longer maturities. The behaviour itself depends on the market data. For example,
for the USD, the curvature approach has been implemented already and appears to
work better than the volatility of volatilities approach.
12.7 Graphical Note
Following our discussion on the two strike model, it seems evident that in the
three strike model, we will now need three vanilla products , calibrated jointly, to
attain a unique set of parameters for a given maturity. Remember how before, we
obtained the solution in the intersection of two curves. Now, the solution will be
obtained in the unique intersection of three surfaces.
Visually this is much more complex than before. The only way in which it could
be represented would be to fix one of the model parameters, and plot the remaining
two against a vertical axis consisting of ‘Model Price – Market Price’. We have not
pursued this possibility any further.
12.8 Results
We will now proceed to summarize the results obtained by comparing how each
of the former expressions for the volatility of volatilities performs.
It is worthwhile noting that in many of the previous formulations, we saw several

new parameters apart form the volatility of volatilities. These include for instance the
β and θ, whose values we have decided to hardcode into our algorithms. An
191
alternative or possible future approach could consist in also calibrating these

parameters. For the meantime, we have manually adjusted them, searching for those
for which our calibration works best.
Notice in the following set of results that we have tended to perform a first
simplistic approach setting these additional parameters to default values of 0 or 1. We
have then proceeded to perform a more detailed analysis, searching for their optimal
values and exploring to what extent they improved the calibration. We have noted
these as ‘extended’ results.
We state that all the formulae were exhaustively tested over 20 year products so
as to evaluate at what point the calibration failed. None of the former expressions
were capable of successfully completing the calibration process. The results presented
below were obtained for 10 pack simulations: this is, 10,000 MonteCarlo paths. The
more packs we added, the more difficult the HJM 3 factor algorithm found it to
advance.
12.8.1 Case 1, Exponential (12.2):
1st Case Limit β

Normal 13Y,14Y Caplet
Extended piecewise integration 14Y, 15Y Caplet -0.05
exact integration 2Y, 3Y 0.1 to -0.1
squared integral 14Y, 15Y Caplet 1 to -0.1
Table 12.1. Mean Reversion Stochastic Volatility Summary
This first formulation turned out to be one of the most successful. Notice that the
column headed limit represents the time span up until which the algorithm
successfully calibrated. We have also included a column β which shows the range of
values for this parameter in which the above formula successfully calibrated to the
specified date.
192
A more careful analysis of the results obtained confirms the following:
· The algorithm seems to fail because the change in the gamma parameter for
long products is too drastic.
· The extended version with beta appears much more flexible.
· The squared integral method allows for the greatest range of parameter
alternatives whilst still reaching the same final length (in years) of calibration.
12.8.2 Case 2, Mean Reversion (12.14):
2nd Case Limit

integral squared 2Y, 3Y
(integral) squared 5Y, 6Y
Table 12.2. Mean Reversion Stochastic Volatility Summary
12.8.3 Case 3, Square Root (12.30) :
3rd Case Limit

2Y, 3Y
5Y, 6Y
Table 12.3. Square Root Stochastic Volatility Summary – 10 and 20 packs
12.8.4 Case 5, Logarithmic (12.39):
5th Case Limit β θ

15Y, 20Y Swaption 1 -0.7 to -0.001
15Y, 16Y Caplet 1.5 -0.03 to -0.035
16Y, 17Y Caplet 2.8 to 3 -0.02 to -0.25
Table 12.4. Logarithmic Stochastic Volatility Summary
193
Clearly, this fifth formulation is the one that has been most successful, reaching
one or two years further in the calibration than the 1st case formulation. Nevertheless,
we were still incapable of calibrating the vanilla products to the final maturity of 20
years.
Having reached this point, we began to consider the possibility that perhaps, it
was not the ‘Volatility of Volatilities’ expression that was causing the failure to
calibrate. We postulated the hypothesis that perhaps the key to the problem could be
located elsewhere.
Indeed, we found a surprising feature when performing these tests. All the above
expressions were obtained from joint calibrations of both Swaptions and Caplets,
where the caplets were based on the 1 year EURIBOR. Since we had calibrated up
until 16 years, we expected to find that our MonteCarlo would at least be able to
calibrate any product under the 15 year mark. Contrary to our expectations, we found
that there was a certain range of products that our model was incapable of calibrating,
even in the case of shorter maturities. These were, specifically, joint calibrations of
Swaptions and Caplets in which the Caplets were not based on the 1 year EURIBOR,
but instead, on the 3 month or 6 month EURIBOR.
An important problem is the fact that in the cap market, forward rates are mostly
semi – annual whereas those entering the forward-swap-rate expressions are typically
annual rates. Therefore, when considering both markets at the same time, we may
have to reconcile volatilities of semi-annual forward rates and volatilities of annual
forward rates. Otherwise, when we treat them as exact equals, the above calibration
problems occur. And perhaps, it is incorrect even to treat the swaptions and caplets
both based on the 1 year EURIBOR as equals. This may be the underlying reason why
we do not calibrate successfully.
This problematic is what gives rise to a deeper study of the joint calibration. We
will first analyse a possibility that the caplet market data themselves should be
erroneous. This will involve an analysis of how the caplets are stripped from their
corresponding cap values. This has been developed in detail in the Caplet Stripping
Section 16. Once this has been implemented, we should ideally return to the 3 Strike
analysis to see whether any improvements are obtained.
If this turns out not to be the true problem, we will perhaps need to somehow
normalise the data of both caplets and swaptions. For this we refer the reader to [BM].
194
13. Analytic approximation
What we refer to as an analytic approximation is simply an approximate

formulation of the HJM framework. In other words, it is an approximation of the
model’s dynamics yielding a price for Swaptions which can be described through an
analytical formula that is very simple to implement numerically.
The need for an approximate formula arises from the fact that the HJM can be
very costly time-wise. It generally involves a huge number of path simulations which
can weigh heavily on any computer and even on a grid network. As seen in the
calibration section, every additional iteration requires n + 1 additional pricings so as
to calculate all the elements in the Jacobian.
If instead we were to work with an analytical approximation, we would stand in

a far stronger position due to the following main reasons:
The HJM model starts off its calibration with an arbitrary first guess for its
parameters. As it iterates through a Newton Raphson algorithm, this type of
algorithm is characterised by converging badly and very slowly if the first guess is
very far from the true solution. A huge leap forward in computation time would be
achieved if the analytical approximation were to provide us with a good starting
point.
Calculating a numerical Jacobian as is done in the calibration process is extremely

costly. We have already seen that we can freeze the model’s Jacobian so as to iterate in
more steps but without having to recalculate the Jacobian at each step. We could
further reduce calibration time if we were to use an analytical approximation Jacobian
that could be calculated mathematically and not through a finite difference ‘bumping’
procedure.
We therefore set out with the aim of deriving an approximate formula of a

swaption for the Santander HJM model.
195
Chapter 13 Analytic approximation
As shall be demonstrated, we will start off by making an assumption for the

dynamics of the forward swap rate
dS (t ) = σ (t )[α (t ) S (t ) + (1 − α (t )) S (0)]dWt P (13.1)
We will use this formulation as an analytic approximation to the exact HJM

expression. From this approximation we will derive its time-dependent parameters
α(t), σ(t). The exact expressions for these will cover the central part of our research. We
will subsequently need a formula to relate our new approximate expression’s σ(t) and
α(t) to the HJM model’s σi(t) and αi(t) (with i = 0, ..., n)
Secondly, we will use the technique of “averaging” developed by Piterbarg to

convert our time dependent parameters α(t), σ(t) into a diffusion with time-
independent parameters α, σ. This will convert our approximate formula into the form
dS (t ) = σ [αS (t ) + (1 − α ) S (0)]dWt P (13.2)
We will simply state the formulation to be used in this section, but will not
analyse it in any further depth.
13.1 Formula Development
Let us recall two pivotal expressions in our HJM model:
 B(0, T ) 
Γ(t , T ) = σ (t , T ) α (t , T ) log B(t , T ) + (1 − α (t , T )) log
 B(0, t ) 
(13.3)
[ ]
dR(t , T ) = (...)dt + σ (t , T ) α (t , T ) R(t , T ) + (1 − α (t , T )) R f (t , T ) dWt P
(13.4)
The first expression implies the second. Indeed, write
dB(t , T )
= rt dt + Γ(t , T )dWt P (13.5)
B(t , T )
196
B(t , T ) = e − R (t ,T )(T −t ) (13.6)
log B (t , T )
R(t , T ) = (13.7)
t −T
Applying Ito to R(t,T) we obtain:
 1  1  −1 −1
dR(t , T ) =
1
 dB(t , T )  + B 2 Γ 2  2  dt + (log B )dt
t −T  B(t , T )  2 B  (t − T )2
(13.8)
dR(t , T ) =
1
[rdt + ΓdW ] − 1 Γ 2 dt − 1 2 R(t , T )(t − T )dt
t −T 2 (t − T )
(13.9)
Separating into temporal and Brownian components
 r 1 R(t,T)  Γ
dR(t,T) =  r − Γ2 − dt + dW
 t − T 2 (t − T )  t − T
(13.10)
Replacing the Γ in the Brownian part with (13.3)
1  B(0, T ) 
dR(t , T ) = (...)dt + σ (t , T ) α (t , T ) log B(t , T ) + (1 − α (t , T )) log dW
t −T  B(0, t ) 
(13.11)
And applying the relationship between bonds and rates in (13.7)
dR (t , T ) = (...)dt +
1
t −T
[ ]
σ (t , T ) α (t , T ) R(t , T )(t − T ) + (1 − α (t , T )) R f (t , T )(t − T ) dW
(13.12)
where we have decided to call Rf the zero coupon rate forward
B(0, T )
= e − R (t ,T )(T −t )
f
(13.13)
B(0, t )
197
13.1.1 Swaption Measure
Let us consider our receiver swaption with strike K and time schedule τ = U0,U1,
...,Un. We are thus left with:
[
dRi (t ) = (...)dt + σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t ) dWt P ] (13.14)
Where Ri (t ) = R(t ,U i )
The swaption rate forward of such a swap at time t is:
B(t ,U 0 ) − B(t , U n )
S (t ) = n
(13.15)
∑ m B(t,U )
i =1
i i
13.2 Step 1
Under its annuity measure S(t) is a martingale. From the previous equation and
under the HJM model, S(t) presents the following dynamics:
∂S (t )
[ ]
n
dS (t ) = ∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t ) dWt P (13.16)
i=0 ∂Ri (t )
Proof:
By applying a multidimensional form of Ito to (13.15) which in turn can be

rewritten as
B(0,U 0 )e− R0 (t ,U 0 )(U 0 −t ) − B(0,U n )e − R (t ,U n )(U n −t )

S (t ) = n
(13.17)
∑ mi B(0,U i )e− R (t ,Ui )(Ui −t )
i =1
through the use of
B(t , T ) = e − (T − t ) R (t ,T ) (13.18)
198
We have thus obtained a multidimensional form of the more simplistic Ito

equation
∂S ∂S 1 ∂2S 2
dS (t ) = dR + dt + σ dt (13.19)
∂R ∂t 2 ∂R 2
Imposing that St is a martingale, we can say that it follows a driftless process,

meaning that all the terms in dt should cancel out. In this way we reach that
∂S
dS (t ) = dR (13.20)
∂R
Where we have dR from before. Thus, by simply replacing dR and realizing that
we have differentiated with respect to each of the Ri that we take from our continuous
rate curve, we have
∂S (t )
[ ]
n
dS (t ) = ∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t ) dWt P
i =0 ∂Ri (t )
Because our HJM model produces a skew, we can make an assumption for our
approximation, on the dynamics of the swap rate forward such that:
This is the approximation of the dynamics of St that we choose to use.
We will now need a formula to relate our new approximate expression’s σ(t) and
α(t) to the true HJM model’s σi(t) and αi(t) (with i = 0, ..., n)
Equations (13.16) and (13.21) give us the dynamics for S(t). The first is the HJM
formulation and the second is our newly developed approximation. We must
therefore find a suitable relationship between them.
199
13.2.1 First Method
To get from (13.16) to (13.21), we impose two conditions. The first is that the two
equations should be equivalent along the forward path. Suppose Ri(t) = Rfi (t)
B(0, T )
∀i ∈ 0,..., n then S(t) = S(0). In fact, when Ri(t)=Rfi(t) we have B(t , T ) = ,
B(0, t )
since
B(t , T ) = e − (T − t ) R (t ,T ) (13.22)
B(0, T )
= e − R (t ,T )(T −t )
f
(13.23)
B(0, t )
Hence
B(0,U 0 ) B(0, U n )
−
B(t , U 0 ) − B (t , U n ) B(0, t ) B(0, t )
S (t ) = n
= n
(13.24)
B(0,U i )
∑i =1
mi B(t , U i ) ∑
i =1
mi
B(0, t )
By definition
B(0,U 0 ) − B(0, U n )
S (t ) = n
= S ( 0) (13.25)
∑ mi B(0,U i )
i =1
So S(t) = S(0)
From this first condition we obtain, rewriting our two expressions: i.e. the
swaption rate forward dynamics from the HJM and the approximations standpoints:
dS (t ) = σ (t )[α (t ) S (t ) + (1 − α (t )) S (0)]dWt P = σ (t ) S (0)dWt P (13.26)
∂S (t )
[ ] ∂S (0)
n n
σ (t )dS (0) = ∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t ) dWt P = ∑ σ i (t ) R f i (t )
i =0 ∂Ri (t ) i =0 ∂Ri (t )
(13.27)
200
Both the HJM and the approximate formulation must be equivalent. We can
equate them, and solve for our approximation’s σ(t):
n
∂S (0)
σ (t ) S (0) = ∑ σ i (t ) R f i (0) (13.28)
i =0 ∂Ri (t )
Which we write in a simplified manner as
n
∂S (0) R f i (t ) n
σ (t ) = ∑ σ i (t ) = ∑ q i (t )σ i (t ) (13.29)
i =0 ∂Ri (t ) S (0) i=o
where in order to be able to solve, we must freeze the parameter Ri , this is
Ri = Ri f i = 1,..., n
And where
∂S (0) R f i (t )
q i (t ) = (13.30)
∂Ri (t ) S (0)
Thus, at this point, we have achieved an expression for our approximate σ(t) that
is a function of parameters that are all known at present. It is important to notice that
we will use the techniche of freezing the Ri to its Rfi value in this and all subsequent
approximation alternatives.
We proceed now to calculate an expression for α(t). Having decided that the
slope should agree along the forward path as well. We intuitively identify α(t) with
the slope or skew of our HJM model, as has been seen in the HJM Section 8.5.2. Thus,
∀j , we analyse the slope by differentiating:
∂
∂R j (t )
 n
∂R j (t )  i = 0 ∂Ri (t )
[
(dS (t ) ) = ∂ ∑ ∂S (t ) σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t )  =

]

n
∂ 2 S (t )
=∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t )  +
i =0 ∂Ri (t )∂R j (t )
n
∂S (t )  ∂ 
+∑  α i (t ) Ri (t ) + (1 − α i (t )) R f i (t )  
i =0 ∂R j (t )  ∂R j (t ) 
(13.31)
201
∂S (t )
where there exists a derivative for α i (t ) Ri (t ) only when i=j. Thus in the
∂Ri (t )
last term we eliminate the “i” index and replace by a “j”:
∂ 2 S (t )
[ ]
∂S (t )
n
=∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t ) + σ j (t )α j (t )
i=0 ∂Ri (t )∂R j (t ) ∂R j (t )
(13.32)
And separately, we analyse the same slope in our approximate formulation. We

have
∂ ∂S (t )
σ (t )[α (t ) S (t ) + (1 − α (t )) S (0)] = σ (t )α (t ) (13.33)
∂R j (t ) ∂R j (t )
σ does not differentiate with respect to R as it is made up of terms at time 0, and

terms in Rf
∂S (0) R f i (t )
σ = σ i (t )q i (t ) = σ i (t ) (13.34)
∂Ri (t ) S (0)
Equating the HJM model’s slopes and the approximation formula’s slopes along
the forward path, we obtain ∀j
n
∂ 2 S (0) ∂S (0) ∂S (0)
∑ ∂R (t )∂R (t ) σ (t ) R
i =0
i
f
i (t ) +
∂R j (t )
σ j (t )α j (t ) = σ (t )α (t )
∂R j (t )
i j
(13.35)
Version 1
Ignoring the second order derivatives and thus just taking a first order approach
∂S (0) ∂S (0)
σ j (t )α j (t ) = σ (t )α (t ) (13.36)
∂R j (t ) ∂R j (t )
Thus
σ j (t )α j (t ) = σ (t )α (t ) (13.37)
202
Version 2
n
∂ 2 S ( 0) ∂S (0) ∂S (0)
=∑ σ i (t ) R f i (t ) + σ j (t )α j (t ) = σ (t )α (t )
i=0 ∂Ri (t )∂R j (t ) ∂R j (t ) ∂R j (t )
(13.38)
This problem with second order considerations normally, does not have a
solution. We reformulate it in the least-square sense: finding α(t) such that:
∑ (σ (t )α j (t ) − σ (t )α (t ) )
min n
α (t ) 2
j (13.39)
i =0
13.3 Second Method
To equate the approximation and the HJM model, we impose two conditions: that
the lognormal and normal terms should both independently be equal. This means that
as we have
∂S (t )
[ ]
n
dS (t ) = ∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t ) dWt P (13.40)
i=0 ∂Ri (t )
and dS (t ) = σ (t )[α (t ) S (t ) + (1 − α (t )) S (0)]dWt P (13.41)
then equating the lognormal and normal components
n
∂S (t )
σ (t )α (t ) S (t ) = ∑ σ i (t )α i (t ) Ri (t ) (13.42)
i=0 ∂Ri (t )
n
∂S (t )
σ (t )(1 − α (t )) S (0) = ∑ σ i (t )(1 − α i (t )) R f i (t ) (13.43)
i =0 ∂Ri (t )
These equations should also agree along the path forward at Ri(t)=Rfi (t) and S(t) =
S(0). Solving the above equations, we obtain:
n
∂S (0) R f i (t ) n
σ (t ) = ∑ σ i (t ) = ∑ q i (t )σ i (t ) (13.44)
i =0 ∂Ri (t ) S (0) i=o
203
n
∂S (0)
∑ ∂R (t ) R (t )σ
i=0
i i (t )α i (t )
α (t ) = i
(13.45)
n
∂S (0) f
∑
i = 0 ∂Ri (t )
R i (t )σ i (t )
Note that the formula for σ(t) is the same as the one derived from the first
method. α(t) is now a weighted composition of all the αi(t).
Note that another possible approach would have been to equate the terms in α(t)
and those independent of α(t). We immediately encounter a problem if we pursue this
approach, for we obtain
n
∂S (t )
σ (t ) S (t ) = ∑ σ i (t ) Ri (t ) (13.46)
i =0 ∂Ri (t )
n
∂ S (t )
σ (t )α (t )( S (t ) − S (0)) = ∑ σ i (t )α i (t )( R f i (t ) − Ri (t )) (13.47)
i=0 ∂ Ri ( t )
the first equation yields the same solution for σ(t) as in all the previous cases, but
the second term gives a problem of a division by 0 when S(t) = S(0)
n
∂S (t )  σ i (t )α i (t )  R f i (t ) − Ri (t ) 
α (t ) = ∑    (13.48)
i =0 ∂Ri (t )  σ (t )  ( S (t ) − S (0)) 
We will see in the results section that the final formulation we retain for our
algorithm is that which is provided by the second method.
204
13.4 Step 2
Following Piterbarg (and we will give no further details), we have:
T
α = ∫ w(t )α (t )dt (13.49)
0
Where
∫ w(t )dt = 1
0
(13.50)
How to choose w(t) is crucial. Piterbarg suggest that
v 2 (t )σ 2 (t )
w(t ) = T
(13.51)
∫v
2
(t )σ (t )dt
2
With
t
v 2 (t ) = ∫ σ 2 ( s)ds (13.52)
0
Another test may be for example:
T
1
α= tα (t )dt
T / 2 ∫0
2
(13.53)
For σ, we always choose the following:
T
1
σ2 = σ 2 (t )dt
T ∫0
(13.54)
205
13.5 Swaption Valuation
We will now analyse how to adapt our approximate formulation to the valuation
of a simple receiver swap. We will make our derivation as generic as possible so that
any of the two specific methods for the σ(t) and α(t) parameters derived can be equally
applied. We shall see how we simply convert our approximation dynamics into a
geometric Black Scholes form which we will then be able to solve for in a
straightforward manner.
A receiver swaption can be expressed as:
n n
Vt ( K ) = K ∑ B(t ;U i )mi − B(t ;U 0 ) + B(t ;U n ) = (∑ B(t ;U i )mi )[( K − S (t )]
i =0 i =0
(13.55)
Where
B(t ,U 0 ) − B(t , U n )
S (t ) = n
(13.56)
∑ m B(t,U )
i =1
i i
n
If we take N (t ) = ∑ mi B(t ,U i ) as the numeraire, S(t) will be a martingale
i =1
under this probability. So we have that the price of a receiver swaption is:
n −1 +
Swaption0 = ∑ mi B (0;U i ) E [ ( K − S (t )] (13.57)

i =0
Under its annuity measure, our approximate formula yields
dS (t ) = σ [αS (t ) + (1 − α ) S (0)]dWt P (13.58)
(1 − α ) S (0)
Performing a change in variable X (t ) = S (t ) +
α
We have dX (t ) = dS (t )
Which we can replace for
206
dX (t ) = dS (t ) = σ [αS (t ) + (1 − α ) S (0)]dWt P (13.59)
and again replacing S(t) we obtain
  (1 − α ) S (0)  
dX (t ) = σ α  X (t ) −  + (1 − α ) S (0) dWt P (13.60)
  α  
Leaving dX (t ) = σαX (t )dWt P (13.61)
We have thus arrived at a simple geometric standard differential equation to

which we can directly apply the Black Scholes Formula
n −1 +
Swaption0 = ∑ mi B (0;U i ) E P [( K '− X (t )] (13.62)

i =0
(1 − α ) S (0)
with . K'= K +
α
Applying the Black Scholes formula, we have:
n −1
Swaption0 = ∑ mi B(0;U i )[( K ' N (−d 2 ) − X (0) N (−d1 )] (13.63)
i =0
where
 X ( 0)  σ α ⋅ T
2 2
Ln +
d1 =  K'  2
(13.64)
σα ⋅ T
And d 1 = d 2 − σα ⋅ T (13.65)
13.6 Approximation Conclusion
We have implemented and tested the two methods extensively dor up to 25 year
calibrations with a maximum of 100,000 simulations. One strike calibrations build on
207
the same form for σ(t), which always works well, provided that the HJM model also
finds a solution. The difference between the two methods lies therefore in the α(t)
implementation. The first approach presents difficulties with certain calibrations.
Despite giving good Jacobians, it requires many MonteCarlo iterations. The second
method proves much more robust. Furthermore, its formulas for σ(t) and α(t) are very
simple. We fancy them in particular because they both seem to be weighted averages
of the σi(t) and αi(t).
The line of research followed to this point in the development of an approximate

formulation seems to be completely compatible with an extension to the 3 Strikes
model.
13.7 Alternative point of Calculation
We now attempt to calculate our approximate formula at a different point from

the original idea of taking Ri(t) = Rfi(t).
Previously, we had arranged for S(t) = S(0)- always ‘at the money’. We now will
examine the possibility of imposing this equality at a different point. We had:
B(t ,U 0 ) − B(t , U n ) e − (U 0 −t ) R0 (t ) − e − (U n −t ) Rn (t )
S (t ) = n
= n
(13.66)
∑ m B(t ,U )
i =1
i i ∑m e
i =1
i
− (U i − t ) Ri ( t )
B(0,U 0 ) − B(0,U n ) e − (U 0 −t ) R − e − (U n − t ) R
f f
0 (t ) n (t )
S ( 0) = n
= n
(13.67)
∑ m B(0,U )
i =1
i i ∑m e
i =1
i
− (U i − t ) R f i ( t )
Imagine that we search for
Ri (t )* = Ri f (t )
R0 (t )* = R0f (t ) + ε
B (0,U 0 ) B(0,U n )
−
B(0, t ) B(0, t )
then S (t ) = n
(13.68)
B(0,U i )
∑
i =1
mi
B(0, t )
208
Imagine that we now want S(t) = S*(t)
Then by dividing the previous equations we obtain
e − (U 0 −t )( R 0 (t )) − e − (U n −t ) R n (t ) B(0,U 0 ) − B(0,U n )
f f
S ( 0)
= =
*
S (t ) e − (U 0 − t )( R f 0 ( t ) + ε )
−e − (U n − t ) R f n ( t )
B(0,U 0 )e − (U 0 − t )ε − B(0,U n )
(13.69)
S * (t )  S * (t ) 
B(0,U 0 )e − (U 0 −t )ε = B(0,U 0 ) + 1 −  B(0,U n ) (13.70)
S ( 0)  S (0) 
1  S * (t )  S * (t )  B(0,U n ) 
ε= LN  + 1 −   (13.71)
(t − U 0 )  S ( 0)  S (0)  B(0,U 0 ) 
Note that if we take S(t)* = S(0), this model yields ε = 0, which brings us back to
the model we had initially.
We will show in the results section that this approximation method in ε yields
optimal results for calibrations that are performed at the money. This appears to be
quite logical, as the general level of volatility σ is best defined at the money, and so
calibrating it at any other point S*(t) does not seem as appropriate.
13.8 Two Factors
The development of an analytic approximation for the two factor HJM model is
completely analogous to the one factor case. Its HJM formulation is expressed as
dS (t ) =
n
∂S (t )
=∑ σ i (t ) α i (t ) Ri (t ) + (1 − α i (t )) R f i (t )  (sin θ i (t )dW1P (t ) + cos θ i (t )dW2P (t ))
i =0 ∂Ri (t )
(13.72)
Notice that the only real difference with respect to the one factor model are the
sine and cosine coefficients which have been included at the end of the expression
with respect to the two different Brownian motions.
209
Because our model produces a skew, we can make an assumption for the
dynamics of the swap rate forward, (once again, analogous to what had been
previously developed), and simply add sine and cosine coefficients with respect to the
two different Brownian motions.
dS (t ) = σ (t )[α (t ) S (t ) + (1 − α (t )) S (0)](sin θ (t )dW1P (t ) + cos θ (t )dW2P (t ))
(13.73)
Solving the above equations by freezing Ri(t) = Rfi (t) and S(t) = S(0), we obtain:
2 2
 n ∂ S ( 0 ) R f i (t )   n ∂S ( 0 ) R f i ( t ) 
σ (t ) =  ∑ σ i (t ) cos θ i (t )  +  ∑ σ i (t ) sin θ i (t ) 
 i = 0 ∂Ri (t ) S (0)   i = 0 ∂ R i (t ) S ( 0 ) 
(13.74)
The above can also be attained by following a parallel approach:
Separation of α-dependent and α-independent terms gives
n
∂S (t )
σ (t ) S (t ) sin θ (t ) = ∑ σ i (t ) Ri (t ) sin θ i (t ) (13.75)
i =0 ∂Ri (t )
n
∂S (t )
σ (t ) S (t ) cos θ (t ) = ∑ σ i (t ) Ri (t ) cos θ i (t ) (13.76)
i =0 ∂Ri (t )
n
∂S (t )
σ (t )α (t )( S (t ) − S (0)) sin θ (t ) = ∑ σ i (t )α i (t )( Ri (t ) − R f i (t )) sin θ i (t )
i =0 ∂Ri (t )
(13.77)
n
∂S (t )
σ (t )α (t )( S (t ) − S (0)) cos θ (t ) = ∑ σ i (t )α i (t )( Ri (t ) − R f i (t )) cos θ i (t )
i =0 ∂Ri (t )
(13.78)
and by applying trigonometry to the first two equations (13.75) and (13.76)
210
2 2
 n ∂S (0) R f i (t )   n ∂S (0) R f i (t ) 
σ (t ) =  ∑ σ i (t ) cosθ i (t )  +  ∑ σ i (t ) sin θ i (t ) 
 i = 0 ∂Ri (t ) S (0)   i = 0 ∂Ri (t ) S (0) 
(13.79)
Dividing the first two equations we also obtain
n
∂S (t )
∑ ∂R (t ) σ (t )α (t ) R (t ) sin θ (t )
i i i i
tan θ (t ) = i =0 i
(13.80)
n
∂S (t )
∑
i = 0 ∂Ri (t )
σ i (t )α i (t ) Ri (t ) cos θ i (t )
Recall however that this methodology provided a difficulty when solving for
alpha in the one factor case. Here we will be faced with the same problem. Squaring
the last two equations (13.77) and (13.78) to eliminate sines and cosines, the expression
obtained can be solved for α(t). However, it involves a division by (S(t)-S(0)) in the
denominator, which yields 0 for S(t)=S(0) and thus makes the ratio explode towards
infinity. Further this is impossible to solve for as S(t) is stochastic
2
 n ∂S (t ) 
∑ σ i (t )α i (t )( Ri (t ) − R f i (t )) sin θ i (t ) 
i = 0 ∂Ri (t )
α 2 (t ) =   +
σ (t )( S (t ) − S (0))
2 2
2
 n ∂S (t ) 
∑ σ i (t )α i (t )( Ri (t ) − R f i (t )) cos θ i (t ) 
i = 0 ∂Ri (t )
+ 
σ 2 (t )( S (t ) − S (0)) 2
(13.81)
We seek now to find alpha through a different approach:
If we proceed as in the one factor case, we can firstly equate our two expressions
through their Brownian motions, and secondly, we can equate them further, as we
already did before, via their normality and lognormality:
n
∂S (t )
σ (t )α (t ) S (t ) sin θ (t ) = ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t ) (13.82)
i =0 ∂Ri (t )
n
∂S (t )
σ (t )α (t ) S (t ) cos θ (t ) = ∑ σ i (t )α i (t ) Ri (t ) cos θ i (t ) (13.83)
i=0 ∂Ri (t )
211
n
∂S (t )
σ (t )(1 − α (t )) S (0) sin θ (t ) = ∑ σ i (t )(1 − α i (t )) R f i (t ) sin θ i (t )
i=0 ∂Ri (t )
(13.84)
n
∂S (t )
σ (t )(1 − α (t )) S (0) cos θ (t ) = ∑ σ i (t )(1 − α i (t )) R f i (t ) cos θ i (t )
i=0 ∂Ri (t )
(13.85)
The main problem that we encounter at this stage is the fact that we have three
variables σ(t), α(t), and θ(t) to solve with four equations.
The addition of a fifth trigonometric relationship must be approached carefully as

it involves squares and roots that enforce the sign on some of our parameters. We add
sin 2 θ (t ) + cos 2 θ (t ) = 1 (13.86)
The system is clearly over-determined, and a preferential choice of one

combination of solutions over another is not evident. We proceed to derive a range of
alternatives which we have subsequently tested for.
We could solve for alpha in the first two equations (13.82) and (13.83), obtaining
n
∂S (t )
∑ ∂R (t ) σ (t )α (t ) R (t ) sin θ (t )
i i i i
α (t ) = i =0 i
n
∂S (0)
∑ ∂R (t ) R
i =0
f
i (t )σ i (t ) sin θ i (t )
Or i (13.87)
n
∂S (t )
∑ ∂R (t )
σ i (t )α i (t ) Ri (t ) cos θ i (t )
α (t ) = i = 0 n i
∂S (0) f
∑i = 0 ∂Ri (t )
R i (t )σ i (t ) cos θ i (t )
The problem is that we do not know which one of the two to use.
We could attempt some sort of mean:
212
 n ∂S (t ) n
∂S (t ) 
∑ σ i (t )α i (t ) Ri (t ) sin θ i (t ) ∑ σ i (t )α i (t ) Ri (t ) cos θ i (t ) 
∂R (t ) ∂R (t )
α (t ) =  i =0 n i 
1
+ i =0 n i
2 ∂S ( 0 ) ∂S ( 0) 
 ∑ R f i (t )σ i (t ) sin θ i (t ) ∑ R f i (t )σ i (t ) cos θ i (t ) 
 i =0 ∂Ri (t ) i = 0 ∂Ri (t ) 
(13.88)
But this proves not to work too well.
If instead we relate (13.82) and (13.83) through trigonometry
α (t ) =
2 2
1  n ∂S (t )   n ∂S (t ) 
= ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cos θi (t ) 
S (t )σ (t )  i =0 ∂Ri (t )   i = 0 ∂Ri (t ) 
(13.89)
2 2
 n ∂S (t )   n ∂S (t ) 
∑ σ i (t )α i (t ) Ri (t ) sin θi (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cos θi (t ) 
i = 0 ∂Ri (t )   i = 0 ∂Ri (t )
α (t ) =  2 2

 n ∂S (0) f   n ∂S (0) f 
∑ R i (t )σ i (t ) cos θi (t )  +  ∑ R i (t )σ i (t ) sin θ i (t ) 
 i =0 ∂Ri (t )   i = 0 ∂Ri (t ) 
(13.90)
We find that the main problem with this approach is the fact that the root
constrains our solutions of α to a positive plane, whereas we have seen in the one
factor model that α can be both positive and negative.
We also attempted to use the expression for alpha that was developed in the one
factor case (13.45), that is, an expression that would be theta independent.
n
∂S (0)
∑ ∂R (t ) R (t )σ
i =0
i i (t )α i (t )
α (t ) = i
(13.91)
n
∂S (0) f
∑
i = 0 ∂Ri (t )
R i (t )σ i (t )
Surprisingly enough, we found that, although inconsistent with the two factor
formulas developed, this expression for alpha proved extremely effective.
213
We do realize however, that as in the one factor case, α turns out to be a mean of
all the αi. We could therefore attempt to use other averages which cannot be derived
mathematically from the above equations, such as:
 n ∂S (t )   n ∂S (t ) 
 ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cos θ i (t ) 
i = 0 ∂Ri (t )   i =0 ∂Ri (t )
α (t ) =  
 ∂S (0) f
n
  n ∂S (0) f 
 ∑ R i (t )σ i (t ) cos θ i (t )  +  ∑ R i (t )σ i (t ) sin θ i (t ) 
 i =0 ∂Ri (t )   i = 0 ∂Ri (t ) 
(13.92)
We have found that we obtain even better results with an expression of the form
 n ∂S (t )   n ∂S (t ) 
 ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cosθ i (t ) 
1  i =0 i ∂ R (t ) ∂
  i =0 iR (t ) 
α (t ) =
2 2
2  n ∂S (0) f   n ∂S (0) f 
 ∑ R i (t )σ i (t ) cosθ i (t )  +  ∑ R i (t )σ i (t ) sin θ i (t ) 
 i = 0 ∂Ri (t )   i = 0 ∂Ri (t ) 
(13.93)
Indeed the former proves to be one of our favourite candidates for the analytic
approximation. Its main drawback clearly being the fact that it cannot be derived from
the initial equations. This means that an extension of the analytic approximation to the
three strike scenario would rely more on the insight of the quant to come up with an
appropriate mean for alpha, than a logic follow-through of mathematical formulas.
We therefore decide to persist with our search for a more logical expression.
If we decide instead to take the last two equations (13.84) and (13.85), we would
find:
σ 2 (t )(1 − α (t )) 2 S 2 (0) =
2 2
 n ∂S (t )   n ∂S (t ) 
= ∑ σ i (t )(1 − α i (t )) R f i (t ) cos θi (t )  +  ∑ σ i (t )(1 − α i (t )) R f i (t ) sin θi (t ) 
 i =0 ∂Ri (t )   i =0 ∂Ri (t ) 
(13.94)
214
α (t ) =
2 2
 n ∂S (t )   n ∂S (t ) 
∑ σ i (t )(1 − α i (t )) R f i (t ) cos θi (t )  +  ∑ σ i (t )(1 − α i (t )) R f i (t ) sin θi (t ) 
 i =0 ∂Ri (t )   i =0 ∂Ri (t ) 
= 1−
σ (t ) S (0)
(13.95)
2 2
 n ∂S(t)   n ∂S(t) 
∑ σi (t)Rf i (t)cosθi (t) +∑ σi (t)Rf i (t)sinθi (t)
 i=0 ∂Ri (t)   i=0 ∂Ri (t) 
=
2 2
 n ∂S(t)   n ∂S(t) 
∂R
 i=0 i (t)  ∂R
 i=0 i (t) 
2 2
 n ∂S(t)   n ∂S(t) 
∑ σi (t)(1−αi (t))Rf i (t)cosθi (t) +∑ σi (t)(1−αi (t))Rf i (t)sinθi (t)
∂R
 i=0 i (t)  ∂R
 i=0 i (t) 
−
2 2
 n ∂S(t)   n ∂S(t) 
 i=0 ∂Ri (t)   i=0 ∂Ri (t) 
(13.96)
However, this method restricts our values of α, forcing them to be smaller than
one, something which we have seen in the one factor case that does not always hold
true.
We finally come across the best solution: that which attempts to encompass all
four of the initial equations, and that is not ‘intuitively’ guessed and constructed by
the quant as a mean.
Let us start by developing the above expression (13.94) concerning the last two
equations (13.84) and (13.85)
σ 2 (t )(1 − α (t )) 2 S 2 (0) = σ 2 (t )(α (t ) 2 − 2α (t ) + 1) S 2 (0) =

2 2
 n ∂S (t )   n ∂S (t ) 
= ∑ σ i (t )(1 − α i (t )) R f i (t ) cos θi (t )  +  ∑ σ i (t )(1 − α i (t )) R f i (t ) sin θi (t )  =
 i =0 ∂Ri (t )   i =0 ∂Ri (t ) 
=E +F2 2
(13.97)
Let us recall that a similar approach with the first two equations gave
215
σ 2 (t )α 2 (t ) S 2 (t ) =
2 2
 n ∂S (t )   n ∂S (t ) 
= ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cos θi (t )  =
 i = 0 ∂Ri (t )   i = 0 ∂Ri (t ) 
=C +D2 2
(13.98)
Therefore
E2 + F 2 E 2 + F 2 C 2 + D2
− 2α (t )σ 2 (t ) = − α (t ) σ
2 2
(t ) − σ 2
(t ) = − 2 − σ 2 (t ) (13.99)
S 2 ( 0) S 2 (0) S (0)
Remember that we had taken a different expression for σ
σ (t ) =
2 2
1  n ∂S (0) f   n ∂S (0) f 
= ∑ R (t )σ (t ) cos θ (t )  + ∑ R i (t )σ i (t ) sin θi (t ) 
S (0)  i = 0 ∂Ri (t )   i =0 ∂Ri (t )
i i i

=
1
S (0)
(
A2 + B 2 )
(13.100)
Leaving 1  E 2 + F 2 C 2 + D 2 A2 + B 2  (13.101)
α (t ) = − 2 − 2
2σ 2 (t )  S 2 (0) S ( 0) S (0) 
The above simplifies to
CA + DB
α (t ) = (13.102)
A2 + B 2
Or in its extended full version, where as always, in order to solve we must freeze
Ri to Rfi.
216
α (t ) =
 n ∂S (t )   n ∂S (0) f 
∑ σ i (t )α i (t ) Ri (t ) sin θi (t )   ∑ R i (t )σ i (t ) sin θi (t ) 
 i=0 i ∂ R ( t ) ∂
  i =0 iR (t )  +
2 2
 n ∂S (0) f   n ∂S (0) f 
∑ R i (t )σ i (t ) cos θ i (t )  +  ∑ R i (t )σ i (t ) sin θ i (t ) 
 i = 0 ∂Ri (t )   i = 0 ∂Ri (t ) 
 n ∂S (t )   n ∂S (0) f 
∑ σ i (t )α i (t ) Ri (t ) cos θ i (t )   ∑ R i (t )σ i (t ) cos θ i (t ) 
∂ R (t ) ∂ R (t )
+    i =0 i 
i =0 i
2 2
 n ∂S (0) f   n ∂S (0) f 
∑ R i (t )σ i (t ) cos θ i (t )  +  ∑ R i (t )σ i (t ) sin θi (t ) 
 i = 0 ∂Ri (t )   i =0 ∂Ri (t ) 
(13.103)
From all the alternatives, this last expression resulted in being that which was
fastest in calibrations, was consistent with the mathematical developments, and
always calibrated whenever there also existed a solution through MonteCarlo
simulations for the HJM.
13.9 Use of ‘No Split’
To this point we have always been considering split processes. That is to say, we
have always been considering joint calibrations of two vanilla products at a time. As
examined previously, this was necessary so as to obtain an intersection of the two
solution curves.
A ‘no split’ process is a bulk calibration procedure. We no longer calibrate the

vanilla products by pairs of equal maturity, but instead take the entire range of
maturities and calibrate them together.
This procedure results much more time consuming computational-wise, as it is

much more difficult for the algorithm to converge with so many parameter
considerations at once.
The analytic approximation however is capable of arriving at a rapid solution

when it deals with so many products at once. And the more surprising fact is that if
we use the analytic approximation’s solution as a first guess to then perform a no split
217
calibration through the HJM MonteCarlo process, we find that the calibration now
becomes much faster. In other words, before MonteCarlo no split was extremely
tedious on its own. Now, by this new procedure in which no split MonteCarlos start
from a no split approximation solution, we achieve much more rapid results than the
equivalent split approximation followed by a split MonteCarlo.
We have further noticed that there are specific cases in which with the no split
and analytic approximation, we are capable of solving calibrations performed
exclusively on caplets. Identical calibrations using the split method find no solution:
refer to the Analytic Approximation Results Section 14 in the Calibration Set
interpolation matrix section.
218
14. Analytic Approximation Results
14.1 1 Factor Model
We now proceed to analyse the results obtained from the tests performed on the
analytic approximation. Both of the proposed methods in the previous section work
well on a wide range of tests. However, the first approach for alpha rapidly ceases to
calibrate past the 5 to 10 year maturity mark.
Our great achievement lies in the fact that the second approximation proves to
always be capable of calibrating, whenever the HJM MonteCarlo is also capable of
converging for a given set of data. If the exact solution by MonteCarlo does not exist,
we will find that even so, many times the analytic approximation still provides us
with a result.
Therefore, having decided on the second method as our final expression for alpha
and sigma for our analytic approximation, we continue to use the graphical analysis
tool developed earlier in the project. This will give us good visual confirmation of how
the analytic approximation is performing, and will strengthen our understanding on
how it works.
We proceed to compare the HJM MonteCarlo solutions with the analytic

approximation solutions to confirm their similarity. Thus we will verify that the
analytic approximation solution is a very good first guess for the HJM.
219
Chapter 14 Analytic Approximation Results
14.1.1 Testing
The analytic approximation was submitted to an exhaustive series of tests. In

these, we attempted to make sure that the analytic approximation responded correctly
to any possible scenario. We therefore tested how it reacted at different maturities and
at different strikes.
We note firstly that the analytic approximation acts more or less as a tangent to
the real HJM MonteCarlo solution curve. Further, the analytic approximation always
shows a monotonous behaviour, whereas the MonteCarlo solution clearly does not.
Approximation at High Strikes

0.25
MC K = 0.052
0.21 MC K = 0.05
Approximation K = 0.052
0.17
Sigma
0.13
0.09
0.05
-2 0 2 4 6 8
Alpha [%]
Fig. 14.1. Analytic approximation at high strikes
The first important thing to notice therefore is that the analytic approximation
acts as a tangent in the region where the final specific solution is achieved, making it
therefore useful for our study as it adapts well to the MonteCarlo simulation.
220
Analytic Approximation Dynamics
0,25
MC K = 0,03
0,23
MC K = 0,06
l 0,21 Analytic Approximation K = 0,03
Analytic Approximation K = 0,06
0,19
0,17
0,15
Sigma
0,13
0,11
0,09
0,07
0,05
-2 0 2 4 6 8
Alpha [%]
Fig. 14.2. Analytic approximation at distant strikes
Now the next thing we must state is that the analytic approximation is not always
a perfect tangent. Notice in the above graph how it adapts well to either side of the
HJM’s MonteCarlo hump. However, for ‘at the money’ values, this is, for the
maximum point of the MonteCarlo solution curve in the below graph, the tangent
should be flat. Instead, the analytic approximation acts more or less as a tangent and
not as a curve with a unique point of contact. Further, its gradient is not unique, as
would be the case of a real tangent.
221
Approximation Tangent 'at the money'
0.22
0.2
0.18
0.16
MC ATM K = 4.2%
Sigma
MC ATM K = 4.3%
0.14 Approximation ATM K = 4.2%
Approximation ATM K = 4.3%
0.12
0.1
-2 0 2 4 6 8
Alpha [%]
Fig. 14.3. Analytic approximation acting as a tangent ‘at the money’
The need to perform tests for further approximations arises explicitly because the
visual similarity between the analytic approximation and the HJM MontreCarlo
curves is not exact, (as can be seen in the figure below). For certain strikes that are
very far from ‘at the money’, the analytic approximation can visually be quite
different.
Note that we proceed to investigate other possibilities simply to see if any further
optimisation can be achieved. But we must state confidently that the approximation at
this level already calibrates extremely rapidly. Despite the difference in slopes,
because the analytic approximation’s gradient is much more pronounced, it actually
converges more rapidly towards the final solution than other better fitted alternatives.
Further, despite the slope difference, the two solutions, (MonteCarlo and analytic
approximation), continue to be extremely close together.
222
Approximation at Distant Strikes
0.25
MC K = 0.03
0.21 MC K = 0.04
0.17
Sigma
0.13
0.09
0.05
-2 0 2 4 6 8
Alpha [%]
Fig. 14.4. Analytic approximation presents difficulties in adjusting to the curve at

distant strikes
14.1.2 Use of the Epsilon Approximation.
As stated in the previous chapter, the epsilon approximation allows to alter the
point of study. However, with the tests performed on both epsilon adjusted and non
adjusted formulations, we can conclude the following:
Calibrating at the money is a consistent, robust approach, as it settles a very good

average value for the volatility level.
Calibrating at the precise strike under consideration is very unstable. This is

because, as we have to calibrate two products at different strikes, on calibrating we
must continually jump from one of their strikes to the other’s. This is a source of
instability.
The logical next approach would be to calibrate at a fixed intermediate point

between the two products’ strike. Having a fixed position reduces instability, and
proves to generate very good results. However the improvement over calibrating ‘at
the money’ is not substantial. Furthermore, instability can still arise because the
calibration set can still have different pairs of strikes for different pairs of products.
223
This is, at different maturities the average strike also fluctuates, introducing instability
and resulting in a difficulty for the calibration process.
We therefore decide to maintain the calculations performed ‘at the money’ and so
do not pursue the epsilon approach any further.
14.1.3 Sigma and Alpha adjusting
Further corrections were performed on the alpha and the sigma parameters.
These were hard-coded and simply tested for manually without any logic behind
them, more than that of trial and error, and visually seeing if the graphical output
resembled the MonteCarlo.
We found that inserting factors in front of the alpha expression added no further
improvements.
Adjustments in the sigma on the other hand could improve the fit.
Indeed, this adjustment factor only needed to be extremely small to produce a

noticeable visual difference on the analytic approximation results. See below how a
constant factor could greatly improve fits for large strikes, but at the same time
impoverish the adjustment at low strikes.
Epsilon Correction at High Strikes

0.25
MC K = 0.06
0.21
MC K = 0.03
Approximation K =0.06
Approximation K =0.03
0.17
Sigma
0.13
0.09
0.05
-2 0 2 4 6 8
Alpha [% ]
224
Fig. 14.5. Analytic approximation corrected in sigma at high strikes
Similarly, modifying this factor we could obtain a good adjustment at low strikes,
but would then lose accuracy at high strikes:
Epsilon Correction at Low Strikes
0.25
MC K = 0.06
MC K = 0.03
0.21 Approximation K = 0.06
0.17
Sigma
0.13
0.09
0.05
-2 0 2 4 6 8
Alpha [% ]
Fig. 14.6. Analytic approximation corrected in sigma for low strikes
A constant factor could not be used, and thus we created a factor that varied with
strikes.
14.1.4 Global adjustment: Factor:
 FWD − K 
σ * = σ ⋅  0.99 + 0.04 ⋅  (14.1)
 FWD 
The above is an example of a good adjustment factor for the calibration set that
we were considering. See below how visually, the adjustment is slightly enhanced.
The main thing to notice here is the fact that we now obtain a very good fit at both
high and low strike values.
225
Sigma Factor Varying with Strike
0.25
MC K = 0.06
0.21 MC K = 0.03
0.17
Sigma
0.13
0.09
0.05
-2 0 2 4 6 8
Alpha [% ]
Fig. 14.7. Analytic approximation with a varying sigma correction
Despite the evident visual improvement, we must state the following. The
computational time difference between using the plain analytic approximation and
this improved method is very slight, amounting to a difference of only one or two
analytic approximation iterations. The subsequent HJM iteration counts are generally
equal between the two methods (recall that it is these HJM iterations that account for
the principal time consumption in the process).
There is a further statement that we must clearly point out in case any future
developments are pursued along this line. The improvement factor is specific for a
given maturity. This is, the close to perfect fit that we have achieved with the above
formulation is specific to a caplet fixing 6 years after settlement and expiring three
months after that. Tests performed on products with different maturities turned out to
require slightly different sigma adjustment factors.
If this were to be pursued further, the final factor would therefore necessarily
have to be a function of
F(fixing, maturity, strike, forward)
226
14.1.5 Use of the qi normalisation approach.
We attempted a final approximation that initially appeared reasonable. Notice

how we have always been altering the sigma factor by minimal amounts. According
to the final formula we saw previously in (14.1) that these values would typically
range between 0.97 and 1.01. Furthermore, we realised that the sigma factor was not
an exact weighting. This is, recalling its expression from the previous section:
n
∂S (0) Ri f (t )
σ (t ) = ∑ q i (t )σ i (t ) qi (t ) = (14.2)
i =0 ∂Ri f (t ) S (0)
n
qi (t )σ i (t )
α (t ) = ∑ pi (t )α i (t ) p i (t ) = n
(14.3)
i=0
∑q
i=0
i (t )σ i (t )
∑ q (t ) ≈ 1
i =0
i (14.4)
∑ p (t ) = 1
i =0
i (14.5)
Now the alpha term is an exact average but we see that the sigma is not. We
therefore decided to normalise all sigma terms using the following expression:
∑ q (t )σ i i (t )
q i (t ) = i=0
n
(14.6)
∑ q (t )
i=0
i
Results conclusively signalled that this approach worsened calibrations.
14.2 Analytic Approximation Jacobian
Another important factor to take into account at this stage is the Jacobian- i.e. the
slopes that the analytic approximation generates. Recall that one of the aims of our
analytic approximation was to be able to use its analytical Jacobian instead of having
to recalculate it numerically through the HJM calibration process. This would greatly
227
reduce computation times. A first, straightforward method to confirm the similarity in

Jacobians is by comparing the slopes graphically.
From a distant perspective we can see below that the behaviour of the two
surfaces can appear to be quite different. Note that the analytic approximation’s
solution is monotonous whereas the HJM has a more peculiar form. However, the
similarity of the slopes in the region that is of our interest, that is, around the solution
curve, is actually very similar.
HJM MonteCarlo Slopes
0 Model - Market
Price
-1
solution curve
-2
0.22 -3
0.16 -4
-2
-1
Sigma 0.1
0
1
2
3
4
5
0.04
6
7
Alpha [%]
8
Fig. 14.8. HJM MonteCarlo slopes and solution curve
228
Analytic Approximation Slopes
solution curve 2

1
-1
-2
0.24
0.2 -3
0.16 -4
-2
0.12
-0.5
Sigma
1
0.08
2.5
4
5.5
0.04 Alpha [%]

7
Fig. 14.9. Analytic approximation slopes and solution curve
If we were to make a closer inspection of the region of interest where the final
solution occurs, we would find something of the following form:
HJM MonteCarlo Slopes
1
0
solution curve
-1
-2 0.22
0.16
-3 Sigma
0.1
8
7
6
5
4
3
2
0.04
1
0
-1
-2
Alpha [%]
Fig. 14.10. Close-up on HJM MonteCarlo’s slopes and solution curve
229
Analytic Approximation Slopes
solution curve
1
-1
0.24
-2 0.19
0.14
Sigma
-3
0.09
8
7
6
5
4
3
2
0.04
1
0
-1
-2
Alpha [%]
Fig. 14.11. Close-up on analytic approximation’s slopes and solution curve
We see that our well known solution curves are still the intersection of the Ω
surface of prices with the horizontal axis. Once again, we distinguish the analytic
approximation as the tangent of the HJM MonteCarlo curve. Yet now, with the 3
dimensional view, we are capable of appreciating the differences between the slopes:
there is a slight difference in concavity, and the HJM solution flattens out below the
horizontal axis much sooner than the analytic approximation.
Nevertheless, the similarity is sufficient to enable us to use the analytic

approximation’s Jacobian as the HJM model’s Jacobian in any iteration process within
the specified region.
14.3 2 Factor Analytic Approximation
We set out in our analysis to test all of the possible alternatives for alpha. Recall
that the expression for sigma was the same in all cases. Many of the candidates
dropped out straight away because they were unable to advance at all in any given
calibration. The remaining candidates were thus:
230
 n ∂S (t )   n ∂S (t ) 
 ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cos θ i (t ) 
i = 0 ∂Ri (t )   i =0 ∂Ri (t )
α (t ) =  
 ∂S (0) f
n
  ∂S (0) f
n

 ∑ R i (t )σ i (t ) cos θ i (t )  +  ∑ R i (t )σ i (t ) sin θ i (t ) 
∂R
 i =0 i (t )   i =0 i∂R ( t ) 
(14.7)
 n ∂S (t )   n ∂S (t ) 
 ∑ σ i (t )α i (t ) Ri (t ) sin θ i (t )  +  ∑ σ i (t )α i (t ) Ri (t ) cosθ i (t ) 
1  i =0 ∂Ri (t )   i = 0 ∂Ri (t ) 
α (t ) =
2 2
2  n ∂S (0) f   n ∂S (0) f 
 ∑ R i (t )σ i (t ) cosθ i (t )  +  ∑ R i (t )σ i (t ) sin θ i (t ) 
 i = 0 ∂Ri (t )   i =0 ∂Ri (t ) 
(14.8)
α(t) =
2 2
 n ∂S(t)   n ∂S(t) 
∑ σi (t)R f i (t)cosθi (t)  +  ∑ σi (t)R f i (t)sinθi (t) 
 i=0 ∂Ri (t)   i=0 ∂Ri (t) 
= −
2 2
 n ∂S(t)   n ∂S(t) 
∂R
 i=0 i (t ) ∂
  i=0 iR (t ) 
2 2
 n ∂S(t)   n ∂S(t) 
∑ σi (t)(1− αi (t))R f i (t)cosθi (t)  +  ∑ σi (t)(1−αi (t))R f i (t)sinθi (t) 
 i=0 ∂Ri (t)   i=0 ∂Ri (t) 
−
2 2
 n ∂S(t)   n ∂S(t) 
 i=0 ∂Ri (t)   i=0 ∂Ri (t) 
(14.9)
n
∂S (0)
∑ ∂R (t ) R (t )σ
i=0
i i (t )α i (t )
α (t ) = i
(14.10)
n
∂S (0) f
∑
i = 0 ∂Ri (t )
R i (t )σ i (t )
These four alternatives were all capable of calibrating with relative ease any small
set with maturities reaching 15 years. To differentiate which of the candidates we
would finally select, we were thus forced to submit them to more extreme calibrations.
When testing on 18 year calibrations with 34 Swaptions and 15 correlations, the

third and fourth of the above formulas, (14.9) and (14.10), already proved to be much
faster than the rest. The third was capable of calibrating without splitting, in 4
231
iterations. Notice that the first two alternatives were yielding up to 17 iterations,
therefore resulting much more time consuming. The fourth expression was discovered
to also be capable of calibrating, although in 5 iterations
We made a critical final test that proved extremely difficult to calibrate. It

involved 66 Swaptions and 15 correlations. In this final test, only the third expression
and the fourth expression were capable of performing the calibration.
As we found no further difference between the two, we decided to select the third
expression for two principal reasons:
• It performed faster by two to three iterations in all tests performed.
• It was mathematically consistent with the formulae derived for the analytic
approximation, and so did not rely on a quant’s intuitive mean weighting
approach.
14.4 Final Considerations on the Analytic approximation
So far we have seen graphical and numerical comparisons between the HJM and
our analytic approximation and confirmed their similarities. The critical analysis that
remains therefore is to determine whether we really do achieve the principal goal of
our project, this is, whether we really do significantly reduce calibration times.
1 Factor:
Calibration Set: 56 Swaptions
HJM Montecarlo 94s

Analytic approximation + HJM MonteCarlo without Split: 56s
Analytic approximation + HJM MonteCarlo with Split: 17s
Table 14.1. Approximation increases calibration speed by a factor of 5
232
2 Factors:
Calibration Set: 66 Swaptions, 15 Correlations
HJM Montecarlo: 9mins 53s

Analytic approximation + HJM MonteCarlo: 1min 12s
Table 14.2. Approximation increases calibration speed by a factor of 8
14.5 Conclusions and Further Developments
The analytic approximation appears to work extremely well. It reduces

calibration times by a factor of 3 to 10. These results are truly extraordinary. We
would like to note that the analytic approximation has already successfully been
implemented within the Banco Santander, and is being used on a daily basis by the
interest rate traders. Future developments must centre on an analytic approximation
for the 3 strike model. Initially, this would appear to involve a relatively intuitive
extrapolation of the two strikes analytic approximation methods presented above.
14.6 Analytic approximation Peculiarities
Recall that there existed three cases in which the HJM MonteCarlo failed to
produce results. We find in contrast that the analytic approximation, because of its
characteristics and in particular, because it is monotonous, manages to surmount
these difficulties.
The first problem stated in this document was the duplicity of solutions
encountered. We found that when these situations arised during calibrations with our
analytic approximation always selected the correct solution. This is, it always selected
the solution with an alpha closest to the [0,1] interval. Recall that the alpha was a
weighting parameter that allowed us to choose between a lognormal (α = 1) and a
233
normal model (α = 0). It therefore seems unreasonable that we should select a value of
6 for alpha. This would imply something of the form: “we are six times a lognormal
model”.
Solution Duplicity
0.25
MC K = 2.5%
0.2 MC K = 6.5%
Approximation K = 2.5%
Approximation K = 6.5%
0.15
Sigma
0.1
MC and analytic
approximation solution
0.05
second MC solution
0
-2 0 2 4 6 8
Alpha [%]
Fig. 14.12. HJM MonteCarlo versus analytic approximation solving solution

duplicities
Therefore, by using the analytic approximation as a first guess, we condition the

HJM MonteCarlo to start very close to the desired solution. In this way we avoid the
possibility that it erroneously converges to the alternative solution.
Another of the problems which we had encountered was the case in which the
HJM MonteCarlo solution curves were encompassed one inside the other. This
inevitably lead to an inexistent intersection, thus meaning that there was no valid
solution i.e. no valid pair of model parameters that could simultaneously satisfy both
conditions imposed by the two vanilla products.
234
HJM MonteCarlo Solution Curves
0.25
MC ATM K = 4.5%
0.2 MC K = 3%
0.15
Sigma
0.1
0.05
0
-2 0 2 4 6 8
Alpha [% ]
Fig. 14.13. HJM MonteCarlo presents no solution curve intersection
Because the analytic approximation is monotonous, we do find an intersection of

its solutions:
Solution Curves
0.25 MC ATM K=4.5 %

MC K = 3%
analytic approximation Approximation ATM K = 4.5%
solution Approximation K = 3%
0.2
0.15
Sigma
0.1
0.05
0
-2 0 2 4 6 8
Alpha [%]
Fig. 14.14. Analytic approximation solving a case with no HJM MonteCarlo solution
intersection
235
We are yet unclear about whether the solutions provided by the approximate
model should be considered as valid and thus revise our HJM model, or whether they
too are incorrect. Whichever of the two we finally decide upon, the HJM model clearly
still needs further corrections, as it should be able to calibrate given a reasonable set of
input data. The fact that it is incapable is a problem we must examine further.
Recall now the final problematic encountered with HJM calibrations. There were
situations in which the HJM MonteCarlos price surface Ω was incapable of descending
below the horizontal axis. This lack of flexibility implied that we were never capable
of equating the model and market prices. We can analyse this situation more in depth
at this point in our study:
7
6
5
4
3
2
1
0
0.18
-1
0.11
7
Sigma
5.5
2.5
0.04
1
-0.5
Alpha [%] solution curve

-2
Fig. 14.15. HJM MonteCarlo first vanilla presenting a solution curve
236
5
1
0.18
0
0.11
7
Sigma
5.5
2.5
0.04
1
-0.5
Alpha [%] -2
Fig. 14.16. HJM MonteCarlo second vanilla does not descend sufficiently
Realise that we continue calibrating in pairs, and although with one of the
products, the price surface does descend sufficiently, in the other product this surface
remains asymptotic to the horizontal axis. No solution curve is achieved.
When we continue with this scenario onto an analytic approximation calibration,

we find the following:
237
Analyticv Approximation - Vanilla 1
1.5
0.5
0
solution curve
-0.5
0.18
-1 0.11 Sigma
8
7
6
5
4
0.04
3
2
1
0
-1
-2
Alpha [%]
Fig. 14.17. Analytic approximation presents a solution for the first vanilla
Analytic Approximation -Vanilla 2
2.5
1.5
1
solution curve
0.5
0.22
0
0.16
-0.5 Sigma
0.1
-1
8
7
0.04
6
5
4
3
2
1
0
-1
-2
Alpha [%]
Fig. 14.18. Analytic approximation also presents a solution for the second troublesome
vanilla
238
Surprisingly we find a solution with both vanilla products. On a two dimensional

plane, this is equivalent to:
Solution Curves
0.24
analytic approximation
0.19
solution
Sigma
0.14
MC ATM K = 4%
Approximation ATM K = 4%
Approximation K = 1%
0.09
0.04
-2 0 2 4 6 8
Alpha [% ]
Fig. 14.19. HJM MonteCarlo versus analytic approximation for a two dimensional view of
the previous cases
Where we have included the single HJM MonteCarlo vanilla product that we
managed to calibrate.
239
Chapter 15 Calibration Set Interpolation Matrix
15. Calibration Set Interpolation Matrix
We explore this section in the search of a possible solution to two main problems
presented during our calibration tests:
• The failure to calibrate when our calibration set was solely composed of caplets.
• The failure to calibrate when using a joint set of caplets and swaptions.
15.1 Initial Data
We have discovered (as is presented in the results of this section, 15.4) that the
specific interpolation process that we may decide for our matrix can have a powerful
impact on the convergence of the calibration itself. Further, the extrapolation of our
calibration set can even transform our solution surface Ω, that we had by now grown
so accustomed to. This peculiarity has edged us to seek for the best possible solution.
15.2 Former approach analysis
Initially, we were performing a horizontal linear interpolation within the triangle

defined by the Target Parameters, and linear outside it. Recall what was discussed in
Section 10.2
15.2.1 1 Factor: (only Caplets)
In the 1 factor scenario the data is not interpolated but instead extrapolated
constantly from the Target Parameters.
240
U0 Ui UN
T0
A
Ti
TN B
Fig. 15.1. Strike Interpolation
From the tests performed, we have concluded that the data above the diagonal is
the most influential to our calibration process. This calibration works best when the
interpolation is performed vertically in this region A, irrespective of whether B is
vertical or horizontal. We proceed to the two factor scenario taking both A and B
vertically.
15.2.2 Tests performed
The one factor calibration was tested successfully up to maturities of 20 years,

with three month caplets and with a frequency of 3 months. Other less drastic
scenarios with fewer fixings and 6 month or 1 year caplets were also successfully
overcome. Thus, the one factor model with vertical extrapolation is in this way now
capable of successfully calibrating caplets.
15.3 2 Strikes
In the 2 strikes scenario, the data was formerly being interpolated horizontally
within the Target Parameter triangle and extrapolated horizontally outside it.
Following on with the improvement in caplet calibration achieved through the
vertical extrapolation, and due also to the fact that the current approach was proving
incapable of calibrating caplets, we proceeded to implement: a horizontal
interpolation within the Target Parameter triangle, and a vertical extrapolation
outside this.
241
U0 Ui UN
T0 interpolate
extrapolate
Ti
TN
Fig. 15.2. Strike Interpolation
0 1.36438 2.37808 3.37808 4.37808 5.36986 6.36712 7.36712 8.38082 9.38082 10.1315 10.1753
1.11233 3.82583 3.82583 3.66615 3.50862 3.3511 3.19487 3.03778 2.88026 2.72058 2.56306 2.44481 2.44481
2.11507 3.82583 3.82583 2.34725 2.46755 2.58785 2.70715 2.82712 2.94742 3.06937 3.18966 3.27997 3.27997
3.11507 3.82583 3.82583 2.34725 2.19095 2.21627 2.24137 2.26662 2.29194 2.3176 2.34291 2.36192 2.36192
4.12055 3.82583 3.82583 2.34725 2.19095 1.00457 1.0092 1.01385 1.01851 1.02324 1.02791 1.03141 1.03141
5.11781 3.82583 3.82583 2.34725 2.19095 1.00457 1.08194 1.14651 1.21126 1.2769 1.34165 1.39025 1.39025
6.11781 3.82583 3.82583 2.34725 2.19095 1.00457 0.982652 0.982652 0.934131 0.884945 0.836424 0.81252 0.80022
Table 15.1 Horizontal interpolation, vertical extrapolation
We obtain slightly different results compared to the formerly implemented

interpolation method.
This occurs because the extrapolated data has a slight influence on the
interpolated data, as each entire row of the matrix is used to compute the subsequent
row. A variation in any of the components in the row therefore has an effect on the
following one.
We found the following main results, regarding whether we were capable of

calibrating or not:
The improvement obtained through vertical extrapolation is extremely clear from

the below table.
242
Calibrates?
Vanilla Strike Split Proxy / MC Vertical Horizontal
Product
Caplet 1K Split Proxy + MC no no
MC yes no
No Split Proxy + MC yes no
MC yes very no
slow
2K No Split Proxy + MC yes no

MC yes very no
slow
Split Proxy + MC no no
MC no no
Caplet + 1K yes yes

Swaption
2K MC fails fails
Table 15.2 Summary table of the differences between vertical, horizontal

extrapolation, split and no split
However, we clearly see two main difficulties in the above. Firstly, the joint
calibration of caplets and swaption remains an unresolved problem.
Secondly, the new interpolation method proves to solve many of the problems
that were previously encountered in any caplet calibration. However, we realize that
it is only effective when it operates as a ‘no split process’- that is, taking the entire set
of vanilla products and calibrating them together at once. This requires an extremely
long and tedious computation process, that is greatly accelerated through the use of
the analytic approximation developed. However, it poses important difficulties if we
are to extend the approach to the three strike model, as here, there is yet no analytic
approximation to speed up the calculations. Calibrations with three strikes and ‘no
split’ will be extremely slow.
243
15.4 Graphical representation
We observe further consequences when changing the form of interpolation used.

Recall the graphical surface Ω created. We now obtain a different form for that same
surface, one that tips upwards again at the left end of the below graph, where before it
was almost horizontally asymptotic here.
Vertical Extrapolation
250
200
150
-15 100
-9 50
-3 0
-50
3
0.25
Alpha [%]
0.2
9
0.15
0.1
15
0.05
Sigma
0
Fig. 15.3. Vertical extrapolation no longer flat
This deformation becomes increasingly drastic. Note that now, the intersection
solution with the horizontal axis has actually evolved towards a circular form.
244
Horizontal Extrapolation
10
Model-Market price
6
4
2
0
-2
-4 0.138
-6 0.13 Sigma
0
0.2
0.4
0.6
0.8
0.122
1.2
Alpha [%]
Fig. 15.4. Surface Deformation in Horizontal Extrapolation
We note that this is exclusively a caplet characteristic. When we use the same
vertical interpolation approach to calibrate swaptions, the well known surface Ω
appears once again.
Swaptions Vertical Extrapolation
18
16
.
14
12
10
8
6 0.2
4 0.15
2 0.1 Sigma
0 0.05
-2
0
16
10
4
-2
-8
-14
-20
Alpha [%]
Fig. 15.5. Swaption Vertical Extrapolation stays the same
245
The transformation of the model price space brings with it a first important
consequence. The fact that the solution curve is no longer a curve with a ‘hump’ but
has now evolved towards a spherical form. This implies that on calibrating two
caplets, the intersection of these circles will always generate a duplicity of solutions
except for the particular case in which one circle is perfectly tangent to another.
Vertical Extrapolation
0.145 K=4.15%
K = 3%
0.14
0.135
Sigma
0.13
0.125
0.12
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Alpha [%]
Fig. 15.6. New Circular Solution Intersection
Thus the method introduces a new duplicity of solutions, this time in sigma
whereas previously it was only existent in alpha. The same question however is still
omnipresent: which is the correct solution?
The change in the model price surface has two further implications.
246
Firstly, the convergence of a Newton Raphson algorithm in this type of surface is

always much more direct as all lines of greatest slope head directly towards the
minimum value i.e. the same bottom pit of the convexity.
Secondly, we would like to state something important: The HJM ends its
calibration process when the error between its model value and the market value is
below a certain level. We realize that perhaps this level should be decreased
somewhat further. The reason is the fact that within the range of currently admitted
model errors, there is sufficient margin for a noticeable variation in the alpha
parameter.
See below a comparison between the two valid solutions, one of which has been
obtained through the new caplet surface and the other obtained through the
traditional horizontal extrapolation. The alpha parameter in particular is substantially
different (30% difference), despite the fact that the HJM accepts both solutions- see the
relative error.
Theoretically, the parameter generating a smaller error is more accurate. This

does not mean that the horizontal extrapolation is better. It simply implies that
perhaps one further iteration in the vertical extrapolation method should be
considered before submitting a final parameter value.
15.4.1 Vertical Extrapolation
ITERATION 0 : Type T U Value

1 SIGMA 145.479 171.233 0.116056
2 ALPHA 145.479 171.233 -0.27304
MarketPrice ModelPrice Relative error

1 579.465 579.511 0.045526 OK
2 0.205302 0.206877 0.157482 OK
Table 15.3 Results obtained through vertical extrapolation
247
15.4.2 Horizontal Extrapolation
ITERATION 0 : Type T U Value

1 SIGMA 119.726 145.753 0.115939
2 ALPHA 119.726 145.753 -0.34993
MarketPrice ModelPrice Relative error
1 495.114 495.118 0.004044 OK

2 0.10758 0.107367 0.021235 OK
Table 15.4 Results obtained through horizontal extrapolation
248
16. Interest Rate Volatilities: Stripping Caplet Volatilities

from cap quotes
16.1 Introduction.
This section describes how to create a set of caplet volatilities that are to be
derived from the market quoted cap volatilities. We will analyse two principal
approaches. The first involves an interpolation between cap prices to then extract the
caplet volatilities. The second method involves a direct interpolation amongst the
caplets themselves.
The input data which we require are the interest rate curve, the cap volatility
matrix, and future option prices. With these, we must be able to compute the caplet
forward volatilities for any given tenor (according to the market conventions, 3M in
the USD dollar, 6M in the Eurozone, etc.) and any given strike. The study is of
particular interest because the caplet volatilities are critical in the calibration of exotic
products. We present below the market quoted caps that are typically used as inputs
when deriving the corresponding caplets.
Fig. 16.1. Market Cap Quotes
249
Chapter 16 Interest Rate Volatilities: Stripping Caplet Volatilities from cap quotes
We generally follow the subsequent procedure: taking the market cap volatilities,
we firstly interpolate along the strike direction so as to create all the strike values that
are necessary for our calibration. After this, for each strike we then interpolate along
the maturity direction, (interpolating either in cap volatilities or caplet volatilities
depending on the approach) thus creating values for the already mentioned 6 month
regular intervals.
In general, we can use linear cap or constant caplet volatility interpolation as a

first approach before continuing onto the more complex functional form
optimisations.
A further step is to fit the sets of volatilities that have been calculated to a given
type of smile interpolator. This will require that we carry out the strike interpolation
at the end of the process rather than at the beginning so as to achieve an optimal smile
behaviour.
There are two principal methods of interpolation which we must firstly

distinguish:
1. Functional Form:
we use a simple deterministic function to interpolate between points. We can
further distinguish two variations here:
Global interpolation:
These methods rely on constructing a single equation that fits all the data points.
The equation is usually a high degree polynomial equation that results in a smooth
curve. However, they are usually not well suited for engineering applications, as they
are prone to severe oscillations and overshoots, which we attempt to avoid specifically
here.
250
Piecewise interpolation.
These methods rely on constructing a polynomial of low degree between each

pair of known data points. If a first degree polynomial is used, it is called linear
interpolation. Second and third degree polynomials are called quadratic and cubic
splines respectively. The higher the degree of the spline, the smoother the resulting
curve. Splines of degree m will have continuous derivatives up to a degree of m-1 at
the data points.
2. Stochastic Form:
This is the second possible form of interpolating. It implies using a stochastic
model such as CEV or SABR. We will enter the details of the SABR approach later
16.1.1 Strike Interpolation:
The first, ‘non smile’ interpolation between strikes is performed as simple as

possible. This is directly a linear interpolation as our first step in the previously
described process.
Vi ( K i +1 − K ) + Vi +1 ( K − K i )
V =
K i +1 − K i
16.2 Stripping Caplet Volatility Methods
It may be useful at this point to refer back to the financial product description of a
cap- Section 6.7. Recall that a cap is constructed as a sum of caplets.
The price of a cap on the 6 month EURIBOR starting at T and maturing at U is

given by the market by means of a unique flat caplet volatility. This means that the
price of the cap must be computed as the sum of the prices of all 6 month caplets
between T and U, whose volatility must be set as the unique flat caplet volatility
specified by the market.
251
However, these flat caplets are simply a construction mechanism to obtain the
market’s cap value easily. The former does not mean that the true caplets in the
specified time period should all have flat volatilities. Instead, it simply imposes that
the sum of all the caplets in the interval should yield the same price as the sum of all
the flat caplets in that interval.
Thus there is a great degree of freedom when we try to analyse the true market of
caplets. We have the liberty of imposing any price we wish on the given group, so
long as their sum equals the market cap. For calibration purposes, we will seek to
construct these caplets so that their volatilities combine to produce a monotonous,
smooth curve.
16.3 Previous Santander Approach for 6 month caplets
The main problem in any Caplet Stripping method lies in the fact that there is not
enough information to be able to extract a unique caplet volatility. Ideally, if we had
two adjacent market cap quotes that were 6 months apart, then we could construct
cap (t , T ,U 2 ) − cap (t , T ,U1 ) = forwardcap (t , U1 ,U 2 ) = caplet (t , U1 ,U1 + δ )
(16.1)
Where δ= 6months
16.3.1 TYPE I
Cap(t,T,U1)
CapForward(t,U1,U2)
Cap(t,T,U2) = Caplet(t,U1,U2)
t T U1 U2
Fig. 16.2. Cap decomposition into other caps and capforwards
252
In this way, the resulting forward cap would be exactly equal to the caplet, so
would also have the same price. Applying the Black Scholes formula, we would be
able to extract the caplet’s volatility from its known price.
However, this is not generally the case. More commonly, two adjacent cap quotes
are separated in maturities by more than six months. In addition, the separation
between cap quotes for LIBOR markets ranges between 12 months and 5 years. In
these cases, the difference between the cap prices (i.e. the forward cap) is equal to the
sum of at least two or more different caplets.
n
cap (t , T ,U1 ) − cap (t , T , U 2 ) = forwardcap (t ,U1 , U 2 ) = ∑ capleti (t ,U i ,U i + i ⋅ δ )
i =1
(16.2)
There is no additional equation to determine the price to be attributed to any

individual caplet, only to their sum. Therefore, a hypothesis must be made at this
stage so as to decide on how these prices should be distributed among the caplets.
16.3.2 TYPE II
Cap(t,T,U1)
CapForward(t,U1,U2)
Cap(t,T,U2) Caplet (t,U1,U1+ δ) Caplet (t, U1+ δ, U2)
t T U1 U1+δ U2
Fig. 16.3. Capforward decomposition into two unknown caplets
We see clearly in the above that there is no additional information available to

choose a specific price for any of the caplets that combine to form the capforward.
253
16.4 Linear Cap Interpolation
In the Banco Santander Model, to ease calculations, a first approach was simply to
linearly interpolate the market cap volatilities so as to always fall within the first case
situation. This is, given the below example where there would be an excess of caplets
to determine, we linearly create the necessary caps that will enable us to return to a
simple situation as in TYPE I.
Capσ1
Capletσ1 Capletσi
T Capσ2 U1 U1+δ U1+2δ U1+3δ U2
Fig. 16.4. Each cap is made up of a number of caplets of unknown volatility
We construct each cap by linearly interpolating the volatility
 σ cap (t , T ,U 2 ) − σ cap (t , T ,U 1 ) 
σ cap (t , T , U 1 + i ⋅ δ ) = σ cap (t , T , U 1 ) +  (U 1 + i ⋅ δ − U 1 )
 U 2 − U1 
(16.3)
Thus we only have
Capσ1
Capletσ1
T Capσ2 interpolated U1 U1+δ
Fig. 16.5. 2 Cap Interpolation
Then we can easily solve each caplet as was stated in the TYPE I approach:
cap(t , T ,U 1 + i ⋅ δ ) − cap(t , T ,U 1 + (i − 1) ⋅ δ ) = caplet (t ,U 1 + ⋅(i − 1)δ ,U 1 + i ⋅ δ )

(16.4)
254
Let us specify the above calculations for the case in which i = 1, and δ = m6 M .
We would then have
cap(t , T , U 1 + δ ) − cap(t , T ,U 1 ) = caplet (t ,U 1 ,U 1 + δ ) (16.5)
Where the only unknown is the caplet volatility σ cap (t , T ,U 1 + δ ) that we seek
to calculate. We will now specify how each of the above terms are obtained.
The second term in (16.5)
[
cap(t , T , U 1 ) = mt ,T B(t ;U 1 ) L(t , T , U 1 ) N (d1Ti ) − KN (d 2Ti ) ] (16.6)
is the market cap, and is a sum of caplets with the cap’s flat market quoted
volatility. As we only have i = 1, the cap is directly equal to the caplet, with
 L(t , T , U 1 )  σ cap (t ,T ,U1 ) ⋅ δ t ,T

2
LN  ±
d 1T, 2 =  K  2
(16.7)
σ cap (t ,T ,U1 ) ⋅ δ t ,T
The first term in the equation (16.5) has had its volatility interpolated so that the
time-space between the newly interpolated cap and the previous market cap is exactly
equal to the single caplet we seek. This interpolation has been done as:
 σ cap (t , T ,U 2 ) − σ cap (t , T ,U1 ) 

σ cap (t , T ,U1 + δ ) = σ cap (t , T ,U1 ) +   ⋅ δ
 U 2 − U1 
(16.8)
Thus we have the expression for the cap as
[ ]
n −1
cap(t , T ,U 1 + δ ) = ∑ mB(t ;U 1 + δ ⋅ i ) L(t , T ,U 1 + δ ⋅ i ) N (d 1Ti ) − KN (d 2Ti )
i=0
(16.9)
 L(t ,U i ' ,U i ' + δ )  σ cap (t ,T ,U1 +δ ) ⋅ δ 6 M

2
LN  ±
d1U, 21 =  K  2
(16.10)
σ cap (t ,T ,U1+δ ) ⋅ δ 6 M
255
The interpolated cap is therefore the sum of the previous cap and an additional
caplet with the interpolated flat volatility σ cap (t , T ,U 1 + δ )
Finally, the last term in the equation (16.5) verifies
[
caplet (t , U 1 , U 1 + δ ) = mB(t ;U 1 + δ ) L(T , U 1 , U 1 + δ ) N (d1Ti ) − KN (d 2Ti ) ]
(16.11)
 L(t ,U 1 ,U 1 + δ )  σ caplet (t ,U1 ,U1 +δ ) ⋅ δ 6 M

2
LN  ±
d 1T, 2 =  K  2
(16.12)
σ caplet (t ,U1 ,U1 +δ ) ⋅ δ 6 M
The only element we do not know from all the above equations is the
σ caplet
2
( t ,U ,U +δ ) . We will need to solve the black volatility for this caplet- typically via
1 1
Newton Raphson iterations.
For the second caplet we would construct
cap(t , T ,U 1 + 2 ⋅ δ ) − cap(t , T , U 1 + δ ) = caplet (t , U 1 + δ , U 1 + 2 ⋅ δ )
(16.13)
And analogously for the rest.
The problem with this form of stripping is that the resulting values for σcaplet i
produce a curve with respect to their maturities that is not smooth at all. (See Fig.
18.2). The ‘bumps’ present an important difficulty for calibration algorithms that
operate on these caplet volatilities. A smoother fit is consequently required.
256
16.5 Quadratic Cap Interpolation
The procedure that we will follow is completely analogous to the former one.
This is, we have a set of market quoted caps constructed on flat cap volatilities. Once
again we are going to interpolate between two known market quotes so as to obtain
an intermediate cap volatility from which we can easily extract the caplet we are
searching for.
A quadratic fit requires three parameters to completely define the parabola. We

use up two degrees of freedom by setting the parabola to pass through the two known
points (σi and σi-1) defined by the market quoted flat cap volatilities. We have absolute
freedom to impose the third point. As we have a greater density of information at the
beginning of the curve, we decide therefore that it is more useful to take our third
point as the volatility σi-2. Another approach is to use this degree of freedom to impose
a continuity in the function’s slopes- a condition of the form: fj’(Ti-1) = fj-1’(Ti).
For the first method, we use:
f (T ) = AT 2 + BT + C (16.14)
with coefficients
( f i −1 − f i ) ( f i − f i −2 )
A= − (16.15)
(Ti −1 − Ti − 2 )(Ti −1 − Ti ) (Ti − Ti −2 )(Ti −1 − Ti )
( f i − f i −1 ) (
B= − A Ti + Ti −1 ) (16.16)
(Ti − Ti −1 )
C = f i −1 − AT 2 i −1 − BTi −1 (16.17)
16.6 Cubic Spline Interpolation
The idea behind this method remains the same as before. Our only difference is
that now we interpolate between caps using cubic functions. The motor cause behind
this decision is the fact that using quadratic functions provides concavities and
convexities to which our caplet transformation is very sensitive. Indeed, despite the
257
fact that any of the two previous methods apparently produce very smooth flat cap
volatility curves (see Fig. 18.1), we note that their subsequent caplet models show
enhanced irregularities wherever two parabolas or straight lines join (see Fig. 18.2).
This is further enhanced the more different the slopes are at that point. We thus
attempt to solve this problem by means of a cubic fit.
As its name indicates, we fit a series of unique cubic polynomials between each of
the data points, with the stipulation that the curve obtained be continuous and appear
smooth. The fundamental idea behind cubic spline interpolation is based on the
engineer’s tool used to draw smooth curves through a number of points, which is
where the method derives its name from. This spline consists of weights attached to a
flat surface at the points to be connected. A flexible strip is then bent across each of
these weights, resulting in a pleasingly smooth curve.
The mathematical spline is similar in principle. The points, in this case, are
numerical data. The weights are the coefficients on the cubic polynomials used to
interpolate the data. These coefficients ‘bend’ the line so that it passes through each of
the data points without any erratic behaviour or breaks in continuity.
We fit a piecewise function of the form
s1 ( x) if x1 ≤ x < x2
s ( x) x 2 ≤ x < x3
 if
S ( x) =  2
 M
sn −1 ( x) if xn −1 ≤ x < xn
where si(x) is a third degree polynomial defined by
si ( x) = ai ( x − xi )3 + bi ( x − xi ) 2 + ci ( x − xi ) + di for i= 1, 2, ..., n-1.
The first and second derivatives of these n-1 equations are
s 'i ( x) = 3ai ( x − xi ) 2 + 2bi ( x − xi ) + ci

for i= 1, 2, ..., n-1.
s ''i ( x) = 6ai ( x − xi ) + 2bi
The curve must verify the following four conditions:
1. The piecewise function S(x)will interpolate all data points (xi,yi).
2. S(x)will be continuous on the interval [x1, xn]
258
3. S’(x)will be continuous on the interval [x1, xn]
4. S’’(x)will be continuous on the interval [x1, xn]
Since the piecewise function S(x) will interpolate all of the data points, we can
conclude that
S ( xi ) = y i
y i = ai ( xi − xi ) 3 + bi ( xi − xi ) 2 + ci ( xi − xi ) + d i for each i= 1, 2, ..., n-1.
yi = d i
Because property 2 imposes that the function be continuous, then at the junction
of two piecewise cubic curves we have
si −1 ( xi ) = si ( xi ) = d i
we also know that
si −1 ( xi ) = ai −1 ( xi − xi −1 )3 + bi −1 ( xi − xi −1 )2 + ci −1 ( xi − xi −1 ) + di −1 (16.18)
so we have
d i = ai −1 ( xi − xi −1 ) 3 + bi −1 ( xi − xi −1 ) 2 + ci −1 ( xi − xi −1 ) + d i −1
for i= 1, 2, ..., n-1.
16.6.1 Analysing the slopes
s'i ( x) = 3ai ( x − xi ) 2 + 2bi ( x − xi ) + ci

s'i ( xi ) = 3ai ( xi − xi ) 2 + 2bi ( xi − xi ) + ci (16.19)
s ' i ( xi ) = ci
Applying the third condition of continuous slopes, we also have
s'i −1 ( xi ) = s 'i ( xi )
so s'i −1 ( xi ) = 3ai −1 ( xi − xi −1 ) 2 + 2bi −1 ( xi − xi −1 ) + ci −1 (16.20)
259
ci = 3ai −1 ( xi − xi −1 ) 2 + 2bi −1 ( xi − xi −1 ) + ci −1 for i= 1, 2, ..., n-1.
Now considering the second derivatives:
s' 'i ( x) = 6ai ( x − xi ) + 2bi

s' 'i ( xi ) = 6ai ( xi − xi ) + 2bi (16.21)
s' 'i ( xi ) = 2bi
And since the second derivatives must also be continuous, we impose
s' 'i −1 ( xi ) = s' 'i ( xi ) = 2bi (16.22)
s' 'i −1 ( xi ) = 6ai −1 ( xi −1 − xi ) + 2bi −1

for i= 1, 2, ..., n-1.
2bi = 6ai −1 ( xi −1 − xi ) + 2bi −1
for simplification, let us imagine that we have constant intervals
∆x = ( xi −1 − xi ) and let us note s' ' i ( xi ) = s ' 'i = 2bi
Then we can re-write all the coefficients in terms of these parameters as:
s ' ' i +1 − s ' 'i

ai =
6∆x
s' '
bi = i
2 (16.23)
y i +1 − y i s' ' i +1 −2 s ' 'i
ci = − ∆x
∆x 6
d i = yi
Therefore, taking equation ci +1 = 3ai ( xi +1 − xi ) 2 + 2bi ( xi +1 − xi ) + ci
y i + 2 − y i +1 s ' ' i + 2 −2 s ' ' i +1 s' ' −2 s ' 'i y − y i s ' 'i +1 −2 s ' 'i
− ∆x = 3 i +1 ∆x 2 + β i ∆x + i +1 − ∆x
∆x 6 6 ∆x 6
Which we can simplify to
y i − 2 y i +1 + y i + 2
s' ' i +4 s ' ' i +1 + s' ' i + 2 = 6 ⋅ for i= 1, 2, ..., n-1.
∆x 2
260
Which leads to a matrix formulation:
 s' '1 
1 4 1 0 L 0 0 0 0  s' ' 2   y1 − 2 y 2 + y 3 
0 1 4 1 L 0 0 0 
0  s ' ' 3   y − 2y + y 
    2 3 4 
0 0 1 4 L 0 0 0 0  s ' ' 3   y3 − 2 y 4 + y5 
  6  
M M M M O M M M M  M  = 2 ⋅  M 
  ∆x
0 0 0 0 L 4 1 0 0  s ' ' n − 3   y n − 4 − 2 y n −3 + y n − 2 
   
0 0 0 0 L 1 4 1 0  s ' ' n − 2   y n−3 − 2 y n − 2 + y n −1 
 
0 0 0 0 L 1  s' ' n−1   y − 2y + y 
 0 1 4  n−2 n −1 n 
 s' ' 
 n 
Note that this system has n -2 rows and n columns, and is therefore under-
determined. In order to generate a unique cubic spline, two other conditions must be
imposed upon the system.
Historically, the most commonly used boundary conditions have been:
16.7 Natural splines
This first spline type includes the stipulation that the second derivative be equal
to zero
At the endpoints s’’ 1 = s’’n = 0. This results in the spline extending as a line
outside the endpoints. Therefore, the first and last columns of this matrix can be
eliminated, as they correspond to s’’1 = s’’n = 0. This results in a n-2 by n-2 matrix,
which will determine the remaining solutions for s’’2 through s’’ n-1. The spline is now
unique.
261
 
1 4 1 0 L 0 0 0 0  s' ' 2 

 y1 − 2 y 2 + y 3 
0 1 4 1 L 0 0 0 
0  s ' ' 3   y − 2y + y 
    2 3 4 
0 0 1 4 L 0 0 0 0  s ' ' 3   y3 − 2 y4 + y5 
  6  
M M M M O M M M M  M  = 2 ⋅  M 
  ∆x
0 0 0 0 L 4 1 0 0  s ' ' n − 3   y n − 4 − 2 y n −3 + y n − 2 
   
0 0 0 0 L 1 4 1 0  s ' ' n − 2   y n−3 − 2 y n − 2 + y n −1 
 
0 0 0 0 L 0 1 4 1  s' ' n−1   y − 2y + y 
  n−2 n −1 n 
 
 
16.8 Parabolic Run out Spline
The parabolic spline imposes the condition on the second derivative at the
endpoints that
s’’1 = s’’2
s’’ n =s’’ n-1
The result of this condition is a curve that becomes parabolic at the endpoints.
This type of cubic spline is useful for periodic and exponential data.
 
 s' ' 
 5 1 0 L 0 0 0  2   y1 − 2 y 2 + y 3 
 1 4 1 L 0 0 0   s' '   y − 2y + y 
  3   2 3 4 
 0 1 4 L 0 0 0   s' '3   y3 − 2 y 4 + y5 
   6  
 M M M O M M M   M  = ∆x 2 ⋅  M 
 0 0 0 L 4 1 0   s ' ' n −3   y n − 4 − 2 y n −3 + y n − 2 
    
 0 0 0 L 1 4 1   s' ' n− 2   y n −3 − 2 y n − 2 + y n−1 
 0 0 0 L 0 0 5   s' '   y − 2y + y 
  n −1  n−2 n −1 n 
 
 
262
16.9 Cubic Run out Spline
This last type of spline has the most extreme endpoint behaviour. It assigns
s’’1 = 2s’’ 2 -s’’ 3
s’’n = 2 s’’ n-1 -s’’ n-2.
This causes the curve to degrade to a single cubic curve over the last two
intervals, rather than two separate functions.
 
 s' ' 
 6 1 0 L 0 0 0  2   y1 − 2 y 2 + y 3 
 1 4 1 L 0 0 0   s' '   y − 2y + y 
  3   2 3 4 
 0 1 4 L 0 0 0   s ' '3   y3 − 2 y 4 + y5 
   6  
 M M M O M M M   M  = ∆x 2 ⋅  M 
 0 0 0 L 4 1 0   s ' ' n −3   y n − 4 − 2 y n −3 + y n − 2 
    
 0 0 0 L 1 4 1  s' ' n−2   y n −3 − 2 y n − 2 + y n−1 
 0 0 0 L 0 0 6   s' '   y − 2y + y 
  n −1  n−2 n −1 n 
 
 
For our particular occasion, we have decided to implement a modified spline

method.
16.10 Constrained Cubic Splines
The principle behind the proposed constrained cubic spline is to prevent

overshooting by sacrificing smoothness. This is achieved by eliminating the
requirement for equal second order derivatives at every point (condition 4) and
replacing it with specified first order derivatives.
Thus, similar to traditional cubic splines, the proposed constrained cubic splines
are constructed according to the previous equations, but substituting the second order
derivative with a specified fixed slope at every point.
s 'i −1 ( xi ) = s 'i ( xi ) = s '( xi ) (16.24)
263
The calculation of the slope becomes the key step at each point. Intuitively we
know the slope will lie between the slopes of the adjacent straight lines, and should
approach zero if the slope of either line approaches zero or changes sign.
 2
 x −x x −x
 i +1 i + i i −1
 yi +1 − yi yi − yi −1

s '( xi ) = 
(16.25)
= 0 if the slope changes sign at xi



for i =1, 2, ..., n -1
For the boundary conditions, we must impose two further conditions. We shall
construct here a generic approach this time where the intervals ∆x need not be
constant any longer. Thus we will use:
si ( x) = a i x 3 + bi x 2 + ci x + d i (16.26)
Applying a natural spline constraint of the form s’’ 1 = s’’n = 0
We obtain
3( y1 − y 0 ) s' ( x0 )
s'1 ( x0 ) = −
2(x1 − x0 ) 2
(16.27)
3( y n − y n−1 ) s ' ( x n −1 )
s' n ( x n ) = −
2( x n − x n −1 ) 2
As the slope at each point is known, it is no longer necessary to solve a system of

equations. Each spline can be calculated based on the two adjacent points on each
side.
2(s ' i ( xi ) − 2 s 'i ( xi −1 ) ) 6( y i − y i −1 )

s' ' i ( xi −1 ) = − +
(xi − xi −1 ) (xi − xi −1 )2
(16.28)
2(2 s 'i ( xi ) − s 'i ( xi −1 ) ) 6( y i − y i −1 )
s' ' i ( xi ) = − +
(xi − xi −1 ) (xi − xi −1 )2
264
s ' ' i ( xi ) − s' ' i ( xi −1 )

ai =
6( xi − xi −1 )
xi s' ' i ( xi −1 ) − xi −1 s ' ' i ( xi )
bi =
2( xi − xi −1 ) (16.29)
( yi − yi −1 ) − bi (x i − x i −1 ) − ai (x
2 2 3
i −x 3
i −1 )
ci =
(xi − xi −1 )
d i = y i −1 − ci xi −1 − bi x 2 i −1 − ai x 3 i −1
This modified cubic spline interpolation method has been implemented in our flat
cap volatility interpolation. The main benefits of the proposed constrained cubic
spline are:
• It is a relatively smooth curve;
• It never overshoots intermediate values;
• Interpolated values can be calculated directly without solving a system of

equations;
• The actual parameters (ai, bi, ci and di) for each of the cubic spline equations can
still be calculated. This permits an analytical integration of the data.
16.11 Functional Interpolation
We present here the following most direct approaches to caplet volatility

stripping. Their principal difference respect the previous method lies in the fact that
the interpolation here is performed between caplet volatilities, whereas the previous
approach interpolated between cap volatilities.
265
16.12 Constant Caplet Volatilities.
This is the most basic approach for stripping cap volatilities. We will always use
either this method or the previous cap linear interpolation as our first guess in our
optimisation algorithm when proceeding to use more complex interpolation methods.
Note that the algorithm we will construct here requires a simple one dimensional root
solver- i.e., it can be simply solved by a one dimensional Newton Raphson for
example.
{Σ } i n
• Let cap i=1 be the set of market quoted caps for a given strike K.
{σ }
i m
• Let caplet i =1 be the constant caplet volatilities that we are trying to calculate.
We perform the stripping by what is commonly known as a bootstrapping

mechanism. We have the same formula as we had before, where each cap price “i” is
constructed as a sum of caplet prices with the flat cap volatility:
n
cap( 0,Ti ) (Σ) = ∑ caplet i (t j −1 ,t j ) (Σ) (16.30)
j =1
and where we construct each caplet as
caplet (t j −1 , t j ) = mt j −1 ,t j B (0, t j )[FN (d 1 ) − KN (d 2 )] (16.31)
 F  Σ ⋅ t j −1
2
LN   ±
d 1, 2 = K 2
(16.32)
Σ ⋅ t j −1
We are now noting the forward LIBOR rate as F for simplicity.
We can compute the forward caps as:
forwardcap (Ti −1 ,Ti ) (Σ i , i −1 ) = cap ( 0,Ti ) (Σ i ) − cap ( 0,Ti −1 ) (Σ i −1 ) (16.33)
266
Remember that the problem with these forward caps was the fact that they could
encompass several caplets of unknown volatility
Cap(0,T i)( Σi )
CapForward(T i-1, T i)( Σ i ,i-1)
Cap(0,T i-1)( Σ i -1) Caplet1 Caplet 2
t T U1 U1+δ U2
Fig. 16.6. Forward caps related to the caplets
So as to find the piecewise constant volatilities, we need to solve the following
equation for a unique, constant σ caplet

i
n
forwardcap( 0,Ti ) (Σ i ,i −1 ) = ∑ caplet i (t j −1 ,t j ) (σ i ) (16.34)
j =1
16.13 Piecewise Linear Caplet Volatility Method
We now no longer use a constant σ caplet

i
for the group of caplets that form the
capforward. Instead, we impose a linear relationship between them. This really only
gives us one more degree of freedom, as a line is entirely defined by two of its points.
Our constraint is still that the caplets in a given interval sum to give the
capforward price derived from the market caps. However, these caplets that we
construct are now related linearly.
Because we seek a smooth curve, we start by imposing that the end node of the
caplets forming one capforward(i-1) (value of σ caplet

i −1, j
) coincide with the starting node
( σ caplet ) of the first caplet that forms the following capforward(i). As we have N cap
i,0
volatilities and only N+1 degrees of freedom, we are left with just one free parameter
to fit.
267
We can impose an exact fit, i.e., impose the first volatility, σ caplet
1, 0
, such that at
each interval, we satisfy the condition
n
forwardcap(Ti −1 ,Ti ) (Σ i ,i −1 ) = forwardcap(Ti −1 ,Ti ) (σ i , j ) = ∑ caplet i (t j −1 ,t j ) (σ i , j )
j =1
(16.35)
As a result we obtain a very unstable functional form, that is, not smooth at all-
see Fig. 18.2.
We could instead consider a non exact fit, in which we would seek to smoothen
the curve at the expense of allowing for small differences in the previous equation.
This is, we seek to minimize the difference between successive changes in slopes,
and minimize at the same time the difference between each Capforward price and the
sum of each set of caplet prices.
We will note the slopes as
σ i − σ i −1
βi = (16.36)
Ti − Ti −1
F (σ 0 ,..., σ N ) =
N −1
( )
N
= ∑ wi forwardcap(Ti −1 ,Ti ) (Σi ,i −1 ) = forwardcap(Ti −1 ,Ti ) (σ i , j ) + λ ∑ ( βi −1 − β i )
2 2
i =1 i =2
(16.37)
We can take for example λ = 10-4 and wi = 1/Ti. We would like to point out at this
stage that an inexact fit implies that the resulting prices that we will obtain for the
caplets will not sum to give the market quoted cap values. We are therefore giving up
precision at the expense of greater smoothness in the curve. We do not believe
therefore that an inexact fit method should be pursued if we are attempting to
accurately portray the market.
268
16.14 Piecewise Quadratic
The approach is very similar to the previous one, with the only difference that at
each interval where a capforward is calculated, we now impose a quadratic
relationship between all the caplets instead of a linear relationship. We will
characterise each of these functional forms with the values at the end points, and with
the mid point of each interval. This is useful as the mid point normally coincides with
the value of a specific caplet that we have to calculate. For example, given a
capforward lasting one year, we will have to divide it into two equal 6 month caplets,
meaning that one of them will coincide with the mid point 6 month caplet that we
must calculate anyway.
Another reason is the fact that we will need the mid point anyway for the
computation of the slopes in each interval.
Evidently, for continuity, we impose that the last caplet volatility in each
quadratic function coincides with the first caplet volatility of the following quadratic
function. In addition, we have an extra midpoint to calculate at each interval. We
therefore have N cap volatilities, and 2N+1 degrees of freedom among the midpoints
‘m’ and end points ‘i’ in the quadratic curve.
As we said, for continuity the two points that we must define at each interval are
related to the previous point via a quadratic function
f i = f i −1 + C1 (Ti − Ti −1 ) + C 2 (Ti − Ti −1 ) 2
(16.38)
f m = f i −1 + 0.5 ⋅ C1 (Ti − Ti −1 ) + 0.25 ⋅ C 2 (Ti − Ti −1 ) 2
Where we have
2( f i − f i −1 ) − 4( f m − f i −1 )
C2 =
(Ti − Ti −1 ) 2
(16.39)
( fi −
f i −1 )
C1 = − C 2 (Ti − Ti −1 )
(Ti − Ti −1 )
And where the slopes are now
269
σ im − σ i −1
β im = ,
Tm − Ti −1
(16.40)
σ i − σ im
βi = ,
Ti − Tm
As with the linear fit we can also perform an inexact fit, this is
F (σ 0 ,..., σ N ) =
2 N +1
( )
N
+ λ ∑ ( βi −1 − βi )
2
= ∑ wi forwardcap(Ti −1 ,Ti ) (Σi ,i −1 ) = forwardcap(Ti −1 ,Ti ) (σ i , j )
2
i =1 i =2
(16.41)
where we allow for differences between the forward cap and its corresponding
sum of caplets. Otherwise, we can impose an exact fit in which we only minimize the
difference between slopes. This would yield:
2 N +1
F (σ 0 ,..., σ N ) = ∑ (β − βi )
2
i −1
(16.42)
i =2
270
16.15 The Algorithm
Strikes K K
Market
Maturities T
Strike Interpolation For every strike Flat

Market Flat
σCaps σCaps
1st Guess
σCaplets
ForwardCaps Price ForwardCaps Price

σCaplets
from σcaplets from σflat
modify 1st guess
Quadratic error
Σ(FWDCapiflat- FWDCappicaplets)2
Slopes βi
λ
Quadratic error Acceptable combined yes Export

w
Σ(βi- βi-1)2 Error σCaplets
no
Optimisation
Fig. 16.7. Optimisation algorithm for interpolation in maturities
271
16.16 About the problem of extracting 6M Caplets from Market data.
Strike
1.5 1.75 2
1 30.6 26.2 21.4
1.5 27 23.8 20.4
2 26.3 23.3 20.2
3 25.6 23.4 21.1
Maturity
4 25.4 23.4 21.3

5 24.9 23.1 21.1
6 24.5 22.8 21.1
Table 16.1. Cap market quotes: flat cap difference under 2 year barrier
The caps that are available to us directly from the market are not constructed over
the same underlying forward LIBOR rates. More specifically, the data that we have
available is constructed over the three month LIBOR for quoted caps with a maturity
of up until two years, and all subsequent caps quoted with longer maturities are
constructed over the six month LIBOR forward rate. Note moreover that the starting
date of our data is no longer always six months past the valuation date. This is still the
case for caps whose maturities last more than 2 years. Now however, for the data
quoted over the 3 month LIBOR, (i.e. with maturities less than 2 years) the starting
date is the third month after the value date. This means for example that the first
market data quoted with a maturity of one year is really constructed from three
caplets, with starting and end dates: (3M,6M), (6M,9M), (9M,1Y).
FlatCaplets
On 3M EURIBOR
0 3M 6M 9M 12M 1Y3M 1Y6M
Fig. 16.8. Cap market quotes: flat cap difference under 2 year barrier
Further, we have to obtain from the 3 month quotes the equivalent six month
caplet starting on the sixth month so as to be consistent with the rest of the data.
272
Sought for Caplets

on 6M EURIBOR
0 3M 6M 12M 1Y6M 2Y
Fig. 16.9. Creation of the 6 month caplets from 3 month Cap market quotes: flat
cap difference under 2 year barrier
We shall try to develop a method to extract a measure of the six month caplets
from the 3 month data. The most direct approach would be to assume that the
volatility of the six month cap is equivalent to the σflat(L3M). Mathematically this would
mean:
n
cap 3 M ( 0,Ti ) (Σ i ) = ∑ capleti (Σ i )
3M
( t j −1 , t j )
(16.43)
j =1
But we would be then using
cap 3 M ( 0,Ti ) (Σ 3 M i ) = cap 6 M ( 0,Ti ) (Σ 6 M i ) (16.44)
which is clearly wrong.
Instead, we decide to construct a cap6M using the following procedure:
Consider three instants of time, 0 < S < T < U, all six-months spaced. Assume also
that we are dealing with a ‘Swaption x 1’ and with S and T expiry six month caplets.
Caplet6M (F)
Caplet3M(F1) Caplet3M
(F )
t S T U
Fig. 16.10. Decomposition of a six menthe caplet into two 3 month caplets
· Where we have noted F as the forward rate applicable to each caplet, and
where (S, T, U) are each separated by three month intervals. Note that both F1
273
and F2 are related to the three month LIBOR, whereas we are trying to
construct an F related to the six month LIBOR rate.
The algebraic relationship between F, F1 and F2 is easily derived expressing all

forward rates in terms of zero-coupon-bond prices.
 B(t , S )  1
F1 =  − 1 (16.45)
 B (t , T )  1 2
 B(t , T )  1
F2 =  − 1 (16.46)
 B(t ,U )  1 2
 B(t , S )  1
F =  − 1 (16.47)
 B(t ,U )  1
Notice that we can rewrite the latter in terms of the previous two:
 B(t , S )  1  B(t , S ) B(t , T )  F1 + F2 F1 F2

F =  − 1 =  ⋅ − 1 = + (16.48)
 B(t ,U )  1  B(t , T ) B(t ,U )  2 4
Let us now apply Ito to the formulae for F1 and F2, where we are only really
concerned with the Brownian terms:
dF 1 ( t ) = (...) dt + σ 1 ( t ) F1 ( t ) dZ 1 ( t )
dF 2 ( t ) = (...) dt + σ 2 ( t ) F 2 ( t ) dZ 2 ( t ) (16.49)
dZ 1 ( t ) dZ 2 ( t ) = ρ dt
The quantity ρ is the ‘infra correlation’ between the ‘inner rates’ F1 and F2. By
differentiation
∂F 2
∂F 1 2 ∂2F
dF (t ) = dt + ∑ dFi + ∑ 2 σ i dt (16.50)
∂t i =1 ∂Fi 2 i =1 ∂Fi
1 F  1 F 
dF (t ) = (...)dt + σ 1 (t ) F1 (t )dZ1 (t ) ⋅  + 2  + σ 2 (t ) F2 (t )dZ 2 (t ) ⋅  + 1 
2 4  2 4 
(16.51)
274
F F F  F F F 
dF (t ) = (...)dt + σ 1 (t ) ⋅  1 + 1 2  dZ1 (t ) + σ 2 (t ) ⋅  2 + 1 2  dZ 2 (t )
2 4  2 4 
(16.52)
Taking variances on both sides of the above, conditional on the information

available at time t we have
2 2
 F1 (t ) F1 (t ) F2 (t )   F2 (t ) F1 (t ) F2 (t ) 
σ 2 (t ) ⋅ F 2 (t ) = σ 12 (t ) ⋅  +  + σ 2 (t ) ⋅  2 +
2
 +
 2 4   4  (16.53)
 F1 (t ) F1 (t ) F2 (t )   F2 (t ) F1 (t ) F2 (t ) 
+2 ρσ 1 (t )σ 2 (t ) ⋅  + ⋅ 2 + 
 2 4   4 
Let us name
1  F1 (t ) F1 (t ) F2 (t ) 
u1 (t ) = +
F (t )  2 4 

(16.54)
1  F2 (t ) F1 (t ) F2 (t ) 
u 2 (t ) = +
F (t )  2 4 

We can then rewrite the former as
σ 2 (t ) = u12 (t )σ 12 (t ) + u 22 (t )σ 2 2 (t ) + 2 ρσ 1 (t )σ 2 (t )u1 (t )u 2 (t ) (16.55)
We decide to introduce a deterministic approximation by freezing all F’s (and

therefore u’s) at their time-zero value:
σ 2 approx (t ) = u12 (0)σ 1 2 (t ) + u 22 (0)σ 2 2 (t ) + 2 ρσ 1 (t )σ 2 (t )u1 (0)u 2 (0)
(16.56)
Now recall that F is the particular (one-period) swap rate underlying the ‘S x 1’
swaption, whose (squared) Black’s swaption volatility is therefore
S
1
σ approx
S ∫0
v 2 Black = 2
(t )dt =
1 2 
S S S
=  1 ∫ 1
u (0) σ 2
(t ) dt + u 2
2 (0) ∫ σ 2
2
(t ) dt + 2 ρ u1 (0)u 2 (0) ∫ σ 1 (t )σ 2 (t )dt 
S 0 0 0 
(16.57)
275
The first integral can be inputted directly as a market caplet volatility.
S
1
σ 12 (t )dt
S ∫0
v 2 S _ Caplet = (16.58)
The second and third integrals in contrast require some form of parametric
assumption on the instantaneous volatility structure of rates in order to be computed.
The simplest solution is to assume that forward rates have constant volatilities. In
such a case
S S
1 1
∫ σ 1 (t )σ 2 (t )dt ≈ ∫ v 2 T _ Caplet dt = v 2T _ Caplet (16.59)
S0 S0
The third integral becomes:
S S
1 1
∫ σ 1 (t )σ 2 (t )dt ≈ ∫ vT _ Caplet vS _ Caplet dt = vT _ Caplet vS _ Caplet (16.60)
S0 S0
Under this assumption we finally get:
σ 2 approx (t ) = u12 (0)vS _ Caplet 2 (t ) + u 22 (0)vT _ Caplet 2 (t ) + 2 ρvS _ Caplet (t )vT _ Caplet (t )u1 (0)u 2 (0)
(16.61)
276
17. SABR
At this point in our study, we have effectively smoothened our curve along the
maturity direction, this is, we have solved the irregularities in the term structure of
our caplet volatility surface. More specifically, and possibly more recognisable
visually, we have interpolated along the vertical direction in the market quoted
matrix:
Strikes [%]
1,50 1,75 2,00 2,25 2,50 3,00 3,50 4,00 5,00 6,00 7,00 8,00 10,00
l
1,00 30,60 26,20 21,40 15,70 12,00 10,70 12,20 14,00 17,50 21,00 23,50 25,40 27,10
1,50 27,00 23,80 20,40 17,70 15,70 14,00 15,80 17,00 18,70 20,10 21,20 22,20 26,10
2,00 26,30 23,30 20,20 18,00 16,70 15,10 16,40 17,30 18,80 20,00 21,20 22,20 26,10
3,00 25,60 23,40 21,10 19,50 18,60 16,90 17,50 18,10 19,30 20,60 21,70 22,60 24,40
4,00 25,40 23,40 21,30 19,90 19,20 17,60 17,90 18,30 19,30 20,40 21,40 22,30 23,90
Maturity [years]
5,00 24,90 23,10 21,10 19,90 19,30 17,90 17,90 18,10 18,90 19,80 20,70 21,50 23,10
6,00 24,50 22,80 21,10 19,90 19,40 18,00 17,90 18,00 18,60 19,30 20,00 20,70 22,40
7,00 24,20 22,60 21,00 19,90 19,40 18,10 17,80 17,80 18,20 18,80 19,50 20,10 21,80
8,00 24,00 22,40 20,90 19,90 19,40 18,10 17,70 17,60 17,90 18,40 18,90 19,50 21,20
9,00 23,80 22,20 20,80 19,80 19,30 18,00 17,60 17,40 17,50 17,90 18,40 18,90 20,50
10,00 23,60 22,10 20,70 19,70 19,20 17,90 17,40 17,20 17,20 17,50 17,90 18,30 19,70
12,00 23,00 21,60 20,30 19,30 18,80 17,50 17,00 16,60 16,40 16,60 16,90 17,30 18,40
15,00 22,50 21,10 19,90 18,90 18,30 17,10 16,50 16,10 15,70 15,70 16,00 16,30 17,20
20,00 21,60 20,30 19,20 18,10 17,60 16,40 15,70 15,20 14,80 14,80 15,00 15,30 16,20
25,00 20,80 19,60 18,50 17,50 17,00 15,90 15,20 14,70 14,20 14,30 14,50 14,90 15,70
30,00 20,10 18,90 17,80 17,00 16,60 15,40 14,70 14,20 13,80 13,90 14,30 14,60 15,50
Table 17.1. Shows the different dates and strikes with which to model products with
similar needs to the HJM
This does not mean however that our curve will be smooth along a horizontal
strike direction. In fact, we have not modified these values at all with respect to the
original market values, and therefore obtain market derived smile volatilities that can
be extremely irregular:
277
Chapter 17 SABR
Caplet Smiles with Maturity
30,000%
l
25,000%
20,000% Maturity 0.5 years

Maturity 1.5 years
Sigma Black
Maturity 8 years
15,000% Maturity 28 years
10,000%
5,000%
1,50 2,00 2,50 3,00 3,50 4,00 5,00 6,00 7,00 8,00 9,00
Strike K [%]
Fig. 17.1. Caplet current market behaviour
The requirement of a smooth smile is essential for calibration. Of particular

importance is the small peak observed for very short maturities around the ‘at the
money’ value (strike = 3%).
As is done with the market swaption matrix, we decide to execute an inexact

SABR interpolation along strikes to smoothen the curve. It is inexact because we will
modify slightly the market values in exchange for an increase in smoothness.
278
17.1 Detailed SABR
The SABR method is a stochastic interpolation method to model LIBOR forward
{Ki }i =1 and
n
rates. The model takes a given vector of strikes a vector of Black
{σ i }i =1 i.e., a horizontal row from the above matrix. The fit is done in the
n
volatilities
sense of the nonlinear least squares, using:

(17.1)
dα = να dW2 α (0) = α
dW1dW2 = ρ dt (17.2)
σ0 = α
There are four parameters in the model, (α, β, ρ, ν), although we will soon see
that two of these can be set beforehand.
The SABR model uses the price of a European option given by Black’s formula:
Vcall = B(t set ) ( f ⋅ N (d1 ) − K ⋅ N (d 2 ) )

(17.3)
V put = Vcall + B(t set ) ( K − f )
 f  σ ⋅ (t )
2
log   ± B ex
T
d1,2 = K 2
(17.4)
σ B ⋅ tex
It then derives the below expression, where the implied volatility σB(f,K) is given
by singular perturbation techniques. We will not enter the specifics here, but simply
state the expressions:
279
Chapter 17 SABR
σB (K, f ) =
α  z 
= ⋅ ⋅
 (1 − β ) 2
(1 − β ) 4  f    x( z ) 
4
2  f 
( f ⋅ K )(
1− β ) / 2
1 + log   + log   + ...

24 K 1920 K 
  (1 − β )2 α2 ρβνα 2 − 3ρ 2 2  
 1 
⋅ 1 +  ⋅ + (1− β ) / 2
+ ν  t + ...
( f ⋅K)
1− β
4 ( f ⋅K)
ex
  24 24  
(17.5)
Where
ν
( f ⋅ K )( ) log  
1− β / 2 f
z= (17.6)
α K
and
 1 − 2 ρ z + z 2 + z − ρ 
x ( z ) = log   (17.7)
 1− ρ 
For special cases of ‘at the money’ options where f = K, the formula simplifies to
α   (1 − β )2 α2 1 ρβνα 2 − 3ρ 2 2  
σ ATM = α B ( f , f ) = 1 +  ⋅ + + ν  t + ... 
f (1− β ) (f) 4 ( f )1− β
2−2 β ex
  24 24  
(17.8)
Implementing the SABR model for vanilla options is very easy, since once this
formula is programmed, we just need to send the options to a Black pricer.
The complexity of the formula is needed for accurate pricing. Omitting the last
line of (17.5), for example, can result in a relative error that exceeds three per cent in
extreme cases. Although this error term seems small, it is large enough to be required
for accurate pricing. The omitted terms “+ ” are much, much smaller.
The function being minimised in this procedure is
∑ w ( blackToNormal(σ ,K )-sabrToNormal(α ,β ,ρ ,ν ,K ) )
2
i i i i (17.9)
i =1
280
This is, we are minimising the difference between the correct market quotes at
specific strikes, and the quotes that the SABR model provides at those same strikes.
The price we are willing to pay for this adjustment is set by the weights in the
minimisation algorithm that we use. We are currently taking weights of the form:
1
wi = (17.10)
1 + ( Ki − F )
2
17.2 Dynamics of the SABR: understanding the parameters

(17.11)
dα = να dW2 α (0) = α
There are two special cases that we must pay special attention to: β=1,
representing a stochastic log normal model (flat) dF = αF , and β=0, representing a
stochastic normal model (skew) dF = α . On top of these curves we have the
superimposed smile, as can be seen in the below graph.
Notice that the σB(f,f) at the money traces the dotted line known as the backbone.
Fig. 17.2. beta = 0 skew imposition, rho smile imposition
281
Chapter 17 SABR
Fig. 17.3. beta flat imposition =1, rho smile imposition
Let us consider a simplified version of the SABR valid when K is not too far from
the current forward f.
σ B ( K, f ) =
α  1 K 1 K 
= 1− β 
2
( )
⋅ 1 − (1 − β − ρλ ) log   + (1 − β ) + 2 − 3ρ 2 λ 2  log 2   + ...
f  2  f  12  f  
(17.12)
ν 1− β
where λ= f (17.13)
α
α
The main term in the above is σB ( f , f ) = which represents what we have
f 1− β
called the backbone. This is almost entirely determined by the exponent β, with the
exponent β = 0 (a stochastic Gaussian model) giving a steeply downward sloping
backbone, and the exponent β = 1 giving a nearly flat backbone.
1 K
The second term − (1 − β − ρλ ) log   represents the overall skew:-the slope
2  f 
of the implied volatility with respect to the strike K We can decompose it into two
principal components:
1. The beta skew:
1 K
− (1 − β ) log   (17.14)
2  f 
282
Is downward sloping since 0 ≤ β ≤ 1. It arises because the “local volatility”
α αfβ
= is a decreasing function of the forward price.
f 1− β f 1− β
2. The vanna skew:
1 K
− ρλ log   (17.15)
2  f 
Is the skew caused by the correlation between the volatility and the asset price.
Typically the volatility and asset price are negatively correlated, so on average, the
volatility α would decrease (increase) when the forward f increases (decreases). It thus
seems unsurprising that a negative correlation ρ causes a downward sloping vanna
skew.
1 
The last term  (1 − β )2 + 2 − 3ρ 2 λ 2  log 2  K  + ... also contains two
( )
   f 
12 
parts:
The first part:
1 K
(1 − β ) log 2  
2
(17.16)
12  f 
appears to be a smile (quadratic) term, but it is dominated by the downward

sloping beta skew, and, at reasonable strikes K, it just modifies this skew somewhat.
The second part:
K
1
( )
2 − 3ρ 2 λ 2 log 2   (17.17)
12  f 
is the smile induced by the volga (vol-gamma) effect. Physically this smile arises
because of “adverse selection”: unusually large movements of the forward F happen
more often when the volatility α increases, and less often when α decreases, so strikes
K far from the money represent, on average, high volatility environments.
283
Chapter 17 SABR
17.2.1 Fitting market data.
The exponent β and correlation ρ affect the volatility smile in similar ways. They
both cause a downward sloping skew in σB (K, f ) as the strike K varies. From a single
market snapshot of σB (K, f ) as a function of K at a given f, it is difficult to distinguish
between the two parameters.
Fig. 17.4. Undistinguishable smile difference on calibrating with different beta

parameters β = 0 and β =1
Note that there is no substantial difference in the quality of the fits, despite the
presence of market noise. This matches our general experience: market smiles can be
fit equally well with any specific value of β. In particular, β cannot be determined by
fitting a market smile.
Suppose for the moment that the exponent β is known or has been selected. The
exponent β can be determined from historical observations of the “backbone” or
selected from “aesthetic considerations”. Selecting β from “aesthetic” or other ‘a
priori’ considerations usually results in β = 1 (stochastic lognormal), β = 0 (stochastic
normal), or β = 1/2 (stochastic CIR) models. We will see however that in our
particular SABR construction, the beta parameter has a much greater impact than
what we initially expected.
284
With β given, fitting the SABR model is a straightforward procedure. Simply, for
every particular date (i.e. row) in our caplet matrix, we seek a unique pair ρ, ν with
which we satisfy the SABR equations, and for which we minimize the equation set out
initially in (17.9). The alpha can be fitted analytically in general.
∑ w ( blackToNormal(σ ,K )-sabrToNormal(α ,β ,ρ ,ν ,K ) )
2
i i i i
i =1
Now, the three parameters α, ρ, and ν have clear different effects on the curve:
· The parameter α mainly controls the overall height of the curve and is defined
and almost equal to the σATM
· The correlation ρ controls the curve’s skew,
· The vol of vol ν controls how much smile the curve exhibits.
Because of the widely separated roles these parameters play, the fitted parameter
values tend to be very stable.
It is usually more convenient to use the at-the-money volatility σATM β, ρ, and ν

as the SABR parameters instead of the original parameters α, β, ρ, ν. The parameter α
is then found whenever needed by inverting
α   (1 − β )2 α2 1 ρβνα 2 − 3ρ 2 2  
σ ATM = α B ( f , f ) = 1 +  ⋅ + + ν  t + ... 
f (1− β ) (f) 4 ( f )1− β
2− 2 β ex
  24 24
 
(17.18)
This inversion is numerically easy since the [...]tex term is small. With this
parameterization, fitting the SABR model requires fitting ρ and ν to the implied
volatility curve, with σATM given by the market and β=1 selected.
In many markets, the ATM volatilities need to be updated frequently, say once or
twice a day, while the smiles and skews need to be updated infrequently, say once or
twice a month. With the new parameterization, σATM can be updated as often as
needed, with ρ, ν (and β) updated only as needed. In general the parameters ρ and ν
285
Chapter 17 SABR
are very stable (β is initially assumed to be a given constant), and need to be re-fit only
every few weeks. This stability may be because the SABR model reproduces the usual
dynamics of smiles and skews. In contrast, the at-the-money volatility σATM , or,
equivalently the α, may need to be updated every few hours in fast-paced markets.
17.2.2 σATM Selection
For our specific caplet case, the ‘at the money’ volatility with which we set the
global volatility level is obtained as the volatility corresponding to the caplet’s
forward. For a caplet with a value date t, fixing date T and maturity U, the forward
strike is simply:
 B(t , T )  1
K FWD =  − 1 (17.19)
 B (t , U )  m
Thus, since at this stage, we will typically have a caplet matrix (either linear or
cubic spline interpolated), we can extract the corresponding σATM by interpolating
along a row the above strike. Graphically, this is equivalent to:
Cubic spline σATM extraction for SABR

0,35
0,3
0,25 constrained cubic spline

l
Maturity 1 Year
6 Month caplet
0,2
Black Vol
0,15
0,1 σATM = 0,0899
0,05
0
1,5 3,5 Forward Strike 5,5 7,5 9,5
= 4,55%
Strike [%]
Fig. 17.5. Constructing the caplet ‘at the money’ volatilities
286
The above must be performed for each of the maturities. Followingly, the entire
caplet matrix, the forwards FWDi, the σATM i, the set of dates (value date t, fixing dates
Ti and maturities Ui) and the tenors (6 months in our case) are used as inputs for the
SABR surface construction.
Some final considerations:
In most markets there is a strong smile for short-dated options which relaxes as
the time-to-expiry increases; this is exactly what we observe in Fig. 18.14 with the
caplet market. Consequently the volatility of volatilities ν is large for short dated
options and smaller for long-dated options, regardless of the particular underlying.
The SABR model predicts that whenever the forward price f changes, the implied
volatility curve shifts in the same direction and by the same amount as the price f.
This predicted dynamics of the smile matches market experience.
If β < 1, the “backbone” is downward sloping, so the shift in the implied volatility
curve is not purely horizontal. Instead, this curve shifts up and down as the at-the-
money point traverses the backbone.
287
Chapter 18 Result Analysis
18. Result Analysis
18.1.1 Flat Cap Volatilities:
Are the market data which we possess to begin with. We seek to interpolate
from the caps the necessary intermediate maturity values to be able to perform the
subsequent caplet interpolation procedure. The first thing that we must notice is the
fact that all interpolation methods produce intermediate points that are basically
indistinguishable within the Cap curve. This apparently suggests that they will yield
very similar Caplet curves also. We will see that the latter is not the general case
Flat Cap Volatility
0,19
0,185 Market
natural spline
0,18
constrained spline
l
0,175
0,17
Cap Volatility
0,165
0,16
0,155
0,15
0,145
0,14
0,00 5,00 10,00 15,00 20,00 25,00 30,00
Maturity [Years]
Fig. 18.1. Cap flat market quotes versus interpolated flat market quotes
288
18.1.2 Linear Cap Interpolation versus Quadratic Cap Interpolation
The Linear Cap interpolation method is that which was already implemented in
the Banco Santander (the green line below). We will use it as the standard of
comparison with respect to all other methods tested. Notice how it presents very
noticeable ‘bumps’ throughout the entire curve, suggesting that we will encounter
future difficulties when using this data.
The first thing we must realize is the actual location of the irregularities. A careful
examination with respect to the previous cap graph from which the caplets are
derived shows that the jumps produced in the below graph coincide exactly with
sharp changes in the gradient of Fig. 18.1. This apparently suggests that for
smoothness in the caplet values, we require minimal slope variations in our cap
graph.
Cap Linear and Quadratic Interpolation
0.2 Cap Quadratic

Cap Linear
l
0.18
0.16
Caplet Volatilities
0.14
0.12
0.1
0.08
0 5 10 15 20 25 30
Maturity [Years]
Fig. 18.2. Caplet interpolated volatilities using linear and quadratic approaches
Quadratic formulae tend to show a high degree of convexity or concavity in their

interpolation points. No matter how small they may be in the Cap graph, it is evident
from the above that the caplet transformation is sensitive to them.
18.1.3 Cubic Spline interpolation
Is the next logical step to take. The increase in the order of the polynomial used
clearly translates into a better interpolation of the caps. As a result, caplets are
smoother. Below we show a particularly unfavourable case that occurs at a strike
289
value of K = 4%. Notice that we still have undulations, but no longer the abrupt spikes
characteristic in the linear method.
Cap Cubic Spline Interpolation

Cap Linear
0,2 Cap Cubic Spline
l
0,18
0,16
Caplet Volatility
0,14
0,12
0,1
0,08
0 5 10 15 20 25 30
Maturity [Years]
Fig. 18.3. Cap interpolated volatilities using linear and cubic spline approaches
Cap Cubic Spline Interpolation
Cap Cubic Spline Natural

0,2
Cap Cubic Spline Constrained
0,18
Caplet Volatility
0,16
0,14
0,12
0,1
0,08
0 5 10 15 20 25 30
Maturity [Years]
Fig. 18.4. Cap interpolation between natural and constrained cubic splines
The second graph is an analysis of the two different spline methods we

implemented. Remember that we seek the smoothest changes in slope possible in our
cap interpolation, with as little overshooting as possible. This is specifically what is
achieved through the constrained spline method. Although the differences are small, a
more careful examination of the above curves shows that the amplitude of the
oscillations in the natural spline are always greater than those in the constrained
spline.
290
18.1.4 Piecewise Constant Caplet Interpolation
Caplet Constant Interpolation
0,230000
Caplet Constant, 1st point free
0,220000
Cap linear interpolation
l
0,210000 Caplet Constant, 1st point fixed
0,200000
Caplet Volatility
0,190000
0,180000
0,170000
0,160000
0,150000
0 5 10 15 20 25 30
Maturity [years]
Fig. 18.5. Caplet interpolated volatilities using linear and linear cap approaches
We now revert to our second method: interpolation between caplets. Once again
we decide to compare each model with the linear cap interpolation that is already
implemented within the bank. The piecewise constant interpolation for each set of
caps is clearly not a smooth solution and has been presented mostly for a visual
confirmation of what the theory predicted.
It does help however in understanding the range of caplets we must interpolate

at each time interval. It is clearly visible that we interpolate in groups of two up until
10 years, then in two groups of around 5 caplets each until 15 years, and then in vast
groups of 10 caplets apiece until 30 years.
18.1.5 Piecewise Linear Caplet Interpolation
Caplet Linear Interpolation
Cap linear
0,23
Caplet linear
0,22
l
0,21
0,2
Caplet Volatility
0,19
0,18
0,17
0,16
0,15
0,14
0 5 10 15 20 25 30
Maturity [Years]
291
Fig. 18.6. Caplet interpolated volatilities using linear approaches
The improvement here is clear with respect to the piecewise constant. We still
however attain no improvement with respect to the cap linear interpolation.
18.1.6 Piecewise Quadratic Caplet Interpolation: first guess cubic spline
Caplet Quadratic Interpolation

Cap Cubic Spline
Caplet Quadratic
0,19
l
0,18
0,17
0,16
Caplet Volatility
0,15
0,14
0,13
0,12
0,11
0,1
0 5 10 15 20 25 30
Maturity [Years]
Fig. 18.7. Caplet interpolated volatilities using cubic spline, and an optimisation
algorithm using quadratic approaches
The optimization algorithm here can result extremely time consuming. The
convergence towards an optimal solution is still extremely similar to the results
obtained through the cap spline interpolation.
We decide therefore to implement the spline interpolator. It is simpler, less time

consuming, and equally as efficient in smoothing out the caplet function.
18.2 SABR Results
Our curve is now smooth along maturities. We return now to the analysis of the
interpolation across different strikes. Recall that we had particular difficulties,
especially at very short or extremely long maturities. Through the SABR interpolation
method, we managed to greatly reduce the fluctuations present within the caplet
Black volatilities.
292
SABR
0,2400
l 0,2300
0,2200
0,2100
Black Vol
0,2000
Market caplets
0,1900
SABR
0,1800
0,1700
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5
Strike K [%]
Fig. 18.8. - SABR shot maturity caplet smile
Above, we have taken a set of different strikes for a six month caplet whose
exercise date was fairly recent, starting in five years time. We see that without the
SABR, the spline interpolated data not too badly, but still has a few irregularities. The
SABR proves to be a lot smoother, and the adjustment is relatively close
SABR
0,18
0,17
0,16
l
0,15
Black Vol
0,14
0,13
0,12
Market Caplets
SABR
0,11
0,1
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5
Strike K [%]
Fig. 18.9. - SABR long maturity caplet smile
293
Here with long maturities (28years) for a six month caplet, we see that the
extrapolated data is extremely irregular. This is mainly because the data available
from the market is for caps that have 25 year or 30 year maturities. Interpolation of
caplets with six month intervals can degenerate rapidly in five year intervals. The
SABR proves to be an extremely good approximation.
SABR Smile
0.3500
0.3000
0.2500
0.2000
Black Vol
0.1500
0.1000 Linear Interpolated Caps

SABR Interpolation
0.0500
0.0000
1.50 2.50 3.50 4.50 5.50 6.50 7.50 8.50
Strikes
Fig. 18.10. - SABR very short 6 month maturity - sharp smile
For very low maturities the smile can be quite sharp. SABR tends to widen it
slightly, and smoothen the curvature on the slopes. Notice that there is a displacement
between ‘at the money’ levels. We shall discuss this characteristic later.
SABR Smile
0.2300
0.2200
0.2100 Linear Cap Interpolation

SABR Interpolation
0.2000
Black Vol
0.1900
0.1800
0.1700
0.1600
0.1500
0.1400
1.50 2.50 3.50 4.50 5.50 6.50 7.50 8.50 9.50
Strike [%]
Fig. 18.11. SABR short maturity caplet smile inexact correction: very irregular smile
294
SABR also prevents overshoots (we typically see these occurring at low strikes),
maintaining the same global volatility level and capturing the general smile convexity.
The SABR that we have implemented has a particular characteristic. This is, it has
the flexibility to perform an independent smile evaluation, and then a global volatility
level evaluation:
We usually have a set of data variables that reproduce the general form of the
smile. However, it is common also to have an independent market quoted value for
the ‘at the money’ volatility. The two are not necessarily obtained from the same set of
data, meaning that they may sometimes not be coherent. See below how the data
points are inconsistent- we obtain a general smile that does not pass through the ‘at
the money’ point.
Smile Fitting
0.24
0.16
Black Vol
0.08
'at the money'
'at the money'

general smile
0
1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
Strike [% ]
Fig. 18.12. Difference in general smile and ‘at the money’ level
For this reason, we use one set of data (without the ‘at the money’) to generate the
smile, and we then displace the curve vertically, thus forcing it to pass through the
desired ‘at the money’ value.
295
Smile Fitting
0.24
0.16
Black Vol
0.08
'at the money'
'at the money'

general smile
Corrected
0
1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
Strike [% ]
Fig. 18.13. Smile correction towards ‘at the money level’
18.3 3D Analysis
If we were to analyse the form of the entire caplet volatility surface, we would see
in a summarized way, the improvements achieved at each phase of the interpolation
process.
18.3.1 Interpolation in Maturities
Caplet Volatility Surface - Linear Cap Interpolation
0.35
0.3
l
0.25
0.2
0.15
Black Vol
0.1
0.05
0
0.08
0.015
1.50
3.50
5.50
7.50
9.50
11.50
13.50
0.065
15.50
Strike
17.50
19.50
21.50
23.50
25.50
27.50
29.50
Maturity [Years]
296
Fig. 18.14. Initial linear interpolated caplet volatility surface
Caplet Volatility Surface - Cubic Smile Interpolation
0.35
0.3
l
0.25
0.2
Black Vol
0.15
0.1
0.05
0.25
2
4
6
1.5
8
10
12
14
16
18
Strike 7 20
22
24
26
28
30
Maturity [Years]
Fig. 18.15. Cubic Spline caplet volatility surface
Caplet Volatility Surface -SABR
0.35000
0.30000
0.25000
Black Vol
0.20000
0.15000
0.10000
0.05000
0.25
2
4
6
8
10
12
14
16
18
0.00000
20
22
24
1.5
26
28
Maturity [Years]
10.00 30
2.25
3.50
6.00
Strike
Fig. 18.16. SABR smooth interpolated smile surface with cubic spline
We see from a comparison of the above three caplet volatility surfaces that the
cubic spline successfully smoothens the surface along the maturity axis. Notice how
the surface is particularly irregular with linear interpolation. Notice also that the
SABR may have slightly different (and may we say better) values along the maturity
297
axis, despite having used the same cubic spline algorithm as Fig. 18.11 because an
SABR fitting has been performed afterwards.
18.3.2 Smile Analysis
Linear or Cubic Smile Interpolation SABR Smile

0.35 0.35
0.3 0.30
0.25 0.25
0.2 0.20
Black Vol
Black Vol
0.15 0.15
0.1 0.25 0.10 0.25

7.5 8
15 Maturity [Years] Maturity [Years]
0.05 0.05 16
22.5 24
0 30 0.00
1.5
2
2.5
3.5
5
7
10
1.5
2
2.5
3.5
5
7
10
Strike Strike
Fig. 18.17. Irregular Smile for both linear interpolation and cubic spline, whereas
SABR presents a much smoother outline
We clearly see form the above two graphs that we have successfully achieved the
interpolation in smile that we set out for. The visual comparison is self explanatory.
298
18.3.3 Detailed Analysis
While working with our HJM model, we came across a necessity of particular
interest. The caplet volatility surface that the bank was generating presented a
particularly annoying bump ‘at the money’. This was giving way to immense
difficulties in the product calibration process. The anomaly tended to occur especially
amongst smiles with low maturities- often between one and two years. We discovered
that it occurred at a strike around 4,5 years, which coincided exactly with a column of
strikes that was being artificially created through linear interpolation. Avoiding this
linear interpolation between strikes was already capable of eliminating a great portion
of the bump.
The implementation of a constrained cubic spline cap interpolation yielded the

below pink graph. As can be seen, the cubic spline improves the smile tremendously.
However, it does not really solve the problem, for it seems to simply displace the
bump further towards lower strikes.
Current 'Bump'
0,27
l
0,22
Linear cap interpolation
constrained cubic interpolation
Blackvol
0,17
0,12
0,07
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
Fig. 18.18. Smile bump ‘at the money level’ in linear cap interpolation; maturity of 1,5
years
That the anomaly should be located beyond 2,5% is extremely suitable for our
purposes. It means that for strikes larger than this, the entire curve appears to be very
smooth and thus valid. Furthermore, discussions with traders have provided the
necessary insight so as to confirm that strikes lower than 2,5% are not liquid enough,
299
and so are not even traded: This implies that traders already tend to reject the
problematic region below 2,5%, meaning that our anomaly would not come into play.
A last factor to support the fact that the strange ‘elbow’ below K = 2,25% is not of
great importance is that these values are no longer going to be quoted by REUTERS
(from which we currently obtain our data), meaning that the entire problem would
disappear.
A further discussion on the above subject could extend into analysing what the
SABR smoothing produces in the above situation, and what the ‘conversion from 3
months to six months’ algorithm yields. Recall what was discussed in section 16.16
where the 3 month quotes were converted to artificial 6 month quotes. Remember also
how this only affected the data between the 1 year and 2 year time-periods. All other
dates coincided exactly with the cubic spline general method.
The main difference we observe among the below graphs is that the 3 month
version also smoothens out the bump problem, but does so at a higher volatility level,
i.e., it provides a smooth smile tangent almost to the top of the bump. Notice that its
‘elbow’ is also displaced.
Smile Comparison
0,3
0,25
linear cap interpolation

l
constrained cubic spline

3M to 6M cubic spline
0,2
Black Vol
0,15
0,1
0,05
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
Fig. 18.19. Caplet 3M to 6M smile for a maturity of 1,5 years
From a quant’s perspective, we cannot really choose which of the spline curves is
more correct visually. It is necessary for the curves to be implemented and used in
300
calibrations in order to see whether the prices they yield for certain products coincide
with the market prices.
We present below the variations amongst the three curves for the 1,5 and 2 year
maturities. Beyond this, cubic and ‘3M to 6M’ are identical. Notice that in these cases
the 3 month correction lies below the other two. Thus, the correction seems to present
greater fluctuations in volatility level than the other two curves.
Maturity 1Y Maturity 2Y
0,3
0,29
0,25
l
l
0,24
Linear cap interpolation Linear cap interpolation

0,2 constrained cubic interpolation constrained cubic interpolation
3M to 6M 3M to 6M
0,19
Black vol
Black vol
0,15
0,14
0,1
0,05 0,09
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%] Strike [%]
Fig. 18.20. Comparisons in cubic and 3M to 6M adjustments
18.3.4 The SABR Strike Adjustment
A detailed analysis of the SABR curve’s dynamics proved that the adjustment
was not perfect. There are several important features to notice when applying an
SABR. Recall that we had implemented an algorithm that fitted the SABR curve to a
specific volatility level. This level must be taken at a point of interest and relevance.
Below, we present a case in which it was taken for a strike of 3,08%. It is clear that
the adjustment of the SABR to the cubic spline curve is exact at the corresponding
volatility level of 0,158. However, the adjustment in other areas of the graph –
especially around the ‘at the money’ region and beyond- is a lot worse.
301
SABR adjusted at K = 3,08

0,3
0,25
linear cap interpolatin
Cubic spline
Black Volatility
SABR adjusted at K=3,08

0,2
0,15
0,1
0,05
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
Fig. 18.21. SABR strike adjustment
SABR adjusted at K=4,5

0,29
l
0,24
Linear cap interpolation
Cubic spline
SABR adjusted at the money
0,19
Black Volatility
0,14
0,09
0,04
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
Fig. 18.22. SABR at the money strike adjustment, β = 1
Above in Fig. 18.22, for a strike at the money, the general level is much better
achieved. However we notice a possible problematic ‘undershoot’ in the ‘at the
money’ region.
It would be necessary, as we said previously, to analyse through calibrations

whether the above is in fact a correct, viable option, or whether the SABR is producing
a form that is too pronounced.
Aiming to analyse now the flexibility of the SABR model, we proceeded to vary
some of its parameters so as to see if it is possible to modify at our own will the form
302
of the curve. Our initial intention is to avoid the undershoot and to resemble as best
as possible the cubic spline interpolation curve, avoiding however its peculiar elbow
for low strikes.
Before proceeding any further, it is necessary to state clearly that the SABR is not
an exact method, whereas the cubic spline is. Thus, the SABR does not satisfy the
condition that
∑ caplets = Forward cap

but instead, constructs its caplets on a best fit minimisation procedure that gives
more importance to the fitting of a smooth curve.
18.3.5 The Beta Parameter
A first idea was to refer back to the SABR model and to realise that the β
parameter which we initially thought of as playing a minor role, could have a much
greater impact in our graphs than what we initially expected. Recall that the SABR
model was:

(18.1)
dα = να dW2 α (0) = α
This means that a β = 1 yields a flat lognormal model, whereas β=0 yields a
normal skew model.
Notice how in Fig. 18.22, we use a flat lognormal model. This means the SABR
can only adapt to the caplet points with a very pronounced curvature that
undershoots. A more skewed model is capable of solving the problematic. See below
with β =0 how the problem is already greatly reduced.
303
SABR, B=0
0,3
0,25
l
linear cap interpolation

constrained cubic spline
0,2
3M to 6M cubic spline
Black Vol
0,15
0,1
0,05
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
Fig. 18.23. SABR β = 0, normal skew; maturity 1 year
However, for long maturities that present a very flat smile, the lognormal model
turns out to be much more adequate than the skew. See below for a detailed analysis.
18.3.6 SABR Weights
Recall that we had initially implemented the weights as
1
wi = (18.2)
1 + ( Ki − F )
2
This implies that we give greater weights to the values that are closest to the ‘at
the money’ strike, whereas more distant values do not have such a great impact in
determining the form of the SABR. We will see now that maintaining the β parameter
and simply modifying the weights enables us to decide in which area we want to
stress our SABR. We can also choose to consider all areas as being equally important.
We shall compare the above weighting scheme therefore with a wi = 1 homogeneous
form
304
18.3.7 Global Analysis
We present below the cases that we believe, best summarize the range of tests
performed. They analyse the adequacy of the different parameters within the SABR
model to the fitting of the caplet volatility smile.
SABR β=1, 1,5Y , weighted

SABR β=1, 1,5Y , w=1
0,27
l
0,27
l
0,22 0,22
Black Vol
0,17 0,17
Black Vol
0,12 0,12
0,07 0,07
1,5 3,5 5,5 7,5 9,5 1,5 3,5 5,5 7,5 9,5
SABR β=0, 1,5Yweighted,

SABR β=0,5 1,5Y w=1
0,27
0,27
0,22
Black Vol
0,22
Black Vol
0,17
0,17
0,12
0,12
0,07 0,07
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5 1,5 3,5 5,5 7,5 9,5
Fig. 18.24. SABR comparisons between long and short maturities, varying the β and
the weights
We see clearly that for flat short maturities, the caplet curve has a pronounced
skew, meaning that the normal skew model β = 0 adapts better- with less undershoot.
As for the weighting, it seems as if the weighted parameters act better for short
maturities.
305
For flat long maturities, both the β curves are capable of adapting relatively well,
although the β=1 flat lognormal tends to be slightly better. The greatest impact is
achieved through the variation of the weights attributed to each region. Below, we see
that uniform curves encompass the curve from below.
0,19 SABR β=1, 8Y , w=1 SABR β=1, 8Y , weighted

0,2
l
0,18
0,19
0,17 0,18
0,16 0,17
Series1
linear cap
0,15 Series2
interpolation 0,16
Series3
constrained cubic
0,15
Black Vol
0,14
0,14
0,13
0,13
0,12
0,12
0,11
0,11
1,5 3,5 5,5 7,5 9,5
1,5 3,5 5,5 7,5 9,5
Strike [%]
Strike [%]
Fig. 18.25. Weighted SABR, β = 1, Maturity 8Y
We conclude from the above study that it seems that an even weighting of wi = 1
is more adequate for our caplet volatility surface . Furthermore, it seems that the ideal
adjustment does not consist in a unique β parameter. Instead, we see how skewed
models with β = 0 are more adequate for short maturities, whereas flat lognormal
models tend to be better for long maturities.
We could consider implementing this variation of β in our final model, or

perhaps even attempt to calibrate our model for the best possible β.
306
18.4 Algorithm
We proceed now to outline the algorithm that was finally implemented. From all
the above alternatives, we finally decided to perform a cubic spline cap interpolation
on the initial cap data along the maturity direction- other interpolation methods or
even the optimisation algorithms can alternatively be selected for use at this stage.
From these interpolated exact caps we easily construct the interpolated

capforwards, and from these extract the corresponding (and therefore also exact)
caplets. This can be done with or without a 3 month to 6 month adjustment.
If the user wants to further smoothen the curve, he has the opportunity of
selecting an SABR fit, where he also has the freedom to choose the beta factor and
weights that he prefers. From this, the final caplet matrix is recalculated.
307
Strikes K
Maturities T
Market Flat
σCaps
Maturity Interpolation
TYPE I: cubic spline
Quadratic
Linear
TYPE II quadratic optimisation
Linear optimisation
Piecewise constant
Adjustment
Option?
0,5 to 2Y 0,5 to 2Y
3M= 6Mcaplets 3Mto 6M caplets
adjusted
Capforward creation
Caplets extracted
Strikes K
Maturities T
no
SABR?
Final cubic
spline
yes σCaplets
SABR fit for each

maturity
(caplet matrix row)
Strikes K
Maturities T
Final SABR
σCaplets
Fig. 18.26. Caplet volatility surface construction algorithm
308
18.5 Future Developments
Along this line, we have still not closed any of the alternatives, but left all the
SABR parameters and interpolation or optimisation algorithms available for the end
trader. Depending on his calibrations and results, an optimum set of parameters could
be selected to finally implement. From our initial tests however, it seems that the best
combinations always include cubic splines. These could be sufficient on their own if
an exact calculation is desired. For further smoothness, we have found an SABR to be
necessary, ideally with constant weights, w=1, and with a variation in the beta
parameter from β=0 for low maturities and β=1 for high maturities.
There is a further possibility to include extra data in the caplet volatility matrix.
We did not have time to implement the algorithm, but it is the logical next step if such
information wants to be added to the adjustment.
REUTERS or other market quotes often add an independent column to the cap
matrix that we have used so far, which includes ‘at the money ‘caplet quotes for each
maturity. The difficulty in incorporating them into the already used matrices is the
fact that they do not correspond to a single unique strike but instead, each maturity
has its own strike.
We present here the procedure that we recommend following.
CAPS STRIKES ATM

Fixing End 1,5 2 3 4 5 6 Vol
0,5 1 σU=1,K=1.5 σU =1, K=2 σU=1,K=3 σU=1,K= 4 σU=1,K= 5 σU=1,K= 6 σ1ATM K1
0,5 1,5 σU=1.5,K= 1.5 σK=1.5, U=2 σK=1.5, U=3 σK=1.5,U= 4 σK=1.5,U= 5 σK=1.5,U= 6 σ1,5 ATM K1,5
0,5 2 σK=2, U=1.5 σK=2,U= 2 σK=2,U= 3 σK=2,U= 4 σK=2, U=5 σK=2,U=6 σ2 ATM K2
0,5 3 σK=3,U= 1.5 σK=3,U= 2 σK=3, U=3 σK=3,U= 4 σK=3,U= 5 σK=3,U= 6 σ3 ATM K3
0,5 4 σK=3, U=1.5 σK=3,U= 2 σK=3, U=3 σK=3, U=4 σK=4,U= 5 σK=4, U=6 σ4 ATM K4
Fig. 18.27. Cap market quotes
1. We are going to perform a step construction. The first idea consists in taking
the first row from the cap market quotes (0,5 to 1Y). We know that for these, we can
either perform a three month to six month adjustment, or directly take them as being
equal to the six month caplets. With our row of caplets, we also have our ATM cap
309
volatility with its particular strike. For this first step, we can also consider the ATM
cap as being equal to the caplet volatility.
Caps = Caplets = Capforwards
0,5 1 σU=1,K=1.5 σU =1, K=2 σU=1,K=3 σU=1,K= 4 σU=1,K= 5 σU=1,K= 6 σ1ATM K1
With all this data and constructing the forwards and σATM necessary for the SABR,
we can proceed to create our SABR smile for this first row. (Notice that the σATM for
the cap and the σATM for the caplet are two separate entities. The first is quoted by the
market and is used as an additional point in our curve. The caplet’s ATM is calculated
directly from its corresponding forward, as stated in 17.2.2).
0,17
0,16
0,15
l
0,14
Black Vol
0,13
0,12
0,11
0,10
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
2. For the second row in the cap matrix, (0,5Y to 1,5Y), we can no longer consider
the caps and caplets as having equal volatilities. Now, using our current row and the
previous cap row we can construct the corresponding cap forward as their difference.
CapForward = Cap0,5to1,5Y − Cap0,5to1Y
Cap:
0,5 1 σU=1,K=1.5 σU =1, K=2 σU=1,K=3 σU=1,K= 4 σU=1,K= 5 σU=1,K= 6

0,5 1,5 σU=1.5,K= 1.5 σK=1.5, U=2 σK=1.5, U=3 σK=1.5,U= 4 σK=1.5,U= 5 σK=1.5,U= 6 σ1,5 ATM K1,5
CapForward=Caplet (from subtracting prices and inverting Black Scholes)
1 1,5 σU=1.5,K= 1.5 σK=1.5, U=2 σK=1.5, U=3 σK=1.5,U= 4 σK=1.5,U= 5 σK=1.5,U= 6
This must be directly equal to the corresponding caplets, as the capforward is

constructed over a unique 6 month interval.
We cannot however construct the K1,5 ATM capforward by subtracting the ATM
cap from the previous row’s ATM 1Y cap as they correspond to different strikes (K1
310
and K1,5 respectively). To construct the K1,5 ATM capforward we need to extract the
corresponding previous row’s K1,5Y cap by interpolation. (where we have included
the previous row’s ATM in the data)
K1,5 ATM
Fixing End 1,5 2 3 K1 ATM 4 5 6

0,5 1 σU=1,K=1.5 σU =1, K=2 σU=1,K=3 σ1 ATM K1 σU=1,K= 4 σU=1,K= 5 σU=1,K= 6
σ1Y ATM K1,5Y
Now with this interpolated (red arrow) σU=1 K(K1,5Y), the capforward ATM is
constructed by subtraction of the two caps with equal K1,5ATM strike.
σ1,5 ATM K1,5 σ1Y ATM K1,5Y σ1,5 ATM K1,5
The caplets are followingly created, and then exported to be smoothened through
the SABR fit.
0,17
0,16
0,15
l
0,14
Black Vol
0,13
0,12
0,11
0,10
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
3. The 0,5Y to 2Y cap row is analogous to the former.
4. The following row proves to be different. The 0,5Y to 3Y cap row would
produce capforwards composed by two 6 month caplets. Instead, we must take all the
previous cap rows and use a cubic spline interpolator to generate an intermediate
fictitious 2,5Y cap row.
Caps
0,5 2 σK=2, U=1.5 σK=2,U= 2 σK=2,U= 3 σK=2,U= 4 σK=2, U=5 σK=2,U=6 σ2 ATM K2
0,5 3 σK=3,U= 1.5 σK=3,U= 2 σK=3, U=3 σK=3,U= 4 σK=3,U= 5 σK=3,U= 6 σ3 ATM K3
Cubic spline cap interpolation
311
0,5 2 σK=2, U=1.5 σK=2,U= 2 σK=2,U= 3 σK=2,U= 4 σK=2, U=5 σK=2,U=6

0,5 2,5 σU=2.5,K= 1,5 σK=2.5, U=2 σK=2.5, U=3 σK=2.5,U= 4 σK=2.5,U= 5 σK=2.5,U= 6
0,5 3 σK=3,U= 1.5 σK=3,U= 2 σK=3, U=3 σK=3,U= 4 σK=3,U= 5 σK=3,U= 6
With it we can extract a first intermediate forward cap volatility (2 to 2,5Y), form
which the 2,5Y maturity caplets can easily be extracted. The 3Y maturity caplets are
now constructed as
CapForward 2 to 3Y = Caplet2 to 2,5Y + Caplet2,5 to 3Y
We obtain the Caplets
2 2,5 σU=1.5,K= 2,5 σK=2.5, U=2 σK=2.5, U=3 σK=2.5,U= 4 σK=2.5,U= 5 σK=2.5,U= 6
2 3 σK=3,U= 1.5 σK=3,U= 2 σK=3, U=3 σK=3,U= 4 σK=3,U= 5 σK=3,U= 6
The difficulty arises now with the extra ATM cap volatility to be included. Notice
that it has a different strike K3Yto the previous row’s K2Y ATM. We interpolate an
intermediate strike for the artificially created 2,5Y caps.
K 3Y ATM + K 2Y ATM
K 2,5Y ATM =
2
The corresponding σU=2,5 K2,5Y is obtained by interpolating in the new cap row:
K2,5 ATM
With this artificial strike, we must also now interpolate in the 0,5 to 2Y cap row to
extract the corresponding cap quote.
K2,5 ATM
Fixing End 1,5 2 3 K2 ATM 4 5 6

0,5 2 σK=2, U=1.5 σK=2,U= 2 σK=2,U= 3 σ2 ATM K2 σK=2,U= 4 σK=2, U=5 σK=2,U=6
With the newly obtained σU=2 K(K2,5Y), and the previous σU=2,5 K2,5Y, the capforward
for the 2,5Y strike can be created, and from their subtracted prices, the caplet can be
constructed.
σ2,5 ATM K2,5Y
312
Analogously, the 3Y maturity ATM cap requires that we interpolate in the 2,5Y
cap row the corresponding cap for the 3Y ATM strike.
K3 ATM
σ2,5 ATM K3Y
With it, we can then construct the capforward at this 3Y ATM strike and from it,
extract the caplets.
σ3 ATM K3 σ2,5 ATM K3Y σ2,5 ATM K3Y
The two rows of caplets can now be sent to the SABR algorithm to construct their
smoothened smiles.
2 2,5 σU=1,5,K= 2,5 σK=2.5, U=2 σK=2.5, U=3 σK=2.5,U=4 σK=2.5,U=5 σK=2.5,U=6 σ2,5 ATM K2,5Y
2 3 σK=3,U=1.5 σK=3,U= 2 σK=3, U=3 σK=3,U= 4 σK=3,U= 5 σK=3,U=6 σ3 ATM K3Y
0,16
0,14
l
0,12
Black Vol
0,10
0,08
0,06
0,04
1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 9,5
Strike [%]
Really, the above could have been performed without a stepwise process, but
with a bulk treatment. Notice that the SABR creation is independent of the rest of the
process, as we never use the data which it provides. Therefore, we could have simply
constructed all the caplet cubic spline matrix incorporating a new end row with an
ATM variable strike. This would be an exact matrix on which a further SABR could be
constructed if necessary.
CAPLETS STRIKES ATM

Fixing End 1,5 2 3 4 5 6 K Vol
0,5 1 σU=1,K=1.5 σU =1, K=2 σU=1,K=3 σU=1,K= 4 σU=1,K= 5 σU=1,K= 6 K1ATM σ1ATM K1
1 1,5 σU=1.5,K= 1.5 σK=1.5, U=2 σK=1.5, U=3 σK=1.5,U= 4 σK=1.5,U= 5 σK=1.5,U= 6 K1,5 ATM σ1,5 ATM K1,5
1,5 2 σK=1, U=1.5 σK=2,U= 2 σK=2,U= 3 σK=2,U= 4 σK=2, U=5 σK=2,U=6 K2 ATM σ2 ATM K2
2 2,5 σU=1,5,K= 2,5 σK=2.5, U=2 σK=2.5, U=3 σK=2.5,U= 4 σK=2.5,U= 5 σK=2.5,U= 6 K2,5 ATM σ2,5 ATM K2,5Y
2 3 σK=3,U= 1.5 σK=3,U= 2 σK=3, U=3 σK=3,U= 4 σK=3,U= 5 σK=3,U= 6 K3 ATM σ3 ATM K3Y
3 4 σK=3, U=1.5 σK=3,U= 2 σK=3, U=3 σK=3, U=4 σK=4,U= 5 σK=4, U=6 K4 ATM σ4 ATM K4
313
Chapter 19 Summary and Conclusions
19. Summary and Conclusions
In this project, we sought to approach the study of the Heath-Jarrow-Morton

framework, seeking to optimize its speed of calibration and robustness through the
use of approximate formulas and other possible alternative methods. We set out to
also examine the degree of control that the Banco Santander’s implemented model
had over its solutions, and whether or not they were unique.
The first main problem encountered was the framework’s failure to calibrate
when attempting to model rates whose time to maturity exceeded five years. A crucial
objective was to identify and solve the cause of these errors.
There also appeared to be specific cases in which the HJM program ceased to
calibrate. These cases were isolated so as to examine whether the problem was due to
an anomaly in the market data, or whether it was due to an internal error of the
program.
Lastly, the dependence between the price and our model’s sigma parameter
proved to be the cause of a limiting value in the implied model’s volatility surface.
Whenever the true market quoted prices lay below this boundary, we found it
impossible for our model to replicate the market prices. The two initial suggested
solutions were:
· a modification of the volatility term of the Γ diffusion function for each bond
price and more specifically, variation of its relationship with respect to the α
parameter controlling the log-normality of the distributions.
· A new election of the underlying statistical distribution to use for the

volatility of volatilities λ parameter, with respect to the currently used
lognormal distribution.
It was the second alternative that was finally selected. After several sets of
analysis, we found that there seemed to be nothing drastically wrong with the
theoretical approach followed in the implementation of the HJM framework. As a
314
result of this, we concluded that the problems that were being brought up were
simply the result of an incomplete model. Evidently, a two parameter framework as
the one proposed could never capture a volatility smile by taking only two strike
positions. This proved only sufficient for the creation of skew characteristics. As a
result, we set out immediately in the construction of a three strike model.
The three strikes enhanced framework is entirely analogous to the former, but it
incorporates a stochastic volatility parameter that introduces a new source of
randomness in the former dynamics. Indeed, the modification allows us to capture the
implied smile, but at a price. Calibrations become ever more tedious, and fail to
succeed in pricing exotic products whose maturities extend beyond 16 years. Hence it
turned out that the two factor model was not itself incomplete, but was simply
forecasting the same problems that would later be encountered in the three strikes
model.
At this point, the entire model and calibration procedure was subjected to
questioning in order to try to isolate the particular flaw which the process was
suffering from. A first attempt was to modify the plain vanilla extrapolation
techniques. The chapter that deals with this aspect proves extremely interesting from
an optimisation algorithm’s point of view. It shows a complete deformation in the
model volatility surface. That a simple extrapolation technique should prove to be so
different depending solely on the derivative products taken as inputs aroused the
hypothesis that perhaps the caplet and swaption input data we inherently inconsistent
by construction.
Before deciding as a possible final measure to forbid their joint calibration, a

careful analysis proved that even the calibrations composed uniquely by caplets were
having difficulties during calibration. The problem was successfully resolved through
the selection of an adequate interpolation and extrapolation technique as mentioned
earlier. However the combined calibration with swaptions still proved unsuccessful.
We decided to tackle the problem from its core foundations. This is, to examine in
depth the entire caplet process, from the very input of the data into our model,
straight through to the final results. We found that a possible cause for the problem
could be located at the very beginning of the entire process.
315
Caplet stripping procedures from their corresponding cap quotes are all but
trivial. A very simplistic linear interpolation approach was being used in the Banco
Santander. However, this was proving largely insufficient for the subsequent
calibration procedures. Indeed, it was introducing immense noise at all points in
which data was being interpolated, both in the maturity term structure as in the strike
implied volatility smiles.
We analysed two principal techniques in the extraction of caplet quotes. The first
approach is based on the direct interpolation of cap quotes, thus constructing
capforwards whose duration is exactly the desired 6 or 3 month caplets which we
want to construct. Hence, the caplets are exactly equivalent to the capforward itself,
and so can be obtained directly. This first approach is exact and depends solely on the
interpolation technique implemented. We found that among these, the most
successful was a constrained cubic spline which was capable of producing term
structures that were just as smooth as quadratic optimisation algorithms.
Our second alternative was to obtain the individual caplets by creating a smooth
evolution of caplet values with which to construct the corresponding capforwards.
These capforwards were no longer constructed with 3 or 6 month durations, but were
directly derived from the existent market cap quotes. Thus, the interpolation
algorithm was no longer being performed amongst caps but now within the caplets
themselves that constructed each capforward.
This text presents a wide range of optimisation algorithms to tackle the caplet
fitting to their corresponding capforward. We provide both exact fit analyses, as well
as best fit minimisation approaches. Notably, quadratic optimisation techniques
proved just as efficient as cubic spline methods, yielding almost identical results. We
state here once again, linking with a notable discussion of ‘best fit versus exact fit’
presented in the text, that we have always tended to follow the exact fit procedures.
The final cubic spline that was implemented did away with the sharp features
within the term structure dynamics. Further, it eliminated a particular ‘bump’
anomaly that was present in the short maturity volatility smiles. However, a deeper
analysis of these proved that the smile generated was not as smooth as could be
desired for a successful exotic calibration procedure. As a result, the SABR was
implemented.
316
In this project we have exhaustively analysed the alternatives that the SABR
stochastic interpolator provides. Despite the fact that it is an inexact minimum square
approach, the variations that it provides for any particular caplet market quote are
minimal. In the current project we examined the effects of varying any one of the
parameters within the model, finally arriving at an optimal combination for our caplet
volatility surface. This involved a dynamic beta parameter that shifted from a skewed
normal model in short maturities to a lognormal flat model for longer maturities. We
also examined the effect of playing with the weighting scheme attributed to the
different regions within the volatility smile. Traditional approaches had commonly
used a stronger weighting for the central ‘at the money’ values. We found however
that this was not necessarily the optimal strategy, and that to capture the pronounced
curvature present in short maturity smiles, it was often necessary to attribute an
equally strong weighting to the most extreme strike values. Without this modification,
the smile was often incapable of curving up sufficiently at its end points.
During the development of the project, this precise calculation became critical for
the bank’s interest rate analysts. The problem of a ‘bump’ in the most relevant region
of the smile was a problem that required an immediate solution. The program
developed in this project successfully dealt with this problem, and has been passed on
directly to traders as a direct solution. It will be immediately coded form its Visual
Basic environment to C++ so as to implement it in the Banco Santander Madrid, as
well as in its headquarters in New York and Latin America.
Another major objective of this project was to be able to arrive at an analytical,

approximate formula that would make use and correctly forecast the three volatility
parameters within the global volatility expression Γ : α , σ , λ . The task was entirely
experimental and mathematical, and encompassed a great part of our work. An initial
proposition was encountered under the assumption that the dynamics of the forward
swap rate followed
However, the above was not in the least sufficient to create a consistent
approximation. Moreover, specific analytic expressions had to be encountered and
created for the two principal volatility parameters alpha and sigma. As is clear from
317
the above formulation, our initial approach only attempted to reproduce the two
strike, skewed HJM dynamics, leaving the development of a three strike model for a
later stage.
As is demonstrated in the text, we were faced with a very wide range of

alternatives for the parameters to select. In general, all formulations converged to a
unique expression for sigma. However, we were not as fortunate with the alpha
parameter. For this, we ended selecting a weighted version which not only proved to
be the simplest alternative mathematically, but also turned out to be the most effective
in successfully performing calibrations.
The next development consisted in extending the analytic approximation’s

formulation to a two factor scenario. This is, we incorporated a multifactor approach
into the previous model through the use of two Brownian variables that were
correlated in terms of a common theta parameter through sines and cosines. This
extension appeared relatively simple at first, but the introduction of the new variables
gave rise to a set of equations that was over-determined. Strangely, we once again had
a common sigma parameter that was attained through basically any procedure we
decided to undertake. Its implementation in the one strike two factor models proved
to be extremely potent and accurate. The alpha on the other hand was an entirely
different matter. Every possible approach to the extraction of an analytic formula for
the alpha from the original equations yielded an entirely different expression.
Moreover, most of these either left out the vast majority of the equations, and so were
only consistent with one or two of them, or else, were directly incapable of producing
adequate calibration results.
We finally narrowed down our analysis to two main expressions, one of which
was consistent with all the analytic expressions, and the other simply being a mean
weighting of the various alpha expressions, consequence of an intuitive insight and
with no mathematical background. Further, we had the lingering alternative of simply
using the 1 factor expression for alpha, inputting it directly into the two factor model
as if it were completely independent of the two different Brownian terms.
Of course, it was the mathematically consistent formulation which finally yielded

the best results, and that proved capable of solving the longest and most troublesome
two factor calibrations.
318
The discovery of such an analytic approximation formula was immediately

implemented in the trader systems. It enabled these to bypass the time consuming
MonteCarlo or Tree simulations by instead, providing a first guess to the final HJM
solution parameters. This first guess is so close to the final exact solution that the
subsequent MonteCarlo algorithm needs only perform two or three further iterations
before arriving at the final solution. What is more, the analytic approach generates
also an approximate Jacobian expression that we can substitute for the MonteCarlo’s
expression. This proves to be a great advantage in time computation, as the
MonteCarlo Jacobian is calculated through finite difference bumping techniques that
are extremely long and tiresome. With the analytic Jacobian, we need not recompute
these Jacobians, hence directly providing the necessary approximate slopes.
As a result, the implementation of the analytical approximation in the Banco

Santander has successfully reduced by a factor of ten the time that traders need to
spend in each exotic product’s calibration. The optimization or time reduction of the
calculation process results in a much friendlier tool for these traders, who ideally need
immediate predictions of the exotic products they operate with. Thus, reducing the
calibration’s duration permits them to be more efficient, allowing them to analyse
many more exotic products in the same time.
319
Chapter 20 References
20. References
[Björk 2004] Björk, T., “Arbitrage Theory in Continuous Time”, Oxford Finance,
Oxford University Press, 2004.
[Black 1976] Black, F., “The pricing of commodity Contracts”, Jour. Pol. Ec., 1976
[Bachert 2006] Bachert, P., Gatarek, D., Maksymiuk, R., "The Libor Market Model in
Practice", Wiley Finance, London, 2006.
[Brigo 2001] Brigo, D., Mercurio, F. "Interest rate models. Theory and practice.",
Springer Finance, Berlin, 2001.
[Cole 1968] Cole, J. D., "Perturbation Methods in Applied Mathematics", Ginn –

Blaisdell, 1968
[Cole 1985] Cole, J. D., Kevorkian, J., "Perturbation Methods in Applied

Mathematics", Springer - Verlag, 1985
[Chow 1978] Chow, Y. S., Teicher, H. "Probability Theory. Independence,

Interchangeability, Martingales"., Springer-Verlag, New York, 1978.
[Dupire 1994] Dupire, B., "Pricing with a smile", Risk, 1994
[Dupire 1997] Dupire, B., "Pricing and Hedging with smiles in Mathematics of
Derivative Securities", Cambridge University Press, Cambridge 1997
[Derman 1994] Derman, E., Kani, I., "Riding on a Smile", Risk, 1994
[Hull 1989] Hull, J. C., "Options, Futures, & Other Derivatives", Prentice-Hall
International, New Jersey, 1989.
[Heston 1993] Heston, S. L., "A closed-form solution for options with stochastic
volatility with applications to bond and currency options", The
Review of Financial Studies, 1993
[Hull 1987] Hull, J. C., White, A., "The pricing of options on assets with stochastic
volatilities", J. of Finance, 1987.
320
[Ikeda 1981] Ikeda, N., Watanabe, S. "Stochastic differential equations and

diffusion processes", North-Holland, Amsterdam, 1981.
[Karatzas 1988] Karatzas, I., Shreve, S., "Brownian motion and stochastic calculus",
Graduate Texts in Maths., 113, Springer-Verlag, Berlin, 1988.
[Lewis 2000] Lewis, A., "Option Valuation Under Stochastic Volatility", Financial
Press, 2000
[Musiela 1997] Musiela, M., Rutkowswki, M., "Martingale Methods in Financial

Modelling", Springer, 1997
[Neftci 1996] Neftci, S. N., "Introduction to the Mathematics of Financial

Derivatives", Academic Press, United States of America, 1996.
[Vievergelt Vievergelt, Y., "Splines in Single Multivariable Calculus", Lexington,

1993] MA, 1993
[Okdendal Okdendal, B., "Stochastic Differential Equations", Springer, 1998

1998]
[Wilmott 2000] Wilmott, P., "Paul Wilmott on Quantitative Finance", John Wiley and
Sons, 2000
Papers
[Karoui 2003] Karoui, N. E., "Couverture des risques dans les marches financiers",
Ecole Polytechnique, 2003
[Kruger 2005] Kruger, C. J. C., "Constrained Cubic Spline Interpolation for Chemical
Engineering Applications", 2005
[Hagan 2004] Hagan, P., Konikov, M., "Interest Rate Volatility Cube: Construction
and Use", 2004
[Hagan 2002] Hagan P., Kumar, D., Lesniewski, A., Woodward, D., "Managing Smile
Risk", 2002
[Hagan 1998] Hagan P, Woodward, D,. "Equivalent Black Volatilities", 1998
[Martinez Martinez, M. T., "Interest Rate Bible Notes of some discussions with
2005] Monsieur F. Friggit", 2005
[Mamon 2004] Mamon, R. S., "Three Ways to Solve for Bond Prices in the Vasicek
Model", 2004
321
Chapter 20 References
[McKinley McKinley, S., Levine, M., "Cubic Spline Interpolation", 2000

2000]
[Sen 2001] Sen, S., "Interest Rate Options", 2001
322

A Practical ImplementationOfHJM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Practical ImplementationOfHJM

Uploaded by

Copyright:

Available Formats

A Practical Implementation of the

Proyecto fin de carrera

Autor: Juan Monge Liaño

Directores: François Friggit, Maria Teresa Martínez

Madrid, junio de 2007

A Practical Implementation of the

1. IntroductionEquation Chapter (Next) Section 1 5

1.1 Exotic Options 6

1.5 Document Structure. 10

1.6 Special Acknowledgements 14

1.7 Project Aims 14

2.2 Markov Process 17

2.4 Brownian Motion 21

2.5 Stochastic Differential Equation 22

2.6 Risk Neutral Probability 23

2.7 Solving Stochastic Differential Equations 24

2.8 Ito’s Lemma 24

2.9 Stochastic Integral 27

2.10 Girsanov’s Theorem 27

2.11 Martingale Representation Theorem 30

2.12 Major Stochastic Differential Equations 32

3. Historical ModelsEquation Chapter (Next) Section 1 36

3.1 The Black Scholes Model 36

3.2 Beyond Black 42

3.4 Normal Black 46

3.5 Black Shifted 47

3.6 Local Volatility - Dupire’s Model 48

3.7 Stochastic Volatility 59

4. Interest Rate Models Equation Chapter (Next) Section 1 62

4.1 Rendleman and Bartter model 63

4.2 Ho-Lee model 63

4.3 Black Derman Toy model 64

4.4 Vasicek Model 64

4.5 Cox Ingersoll and Ross model 64

4.6 Black Karasinski model 65

4.7 Hull White Model 65

5. Interest Rate ProductsEquation Chapter (Next) Section 1 68

5.1 Discount Factors 68

5.2 Zero-coupon bond 70

5.3 Interest Rate Compounding 70

5.4 Present Value PV 71

5.5 Internal Rate of Return IRR 72

5.6 Bond Yield (to Maturity) YTM 72

5.7 Coupon Rate 73

5.8 Interest Rates 75

5.9 Forward Rates 78

5.10 Instantaneous forward rate 79

6. More Complex Derivative ProductsEquation Chapter (Next) Section 1 81

6.1 Calls and Puts 81

6.5 FRA Forward 90

6.9 Swaption 100

7. HJMEquation Chapter (Next) Section 1 104

7.1 Introduction 104

7.2 Model Origins 105

7.3 The HJM Development 106

7.4 The rt in the HJM Approach 109

8. Santander HJMEquation Chapter (Next) Section 1 112

8.1 How to choose the γ? 113

8.2 One Factor 114

8.3 Model Implementation 117

8.4 Controlled correlation 126

8.5 Tangible Parameter Explanation 128

9. Numerical MethodsEquation Chapter (Next) Section 1 134

9.1 Discretisation 135