You are on page 1of 238

CALCUL SOCHASTIQUE EN FINANCE

Peter Tankov
peter.tankov@polytechnique.edu
Nizar Touzi
nizar.touzi@polytechnique.edu
Ecole Polytechnique Paris
D

epartement de Math

ematiques Appliqu

ees
Septembre 2010
2
Contents
1 Introduction 9
1.1 European and American options . . . . . . . . . . . . . . . . . . 11
1.2 No dominance principle and rst properties . . . . . . . . . . . . 12
1.3 Put-Call Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Bounds on call prices and early exercise of American calls . . . . 14
1.5 Risk eect on options prices . . . . . . . . . . . . . . . . . . . . . 15
1.6 Some popular examples of contingent claims . . . . . . . . . . . . 16
2 A rst approach to the Black-Scholes formula 19
2.1 The single period binomial model . . . . . . . . . . . . . . . . . . 19
2.2 The Cox-Ross-Rubinstein model . . . . . . . . . . . . . . . . . . 21
2.3 Valuation and hedging
in the Cox-Ross-Rubinstein model . . . . . . . . . . . . . . . . . 22
2.4 Continuous-time limit . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Some preliminaries on continuous-time processes 27
3.1 Filtration and stopping times . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Martingales and optional sampling . . . . . . . . . . . . . . . . . 31
3.3 Maximal inequalities for submartingales . . . . . . . . . . . . . . 33
3.4 Complement: Doobs optional sampling for discrete martingales . 34
4 The Brownian Motion 37
4.1 Denition of the Brownian motion . . . . . . . . . . . . . . . . . 37
4.2 The Brownian motion as a limit of a random walk . . . . . . . . 40
4.3 Distribution of the Brownian motion . . . . . . . . . . . . . . . . 41
4.4 Scaling, symmetry, and time reversal . . . . . . . . . . . . . . . . 44
4.5 Brownian ltration and the Zero-One law . . . . . . . . . . . . . 47
4.6 Small/large time behavior of the Brownian sample paths . . . . . 48
4.7 Quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3
4
5 Stochastic integration with respect to the Brownian motion 55
5.1 Stochastic integrals of simple processes . . . . . . . . . . . . . . . 55
5.2 Stochastic integrals of processes in H
2
. . . . . . . . . . . . . . . 56
5.2.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2.2 The stochastic integral as a continuous process . . . . . . 57
5.2.3 Martingale property and the It o isometry . . . . . . . . . 59
5.2.4 Deterministic integrands . . . . . . . . . . . . . . . . . . . 59
5.3 Stochastic integration beyond H
2
and It o processes . . . . . . . . 60
5.4 Complement: density of simple processes in H
2
. . . . . . . . . . 62
6 It o Dierential Calculus 65
6.1 It os formula for the Brownian motion . . . . . . . . . . . . . . . 66
6.2 Extension to It o processes . . . . . . . . . . . . . . . . . . . . . . 69
6.3 Levys characterization of Brownian motion . . . . . . . . . . . . 72
6.4 A verication approach to the Black-Scholes model . . . . . . . . 73
6.5 The Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . . . . 76
6.5.1 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5.2 Dierential representation . . . . . . . . . . . . . . . . . . 78
6.6 Application to the Merton optimal portfolio allocation problem . 79
6.6.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . 79
6.6.2 The dynamic programming equation . . . . . . . . . . . . 80
6.6.3 Solving the Merton problem . . . . . . . . . . . . . . . . . 81
7 Martingale representation and change of measure 83
7.1 Martingale representation . . . . . . . . . . . . . . . . . . . . . . 83
7.2 The Cameron-Martin change of measure . . . . . . . . . . . . . . 87
7.3 The Girsanovs theorem . . . . . . . . . . . . . . . . . . . . . . . 88
7.4 The Novikovs criterion . . . . . . . . . . . . . . . . . . . . . . . 91
7.5 Application: the martingale approach to the Black-Scholes model 92
7.5.1 The continuous-time nancial market . . . . . . . . . . . 92
7.5.2 Portfolio and wealth process . . . . . . . . . . . . . . . . . 93
7.5.3 Admissible portfolios and no-arbitrage . . . . . . . . . . . 95
7.5.4 Super-hedging and no-arbitrage bounds . . . . . . . . . . 95
7.5.5 The no-arbitrage valuation formula . . . . . . . . . . . . . 96
8 Stochastic dierential equations 99
8.1 First examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 Strong solution of a stochastic dierential equation . . . . . . . . 101
8.2.1 Existence and uniqueness . . . . . . . . . . . . . . . . . . 101
8.2.2 The Markov property . . . . . . . . . . . . . . . . . . . . 104
8.3 More results for scalar stochastic dierential equations . . . . . . 104
8.4 Linear stochastic dierential equations . . . . . . . . . . . . . . . 108
8.4.1 An explicit representation . . . . . . . . . . . . . . . . . . 108
8.4.2 The Brownian bridge . . . . . . . . . . . . . . . . . . . . . 109
8.5 Connection with linear partial dierential equations . . . . . . . 110
8.5.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5
8.5.2 Cauchy problem and the Feynman-Kac representation . . 111
8.5.3 Representation of the Dirichlet problem . . . . . . . . . . 113
8.6 The hedging portfolio in a Markov nancial market . . . . . . . . 114
8.7 Application to importance sampling . . . . . . . . . . . . . . . . 115
8.7.1 Importance sampling for random variables . . . . . . . . . 115
8.7.2 Importance sampling for stochastic dierential equations . 117
9 The Black-Scholes model and its extensions 119
9.1 The Black-Scholes approach for the Black-Scholes formula . . . . 119
9.2 The Black and Scholes model for European call options . . . . . 120
9.2.1 The Black-Scholes formula . . . . . . . . . . . . . . . . . . 120
9.2.2 The Blacks formula . . . . . . . . . . . . . . . . . . . . . 123
9.2.3 Option on a dividend paying stock . . . . . . . . . . . . . 123
9.2.4 The Garman-Kohlhagen model for exchange rate options 125
9.2.5 The practice of the Black-Scholes model . . . . . . . . . . 127
9.2.6 Hedging with constant volatility: robustness of the Black-
Scholes model . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.3 Complement: barrier options in the Black-Scholes model . . . . . 134
9.3.1 Barrier options prices . . . . . . . . . . . . . . . . . . . . 135
9.3.2 Dynamic hedging of barrier options . . . . . . . . . . . . 138
9.3.3 Static hedging of barrier options . . . . . . . . . . . . . . 138
10 Local volatility models and Dupires formula 141
10.1 Implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.2 Local volatility models . . . . . . . . . . . . . . . . . . . . . . . . 143
10.2.1 CEV model . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10.3 Dupires formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.3.1 Dupires formula in practice . . . . . . . . . . . . . . . . . 150
10.3.2 Link between local and implied volatility . . . . . . . . . 150
11 Gaussian interest rates models 153
11.1 Fixed income terminology . . . . . . . . . . . . . . . . . . . . . . 154
11.1.1 Zero-coupon bonds . . . . . . . . . . . . . . . . . . . . . . 154
11.1.2 Interest rates swaps . . . . . . . . . . . . . . . . . . . . . 155
11.1.3 Yields from zero-coupon bonds . . . . . . . . . . . . . . . 156
11.1.4 Forward Interest Rates . . . . . . . . . . . . . . . . . . . . 156
11.1.5 Instantaneous interest rates . . . . . . . . . . . . . . . . . 157
11.2 The Vasicek model . . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.3 Zero-coupon bonds prices . . . . . . . . . . . . . . . . . . . . . . 159
11.4 Calibration to the spot yield curve and the generalized Vasicek
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11.5 Multiple Gaussian factors models . . . . . . . . . . . . . . . . . . 162
11.6 Introduction to the Heath-Jarrow-Morton model . . . . . . . . . 165
11.6.1 Dynamics of the forward rates curve . . . . . . . . . . . . 165
11.6.2 The Heath-Jarrow-Morton drift condition . . . . . . . . . 165
11.6.3 The Ho-Lee model . . . . . . . . . . . . . . . . . . . . . . 167
6
11.6.4 The Hull-White model . . . . . . . . . . . . . . . . . . . . 167
11.7 The forward neutral measure . . . . . . . . . . . . . . . . . . . . 168
11.8 Derivatives pricing under stochastic interest rates and volatility
calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.8.1 European options on zero-coupon bonds . . . . . . . . . . 170
11.8.2 The Black-Scholes formula under stochastic interest rates 171
12 Introduction to nancial risk management 173
12.1 Classication of risk exposures . . . . . . . . . . . . . . . . . . . 174
12.1.1 Market risk . . . . . . . . . . . . . . . . . . . . . . . . . . 174
12.1.2 Credit risk . . . . . . . . . . . . . . . . . . . . . . . . . . 176
12.1.3 Liquidity risk . . . . . . . . . . . . . . . . . . . . . . . . . 177
12.1.4 Operational risk . . . . . . . . . . . . . . . . . . . . . . . 178
12.1.5 Model risk . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.2 Risk exposures and risk limits: sensitivity approach to risk man-
agement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.3 Value at Risk and the global approach . . . . . . . . . . . . . . . 181
12.4 Convex and coherent risk measures . . . . . . . . . . . . . . . . . 185
12.5 Regulatory capital and the Basel framework . . . . . . . . . . . . 187
A Preliminaires de la theorie des mesures 191
A.1 Espaces mesurables et mesures . . . . . . . . . . . . . . . . . . . 191
A.1.1 Alg`ebres, alg`ebres . . . . . . . . . . . . . . . . . . . . . 191
A.1.2 Mesures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.1.3 Proprietes elementaires des mesures . . . . . . . . . . . . 193
A.2 Lintegrale de Lebesgue . . . . . . . . . . . . . . . . . . . . . . . 195
A.2.1 Fonction mesurable . . . . . . . . . . . . . . . . . . . . . . 195
A.2.2 Integration des fonctions positives . . . . . . . . . . . . . 196
A.2.3 Integration des fonctions reelles . . . . . . . . . . . . . . . 199
A.2.4 De la convergence p.p. `a la convergence L
1
. . . . . . . . 199
A.2.5 Integrale de Lebesgue et integrale de Riemann . . . . . . 201
A.3 Transformees de mesures . . . . . . . . . . . . . . . . . . . . . . . 202
A.3.1 Mesure image . . . . . . . . . . . . . . . . . . . . . . . . . 202
A.3.2 Mesures denies par des densites . . . . . . . . . . . . . . 202
A.4 Inegalites remarquables . . . . . . . . . . . . . . . . . . . . . . . 203
A.5 Espaces produits . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
A.5.1 Construction et integration . . . . . . . . . . . . . . . . . 204
A.5.2 Mesure image et changement de variable . . . . . . . . . . 206
A.6 Annexe du chapitre A . . . . . . . . . . . . . . . . . . . . . . . . 207
A.6.1 syst`eme, dsyst`eme et unicite des mesures . . . . . . . 207
A.6.2 Mesure exterieure et extension des mesures . . . . . . . . 208
A.6.3 Demonstration du theor`eme des classes monotones . . . . 210
7
B Preliminaires de la theorie des probabilites 213
B.1 Variables aleatoires . . . . . . . . . . . . . . . . . . . . . . . . . . 213
B.1.1 alg`ebre engendree par une v.a. . . . . . . . . . . . . . . 213
B.1.2 Distribution dune v.a. . . . . . . . . . . . . . . . . . . . . 214
B.2 Esperance de variables aleatoires . . . . . . . . . . . . . . . . . . 215
B.2.1 Variables aleatoires `a densite . . . . . . . . . . . . . . . . 215
B.2.2 Inegalites de Jensen . . . . . . . . . . . . . . . . . . . . . 216
B.2.3 Fonction caracteristique . . . . . . . . . . . . . . . . . . . 217
B.3 Espaces L
p
et convergences
fonctionnelles des variables aleatoires . . . . . . . . . . . . . . . . 219
B.3.1 Geometrie de lespace L
2
. . . . . . . . . . . . . . . . . . 219
B.3.2 Espaces L
p
et L
p
. . . . . . . . . . . . . . . . . . . . . . . 220
B.3.3 Espaces L
0
et L
0
. . . . . . . . . . . . . . . . . . . . . . . 221
B.3.4 Lien entre les convergences L
p
, en proba et p.s. . . . . . . 222
B.4 Convergence en loi . . . . . . . . . . . . . . . . . . . . . . . . . . 225
B.4.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . 225
B.4.2 Caracterisation de la convergence en loi par les fonctions
de repartition . . . . . . . . . . . . . . . . . . . . . . . . . 226
B.4.3 Convergence des fonctions de repartition . . . . . . . . . . 227
B.4.4 Convergence en loi et fonctions caracteristiques . . . . . . 228
B.5 Independance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
B.5.1 alg`ebres independantes . . . . . . . . . . . . . . . . . . 229
B.5.2 variables aleatoires independantes . . . . . . . . . . . . . 230
B.5.3 Asymptotique des suites devenements independants . . . 231
B.5.4 Asymptotique des moyennes de v.a. independantes . . . . 232
8
Chapter 1
Introduction
Financial mathematics is a young eld of applications of mathematics which
experienced a huge growth during the last thirty years. It is by now considered
as one of the most challenging elds of applied mathematics by the diversity of
the questions which are raised, and the high technical skills that it requires.
These lecture notes provide an introduction to stochastic nance for the
students of third year of Ecole Polytechnique. Our objective is to cover the basic
Black-Scholes theory from the modern martingale approach. This requires the
development of the necessary tools from stochastic calculus and their connection
with partial dierential equations.
Modelling nancial markets by continuous-time stochastic processes was ini-
tiated by Louis Bachelier (1900) in his thesis dissertation under the supervision
of Henri Poincare. Bacheliers work was not recognized until the recent his-
tory. Sixty years later, Samuelson (Nobel Prize in economics 1970) came back
to this idea, suggesting a Brownian motion with constant drift as a model for
stock prices. However, the real success of Brownian motion in the nancial
applications was realized by Fisher Black, Myron Scholes, et Robert Merton
(Nobel Prize in economics 1997) who founded between 1969 and 1973 the mod-
ern theory of nancial mathematics by introducing the portfolio theory and
the no-arbitrage pricing arguments. Since then, this theory gained an impor-
tant amount of rigour and precision, essentially thanks to the martingale theory
developed in the eightees.
Although continuous-time models are more demanding from the technical
viewpoint, they are widely used in the nancial industry because of the sim-
plicity of the resulting formulae for pricing and hedging. This is related to the
powerful tools of dierential calculus which are available only in continuous-
time. We shall rst provide a self-contained introduction of the main concept
from stochastic analysis: Brownian motion, stochastic integration with respect
to the Brownian motion, It os formula, Girsanov change of measure Theorem,
connection with the heat equation, and stochastic dierential equations. We
then consider the Black-Scholes continuous-time nancial market where the no-
arbitrage concept is sucient for the determination of market prices of derivative
9
10 CHAPTER 1. INTRODUCTION
securities. Prices are expressed in terms of the unique risk-neutral measure, and
can be expressed in closed form for a large set of relevant derivative securities.
The nal chapter provides the main concepts in interest rates models in the
gaussian case.
In order to motivate the remaining content of theses lecture notes, we would
like to draw the reader about the following major dierence between nan-
cial engineering and more familiar applied sciences. Mechanical engineering is
based on the fundamental Newtons law. Electrical engineering is based on the
Maxwell equations. Fluid mechanics are governed by the Navier-Stokes and the
Bernoulli equations. Thermodynamics rest on the fundamental laws of conser-
vation of energy, and entropy increase. These principles and laws are derived
by empirical observation, and are suciently robust for the future development
of the corresponding theory.
In contrast, nancial markets do not obey to any fundamental law except the
simplest no-dominance principle which states that valuation obeys to a trivial
monotonicity rule, see Section 1.2 below.
Consequently, there is no universally accurate model in nance. Financial
modelling is instead based upon comparison between assets. The Black-Scholes
model derives the price of an option by comparison to the underlying asset price.
But in practice, more information is available and one has to incorporate the
relevant information in the model. For this purpose, the Black-Scholes model
is combined with convenient calibration techniques to the available relevant
information. Notice that information is dierent from one market to the other,
and the relevance criterion depends on the objective for which the model is built
(prediction, hedging, risk management...). Therefore, the nature of the model
depends on the corresponding market and its nal objective.
So again, there is no universal model, and any proposed model is wrong. Fi-
nancial engineering is about building convenient tools in order to make these
wrong models less wrong. This is achieved by accounting for all relevant in-
formation, and using the only no-dominance law, or its stronger version of no-
arbitrage. An introdcution to this important aspect is contained in Chapter
10.
Given this major limitation of nancial modelling, a most important issue
is to develop tools which measure the risk beared by any nancial position
and any model used in its management. The importance of this activity was
highlighted by the past nancial crisis, and even more emphasized during the
recent subprime nancial crisis. Chapter 12 provides the main tools and ideas
in this area.
In the remaining of this introduction, we introduce the reader to the main
notions in derivative securities markets. We shall focus on some popular exam-
ples of derivative assets, and we provide some properties that their prices must
satisfy independently of the distribution of the primitive assets prices. The
only ingredient which will be used in order to derive these properties is the no
dominance principle introduced in Section1.2 below.
1.1. European and American options 11
1.1 European and American options
The most popular examples of derivative securities are European and American
call and put options. More examples of contingent claims are listed in Section
1.6 below.
A European call option on the asset S
i
is a contract where the seller promises
to deliver the risky asset S
i
at the maturity T for some given exercise price, or
strike, K > 0. At time T, the buyer has the possibility (and not the obligation)
to exercise the option, i.e. to buy the risky asset from the seller at strike K. Of
course, the buyer would exercise the option only if the price which prevails at
time T is larger than K. Therefore, the gain of the buyer out of this contract is
B = (S
i
T
K)
+
= maxS
i
T
K, 0 ,
i.e. if the time T price of the asset S
i
is larger than the strike K, then the
buyer receives the payo S
i
T
K which corresponds to the benet from buying
the asset from the seller of the contract rather than on the nancial market. If
the time T price of the asset S
i
is smaller than the strike K, the contract is
worthless for the buyer.
A European put option on the asset S
i
is a contract where the seller promises
to purchase the risky asset S
i
at the maturity T for some given exercise price, or
strike, K > 0. At time T, the buyer has the possibility, and not the obligation,
to exercise the option, i.e. to sell the risky asset to the seller at strike K. Of
course, the buyer would exercise the option only if the price which prevails at
time T is smaller than K. Therefore, the gain of the buyer out of this contract
is
B = (K S
i
T
)
+
= maxK S
i
T
, 0 ,
i.e. if the time T price of the asset S
i
is smaller than the strike K, then the
buyer receives the payo K S
i
T
which corresponds to the benet from selling
the asset to the seller of the contract rather than on the nancial market. If the
time T price of the asset S
i
is larger than the strike K, the contract is worthless
for the buyer, as he can sell the risky asset for a larger price on the nancial
market.
An American call (resp. put) option with maturity T and strike K > 0
diers from the corresponding European contract in that it oers the possibility
to be exercised at any time before maturity (and not only at the maturity).
The seller of a derivative security requires a compensation for the risk that
he is bearing. In other words, the buyer must pay the price or the premium
for the benet of the contrcat. The main interest of this course is to determine
this price. In the subsequent sections of this introduction, we introduce the no
dominance principle which already allows to obtain some model-free properties
of options which hold both in discrete and continuous-time models.
In the subsequent sections, we shall consider call and put options with ex-
ercise price (or strike) K, maturity T, and written on a single risky asset with
12 CHAPTER 1. INTRODUCTION
price S. At every time t T, the American and the European call option price
are respectively denoted by
C(t, S
t
, T, K) and c(t, S
t
, T, K) .
Similarly, the prices of the American and the Eurpoean put options are respec-
tively denoted by
P(t, S
t
, T, K) and p(t, S
t
, T, K) .
The intrinsic value of the call and the put options are respectively:
C(t, S
t
, t, K) = c(t, S
t
, t, K) = (S
t
K)
+
.
P(t, S
t
, t, K) = p(t, S
t
, t, K) = (K S
t
)
+
,
i.e. the value received upon immediate exercing the option. An option is said
to be in-the-money (resp. out-of-the-money) if its intrinsic value is positive. If
K = S
t
, the option is said to be at-the-money. Thus a call option is in-the-
money if S
t
> K, while a put option is in-the-money if S
t
< K.
Finally, a zero-coupon bond is the discount bond dened by the xed income
1 at the maturity T. We shall denote by B
t
(T) its price at time T. Given the
prices of zero-coupon bonds with all maturity, the price at time t of any stream
of deterministic payments F
1
, . . . , F
n
at the maturities t < T
1
< . . . < T
n
is
given by
F
1
B
t
(T
1
) +. . . +F
n
B
t
(T
n
) .
1.2 No dominance principle and rst properties
We shall assume that there are no market imperfections as transaction costs,
taxes, or portfolio constraints, and we will make use of the following concept.
No dominance principle Let X be the gain from a portfolio strategy with
initial cost x. If X 0 in every state of the world, Then x 0.
1 Notice that, choosing to exercise the American option at the maturity T
provides the same payo as the European counterpart. Then the portfolio con-
sisting of a long position in the American option and a short position in the
European counterpart has at least a zero payo at the maturity T. It then
follows from the dominance principle that American calls and puts are at least
as valuable as their European counterparts:
C(t, S
t
, T, K) c(t, S
t
, T, K) and P(t, S
t
, T, K) p(t, S
t
, T, K)
2 By a similar easy argument, we now show that American and European call
(resp. put) options prices are decreasing (resp. increasing) in the exercise price,
1.2. No dominance and first properties 13
i.e. for K
1
K
2
:
C(t, S
t
, T, K
1
) C(t, S
t
, T, K
2
) and c(t, S
t
, T, K
1
) c(t, S
t
, T, K
2
)
P(t, S
t
, T, K
1
) P(t, S
t
, T, K
2
) and p(t, S
t
, T, K
1
) p(t, S
t
, T, K
2
)
Let us justify this for the case of American call options. If the holder of the
low exrecise price call adopts the optimal exercise strategy of the high exercise
price call, the payo of the low exercise price call will be higher in all states of
the world. Hence, the value of the low exercise price call must be no less than
the price of the high exercise price call.
3 American/European Call and put prices are convex in K. Let us justify this
property for the case of American call options. For an arbitrary time instant
u [t, T] and [0, 1], it follows from the convexity of the intrinsic value that
(S
u
K
1
)
+
+ (1 ) (S
u
K
2
)
+
(S
u
K
1
+ (1 )K
2
)
+
0 .
We then consider a portfolio X consisting of a long position of calls with strike
K
1
, a long position of (1 ) calls with strike K
2
, and a short position of a
call with strike K
1
+ (1 )K
2
. If the two rst options are exercised on the
optimal exercise date of the third option, the resulting payo is non-negative
by the above convexity inequality. Hence, the value at time t of the portfolio is
non-negative.
4 We next show the following result for the sensitivity of European call options
with respect to the exercise price:
B
t
(T)
c (t, S
t
, T, K
2
) c (t, S
t
, T, K
1
)
K
2
K
1
0
The right hand-side inequality follows from the decrease of the European call
option c in K. To see that the left hand-side inequality holds, consider the
portfolio X consisting of a short position of the European call with exercise
price K
1
, a long position of the European call with exercise price K
2
, and a
long position of K
2
K
1
zero-coupon bonds. The value of this portfolio at the
maturity T is
X
T
= (S
T
K
1
)
+
+ (S
T
K
2
)
+
+ (K
2
K
1
) 0 .
By the dominance principle, this implies that c (S
t
, , K
1
) + c (S
t
, , K
2
) +
B
t
()(K
2
K
1
) 0, which is the required inequality.
5 American call and put prices are increasing in maturity, i.e. for T
1
T
2
:
C(t, S
t
, T
1
, K) C(t, S
t
, T
2
, K)and P(t, S
t
, T
1
, K
1
) P(t, S
t
, T
2
, K
2
)
This is a direct consequence of the fact tat all stopping strategies of the shorter
maturity option are allowed for the longer maturity one. Notice that this argu-
ment is specic to the American case.
14 CHAPTER 1. INTRODUCTION
1.3 Put-Call Parity
When the underlying security pays no income before the maturity of the options,
the prices of calls and puts are related by
p(t, S
t
, T, K) = c(t, S
t
, T, K) S
t
+KB
t
(T) .
Indeed, Let X be the portfolio consisting of a long position of a European
put option and one unit of the underlying security, and a short position of a
European call option and K zero-coupon bonds. The value of this portfolio at
the maturity T is
X
T
= (K S
T
)
+
+S
T
(S
T
K)
+
K = 0 .
By the dominance principle, the value of this portfolio at time t is non-negative,
which provides the required inequality.
Notice that this argument is specic to European options. We shall se in
fact that the corresponding result does not hold for American options.
Finally, if the underlying asset pays out some dividends then, the above
argument breaks down because one should account for the dividends received
by holding the underlying asset S. If we assume that the dividends are known
in advance, i.e. non-random, then it is an easy exercise to adapt the put-call
parity to this context. However, id the dividends are subject to uncertainty as
in real life, there is no direct way to adapt the put-call parity.
1.4 Bounds on call prices and early exercise of
American calls
1 From the monotonicity of American calls in terms of the exercise price, we
see that
c(S
t
, , K) C(S
t
, , K) S
t
When the underlying security pays no dividends before maturity, we have the
following lower bound on call options prices:
C(t, S
t
, T, K) c(t, S
t
, T, K) (S
t
KB
t
(T))
+
.
Indeed, consider the portfolio X consisting of a long position of a European
call, a long position of K Tmaturity zero-coupon bonds, and a short position
of one share of the underlying security. The required result follows from the
observation that the nal value at the maturity of the portfolio is non-negative,
and the application of the dominance principle.
2 Assume that interest rates are positive. Then, an American call on a security
that pays no dividend before the maturity of the call will never be exercised early.
Indeed, let u be an arbitrary instant in [t, T),
- the American call pays S
u
K if exercised at time u,
1.5. Risk effect on options 15
- but S
u
K < S KB
u
(T u) because interest rates are positive.
- Since C(S
u
, u, K) S
u
KB
u
(T u), by the lower bound, the American
option is always worth more than its exercise value, so early exercise is never
optimal.
3 Assume that the security price takes values as close as possible to zero. Then,
early exercise of American put options may be optimal before maturity.
Suppose the security price at some time u falls so deeply that S
u
< K
KB
u
(T).
- Observe that the maximum value that the American put can deliver when if
exercised at maturity is K.
- The immediate exercise value at time u is K S
u
> K [K KB
u
(T u)]
= KB
u
(T u) the discounted value of the maximum amount that the put
could pay if held to maturity,
Hence, in this case waiting until maturity to exercise is never optimal.
1.5 Risk eect on options prices
1 The value of a portfolio of European/american call/put options, with com-
mon strike and maturity, always exceeds the value of the corresponding basket
option.
Indeed, let S
1
, . . . , S
n
be the prices of n security, and consider the portfolio
composition
1
, . . . ,
n
0. By sublinearity of the maximum,
n

i=1

i
max
_
S
i
u
K, 0
_
max
_
n

i=1

i
S
i
u
K, 0
_
i.e. if the portfolio of options is exercised on the optimal exercise date of the
option on the portfolio, the payo on the former is never less than that on the
latter. By the dominance principle, this shows that the portfolio of options is
more maluable than the corresponding basket option.
2 For a security with spot price S
t
and price at maturity S
T
, we denote its
return by
R
t
(T) :=
S
T
S
t
.
Denition Let R
i
t
(T), i = 1, 2 be the return of two securities. We say that
security 2 is more risky than security 1 if
R
2
t
(T) = R
1
t
(T) + where E
_
[R
1
t
(T)

= 0 .
As a consequence, if security 2 is more risky than security 1, the above
denition implies that
Var
_
R
2
t
(T)

= Var
_
R
1
t
(T)

+Var[] + 2Cov[R
1
t
(T), ]
= Var
_
R
1
t
(T)

+Var[] Var
_
R
1
t
(T)

16 CHAPTER 1. INTRODUCTION
3 We now assume that the pricing functional is continuous in some sense to be
precised below, and we show that the value of an European/American call/put
is increasing in its riskiness.
To see this, let R := R
t
(T) be the return of the security, and consider the
set of riskier securities with returns R
i
:= R
i
t
(T) dened by
R
i
= R +
i
where
i
are iid and E[
i
[R] = 0 .
Let C
i
(t, S
t
, T, K) be the price of the American call option with payo
_
S
t
R
i
K
_
+
,
and C
n
(t, S
t
, T, K) be the price of the basket option dened by the payo
_
1
n

n
i=1
S
t
R
i
K
_
+
=
_
S
T
+
1
n

n
i=1
S
t

i
K
_
+
.
We have previously seen that the portfolio of options with common maturity
and strike is worth more than the corresponding basket option:
C
1
(t, S
t
, T, K) =
1
n
n

i=1
C
i
(t, S
t
, T, K) C
n
(t, S
t
, T, K).
Observe that the nal payo of the basket option C
n
(T, S
T
, T, K) (S
T
K)
+
a.s. as n by the law of large numbers. Then assuming that the pricing
functional is continuous, it follows that C
n
(t, S
t
, T, K) C(t, S
t
, T, K), and
therefore: that
C
1
(t, S
t
, T, K) C(t, S
t
, T, K) .
Notice that the result holds due to the convexity of European/American call/put
options payos.
1.6 Some popular examples of contingent claims
Example 1.1. (Basket call and put options) Given a subset I of indices in
1, . . . , n and a family of positive coecients (a
i
)
iI
, the payo of a Basket
call (resp. put) option is dened by
B =
_

iI
a
i
S
i
T
K
_
+
resp.
_
K

iI
a
i
S
i
T
_
+
.

Example 1.2. (Option on a non-tradable underlying variable) Let U


t
() be
the time t realization of some observable state variable. Then the payo of a
call (resp. put) option on U is dened by
B = (U
T
K)
+
resp. (K U
T
)
+
.
For instance, a Temperature call option corresponds to the case where U
t
is the
temperature at time t observed at some location (dened in the contract).
1.6. Examples of contingent claims 17
Example 1.3. (Asian option)An Asian call option on the asset S
i
with matu-
rity T > 0 and strike K > 0 is dened by the payo at maturity:
_
S
i
T
K
_
+
,
where S
i
T
is the average price process on the time period [0, T]. With this
denition, there is still choice for the type of Asian option in hand. One can
dene S
i
T
to be the arithmetic mean over of given nite set of dates (outlined
in the contract), or the continuous arithmetic mean...
Example 1.4. (Barrier call options) Let B, K > 0 be two given parameters,
and T > 0 given maturity. There are four types of barrier call options on the
asset S
i
with stike K, barrier B and maturity T:
When B > S
0
:
an Up and Out Call option is dened by the payo at the maturity
T:
UOC
T
= (S
T
K)
+
1
{max
[0,T]
S
t
B}
.
The payo is that of a European call option if the price process of
the underlying asset never reaches the barrier B before maturity.
Otherwise it is zero (the contract knocks out).
an Up and In Call option is dened by the payo at the maturity T:
UIC
T
= (S
T
K)
+
1
{max
[0,T]
S
t
>B}
.
The payo is that of a European call option if the price process of the
underlying asset crosses the barrier B before maturity. Otherwise it
is zero (the contract knocks out). Clearly,
UOC
T
+ UIC
T
= C
T
is the payo of the corresponding European call option.
When B < S
0
:
an Down and In Call option is dened by the payo at the maturity
T:
DIC
T
= (S
T
K)
+
1
{min
[0,T]
S
t
B}
.
The payo is that of a European call option if the price process of
the underlying asset never reaches the barrier B before maturity.
Otherwise it is zero (the contract knocks out).
18 CHAPTER 1. INTRODUCTION
an Down and Out Call option is dened by the payo at the maturity
T:
DOC
T
= (S
T
K)
+
1
{min
[0,T]
S
t
<B}
.
The payo is that of a European call option if the price process of the
underlying asset crosses the barrier B before maturity. Otherwise it
is zero (the contract knocks out). Clearly,
DOC
T
+ DIC
T
= C
T
is the payo of the corresponding European call option.

Example 1.5. (Barrier put options) Replace calls by puts in the previous
example
Chapter 2
A rst approach to the
Black-Scholes formula
2.1 The single period binomial model
We rst study the simplest one-period nancial market T = 1. Let =
u
,
d
,
T the -algebra consisting of all subsets of , and P a probability measure on
(, T) such that 0 < P(
u
) < 1.
The nancial market contains a non-risky asset with price process
S
0
0
= 1 , S
0
1
(
u
) = S
0
1
(
d
) = e
r
,
and one risky asset (d = 1) with price process
S
0
= s , S
1
(
u
) = su , S
1
(
d
) = sd ,
where s, r, u and d are given strictly positive parameters with u > d. Such a
nancial market can be represented by the binomial tree :
time 0 time 1
Su
Risky asset S
0
= s

1
P
P
P
Pq
Sd
Non-risky asset S
0
0
= 1
-
R = e
r
In the terminology of the Introduction Section 1, the above model is the sim-
plest wrong model which illustrates the main features of the valuation theory in
nancial mathematics.
The discouted prices are dened by the value of the prices relative to the
nonrisky asset price, and are given by

S
0
:= S
0
,

S
0
0
:= 1, and

S
1
:=
S
1
R
,

S
0
1
:= 1.
19
20 CHAPTER 2. FIRST APPROACH TO BLACK-SCHOLES
A self-nancing trading strategy is a pair (x, ) R
2
. The corresponding wealth
process at time 1 is given by :
X
x,
1
:= (x S
0
)R +S
1
,
or, in terms of discounted value

X
x,
1
:= x +(

S
1


S
0
).
(i) The No-Arbitrage condition : An arbitrage opportunity is a portfolio strat-
egy R such that
X
0,
1
(
i
) 0, i u, d, and P[X
0,
1
> 0] > 0.
It can be shown that excluding all arbitrage opportunities is equivalent to the
condition
d < R < u. (2.1)
Exercise 2.1. In the context of the present one-period binomial model, prove
that the no-arbitrage condition is equivalent to (2.1).
then, introducing the equivalent probability measure Q dened by
Q[S
1
= uS
0
] = 1 Q[S
1
= dS
0
] = q :=
R d
u d
, (2.2)
we see that the discounted price process satises

S is a martingale under Q, i.e. E


Q
[

S
1
] = S
0
. (2.3)
The probability measure Q is called risk-neutral measure, or equivalent martin-
gale measure.
(ii) Hedging contingent claims : A contingent claim is dened by its payo
B
u
:= B(
u
) and B
d
:= B(
d
) at time 1.
In the context of the binomial model, it turns out that there exists a pair
(x
0
,
0
) R / such that X
x
0
,
0
T
= B. Indeed, the equality X
x
0
,
0
1
= B
is a system of two (linear) equations with two unknowns which can be solved
straightforwardly :
x
0
(B) = q
B
u
R
+ (1 q)
B
d
R
= E
Q
[

B] and
0
(B) =
B
u
B
d
su sd
.
The portfolio (x
0
(B),
0
(B)) satises X
x
0
,
0
T
= B, and is therefore called a
perfect replication strategy for B.
(iii) No arbitrage valuation : Suppose that the contingent claim B is available
for trading at time 0 with market price p(B), and let us show that, under
2.2. Cox-Ross-Rubinstein model 21
the no-arbitrage condition, the price p(B) of the contingent claim contract is
necessarily given by
p(B) = x
0
(B) = E
Q
[

B].
(iii-a) Indeed, suppose that p(B) < x
0
(B), and consider the following portfolio
strategy :
- at time 0, pay p(B) to buy the contingent claim, so as to receive the payo
B at time 1,
- perform the self-nancing strategy (x
0
, ), this leads to paying x
0
at
time 0, and receiving B at time 1.
The initial capital needed to perform this portfolio strategy is p(B) x
0
< 0.
At time 1, the terminal wealth induced by the self-nancing strategy exactly
compensates the payo of the contingent claim. We have then built an arbi-
trage opportunity in the nancial market augmented with the contingent claim
contract, thus violating the no-arbitrage condition on this enlarged nancial
market.
(iii-b) If p(B) > x
0
(B), we consider the following portfolio strategy :
- at time 0, receive p(B) by selling the contingent claim, so as to pay the
payo B at time 1,
- perform the self-nancing strategy (x
0
, ), this leads to paying x
0
at time
0, and receiving B at time 1.
The initial capital needed to perform this portfolio strategy is p(B) +x
0
< 0.
At time 1, the terminal wealth induced by the self-nancing strategy exactly
compensates the payo of the contingent claim. This again denes an arbi-
trage opportunity in the nancial market augmented with the contingent claim
contract, thus violating the no-arbitrage condition on this enlarged nancial
market.
2.2 The Cox-Ross-Rubinstein model
In this section, we present a dynamic version of the previous binomial model.
Let = 1, 1
N
, and let T be the Borel -algebra on . Let (Z
k
)
k0
be a sequence of independent random variables with distribution P[Z
k
= 1] =
P[Z
k
= 1] = 1/2. We shall see later that we may replace the value 1/2 by
any parameter in (0, 1), see Remark 2.4). We consider the trivial ltration T
0
= , T, T
k
= (Z
0
, . . . , Z
k
) and F
n
= T
0
, . . . , T
n
.
Let T > 0 be some xed nite horizon, and (b
n
,
n
)
n1
the sequence dened
by :
b
n
= b
T
n
and
n
=
_
T
n
_
1/2
,
where b and are two given strictly positive parameters.
22 CHAPTER 2. FIRST APPROACH TO BLACK-SCHOLES
Remark 2.2. All the results of this section hold true with a sequence (b
n
,
n
)
satisfying :
nb
n
bT and

n
n

T whenever n .
For n 1, we consider the price process of a single risky asset S
n
= S
n
k
,
k = 0, . . . , n dened by :
S
n
0
= s and S
n
k
= s exp
_
kb
n
+
n
k

i=1
Z
i
_
, k = 1, . . . , n .
The non-risky asset is dened by a constant interest rate parameter r, so that
the return from a unit investment in the bank during a period of length T/n is
R
n
:= e
r(T/n)
.
For each n 1, we have then dened a nancial market with time step T/n.
In order to ensure that these nancial markets satisfy the no-arbitrage con-
dition, we assume that :
d
n
< R
n
< u
n
where u
n
= e
b
n
+
n
, d
n
= e
b
n

n
, n 1 . (2.4)
Under this condition, the risk-neutral measure Q
n
dened by :
Q
n
[Z
i
= 1] = q
n
:=
R
n
d
n
u
n
d
n
.
2.3 Valuation and hedging
in the Cox-Ross-Rubinstein model
Consider the contingent claims
B
n
:= g(S
n
n
) where g(s) = (s K)
+
and K > 0 .
At time n 1, we are facing a binomial model, and we can therefore conclude
from our previous discussions that the no-arbitrage market price of this con-
tingent claim at time n 1 and the corresponding perfect hedging strategy are
given by
B
n
n1
:= E
Q
n
n1
[

B
n
] and
n
n1
(
n1
) =
B
n
(
n1
, u
n
) B
n
(,
n1
d
n
)
u
n
S
n
n1
(
n1
) d
n
S
n
n1
(
n1
)
,
where
n1
d
n
, u
n

n1
, and E
Q
n
n1
denotes the expectation operator under
Q
n
conditional on the information at time n 1. Arguying similarly step by
step, backward in time, we may dene the contingent claim B
n
k
at each time
step k as the no-arbitrage market price of the contingent claim B
n
k+1
and the
corresponding perfect hedging strategy:
B
n
k
:= E
Q
n
k
[

B
n
k+1
] and
n
k
(
k
) =
B
n
k+1
(
k
, u
n
) B
n
k+1
(
k
, d
n
)
u
n
S
n
k
(
k
) d
n
S
n
k
(
k
)
, k = 0, . . . , n 1.
2.4. Continuous-time limit 23
Remark 2.3. The hedging strategy is the nite dierences approximation (on
the binomial tree) of the partial derivative of the price of the contingent claim
with respect to the spot price of the underlying asset. This observation will be
conrmed in the continuous-time framework.
By the law of iterated expectations, we conclude that the no-arbitrage price
of the European call option is :
p
n
(B
n
) := e
rT
E
Q
n
_
(S
n
n
K)
+

.
Under the probability measure Q
n
, the random variables (1 + Z
i
)/2 are inde-
pendent and identically distributed as a Bernoulli with parameter q
n
. Then :
Q
n
_
n

i=1
1 +Z
i
2
= j
_
= C
j
n
q
j
n
(1 q
n
)
nj
for j = 0, . . . , n .
This provides
p
n
(B
n
) = e
rT
n

j=0
g
_
su
j
n
d
nj
n
_
C
j
n
q
j
n
(1 q
n
)
nj
.
Remark 2.4. The reference measure P is not involved neither in the valuation
formula, nor in the hedging formula. This is due to the fact that the no-arbitrage
price in the present framework coincides with the perfect replication cost, which
in turn depends on the reference measure only through the corresponding zero-
measure sets.
2.4 Continuous-time limit
In this paragraph, we examine the asymptotic behavior of the Cox-Ross-Rubinstein
model when the time step T/n tends to zero, i.e. when n . Our nal goal
is to show that the limit of the discrete-time valuation formulae coincides with
the Black-Scholes formula which was originally derived in [6] in the continuous-
time setting.
Although the following computations are performed in the case of European
call options, the convergence argument holds for a large class of contingent
claims.
Introduce the sequence :

n
:= infj = 0, . . . , n : su
j
n
d
nj
n
K ,
and let
B(n, p, ) := Prob [Bin(n, p) ] ,
where Bin(n, p) is a Binomial random variable with parameters (n, p).
The following Lemma provides an interesting reduction of our problem.
24 CHAPTER 2. FIRST APPROACH TO BLACK-SCHOLES
Lemma 2.5. For n 1, we have :
p
n
(B
n
) = sB
_
n,
q
n
u
n
R
n
,
n
_
Ke
rT
B(n, q
n
,
n
) .
Proof. Using the expression of p
n
(B
n
) obtained in the previous paragraph, we
see that
p
n
(B
n
) = R
n
n
n

j=
n
_
su
j
n
d
nj
n
K
_
C
j
n
q
j
n
(1 q
n
)
nj
= s
n

j=
n
C
j
n
_
q
n
u
n
R
n
_
j
_
(1 q
n
)d
n
R
n
_
nj

K
R
n
n
n

j=
n
C
j
n
q
j
n
(1 q
n
)
nj
.
The required result follows by noting that q
n
u
n
+ (1 q
n
)d
n
= R
n
.
Hence, in order to derive the limit of p
n
(B
n
) when n , we have
to determine the limit of the terms B(n, q
n
u
n
/R
n
,
n
) et B(n, q
n
,
n
). We
only provide a detailed exposition for the second term ; the rst one is treated
similarly.
The main technical tool in order to obtain these limits is the following.
Lemma 2.6. Let (X
k,n
)
1kn
be a triangular sequence of iid Bernoulli ramdom
variables with parameter
n
:
P[X
k,n
= 1] = 1 P[X
k,n
= 0] =
n
.
Then:

n
k=1
X
k,n
n
n
_
n
n
(1
n
)
^(0, 1) in distribution .
The proof of this lemma is reported at the end of this paragraph.
Exercise 2.7. Use Lemma 2.6 to show that
ln
_
S
n
n
s
_
^(bT,
2
T) in distribution under P .
This shows that the Cox-Ross-Rubinstein model is a discrete-time approxima-
tion of a continuous-time model where the risky asset price has a log-normal
distribution.
Theorem 2.8. In the context of the Cox-Ross-Rubinstein model, the no-arbitrage
price p
n
(B
n
) of a European call option converges, as n , to the Black-
Scholes price :
p(B) = s N
_
d
+
(s,

K,
2
T)
_


K N(d

_
s,

K,
2
T)
_
2.4. Continuous-time limit 25
where

K := Ke
rT
, d

(s, k, v) :=
ln(s/k)

v
2
,
and N(x) =
_
x

e
v
2
/2
dv/

2 is the cumulative distribution function of stan-


dard Gaussian ^(0, 1).
Proof. Under the probability measure Q
n
, notice that B
i
:= (Z
i
+ 1)/2, i 0,
denes a sequence of iid Bernoulli random variables with parameter q
n
. Then
B(n, q
n
,
n
) = Q
n
_
_
n

j=1
B
i

n
_
_
.
We shall only develop the calculations for this term.
1. By denition of
n
, we have
su

n
1
n
d
n
n
+1
n
K su

n
n
d
n
n
n
.
Then,
2
n

_
T
n
+n
_
b
T
n

_
T
n
_
= ln
_
K
s
_
+O
_
n
1/2
_
,
which provides, by direct calculation that

n
=
n
2
+

n
ln(K/s) bT
2

T
+ (

n) . (2.5)
We also compute that
nq
n
=
1
2
+
_
r b

2
2
_
T
2

n +(

n) . (2.6)
By (2.5) and (2.6), it follows that
lim
n

n
nq
n
_
nq
n
(1 q
n
)
= d

(s,

K,
2
T) .
2. Applying Lemma 2.6 to the sequence (Z
1
, . . . , Z
n
), we see that :
L
Q
n
_
1
2

n
k=1
(1 +Z
j
) nq
n
_
nq
n
(1 q
n
)
_
^(0, 1) ,
where L
Q
n
(X) denotes the distribution under Q
n
of the random variable X.
Then :
lim
n
B(n, q
n
,
n
) = lim
n
Q
n
_
1
2

n
k=1
(1 +Z
j
) nq
n
_
nq
n
(1 q
n
)


n
nq
n
_
nq
n
(1 q
n
)
_
= 1 N
_
d

(s,

K,
2
T)
_
= N
_
d

(s,

K,
2
T)
_
.
26 CHAPTER 2. FIRST APPROACH TO BLACK-SCHOLES

Proof of Lemma 2.6. (i) We start by recalling a well-known result on char-


acteristic functions (see Exercise B.9). Let X be a random variable with E[X
n
]
< . Then :

X
(t) := E
_
e
itX

=
n

k=0
(it)
k
k!
E[X
k
] + (t
n
) . (2.7)
To prove this result, we denote F(t, x) := e
itx
and f(t) = E[F(t, X)]. The
function t F(t, x) is dierentiable with respect to the t variable. Since
[F
t
(t, X)[ = [iXF(t, X)[ [X[ L
1
, it follows from the dominated convergence
theorem that the function f is dierentiable with f

(t) = E[iXe
itX
]. In partic-
ular, f

(0) = iE[X]. Iterating this argument, we see that the function f is n


times dierentiable with nth order derivative at zero given by :
f
(n)
(0) = i
n
E[X
n
] .
The expansion (2.7) is an immediate consequence of the Taylor-Young formula.
(ii) We now proceed to the proof of Lemma 2.6. Let
Y
j
:=
X
j,n

n
_
n
n
(1
n
)
and Y
n
:=
n

k=1
Y
j
.
Since the random variables Y
j
are independent and identically distributed, the
characteristic function
Y
n
of Y
n
factors in terms of the common character-
istic function
Y
1
of the Y
i
s as :

Y
n
(t) = (
Y
1
(t))
n
.
Moreover, we compute directly that E[Y
j
] = 0 and E[Y
2
j
] = 1/n. Then, it
follows from (2.7) that :

Y
1
(t) = 1
t
2
2n
+
_
1
n
_
.
Sending n to , this provides :
lim
n

Y
n
(t) = e
t
2
/2
=
N(0,1)
(t) .
This shows the convergence in distribution of Y
n
towards the standard normal
distribution.
Chapter 3
Some preliminaries on
continuous-time processes
3.1 Filtration and stopping times
Throughout this chapter, (, T, P) is a given probability space.
A stochastic process with values in a set E is a map
V : R
+
E
(t, ) V
t
()
The index t is conveniently interpreted as the time variable. In the context of
these lectures, the state space E will be a subset of a nite dimensional space,
and we shall denote by B(E) the corresponding Borel eld. The process V
is said to be measurable if the mapping
V : (R
+
, B(R
+
) T) (E, B(E))
(t, ) V
t
()
is measurable. For a xed , the function t R
+
V
t
() is the sample
path (or trajectory) of V corresponding to .
3.1.1 Filtration
A ltration F = T
t
, t 0 is an increasing family of sub-algebras of T.
Similar to the discrete-time context, T
t
is intuitively understood as the infor-
mation available up to time t. The increasing feature of the ltration, T
s
T
t
for 0 s t, means that information can only increase as time goes on.
Denition 3.1. A stochastic process V is said to be
(i) adapted to the ltration F if the random variable V
t
is T
t
measurable for
27
28 CHAPTER 3. CONTINUOUS-TIME PROCESSES PRELIMINARIES
every t R
+
,
(ii) progressively measurable with respect to the ltration F if the mapping
V : ([0, t] , B([0, t]) T
t
) (E, B(E))
(t, ) V
t
()
is measurable for every t R
+
.
Given a stochastic process V , we dene its canonical ltration by T
V
t
:=
(V
s
, s t), t R
+
. This is the smallest ltration to which the process V is
adapted.
Obviously, any progressively measurable stochastic process is measurable
and adapted. The following result states that these two notions are equivalent
for processes which are either right-continuous or left-continuous.
Proposition 3.2. Let V be a stochastic process with right-continuous sample
paths or else left-continuous sample paths. Then, if V is adapted to a ltration
F, it is also progressively measurable with respect to F.
Proof. Assume that every sample path of V is right-continuous (the case of
left-continuous sample paths is treated similarly), and x an arbitraty t 0.
Observe that V
s
() = lim
n
V
n
s
() for every s [0, t], where V
n
is dened
by
V
n
s
() = V
kt/n
() for (k 1)t < sn kt and k = 1, . . . , n.
Since the restriction of the map V
n
to [0, t] is obviously B([0, t])T
t
measurable,
we deduce the measurability of the limit map V dened on [0, t] .
3.1.2 Stopping times
A random time is a random variable with values in [0, ]. It is called
- a stopping time if the event set t is in T
t
for every t R
+
,
- an optional time if the event set < t is in T
t
for every t R
+
.
Obviously, any stopping time is an optional time. It is an easy exercise to
show that these two notions are in fact identical whenever the ltration F is
right-continuous, i.e.
T
t+
:=
s>t
T
s
= T
t
for every t 0 .
This will be the case in all of the nancial applications of this course. An
important example of a stopping time is:
Exercise 3.3. (rst exit time) Let V be a stochastic process with continuous
paths adapted to F, and consider a closed subset B(E) together with the
random time
T

:= inf t 0 : X
t
, ,
with the convention inf = . Show that if is closed (resp. open), then T

is a stopping time (resp. optional time).


3.1. Filtration and stopping times 29
Proposition 3.4. Let
1
and
2
be two stopping times. Then so are
1

2
,

1

2
, and
1
+
2
, .
Proof. For all t 0, we have
1

2
t =
1
t
2
t T
t
, and

1

2
t =
1
t
2
t T
t
. This proves that
1

2
,
1

2
are
stopping times. Finally:

1
+
2
> t =
_

1
= 0
2
> t
_

1
> t
2
= 0
_

1
t
2
> 0
_

_
0 <
1
< t
1
+
2
> t
_
.
Notice that the rst three events sets are obviously in T
t
. As for the fourth one,
we rewrite it as
0 <
1
< t
1
+
2
> t =
_
r(0,t)Q
_
r <
1
< t
2
> t r
_
T
t
.

Exercise 3.5. (i) If (


n
)
n1
is a sequence of optional times, then so are sup
n1

n
,
inf
n1

n
, limsup
n

n
, liminf
n

n
.
(ii) If (
n
)
n1
is a sequence of stopping times, then so is sup
n1

n
.
Given a stopping time with values in [0, ], we shall frequently use the
approximating sequence

n
:=
n + 1
n
1
{<}
+1
{=}
, n 1, (3.1)
which denes a decreasing sequence of stopping times converging a.s. to . Here
t denotes the largest integer less than or equal to t. Notice that the random
time
n
n
is not a stopping time in general.
The following example is a complement to Exercise (3.5).
Exercise 3.6. Let be a nite optional time, and consider the sequence (
n
)
n1
dened by (3.1). Show that
n
is a stopping time for all n 1.
As in the discrete-time framework, we provide a precise denition of the
information available up to some stopping time of a ltration F:
T

:= A T : A t T
t
for every t R
+
.
Denition 3.7. Let be a stopping time, and V a stochastic process. We
denote
V

() := V
()
(), V

t
:= V
t
for all t 0,
and we call V

the process V stopped at .


30 CHAPTER 3. CONTINUOUS-TIME PROCESSES PRELIMINARIES
Proposition 3.8. Let F = T
t
, t 0 be a ltration, an Fstopping time,
and V a progressively measurable stochastic process. Then
(i) T

is a algebra,
(ii) V

is T

measurable, and the stopped process V

t
, t 0 is progressively
measurable.
Proof. (i) First, for all t 0, t = t T
t
proving that T

.
Next, for any A T

, we have A
c
t = t(A t)
c
. Since
is a stopping time, t T
t
. Since A t T
t
, its complement is
in T
t
, and we deduce that the intersection t (A t)
c
is in T
t
.
Since this holds true for any t 0, this shows that A
c
T

.
Finally, for any countable family (A
i
)
i1
T

, we have (
i1
A
i
)
t =
i1
(A
i
t) T
t
proving that
i1
A
i
T

.
(ii) We rst prove that the stopped process V

is progressively measurable.
To see this, observe that the map (s, ) X

s
() is the composition of the
B([0, t]) T
t
measurable maps
f : (s, ) ((), ) and V : (s, ) V
s
(),
where the measurability of the second map is is exactly the progressive measur-
ability assumption on V , and that of the rst one follows from the fact that for
all u t, A T
t
:
f[
1
[0,t]
([0, s) A) = (u, ) [0, t] : u < s, u < (), A B[0, t] T
t
,
as a consequence of () > u T
u
T
t
.
Finally, For every t 0 and B B(E), we write X

B t =
X

t
B t T
t
by the progressive measurability of the stopped
process X

.
Proposition 3.9. Let
1
and
2
be two Fstopping times. Then the events sets

1
<
2
and
1
=
2
are in T

1
T

2
.
Proof. (i) We rst prove that
1
>
2
T

2
. For an arbitrary t 0, we have

1

2

2
t =
1
t
2
t
1
t
2
t. Notice that

i
t T
t
, by the denition of
i
as stopping times, and
1
t
2
t T
t
because both
1
t and
2
t are in T
t
. Consequently,
1

2

2
t T
t
,
and therefore
1

2
T

2
by the arbitrariness of t 0 and the denition
of T

2
. Since T

2
is a algebra, this shows that
1
>
2
T

2
.
(ii) On the other hand,
1
>
2
=
1

2
<
1
T

1
, since
1

2
and
1
are T

1
measurable.
(iii) Finally
1
=
2
=
1
>
2

2
>
1

c
T

1
T

2
by the rst part of
this proof and the fact that T

1
T

2
is a algebra.
3.2. Optional sampling theorem 31
3.2 Martingales and optional sampling
In this section, we shall consider real-valued adapted stochastic processes V on
the ltered probability space (, T, F, P). The notion of martingales is dened
similarly as in the discrete-time case.
Denition 3.10. Let V be an Fadapted stochastic process with E[V
t
[ < for
every t R
+
.
(i) V is a submartingale if E[V
t
[T
s
] V
s
for 0 s t,
(ii) V is a supermartingale if E[V
t
[T
s
] V
s
for 0 s t,
(iii) V is a martingale if E[V
t
[T
s
] = V
s
for 0 s t.
The following Doobs optional sampling theorem states that submartingales
and supermartingales satisfy the same inequalities when sampled along random
times, under convenient conditions.
Theorem 3.11. (Optional sampling) Let V = V
t
, 0 t be a right-
continuous submartingale where the last element V

:= lim
t
V
t
exists for
almost every (see Remark 3.12). If
1

2
are two stopping times, then
E[V

2
[T

1
] V

1
P a.s. (3.2)
Proof. For stopping times
1
and
2
taking values in a nite set, the proof is
identical to that of Theorem 3.17 proved for discrete-time martingales in MAP
432. For completeness, we report the corresponding statement and proof in
Section 3.4 below. In order to extend the result to general stopping times, we
approximate the stopping times
i
by the decreasing sequences (
n
i
)
n1
of (3.1).
Then by the discrete-time optional sampling theorem,
E
_
V

n
2
[T

n
1

n
1
P a.s.
- By denition of the conditional expectation, this means that
_
V

n
2
1
A

_
V

m
1
1
A
for all A T

m
1
. Since
1

m
1
, we have T

1
T

m
1
, and there-
fore
E
_
V

n
2
1
A

E
_
V

m
1
1
A

for all A T

1
. (3.3)
- For all i = 1, 2, the sequence V

n
i
, n 1 is an T

n
i
, n 1backward sub-
martingale in the sense that E[V

n
i
[ < and E
_
V

n
i
[T

n+1
i
_
V

n+1
i
. Moreover,
E
_
V

n
i

E[V
0
]. Then it follows from Lemma 3.14 below that it is uniformly
integrable. Therefore, taking limits in (3.3) and using the right-continuity of V ,
we obtain that
E[V

2
1
A
] E[V

1
1
A
] for all A T

1
,
completing the proof.
32 CHAPTER 3. CONTINUOUS-TIME PROCESSES PRELIMINARIES
Remark 3.12. The previous optional sampling theorem requires the existence
of a last element V

L
1
, as dened in the statement of the theorem, with
E[V

[T
t
] V
t
. For completeness, we observe that the existence of a last element
V

L
1
is veried for right-continuous submartingales V with sup
t0
E
_
V
+
t

<
. This is the so-called submartingale convergence theorem, see e.g. Karatzas
and Shreve [29] Theorem 1.3.15. Notice however that this does not guarantee
that E[V

[T
t
] V
t
.
In the context of these lectures, we shall simply apply the following conse-
quence of the optional sampling theorem.
Exercise 3.13. For a right-continuous submartingale V and two stopping
times
1

2
, the optional sampling theorem holds under either of the following
conditions:
(i)
2
a for some constant a > 0,
(ii) there exists an integrable r.v. Y such that V
t
E[Y [T
t
] Pa.s. for every
t 0. Hint: under this condition, the existence of V

is guaranteed by the
submartingale convergence theorem (see Remark 3.12), and the submartingale
property at innity is a consequence of Fatous lemma.
We conclude this section by proving a uniform integrability result for back-
ward submartingales which was used in the above proof of Theorem 3.11.
Lemma 3.14. Let T
n
, n 1 be a sequence of sub-algebras of T with
T
n
T
n+1
for all n 1. Let X
n
, n 1 be an integrable stochastic process
with
X
n
T
n
measurable and E[X
n
[T
n+1
] X
n+1
for all n 1. (3.4)
Suppose that the sequence (E[X
n
])
n1
is bounded from below, then X
n
, n 1
is uniformly integrable.
Proof. We organize the proof in three steps.
Step 1 By the Jensen inequality, it follows from (3.4) that E[X
+
n
[T
n+1
] X
+
n+1
for all n 1. Then E[X
+
n+1
] E[EX
+
n
[T
n+1
] = E[X
+
n
]
P[[X
n
[ > ]
1

E[X
n
[ =
1

_
2E[X
+
n
] E[X
n
]
_

1

_
2E[X
+
1
] E[X
n
]
_
.
Since the sequence (E[X
n
])
n1
is bounded from below, this shows that sup
n1
P[[X
n
[ >
] 0 as . In other words:
for all > 0, there exists

> 0 such that P[[X


n
[ > ] for all n 1. (3.5)
Step 2 We nally prove that both X

n
, n 1 is uniformly integrable. By
(3.4) and the Jensen inequality, we directly estimate for > 0 that
E
_
X
+
n
1
{X
+
n
>}
_
E
_
X
+
1
1
{X
n
>}

3.3. Maximal inequalities for submartingales 33


Let /

be the set of all events A T such that P[A

] . Let > 0 be given.


Since X
+
1
is integrable, there exists > 0 such that E[X
+
1
1
A
] for all A /

.
By (3.5), there exists

such that X
n
> /

for all n 1, and therefore


E
_
X
+
n
1
{X
+
n
>}
_
for all

and n 1.
Step 3 We now prove that both X
+
n
, n 1 is uniformly integrable.Similarly,
for > 0, it follows from (3.4) that E
_
X

n
1
{X

n
}
_
E
_
X

k
1
{X

n
}
_
for all
k n. Then
0 E
_
X

n
1
{X

n
<}
_
u
n
u
m
+E
_
X
n
1
{X
n
<}

, (3.6)
where u
n
:= E[X
n
], n 1, is bounded from below and decreasing by (3.4).
Then for all > 0, there exists k

> 0 such that [u


n
u
k
[ for all n k

.
Arguying as in Step 2, it follows from the integrability of X
k
that
sup
nk

E
_
[X
k
[1
{X
n
<}

,
which concludes the proof by (3.6).
3.3 Maximal inequalities for submartingales
In this section, we recall the Doobs maximal inequality for discrete-time mar-
tingales (from MAP 432), and extend it to continuous-time martingales.
Theorem 3.15. (Doobs maximal inequality)
(i) Let M
n
, n N be a nonnegative submartingale, and set M

n
:= sup
kn
M
k
.
Then for all n 0:
P
_
M

n
c

E
_
M
n
1
{M

n
c}

and |M

n
|
p

p
p 1
|M
n
|
p
for all p > 1.
(ii) Let M
t
, t 0 be a nonnegative continuous submartingale, and set M

t
:=
sup
s[0,t]
M
s
. Then for all t 0:
P
_
M

t
c

E
_
M
t
1
{M

t
c}

and |M

t
|
p

p
p 1
|M
t
|
p
for all p > 1.
Proof. 1- We rst prove that
cP[M

n
c] E
_
M
n
1
{M

n
c}

E[M
n
] for all c > 0 and n N. (3.7)
To see this, observe that
M

n
c =
_
kn
F
k
, F
0
:= M
0
c, F
k
:=
_
_

ik1
M
i
< c
_
_
M
k
c.
(3.8)
34 CHAPTER 3. CONTINUOUS-TIME PROCESSES PRELIMINARIES
Since F
k
T
k
, M
k
c sur F
k
, it follows from the submartingale property of
M that
E[M
n
1
F
k
] E[M
k
1
F
k
] cP[F
k
], k 0.
Summing over k, this provides
E[M
n
1

k
F
k
] c

k0
P[F
k
] cP[
k
F
k
]
and (3.7) follows from (3.8).
2- Soit p > 0 et q := p/(p 1). It follows from (3.7) that
L :=
_

0
pc
p1
P[M

n
c]dc R :=
_

0
pc
p2
E
_
M
n
1
{M

n
c}

dc.
since M 0, it follows from Fubinis theorem that
L = E
_
_
M

n
0
pc
p1
dc
_
= E[(M

n
)
p
]
and
R = E
_
_
M

n
0
pc
p2
dc
_
= qE
_
[M
n
(M

n
)
p1

q|M
n
|
p
|(M

n
)
p1
|
q
,
by H older inequality. Hence |(M

n
)
p
|
p
p
q|M
n
|
p
|(M

n
)
p1
|
q
, and the required
inequality follows.
3- The extension of the inequality to continuous-time martingales which are
pathwise continuous follows from an obvious discretization and the monotone
convergence theorem.
3.4 Complement: Doobs optional sampling for
discrete martingales
This section reports the proof of the optional sampling theorem for discrete-time
martingales which was the starting point for the proof of the continuous-time
extension of Theorem 3.11.
Lemma 3.16. Let X
n
, n 0 be a supermartingale (resp. submartingale,
martingale) and a stopping time on (, /, F, P). Then, the stopped process
X

is a supermartingale (resp. submartingale, martingale).


Proof. We only prove the result in the martingale case. We rst observe that
X

is Fadapted since, for all n 0 and B c,


(X

n
)
1
(B) =
_

kn1
= k (X
k
)
1
(B)

_
n 1
c
(X
n
)
1
(B)

.
3.4. Complement 35
For n 1, [X

n
[

kn
[X
k
[ L
1
(, T
n
, P), and we directly compute that
E[X

n
[T
n1
] = E
_
X

1
{n1}
+X
n
1
{>n1}
[T
n1

= X

1
{n1}
+E
_
X
n
1
{>n1}
[T
n1

.
Since is a stopping time, the event > n 1 = n
c
T
n1
. Since
X
n
and X
n
1
{>n1}
are integrable, we deduce that
E[X

n
[T
n1
] = X

1
{n1}
+1
{>n1}
E[X
n
[T
n1
]
X

1
{n1}
+1
{>n1}
X
n1
= X

n1
.

Theorem 3.17. (Optional sampling, Doob). Let X


n
, n 0 be a martingale
(resp. supermartingale) and , two bounded stopping times satisfying
a.s. Then :
E
_
X

[T

= X

(resp. X

).
Proof. We only prove the result for the martingale case ; the corresponding
result for surmartingales is proved by the same argument. Let N N be a
bound on .
(i) We rst show that E[X
N
[T

] = X

for all stopping time . For an arbitrary


event A T

, we have A = n T
n
and therefore:
E
_
(X
N
X

)1
A{=n}

= E
_
(X
N
X
n
)1
A{=n}

= 0
since X is a martingale. Summing up over n, we get:
0 =
N

n=0
E
_
(X
N
X

)1
A{=n}

= E[(X
N
X

)1
A
] .
By the arbitrariness of A in T

, this proves that E[X


N
X

[T

] = 0.
(ii) It follows from Lemma 3.16 that the stopped process X

is a martingale.
Applying the result established in (i), we see that:
E
_
X

[T

= E
_
X

T
[T

= X

= X

since .
36 CHAPTER 3. CONTINUOUS-TIME PROCESSES PRELIMINARIES
Chapter 4
The Brownian Motion
The Brownian motion was introduced by the scottish botanist Robert Brown in
1828 to describe the movement of pollen suspended in water. Since then it has
been widely used to model various irregular movements in physics, economics,
nance and biology. In 1905, Albert Einstein (1879-1955) introduced a model
for the trajectory of atoms subject to shocks, and obtained a Gaussian density.
Louis Bachelier (1870-1946) was the very rst to use the Brownian motion as
a model for stock prices in his thesis in 1900, but his work was not recognized
until the recent history. It is only sixty years later that Samuelson (1915-2009,
Nobel Prize in economics 1970) suggested the Brownian motion as a model for
stock prices. The real success of Brownian motion in the nancial application
was however realized by Fisher Black (1938-1995), Myron Scholes (1941-), et
Robert Merton (1944-) who received the Nobel Prize in economics 1997 for their
seminal work between 1969 and 1973 founding the modern theory of nancial
mathematics by introducing the portfolio theory and the no-arbitrage pricing
argument.
The rst rigorous construction of the Brownian motion was achieved by
Norbert Wiener (1894-1964) in 1923, who provided many applications in signal
theory and telecommunications. Paul Levy, (1886-1971, X1904, Professor at
Ecole Polytechnique de 1920 `a 1959) contributed to the mathematical study of
the Brownian motion and proved many surprising properties. Kyioshi It o (1915-
2008) developed the stochastic dierential calculus. The theory benetted from
the considerable activity on martingales theory, in particular in France around
P.A. Meyer. (1934-2003).
The purpose of this chapter is to introduce the Brownian motion and to
derive its main properties.
4.1 Denition of the Brownian motion
Denition 4.1. Let W = W
t
, t R
+
be a stochastic process on the proba-
bility space (, T, P), and F a ltration. W is an Fstandard Brownian motion
37
38 CHAPTER 4. THE BROWNIAN MOTION
if
(i) W is Fadapted.
(ii) W
0
= 0 and the sample paths W
.
() are continuous for a.e. ,
(iii) independent increments: W
t
W
s
is independent of T
s
for all s t,
(iv) the distribution of W
t
W
s
is ^(0, t s) for all t > s 0,
An interesting consequence of (iii) is that:
(iii) the increments W
t
4
W
t
3
and W
t
2
W
t
1
are independent for all 0 t
1

t
2
t
3
t
4
Let us observe that, for any given ltration F, an Fstandard Brownian
motion is also an F
W
standard Brownian motion, where F
W
is the canonical
ltration of W. This justies the consistency of the above denition with the
following one which does not refer to any ltration:
Denition 4.2. Let W = W
t
, t R
+
be a stochastic process on the probabil-
ity space (, T, P). W is a standard Brownian motion if it satises conditions
(ii), (iii) and (iv) of Denition 4.1.
We observe that the pathwise continuity condition in the above Property (ii)
can be seen to be redundant. This is a consequence of the Kolmogorov-

Centsov
Theorem, that we recall (without proof) for completeness, and which shows that
the pathwise continuity follows from Property (iv).
Theorem 4.3. Let X
t
, t [0, T] be a process satisfying
E[X
t
X
s
[
r
C[t s[
1+r
, 0 s, t T, for some r, , C 0.
Then there exists a modication

X
t
, t [0, T] of X (P[X
t
=

X
t
] = 1 for all
t [0, T]) which is a.s. H older continuous for every (0, ).
Exercise 4.4. Use Theorem 4.3 to prove that, Pa.s., the Brownian motion is
(
1
2
)H older continuous for any > 0 (Theorem 4.21 below shows that this
result does not hold for = 0.
We conclude this section by extending the denition of the Brownian motion
to the vector case.
Denition 4.5. Let W = W
t
, t R
+
be an R
n
valued stochastic process on
the probability space (, T, P), and F a ltration. W is an Fstandard Brow-
nian motion if the components W
i
, i = 1, . . . , n, are independent Fstandard
Brownian motions, i.e. (i)-(ii)-(iv) of Denition 4.1 hold, and
(iii)
n
the distribution of W
t
W
s
is ^ (0, (t s)I
n
) for all t > s 0, where I
n
is the identity matrix of R
n
.
4.1. Definition of Brownian motion 39

Figure 4.1: Approximation of a sample path of a Brownian motion

Figure 4.2: A sample path of the two-dimensional Brownian motion
40 CHAPTER 4. THE BROWNIAN MOTION
4.2 The Brownian motion as a limit of a random
walk
Before discussing the properties of the Brownian motion, let us comment on
its existence as a continuous-time limit of a random walk. Given a family
Y
i
, i = 1, . . . , n of n independent random variables dened by the distribution
P[Y
i
= 1] = 1 P[Y
i
= 1] =
1
2
, (4.1)
we dene the symmetric random walk
M
0
= 0 and M
k
=
k

j=1
Y
j
for k = 0, . . . , n.
A continuous-time process can be obtained from the sequence M
k
, k = 0, . . . , n
by linear interpollation:
M
t
:= M
t
+ (t t) Y
t+1
for t 0 ,
where t denotes the largest integer less than or equal to t. The following gure
shows a typical sample path of the process M.

Figure 4.3: Sample path of a random walk
We next dene a stochastic process W
n
from the previous process by speed-
ing up time and conveniently scaling:
W
n
t
:=
1

n
M
nt
, t 0 .
4.3. Distribution of Brownian motion 41
In the above denition, the normalization by

n is suggested by the Central
Limit Theorem. We next set
t
k
:=
k
n
for k N
and we list some obvious properties of the process W
n
:
for 0 i j k n, the increments W
n
t

W
n
t
k
and W
n
t
j
W
n
t
i
are
independent,
for 0 i k, the two rst moments of the increment W
n
t
k
W
n
t
i
are given
by
E[W
n
t
k
W
n
t
i
] = 0 and Var[W
n
t
k
W
n
t
i
] = t
k
t
i
,
which shows in particular that the normalization by n
1/2
in the denition of
W
n
prevents the variance of the increments from blowing up,
with T
n
t
:= (Y
j
, j nt), t 0, the sequence
_
W
n
t
k
, k N
_
is a dis-
crete
_
T
n
t
k
, k N
_
martingale:
E
_
W
n
t
k
[T
n
t
i

= W
n
t
i
for 0 i k .
Hence, except for the Gaussian feature of the increments, the discrete-time
process
_
W
n
t
k
, k N
_
is approximately a Brownian motion. One could even
obtain Gaussian increments with the required mean and variance by replacing
the distribution (4.1) by a convenient normal distribution. However, since our
objective is to imitate the Brownian motion in the asymptotics n , the
Gaussian distribution of the increments is expected to hold in the limit by a
central limit type of argument.
Figure 4.4 represents a typical sample path of the process W
n
. Another
interesting property of the rescaled random walk, which will be inherited by the
Brownian motion, is the following quadratic variation result:
[W
n
, W
n
]
t
k
:=
k

j=1
_
W
n
t
j
W
n
t
j1
_
2
= t
k
for k N.
A possible proof of the existence of the Brownian motion consists in proving
the convergence in distribution of the sequence W
n
toward a Brownian motion,
i.e. a process with the properties listed in Denition 4.2. This is the so-called
Donskers invariance principle. The interested reader may consult a rigorous
treatment of this limiting argument in Karatzas and Shreve [29] Theorem 2.4.20.
4.3 Distribution of the Brownian motion
Let W be a standard real Brownian motion. In this section, we list some
properties of W which are directly implied by its distribution.
42 CHAPTER 4. THE BROWNIAN MOTION

Figure 4.4: The rescaled random walk
The Brownian motion is a martingale:
E[W
t
[T
s
] = W
s
for 0 s t ,
where F is any ltration containing the canonical ltration F
W
of the Brown-
ian motion. From the Jensen inequality, it follows that the squared Brownian
motion W
2
is a submartingale:
E
_
W
2
t
[T
s

W
2
s
, for 0 s < t .
The precise departure from a martingale can be explicitly calculated
E
_
W
2
t
[T
s

= W
2
s
+ (t s) , for 0 s < t ,
which means that the process
_
W
2
t
t, t 0
_
is a martingale. This is an
example of the very general Doob-Meyer decomposition of submartingales which
extends to the continuous-time setting under some regularity conditions, see e.g.
Karatzas and Shreve [29].
4.3. Distribution of Brownian motion 43
The Brownian motion is a Markov process, i.e.
E[(W
s
, s t) [T
t
] = E[(W
s
, s t) [W
t
]
for every t 0 and every bounded continuous function : C
0
(R
+
) R,
where C
0
(R
+
) is the set of continuous functions from R
+
to R. This follows
immediately from the fact that W
s
W
t
is independent of T
t
for every s t.
We shall see later in Corollary 4.11 that the Markov property holds in a stronger
sense by replacing the deterministic time t by an arbitrary stopping time .
The Brownian motion is a centered Gaussian process as it follows from its
denition that the vector random variable (W
t
1
, . . . , W
t
n
) is Gaussian for every
0 t
1
< . . . < t
n
. Centered Gaussian processes can be characterized in terms of
their covariance function. A direct calculation provides the covariance function
of the Brownian motion
Cov (W
t
, W
s
) = E[W
t
W
s
] = t s = mint, s
The Kolmogorov theorem provides an alternative construction of the Brownian
motion as a centered Gaussian process with the above covariance function, we
will not elaborate more on this and we send the interested reader to Karatzas
and Shreve [29], Section 2.2.
We conclude this section by the following property which is very useful for
the purpose of simulating the Brownian motion.
Exercise 4.6. For 0 t
1
<

t < t
2
, show that the conditional distribution of
W

t
given (W
t
1
, W
t
2
) = (x
1
, x
2
) is Gaussian, and provided its mean and variance
in closed form.
Distribution: By denition of the Brownian motion, for 0 t < T, the
conditional distribution of the random variable W
T
given W
t
= x is a ^(x, Tt):
p(t, x, T, y)dy := P[W
T
[y, y +dy][W
t
= x] =
1
_
2(T t)
e

(yx)
2
2(Tt)
dy
An important observation is that this density function satises the heat equation
for every xed (t, x):
p
T
=
1
2

2
p
y
2
,
as it can be checked by direct calculation. One can also x (T, y) and express
the heat equation in terms of the variables (t, x):
p
t
+
1
2

2
p
x
2
= 0 . (4.2)
We next consider a function g with polynomial growth, say, and we dene the
conditional expectation:
V (t, x) = E[g (W
T
) [W
t
= x] =
_
g(y)p(t, x, T, y)dy . (4.3)
44 CHAPTER 4. THE BROWNIAN MOTION
Remark 4.7. Since p is C

, it follows from the dominated convergence theorem


that V is also C

.
By direct dierentiation inside the integral sign, it follows that the function
V is a solution of
V
t
+
1
2

2
V
x
2
= 0 and V (T, .) = g . (4.4)
We shall see later in Section 8.5 that the function V dened in (4.3) is the
unique solution of the above linear partial dierential equation in the class of
polynomially growing functions.
4.4 Scaling, symmetry, and time reversal
The following easy properties follow from the properties of the centered Gaussian
distribution.
Proposition 4.8. Let W be a standard Brownian motion, t
0
> 0, and c > 0.
Then, so are the processes
W
t
, t 0 (symmetry),
c
1/2
W
ct
, t 0 (scaling),
W
t
0
+t
W
t
0
, t 0 (time translation),
W
Tt
W
T
, 0 t T (time reversal).
Proof. Properties (ii), (iii) and (iv) of Denition 4.2 are immediately checked.

Remark 4.9. For a Brownian motion W in R


n
, the symmetry property of the
Brownian motion extends as follows: for any (nn) matrix A, with AA
T
= I
n
,
the process AW
t
, t 0 is a Brownian motion.
Another invariance property for the Brownian motion will be obtained by
time inversion in subsection 4.6 below. Indeed, the process B dened by B
0
:= 0
and B
t
:= tW
1/t
, t > 0, obviously satises properties (iii) and (iv); property
(ii) will be obtained as a consequence of the law of large numbers.
We next investigate whether the translation property of the Brownian motion
can be extended to the case where the deterministic time t
0
is replaced by some
random time. The following result states that this is indeed the case when the
random time is a stopping time.
Proposition 4.10. Let W be a Brownian motion, and consider some nite
stopping time . Then, the process B dened by
B
t
:= W
t+
W

, t 0 ,
is a Brownian motion independent of T

.
4.4. Scaling, symmetry, time reversal 45
Proof. Clearly B
0
= 0 and B has a.s. continuous sample paths. In the rest
of this proof, we show that, for 0 t
1
< t
2
< t
3
< t
4
, s > 0, and bounded
continuous functions , and f:
E[(B
t
4
B
t
3
) (B
t
2
B
t
1
) f (W
s
) 1
s
]
= E[(W
t
4
W
t
3
)] E[ (W
t
2
W
t
1
)] E[f (W
s
) 1
s<
] (4.5)
This would imply that B has independent increments with the required Gaussian
distribution.
Observe that we may restrict our attention to the case where has a nite
support s
1
, . . . , s
n
. Indeed, given that (4.5) holds for such stopping times, one
may approximate any stopping time by a sequence of bounded stopping times
(
N
:= N)
N1
, and then approximate each
N
by the decreasing sequence
of stopping times
N,n
:=
_
n
N
+ 1
_
/n of (3.1), apply (4.5) for each n 1,
and pass to the limit by the dominated convergence theorem thus proving that
(4.5) holds for .
For a stopping time with nite support s
1
, . . . , s
n
, we have:
E
_
(B
t
4
B
t
3
) (B
t
2
B
t
1
) f (W
s
) 1
{s}

=
n

i=1
E
_
(B
t
4
B
t
3
) (B
t
2
B
t
1
) f (W
s
) 1
{s}
1
{=s
i
}

=
n

i=1
E
_

_
W
t
i
4
W
t
i
3
_

_
W
t
i
2
W
t
i
1
_
f (W
s
) 1
{=s
i
s}
_
where we denoted t
i
k
:= s
i
+ t
k
for i = 1, . . . , n and k = 1, . . . , 4. We next
condition upon T
s
i
for each term inside the sum, and recall that 1
{=s
i
}
is
T
s
i
measurable as is a stopping time. This provides
E
_
(B
t
4
B
t
3
) (B
t
2
B
t
1
) f (W
s
) 1
{s}

=
n

i=1
E
_
E
_

_
W
t
i
4
W
t
i
3
_

_
W
t
i
2
W
t
i
1
_

T
s
i
_
f (W
s
) 1
{=s
i
s}
_
=
n

i=1
E
_
E[(W
t
4
W
t
3
)] E[ (W
t
2
W
t
1
)] f (W
s
) 1
{=s
i
s}
_
where the last equality follows from the independence of the increments of the
Brownian motion and the symmetry of the Gaussian distribution. Hence
E
_
(B
t
4
B
t
3
) (B
t
2
B
t
1
) f (W
s
) 1
{s}

= E[(W
t
4
W
t
3
)] E[ (W
t
2
W
t
1
)]
n

i=1
E
_
f (W
s
) 1
{=s
i
s}

which is exactly (4.5).


An immediate consequence of Proposition 4.10 is the strong Markov property
of the Brownian motion.
46 CHAPTER 4. THE BROWNIAN MOTION
Corollary 4.11. The Brownian motion satises the strong Markov property:
E[(W
s+
, s 0) [T

] = E[(W
s
, s ) [W

]
for every stopping time , and every bounded function : C
0
(R
+
) R.
Proof. Since B
s
:= W
s+
W

is independent of F

for every s 0, we have


E[(W
s
, s ) [T

] = E[(B
s
+W

, s ) [T

]
= E[(B
s
+W

, s ) [W

] .

We next use the symmetry property of Proposition 4.10 in order to provide


explicitly the joint distribution of the Brownian motion W and the correspond-
ing running maximum process:
W

t
:= sup
0st
W
s
, t 0 .
The key-idea for this result is to make use of the Brownian motion started at
the rst hitting time of some level y:
T
y
:= inf t > 0 : W
t
> y .
Observe that
W

t
y = T
y
t ,
which implies in particular a connection between the distributions of the running
maximum W

and the rst hitting time T


y
.
Proposition 4.12. Let W be a Brownian motion and W

the corresponding
running maximum process. Then, for t > 0, the random variables W

t
and [W
t
[
have the same distribution, i.e.
P[W

t
y] = P[[W
t
[ y] .
Furthermore, the joint distribution of the Brownian motion and the correspond-
ing running maximum is characterized by
P[W
t
x, W

t
y] = P[W
t
2y x] for y > 0 and x y .
Proof. From Exercise 3.3 and Proposition 4.10, the rst hitting time T
y
of
the level y is a stopping time, and the process
B
t
:=
_
W
t+T
y
W
T
y
_
, t 0 ,
4.5. Filtration of the Brownian motion 47
is a Brownian motion independent of T
T
y
. Since B
t
and B
t
have the same
distribution and W
T
y
= y, we compute that
P[W
t
x, W

t
y] = P
_
y +B
tT
y
x, T
y
t

= E
_
1
{T
y
t}
PB
tT
y
x y[T
T
y

_
= E
_
1
{T
y
t}
PB
tT
y
x y[T
T
y

= P[W
t
2y x, W

t
y] = P[W
t
2y x] ,
where the last equality follows from the fact that W

t
y W
t
2y x as
y x. As for the marginal distribution of the running maximum, we decompose:
P[W

t
y] = P[W
t
< y , W

t
y] +P[W
t
y , W

t
y]
= P[W
t
< y , W

t
y] +P[W
t
y]
= 2P[W
t
y] = P[[W
t
[ y]
where the two last equalities follow from the rst part of this proof together
with the symmetry of the Gaussian distribution.
The following property is useful for the approximation of Lookback options
prices by Monte Carlo simulations.
Exercise 4.13. For a Brownian motion W and t > 0, show that
P[W

t
y[W
t
= x] = e
2
t
y(yx)
for y x
+
.
4.5 Brownian ltration and the Zero-One law
Because the Brownian motion has a.s. continuous sample paths, the correspond-
ing canonical ltration F
W
:= T
W
t
, t 0 is left-continuous, i.e.
s<t
T
W
s
=
T
W
t
. However, F
W
is not right-continuous. To see this, observe that the event
set W has a local maximum at t is in T
W
t+
:=
s>t
T
W
s
, but is not in T
W
t
.
This diculty can be overcome by slightly enlarging the canonical ltration
by the collection of zero-measure sets:
T
W
t
:= T
W
t
^(T) :=
_
T
t
^(T)
_
, t 0,
where
^(T) :=
_
A : there exists

A T s.t. A

A and P[

A] = 0
_
.
The resulting ltration F
W
:= T
W
t
, t 0 is called the augmented canonical
ltration which will now be shown to be continuous.
We rst start by the Blumenthal Zero-One Law.
Theorem 4.14. For any A T
W
0+
, we have P[A] 0, 1.
48 CHAPTER 4. THE BROWNIAN MOTION
Proof. Since the increments of the Brownian motion are independent, it follows
that T
W
0+
T

is independent of (

:=
_
W
s
W

, s 1
_
, for all > 0.
Then, for A T
W
0+
, we have P[A[(

] = P[A], a.s.
On the other hand, since W

W
0
, Pa.s. we see that, for all t > 0, W
t
=
lim
0
(W
t
W

), a.s. so that W
t
is measurable with respect to the algebra
( := (
n
(
n
) ^(T). Then T
W
0+
(, and therefore by the monotonicity of (

:
1
A
= P[A[(] = lim
0
P[A[(

] = P[A].

Theorem 4.15. Let W be a Brownian motion. Then the augmented ltration


F
W
is continuous and W is an F
W
Brownian motion.
Proof. The left-continuity of F
W
is a direct consequence of the path continuity
of W. The inclusion T
W
0
T
W
0+
is trivial and, by Theorem 4.14, we have
T
W
0+
(^(T)) T
W
0
. Similarly, by the independent increments property,
T
W
t+
= T
W
t
for all t 0. Finally, W is an F
W
Brownian motion as it satises
all the required properties of Denition 4.1.
In the rest of these notes, we will always work with the augmented ltration,
and we still denote it as F
W
:= F
W
.
4.6 Small/large time behavior of the Brownian
sample paths
The discrete-time approximation of the Brownian motion suggests that
W
t
t
tends
to zero at least along natural numbers, by the law of large numbers. With a
little eort, we obtain the following strong law of large numbers for the Brownian
motion.
Theorem 4.16. For a Brownian motion W, we have
W
t
t
0 P a.s. as t .
Proof. We rst decompose
W
t
t
=
W
t
W
t
t
+
t
t
W
t
t
By the law of large numbers, we have
W
t
t
=
1
t
t

i=1
(W
i
W
i1
) 0 P a.s.
4.6. Asymptotics of Brownian paths 49
We next estimate that

W
t
W
t

t

t
t

t
, where
n
:= sup
n1<tn
(W
t
W
n1
) , n 1 .
Clearly,
n
, n 1 is a sequence of independent identically distributed random
variables. The distribution of
n
is explicitly given by Proposition 4.12. In
particular, by a direct application of the Chebychev inequality, it is easily seen
that

n1
P[
n
n] =

n1
P[
1
n] < . By the Borel Cantelli
Theorem, this implies that
n
/n 0 Pa.s.
W
t
W
t
t
0 P a.s. as t ,
and the required result follows from the fact that t/t 1 as t .
As an immediate consequence of the law of large numbers for the Brownian
motion, we obtain the invariance property of the Brownian motion by time
inversion:
Proposition 4.17. Let W be a standard Brownian motion. Then the process
B
0
= 0 and B
t
:= tW1
t
for t > 0
is a Brownian motion.
Proof. All of the properties (ii)-(iii)-(iv) of the denition are obvious, except
for the sample path continuity at zero. But this is equivalent to the law of large
numbers stated in Proposition 4.16.
The following result shows the path irregularity of the Brownian motion.
Proposition 4.18. Let W be a Brownian motion in R. Then, Pa.s. W
changes sign innitely many times in any time interval [0, t], t > 0.
Proof. Observe that the random times

+
:= inf t > 0 : W
t
> 0 and

:= inf t > 0 : W
t
< 0
are stopping times with respect to the augmented ltration F
W
. Since this
ltration is continuous, it follows that the event sets
+
= 0 and

= 0
are in T
W
0
. By the symmetry of the Brownian motion, its non degeneracy
on any interval [0, t], t > 0, and the fact that T
W
0
is trivial, it follows that
P[
+
= 0] = P[

= 0] = 1. Hence for a.e. , there are sequences of


random times
+
n
0 and

n
0 with W

+
n
> 0 and W

n
< 0 for n 1.
We next state that the sample path of the Brownian motion is not bounded
Pa.s.
50 CHAPTER 4. THE BROWNIAN MOTION
Proposition 4.19. For a standard Brownian motion W, we have
limsup
t
W
t
= and liminf
t
W
t
= , P a.s.
Proof. By symmetry of the Brownian motion, we only have to prove the
limsup result.
Step 1 The invariance of the Brownian motion by time inversion of Proposition
4.17 implies that
limsup
t
W
t
= limsup
u0
1
u
B
u
where B
u
:= uW
1/u
1
{u=0}
denes a Brownian motion. Then, it follows from the Zero-One law of Theo-
rem 4.14 that C
0
:= limsup
t
W
t
is deterministic. By the symmetry of the
Brownian motion, we see that C
0
R
+
.
By the translation invariance of the Brownian motion, we see that
C
0
= limsup
t
(W
t
W
s
) in distribution for every s 0.
Then, if C
0
< , it follows that
e
C
0
= E
_
e
C
0
+W
s
_
= e
C
0
+
2
s/2
which can not happen. Hence C
0
= .
Another consequence is the following result which shows the complexity of
the sample paths of the Brownian motion.
Proposition 4.20. For any t
0
0, we have
liminf
tt
0
W
t
W
t
0
t t
0
= and limsup
tt
0
W
t
W
t
0
t t
0
= .
Proof. From the invariance of the Brownian motion by time translation, it
is sucient to consider t
0
= 0. From Proposition 4.17, B
t
:= tW
1/t
denes a
Brownian motion. Since W
t
/t = B
1/t
, it follows that the behavior of W
t
/t for
t 0 corresponds to the behavior of B
u
for u . The required limit result
is then a restatement of Proposition 4.19.
We conclude this section by stating, without proof, the law of the iterated
logarithm for the Brownian motion. The interested reader may consult [29] for
the proof.
Theorem 4.21. For a Brownian motion W, we have
limsup
t0
W
t
_
2t ln(ln
1
t
)
= 1 and liminf
t0
W
t
_
2t ln(ln
1
t
)
= 1
In particular, this result shows that the Brownian motion is nowhere
1
2
H older
continuous, see Exercise 4.4.
4.7. Quadratic variation 51
4.7 Quadratic variation
In this section, we consider a sequence of partitions
n
= (t
n
i
)
i1
R
+
, n 1,
such that
t
n
i
:= t
n
i
t
n
i1
0 and [
n
[ := sup
i1
[t
n
i
[ 0 as n , (4.6)
where we set t
n
0
:= 0, and we dene the discrete quadratic variation:
QV

n
t
(W) :=

i1

W
t
n
i
t
W
t
n
i1
t

2
for all n 1. (4.7)
As we shall see shortly, the Brownian motion has innite total variation, see
(4.8) below. In particular, this implies that classical integration theories are not
suitable for the case of the Brownian motion. The key-idea in order to dene an
integration theory with respect to the Brownian motion is the following result
which states that the quadratic variation dened as the L
2
limit of (4.7) is
nite.
Before stating the main result of this section, we observe that the quadratic
variation (along any subdivision) of a continuously dierentiable function f
converges to zero. Indeed,

t
i
t
[f(t
i+1
) f(t
i
)[
2
|f

|
2
L

([0,t])

t
i
t
[t
i+1

t
i
[
2
0. Because of the non-dierentiability property stated in Proposition
4.20, this result does not hold for the Brownian motion.
Proposition 4.22. Let W be a standard Brownian motion in R, and (
n
)
n1
a partition as in (4.6). Then the quadratic variation of the Brownian motion is
nite and given by:
W)
t
:= L
2
lim
n
QV

n
t
(W) = t for all t 0.
Proof. We directly compute that:
E
__
QV

n
t
(W) t
_
2

= E
__

i1

W
t
n
i
t
W
t
n
i1
t

2
(t
n
i
t t
n
i1
t)
_
2
_
=

i1
E
__

W
t
n
i
t
W
t
n
i1
t

2
(t
n
i
t t
n
i1
t)
_
2
_
= 2

i1
(t
n
i
t t
n
i1
t)
2
2t[
n
[
by the independence of the increments of the Brownian motion and the fact
that W
t
n
i
t
W
t
n
i1
^(0, t
n
i
t t
n
i1
t).
Remark 4.23. Proposition 4.22 has a natural direct extension to the multi-
dimensional setting. Let W be a standard Brownian motion in R
d
, then, Pa.s.

t
n
i
t
_
W
t
n
i+1
W
t
n
i
__
W
t
n
i+1
W
t
n
i
_
T
t I
d
, t 0,
52 CHAPTER 4. THE BROWNIAN MOTION
where I
d
is the identity matrix of R
d
. We leave the verication of this result as
an exercise.
The convergence result of Proposition 4.22 can be improved for partition

n
whose mesh [
n
[ satises a fast convergence to zero. As a complement, the
following result considers the dyadic partition
n
= (
n
i
)
i1
dened by:

n
i
:= i2
n
, for integers i 0 and n 1.
Proposition 4.24. Let W be a standard Brownian motion in R. Then:
P
_
lim
n
V

n
t
(W) = t , for every t 0
_
= 1.
Proof. See section 4.8.
Remark 4.25. Inspecting the proof of Proposition 4.24, we see that the
quadratic variation along any subdivision 0 = s
n
0
< . . . < s
n
n
= t satises:
n

i=1

W
s
n
i+1
W
s
n
i

2
t P a.s. whenever

n1
sup
1in
[s
n
i+1
s
n
i
[ < 0.
We nally observe that Proposition 4.22 implies that the total variation of
the Brownian motion innite:
L
2
lim
n

i1

W
t
n
i
t
W
t
n
i1
t

= . (4.8)
This follows from the inequality
QV

n
t
(W) max
i1

W
t
n
i
t
W
t
n
i1
t

i1

W
t
n
i
t
W
t
n
i1
t

,
together with the fact that max
i1

W
t
n
i
t
W
t
n
i1
t

0, due to the continu-


ity of the Brownian motion. For this reason, the Stieltjes theory of integration
does not apply to the Brownian motion.
4.8 Complement
Proof of Proposition 4.24 We shall simply denote V
n
t
:= V

n
t
(W).
(i) We rst x t > 0 and show that V
n
t
t Pa.s. as n , or equivalently:

t
n
i
t
Z
i
n
0 P a.s. where Z
i
:=
_
_
W
t
n
i+1
W
t
n
i
_
2
2
n
_
.
Observe that E[Z
i
Z
j
] = 0 for i ,= j, and E[Z
2
j
] = C2
2n
for some constant
C > 0. Then

nN
E
__

t
n
i
t
Z
i
_
2
_
=

nN

t
n
i
t
E
_
Z
2
i

= C

nN
2
n

t
n
i
t
2
n
= C

nN
2
n
.
4.7. Quadratic variation 53
Then, it follows from the monotone convergence theorem that
E
_

n1
_

t
n
i
t
Z
i
_
2
_
liminf
N

nN
E
__

t
n
i
t
Z
i
_
2
_
< .
In particular, this shows that the series

n1
_

t
n
i
t
Z
i
_
2
is a.s. nite, and
therefore

t
n
i
t
Z
i
0 Pa.s. as n .
(ii) From the rst step of this proof, we can nd a zero measure set N
s
for
each rational number s Q. For an arbitrary t 0, let (s
p
) and (s

p
) be two
monotonic sequences of rational numbers with s
p
t and s

p
t. Then, except
on the zero-measure set N :=
sQ
N
s
, it follows from the monotonicity of the
quadratic variation that
s
p
= lim
n
V
n
s
p
liminf
n
V
n
t
limsup
n
V
n
t
lim
n
V
n
s

p
= s

p
.
Sending p shows that V
n
t
t as n for every outside the zero-
measure set N.
54 CHAPTER 4. THE BROWNIAN MOTION
Chapter 5
Stochastic integration with
respect to the Brownian
motion
Recall from (4.8) that the total variation of the Brownian motion is innite:
lim
n

t
n
i
t

W
t
n
i+1
W
t
n
i

= P a.s.
Because of this property, one can not hope to dene the stochastic integral with
respect to the Brownian motion pathwise. To understand this, let us forget
for a moment about stochastic process. Let , f : [0, 1] R be continuous
functions, and consider the Riemann sum:
S
n
:=

t
n
i
1

_
t
n
i1
_ _
f (t
n
i
) f
_
t
n
i1
_
.
Then, if the total variation of f is innite, one can not guarantee that the above
sum converges for every continuous function .
In order to circumvent this limitation, we shall make use of the niteness
of the quadratic variation of the Brownian motion, which allows to obtain an
L
2
denition of stochastic integration.
5.1 Stochastic integrals of simple processes
Throughout this section, we x a nal time T > 0. A process is called simple
if there exists a strictly increasing sequence (t
n
)
n0
in R and a sequence of
random variables (
n
)
n0
such that

t
=
0
1
{0}
(t) +

n=0

n
1
(t
n
,t
n+1
]
(t), t 0,
55
56 CHAPTER 5. STOCHASTIC INTEGRATION
and

n
is T
t
n
measurable for every n 0 and sup
n0
|
n
|

< .
We shall denote by o the collection of all simple processes. For o, we dene
its stochastic integral with respect to the Brownian motion by:
I
0
t
() :=

n0

n
_
W
tt
n+1
W
tt
n
_
, 0 t T. (5.1)
By this denition, we immediately see that:
E
_
I
0
t
()[T
s

= I
s
() for 0 s t, (5.2)
i.e. I
0
t
(), t 0 is a martingale. We also calculate that
E
_
I
0
t
()
2

= E
__
t
0
[
s
[
2
ds
_
for t 0. (5.3)
Exercise 5.1. Prove properties (5.2) and (5.3).
Our objective is to extend I
0
to a stochastic integral operator I acting on
the larger set
H
2
:=
_
: measurable, F adapted processes with E
_
_
T
0
[
t
[
2
dt
_
<
_
,
which is a Hilbert space when equipped with the norm
||
H
2 :=
_
E
_
_
T
0
[
t
[
2
dt
__
1/2
.
The extension of I
0
to H
2
is crucially based on the following density result.
Proposition 5.2. The set of simple processes o is dense in H
2
, i.e. for ev-
ery H
2
, there is a sequence
_

(n)
_
n0
of processes in o such that |

(n)
|
H
2 0 as n .
The proof of this result is reported in the Complements section 5.4.
5.2 Stochastic integrals of processes in H
2
5.2.1 Construction
We now consider a process H
2
, and we intend to dene the stochastic
integral I
T
() for every T 0 by using the density of simple processes.
5.2. Integrals of processes in H
2
57
a. From Proposition 5.2, there is a sequence
_

(n)
_
n0
which approximates
in the sense that |
(n)
|
H
2 0 as n . We next observe from (5.3)
that, for every t 0:
_
_
_I
0
t
_

(n)
_
I
0
t
_

(m)
_
_
_
_
2
L
2
= E
_
I
0
t
_

(n)

(m)
_
2
_
= E
__
t
0
[
(n)
s

(m)
s
[
2
ds
_
=
_
_
_
(n)

(m)
_
_
_
2
H
2
converges to zero as n, m . This shows that the sequence
_
I
0
t
_

(n)
__
n0
is
a Cauchy sequence in L
2
, and therefore
I
0
t
_

(n)
_
I
t
() in L
2
for some random variable I
t
().
b. We next show that the limit I
t
() does not depend on the choice of the
approximating sequence
_

(n)
_
n
. Indeed, for another approximating sequence
_

(n)
_
n
of , we have
_
_
_I
0
t
_

(n)
_
I
t
()
_
_
_
L
2

_
_
_I
0
t
_

(n)
_
I
0
t
_

(n)
_
_
_
_
L
2
+
_
_
_I
0
t
_

(n)
_
I
t
()
_
_
_
L
2

_
_
_
(n)

(n)
_
_
_
H
2
+
_
_
_I
0
t
_

(n)
_
I
t
()
_
_
_
L
2
0 as n .
c. Observe that the above construction applies without any diculty if the
time index t T is replaced by a stopping time with values in [0, T]. The
only ingredient needed for this is the Doobs optional sampling theorem 3.11.
The notation of the stochastic integral is naturally extended to I

() for all
such stopping time.
The following result summarizes the above construction.
Theorem 5.3. For H
2
and a stopping time with values in [0, T], the
stochastic integral denoted
I

() :=
_

0

s
dW
s
is the unique limit in L
2
of the sequence
_
I
0

(
(n)
)
_
n
for every choice of an
approximating sequence
_

(n)
_
n
of in H
2
.
5.2.2 The stochastic integral as a continuous process
For every H
2
, the previous theorem dened a family I

(), where
ranges in the set of all stopping times with values in [0, T]. We now aim at
aggregating this family into a process I
t
(), t [0, T] so that
I

()() = I
()
()() = I
T
(1
[0,]
)(). (5.4)
58 CHAPTER 5. STOCHASTIC INTEGRATION
The meaning of (5.4) is the following. For a stopping time with values in [0, T],
and a process H
2
, we may compute the stochastic integral of with respect
to the Brownian motion on [0, ] either by I

() or by I
T
(1
[0,]
). Therefore,
for the consistency of the stochastic integral operator, we have to verify that
I

() = I
T
(1
[0,]
).
Proposition 5.4. For a process H
2
and a stopping time with values in
[0, T], we have I

() = I
T
(1
[0,]
).
Proof. Consider the approximation of by the decreasing sequence of stop-
ping times
n
:= (n + 1)/n. Let t
i
:= i/n, and observe that 1
[0,
n
]
=

i
1
[t
i
,t
i+1
)
()1
(0,t
i+1
]
is a simple process. Then, the equality
I

n
() = I
T
(1
[0,
n
]
) for simple processes o (5.5)
is trivial. Since 1
[0,
n
]
1
[0,]
in H
2
, it follows that I
T
(1
[0,
n
]
)
I
T
(1
[0,]
) in L
2
. Then, by the pathwise continuity of I
t
(), t [0, T], we
deduce that the proposition holds true for simple processes.
Now for H
2
with an approximating sequence
n
in H
2
, we have
I

(
n
) = I
T
(
n
1
[0,]
) for all n, and we obtain the required result by sending n
to innity.
In the previous proof we used the fact that, for a simple process o, the
process I
t
(), t [0, T] is pathwise continuous. The next result extends this
property to any H
2
.
Proposition 5.5. Let be a process in H
2
. Then the process I
t
(), t [0, T]
has continuous sample paths a.s.
Proof. Denote M
t
:= I
t
(). By denition of the stochastic integral, M
t
is the
L
2
limit of M
n
t
:= I
0
t
(
n
) for some sequence (
n
)
n
of simple processes con-
verging to in H
2
. By denition of the stochastic integral of simple integrands
in (5.1), notice that the process M
n
t
M
m
t
= I
0
t
(
n
) I
0
t
(
m
), t 0 is a.s.
continuous and the process ([M
n
t
M
m
t
[)
mn
is a non-negative submartingale.
We then deduce from the Doobs maximal inequality of Theorem 3.15 that:
E
_
([M
n
M
m
[

t
)
2
_
4 E
__
t
0
[
n
s

m
s
[
2
ds
_
.
This shows that the sequence (M
n
)
n
is a Cauchy sequence in the Banach space
of continuous processes endowed with the norm E
_
sup
[0,T]
[X
s
[
2
_
. Then M
n
converges towards a continuous process

M in the sense of this norm. We know
however that M
n
t
M
t
:= I
t
() in L
2
for all t [0, T]. By passing to
subequences we may deduce that

M = M is continuous.
5.2. Integrals of processes in H
2
59
5.2.3 Martingale property and the Ito isometry
We nally show that the uniquely dened limit I
t
() satises the analogue
properties of (5.2)-(5.3).
Proposition 5.6. For H
2
and t T, we have:
Martingale property: E[I
T
()[T
t
] = I
t
(),
It o isometry: E
_
I
T
()
2

= ||
2
H
2
.
Proof. To see that the martingale property holds, we directly compute with
_

(n)
_
n
an approximating sequence of in H
2
that
|E[I
T
()[T
t
] I
t
()|
L
2

_
_
_E[I
T
()[T
t
] E
_
I
0
t
(
(n)
)[T
t
__
_
_
L
2
+
_
_
_E
_
I
0
T
(
(n)
)[T
t
_
I
t
()
_
_
_
L
2
=
_
_
_E[I
T
()[T
t
] E
_
I
0
T
(
(n)
)[T
t
__
_
_
L
2
+
_
_
_I
0
t
(
(n)
) I
t
()
_
_
_
L
2
by (5.2). By the Jensen inequality and the law of iterated expectations, this
provides
|E[I
T
()[T
t
] I
t
()|
L
2

_
_
_I
T
() I
0
T
(
(n)
)
_
_
_
L
2
+
_
_
_I
0
t
(
(n)
) I
t
()
_
_
_
L
2
which implies the required result by sending n to innity.
As for the It o isometry, it follows from the H
2
convergence of
(n)
towards
and the L
2
convergence of I
0
_

(n)
_
towards I(), together with (5.3), that:
E
_
_
T
0
[
s
[
2
dt
_
= lim
n
E
_
_
T
0
[
(n)
t
[
2
dt
_
= lim
n
E
_
I
0
T
_

(n)
_
2
_
= E
_
I
T
()
2
_
.

5.2.4 Deterministic integrands


We report the main message of this subsection in the following exercise.
Exercise 5.7. Let f : [0, T] R
d
be a deterministic function with
_
T
0
[f(t)[
2
dt <
.
1. Prove that
_
T
0
f(t) dW
t
has distribution ^
_
0,
_
T
0
[f(t)[
2
dt
_
.
Hint: Use the closeness of the Gaussian space.
60 CHAPTER 5. STOCHASTIC INTEGRATION
2. Prove that the process
exp
__
t
0
f(s) dW
s

1
2
_
t
0
[f(s)[
2
ds
_
, t 0
is a martingale.
5.3 Stochastic integration beyond H
2
and Ito pro-
cesses
Our next task is to extend the stochastic integration to integrands in the set
H
2
loc
:=
_
measurable, F adapted
_
T
0
[
s
[
2
ds < a.s.
_
. (5.6)
To do this, we consider for every H
2
loc
the sequence of stopping times

n
:= inf
_
t > 0 :
_
t
0
[
u
[
2
du n
_
.
Clearly, (
n
)
n
is non-decreasing sequence of stopping times and

n
P a.s. when n .
For xed n > 0, the process
n
:=
.
1
.
n
is in H
2
. Then the stochastic integral
I
t
(
n
) is well-dened by Theorem 5.3. Since (
n
)
n
is a non-decreasing and
P[
n
t for some n 1] = 1, it follows that the limit
I
t
() := (a.s.) lim
n
I
t
(
n
) (5.7)
exists (in fact I
t
() = I
t
(
n
) for n suciently large, a.s.).
Remark 5.8. The above extension of the stochastic integral to integrands in
H
2
loc
does not imply that I
t
() satises the martingale property and the It o
isometry of Proposition 5.6. This issue will be further developed in the next
subsection. However the continuity property of the stochastic integral is con-
served because it is a pathwise property which is consistent with the pathwise
denition of (5.7).
As a consequence of this remark, when the integrand is in H
2
loc
but is not
in H
2
, the stochastic integral fails to be a martingale, in general. This leads us
to the notion of local martingale.
Denition 5.9. An Fadapted process M = M
t
, t 0 is a local mar-
tingale if there exists a sequence of stopping times (
n
)
n0
(called a localizing
sequence) such that
n
Pa.s. as n , and the stopped process
M

n
= M
t
n
, t 0 is a martingale for every n 0.
5.3. Integration beyond H
2
61
Proposition 5.10. Let be a process in H
2
loc
. Then, for every T > 0, the
process I
t
(), 0 t T is a local martingale.
Proof. The above dened sequence (
n
)
n
is easily shown to be a localizing se-
quence. The result is then a direct consequence of the martingale property of
the stochastic integral of a process in H
2
.
An example of local martingale which fails to be a martingale will be given
in the next chapter.
Denition 5.11. An It o process X is a continuous-time process dened by:
X
t
= X
0
+
_
t
0

s
ds +
_
t
0

s
dW
s
, t 0,
where and are measurable, Fadapted processes with
_
t
0
([
s
[ +[
s
[
2
)ds <
a.s.
Remark 5.12. Observe that the process above is only assumed to be mea-
surable and Fadapted, so we may ask whether the process
_
t
0

s
ds, t T is
adapted. This is indeed true as a consequence of the density result of Propo-
sition 5.2: Let (
n
)
n1
be an approximation of in H
2
. This implies that
_
t
0

n
s
ds
_
t
0

s
ds a.s. along some subsequence. Since
n
is a simple process,
_
t
0

n
s
ds is T
t
adapted, and so is the limit
_
t
0

s
ds.
We conclude this section by the following easy result:
Lemma 5.13. Let M = M
t
, 0 t T be a local martingale bounded from
below by some constant m, i.e. M
t
m for all t [0, T] a.s. Then M is a
supermartingale.
Proof. Let (T
n
)
n
be a localizing sequence of stopping times for the local mar-
tingale M, i.e. T
n
a.s. and M
tT
n
, 0 t T is a martingale for every
n. Then :
E[M
tT
n
[T
u
] = M
uT
n
, 0 u t T,
for every xed n. We next send n to innity. By the lower bound on M, we can
use Fatous lemma, and we deduce that :
E[M
t
[T
u
] M
u
, 0 u t T ,
which is the required inequality.
Exercise 5.14. Show that the conclusion of Lemma 5.13 holds true under the
weaker condition that the local martingale M is bounded from below by a mar-
tingale.
62 CHAPTER 5. STOCHASTIC INTEGRATION
5.4 Complement: density of simple processes in
H
2
The proof of Proposition 5.2 is a consequence of the following Lemmas. Through-
out this section, t
n
i
:= i2
n
, i 0 is the sequence of dyadic numbers.
Lemma 5.15. Let be a bounded Fadapted process with continuous sample
paths. Then can be approximated by a sequence of simple processes in H
2
.
Proof. Dene the sequence

(n)
t
:=
0
1
{0}
(t) +

t
n
i
T

t
n
i
1
(t
n
i
,t
n
i+1
]
(t), t T.
Then,
(n)
is a simple process for each n 1. By the dominated convergence
theorem, E
_
_
T
0
[
(n)
t

t
[
2
_
0 as n .
Lemma 5.16. Let be a bounded Fprogressively measurable process. Then
can be approximated by a sequence of simple processes in H
2
.
Proof. Notice that the process

(k)
t
:= k
_
t
0(t
1
k
)

s
ds, 0 t T,
is progressively measurable as the dierence of two adapted continuous pro-
cesses, see Proposition 3.2, and satises
_
_
_
(k)

_
_
_
H
2
0 as k , (5.8)
by the dominated convergence theorm. For each k 1, we can nd by Lemma
5.15 a sequence
_

(k,n)
_
n0
of simple processes such that |
(k,n)

(k)
|
H
2 0
as n . Then, for each k 0, we can nd n
k
such that
the process
(k)
:=
(k,n
k
)
satises
_
_
_
(k)

_
_
_
H
2
0 as k .

Lemma 5.17. Let be a bounded measurable and Fadapted process. Then


can be approximated by a sequence of simple processes in H
2
.
Proof. In the present setting, the process
(k)
, dened in the proof of the
previous Lemma 5.16, is measurable but is not known to be adapted. For each
> 0, there is an integer k 1 such that

:=
(k)
satises |

|
H
2 .
Then, with
t
=
0
for t 0:
|
.h
|
H
2
|

|
H
2
+
_
_

.h
_
_
H
2
+
_
_

.h

.h
_
_
H
2
2 +
_
_

.h
_
_
H
2
.
5.4. Complement 63
By the continuity of

, this implies that


limsup
h0
|
.h
|
H
2
4
2
. (5.9)
We now introduce

n
(t) := 1
{0}
(t) +

i1
t
n
i
1
(t
n
i1
,t
n
i
]
,
and

(n,s)
t
:=

n
(ts)+s
, t 0, s (0, 1].
Clearly
n,s
is a simple adapted process, and
E
_
_
T
0
_
1
0
[
(n,s)
t

t
[
2
dsdt
_
= 2
n
E
_
_
T
0
_
2
n
0
[
t

th
[
2
dhdt
_
= 2
n
_
2
n
0
E
_
_
T
0
[
t

th
[
2
dt
_
dh
max
0h2
n
E
_
_
T
0
[
t

th
[
2
dt
_
which converges to zero as n by (5.9). Hence

(n,s)
t
()
t
() for almost every (s, t, ) [0, 1] [0, T] ,
and the required result follows from the dominated convergence theorem.
Lemma 5.18. The set of simple processes o is dense in H
2
.
Proof. We only have to extend Lemma 5.17 to the case where is not neces-
sarily bounded. This is easily achieved by applying Lemma 5.17 to the bounded
process n, for each n 1, and passing to the limit as n .
64 CHAPTER 5. STOCHASTIC INTEGRATION
Chapter 6
Ito Dierential Calculus
In this chapter, we focus on the dierential properties of the Brownian motion.
To introduce the discussion, recall that E
_
W
2
t

= t for all t 0. If standard


dierential calculus were valid in the present context, then one would expect that
W
2
t
be equal to M
t
= 2
_
t
0
W
s
dW
s
. But the process M is a square integrable
martingale on every nite interval [0, T], and therefore E[M
t
] = M
0
= 0 ,=
E[W
2
t
] !
So, the standard dierential calculus is not valid in our context. We should
not be puzzled by this small calculation, as we already observed that the Brow-
nian motion sample path has very poor regularity properties, has innite total
variation, and nite quadratic variation.
We can elaborate more on the above example by considering a discrete-time
approximation of the stochastic integral
n

i=1
2W
t
i1
_
W
t
i
W
t
i1
_
=
n

i=1
_
W
t
i
W
t
i1
_
2
+
n

i=1
_
W
2
t
i
W
2
t
i1
_
= W
2
t

n

i=1
_
W
t
i
W
t
i1
_
2
,
where 0 = t
0
< t
1
< . . . , t
n
= t. We know that the latter sum converges in
L
2
towards t, the quadratic variation of the Brownian motion at time t (the
convergence holds even Pa.s. if one takes the dyadics as (t
i
)
i
). Then, by
sending n to innity, this shows that
_
t
0
2W
s
dW
s
= W
2
t
t, t 0. (6.1)
In particular, there is no contradiction anymore by taking expectations on both
sides.
65
66 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
6.1 Itos formula for the Brownian motion
The purpose of this section is to prove the It o formula for the change of variable.
Given a smooth function f(t, x), we will denote by f
t
, Df and D
2
f, the partial
gradients with respect to t, to x, and the partial Hessian matrix with respect
to x.
Theorem 6.1. Let f : R
+
R
d
R be C
1,2
_
[0, T], R
d
_
. Then, with proba-
bility 1, we have:
f(T, W
T
) = f(0, 0) +
_
T
0
Df (t, W
t
) dW
t
+
_
T
0
_
f
t
+
1
2
Tr[D
2
f]
_
(t, W
t
)dt
for every T 0.
Proof. 1 We rst x T > 0, and we show that the above It os formula holds
with probability 1. By possibly adding a constant to f we may assume that
f(0, 0) = 0. Let
n
= (t
n
i
)
i0
be a partition of R
+
with t
n
0
= 0, and denote
n(T) := supi 0 : t
n
i
T, so that t
n
n(T)
T < t
n
n(T)+1
. We also denote

n
i
W := W
t
n
i+1
W
t
n
i
and
n
i
t := t
n
i+1
t
n
i
.
1.a We rst decompose
f
_
t
n
n(T)+1
, W
t
n
n(T)+1
_
=

t
n
i
T
_
f
_
t
n
i+1
, W
t
n
i+1
_
f
_
t
n
i
, W
t
n
i+1
__
+

t
n
i
T
_
f
_
t
n
i
, W
t
n
i+1
_
f
_
t
n
i
, W
t
n
i
_
_
.
By a Taylor expansion, this provides:
I
n
T
(Df) :=

t
n
i
T
Df
_
t
n
i
, W
t
n
i
_

n
i
W
= f
_
t
n
n(T)+1
, W
t
n
n(T)+1
_

t
n
i
T
f
t
_

n
i
, W
t
n
i+1
_

n
i
t

1
2

t
n
i
T
Tr
__
D
2
f (t
n
i
,
n
i
) D
2
f
_
t
n
i
, W
t
n
i
__

n
i
W
n
i
W
T

1
2

t
n
i
T
Tr
_
D
2
f
_
t
n
i
, W
t
n
i
_

n
i
W
n
i
W
T

, (6.2)
where
n
i
is a random variable with values in
_
t
n
i
, t
n
i+1

, and
n
i
=
n
i
W
t
n
i
+
(1
n
i
) W
t
n
i+1
for some random variable
n
i
with values in [0, 1].
1.b Since a.e. sample path of the Brownian motion is continuous, and there-
fore uniformly continuous on the compact interval [0, T + 1], it follows that
f
_
t
n
n(T)+1
, W
t
n
n(T)+1
_
f(T, W
T
) , P a.s.

t
n
i
T
f
t
_

n
i
, W
t
n
i+1
_

n
i
t
_
T
0
f
t
(t, W
t
)dt P a.s.
6.1. It os formula, Brownian motion 67
Next, using again the above uniform continuity together with Proposition 4.22
and the fact that the L
2
convergence implies the a.s. convergence along some
subsequence, we see that:

t
n
i
T
Tr
__
D
2
f (t
n
i
,
n
i
) D
2
f
_
t
n
i
, W
t
n
i
__

n
i
W
n
i
W
T

0 P a.s.
1.c. For the last term in the decomposition (6.2), we estimate:

t
n
i
T
Tr
_
D
2
f
_
t
n
i
, W
t
n
i
_

n
i
W
n
i
W
T
_

t
n
i
T
Tr
_
D
2
f
_
t
n
i
, W
t
n
i
_

n
i
t

t
n
i
T
Tr
_
D
2
f
_
t
n
i
, W
t
n
i
_ _

n
i
W
n
i
W
T

n
i
tI
d
_
_

(6.3)
By the continuity of D
2
f and W, notice that

t
n
i
T
Tr
_
D
2
f
_
t
n
i
, W
t
n
i
_ _

n
i
W
n
i
W
T

n
i
tI
d
_
_

C()

t
n
i
T
Tr
_

n
i
W
n
i
W
T

n
i
I
d

= C()

t
n
i
T
_
t
n
i+1
t
n
i
(W
t
W
t
n
i
) dW
t

(6.4)
by an immediate extension of (6.1) to the multidimensional setting. Since
E
__

t
n
i
T
_
t
n
i+1
t
n
i
(W
t
W
t
n
i
) dW
t
_
2
_
=

t
n
i
T
E
__
_
t
n
i+1
t
n
i
(W
t
W
t
n
i
) dW
t
_
2
_
=

t
n
i
T
_
t
n
i+1
t
n
i
(t t
n
i
)dt
=
1
2

t
n
i
T

n
i
t

2
[
n
[T 0,
it follows that (6.4) converges to zero Pa.s. along some subsequence, and it
follows from (6.3) that along some subsequence:

t
n
i
T
Tr
_
D
2
f
_
t
n
i
, W
t
n
i
_

n
i
W
n
i
W
T


_
T
0
Tr
_
D
2
f(t, W
t
)

dt, P a.s.
1.d In order to complete the proof of It os formula for xed T > 0, it remains
to prove that
I
n
T
(Df)
_
T
0
Df(t, W
t
) dW
t
P a.s. along some subsequence.(6.5)
68 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
Notice that I
n
T
(Df) = I
0
T
_

(n)
_
where
(n)
is the simple process dened by

(n)
t
=

t
n
i
T
Df
_
t
n
i
, W
t
n
i
_
1
[t
n
i
,t
n
i+1
)
(t), t 0.
Since Df is continuous, it follows from the proof of Proposition 5.2 that
(n)

in H
2
with
t
:= Df (t, W
t
). Then I
n
T
(Df) I
T
() in L
2
, by the denition
of the stochastic integral in Theorem 5.3, and (6.5) follows from the fact that
the L
2
convergence implies the a.s. convergence along some subsequence.
2. From the rst step, we have the existence of subsets N
t
T for every t 0
such that P[N
t
] = 0 and the It os formula holds on N
c
t
, the complement of N
t
.
Of course, this implies that the It os formula holds on the complement of the set
N :=
t0
N
t
. But this does not complete the proof of the theorem as this set is
a non-countable union of zero measure sets, and is therefore not known to have
zero measure. We therefore appeal to the continuity of the Brownian motion
and the stochastic integral, see Proposition 3.15. By usual approximation along
rational numbers, it is easy to see that, with probability 1, the It o formula holds
for every T 0.
Remark 6.2. Since, with probability 1, the It o formula holds for every T 0,
it follows that the It os formula holds when the deterministic time T is replaced
by a random time .
Remark 6.3. (It os formula with generalized derivatives) Let f : R
d
R be
C
1,2
(R
d
K) for some compact subset K of R
d
. Assume that f W
2
(K), i.e.
there is a sequence of functions (f
n
)
n1
such that
f
n
= f on R
d
K, f
n
C
2
(K) and |f
n
x
f
m
x
|
L
2
(K)
+|f
n
xx
f
m
xx
|
L
2
(K)
0.
Then, It os formula holds true:
f(W
t
) = f(0) +
_
t
0
Df(W
s
)dW
s
+
1
2
_
t
0
D
2
f(W
s
)ds,
where Df and D
2
f are the generalized derivatives of f. Indeed, It os formula
holds for f
n
, n 1, and we obtain the required result by sending n .
A similar statement holds for a function f(t, x).
Exercise 6.4. Let W be a Brownian motion in R
d
and consider the process
X
t
:= X
0
+bt +W
t
, t 0,
where b is a vector in R
d
and is an (d d)matrix. Let f be a C
1,2
(R
+
, R
d
)
function. Show that
df(t, X
t
) =
f
t
(t, X
t
)dt +
f
x
(t, X
t
) dX
t
+
1
2
Tr
_

2
f
xx
T
(t, X
t
)
T
_
.
6.2. Extension to It o processes 69
Exercise 6.5. Let W be a Brownian motion in R
d
, and consider the process
S
t
:= S
0
exp (bt + W
t
) , t 0,
where b R and R
d
are given.
1. For a function f : R
+
R
+
R, show that f(t, S
t
), t 0 is an It o
process, and provide its dynamics.
2. Find a function f so that the process f(t, S
t
), t 0 is a local martingale.
Exercise 6.6. Let f : R
+
R
+
R be a C
1,2
function such that, for some
constant C > 0,
[f(t, x)[ +[f

t
(t, x)[ +[f

x
(t, x)[ +[f

x,x
(t, x)[ C exp(C[x[)
for all (t, x) R
+
R.
1. If T is a bounded stopping time, show that
E(f(T, W
T
)) = f(0, 0) +E
_
_
T
0
[f

t
(s, W
s
) +
1
2
f

x,x
(s, W
s
)] ds
_
2. When T is a bounded stopping time, compute E(W
T
) and E(W
2
T
).
3. Show that if T is a stopping time such that E(T) < +, then E(W
T
) = 0.
4. For every real number a ,= 0, we dene

a
= inft 0 : W
t
= a.
Is it a nite stopping time? Is it bounded? Deduce from question 3) that
E(
a
) = +.
5. Show that the law of
a
is characterized by its Laplace transform :
E(exp
a
) = exp(

2[a[), 0.
6. From question 2), deduce the value of P(
a
<
b
), for a > 0 and b < 0.
6.2 Extension to Ito processes
We next provide It os formula for a general It o process with values in R
n
:
X
t
:= X
0
+
_
t
0

s
ds +
_
t
0

s
dW
s
, t 0,
where and are adapted processes with values in R
n
and /
R
(n, d), respec-
tively and satisfying
_
T
0
[
s
[ds +
_
T
0
[
s
[
2
ds < a.s.
70 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
Observe that stochastic integration with respect to the It o process X reduces
to the stochastic integration with respect to W: for any Fadapted R
n
valued
process with
_
T
0
[
T
t

t
[
2
dt +
_
T
0
[
t

t
[dt < ,a.s.
_
T
0

t
dX
t
=
_
T
0

t

t
dt +
_
T
0

t

t
dW
t
=
_
T
0

t

t
dt +
_
T
0

T
t

t
dW
t
.
Theorem 6.7. Let f : R
+
R
n
R be C
1,2
([0, T], R
n
). Then, with proba-
bility 1, we have:
f(T, X
T
) = f(0, 0) +
_
T
0
Df (t, X
t
) dX
t
+
_
T
0
_
f
t
(t, X
t
) +
1
2
Tr[D
2
f(t, X
t
)
t

T
t
]
_
dt
for every T 0.
Proof. Let
N
:= inf
_
t : max
_
[X
t
X
0
[,
_
t
0

2
s
ds,
_
t
0
[
s
[ds
_
N
_
. Obviously,

N
a.s. when N , and it is sucient to prove It os formula on [0,
N
],
since any t 0 can be reached by sending N to innity. In view of this, we
may assume without loss of generality that X,
_
t
0

s
ds,
_
t
0

2
s
ds are bounded
and that f has compact support. We next consider an approximation of the
integrals dening X
t
by step functions which are constant on intervals of time
(t
n
i1
, t
n
i
] for i = 1, . . . , n, and we denote by X
n
the resulting simple approxi-
mating process. Notice that It os formula holds true for X
n
on each interval
(t
n
i1
, t
n
i
] as a direct consequence of Theorem 6.1. The proof of the theorem
is then concluded by sending n to innity, and using as much as needed the
dominated convergence theorem.
Exercise 6.8. Let W be a Brownian motion in R
d
, and consider the process
S
t
:= S
0
exp
__
t
0
b
u
du +
_
t
0

u
dW
u
_
, t 0,
where b and are measurable and Fadapted processes with values in R and
R
d
, repectively, with
_
T
0
[b
u
[du +
_
T
0
[
u
[
2
du < , a.s. for all T > 0.
1. For a function f : R
+
R
+
R, show that f(t, S
t
), t 0 is an It o
process, and provide its dynamics.
2. Let be a measurable Fadapted process with values in R with
_
T
0
[
u
[du <
, a.s. Show that the process X
t
:= e

R
t
0

u
du
S
t
, t 0 is an It o process,
and provide its dynamics.
6.2. Extension to It o processes 71
3. Find a process such that X
t
, t 0 is a local martingale.
Exercise 6.9. Let W be a Brownian motion in R
d
, and X and Y be the It o
processes dened for all t 0 by:
X
t
= X
0
+
_
t
0

X
u
du +
_
t
0

X
u
dW
u
Y
t
= Y
0
+
_
t
0

Y
u
du +
_
t
0

Y
u
dW
u
,
where
X
,
Y
,
X
,
Y
are measurable and Fadapted processes with appropriate
dimension, satisfying
_
T
0
([
X
u
[ +[
Y
u
[ +[
X
u
[
2
+[
Y
u
[
2
)du < , a.s.
Provide the dynamics of the process Z := f(X, Y ) for the following functions
f:
1. f(x, y) = xy, x, y R,
2. f(x, y) =
x
y
, x, y R, (assuming that Y
0
,
Y
and
Y
are such that Y is a
positive process).
Exercise 6.10. Let a and be two measurable and Fadapted processes such
that
_
t
0
[a
s
[ds +
_
t
0
[
s
[
2
ds < .
1. Prove the integration by parts formula:
_
t
0
_
s
0

u
a
s
dW
u
ds =
__
t
0

u
dW
u
___
t
0
a
s
ds
_

_
t
0
_
u
0
a
s

u
dsdW
u
.
2. We now take the processes a
s
= a(s) and b
s
= b(s) to be deterministic
functions. Show that the random variable
_
t
0
_
s
0

u
a
s
dW
u
ds has a Gaus-
sian distribution, and compute the corresponding mean and variance.
We conclude this section by providing an example of local martingale which
fails to be a martingale, although positive and uniformly integrable.
Example 6.11. (A strict local martingale) Let W be a Brownian motion in
R
d
.
In the one-dimensional case d = 1, we have P[[W
t
[ > 0, t > 0] = 0, see
also Proposition 4.18.
When d 2, the situation is drastically dierent as it can be proved that
P[[W
t
[ > 0, for every t > 0] = 1, see e.g. Karatzas and Shreve [29] Proposition
3.22 p161. In words, this means that the Brownian motion never returns to the
origin Pa.s. This is a well-known result for random walks on Z
d
(as studied
in MAP 432). Then, for a xed t
0
> 0, the process
X
t
:= [W
t
0
+t
[
1
, t 0 ,
is well-dened, and it follows from It os formula that
dX
t
= X
3
t
_
1
2
(3 d)dt W
t
dW
t
_
.
72 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
We now consider the special case d = 3. By the previous Proposition 5.10, it
follows from It os formula that X is a local martingale. However, by the scaling
property of the Brownian motion, we have
E[X
t
] =
_
t
0
t +t
0
E[X
0
] , t 0 ,
so that X has a non-constant expectation and can not be a martingale.
Passing to the polar coordinates, we calculate directly that
E
_
X
2
t

= (2(t
0
+t))
3/2
_
[x[
2
e
|x|
2
/2(t
0
+t)
dx
= (2(t
0
+t))
3/2
_
r
2
e
r
2
/2(t
0
+t)
4r
2
dr
=
1
t
0
+t

1
t
0
for every t 0.
This shows that sup
t0
E
_
X
2
t

< . In particular, X is uniformly integrable.


6.3 Levys characterization of Brownian motion
The next result is valid in a larger generality than the present framework.
Theorem 6.12. Let W be a Brownian motion in R
d
, and an /
R
(n, d)valued
process with components in H
2
, and such that
_
t
0

s

T
s
ds = t I
n
for all t 0.
Then, the process X dened by:
X
j
t
:= X
j
0
+
d

k=1
_
t
0

jk
t
dW
k
t
, j = 1, . . . , n
is a Brownian motion on R
n
.
Proof. Clearly X
0
= 0 and X has continuous sample paths, a.s. and is Fadapted.
To complete the proof, we show that X
t
X
s
is independent of T
s
, and is
distributed as a N(0, (t s)I
n
). By using the characteristic function, this is
equivalent to show
E
_
e
iu(X
t
X
s
)

T
s
_
= e
|u|
2
(ts)/2
for all u R
n
, 0 s t. (6.6)
For xed s, we apply It os formula to the function f(x) := e
iux
:
e
iu(X
t
X
s
)
= 1 +i
d

j=1
u
j
_
t
s
e
iu(X
r
X
s
)
dX
j
r

1
2
[u[
2
_
t
s
e
iu(X
r
X
s
)
dr
6.4. Black-Scholes by verification 73
Since [f[ 1 and
s

T
s
= I
n
, dtdPa.s., we have E
_
_
t
s
e
iu(X
r
X
s
)
dX
j
r

T
s
_
=
0. Then, the function h(t) := E
_
e
iu(X
t
X
s
)

T
s

satises the ordinary dieren-


tial equation:
h(t) = 1
1
2
[u[
2
_
t
s
h(r)dr,
and therefore h(t) = e
|u|
2
(ts)/2
, which is the required result (6.6).
6.4 A verication approach to the Black-Scholes
model
Our rst contact with the Black-Scholes model was by means of the continuous-
time limit of the binomial model. We shall have a cleaner presentation in the
larger class of models later on when we will have access to the change of measure
tool. In this paragraph, we provide a continuous-time presentation of the Black-
Scholes model which only appeals to It os formula.
Consider a nancial market consisting of a nonrisky asset with constant
interest rate r 0, and a risky asset dened by
S
t
= S
0
exp
_
(
1
2

2
)t +W
t
_
, t 0. (6.7)
An immediate application of It os formula shows that the dynamics of this
process are given by:
dS
t
S
t
= dt +dW
t
, t 0.
A portfolio strategy is a measurable and Fadapted process
t
, t [0, T] with
_
T
0
[
t
[
2
dt < , a.s. where
t
represents the amount invested in the risky asset
at time t, corresponding to a number of shares
t
/S
t
. Denoting by X
t
the value
of the portfolio at time t, we see that the investment in the nonrisky asset is
given by X
t

t
, and the variation of the portfolio value under the self-nancing
condition is given by
dX

t
=
t
dS
t
S
t
+ (X
t

t
)rdt, t 0.
We say that the portfolio is admissible if in addition the corresponding port-
folio value X

is bounded from below. The admissibility condition means that


the investor is limited by a credit line below which he is considered bankrupt.
We denote by / the collection of all admissible portfolios.
Using again It os formula, we see that the discounted portfolio value process

X
t
:= X
t
e
rt
satises the dynamics:
d

X

t
= e
rt

t
d

S
t

S
t
where

S
t
:= S
t
e
rt
, (6.8)
74 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
and then
d

S
t
= e
rt
(rS
t
dt +dS
t
) =

S
t
_
( r)dt +dW
t
_
. (6.9)
We recall the no-arbitrage principle which says that if a portfolio strategy on
the nancial market produces an a.s. nonegative nal portfolio value, starting
from a zero intial capital, then the portfolio value is zero a.s.
A contingent claim is an T
T
measurable random variable which describes
the random payo of the contract at time T. The following result is specic
to the case where the contingent claim is g(S
T
) for some deterministic function
g : R
+
R. Such contingent claims are called Vanilla options.
In preparation of the main result of this section, we start by
Proposition 6.13. Suppose that the function g : R
+
R has polynomial
growth, i.e. [g(s)[ (1 + s

) for some , 0. Then, the linear partial


dierential equation on [0, T) (0, ):
Lv :=
v
t
+rs
v
s
+
1
2

2
s
2

2
v
s
2
rv = 0 and v(T, .) = g, (6.10)
has a unique solution v C
0
([0, T] (0, )) C
1,2
([0, T), (0, )) in the class
of polynomially growing functions, and given by
v(t, s) = E
_
e
r(Tt)
g
_

S
T
_
[S
t
= s
_
, (t, s) [0, T] (0, )
where

S
T
:= e
(r)(Tt)
S
T
.
Proof. We denote V (t, s) := E
_
e
r(Tt)
g
_

S
T
_
[S
t
= s
_
.
1- We rst observe that V C
0
([0, T] (0, )) C
1,2
([0, T), (0, )). To see
this, we simply write
V (t, s) = e
r(Tt)
_
R
g (e
x
)
1
_
2
2
(T t)
e

1
2

xln(s)(r
1
2

2
)(Tt)

Tt

2
dx,
and see that the claimed regularity holds true by the dominated convergence
theorem.
2- Immediate calculation reveals that V inherits the polynomial growth of g.
Let

S
t
:= e
(r)t
S
t
, t 0, so that
d

S
t

S
t
= rdt +dW
t
,
an consider the stopping time := inf
_
u > t : [ ln (

S
u
/s)[ > 1
_
. Then, it fol-
lows from the law of iterated expectations that
V (t, s) = E
t,s
_
e
r(ht)
V
_
h,

S
h
__
6.4. Black-Scholes by verification 75
for every h > t, where we denoted by E
t,s
the expectation conditional on

S
t
=
s. It then follows from It os formula that
0 = E
t,s
_
_
h
t
e
ru
LV (u,

S
u
)du +
_
h
t
e
ru
V
s
(u,

S
u
)

S
u
dW
u
_
= E
t,s
_
_
h
t
e
ru
LV (u,

S
u
)du
_
.
Normalizing by (h t) and sending h t, il follows from the dominated con-
vergence theorem that V is a solution of (6.10).
3- We next prove the uniqueness among functions of polynomial growth. Let

n
:= inf
_
u > t : [ ln (

S
u
/s)[ > n
_
, and consider an arbitrary solution v of
(6.10) with [v(t, s)[ (1 + s

) for some , 0. Then, it follows from


It os formula together with the fact that v solves (6.10) that:
e
r(T
n
)
v
_
T
n
,

S
T
n
_
= e
rt
v(t, s) +
_
T
n
t
e
rt
V
s
(u,

S
u
)

S
u
dW
u
.
Since the integrand in the latter stochastic integral is bounded on [t, T
n
],
this provides
e
rt
v(t, s) = E
t,s
_
e
r(T
n
)
v
_
T
n
,

S
T
n
__
.
By the continuity of v, we see that e
r(T
n
)
v (T
n
, S
T
n
) e
rT
v(T, S
T
) =
e
rT
g(S
T
), a.s. We next observe from the polynomial growth of v that

e
r(T
n
)
v
_
T
n
,

S
T
n
_


_
1 +e
rT+ max
uT
W
u
_
L
1
.
We then deduce from the dominated convergence theorem that v(t, s) = V (t, s).

We now have the tools to obtain the Black-Scholes formula in the continuous-
time framework of this section.
Proposition 6.14. Let g : R
+
R be a polynomially growing function
bounded from below, i.e. c g(s) (1 + s

) for some , 0. Assume


that the contingent claim g(S
T
) is available for trading, and that the nancial
market satises the no-arbitrage condition. Then:
(i) the market price at time 0 of the contingent claim g(S
T
) is given by V (0, S
0
) =
E
_
e
rT
g(

S
T
)
_
,
(ii) there exists a replicating portfolio

/ for g(S
T
), i.e. X

0
= V (0, S
0
)
and X

T
= g(S
T
), a.s.
76 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
Proof. By It os formula, we compute that:
e
rT
V (T, S
T
) = V (0, S
0
) +
_
T
0
e
rt
_
rV +
V
t
+s
V
s
+
1
2

2
s
2

2
V
s
2
_
(t, S
t
)dt
+
_
T
0
e
rt
V
s
(t, S
t
)S
t
dW
t
.
Since V solves the PDE (6.10), this provides:
e
rT
g(S
T
) = V (0, S
0
) +
_
T
0
e
rt
V
s
(t, S
t
) (rS
t
dt +dS
t
)
= V (0, S
0
) +
_
T
0

S
t
V
s
(t, S
t
)
d

S
t

S
t
,
which, in view of (6.9), can be written in:
e
rT
g(S
T
) = V (0, S
0
) +
_
T
0
e
rt

t
d

S
t

S
t
where

t
:= S
t
v
s
(t, S
t
).
By (6.8), we see that e
rT
g(S
T
) =

X

T
with

X

0
= V (0, S
0
), and therefore
X

T
= g(S
T
), a.s.
Notice that

X

t
= E[e
rT
g(S
T
)[T
t
], t [0, T]. Then, the process X inherits
the lower bound of g, implying that

/.
We nally conclude the proof by using the no-arbitrage property of the
market consisting of the nonrisky asset, the risky one, and the contingent claim,
arguying exactly as in Section 2.1 (iii).
Exercise 6.15. Let g(s) := (s K)
+
for some K > 0. Show that the no-
arbitrage price of the last proposition coincides with the Black-Scholes formula
(2.8).
6.5 The Ornstein-Uhlenbeck process
We consider the It o process dened on (, T, F, P) by
X
t
:= b + (X
0
b)e
at
+
_
t
0
e
a(ts)
dW
s
, t 0
where X
0
is an T
0
measurable square integrable r.v.
6.5.1 Distribution
Conditional on X
0
, X is a continuous gaussian process whose covariance function
can be explicitly computed:
Cov (X
s
, X
t
[X
0
) = e
a(ts)
Var [X
s
[X
0
] for 0 s t.
6.5. Ornstein-Uhlenbeck process 77
The conditional mean and variance ar given by:
E[X
t
[X
0
] = b + (X
0
b) e
at
, Var [X
t
[X
0
] =

2
2a
_
1 e
2at
_
.
We then deduce the unconditional mean:
E[X
t
] = b + (E[X
0
] b) e
at
and the variance
Var [X
t
] = EVar[X
t
[X
0
] +Var E[X
t
[X
0
]
= e
2at
Var[X
0
] +

2
2a
_
1 e
2at
_
Since X[X
0
is Gaussian, we deduce that
if X
0
is distributed as N
_
b,

2
2a
_
, then X
t
=
d
X
0
for any t 0, where =
d
denotes equality in distribution,
if X
0
is an T
0
Gaussian r.v. then X
0
is independent of W, and
X
t
N
_
b,

2
2a
_
in law as t .
We say that N
_
b,

2
2a
_
is the invariant (or stationary) distribution of the process
Y .
Finally, when X
t
models the instantaneous interest rate, the discount factor
for a payo at time T is dened by exp
_

_
T
0
X
t
dt
_
plays an important role in
the theory of interest rates. The distribution of
_
T
0
X
t
dt can be characterized
as follows:
_
T
0
X
t
dt = bT + (X
0
b)A(T) +
_
T
0
M
t
dA(t),
where M
t
:=
_
t
0
e
as
dW
s
dt and A(t) :=
_
t
0
e
as
ds =
1 e
at
a
. Recall that the
integration by parts formula is valid, as a consequence of It os formula. Then,
_
T
0
X
t
dt = bT + (X
0
b)A(T) +
_
T
0
_
_
T
t
e
as
ds
_
e
at
dW
t
= bT + (X
0
b)A(T) +
_
T
0
A(T t)dW
t
.
Hence, conditional on X
0
, the r.v.
_
T
0
X
t
dt is distributed as Gaussian with
mean bT +(X
0
b)A(T) and variance
2
_
T
0
A(t)
2
dt. We can even conclude that,
78 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
conditional on X
0
, the joint distribution of the pair Z
T
:=
_
X
T
,
_
T
0
X
t
dt
_
is
Gaussian with mean
E[Z
T
] =
_
b + (X
0
b) e
aT
bT + (X
0
b)A(T)
_
and variance
V[Z
T
] =
2
_
_
T
0
e
2at
dt
_
T
0
e
at
A(t)dt
_
T
0
e
at
A(t)dt
_
T
0
A(t)
2
dt
_
.
6.5.2 Dierential representation
Let Y
t
:= e
at
X
t
, t 0. Then, by direct calculation, we get:
Y
t
= X
0
+b
_
e
at
1
_
+
_
t
0
e
as
dW
s
,
or, in dierential form,
dY
t
= abe
at
dt +e
at
dW
t
.
By direct application of It os formula, we then obtain the process X
t
= e
at
Y
t
in dierential form:
dX
t
= a (b X
t
) +dW
t
.
This is an example of stochastic dierential equation, see Chapter 8 for a sys-
tematic treatment. The latter dierential form shows that, whenever a > 0, the
dynamics of the process X exhibit a mean reversion eect in the sense that
if X
t
> b, then the drift is pushing the process down towards b,
similarly, if X
t
< b, then the drift is pushing the process up towards b.
The mean-reversion of this gaussian process is responsible for its popularity on
many application. In particular, in nance this process is commonly used for
the modelling of interest rates.
Exercise 6.16. Let B be a Brownian motion, and consider the processes
X
t
:= e
t
B
e
2t , Y
t
:= X
t
X
0
+
_
t
0
X
s
ds, t 0.
be an Ornstein-Uhlenbeck process dened by the dynamics dY
t
= Y
t
dt+

2dW
t
.
1. Prove that Y
t
=
_
e
2t
1
u
1/2
dB
u
, t 0.
2. Deduce that X
t
, t 0 is an Ornstein-Uhlenbeck process.
6.6. Mertons portfolio management problem 79
6.6 Application to the Merton optimal portfolio
allocation problem
6.6.1 Problem formulation
Consider a nancial market consisting of a nonrisky asset, with constant interest
rate r, and a risky asset with price process dened by (6.7), so that
dS
t
S
t
= dt +dW
t
.
A portfolio strategy is an Fadapted process such that
_
T
0
[
t
[
2
dt < Pa.s.
representing the proportion of wealth invested in the risky asset at time t. Let
/ denote the set of all portfolio strategies.
Under the self-nancing condition, the portfolio value at time t is dened
by:
X
t
= e
rt
_
x +
_
t
0

u
X
u
e
ru
d (S
u
e
ru
)
S
u
e
ru
_
.
Then,
dX
t
= rX
t
dt +
t
X
t
(( r)dt +dW
t
) ,
and it follows from a direct application of It os formula to the function ln X
t
that
d ln X
t
=
_
r + ( r)
t

1
2

2
t
_
dt +
t
dW
t
.
This provides the expression of X
t
in terms of the portfolio strategy and the
initial capital X
0
:
X
t
= X
0
exp
__
t
0
_
r + ( r)
u

1
2

2
u
_
du +
_
t
0

u
dW
u
_
.(6.11)
The continuous-time optimal portfolio allocation problem, as formulated by
Merton (1969), is dened by:
V
0
(x) := sup
A
E[U (X
x,
T
)] , (6.12)
where U is an increasing strictly concave function representing the investor
utility, i.e. describing his preferences and attitude towards risk. We assume
that U is bounded from below, which guarantees thatt he expectation in (6.12)
is well-dened.
80 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
6.6.2 The dynamic programming equation
The dynamic programming technique is a powerful approach to stochatsic con-
trol problems of the type (6.12). This method was developed by Richard Bell-
man in the fties, while at the same time the russian school was exploring the
stochastic extension of the Pontryagin maximum principle. Our objective is to
provide an intuitive introduction the dynamic programming approach in order
to motivate our solution approach used in the subsequent section.
The main idea is to dene a dynamic version V
t
(x) of the problem (6.12)
by moving the time origin from 0 to t. For simplicity, let us restrict the set of
portfolio strategies to those so-called Markov ones, i.e.
t
= (t, X
t
) Then, it
follows from the law of iterated expectations that
V
t
(x) = sup
A
E
t,x
[U (X

T
)]
= sup
A
E
t,x
_
E
t+h,X

t+h
U (X

T
)
_
,
where we denoted by E
t,x
the expectation operator conditional on X

t
= x.
Then, we formally expect that the following dynamic programming principle
V
t
(x) = sup
A
E
t,x
_
V
_
t +h, X

t+h
_
holds true. A rigorous proof of the claim is far from obvious, and is not needed
in these notes, as this paragraph is only aiming at developing a good intuition
for the subsequent solution approach of the Merton problem.
We next assume that V is known to be suciently smooth so as to allow for
the use of It os formula. Then:
0 = sup
A
E
t,x
_
V
_
t +h, X

t+h
_
V (t, x)

= sup
A
E
t,x
_
_
t+h
t
L

t
V (u, X

u
) du +
_
t+h
t
V
x
(u, X

u
)
u
X
u
dW
u
_
,
where, for a function v C
1,2
([0, T), R), we denote
L

v :=
v
t
+x(r +( r))
v
x
+
1
2
x
2

2
v
x
2
. (6.13)
We continue our intuitive presentation by forgetting about any diculties re-
lated to some strict local martingale feature of the stochastic integral inside the
expectation. Then
0 = sup
A
E
t,x
_
1
h
_
t+h
t
L

t
V (u, X

u
) du
_
,
and by sending h 0, we expect from the mean value theorem that V solves
the nonlinear partial dierential equation
sup
R
L

V (t, x) = 0 on [0, T) R and V (T, .) = U.


6.6. Mertons portfolio management problem 81
The latter partial dierential equation is the so-called dynamic programming
equation, also referred to as the Hamilton-Jacobi-Bellman equation.
Our solution approach for the Merton problem will be the following. suppose
that one is able to derive a solution v of the dynamic programming equation,
then use a verication argument to prove that the candidate v is indeed the
value function of the optimal portfolio allocation problem.
6.6.3 Solving the Merton problem
We recall that for a C
1,2
([0, T], R) function v, it follows from It os formula that
v(t, X

t
) = v(0, X
0
) +
_
T
0
L

t
v(s, X
s
)ds +
_
t
0
v
x
(s, X
s
)
t
X
t
dW
t
,
where the operator L

is dened in (6.13).
Proposition 6.17. Let v C
0
([0, T] R) C
1,2
([0, T), R) be a nonnegative
function satisfying
v(T, .) U and L

v(t, x) 0 for all (t, x) [0, T) R, R.


Then v(0, x) V
0
(x).
Proof. For every portfolio strategy / and t T, it follows from It os
formula that:
M
t
:=
_
t
0
v
x
(u, X
x,
u
) X
x,
u

u
dW
u
= v (t, X
x,
t
) v(0, x)
_
t
0
L

u
v (u, X
x,
u
) du
v (t, X
x,
t
) v(0, x).
Since v 0, the process M is a supermartingale, as a local martingale bounded
from below by a constant. Then:
0 E[M
T
] E[v (T, X
x,
T
)] v(0, x) = E[U (X
x,
T
)] v(0, x),
and the required inequality follows from the arbitrariness of /.
We continue the discussion of the optimal portfolio allocation problem in
the context of the power utility function:
U(x) = x
p
, 0 < p < 1. (6.14)
This induces an important simplication as we immediately verify from (6.11)
that V
0
(x) = x
p
V
0
(1). We then search for a solution of the partial dierential
equation
sup
R
L

v = 0 and v(T, x) = x
p
,
82 CHAPTER 6. ITO DIFFERENTIAL CALCULUS
of the form v(t, x) = x
p
h(t) for some function h. Plugging this form in the above
nonlinear partial dierential equation leads to an ordinary dierential equation
for the function h, and provides the candidate solution:
v(t, x) := x
p
e
p(Tt)

r+
(r)
2
2(1p)
2

, t [0, T], x 0.
By candidate solution, we mean that v satises:
sup
R
L

v = L

v = 0 and v(T, x) = x
p
, where :=
r
(1 p)
2
.
Proposition 6.18. In the context of the power utility function (6.14), the value
function of the optimal portfolio allocation problem is given by:
V
0
(x) = v(0, x) = x
p
e
pT

r+
(r)
2
2(1p)
2

,
and the constant portfolio strategy
u
:= is an optimal portfolio allocation.
Proof. Let

X := X
x,
,
n
:= T inf
_
t > 0 :

X
t
n
_
, and

M
t
:=
_
t
0
v
x
_
u,

X
u
_

X
u

u
dW
u
, t [0, T).
Since the integrand in the expression of

M is bounded on [0,
n
], we see that
the stopped process M
t
n
, t [0, T] is a martingale. We then deduce from
It os formula together with the fact that L

v = 0 that:
0 = E
_

M

n
_
= E
_
v
_

n
,

X

n
__
v(0, x). (6.15)
We next observe that, for some constant C > 0,
0 v
_

n
,

X

n
_
Ce
C max
tT
W
t
L
1
,
recall that max
tT
W
t
=
d
[W
t
[. We can then use the dominated convergence
theorem to pass to the limit n in (6.15):
lim
n
E
_
v
_

n
,

X

n
__
= E
_
v
_
T,

X
T
__
= E
_
U
_

X
T
__
,
where the last equality is due to v(T, .) = U. Hence, V
0
(x) = v(0, x) and is
an optimal portfolio strategy.
Chapter 7
Martingale representation
and change of measure
In this chapter, we develop two essential tools in nancial mathematics. The
martingale representation is the mathematical counterpart of the hedging port-
folio. Change of measure is a crucial tool for the representation and the calcula-
tion of valuation formulae in the everyday life of the nancial industry oriented
towards derivative securities. The intuition behind these two tools can be easily
understood in the context of the one-period binomial model of Section 2.1 of
Chapter 2:
- Perfect hedging of derivative securities in the simple one-period model
is always possible, and reduces to a linear system of two equations with two
unknowns. It turns out that this property is valid in the framework of the
Brownian ltration, and this is exactly what the martingale representation is
about. But, we should be aware that this result is specic to the Brownian
ltration, and fails in more general models... this topic is outside the scope of
the present lectures notes.
- The hedging cost in the simple one-period model, which is equal to the no-
arbitrage price of the derivative secutity, can be expressed as an expected value
of the discounted payo under the risk-neutral measure. This representation is
very convenient for the calculations, and builds a strong intuition for the next
developments of the theory. The Girsanov theorem provides the rigorous way
to express expectation under alternative measures than the initially given one.
7.1 Martingale representation
Let F be the canonical ltration of Brownian motion completed with the null
sets, and consider a random variable F L
1
(T
T
, P). The goal of this section is
to show that any such random variable or, in other words, any path-dependent
functional of Brownian motion, can be represented as a stochastic integral of
some process with respect to Brownian motion (and hence, a martingale). This
83
84 CHAPTER 7. CHANGE OF MEASURE
has a natural application in nance where one is interested in replicating con-
tingent claims with hedging portfolios. We start with square integrable random
variables.
Theorem 7.1. For any F L
2
(T
T
, P) there exists a unique adapted process
H H
2
such that
F = E[F] +
_
T
0
H
s
dW
s
. (7.1)
Proof. The uniqueness is an immediate consequence of the It o isometry. We
prove the existence in the two following steps.
Step 1 We rst prove that for every integer n 1, 0 t
1
< . . . < t
n
, and
every bounded function f : (R
d
)
n
R, the representation (7.1) holds with
:= f(W
t
1
, . . . , W
t
n
) for some H H
2
. To see this, denote x
i
:= (x
1
, . . . , x
i
)
and set
u
n
(t, x
n
) := E
_
f(x
n1
, W
t
n
)[W
t
= x
n

, t
n1
t t
n
.
Then, for all (x
n1
) xed, the mapping (t, x
n
) u
n
(t, x
n1
, x
n
) is C

on
[t
n1
, t
n
) R
d
, see Remark 4.7. Then it follows from It os formula that
f(x
n1
, W
t
n
) = f
n1
(x
n1
) +
_
t
n
t
n1
u
n
y
n
(s, x
n1
, W
s
) dW
s
, (7.2)
where
f
n1
(x
n1
) := u
n
(t
n1
, x
n1
, x
n1
).
We next x (x
n2
), and dene the C

function of (t, x
n1
) [t
n2
, t
n1
) R
d
:
u
n1
(t, x
n2
, x
n1
) := E
_
f
n1
(x
n2
, W
t
n1
)[W
t
= x
n1

,
so that
f
n1
(x
n2
, W
t
n1
) = f
n2
(x
n2
) +
_
t
n1
t
n2
u
n1
y
n1
(s, x
n2
, W
s
) dW
s
, (7.3)
where
f
n2
(x
n2
) := u
n1
(t
n2
, x
n2
, x
n2
).
Combining (7.2) and (7.3), we obtain:
f(x
n2
, W
t
n1
, W
t
n
) = f
n2
(x
n2
) +
_
t
n1
t
n2
u
n1
y
n1
(s, x
n2
, W
s
) dW
s
+
_
t
n
t
n1
u
n
y
n
(s, x
n2
, W
t
n1
, W
s
) dW
s
.
7.1. Martingale representation 85
Repeating this argument, it follows that
f(W
t
1
, . . . , W
t
n
) = f
0
(0) +
_
t
n
0
H
s
dW
s
,
where
H
s
:=
u
i
y
i
(s, W
t
1
, . . . , W
t
i1
, W
s
) dW
s
, s [t
i1
, t
i
),
and for i = 1, . . . , n:
u
i
(t, x
i
) := E
_
f
i
(x
i1
, W
t
i
)[W
t
= x
i

, t
i1
t t
i
,
f
i1
(x
i1
) := u
i
(t
i
, x
i1
, x
i1
).
Since f is bounded, it follows that H H
2
.
Step 2 The subset 1 of L
2
for which (7.1) holds is a closed linear subspace of
the Hilbert space L
2
. To complete the proof, we now prove that its L
2
orthogonal
1

is reduced to 0.
Let 1 denote the set of all events E = (W
t
1
, . . . , W
t
n
) A for some
n 1, 0 t
1
< . . . , t
n
, and A B
_
(R
d
)
n
_
. Then 1 is a system (i.e.
stable by intersection), and by the previous step 1
A
(W
t
1
, . . . , W
t
n
) 1 for all
0 t
1
< . . . , t
n
. Then, for all 1

, we have E[1
A
(W
t
1
, . . . , W
t
n
)] = 0 or,
equivalently
E[
+
1
A
(W
t
1
, . . . , W
t
n
)] = E[

1
A
(W
t
1
, . . . , W
t
n
)].
In other words, the measures dened by the densities
+
and

agree on the
system 1. Since (1) = T
T
, it follows from Proposition A.5 that
+
=

a.s.
The last theorem can be stated equivalently in terms of square integrable
martingales.
Theorem 7.2. Let M
t
, 0 t T be a square integrable martingale. Then,
there exists a unique process H H
2
such that
M
t
= M
0
+
_
t
0
H
s
dW
s
, 0 t T.
In particular, M has continuous sample paths, a.s.
Proof. Apply Theorem 7.1 to the square integrable r.v. M
T
, and take condi-
tional expectations.
We next extend the representation result to L
1
.
Theorem 7.3. For any F L
1
(T
T
, P) there exists a process H H
2
loc
such
that
F = E[F] +
_
T
0
H
s
dW
s
. (7.4)
86 CHAPTER 7. CHANGE OF MEASURE
Proof. Let M
t
= E[F[T
t
]. The rst step is to show that M is continuous. Since
bounded functions are dense in L
1
, there exists a sequence of bounded random
variables (F
n
) such that |F
n
F|
L
1 3
n
. We dene M
n
t
= E[F
n
[T
t
]. From
Theorem 3.11 (ii),
P[ sup
0tT
[M
n
t
M
t
[ > 2
n
] 2
n
E[[F F
n
[]
_
2
3
_
n
.
Therefore, by Borel-Cantelli lemma, the martingales M
n
converge uniformly to
M, but since F
n
L
2
(T
T
, P), M
n
is continuous for each n (by theorem 7.1), so
the uniform limit M is also continous.
The second step is to show that M can be represented as stochastic integral.
From continuity of M it follows that [M
t
n
[ n for all t with
n
= inft :
[M
t
[ n. Therefore, by theorem 7.1, for each n, there exists a process H
n
H
2
with
M
t
n
= E[M
t
n
] +
_
t
n
0
H
n
s
dW
s
= E[F] +
_
t
n
0
H
n
s
dW
s
.
Moreover, It o isometry shows that the processes H
n
and H
m
must coincide for
t
n

m
. Dene the process
H
t
:=

n1
H
n
t
1
t]
n1
,
n
]
.
Since M is continuous, it is uniformly continuous on [0, T], and therefore bounded
a.s., which means that almost surely, starting from some n, H
t
= H
n
t
for all
t [0, T], and so
_
T
0
H
2
s
ds < a.s.
On the other hand, for every n,
M
t
n
= E[F] +
_
t
n
0
H
s
dW
s
.
By passing to the almost-sure limit on each side of this equation, the proof is
complete.
With the latter result, we can now extend the representation Theorem 7.2
to local martingales. Notice that uniqueness of the integrand process is lost.
Theorem 7.4. Let M
t
, 0 t T be a local martingale. Then, there exists a
process H H
2
loc
such that
M
t
= M
0
+
_
t
0
H
s
dW
s
, 0 t T.
In particular, M has continuous sample paths, a.s.
7.2. Cameron-Martin formula 87
Proof. There exists an increasing sequence of stopping times (
n
)
n1
with
n

, a.s., such that the stopped process M
n
t
:= M
t
n
, t [0, T] is a bounded
martingale for all n. By Theorem 7.3 to the bounded r.v. M
n
T
can be repre-
sented as M
n
T
= M
0
+
_
T
0
H
n
s
dW
s
, and it follows from direct conditioning that
M
n
t
= M
0
+
_
t
0
H
n
s
dW
s
for all t [0, T]. Similar to the proof of the previ-
ous Theorem 7.3, it follows from the It o isometry that H
n
and H
m
coincide
dt dPa.s. on [0,
n

m
], and we may dene H
t
:=

n1
H
n
t
1
]
n1
,
n
]
so
that M
n
t
= M
0
+
_
t
0
H
s
dW
s
, and we conclude by taking the almost sure limit
in n.
7.2 The Cameron-Martin change of measure
Let N be a Gaussian random variable with mean zero and unit variance. The
corresponding probability density function is

x
P[N x] = f(x) =
1

2
e
x
2
/2
, x R.
For any constant a R, the random variable N + a is Gaussian with mean a
and unit variance with probability density function

x
P[N +a x] = f(x a) = f(x)e
ax
a
2
2
, x R.
Then, for every (at least bounded) function , we have
E[(N +a)] =
_
(x)f(x)e
ax
a
2
2
dx = E
_
e
aN
a
2
2
(N)
_
.
This easy result can be translated in terms of a change of measure. Indeed, since
the random variable e
aN
a
2
2
is positive and integrates to 1, we may introduce
the equivalent measure Q := e
aN
a
2
2
P. Then, the above equality says that the
Qdistribution of N coincides with the Pdistribution of N +a, i.e.
under Q, N a is distributed as ^(0, 1) .
The purpose of this section is to extend this result to a Brownian motion W
in R
d
. Let h : [0, T] R
d
be a deterministic function in L
2
, i.e.
_
T
0
[h(t)[
2
dt <
. From Theorem 5.3, the stochastic integral
N :=
_
T
0
h(t) dW
t
is well-dened as the L
2
limit of the stochastic integral of some H
2
approximating
simple function. In particular, since the space of Gaussian random variables is
closed, it follows that
N is distributed as ^
_
0,
_
T
0
[h(t)[
2
dt
_
,
88 CHAPTER 7. CHANGE OF MEASURE
and we may dene an equivalent probability measure Q by:
dQ
dP
:= e
R
T
0
h(t)dW
t

1
2
R
T
0
|h(t)|
2
dt
. (7.5)
Theorem 7.5 (Cameron-Martin formula). For a Brownian motion W in R
d
,
let Q be the probability measure equivalent to P dened by (7.5). Then, the
process
B
t
:= W
t

_
t
0
h(u)du, t [0, T] ,
is a Brownian motion under Q.
Proof. We rst observe that B
0
= 0 and B has a.s. continuous sample paths.
It remains to prove that, for 0 s < t, B
t
B
s
is independent of T
s
and
distributed as a centered Gaussian with variance t s. To do this, we compute
the QLaplace transform
E
Q
_
e
(W
t
W
s
)

T
s
_
= E
_
E
_
dQ
dP

T
t
_
e
(W
t
W
s
)

T
s
_
= E
_
e
R
t
s
h(u)dW
u

1
2
R
t
s
|h(u)|
2
du
e
(W
t
W
s
)

T
s
_
= e

1
2
R
t
s
|h(u)|
2
du
E
_
e
R
t
s
(h(u)+)dW
u

T
s
_
Since the random variable
_
t
s
(h(u)+)dW
u
is a centered Gaussian with variance
_
t
s
[h(u) +[
2
du, independent of T
s
, this provides:
E
Q
_
e
(W
t
W
s
)

T
s
_
= e

1
2
R
t
s
|h(u)|
2
du
e
1
2
R
t
s
|h(u)+|
2
= e
1
2

2
(ts)+
R
t
s
h(u)du
.
This shows that W
t
W
s
is independent of T
s
and is distributed as a Gaussian
with mean
_
t
s
h(u)du and variance t s, i.e. B
t
B
s
is independent of T
s
and
is distributed as a centered Gaussian with variance t s.
7.3 The Girsanovs theorem
The Cameron-Martin change of measure formula of the preceding section can be
extended to adapted stochastic processes satisfying suitable integrability condi-
tions. Let W be a d-dimensional Brownian motion on the probability space
(, T, F, P). Given a T
T
-measurable positive random variable Z such that
E
P
[Z] = 1, we dene a new probability Q via Q := ZP:
Q(A) = E
P
[Z1
A
], A T
T
.
7.3. Girsanovs theorem 89
Z is called the density of Q with respect to P. For every A T
t
,
Q(A) = E
P
[Z1
A
] = E
P
[1
A
E[Z[T
t
]].
Therefore, the martingale Z
t
:= E[Z[T
t
] plays the role of the density of Q with
respect to P on T
t
. The following lemma shows how to compute conditional
expectations under Q.
Lemma 7.6 (Bayes rule). Let Y T
T
with E
Q
[[Y [] < . Then
E
Q
[Y [T
t
] =
E[ZY [T
t
]
E[Z[T
t
]
=
1
Z
t
E[ZY [T
t
].
Proof. Let A T
t
.
E
Q
[Y 1
A
] = E
P
[ZY 1
A
] = E
P
[E
P
[ZY [T
t
]1
A
]
= E
P
_
Z
E
P
[Z[T
t
]
E
P
[ZY [T
t
]1
A
_
= E
Q
_
E
P
[ZY [T
t
]
E
P
[Z[T
t
]
1
A
_
.
Since the above is true for any A T, this nishes the proof.
Let be a process in H
2
loc
. Inspired by equation (7.5), we dene a candidate
for the martingale density:
Z
t
= exp
__
t
0

s
dW
s

1
2
_
t
0
[
s
[
2
ds
_
, 0 t T. (7.6)
An application of It o formula gives the dynamics of Z:
dZ
t
= Z
t

t
dW
t
,
which shows that Z is a local martingale (take
n
= inft :
_
t
0
Z
2
s

2
s
ds n
as localizing sequence). As shown by the following lemma, Z is also a super-
martingale and therefore satises E[Z
t
] 1 for all t.
Lemma 7.7. For any stopping time s t T, E[Z
t
[T
s
] Z
s
.
Proof. Let (
n
) be a localizing sequence for Z. Then Z
t
n
is a true martingale
and by Fatous lemma, which can be applied to conditional expectations,
E[Z
t
[T
s
] = E[lim
n
Z
t
n
[T
s
] liminf
n
E[Z
t
n
[T
s
] = liminf
n
Z
s
n
= Z
s
.
However, Z may sometimes fail to be a true martingale. A sucient condi-
tion for Z to be a true martingale for given in the next section; for now let us
assume that this is the case.
90 CHAPTER 7. CHANGE OF MEASURE
Theorem 7.8 (Girsanov). Let Z be given by (7.6) and suppose that E[Z
T
] = 1.
Then the process

W
t
:= W
t

_
t
0

s
ds, t T
is a Brownian motion under the probability Q := Z
T
P on T
T
.
Proof. Step 1. Let Y
t
:=
_
t
0
b
s
d

W
s
, where b is an adapted process with
_
T
0
[b
s
[
2
ds,
and let X
t
:= Z
t
Y
t
. Applying It os formula to X, we get
dX
t
= Z
t
(b
t
d

W
t
) +Y
t
(
t
Z
t
dW
t
) +b
t
Z
t

t
dt = Z
t
(b
t
+Y
t

t
)dW
t
,
which shows that X is a local martingale under P. Let (
n
) be a localizing
sequence for X under P. By lemma 7.6, for t s,
E
Q
[Y
t
n
[T
s
] =
1
Z
s
E
P
[Z
T
Y
t
n
[T
s
]
=
1
Z
s
E
P
[E
P
[Z
T
[T
s
n
t
]Y
t
n
[T
s
] =
1
Z
s
E
P
[Z
s
n
t
Y
t
n
[T
s
]
=
1
Z
s
E
P
[X

n
t
1

n
s
+Z
s
Y

n
1

n
<s
[T
s
]
=
1
Z
s
X
s
1

n
s
+Z
s
Y

n
1

n
<s
= Y
s
n
,
which shows that Y is a local martignale under Q with (
n
) as localizing se-
quence.
Step 2. For a xed u R
d
, applying the It o formula to e
iu

W
between s and
t and multiplying the result by e
iu

W
s
, we get:
e
iu(

W
t


W
s
)
= 1 +i
_
t
s
e
iu(

W
r


W
s
)
ud

W
r

1
2
[u[
2
_
t
s
e
iu(

W
r


W
s
)
dr. (7.7)
By step 1, the stochastic integral
_
t
s
e
iu(

W
r


W
s
)
ud

W
r
is a local martingale (as a function of the parameter t), and since it is also
bounded (because all the other terms in the equation (7.7) are bounded), the
dominated convergence theorem implies that it is a true martingale and satises
E
Q
__
t
s
e
iu(

W
r


W
s
)
ud

W
r

T
s
_
= 0.
Let h
Q
(t) := E
Q
[e
iu(

W
r


W
s
)

T
s
]. Taking Q-expectations on both sides of
(7.7) leads to an integral equation for h
Q
:
h
Q
(t) = 1
1
2
[u[
2
_
t
s
h
Q
(r)dr h
Q
(t) = e
|u|
2
(ts)/2
.
7.4. Novikovs criterion 91
This proves that

W has independent stationary and normally distributed incre-
ments under Q. Since it is also continuous and

W
0
,

W is a Brownian motion
under Q.
7.4 The Novikovs criterion
Theorem 7.9 (Novikov). Suppose that
E[e
1
2
R
T
0
|
s
|
2
ds
] < .
Then E[Z
T
] = 1 and the process Z
t
, 0 t T is a martingale.
In the following lemma and below, we use the notation
Z
(a)
t
= exp
__
t
0
a
s
dW
s

a
2
2
_
t
0
[
s
[
2
ds
_
, 0 t T.
Lemma 7.10. Assume that
sup

E[e
1
2
R

0

s
dW
s
] < ,
where the sup is taken over all stopping times with T. Then for all
a (0, 1) and all t T, E[Z
(a)
t
] = 1.
Proof. Let q =
1
a(2a)
> 1 et r =
2a
a
> 1, and let be a stopping time with
T. Then, applying the H older inequality with
1
s
+
1
r
= 1 and observing that
s(aq a
_
q
r
) =
1
2
,
E[(Z
(a)

)
q
] = E[e
aq
R

0

s
dW
s

a
2
q
2
R

0
|
s
|
2
ds
]
= E[e
a

q
r
R

0

s
dW
s

a
2
qr
2
R

0
|
s
|
2
ds
e
(aqa

q
r
)
R

0

s
dW
s
]
E[Z
(a

qr)

]
1
r
E[e
1
2
R

0

s
dW
s
]
1
s
E[e
1
2
R

0

s
dW
s
]
1
s
by lemma 7.7. Therefore, sup

E[(Z
(a)

)
q
] is bounded. With the sequence
n

dened by
n
= inft :
_
t
0
a
2
(Z
(a)
s
)
2

2
s
ds n, we now have by Doobs maximal
inequality of Theorem 3.15,
E[sup
tT
Z
(a)
t
n
] E
__
sup
tT
Z
(a)
t
n
_
q
_
1
q

_
q
q 1
sup
tT
E[(Z
(a)
t
n
)
q
]
_1
q
< .
By monotone convergence, this implies
E[sup
tT
Z
(a)
t
] < ,
and by dominated convergence we then conclude that E[Z
(a)
t
] = 1 for all t
T.
92 CHAPTER 7. CHANGE OF MEASURE
Proof of theorem 7.9. For any stopping time T,
e
1
2
R

0

s
dW
s
= Z
1
2

_
e
1
2
R

0
|
s
|
2
ds
_1
2
,
and by an application of the Cauchy-Schwartz inequality, using lemma 7.7 and
the assumption of the theorem it follows that
sup

E[e
1
2
R

0

s
dW
s
] < . (7.8)
Fixing a < 1 and t T and observing that
Z
(a)
t
= e
a
2
R
t
0

s
dW
s

a
2
2
R
t
0
|
s
|
2
ds
e
a(1a)
R
t
0

s
dW
s
= Z
a
2
t
e
a(1a)
R
t
0

s
dW
s
,
we get, by H olders inequality and lemma 7.10,
1 = E[Z
(a)
t
] E[Z
t
]
a
2
E[e
a
1+a
R
t
0

s
dW
s
]
1a
2
E[Z
t
]
a
2
E[e
1
2
R
t
0

s
dW
s
]
2a(1a)
.
Making a tend to 1 and using (7.8) yields E[Z
t
] 1, and by lemma 7.7, E[Z
t
] =
1.
Now, let s t T. In the same way as in lemma 7.7, one can show that
E[Z
t
[T
s
] Z
s
, and taking the expectation of both sides shows that E[Z
t
] = 1
can hold only if E[Z
t
[T
s
] = Z
s
P-almost surely.
7.5 Application: the martingale approach to the
Black-Scholes model
This section contains the modern approach to the Black-Scholes valuation and
hedging theory. The prices will be modelled by It o processes, and the results
obtained by the previous approaches will be derived by the elegant martingale
approach.
7.5.1 The continuous-time nancial market
Let T be a nite horizon, and (, T, P) be a complete probability space sup-
porting a Brownian motion W = (W
1
t
, . . . , W
d
t
), 0 t T with values in R
d
.
We denote by F = F
W
= T
t
, 0 t T the canonical augmented ltration of
W, i.e. the canonical ltration augmented by zero measure sets of T
T
.
The nancial market consists in d + 1 assets :
(i) The rst asset S
0
is non-risky, and is dened by
S
0
t
= exp
__
t
0
r
u
du
_
, 0 t T,
where r
t
, t [0, T] is a non-negative measurable and adapted processes with
_
T
0
r
t
dt < a.s., and represents the instantaneous interest rate.
7.5. Martingale approach to Black-Scholes 93
(ii) The d remaining assets S
i
, i = 1, . . . , d, are risky assets with price
processes dened by the equations
S
i
t
= S
i
0
exp
_
_
_
t
0
_
_

i
u

1
2
d

j=1

ij
u

2
_
_
du +
d

j=1
_
t
0

ij
u
dW
j
u
_
_
, t 0 ,
where , are measurable and Fadapted processes with
_
T
0
[
i
t
[dt+
_
T
0
[
i,j
[
2
dt <
for all i, j = 1, . . . , d. Applying It os formula, we see that
dS
i
t
S
i
t
=
i
t
dt +
d

j=1

i,j
t
dW
j
t
, t [0, T],
for 1 i d. It is convenient to use the matrix notations to represent the
dynamics of the price vector S = (S
1
, . . . , S
d
):
dS
t
= diag[S
t
] (
t
dt +
t
dW
t
) , t [0, T],
where diag[S
t
] is the dddiagonal matrix with diagonal ith component given
by S
i
t
, and , are the R
d
vector with components
i
s, and the /
R
(d, d)matrix
with entries
i,j
.
We assume that the /
R
(d, d)matrix
t
is invertible for every t [0, T]
a.s., and we introduce the process

t
:=
1
t
(
t
r
t
1) , 0 t T,
called the risk premium process. Here 1 is the vector of ones in R
d
. We shall
frequently make use of the discounted processes

S
t
:=
S
t
S
0
t
= S
t
exp
_

_
t
0
r
u
du
_
,
Using the above matrix notations, the dynamics of the process

S are given by
d

S
t
= diag[

S
t
] (
t
r
t
1)dt +
t
dW
t
= diag[

S
t
]
t
(
t
dt +dW
t
) .
7.5.2 Portfolio and wealth process
A portfolio strategy is an Fadapted process =
t
, 0 t T with values
in R
d
. For 1 i n and 0 t T,
i
t
is the amount (in Euros) invested in the
risky asset S
i
.
We next recall the self-nancing condition in the present framework. Let X

t
denote the portfolio value, or wealth, process at time t induced by the portfolio
strategy . Then, the amount invested in the non-risky asset is X

n
i=1

i
t
= X
t

t
1.
Under the self-nancing condition, the dynamics of the wealth process is
given by
dX

t
=
n

i=1

i
t
S
i
t
dS
i
t
+
X
t

t
1
S
0
t
dS
0
t
.
94 CHAPTER 7. CHANGE OF MEASURE
Let

X be the discounted wealth process

X
t
:= X
t
exp
_

_
t
0
r(u)du
_
, 0 t T .
Then, by an immediate application of It os formula, we see that
d

X
t
=

t
diag[

S
t
]
1
d

S
t
(7.9)
=

t

t
(
t
dt +dW
t
) , 0 t T . (7.10)
We still need to place further technical conditions on , at least in order for the
above wealth process to be well-dened as a stochastic integral.
Before this, let us observe that, assuming that the risk premium process
satises the Novikov condition:
E
_
e
1
2
R
T
0
|
t
|
2
dt
_
< ,
it follows from the Girsanov theorem that the process
B
t
:= W
t
+
_
t
0

u
du, 0 t T , (7.11)
is a Brownian motion under the equivalent probability measure
Q := Z
T
P on T
T
where Z
T
:= exp
_

_
T
0

u
dW
u

1
2
_
T
0
[
u
[
2
du
_
.
In terms of the Q Brownian motion B, the discounted price process satises
d

S
t
= diag[

S
t
]
t
dB
t
, t [0, T],
and the discounted wealth process induced by an initial capital X
0
and a port-
folio strategy can be written in

t
=

X
0
+
_
t
0

u

u
dB
u
, for 0 t T. (7.12)
Denition 7.11. An admissible portfolio process =
t
, t [0, T] is a mea-
surable and Fadapted process such that
_
T
0
[
T
t

t
[
2
dt < , a.s. and the corre-
sponding discounted wealth process is bounded from below by a Qmartingale

t
M

t
, 0 t T, for some Qmartingale M

> 0.
The collection of all admissible portfolio processes will be denoted by /.
The lower bound M

, which may depend on the portfolio , has the interpre-


tation of a nite credit line imposed on the investor. This natural generalization
of the more usual constant credit line corresponds to the situation where the
total credit available to an investor is indexed by some nancial holding, such as
the physical assets of the company or the personal home of the investor, used as
collateral. From the mathematical viewpoint, this condition is needed in order
to exclude any arbitrage opportunity, and will be justied in the subsequent
subsection.
7.5. Martingale approach to Black-Scholes 95
7.5.3 Admissible portfolios and no-arbitrage
We rst dene precisely the notion of no-arbitrage.
Denition 7.12. We say that the nancial market contains no arbitrage op-
portunities if for any admissible portfolio process /,
X
0
= 0 and X

T
0 P a.s. implies X

T
= 0 P a.s.
The purpose of this section is to show that the nancial market described
above contains no arbitrage opportunities. Our rst observation is that, by the
very denition of the probability measure Q, the discounted price process

S
satises:
the process
_

S
t
, 0 t T
_
is a Qlocal martingale. (7.13)
For this reason, Q is called a risk neutral measure, or an equivalent local mar-
tingale measure, for the price process S.
We also observe that the discounted wealth process satises:

is a Qlocal martingale for every /, (7.14)


as a stochastic integral with respect to the QBrownian motion B.
Theorem 7.13. The continuous-time nancial market described above contains
no arbitrage opportunities.
Proof. For /, the discounted wealth process

X

is a Qlocal martingale
bounded from below by a Qmartingale. From Lemma 5.13 and Exercise 5.14,
we deduce that

X

is a Qsuper-martingale. Then E
Q
_

X

T
_


X
0
= X
0
. Recall
that Q is equivalent to P and S
0
is strictly positive. Then, this inequality shows
that, whenever X

0
= 0 and X

T
0 Pa.s. (or equivalently Qa.s.), we have

T
= 0 Qa.s. and therefore X

T
= 0 Pa.s.
7.5.4 Super-hedging and no-arbitrage bounds
Let G be an T
T
measurable random variable representing the payo of a deriva-
tive security with given maturity T > 0. The super-hedging problem consists in
nding the minimal initial cost so as to be able to face the payment G without
risk at the maturity of the contract T:
V (G) := inf
_
X
0
R : X

T
G P a.s. for some /
_
.
Remark 7.14. Notice that V (G) depends on the reference measure P only by
means of the corresponding null sets. Therefore, the super-hedging problem is
not changed if P is replaced by any equivalent probability measure.
The following properties of the super-hedging problem are easy to prove.
Proposition 7.15. The function G V (G) is
96 CHAPTER 7. CHANGE OF MEASURE
1. monotonically increasing, i.e. V (G
1
) V (G
2
) for every G
1
, G
2
L
0
(T)
with G
1
G
2
Pa.s.
2. sublinear, i.e. V (G
1
+G
2
) V (G
1
) +V (G
2
) for G
1
, G
2
L
0
(T),
3. positively homogeneous, i.e. V (G) = V (G) for > 0 and G L
0
(T),
4. V (0) = 0 and V (G) V (G) for every contingent claim G.
5. Let G be a contingent claim, and suppose that X

0
T
= G Pa.s. for some
X
0
and
0
/. Then
V (G) = inf
_
X
0
R : X

T
= G P a.s. for some /
_
.
Exercise 7.16. Prove Proposition 7.15.
We now show that, under the no-arbitrage condition, the super-hedging
problem provides no-arbitrage bounds on the market price of the derivative se-
curity.
Assume that the buyer of the contingent claim G has the same access to
the nancial market than the seller. Then V (G) is the maximal amount that
the buyer of the contingent claim contract is willing to pay. Indeed, if the seller
requires a premium of V (G) + 2, for some > 0, then the buyer would not
accept to pay this amount as he can obtain at least G be trading on the nancial
market with initial capital V (G) +.
Now, since selling of the contingent claim G is the same as buying the con-
tingent claim G, we deduce from the previous argument that
V (G) market price of G V (G) . (7.15)
Observe that this denes a non-empty interval for the market price of B under
the no-arbitrage condition, by Proposition 7.15.
7.5.5 The no-arbitrage valuation formula
We denote by p(G) the market price of a derivative security G.
Theorem 7.17. Let G be an T
T
measurabel random variable representing the
payo of a derivative security at the maturity T > 0, and recall the notation

G := Gexp
_

_
T
0
r
t
dt
_
. Assume that E
Q
[[

G[] < . Then


p(G) = V (G) = E
Q
[

G].
Moreover, there exists a portfolio

/ such that X

0
= p(G) and X

T
= G,
a.s., that is

is a perfect replication strategy.


7.5. Martingale approach to Black-Scholes 97
Proof. 1- We rst prove that V (G) E
Q
[

G]. Let X
0
and / be such that
X

T
G, a.s. or, equivalently,

X

T


G a.s. Since

X

is a Qlocal martingale
bounded from below by a Qmartingale, we deduce from Lemma 5.13 and
Exercise 5.14 that

X

is a Qsuper-martingale. Then X
0
=

X
0
E
Q
[

X

T
]
E
Q
[

G].
2- We next prove that V (G) E
Q
[

G]. Dene the Qmartingale Y
t
:= E
Q
[

G[T
t
]
and observe that F
W
= F
B
. Then, it follows from the martingale representation
theorem 7.3 that Y
t
= Y
0
+
_
t
0

u
dB
u
for some H
2
loc
. Setting

:= (
T
)
1
,
we see that

/ and Y
0
+
_
T
0


t
dB
t
=

G P a.s.
which implies that Y
0
V (G) and

is a perfect hedging stratgey for G, starting


from the initial capital Y
0
.
3- From the previous steps, we have V (G) = E
Q
[

G]. Applying this result to G,
we see that V (G) = V (G), so that the no-arbitrage bounds (7.15) imply that
the no-arbitrage market price of G is given by V (G).
98 CHAPTER 7. CHANGE OF MEASURE
Chapter 8
Stochastic dierential
equations
8.1 First examples
In the previous sections, we have handled the geometric Brownian motion de-
ned by
S
t
:= S
0
exp [(
1
2

2
)t +W
t
], t 0, (8.1)
where W is a scalar Brownian motion, and X
0
> 0, , R are given constants.
An immediate application of It os formula shows that X satises the dynamics
dS
t
= S
t
dt +S
t
dW
t
, t 0. (8.2)
This is our rst example of stochastic dierential equations since S appears on
both sides of the equation. Of course, the geometric Brownian motion (8.1) is
a solution of the stochastic dierential equation. A natural question is whether
this solution is unique. In this simple model, the answer to this squestion is
easy:
Since S
0
> 0, and any solution S of (8.2) has a.s. continuous sample
paths, as a consequence of the contituity of the stochastic integral t
_
t
0
S
s
dW
s
, we see that S hits 0 before any negative real number.
Let := inft : S
t
= 0 be the rst hitting time of 0, and set L
t
:= ln S
t
for t < ; then, it follows from It os formula that
dL
t
= dt +dW
t
, t < ,
which leads uniquely to L
t
= L
0
+ (
1
2

2
)t + W
t
, t < , which cor-
respond exactly to the solution (8.1). In particular = + a.s. and the
above arguments holds for any t 0.
99
100 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
Our next example of stochastic dierential equations is the so-called Ornstein-
Uhlenbeck process, which is widely used for the modelling of the term structure
of interest rates:
dX
t
= k(mX
t
)dt +dW
t
, t 0. (8.3)
where k, m, R are given constants. From the modelling viewpoint, the
motivation is essentially in the case k > 0 which induced the so-called mean
reversion: when X
t
> m, the drift points downwards, while for X
t
< m the
drift coecient pushes the solution upwards, hence the solution (if exists !) is
attracted to the mean level m. Again, the issue in (8.3) is that X appears on
both sides of the equation.
This example can also be handled explicitly by using the analogy with the
deterministic case (corresponding to = 0) which suggests the change of vari-
able Y
t
:= e
kt
X
t
. Then, it follows from It os formula that
dY
t
= mke
kt
dt +e
kt
dW
t
, t 0,
and we obtain as a unique solution
Y
t
= Y
0
+m
_
e
kt
1
_
+
_
t
0
e
ks
dW
s
, t 0,
or, back to X:
X
t
= X
0
e
kt
+m
_
1 e
kt
_
+
_
t
0
e
k(ts)
dW
s
, t 0.
The above two examples are solved by a (lucky) specic change of variable.
For more general stochastic dierential equations, it is clear that, as in the
deterministic framework, a systematic analysis of the existence and uniqueness
issues is needed, without any access to a specic change of variable. In this
section, we show that existence and uniqueness hold true under general Lipschitz
conditions, which of course reminds the situation in the deterministic case. More
will be obtained outside the Lipschitz world in the one-dimensional case.
Finally, let us observe that the above solutions S and X can be expressed
starting from an initial condition at time t as:
S
u
= S
t
exp (
1
2

2
)(u t) +(W
u
W
t
)
X
u
= X
t
e
k(ut)
+m
_
1 e
k(ut)
_
+
_
u
t
e
k(us)
dW
s
,
for all u t 0. In particular, we see that:
the distribution of S
u
conditional on the past values S
s
, s t of S up
to time t equals to the distribution of S
u
conditional on the current value
S
t
at time t; this is the so-called Markov property,
8.2. Strong solution of SDE 101
let S
t,s
u
and X
t,x
u
be given by the above expressions with initial condition
at time t frozen to S
t
= s and X
t
= x, respectiveley, then the random
functions s S
t,s
u
and x X
t,x
u
are strictly increasing for all u; this
is the so-called increase of the ow property.
These results which are well known for deterministic dierential equations
will be shown to hold true in the more general stochastic framework.
8.2 Strong solution of a stochastic dierential
equation
8.2.1 Existence and uniqueness
Given a ltered probability space (, T, F := T
t

t
, P) supporting a ddimensional
Brownian motion W, we consider the stochastic dierential equation
dX
t
= b(t, X
t
)dt +(t, X
t
)dW
t
, t [0, T], (8.4)
for some T R. Here, b and are function dened on [0, T] R
n
taking values
respectively in R
n
and /
R
(n, d).
Denition 8.1. A strong solution of (8.4) is an Fadapted process X such that
_
T
0
([b(t, X
t
)[ +[(t, X
t
)[
2
)dt < , a.s. and
X
t
= X
0
+
_
t
0
b(s, X
s
)ds +
_
t
0
(s, X
s
)dW
s
, t [0, T].
Let us mention that there is a notion of weak solutions which relaxes some
conditions from the above denition in order to allow for more general stochas-
tic dierential equations. Weak solutions, as opposed to strong solutions, are
dened on some probabillistic structure (which becomes part of the solution),
and not necessarilly on (, T, F, P, W). Thus, for a weak solution we search for
a probability structure (

,

T,

F,

P,

W) and a process

X such that the require-
ment of the above denition holds true. Obviously, any strong solution is a
weak solution, but the opposite claim is false.
Clearly, one should not expect that the stochastic dierential equation (8.4)
has a unique solution without any condition on the coecients b and . In the
deterministic case 0, (8.4) reduces to an ordinary dierential equation for
which existence and uniqueness requires Lipschitz conditions on b. The following
is an example of non-uniqueness.
Exercise 8.2. Consider the stochastic dierential equation:
dX
t
= 3X
1/3
t
dt + 3X
2/3
t
dW
t
with initial condition X
0
= 0. Show that X
t
= W
3
t
is a solution (in addition to
the solution X = 0).
102 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
Our main existence and uniqueness result is the following.
Theorem 8.3. Let X
0
L
2
be a r.v. independent of W, and assume that the
functions [b(t, 0)[, [(t, 0)[ L
2
(R
+
), and that for some K > 0:
[b(t, x) b(t, y)[ +[(t, x) (t, y)[ K[x y[ for all t [0, T], x, y R
n
.
Then, for all T > 0, there exists a unique strong solution of (8.4) in H
2
. More-
over,
E
_
sup
tT
[X
t
[
2
_
C
_
1 +E[X
0
[
2
_
e
CT
, (8.5)
for some constant C = C(T, K) depending on T and K.
Proof. We rst establish the existence and uniqueness result, then we prove the
estimate (8.5).
Step 1 For a constant c > 0, to be xed later, we introduce the norm
||
H
2
c
:= E
_
_
T
0
e
ct
[
t
[
2
dt
_
1/2
for every H
2
.
Clearly e
cT
||
H
2 ||
H
2
c
||
H
2. So the norm |.|
H
2
c
is equivalent to the
standard norm |.|
H
2 on the Hilbert space H
2
.
We dene a map U on H
2
([0, T] ) by:
U(X)
t
:= X
0
+
_
t
0
b(s, X
s
)ds +
_
t
0
(s, X
s
)dW
s
, 0 t T.
This map is well dened, as the processes b(t, X
t
), (t, X
t
), t [0, T] are
immediately checked to be in H
2
. In order to prove existence and uniqueness of
a solution for (8.4), we shall prove that U(X) H
2
for all X H
2
and that U
is a contracting mapping with respect to the norm |.|
H
2
c
for a convenient choice
of the constant c > 0.
1- We rst prove that U(X) H
2
for all X H
2
. To see this, we decompose:
|U(X)|
2
H
2 3T|X
0
|
2
L
2 + 3E
_
_
T
0

_
t
0
b(s, X
s
)ds

2
dt
_
+3E
_
_
T
0

_
t
0
(s, X
s
)dW
s

2
dt
_
By the Lipschitz-continuity of b and in x, uniformly in t, we have [b(t, x)[
2
+
[(t, x)[
2
K(1 +[b(t, 0)[
2
+[x[
2
) for some constant K. We then estimate the
second term by:
E
_
_
T
0

_
t
0
b(s, X
s
)ds

2
dt
_
KTE
_
_
T
0
(1 +[b(t, 0)[
2
+[X
s
[
2
)ds
_
< ,
8.2. Strong solution of SDE 103
since X H
2
, and b(., 0) L
2
([0, T]).
As, for the third term, we rst use the It o isometry:
E
_
_
T
0

_
t
0
(s, X
s
)dW
s

2
dt
_
TE
_
_
T
0
[(s, X
s
)[
2
ds
_
TKE
_
_
T
0
(1 +[(t, 0)[
2
+[X
s
[
2
)ds
_
< .
2- We next show that U is a contracting mapping for the norm |.|
H
2
c
for some
convenient choice of c > 0. For X, Y H
2
with X
0
= Y
0
= 0, we have
E[U(X)
t
U(Y )
t
[
2
2E

_
t
0
(b(s, X
s
) b(s, Y
s
)) ds

2
+ 2E

_
t
0
((s, X
s
) (s, Y
s
)) dW
s

2
= 2E

_
t
0
(b(s, X
s
) b(s, Y
s
)) ds

2
+ 2E
_
t
0
[(s, X
s
) (s, Y
s
)[
2
ds
= 2tE
_
t
0
[b(s, X
s
) b(s, Y
s
)[
2
ds + 2E
_
t
0
[(s, X
s
) (s, Y
s
)[
2
ds
2(T + 1)K
_
t
0
E[X
s
Y
s
[
2
ds.
Then,
|U(X) U(Y )|
H
2
c
2K(T + 1)
_
T
0
e
ct
_
t
0
E[X
s
Y
s
[
2
ds dt
=
2K(T + 1)
c
_
T
0
e
cs
E[X
s
Y
s
[
2
(1 e
c(Ts)
)ds

2K(T + 1)
c
|X Y |
H
2
c
.
Hence, U is a contracting mapping for suciently large c > 1.
Step 2 We next prove the estimate (8.5). We shall alleviate the notation writ-
ing b
s
:= b(s, X
s
) and
s
:= (s, X
s
). We directly estimate:
E
_
sup
ut
[X
u
[
2
_
= E
_
sup
ut

X
0
+
_
u
0
b
s
ds +
_
u
0

s
dW
s

2
_
3
_
E[X
0
[
2
+tE
__
t
0
[b
s
[
2
ds
_
+E
_
sup
ut

_
u
0

s
dW
s

2
__
3
_
E[X
0
[
2
+tE
__
t
0
[b
s
[
2
ds
_
+ 4E
__
t
0
[
s
[
2
ds
__
where we used the Doobs maximal inequality of Proposition 3.15. Since b and
104 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
are Lipschitz-continuous in x, uniformly in t, this provides:
E
_
sup
ut
[X
u
[
2
_
C(K, T)
_
1 +E[X
0
[
2
+
_
t
0
E
_
sup
us
[X
u
[
2
_
ds
_
and we conclude by using the Gronwall lemma.
The following exercise shows that the Lipschitz-continuity condition on the
coecients b and can be relaxed. Further relaxation of this assumption is
possible in the one-dimensional case, see Section 8.3.
Exercise 8.4. In the context of this section, assume that the coecients and
are locally Lipschitz with linear growth. By a localization argument, prove that
strong existence and uniqueness holds for the stochastic dierential equation
(8.4).
8.2.2 The Markov property
Let X
t,x
.
denote the solution of the stochastic dierential equation
X
s
= x +
_
s
t
b(u, X
u
)du +
_
s
t
(u, X
u
)dW
u
s t
The two following properties are obvious:
Clearly, X
t,x
s
= F (t, x, s, (W
.
W
t
)
tus
) for some deterministic function
F.
For t u s: X
t,x
s
= X
u,X
t,x
u
s
. This follows from the pathwise uniqueness,
and holds also when u is a stopping time.
With these observations, we have the following Markov property for the solutions
of stochastic dierential equations.
Proposition 8.5. (Markov property) For all 0 t s:
E[(X
u
, t u s) [T
t
] = E[(X
u
, t u s) [X
t
]
for all bounded function : C[t, s] R.
8.3 More results for scalar stochastic dierential
equations
We rst start by proving a uniqueness result for scalar stochastic dierential
equation under weaker conditions than the general ndimensional result of The-
orem 8.3. For example, the following extension allows to consider the so-called
Cox-Ingersol-Ross square root model for interest rates:
dr
t
= k(b r
t
)dt +

r
t
dW
t
,
8.3. Scalar SDE 105
or the Stochastic Volatility Constant Elasticity of Variance (SV-CEV) models
which are widely used to account for the dynamics of implied volatility, see
Chapter 10,
dS
t
S
t
= dt +S

t

t
dW
t
,
where the process (
t
)
t0
is generated by another autonomous stochastic dif-
ferential equation.
The question of existence will be skipped in these notes as it requires to
develop the theory of weak solutions of stochastic dierential equations, which
we would like to avoid. Let us just mention that the main result, from Yamada
and Watanabe (1971), is that the existence of weak solutions for a stochastic
dierential equation for which strong uniqueness holds implies existence and
uniqueness of a strong solution.
Theorem 8.6. Let b, : R
+
R R be two functions satisfying for all
(t, x) R
+
R:
[(t, x) (t, y)[ K[x y[ and [(t, x) (t, y)[ h([x y[), (8.6)
where h : R
+
R
+
is strictly increasing, h(0) = 0, and
_
(0,)
h(u)
2
du =
for all > 0. Then, there exists at most one strong solution for the stochastic
dierential equation
X
t
= X
0
+
_
t
0
(s, X
s
)ds +
_
t
0
(s, X
s
)dW
s
, t 0.
Proof. The conditions imposed on the function h imply the existence of a
strictly decreasing sequence (a
n
)
n0
(0, 1], with a
0
= 1, a
n
0, and
_
a
n1
a
n
h(u)
2
du = n for all n 1. Then, for all n 0, there exists a continuous
function
n
: R R such that
h = 0 outside of [a
n
, a
n1
], 0
n

2
nh
2
and
_
a
n1
a
n

n
(x)dx = 1.
We now introduce the C
2
function
n
(x) :=
_
|x|
0
_
y
0

n
(u)dudy, x R, and we
observe that
[

n
[ 1, [

n
[
2
nh
2
1
[a
n
,a
n1
]
, and
n
(x) [x[, n . (8.7)
Let X and Y be two solutions of (8.6) with X
0
= Y
0
, dene the stopping time

n
:= inft > 0 : (
_
t
0
(s, X
s
)
2
ds) (
_
t
0
(s, Y
s
)
2
ds) n, and set
t
:= X
t
Y
t
.
Then, it follows from It os formula that:

n
(
t
n
) =
_
t
n
0

n
(
s
) ((s, X
s
) (s, Y
s
)) ds
+
1
2
_
t
n
0

n
(
s
) ((s, X
s
) (s, Y
s
))
2
ds
+
_
t
n
0

n
(
s
) ((s, X
s
) (s, Y
s
)) dW
s
. (8.8)
106 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
Then, by the denition of
n
and the boundedness of

n
, the stochastic integral
term has zero expectation. Then, using the Lipschitz-continuity of and the
denition of the function h:
E[
n
(
t
n
)] KE
__
t
n
0
[
s
[ds
_
+E
__
t
n
0

n
(
s
)h
2
([
s
[)ds
_
KE
__
t
0
[
s
[ds
_
+E
__
t
0

n
(
s
)h
2
([
s
[)ds
_
K
_
t
0
E[[
s
[]ds +
t
n
.
By sending n , we see that E[[
t
[] K
_
t
0
E[[
s
[]ds +
t
n
, t 0, and we
deduce from the Gronwall inequality that
t
= 0 for all t 0.
Remark 8.7. It is well-known that the deterministic dierential equations X
t
=
X
0
+
_
t
0
b(s, X
s
), has a unique solution for suciently small t > 0 when b is
locally Lipschitz in x uniformly in t, and bounded on compact subsets of R
+
R.
In the absence of these conditions, we may go into problems of existence and
uniqueness. For example, for (0, 1), the equation X
t
=
_
t
0
[X
s
[

ds has a
continuum of solutions X

, 0, dened by:
X

t
:= [(1 )(t )]
1/(1)
1
[,)
(t), t 0.
The situation in the case of stochastic dierential equations is dierent, as the
previous theorem 8.6 shows that strong uniqueness holds for the stochastic dif-
ferential equation X
t
=
_
t
0
[X
s
[

dW
s
when 1/2, and therefore X = 0 is the
unique strong solution.
We next use the methodology of proof of the previous theorem 8.6 to prove
a monotonicity of the solution of a scalar stochastic dierential equation with
respect to the drift coecient and the initial condition.
Proposition 8.8. Let X and Y be two Fadapted processes with continuous
sample paths, satisfying for t R
+
:
X
t
= X
0
+
_
t
0
(t, X
t
)dt +
_
t
0
(t, X
t
)dW
t
,
Y
t
= Y
0
+
_
t
0
(t, X
t
)dt +
_
t
0
(t, X
t
)dW
t
.
for some continuous functions , , and . Assume further that either or
is Lipschitz-continuous, and that satises Condition (8.6) from theorem 8.6.
Then
(.) (.) and X
0
Y
0
a.s. implies that X
.
Y
.
a.s.
8.3. Scalar SDE 107
Proof. Dene
n
(x) :=
n
(x)1
(0,)
(x), where
n
is as dened in the proof of
Theorem 8.6. With
t
:= X
t
Y
t
, and
n
:= inft > 0 : (
_
t
0
(s, X
s
)
2
ds)
(
_
t
0
(s, Y
s
)
2
ds) n, we apply It os formula, and we deduce from the analogue
of (8.8), with
n
replacing
n
, that
E[
n
(
t
)]
t
n
E
__
t
n
0

n
(
s
) ((s, X
s
) (s, Y
s
))
_
ds
Since
n
0 and , this provides
E[
n
(
t
)]
t
n
E
__
t
n
0

n
(
s
) ((s, X
s
) (s, Y
s
))
_
ds (8.9)
E[
n
(
t
)]
t
n
E
__
t
n
0

n
(
s
) ((s, X
s
) (s, Y
s
))
_
ds. (8.10)
The following inequality then follows from (8.9) if is Lipschitz-continuous, or
(8.10) if is Lipschitz-continuous:
E[
n
(
t
)]
t
n
K
_
t
0
E[
+
s
]ds,
By sending n , this provides E[
+
t
] K
_
t
0
E[
+
s
]ds, and we conclude by
Gronwall lemma that E[
+
t
] = 0 for all t. Hence,
t
= 0, a.s. for all t 0, and
by the pathwise continuity of the process , we deduce that
.
= 0, a.s.
We conclude this section by the following exercise which proves the existence
of a solution for the so-called Cox-Ingersol-Ross process (or, more exactly, the
Feller process) under the condition of non-attainability of the origin.
Exercise 8.9. (Exam, December 2002) Given a scalar Brownian motion W,
we consider the stochastioc dierential equation:
X
t
= x +
_
t
0
( 2X
s
)ds + 2
_
t
0
_
X
s
dW
s
, (8.11)
where , and the initial condition x are given positive parameters. Denote
T
a
:= inft 0 : X
t
= a.
1. Let u(x) :=
_
x
1
y
/2
e
y
dy. Show that u(X
t
) is an It o process and provide
its dynamics.
2. For 0 < x a, prove that E[u(X
tT

T
a
)] = u(x) for all t 0.
3. For 0 < x a, show that there exists a scalar > 0 such that the
function v(x) := e
x
2
satises
( 2x)v

(x) + 2xv

(x) 1 for x [, a].


Deduce that E[v(X
tT

T
a
)] v(x) +E[t T

T
a
] for all t 0, and then
E[T

T
a
] < .
108 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
4. Prove that E[u(X
T

T
a
)] = u(x), and deduce that
P[T
a
T

] =
u(x) u()
u(a) u()
.
5. Assume 0. Prove that P[T
a
T
0
] = 1 for all a x. Deduce that
P[T
0
= ] = 1 (you may use without proof that T
a
, a.s. when
a .
8.4 Linear stochastic dierential equations
8.4.1 An explicit representation
In this paragraph, we focus on linear stochastic dierential equations:
X
t
= +
_
t
0
[A(s)X
s
+a(s)] ds +
_
t
0
(s)dW
s
, t 0, (8.12)
where W is a ddimensional Brownian motion, is a r.v. in R
d
independent
of W, and A : R
+
/
R
(n, n), a : R
+
R
n
, et : R
+
/
R
(n, d) are
deterministic Borel measurable functions with A bounded, and
_
T
0
[a(s)[ds +
_
T
0
[(s)[
2
ds < for all T > 0.
The existence and uniqueness of a strong solution for the above linear stochas-
tic dierential equation is a consequence of Theorem 8.3.
We next use the analogy with deterministic linear dierential systems to
provide a general representation of the solution of (8.12). Let H : R
+

/
R
(d, d) be the unique solution of the linear ordinary dierential equation:
H(t) = I
d
+
_
t
0
A(s)H(s)ds for t 0. (8.13)
Observe that H(t) is invertible for every t 0 for otherwise there would exist
a vector such that H(t
0
) = 0 for some t
0
> 0; then since x := H solves
the ordinary dierential equation x = A(t)x on R
d
, it follows that x = 0,
contradicting the fact that H(0) = I
d
.
Given the invertible matrix solution H of the fundamental equation (8.13), it
follows that the unique solution of the deterministic ordinary dierential equa-
tion x(t) = x(0) +
_
t
0
[A(s)x(s) +a(s)]ds is given by
x(t) = H(t)
_
x(0) +
_
t
0
H(s)
1
a(s)ds
_
for t 0.
Proposition 8.10. The unique solution of the linear stochastic dierential
equation (8.12) is given by
X
t
:= H(t)
_
+
_
t
0
H(s)
1
a(s)ds +
_
t
0
H(s)
1
(s)dW
s
_
for t 0.
8.4. Linear SDE 109
Proof. This is an immediate application of It os formula, which is left as an
exercise.
Exercise 8.11. In the above context, show that for all s, t R
+
:
E[X
t
] = H(t)
_
E[X
0
] +
_
t
0
H(s)
1
a(s)ds
_
,
Cov[X
t
, X
s
] = H(t)
_
Var[X
0
] +
_
st
0
H(s)
1
(s)[H(s)
1
(s)]
T
ds
_
H(t)
T
.
8.4.2 The Brownian bridge
We now consider the one-dimensional linear stochastic dierential equation:
X
t
= a +
_
t
0
b X
t
T t
dt +W
t
, for t [0, T), (8.14)
where T > 0 and a, b R are given. This equation can be solved by the method
of the previous paragraph on each interval [0, T ] for every > 0, with
fundamental solution of the corresponding linear equation
H(t) = 1
t
T
for all t [0, T). (8.15)
This provides the natural unique solution on [0, T):
X
t
= a
_
1
t
T
_
+b
t
T
+ (T t)
_
t
0
dW
s
T s
, pour 0 t < T. (8.16)
Proposition 8.12. Let X
t
, t [0, T] be the process dened by the unique
solution (8.16) of (8.14) on [0, T), and X
T
= b. Then X has a.s. continuous
sample paths, and is a gaussian process with
E[X
t
] = a
_
1
t
T
_
+b
t
T
, t [0, T],
Cov(X
t
, X
s
) = (s t)
st
T
, s, t [0, T].
Proof. Consider the processes
M
t
:=
_
t
0
dW
s
T s
, t 0, B
u
:= M
h
1
(u)
, u 0, where h(t) :=
1
T t

1
T
.
Then, B
0
= 0, B has a.s. continuous sample paths, and has independent incre-
ments, and we directly see that for 0 s < t, the distribution of the increment
B
t
B
s
is gaussian with zero mean and variance
Var[B
t
B
s
] =
_
H
1
(t)
H
1
(s)
ds
(T s)
2
=
1
T H
1
(t)

1
T H
1
(s)
= t s.
110 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
This shows that B is a Brownian motion, and therefore
lim
tT
(T t)M
t
= lim
u
B
u
u +T
1
= 0, a.s.
by the law of large numbers for the Brownian motion. Hence X
t
b a.s. when
t T. The expressions of the mean and the variance are obtained by direct
calculation.
8.5 Connection with linear partial dierential
equations
8.5.1 Generator
Let X
t,x
s
, s t be the unique strong solution of
X
t,x
s
= x +
_
s
t
(u, X
t,x
u
)du +
_
s
t
(u, X
t,x
u
)dW
u
, s t,
where and satisfy the required condition for existence and uniqueness of a
strong solution.
For a function f : R
n
R, we dene the function /f by
/f(t, x) = lim
h0
E[f(X
t,x
t+h
)] f(x)
h
if the limit exists
Clearly, /f is well-dened for all bounded C
2
function with bounded deriva-
tives and
/f =
f
x
+
1
2
Tr
_

T

2
f
xx
T
_
, (8.17)
(Exercise !). The linear dierential operator / is called the generator of X. It
turns out that the process X can be completely characterized by its generator or,
more precisely, by the generator and the corresponding domain of denition...
As the following result shows, the generator provides an intimate connection
between conditional expectations and linear partial dierential equations.
Proposition 8.13. Assume that the function (t, x) v(t, x) := E
_
g(X
t,x
T
)

is C
1,2
([0, T) R
n
). Then v solves the partial dierential equation:
v
t
+/v = 0 and v(T, .) = g.
Proof. Given (t, x), let
1
:= T infs > t : [X
t,x
s
x[ 1. By the law of
iterated expectation, it follows that
V (t, x) = E
_
V
_
s
1
, X
t,x
s
1
_
.
8.5. Connection with PDE 111
Since V C
1,2
([0, T), R
n
), we may apply It os formula, and we obtain by taking
expectations:
0 = E
__
s
1
t
_
v
t
+/v
_
(u, X
t,x
u
)du
_
+E
__
s
1
t
v
x
(u, X
t,x
s
) (u, X
t,x
u
)dW
u
_
= E
__
s
1
t
_
v
t
+/v
_
(u, X
t,x
u
)du
_
,
where the last equality follows from the boundedness of (u, X
t,x
u
) on [t, s
1
]. We
now send s t, and the required result follows from the dominated convergence
theorem.
8.5.2 Cauchy problem and the Feynman-Kac representa-
tion
In this section, we consider the following linear partial dierential equation
v
t
+/v k(t, x)v +f(t, x) = 0, (t, x) [0, T) R
d
v(T, .) = g
(8.18)
where / is the generator (8.17), g is a given function from R
d
to R, k and f are
functions from [0, T] R
d
to R, b and are functions from [0, T] R
d
to R
d
and /
R
(d, d), respectively. This is the so-called Cauchy problem.
For example, when k = f 0, b 0, and is the identity matrix, the above
partial dierential equation reduces to the heat equation.
Our objective is to provide a representation of this purely deterministic prob-
lem by means of stochastic dierential equations. We then assume that and
satisfy the conditions of Theorem 8.3, namely that
, Lipschitz in x uniformly in t,
_
T
0
_
[(t, 0)[
2
+[(t, 0)[
2
_
dt < . (8.19)
Theorem 8.14. Let the coecients , be continuous and satisfy (8.19). As-
sume further that the function k is uniformly bounded from below, and f has
quadratic growth in x uniformly in t. Let v be a C
1,2
_
[0, T), R
d
_
solution of
(8.18) with quadratic growth in x uniformly in t. Then
v(t, x) = E
_
_
T
t

t,x
s
f(s, X
t,x
s
)ds +
t,x
T
g
_
X
t,x
T
_
_
, t T, x R
d
,
where X
t,x
s
:= x+
_
s
t
(u, X
t,x
u
)du+
_
s
t
(u, X
t,x
u
)dW
u
and
t,x
s
:= e

R
s
t
k(u,X
t,x
u
)du
for t s T.
112 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
Proof. We rst introduce the sequence of stopping times

n
:= T inf
_
s > t :

X
t,x
s
x

n
_
,
and we oberve that
n
T Pa.s. Since v is smooth, it follows from It os
formula that for t s < T:
d
_

t,x
s
v
_
s, X
t,x
s
__
=
t,x
s
_
kv +
v
t
+/v
_
_
s, X
t,x
s
_
ds
+
t,x
s
v
x
_
s, X
t,x
s
_

_
s, X
t,x
s
_
dW
s
=
t,x
s
_
f(s, X
t,x
s
)ds +
v
x
_
s, X
t,x
s
_

_
s, X
t,x
s
_
dW
s
_
,
by the PDE satised by v in (8.18). Then:
E
_

t,x

n
v
_

n
, X
t,x

n
_
v(t, x)
= E
__

n
t

t,x
s
_
f(s, X
s
)ds +
v
x
_
s, X
t,x
s
_

_
s, X
t,x
s
_
dW
s
__
.
Now observe that the integrands in the stochastic integral is bounded by def-
inition of the stopping time
n
, the smoothness of v, and the continuity of .
Then the stochastic integral has zero mean, and we deduce that
v(t, x) = E
__

n
t

t,x
s
f
_
s, X
t,x
s
_
ds +
t,x

n
v
_

n
, X
t,x

n
_
_
. (8.20)
Since
n
T and the Brownian motion has continuous sample paths Pa.s.
it follows from the continuity of v that, Pa.s.
_

n
t

t,x
s
f
_
s, X
t,x
s
_
ds +
t,x

n
v
_

n
, X
t,x

n
_
n

_
T
t

t,x
s
f
_
s, X
t,x
s
_
ds +
t,x
T
v
_
T, X
t,x
T
_
=
_
T
t

t,x
s
f
_
s, X
t,x
s
_
ds +
t,x
T
g
_
X
t,x
T
_
(8.21)
by the terminal condition satised by v in (8.18). Moreover, since k is bounded
from below and the functions f and v have quadratic growth in x uniformly in
t, we have

_

n
t

t,x
s
f
_
s, X
t,x
s
_
ds +
t,x

n
v
_

n
, X
t,x

n
_

C
_
1 + max
tT
[X
t
[
2
_
.
By the estimate stated in the existence and uniqueness theorem 8.3, the latter
bound is integrable, and we deduce from the dominated convergence theorem
that the convergence in (8.21) holds in L
1
(P), proving the required result by
taking limits in (8.20).
8.5. Connection with PDE 113
The above Feynman-Kac representation formula has an important numerical
implication. Indeed it opens the door to the use of Monte Carlo methods in order
to obtain a numerical approximation of the solution of the partial dierential
equation (8.18). For sake of simplicity, we provide the main idea in the case
f = k = 0. Let
_
X
(1)
, . . . , X
(k)
_
be an iid sample drawn in the distribution of
X
t,x
T
, and compute the mean:
v
k
(t, x) :=
1
k
k

i=1
g
_
X
(i)
_
.
By the Law of Large Numbers, it follows that v
k
(t, x) v(t, x) Pa.s. More-
over the error estimate is provided by the Central Limit Theorem:

k ( v
k
(t, x) v(t, x))
k
^
_
0, Var
_
g
_
X
t,x
T
__
in distribution,
and is remarkably independent of the dimension d of the variable X !
8.5.3 Representation of the Dirichlet problem
Let D be an open subset of R
d
. The Dirichlet problem is to nd a function u
solving:
/u ku +f = 0 on D and u = g on D, (8.22)
where D denotes the boundary of D, and / is the generator of the process
X
0,X
0
dened as the unique strong solution of the stochastic dierential equation
X
0,X
0
t
= X
0
+
_
t
0
(s, X
0,X
0
s
)ds +
_
t
0
(s, X
0,X
0
s
)dW
s
, t 0.
Similarly to the the representation result of the Cauchy problem obtained in
Theorem 8.14, we have the following representation result for the Dirichlet prob-
lem.
Theorem 8.15. Let u be a C
2
solution of the Dirichlet problem (8.22). As-
sume that k is bounded from below, and
E[
x
D
] < , x R
d
, where
x
D
:= inf
_
t 0 : X
0,x
t
, D
_
.
Then, we have the representation:
u(x) = E
_
g
_
X
0,x

D
_
e

D
0
k(X
s
)ds
+
_

D
0
f
_
X
0,x
t
_
e

R
t
0
k(X
s
)ds
dt
_
.
Exercise 8.16. Provide a proof of Theorem 8.15 by imitating the arguments in
the proof of Theorem 8.14.
114 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
8.6 The hedging portfolio in a Markov nancial
market
In this paragraph, we return to the context of Section 7.5, and we assume further
that the It o process S is dened by a stochastic dierential equation, i.e.

t
= (t, S
t
),
t
= (t, S
t
),
and the interest rate process r
t
= r(t, S
t
). In particular, the risk premium pro-
cess is also a deterministic function of the form
t
= (t, S
t
), and the dynamics
of the process S under the risk-neutral measure Q is given by:
dS
t
= diag[S
t
] (r(t, S
t
)dt +(t, S
t
)dB
t
) ,
where we recall the B is a QBrownian motion. We assume that these coe-
cients are subject to all required conditions so that existence and uniqueness of
the It o processes, together with the conditions of Section 7.5 are satised.
Finally, we assume that the derivative security is dened by a Vanilla con-
tract, i.e. G = g(S
T
) for some function g with quadratic growth.
Then, from Theorem 7.17, the no-arbitrage market price of the derivative
security g(S
T
) is given by
p(G) = V (0, S
0
) := E
Q
_
e

R
T
0
r
u
du
g(S
T
)
_
.
Moreover, a careful inspection of the proof shows that the perfect replicating
strategy

is obtained by means of the martingale representations of the mar-


tingale Y
t
:= E
Q
[

G[T
t
]. In order to identify the optimal portfolio, we introduce
the derivative securitys price at each time t [0, T]:
V (t, S
t
) = E
Q
_
e

R
T
t
r
u
du
g(S
T
)

T
t
_
= E
Q
_
e

R
T
t
r
u
du
g(S
T
)

S
t
_
, t [0, T],
and we observe that Y
t
= e

R
t
0
r
u
du
V (t, S
t
), t [0, T].
Proposition 8.17. In the above context, assume that the function (t, s)
V (t, s) is C
1,2
_
[0, T), (0, )
d
_
. Then the perfect replicating strategy of the
derivative security G = g(S
T
) is given by

t
= diag[S
t
]
V
s
(t, S
t
), t [0, T).
In other words the perfect replicating strategy requires that the investor holds a
hedging portfolio consisting of

i
t
:=
V
s
i
(t, S
t
) shares of S
i
at each time t [0, T).
8.7. Importance sampling 115
Proof. From the discussion preceeding the statement of the proposition, the
perfect hedging strategy is obtained from the martingale representation of the
process Y
t
= e

R
t
0
r
u
du
V (t, S
t
), t [0, T]. Since V has the required regularity
for the application of It os formula, we obtain:
dY
t
= e

R
t
0
r
u
du
_
...dt +
V
s
(t, S
t
) dS
t
_
=
V
s
(t, S
t
) d

S
t
,
where, in the last equality, the dt coecient is determined from the fact that

S and Y are Qmartingales. The expression of

is then easily obtained by


identifying the latter expression with that of a portfolio value process.
8.7 Application to importance sampling
Importance sampling is a popular variance reduction technique in Monte Carlo
simulation. In this section, we recall the basic features of this technique in the
simple context of simulating a random variable. Then, we show how it can be
extended to stochastic dierential equations.
8.7.1 Importance sampling for random variables
Let X be a square integrable r.v. on R
n
. We assume that its distribution is ab-
solutely continuous with respect to the Lebesgue measure in R
n
with probability
density function f
X
. Our task is to provide an approximation of
:= E[X]
by Monte Carlo simulation. To do this, we assume that independent copies
(X
i
)
i1
of the r.v. X are available. Then, from the law of large numbers, we
have:

N
:=
1
N
N

i=1
X
i
P a.s.
Moreover, the approximation error is given by the central limit theorem:

N
_

N

_
N(0, Var[X
1
]) in distribution.
We call the approximation

N
the naive Monte Carlo estimator of , in the
sense that it is the most natural. Indeed, one can devise many other Monte
Carlo estimators as follows. Let Y be any other r.v. absolutely continuous
with respect to the Lebesgue measure with density f
Y
satisfying the support
restriction
f
Y
> 0 on f
X
> 0.
116 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
Then, one can re-write as:
= E
_
f
X
(Y )
f
Y
(Y )
Y
_
.
Then, assuming that independent copies (Y
i
)
i1
of the r.v. Y are available, this
suggests an alternative Monte Carlo estimator:

N
(Y ) :=
1
N
N

i=1
f(Y
i
)
g(Y
i
)
Y
i
.
By the law of large numbers and the central limit theorem, we have:

N
(Y ) , a.s.
and

N
_

N
(Y )
_
N
_
0, Var
_
f
X
(Y )
f
Y
(Y )
Y
__
in distribution
Hence, for every choice of a probability density function f
Y
satisfying the above
support restriction, one may build a corresponding Monte Calo estimator

N
(Y )
which is consistent, but diers from the naive Monte Carlo estimator by the
asymptotic variance of the error. It is then natural to wonder whether one can
nd an optimal density in the sense of minimization of the asymptotic variance
of the error:
min
f
Y
Var
_
f
X
(Y )
f
Y
(Y )
Y
_
.
This minimization problem turns out to be very easy to solve. Indeed, since
E
_
f
X
(Y )
f
Y
(Y )
Y
_
= E[X] and E
_
f
X
(Y )
f
Y
(Y )
[Y [
_
= E[[X[] do not depend on f
Y
, we have
the equivalence between the following minimization problems:
min
f
Y
Var
_
f
X
(Y )
f
Y
(Y )
Y
_
min
f
Y
E
_
f
X
(Y )
2
f
Y
(Y )
2
Y
2
_
min
f
Y
Var
_
f
X
(Y )
f
Y
(Y )
[Y [
_
,
and the solution of the latter problem is given by
f

Y
(y) :=
[y[f
X
(y)
E[[X[]
,
Moreover, when X 0 a.s. the minimum variance is zero ! this means that,
by simulating one single copy Y
1
according to the optimal density f

Y
, one can
calculate the required expected value without error !
Of course, this must not be feasible, and the problem here is that the calcu-
lation of the optimal probability density function f

Y
involves the computation
of the unknown expectation .
However, this minimization is useful, and can be used as follows:
8.7. Importance sampling 117
- Start from an initial (poor) estimation of , from the naive Monte Carlo
estimator for instance. Deduce an estimator

f

Y
of the optimal probability den-
sity f

Y
. Simulate independent copies (Y
i
)
i1
according to

f

Y
, and compute a
second stage Monte Carlo estimator.
- Another application is to perform a Hastings-Metropolis algorithm and take
advantage of the property that the normalizing factor E[[X[] is not needed...
(MAP 432).
8.7.2 Importance sampling for stochastic dierential equa-
tions
We aim at approximating
u(0, x) := E
_
g
_
X
0,x
T
__
where X
0,x
.
is the unique strong solution of
X
0,x
t
= x +
_
t
0

_
t, X
0,x
t
_
dt +
_
t
0

_
t, X
0,x
t
_
dW
t
, t 0.
Let Q := Z
T
P be a probability measure equivalent to P on T
T
, and assume
that one can produce independent copies
_

X
i
T
,

Z
i
T
_
of the r.v.
_
X
0,x
T
, Z
T
_
(in
practice, one can only generate a discrete-time approximation...)
Each choice of density Z suggests a Monte Carlo approximation:
u
Z
N
(0, x) :=
1
N
N

i=1
1

Z
i
T
g
_

X
i
T
_
u(0, x) P a.s.
and the central limit theorem says that an optimal choice of Z consists in min-
imizing the asymptotic variance
min
Z
T
Var
_
1
Z
T
g
_
X
0,x
T
_
_
.
To do this, we restrict our attention to those densities dened by
Z
h
0
= 1 and dZ
h
t
= Z
h
t
h
t
dW
t
, t [0, T],
for some h
t
= h(t, X
t
) satisfying
_
T
0
[h
t
[
2
dt < Pa.s. and E[Z
h
T
] = Z
h
0
= 1.
We denote by H the collection of all such processes.
Under the probability measure Q
h
:= Z
h
T
P on T
T
, it follows from the
Girsanov theorem that
_
W
h
t
= W
t

_
t
0
h
u
du, 0 t T
_
is a Q
h
Brownian motion
118 CHAPTER 8. STOCHASTIC DIFFERENTIAL EQUATIONS
The dynamics of X and M
h
:=
_
Z
h
_
1
under Q
h
are given by
dX
t
= [(X
t
)dt +(X
t
)h
t
] dt +(X
t
)dW
h
t
,
dM
h
t
= M
h
t
h
t
dW
h
t
.
We now solve the minimization problem
V
0
:= min
hH
Var
_
M
h
T
g
_
X
t,x
T
_
The subsequent calculation uses the fact that the function u(t, x) := E
Q
h
_
g
_
X
t,x
T
_
solves the partial dierential equation
u
t
(t, x) +L
h
t
u(t, x) = 0
by Proposition 8.13, where L
h
= L
0
+(h)

x
is the generator of X under Q
h
,
and L
0
is the generator of X under the original probability measure P.
Applying It os formula, we see that
M
h
T
u(T, X
h
T
) = u(0, x) +
_
T
0
M
h
t
_

T
u
x
uh
_
(t, X
t
) dW
t
and we immediately see that if h

t
:=
T
( ln u/x)(t, X
t
) is in H, then V
0
= 0.
Similarly to the case of random variables, this result can be used either
to devise a two-stage Monte Carlo method, or to combine with a Hastings-
Metropolis algorithm.
Chapter 9
The Black-Scholes model
and its extensions
In the previous chapters, we have seen the Black-Sholes formula proved from
three dierent approaches: continuous-time limit of the Cox-Ingersol-Ross bi-
nomial model, verication from the solution of a partial dierential equation,
and the elegant martingale approach. However, none of these approaches was
originally used by Black and Scholes. The rst section of this chapter presents
the original intuitive argument contained in the seminal paper by Black and
Scholes. The next section reviews the Black-Scholes formula and shows vari-
ous extensions which are needed in the every-day practive of the model within
the nancial industry. The nal section provides some calculations for barrier
options.
9.1 The Black-Scholes approach for the Black-
Scholes formula
In this section, we derive a formal argument in order to obtain the valuation
PDE (6.10) from Chapter 6. The following steps have been employed by Black
and Scholes in their pioneering work [6].
1. Let p(t, S
t
) denote the timet market price of a contingent claim dened by
the payo B = g(S
T
) for some function g : R
+
R. Notice that we are
accepting without proof that p is a deterministic function of time and the spot
price, this has been in fact proved in the previous section.
2. The holder of the contingent claim completes his portfolio by some investment
in the risky assets. At time t, he decides to holds
i
shares of the risky asset
S
i
. Therefore, the total value of the portfolio at time t is
P
t
:= p(t, S
t
) S
t
, 0 t < T .
119
120 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
3. Considering delta as a constant vector in the time interval [t, t + dt), and
assuming that the function p is of class C
1,2
, the variation of the portfolio value
is given by :
dP
t
= Lp(t, S
t
)dt +
p
s
(t, S
t
) dS
t
dS
t
.
where Lp = p
t
+
1
2
Tr[diag[s]
T
diag[s]

2
p
ss
T
]. In particular, by setting
=
p
s
,
we obtain a portfolio value with nite quadratic variation
dP
t
= Lp(t, S
t
)dt . (9.1)
4. The portfolio P
t
is non-risky since the variation of its value in the time
interval [t, t + dt) is known in advance at time t. Then, by the no-arbitrage
argument, we must have
dP
t
= r(t, S
t
)P
t
dt = r(t, S
t
)[p(t, S
t
) S
t
]
= r(t, S
t
)
_
p(t, S
t
)
p
s
S
t
_
(9.2)
By equating (9.1) and (9.2), we see that the function p satises the PDE
p
t
+rs
p
s
+
1
2
Tr
_
diag[s]
T
diag[s]

2
p
ss
T
_
rp = 0 ,
which is exactly the PDE obtained in the previous section.
9.2 The Black and Scholes model for European
call options
9.2.1 The Black-Scholes formula
In this section, we consider the one-dimensional Black-Scholes model d = 1
so that the price process S of the single risky asset is given in terms of the
QBrownian motion B :
S
t
= S
0
exp
__
r

2
2
_
t +B
t
_
, 0 t T . (9.3)
Observe that the random variable S
t
is log-normal for every xed t. This is the
key-ingredient for the next explicit result.
Proposition 9.1. Let G = (S
T
K)
+
for some K > 0. Then the no-arbitrage
price of the contingent claim G is given by the so-called Black-Scholes formula :
p
0
(G) = S
0
N
_
d
+
(S
0
,

K,
2
T)
_


K N
_
d

(S
0
,

K,
2
T)
_
, (9.4)
9.2. European call options 121
where

K := Ke
rT
, d

(s, k, v) :=
ln (s/k)

v

1
2

v , (9.5)
and the optimal hedging strategy is given by

t
= S
t
N
_
d
+
(S
t
,

K,
2
(T t)
_
, 0 t T . (9.6)
Proof. This formula can be derived by various methods. One can just calcu-
late directly the expected value by exploiting the explicit probability density
function of the random variable S
T
. One can also guess a solution for the valu-
ation PDE corresponding to the call option. We shall present another method
which relies on the technique of change of measure and reduces considerably the
computational eort. We rst decompose
p
0
(G) = E
Q
_

S
T
1
{

S
T

K}
_


K Q
_

S
T


K
_
(9.7)
where as usual, the tilda notation corresponds to discounting, i.e. multiplication
by e
rT
in the present context.
1. The second term is directly computed by exploiting the knowledge of the
distribution of

S
T
:
Q
_

S
T


K
_
= Q
_
ln (

S
T
/S
0
) + (
2
/2)T

ln (

K/S
0
) + (
2
/2)T

T
_
= 1 N
_
ln (

K/S
0
) + (
2
/2)T

T
_
= N
_
d

(S
0
,

K,
2
T)
_
.
2. As for the rst expected value, we dene the new measure P
1
:= Z
1
T
Q on
T
T
, where
Z
1
T
:= exp
_
W
T


2
2
T
_
=

S
T
S
0
.
By the Girsanov theorem, the process W
1
t
:= B
t

t
, 0 t T, denes a
Brownian motion under P
1
, and the random variable
ln (

S
T
/S
0
) (
2
/2)T

T
is distributed as ^(0, 1) under P
1
.
We now re-write the rst term in (9.7) as
E
Q
_

S
T
1
{

S
T

K}
_
= S
0
P
1
_

S
T


K
_
= S
0
Prob
_
^(0, 1)
ln (

K/S
0
) (
2
/2)T

T
_
= S
0
N
_
d
+
(S
0
,

K,
2
T)
_
.
122 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
3. The optimal hedging strategy is obtained by directly dierentiating the price
formula with respect to the underlying risky asset price, see Proposition 8.17.


Figure 9.1: The Black-Scholes formula as a function of S and t
Exercise 9.2 (Black-Scholes model with time-dependent coecients). Consider
the case where the interest rate is a deterministic function r(t), and the risky
asset price process is dened by the time dependent coecients b(t) and (t).
Show that the European call option price is given by the extended Black-Sholes
9.2. European call options 123
formula:
p
0
(G) = S
0
N
_
d
+
(S
0
,

K, v(T))
_


K N
_
d

(S
0
,

K, v(T))
_
(9.8)
where

K := Ke

R
T
0
r(t)dt
, v(T) :=
_
T
0

2
(t)dt . (9.9)
What is the optimal hedging strategy.
9.2.2 The Blacks formula
We again assume that the nancial market contains one single risky asset with
price process dened by the constant coecients Black-Scholes model. Let
F
t
, t 0 be the price process of the forward contract on the risky asset
with maturity T

> 0. Since the interest rates are deterministic, we have


F
t
= S
t
e
r(T

t)
= F
0
e

1
2

2
t+B
t
, 0 t T .
In particular, we observe that the process F
t
, t [0, T

] is a martingale under
the risk neutral measure Q. As we shall see in next chapter, this property is
specic to the case of deterministic interest rates, and the corresponding result in
a stochastic interest rates framework requires to introduce the so-called forward
neutral measure.
We now consider the European call option on the forward contract F with
maturity T (0, T

] and strike price K > 0. The corresponding payo at the


maturity T is G := (F
T
K)
+
. By the previous theory, its price at time zero
is given by
p
0
(G) = E
Q
_
e
rT
(F
T
K)
+
_
.
In order to compute explicitly the above expectation, we shall take advantage
of the previous computations, and we observe that e
rT
p
0
(G) corresponds to the
Black-Scholes formula for a zero interest rate. Hence:
p
0
(G) = e
rT
_
F
0
N
_
d
+
(F
0
, K,
2
T)
_
KN
_
d

(F
0
, K,
2
T)
__
.(9.10)
This is the so-called Blacks formula.
9.2.3 Option on a dividend paying stock
When the risky asset S pays out some dividend, the previous theory requires
some modications. We shall rst consider the case where the risky asset pays
a lump sum of dividend at some pre-specied dates, assuming that the process
S is dened by the Black-Scholes dynamics between two successive dates of
dividend payment. This implies a downward jump of the price process upon
the payment of the dividend. We next consider the case where the risky asset
124 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
pays a continuous dividend dened by some constant rate. The latter case can
be viewed as a model simplication for a risky asset composed by a basket of a
large number of dividend paying assets.
Lump payment of dividends Consider a European call option with matu-
rity T > 0, and suppose that the underlying security pays out a lump of dividend
at the pre-specied dates t
1
, . . . , t
n
(0, T). At each time t
j
, j = 1, . . . , n, the
amount of dividend payment is

j
S
t
j

where
1
, . . . ,
n
(0, 1) are some given constants. In other words, the dividends
are dened as known fractions of the security price at the pre-specied dividend
payment dates. After the dividend payment, the security price jumps down
immediately by the amount of the dividend:
S
t
j
= (1
j
)S
t
j

, j = 1, . . . , n.
Between two successives dates of dividend payment, we are reduced to the pre-
vious situation where the asset pays no dividend. Therefore, the discounted
security price process must be a martingale under the risk neutral measure Q,
i.e. in terms of the Brownian motion B, we have
S
t
= S
t
j1
e

2
2

(tt
j1
)+(W
t
W
t
j1
)
, t [t
j1
, t
j
) ,
for j = 1, . . . , n with t
0
:= 0. Hence
S
T
=

S
0
e

2
2

T+W
T
where

S
0
:= S
0
n

j=1
(1
j
) ,
and the no-arbitrage European call option price is given by
E
Q
_
e
rT
(S
T
K)
+

=

S
0
N
_
d
+
(

S
0
,

K,
2
T)
_


K N
_
d

S
0
,

K,
2
T)
_
,
with

K = Ke
rT
, i.e. the Black-Scholes formula with modied spot price from
S
0
to

S
0
.
Continuous dividend payment We now suppose that the underlying secu-
rity pays a continuous stream of dividend S
t
, t 0 for some given constant
rate > 0. This requires to adapt the no-arbitrage condition so as to account
for the dividend payment. From the nancial viewpoint, the holder of the op-
tion can immediately re-invest the dividend paid in cash into the asset at any
time t 0. By doing so, the position of the security holder at time t is
S
()
t
:= S
t
e
t
, t 0 .
9.2. European call options 125
In other words, we can reduce the problem the non-dividend paying security
case by increasing the value of the security. By the no-arbitrage theory, the dis-
counted process
_
r
rt
S
()
t
, t 0
_
must be a martingale under the risk neutral
measure Q:
S
()
t
= S
()
0
e

2
2

t+ B
t
= S
0
e

2
2

t+ B
t
, t 0 ,
where B is a Brownian motion under Q. By a direct application of It os for-
mula, this provides the expression of the security price process in terms of the
Brownian motion B:
S
t
= S
0
e

2
2

t+ B
t
, t 0 . (9.11)
We are now in a position to provide the call option price in closed form:
E
Q
_
e
rT
(S
T
K)
+
_
= e
T
E
Q
_
e
(r)T
(S
T
K)
+
_
= e
T
_
S
0
N
_
d
+
(S
0
,

K
()
,
2
T)
_


K
()
N
_
d

(S
0
,

K
()
,
2
T)
__
, (9.12)
where

K
()
:= Ke
(r)T
.
9.2.4 The Garman-Kohlhagen model for exchange rate
options
We now consider a domestic country and a foreign country with dierent cur-
rencies. The instantaneous interest rates prevailing in the domestic country and
the forign one are assumed to be constant, and will be denoted respectively by
r
d
and r
f
.
The exchange rate from the domestic currency to the foreign one is denoted
by c
d
t
at every time t 0. This is the price at time t, expressed in the domestic
currency, of one unit of the foreig currency. For instance, if the domestic cur-
rency is the Euro, and the foreign currency is the Dollar, then c
d
t
is the timet
value in Euros of one Dollar; this is the Euro/Dollar exchange rate.
Similarly, one can introduce the exchange rate from the foreign currency
to the domestic one c
f
t
. Assuming that all exchange rates are positive, and
that the international nancial market has no frictions, it follows from a simple
no-arbitrage argument that
c
f
t
=
1
c
d
t
for every t 0 . (9.13)
We postulate that the exchange rate process c
d
is dened by the Black-Scholes
model
c
d
t
= c
d
0
e

|
d
|
2
2

t+
d
W
t
, t 0 . (9.14)
126 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
Our objective is to derive the no-arbitrage price of the exchange rate call option
G
d
:=
_
c
d
T
K
_
+
,
for some K > 0, where the payo G
d
is expressed in domestic currency.
To do this, we will apply our results from the no-arbitrage valuation theory.
We will rst identify a risky asset in the domestic country which will isolate a
unique risk neutral measure P
d
, so that the no arbitrage price of the contin-
gent claim G
d
will be easily obtained once the distribution of c
d
under P
d
is
determined.
1. In order to relate the exchange rate to a nancial asset of the domestic
country, consider the following strategy, for an investor of the domestic country,
consisting of investing in the non-risky asset of the foreign country. Let the
initial capital at time t be P
t
:= 1 Euro or, after immediate conversion
_
c
d
_
1
Dollars. Investing this amount in the foreign country non-risky asset, the in-
vestor collects the amount
_
c
d
_
1
(1 + r
f
dt) Dollars after a small time period
dt. Finally, converting back this amount to the domestic currency provides the
amount in Euros
P
t+dt
=
c
d
t+dt
(1 +r
f
dt)
c
d
t
and therefore dP
t
=
dc
d
t
c
d
t
+r
f
dt .
2. Given the expression (9.14) of the exchange rate, it follows from a direct
application of It os formula that
dP
t
=
_

d
+r
f
_
dt +
d
dW
t
= r
d
dt +
d
dW
d
t
where
W
d
t
:= W
t
+
d
t , t 0 , where
d
:=

d
+r
f
r
d

d
.
Since P
t
is the value of a portfolio of the domestic country, the unique risk
neutral measure P
d
is identied by the property that the processes W
d
is a
Brownian motion under P
d
, which provides by the Girsanov theorem:
dP
d
dP
= e

d
W
T

1
2
|
d
|
2
T
on T
T
.
3. We now can rewrite the expression of the exchange rate (9.14) in terms of
the Brownian motion W
d
of the risk neutral measure P
d
:
c
d
t
= c
d
0
e

r
d
r
f

|
d
|
2
2

t+
d
W
t
, t 0 .
Comparing with (9.11), we obtain the following
Interpretation The exchange rate c
d
is equivalent to an asset of the domestic
currency with continuous dividend payment at the rate r
f
.
9.2. European call options 127
4. We can now compute the call option price by directly applying (9.12):
E
P
d
_
e
r
d
T
G
d
_
:= e
r
f
T
_
c
d
0
N
_
d
+
(c
d
0
,

K
(r
f
)
, [
d
[
2
T)
_


K
(r
f
)
N
_
d

(c
d
0
,

K
(r
f
)
, [
d
[
2
T)
__
,
where

K
(r
f
)
:= Ke
(r
d
r
f
)T
.
5. Of course the previous analysis may be performed symmetrically from the
point of view of the foreign country. By (9.13) and (9.14), we have:
c
f
t
= c
f
0
e

|
f
|
2
2

t+
f
W
t
, t 0 ,
where

f
:=
d
+[
d
[
2
and
f
:=
d
. (9.15)
The foreign nancial market risk neutral measure together with the correspond-
ing Brownian motion are denied by
dP
d
dP
= e

f
W
T

1
2
|
f
|
2
T
on T
T
and W
f
t
:= W
t
+
f
t ,
where

f
:=

f
+r
d
r
f

f
=
d

d
by (9.15).
9.2.5 The practice of the Black-Scholes model
The Black-Scholes model is used allover the industry of derivative securities. Its
practical implementation requires the determination of the coecients involved
in the Black-Scholes formula. As we already observed the drift parameter
is not needed for the pricing and hedging purposes. This surprising feature is
easily understood by the fact that the perfect replication procedure involves the
underlying probability measure only through its zero-measure sets, and therefore
the problem is not changed by passage to any equivalent probability measure; by
the Girsanov theorem this means that the problem is not modied by changing
the drift in the dynamics of the risky asset.
Since the interest rate is observed, only the volatility parameter needs to be
determined in order to implement the Black-Scholes formula. After discussing
this important issue, we will focus on the dierent control variables which are
carefully scrutinized by derivatives traders in order to account for the departure
of real life nancial markets from the simple Black-Scholes model.
Volatility: statistical estimation versus calibration
128 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
1. According to the Black-Scholes model, given the observation of the risky
asset prices at times t
i
:= ih, i = 1, . . . , n for some time step h > 0, the returns
R
t
i
:= ln
_
S
t
i
S
t
i1
_
are iid distributed as ^
__


2
2
_
h,
2
h
_
.
Then the sample variance

2
n
:=
1
n
n

i=1
_
R
t
i


R
n
_
2
, where

R
n
:=
1
n
n

i=1
R
t
i
,
is the maximum likelihood estimator for the parameter
2
. The estimator
n
is
called the historical volatility parameter.
The natural way to implement the Black-Scholes model is to plug the his-
torical volatility into the Black-Scholes formula
BS (S
t
, , K, T) := S
t
N
_
d
+
_
S
t
,

K,
2
T
__


KN
_
d

_
S
t
,

K,
2
T
__
(9.16)
to compute an estimate of the option price, and into the optimal hedge ratio
(S
t
, , K, T) := N
_
d
+
_
S
t
,

K,
2
T
__
(9.17)
in order to implement the optimal hedging strategy.
Unfortunately, the options prices estimates provided by this method per-
forms very poorly in terms of tting the observed data on options prices. Also,
the use of the historical volatility for the hedging purpose leads to a very poor
hedging strategy, as it can be veried by a back-testing procedure on observed
data.
2. This anomaly is of course due to the simplicity of the Black-Scholes model
which assumes that the log-returns are gaussian independent random variables.
The empirical analysis of nancial data reveals that securities prices exhibit fat
tails which are by far under-estimated by the gaussian distribution. This is the
so-called leptokurtic eect. It is also documented that nancial data exhibits
an important skewness, i.e. asymmetry of the distribution, which is not allowed
by the gaussian distribution.
Many alternative statistical models have suggested in the literature in order
to account for the empirical evidence (see e.g. the extensive literature on ARCH
models). But none of them is used by the practioners on (liquid) options mar-
kets. The simple and by far imperfect Black-Scholes models is still used allover
the nancial industry. It is however the statistical estimation procedure that
practitioners have gave up very early...
3. On liquid options markets, prices are given to the practitioners and are de-
termined by the confrontation of demand and supply on the market. Therefore,
their main concern is to implement the corresponding hedging strategy. To do
9.2. European call options 129
this, they use the so-called calibration technique, which in the present context
reduce to the calculation of the implied volatility parameter.
It is very easily checked the Black-Scholes formula (9.16) is a on-to-one func-
tion of the volatility parameter, see (9.22) below. Then, given the observation
of the call option price C

t
(K, T) on the nancial market, there exists a unique
parameter
imp
which equates the observed option price to the corresponding
Black-Scholes formula:
BS
_
S
t
,
imp
t
(K, T), K, T
_
= C

t
(K, T) , (9.18)
provided that C

t
satises the no-arbirage bounds of Subsection 1.4. This de-
nes, for each time t 0, a map (K, T)
imp
t
(K, T) called the implied
volatility surface. For their hedging purpose, the option trader then computes
the hedge ratio

imp
t
(T, K) :=
_
S
t
,
imp
t
(K, T), K, T
_
.
If the constant volatility condition were satised on the nancial data, then the
implied volatility surface would be expected to be at. But this is not the case
on real life nancial markets. For instance, for a xed maturity T, it is usually
observed that that the implied volatility is U-shaped as a function of the strike
price. Because of this empirical observation, this curve is called the volatility
smile. It is also frequently argued that the smile is not symmetric but skewed
in the direction of large strikes.
From the conceptual point of view, this practice of options traders is in con-
tradiction with the basics of the Black-Scholes model: while the Black-Scholes
formula is established under the condition that the volatility parameter is con-
stant, the practical use via the implied volatility allows for a stochastic variation
of the volatility. In fact, by doing this, the practioners are determining a wrong
volatility parameter out of a wrong formula !
Despite all the criticism against this practice, it is the standard on the deriva-
tives markets, and it does perform by far better than the statistical method.
It has been widely extended to more complex derivatives markets as the xed
income derivatives, defaultable securities and related derivatives...
More details can be found in Chapter 10 is dedicated to the topic of implied
volatility.
Risk control variables: the Greeks
With the above denition of the implied volatility, all the parameters needed
for the implementation of the Black-Scholes model are available. For the purpose
of controlling the risk of their position, the practitioners of the options markets
various sensitivities, commonly called Greeks, of the Black-Scholes formula to
the dierent variables and parameters of the model. The following picture
shows a typical software of an option trader, and the objective of the following
discussion is to understand its content.
1. Delta: This control variable is the most important one as it represents
the number of shares to be held at each time in order to perform a perfect
130 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES

Figure 9.2: an example of implied volatility surface
(dynamic) hedge of the option. The expression of the Delta is given in (9.17).
An interesting observation for the calculation of this control variables and the
subsequent ones is that
sN

(d
+
(s, k, v)) = kN

(d

(s, k, v)) ,
where N

(x) = (2)
1/2
e
x
2
/2
.
2. Gamma: is dened by
(S
t
, , K, T) :=

2
BS
s
2
(S
t
, , K, T)
=
1
S
t

T t
N

_
d
+
_
S
t
,

K,
2
T
__
. (9.19)
The interpretation of this risk control coecient is the following. While the sim-
ple Black-Scholes model assumes that the underlying asset price process is con-
tinuous, practitioners believe that large movemements of the prices, or jumps,
are possible. A stress scenario consists in a sudden jump of the underlying asset
price. Then the Gamma coecient represent the change in the hedging strategy
induced by such a stress scenario. In other words, if the underlying asset jumps
immediately from S
t
to S
t
+, then the option hedger must immediately modify
his position in the risky asset by buying
t
shares (or selling if
t
< 0).
9.2. European call options 131

Figure 9.3: A typical option trader software
Given this interpretation, a position with a large Gamma is very risky, as it
would require a large adjustment in case of a stress scenario.
3. Rho: is dened by
(S
t
, , K, T) :=
BS
r
(S
t
, , K, T)
=

K(T t)N
_
d

_
S
t
,

K,
2
T
__
, (9.20)
and represents the sensitivity of the Black-Scholes formula to a change of the
instantaneous interest rate.
4. Theta: is dened by
(S
t
, , K, T) :=
BS
T
(S
t
, , K, T)
=
1
2
S
t

T tN

_
d

_
S
t
,

K,
2
T
__
, (9.21)
is also called the time value of the call option. This coecient isolates the
depreciation of the option when time goes on due to the maturity shortening.
5. Vega: is one of the most important Greeks (although it is not a Greek
132 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
letter !), and is dened by
1 (S
t
, , K, T) :=
BS

(S
t
, , K, T)
= S
t

T tN

_
d

_
S
t
,

K,
2
T
__
. (9.22)
This control variable provides the exposition of the call option price to the
volatility risk. Practitioners are of course aware of the stochastic nature of the
volatility process (recall the smile surface above), and are therefore seeking a
position with the smallest possible Vega in absolute value.

Figure 9.4: Representation of the Greeks
9.2.6 Hedging with constant volatility: robustness of the
Black-Scholes model
In this subsection, we analyze the impact of a hedging strategy based on a
constant volatility parameter in a model where the volatility is stochastic:
dS
t
S
t
=
t
dt +
t
dW
t
. (9.23)
Here, the volatility is a process in H
2
, the drift process is measurable
adapted with
_
T
0
[
u
[du < , and B is the Brownian motion under the risk-
neutral measure. We denote by r the instantaneous interest rate assumed to be
constant.
9.2. European call options 133
Consider the position of a seller of the option who hedges the promised
payo (S
T
K)
+
by means of a self-nancing portfolio based on the Black-
Scholes hedging strategy with constant volatility . Then, the nal value of the
portfolio is:
X

BS
T
:= BS(t, S
t
, ) +
_
T
t

BS
(u, S
u
, )dS
u
+
_
T
t
(BS s
BS
)(u, S
u
, )rdu,
where BS(t, S
t
, ) is the Black-Scholes formula parameterized by the relevant
parameters for the present analysis, and
BS
:=
BS
s
. We recall that:
_

t
+rs

s
+
1
2

2
s
2

2
s
2
r
_
BS(t, s, ) = 0. (9.24)
The Prot and Loss is dened by
P&L
T
() := X

BS
T
(S
T
K)
+
.
Since BS(T, s, ) = (s K)
+
independently of :
P&L
T
=
_
T
t

BS
(u, S
u
, )dS
u
+
_
T
t
(BS s
BS
)(u, S
u
, )rdu
_
T
t
d BS(u, S
u
, )
(9.25)
By the smoothness of the Black-Scholes formula, it follows from the It os formula
and the (true) dynamics of the underlying security price process (9.23) that:
d BS(u, S
u
, ) =
BS
(u, S
u
, )dS
u
+
_

t
+
1
2
s
2

2
u

2
s
2
_
BS(u, S
u
, )du
=
BS
(u, S
u
, )dS
u
+
1
2
(
2
u

2
)S
2
u

BS
(u, S
u
, )du
+(rBS rs
BS
)(u, S
u
, )du,
where
BS
=

2
BS
s
2
, and the last equality follows from (9.24). Plugging this
expression in (9.25), we obtain:
P&L
T
() =
1
2
_
T
t
(
2
u

2
)S
2
u

BS
(u, S
u
, )du. (9.26)
An interesting consequence of the latter beautiful formula is the following ro-
bustness property of the Black-Scholes model which holds true in the very gen-
eral setting of the model (9.23).
Proposition 9.3. Assume that
t
, 0 t T, a.s. Then P&L
T
() 0,
a.s., i.e. hedging the European call option within the (wrong) Black-Scholes
model induces a super-hedging strategy for the seller of the option.
Proof. It suces to observe that
BS
0.
134 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
9.3 Complement: barrier options in the Black-
Scholes model
So far, we have developed the pricing and hedging theory for the so-called plain
vanilla options dened by payos g(S
T
) depending on the nal value of the
security at maturity. We now examine the example of barrier options which are
the simplest representatives of the so-called path-dependent options.
A European barrier call (resp. put) option is a European call (resp. put)
option which appears or disappears upon passage from some barrier. To simplify
the presentation, we will only concentrate on European barrier call options. The
corresponding denition for European barrier call options follow by replacing
calls by puts.
The main technical tool for the derivation of explicit formulae for the no
arbitrage prices of barrier options is the explicit form of the joint distribution
of the Brownian motion W
t
and its running maximum W

t
:= max
st
W
s
, see
Proposition 4.12:
f
W

t
,W
t
(m, w) =
2(2mw)
t

2t
exp
_

(2mw)
2
2t
_
1
{m0}
1
{wm}
In our context, the risky asset price process is dened as an exponential of a
drifted Brownian motion, i.e. the Black and Scholes model. For this reason, we
need the following result.
Proposition 9.4. For a given constant a R, let X
t
= W
t
+ at and X

t
=
max
st
X
s
the corresponding running maximum process. Then, the joint distri-
bution of (X

t
, X
t
) is characterized by the density:
f
X

t
,X
t
(y, x) =
2(2y x)
t

2t
exp
_
ax
a
2
2
t
(2y x)
2
2t
_
1
{y0}
1
{xy}
Proof. By the Cameron-Martin theorem, X is a Brownian motion under the
probability measure Q with density
dQ
dP
= e
aW
t

1
2
a
2
t
= e
aX
t
+
1
2
a
2
t
.
Then,
P[X

t
y, X
t
x] = E
Q
_
e
aX
t

1
2
a
2
t
1
{X

t
y}
1
{X
t
x}
_
.
Dierentiating, we see that
f
X

t
,X
t
(y, x) = e
ax
1
2
a
2
t
f
Q
X

t
,X
t
(y, x) = e
ax
1
2
a
2
t
f
W

t
,W
t
(y, x),
where we denoted by f
Q
X

t
,X
t
the joint density under Q of the pair (X

t
, X
t
).
9.3. Barrier options 135
9.3.1 Barrier options prices
We Consider a nancial market with a non-risky asset S
0
dened by
S
0
t
= e
rt
, t 0,
and a risky security with price process dened by the Black and Scholes model
S
t
= S
0
e
(r

2
2
)t+B
t
, t 0,
where B is a Brownian motion under the risk neutral measure Q.
An up-and-out call option is dened by the payo at maturity T:
UOC
T
:= (S
T
K)
+
1
{max
0tT
S
t
B}
Introducing the parameters
a :=
_
r



2
_
and b =
1

log
_
B
S
0
_
we may re-write the payo of the up-and-out call option in:
UOC
T
=
_
S
0
e
X
T
K
_
+
1
{X

T
b}
where X
t
:= W
t
+at, t 0,
and X

t
= max
0ut
X
u
, t 0. The no-arbitrage price at time 0 of the up-and-
out call is
UOC
0
= E
Q
_
e
rT
_
S
0
e
X
T
K
_
+
1
{X

T
b}
_
.
We now show how to obtain an explicit formula for the up-and-out call option
price in the present Black and Scholes framework.
a. By our general no-arbitrage valuation theory, together with the change of
measure, it follows that
UOC
0
= E
Q
_
e
rT
UOC
T

= S
0

P[X
T
k , X

T
b] Ke
rT
Q[X
T
k , X

T
b] (9.27)
where we set:
k :=
1

log
_
K
S
0
_
and

P is an equivalent probability measure dened by the density:
d

P
dQ
= e
rT
e
X
T
= e
rT
S
T
S
0
.
By the Cameron-Martin theorem, the process

W
t
= B
t
t, t 0,
136 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
denes a Brownian motion under

P. We introduce one more notation
a := a + so that X
t
= B
t
+at =

W
T
+ at, t 0. (9.28)
b. Using the explicit joint distribution of (X
T
, X

T
) derived in Proposition 9.4,
we compute that:
Q[X
T
k, X

T
b] =
_
b
0
_
yk

2(2y x)
T

2T
exp
_
ax
1
2
a
2
T
(2y x)
2
2T
_
dxdy
=
_
k

e
ax
1
2
a
2
T

2T
_
b
x
+
4(2y x)
2T
e

(2yx)
2
2T
dy dx
=
_
k

e
ax
1
2
a
2
T

2T
_
e
(2bx)
2
/2T
+e
x
2
/2T
_
dx
=
_
k aT

T
_
e
2ab

_
k at 2b

T
_
(9.29)
By (9.28), it follows that the second probability in (9.27) can be immediately
deduced from (9.29) by substituting a to a:

P[X
T
k , X

T
b] =
_
k (a +)T

T
_
e
2(a+)b

_
k (a +)T 2b

T
_
c. We next compute
Q[X

T
b] = 1 Q[X

T
> b]
= 1 Q[X

T
> b, X
T
< b] Q[X

T
> b, X
T
b]
= 1 Q[X

T
> b, X
T
< b] Q[X
T
b]
= Q[X

T
b, X
T
b] Q[X
T
b] .
Then,
Q[X
T
k, X

T
b] = Q[X

T
b] Q[X
T
k, X

T
b]
= N
_
b aT

T
_
e
2ab
N
_

b +aT

T
_
N
_
k aT

T
_
+e
2ab
N
_
k aT 2b

T
_
(9.30)
Similarly:

P[X
T
k, X

T
b] = N
_
b aT

T
_
e
2 ab
N
_

b + aT

T
_
N
_
k aT

T
_
+e
2 ab
N
_
k aT 2b

T
_
. (9.31)
d. The explicit formula for the price of the up-and-out call option is then
obtained by combining (9.27), (9.30) and (9.31).
9.3. Barrier options 137
e. An Up-and-in call option is dened by the payo at maturity T:
UIC
T
:= (S
T
K)
+
1
{max
0tT
S
t
B}
The no-arbitrage price at time 0 of the up-and-in call is easily deduced from the
explicit formula of the up-and-out call price:
UIC
0
= E
Q
_
e
rT
_
S
0
e
X
T
K
_
+
1
{Y
T
b}
_
= E
Q
_
e
rT
_
S
0
e
X
T
K
_
+
_
E
Q
_
e
rT
_
S
0
e
X
T
K
_
+
1
{Y
T
b}
_
= c
0
U0C
0
,
where c
0
is the Black-Scholes price of the corresponding European call option.
f. A down-and-out call option is dened by the payo at maturity T:
DOC
T
:= (S
T
K)
+
1
{min
0tT
S
t
B}
=
_
S
0
e
X
T
K
_
+
1
{min
0tT
X
t
b}
Observe that the process X
t
= B
t
+at, t 0 has the same distribution as the
process x
t
, t 0, where
x
t
:= B
t
at, t 0.
Moreover min
0tT
X
t
has the same distribution as max
0tT
x
t
. Then the no-
arbitrage price at time 0 of the down-and-out call is
DOC
0
= E
Q
_
e
rT
_
S
0
e
x
T
K
_
+
1
{max
0tT
x
t
b}
_
.
We then can exploit the formula established above for the up-and-out call option
after substituting (, a, b) to (, a, b).
g. A down-and-in call option is dened by the payo at maturity T:
DIC
T
:= (S
T
K)
+
1
{min
0tT
S
t
B}
The problem of pricing the down-and-in call option reduces to that of the down-
and-out call option:
DIC
0
= E
Q
_
e
rT
(S
T
K)
+
1
{min
0tT
X
t
b}
_
= E
Q
_
e
rT
(S
T
K)
+
_
E
Q
_
e
rT
(S
T
K)
+
1
{min
0tT
X
t
b}
_
= c
0
DOC
0
,
where c
0
is the Black-Scholes price of the corresponding European call option.
138 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
-
6

Regular down-and-in call


S
T
B K S
0
-
6

Reverse up-and-in call


S
T
S
0
K B
Figure 9.5: Types of barrier options.
9.3.2 Dynamic hedging of barrier options
We only indicate how the Black-Scholes hedging theory extends to the case of
barrier options. We leave the technical details for the reader. If the barrier is
hit before maturity, the barrier option value at that time is known to be either
zero, or the price of the corresponding European option. Hence, it is sucient
to nd the hedge before hitting the barrier T
B
T with
T
B
:= inf t 0 : S
t
= B .
Prices of barrier options are smooth functions of the underlying asset price in
the in-region, so It os formula may be applied up to the stopping time T T
B
.
By following the same line of argument as in the case of plain vanilla options,
it then follows that perfect replicating strategy consists in:
holding
f
s
(t, S
t
) shares of the underlying asset for t T
B
T
where f(t, S
t
) is price of the barrier option at time t.
9.3.3 Static hedging of barrier options
In contrast with European calls and puts, the delta of barrier options is not
bounded, which makes these options dicult to hedge dynamically. We conclude
this section by presenting a hedging strategy for barrier options, due to P. Carr
et al. [8], which uses only static positions in European products.
A barrier option is said to be regular if its pay-o function is zero at and
beyond the barrier, and reverse otherwise (see Figure 9.5 for an illustration).
In the following, we will treat barrier options with arbitrary pay-o functions
(not necessarily calls or puts). The price of an Up and In barrier option which
pays f(S
T
) at date T if the barrier B has been crossed before T will be denoted
by UI
t
(S
t
, B, f(S
T
), T), where t is the current date and S
t
is the current stock
price. In the same way, UO denotes the price of an Up and Out option and
EUR
t
(S
t
, f(S
T
), T) is the price of a European option with pay-o f(S
T
). These
9.3. Barrier options 139
functions satisfy the following straightforward parity relations:
UI
t
+ UO
t
= EUR
t
UI
t
(S
t
, B, f(S
T
), T) = EUR
t
(S
t
, f(S
T
), T) if f(z) = 0 for z < B
UI
t
(S
t
, B, f(S
T
), T) = UI
t
(S
t
, B, f(S
T
)1
S
T
<B
, T)
+ EUR
t
(S
t
, f(S
T
)1
S
T
B
, T) in general.
This means that in order to hedge an arbitrary barrier option, it is sucient
to study options of type In Regular. In addition, Up and Down options can
be treated in the same manner, so we shall concentrate on Up and In regular
options.
The method is based on the following symmetry relationship:
EUR
t
(S
t
, f(S
T
), T) = EUR
t
_
S
t
,
_
S
T
S
t
_

f
_
S
2
t
S
T
_
, T
_
, (9.32)
with = 1
2(rq)

2
where r is the interest rate, q the dividend rate and the
volatility. It is easy to check that this relation holds in the Black-Scholes model,
but the method also applies to other models which possess a similar symmetry
property.
Replication of regular options Let f be the pay-o function of an Up and
In regular option. This means that f(z) = 0 for z B. We denote by T
B
the
rst passage time by the price process above the level B. Consider the following
static hedging strategy:
At date t, buy the European option EUR
t
_
S
t
,
_
S
T
B
_

f
_
B
2
S
T
_
, T
_
.
When and if the barrier is hit, sell EUR
T
B
_
B,
_
S
T
B
_

f
_
B
2
S
T
_
, T
_
and
buy EUR
T
B
(B, f(S
T
), T). This transaction is costless by the symmetry
relationship (9.32).
It is easy to check that this strategy replicates the option UI
t
(S
t
, f(S
T
), T). As
a by-product, we obtain the pricing formula:
UI
t
(S
t
, B, f(S
T
), T) = EUR
t
_
S
t
,
_
S
T
B
_

f
_
B
2
S
T
_
, T
_
=
_
S
t
B
_

EUR
t
_
S
t
, f
_
B
2
S
2
t
S
T
_
, T
_
. (9.33)
The case of calls and puts Equation (9.33) shows that the price of a regular
In option can be expressed via the price of the corresponding European option,
for example,
UIP
t
(S
t
, B, K, T) =
_
S
t
B
_
2
p
t
_
S
t
,
KS
2
t
B
2
, T
_
.
140 CHAPTER 9. MARTINGALE APPROACH TO BLACK-SCHOLES
However, unless = 1, the replication strategies will generally involve European
payos other than calls or puts. If = 1 (that is, the dividend yield equals the
risk-free rate), then regular In options can be statically replicated with a single
call / put option. For example,
EUR
t
_
S
t
,
_
S
T
B
_

_
K
B
2
S
T
_
+
, T
_
= EUR
t
_
S
t
,
_
KS
T
B
B
_
+
, T
_
=
K
B
c
t
_
S
t
,
B
2
K
, T
_
.
The replication of reverse options will involve payos other than calls or puts
even if = 1.
Chapter 10
Local volatility models and
Dupires formula
10.1 Implied volatility
In the Black-Scholes model the only unobservable parameter is the volatility. We
therefore focus on the deopendence of the Black-Scholes formula in the volatility
parameter, and we denote:
C
BS
() := sN
_
d
+
(s,

K,
2
T)
_


KN
_
d

(s,

K,
2
T)
_
,
where N is the cumulative distribution function of the ^(0, 1) distribution, T
is the time to maturity, s is the spot price of the underlying asste, and

K, d

are given in (9.5).


In this section, we provide more quantitative results on the volatility calibra-
tion discussed in Section 9.2.5. First, observe that the model can be calibrated
from a single option price because the Black-Scholes price function is strictly
increasing in volatility:
lim
0
C
BS
() = (s

K)
+
, lim

C
BS
() = s and
C
BS

= sN

(d
+
)

T > 0 (10.1)
Then, whenever the observed market price C of the call option lies within the
no-arbitrage bounds:
(s

K)
+
< C < s
there is a unique solution I(C) to the equation
C
BS
() = C
called the implied volatility of this option. Direct calculation also shows that

2
C
BS

2
=
sN

(d
+
)

_
m
2

2
T


2
T
4
_
where m = ln
_
s
Ke
rT
_
, (10.2)
141
142 CHAPTER 10. IMPLIED VOLATILITY AND DUPIRES FORMULA
is the option moneyness. Equation (10.2) shows that the function C
BS
()
is convex on the interval (0,
_
2|m|
Tt
) and concave on (
_
2|m|
Tt
, ). Then the
implied volatility can be approximated by means of the Newtons algorithm:

0
=
_
2m
T
and
n
=
n1
+
C C
BS
(
n1
)
C
BS

(
n
)
which produces a monotonic sequence of positive scalars (
n
)
n0
. However, in
practice, when C is too close to the arbitrage bounds, the derivative
C
BS

(
n
)
becomes too small, leading to numerical instability. In this case, it is better to
use the bisection method.
In the Black-Scholes model, the implied volatility of all options on the same
underlying must be the same and equal to the historical volatility (standard
deviation of annualized returns) of the underlying. However, when I is computed
from market-quoted option prices, one observes that
The implied volatility is always greater than the historical volatility of the
underlying.
The impied volatilities of dierent options on the same underlying depend
on their strikes and maturity dates.
The left graph on Fig. 10.1 shows the implied volatilities of options on the S&P
500 index as function of their strike and maturity, observed on January 23, 2006.
One can see that
For almost all the strikes, the implied volatility is decreasing in strike (the
skew phenomenon).
For very large strikes, a slight increase of implied volatility can sometimes
be observed (the smile phenomenon).
The smile and skew are more pronounced for short maturity options; the
implied volatility prole as function of strike attens out for longer matu-
rities.
The dierence between implied volatility and historical volatility of the un-
derlying can be explained by the fact that the cost of hedging an option in
reality is actually higher than its Black-Scholes price, due, in particular to the
transaction costs and the need to hedge the risk sources not captured by the
Black-Scholes model (such as the volatility risk). The skew phenomenon is due
to the fact that the Black-Scholes model underestimates the probability of a
market crash or a large price movement in general. The traders correct this
probability by increasing the implied volatilities of options far from the money.
Finally, the smile can be explained by the liquidity premiums that are higher for
far from the money options. The right graph in gure 10.1 shows that the im-
plied volatilities of far from the money options are almost exclusively explained
by the Bid prices that have higher premiums for these options because of a lower
oer.
10.2. Local volatility 143
600
800
1000
1200
1400
1600
0
0.5
1
1.5
2
2.5
3
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
400 600 800 1000 1200 1400 1600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Bid
Ask
Midquote
Figure 10.1: Left: Implied volatility surface of options on the S&P500 index on
January 23, 2006. Right: Implied volatilities of Bid and Ask prices.
10.2 Local volatility models
In section 10.1 we saw that the Black-Scholes model with constant volatility
cannot reproduce all the option prices observed in the market for a given under-
lying because their implied volatility varies with strike and maturity. To take
into account the market implied volatility smile while staying within a Marko-
vian and complete model (one risk factor), a natural solution is to model the
volatility as a deterministic function of time and the value of the underlying:
dS
t
S
t
= rdt +(t, S
t
)dB
t
, (10.3)
where r is the interest rate, assumed to be constant, and B is the Brownian mo-
tion under the risk-neutral measure Q. The SDE (10.3) denes a local volatility
model.
We recall from Section 8.6 that the price of an option with payo h(S
T
) at
date T is given by
C(t, s) = E
Q
_
e
r(Tt)
h(S
T
)[S
t
= s
_
,
and is characterized by the partial dierential equation:
rC =
C
t
+rs
C
s
+
1
2
(t, s)
2
s
2

2
C
s
2
, C(T, s) = h(s). (10.4)
The self-nancing hedging portfolio contains
t
= (C/s)(t, S
t
) shares and the
amount
0
t
= C(t, S
t
)
t
S
t
in cash. The pricing equation has the same form
as in the Black-Scholes model, but one can no longer deduce an explicit pricing
formula, because the volatility is now a function of the underlying.
The naive way to use the model is to estimate the parameters, under the
statistical measure P, and to estimate the risk premium, typically by using
144 CHAPTER 10. IMPLIED VOLATILITY AND DUPIRES FORMULA
historical data on options prices. This allows to specify completely the risk-
neutral probability measure and the model (10.3).
The drawback of this approach is that option prices would then completely
determined by the historical model, and there is no hope for such an estimated
model to produce exactly the observed prices of the quoted options. Conse-
quently, the model can not be used under this form because it would immedi-
ately lead to arbitrage opportunities on the options market.
For this reason, practicioners have adopted a dierent approach which allows
to use all observed prices of quoted options as an input for their pricing and
hedging activities. This is the so-called model calibration approach. The param-
eters obtained by calibration are of course dierent from those which would be
obtained by historical estimation. But this does not imply any problem related
to the presence of arbitrage opportunities.
The model calibration approach is adopted in view of the fact that nancial
markets do not obey to any fundamental law except the simplest no-dominance
or the slightly stronger no-arbitrage, see Section 1.2 below. There is no univer-
sally accurate model in nance, and any proposed model is wrong. Therefore,
practitioners primarily base their strategies on comparison between assets, this
is exactly what calibration does.
10.2.1 CEV model
A well studied example of a parametric local volatility model is provided by the
CEV (Constant Elasticity of Variance) model [11]. In this model, the volatility is
a power-law function of the level of the underlying. For simplicity, we formulate
a CEV model on the forward price of the underlying F
t
= e
r(Tt)
S
t
:
dF
t
=
0
F

t
dB
t
, for some (0, 1], (10.5)
together with the restriction that the left endpoint 0 is an absorbing boundary:
if F
t
= 0 for some t, F
s
0 for all s t. The constraint 0 < 1 needs to
be imposed to ensure that the above equation denes a martingale, see Lemma
10.2 below. We observe that one can show that for > 1, (F
t
) is a strict
local martingale, that is, not a true martingale. This can lead, for example, to
Call-Put parity violation and other problems.
In the above CEV model, the volatility function (f) :=
0
f

has a constant
elasticity:
f

(f)
(f)
= .
The Black-Scholes model and the Gaussian model are particular cases of this
formulation corresponding to = 1 and = 0 respectively. When < 1, the
CEV model exhibits the so-called leverage eect, commonly observed in equity
markets, where the volatility of a stock is decreasing in terms of the spot price
of the stock.
10.2. Local volatility 145
For the equation (10.5) the existence and uniqueness of solution do not follow
from the classical theory of strong solutions of stochastic dierential equations,
because of the non-Lipschitz nature of the coecients. The following exercise
shows that the existence of a weak solution can be shown by relating the CEV
process with the so-called Bessel processes. We also refer to [16] for the proof
of the existence of an equivalent martingale measure in this model.
Exercise 10.1. Let W be a scalar Brownian motion. For R, X
0
> 0, we
assume that there is a unique strong solution X to the SDE:
dX
t
= dt + 2
_
[X
t
[dW
t
,
called the dimensional square Bessel process.
1. Let

W be a Brownian motion in R
d
, d 2. Show that |

W|
d
is a
ddimensional square Bessel process.
2. Find a scalar power and a constant a R so that the process Y
t
:=
aX

t
, t 0, is a CEV process satisfying (10.5), as long as X does not hit
the origin.
3. Conversely, given a CEV process (10.5), dene a Bessel process by an
appripriate change of variable.
Lemma 10.2. For 0 < 1, let F be a solution of the SDE (10.5). Then F
is a square-integrable martingale on [0, T] for all T < .
Proof. It suces to show that
E
_

2
0
_
T
0
F
2
t
dt
_
< . (10.6)
Let
n
= inft : F
t
n. Then, F
T
n
is square integrable and for all 0 < 1,
E[F
2

n
T
] =
2
0
E
_
_

n
T
0
F
2
t
dt
_

2
0
E
_
_

n
T
0
(1 +F
2
t
)dt
_

2
0
E
_
_
T
0
(1 +F
2
t
n
)dt
_
By Gronwalls lemma we then get

2
0
E
_
_

n
T
0
F
2
t
dt
_
= E[F
2

n
T
]
2
0
Te

2
0
T
,
and (10.6) now follows by monotone convergence.
146 CHAPTER 10. IMPLIED VOLATILITY AND DUPIRES FORMULA
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
Strike
(K)
implied vol
Figure 10.2: Skew (decreasing prole) of implied volatility in the CEV model
with = 0.3, = 0.5, S
0
= 1 and T = 1
The shape of the implied volatility in the CEV model is known from the
asymptotic approximation (small volatility) of Hagan and Woodward [24]:

imp
(K, T)

0
F
1
m
_
1 +
(1 )(2 +)
24
_
F
0
K
F
m
_
2
+
(1 )
2
24

2
0
T
F
22
m
+. . .
_
,
F
m
=
1
2
(F
0
+K).
To the rst order, therefore,
imp
(K, T)

0
F
1
m
: the implied volatility has the
same shape as local volatility but with an at the money slope which is two times
smaller than that of the local volatility (see gure 10.2).
10.3 Dupires formula
We now want to exploit the pricing partial dierential equation (10.4) to deduce
the local volatility function (t, s) from observed call option prices for all strikes
and all maturities. Unfortunately, equation (10.4) does not allow to reconstruct
the local volatility from the formula

2
(t, s) =
rC
C
t
rs
C
s
1
2
s
2

2
C
s
2
,
because at a given date, the values of t and s are xed, and the corresponding
partial derivatives cannot be evaluated. The solution to this problem was given
by Bruno Dupire [20] who suggested a method for computing (t, s) from the
observed option prices for all strikes and maturities at a given date.
To derive the Dupires equation, we need the following conditions on the
local volatility model (10.3).
10.3. Dupires formula 147
Assumption 10.3. For all x > 0 and all small > 0, there exists > 0 and
a continuous function c(t) such that:
[x(t, x) y(t, y)[ c(t)[x y[

for [x y[ < , t (t
0
, ).
Theorem 10.4. Let Assumption 10.3 hold true. Let t
0
0 be xed and (S
t
)
t
0
t
be a square integrable solution of (10.3) with E
_
_
t
t
0
S
2
t
dt
_
< for all t t
0
.
Assume further that the random variable S
t
has a continuous density p(t, x)on
(t
0
, )(0, ). Then the call price function C(T, K) = e
r(Tt
0
)
E[(S
T
K)
+
]
satises Dupires equation
C
T
=
1
2

2
(T, K)K
2

2
C
K
2
rK
C
K
, (T, K) [t
0
, ) [0, ) (10.7)
with the initial condition C(t
0
, K) = (S
t
0
K)
+
.
Proof. The proof is based on an application of It os formula to the process
e
rt
(S
t
K)
+
. Since the function f(x) = x
+
is not C
2
, the usual It o formula
does not apply directly. A possible solution [21] is to use the Meyer-Ito formula
for convex functions [34]. The approach used here, is instead to regularize the
function f, making it suitable for the usual It o formula. Introduce the function
f

(x) =
(x +/2)
2
2
1
/2x/2
+x1
x>/2
.
Notice that f

and f are equal outside the interval [/2, /2]. Direct calculation
provides:
f

(x) =
x +/2

1
/2x/2
+1
x>/2
, and for 2[x[ , = , f

(x) =
1

1
/2x/2
.
Then, we may apply It os formula (with generalized derivatives, see Remark
6.3) to e
rt
f

(S
t
K) between T and T +:
e
r(T+)
f

(S
T+
K) e
rT
f

(S
T
K) = r
_
T+
T
e
rt
f

(S
t
K)dt
+
_
T+
T
e
rt
f

(S
t
K)dS
t
+
1
2
_
T+
T
e
rt
f

(S
t
K)
2
(t, S
t
)S
2
t
dt. (10.8)
The last term satises
_
T+
T
e
rt
f

(S
t
K)
2
(t, S
t
)S
2
t
dt
=
_
T+
T
dte
rt
K
2

2
(t, K)
1

1
K/2S
t
K+/2
+
_
T+
T
dte
rt
(S
2
t

2
(t, S
t
) K
2

2
(t, K))
1

1
K/2S
t
K+/2
.
148 CHAPTER 10. IMPLIED VOLATILITY AND DUPIRES FORMULA
Using Assumption 10.3 above, the last term is dominated, up to a constant, by
_
T+
T
dte
rt
c(t)

1
K/2S
t
K+/2
(10.9)
Taking the expectation of each term in (10.8) under the assumption 1, we nd
e
r(T+)
E
_
f

(S
T+
K)

e
rT
E
_
f

(S
T
K)

= r
_
T+
T
e
rt
E
_
f

(S
t
K)

dt
+
_
T+
T
e
rt
E
_
f

(S
t
K)S
t

rdt
+
1
2
_
T+
T
e
rt
K
2

2
(t, K)
1

E
_
1
K/2S
t
K+/2

dt +O(

), (10.10)
where the estimate O(

) for the last term is obtained using (10.9) and the


continuous density assumption. By the square integrability of S, we can pass
to the limit 0:
C(T +, K) C(T, K)
= r
_
T+
T
E
_
(S
t
K)
+

dt +r
_
T+
T
e
rt
E
_
S
t
1
S
t
K

dt
+
1
2
_
T+
T
e
rt

2
(t, K)K
2
p(t, K)dt
= rK
_
T+
T
e
rt
P
_
S
t
K

dt +
1
2
_
T+
T
e
rt

2
(t, K)K
2
p(t, K)dt.
Dividing both sides by and passing to the limit 0, this gives
C
T
= rKe
rT
P
_
S
T
K

+
1
2
e
rT

2
(T, K)K
2
p(T, K).
Finally, observing that
e
rT
P[S
T
K] =
C
K
and e
rT
p(T, K) =

2
C
K
2
,
the proof of Dupires equation is completed.
The Dupire equation (10.7) can be used to deduce the volatility function
(, ) from option prices. In a local volatility model, the volatility function
can therefore be uniquely recovered via
(T, K) =

2
C
T
+rK
C
K
K
2

2
C
K
2
(10.11)
Notice that the fact that one can nd a unique continuous Markov process from
European option prices does not imply that there are no other models (non-
Markovian or discontinuous) that produce the same European option prices.
10.3. Dupires formula 149
Knowledge of European option prices determines the marginal distributions
of the process, but the law of the process is not limited to these marginal
distributions.
Another feature of the Dupire representation is the following. Suppose that
the true model (under the risk-neutral probability) can be written in the form
dS
t
S
t
= rdt +
t
dB
t
,
where is a general adapted process, and not necessarily a deterministic func-
tion of the underlying. It can be shown that in this case the square of Dupires
local volatility given by equation (10.11) coincides with the expectation of the
squared stochastic volatility conditioned by the value of the underlying:

2
(t, S) = E[
2
t
[S
t
= S].
Dupires formula can therefore be used to nd the Markovian diusion which
has the same marginal distributions as a given It o martingale. In this sense,
a local volatility surface can be seen as an arbitrage-free representation of a
set of call prices for all strikes and all maturities just as the implied volatility
represents the call price for a single strike and maturity.
Theorem 10.4 allows to recover the volatility coecient starting from a com-
plete set of call prices at a given date if we know that these prices were produced
by a local volatility model. It does not directly allow to answer the following
question: given a system of call option prices (C(T, K))
T0,K0
, does there
exist a continuous diusion model reproducing these prices? To apply Dupires
formula (10.11), we need to at least assume that

2
C
K
2
> 0 and
C
T
+rK
C
K
0.
These constraints correspond to arbitrage constraints of the positivity of but-
tery spreads and calendar spreads respectively.
A buttery spread is a portfolio containing one call with strike K, one call
with strike K + and a short position in two calls with strike K, where all the
options have the same expiry date. Since the terminal pay-o of this portfolio is
positive, its price must be positive at all dates: C(K)2C(K)+C(K+)
0. This shows that the prices of call (and put) options are convex in strike, which
implies

2
C
K
2
> 0 if the price is twice dierentiable and the second derivative
remains strictly positive.
A (modied) calendar spread is a combination of a call option with strike
K and maturity date T + with a short position in a call option with strike
Ke
rT
and maturity date T. The Call-Put parity implies that this portfolio
has a positive value at date T; its value must therefore be positive at all dates
before T: C(K, T + ) C(Ke
r
, T). Passing to the limit 0 we have,
under dierentiability assumption,
lim
0
C(K, T + ) C(Ke
r
, T)

=
C
T
+rK
C
K
0.
Nevertheless, it may happen that the volatility (t, s) exists but does not
lead to a Markov process satisfying the three assumptions of theorem 10.4, for
150 CHAPTER 10. IMPLIED VOLATILITY AND DUPIRES FORMULA
example, models with jumps in stock prices typically lead to explosive volatility
surfaces, for which the SDE (10.3) does not have a solution.
10.3.1 Dupires formula in practice
The gure 10.3 shows the results of applying Dupires formula to articially
simulated data (left) and real prices of options on the S&P 500 index. While
on the simulated data, the formula produces a smooth local volatility surface,
its performance for real data is not satisfactory for several reasons:
Market prices are not known for all strikes and all maturities. They must
be interpolated and the nal result is very sensitive to the interpolation
method used.
Because of the need to calculate the second derivative of the option price
function C(T, K), small data errors lead to very large errors in the solution
(ill-posed problem).
Due to these two problems, in pracrice, Dupires formula is not used directly
on the market prices. To avoid solving the ill-posed problem, practitioners
typically use one of two approaches:
Start by a preliminary calibration of a parametric functional form to the
implied volatility surface (for example, a function quadratic in strike and
exponential in time may be used). With this smooth parametric function,
recalculate option prices for all strikes, which are then used to calculate
the local volatility by Dupires formula.
Reformulate Dupires equation as an optimization problem by introduc-
ing a penalty term to limit the oscillations of the volatility surface. For
example, Lagnado and Osher [30], Crepey [14] and other authors propose
to minimize the functional
J()
N

i=1
w
i
(C(T
i
, K
i
, ) C
M
(T
i
, K
i
))
2
+||
2
2
, (10.12)
||
2
2

_
K
max
K
min
dK
_
T
max
T
min
dT
_
_

K
_
2
+
_

T
_
2
_
, (10.13)
where C
M
(T
i
, K
i
) is the market price of the option with strike K
i
et
expiry date T
i
and C(T
i
, K
i
, ) corresponds to the price of the same option
computed with the local volatility surface (, ).
10.3.2 Link between local and implied volatility
Dupires formula (10.11) can be rewritten in terms of market implied volatilities,
observing that for every option,
C(T, K) = C
BS
(T, K, I(T, K)),
10.3. Dupires formula 151
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
80
85
90
95
100
105
110
115
120
0.1
0.15
0.2
0.25
T
K 0
0.5
1
1.5
2
1150
1200
1250
1300
1350
1400
1450
1500
0
0.5
1
1.5
Figure 10.3: Examples of local volatility surface. Left: articial data; the im-
plied volatility is of the form I(K) = 0.15
100
K
for all maturities (S
0
= 100).
Right: local volatility computed from S&P 500 option prices with spline inter-
polation.
where C
BS
(T, K, ) denotes the Black-Scholes call price with volatility and
I(T, K) is the implied volatility for strike K and maturity date T.
Substituting this expression into Dupires formula, we get

2
(T, K) = 2
C
BS
T
+
C
BS

I
T
+rK
_
C
BS
K
+
C
BS

I
K
_
K
2
_

2
C
BS
K
2
+ 2

2
C
BS
K
I
K
+

2
C
BS

2
_
I
K
_
2
+
C
BS

2
I
K
2
_
=
I
T
+ 2
I
T
+ 2rK
I
K
K
2
_
1
K
2
IT
+ 2
d
+
KI

T
I
K
+
d
+
d

I
_
I
K
_
2
+

2
I
K
2
_, (10.14)
with the usual notation
d

=
log
_
S
Ke
rT
_

1
2
I
2
T
I

T
.
Suppose rst that the implied volatility does not depend on the strike (no
smile). In this case, the local volatility is also independent from the strike and
equation (10.14) is reduced to

2
(T) = I
2
(T) + 2I(T)T
I
T
,
and so
I
2
(T) =
_
T
0

2
(s)ds
T
.
The implied volatility is thus equal to the root of mean squared local volatility
over the lifetime of the option.
152 CHAPTER 10. IMPLIED VOLATILITY AND DUPIRES FORMULA
To continue the study of equation (10.14), let us make a change of variable
to switch from the strike K to the log-moneyness variable x = log(S/

K), with
I(T, K) = J(T, x). The equation (10.14) becomes
2JT
J
T
+J
2

2
_
1
x
J
J
x
_
2

2
JT

2
J
x
2
+
1
4

2
J
2
T
2
_
J
x
_
2
= 0.
Assuming that I and its derivatives remain bounded when T 0, we obtain by
sending T to 0:
J
2
(0, x) =
2
(0, x)
_
1
x
J
J
x
_
2
.
This dierential equation can be solved explicitly:
J(0, x) =
__
1
0
dy
(0, xy)
_1
. (10.15)
We have thus shown that, in the limit of very short time to maturity, the implied
volatility is equal to the harmonic mean of local volatilities. This result was
established by Berestycki and Busca [7]. When the local volatility (0, x) is
dierentiable at x = 0, equation (10.15) allows to prove that (the details are
left to the reader)
J(0, 0)
x
=
1
2
(0, 0)
x
.
The slope of the local volatility at the money is equal, for short maturities, to
twice the slope of the implied volatility.
This asymptotic makes it clear that the local volatility model, although it
allows to calibrate the prices of all options on a given date, does not reproduce
the dynamic behavior of these prices well enough. Indeed, the market implied
volatility systematically attens out for long maturities (see Figure 10.1), which
results in the attening of the local volatility surface computed from Dupires
formula. Assuming that the model is correct and that the local volatility surface
remains constant over time, we therefore nd that the ATM slope of the implied
volatility for very short maturities should systematically decrease with time, a
property which is not observed in the data. This implies that the local volatility
surface cannot remain constant but must evolve with time: (T, K) =
t
(T, K),
an observation which leads to local stochastic volatility models.
Chapter 11
Gaussian interest rates
models
In this chapter, we provide an introduction to the modelling of the term struc-
ture of interest rates. We will develop a pricing theory for securities that depend
on default-free interest rates or bond prices. The general approach will exploit
the fact that bonds of many dierent maturities are driven by a few common
factors. Therefore, in contrast with the previous theory developed for a nite
securities markets, we will be in the context where the number of traded assets
is larger (in fact innite) than the number of sources of randomness.
The rst models introduced in the literature stipulate some given dynamics
of the instantaneous interest rate process under the risk neutral measure Q,
which is assumed to exist. The prices of bonds of all maturities are then deduced
by computing the expected values of the corresponding discounted payo under
Q. We shall provide a detailed analysis of the most representative of this class,
namely the Vasicek model. An important limitation of this class of models is
that the yield curve predicted by the model does not match the observed yield
curve, i.e. the calibration to the spot yield curve is not possible.
The Heath-Jarrow-Morton approach (1992) solves this calibration problem
by taking the spot yield curve as the initial condition for the dynamics of the en-
tire yield curve. The dynamics of the yield curve is driven by a nite-dimensional
Broanian motion. In order to exclude all possible arbitrage opportunities, we
will assume the existence of a risk neutral probability measure, for all bonds
with all maturities. In the present context of a large nancial market, This
condition leads to the so-called Heath-Jarrow-Morton restriction which states
that the dynamics of the yield curve is dened by the volatility process of zero-
coupon bonds together with a risk premia process which is common to all bonds
with all maturities.
Finally, a complete specication of an interest rates model requires the spec-
ication of the volatility of bonds. As in the context of nite securities markets,
153
154CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
this is achieved by a calibration technique to the options markets. We therefore
provide an introduction to the main tools for the analysis of xed income deriva-
tives. An important concept is the notion of forward neutral measure, which
turns the forward price processes with pre-specied maturity into martingales.
In the simplest models dened by deterministic volatilities of zero-coupon bonds,
this allows to express the prices of European options on zero-coupon bonds in
closed form by means of a Black-Scholes type of formula. The structure of im-
plied volatilities extracted from these prices provides a powerfull tool for the
calibration of the yield curve to spot interest rates and options.
11.1 Fixed income terminology
11.1.1 Zero-coupon bonds
Throughout this chapter, we will denote by P
t
(T) the price at time t of a pure
discount bond paying 1 at date T t. By denition, we have P
T
(T) = 1.
In real nancial markets, the prices P
t
(T
i
) are available, at each time t, for
various maturities T
i
, i = 1, . . . , k. We shall see later that these data are directly
available for maturities shorter than one year, and can be extracted from bond
prices for larger maturities by the so-called boostrapping technique.
Since the integer k recording the number of maturities is typically large, the
models developed below allow for trading the zero-coupon bonds P
t
(T) for any
T > t. We are then in the context of innitely many risky assets. This is a rst
major dierence with the theory of derivative securities on stocks developed in
previous chapters.
The second specicity is that the zero-coupon bond with price P
t
(T) today
will be a dierent asset at a later time u [t, T] as its time to maturity is
shortened to T u. This leads to important arbitrage restrictions.
Given the prices of all zero-coupon bonds, one can derive an un-ambiguous
price of any deterministic income stream: consider an asset which pays F
i
at
each time t
i
, i = 1, . . . , n. Then the no-arbitrage price at time 0 of this asset is
n

i=1
F
i
P
0
(t
i
).
If F
i
is random, the above formula does not hold true, as the correlation between
F
i
and interest rates enters into the picture.
Coupon-bearing bonds are quoted on nancial markets. Their prices are
obviously related to the prices of zero-coupon bonds by
P
0
=
n

i=1
c P
0
(T
i
) + K P
0
(T) =
n

i=1
K P
0
(T
i
) + K P
0
(T)
where c = K is the coupon corresponding to the pre-assigned interest > 0,
T
1
. . . T
n
T are the dates where the coupons are paid, and K is the
Principal (or face value) to be paid at the maturity T.
11.1. Terminology 155
The yield to maturity is dened as the (unique !) scalar Y
0
such that
P
0
=
n

i=1
ce
Y
0
T
i
+ Ke
Y
0
T
The bond is said to be priced
at par if = Y
0
,
below par if < Y
0
,
above par if > Y
0
.
Only short term zero-coupon bonds are quoted on the market (less than
one-year maturity). Zero-coupon bonds prices are inferred from coupon-bearing
bonds (or interest rates swaps introduced below).
On the US market, Government debt securities are called:
Treasury bills (T-bills): zero-coupon bonds with maturity 1 year,
Treasury notes (T-notes): coupon-bearing with maturity 10 years,
Treasury bonds (T-bonds): coupon-bearing with maturity > 10 years.
A government bond is traded in terms of its price which is quoted in terms of
its face value.
11.1.2 Interest rates swaps
Let T
0
> 0, > 0, T
i
= T
0
+ i, i = 1 . . . , n, and denote by T := T
0
< T
1
<
. . . < T
n
the set of such dened maturities. We denote by L(T
j1
) the LIBOR
rate at time T
j1
, i.e. the oating rate received at time T
j
and set at time T
j1
by reference to the price of the zero-coupon bond over that period:
P
T
j1
(T
j
) =
1
1 +L(T
j1
)
.
The interest rate swap is dened by the comparison of the two following streams
of payments:
the oating leg: consists of the payments L(T
j1
) at each maturity T
j
for j = 1, . . . , n, and the unit payment (1) at the nal maturity T
n
,
the xed leg: consists of the payments at each maturity T
j
for j =
1, . . . , n, and the unit payment (1) at the nal maturity T
n
, for some
given constant rate .
The interest rate swap rate correponding to the set of maturities T is dened
as the constant rate which equates the value at time T
0
of the above oating
and xed leg. Direct calculation leads to the following expression of the swap
rate:

T
0
(, n) =
1 P
T
0
(T
n
)

n
j=1
P
T
0
(T
j
)
.
We leave the verication of this formula as an exercise.
156CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
11.1.3 Yields from zero-coupon bonds
We dene the yields corresponding to zero-coupon bonds by
P
t
(T) = e
(Tt)R
t
(T)
, i.e. R
t
(T) :=
ln P
t
(T)
T t
.
The term structure of interest rates represents at each time t the curve T
R
t
(T) of yields on zero-coupon bonds for all maturities T > 0. It is also com-
monly called the yields curve.
In practice, there are many dierent term structures of interest rates, de-
pending on whether zero-coupon bonds are deduced
from observed bonds prices
from observed swaps prices
from bonds issued by the government of some country, or by a rm with
some condence on its liability (rating).
We conclude this section by the some stylized facts observed on real nancial
markets data:
a- The term structure of interest rates exhibits dierent shapes: (almost) at,
increasing (most frequently observed), decreasing, decreasing for short term
maturities then increasing, increasing for short term maturities then decreasing.
Examples of observed yield curves are displayed in Figure 11.1 below.
b- Interest rates are positive: we shall however make use of gaussian models
which oer more analytic solutions, although negative values are allowed by
such models (but with very small probability).
c- Interest rates exhibit mean reversion, i.e. oscillate around some average level
and tend to be attracted to it. See Figure 11.2 below.
d- Interest rates for various maturities are not perfectly correlated.
e- Short term interest rates are more volatile than long term interest rates. See
Figure 11.3
11.1.4 Forward Interest Rates
The forward rate F
t
(T) is the rate at which agents are willing, at date t, to
borrow or lend money over the period [T, T + h] for h 0. It can be de-
duced directly from the zero-coupon bonds P
t
(T), T t by the following
no-arbitrage argument:
start from the initial capital P
t
(T), and lend it over the short periods
[t, t + h], [t + h, t + 2h], ..., at rates agreed now; for h 0, this strategy
yields the payo P
t
(T)e
R
T
t
F
t
(u)du
.
Since the alternative strategy of buying one discount bond costs P
t
(T)
yields a certain unit payo (1), it must be the case that
P
t
(T) = e

R
T
t
F
t
(u)du
.
11.1. Terminology 157

Figure 11.1: Various shapes of yield curves from real data
In terms of yields to maturity, the above relation can be rewritten in:
R
t
(T) =
1
T t
_
T
t
F
t
(u)du.
So zero-coupon bonds and the corresponding yields can be dened from forward
rates. Conversely, forward rates can be obtained by
F
t
(T) =

T
(T t)R
t
(T) and F
t
(T) =

T
ln P
t
(T)
11.1.5 Instantaneous interest rates
The instantaneous interest rate is given by
r
t
= R
t
(t) = F
t
(t) .
It does not correspond to any tradable asset, and is not directly observable.
However, assuming that the market admits a risk neutral measure Q, it follows
from the valuation theory developed in the previous chapter that:
P
t
(T) = E
Q
_
e

R
T
t
r
u
du

T
t
_
.
Hence, given the dynamics of the instantaneous interest rates under Q, we may
deduce the prices of zero-coupon bonds.
158CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES

Figure 11.2: Mean reversion of interest rates

Figure 11.3: Volatility of interest rates is decreasing in terms of maturity
11.2 The Vasicek model
Consider the process dened by:

t
=
0
+m
_
t
0
e
t
dt +
_
t
0
e
t
dB
t
=
0
+m
_
e
t
1
_
+
_
t
0
e
t
dB
t
,
11.3. Zero-coupon bonds 159
where B is a Brownian motion under the risk neutral measure Q. Observe that
the above stochastic integral is well-dened by Theorem 6.1. The Vasicek model
(1977) assumes that the the instantaneous interest rate is given by:
r
t
:=
t
e
t
for t 0 .
An immediate application of It os formula provides the following dynamics of
the interest rates process:
dr
t
= (mr
t
)dt +dB
t
.
this is the so-called Ornstein-Uhlenbeck process in the theory of stochastic pro-
cesses. Observe that this process satises the mean reversion property around
the level m with intensity :
if r
t
< m, then the drift is positive, and the interest rate is pushed upward
with the intensity ,
if r
t
> m, then the drift is negative, and the interest rate is pushed downward
with the intensity .
The process r
t
, t 0 is explicitly given by
r
t
= m+ (r
0
m) e
t
+
_
t
0
e
(tu)
dB
u
. (11.1)
Using the It o isometry, this shows that r
t
, t 0 is a gaussian process with
mean
E
Q
[r
t
] = m+ (r
0
m) e
t
, t 0 ,
and covariance function
Cov
Q
[r
t
, r
s
] = E
Q
__
t
0
e
(tu)
dB
u
_
s
0
e
(su)
dB
u
_
=

2
2
_
e
|ts|
e
(t+s)
_
, for s, t 0 .
In particular, this model allows for negative interest rates with positive (but
small) probability !
For xed t > 0, the distribution under Q of r
t
is N
_
E
Q
[r
t
], V
Q
[r
t
]
_
, which
converges to the stationary distribution
N
_
m,

2
2
_
as t .
11.3 Zero-coupon bonds prices
Recall that the price at time 0 of a zero-coupon bond with maturiyt T is given
by
P
0
(T) = E
Q
_
e

R
T
0
r
t
dt
_
.
160CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
In order to develop the calculation of this expectation, we now provide the
distribution of the random variable
_
T
0
r
t
dt by integrating (11.1):
_
T
0
r
t
dt = mT + (r
0
m)
_
T
0
e
t
dt +
_
T
0
_
t
0
e
(tu)
dB
u
(11.2)
= mT + (r
0
m)
1 e
T

+
_
T
0
_
t
0
e
(tu)
dB
u
. (11.3)
In order to derive the distribution of the last double integral, we need to reverse
the order of integration. To do this, we introduce the process Y
t
:=
_
t
0
e
u
dB
u
,
t 0, and we compute by It os formula that
d
_
e
t
Y
t
_
= e
t
dY
t
e
t
Y
t
dt = dB
t
e
t
Y
t
dt .
Integrating between 0 and T and recalling the expression of Y
t
, this provides:
_
T
0
e
t
_
t
0
e
u
dB
u
dt = B
T
e
T
Y
T
=
_
T
0
_
1 e
(Tt)
_
dB
t
.
Plugging this expression into (11.2), we obtain:
_
T
0
r
t
dt = mT + (r
0
m) (T) +
_
T
0
(T t)dB
t
where
(u) :=
1 e
u

.
This shows that
_
T
0
r
t
dt is distributed as ^
_
E
Q
_
_
T
0
r
t
dt
_
, V
Q
_
_
T
0
r
t
dt
__
, (11.4)
where
E
Q
_
_
T
0
r
t
dt
_
= mT + (r
0
m) (T)
V
Q
_
_
T
0
r
t
dt
_
=
2
_
T
0
(u)
2
du.
Given this explicit distribution, we can now compute the prices at time zero of
the zero-coupon bonds:
P
0
(T) = exp
_
mT + (r
0
m) (T) +

2
2
_
T
0
(u)
2
du
_
for T 0 .
11.4. Calibration to yield curve 161
Since the above Vasicek model is time-homogeneous, we can deduce the price
at any time t 0 of the zero-coupon bonds with all maturities:
P
t
(T) = exp
_
m(T t) (r
t
m) (T t) +

2
2
_
Tt
0
(u)
2
du
_
(11.5)
for T t 0. The term structure of interest rates is also immediately obtained:
R
t
(T) =
ln P
t
(T)
T t
= m+ (r
t
m)
(T t)
T t


2
2(T t)
_
T
t
(u)
2
du(11.6)
Exercise 11.1. Show that the joint distribution of the pair
_
r
T
,
_
T
0
r
t
dt
_
is
gaussian, and provide its characteristics in explicit form. Hint: compute its
Laplace transform.
We conclude this section by deducing from (11.5) the dynamics of the price
process of the zero-coupon bonds, by a direct application of It os formula. An
important observation for this calculation is that the drift term in this dier-
ential representaion is already known to be dP
t
(T) = P
t
(T)r
t
dt + dB
t
, since
P
t
(T), 0 t T is the price of a security traded on the nancial market.
Therefore, we only need to compute the volatility coecient of the zero-coupon
price process. This is immediately obtained from (11.5):
dP
t
(T)
P
t
(T)
= r
t
dt (T t)dB
t
, t < T .
11.4 Calibration to the spot yield curve and the
generalized Vasicek model
An important requirement that the interest rate must satisfy is to reproduce
the observed market data for the zero-coupon bond prices
B

0
(T) , T 0 ,
or equivalently, the spot yield curve at time zero
R

0
(T) , T 0 ,
or equivalently the spot forward rates curve
F

0
(T) , T 0 ,
In practice, the prices of the zero-coupon bonds for some given maturities are
either observed (for maturities shorter than one year), or extracted from coupon-
bearing bonds or interest rates swaps; the yield curve is then constructed by an
interpollation method.
162CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
Since the Vasicek model is completely determined by the choice of the four
parameters r
0
, , m, , there is no hope for the yield curve (11.6) predicted by
this model to match some given observe spot yield curve R

0
(T) for every T 0.
In other words, the Vasicek model can not be calibrated to the spot yield curve.
Hull and White (1992) suggested a slight extension of the Vasicek model
which solves this calibration problem. In order to meet the innite number
of constraints imposed by the calibration problem, they suggest to model the
instantaneous interest rates by
r
t
:=
t
e
t
where
t
=
0
+
_
t
0
m(t)e
t
dt +
_
t
0
e
t
dB
t
,
which provides the dynamics of the instantaneous interest rates:
dr
t
= (m(t) r
t
) dt +dB
t
, (11.7)
Here, the keypoint is that m(.) is a deterministic function to be determined by
the calibration procedure. This extension increases the number of parameters
of the model, while keeping the main features of the model: mean reversion,
gaussian distribution, etc...
All the calculations of the previous section can be reproduced in this context.
One can show that the above extension of the Vasicek model denes a Gaussian
process. The distribution of the cumulated interest rate can also be shown to
be gaussian, and its mean and variance can be computed explicitly. The details
of these computations are left as an exercise for the reader.
In the present context, the calibration to the spot yield curve is possible by
chosing:
m(T) =
e
kT
k

T
_
e
kT
_
F

0
(T) +
1
2

2
(T)
2
__
.
Of course, such a calibration must be updated at every time instant. This means
that the coecient m(.) which is supposed to be deterministic, will typically be
xed at every time instant by the calibration procedure. Hence, similarly to
the implied volatility parameter in the case of European options on stocks, the
Hull-White model is based on a gaussian model, but its practical implementation
violates its founding assumptions by allowing for a stochastic evolution of the
mean reversion level m(.).
11.5 Multiple Gaussian factors models
In the one factor Vasicek model and its Hull-White extension, all the yields-to-
maturity R
t
(T) are linear in the spot interest rate r
t
, see (11.6). An immediate
consequence of this model is that yields corresponding to dierent maturities
are perfectly correlated:
Cor
Q
[ R
t
(T), R
t
(T

)[ T
t
] = 1
11.5. Multiple factor models 163
which is not consistent with empirical observation. The purpose of this section is
to introduce a simple model which avoids this perfect correlation, while keeping
the analytical tractability: the two-factor Hull-White model.
Let X and Y be two factors driven by the centred Vasicek model:
dX
t
= X
t
dt +dB
t
dY
t
= Y
t
dt +dB

t
Here , , , and are given parameters, and B, B

are two independent Brow-


nian motions under the risk neutral measure Q. The instantaneous interest rate
is modelled as an ane function of the factors X, Y :
r
t
= a(t) + X
t
+ Y
t
This two-factor model is then dened by four parameters and one deterministic
function a(t) to be determined by calibration on the market data.
Exploiting the independence of the factors X and Y , we immediately com-
pute that
P
t
(T) = E
Q
_
e

R
T
t
r
u
du

T
t
_
= e

R
T
0
a(t)dt
E
Q
_
e

R
T
t
X
u
du

T
t
_
E
Q
_
e

R
T
t
Y
u
du

T
t
_
= exp [A(t, T) (T t)X
t
(T t)Y
t
]
where
A(t, T) :=
_
T
t
a(u)du

2
2
_
T
t
(u)
2
du

2
2
_
T
t
(u)
2
du
and
(u) :=
1 e
u

, (u) :=
1 e
u

.
The latter explicit expression is derived by analogy with the previous computa-
tion in (11.5). The term structure of interest rates is now given by
R
t
(T) =
1
T t
[A(t, T) + (T t)X
t
+ (T t)Y
t
]
and the yields with dierent maturities are not perfectly correlated. Further
explicit calculation can be performed in order to calibrate this model to the
spot yield curve by xing the function a(.). We leave this calculation as an
exercise for the reader.
We nally comment on the interpretation of the factors. It is usually de-
sirable to write the above model in terms of factors which can be identied on
the nancial market. A possible parameterization is obtained by projecting the
model on the short and long rates. This is achieved as follows:
164CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
By direct calculation, we nd the expression of the short rate in terms of
the factors:
r
t
= X
t
+Y
t
+A
T
(t, t) ,
where
A
T
(t, t) =
A
T

T=t
= a(t)

2
2
(t)
2


2
2
(t)
2
.
Fix some positive time-to-maturity (say, 30 years), and let

t
= R
t
(t + ) ,
represent the long rate at time t. In the above two-factors model, we have

t
=
1

[A(t, t + ) + ( )X
t
+ ( )Y
t
]
In order to express the factors in terms of the short and the long rates,
we now solve the linear system
_
r
t

t
_
= D
_
X
t
Y
t
_
+
_
A
T
(t, t)
A(t,t+ )

_
(11.8)
where
D :=
_

( )

( )

_
.
Assuming that ,= and ,= 0, it follows that the matrix D is invertible,
and the above system allows to express the factors in terms of the short
and long rates:
_
X
t
Y
t
_
= D
1
__
r
t

t
_

_
A
T
(t, t)
A(t, t + )/
__
. (11.9)
Finally, using the dynamics of the factor (X, Y ), it follows from (11.8) and
(11.9) that the dynamics of the short and long rates are given by:
_
dr
t
d
t
_
= K
_
b(t, )
_
r
t

t
__
dt +D
_
dB
t
dB

t
_
where
K := D
_
0
0
_
D
1
,
and
b(t, ) := K
1
_
(A
t
+A
T
)(t, t)
(A
tT
+A
TT
)(t,t+ )

_
+
_
A
T
(t, t)
A(t,t+ )

_
.
Notice from the above dynamics that the pair (r
t
,
t
) is a two-dimensional
Hull-While model with mean reversion towrd b(t, t + ).
Exercise 11.2. Repeat the calculations of this section in the case where E[B
t
B

t
] =
t, t 0, for some (1, 1).
11.6. Introduction to HJM 165
11.6 Introduction to the Heath-Jarrow-Morton
model
11.6.1 Dynamics of the forward rates curve
These models were introduced in 1992 in order to overcome the two following
shortcomings of factor models:
Factor models ignore completely the distribution under the statistical measure,
and take the existence of the risk neutral measure as granted. In particular, this
implies an inconsistency between these models and those built by economists
for a predictability purpose.
The calibration of factor models to the spot yield curve is articial. First,
the structure of the yield curve implied by the model has to be computed, then
the parameters of the model have to be xed so as to match the observed yield
curve. Furthermore, the calibration must be repeated at any instant in time
leading to an inconsistency of the model.
Heath-Jarrow-Morton suggest to directly model the dynamics of the observ-
able yield curve. Given the spot forward rate curve, F
0
(T) for all maturities
0 T

T, the dynamics of the forward rate curve is dened by:
F
t
(T) = F
0
(T) +
_
t
0

u
(T)du +
_
t
0

u
(T) dW
u
= F
0
(T) +
_
t
0

u
(T)du +
n

i=1
_
t
0

i
u
(T)dW
i
u
where W is a Brownian motion under the statistical measure P with values in
R
n
, and
t
(T), t

T,
i
t
(T), t

T, i = 1, . . . , n, are adapted processes
for every xed maturity T. Throughout this section, we will assume that all
stochastic integrals are well-dened, and we will ignore all technical conditions
needed for the subsequent analysis.
11.6.2 The Heath-Jarrow-Morton drift condition
The rst important question is whether such a model allows for arbitrage. In-
deed, one may take advantage of the innite number of assets available for
trading in order to build an arbitrage opportunity. To answer the question, we
shall derive the dynamics of the price process of zero-coupon bonds, and impose
the existence of a risk neutral measure for these tradable securities. This is a
sucient condition for the absence of arbitrage opportunities, as the discounted
wealth process corresponding to any portfolio strategy would be turned into a
local martingale under the risk neutral measure, hence to a supermartingale
thanks to the nite credit line condition. The latter supermartingale property
garantees that no admissible portfolio of zero-coupon bonds would lead to an
arbitrage opportunity.
166CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
We rst observe that
d
_
T
t
F
t
(u)du = F
t
(t)dt +
_
T
t
dF
t
(u)du
= r
t
dt +
_
T
t

t
(u)dudt +
_
T
t

t
(u)du dW
t
= r
t
dt +
t
(T)dt +
t
(T)dW
t
where we introduced the adapted processes:

t
(T) =
_
T
t

t
(u)du and
t
(T) =
_
T
t

t
(u)du
Since P
t
(T) = e

R
T
t
F
t
(u)du
, it follows from It os formula that
dP
t
(T) = P
t
(T)
__
r
t

t
(T) +
1
2
[
t
(T)[
2
_
dt
t
(T) dW
t
_
We next impose that the zero-coupon bond price satises the risk neutral dy-
namics:
dP
t
(T)
P
t
(T)
= r
t
dt
t
(T) dB
t
where
B
t
:= W
t
+
_
t
0

u
du, t 0 ,
denes a Brownian motion under some risk neutral measure Q, and the so-
called risk premium adapted R
n
valued process
t
, t 0 is independent of
the maturity variable T. This leads to the Heath-Jarrow-Morton drift condition:

t
(T)
t
=
t
(T)
1
2
[
t
(T)[
2
. (11.10)
Recall that

T
(t, T) = (t, T) and

T
(t, T) = (t, T) .
Then, dierentiating with respect to the maturity T, we see that

t
(T)
t
=
t
(T)
t
(T)
t
(T) ,
and therefore
dF
t
(T) =
t
(T)dt +
t
(T) dW
t
=
t
(T)
t
(T)dt +
t
(T) dB
t
We nally derive the risk neutral dynamics of the instantaneous interest rate
under the HJM drift restriction (11.10). Recall that:
r
T
= F
T
(T) = F
0
(T) +
_
T
0

u
(T)du +
_
T
0

u
(T) dW
u
11.6. Introduction to HJM 167
Then:
dr
T
=

T
F
0
(T) dT +
T
(T) dT +
_
T
0

T

u
(T)du dT
+
_
T
0

T

u
(T) dW
u
dT +
T
(T) dW
T
.
Organizing the terms, we get:
dr
t
=
t
dt +
t
(t) dW
t
where

t
=

T
F
0
(t) +
t
(t) +
_
t
0

T

u
(t)du +
_
t
0

T

u
(t) dW
u
or, in terms of the QBrownian motion:
dr
t
=
0
t
dt +
t
(t) dB
t
where

0
t
=

T
F
0
(t) +
t
(t)
t
(t) +
_
t
0

T
(
u
(t)
u
(t)) du +
_
t
0

T

u
(t) dW
u
=

T
F
0
(t) +
_
t
0

T
(
u
(t)
u
(t)) du +
_
t
0

T

u
(t) dW
u
.
11.6.3 The Ho-Lee model
The Ho and Lee model corresponds to the one factor case (n = 1) with a
constant volatility of the forward rate:
dF
t
(T) =
t
(T)dt +dW
t
=
2
(T t)dt +dB
t
The dynamics of the zero-coupon bond price is given by:
dP
t
(T)
P
t
(T)
= r
t
dt (T t)dB
t
with r
t
= F
0
(t) +
1
2

2
t +B
t
By the dynamics of the forward rates, we see that the only possible movements
in the yield curve are parallel shifts, i.e. all rates along the yield curve uctuate
in the same way.
11.6.4 The Hull-White model
The Hull and White model corresponds to one-factor case (n = 1) with the
following dynamics of the forward rates
dF
t
(T) =
2
e
(Tt)
1 e
(Tt)
k
dt +e
(Tt)
dB
t
168CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
The dynamics of the zero-coupon bond price is given by:
dP
t
(T)
P
t
(T)
= r
t
dt

_
1 e
(Tt)
_
dB
t
with
r
t
= a(t) +
_
t
0
e
(tu)
dB
u
,
and
a(t) := F
0
(t) +

2
2
2
_
e
2t
1
_
+
1 e
t

.
This implies that the dynamics of the short rate are:
dr
t
= (m(t) r
t
) dt +dB
t
where a(t) = m(t) +m

(t)
11.7 The forward neutral measure
Let T
0
> 0 be some xed maturity. The T
0
forward neutral measure Q
T
0
is
dened by the density with respect to the risk neutral measure Q
dQ
T
0
dQ
=
e

_
T
0
0
r
t
dt
P
0
(T
0
)
,
and will be shown in the next section to be a powerful tool for the calculation
of prices of derivative securities in a stochastic interest rates framework.
Proposition 11.3. Let M = M
t
, 0 t T
0
be an Fadapted process, and
assume that

M is a Qmartingale. Then the process

t
:=
M
t
P
t
(T
0
)
, 0 t T
0
,
is a martingale under the T
0
forward neutral measure Q
T
0
.
Proof. We rst verify that is Q
T
0
integrable. Indeed:
E
Q
T
0
[[
t
[] = P
0
(T
0
)
1
E
Q
_
e

R
T
0
0
r
u
du
[M
t
[
P
t
(T
0
)
_
= P
0
(T
0
)
1
E
Q
_
e

R
t
0
r
u
du
[M
t
[
_
= P
0
(T
0
)
1
E
Q
_
[

M
t
[
_
< ,
11.7. Forward neutral measure 169
where the second equality follows from the tower property of conditional expec-
tations. We next compute for 0 s < t that
E
Q
T
0
[
t
[T
s
] =
E
Q
_
e

R
T
0
0
r
u
du M
t
P
t
(T
0
)

T
s
_
E
Q
_
e

R
T
0
0
r
u
du

T
s
_
=
E
Q
_
e

R
T
0
s
r
u
du M
t
P
t
(T
0
)

T
s
_
E
Q
_
e

R
T
0
s
r
u
du

T
s
_
=
E
Q
_
e

R
t
s
r
u
du
M
t

T
s
_
E
Q
_
e

R
T
0
s
r
u
du

T
s
_
=
E
Q
_

M
t

T
s
_
P
s
(T
0
)
=
s
,
where we used the Bayes rule for the rst equality and the tower property of
conditional expectation in the third equality.
The above result has an important nancial interpretation. Let S be the
price process of any tradable security. Then, the no-arbitrage condition ensure
that M =

S is a martingale under some risk neutral measure Q. By denition
is the price process of the T
0
forward contract on the security S. Hence
Proposition 11.3 states that
The price process of the T
0
forward contract on any tradable security
is a martingale under the T
0
forward measure Q
T
0
.
We continue our discussion of the T
0
forward measure in the context of the
gaussian Heath-Jarrow-Morton model for the zero-coupon bond prices:
dP
t
(T)
P
t
(T)
= r
t
dt (T t)dB
t
where (u) =
1 e
(u)
u
,
which corresponds to the solution
P
t
(T) = P
0
(T)e
R
t
0
(r
u

1
2

2
(Tu)
2
)du
R
t
0
(Tu)dB
u
, 0 t T . (11.11)
Recall that this model corresponds also to the Hull-White extension of the
Vasicek model, up to the calibration to the spot yield curve. Since P
T
(T) = 1,
it follows from (11.11) that
dQ
T
0
dQ
=
e

R
T
0
r
u
du
P
0
(T)
= exp
_

1
2
_
t
0

2
(T u)
2
du
_
t
0
(T u)dB
u
_
,
and by the Cameron-Martin formula, we deduce that the process
W
T
0
t
:= B
t
+
_
t
0
(T u)du, 0 t T
0
, (11.12)
170CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
is a Brownian motion under the T
0
forward neutral measure Q
T
0
.
11.8 Derivatives pricing under stochastic inter-
est rates and volatility calibration
11.8.1 European options on zero-coupon bonds
The objective of this section is to derive a closed formula for the price of a
European call option on a zero-coupon bond dened by the payo at time T
0
>
0:
G := (P
T
0
(T) K)
+
for some T T
0
in the context of the above gaussian Heath-Jarrow-Morton model.
We rst show how the use of the forward measure leads to a substantial
reduction of the problem. By denition of the T
0
forward neutral measure, the
no-arbitrage price at time zero of the European call option dened by the above
payo is given by
p
0
(G) = E
Q
_
e

R
T
0
0
r
t
dt
(P
T
0
(T) K)
+
_
= P
0
(T
0
)E
Q
T
0
_
(P
T
0
(T) K)
+
_
.
Notice that, while rt expectation requires the knowledge of the joint distri-
bution of the pair
_
e

R
T
0
0
r
t
dt
, P
T
0
(T)
_
under Q, the second expectation only
requires the distribution of P
T
0
(T) under Q
T
0
. But, in view of Proposition
11.3, an additional simplication can be gained by passing to the price of the
T
0
forward contract on the zero-coupon bond with maturity T:

t
=
P
t
(T)
P
t
(T
0
)
, 0 t T
0
.
Since P
T
0
(T
0
) = 1, it follows that
T
0
= P
T
0
(T), and therefore:
p
0
(G) = P
0
(T
0
)E
Q
T
0
_
(
T
0
K)
+
_
.
Since the process is a Q
T
0
martingale by Proposition 11.3, we only need to
compute the volatility of this process. An immediate calculation by means of
It os formula shows that
d
t

t
= ((T t) (T
0
t)) dW
T
0
t
.
By analogy with the previously derived Black-Scholes formula with deterministic
coecients (9.8), this provides:
p
0
(G) = P
0
(T
0
) [
0
N(d
+
(
0
, K, v(T
0
))) KN(d

(
0
, K, v(T
0
)))]
= P
0
(T)N
_
d
+
(P
0
(T),

K, v(T
0
))
_


KN
_
d

(P
0
(T),

K, v(T
0
))
_
,
(11.13)
11.8. Stochastic interest rates, calirbation 171
where

K := KP
0
(T
0
) , v(T
0
) :=
2
_
T
0
((T t) (T
0
t))
2
dt . (11.14)
Given this simple formula for the prices of options on zero-coupon bonds, one
can x the parameters and so as to obtain the best t to the observed
options prices or, equivalently, the corresponding implied volatilities. Of course
with only two free parameters ( and ) there is no hope to perfectly calibrate
the model to the whole strucrue of the implied volatility surface. However, this
can be done by a further extension of this model.
11.8.2 The Black-Scholes formula under stochastic inter-
est rates
In this section, we provide an extension of the Black-Scholes formula for the
price of a European call option dened by the payo at some maturity T > 0:
G := (S
T
K)
+
for some exercise price K > 0 ,
to the context of a stochastic interest rate. Namely, the underlying asset price
process is dened by
S
t
= S
0
e
R
T
0
(r
u

1
2
|(u)|
2
)du+
R
T
0
(u)dB
u
, t 0 ,
where B is a Brownian motion in R
2
under the risk neutral measure Q, and
= (
1
,
2
) : R
+
R is a deterministic C
1
function. The interest rates
process is dened by the Heath-Jarrow-Morton model for the prices of zero-
coupon bonds:
P
t
(T) = P
0
(T)e
R
T
0
(r
u

1
2

2
(Tu)
2
)du
R
T
0
(Tu)dW
0
1
u
,
where
(t) :=
1 e
t

,
for some parameters , > 0. This models allows for a possible correlation
between the dynamics of the underlying asset and the zero-coupon bonds.
Using the concept of formard measure, we re-write the no-arbitrage price of
the European call option in:
p
0
(G) = E
Q
_
e

R
T
0
r
t
dt
(S
T
K)
+
_
= P
0
(T)E
Q
T
_
(S
T
K)
+
_
= P
0
(T)E
Q
T
_
(
T
K)
+
_
,
where
t
:= P
t
(T)
1
S
t
is the price of the Tforward contract on the security S.
By Proposition 11.3, the process
t
, t 0 is a Q
T
martingale measure, so its
172CHAPTER 11. GAUSSIANTERMSTRUCTURE OF INTEREST RATES
dynamics has zero drift when expressed in terms of the Q
T
Brownian motion
W
T
. We then calculate the volatility component in its dynamics by means of
It os formula, and we obtain:
d
t

t
= (
1
(t) +(T t)) dW
T
1
+
2
(t)dW
T
2
.
Hence, under the Tforward neutral measure Q
T
, the process follows a time-
dependent Black-Scholes model with zero interest rate and time dependent
squared volatility (
1
(t) +(T t))
2
+
2
(t)
2
. We can now take advantage
of the calculation performed previously in (9.8), and conclude that
p
0
(G) = P
0
(T) [
0
N(d
+
(
0
, K, v(T))) K N(d

(
0
, K, v(T)))]
= S
0
N
_
d
+
(S
0
,

K, v(T))
_


K N
_
d

(S
0
,

K, v(T))
_
,
where

K := KP
0
(T) and v(T) :=
_
T
0
_
(
1
(t) +(T t))
2
+
2
(t)
2
_
dt .
Chapter 12
Introduction to nancial
risk management
The publication by Harry Markowitz in 1953 of his dissertation on the theory
of risk and return, has led to the understanding that nancial institutions take
risks as part of their everyday activities, and that these risks must therefore
be measured and actively managed. The importance of risk managers within
banks and investment funds has been growing ever since, and now the chief risk
ocer (CRO) is often a member of the top management committee. The job
of a risk manager now requires sophisticated technical skills, and risk control
departments of major banks employ many mathematicians and engineers.
The role of risk management in a nancial institution has several important
aspects. The rst objective is to identify the risk exposures, that is the dierent
types of risk (see below) which aect the company. These exposures should then
be quantied, and measured. It is generally not possible to quantify the risk
exposure with a single number, or even associate a probability distribution to
it, since some uncertain outcomes cannot be assigned a probability in a reliable
manner. Modern risk management usually combines a probabilistic approach
(e.g. Value at Risk) with worst-case scenario analysis. These quantitative and
qualitative assessments of risk exposures form the basis of reports to senior man-
agement, which must convey the global picture in a concise and non-technical
way.
More importantly, the risk management must then design a risk mitigation
strategy, that is, decide, taking into account the global constraints imposed by
the senior management, which risk exposures are deemed acceptable, and which
must be reduced, either by hedges or by limiting the size of the positions. In
a trading environment, this strategy will result in precise exposure limits for
each trading desk, in terms of the Value at Risk, the sensitivities to dierent
risk factors, and the notional amounts for dierent products. For acceptable
exposures, provisions will be made, in order to ensure the solvency of the bank
if the corresponding risky scenarios are realized.
173
174 CHAPTER 12. FINANCIAL RISK MANAGEMENT
Finally, it is the role of risk management to monitor the implementation and
performance of the chosen risk mitigation strategy, by validating the models and
algorithms used by the front oce for pricing and computing hedge ratios, and
by double-checking various parameter estimates using independent data sources.
This part in particular requires extensive technical skills.
With the globalization of the nancial system, the bank risk management
is increasingly becoming an international aair, since, as we have recently wit-
nessed, the bankruptcy of a single bank or even a hedge fund can trigger nancial
turmoil around the world and bring the entire nancial system to the edge of
a collapse. This was the main reason for introducing the successive Basel Cap-
ital Accords, which dene the best practices for risk management of nancial
institutions, determine the interaction between the risk management and the
nancial regulatory authorities and formalize the computation of the regulatory
capital, a liquidity reserve designed to ensure the solvency of a bank under un-
favorable risk scenarios. These agreements will be discussed in more detail in
the last section of this chapter.
12.1 Classication of risk exposures
The dierent risk exposures faced by a bank are usually categorized into several
major classes. Of course this classication is somewhat arbitrary: some risk
types are dicult to assign a category and others may well belong to several
categories. Still, some classication is important since it gives a better idea
about the scope of possible risk sources, which is important for eective risk
managemet.
12.1.1 Market risk
Market risk is probably the best studied risk type, which does not mean that it
is always the easiest to quantify. It refers to the risk associated with movements
of market prices of securities and rates, such as interest rate or exchange rate.
Dierent types of market risk are naturally classied by product class: inter-
est rate risk, where one distinguishes the risk of overall movements of interest
rates and the risk of the change in the shape of the yield curve; equity price
risk, with a distinction between global market risk and idiosyncratic risk of in-
dividual stocks; foreign exchange risk etc. Many products will be sensitive to
several kinds of market risk at the same time.
One can also distinguish the risk associated to the underlying prices and
rates themselves, and the risk associated to other quantities which inuence asset
prices, such as volatility, implied volatility smile, correlation, etc. If the volatility
risk is taken into accout by the modeling framework, such as, in a stochastic
volatility model, it is best viewed as market risk, however if it is ignored by
the model, such as in the Black-Scholes framework, then it will contribute to
model risk. Note that volatility and correlation are initially interpreted as model
parameters or non-observable risk factors, however, nowadays these parameters
12.1. Risk exposures 175
may be directly observable and interpretable via the quoted market prices of
volatility / correlation swaps and the implied volatilities of vanilla options.
Tail risk Yet another type of market risk, the tail risk, or gap risk, is associ-
ated to large sudden moves (gaps) in asset prices, and is related to the fat tails
of return distributions. A distribution is said to have fat tails if it assigns to very
large or very small outcomes higher probability than the Gaussian distribution
with the same variance. For a Gaussian random variable, the probability of a
downside move greater than 3 standard deviations is 3 10
7
but such moves
do happen in practice, causing painful losses to nancial institutions. These
deviations from Gaussianity can be quantied using the skewness s(X) and the
kurtosis (X):
s(X) =
E(X EX)
3
(Var X)
3/2
, (X) =
E(X EX)
4
(Var X)
2
.
The skewness measures the asymmetry of a distribution and the kurtosis mea-
sures the fatness of tails. For a Gaussian random variable X, s(X) = 0 and
(X) = 3, while for stock returns typically s(X) < 0 and (X) > 3. In a recent
study, was found to be close to 16 for 5-minute returns of S&P index futures.
Assume that conditionnally on the value of a random variable V , X is cen-
tered Guassian with variance f(V ). Jensens inequality then yields
(X) =
E[X
4
]
E[X
2
]
2
=
3E[f
2
(V )]
E[f(V )]
2
> 3.
Therefore, all conditionnally Gaussian models such as GARCH and stochastic
volatility, produce fat-tailed distributions.
The tail risk is related to correlation risk, since large downward moves are
usually more strongly correlated than small regular movements: during a sys-
temic crisis all stocks fall together.
Managing the tail risk In a static framework (one-period model), the tail
risk can be accounted for using fat-tailed distributions (Pareto). In a dynamic
approach this is usually accomplished by adding stochastic volatility and/or
jumps to the model, e.g. in the Merton model
dS
t
S
t
= dt +
t
dW
t
+dJ
t
,
where J
t
is the process of log-normal jumps.
To take into account even more extreme events, to which it is dicult to
assign a probability, one can add stress scenarios to models by using extreme
values of volatility / correlations or taking historical data from crisis periods.
176 CHAPTER 12. FINANCIAL RISK MANAGEMENT
Brazil BBB- Japan AA
China A+ Russia BBB
France AAA Tunisia BBB
India BBB- United States AAA
Table 12.1: Examples of credit ratings (source: Standard & Poors, data from
November 2009).
12.1.2 Credit risk
Credit risk is the risk that a default, or a change in credit quality of an entity
(individual, company, or a sovereign country) will negative aect the value of the
banks portfolio. The portfolio may contain bonds or other products issued by
that entity, or credit derivative products linked to that entity. One distinguishes
the default risk from the spread
1
risk, i.e., risk of the depreciation of bonds due
to a deterioration of the creditworthiness of their issuer, without default.
Credit rating The credit quality is measured by the credit rating: an evalu-
ation of the creditworthiness of the borrower computed internally by the bank
or externally by a rating agency. The credit ratings for sovereign countries
and large corporations are computed by international rating agencies (the best
known ones are Standard & Poors, Moodys and Fitch ratings) and have letter
designations. The rating scale of Standard and Poors is AAA, AA, A, BBB,
BB, B, CCC, CC, C, D, where AAA is the best possible rating and D corre-
sponds to default. Ratings from AA to CCC may be further rened by the
addition of a plus (+) or minus (-). Bonds with ratings above and including
BBB- are considered investment grade or suitable for long-term investment,
whereas all others are considered speculative. Table 12.1 reproduces the S&P
credit ratings for several countries as of November 2009.
Credit derivatives The growing desire of the banks and other nancial in-
stitutions to remove credit risk from their books to reduce regulatory capital,
and more generally, to transfer credit risk to investors willing to bear it, has led
to the appearance, in the late 90s of several classes of credit derivative products.
The most widely used ones are Credit Default Swaps (CDS) and Collateralized
Debt Obligations (CDO). The structure, the complexity and the role of these
two types of instruments is entirely dierent. The credit default swap is de-
signed to oer protection against the default of a single entity (let us call it
Risky Inc.). The buyer of the protection (and the buyer of the CDS) makes
regular premium payments to the seller of the CDS until the default of Risky
Inc. or the maturity of the CDS. In exchange, when and if Risky Inc. defaults,
the seller of the CDS makes a one-time payment to the seller to cover the losses
from the default.
1
The credit spread of an entity is dened as the dierence between the yield to maturity
of the bonds issued by the entity and the yield to maturity of the risk-free sovereign bonds.
12.1. Risk exposures 177
The CDO is designed to transfer credit risk from the books of a bank to the
investors looking for extra premium. Suppose that a bank owns a portfolio of
defaultable loans (P). To reduce the regulatory capital charge, the bank creates a
separate company called Special Purpose Vehicle (SPV), and sells the portfolio
P to this company. The company, in turn, issues bonds (CDOs) which are
then sold to investors. This process of converting illiquid loans into more liquid
securities is known as securitization. The bonds issued by the SPV are divided
onto several categories, or tranches, which are reimbursed in dierent order from
the cash ows of the inital portfolio P. The bonds from the Senior tranche are
reimbursed rst, followed by Mezzanine, Junior and Equity tranches (in this
order). The senior tranche thus (in theory) has a much lower default risk than
the bonds in the original portfolio P, since it is only aected by defaults in P after
all other tranches are destroyed. The tranches of a CDO are evaluated separately
by rating agencies, and before the start of the 2008 subprime crisis, senior
tranches received the highest ratings, similar to the bonds of the most nancially
solid corporations and sovereign states. This explained the spectacular growth
of the CDO market with the global notional of CDOs issued in 2007 totalling
to almost 500 billion US dollars. However, senior CDO tranches are much more
sensitive to systemic risk and tend to have lower recovery rates
2
than corporate
bonds in the same rating class. As a typical example, consider the situation
when the portfolio P mainly consists of residential mortgages. If the defaults
are due to individual circumstances of each borrower, the senior tranche will
be protected by diversication eects. However, in case of a global downturn
of housing prices, such as the one that happened in the US in 20072008, a
large proportion of borrowers may default, leading to a severe depreciation of
the senior tranche.
Counterparty risk The counterparty risk is associated to a default or a
downgrade of a counterparty as opposed to the entity underlying a credit deriva-
tive product which may not necessarily be the counterparty. Consider a situation
where a bank B holds bonds issued by company C, partially protected with a
credit default swap issued by another bank A. In this case, the portfolio of B is
sensitive not only to the credit quality of C but also to the credit quality of A,
since in case of a default of C, A may not be able to meet its obligations on the
CDS bought by B.
12.1.3 Liquidity risk
Liquidity risk may refer to asset liquidity risk, that is, the risk of not being able
to liquidate the assets at the prevailing market price, because of an insucient
depth of the market, and funding liquidity risk, that is, the risk of not being
able to raise capital for current operations. The standard practice of marking to
market a portfolio of derivatives refers to determining the price of this portfolio
2
The recovery rate of a bond is the proportion of the notional recovered by the lender after
default.
178 CHAPTER 12. FINANCIAL RISK MANAGEMENT
for accounting purposes using the prevailing market price of its components.
However, even for relatively liquid markets, where many buyers and sellers are
present, and there is a well-dened market price, the number of buy orders
close to this price is relatively small. To liquidate a large number of assets the
seller will need to dig deep into the order book
3
, obtaining therefore a much
lower average price than if he only wanted to sell a single share (see g. 12.1).
This type of risk is even more important for illiquid assets, where the ballance-
sheet price is computed using an internal model (marking to model). The model
price may be very far from the actual price which can be obtained in the market,
especially in the periods of high volatility. This is also related to the issue of
model risk discussed below. The 2008 nancial crisis started essentially as an
asset liquidity crisis, when the market for CDOs suddenly shrunk, leading to
massive depreciation of these products in the banks ballance sheets. The fear of
imminent default of major banks, created by these massive depreciations, made
it dicult for them to raise money and created a funding liquidity crisis.
12.1.4 Operational risk
Operational risk refers to losses resulting from inadequate or failed internal pro-
cesses, people and systems or from external events. This includes deliberate
fraud by employees, several spectacular examples of which we have witnessed in
the recent years. The Basel II agreement provides a framework for measuring
operational risk and making capital provisions for it, but the implementation of
these quantitative approaches faces major problems due to the lack of historical
data and extreme heavy-tailedness of some types of operational risk (when a sin-
gle event may destroy the entire institution). No provisions for operational risk
can be substituted for strict and frequent internal controls, up-to-date computer
security and expert judgement.
12.1.5 Model risk
Model risk is especially important for determining the prices of complex deriva-
tive products which are not readily quoted in the market. Consider a simple
European call option. Even if it is not quoted, its price in the balance sheet
may be determined using the Black-Scholes formula. This is the basis of a
widely used technique known as marking to model, as opposed to marking to
market. However, the validity of this method is conditionned by the validity of
the Black-Scholes model assumptions, such as, for example, constant volatility.
If the volatility changes, the price of the option will also change. More gener-
ally, model risk can be caused by an inadequate choice of the pricing model,
parameter errors (due to statistical estimation errors or nonstationary param-
eters) and inadequate implementation (not necessarily programming bugs but
perhaps an unstable algorithm which amplies data errors). While the eect
of parameter estimation errors may be quantied by varying the parameters
3
The order book contains all outstanding limit orders at a given time.
12.2. Risk management by sensitivity 179
within condence bounds, and inadequate implementations can be singled out
by scrupulous expert analysis, the rst type of model risk (inadequate models)
is much more dicult to analyze and quantify. The nancial environment is an
extremely complex system, and no single model can be used to price all prod-
ucts in a banks portfolio. Therefore, a specic model is usually chosen for each
class of products, and it is extremely important to choose the model which takes
into account the risk factors, relevant for a given product class. For example:
stochastic volatility may not be really necessary for pricing short-dated Euro-
pean options, but it is essential for long-dated forward start options or cliquets.
The selected model will then be tted to a set of calibration instruments, and
here it is essential on one hand that the model is rich enough to match the
prices of all instruments in the calibration set (for instance it is impossible to
calibrate the constant volatility Black-Scholes model to the entire smile), and on
the other hand that the model parameters are identiable in a unique and stable
way from the prices of calibration instruments (it is impossible to calibrate a
complex stochastic volatility model using a single option price).
Finally, the calibrated model is used for pricing the non-quoted exotic prod-
ucts. The nal price is always explained to some extent by the model used and
to some extent by the calibration instruments. Ideally, it should be completely
determined by the calibration instruments: we want every model, tted to the
prices of calibration instruments, to yield more or less the same price of the
exotic. If this is not the case, then one is speaking of model risk. To have an
idea of this risk, one can therefore price the exotic option with a set of dierent
models calibrated to the same instruments. See [37] for an example of possible
bounds obtained that way and [9] for more details on model risk.
The notion of model risk is closely linked to model validation: one of the
roles of a risk manager in a bank which consists in scrutinizing and determining
validity limits for models used by the front oce. The above arguments show
that, for sensible results, models must be validated in conjunction with the set
of calibration instruments that will be used to t the model, and with the class
of products which the model will be used to price.
Other risk types which we do not discuss here due to the lack of space include
legal and regulatory risk, business risk, strategic risk and reputation risk [12].
12.2 Risk exposures and risk limits: sensitivity
approach to risk management
A traditional way to control the risk of a trading desk is to identify the risk
factors relevant for this desk and impose limits on the sensitivities to these
risk factors. The sensitivity approach is an important element of the risk man-
agers toolbox, and it is therefore important to understand the strengths and
weaknesses of this methodology.
The sensitivity is dened as the variation of the price of a nancial product
which corresponds to a small change in the value of a given risk factor, when
180 CHAPTER 12. FINANCIAL RISK MANAGEMENT
0 5 10 15
0
2
4
6
8
10
12
Price
O
r
d
e
r

s
i
z
e
Buy orders
Figure 12.1: Illustration of asset liquidity risk: a sell order for a single share
would be executed at a price of 10 euros, while a market sell order for 20 shares
would be executed at a weighted average price of 8.05 euros per share.
all other risk factors are kept constant. In mathematical terms, sensitivity is
very close to a partial derivative. For example, the standard measure of the
interest rate risk of a bond is called DV01, that is, the dollar value of one basis
point, or, in other words, the change of the value of a bond corresponding to
a 1-bp decrease of its yield to maturity. For an option, whose price will be
denoted by C, the basic sensitivities are the delta
C
S
, the vega
C

, where is
the volatility, the theta
C
t
and the rho
C
r
, where r is the interest rate of the
zero-coupon bond with the same maturity as the option. The gamma

2
C
S
2
is
the price sensitivity of delta, and can be also used to quantify the sensitivity of
the option price itself to larger price moves (as the second term in the Taylor
expansion).
An option trading desk typically has a limit on its delta and vega exposure
(for each underlying), and other sensitivities are also monitored. The vega is
typically interpreted as the exposure to Black-Scholes implied volatility, and
since the implied volatility depends on the strike and maturity of the option,
the sensitivity to the implied volatility surface is a natural extension. This is
typically taken into account by bucketing the smile, that is computing sensi-
tivities to perturbations of specic sections of the smile (for a certain range of
strikes and maturities). The same approach is usually applied to compute the
sensitivities to the yield curve.
When working with sensitivities it is important to understand the following
features and limitations of this approach:
Sensitivities do not provide information about the actual risk faced by a
bank, but only relate changes in the values of derivative products to the
changes of basic risk factors.
Sensitivities cannot be aggregated across dierent underlyings and across
12.3. V@R and the global approach 181
dierent types of sensitivities (delta plus gamma): they are therefore local
measures and do not provide global information about the entire portfolio
of a bank.
Sensitivities are meaningful only for small changes of risk factors: they
do not provide accurate information about the reaction of the portfolio to
larger moves (jumps).
The notion of a sensitivity to a given risk factor is associated to a very
specic scenario of market evolution, when only this risk factor changes
while others are kept constant. The relevance of a particular sensitivity
depends on whether this specic scenario is plausible. As an example, con-
sider the delta of an option, which is often dened as the partial derivative
of the Black-Scholes price of this option computed using its actual implied
volatility:
imp
t
(T, K) =
BS
s
(S
t
,
imp
, K, T) (see chapter 8). The implicit
assumption is that when the underlying changes, the implied volatility
of an option with a given strike remains constant. This is the so-called
sticky strike behavior, and it is usually not observed in the markets, which
tend to have the sticky moneyness behavior, where the implied volatility
of an option with a given moneyness level m = K/S
t
is close to constant:

imp
=
imp
(K/S
t
). Therefore, the true sensitivity of the option price to
changes in the underlying is

t
=
d
ds
BS(S
t
,
imp
(K/S
t
), K, T)
=
imp
t
(T, K)
K
S
2
t
BS(S
t
,
imp
, K, T)

d
imp
dm
.
In equity markets, the implied volatility is usually decreasing as function
of the moneyness value (skew eect), and the Black-Scholes delta there-
fore undervalues the sensitivity of option price to the movements of the
underlying.
When computing the sensitivities to the implied volatility smile or the
yield curve, the bucketing approach assumes that one small section of
the smile/curve moves while the other ones remain constant, which is
clearly unrealistic. A more satisfactory approach is to identify possible
orthogonal deformation patterns, such as level changes, twists, convex-
ity changes, from historical data (via principal component analysis), and
compute sensitivities to such realistic deformations.
12.3 Value at Risk and the global approach
The need to dene a global measure of the amount of capital that the bank
is actually risking, has led to the appearance of the Value at Risk (VaR). The
VaR belongs to the class of Monetary risk measures which quantify the risk as a
dollar amount and can therefore be interpreted as regulatory capital required to
182 CHAPTER 12. FINANCIAL RISK MANAGEMENT
VaR(X) 3 0 3
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Figure 12.2: Denition of the Value at Risk
cover the risk. The VaR of a position is associated to a time horizon T (usually
1 or 10 days) and condence level (usually 99% or 95%), and is dened as the
opposite of the (1 )-quantile of the prot and loss of this position in T days
(see Fig. 12.2):
VaR

(X) := infx R : P[X x] 1 .


The advantages of the VaR include its simplicity (the risk is summarized in
a single number, whose meaning is easy to explain to people without technical
knowledge) and the fact that it is a portfolio risk measure, that is, it summarizes
all risk factors aecting a large portfolio, taking into account correlations and
dependencies.
Computing Value at Risk The rst step in implementing a VaR compu-
tation engine is to identify a reasonable number of risk factors, which span
suciently well the universe of risks faced by the bank. This is mainly done to
reduce the dimension of the problem compared to the total number of products
in the portfolio. For example, for a bond portfolio on the same yield curve, three
risk factors (short-term, medium-term and long-term yield) may be sucient.
The next step is to identify the dependency of each product in the portfolio on
the dierent risk factors (via a pricing model). Finally, scenarios or probabil-
ity laws for the evolution of risk factors are identied and a specic method is
applied to compute the VaR.
In local valuation methods, the portfolio value and its sensitivities are com-
puted at the current values of risk factors only. In full valuation methods, all
products in the portfolio must be repriced with perturbed values of risk factors.
A particularly simple method is the Gaussian or normal VaR, where the risk fac-
tors are assumed Gaussian, which makes it possible to compute the VaR without
simulation. In historical VaR method, the historically observed changes of risk
factors are used, and in Monte Carlo VaR method the variations of risk factors
12.3. V@R and the global approach 183
are sampled from a fully calibrated model and the VaR value is estimated by
Monte Carlo.
We now discuss the three most popular methods for computing VaR using
the example of a portfolio of N options written on d dierent underlyings with
values S
1
t
, . . . , S
d
t
. We denote by P
i
t
the price of i-th option at time t: P
i
t
=
P
i
(t, S
1
t
, . . . , S
d
t
). Let w
i
be the quantity of i-th option in the portfolio, so that
the portfolio value is V
t
=

i
w
i
P
i
t
.
The delta-normal approach only takes into accout the rst-order sensitiv-
ities (deltas) of option prices, and assumes that the daily increments of
risk factors follow a normal distribution: S
t
N(0,
t
). The means are
usually ignored in this approach, and the covariance matrix is estimated
from the historical time series using a moving window. The variation of
the portfolio is approximated by
V

j
w
i
P
i
S
j
S
j
.
Therefore, the variance of V is given by
Var [V ]

ijkl
w
i
P
i
S
j
w
k
P
k
S
l

jl
,
and the daily Value at Risk for a given condence level may be estimated
via
VaR

= N()
_
Var [V ],
where N is the standard normal distribution function.
This method is extremely fast, but not so popular in practice due to its
imporant drawbacks: it may be very inaccurate for non-linear derivatives
and does not allow for fat tails in the distributions of risk factors.
The historical approach is a full valuation method which relies on historical
data for obtaining risk factor scenarios. Most often, one uses one year of
daily increments of risk factor values: (S
j
i
)
j=1...d
i=1...250
. These increments
are used to obtain 250 possible values for the portfolio price on the next
day:
V
i
=
N

k=1
w
k
P
k
(t + 1, S
1
t
+ S
1
i
, . . . , S
d
t
+ S
d
i
) P
k
(t, S
1
t
, . . . , S
d
t
)
The Value at Risk is then estimated as the corresponding empirical quan-
tile of V .
This method preserves the dependency structure and the distributional
properties of the data, and applies to some extent to nonlinear products,
however it also has a number of drawbacks. All intertemporal dependen-
cies which may be present in the data, such as stochastic volatility, are
184 CHAPTER 12. FINANCIAL RISK MANAGEMENT
destroyed. Some dependency on the current volatility may be preserved by
taking a relatively short time window, but in this case important scenar-
ios which have occured in not-so-recent past may be lost. Also, important
no-arbitrage relations between risk factors (such as, between stock and
option prices) may be violated.
The Monte Carlo VaR, used by most major banks, is the most exible
approach, but it is also the most time consuming. It is a full valuation
method where the increments S
i
are simulated using a full-edged sta-
tistical model, which may include fat tails, intertemporal dependencies
such as stochastic volatility, correlations or copula-based cross-sectional
dependencies, and so on. The Monte Carlo computation of a global VaR
estimate for the entire banks portfolio is probably the most time consum-
ing single computation that a bank needs to perform, and may take an
entire night of computing on a cluster of processors.
Independently of the chosen method of computation, the VaR engine must
be systematically back-tested, that is, the number of daily losses exceeding the
previous days VaR in absolute value must be carefully monitored. For a 95%
VaR, these losses should be observed roughly on ve days out of 100 (with both
a larger and a smaller number being an indication of a poor VaR computation),
and they must be uniformly distributed over time, rather than appear in clusters.
Shortcomings of the VaR approach The Value at Risk takes into account
the probability of loss but not the actual loss amount above the qualtile level:
as long as the probability of loss is smaller than , the VaR does not distinguish
between losing $1000 and $1 billion. More generally, VaR is only suitable for
everyday activities and loss sizes, it does not accout for extreme losses which
happen with small probability. A sound risk management system cannot there-
fore be based exclusively on the VaR and must include extensive stress testing
and scenario analysis.
Because the VaR lacks the crucial subadditivity property (see next section),
in some situations, it may penalize diversication: VaR(A + B) > VaR(A) +
VaR(B). For example, let A and B be two independent portfolios with distri-
bution
P[A = 1000$] = P[B = 1000$] = 0.04
P[A = 0] = P[B = 0] = 0.96.
Then VaR
0.95
(A) = VaR
0.95
(B) = 0 but P[A + B 1000$] = 0.0784 and
VaR
0.95
(A+B) = 1000$ > 0. Such situations are not uncommon in the domain
of credit derivatives where the distributions are strongly non-Gaussian.
More dangerously, this lack of convexity makes it possible for unscrupulous
traders to introduce bias into risk estimates, by putting all risk in the tail of the
distribution which the VaR does not see, as illustrated in the following example.
12.4. Convex risk measures 185
Portfolio composition n stocks n stocks + n put options
Initial value nS
0
n(S
0
+P
0
)
Terminal value nS
T
n(S
T
+P
0
(K S
T
)
+
)
Portfolio P&L n(S
T
S
0
) n(S
T
S
0
(K S
T
)
+
)
Portfolio VaR nq
1
nq
1
.
Table 12.2: The sale of an out of the money put option with exercise probability
less than allows to generate immediate prots but has no eect on VaR

although the risk of the position is increased (see example 12.1 for details).
Example 12.1. Consider a portfolio containing n units of stock (S
t
), and let q

denote the (1)-quantile of the stock return distribution (assumed continuous)


for the time horizon T:
q

:= infx R : P[S
T
S
0
x] .
The value at risk of this portfolio for the time horizon T and condence level
is given by nq
1
(see table 12.2). Assume now that the trader sells n put
options on S
T
with strike K satisfying K S
0
q
1
, maturity T and initial
price P
0
. The initial value of the portfolio becomes equal to n(S
0
+P
0
) and the
terminal value is n(S
T
+P
0
(K S
T
)
+
). Since
P[n(S
T
S
0
(K S
T
)
+
) nq
1
]
= P[n(S
T
S
0
) nq
1
; S K] +P[S
T
< K] = 1 ,
the VaR of the portfolio is unchanged by this transaction, although the risk is
increased.
12.4 Convex and coherent risk measures
In the seminal paper [1], Artzner et al. dened a set of properties that a risk
measure must possess if it is to be used for computing regulatory capital in a
sensible risk management system. Let A be the linear space of possible pay-os,
containing the constants
Denition 12.2. A mapping : A R + is called a coherent risk
measure if it possesses the following properties:
Monotonicity: X Y implies (X) (Y ).
Cash invariance: for all m R, (X +m) = (X) m.
Subadditivity: (X +Y ) (X) +(Y ).
Positive homogeneity: (X) = (X) for 0.
186 CHAPTER 12. FINANCIAL RISK MANAGEMENT
While the rst three conditions seem quite natural (in particular, the subad-
ditivity is linked to the ability of the risk measure to encourage diversication),
the positive homogeneity property has been questioned by many authors. In
particular, a risk measure with this property does not take into account the
liquidity risk associated with liquidation costs of large portfolios.
Under the positive homogeneity property, the subadditivity is equivalent to
convexity: (X +(1 )Y ) (X) +(1 )(Y ), for 0 1. The Value
at Risk possesses the monotonicity, the cash invariance and the positive homo-
geneity properties, but we have seen that it is not subadditive, and therefore it
is not a coherent risk measure.
The smallest coherent risk measure which dominates the VaR is known as
the Conditional VaR or expected shortfall, and is dened as the average VaR
with condence levels between and 1:
ES

(X) =
1
1
_
1

VaR

(X)d (12.1)
The following proposition claries the interpretation of ES

as the expec-
tation of losses in excess of VaR.
Proposition 12.3. The Expected Shortfall admits the probabilistic representa-
tion
ES

(X) = VaR

(X) +
1
1
E[(VaR

(X) X)
+
]
If the distribution function of X, denoted by F(x), is continuous then in addition
ES

(X) = E[X[X < VaR

(X)].
Proof. Let F
1
(u) := infx R : F(x) u be the left-continuous generalized
inverse of F, such that VaR

(X) = F
1
(1 ). It is well known that if U
is a random variable, uniformly distributed on [0, 1] then F
1
(U) has the same
law as X. Then
VaR

(X) +
1
1
E[(VaR

(X) X)
+
]
=
1
1
E[(F
1
(1 ) F
1
(U))
+
] F
1
(1 )
=
1
1
_
1
0
(F
1
(1 ) F
1
(u))du F
1
(1 ) =
1
1
_
1
0
F
1
(u)du,
which nishes the proof of the rst part. The second observation follows from
the fact that if F is contunuous, P[X VaR

(X)] = 1 .
The monotonicity, cash invariance and positive homogeneity properties of the
expected shortfall are clear from the denition (12.1). The following proposition
establishes a dual representation, which shows that unlike the VaR, the expected
shortfall is convex (as the upper bound of a family of linear functions). It is
therefore, the simplest example of a coherent risk measure.
12.5. Regulatory capital 187
Proposition 12.4. The expected shortfall admits the representation
ES

(X) = supE
Q
[X] : Q Q
1
,
where Q
1
is the set of probability measures on A satisfying
dQ
dP

1
1
.
Proof. Let Q Q
1
with Z =
dQ
dP
, and let x = VaR

(X). Then, using the


representation of Proposition 12.3,
ES

(X) +E[ZX] =
1
1
E[(x X)1
xX
+ (1 )Z(X x)]
=
1
1
E[(x X)1
xX
(1 (1 )Z)]
+
1
1
E[(X x)1
x<X
(1 )Z] 0, (12.2)
which shows that ES

(X) supE
Q
[X] : Q Q
1
. On the other hand,
there exists c [0, 1] such that P[X < x] + cP[X = x] = 1 . Taking
Z =
1
X<x
1
+
c1
X=x
1
, we get equality in (12.2).
The expected shortfall presents all the advantages of the Value at Risk and
avoids many of its drawbacks: it encourages diversication, presents computa-
tional advantages compared to VaR and does not allow for regulatory arbitrage.
Yet to this day, although certain major banks use the expected shortfall for
internal monitoring purposes, the VaR remains by far the most widely used
measure for the computation of regulatory capital, and the Expected Shortfall
is not even mentioned in the Basel capital accords on banking supervision. In
addition to a certain conservatism of the banking environment, a possible reason
for that is that the Expected Shortfall is a more conservative risk measure, and
would therefore imply higher costs for the bank in terms of regulatory capital.
12.5 Regulatory capital and the Basel frame-
work
Starting from the early 80s, the banking regulators of the developed nations
became increasingly aware of the necessity to rethink and standardize the reg-
ulatory practices, in order to avoid the dangers of the growing exposure to
derivative products and loans to emerging markets on one hand, and ensure fair
competition between internationally operating banks on the other hand. The
work of the Basel committee for banking supervision, created for the purpose of
developping a set of recommendations for regulators, resulted in the publication
of the 1988 Basel Accord (Basel I) [3]. The Accord concerned exclusively credit
risk, and contained simple rules for computing minimal capital requirements
depending on the credit exposures of a bank. More precisely, the Accord denes
two minimal capital requirements, to be met by a bank at all times: the assets
to capital multiple, and the risk based capital ratio (the Cooke ratio). The
assets to capital multiple is the ratio of the total notional amount of the banks
188 CHAPTER 12. FINANCIAL RISK MANAGEMENT
assets to the banks capital (meaning equity capital, that is, dierence between
assets and debt, plus some additions). The maximum allowed multiple is 20:
Total assets
Capital
20.
The risk-based capital ratio is the ratio of the capital to the sum of all assets,
weighted by their respective risk factors. This ratio must not be less that 8 per
cent:
Capital
Risk-weighted assets
8%.
The risk weights reect relative riskiness of very broad asset classes: for example,
cash, gold and OECD government bonds are considered risk-free and have risk
weight zero; claims on OECD banks and public agencies have risk weight 0.20;
uninsured residential mortgages have weight 0.50 and all other claims such as
corporate bonds have risk weight 1.00. The credit ratings of bond issuers are
not explicitly taken into account under the Basel I accord.
The rapid growth of banks trading activity, especially in the derivative prod-
ucts, has prompted the Basel Committee to develop a set of recommendations
for the computation of regulatory capital needed for protection agains market
risk, known as the 1996 market risk amendment [4]. This amendment allows
the banks to choose between a standard model proposed by the regulator (the
standardized approach) and internally developed VaR model (internal models
approach). To be eligible for the internal models approach, the banks must have
a strong risk management team, reporting only to the senior management, im-
plement a robust back-testing scheme and meet a number of other requirements.
This creates an incentive for banks to develop strong risk management, since
using internal models allows to reduce the regulatory capital by 2050 per cent,
thanks to the correlations and diversication eects. The capital requirements
are computed using the 10-day VaR at the 99% condence level, multiplied
by an adjustment factor imposed by the regulator and reecting provisions for
model risk, quality of ex-post performance etc.
Already in the late 90s, it was clear that the Basel I accord needed replace-
ment, because of such notorious problems as rating-independent risk weights
and possibility of regulatory arbitrage via securitization. In 2004, the Basel
Committe published a new capital adequacy framework known as Basel II [5].
This framework describes capital provisions for credit, market and newly intro-
duced operational risk. In addition to describing the computation of minimum
regulatory requirements (Pillar I of the framework), it also describes dierent
aspects of interaction between banks and their regulators (Pillar II) and the
requiremens for disclosure of risk information to encourage market discipline
(Pillar III). The basic Cooke Ratio formula remains the same, but the method
for computing the risk-weighted assets is considerably modied. For dierent
risk types, the banks have the choice between several approaches, depending on
the strengh of their risk management teams. For credit risk, these approaches
are the Standardized approach, the Foundation internal ratings based approach
12.5. Regulatory capital 189
(IRBA) and the Advanced IRBA. Under the standardized approach, the banks
use supervisory formulas and the ratings provided by an external rating agency.
Under the Foundation IRBA the banks are allowed to estimate their own default
probability, and under Advanced IRBA other parameters such as loss given de-
fault (LGD) and exposure at default (EAD) are also estimated internally. These
inputs are then plugged into the supervisor-provided general formula to compute
the capital requirements.
190 CHAPTER 12. FINANCIAL RISK MANAGEMENT
Appendix A
Preliminaires de la theorie
des mesures
A.1 Espaces mesurables et mesures
Dans toute cette section, designe un ensemble quelconque, et T() est lensemble
des toutes ses parties.
A.1.1 Alg`ebres, alg`ebres
Denition A.1. Soit / T(). On dit que
(i) /
0
est une alg`ebre sur si /
0
contient et est stable par passage au
complementaire et par reunion.
(ii) / est une alg`ebre si cest une alg`ebre stable par union denombrable. On
dit alors que (, /) est un espace mesurable.
Notons quune alg`ebre doit aussi contenir , et est stable par intersection et
par dierence symetrique, i.e.
A B et AB := (A B) (A B) / pour tous A, B /
0
,
et quune alg`ebre est stable par intersection denombrable. T() est la plus
grande alg`ebre sur . Il sav`ere cependant que cette alg`ebre est souvent
trop grande pour quon puisse y developper les outils mathematiques necessaires.
En dehors des cas tr`es simples, il est souvent impossible de lister les elements
dune alg`ebre ou dune alg`ebre. Il est alors commode de les caracteriser par
des sous-ensemble assez riches.
Ainsi, on denit pour tout ( T() la alg`ebre (() engendree par (.
Cest la plus petite alg`ebre sur contenant (, denie comme intersection de
toutes les alg`ebre sur contenant (.
Example A.2. Si est un espace topologique, la alg`ebre Borelienne, notee
par B

, est la alg`ebre engendree par les ouverts de . Pour la droite reelle,


191
192 APPENDIX A. ESSENTIALS OF MEASURE THEORY
on peut meme simplier la comprehension de B
R
:
B
R
= ((R)) o` u (R) := ] , x] : x R
(Exercice !)
Lexemple precedent se generalise par la notion suivante:
Denition A.3. Soit 1 T(). On dit que 1 est un syst`eme sil est stable
par intersection nie.
Ainsi lensemble (R) de lexemple ci-dessus est un syst`eme. Limportance
de cette notion apparatra dans la proposition A.5 ci-dessous ainsi que dans le
theor`eme des classes monotones A.18 de la section A.2.
A.1.2 Mesures
Denition A.4. Soit /
0
une alg`ebre sur , et
0
: /
0
R
+
une fonction
positive.
(i)
0
est dite additive si
0
() = 0 et pour tous A, B /
0
:

0
(A B) =
0
(A) +
0
(B) d`es que A B = .
(ii)
0
est dite additive si
0
() = 0 et pour toute suite (A
n
)
n0
/
0
:
A =
n0
A
n
/
0
et les A
n
disjoints =
0
(A) =

n0

0
(A
n
).
(iii) Une fonction additive : / R
+
sur un espace mesurable (, /)
est appelee mesure, et on dit que (, /, ) est un espace mesure.
(iv) Un espace mesure (, /, ) est dit ni si () < , et ni sil existe
une suite (
n
)
n0
/ telle que (S
n
) < et
n0

n
= .
Proposition A.5. Soient 1 un syst`eme, et , deux mesures nies sur
lespace mesurable (, (1)). Si = sur 1 alors = sur (1).
La demonstration est reportee, `a titre de complement, dans lannexe de ce
chapitre. Le resultat suivant est essentiel pour construire des mesures interessantes.
Theorem A.6. (extension de Caratheodory) Soient /
0
une alg`ebre sur , et

0
: /
0
R
+
une fonction additive. Alors il existe une mesure sur
/ := (/
0
) telle que =
0
sur /
0
. Si de plus
0
() < , alors une telle
extension est unique.
La demonstration est reportee, `a titre de complement, dans lannexe de ce
chapitre. Avec ce resultat, on peut maintenant construire une mesure impor-
tante sur lespace mesurable (]0, 1], B
]0,1]
).
A.1. Espaces mesur es 193
Example A.7. (Mesure de Lebesgue) Nous allons denir une mesure sur B
]0,1]
qui mesure les longueurs.
1- On remarque tout dabord que /
0
constitue des parties A ]0, 1] de la forme
A =
1in
(a
i
, b
i
] pour n N et 0 a
1
b
1
. . . a
r
b
r
1,(A.1)
est une alg`ebre telle que B
]0,1]
= (/
0
). Pour tout A /
0
de la forme (A.1),
on denit

0
(A) :=
n

i=1
(b
i
a
i
).
2- Alors
0
: /
0
R
+
est une application bien denie et est evidemment
additive. On peut montrer quelle est additive (cest moins evident, voir
cours de premi`ere annee). Comme
0
(]0, 1]) < , on deduit du theor`eme de
Carathedory lexistence dune unique extension denie sur B
]0,1]
.
Cette mesure ni est appelee mesure de Lebesgue sur ]0, 1]. La mesure de
Lebesgue sur [0, 1] est obtenue par une modication triviale puisque le singleton
0 est de mesure de Lebesgue nulle.
3- Par le meme raisonnement, on peut construite la mesure de Lebesgue sur B
R
comme extension dune application densembles sur lalg`ebre des unions nies
dintervalles semi-ouverts disjoints. Dans ce cas, la mesure de Lebesgue est
seulement nie.
Denition A.8. (i) Sur un espace mesure (, /, ), un ensemble N / est
dit negligeable si (N) = 0.
(ii) Soit P() une propriete qui ne depend que dun element . On dit
que P est vraie presque partout, et on note p.p., si lensemble :
P() est fausse est inclus dans un ensemble negligeable.
Remark A.9. Dapr`es la propriete de additivite de la mesure, on voit
aisement que toute union denombrable de negligeables est negligeable.
A.1.3 Proprietes elementaires des mesures
Nous commen cons par des proprietes mettant en jeu un nombre ni densembles.
Proposition A.10. Soit (, /, ) un espace mesure, et (A
i
)
in
/. Alors:
(i) (
in
A
i
)

in
(A
i
),
(ii) Si de plus () < , on a
(
in
A
i
) =

kn
(1)
k1

i
1
<...<i
k
n
(A
i
1
. . . A
i
k
).
La preuve de ce resultat est une consequence immediate de la denition de
mesure. La partie (ii), specique aux mesures nies, donne une formule pour
la mesure de lunion nie densemble qui alterne entre sur-estimation et sous
estimation. Pour n = 2 cette formule nest autre que la propriete bien connue
(A B) = (A) +(B) (A B) pour A, B /.
194 APPENDIX A. ESSENTIALS OF MEASURE THEORY
Le resultat (simple) suivant est fondamental en theorie de la mesure. Pour
une suite densembles (A
n
)
n
, nous notons simplement A
n
A pour indiquer que
la suite est croissante (A
n
A
n+1
) et
n
A
n
= A. La notation A
n
A a un
sens similaire dans le cas o` u la suite est decroissante.
Proposition A.11. Soit (, /, ) un espace mesure, et (A
n
)
n
une suite de /.
Alors
(i) A
n
A = (A
n
) (A),
(ii) A
n
A et (A
k
) < pour un certain entier k = (A
n
) (A),
La demonstration simple de ce resultat est laissee comme exercice. Faisons
juste deux remarques:
Une consequence de la proposition A.11 est que lunion denombrable
densembles de mesure nulle est de mesure nulle.
lexemple A
n
=]n, [ dans lespace mesure (R, B
R
, ), etant la mesure
de Lebesgue sur R, montre que la condition supplementaire dans (ii) est
necessaire.
Ces resultats permettent de montrer les outils important pour lanalyse de
la convergence des mesures des ensembles. On rappelle les notions de liminf et
limsup pour une suite densembles (A
n
)
n
:
limsup E
n
:=
n

kn
E
k
= : E
n
pour une innite de n,
liminf E
n
:=
n

kn
E
k
= : E
n
`a partir dun rang n
0
().
Le resultat suivant est tr`es utile.
Lemma A.12. (de Fatou pour les ensembles) Soit (, /, ) un espace mesure,
et (A
n
)
n
une suite dans /. Alors
[liminf A
n
] liminf [A
n
].
Proof. Par denition, nous avons B
n
:=
kn
A
k
B := liminf A
n
, et on deduit
de la proposition A.11 (i) que [B] = lim [B
n
]. Pour conclure, il sut de
remarquer que B
n
A
n
et par suite [B
n
] [A
n
], impliquant que lim
[B
n
] liminf [A
n
].
Si la mesure est nie, le resultat suivant montre que linegalite inverse dans
le lemme de Fatou pour les ensembles a lieu en echangeant liminf et limsup.
Nous verrons plus tard que la situation est plus compliquee pour les fonctions...
Lemma A.13. (inverse Fatou pour les ensembles) Soit (, /, ) un espace
mesure ni, et (A
n
)
n
une suite dans /. Alors
[limsup A
n
] limsup [A
n
].
A.2. Int egrale de Lebesgue 195
Proof. Par denition, nous avons C
n
:=
kn
A
k
C := limsup A
n
. La propo-
sition A.11 (ii), qui requiert que la mesure soit nie, donne [C] = lim [C
n
].
Pour conclure, il sut de remarquer que C
n
A
n
et par suite [C
n
] [A
n
],
impliquant que lim [C
n
] limsup [A
n
].
Enn, nous enon cons le resultat suivant qui sera utilise `a plusieur reprises, et
qui sera complete dans la suite quand nous aurons aborde les notions dindependance.
Lemma A.14. (Premier lemme de Borel-Cantelli) Soit (, /, ) un espace
mesure, et (A
n
)
n
/. Alors

n
[A
n
] < = [limsup A
n
] = 0.
Proof. Avec les notations de la demonstration du lemme A.13, on a limsup A
n

C
n
=
kn
A
k
, et donc (limsup A
n
) (C
n
)

kn
(A
n
). Le resultat est
obtenu en envoyant n vers linni.
A.2 Lintegrale de Lebesgue
Dans cette section, on consid`ere un espace mesure (, /, ), et nous developpons
la theorie dintegration dune fonction par rapport `a la mesure . Si est
denombrable, / = (), et () = 1 pour tout , une fonction est
identiee `a une suite (a
n
)
n
, et elle est integrable si et seulement si

n
[a
n
[ < ,
et lintegrale est donnee par la valeur de la serie

n
a
n
. La reelle diculte est
donc pour les espaces non denombrables.
A.2.1 Fonction mesurable
Lobjet central en topologie est la structure des ouverts, et les fonctions contin-
ues sont caracrterisees par la propriete que les images reciproques des ouvert de
lensemble darrivee sont des ouverts de lensemble de depart. Dans la theorie
de la mesure, les ouverts sont remplaces par les ensembles mesurables, et les
fonctions mesurables remplacent les fonctions continues.
Denition A.15. On dit quune fonction f : (, /) (R, B
R
) est mesurable
si limage reciproque de tout ensemble borelien est dans /. On note par L
0
(/)
lensemble des fonctions mesurables. Les sous-ensembles des fonctions mesurables
positives (resp. bornees) seront notes L
0
+
(/) (resp. L

(/)).
De mani`ere equivalente f L
0
(/) si et seulement linverse f
1
est bien
denie comme une application de B
R
dans /, i.e. f
1
: B
R
/. Si ( B
R
est tel que (() = B
R
, alors il sut de verier f
1
: ( /.
Remark A.16. (i) En prenant ( = (R) le syst`eme des intervalles de la
forme ] , c], c R, on voit que
f L
0
(/) ssi f c / pour tout c R.
196 APPENDIX A. ESSENTIALS OF MEASURE THEORY
(ii) Sopposons que est un espace topologique, et que f : R est con-
tinue. Alors f est B

mesurable. En eet, avec ( = ouverts de R, la conti-


nuite secrit f
1
: B
R
/. On dit que f est une fonction borelienne.
(iii) Soit X une application de dans un ensemble denombrable X() =
x
n
, n N. On munit X() de la plus grande alg`ebre T(X()) et on
remarque que T(X()) = ( : ). Ceci permet de conclure que X
est mesurable si et seulement si X = x
n
/ pour tout n N.
La mesurabilite est conservee par les operations usuelles pour les fonctions.
Proposition A.17. (i) Pour f, g L
0
(/), h L
0
(B
R
), et R, on a f +g,
f, fg, f h et f L
0
(/).
(ii) Pour une suite (f
n
)
n
L
0
(/), on a inf h
n
, liminf h
n
, sup h
n
et limsup h
n
L
0
(/).
La preuve est simple et est laissee en exercice. Avant daborder lobjet
central de ce chapitre, `a savoir la construction de lintegrale de Lebesgue, nous
reportons une version simple du theor`eme des classes monotones, qui ne sera
utilise que plus tard dans la construction despaces mesures produits.
Theorem A.18. (classes monotones) Soit H une classes de fonctions reelles
bornees sur veriant les conditions suivantes:
(H1) H est un espace vectoriel contenant la fonction constante 1,
(H2) pour toute suite croissante (f
n
)
n
H de fonctions positives telle que
f := lim f
n
est bornee, on a f H.
Soit 1 un syst`eme tel que 1
A
: A 1 H. Alors L

((1)) H.
La demonstration est reportee `a titre de complement dans lannexe de ce
chapitre.
A.2.2 Integration des fonctions positives
Le but de ce paragraphe est de denir pour toute fonction mesurable positive f
une notion dintegrale par rapport `a la mesure :
_
fd que lon note aussi (f),
qui est un abus de notation comunement accepte ( : / R !) du fait que
notre denition doit verier
_
1
A
= (A) pour tout A /.
Plus generalement, soit o
+
lensemble des fonctions de dans R
+
de la forme
g =
n

i=1
a
i
1
A
i
, (A.2)
A.2. Int egrale de Lebesgue 197
pour un certain entier n 1, des ensembles A
i
/, et des scalaires a
i
[0, ],
1 i n. Ici, il est commode dautoriser la valeur +, et on utilisera les r`egles
de calcul 0 = 0 = 0. lintegrale sur o
+
est denie par:
(g) =
n

i=1
a
i

0
(A
i
). (A.3)
Il est clair que (g) est bien deni, i.e. deux representations dierentes (A.2)
dun element f o
+
donnent la meme valeur. Nous etendons `a present la
denition de `a lensemble L
0
+
(/) des fonctions /mesurables positives.
Denition A.19. Pour f L
0
+
(/), lintegrale de f par rapport ` a est denie
par
(f) := sup
_
(g) : g o
+
et g f
_
.
Lensemble g o
+
: g f, dont la borne superieure denit lintegrale,
contient la fonction nulle. On peut aussi construire des elements non triviaux
en introduisant la fonction

n
(x) := n1
]n,[
(x) +

i1
(i 1)2
n
1
B
n
i
(x), B
n
i
:= [0, n]](i 1)2
n
, i2
n
].
En eet, pour tout f L
0
(/):
(
n
f)
n
o
+
est une suite croissante qui converge vers f. (A.4)
La denition de lintegrale implique immediatement que
(cf) = c(f) pour tous c R
+
et f L
0
+
(/), (A.5)
ainsi que la propriete de monotonie suivante.
Lemma A.20. Pour f
1
, f
2
L
0
+
(/) avec f
1
f
2
, on a 0 (f
1
) (f
2
). De
plus (f
1
) = 0 si et seulement si f
1
= 0, p.p.
Proof. Pour la premi`ere partie, il sut de remarquer que g o
+
: g f
1

g o
+
: g f
2
. Pour la deuxi`eme partie de lenonce, rappelons que (f >
0) = lim (f > n
1
) dapr`es la proposition A.11. Si (f > 0) > 0, on a
(f > n
1
) > 0 pour n assez grand. Alors f g := n
1
1f > n
1
o
+
, et
on deduit de la denition de lintegrale que (f) (g) = n
1
(f > n
1
) >
0.
Le resultat `a la base de la theorie de lintegration est lextension suivante de
la propriete de convergence monotone des mesures densembles enoncee dans la
proposition A.11 (i).
Theorem A.21. (convergence monotone) Soit (f
n
)
n
L
0
+
(/) une suite crois-
sante p.p., i.e. pour tout n 1, f
n
f
n+1
p.p. Alors
(lim f
n
) = lim (f
n
).
198 APPENDIX A. ESSENTIALS OF MEASURE THEORY
Proof. On proc`ede en trois etapes.
Etape 1 On commence par supposer que f
n
f
n+1
sur . On note f :=
lim f
n
. Dapr`es le lemme A.20, la suite des integrales ((f
n
))
n
herite la
croissance de la suite (f
n
)
n
et est majoree par (f). Ceci montre linegalite
lim (f
n
) (lim f
n
).
Pour etablir linegalite inverse, nous devons montrer que lim (f
n
) (g)
pour tout g =

k
i=1
a
i
1
A
i
o
+
veriant g f. Pour tout c [0, 1[, on deduit
du lemme A.20 et de (A.5) que:
(f
n
) (f
n
1
{f
n
cg}
) c(g1
{f
n
cg}
) = c
k

i=1
a
i
(A
i
f
n
ca
i
).
En utilisant la propriete de convergence monotone des mesures densembles
enoncee dans la proposition A.11 (i), on obtient alors:
lim (f
n
) c
l

i=1
a
i
(A
i
) = c(g) (g) quand c 1.
Etape 2 Dans le reste de la preuve, on veut passer de la monotonie de la suite
(f
n
)
n
sur `a la monotonie p.p. Pour cel`a, introduisons
0
= :
(f
n
())
n
croissante, la suite croissante (sur )

f
n
:= f
n
1

0
, et les appapproxi-
mations croissantes (sur ) par des fonctions simples
_

k
f
n
_
k
,
_


f
n
_
k
de
f
n
,

f
n
, comme dans (A.4). La denition de lintegrale pour les fonctions simples
donne trivialement (
k
f
n
) = (
k


f
n
), et par suite (f
n
) = (

f
n
) dapr`es
letape 1. Le resultat du theor`eme est enn obtenu en appliquant le resultat de
letape 1 `a la suite (

f
n
)
n
.
Remark A.22. Par le meme argument que letape 2 ci-dessus (approximation
par les fonctions simples (A.4) et utilisation du theor`eme de convergence mono-
tone), on montre facilement que:
(i) Pour f
1
, f
2
L
0
+
(/) telles que f
1
= f
2
p.p., on a (f
1
) = (f
2
).
(ii) Pour f
1
, f
2
L
0
+
(/), on a f
1
+f
2
L
0
+
(/) et (f
1
+f
2
) = (f
1
) +(f
2
).
Voici une consequence simple et tr`es utile du theor`eme de convergence mono-
tone.
Lemma A.23. (Fatou) Pour une suite de fonctions (f
n
)
n
de L
0
+
(/), on a
(liminf f
n
) liminf (f
n
).
Proof. Dapr`es la monotonie de lintegrale, inf
kn
(f
n
) (inf
kn
f
k
) pour
tout n 1, et on obtient le resultat par application du theor`eme de convergence
monotone.
A.2. Int egrale de Lebesgue 199
A.2.3 Integration des fonctions reelles
Pour une fonction f L
0
(/), on note f
+
:= maxf, 0 et f

:= maxf, 0 si
bien que [f[ = f
+
f

. Ces deux fonctions heritent la /mesurabilite de f.


Denition A.24. Une fonction f L
0
(/) est dite integrable si ([f[) =
(f
+
) +(f

) < , et son integrale est denie par


(f) := (f
+
) (f

).
On note par L
1
(/, ) lensemble des fonctions integrables.
On voit immediatement que L
1
(/, ) est un espace vectoriel dont on donnera
dautres proprietes topologiques dans la suite.
Avant de continuer, levons tout de suite une source dambiguite concernant
lintegration dune fonction f L
1
(/, ) sur une partie A /. En eet celle-ci
peut se faire soit en integrant la fonction integrable f1
A
, soit en integrant la
restriction f[
A
par rapport `a la restriction
A
de `a lespace mesurable (A, /
A
),
o` u /
A
est la alg`ebre denie par /
A
:= T(A) /.
Proposition A.25. Pour tout f L
1
(/, ) et A /, on a (f1
A
) =
A
(f[
A
).
Proof. Tout dabord, cette propriete est vraie pour les fonctions f = 1
B
, B /,
puisque dans ce cas (1
B
1
A
) = (A B) =
A
(1
B
[
A
). Par linearite, cette
egalite reste vraie pour les fonctions simples, puis par convergence monotone
pour les fonctions mesurables positives. Enn, pour f L
1
(/, ), on decompose
f = f
+
f

, et on obtient le resultat voulu en appliquant legalite `a f


+
et f

Voici un resultat qui rappelle une propriete classique sur les integrales de
Riemann, eventuellement impropres.
Lemma A.26. Soit f L
1
(/, ) et > 0. Alors, il existe > 0 tel que pour
tout A / veriant (A) < , on a ([f[1
A
) < .
Proof. Supposons, au contraire, quil existe
0
et une suite (A
n
)
n
/ tels que
(A
n
) < 2
n
et ([f[1
A
n
)
0
. Dapr`es le premier lemme de Borel-Cantelli,
lemme A.14, on deduit que A := limsup A
n
est negligeable. En particulier
([f[1
A
) = 0, et on obtient une contradiction en remarquant que ([f[1
A
) =
([f[) ([f[1
A
c ) ([f[) liminf ([f[1
A
c
n
) = limsup ([f[1
A
n
)
0
, o` u on
a utilise le lemme de Fatou.
A.2.4 De la convergence p.p. `a la convergence L
1
Theorem A.27. (convergence dominee) Soient (f
n
)
n
L
0
(/) une suite telle
que f
n
f a.e. pour une certaine fonction f L
0
(/). Si sup
n
[f
n
[
L
1
(/, ), alors
f
n
f dans L
1
(/, ) i.e. ([f
n
f[) 0.
En particulier, (f
n
) (f).
200 APPENDIX A. ESSENTIALS OF MEASURE THEORY
Proof. On note g := sup
n
f
n
, h
n
:= f
n
f. Alors, les fonctions 2g+h
n
et 2gh
n
sont positives, on obtient par le lemme de Fatou que liminf (g f
n
) (g f)
et liminf (g + f
n
) (g + f). Du fait que g est integrable, on peut utiliser
la linearite de lintegrale, et on arrive `a (f) liminf (f
n
) limsup (f
n
)
(f).
Le resultat suivant donne une condition necessaire et susante pour quune
suite convergente p.p. soit convergente dans L
1
(/).
Lemma A.28. (Schee) Soit (f
n
)
n
L
1
(/, ) telle que f
n
f p.p. pour
une certaine fonction f L
1
(/, ). Alors:
f
n
f dans L
1
(/, ) ssi ([f
n
[) ([f[).
Proof. Limplication = est triviale. Pour linegalite inverse, on proc`ede en
deux etapes.
Etape 1 Supposons que f
n
, f 0, p.p. Alors (f
n
f)

f L
1
(/), et
on deduit du theor`eme de convergence dominee que ((f
n
f)

) 0. Pour
conclure, on ecrit que ([f
n
f[) = (f
n
) (f) + 2((f
n
f)

) 0.
Etape 2 Pour f
n
et f de signe quelconque, on utilise le lemme de Fatou pour
obtenir ([f[) = lim(f
+
n
) + (f

n
) (f
+
) + (f

) = ([f[) et par suite


toutes les inegalites sont des egalite, i.e. lim(f
+
n
) = (f
+
) et lim(f

n
) =
(f

). On est alors ramene au contexte de letape 1, qui permet dobtenir


f
+
n
f
+
and f

n
f

dans L
1
(/), et on conclut en ecrivant [f
n
f[
[f
+
n
f
+
[ +[f

n
f

[ et en utilisant la monotonie de lintegrale.


Exercise A.29. Soient (, /, ) un espace mesure, I un intervalle ouvert de
R, et f : I R une fonction telle que f(x, .) L
0
(/) pour tout x I.
1. On suppose quil existe une fonction g L
1
+
(/, ) telle que [f(x, .)[ g,
p.p. Montrer alors que, si f(., ) est continue en en un point x
0
I,
p.p., la fonction : I R denie par
(x) :=
_
f(x, )d(); x I,
est bien denie, et quelle est continue au point x
0
.
2. On suppose que la derivee partielle f
x
:= (f/x) existe pour tout x I,
p.p. et quil existe une fonction h L
1
+
(/, ) telle que [f
x
(x, .)[ h,
p.p. Montrer alors que est derivable sur I, et

(x) =
_
f
x
(x, )d(); x I.
3. Donner des conditions qui assurent que soit continuement derivable sur
I.
A.2. Int egrale de Lebesgue 201
A.2.5 Integrale de Lebesgue et integrale de Riemann
Dans ce paragraphe, nous donnons quelques elements qui expliquent lavantage
de lintegrale de Lebesgue par rapport `a celle de Riemann. Pour etre plus
concret, on consid`ere le probl`eme dintegration sur R.
(a) Lintegrale de Riemann est construite sur un intervalle [a, b] compact de
R. Il y a bien une extension par les integrales impropres, mais cel`a conduit `a
un cadre assez restrictif.
(b) Lintegrale de Riemann est construite en approximant la fonction par des
fonctions en escalier, i.e. constantes sur des sous-intervalles de [a, b] de longueur
petite. Sur un dessin, il sagit dune approximation verticale. Par contre,
lintegrale de Lebesgue est construite en decoupant lintervalle image et en ap-
proximant f sur les images reciproques de ces intervalles. Il sagit dans ce cas
dune approximation horizontale de la fonction `a integrer.
(c) Les fonctions Riemann integrables sont Lebesgue integrables. Montrons
ceci dans [0, 1]. Soit f une fonction Riemann integrable bornee sur = [0, 1]
dintegrale (au sens de Riemann)
_
1
0
f(x)dx. Alors f est Lebesgue integrable
dintegrale (f) =
_
1
0
f(x)dx. Si f est une fonction en escalier, ce resultat est
trivial. Pour une fonction Rieman integrable f arbitraire, on peut trouver deux
suites de fonctions en escalier (g
n
)
n
et (h
n
)
n
croissante et decroissante, respec-
tivement, telles que g
n
f h
n
et inf
n
_
1
0
(g
n
h
n
)(x)dx = lim
n
_
1
0
(g
n

h
n
)(x)dx = 0. Sans perte de generalite, on peut supposer h
n
2|f|

. Les
fonctions f

:= sup
n
g
n
et f

:= inf
n
h
n
= 2M sup
n
(2M + h
n
) sont boreli-
ennes, et on a f

f f

. Dapr`es la monotonie de lintegrale:


0 (f

) = (inf(h
n
g
n
)) inf
n
(h
n
g
n
) = 0,
et par suite f = f

= f

. Enn:
(f

) = lim (g
n
) = lim
_
1
0
g
n
(x)dx =
_
1
0
f(x)dx
La reciproque nest pas vraie. Par exemple, la fonction f = 1
{Q[0,1]}
est
integrable, mais nest pas Riemann-integrable.
(d) Le theor`eme de convergence dominee na pas son equivalent dans le cadre
de lintegrale de Riemann, et permet dobtenir un espace de fonctions integrables
complet (on verra ce resultat plus tard). Par contre, on peut construire des
exemples de suites de Cauchy de fonctions Riemann integrables dont la limite
nest pas Riemann integrable.
(e) Pour les fonctions denies par des integrales, les resultats de continuite
et de derivabilite sont simplement obtenus gr ace au theor`eme de convergence
202 APPENDIX A. ESSENTIALS OF MEASURE THEORY
dominee. Leur analogue dans le cadre des integrales de Riemann conduit `a des
resultats assez restrictifs.
(f) Lintegrale de Lebesgue se denit naturellement dans R
n
, alors que la sit-
uation est un peu plus compliquee pour lintegrale de Riemann. En particulier,
le theor`eme de Fubini est dune grande simplicite dans le cadre de lintegrale de
Lebesgue.
A.3 Transformees de mesures
A.3.1 Mesure image
Soit (
1
, /
1
,
1
) un espace mesure, (
2
, /
2
) un espace mesurable et f :
1

2
une fonction mesurable, i.e. f
1
: /
2
/
1
. On verie immediatement
que lapplication:

2
(A
2
) :=
1
_
f
1
(A
2
)
_
pour tout A
2

2
,
denit une mesure sur (
2
, /
2
).
Denition A.30.
2
est appelee mesure image de
1
par f, et est notee
1
f
1
.
Theorem A.31. (transfert) Soient
2
:=
1
f
1
, la mesure image de
1
par
f, et h L
0
(/
2
). Alors h L
1
(/
2
,
2
) si et seulement si h f L
1
(/
1
,
1
).
dans ces conditions, on a
_

2
hd(
1
f
1
) =
_

1
(h f)d
1
. (A.6)
Proof. On commence par verier la formule de transfert (A.6) pour les fonctions
positives. La formule est vraie pour les fonctions 1
A
2
, A
2
/
2
, puis oar linearite
pour les fonctions simples positives, et on conclut par le biais du theor`eme de
convergence monotone. Pour h de signe arbitraire integrable, on applique le
resultat precedente `a h
+
et h

. Enn, la formule de transfert montre que


h L
1
(/
2
,
2
) ssi h
+
f et h

f L
1
(/
1
,
1
), et lequivalence decoule du
fait que h
+
f = (h f)
+
et h

f = (h f)

.
A.3.2 Mesures denies par des densites
Soit (, /, ) un espace mesure, et soit f L
0
+
(/) une fonction mesurable
positive nie. On denit
(A) := (f1
A
) =
_
A
fd pour tout A /.
Exercise A.32. Verier que est une mesure sur (, /).
A.4. In egalit es remarquables 203
Denition A.33. (i) La mesure est appelee mesure de densite f par rapport
` a , et on note = f .
(ii) Soient
1
,
2
deux mesures sur un espace mesurable (, /). On dit que

2
est absoluement continue par rapport ` a
1
, et on note
2

1
, si
2
(A) =
0 =
1
(A) pour tout A /. Sinon, on dit que
2
est etrang`ere ` a
1
. Si

2

1
et
1

2
, on dit que
1
et
2
sont equivalentes, et on note
1

2
.
Enn, si
2
,
1
et
1
,
2
, on dit que
1
et
2
sont singuli`eres.
Ainsi, la mesure f est absoluement continue par rapport `a .
Theorem A.34. (i) Pour g : [0, ] /mesurable positive, on a (f
)(g) = (fg).
(ii) Pour g L
0
+
(/), on a g L
1
(/, f ) ssi fg L
1
(/, ), et alors (f
)(g) = (fg).
Exercise A.35. Prouver le theor`eme A.34 (en utilisant le shemas de demonstration
habituel).
A.4 Inegalites remarquables
Dans ce paragraphe, nous enon cons trois inegalites qui sont tr`es utiles. An
dhabituer le lecteur `a la manipulation des mesures et de lintegration, nous
formulons les resultats sous forme dexercices.
Exercise A.36. (Inegalite de Markov) Soit f une fonction /mesurable, et
g : R R
+
une fonction borelienne croissante positive.
1. Justier que g f est une fonction mesurable, et pour tout c R:
(g f) g(c)(f c). (A.7)
2. Montrer que
c(f c) (f) pour tout f L
0
+
(/) et c > 0,
c[[f[ c] ([f[) pour tout f L
1
(/, ) et c > 0.
3. Montrer linegalite de Chebychev:
c
2
[[f[ c] (f
2
) pour tout f
2
L
1
(/, ) et c > 0.
4. Montrer que
(f c) inf
>0
e
c
E[e
f
] pour tout f L
0
(/) et c R.
Exercise A.37. (Inegalite de Schwarz) Soient (, /, ) un espace mesure, et
f, g : / R
+
deux fonctions mesurables positives telle que (f
2
)+(g
2
) < .
1. Montrer que (fg) < .
204 APPENDIX A. ESSENTIALS OF MEASURE THEORY
2. Montrer que (fg)
2
(f
2
)(g
2
) (Indication: considerer la fonction xf +
g, x R).
3. Montrer que linegalite de Schwarz dans la question 2 est valable sans la
condition de positivite de f et g.
Exercise A.38. (Inegalite de H older, inegalite de Minkowski) On admet lineglite
de Jensen, valable pour une mesure sur (R, B
R
) telle que (R) = 1:
(c(f)) c((f)) pour f, c(f) L
1
(B
R
, ) et c(.) convexe,
qui sera demontree dans le chapitre B, theeor`eme B.6.
Soient (, /, ) un espace mesure et f, g : R deux fonctions mesurables
avec
([f
p
[) < ([g[
q
) < o` u p > 1,
1
p
+
1
q
= 1. (A.8)
1. On suppose f, g 0 et (f
p
) > 0. Montrer linegalite de H older:
([fg[) ([f[
p
)
1/p
([g[
q
)
1/q
pour p > 1,
1
p
+
1
q
= 1 et ([f[
p
) +([g[
q
) < ,
(Indication: introduire la mesure :=
f
p
(f
p
)
.)
2. Montrer que linegalite de H older de la question 1 est valable sous les con-
ditions (A.8) sans les conditions supplementaires de la question precedente.
3. En deduire linegalite de Minkowski:
([f +g[
p
)
1/p
([f[
p
)
1/p
+([g[
p
)
1/p
pour p > 1 et ([f[
p
) +([g[
p
) < .
(Indication: decomposer [f +g[
p
= (f +g)[f +g[
p1
.)
A.5 Espaces produits
A.5.1 Construction et integration
Dans ce paragraphe, nous faisons la construction de la mesure produit sur le
produit de deux espaces mesures.
Soient (
1
, /
1
,
1
), (
2
, /
2
,
2
) deux espaces mesures. Sur lespace produit

1

2
, on verie immediatement que /
1
/
2
est un syst`eme. On denit
alors la alg`ebre quil engendre
/
1
/
2
:= (/
1
/
2
) .
Sur cette structure despace mesurable (
1

2
, /
1
/
2
), on veut denir une
mesure telle que
(AA
2
) =
1
(A
1
)
2
(A
2
) pour tous (A
1
, A
2
) /
1
/
2
, (A.9)
A.5. Espaces produits 205
puis denir lintegrale dune fonction f :
1

2
R integrable:
_

2
fd.
Une question importante est de relier cette quantite aux integrales doubles
_

2
__

1
fd
1
_
d
2
et
_

1
__

2
fd
2
_
d
1
,
qui pose tout dabord les questions de
(1a) la
1
mesurabilite de la fonctionf

2
2
:
1
f(
1
,
2
),
(2a) la
2
mesurabilite de la fonction f

1
1
:
2
f(
1
,
2
),
puis, une fois ces questions r`eglees,
(1b) la /
1
mesurabilite de la fonction I
f
1
:
1

_
f(
1
,
2
)d
2
(
2
),
(2b) la /
2
mesurabilite de la fonction I
f
2
:
2

_
f(
1
,
2
)d
1
(
1
).
Ces deux probl`emes sont resolus aisement gr ace au theor`eme des classes
monotones:
Lemma A.39. (a) Soit f L

(/
1
/
2
). Alors, pour tous
1

1
,
2

2
:
f

1
1
L

(/
2
) et f

2
2
L

(/
1
).
(b) Supposons de plus que
1
et
2
soient nies. Alors I
f
i
L
1
(/
i
,
i
) pour
i = 1, 2 et
_

1
I
f
1
d
1
=
_

2
I
f
2
d
2
.
Proof. (a) Soit H := f L

(
1

2
, /
1
/
2
) : f

i
i
L

(
i
, /
i
), i = 1, 2.
Les condition H1 et H2 sont trivialement satisfaites par H. De plus, rappelons
que /
1
/
2
est un syst`eme engendrant /
1
/
2
, par denition. Il est claire
que H 1
A
: A /
1
/
2
. Le theor`eme des classes monotones permet de
conclure que H = L
1
(
1

2
, /
1
/
2
).
(b) Il sut de refaire le meme type dargument que pour (a).
Gr ace au dernier resultat, nous pouvons maintenant denir un candidat pour
la mesure sur lespace produit
1

2
par:
(A) :=
_ __
1
A
d
1
_
d
2
=
_ __
1
A
d
2
_
d
1
pour tout A /
1
/
2
.
Theorem A.40. (Fubini) Lapplication est une mesure sur (
1

2
, /
1

/
2
), appelee mesure produit de
1
et
2
, et notee
1

2
. Cest lunique mesure
sur
1

2
veriant (A.9). De plus, pour tout f L
0
+
(/
1
/
2
),
_
fd
1

2
=
_ __
fd
1
_
d
2
=
_ __
fd
2
_
d
1
[0, ]. (A.10)
Enn, si f L
1
(/
1
/
2
,
1

2
), les egalites (A.10) sont valides.
206 APPENDIX A. ESSENTIALS OF MEASURE THEORY
Proof. On verie que
1

2
est une mesure gr ace aux proprietes elementaires
de lintegrale de Lebesgue. Lunicite est une consequence immediate de la propo-
sition A.5. Les egalites (A.10) ont dej` a ete etablies dans le lemme A.39 (b) pour
f bornee et des mesures nies. Pour generaliser `a des fonctions f mesurables
positives, on introduit des approximations croissantes, et utilise le theor`eme de
convergence monotone. Enn, pour des fonctions f L
1
(/
1
/
2
,
1

2
), on
applique le resultat precedent `a f
+
et f

.
Remark A.41. (i) La construction de ce paragraphe, ainsi que les resultats
dintegration ci-dessous, setendent sans diculte pour la construction du pro-
duit de n espaces mesures au prix de notations plus encombrantes.
(ii) Soit maintenant (
i
, /
i
)
i1
une famille denombrable despaces mesures, et
:=

i1

i
. Pour tout sous-ensemble ni I N, et pour tous A
i
/
i
, i I,
on denit le cylindre
((A
i
, i I) := :
i
A
i
pour i I .
La alg`ebre produit est alors denie par
/ :=
n1
/
i
:= (((A
i
, i I) : I N, card(I) < .
A.5.2 Mesure image et changement de variable
Soit O = R
n
, ou dun espace de dimension n. Les outils developpes dans les
paragraphes precedents permettent de denir la mesure de Lebesgue sur R
n
`a
partir de notre construction de la mesure de Lebesgue sur R.
Dans ce paragraphe, on consid`ere une fonction
g :
1

2
o` u
1
,
2
ouverts de R
n
.
On note g = (g
1
, . . . , g
n
). Si g est dierentiable en un point x
1
, on note par
Dg(x) :=
_
g
i
x
j
_
1i,jn
et det[Dg(x)]
la matrice jacobienne de f en x et son determinant. Rappelons enn que g est
un C
1
dieomorphisme si g est une bijection telle g et g
1
sont de classe C
1
,
et que dans ce cas
det[Dg
1
(y)] =
1
det[Dg g
1
(y)]
.
Theorem A.42. Soit
1
une mesure sur (
1
, B

1
) de densite par rapport ` a la
mesure de Lebesgue f
1
L
0
+
(B

1
), i.e.
1
(dx) = 1

1
f
1
(x) dx. Si g est un
C
1
dieomorphisme, la mesure image
2
:= g
1
est absolument continue par
rapport ` a la mesure de Lebesgue de densite
f
2
(y) = 1

2
(y)f
_
g
1
_
[det[Dg
1
(y)][ et
_

1
h g(x)f
1
(x)dx =
_

2
h(y)f
2
(y)dy
pour toute fonction h :
2
R positive ou
2
integrable.
Pour la demonstration, on renvoit au cours de premi`ere annee.
A.6. Annexe 207
A.6 Annexe du chapitre A
A.6.1 syst`eme, dsyst`eme et unicite des mesures
Commen cons par introduire une notion supplementaire de classes densembles.
Denition A.43. Une classe T T() est appelee dsyst`eme si T,
B A T pour tous A, B T avec A B, et
n
A
n
T pour toute suite
croissante (A
n
)
n
.
Lemma A.44. Une classe ( T() est une alg`ebre si et seulement si ( est
un syst`eme et un dsyst`eme.
La preuve facile de ce resultat est laissee en exercice. Pour toute classe (,
on denit lensemble
d(() := T ( : T est un d syst`eme ,
qui est le plus petit dsyst`eme contenant (. Linclusion d(() (() est
evidente.
Lemma A.45. Pour un syst`eme 1, on a d(1) = (1).
Proof. Dapr`es le lemme A.44, il sut de montrer que d(1) est un system,
i.e. que d(1) est stable par intersection nie. On denit lensemble T

:= A
d(1) : A B d(1) pour tout B d(1), et on va montrer que T

= d(1) ce
qui termine la demonstration.
1- On commence par montrer que lensemble T
0
:= B d(1) : B C
d(1) pour tout C Ic est un dsyst`eme. En eet:
- T;
- soient A, B T
0
tels que A B, et C 1; comme A, B T
0
, on a
(A C) et (B C) d(1), et du fait que d(1) est un dsyst`eme, on voit que
(B A) C = (B C) (A C) d(1);
- enn, si T
0
A
n
A et C 1, on a A
n
C d(1) et donc lim (A
n
C) =
A C d(1) du fait que d(1) est un dsyst`eme;
2- par denition T
0
d(1), et comme on vient de montrer que cest un
dsyst`eme contenant 1, on voit quon a en fait T
0
= d(1); on verie main-
tenant que ceci implique que 1 T

;
3- enn, en procedant comme dans les etapes precedentes, on voit que T

est
un dsyst`eme.
Preuve de la proposition A.5 On verie aisement que lensemble T :=
A (1) : (A) = (A) est un dsyst`eme (cest `a ce niveau quon utilise que
les mesures sont nies an deviter des formes indeterminees du type ).
Or, par hypoth`ese, T contient le syst`eme 1. On deduit alors du lemme A.45
que T contient (1) et par suite T = (1).
208 APPENDIX A. ESSENTIALS OF MEASURE THEORY
A.6.2 Mesure exterieure et extension des mesures
Le but de ce paragraphe est de demonrer du theor`eme de Caractheodory A.6
dont nous rappeleons lenonce.
Theor`eme A.6 Soient /
0
une alg`ebre sur , et
0
: /
0
R
+
une fonction
additive. Alors il existe une mesure sur / := (/
0
) telle que =
0
sur
/
0
. Si de plus
0
() < , alors une telle extension est unique.
Pour preparer la demonstration, nous considerons une alg`ebre /

T(),
et une application : /

[0, ] veriant () = 0.
Denition A.46. On dit que est une mesure exterieure sur (, /

) si
(i) () = 0,
(ii) est croissante: pour A
1
, A
2
/

, (A
1
) (A
2
) d`es que A
1
A
2
,
(iii) est sous-additive: pour (A
n
)
n
/

, on a (
n
A
n
)

n
(A
n
).
Denition A.47. On dit quun element A /

est un ensemble si
(A B) +(A
c
B) = (B) pour tout B /
0
,
(en particulier, () = 0). On note par /

lensemble de tous les ensembles


de /

.
Le resultat suivant utilise uniquement le fait que /

est une alg`ebre.


Lemma A.48. Lensemble /

est une alg`ebre, et la restriction de ` a /

est
additive et verie pour tout B /

:
(
n
i=1
(A
i
B)) =
n

i=1
(A
i
B) d`es A
1
, . . . , A
n
/

sont disjoints.
Ce lemme, dont la demonstration (facile) est reportee pour la n du para-
graphe, permet de montrer le resultat suivant:
Lemma A.49. (Caratheodory) Soit une mesure exterieure sur (, /

). Alors
/

est une alg`ebre, et la restriction de ` a /

est additive, et par suite


est une mesure sur (, /

).
Proof. En vue du lemme A.48, il reste `a montrer que pour une suite densembles
disjoints (A
n
)
n
/

, on a
L :=
n
A
n
/

0
() et (
n
A
n
) =

n
(A
n
). (A.11)
Notons

A
n
:=
in
A
i
,

A :=
n
A
n
, et remarquons que

A
c


A
c
n
. Dapr`es le
lemme A.48,

A
n
/

et pour tout B /

:
(B) = (

A
c
n
B)+(

A
n
B) (

A
c
B)+(

A
n
B) = (

A
c
B)+

in
(A
i
B).
A.6. Annexe 209
On continue en faisant tendre n vers linni, et en utilisant (deux fois) la sous-
additivite de :
(B) (

A
c
B) +

n
(A
i
B) (

A
c
B +(A B) (B).
On deduit que toutes les inegalites sont des egalites, prouvant que

A /

, et
pour B =

A on obtient la propriete de sous-additivite de , nissant la preuve
de (A.11).
Nous avons maintenant tous les ingredients pour montrer le theor`eme dextension
de Caratheodory.
Preuve du theor`eme A.6 On consid`ere la alg`ebre /

:= T(), et on
denit lapplication sur :
(A) := inf
_

0
(B
n
) : (B
n
)
n
/
0
, B
n
disjoints et A
n
B
n
_
.
Etape 1 Montrons que est une mesure exterieure sur (, T), ce qui implique
par le lemme A.49 que
est une mesure sur (, /

). (A.12)
Il est clair que () = 0, et que est croissante, il reste donc `a verier que est
sous-additive. Soit une suite (A
n
)
n
T telle que (A
n
) < pour tout n,
et soit A :=
n
A
n
. Pour tout > 0 et n 1, on consid`ere une suite optimale
(B
n,
i
)
i
/
0
du probl`eme de minimisation (A
n
), i.e. B
n,
i
B
n,
j
= ,
A
n

k
B
n,
k
et (A
n
) >

0
(B
n,
k
) 2
n
.
Alors, (A)

n,k

0
(B
n,
k
) < +

n
(A
n
)

n
(A
n
) quand 0.
Etape 2 Rappelons que (/
0
) /

. Alors, pour nir la demonstration de


lexistence dune extension, il nous reste `a montrer que
/
0
/

et =
0
sur /
0
, (A.13)
pour ainsi denir comme la restriction de `a (/
0
).
1- Commen cons par montrer que =
0
sur /
0
. Linegalite
0
sur /
0
est
triviale. Pour linegalite inverse, on consid`ere A /
0
et une suite (B
n
)
n
/
0
delements disjoints telle A
n
B
n
. Alors, en utilisant la additivite de
0
sur /
0
:

0
(A) =
0
(
n
A B
n
) =

0
(A B
n
)

0
(B
n
) = (A).
210 APPENDIX A. ESSENTIALS OF MEASURE THEORY
2- Montrons maintenant que /
0
/

. Soient A /

, > 0 et (B
n
)
n
/
0
une suite optimale pour le probl`eme de minimsation (A). Alors, pour tout
A
0
/
0
, on a
(A) +

0
(B
n
) =

0
(A
0
B
n
) +

0
(A
c
0
B
n
)
((A
0
A) +((A
c
0
A)
(A),
o` u les deux derni`eres inegalites decoulent respectivement de la monotonie et la
sous-linearite de . Comme > 0 est arbitraire, ceci montre que A
0
est un
ensemble, i.e. A
0
/

.
Preuve du lemme A.48 1- Commen cons par montrer que /

est une
alg`ebre. Il est clair que /

et que /

est stable par passage au complementaire.


Il reste `a montrer que A = A
1
A
2
/
0
() pour tous A
1
, A
2
/
0
(). En util-
isant successivement le fait que A
2
/

et que A
2
A
c
= A
c
1
A
2
, A
c
2
A
c
= A
c
2
,
on calcule directement:
(A
c
B) = (A
2
A
c
B) +(A
c
2
A
c
B) = (A
c
1
A
2
B) +(A
c
2
B).
On continue en utilisant le fait que A
1
, A
2
/

:
(A
c
B) = (A
2
B) (A B) +(A
c
2
B) = (B) (A B).
2- Pour des ensembles disjoints A
1
, A
2
/

, on a (A
1
A
2
) A
1
= A
1
et
(A
1
A
2
) A
c
1
= A
2
, et on utilise le fait que A
1
/

pour voir que ((A


1

A
2
) B) = (A
1
B) + (A
2
B), ce qui est legalite annoncee pour n = 2.
Lextension pour un n plus grand est triviale, et la addditivite de en est
une consequence immediate.
A.6.3 Demonstration du theor`eme des classes monotones
Rappelons lenonce.
Theor`eme A.18 Soit H une classes de fonctions reelles bornees sur veriant
les conditions suivantes:
(H1) H est un espace vectoriel contenant la fonction constante 1,
(H2) pour toute suite croissante (f
n
)
n
H de fonctions positives telle que
f := lim f
n
est bornee, on a f H.
Soit 1 un syst`eme tel que 1
A
: A 1 H. Alors L

((1)) H.
Proof. Dapr`es les conditions H1 et H2, on voit immediatement que lensemble
T := F : 1
F
H est un dsyst`eme. De plus, comme T contient le
A.6. Annexe 211
syst`eme 1, on deduit que (1) T. Soit maintenant f L

(
(1)) bornee
par M > 0, et

n
() :=
M2
n

i=0
i2
n
1
A
n
i
(), o` u A
n
i
:= : i2
n
f
+
() < (i + 1)2
n
.
Comme A
n
i
(1), on deduit de la structure despace vectoriel (condition H1)
de H que
n
H. De plus (
n
)
n
etant une suite croissante de fonctions positives
convergeant vers la fonction bornee f
+
, la condition H2 assure que f
+
H. On
montre de meme que f

H et, par suite, f = f


+
f

H dapr`es H1.
212 APPENDIX A. ESSENTIALS OF MEASURE THEORY
Appendix B
Preliminaires de la theorie
des probabilites
Dans ce chapitre, on specialise lanalyse aux cas dune mesure de probabilite,
i.e. une mesure P : / R
+
telle que P[] = 1. On dit alors que (, T, P) est
un espace probabilise.
Bien evidemment, tous les resultats du chapitre precedent sont valables dans
le cas present. En plus de ces resultats, nous allons exploiter lintuition proba-
biliste pour introduire de nouveaux concepts et obtenir de nouveaux resultats.
Ainsi, lensemble sinterpr`ete comme lensemble des evenements elementaires,
et tout point est un evenement elementaire. La salg`ebre /est lensemble
de tous les evenements realisables.
On remplacera systematiquement la terminologie Pp.p. par Ppresque
surement, notee Pp.s. ou plus simplement p.s. sil ny a pas de risque de
confusion.
Les fonctions Pmesurables sont appelees variables aleatoires (on ecrira v.a.),
et on les notera, le plus souvent, par des lettres majuscules, typiquement X. La
loi image PX
1
est appelee distribution de la v.a. X, et sera notee L
X
sil ny
a pas besoin de rappeler la probabilite P.
B.1 Variables aleatoires
B.1.1 alg`ebre engendree par une v.a.
Nous commen cons par donner un sens precis `a linformation revelee par une
famille de variables aleatoires.
Denition B.1. Soient T un ensemble, et X

, T une famille quelconque


de v.a. La alg`ebre engendree par cette famille A := (X

: T) est la plus
petite alg`ebre sur telle que X

est Amesurable pour tout T, i.e.


(X

: T) =
_
X
1

(A) : T et A B
R

_
. (B.1)
213
214 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
Il est clair que si les X

sont /mesurables, alors (X

: T) /.
Lemma B.2. Soient X et Y deux v.a. sur (, /, P) prenant valeurs respec-
tivement dans R et dans R
n
. Alors X est (Y )mesurable si et seulement si il
existe une fonction borelienne f : R
n
R telle que X = f(Y ).
Proof. Seule la condition necessaire est non triviale. Par ailleurs quitte `a trans-
former X par une fonction bijective bornee, on peut se limiter au cas o` u X est
bornee. On denit
H := f(Y ) : f L

(R
n
, B
R
n),
et on remarque que 1
A
: A (Y ) H: dapr`es (B.1), pour tout A (Y ),
il existe B / tel que A = Y
1
(B), et par suite 1
A
= 1
B
(Y ).
Pour conclure, il nous sut de montrer que H verie les conditions du
theor`eme des classes monotones. Il est cair que H est un espace vectoriel
contenant la v.a. constante 1. Soient X L

+
(/, P) et (f
n
(Y ))
n
une suite
croissante de H telle que f
n
(Y ) X. Alors X = f(Y ), o` u f = limsup f
n
est
B
R
nmesurable bornee (puisque X lest).
B.1.2 Distribution dune v.a.
La distribution, ou la loi, dune v.a. X sur (, /, P) est denie par la mesure
image L
X
:= PX
1
. En utilisant le syst`eme (R) = ] , c]) : c R, on
deduit de la proposition A.5 que la loi L
X
est caracterisee par la fonction
F
X
(c) := L
X
(] , c]) = P[X c], c R. (B.2)
La fonction F
X
est appelee fonction de repartition.
Proposition B.3. (i) La fonction F
X
est croissante continue ` a droite, et
F
X
() = 0, F
X
() = 1,
(ii) Soit F une fonction croissante continue ` a droite, et F() = 0, F() =
1. Alors il existe une variable aleatoire

X sur un espace de probabilite (

,

/,

P)
telle que F = F

X
.
Proof. (i) est triviale. Pour (ii), une premi`ere approche consiste `a construire
une loi

L en suivant le schemas de construction de la mesure de Lebesgue dans
lexemple A.7 qui utilise le theor`eme dextension de Caratheodory; on prend
alors (

,

/,

P) = (R, B
R
,

L) et X() = . La remarque suivante donne une
approche alternative.
Remark B.4. Etant donnee une fonction de repartition, ou une loi, voici une
construction explicite dune v.a. lui correspondant. Cette construction est utile,
par exemple, pour la simulation de v.a. Sur lespace de probabilite (

,

/,

P) :=
([0, 1], B
[0,1]
, ), etant la mesure de Lebesgue, on denit
X() := infu : F(u) > et X() := infu : F(u)
B.2. Esp erance 215
1- F
X
= F: nous allons montrer que
F(c) X() c, (B.3)
et par suite P[X c] = F(c).
Limplication = decoule de la denition. Pour linclusion inverse, on ob-
serve que F(X()) . En eet, si ce netait pas le cas, on deduirait de la
continuite `a droite de F que F(X()+) < pour > 0 assez petit, impliquant
labsurdite X() + X() !
Avec cette observation et la croissance de F, on voit que X() c implique
F(X()) F(c) implique F(c).
2- F
X
= F: par denition de X, on a < F(c) implique X() c. Mais
X() c implique X() c puisque X X. On en deduit que F(c) P[X
c] P[X c] = F(c).
B.2 Esperance de variables aleatoires
Pour une v.a. X L
1
(, /, P), lesperance dans le vocabulaire probabiliste est
lintegrale de X par rapport `a P:
E[X] := P(X) =
_

XdP.
Pour une v.a. positive, E[X] [0, ] est toujours bien denie. Bien s ur, toutes
les proprietes du chapitre A sont valides. Nous allons en obtenir dautres comme
consequence de P[] = 1.
B.2.1 Variables aleatoires `a densite
Revenons `a presente `a la loi L
X
sur (R, B
R
) dune v.a. X sur (, /, P). Par
denition, on a:
L
X
(B) = P[X B] pour tout B B
R
.
Par linearite de lintegrale (par rapport `a L
X
), on obtient E[g(X)] = L
X
(g) =
_
R
hdL
X
pour toute fonction simple g o
+
. On etend alors cette relation aux
fonction g mesurables positives, par le theor`eme de convergence monotone, puis
`a L
1
en decomposant g = g
+
g

. Ceci montre que g(X) L


1
(, /, P) ssi
g L
1
(R, B
R
, L
X
) et
E[g(X)] = L
X
(g) =
_
R
hdL
X
. (B.4)
Denition B.5. On dit que X a une densite de probabilite f
X
si L
X
est ab-
soluement continue par rapport ` a la mesure de Lebesgue sur R et:
P[X B] =
_
B
f
X
(x)dx pour tout B B
R
.
216 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
Le lien entre la densite de probabilite, si elle existe, et la fonction de repartition
(qui existe toujours) est facilement etablie en considerant B =] , c]:
F
X
(c) =
_
],c]
f
X
(x)dx pour tout c R.
qui exprime f
X
est la derivee de F
X
aux points de continuite de f. Enn,
pour une v.a. X `a densite f
X
, on peut reformuler (B.4) sous la forme:
g(X) L
1
(, /, P) ssi
_
R
[g(x)[f
X
(x)dx <
et
E[g(X)] =
_
R
g(x)f
X
(x)dx.
B.2.2 Inegalites de Jensen
Une fonction convexe g : R
n
R est au dessus de son hyperplan tangeant en
tout point de linterieur du domaine. Si on admet ce resultat, alors, on peut
ecrire pour une v.a. integrable X que
g(X) g(E[X]) +p
E[X]
, X E[X]),
o` u p
E[X]
est le gradient de g au point E[X], si g est derivable en ce point. si g
nest pas derivable ce resultat est encore valable en rempla cant le gradient par
la notion de sous-gradient... Dans la demonstration qui va suivre, nous allons
eviter de passer par cette notion danalyse convexe, et utiliser un argument
dapproximation. En prenant lesperance dans la derni`ere inegalite, on obtient
linegalite de Jensen:
Theorem B.6. Soit X L
1
(, /, P) et g : R
n
R une fonction
convexe telle que E[[g(X)[] < . Alors E[g(X)] g (E[X]).
Proof. Si g est derivable sur linterieur du domaine, le resultat decoule de la
discussion qui precede lenonce.
Dans le cas general, on commence par supposer que X est bornee, et on
consid`ere une approximation de g par une suite de fonctions (g
n
)
n
telle que
g
n
est dierentiable, convexe, < sup
n
|g

n
|

g
n
g pour tout n, et
g
n
g.
1
On ecrit alors que g
n
(X) est au dessus de son hyperplan tangeant
au point E[X], et on obtient en prenant lesperance E[g
n
(X)] g
n
(E[X]). Le
theor`eme de convergence dominee permet de conclure.
Pour une variable aleatoire X integrable, on applique le resultat precedent
`a X
n
:= (n) X n, et on passe `a la limite par un argument de convergence
dominee.
1
Un exemple dune telle fonction est donne par linf-convolution g
n
(x) :=
inf
yR
n

f(y) + n|y x|
2

, voir Aubin [2].


B.2. Esp erance 217
B.2.3 Fonction caracteristique
Dans tout ce paragraphe X designe un vecteur aleatoire sur lespace probabilise
(, /, P), `a valeurs dans R
n
.
Denition B.7. On appelle fonction caracteristique de X la fonction
X
:
R
n
C denie par

X
(u) := E
_
e
iu,X
_
pour tout u R
n
.
La fonction caracteristique depend uniquement de la loi de X:

X
(u) =
_
R
n
e
iu,x
dL
X
(x),
est nest rien dautre que la transformee de Fourier de P
X
au point u/2.
Lintegrale de Lebesgue dune fonction `a valeurs complexes est denie de mani`ere
naturelle en separant partie reelle et partie imaginaire. La fonction caracteristique
est bien denie pour tout u R comme integrale dune fonction de module 1.
Enn, pour deux v.a. X et Y , on a

X
(u) =
X
(u) et
aX+b
(u) = e
ib

X
(au) pour tous u R
n
, a, b R.
Les propries suivantes des fonctions caracteristiques peuvent etre demontres
facilement gr ace au theor`eme de convergence dominee.
Lemma B.8. Soit
X
la fonction caracteristique dune v.a. X. Alors
X
(0) =
1, et
X
est continue bornee (par 1) sur R
n
.
Proof.
X
(0) = 1 et [
X
[ 1 sont des proprietes evidentes, la continuite est
une consequence immediate du theor`eme de convergence dominee.
Exercise B.9. 1. Pour un vecteur gaussien X de moyenne b et de matrice
de variance V , montrer que

X
(u) = e
u,b
1
2
u,V u
.
(Il sagit dune formule utile ` a retenir.)
2. Si L
X
est symetrique par rapport ` a lorigine, i.e. L
X
= L
X
, montrer
que
X
est ` a valeurs reelles.
3. Pour une v.a. reelle, supposons que E[[X[
p
] < pour un certain entier
p 1. Montrer que
X
est p fois derivable et

(k)
X
(0) = i
k
E[X
k
] pour k = 1, . . . , p.
218 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
Le but de ce paragraphe est de montrer que la fonction caracteristique per-
met, comme son nom lindique, de caracteriser la loi L
X
de X. Ceci donne un
moyen alternatif daborder les vecteurs aleatoires pour lesquels la fonction de
repartition est dicile `a manipuler. Cependant, linteret de cette notion ne se
limite pas `a la dimension n = 1. Par exemple, la manipulation de sommes de
v.a. est souvent plus simple par le biais des fonctions caracteristiques.
Dans ces notes, nous nous limitons `a montrer ce resultat dans le cas unidi-
mensionnel.
Theorem B.10. Pour une v.a. relle, la fonction
X
caracterise la loi L
X
.
Plus precisement
1
2
L
X
(a) +
1
2
L
X
(b) +L
X
(]a, b[) =
1
2
lim
T
_
T
T

X
(u)
e
iua
e
iub
iu
du
pour tous a < b. De plus, si
X
est inegrable, L
X
est absoluement continue par
rapport ` a la mesure de Lebesgue, de densite
f
X
(x) =
1
2
_
R
e
iux

X
(u)du, x R.
Proof. Nous nous limitons au cas unidimensionnel n = 1 pour simplier la
presentation. Pour a < b, on verie sans peine que la condition dapplication
du theor`eme de Fubini est satisfaite, et on calcule que:
1
2
_
T
T
e
iua
e
iub
iu

X
(u)du =
1
2
_
T
T
e
iua
e
iub
iu
__
R
e
iuv
dL
X
(v)dv
_
du
=
1
2
_
R
_
_
T
T
e
iu(va)
e
iu(vb)
iu
du
_
dL
X
(v).
Puis, on calcule directement que
1
2
_
T
T
e
iu(va)
e
iu(vb)
iu
du =
S((v a)T) S((v b)T)
T
, (B.5)
o` u S(x) := sgn(x)
_
|x|
0
sin t
t
dt, t > 0, et sgn(x) = 1
{x>0}
1
{x<0}
. On peut
verier que lim
x
S(x) =

2
, que lexpression (B.5) est uniformement bornee
en x et T, et quelle converge vers
0 si x , [a, b],
1
2
si x a, b, et 1 si x ,]a, b[.
On obtient alors le resultat annonce par le theor`eme de convergence dominee.
Supposons de plus que
_
R
[
X
(u)[du < . Alors, en prenant la limite T
dans lexpression du theor`eme, et en supposant dans un premier temps que
L
X
na pas datomes, on obtient:
L
X
(]a, b] = F
X
(b) F
X
(a) =
1
2
_
R
e
iua
e
iub
iu
du
B.3. Espaces L
p
219
par le theor`eme de convergence dominee. On realise alors que le membre de
droite est continu en a et b et, par suite, L
X
na pas datomes et lexpression ci-
dessus est vraie. Pour trouver lexpression de la densite f
X
, il sut de prendre
la limite b a apr`es normalisation par b a, et dutiliser le theor`eme de
convergence dominee.
B.3 Espaces L
p
et convergences
fonctionnelles des variables aleatoires
B.3.1 Geometrie de lespace L
2
On designe par L
2
= L
2
(, /, P) lespace vectoriel des variables aleatoires relles
de carre Pintegrable. Une application simple de linegalite de Jensen montre
montre que L
2
L
1
= L
1
(, /, P).
Lapplication (X, Y ) E[XY ] denit un produit scalaire sur L
2
si on
identie les v.a.egales p.s. On note la norme correspondantes par |X|
2
:=
E[X
2
]
1/2
. En particulier, ceci garantit linegalite de Schwarz (valable pour les
mesures, voir exercice A.37):
[E[XY ][ E[[XY [] |X|
2
|Y |
2
pour tous X, Y L
2
,
ainsi que linegalite triangulaire
|X +Y |
2
|X|
2
+|Y |
2
pour tous X, Y L
2
.
(On peut verier que les preuves de ces resultats ne sont pas perturbees par le
probl`eme didentication des v.a. egales p.s.)
En probabilite, lesperance quantie la moyenne de la v.a. Il est aussi im-
portant, au moins intuitivement, davoir une mesure de la dispersion de la loi.
ceci est quantie par la notion de variance et de covariance:
V[X] := E[(X EX)
2
] = E[X
2
] E[X]
2
et
Cov[X, Y ] := E[(X EX)(Y EY )] = E[XY ] E[X]E[Y ].
Si X est `a valeurs dans R
n
, ces notions sont etendus de mani`ere naturelle. Dans
ce cadre V[X] est une matrice symetrique positive de taille n.
Enn, la correlation entre les v.a. X et Y est denie par
Cor[X, Y ] :=
Cov[X, Y ]
|X|
2
|Y |
2
=
X, Y )
2
|X|
2
|Y |
2
,
i.e. le cosinus de langle forme par les vecteurs X et Y . Linegalite de Schwarz
garantit que la correlation est un reel dans lintervalle [1, 1]. Le theor`eme de
Pythagore secrit
E[(X +Y )
2
] = E[X
2
] +E[Y
2
] d`es que E[XY ] = 0,
220 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
ou, en termes de variances,
V[X +Y ] = V[X] +V[Y ] d`es que Cov[X, Y ] = 0.
Attention,la variance nest pas un operateur lineaire, la formule ci-dessus est
uniquement valable si Cov[X, Y ] = 0. Enn, la loi du parallelogramme secrit
|X +Y |
2
2
+|X Y |
2
2
= 2|X|
2
2
+ 2|Y |
2
2
pour tous X, Y L
2
.
B.3.2 Espaces L
p
et L
p
Pour p [1, [, on note par L
p
:= L
p
(, /, P) lespace vectoriel des variables
aleatoires X telles que E[[X[
p
] < . On note |X|
p
:= (E[[X[
p
])
1/p
. Remar-
quons que |X|
p
= 0 implique seulement que X = 0 p.s. donc |.|
p
ne denit
pas une norme sur L
p
.
Denition B.11. Lespace L
p
est lensemble des classes dequivalence de L
p
pour la relation denie par legalite p.s.
Ainsi lespace L
p
identie les variables aleatoires egales p.s. et |.| denit
bien une norme sur L
p
.
Nous continuerons tout de meme `a travailler sur lespace L
p
et nous ne
passerons `a L
p
que si necessaire.
Par une application directe de linegalite de Jensen, on voit que
|X|
p
|X|
r
si 1 p r < pour tout X L
r
, (B.6)
en particulier, X L
p
. Ceci montre que L
p
L
r
d`es que 1 p r < .
Nous allons montrer que lespace L
p
peut etre transforme (toujours par quo-
tionnement par la classe des v.a. nulles p.s.) en un espace de Banach.
Theorem B.12. Pour p 1, lespace L
p
est un espace de Banach, et L
2
est
espace de Hilbert. Plus precisement, soit (X
n
)
n
une suite de Cauchy dans L
p
,
i.e. |X
n
X
m
|
p
0 pour n, m . Alors il existe une v.a. X L
p
telle
que |X
n
X|
p
0.
Proof. Si (X
n
)
n
est une suite de Cauchy, on peut trouver une suite croissante
(k
n
)
n
N, k
n
, telle que
|X
m
X
n
|
p
2
n
pour tous m, n k
n
. (B.7)
Alors, on deduit de linegalite (B.6) que
E[[X
k
n+1
X
k
n
[] |X
k
n+1
X
k
n
|
p
2
n
,
et que E[

n
[X
k
n+1
X
k
n
[] < . Alors la serie

n
(X
k
n+1
X
k
n
) est absolue-
ment convergente p.s. Comme il sagit dune serie telescopique, ceci montre
que
lim
n
X
k
n
= X p.s. o` u X := limsup
n
X
k
n
.
B.3. Espaces L
p
221
Revenant `a (B.7), on voit que pour n k
n
et m n, on a E[X
n
X
k
m
[
p
] =
|X
n
X
k
m
|
p
p
2
np
. Pour m , on deduit du lemme de Fatou que
E[[X
n
X[
p
] 2
np
.
B.3.3 Espaces L
0
et L
0
On note par L
0
:= L
0
(/) lespace vectoriel des variables aleatoires /mesurables
sur lespace probabilise (, /, P), et on introduit lespace quotient L
0
constitue
des classes dequivalence de L
0
pour la relation denie par legalite p.s.
Denition B.13. (Convergence en probabilite) Soient (X
n
)
n
et X des v.a.
dans L
0
. On dit que (X
n
)
n
converge en probabilite vers X si
lim
n
P[[X
n
X[ ] = 0 pour tout > 0.
Cette notion de convergence est plus faible que la convergence p.s. et que la
convergence dans L
p
dans le sens suivant.
Lemma B.14. (i) La convergence p.s. implique la convergence en probabilite,
(ii) Soit p 1. La convergence en norme dans L
p
implique la convergence en
probabilite.
Proof. (i) decoule dune application immediate du theor`eme de la convergence
dominee. Pour (ii), il sut dutiliser linegalite de Markov de lexercice A.36.

Le but de ce paragraphe est de montrer que la convergence en probabilite


est metrisable et quelle conf`ere `a L
0
une structure despace metrique complet.
Pour cela, on introduit la fonction D : L
0
L
0
R
+
denit par:
D(X, Y ) = E[[X Y [ 1] pour tous X, Y L
0
. (B.8)
On verie imediatement que D est une distance sur L
0
, mais ne lest pas sur
L
0
, pour les memes raisons que celles du paragraphe precedent.
Lemma B.15. La convergence en probabilite est equivalente ` a la convergence
au sens de la distance D.
Proof. Pour X L
0
, on obtient par linegalite de Markov de lexercice A.36:
P[[X[ ] = P[[X[ 1 ]
E[[X[ 1]

,
qui permet de deduire que au sens de D implique la convergence en probabilite.
Pour limplication inverse, on estime:
E[[X[ 1] = E[([X[ 1)1
|X|
] +E[([X[ 1)1
|X|<
] P[[X[ ] +,
do` u on tire que la convergence en probabilite implique la convergence au sens
de D.
222 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
Theorem B.16. (L
0
, D) est un espace metrique complet.
Proof. Soit (X
n
)
n
une suite de Cauchy pour D. Alors cest une suite de Cauchy
pour la convergence en probabilite dapr`es le lemme B.15, et on peut construire
une suite (n
k
)
k
telle que
P
_
[X
n
k+1
X
n
k
[ 2
k

2
k
pour tout k 1,
et par suite

k
P
_
[X
n
k+1
X
n
k
[ 2
k

< . Le premier lemme de Borel-


Cantelli (lemme A.14) implique alors que P
_

n

mn
[X
n
k+1
X
n
k
[ 2
k

=
1 et, par suite, pour presque tout , (X
n
k
())
n
est une suite de Cauchy
dans R. Ainsi, la v.a. X := limsup
n
X
k
n
verie X
n
k
X p.s. donc en
probabilite, et on termine comme dans la demonstration du theor`eme B.12.
B.3.4 Lien entre les convergences L
p
, en proba et p.s.
Nous avons vu que la convergence en probabilite est plus faible que la conver-
gence p.s. Le resultat suivant etablit un lien precis entre ces deux notions de
convergence.
Theorem B.17. Soient X
n
, n 1 et X des v.a. dans L
0
.
(i) X
n
X p.s. ssi sup
mn
[X
m
X[ 0 en probabilite.
(ii) X
n
X en probabilite ssi de toute suite croissante dentiers (n
k
)
k
, on
peut extraire une sous-suite (n
k
j
)
j
telle que X
n
k
j
X p.s.
La demonstration est reportee `a la n de ce paragraphe. On continue par
une consequence immediate du teor`eme B.17 (ii).
Corollary B.18. (Slutsky) Soient (X
n
)
n
une suite ` a valeur dans R
d
, et :
R
n
R
d
une fonction continue. Si X
n
X en probabilite, alors (X
n
)
(X) en probabilite.
Ceci est une consequence immediate du theor`eme B.17 (ii). En particulier, il
montre que la convergence en probabilite est stable pour les operations usuelles
daddition, de multiplication, de min, de max, etc...
Avant de demontrer le theor`eme B.17, enon cons le resultat etablissant le lien
precis entre la convergence en probabilite et la convergence dans L
1
.
Denition B.19. Une famille ( de v.a. est dite uniformement integrable, et
on note U.I. si
lim
c
sup
XC
E[[X[1
{|X|c}
= 0.
Theorem B.20. Soient X
n
, n 1 et X des v.a. dans L
1
. Alors X
n
X
dans L
1
si et seulement si
(a) X
n
X en probabilite,
(b) (X
n
)
n
est U.I.
B.3. Espaces L
p
223
La demonstration de ce resultat est reportee `a la n de ce paragraphe.
Lexercice suivant regroupe les resultats essentiels qui concernent luniforme
integrabilite.
Exercise B.21. Soit (X
n
)
n
une suite de v.a. ` a valeurs reelles.
1. Supposons que (X
n
)
n
est U.I.
(a) Montrer que (X
n
)
n
est bornee dans L
1
, i.e. sup
n
E[[X
n
[] < .
(b) Sur lespace probabilise ([0, 1], B
[0,1]
, ), etant la mesure de Lebesgue,
on consid`ere la suite Y
n
:= n1
[0,1/n]
. Montrer que (Y
n
)
n
est bornee
dans L
1
, mais nest pas U.I.
2. Supposons que E[sup
n
[X
n
[] < . Montrer que (X
n
) est U.I. (Indication:
utiliser la croissance de la fonction x x1
{xc}
R
+
).
3. Supposons quil existe p > 1 tel que (X
n
)
n
est bornee dans L
p
.
(a) Montrer que E[[X
n
[1
{|X
n
|c}
] |X
n
|
p
P[[X
n
c]
11/p
(b) En deduire que (X
n
) est U.I.
Nous allons maintenons passer aux demonstrations des theor`emes de ce para-
graphe.
Preuve du theor`eme B.17 (i) Remarquons que
C := X
n
X =
k

n

mn
[X
m
X[ k
1
= lim
k

n
A
n
o` u A
n
:=
mn
[X
m
X[ k
1
. La convergence p.s. de X
n
vers X secrit
P[C] = 1, et est equivalente `a P[
n
A
n
] = 1 pour tout k 1. Comme la suite
(A
n
)
n
est croissante, ceci est equivalent `a lim
n
P[A
n
] = 1 pour tout k 1, ce
qui exprime exactement la convergence en probabilite de sup
mn
[X
m
X[ vers
0.
(ii) Supposons dabord que X
n
X en probabilite. Soit (n
k
) une suite crois-
sante dindices, et

X
k
:= X
n
k
. On denit
k
j
:= inf
_
i : P[[

X
i
X[ 2
j
] 2
j
_
.
Alors,

j
P[[

X
k
j
X[ 2
j
] < , et on deduit du premier lemme de Borel
Cantelli, lemme A.14 que [

X
k
j
X[ < 2
j
pour j assez grand, p.s. En partic-
ulier, ceci montre que

X
k
j
X, p.s.
Pour la condition susante, supposons au contraire que X
n
, X en prob-
abilite. Alors, dapr`es le lemme B.15, il existe une sous-suite (n
k
) croissante
et > 0 tels que D(X
n
k
, X) . On arrive `a une contradiction en extrayant
une sous-suite (X
n
k
j
)
j
qui converge p.s. vers X, et en evoquant le theor`eme de
convergence dominee pour le passage `a la limite.
Preuve du theor`eme B.20 Supposons dabord que les conditions (a) et (b)
sont satisfaites. La fonction
c
(x) := c x c, x R est lipschitzienne, et
224 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
verie [
c
(x) x[ [x[1
|x|c
. On deduit alors
- de lU.I. de (X
n
)
n
et lintegrabilite de X que, quand c :
E[[
c
(X
n
) X
n
[] 0 pour tout n et E[[
c
(X) X[] 0,
- et de la convergence en probabilite de X
n
vers X, et du corollaire B.18, que

c
(X
n
)
c
(X) en probabilite.
On peut maintenant conclure que X
n
X dans L
1
en decomposant
E[[X
n
X[] E[[X
n

c
(X
n
)[] +E[[
c
(X
n
)
c
(X)[] +E[[
c
(X) X[.
Reciproquement, supposons que X
n
X dans L
1
, alors la convergence en
probabilite (a) est une consequence immediate de linegalite de Markov (A.7)
(exercice A.36). Pour montrer (b), on se donne > 0. La convergence L
1
de
(X
n
)
n
montre lexistence dun rang N `a partir duquel
E[X
n
X[ < pour tout n > N. (B.9)
Par ailleurs, dapr`es le lemme A.26, il existe > 0 tel que pour tout A /:
sup
nN
E[[X
n
[1
A
] < et E[[X[1
A
] < d`es que P[A] < . (B.10)
Nous allons utiliser cette inegalite avec les ensembles A
n
:= [X
n
[ > c qui
verient bien
sup
n
P[A
n
] c
1
sup
n
E[[X
n
[] < pour c assez grand, (B.11)
o` u nous avons utilise linegalite de Markov (A.7) (exercice A.36), ainsi que la
bornitude dans L
1
de la suite (X
n
)
n
du fait de sa convergence dans L
1
. Ainsi,
on deduit de (B.10) et (B.11) que
sup
n
E[[X
n
[1
{|X
n
|>c}
] = max
_
sup
nN
E[[X
n
[1
{|X
n
|>c}
] , sup
n>N
E[[X
n
[1
{|X
n
|>c}
]
_
max
_
, sup
n>N
E[[X
n
[1
{|X
n
|>c}
]
_
max
_
, sup
n>N
E[[X[1
{|X
n
|>c}
+E[[X X
n
[1
{|X
n
|>c}
]
_
max
_
, sup
n>N
E[[X[1
A
n
+E[[X X
n
[]
_
< 2,
o` u la derni`ere inegalite est due `a (B.9), (B.10) et (B.11).
B.4. Convergence en loi 225
B.4 Convergence en loi
Dans ce paragraphe, nous nous interessons `a la convergence des loi. Remarquons
immediatement quil ne peut sagir que dun sens de convergence plus faible que
les convergences fonctionnelles etudiees dans le paragraphe precedent puis quon
ne pourra en general rien dire sur les variables aleatoires sous-jacentes. A titre
dexemple, si X est une v.a. de loi gaussienne centree, alors X a la meme loi
que X (on ecrit X
L
= X). Pire encore, on peut avoir deux v.a. reelles X et
Y sur des espaces probabilises dierents (
2
/
1
, P
1
) et (
2
, /
2
, P
2
) qui ont la
meme distribution.
Dans ce paragraphe, on designera par C
b
(R) lensemble des fonctions con-
tinues bornees sur R et (R) lensemble des mesures de probabilite sur R.
B.4.1 Denitions
Soient et
n
, n N (R). On dit que (
n
)
n
converge faiblement, ou
etroitement, vers si:
n
(f) (f) pour toute fonction f C
b
(R).
Soient X et X
n
, n N des v.a. dans L
0
(/, P). On dit que (X
n
)
n
converge
en loi vers X si (L
X
n
)
n
converge faiblement vers L
X
, i.e.
E[f(X
n
)] E[f(X)] pour tout f C
b
(R).
Dans la derni`ere denition, il nest pas necessaire que les v.a. X, X
n
, n N
soient denies sur le meme espace probabilise. Montrons maintenant que les
convergences intrduites dans les chapitres precedents sont plus fortes que la
convergence en loi.
Proposition B.22. La convergence en probabilite implique la convergence en
loi.
Proof. Supposons que X
n
X en probabilite, et soient g C
b
(R). La suite
reelles u
n
:= E[g(X
n
)], n N, est bornee. Pour montrer la convergence en
loi, il sut de verier que toute sous-suite convergente (u
n
k
)
k
converge vers
E[g(X)]. Pour cel`a, il sut dutiliser le lemme B.17 et le theor`eme de conver-
gence dominee.
Comme la convergence en probabilite est plus faible que la convergence L
1
et
la convergence p.s. on le schemas suivant expliquant les liens entre les dierents
types de convergence rencontres:
L
p
= L
1

p.s. = P = Loi
226 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
B.4.2 Caracterisation de la convergence en loi par les fonc-
tions de repartition
Toute loi (R) est caracterisee par la fonction de repartition correspondante
F(x) :=
_
xd(x). Ainsi, si F, F
n
, n N des fonctions de repartition sur R, on
dira que (F
n
)
n
converge en loi vers F si la convergence en loi a lieu pour les
mesures correspondantes.
Dans ce paragraphe, nous allons exprimer la denition de la convergence
faible de mani`ere equivalente en terme des fonctions de repartition.
Remark B.23. Les points de discontinuite de F, sil y en a, jouent un role
particulier: Sur ([0, 1], B
[0,1]
, ), soit
n
:=
1/n
la masse de Dirac au point 1/n
(cest la loi de la v.a. deterministe X
n
= 1/n). Alors (
n
) converge en loi vers

0
, la masse de Dirac au point 0. Maisn pour tout n 1, F
n
(0) = 0 , F

0
(0).
Theorem B.24. Soient F, F
n
, n N des fonctions de repartition sur R. Alors,
(F
n
) converge en loi vers F si et seulement si
Pour tout x R, F(x) = F(x) = F
n
(x) F(x).
Proof. 1- Pour > 0 et x R, on denit les fonctions
g
1
(y) := 1] , x +]
y x

1]x, x +] et g
2
(y) := g
1
(y +), y R,
et on observe que 1
],x]
g
1
1
],x+]
, 1
],x]
g
2
1
],x]
et, par
suite
F
n
(x)
n
(g
1
), (g
1
) F(x +), et F
n
(x)
n
(g
2
), (g
2
) F(x )
Comme g
1
, g
2
C
b
(R), on deduit de a convergence faible de (F
n
)
n
vers F que

n
(g
1
) (g
1
),
n
(g
2
) (g
2
), et
F(x ) liminf
n
F
n
(x) limsup
n
F
n
(x) F(x +) pour tout > 0,
qui implique bien que F
n
(x) F(x) si x est un point de continuite de F.
2- Pour la condition susante, on denit comme dans la remarque B.4 les v.a.
X, X, X
n
, X
n
qui ont pour fonction de repartition F et F
n
. Par denition de X,
pour tout x > X() on a F(x) > . Si x est un point de continuite de F, ceci
implique que F
n
(x) > pour n assez grand et, par suite, x X
n
(). Comme F
est croissante, lensemble de ses points de discontinuite est au plus denombrable.
On peut donc faire tendre x vers X() lelong de points de continuite de F, et
on tire linegalite X() X
n
() pour n assez grand. On obtient le resultat
symetrique en raisonnant sur X et X
n
. Do` u:
X() X
n
() X
n
() X pour n assez grand.
Comme P[X = X] = 1, ceci montrer que X
n
X p.s. et donc en loi.
B.4. Convergence en loi 227
B.4.3 Convergence des fonctions de repartition
Limportance de la convergence en loi provient de la facilite dobtenir des theor`emes
limites. En eet, les suites de mesures convergent en loi ` a peu de frais, lelong
dune sous-suite, vers une limite qui nest cependant pas necessairement une loi.
Si la limite nest pas une loi, on dit quil y a perte de masse.
Avant denoncer un resultat precis, expliquons les idees quil y a derri`ere ces
resultats profonds. Les fonctions de repartition ont une structure tr`es speciale:
on regardant le graphe dune fonction de repartition dans les coordonnee (x +
y, x + y) (obtenu par rotation des coordonnees initiale de 45

), le graphe
devient celui dune fonction dont la valeur absolue de la pente est majoree par
1: les pentes 1 et 1 correspondent respectivement aux plats et aux sauts de
la fonction de repartition. Ainsi dans ce syst`eme de coordonnees le graphe perd
la proprie de croissance, mais devient 1Lipschitzien. Le theor`eme dAscoli
nous garantit alors lexistence dune sous-suite convergente. La demonstration
ci-dessous utilise un argument encore plus elementaire.
Lemma B.25. Soit (F
n
)
n
une suite de fonctions de repartition sur R. Alors,
il existe une fonction croissante continue ` a droite F : R [0, 1], et une sous-
suite (n
k
) telles que F
n
k
F simplement en tout point de continuite de F.
Proof. On denombre les elements de lensemble des rationnels Q = q
i
, i N.
La suite (F
n
(q
1
))
n
est bornee, donc converge le long dune sous-suite F
n
1
k
(q
1
)
G(q
1
) quand k . De meme la suite
_
F
n
1
k
(q
2
)
_
n
est bornee, donc converge le
long dune sous-suite F
n
2
k
(q
2
) G(q
2
) quand k , etc... Alors, en posant
k
j
:= n
j
j
, on obtient
F
k
j
(q) G(q) pour tout q Q.
Il est clair que G est croissante sur Q et `a valeurs dans [0, 1]. On denit alors
la fonction F par
F(x) := lim
Qqx
G(q) pour tout x R,
qui verie les proprietes annoncees dans le lemme.
An deviter la perte de masse `a la limite, on introduit une nouvelle notion.
Denition B.26. Une suite (F
n
)
n1
de fonctions de repartition sur R est dite
tendue si pour tout > 0, il existe K > 0 tel que

n
([K, K]) := F
n
(K) F
n
(K) > 1 pour tout n 1.
Le resultat suivant est une consequence directe du lemme precedent.
Lemma B.27. Soit (F
n
)
n
une suite de fonctions de repartition sur R.
(i) Si F
n
F en loi, alors (F
n
)
n
est tendue.
(ii) Si (F
n
)
n
est tendue, alors il existe une fonction de repartition F sur R, et
une sous-suite (n
k
) telles que F
n
k
F en loi.
228 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
B.4.4 Convergence en loi et fonctions caracteristiques
La fonction caracteristique caracterise une loi de distribution tout aussi bien
que la fonction de repartition. Le resultat suivant donne la caracterisation de
la convergence en loi en termes de fonctions caracteristiques.
Theorem B.28. (convergence de Levy) Soit (F
n
)
n
une suite de fonctions de
repartitions sur R, et (
n
)
n
la suite de fonctions caracteristiques correspon-
dantes. Supposons quil existe une fonction sur R telle que

n
simplement sur R et continue en 0.
Alors est une fonction caracteristique correspondant ` a une fonction de repartition
F, et F
n
F en loi.
Proof. 1- Montrons dabord que
(F
n
)
n
est tendue. (B.12)
Soit > 0. Dapr`es la continuite de en 0, il existe > 0 tel que [1 [ <
sur [, ]. Il est clair que 2
n
(u)
n
(u) R
+
et que cette propriete
est heritee par `a la limite. Alors 0
_

0
[2 (u) (u)]du 2, et on
deduit de la convergence de
n
vers et du theor`eme de convergence dominee
qu` a partir dun certain rang n N:
4
1

_

0
[2
n
(u)
n
(u)]du
=
1

_
R
_
1 e
iu
_
dF
n
()du
=
1

_
R
_

_
1 e
iu
_
dudF
n
() = 2
_
R
_
1
sin ()

_
dF
n
()
par le theor`eme de Fubini. Comme sin x x pour tout x R, on deduit alors
que pour tout > 0, il existe > 0 tel que:
4 2
_
||2
1
_
1
sin ()

_
dF
n
()
_
||2
1
dF
n
(),
prouvant (B.12).
2- Comme (F
n
)
n
est tendue, on deduit du lemme B.27 que F
n
k
F en loi
lelong dune sous-suite (n
k
)
k
, o` u F est une foncion de repartition. Dapr`es la
denition de la convergence en loi, on a aussi convergence des fonctions car-
acteristiques correspondantes
n
k

F
. Alors =
F
.
3- Il reste `a monter que F
n
F en loi. Supposons au contraire quil existe
un point de continuite x tel que F
n
(x) , F(x). Alors, il existe une sous-suite
(n
k
)
k
telle que
F(x) = F(x) et [F
n
k
(x) F(x)[ pour tout k. (B.13)
B.5. Ind ependance 229
Comme (F
n
k
)
k
est tendue dapr`es letape 1, on a F
n
k
j


F en loi lelong dune
sous-suite (n
k
j
)
j
, o` u

F est une foncion de repartition. Raisonnant comme dans
letape precedente, on voit que
n
k
j

F
= =
F
, et on deduit que

F = F
par injectivite. Ainsi F
n
k
j
F en loi, contredisant (B.13).
B.5 Independance
B.5.1 alg`ebres independantes
Soient (, /, P) un espace probabilise, et (/
n
)
n
/ une suite de alg`ebres.
On dit que les (/
n
)
n
sont independantes (sous P) si pour tous entiers n 1 et
1 i
1
< . . . < i
n
:
P[
n
k=1
A
i
k
] =
n

k=1
P[A
i
k
] pour tous A
i
k
/
i
k
, 1 k n. (B.14)
Remarquons que le theor`eme de convergence monotone permet darmer que
(B.14) est aussi valide pour n = , i.e.
P[
k1
A
i
k
] =

k1
P[A
i
k
] pour tous A
i
k
/
i
k
, k 1. (B.15)
A partir de cette denition generale pour les alg`ebres, on etend lindependance
`a des sous-familles arbitraires de / et aux v.a.
Denition B.29. On dit que les evenements (A
n
)
n
/ sont independants
si ((A
n
))
n
sont independantes ou, de mani`ere equivalente, si les v.a. (1
X
n
)
n
sont independantes.
Dans la partie (ii) de la denition precedentes, il est inutile de verier (B.14)
pour tous les choix possibles dans les alg`ebres (A
n
) = , , A
n
, An
c
. En
eet, on peut facilement montrer quil sut de verier que
P[
n
k=1
A
i
k
] =
n

k=1
P[A
i
k
] pour n 1 et 1 i
1
< . . . < i
n
.
Voici une formulation plus generale de ce resultat.
Lemma B.30. Soit (1
n
)
n
/ une suite de syst`emes. Alors les sous-
alg`ebres ((1
n
))
n
sont independantes si et seulement si (B.14) est vraie pour
les evenements des 1
n
, i.e. si pour tous entiers n 1 et 1 i
1
< . . . < i
n
, on
a:
P[
n
k=1
I
i
k
] =
n

k=1
P[I
i
k
] pour tous I
i
k
1
i
k
, 1 k n.
230 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
Proof. il sut de verier le resultat pour deux syst`emes 1
1
, 1
2
. Fixons un
evenement I
1
1
1
, et introduisons les applications de (1
2
) dans [0, P[I
1
]]
denies par (I
2
) := (I
1
I
2
) et (I
2
) := (I
1
)(I
2
). Il est clair que et
sont des mesures sur (1
2
) egales sur le syst`eme 1
2
. Alors elles sont egales sur
(1
2
) dapr`es la proposition A.5. Il sut maintenant devoquer le r ole arbitraire
de I
1
1
1
, et de repeter exactement le meme argument en inversant 1
1
et 1
2
.

B.5.2 variables aleatoires independantes


Denition B.31. On dit que des v.a. (X
n
)
n
sont independantes si les sous-
alg`ebres correspondantes ((X
n
))
n
sont independantes.
Une application directe du lemme B.30 et du theor`eme de Fubini permet
detablir le crit`ere suivant dindependance de v.a.
Proposition B.32. Les v.a. (X
n
)
n
sont independantes si et seulement si pour
tous n 1 et 1 i
1
< . . . < i
n
, lune des assertions suivantes est veriee:
(a) P[X
i
k
x
k
pour 1 k n] =

n
k=1
P[X
i
k
x
k
] pour tous x
1
, . . . , x
k

R,
(b) E[

n
k=1
f
i
k
(X
i
k
)] =

n
k=1
E[f
i
k
(X
i
k
)] pour toutes f
i
k
: R R, 1 k
n, mesurables bornees.
(c) L
(X
i
1
,...,X
i
n
)
= L
X
i
1
. . . L
X
i
n
Exercise B.33. Montrer la proposition B.32.
Remark B.34. Si X, Y sont deux v.a. relles independantes, la proposition
B.32 implique que la fonction caracteristique du couple se factorise:

(X,Y )
(u, v) =
X
(u)
Y
(v) pour tous u, v R.
Remark B.35. Soient X, Y deux v.a. relles independantes integrables, alors
dapr`es la proposition B.32, on a
E[XY ] = E[X]E[Y ], Cov[X, Y ] = 0 et V[X +Y ] = V[X] +V[Y ].
Observons que la nullite de la covariance nimplique pas lindependance, en
general. Dans le cas tr`es particulier o` u le couple (X, Y ) est un vecteur gaussien,
on a cependant equivalence entre lindependance et la nullite de la covariance.
Si les (X
n
)
n
sont des v.a. independantes `a densite, alors on deduit de
lassertion (a) ci-dessus que le vecteur aleatoire (X
i
1
, . . . , X
i
n
) est absoluement
continu par rapport `a la mesure de Lebesgue sur R
n
de densite
f
(X
i
1
,...,X
i
n
)
(x
1
, . . . , x
n
) := f
X
i
1
(x
1
) . . . f
X
i
n
(x
n
). (B.16)
Reciproquement si le vecteur aleatoire (X
i
1
, . . . , X
i
n
) est absoluement continu
par rapport `a la mesure de Lebesgue sur R
n
de densite separable, comme dans
(B.16) f
(X
i
1
,...,X
i
n
)
(x
1
, . . . , x
n
) =
1
(x
1
) . . .
n
(x
n
) alors, les v.a. X
i
k
sont
independantes `a densite f
X
i
k
=
k
.
B.5. Ind ependance 231
B.5.3 Asymptotique des suites devenements independants
Le resultat suivant joue un r ole central en probabilites. Remarquons tout de
suite que la partie (i) reprend le resultat etabli plus generalement pour les
mesures dans le lemme A.14.
Lemma B.36. (Borel-Cantelli) Soit (A
n
)
n
une suite devenements dun espace
probabilise (, /, P).
(i) Si

n
P[A
n
] < , alors P[limsup
n
A
n
] = 0,
(ii) Si

n
P[A
n
] < et (A
n
)
n
sont independants, alors P[limsup
n
A
n
] = 1.
(iii) Si (A
n
)
n
sont independants, alors soit limsup
n
A
n
est negligeable, soit
(limsup
n
A
n
)
c
est negligeable.
Proof. Il reste `a montrer (ii). Par denition de lindependance et (B.15), on a
P[
mn
A
c
m
] =

mn
(1 P[A
m
])

mn
e
P[A
m
]
= e

P
mn
P[A
m
]
= 0.
Ainsi, pour tout n 1, levenement
mn
A
c
m
est negligeable. Lunion denombrable
de negligeables (limsup
n
A
n
)
c
=
n1

mn
A
c
m
est alors negligeable.
Le resultat suivant est assez frappant, et est une consequence du Lemma de
Borel-Cantelli.
Theorem B.37. Soient (X
n
)
n
une suite de v.a. independantes, et T :=

n
(X
m
, m > n) la alg`ebre de queue associee. Alors T est triviale, cest
` a dire:
(i) Pour tout evenement A T , on a P[A]P[A
c
] = 0,
(ii) Toute v.a. T mesurable est deterministe p.s.
Proof. (i) De lindependance des (X
n
)
n
, on deduit que pour tout n 1, les
alg`ebres /
n
:= (X
1
, . . . , X
n
) et T
n
:= (X
m
, m > n) sont independantes.
Comme T T
n
, on voit que /
n
et T sont independantes, et par suite
n
/
n
et T sont independantes. En observant que
n
/
n
est un syst`eme, on deduit
du lemme B.30 que /

:= (
n
/
n
) et T sont independants.
Or, T /

, donc lindependance entre T et /

implique que T est


independant de lui meme, et pour tout A T , P[A] = P[A A] = P[A]
2
.
(ii) Pour tout x R, levenement P[ x] 0, 1 dapr`es (i). Soit c := supx :
P[ x] = 0. Si c = , ou c = +, on voit immediatement que = c
(deterministe), p.s. Si [c[ < , la denition de c implique que P[ c ] =
P[ > c+] = 0 pour tout > 0. Alors 1 E[1
]c,c+]
()] = P[< c+] = 1,
i.e. 1
]c,c+]
() = 1 p.s. et on termine la preuve en envoyanty vers 0.
La alg`ebre de queue introduite dans le theor`eme B.37 contient de nom-
breux evenements interessants comme par exemple
lim
n
X
n
existe,

n
X
n
converge, lim
n
1
n
n

i=1
X
i
existe
232 APPENDIX B. ESSENTIALS OF PROBABILITY THEORY
Un exemple de v.a. T mesurable est limsup

n
X
n
, liminf
1
n

n
i=1
X
i
, ...
B.5.4 Asymptotique des moyennes de v.a. independantes
Dans ce paragraphe, nous manipulerons des suites de v.a. independantes et
identiquement distribuees, on ecrira plus simplement iid.
On commencera par enoncer la loi des grands nombres pour les suites de v.a.
iid integrables, la demonstration par lapproche des martingales est disponible
dans le polycopie de MAP 432 [40]. Puis, nous montrerons le theor`eme central
limite.
Theorem B.38. (Loi forte des grands nombres) Soit (X
n
)
n
une suite de v.a.
iid integrables. Alors
1
n
n

i=1
X
i
E[X
1
] p.s.
Si les v.a. iid sont de carre integrable, le theor`eme central limite donne une
information precise sur le taux de convergence de la moyenne empirique vers
lesperance, ou la moyenne theorique.
Theorem B.39. Soit (X
n
)
n
une suite de v.a. iid de carre integrable. Alors

n
_
1
n
n

i=1
X
i
E[X
1
]
_
^(0, V[X
1
]) en loi,
o` u ^(0, V[X
1
]) designe la loi normale centree de variance V[X
1
].
Proof. On note

X
i
= X
i
E[X
1
] et G
n
:=

n
1
n

n
i=1

X
i
. En utilisant les
proprietes de la fonction caracteristique pour les variables iid (X
i
)
i
, on obtient
que

G
n
(u) = P
n
i=1

X
i

n
(u) =
n

i=1

X
i

n
(u) =
_

X
i
_
u

n
__
n
.
Dapr`es 2.7 et le fait que E[

X
1
= 0] et E[

X
2
1
] = V[X
1
] < , on peut ecrire le
developpement au second ordre suivant:

G
n
(u) =
_
1
1
2
u
2
n
V[X
1
] +
_
1
n
__
n
(u) := e

u
2
2
V[X
1
]
.
On reconnait alors que =
N(0,V[X
1
])
, voir question 1 de lexercice B.9, et on
conclut gr ace au theor`eme B.28 de convergence de Levy.
Bibliography
[1] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, Coherent
measures of risk, Mathematical Finance, (1999).
[2] J.-P. Aubin (1975), Lanalysenon lineaire et ses motivations
economiques, Masson.
[3] Basel Committee on Banking Supervision, International con-
vergence of capital measurement and capital standards, Bank of In-
ternational Settlements, 1988.
[4] Basel Committee on Banking Supervision, Amendment to the
capital accord to incorporate market risks, Bank of International Set-
tlements, 1996.
[5] Basel Committee on Banking Supervision, Basel II: Interna-
tional convergence of capital measurement and capital standards: A
revised framework, Bank of International Settlements, 2005.
[6] Black F. and Scholes M. (1973), The pricing of options and corporate
liabilities, Journal of Political Economy, 81, 637-654.
[7] H. Berestycki, J. Busca, and I. Florent, Asymptotics and cal-
ibration of local volatility models, Quant. Finance, 2 (2002), pp. 61
69.
[8] Carr P., Ellis K. and Gupta V. (1998), Static Hedging of Exotic
Options, Journal of Finance, June 1998, pp. 1165-90.
[9] R. Cont, Model uncertainty and its impact on the pricing of deriva-
tive instruments, Mathematical Finance, 16 (2006), pp. 519542.
[10] Conze A. and Viswanathan R. (1991), Path dependent options : the
case of lookback options, Journal of Finance, 46, 1893-1907.
[11] J. C. Cox, The constant elasticity of variance option pricing model,
Journal of Portfolio Management, 22 (1996), pp. 1517.
[12] M. Crouhy, D. Galai, and R. Mark, The Essentials of Risk
Management, McGraw-Hill, 2005.
233
234 BIBLIOGRAPHY
[13] Cox J.C., Ross S.A. and Rubinstein M. (1979), Option pricing : a
simplied approach, Journal of Financial Economics, 7, 229-263.
[14] S. Cr epey, Calibration of the local volatility in a trinomial tree
using Tikhonov regularization, Inverse Problems, 19 (2003), pp. 91
127.
[15] Dana R.A. and Jeanblanc-Pique M. (1998), Marches nanciers en
temps continu, valorisation et equilibre, 2`eme edition, Economica.
[16] F. Delbaen and H. Shirakawa, A note on option pricing for the
constant elasticity of variance model, Asia-Pacic Financial Mar-
kets, 9 (2002), pp. 8599.
[17] Dellacherie C. and Meyer P.A. (1975), Probabilites et potentiel, Her-
mann.
[18] Demange G. and Rochet J.C. (1992), Methodes mathematiques de
la nance, Economica.
[19] Due D. (1996), Dynamic Asset Pricing Theory, 2nd edition,
Princeton University Press, Princeton (New Jersey).
[20] B. Dupire, Pricing with a smile, RISK, 7 (1994), pp. 1820.
[21] N. El Karoui, Couverture des risques dans les marches nanciers.
Lecture notes for master Probability and Finance, Paris VI univer-
sity.
[22] El Karoui N. and Gobet E. (2005), Mod`eles Stochastique en Fi-
nance, Polycopie de cours, Ecole Polytechnique, Departement de
Mathematiques Appliquees.
[23] El Karoui N. and Rochet J.C. (1989), A pricing formula for options
on zero-coupon bonds. Preprint.
[24] P. S. Hagan and D. E. Woodward, Equivalent black volatilities.
Research report, 1998.
[25] Harrison J.M. and Kreps D. (1979), Martingales and arbitrage in
multiperiod securities markets, Journal of Economic Theory, 20,
381-408.
[26] Harrison J.M. and Pliska S. (1981), Martingales and stochastic in-
tegrals in the theory of continuous trading : complete markets,
Stochastic Processes and their Applications, 11, 215-260.
[27] J. Jacod et P. Protter (2002), Probability Essentials, 2nd Edition.
Universitext, Springer-Verlag.
SUBJECT INDEX 235
[28] Jacod, J. and Shiryaev, A.N.(1998), Local martingales and the fun-
damental asset pricing theorems in the discrete-time case. Finance
and Stochastics 2, 259-273.
[29] Karatzas, I. and Shreve, S. (1991), Brownian Motion and Stochastic
Calculus, Springer Verlag.
[30] R. Lagnado and S. Osher, A technique for calibrating derivative
security pricing models: numerical solution of the inverse problem,
J. Comput. Finance, 1 (1997).
[31] Lamberton D. and Lapeyre B. (1997), Introduction au calcul stochas-
tique applique ` a la nance, 2`eme edition, Ellipses.
[32] Musiela M. and Rutkowski M. (1997), Martingale Methods in Fi-
nancial Modelling, Springer.
[33] Pliska S. (1997), Introduction to Mathematical Finance, Blackwell,
Oxford.
[34] P. Protter, Stochastic integration and dierential equations,
Springer, Berlin, 1990.
[35] Revuz D. and Yor M. (1991), Continuous Martingales and Brownian
Motion, Springer Verlag.
[36] Schachermayer W. (1992), A Hilbert space proof of the fundamental
theorem of asset pricing in nite discrete-time, Insurance : Mathe-
matics and Economics, 11, 291-301.
[37] W. Schoutens, E. Simons, and J. Tistaert, A perfect calibra-
tion ! now what ?, Wilmott Magazine, March (2004).
[38] A. Shiryaev (1989), Probability, 2nd Edition. Graduate Texts in
Mathematics 95, Springer-Verlag.
[39] Shiryaev A. (1998), Essentials of Stochastic Finance: Facts, Models
and Theory. World Scientic.
[40] Touzi, N. (2009), Chanes de Markov et martingales en temps dis-
cret, Polycopie MAP432, Ecole Polytechnique, Paris.
[41] Williams D. (1991), Probability with martingales, Cambridge Math-
ematical Textbooks.
[42] Yamada T. and Watanabe S. (1971), On the uniqueness of solutions
of stochastic dierential equations. J. Math. Kyoto Univ. 11, p155-
167.
Index
-syst`eme, 192, 207
-alg`ebre, 191
Borelienne, 191
adapted process, 27
alg`ebre, 191
American option, 11, 12
Asian option, 17
at-the-money, 12
Bachelier, Louis, 9, 37
barrier option, 17, 134
basket option, 16
Bessel process, 145
binomial model, 19
hedging, 20
no-arbitrage condition, 20
risk-neutral measure, 20
Blacks formula, 123
Black, Fisher, 9, 37
Black-Scholes formula, 120
as limit of CRR model, 24
under stochastic interest rates, 171
Black-Scholes model
Black-Scholes approach, 119
martingale approach, 92
practice, 127
verication approach, 73
Borel-Cantelli, lemme de, 195
Brown, Robert, 37
Brownian bridge, 109
Brownian motion, 37
distribution, 41
ltration of, 47
Levy characterization, 72
law of maximum, 46
Markov property, 44
sample path behavior, 48
scaling properties, 44
buttery spread, 149
calendar spread, 149
Cameron-Martin formula, 87
CEV model, 105, 144
change of measure, 87
Chebychev, inegalite de, 203
classe monotone, 196, 210
convergence
en loi, 225
en probabilite, 221
convergence dominee, 199
convergence monotone, 197
coupon-bearing bond, 154
Cox-Ross-Rubinstein model, 21
continuous-time limit, 23
valuation and hedging, 22
delta, 129
densite, 203
dividends, 123
Doobs maximal inequality, 33
Dupires formula, 146
in terms of implied volatility, 151
practice, 150
dynamic programming equation, 80
early exercise, 14
Einstein, Albert, 37
ensemble negligeable, 193
equivalent local martingale measure, 95
equivalent martingale measure, 20
esperance, 215
espace mesure, 192
espace mesurable, 191
236
SUBJECT INDEX 237
European call option, 11, 120
European put option, 11
exercise price, 11
Fatou, lemme de, 198
Feynman-Kac formula, 111
ltration, 27
fonction borelienne, 196
fonction caracteristique, 217
fonction de repartition, 214
fonction mesurable, 195
forward interest rate, 156
forward neutral measure, 168
Fubini, theor`eme de, 205
gamma, 130
Garman-Kohlhagen model, 125
Girsanovs theorem, 88
greeks, 129
H older, inegalite de, 204
Hamilton-Jacobi-Bellman equation, 81
Heath-Jarrow-Morton framework, 165
hedging, 114
Ho-Lee model, 167
Hull-White model, 167
two-factor, 163
implied volatility, 129, 141, 150
importance sampling, 115
in-the-money, 12
independance, 229
integrale de Lebesgue, 195, 201
integrale de Riemann, 201
interest rate swap, 155
intrinsic value, 12
It o process, 61
It os formula
for Brownian motion, 66
for It o process, 70
It o, Kyioshi, 37
iterated logarithm, law of, 50
Jensen, inegalite de, 216
Levy, Paul, 37
local martingale, 60
strict, 71
local volatility model, 143
Markov, inegalite de, 203
martingale, 31
martingale representation, 83
Mertons portfolio allocation problem,
79
Merton, Robert, 9, 37
mesure, 192
etrang`ere, 203
absolument continue, 203
de Lebesgue, 193
equivalente, 203
image, 202, 206
produit, 205
Meyer, Paul-Andre, 37
Minkowski, inegalite de, 204
no dominance, 12
no-arbitrage, 95
Novikovs criterion, 91
optional sampling theorem, 31
for discrete martingales, 34
optional time, 28
Ornstein-Uhlenbeck process, 76, 159
dierential representation, 78
out-of-the-money, 12
progressively measurable process, 28
put-call parity, 14
quadratic variation, 51
random walk, 40
rho, 131
risk-neutral measure, 20, 95
Samuelson, Paul, 9, 37
Scholes, Myron, 9, 37
Schwartz, inegalite de, 203
simple process, 55
density of, 62
square root process, 104
stochastic dierential equation, 99
generator, 110
238 SUBJECT INDEX
linear, 108
Markov property, 104
strong solution, 101
weak solution, 101
stochastic integral, 56
stochastic process, 27
stopping time, 28
strike, 11
submartingale, 31
super-hedging, 95
supermartingale, 31
theta, 131
Treasury bill, 155
Treasury bond, 155
Treasury note, 155
vanilla option, 74
variable aleatoire, 213
Vasicek model, 158
calibration, 161
Hull-White extension, 162
vega, 131
volatility, 127
Wiener, Norbert, 37
yield curve, 156
yield to maturity, 155
zero-coupon bond, 12, 154

You might also like