Professional Documents
Culture Documents
Useful textbooks
There are a huge number of books on financial derivatives. Here is a selection, worth consulting
for background reading. The numbers in square brackets refer to the bibliography at the end of
the notes.
Steven E. Shreve, Stochastic calculus for finance I: The binomial asset pricing model,
Springer 2004 [16]
(A superb probabilistic account of the binomial model.)
Steven E. Shreve, Stochastic calculus for finance II: Continuous-time models, Springer
2004 [17]
(A superb first text on stochastic calculus for finance with many examples.)
Alison Etheridge, A course in financial calculus, CUP 2002 [6]
(An excellent primer on stochastic calculus for finance.)
Paul Wilmott, Sam Howison and Jeff Dewynne, The mathematics of financial derivatives:
A student introduction, CUP 1995 [19]
(A decent first text on the PDE aspects of the subject.)
Tomas Bj
ork, Arbitrage theory in continuous time, 3rd Ed., OUP 2009 [2]
(A good all-round text which covers many topics outside the scope of the course.)
Nick H. Bingham and Ruediger Kiesel, Risk-neutral valuation: Pricing and hedging of
financial derivatives, 2nd Ed., Springer 2004 [1]
(A decent all-round text.)
Douglas Kennedy, Stochastic financial models, CRC Press 2010 [13]
(A good text based on a Cambridge Part III course, with a different emphasis, focusing a
little more on portfolio optimisation as opposed to derivative security valuation.)
Hugo D. Junghenn, Option valuation: A first course in financial mathematics, CRC Press
2012 [11]
(A good recent text at about the same level as the course.)
John C. Hull, Options, futures and other derivatives, 8th Ed., Pearson 2011 [9]
(A bestseller that has a more financial as opposed to mathematical bias, and was one of
the first textbooks on the subject, becoming a mainstay of many trading rooms.)
Jean Jacod and Philip Protter, Probability essentials, Springer 2003 [10]
(An excellent text on measure-theoretic probability, good for background.)
Geoffrey R. Grimmett and David R. Stirzaker, Probability and random processes, 3rd Ed.,
OUP 2001 [7]
(An excellent and encyclopedic background probability text.)
The lecture notes
These notes contain the core material and more, in somewhat more detail that we will be able
to cover in lectures. Material marked with an asterisk and is not examinable. Some probability
Date: March 12, 2014.
1
MICHAEL MONOYIOS
theory underlying conditional expectation and martingales is contained in the Appendix, for
those who wish to brush up on some probabilistic material. The material in the Appendix is not
examinable. The notes are no substitute for attending lectures. Some topics might be covered in
a little more or less detail than in these notes.
Regarding the mathematical material that you will need to be fluent in, here is some guidance.
You are expected to become familiar with the use of some probabilistic terminology (-algebras,
filtrations, random variables that are measurable with respect to a -algebra). You are expected
to have some familiarity with the properties of conditional expectation (but will not be examined
on proofs of these) and martingales, and to be able to use them.
You are expected to know the defining properties of a stochastic process W = (Wt )t0 known as
Brownian motion (BM), and to understand how these lead to the fact that its quadratic variation
(QV) process [W ] is equal to the time elapsed: [W ]t = t. You are expected to know Levys
criterion: any continuous martingale M satisfying [M ]t = t is a BM (and to be able sketch the
proof in the one-dimensional case).
You should be able to use the properties of Brownian motion (such as its independent Gaussian
increments property and its quadratic variation property). You should have some appreciation
of how the properties of BM lead to the properties of the Ito integral (such as the martingale
property and the It
o isometry) for elementary integrands, and you are required to know (but not
to prove) that these properties extend to the Ito integral for general integrands. You are not
required to know the theory of the construction of the Ito integral for general integrands.
You are expected to have an appreciation of how the quadratic variation property of BM leads to
the Ito formula and to properties of the Ito integral (that is, stochastic calculus). You are expected
to be able to use the It
o formula (both the one-dimensional and multi-dimensional versions)
fluently. You are expected to understand (and prove, using the Ito formula) the connection
between PDEs and stochastic calculus, in the form of the Feynman-Kac theorem.
Contents
Useful textbooks
The lecture notes
1. Introduction to financial derivatives
1.1. Underlying assets
1.2. Interest rates and time value of money
1.3. Forward and futures contracts
1.4. Arbitrage
1.5. Options
1.6. Some history*
2. Coin-toss space: a finite probability space
3. The binomial stock price process
4. Conditional expectation, martingales, equivalent measures
4.1. Conditional expectation
4.2. Properties of conditional expectation
4.3. Martingales
4.4. Equivalent measures
5. Contingent claim valuation in the binomial model
5.1. Equivalent martingale measures and no arbitrage
5.2. Valuation by replication in the binomial model
5.3. Completeness of the multiperiod binomial model
6. American options in the binomial model
6.1. Value of hedging portfolio for an American option*
7. Brownian motion
1
1
4
5
5
6
7
9
11
13
16
18
18
19
21
22
22
24
25
28
30
32
33
33
34
35
37
41
42
43
43
43
44
47
50
50
50
53
55
56
57
57
58
60
61
61
62
63
64
65
65
66
69
72
72
74
74
75
76
77
79
80
81
83
83
84
84
85
86
89
89
92
93
93
MICHAEL MONOYIOS
94
94
96
96
96
99
102
103
106
107
108
1.1. Underlying assets. Typical assets which are traded in financial markets, and which can
be the underlying assets for a derivative contract, include:
shares (stocks)
commodities (metals, oil, other physical products)
currencies
bonds (assets used as borrowing tools by governments and companies) which pay fixed
amounts at regular intervals to the bond holder.
An agent who holds an asset will be said to hold a long position in the asset, or to be long in
the asset.
An agent who has sold an asset will be said to hold a short position in the asset, or to be short
in the asset.
For the most part in this course, we will focus on derivative securities which have a stock as
underlying asset. The stock price will be a stochastic process denoted by S = (St )0tT on a
filtered probability space (, F, F := (Ft )0tT , P). This means that for each t [0, T ], St , the
value of the stock at time t, is a random variable that is measurable with respect to the -algebra
Ft (this means that the information represented by the -algebra Ft is enough to know the value
of the stock price at that time). When St is Ft -measurable for all t [0, T ] we say that S is a
process that is adapted to the filtration F. We shall see later that this means the following: each
Ft is a collection of subsets of a set (the sample space), closed under complements and under
countable unions, and with Fs Ft for s < t (such an increasing sequence of -algebras is called
a filtration, and will represent increasing information as time evolves). Each St is a function from
to R+ with the property that sets of the form
{ |St () A R+ }
lie in Ft . This is what we mean by saying that St is an Ft -measurable random variable. The
adaptedness property (St is Ft -measurable for each t [0, T ]) is tantamount to the idea that the
information available at time t is sufficient to know the value of St (that is, if you observe the
stock market up to time t, you will know the current value of the stock price).
1.2. Interest rates and time value of money. Let us measure time in some convenient units,
say years. If an interest rate r is quoted per annum and with compounding frequency at time
intervals t, this means that an amount C invested for a time period t will grow to C(1 + r t).
If this is re-invested for another period t, the balance becomes C(1 + r t)2 , and so on. So after n
periods, with t := n t, we have C(1 + r t)n = C(1 + rt/n)n . A continuously compounded interest
rate corresponds to the limit n , or t 0. In this case, after time t an amount C will grow
to
rt n
= Cert .
lim C 1 +
n
n
So an amount C invested at time zero for a time t will grow to an amount Cert , where r is the
continuously compounded risk-free interest rate. We call the amount Cert the future value of C
invested at time zero, and the factor ert is called an accumulation factor.
By the same token, receiving an amount C at time t is equivalent to receiving Cert at time
zero. We call Cert the present value of C received at time t (we say that C is discounted to the
present) and the factor ert is called a discount factor.
It is usually convenient (but nothing more) to assume that interest is continuously compounded.
We do not need to assert that, in reality, interest is continuously compounded, in order to use a
continuously compounded interest rate in all our analysis. If the interest is actually compounded
m times a year at an interest rate of R per annum, then we can still use a continuously compounded
interest rate r simply by making the identification
R mt
(1.1)
C 1+
= C exp(rt),
m
MICHAEL MONOYIOS
so that there is a one-to-one correspondence between the interest rate R (compounded m times
per annum) and the continuously compounded interest rate r. In this course, we will generally use
continuously compounded rates when considering continuous time models (but will not necessarily
do so when using discrete-time models such as the binomial model).
A differential version of the above arguments is as follows. In continuous time, we model the
time evolution of cash in a bank account in terms of a riskless asset which we shall call a money
market account, which is the value at time t > 0 of $1 invested at time zero and continuously
earning interest which is reinvested. We shall often denote the value of this asset at time t by
(0)
St , which satisfies
(1.2)
(0)
(0)
dSt
= rSt dt,
(0)
S0 = 1,
where r is the (assumed constant) interest rate. Then the value of the bank account at time t is
(0)
given by St = ert , which we see is the accumulation factor we encountered above.
A more complex model could assume that interest rates are time-varying (possibly stochastic).
In this case the money market account would satisfy
(1.3)
(0)
dSt
(0)
= rt St dt,
(0)
S0 = 1,
where rt is the instantaneous (or short term) interest rate. We have allowed for this to be timevarying, and rt represents the interest rate in the time interval [t, t + dt). From (1.3) we see
that
Z t
(0)
(1.4)
St = exp
ru du ,
0
and this is the accumulation factor in this case. This is the factor by which $1 invested at time
zero grows to at time t, when the interest generated is continually reinvested.
1.3. Forward and futures contracts.
Definition 1.4 (Forward contract). A forward contract obliges its holder to buy an underlying
asset (a stock, say) at some future time T (the maturity time) for a price K (the delivery price)
that is fixed at contract initiation. Hence, at time T , when the stock price is ST , the contract is
worth ST K (the payoff of the forward) to the holder. This payoff is shown in Figure 1.
account (the holder of a futures contract receives the change in value of the futures price after
each day, for each contract held). One also has to maintain the balance in the margin account at
some minimum value (the maintenance margin), and receives a so-called margin call (a demand
to top-up the margin account) if the balance in the margin account falls below the maintenance
margin. This mechanism is designed to remove the risk of default from the market, and hence
futures markets are very liquid. See Hull [9] for detailed descriptions of the workings of futures
exchanges.
1.3.1. Valuation of forward contracts. In what follows we value forward contracts on a nondividend paying stock, that is, an asset with price process S = (St )0tT that pays no income to
its holder. We shall assume a constant interest rate r 0.
Lemma 1.5. The value at time t T of a forward contract with delivery price K and maturity
T , on an asset with price process S = (St )0tT , is ft,T f (t, St ) f (t, St ; T ) f (t, St ; T, K),
given by
(1.5)
0 t T.
Proof. This is a simple hedging argument which provides our first example of a riskless hedging
strategy. Start with zero wealth at time t T , and sell the contract at this time for some price
ft,T . Hedge this sale by purchasing the asset for price St . This requires borrowing of St ft,T .
At time T , sell the asset for price K under the terms of the forward contract, and require that
this is enough to pay back the loan. Hence we must have
K = (St ft,T ) exp(r(T t)),
and the result follows.
Corollary 1.6. The forward price of the asset at time t T for delivery at T is Ft,T given by
Ft,T = St exp(r(T t)),
0 t T.
Proof. Set ft,T = 0 in (1.5) and then by definition we must have K = Ft,T .
1.4. Arbitrage. The simple argument above for valuing a forward contract is an example of
valuation by the principle of no arbitrage. If the relationship in Lemma 1.5 is violated, then
an elementary example of a riskless profit opportunity, called an arbitrage, ensues. Here is a
definition of arbitrage.
Definition 1.7 (Arbitrage). Let X = (Xt )0tT denote the wealth process of a trading strategy.
An arbitrage over [0, T ] is a strategy satisfying X0 = 0, P[XT 0] = 1 and P[XT > 0] > 0.
So an arbitrage is guaranteed not to lose money and has a positive probability of making a
profit. If the valuation formula (1.5) for the forward contract is violated, an immediate arbitrage
opportunity occurs, as we now illustrate.
Suppose ft,T > St K exp(r(T t)). Then one can short the forward contract and buy the
stock, by borrowing St ft,T at time t. At maturity, one sells the stock for K under the terms of
the forward contract and uses the proceeds to pay back the loan, yielding a profit of
K (St ft,T ) exp(r(T t)) > 0.
This is an arbitrage. A symmetrical argument applies if ft,T < St K exp(r(T t)) (and you
should supply this).
The principle of riskless hedging and no arbitrage will also apply, rather less trivially, to the
valuation of options later in the course.
An equivalent way of looking at no arbitrage is sometimes called the law of one price. Two
portfolios which give the same payoff at T should have the same value at time t T . Let us show
MICHAEL MONOYIOS
how this applies to the valuation of a forward contract. Consider the following two portfolios at
time t T :
a long position in one forward contract,
a long position in the stock plus a short cash position of K exp(r(T t)).
At time T , these are both worth ST K, so their values at time t T must be equal, yielding
ft,T = St K exp(r(T t)), as before. Notice that the second portfolio perfectly replicates
(or perfectly hedges) the payoff of the forward contract, meaning that it reproduces the payoff
ST K. Denote the position in the stock that is needed to perfectly hedge a forward contract by
(f )
(f )
Ht . Then we have that Ht = 1 for all t [0, T ], and note that
(f )
Ht
0 t T,
= fx (t, St ) = 1,
where f (t, x) := x Ker(T t) . This is a simple example of a delta hedging rule, in which one
differentiates the pricing function of the derivative with respect to the variable representing the
underlying asset price, in order to get the hedging strategy. We will see a similar result when
valuing options.
1.4.1. Forward contract on a dividend-paying stock. The stock in the preceding analysis was assumed to pay no dividends. Now assume that the stock pays dividends as a continuous income
stream with dividend yield q. This means that in the interval [t, t + dt), the income received
by someone holding one share of the stock will be qSt dt. In Problem Sheet 1 you are asked to
consider what happens to ones holding of shares if all such income is immediately re-invested in
more shares, to show that if the initial holding is n0 at time zero, we have
nt = n0 exp(qt),
0 t T,
and then to use this result to value a forward contract on the dividend-paying stock, arriving at
the following.
Lemma 1.8. The value at time t T of a forward contract with delivery price K and maturity
T , on a stock with price process S = (St )0tT paying dividends at a dividend yield q, is given by
(1.6)
0 t T.
0 t T.
0 t T,
1.5. Options. An option is a contract that gives the holder the right but not the obligation to
buy or sell an asset for some price that is defined in advance.
The two most basic option types are a European call and a European put.
Definition 1.11 (European call option). A European call option on a stock is a contract that
gives its holder the right (but not the obligation) to purchase the stock at some future time T
(the maturity time) for a price K (the strike price or exercise price) that is fixed at contract
initiation. If S = (St )0tT denotes the underlying assets price process, the payoff of a call
option is (ST K)+ , as shown in Figure 2.
ST
Definition 1.12 (European put option). A European put option on a stock is a contract that
entitles the holder to sell the underlying stock for a fixed price K, the strike price, at a future
time T . If S = (St )0tT denotes the underlying assets price process, the payoff of a put option
is (K ST )+ , as shown in Figure 3.
(K ST )+ (Put payoff)
6
@
@
@
@
@
- S
T
10
MICHAEL MONOYIOS
Lemma 1.13 (Put-call parity). The European call and put prices c(t, St ) and p(t, St ) of options
with the same strike K and maturity T on a non-dividend paying traded stock with price St at
time t [0, T ] are related by
c(t, St ) p(t, St ) = St Ker(T t) ,
0 t T.
0 t T,
0 t T.
(2) ST > K, in which case the call is exercised. The arbitrageur buys the stock for K to
close out the short position, using the proceeds from the bank account, which stand at
(St c(t, St ))er(T t) prior to buying the stock. This leaves a profit of
(St c(t, St ))er(T t) K = er(T t) (St Ker(T t) c(t, St )) > 0.
We can derive similar model-independent bounds on a put option price. A put option gives its
holder the right to receive an amount K for the stock, so the most it can be worth at maturity
11
call price, C
10
C=S
5
C=SK.exp(rT)
10
8
10
stock price, S
12
14
16
18
0 t T.
Similarly, for the value of a put at expiry we have p(T, ST ) = (K ST )+ K ST . That is, a
put option is at least as valuable as a short position in a forward contract. Hence we have the
lower bound
p(t, St ) Ker(T t) St , 0 t T.
The results in this section are model-independent. To say more about option values we need a
model for the dynamic evolution of a stock price. One of the simplest continuous-time models is
the Black-Scholes-Merton (BSM) model, which we shall describe later, and one of the simplest
discrete-time models is the binomial model, which we shall also see shortly.
1.5.3. Combinations of options. Options can be combined to give a variety of payoffs for different
hedging purposes, or for speculation on movements in the underlying asset price, and they are
often used to do so because the option premiums are relatively small in some cases, thus proving
attractive to gamblers.
A straddle is a call and a put with the same strike and maturity. The payoff of a long position
in a straddle is
K ST , ST < K
+
+
(1.7)
(ST K) + (K ST ) =
ST K, ST K,
This payoff is illustrated in Figure 5.
1.6. Some history*. As remarked earlier, the origins of derivatives lie in medieval agreements
between farmers and merchants to insure farmers against low crop prices.
12
MICHAEL MONOYIOS
ST
In the 1860s the Chicago Board of Trade was founded to trade commodity futures (contracts
that set trading prices of commodities in advance), formalising the act of hedging against future
price changes of important products.
Options were first valued by Bachelier in 1900 in his PhD thesis, a translation of which can
be found in the book by Davis and Etheridge [4]. Bachelier introduced a stochastic process now
known as Brownian motion (BM) to model stock price movements in continuous time. Bachelier
did this before a rigorous treatment of BM was available in mathematics. His work was decades
ahead of its time, both mathematically and economically speaking, and was therefore not given
the credit it deserved at the time. In the decades that followed, mathematicians and physicists
(Einstein, Wiener, Levy, Kolmogorov, Feller to name but a few) developed a rigorous theory of
Brownian motion, and It
o developed a rigorous theory of stochastic integration with respect to
Brownian motion, leading to the notion of a stochastic calculus, which we shall encounter. In the
1960s, economists re-discovered Bacheliers work, and this was one of the ingredients that led to
the modern theory of option valuation.
In the early 1970s a combination of forces existed which made markets more risky, derivatives
more prominent, and their valuation and trading possible. The system of fixed exchange rates
that existed before 1970 collapsed, and the Middle East oil crises caused a big increase in the
volatility of financial prices. This increased the demand for risk management products such as
options. At the same time Black and Scholes [3] and Merton [14] (BSM) published their seminal
work on how to price options, based on managing the risk associated with selling such an asset.
This breakthrough, for which Scholes and Merton received a Nobel Prize (Black having passed
away in 1995) coincided with the opening of the Chicago Board Options Exchange (CBOE), giving
individuals both a means to value option contracts and a marketplace where they could profit
from this knowledge of the fair price.
Following on from this, the financial deregulation of the 1980s, allied to technological developments which made it possible to trade securities globally and to run large portfolios of complex
products, caused a huge increase in risky trading across international borders. This opened up yet
more risks across currencies, interest rates and equities, and financial institutions very skillfully
(or opportunistically, perhaps) created markets to trade derivatives and to sell these products to
customers. This has led to the massive increase in derivative trading that we now see, with the
volume of derivative contracts traded now dwarfing that in the associated underlying assets.
13
The papers of Black-Scholes [3] and Merton [14] attracted mathematicians to the subject,
and led to a mathematically rigorous approach to valuing derivatives, based on probability and
martingale theory, inspired by Harrison and Pliska [8]. This led directly to modern financial
mathematics, and has also contributed to the advent of derivatives written on a plethora of
underlying stochastic reference entities, such as interest rates, weather indices, default events, as
well as on more traditional traded underlying securities such as stocks, currencies and interest
rates.
2. Coin-toss space: a finite probability space
The binomial stock price model (which we are working towards) is a discrete time stochastic
model of a stock price process in which a fictitious coin is tossed and a stock price depends on the
outcome of the coin tosses. Hence our first task is to introduce some probabilistic notions and
terminology in the context of a finite coin-toss probability space.
Let T := {0, 1, . . . , n} represent a discrete time set. Let = n , the set of all outcomes of n
coin tosses. The finite set is called the sample space, with elements called sample points,
representing the possible outcomes of the random experiment in which the coin is tossed. Each
sample point is a sequence of length n, written as = 1 2 . . . n , where each t , t T is
either H (head) or T (tail), representing the outcome of the tth coin toss.
Let F be the set of all subsets of ; F is a -algebra (or -field), that is, a collection of subsets
of with the properties: (i) F, (ii) if A F then Ac F, (iii) if A1 , A2 , . . . is a sequence of
sets in F, then
k=1 Ak is also in F. We interpret -algebras as a record of information (as we
shall see shortly). The pair (, F) is called a measurable space.
We place a probability measure P on (, F). A probability measure P is a function mapping
P : F [0, 1] S
with the properties:
(i) P() = 1, (ii) if A1 , A2 , . . . is a sequence of disjoint sets
P
P(A
A
)
=
in F, then P (
k ). The interpretation is that, for a set A F, there is a
k=1
k=1 k
probability in [0, 1] that the outcome of a random experiment will lie in the set A. We think of
P(A) as this probability. The set A F is called an event.
For A F we define
X
(2.1)
P(A) :=
P().
A
We can define P(A) in this way because A has only finitely many elements.
Let the probability of H on each coin toss be p (0, 1), so that the probability of T is q := 1p.
For each = (1 2 . . . n ) we define
(2.2)
P() := pNumber of H in q Number of T in .
Then for each A F we define P(A) according to (2.1).
In the finite coin toss space, for each t T let Ft be the -algebra generated by the first t coin
tosses. This is a -algebra which encapsulates the information one has if one observes the outcome
of the first t coin tosses (but not the full outcome of all n coin tosses). Then Ft is composed of
all the sets A such that Ft is indeed a -algebra, and such that if you know the outcome of the
first t coin tosses, then you can say whether A or
/ A, for each A Ft . The (increasing)
sequence of -algebras F := (Ft )tT is an example of a filtration, defined formally below.
Definition 2.1 (Filtration). Let T = {0, 1, . . . , n}. A filtration F = (Ft )tT is a sequence of
increasing -algebras F0 , F1 , . . . , Fn . That is, Fs Ft if s < t.
A filtration records increasing information flow, as we shall see in an example below. A probability space (, F, P) equipped with a filtration F = (Ft )tT , with each Ft F, is called a filtered
probability space.
Example 2.2 (3-period coin toss space). Take n = 3, so that = 3 , given by the finite set
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT},
14
MICHAEL MONOYIOS
corresponding to the events that the first coin toss results in H and T respectively.
Define also
AHH = {HHH, HHT}, AHT = {HTH, HTT},
ATH = {THH, THT}, ATT = {TTH, TTT},
corresponding to the events that the first two coin tosses result in HH, HT, TH and TT respectively.
Using (2.2) and (2.1) we can compute the probability of any event. For instance,
P(AH ) = P{H on first toss} = P{HHH, HHT, HTH, HTT} = p3 + 2p2 q + pq 2 = p,
precisely in accordance with intuition, and similarly P(AT ) = q.
We can construct the following -algebras of subsets of :
F0 = {, },
F1 = {, , AH , AT }.
15
It is easy to see that for an indicator function 1A of an event A F, the definition of expectation
leads to E[1A ] = P(A).
Remark 2.8. When the sample space is infinite and, in particular, uncountable, the summation
in the definition of expectation is replaced by an integral. In general, the integral over an abstract
measurable space (, F) with respect to a probability measure P is a so-called Lebesgue integral
(which has all the linearity and comparison properties we associate with ordinary integrals). The
expectation E[X] becomes the Lebesgue integral over of X with respect to P, written as
Z
Z
Z
Z
(2.4)
E[X] =
X dP
X() dP() =
x dX (x)
xX ( dx).
16
MICHAEL MONOYIOS
(0)
St+1 = (1 + r)St ,
t = 0, 1, . . . , n 1,
(0)
S0 = 1,
(3.1)
St
= (1 + r)t ,
t = 0, 1, . . . , n,
(0)
and hence St represents the value of at time t of one unit of currency (say $1) invested at time
zero.
Regarding the stock price process S = (St )tT , for each t T, St St () (for ) is a
one-dimensional random variable on the measurable space (, F), such that St () = St (1 . . . t )
(where the second notation encapsulates the fact that each St , for t T, depends only on the
outcome of the first t coin tosses) is the stock price after t coin tosses. The sequence of random
variables S = (St )tT is a stochastic process. We shall see that for each t T, St is Ft -measurable,
so that S is an F-adapted process. This encapsulates the idea that the information at time t T,
represented by Ft , is sufficient information to know the values of Ss for all s t.
Define two constants u, d satisfying u > 1 + r > d > 0. The evolution of the stock price is given
by (see Figure 6)
St u, if t+1 = H,
(3.2)
St+1 () =
t = 0, 1, . . . , n 1.
St d, if t+1 = T,
We may sometimes write (3.2) in the equivalent notation
St (1 . . . t )u, if t+1 = H
St+1 (1 . . . t+1 ) =
St (1 . . . t )d, if t+1 = T,
t = 0, 1, . . . , n 1,
whenever we wish to emphasise that St actually depends only on the outcome of the first t coin
tosses, and we abbreviate this notation further by sometimes suppressing the dependence on
1 . . . t , and writing
St u, if t+1 = H
St+1 (t+1 ) =
t = 0, 1, . . . , n 1,
St d, if t+1 = T,
17
St u
p
St
@
q := 1 p@
R St d
@
St
= S0 uj dtj ,
j = 0, 1, . . . , t,
t T.
Example 3.1 (One-period binomial model). Let n = 1, so T = {0, 1}, and = 1 is the finite set
1 := {H, T},
the set of outcomes of a single coin toss. The stock price process is (St )t{0,1} , and S1 () takes
on two possible values, S1 (H) or S1 (T), given by
S0 u, if = H
S1 () =
S0 d, if = T.
Example 3.2 (3-period binomial model). Let n = 3, so that T = {0, 1, 2, 3} and = 3 , given by
the finite set
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT},
the set of all possible outcomes of three coin tosses.
We can write down all the stock prices in a binomial tree as:
S (HHH) = u3 S0
0
1
111111111111
000000000000
03
1
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
S2 (HH) = u2 S0
000000000000
111111111111
000000000000
111111111111
0
1
0000
1111
000000000000
111111111111
0
1
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
S1 (H)111111111111
=
uS
0
0000
1111
000000000000
S3 (HHT) = S3 (HTH) = S3 (THH) = u2 dS0
0000
1111
0
1
0
1
000000000000
111111111111
000000000
111111111
0
1
0
1
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
S (HT) = S2 (TH) = udS0
0
1
0
1
S0
000000000000
111111111111
000000000
111111111
0
1
0 2
1
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
0
1
2
000000000000
111111111111
0
1
000000000
111111111
0000
1111
0
1
000000000000
111111111111
0S3 (HTT) = S3 (THT) = S3 (TTH) = ud S0
1
S1 (T)
=
dS
0000
1111
0
000000000000
111111111111
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
0
1
0000
1111
000000000000
111111111111
0
1
000000000000
111111111111
S2 (TT) = d2 S0
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
0
1
000000000000
111111111111
0S3 (TTT) = d3 S0
1
Recall the filtration (Ft )3t=0 of Example 2.2. The stock price process is adapted to this filtration,
meaning that for each t T, the random variable St is Ft -measurable, which in turn means that
for each t T, all sets of the form
{ |St () A R} {St A}
lie in Ft (as you can check in a few cases), and this in turn means that the information in Ft is
enough to know the value of St .
18
MICHAEL MONOYIOS
Now we consider the -algebra generated by the random variable S2 , denoted by (S2 ). If we
list, in as minimal a fashion as possible, the subsets of that we can get as preimages under S2
of sets in R, along with sets which can be built by taking unions and complements of these, then
this collection of sets turns out to be (S2 ).
Now, if AHH , then S2 () = u2 S0 . If AHT ATH , then S2 () = udS0 . If ATT , then
S2 () = d2 S0 . Hence (S2 ) is composed of {, , AHH , AHT ATH , ATT }, plus all relevant unions
and complements. Using the identities
AHH (AHT ATH ) = AcTT ,
AHH ATT = (AHT ATH )c ,
(AHT ATH ) ATT = AcHH ,
we obtain
(3.3)
The information content of the -algebra (S2 ) is exactly the information learnt by observing S2 .
So, suppose the coin is tossed three times and you do not know the outcome , but you are told,
for each set in (S2 ), whether is in that set or not. For instance, you might be told that is
not in AHH , is in AHT ATH , and is not in ATT . Then you know that in the first two throws
there was a head and a tail, but you are not told in which order they occurred. This is the same
information you would have got by being told that the value of S2 () is udS0 .
Note that F2 contains all the sets which are in (S2 ), and even more. In other words, the
information in the first two throws is greater than the information in S2 . In particular, if you
see the first two tosses you can distinguish AHT from ATH , but you cannot make this distinction
from knowing the value of S2 alone.
Note (we do not prove this here) that there is always a random variable Y satisfying the above
properties (that is, conditional expectations always exist) provided that E|X| < . There can
be more than one random variable satisfying the above properties, but if Y 0 is another one, then
Y = Y 0 with probability 1 (or almost surely (a.s.)).
For random variables X, Y it is standard notation to write
E[X|Y ] E[X|(Y )].
19
4.2. Properties of conditional expectation. Proofs (or sketch proofs) of the properties below
are given in the Appendix. All the X below satisfy E|X| < .
(1) E[E[X|G]] = E[X].
The conditional expectation of X is thus an unbiased estimator of the random variable
X.
(2) If X is G-measurable, then E[X|G] = X.
In other words, if the information content of G is sufficient to determine X, then the
best estimate of X based on G is X itself.
(3) (Linearity) For a1 , a2 R,
E[a1 X1 + a2 X2 |G] = a1 E[X1 |G] + a2 E[X2 |G].
(4) (Positivity) If X 0 almost surely, then E[X|G] 0 almost surely.
(5) (Jensens inequality) If : R R is convex and E|(X)| < , then
E[(X)|G] (E[X|G]).
(6) (Tower property) If H is a sub--algebra of G, then
E[E[X|G]|H] = E[X|H],
a.s.
The intuition here is that G contains more information than H. If we estimate X based
on the information in G, and then estimate the estimator based on the smaller amount of
information in H, then we get the same result as if we had estimated X directly based on
the information in H.
(7) (Taking out what is known). If Z is G-measurable, then
E[ZX|G] = Z E[X|G].
(8) (Role of independence) If X is independent of H (i.e. if (X) and H are independent
-algebras), then
(4.1)
E[X|H] = E[X].
The intuition behind (4.1) is that if X is independent of H, then the best estimate of
X based on the information in H is E[X], the same as the best estimate of X based on
no information.
Example 4.2 (Conditional expectation in 3-period binomial model). Recall Example 3.2, featuring
a 3-period binomial model. We continue this to give an example of a conditional expectation:
suppose we wish to estimate S1 , given S2 , and denote this estimate by E[S1 |S2 ]. This should
have two properties: (i) it should be a random variable, so should depend on , E[S1 |S2 ] =
E[S1 |S2 ()] = E[S1 |S2 ](), and (ii) it should be (S2 )-measurable, that is, if if the value of S2 is
known then the value of E[S1 |S2 ] should also be known.
In particular, if = HHH or = HHT, then S2 () = u2 S0 and, even without knowing , we
know that S1 () = uS0 . We thus define
E[S1 |S2 ](HHH) := E[S1 |S2 ](HHT) = uS0 .
In other words
E[S1 |S2 ]() = uS0 ,
AHH .
Similarly we define
E[S1 |S2 ](TTT) := E[S1 |S2 ](TTH) = dS0 .
In other words
E[S1 |S2 ]() = dS0 ,
ATT .
20
MICHAEL MONOYIOS
Finally, if A = AHT ATH = {HTH, HTT, THH, THT}, then S2 () = udS0 , so that S1 () =
uS0 or S1 () = dS0 . So, to get E[S1 |S2 ] in this case, we take a weighted average, as follows. For
A we define
R
S1 dP
E[S1 |S2 ]() := A
,
P(A)
which is a partial averageR of S1 over the set A, normalised by the probability of A.
Now, P(A) = 2pq and A S1 dP = pq(u + d)S0 , so that for A
1
E[S1 |S2 ]() = (u + d)S0 , A = AHT ATH .
2
(In other words, the best estimate of S1 , given that S2 = udS0 , is the average of the possibilities
uS0 and dS0 .) Then we have that
Z
Z
S1 dP.
E[S1 |S2 ] dP =
A
if x = u2 S0
uS0 ,
1
g(x) =
(u + d)S0 , if x = udS0
2
dS0 ,
if x = d2 S0 .
In other words E(S1 |S2 ) is random only through dependence on S2 (and hence is (S2 )-measurable).
This random variable satisfies
(1) E[S1 |S2 ] is (S2 )-measurable
(2) For every A (S2 ),
Z
Z
E[S1 |S2 ] dP =
A
S1 dP,
A
AH
AT
AT
(Obviously the partial averaging property is true on (all are zero) and it will be true on if it
is true on AH and AT since AH AT = ). On AH we have
Z
E[S2 |F1 ] dP = P(AH )E[S2 |F1 ]() (since E[S2 |F1 ] is constant over AH )
AH
= pE[S2 |F1 ],
whilst on the other hand
AH ,
S2 dP = p2 u2 S0 + pqudS0 .
AH
Hence
E[S2 |F1 ]() = pu2 S0 + qudS0 = (pu + qd)uS0 = (pu + qd)S1 (),
AH .
21
AT .
So overall we get
E[S2 |F1 ]() = (pu + qd)S1 (),
and so we have
E[S1 |F0 ] = (pu + qd)S0 .
We can generalise the above results to an n-period model.
Lemma 4.3. In an n-period binomial model, we have
E[St+1 |Ft ] = (pu + qd)St ,
(4.2)
t = 0, 1, . . . , n 1.
St+1
.
St
t = 0, 1, . . . , n 1.
So martingales tend to go neither up nor down. A supermartingale tends to go down, i.e. the
second condition above is replaced by E[Mt+1 |Ft ] Mt . A submartingale tends to go up, i.e.
E[Mt+1 |Ft ] Mt .
Definition 4.5 (Predictable process). A a predictable process (t )nt=1 on a filtered probability
space (, F, (Ft )nt=0 , P) s one such that, for each t {1, . . . , n}, t is Ft1 -measurable.
See the Appendix for some more background material on discrete-time martingales, including
the idea of a martingale transform, or discrete stochastic integral (Proposition A.13), which uses
the notion of a predictable process.
22
MICHAEL MONOYIOS
A F.
See the Appendix for some more on equivalent and absolutely continuous measures, including
the concept of a Radon-Nikodym derivative.
5. Contingent claim valuation in the binomial model
Take the standard n-period binomial stock price process S of Section 3 on the filtered probability
space (, F, F := (Ft )tT , P), generated by n coin tosses, with time index set T = {0, 1, 2, . . . n}.
The sample space is finite, the probability measure P is called the physical measure (or objective
measure, or the market measure). We assume Fn = F and F0 = {, }.
The sample space is the set of all outcomes of n coin tosses, so each is of the form
= (1 2 . . . n ), with each t {H, T}, for each t {1, . . . , n}. The evolution of the stock price
process S = (St )nt=0 is given by
St u, if t+1 = H,
St+1 =
t = 0, 1, . . . , n 1,
St d, if t+1 = T,
where u > 1 + r > d > 0.
Introduce a financial agent with initial wealth X0 at time zero, who can choose at each time
how to split his wealth between the riskless and risky assets. The agents trading strategy is the
two-dimensional stochastic process
(0)
(t , t ),
t {1, . . . , n},
(0)
where, for t {1, . . . , n}, t denotes the number of units of the riskless asset held over the
interval [t 1, t) and t , denotes the number of units of the stock held over the interval [t 1, t).
The positions in the portfolio at time t, for t {1, . . . , n}, are decided at time t 1 and kept until
time t, when new asset price quotations are available.
(0)
Assumption 5.1. The portfolio process (t , t )t{1,...,n} is predictable, so that for each t
(0)
{1, . . . , n}, t
(0)
X0 = 1 + 1 S0 . (budget constraint)
Equation (5.1) is a budget constraint, that the agent splits all his initial wealth between cash and
the risky stock.
The time 1 wealth is
(5.2)
(0)
X1 = (1 + r)1 + 1 S1 ,
where we have assumed that no wealth has been taken out of the portfolio for (say) consumption
and no outside income has been injected into the portfolio. Equation (5.2) is thus one form of a
self-financing condition on the portfolio wealth evolution. Using the budget constraint (5.1) we
re-cast (5.2) into the form
(5.3)
23
Similar self-financing portfolio rebalancing occurs at each time t {1, . . . , n 1}. Define the
wealth process X = (Xt )tT , where Xt denotes the wealth at time t (that is, at the end of the
interval [t 1, t) and the beginning of the interval [t, t + 1)), for t = 0, 1, . . . , n 1, with Xn the
final wealth at the end of the interval [n 1, n]. We then have the following evolution.
At the beginning of the interval [t 1, t), and just after portfolio rebalancing has taken place,
the wealth is Xt1 given by
(5.4)
(0) (0)
(0)
where the last equality follows from the expression (3.1) for the value of the riskless asset at any
(0)
time in T. The position (t , t ) is held over [t 1, t), and the wealth Xt achieved at the end of
this interval (and hence at the start of the interval [t, t + 1)) is
(0) (0)
(5.5)
Xt = t St
(0)
+ t St = t (1 + r)t + t St .
(0)
At this time, t, the portfolio is rebalanced to (t+1 , t+1 ) so that Xt is also given by
(0)
(0)
Xt = t+1 St
+ t+1 St .
(0)
t+1 St
(0) (0)
+ t+1 St = t St
t = 1, . . . , n 1.
+ t St ,
t = 1, . . . , n.
This can be put into a neater form if we work with discounted quantities, that is, we evaluate all
quantities in units of the bond price. The discounted stock price process Se is defined by
St
St
, t = 0, 1, . . . , n,
Set = (0) =
(1 + r)t
St
and the discounted wealth process is similarly defined by
et = Xt = Xt , t = 0, 1, . . . , n.
X
(0)
(1 + r)t
S
t
Then, in terms of discounted quantities, the wealth evolution equation (5.6) becomes
(5.7)
et = X
et1 + t (Set Set1 ),
X
t = 1, . . . , n.
et = X0 +
X
t
X
s (Ses Ses1 ),
t = 1 . . . , n.
s=1
From this we see that the wealth process is completely specified by the initial wealth X0 and the
choice of stock portfolio . When we need to emphasise the dependence of wealth on the chosen
portfolio we write X() X.
e denoted by
The sum in (5.8) as the (discrete-time) stochastic integral of with respect to S,
e
( S):
t
X
e t :=
( S)
s (Ses Ses1 ), t = 1, . . . , n.
s=1
24
MICHAEL MONOYIOS
Q(t = H) = pQ :=
In this case, for t = 0, . . . , n, we call Vt := Xt the no-arbitrage price at time t of Y , and the
portfolio which attains Xn = Y is called the replicating portfolio for the claim.
25
Definition 5.11 (Complete market). A financial market is said to be complete if every contingent
claim is attainable. Otherwise, the market is said to be incomplete.
Theorem 5.12 (Second Fundamental Theorem of Asset Pricing (FTAP II)). A finite-state
discrete-time arbitrage-free market is complete if and only if there is a unique equivalent martingale measure.
Here are some examples of European claims in a discrete-time setting with time index set
T = {0, 1, . . . , n}.
A European call option, with payoff Y = (Sn K)+ for fixed strike K 0.
A European put option, with payoff Y = (K Sn )+ for fixed strike K 0.
A fixed strike lookback call option, with payoff Y = (Mn K)+ for fixed strike K 0,
where Mn is the maximum of the stock price over {0, 1, . . . , n}, that is Mn = maxtT St .
A floating strike lookback call option, with payoff Y = (Sn mn )+ , where mn is the
minimum of the stock price over {0, 1, . . . , n}, that is mn = mintT St .
An arithmetic average fixed strike Asian call option, with payoff Y = (An K)+ for fixed
strike K 0,, where An is the arithmetic average of the stock price over {0, 1, . . . , n},
that is
n
1 X
St .
An =
n+1
t=0
In a complete market, all contingent claims are attainable. So, given a contingent claim Y ,
there is a unique trading strategy with wealth process X = (Xt )tT such that Xn = Y almost
surely. This immediately implies that, to avoid arbitrage, the price of the claim at any time t T
must be Vt := Xt , as in Definition 5.10.
Denote the discount factor from time t T to time zero by Dt . So, with constant interest rate
(0)
r, Dt = (1 + r)t = 1/St .
Lemma 5.13. The no-arbitrage price of an attainable claim Y is given by
1 Q
(5.10)
Vt =
E [Dn Y |Ft ], t T.
Dt
Any other price for the claim will lead to an arbitrage opportunity.
Proof. Let X = (Xt )tT be the wealth process of the replicating strategy. The discounted wealth
e = DX is a Q-martingale, so satisfies
process X
EQ [Dn Xn |Ft ] = Dt Xt ,
t n.
26
MICHAEL MONOYIOS
emerges naturally.
5.2.1. Replication in a one-period binomial model. First consider a one-period model, n = 1, so
T = {0, 1}. Suppose an agent sells a claim on the stock at time zero that expires at time 1. There
are just two points , given by = H and = T.
The claim pays off an amount Y at time 1, where Y is an F1 -measurable random variable.
This measurability condition is relevant; it says that the value of the claim at its maturity date
is determined by the coin toss, that is, by the value of the stock price at time 1. This is why it
does not make sense to use some stock unrelated to the derivative security in valuing it.
The agent sells the claim at time zero for some price V0 (to be determined) and attempts to
manage the risk from this sale by building a hedging portfolio composed of a number 1
(0)
shares of the underlying stock and a number 1 (0) of shares of the riskless asset (which has
(0)
initial value S0 = 1).
We suppose that the proceeds from the sale of the claim, V0 , are all that the agent uses to
construct a hedging portfolio. Therefore the initial wealth in the hedging portfolio is
X0 = (0) + S0 .
(5.12)
As the stock price evolves in time the hedging portfolio and option value will also evolve. The
option payoff is the random variable Y () (so for, say, a call option, Y () = (S1 () K)+ , where
K is the options strike and S1 () is the stock price after one coin toss). The agents hedge
portfolio wealth at time 1 is X1 (), given by
X1 () = (1 + r) (0) + S1 ().
Eliminating (0) using (5.12), we write X1 as
X1 () = (1 + r)X0 + (S1 () (1 + r)S0 ).
If the hedging portfolio is to successfully manage the risk from the option sale its value must
replicate the option payoff in each possible final state, so that we require X1 () = Y () for
= H and = T, yielding the equations
(5.13)
(5.14)
if = H,
if = T.
Y (H) Y (T)
.
S1 (H) S1 (T)
1
EQ [Y ].
1+r
The measure Q is the unique EMM for this one-period market, and is also known as the riskneutral probability measure.
27
It is clear that Q is equivalent to the physical measure P, and that Q is indeed a martingale
measure, in that
S1
Q
= S0 .
E
1+r
It is also clear that the discounted wealth process, and hence the discounted claim price process,
is also a Q-martingale, just as we would have expected from the FTAPs.
5.2.2. Replication in an n-period binomial model. We can easily generalise the above analysis to
an n-period model, by concatenating a sequence of 1-period models.
Let us place ourselves at some time t1, where t {1, . . . , n}. Given a fixed outcome 1 . . . t1
of the first t 1 coin tosses, suppose that the values of the stock and a derivative security at time
t are St (t ), Vt (t ) respectively, if the outcome of the tth coin toss is t (see Figure 7). Then one
can trade over [t 1, t) to reproduce the values of the derivative one period later, as follows.
St1 , Vt1
@
@
R St (T) = St1 d, Vt (T)
@
(5.16)
(0)
Xt (t ) = t (1 + r)St1 + t St1 t ,
t =
u, t = H,
d, t = T.
(0) (0)
t = H, T.
We require Xt (t ) = Vt (t ), for both t = H and t = H. This requires that the stock holding at
the beginning of the interval [t 1, t) must be
t =
Vt (H) V )t (T)
Vt (H) V )t (T)
=
.
St1 (u d)
St (H) St (T)
The required wealth at time t 1 is then given from either of (5.17) or (5.18) as
i
1 h Q
Xt1 =
p Vt (H) + q Q Vt (T) = EQ [(1 + r)1 Vt |Ft1 ], t = 1, . . . , n.
1+r
For no-arbitrage, we must then have that the derivative value at time t 1 must be given by
Vt1 = Xt1 :
(5.19)
t = 1, . . . , n.
28
MICHAEL MONOYIOS
Notice that this implies that the discounted option value ((1 + r)t Vt )nt=0 is a Q-martingale (as it
should be, since it is replicated by a discounted wealth process which is a Q-martingale).
This shows that one can always find a strategy at any time to reproduce the value of a contingent
claim one period later. The key to valuing the contingent claim is thus to begin at the maturity
time and work backwards, computing risk-neutral discounted expectations. The next section
formalises this.
5.3. Completeness of the multiperiod binomial model. The above analysis can clearly be
iterated so that in a multiperiod binomial model, we can replicate any contingent claim. The
next theorem rigorously demonstrates that a portfolio process to hedge any contingent claim in
the binomial model exists, and derives an expression for t , t = 1, . . . , n.
Define the unique EMM Q by setting the Q-probability of H on each coin toss is to be pQ , and
the Q-probability of T to be q Q := 1 pQ , given by (5.11).
Theorem 5.14. The n-period binomial model is complete. In particular, let Y be European claim
with maturity time n, and define
Vt (1 . . . t ) := (1 + r)t EQ [(1 + r)n Y |Ft ](1 . . . t ), t = 0, . . . , n,
Vt (1 . . . t1 H) Vt (1 . . . t1 T)
, t = 1, . . . , n.
t (1 . . . t1 ) :=
St (1 . . . t1 H) St (1 . . . t1 T)
Then, starting with initial wealth X0 := V0 = EQ [(1 + r)n Y ], the self-financing wealth process
corresponding to the portfolio process 1 , . . . , n is the process V0 , . . . , Vn .
Proof. Let V0 , . . . , Vn and 1 , . . . , n be defined as in the theorem. Observe that Vn = Y almost
surely.
Start with wealth X0 = V0 = EQ [(1+r)n Y ] and consider the self-financing value of the process
1 , . . . , n . This wealth satisfies the recursive formula
Xt+1 = (1 + r)Xt + t+1 (St+1 (1 + r)St ),
t = 0, 1, . . . , n 1.
Xt = Vt ,
We proceed by induction. For t = 0, (5.20) holds by definition of X0 . Now assume that (5.20)
holds for some fixed value of t {1, . . . , n 1}, i.e. for each fixed (1 . . . t ) we have
Xt (1 . . . t ) = Vt (1 . . . t ).
Then we need to show that
Xt+1 (1 . . . t H) = Vt+1 (1 . . . t H),
Xt+1 (1 . . . t T) = Vt+1 (1 . . . t T).
We shall prove the first equality, and note that the second can be proved similarly (an exercise).
Note first that {(1 + r)t Vt }nt=0 is a martingale under Q, since
EQ [(1 + r)(t+1) Vt+1 |Ft ] = EQ [EQ [(1 + r)n Y |Ft+1 ]|Ft ]
n
= E [(1 + r)
Y |Ft ]
(defn. of Vt+1 )
(tower property)
= (1 + r) Vt .
So in particular,
Vt (1 . . . t ) = EQ [(1 + r)1 Vt+1 |Ft ](1 . . . t )
1
(pQ Vt+1 (1 . . . t H) + q Q Vt+1 (1 . . . t T)).
=
1+r
29
Since (1 . . . t ) will be fixed for the rest of the proof, we simplify notation by suppressing these
symbols. For example, the last equation is written as
(5.21)
Vt =
1
(pQ Vt+1 (H) + q Q Vt+1 (T)).
1+r
Now we compute
Xt+1 (H) = (1 + r)Xt + t+1 (St+1 (H) (1 + r)St )
= (1 + r)Vt + t+1 (St+1 (H) (1 + r)St ) (since Xt = Vt )
Vt+1 (H) Vt+1 (T)
= (1 + r)Vt +
(St+1 (H) (1 + r)St )
St+1 (H) St+1 (T)
= pQ Vt+1 (H) + q Q Vt+1 (T)
Vt+1 (H) Vt+1 (T)
+
(St+1 (H) (1 + r)St )
St+1 (H) St+1 (T)
= pQ Vt+1 (H) + q Q Vt+1 (T) + q Q (Vt+1 (H) Vt+1 (T))
= Vt+1 (H),
where we have used St+1 (H) = St u and St+1 (T) = St d.
Example 5.15 (European call in 2-period model). Let u = 2, d = 1/u, r = 1/4, S0 = 4, so
that pQ = q Q = 1/2. Consider a European call with expiration time 2 and payoff function
Y = (S2 5)+ . The possible stock prices in this model are shown in Figure 8.
S2 (HH) = 16
S1 (H) = 8
@
@ S2 (HT) = S2 (TH) = 4
S0 = 4
@
@
S1 (T) = 2
@
@S (TT) = 1
2
30
MICHAEL MONOYIOS
Then using the binomial algorithm in Theorem 5.14 we work backwards in time using the fact
that V is a Q-martingale, to obtain
1
4 1
1
22
Q
Q
V1 (H) =
(p V2 (HH) + q V2 (HT)) =
(11) + (0) = ,
1+r
5 2
2
5
4 1
1
1
(pQ V2 (TH) + q Q V2 (TT)) =
(0) + (0) = 0,
V1 (T) =
1+r
5 2
2
1
1
4 1 22
44
+ (0) = .
V0 =
(pQ V1 (H) + q Q V1 (T)) =
1+r
5 2 5
2
25
6. American options in the binomial model
We briefly discuss the pricing of American derivative securities in the binomial model. American
derivative securities can be exercised at any time prior to maturity.
Definition 6.1. In a discrete-time framework with time set T = {0, 1, . . . , n}, an American
derivative security with maturity n is a sequence of nonnegative random variables (Yt )nt=0 such
that for each t T, Yt is Ft -measurable. The owner of an American derivative security can
exercise at any time t T, and if he does, he receives the payment Yt .
For example, an American put option of strike K on a stock price S = (St )nt=0 can be exercised
at any time t T to give the owner a payment Yt := (K St )+ , which is called the intrinsic value
of the option at time t.
Recall the pricing of European securities. Consider a binomial model with n periods, so the
time set is T = {0, 1, . . . , n}. Suppose Yn is the payoff of a European derivative. For t T, we
define by backward recursion
1
[pQ Vt+1 (H) + q Q Vt+1 (T)], t = 0, . . . , n 1,
(6.1)
Vn := Yn , Vt :=
1+r
where, as before, the second equation is a shorthand for
Vt (1 . . . t ) = EQ [(1 + r)1 Vt+1 |Ft ](1 . . . t )
1
(pQ Vt+1 (1 . . . t H) + q Q Vt+1 (1 . . . t T)).
=
1+r
Then Vt is the value of the option at time t T, and the hedging portfolio over [t 1, t) is t
given by
Vt (H) Vt (T)
Vt (H) Vt (T)
t =
=
, t = 1, . . . , n,
St (H) St (T)
St1 (u d)
which is shorthand for
Vt (1 . . . t1 H) Vt (1 . . . t1 T)
t (1 . . . t1 ) =
, t = 1, . . . , n.
St (1 . . . t1 H) St (1 . . . t1 T)
Now suppose the option is American, with payoff Y = (Yt )nt=0 . At any time t T, the holder of
the American derivative can exercise the option and receive the payment Yt . Hence, the hedging
portfolio should create a wealth process X which satisfies
Xt Yt ,
t T,
almost surely.
This is because the value of the American option at time t is at least as much as the so-called
intrinsic value Yt , and the value of the hedging portfolio at that time must equal the value of the
option.
This suggests that, to price an American derivative, we should replace the European algorithm
(6.1) by to the following American algorithm:
1
Q
Q
(6.2)
Vn = Yn , Vt = max Yt ,
[p Vt+1 (H) + q Vt+1 (T)] , t = 0, . . . , n 1,
1+r
31
which checks whether the intrinsic value is greater than the value of the discounted risk-neutral
expectation, which would signify that the option would be exercised in that state. Then Vt would
be the value of the American derivative at time t T.
Remark 6.2 (Super-martingale property of American option price). In valuing European options
we found that the discounted option value is a Q-martingale (recall, for example, (5.19)). From
(6.2). We see that the value of the American option can be greater than that given by a discounted
risk-neutral expectation, because of the possibility of early exercise. In other words, we might
have that
1
Q
Vt > E
Vt+1 Ft ,
1+r
or, equivalently
EQ [(1 + r)(t+1) Vt+1 |Ft ] < (1 + r)t Vt ,
so that the option value is a Q-supermartingale. (It turns out that the value process of an
American option is the smallest supermartingale that dominates the payoff, though we do not
prove this here.)
Example 6.3 (American put in a 2-period model). Consider an American put option in a 2-period
binomial model with u = 2, d = 1/u, r = 1/4, S0 = 4, so that pQ = q Q = 1/2. Let the option
have payoff function Yt = (5 St )+ . The possible stock prices in this model are shown in Figure
9. The terminal values of the option are given by V2 = Y2 = (5 S2 )+ and these are also shown
in the figure.
S2 (HH) = 16, V2 (HH) = 0
S1 (H) = 8
@
@ S2 (HT) = S2 (TH) = 4, V2 (HT) = V2 (TH) = 1
S0 = 4
@
@
S1 (T) = 2
@
@S (TT) = 1, V (TT) = 4
2
2
32
MICHAEL MONOYIOS
which also yields 1 = 13/30. Now let us try to compute 2 in a similar manner:
X2 (HH) = (1 + r)X1 (H) + 2 (H)(S2 (HH) (1 + r)S1 (H)) = V2 (HH) = 0,
1
. The same result is obtained if one considers the wealth X2 (HT). Now
which yields 2 (H) = 12
let us try to compute 2 (T) as follows:
t = 1, . . . , n,
where, for t {0, 1, . . . , n 1}, Ct represents the amount of wealth consumed at time t. In other
words, we are allowing for some funds to be withdrawn from the self-financing portfolio. We
found earlier that, for a self-financing portfolio, the discounted wealth process ((1 + r)t Xt )nt=0
is a martingale. The consequence of allowing consumption from the portfolio will mean that the
discounted wealth process will be a supermartingale (i.e. it will tend to go down).
To appreciate why this adjustment might be needed, consider the American algorithm in (6.2).
We see that the value of the option can be greater than that given by a discounted risk-neutral
expectation, because of the possibility of early exercise. In other words, we might have that
1
Q
Vt+1 Ft ,
Vt > E
1+r
or, equivalently
EQ [(1 + r)(t+1) Vt+1 |Ft ] < (1 + r)t Vt ,
so that the option value is a supermartingale. (It turns out that the value process of an American
option is the smallest supermartingale that dominates the payoff, though we do not prove this
here.)
To see how consumption enters the hedging portfolio, consider the situation in which
1
Q
Vt+1 Ft .
(6.4)
Vt > E
1+r
Then the holder of the American option should exercise (this is the case in the state 1 = T in
Example 6.3), so that hedging should stop at this point (which is why we had difficulty isolating
what the hedging portfolio should be in the example). If the holder of the option does not exercise,
then the seller of the option may consume to close the gap between the left and right hand sides
of (6.4). By doing this, he can ensure that Xt = Vt for all t T, where Vt is the value defined by
the American algorithm.
In Example 6.3, we had V1 (T) = 3, V2 (TH) = 1, V2 (TT) = 4, so that
1
4 1
1
Q
E
V2 F1 (T) =
1 + 4 = 2,
1+r
5 2
2
33
and there is a gap if size 1 in (6.4). If the owner of the option does not exercise it at time 1 in the
state 1 = T, then the seller can consume an amount 1 at time 1. Thereafter he uses the usual
hedging portfolio
Vt (H) Vt (T)
.
t =
(u d)St1
In the example, we had V1 (T) = Y1 (T), which means that, acting optimally, the holder of the
option should exercise. It turns out that it is optimal for the owner of the American option to
exercise whenever its value Vt agrees with its intrinsic value Yt .
7. Brownian motion
We now move on to the continuous-time modelling of financial assets, based on a continuous
stochastic process called Brownian motion (BM). This has many remarkable properties, as we
shall see, not least of which it is simultaneously Gaussian, Markovian, and a martingale. Our first
foray into this subject is to think of BM as a suitably scaled and speeded up random walk.
We will use the notation where X N (, 2 ) denotes that a random variable X is normally
distributed with mean and variance 2 .
7.1. BM as scaled limit of symmetric random walk. Toss a coin infinitely many times, so
that the sample space is the set of all infinite sequences = (1 2 . . .) of H and T. One can
construct a well-defined probability space (, F, P) called the space of infinite coin tosses (though
this is not completely trivial, as is an uncountably infinite space), as well as a filtration (Fj )
j=0
on this space, where Fj denotes the -algebra generated by the first j coin tosses. We shall
not delve into the construction of infinite coin toss space here, but we take it as given that it is
well-defined. Chapters 1 and 2 of Shreve [17] has a detailed account.
Assume that each toss is independent. Take P{j = H} = P{j = T} = 21 , for j = 1, 2, . . . On
the infinite coin toss space (, F, P) define the random variables
1
if j = H,
Xj () :=
, j = 1, 2 . . .
1 if j = T,
So each Xj has mean zero and variance 1, (that is, E[Xj |Fj1 ] = E[Xj ] = 0 and E[Xj2 |Fj1 ] =
E[Xj2 ] = 1). Thus X1 , X2 , . . . is a sequence of independent, identically distributed (i.i.d.) random
variables. Then define the symmetric random walk M = (Mk )
k=0 via
M0 := 0,
Mk :=
k
X
Xj ,
k = 1, 2, . . . .
j=1
almost surely, as k .
By the Central Limit Theorem we know that for large k, Mk / k is approximately standard
normal:
1
Mk Z N (0, 1), almost surely, as k .
k
Brownian motion arises if we suitably speed up the tossing of the coins and scale the size of each
random walk increment. To this end, first fix some time t 0 in which k coin tosses take place,
with some time interval t between each toss, so that
t =: kt =:
k
,
n
34
MICHAEL MONOYIOS
where n := 1/t, for positive integers k, n. Then define a continuous time process W (n) via
r
1
t
(n)
Wt := Mnt =
Mk = tMt/t , t 0,
k
n
(n)
with linear interpolation used to define Wt for any times t 0 not of the form k/n. Take the
limit k , with t fixed. Equivalently, n , or equivalently, t 0, so we are speeding
(n)
up the coin tossing, and since, Wt = tMt/t = tMk , we are scaling each increment of the
Wt
Wt N (0, t),
as n ,
dWt
dt
(n)
(n)
W
Wt
= lim t+t
t0
t
1
= lim Xk+1 = .
t0
t
(n)
Vt
:=
(n)
Vt
0,
as n ,
(n)
(n)
and
(n)
V
Vt
dVt
= lim t+t
= lim Xk+1 = 1,
t0
t0
dt
t
(n)
so while the derivative of V
is defined (unlike that of W (n) ), the process V (n) is trivially zero
in the limit.
In other words the Brownian particle can only have motion if it has infinite velocity. This is
a manifestation of the fact that paths of Wt are almost surely continuous but not differentiable,
and of infinite length, as we shall see.
7.2. Brownian motion definition. We now give a standard definition of BM, and assume
without further mention that it exists, thought the existence of such a process is not trivial, and
there are many advanced texts devoted to various constructions of it (for instance, Karatzas and
Shreve [12]).
Definition 7.1 (Brownian motion). A standard 1-dimensional Brownian motion (BM) is a continuous adapted process W := (Wt , Ft )0t< on some filtered probability space (, F, F :=
(Ft )t0 , P) with the properties that W0 = 0 a.s. and, for 0 s < t, Wt Ws is independent of Fs
and normally distributed as Wt Ws N (0, t s).
The above definition is often summarised by saying that BM is a continuous process with
stationary and independent Gaussian increments (the stationarity refers to the fact that the
distribution of Wt Ws depends only on the elapsed time t s, and not individually on t, s).
7.2.1. Other definitions*. Here are some (non-examinable) remarks and pointers to more advanced topics.
Here is another definition of BM, based on a process called quadratic variation (one definition of
which is given below). Let M2 denote the space of right-continuous square-integrable martingales
on a complete filtered probability space (, F, F := (Ft )t0 , P): that is, for M := (Mt )t0 M2
we have M0 = 0 a.s., and E[Mt2 ] < , for all t 0.
35
Definition 7.2 (Quadratic variation). For M M2 , the quadratic variation (QV) of M is the
unique, increasing adapted process [M ] such that [M ]0 = 0 a.s. and such that (Mt2 [M ]t )t0 is
a martingale.
Definition 7.3 (Cross-variation). For X, Y M2 , define their cross-variation process ([X, Y ]t )t0
by
1
[X, Y ]t := ([X + Y ]t [X Y ]t ) ,
4
c
For X, Y M2 (i.e. continuous), this is the unique increasing adapted process [X, Y ] such that
[X, Y ]0 = 0 a.s. and such that ((XY [X, Y ])t )t0 is a martingale.
Remark 7.4. For Brownian motion W := (Wt )t0 , we have [W ]t = t, since Wt2 t is a martingale
(see Problem Sheet 2). Indeed, Brownian motion may be defined as the unique continuous process
that satisfies this property. This is related to the Levy criterion which we shall meet in Section
7.8.
7.2.2. Filtration generated by BM*. We denote by (Ft )t0 the filtration generated by Brownian
motion. Its required properties are:
For each t, Wt is Ft -measurable;
for each t and for t < t1 < t2 < . . . < tn , the Brownian motion increments Wt1 Wt , Wt2
Wt1 , . . . , Wtn Wtn1 are independent of Ft .
Here is one way to construct Ft . First fix t. Let s [0, t] and A B(R) be given. Put the set
{Ws A} = { : Ws () A}
in Ft . Do this for all possible numbers s [0, t] and all Borel sets A B(R). Then put in every
other set required by the -algebra properties. This -algebra Ft contains exactly the information
learnt by observing the Brownian motion up to time t, and (Ft )t0 is called the filtration generated
by the Brownian motion.
7.3. Properties of BM. We discuss some properties of BM, with starred subsections being (as
usual) non-examinable background.
7.3.1. Stationarity*. We say a stochastic process X = (Xt )t0 is stationary if Xt has the same
distribution as Xt+h for any h > 0. Brownian motion has stationary increments. To see this,
define the increment process I = (It )t0 by It := Wt+h Wt . Then It N (0, h), and It+h =
Wt+2h Wt+h N (0, h) have the same distribution. This is equivalent to saying that the process
(Wt+h Wt )h0 has the same distribution for all t.
7.3.2. Martingale property. The independent increments property allows us to show that BM is
a martingale. For 0 s t we have
E[Wt |Fs ] = E[Wt Ws + Ws |Fs ] = E[Wt Ws |Fs ] + Ws = E[Wt Ws ] + Ws = Ws .
7.3.3. Covariance of BM at different times*. Let 0 s t be given. Then Ws and Wt Ws are
independent, and (Ws , Wt ) are jointly normal with E[Ws ] = E[Wt ] = E[Wt Ws ] = 0, var(Ws ) = s,
var(Wt ) = t, var(Wt Ws ) = t s, so that the covariance of Ws and Wt is
cov(Ws , Wt ) := E[(Ws E[Ws ])(Wt E[Wt ])]
= E[Ws Wt ]
= E[Ws (Wt Ws + Ws )]
=
=
=
36
MICHAEL MONOYIOS
37
In this lemma, take G = Fs , X = Ws , Y = Ws+t Ws , and f (x, y) = h(x + y). Then define
g(x) := E[h(Ws+t Ws + x)]
= E[h(x + Wt ) (since Wt N (0, t) has the same distribution as Ws+t Ws )
= Ex [h(Wt )].
Then
E[h(Ws+t )|Fs ] = g(Ws ) = EWs [h(Wt )],
which is the Markov property.
Remark 7.6 (Strong Markov property*). In fact, Brownian motion has the strong Markov property
given below (though we do not prove this).
Fix x R and define
:= min{t 0|Wt = x}.
Then we have
E[h(W +t )|F ] = g(x) = Ex h(Wt ).
7.4. Quadratic variation of BM. This section is the most important one on BM, giving the
crucial property that we shall use repeatedly, and which will form the bedrock of the Ito stochastic
calculus to come.
Definition 7.7 (pth variation). Let P = {t0 , t1 , . . . , tn } be a partition of [0, t], i.e.
0 = t0 t1 . . . tn = t.
The mesh of the partition is defined to be
kPk =
max
k=0,...,n1
|tk+1 tk |
(p)
(p)
The pth variation of a function f : R+ R on an interval [0, t], [f, f ]t [f ]t , is defined by
(p)
[f ]t
(7.1)
n1
X
:= lim
kPk0
k=0
In particular, if p = 1 this is called the total variation (or the first variation) and if p = 2 this is
called the quadratic variation.
(1)
7.4.1. First variation. Consider the first variation (or total variation), [f ]t , of a function f .
Suppose f is differentiable. Then the Mean Value Theorem1 implies that in each subinterval
[tk , tk+1 ], there is a point tk such that
f (tk+1 ) f (tk ) = (tk+1 tk )f 0 (tk ).
Then
n1
X
|f (tk+1 f (tk )| =
k=0
n1
X
|f 0 (tk )|(tk+1 tk ),
k=0
and so
(1)
[f ]t
lim
kPk0
Z
=
n1
X
|f 0 (tk )|(tk+1 tk )
k=0
|f 0 (s)| ds.
Thus, first variation measures the total amount of up and down motion of the path of f over the
interval [0, t].
1The Mean Value Theorem states that if f is differentiable in (a, b), then there is a point x (a, b) at which
f (b) f (a) = (b a)f 0 (x).
38
MICHAEL MONOYIOS
(2)
(2)
[f ]t
|f (tk+1 f (tk )|
n1
X
k=0
k=0
kPk
n1
X
k=0
and so
[f ]t
kPk0
kPk0
t
Z
=
lim kPk
kPk0
n1
X
k=0
|f 0 (s)|2 ds
= 0.
Theorem 7.9. For Brownian motion W = (Wt )t0 we have
[W ]t = t,
t 0,
or more precisely
P{ : [W ]t () = t} = 1.
In particular, the paths of Brownian motion are not differentiable.
We shall first prove that the quadratic variation of Brownian motion over [0, t] is equal to t
in mean square. (We shall then prove that the result holds almost surely, but the almost sure
convergence proof is not examinable).
Recall that a sequence (Xn )nN of random variables converges in mean square (or in L2 (, F, P))
to a random variable X if E[|Xn X|2 ] 0 as n , and converges to X almost surely if
P{ |Xn () = X()} 1 as n .
Proof of Theorem 7.9 I: convergence in L2 . Let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. Set
Dk := Wtk+1 Wtk and define the sample quadratic variation
QP :=
n1
X
Dk2 .
k=0
Then
QP t =
n1
X
k=0
We want to show that limkPk0 (QP t) = 0 in mean square. Consider an individual summand
Dk2 (tk+1 tk ). This has expectation zero, so
E[QP t] = E
n1
X
[Dk2 (tk+1 tk )] = 0.
k=0
t)2 ]
39
var(QP t) =
n1
X
k=0
n1
X
var[Dk2 (tk+1 tk )]
E[Dk4 2(tk+1 tk )Dk2 + (tk+1 tk )2 ]
k=0
n1
X
k=0
n1
X
(tk+1 tk )2
= 2
k=0
2kPk
n1
X
(tk+1 tk )
k=0
= 2kPkt.
Thus we have
E(QP t) = 0,
var(QP t) 2kPkt.
in L2 .
Proof of Theorem 7.9 II: a.s. convergence*. To show that the convergence is also almost sure,
consider the dyadic partition tk = kt/2m , k = 0, 1, . . . , 2m , i.e. we partition [0, t] into 2m intervals
of width t/2m , so that the mesh of the partition approaches zero as m . Then the sample
quadratic variation over [0, t] may be written as
Qm (t) :=
m 1
2X
W(k+1)t/2m Wkt/2m
k=0
2
=:
m 1
2X
(Wk )2 ,
k=0
40
MICHAEL MONOYIOS
m 1
2X
(Wk )
k=0
m 1
2X
var(Wk2 )
k=0
= 2m 2
t
2m
2
2t2
0, as m .
2m
Therefore, since the limit of Qm (t) as m is [W ]t . we have established the mean square
convergence
=
[W ]t = lim Qm (t) t, in L2 .
m
Now we show almost sure convergence using the Chebyshev inequality and the Borel-Cantelli
lemmas (see, for instance Grimmett and Stirzaker [7], Section 7.3).2 By Chebyshevs inequality
we have, for a > 0,
2t2
1
P{|Qm (t) t| > a} 2 E[(Qm (t) t)2 ] = 2 m .
a
a 2
So
2t2 m2
P{|Qm (t) t| > 1/m} m2 E[(Qm (t) t)2 ] =
.
2m
P
Write Am = {|Qm (t)t| > 1/m}, and consider the sequence of events (Am )
m=1 . Then
m=1 P(Am ) <
, so by the Borel-Cantelli lemmas, the event that infinitely many of the Am occur has probability
given by
!
[
\
P lim sup Am = P
Ak = 0.
m
m=1 k=m
2Chebyshevs inequality follows from the following result, which is Theorem 7.3.1 in [7].
E[h(X)]
,
a
a > 0.
Proof. Let A := {h(X) a}. Then h(X) a1A . Taking expectations gives the result.
Setting h(x) = |x| gives Markovs inequality. Taking h(x) = x2 gives Chebyshevs inequality: P(|X| a)
E[X 2 ]/a2 .
The Borel-Cantelli lemmas (Theorem 7.3.10 in [7]) state:
Theorem 7.11 (Borel-Cantelli lemmas). Let A1 , A2 , . . . be an infinite sequence of events from some probability
space (, F, P). Let A be the event that infinitely many of the An occur (or {An infinitely often} = {An i.o.}, given
by
\
\
A := {An i.o.} = lim sup An =
Ak .
n
n=1 k=n
Then:
P
(1) P(A) = 0 if P
n=1 P(An ) < ,
(2) P(A) = 1 if
n=1 P(An ) = and A1 , A2 . . . are independent events.
41
almost surely.
7.5. Path length*. Given a continuous function f : [0, t] R its total variation over [0, t] is,
over any partition P = {0 = t0 t1 . . . tn = t} of [0, t], is
(1)
T V (f )t [f ]t
:= lim
kPk0
n1
X
k=0
This may be infinite, or some finite number, in which case we say that f has bounded variation.
Consider an element of arc length sk along f in the interval [tk , tk+1 ]. If this interval is
small, we have (sk )2 (tk )2 + (fk )2 , where we have written tk = tk+1 tk and fk =
f (tk+1 ) f (tk ). By the triangle inequality we have
|fk | |sk | |fk | + |tk |.
Denoting the total arc length (or path length) of f over [0, t] by s(f )t we therefore have, in the
limit kPk 0,
T V (f )t s(f )t T V (f )t + t.
Therefore,
finite path length T V (f )t < .
In contrast, the quadratic variation of f over [0, t] is
[f ]t =
lim
kPk0
n1
X
|fk ||fk |
k=0
max
n1
X
|fk | lim
|fk |
max
|fk | T V (f ).
lim
kPk0
k=0,...,n1
=
lim
kPk0
k=0,...,n1
kPk0
k=0
For any continuous function, limkPk0 (maxk=0,...,n1 |fk |) 03, so we conclude that
T V (f )t < [f ]t = 0 for all t 0.
In other words, paths of Brownian motion (Ws )0st over the interval [0, t] have infinite path
length.
Because the total variation of Brownian motion is infinite (i.e. Brownian paths are very long)
Rt
one is not readily able to give meaning to integrals with respect to Brownian motion, 0 bs dWs ,
via a path-by-path procedure. Thus we are led to a new type of integral, the It
o stochastic integral,
which we shall describe shortly.
Remark 7.12 (Heuristics). If we (formally) write dWt for the infinitesimal (corresponding to the
Rt
infinitesimal time interval dt) increase in Wt , then we have 0 dWs dWs = t, which is often
summarised by the formula
dWt dWt = dt,
or by
d[W ]t = dt.
Formally, note that if dWt dWt = dt, then in some sense dWt / dt = 1/ dt as dt 0. In
other words, Brownian motion is nowhere differentiable, as we saw earlier.
3This is a standard theorem from real analysis, proven from compactness arguments.
42
MICHAEL MONOYIOS
tk := tk+1 tk ,
k = 0, 1, . . . , n 1.
We have that
E[Dk2 ] = tk , var(Dk2 ) = 2(tk )2 .
It is tempting to argue that, because the variance of Dk2 is much smaller than its mean, then we
have that for small tk , Dk2 tk . But this equation has no content: when tk is small, it
would be true because both sides are near zero. A better way to capture what we think is going
might be to write
Dk2
1.
tk
But this is never true either. The left hand side is the square of the standard normal random
variable
Dk
Yk :=
N (0, 1),
tk
whose distribution is the same no matter how small we make tk .
To better understand what is going on, for some large positive integer n, define tk := kt/n, k =
0, 1, . . . , n, so that tk = t/n for all k = 0, 1, . . . , n 1. Then
Yk2
, k = 0, 1, . . . , n 1.
n
The
random variables Y0 , Y1 , . . . , Yn1 are i.i.d., so the Law of Large NumbersPimplies that
n1 2
1 Pn1 2
2
k=0 Dk conk=0 Yk converges to the common mean E[Yk ] = 1 as n , and hence
n
verges to t. Each of the terms Dk2 in this sum can be quite different from its mean tk = t/n,
but when we sum many terms like this, the differences average out to zero.
The point is that although we write dWt dWt = dt frequently, this has no rigorous mathematical meaning unless we consider the integrated relation [W ]t = t.
Dk2 = t
[W ]t := lim
kPk0
n1
X
Dk2 = t.
k=0
In addition to this, we can compute the cross variation of Wt with t or the quadratic variation of
t, given by
(7.3)
lim
kPk0
n1
X
Dk tk = 0,
k=0
lim
kPk0
n1
X
(tk )2 = 0.
k=0
We know that the second of these limits is zero since t is a differentiable function (see Lemma
7.8, or the argument below). To see that the first limit in (7.3) is zero, observe that
|Dk tk |
max |Dj | tk ,
0jn1
and hence
n1
X
Dk tk
max |Dj | t,
0jn1
k=0
43
n1
X
k=0
k=0
(tk )2 kPk
tk = kPkt,
dt dt = 0.
7.7. Other variations of Brownian motion. Consider the first variation (or total variation)
of BM, denoted by
n1
X
T V (W )t := lim
|Dk |.
kPk0
k=0
n1
X
lim
kPk0
Dk2
k=0
max |Dj | T V (W )t .
lim
kPk0 0jn1
:= lim
kPk0
n1
X
|Dk |3 ,
k=0
we have:
Lemma 7.14.
(3)
[W, W ]t
= 0,
t 0.
Proof.
(3)
[W, W ]t
:= lim
kPk0
n1
X
|Dk |3 [W ]t
k=0
lim
max |Dj | = 0.
kPk0 0jn1
7.8. L
evys characterisation of Brownian motion. BM W is a martingale with continuous
paths whose quadratic variation is [W ]t = t. In fact, this is a complete characterisation of BM,
given in the following Theorem (we give a sketch of the proof in Problem Sheet 2).
Theorem 7.15 (Levys theorem, 1-dimensional). Let M be a martingale relative to a filtration,
with M0 = 0, continuous paths, and [M ]t = t for all t 0. Then M is a BM.
integral
8. The Ito
We consider how to define an integral with respect to Brownian motion. The probability space
(, F, P) (with F = (F)t0 the filtration generated by Brownian motion) is given, and always
lurks in the background, even when not explicitly mentioned.
We want to construct the It
o integral, which we write as
Z t
It =
bs dWs , t 0.
0
The integrator is Brownian motion, (Wt )t0 , with associated filtration (Ft )t0 and the following
properties:
44
MICHAEL MONOYIOS
This wont work when the integrator is Brownian motion, because the paths of Brownian motion
are not differentiable.
8.1. It
o Integral of an elementary integrand. For some fixed T > 0 let P = {t0 , t1 , . . . , tn }
be a partition of [0, T ]:
0 = t0 t1 . . . tn = T.
Assume that b is constant on each interval [tk , tk+1 ), such that for t [tk , tk+1 ), bt = btk . We
call such a process b an elementary process, or a simple process. The Ito integral It of such a
process is defined as follows.
Definition 8.2 (It
o integral of elementary process). Suppose t [0, T ] is such that tk t < tk+1 ,
Rt
for some k {1, . . . , n 1}. Then the Ito integral It = 0 bs dWs of the elementary process b is
defined by
Z t
k1
X
It :=
bs dWs :=
btj (Wtj+1 Wtj ) + btk (Wt Wtk ), t 0.
0
j=0
Note that we can let T > 0 in the above definition be arbitrarily large and in this was construct
the integral It for any t 0, yielding the process (It )t0 .
8.1.1. Properties of the It
o integral of an elementary process.
Property 8.3 (Adaptedness). For each t 0, It is Ft -measurable;
Property 8.4 (Linearity). With
Z
It =
Z
bs dWs ,
Jt =
as dWs ,
0
then for , R,
Z
It + Jt =
(bs + as ) dWs .
0
k1
X
j=0
is a (P, F)-martingale.
45
Proof. Let 0 s t be given. We treat the general case that s and t are in different subintervals.
That is, there are partition points t` and tk such that s [t` , t`+1 ) and t [tk , tk+1 ).
Write
It =
k1
X
j=0
`1
X
j=0
(8.1)
k1
X
j=`+1
`1
`1
X
X
E
btj (Wtj+1 Wtj ) Fs
=
btj (Wtj+1 Wtj ),
j=0
j=0
E bt` (Wt`+1 Wt` )|Fs = bt` (E[Wt`+1 |Fs ] Wt` ) = bt` (Ws Wt` ).
These are the conditional expectations of the first first two terms on the RHS of (8.1). They add
up to Is and so contribute this to E[It |Fs ]. We show that the third and fourth terms contribute
zero:
k1
k1
X
X
E btj (E[Wtj+1 |Ftj ] Wtj )|Fs = 0,
j=`+1
and
E [btk (Wt Wtk )|Fs ] = E [btk (E[Wt |Ftk ] Wtk )|Fs ] = 0.
Property 8.7 (The It
o isometry). Because (It )t0 is a martingale and I0 = 0 we have E[It ] = 0 for
all t 0. It follows that var(It ) = E[It2 ], a quantity given by the formula in the next theorem.
Theorem 8.8 (It
o isometry). The It
o integral of the elementary process b, defined by
(8.2)
It :=
k1
X
j=0
satisfies
E[It2 ]
Z
=E
b2s ds
t 0.
k
X
j=0
b2tj Dj2 + 2
X
0i<jk
bti btj Di Dj .
46
MICHAEL MONOYIOS
We first show that the expected value of the cross terms is zero. For i < j, the random variable
bti btj Di is Ftj -measurable, while the Brownian increment Dj is independent of Ftj , so E[Dj |Ftj ] =
E[Dj ] = 0. Therefore,
E[bti btj Di Dj ] = E E[bti btj Di Dj |Ftj ]
= E bti btj Di E[Dj |Ftj ]
= 0.
Now consider the square terms b2tj Dj2 . The random variable b2tj is Ftj -measurable, while the
squared Brownian increment Dj2 is independent of Ftj , so E[Dj2 |Ftj ] = E[Dj2 ] = tj+1 tj , for
j = 0, . . . , k 1, and E[Dk2 |Ftk ] = E[Dk2 ] = t tk . Therefore,
E[It2 ] =
k
X
E[b2tj Dj2 ]
j=0
k
X
h
i
E E[b2tj Dj2 |Ftj ]
j=0
k
X
j=0
k
X
E[b2tj E[Dj2 ]]
j=0
k1
X
j=0
Rt
Rt
But btj is constant on [tj , tj+1 ), so b2tj (tj+1 tj ) = tjj+1 b2s ds and similarly, b2tk (t tk ) = tk b2s ds,
so
"Z
#!
Z t
k1
tj+1
X
2
2
2
E[It ] =
E
bs ds
+E
bs ds
tj
j=0
k1 Z
X
= E
j=0
Z
= E
tk
tj+1
tj
!
b2s ds
b2s ds
tk
b2s ds
Property 8.9 (Quadratic variation of the integral). The quadratic variation of the integral is the
quadratic variation process [I]t )t0 of the integral process I = (It )t0 . For Brownian motion we
Rt
Rt
Rt
may write Wt = 0 1 dWs and this has quadratic variation [W ]t = 0 12 d[W ]s = 0 1 dt. We
say that Brownian motion
accumulates quadratic variation at the rate of one per unit time. In
Rt
the Ito integral It = 0 bs dWs , BM is scaled in a time and path-dependent way (depending on
(s, ) [0, t] ) by the integrand bs . Because increments are squared in the computation of
quadratic variation, the QV of BM will be scaled by b2s as it enters the integral. The following
theorem gives the precise statement.
47
Theorem 8.10 (Quadratic variation of the Ito integral). Let b be a simple process. Then the It
o
integral
Z t
It =
bs dWs , t 0,
0
t 0.
m1
X
i=0
i=0
m1
X
2
btj
(Wsi+1
i=0
(Isi+1 Isi )2 =
(8.3)
As m and the mesh of the partition, maxi=0,...,m1 (si+1 si ) approaches zero, the term
Pm1
2
i=0 (Wsi+1 Wsi ) converges to the QV accumulated by BM over [tj , tj+1 ), which is tj+1 tj .
Therefore, the limit of the RHS of (8.3), which is the QV accumulated by the integral over
[tj , tj+1 ), is
Z tj+1
2
b2s ds,
btj (tj+1 tj ) =
tj
where we have use the fact that bs is constant for s [tj , tj+1 ). Similarly, the QV accumulated
Rt
by the integral over [tk , t] is tk b2s ds. Adding up all these contributions proves the theorem.
Informally, we establish the theorem in differential form via
dIt = bt dWt = d[I]t = dIt dIt = b2t dWt dWt = b2t d[W ]t = b2t dt,
just as we wrote d[W ]t = dWt dWt = dt earlier. In fact, one can do a lot of the calculations in
Ito calculus simply by applying the informal multiplication rules:
dWt dWt = dt,
dWt dt = dt dt = 0.
Remark 8.11. Note the contrast between Theorems 8.8 and 8.10. The QV [I]t is computed pathby-path, so the result can depend on the path, and so in principle is random. The variance of
the integral is precisely the expectation of the QV, as given by the Ito isometry (i.e. it is is an
average over all possible paths of the QV), and so is non-random.
8.2. It
o Integral of a general integrand*. It turns out that one can construct the Ito integral
for more general integrands, and the resulting integral inherits the same properties as the It
o
integral of an elementary integrand. We will take these properties as given and use them freely
from now on, but we do not prove them, and the general theory of Ito integration for general
adapted integrands (so proofs of all the results in this sub-section) is not examinable.
Fix t > 0. Let b be a process (not necessarily an elementary process) such that
bs his Fs -measurable,
s [0, t];
Rt 2 i
E 0 bs ds < .
48
MICHAEL MONOYIOS
Proof. See [17], Section 4.3, [15], Section 3.1, or [12], Section 3.2 and Problem 3.2.5 in [12].
We have shown how to define
(n)
It
b(n)
s dWs ,
=
0
The only difficulty with this approach is that we need to make sure the above limit exists. Suppose
m and n are large positive integers. Then
(n)
E[|It
(m) 2
It
(n)
(m)
| ] = var(It It )
"Z
2 #
t
(bs(n) b(m)
= E
s ) dWs
0
Z
(b(n)
s
(It
o isometry) = E
2
b(m)
s ) ds
Z
(|bs(n)
(triangle inequality) E
2
b(m)
s |) ds
bs | + |bs
Z t
Z t
(n)
2
(m) 2
2
2
2
((a + b) 2(a + b )) 2E
|bs bs | ds + 2E
|bs bs | ds ,
0
which approaches zero as m, n , by Theorem 8.12. This guarantees that the sequence
(n)
(It )
n=1 is a Cauchy sequence in L2 (, F, P) and so has a limit.
8.2.1. Properties of the general It
o integral. The general Ito integral is
Z t
It =
bs dWs ,
0
where b is any adapted, square-integrable process. Its properties are inherited from the properties
of Ito integrals of simple processes and are summarised below. You are expected to know these
properties as being inherited from the properties of the integral of elementary processes, but are
not required to be able to prove any of them in this general case.
Property 8.13 (Adaptedness). For each t 0, It is Ft -measurable;
Property 8.14 (Linearity). If
Z
It =
Z
bs dWs ,
Jt =
as dWs ,
0
then for , R,
Z
It + Jt =
(bs + as ) dWs .
0
49
It =
Ws dWs .
0
(n)
We approximate the integrand by an elementary process bs , s [0, t], in the following way.
Partition the interval [0, t] into n time intervals t, so that t = nt, and
0 = t0 < t1 = t =
(n)
and define bs
t
kt
< . . . < tk = kt =
< . . . < tn = t,
n
n
by
b(n)
s = Wtk = Wkt/n , if
kt
n
s<
(k+1)t
n ,
k = 0, . . . , n 1.
Then by definition
Z
It =
Ws dWs = lim
n1
X
Wkt/n W(k+1)t/n Wkt/n .
k=0
Ws dWs = lim
n1
X
Wk (Wk+1 Wk ).
k=0
Wk (Wk+1 Wk ) =
k=0
"n1
#
n1
X
1 X 2
2
2
(Wk+1 Wk )
(Wk+1 Wk )
2
k=0
k=0
!
n1
X
1
Wn2
(Wk+1 Wk )2 . (W0 = 0)
2
k=0
50
MICHAEL MONOYIOS
Remark 8.20 (Reason for the 12 t term). If f is a differentiable function with f (0) = 0, then
Z t
Z t
1
1
f (s)f 0 (s) ds = f 2 (s)|t0 = f 2 (t).
f (s) df (s) =
2
2
0
0
In contrast, for Brownian motion, we have
Z t
1
Ws dWs = (Wt2 t).
2
0
The extra term 21 t comes from the nonzero quadratic variation of Brownian motion. It has to be
Rt
there, because E[ 0 Ws dWs ] = 0 (the Ito integral is a martingale), but E[ 21 Wt2 ] = 12 t.
8.3. It
o and martingaleR representation theorems for Brownian
R t motion*. For adapted
t
integrands b satisfying E[ 0 b2s ds] < we have that the Ito integral 0 bs dWs is a martingale.
There is also a converse result, known as the martingale representation theorem (which do not
prove, see [15] for example).
Theorem 8.21 (It
o representation theorem for Brownian motion). Let (Wt )t0 be a Brownian
motion on a filtered probability space (, F, F := (Ft )t0 , P), with (Ft )t0 the natural filtration
Ft = (Ws , 0 s t). Suppose that X L2 (, Ft ,hP) (i.e. iX is Ft -measurable and E[X 2 ] < ).
Rt
Then there exists an adapted process b such that E 0 b2s ds < , t 0 and
Z t
X = E[X] +
bs dWs .
0
Theorem 8.22 (Martingale representation theorem for Brownian motion). Let (Wt )t0 be a
Brownian motion on a filtered probability space (, F, F := (Ft )t0 , P), with (Ft )t0 the natural
filtration Ft = (Ws , 0 s t). Suppose that the process M = (Mt )t0 is a square-integrable
martingale with respect to this filtration, written M M2 (that is, Mt L2 (, Ft , P) for
hR all t
i 0,
t 2
2
or E[Mt ] < , for all t 0). Then there exists an adapted process b such that E 0 bs ds <
, t 0 and
Z t
Mt = M0 +
bs dWs .
0
formula
9. The Ito
9.1. It
os formula for one Brownian motion. We want a rule to differentiate expressions
of the form f (Wt ). If the paths of Wt were differentiable then the ordinary chain rule would give
d
f (Wt ) = f 0 (Wt )Wt0 ,
dt
which could be written in differential notation as
df (Wt ) = f 0 (Wt )Wt0 dt = f 0 (Wt ) dWt .
However, Wt is not differentiable, and in particular has nonzero quadratic variation, so the correct
formula has an extra term, namely,
1
df (Wt ) = f 0 (Wt )Wt0 dt = f 0 (Wt ) dWt + f 00 (Wt ) d[W ]t ,
2
with the understanding that d[W ]t = dt. This is a version of It
os formula in differential form.
Integrating this, we obtain a version of Itos formula in integral form.
Theorem 9.1 (It
o formula for one BM). If f (x) is a C 2 (R) function and t 0, then
Z t
Z
1 t 00
0
(9.1)
f (Wt ) f (W0 ) =
f (Ws ) dWs +
f (Ws ) d[W ]s .
2 0
0
51
Remark 9.2 (Differential versus integral forms). The mathematically meaningful form of It
os
formula is its integral form, because we have solid definitions for the integrals appearing on the
RHS of (9.1). For pencil and paper computations, the more convenient form is the differential
form.
Proof of Theorem 9.1. Fix t > 0 and let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. By Taylors
theorem we have
n1
X
f (Wt ) f (W0 )
=
[f (Wtk+1 ) f (Wtk )]
k=0
n1
X
1 00
2
=
f (Wtk )(Wtk+1 Wtk ) + f (Wtk )(Wtk+1 Wtk ) +
2
k=0
Z t
Z t
1
kPk0
f 0 (Ws ) dWs +
f 00 (Ws ) d[W ]s ,
2 0
0
with higher order terms disappearing (since the third variation of BM is zero by Lemma 7.14 and
Pn1 3 Pn1
3
k=0 Dk
k=0 |DkR| , where Dk := Wtk+1 Wtk ) and the second-order summation converges to
t
the Riemann integral 0 f 00 (Ws ) d[W ]s , since it becomes the quadratic variation of an Ito integral.
That is, for the It
o integral
Z t
n1
X
It =
bs dWs = lim
btk (Wtk+1 Wtk ),
0
kPk0
k=0
we have
Z
b2s ds = [I]t =
lim
kPk0
lim
kPk0
n1
X
(Itk+1 Itk )2
k=0
n1
X
k=0
A heuristic derivation would simply state that, by Taylors theorem
1
df (Wt ) = f 0 (Wt ) dWt + f 00 (Wt ) dt,
2
where we have used dWt dWt = dt in the last term on the RHS, and higher order terms are
neglected.
Example 9.3. Applying the It
o formula to f (x) = x2 we have
d(Wt2 ) = 2Wt dWt + d[W ]t = 2Wt dWt + dt.
Integrating over [0, t] and re-arranging we get
Z t
1
Ws dWs = (Wt2 t), t 0,
2
0
which reproduces the result obtained in Example 8.19 from first principles.
Corollary 9.4 (It
o formula for function of time and one Brownian motion). If St = f (t, Wt ) for
some C 1,2 (R+ R) function f (t, x), then
1
dSt = df (t, Wt ) = ft (t, Wt ) dt + fx (t, Wt ) dWt + fxx (t, Wt ) d[W ]t ,
2
and higher order terms do not contribute, since we have shown earlier in Section 7.6 that we have
the informal rules dWt dt = 0 and dt dt = 0.
52
MICHAEL MONOYIOS
Definition 9.5 (Geometric Brownian motion). Geometric Brownian motion is the process S =
(St )t0 given by
1 2
St = S0 exp Wt + t ,
2
where and > 0 are constant, and the parameter is called the volatility of the process S.
Define
1 2
f (t, x) = S0 exp x + ( )t ,
2
df (t, Wt )
1
= ft (t, Wt ) dt + fx (t, Wt ) dWt + fxx (t, Wt ) dt
2
1 2
1
=
f (t, Wt ) dt + f (t, Wt ) dWt + 2 f (t, Wt ) dt
2
2
1
1
=
2 St dt + St dWt + 2 St dt
2
2
= St dt + St dWt ,
which is geometric Brownian motion in differential form. Geometric Brownian motion in integral
form may be written as
Z t
Z t
St = S0 +
Ss ds +
Ss dWs , t 0.
0
9.1.1. Quadratic variation of geometric Brownian motion. In the integral form of geometric Brownian motion,
Z
Z
t
St = S0 +
Ss ds +
0
Ss dWs ,
0
Ss ds
0
is differentiable with F 0 (t) = St . This term has zero quadratic variation. The Ito integral
Z t
G(t) =
Ss dWs
0
53
using the informal multiplication rules involving the differentials dt and dWt :
d[W ]t = dWt dWt = dt,
9.2. It
os formula for It
o processes.
Definition 9.7 (It
o process). Let (Wt , Ft )t0 be a standard Brownian motion. An Ito process is
a stochastic process of the form
Z t
Z t
bs dWs , t 0,
as ds +
(9.2)
Xt = X0 +
0
where
hR X0 iis non-random and a, b are adapted stochastic processes satisfying
t
E 0 b2s ds < .
Rt
0
Proof. This is immediate from the fact that the quadratic variation of
Rt
0
as ds is zero.
Definition 9.9 (Integral with respect to an Ito process). Let (Xt )t0 be the Ito process (9.2)
and let (t )t0 be an adapted process satisfying
Z t
Z t
E
s2 b2s ds < ,
|s as | ds < ,
0
Rt
2
0 bs ds.
54
MICHAEL MONOYIOS
It is usually easier to remember and use this theorem in the differential form
1
df (t, Xt ) = ft (t, Xt ) dt + fx (t, Xt ) dXt + fxx (t, Xt ) d[X]t ,
2
where d[X]t = dXt dXt is computed according to the rules
dt dt = dt dWt = dWt dt = 0,
Example 9.11 (Generalised geometric Brownian motion). Define the Ito process
Z t
Z t
1 2
Xt =
s dWs +
s s ds, t 0,
2
0
0
where , are adapted processes. Then
dXt
1
= t dWt + t t2
2
dt,
s Zs dWs ,
t 0,
hR
T
0
i
t2 Zt2 dt < .
Remark 9.13 (Novikov condition*). It can be shown (though outside the scope of this course)
that a sufficient condition on alone for Z to be a martingale is the Novikov condition
Z T
1
2
E exp
dt
< .
2 0 t
In particular, Z will be a martingale if is bounded (for instance, a constant).
55
or in differential form
(9.4)
for well-behaved (see Remark 9.14 further below) functions a(t, x), b(t, x). Then (9.4) (or equivalently (9.3)) is called a stochastic differential equation (SDE) for X, and the process X is Markovian (though we do not prove this here):
E[h(XT )|Ft ] = E[h(XT )|Xt ],
0 t T,
t 0,
The basic existence result is as follows. Suppose there is a constant K such that for all x, y, t we
have
|a(t, x) a(t, y)| K|x y|, |b(t, x) b(t, y)| K|x y| |a(t, x)| + |b(t, x)| K(1 + |x|).
(The first two conditions are Lipschitz continuity in x.) Then the SDE (9.5) has a unique, adapted,
continuous Markovian solution, and there exists a constant C such that
E[|Xt |2 ] CeCt (1 + |x|2 ).
56
MICHAEL MONOYIOS
0 t T,
for a function h(x) such that the above expectations are defined. A consequence of the Markov
property is that the right-hand-side of (9.7) is a function of (t, Xt ) only. Write
(9.8)
Assume that v C 1,2 ([0, T ] R) and that the process defined by Yt := v(t, Xt ) is integrable
(E[|Yt | < ).
Lemma 9.15. The process Y = (Yt )0tT defined by Yt := v(t, Xt ) is a martingale.
Proof. By the Markov property, we have Yt = E[h(XT )|Xt ] = E[h(XT )|Ft ]. Then, for 0 s
t T,
E[Yt |Fs ] =
=
=
=
=
Theorem 9.16 (Feynman-Kac). The function v defined by (9.8) solves the PDE
1
vt (t, x) + a(t, x)vx (t, x) + b2 (t, x)vxx (t, x) = 0,
2
(9.9)
v(T, x) = h(x).
Proof. By the It
o formula
dYt =
dv(t, Xt )
1
= [vt (t, Xt ) + a(t, Xt )vx (t, Xt ) + b2 (t, Xt )vxx (t, Xt )] dt + b(t, Xt )vx (t, Xt ) dWt .
2
Since Y is a martingale the coefficient of the dt term must be zero for all (t, Xt ), and (9.9)
follows.
Note that the PDE (9.9) may be written
(9.10)
vt (t, x) + Av(t, x) = 0,
v(T, x) = h(x),
57
i = 1, . . . , d.
n1
X
kPk0
i, j = 1, . . . , d,
k=0
n1
X
k=0
The increments appearing on the RHS of the above equation are all independent of one another
and all have mean zero. Therefore
E[CP ] = 0.
We compute var(CP ) = E[CP2 ]. First note that
CP2
n1
X
k=0
n1
X
+ 2
`<k
All the increments appearing in the sum of cross terms are independent of one another and have
mean zero. Therefore
var(CP ) =
E[CP2 ]
n1
X
k=0
58
MICHAEL MONOYIOS
But (Wtik+1 Wtik )2 and (Wtjk+1 Wtjk )2 are independent of one another, and each has expectation
(tk+1 tk ). It follows that
var(CP ) =
n1
X
(tk+1 tk ) kPk
k=0
n1
X
(tk+1 tk ) = kPkt.
k=0
dXt = at dt + bt dWt ,
where
at =
a1t
a2t
bt =
b11
b12
t
t
21
bt b22
t ,
or in integral form
Xt1 = x1 +
Xt2 = x2 +
t
1
12
2
a1s ds + b11
s dWs + bs dWs ,
0
t
1
22
2
a2s ds + b21
s dWs + bs dWs ,
or in compact form
Z
Xt = x +
as ds + bs dWs ,
0
t 0,
1
x
x=
.
x2
Such processes, consisting of a nonrandom initial condition, plus a Riemann integral, plus one or
more Ito integrals, are examples of semimartingales. The integrands as , bs can be any adapted
processes such that the relevant integrals exist. The adaptedness of the integrands guarantees
that X is also adapted.
4The convergence also holds almost surely, though we do not prove this here.
59
1, i = j,
0, i =
6 j.
10.2.1. Markovian diffusion case. If, in (10.1), we have at = a(t, Xt ), bt = b(t, Xt ) for wellbehaved5 functions a(t, x), b(t, x), so that
dXt = a(t, Xt ) dt + b(t, Xt ) dWt ,
then the process X is Markovian:
0 t T.
2
X
i=1
(10.2)
where
1 X X ij
a (t, x)fxi (t, x) +
(bb ) (t, x)fxi xj (t, x)
2
i
i=1 j=1
1
2
2 X
2
X
i=1 j=1
d[X, Y ]s ,
t 0.
60
MICHAEL MONOYIOS
10.3. Multidimensional It
o formula.
10.3.1. Multidimensional It
o process. Let Wt = (Wt1 , . . . , Wtd ) be a vector of d independent Brownian motions, that is, Wt is d-dimensional Brownian motion. We can use the Brownian motion
vector to form the following n It
o processes Xt1 , . . . , Xtn :
1
1d
d
dXt1 = a1t dt + b11
t dWt + + bt dWt
..
.
n
1
nd
d
dXt = ant dt + bn1
t dWt + + bt dWt ,
dXt = at dt + bt dWt ,
where
Xt1
Xt = ... at =
Xtn
(10.4)
a1t
.. b =
. t
ant
b11
t
..
.
b1d
t
..
.
nd
bt
..
.
bn1
Note that the coefficients a and b are required to satisfy certain conditions so that the integrals
implicit in the above equations are well defined. In particular, their elements should all be adapted
process, so that we know their values at time t if we know Xt .
Theorem 10.6 (Multidimensional It
o formula). Suppose Xt satisfies (10.3). Let
f (t, x) = (f 1 (t, x), . . . , f p (t, x))
be a twice differentiable map from [0, ) Rn into Rp . Then the process Yt := f (t, Xt ) is again
an It
o process, whose k th component, Ytk , is given by the multidimensional It
o formula as
n
dYtk =
(10.5)
i=1
where
d[X i , X j ]
X f k
1 X X 2f k
f k
(t, Xt ) dt +
(t, Xt ) dXti +
(t, Xt ) d[X i , X j ]t ,
t
xi
2
xi xj
dXti dXtj
i=1 j=1
dt dt = dWti dt = dt dWti = 0.
f
xi
=
,
xi
|x|
ij
xi xj
2f
=
.
2
|x|
|x|3
xi
Then
dRt
n
n
n
X
Wti
1 XX
=
dWti +
|Wt |
2
i=1
i=1 j=1
n
X
W i dW i
t
i=1
Rt
n1
dt.
2Rt
ij
W iW j
t 3t
|Wt |
|Wt |
!
ij dt
61
10.4. Multi-dimensional Feynman-Kac theorem. Recall the connection between stochastic calculus for Markov diffusions and partial differential equations (PDEs), the Feynman-Kac
theorem (Theorem 9.16).
There is an obvious generalisation to a multi-dimensional situation. We content ourselves with
the following two-dimensional version. Suppose we have a two-dimensional diffusion X = (X 1 , X 2 )
following
(10.6)
where
a(t, Xt ) =
a1 (t, Xt )
a2 (t, Xt )
, b(t, Xt ) =
0 t T..
2
X
i=1
1 X X ij
(bb ) (t, x)fxi xj (t, x).
2
i=1 j=1
Theorem 10.8 (Feynman-Kac, two-dimensional). The function v(t, x) in (10.7) satisfies the
PDE
vt (t, x) + Av(t, x) = 0, v(T, x) = h(x),
where A is the generator of the diffusion (10.6).
10.5. The Girsanov Theorem*. Given a Brownian motion W := (Wt )0tT on (, F, F, P)
with the filtration F := (Ft )0tT being that generated by W , and given an adapted process
:= (t )0tT , define the (local) martingale Z by
Z t
Z
1 t 2
Zt := E( W )t := exp
s dWs
ds , 0 t T,
2 0 s
0
where E is the so-called Doleans exponential. We have that Z follows
dZt = t Zt dWt .
Then, provided satisfies the Novikov condition
Z T
1
2
(10.8)
E exp
dt
< ,
2 0 t
we can define a new probability measure Q P on F FT by
Z
Q(A) =
ZT dP, A F,
A
Z
:= Wt +
s ds,
0 t T,
dQ
dP
EQ [X] = E[XZT ].
62
MICHAEL MONOYIOS
Remark 10.9. The Novikov condition (10.8) is sufficient to guarantee that Z is a (P, F)-martingale,
so that E[ZT ] = 1 and Q is indeed a probability measure.
As well as (10.9) we have the following results connecting conditional expectations under Q
and P.
Let 0 t T . If X is Ft -measurable, then
EQ [X] = E[XZt ].
The Bayes formula: If X is Ft -measurable and 0 s t T , then
Zs EQ [X|Fs ] = E[XZt |Fs ].
There is a multi-dimensional version of Girsanovs Theorem. Once again we content ourselves
with a two-dimensional version. Given a two-dimensional Brownian motion W = (W 1 , W 2 ) on
a stochastic basis (, F, F := (Ft )0tT , P), and a two-dimensional adapted process = (1 , 2 ),
define a (local) martingale Z by
Zt = E( W )t E(1 W 1 2 W 2 )t
Z t
Z t
Z
1 t 1 2
1
1
2
2
2 2
= exp
s dWs
s dWs
(s ) + (s ) ds .
2 0
0
0
Then, provided we have the two-dimensional Novikov condition
Z T
1 2
1
2 2
< ,
(t ) + (t ) ds
(10.10)
E exp
2 0
we can define a new probability measure Q P on F FT by
Z
Q(A) =
ZT dP, A F,
A
WQ
(W Q,1 , W Q,2 )
WtQ,1
defined by
Z t
Z t
Q,2
1
:= Wt +
s ds, Wt := Wt +
s2 ds.
0
dSt
(0)
= rSt dt,
(0)
S0 = 1,
63
where r 0 is the interest rate. Hence the price of the riskless asset is given by the usual
accumulation factor
(0)
St = exp(rt), t 0.
The model makes a number of idealised assumptions, of a continuous-time frictionless market.
This means that continuous trading is possible, with the absence of any exploitable arbitrage
opportunities, there are no trading costs or limits or taxes (so assets can be held in any amount),
assets are divisible and short-selling is always permitted. We also assume constant parameters in
the price model (this can be relaxed to some extent) and that the stock pays no dividends (this
can be relaxed).
11.1. Portfolio wealth evolution. An agent trades a portfolio of stock and cash in a selffinancing manner, meaning all profits and losses are generated by price changes and by adjusting
the proportion of wealth allocated to the stock and the bond. Let X = (Xt )0tt denote the
wealth process of the agent.
Denote the (adapted) processes for number of shares in the bond and in the stock by H (0) and
H, so that the wealth of the agent at time t [0, T ] is
(0) (0)
(11.1)
0 t T.
Xt := Ht St + Ht St ,
The self-financing condition asserts that once we set the portfolio up we will neither put any more
money into it nor take any out; any increase in the number of stocks must be financed by selling
bonds, any increase in the number of bonds must be financed by selling stocks, and nothing is
sold unless the funds are needed to buy something else. In other words, on seeing the new bond
and stock prices and deciding how many units of each to buy and sell, the change in the number
of stocks must be financed by the change in the number of bonds, and vice versa.
Lemma 11.1. A self-financing portfolio satisfies
(11.2)
(0)
(0)
(self-financing condition).
Proof. In the time interval [t, t + dt) the wealth evolves to Xt + dXt , given by
(0)
(0)
(0)
+ dSt )
(0)
(0)
(0)
+ dHt )(St
(0)
+ dSt ),
(0)
the last equality following from the rebalancing of the portfolio to new positions (Ht + dHt , Ht +
(0)
dHt ). Hence, the self-financing portfolio satisfies
(0)
(0)
(0)
+ dSt ) dHt
= 0,
(0)
dSt
(0)
(0)
+ St dHt
and augmenting this with the self-financing condition (11.2) we arrive at the evolution of the
wealth of a self-financing portfolio, given by
(0)
dXt = Ht dSt + Ht
(0)
dSt
(0) (0)
Many books simply take this as a definition of a self-financing portfolio. Using the definition
(11.1) of X we can write
dXt = Ht dSt + r(Xt Ht St ) dt.
64
MICHAEL MONOYIOS
11.2. Perfect hedging. Consider selling a European claim with payoff h(ST ) at time zero. Suppose that there exists a function v : [0, T ] R+ R+ such that the claims value at t [0, T ]
is v(t, St ) (so we suppose the claim is sold for v(0, S0 )). The goal is to characterise the function
v(t, x) that is consistent with the no-arbitrage principle.
We suppose that the proceeds from the option sale are invested in a self-financing portfolio
following, so X0 = v(0, S0 ). We want to show that we can achieve replication, that is, find a
portfolio whose final value matches the option payoff. To achieve this we insist that the portfolio
wealth matches the option value at all times t [0, T ]:
Xt = v(t, St ),
0 t T.
dXt = dv(t, St ).
Ht = vx (t, St ) =: t ,
0 t T.
This is the celebrated Black-Scholes (BS) delta hedging rule, and the quantity in (11.4) is called
the delta of the claim.
Using (11.4) and equating terms multiplying dt in (11.3) yields that the option pricing function
v(t, x) must satisfy the BS PDE
1
vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0.
2
For v to represent the option pricing function we would also require the terminal condition
v(T, ST ) = h(ST ). We then have a terminal value problem for v:
1
vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x).
2
Provided we can solve this PDE, then we must have that, to avoid arbitrage, that v(t, St ) must
be the unique option price at time t [0, T ]. If it were not, an immediate arbitrage opportunity
affords itself. For instance, if the claim is available in the market at time zero at a price V0 >
v(0, S0 ), then one can sell the claim and invest in the in the replicating portfolio. The excess
V0 v(0, S0 ) can be invested in the bank account. At time T , one uses the proceeds from the
replicating portfolio to pay ones obligations under the claim, leaving a profit of (V0 v(0, S0 )erT >
0. A symmetric argument is possible if V0 < v(0, S0 ), with reversed positions in the claim and
the replicating portfolio.
(11.5)
11.2.1. Riskless portfolio argument. An equivalent route to the BS PDE is to construct a riskless
portfolio involving the option and the stock. Take a position of one unit in the claim and a short
position of H shares, that is, a position H in the stock, so the overall portfolio has wealth Y
given by
Yt = v(t, St ) Ht St ,
where we once again assume the price process for the claim is given by some function v(t, St ).
The dynamics of the portfolio are given by
dYt =
dv(t, St ) Ht dSt
1
= vt (t, St ) dt + vx (t, St ) dSt + vxx (t, St ) d[S]t Ht dSt .
2
65
Choose H such that the terms involving dSt vanish (so that the terms involving dWt vanish).
This implies that H must be chosen such that
Ht = vx (t, St ),
0 t T,
matching the delta hedging condition we found earlier. With this choice, the portfolio value will
only contain a finite variation term (that is, a dt term) in its dynamics, so in the absence of
arbitrage it must be a riskless portfolio satisfying
dYt = rYt dt.
Combining this with the above choice for H yields once again that the pricing function v must
satisfy the BS PDE as before.
Notice that the BS PDE has no dependence on the stocks P-drift . This is a legacy of
removing all risk associated with the claim.
11.3. Feynman-Kac solution of the BSM equation. By the Feynman-Kac Theorem, a solution to (11.5) is given by
v(t, x) = EQ [er(T t) h(ST )|St = x],
(11.6)
(11.7)
where W Q is a Q-Brownian motion. This measure is called a risk-neutral measure, or an equivalent local martingale measure (ELMM), because under it, the discounted stock price is a local
martingale:
d(ert St ) = ert St dWtQ .
We will see later a probabilistic argument that gives a justification for the risk-neutral valuation
result (11.6).
Note that, under the measure Q, the stock price has an average growth rate of r, so in this sense
behaves like a riskless asset. This is no accident. The measure Q has arisen out of an argument
in which all risk associated with a claim was eliminated by dynamic trading. The result of this is
that the claim can be priced by expectation with the caveat that one treats the stock as though
its price grows, on average, like that of a riskless asset. For this reason the formula (11.6) is often
called a risk-neutral valuation formula.
11.4. BS option pricing formulae. We will use the risk-neutral evaluation formula
v(t, x) = EQ [er(T t) h(ST )|St = x],
to derive formulae for some European options.
11.4.1. European call price. For a European call, h(x) = (x K)+ , and under the ELMM Q the
log-stock price is Gaussian. Given St = x, under Q we have
1 2
log ST = log x + r (T t) + (WTQ WtQ ), 0 t T.
2
Hence the probability law of ST under Q, given St = x, is
LawQ [log ST |St = x] = N(m(x, t, T ), 2 (t, T )),
where
0 t T,
N(m, s2 )
denotes the Gaussian probability law of mean m and variance s2 , and where
1 2
m(x, t, T ) = log x + r (T t), 2 (t, T ) = 2 (T t), 0 t T.
2
In terms of Y := log S, we have, writing c(t, x) for the call option pricing function:
c(t, x) = er(T t) EQ [(eYT K)1{YT >log K} |Yt = log x].
66
MICHAEL MONOYIOS
Then an easy computation using Gaussian integrals gives the celebrated Black-Scholes formula
for a call option as c(t, St ), where
(11.9)
+ r + (T t) ,
y =
log
K
2
T t
where () denotes the standard cumulative normal distribution function, defined by
2
Z y
1
u
(y) :=
du,
exp
2
2
so that (y) is the probability that a standard normal random variable (one with mean zero and
variance 1) is less than or equal to x. We have (y) = 1 (y), by the symmetry with respect
to negation of the function exp(u2 /2). The call price function is plotted as a function of stock
price in Figure 10.
Intrinsic and Time Value of European Call
9
8
K=10, r=10%, sigma=25%, T=1year
7
call price, C
8
10
stock price, S
12
14
16
18
67
with y defined in (11.9). The call option delta is plotted as a function of stock price in Figure
11. It is positive, meaning that if one sells a call option, it is hedged with a dynamically adjusted
long position in the stock.
Delta of Call Option
0.8
call delta
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
stock price
1.2
1.4
1.6
1.8
Figure 11. Black-Scholes call delta as function of stock price. The parameters
are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
For a put option, the delta can be computed using put-call parity, which implies that cx (t, x)
px (t, x) = 1, so that
px (t, y) = (y),
and this function is plotted in Figure 12. It is negative, meaning that if one sells a put option, it
is hedged with a dynamically adjusted short position in the stock.
11.5.2. Theta. The theta of a call is
x
0 (y).
ct (t, x) = rKer(T t) (y T t)
2 T t
Because () and 0 () are always positive, theta is always negative, meaning that the price of
an option declines as we approach maturity (if all other factors remain unchanged). This is true
regardless of whether the option is a call or a put (as you can easily verify using put-call parity).
11.5.3. Gamma. The gamma of a call is
cxx (t, x) =
0 (y),
x T t
which is always positive, and is equal to pxx (t, x), the put gamma (again this follows easily from
put-call parity). The BS gamma is plotted in Figure 13.
Gamma is closely related to volatility and to the risk introduced into the BS hedging program
if trading is not continuous. To get an intuitive understanding of this effect, notice that gamma
measures how quickly the delta of an option changes as the stock price changes. If the magnitude
of gamma is small, then delta changes slowly, so a trader will not have to re-hedge very often in
68
MICHAEL MONOYIOS
Delta of Put Option
0.2
put delta
0.4
0.6
0.8
0.2
0.4
0.6
0.8
1
stock price
1.2
1.4
1.6
1.8
Figure 12. Black-Scholes put delta as function of stock price. The parameters
are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
Gamma of European Option
2.5
gamma
1.5
0.5
0
0
0.2
0.4
0.6
0.8
1
stock price
1.2
1.4
1.6
1.8
Figure 13. Black-Scholes call gamma as function of stock price. The parameters
are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
69
order to maintain delta neutrality. On the other hand, if the magnitude of gamma is large, then
the trader must re-hedge very often to maintain delta neutrality.
To make a portfolio gamma neutral, one cannot use a position in the underlying asset, as this
has zero gamma. In other words, gamma neutrality can only be achieved by adding more option
positions to ones portfolio.
11.5.4. Vega. The vega of an option is the derivative of the option price with respect to volatility,
and given by
c (t, x; ) = x T t0 (y).
This function is plotted versus the stock price in Figure 14. Note the similarity with the gamma
plot, which captures our intuitive notion that gamma does indeed measure sensitivity to volatility
in some way.
Vega of European Option
0.7
0.6
0.5
Vega
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
stock price
1.2
1.4
1.6
Figure 14. Variation of vega with stock price. The parameters are K = 1,
r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
11.6. Probabilistic (martingale) interpretation of perfect hedging*. The wealth process
in the BSM model satisfies
dXt = rXt dt + Ht St ( dt + dWt ),
where := ( r)/ is called the market price of risk of the stock. Define the discounted stock
e by
price process Se and discounted wealth process X
et := ert Xt , t 0.
Set := ert St , X
e X
e satisfy, under the physical measure P,
Then S,
et = Ht dSet = Ht Set ( dt + dWt ).
dSet = Set ( dt + dWt ), dX
By the Girsanov theorem, if we define the measure Q P
dQ
= ZT := exp WT
dP
by
1 2
T
2
,
70
MICHAEL MONOYIOS
t 0,
is a Brownian motion under Q. Hence the discounted stock price and discounted wealth process
are local Q-martingales (and in fact, can be shown to be martingales in the BSM model).
The measure Q, known as an equivalent (local) martingale measure (ELMM), or as a riskneutral measure, is defined as one such that traded asset prices are local Q-martingales, and
notice that it is uniquely defined in the BSM model. This is a consequence of the model being
complete (though we do not prove this here).
Earlier, we found a portfolio wealth process which could replicate the payoff of an option, and
in this case it must satisfy Xt = v(t, St ) = for all t [0, T ], else there is arbitrage. This led us to
the BS PDE and, via the Feynman-Kac theorem, to the representation for the option value:
v(t, St ) = EQ [er(T t) v(T, ST )|St ],
0 t T.
Hence the discounted option price is a Q-martingale, and hence the discounted wealth process of
a replicating strategy must also be a Q-martingale.
There is a purely probabilistic route to these conclusions, with the arguments as follows. Begin
with the Q-dynamics of the wealth process of any self-financing trading strategy:
Z t
(11.10)
t Xt = X0 +
s Hs Ss dWsQ , 0 t T,
0
ert
where t :=
is the discount factor for t [0, T ].
Introduce a European contingent claim with FT -measurable payoff C at time T , and then define
the Q-martingale
Mt := EQ [T C|Ft ], 0 t T.
Then by the representation of Brownian martingales as stochastic integrals (the martingale repRT
resentation theorem), there exists an adapted process ; [0, T ] R with EQ [ 0 t2 dt] <
such that we have
Z
T
s dWsQ ,
Mt = M0 +
0 t T.
s dWsQ ,
0 t T.
a.s.
so that replication is guaranteed. This argument relies only on the martingale representation
theorem and the existence of a unique measure Q such that the discounted wealth process is a
local Q-martingale. The parameters , in the BSM model could just as well have been random,
provided they were F-adapted processes. Notice that the wealth process of the replicating portfolio
is actually a Q-martingale, not just a local martingale.
Since XT = C for the replication portfolio wealth process, we must have that the value of the
claim at all earlier times is equal to the wealth process. Let V be the value process of the claim.
We then have
(11.12)
Vt = Xt ,
0 t T,
71
0 t T,
0 t T,
the delta hedging rule we found before, and that the claim pricing function must satisfy
vt (t, x) + LS,Q v(t, x) rv(t, x) = 0,
which is the BS PDE.
We are seeing a manifestation of deep results connecting absence of arbitrage with existence
of ELMMs and with completeness. These are called the Fundamental Theorems of Asset Pricing
(FTAPs). In continuous asset price models such as the BSM model, the theorems state that
no-arbitrage is equivalent to the existence of an ELMM, and completeness is equivalent to there
being a unique ELMM. Here is an easy part of the statements in the FTAPs to prove.
Lemma 11.2. If a model has an equivalent martingale measure Q such the discounted wealth
e := X is a Q-martingale, then it admits no arbitrage.
process X
Proof. Suppose there is an arbitrage. Then there exists a portfolio wealth process X with
e0 = 0,
X
eT 0,
X
a.s.
eT > 0] > 0.
and P[X
eT ] = X
e0 = 0.
EQ [X
72
MICHAEL MONOYIOS
11.7. Black-Scholes-Merton analysis for dividend-paying stock. Suppose S pays dividends at a constant dividend yield q. Then the wealth dynamics for a portfolio with H shares of
S and H (0) shares of S (0) become
(0)
(0)
dSt ,
(0) (0)
since the dividend income received in the interval [t, t+ dt) is qHt St dt. Using Xt = Ht St +Ht St
(0)
(0)
and dSt = rSt dt, this converts to
(11.15)
It is easy to apply the same replication analysis as in Section 11.2 to once again yield the same
delta hedging rule as before:
Ht = vx (t, St ), 0 t T,
and this time the option pricing function satisfies the BS PDE with dividend yield q, given as
1
vt (t, x) + (r q)xvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0,
2
with v(T, x) = h(x) (for a claim with path-independent payoff). We then obtain the risk-neutral
pricing formula
v(t, x) = EQ [er(T t) h(ST )|St = x],
where EQ denotes expectation under Q P, and under which S follows
dSt = (r q)St dt + St dWtQ .
One can then go through a similar computation for the price function c(t, x) of a call option, to
obtain
y =
+ r q + (T t) .
log
K
2
T t
Notice that we can obtain this formula by the replacement x xeq(T t) in the original BS
formula (11.8).
Using put-call parity, namely,
c(t, St ) p(t, St ) = St eq(T t) Ker(T t) ,
we can compute the put option price function as
73
So the terms involving dSt equate provided we choose the usual delta-hedging rule:
Ht = vx (t, St ),
0 t T,
and then the remaining terms equate provided v satisfies the BS PDE with time-dependent
parameters:
1
vt (t, x) + (r(t) q(t))xvx (t, x) + 2 (t)x2 vxx (t, x) r(t)v(t, x) = 0,
2
with v(T, x) = h(x) (for a claim with path-independent payoff).
Define the (deterministic) discount factor
Z t
r(s) ds , t 0.
(t) := exp
0
Consider the discounted claim price process u(t, St ) := (t)v(t, St ). This satisfies
1
ut (t, x) + (r(t) q(t))xux (t, x) + 2 (t)x2 uxx (t, x) = 0, u(T, x) = (T )h(x).
2
By the Feynman-Kac theorem, u(t, x) has the representation
Z T
Q
u(t, x) = E exp
r(s) ds h(ST ) St = x ,
0
where
EQ
with W Q a Q-BM. Then it is immediate that v(t, x) is given by the risk-neutral pricing formula
Z T
Q
v(t, x) = E exp
r(s) ds h(ST ) St = x .
t
Applying the It
o formula (under Q) to log S we have
1 2
d(log St ) = r(t) q(t) (t) dt + (t) dWtQ .
2
Given t T and St = x, we therefore have
Z T
Z T
1
log ST = log x +
r(s) q(s) 2 (s) ds +
(s) dWsQ .
2
t
t
So the law of the log-stock price process under Q is Gaussian:
LawQ (log ST |St = x) = N (m, 2 ),
(11.16)
with
Z T
1 2
m = log x +
r(s) q(s) (s) ds, 2 =
2 (s) ds.
2
t
t
In the standard BS model, the law of the terminal stock price is given by
Z
2
LawQ
BS (log ST |St = x) = N (mBS , BS ),
with
1
mBS = log x + r q 2 (T t), 2BS = 2 (T t).
2
Comparing with the law of the log-stock price in (11.16), it is evident that we obtain BS-style
formulae for option prices if we make the following replacements in the standard formulae: r r,
q q,
, where
Z T
Z T
Z T
2
r(T t) =
r(s) ds, q(T t) =
q(s) ds,
(T t) =
2 (s) ds.
t
74
MICHAEL MONOYIOS
0 t T,
0 t T.
0 t)
) = dSt rKer(T
0 t)
) dt.
So the change in value for the holder of a futures contract entered into at time t ought to be the
amount
df (t, T 0 ; Ft,T 0 ) = dSt rSt dt.
Note that
(12.1)
0 t)
) = er(T
0 t)
0 t)
df (t, T 0 ; Ft,T 0 ).
So, in the BS model, under the physical measure P, we have futures price dynamics
dFt,T 0 = Ft,T 0 [( r) dt + dWt ].
The mechanics of futures markets are such that the holder of a futures contract receives the
amount dFt,T 0 in the interval [t, t + dt) (despite the fact that this is not the change in value of
the associated forward contract).
Definition 12.1. A futures contract with maturity T 0 is a contract which costs nothing to acquire
at any time t [0, T 0 ], and is such that after each time interval [t, t + dt) the contract holder
0
receives the amount dFt , where Ft = St er(T t) .
The result of this is that, if you hold a dynamic portfolio of futures contracts with position
(Ht )0tT 0 plus cash, then the associated portfolio wealth evolves according to
dXt = Ht dFt,T 0 + rXt dt.
Comparing this with (11.15) we see that a futures contract can be considered as an asset which
pays a dividend yield q = r.
75
12.2. Options on futures contracts. In this section, the maturity time T 0 of a futures contract
will be fixed, so we write Ft Ft,T 0 from now on.
A European option, with maturity T T 0 , on a futures contract with maturity T 0 , is a contract
with payoff h(FT ) at time T . For instance, a call option with strike K on a futures contract pays
(FT K)+ at time T . Note that if T = T 0 , then since FT = ST , the futures option in this case
pays the same as a conventional option on the stock.
If one holds a dynamic portfolio of futures contracts with position H = (Ht )0tT 0 , plus some
cash, the associated portfolio wealth at time t evolves according to
dXt = Ht dFt + rXt dt.
Notice that this evolution is precisely that which we would obtain for an asset with price process
F and dividend yield q = r. Thus means that we can value a futures option with BS-style
formulae provided we set the underlying asset price to F and the dividend yield to r, as we now
demonstrate.
Consider a European futures option with maturity T , and with price process (v(t, Ft ))0tT ,
where v(t, x) is some function. This evolves according to
1
dv(t, Ft ) = vt (t, Ft ) dt + vx (t, Ft ) dFt + vxx (t, Ft ) d[F ]t .
2
We attempt to hedge this option with a dynamic portfolio of futures contracts. Imposing the
replication condition
Xt = v(t, Ft ), 0 t T,
and hence also requiring dXt = dv(t, Ft ), gives the required hedge as
Ht = vx (t, Ft ),
0 t T.
Then using the fact that d[F ]t = e2r(T t) d[S]t (from (12.1)), we have d[F ]t = 2 Ft2 dt (in the
BS model), so the futures option price function solves the PDE
1
vt (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x),
2
which is indeed the BS PDE for an asset with dividend yield q = r.
Hence we obtain, via the Feynman-Kac theorem, the risk-neutral valuation formula
v(t, x) = EQ [er(T t) h(FT )|Ft = x],
where, under Q, F follows
dFt = Ft dWtQ ,
with W Q a Q-BM. Note that F is a Q-martingale, since we have
1 2
Q
Q
FT = Ft exp (WT Wt ) (T t) ,
2
t T,
0
so EQ [FT |Ft ] = Ft for t T (and this is also clear from the fact that Ft = St er(T t) =
EQ [ST 0 |Ft ] = EQ [FT 0 |Ft ].)
We observe that we can indeed recover the option price formula from the standard BS model
with dividends by treating the futures option as being written on an underlying asset with price
process F and dividend yield q = r. Hence, the call price function for a futures option is given by
1
x
1 2
r(T t)
c(t, x) = e
[x(y) K(y T t)], y =
log
+ (T t) .
K
2
T t
Lemma 12.2 (Put-call parity). For European call and put futures options with common maturity
T and strike K, the prices at t T satisfy
c(t, Ft ) p(t, Ft ) = er(T t) (Ft K),
0 t T.
76
MICHAEL MONOYIOS
Proof. One can either observe that the required relation follows from the standard put-call parity
relation for an asset with price process F and dividend yield q = r, or else compute that
c(T, FT ) p(T, FT ) = FT K.
Then a discounted risk-neutral expectation, using the fact that the discounted option prices as
well as the futures price are Q-martingales, gives the result.
From the put-call parity relation, the put pricing function is obtained as
1
x
1
p(t, x) = er(T t) [K( T t y) x(y)], y =
log
+ 2 (T t) .
K
2
T t
13. Multi-asset derivatives
It is straightforward to generalise the BSM analysis to the situation where there is more than
one stock in the market and when there are derivatives written on multiple stocks. The key idea
is that we can still perfectly hedge claims on many assets provided the number of traded stocks
is the same as the number of independent Brownian motions driving the stock prices. This keeps
the market complete, as we shall see.
To be concrete, we shall consider a market with two stocks. Their prices S, Y are assumed to
follow the geometric Brownian motions (GBMs)
dYt = Yt ( dt + dBt ), , R, , R+ ,
p
where B, W are correlated BMs, with B = W + 1 2 W , for [1, 1] and W, W
independent BMs. A European claim on S, Y , has payoff h(ST , YT ) at time T . Denote its price
process by (v(t, St , Yt ))0tT , for some function v(t, x, y). The interest rate is r 0.
Form a dynamic self-financing portfolio using S, Y and cash. Let the processes for the number
of shares of S, Y be H S , H Y respectively. The portfolio wealth process X = H S S + H Y Y + C (C
being the cash holding) follows
(13.1)
dSt = St ( dt + dWt ),
where AQ
S,Y denotes the generator of S, Y under the EMM Q:
1 2 2
2 2
AQ
S,Y v := r(xvx + yvy ) + ( x vxx + y vyy ) + xyvxy ,
2
where the arguments of the function have been omitted for brevity. Under Q, the asset prices
follow
p
dSt = St (r dt + dWtQ ), dYt = Yt (r dt + dBtQ ), B Q = W Q + 1 2 W ,Q ,
and (W Q , W ,Q ) is a two-dimensional Q-BM.
77
Using the Feynman-Kac theorem, we can write down an expectation representation for v(t, x, y):
v(t, x, y) = EQ [er(T t) h(ST , YT )|St = x, Yt = y],
0 t T.
Example 13.1. Suppose h(x, y) = xy. We can derive a closed-form formula for v(t, x, y) in this
case. Under Q, S, Y are given by
1 2
1 2
Q
Q
St = S0 exp r t + Wt , Yt = Y0 exp r t + Bt , t 0,
2
2
so in particular, given St Yt = xy at t T , we can compute
EQ [er(T t) ST YT |St = x, Yt = y] = xy exp(r + )(T t),
p
where we have used B Q = W Q + 1 2 W ,Q .
This can also be obtained by showing that U = SY follows a GBM. Under P
(13.2)
dUt = Ut (a dt + b dZt ),
p
where Z = (W + B)/b, a = + + , b = 2 + 2 + 2. Under Q, we make the
replacements r, R (and W W Q , B B Q ) and proceed from there.
Example 13.2. An European exchange option allows the holder to exchange one asset S for another
asset Y if this is favourable, so has payoff h(ST , YT ) = (ST YT )+ . In Problem Sheet 3 we show
that in the two-dimensional lognormal model (13.1), the value of the exchange option at t T ,
given St = x, Yt = y, is given by the so-called Margrabe formula:
1
x
1 2
log
+ (T t) ,
v(t, x, y) = x(d) y(d T t), d =
y
2
T t
where 2 = 2 + 2 2. Notice that the result does not depend on the interest rate.
14. American options in the BSM model
We give an informal treatment of American option pricing in the BS model. The full theory
requires the machinery of stopping times and the theory of optimal stopping, both of which are
beyond the scope of the course.
An American claim with (path-independent) payoff function h() and maturity T pays h(St )
(the so-called intrinsic value) to its holder if exercised at time t [0, T ]. For instance, an American
put pays (K St )+ if exercised at t [0, T ]. It would thus never be exercised if St K. We
conjecture that exercise would occur only if the stock price dropped to a low enough level below
the strike, and we suppose the existence of some critical stock price at each t [0, T ], denoted
Sf (t), such that the option is exercised at t [0, T ] if St Sf (t). The function Sf : [0, T ] R+
is called the optimal exercise boundary, and is an example of a free boundary. This means it is
not known a priori, and must be computed as part of the solution to the pricing problem.
Let (v(t, St ))0tT denote the price process of an American claim. We must have
v(t, St ) h(St ),
t [0, T ].
If this were not so, an immediate arbitrage opportunity would ensue: one could buy the option
and exercise it immediately to make profit h(St ) v(t, St ).
Recall the American valuation algorithm (6.2) in the binomial model, repeated below:
h
i
Vn = h(Sn ), Vt = max h(St ), EQ [(1 + r)1 Vt+1 |Ft ] , t = 0, . . . , n 1.
At each time, one checks whether the intrinsic value is is greater than the value of the discounted
risk-neutral expectation, in which case the option would be exercised at that time. We have two
possibilities:
Either Vt > h(St ), in which case the option is not exercised, and then the discounted
option price is constant on average, or
78
MICHAEL MONOYIOS
Vt = h(St ), in which case the is not exercised, and then the discounted option price is
decreasing on average.
We use this to make plausible the result in continuous time. We conjecture that v(t, St ) is
given by comparing the value of immediate exercise with the value of a discounted expectation
under the EMM. Denote the discounted option price process by u(t, St ) := ert v(t, St ). Under
the EMM Q, we have
h
i
du(t, St ) = ert LBS v(t, St ) dt + St vx (t, St ) dWtQ ,
where LBS denotes the BS operator:
(14.1)
1
LBS v(t, x) := vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x).
2
Observe that when the discounted option value is constant on average, then we have LBS v(t, St ) =
0, with LBS v(t, St ) < 0 when the discounted option value is decreasing on average.
We have two possibilities at each time t [0, T ]:
Either v(t, St ) = h(St ), so that exercise is optimal, and the discounted option price is
decreasing on average, that is LBS v(t, St ) < 0, or
v(t, St ) > h(St ), so that the option is not exercised, and the discounted option price is
constant on average, that is LBS v(t, St ) = 0.
This suggests that the option pricing function satisfies
max[h(x) v(t, x), LBS v(t, x)] = 0,
(14.2)
We interpret (14.2) as follows. At time t [0, T ] the option holder faces two possibilities:
(1) Exercise the option, in which case the maximisation in (14.2) is achieved by the first term,
and we have v(t, St ) = h(St ) (corresponding to St Sf (t) for a put), and LBS v(t, St ) < 0
(so the discounted option price would be decreasing on average, that is, ert v(t, St ) is a
Q-supermartingale). We say that the stock price is in the stopping region.
(2) Do not exercise the option, in which case the maximisation in (14.2) is achieved by the
second term, and we have v(t, St ) > h(St ) (corresponding to St > Sf (t) for a put), and
LBS v(t, St ) = 0 (so the discounted option price would be constant on average, that is,
ert v(t, St ) is a Q-martingale). We say that the stock price is in the continuation region.
We conclude that the American option price function solves the free boundary problem
v(t, x) h(x),
(14.3)
LBS v(t, x) 0,
with the first inequality holding as equality when we are in the stopping region, in which case the
second inequality is strict, or else the first inequality is strict and the second holds as equality,
when we are in the continuation region.
Example 14.1 (American put). Denote the American put price by pA (t, St ) when the stock price
at t [0, T ] is St . There is some free boundary Sf (t), which must be determined as part of the
solution to the problem, such that the American put pricing function pA (t, x) satisfies
(14.4)
LBS pA (t, x) = 0,
(14.5)
pA (t, x) = K x,
where we have used the fact that Sf (t) K to write (K x)+ = K x in the stopping region.
We would also have the boundary conditions
pA (t, Sf (t)) = K Sf (t),
p(T, x) = (K x)+ ,
lim pA (t, x) = 0,
along with one further condition, on the derivative of the pricing function at the critical stock
price x = Sf (t), called the smooth pasting condition, which we shall discuss shortly.
79
Figure 15 shows a schematic drawing of the form of the optimal exercise boundary for the
American put. This boundary has been computed numerically in many papers but there is no
known closed form expression for it (just as there is no exact closed form expression for the
American put price).
Exercise boundary for American put
Stock price
Continuation region
Stopping region
t=T
t=0
Time
80
MICHAEL MONOYIOS
S_{f}(t)
S_{t}
p (t, x) = K x,
LBS pA (t, x)
< 0, x Sf (t)
(continuation region),
(stopping region),
pA
x (t, Sf (t) = 1,
p(T, x) = (K x)+ ,
lim pA (t, x) = 0.
14.2. Optimal stopping representation*. We do not prove this here, but the American option
pricing function for a claim with intrinsic value (h(St ))0tT is given by
(14.6)
v(t, x) =
sup
0 t T,
T (t,T )
where T (t, T ) denotes the class of F-stopping times with values in [t, T ]6 and F denotes the
underlying Brownian filtration.
6A stopping time with respect to the filtration F = (F )
t t[0,T ] is a random time variable, with values in [0, T ],
such that the event { t} lies in Ft for every t [0, T ].
81
Put price
Violation of smooth pasting condition (i)
Slope of price function too steep at S_{f}(t)
S_{f}(t)
S_{t}
Put price
Violation of smooth pasting condition (ii)
Slope of price function too shallow at S_{f}(t)
S_{f}(t)
S_{f}(t)
S_{t}
82
MICHAEL MONOYIOS
0 1,
0 x0 x1 .
3
f(x0+(1)x1)f(x0)+(1)f(x1)
2.5
f(x)
1.5
0.5
0.5
x0+(1)x1
x0
0
0.2
0.4
0.6
0.8
1
x
1.2
x1
1.4
1.6
1.8
h(x) h(x),
Using the conditional Jensen inequality and the fact that (ert St )0tT is a Q-martingale, we
have
(14.9)
E Q [h(er(ts) St )|Fs ] h E Q [er(ts) St |Fs ] = h(Ss ), 0 s t T.
Combining (14.8) and (14.9) gives the submartingale property for (ert h(St ))0tT .
Theorem 14.3. Let h(x) be a non-negative convex function of x 0 satisfying h(0) = 0. Then
the value of the American claim expiring at time T and having intrinsic value (h(St ))0tT , is
the same as the value of the European claim with payoff h(ST ).
Proof. Set 0 t T in Lemma 14.2 to obtain
(14.10)
0 t T,
which says that the value of the European claim always dominates the exercise value of the
American claim, that is, there is no value in early exercise.
Applying the above results to the call payoff shows that an American call on a non-dividendpaying stock is never exercised early.
83
Remark 14.4. An alternative (but equivalent) way to proceed is as follows. Let Zt := ert h(St ) =
ert (St K)+ be the discounted intrinsic value process of the call, which is a Q-submartingale.
Then, for all stopping times [t, T ] we must have7
EQ [ZT |Ft ] EQ [Z |Ft ] Zt ,
t T.
0 t T.
But we also have the reverse inequality cA (t, St ) c(t, St ), hence cA (t, St ) = c(t, St ).
15. Simple exotic options
Any option which is not a plain vanilla call or put is called an exotic option. There are
(usually) no markets in these options and they are bought over-the-counter (OTC). Effective
risk management of these products is important as they are much less liquid than standard options.
They often have discontinuous payoffs and can have large deltas near expiration which can make
them difficult to hedge.
We recall that we can price any option with payoff function h(ST ) via the risk-neutral valuation
formula
v(t, x) = EQ [er(T t) h(ST )|St = x], 0 t T,
where Q denotes the EMM under which the stock price follows
dSt = rSt dt + St dWtQ ,
and that the risk-neutral pricing formula corresponds to the Feynman-Kac solution of the BS
PDE
1
vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x).
2
Note that the risk-neutral valuation result generalises to any payoff, not necessarily one that is
just a function of the final stock price. For a (possibly path-dependent) European claim payoff C
(some FT -measurable random variable), its price at t T is given by
Vt = EQ [er(T t) C|Ft ],
0 t T.
15.1. Digital options. Digital (or binary, or cash-or-nothing) options have discontinuous payoffs.
For example, the cash-or-nothing digital call with strike K and maturity T has payoff hc/n (ST )
given by
0, if ST < K
c/n
h (ST ) =
1, if ST K.
The price of a digital call at t T is cc/n (t, St ), given by
cc/n (t, x) = er(T t) EQ [1{ST >K} |St = x],
and the usual computation with Gaussian integrals yields
1
x
1 2
c/n
r(T t)
c (t, x) = e
(y T t),
log
+ r + (T t) .
K
2
T t
An alternative way of obtaining the above formula is to observe that the standard call option has
price function given by
84
MICHAEL MONOYIOS
0 t T,
which follows from cc/n (T, ST ) + pc/n (T, ST ) = 1 along with a risk-neutral expectation.
An asset-or-nothing call option has payoff
0,
if ST < K
a/n
h (ST ) =
ST , if ST K.
Once again, a computation using Gaussian integrals can be used to give the price at time t T .
Or we can use the observation that a vanilla call option can be decomposed into a portfolio of an
asset-or-nothing option plus a short position in K cash-or-nothing options. We have
c(t, x)
cc/n (t, x)
= xer(T t)
(y)
K.
(y T t)
15.3. Multi-stage options. Multi-stage options are contracts allow decisions to be made, or
stipulate conditions, at intermediate dates during the life of the contract. An example is a forward
start call option with maturity T and forward start time T1 < T (with the contract initiated at at
some time t T1 ). At time T1 < T , the option holder receives an at-the-money call option (one
whose strike is equal to the asset price at T1 , ST1 ) with maturity time T .
The procedure for valuing an option with expiry at T and some intermediate stage with date
T1 is to first determine the final payoff of the option at time T , then determine the value of this
payoff at the intermediate time T1 , and then determine the value at t T1 by using the value of
the contract at time T1 as a terminal payoff.
Denote the price function of the multi-stage option by v m/s (t, x). Then
v m/s (t, x) = EQ [er(T1 t) v m/s (T1 , ST1 )|St = x],
0 t T1 < T.
85
Example 15.1 (Chooser option). A chooser option allows the holder at time T1 the choice of buying,
for an amount K1 , either a call or a put, these options having with strike K and maturity T > T1 .
The terminal payoff at time T is therefore (ST K)+ or (K ST )+ .
At time T1 , the chooser option will be exercised if either of the underlying call or put at that
time are worth more than the chooser strike K1 , that is, if c(T1 , ST1 ) > K1 or p(T1 , ST1 ) > K1 .
The holder of the chooser option will select the vanilla option with the larger T1 -value. Hence the
value of the chooser option at time T1 is given by
v ch (T1 , ST1 ) = max(c(T1 , ST1 ; T, K) K1 , p(T1 , ST1 ; T, K) K1 , 0).
For K1 = 0, we can use put-call parity to write
v ch (T1 , ST1 ) = c(T1 , ST1 ; T, K) + max(0, Ker(T T1 ) ST1 ).
Hence the chooser with strike K1 = 0 is equivalent to a call of maturity T and strike K plus a
put of maturity T1 and strike Ker(T T1 ) . The value of the chooser with strike K1 = 0 at t T1
is then given by
v ch (t, St ) = c(t, St ; T, K) + p(t, St ; T1 , Ker(T T1 ) ).
Problem Sheet 4 values two types of multi-stage option, a forward start option and a ratchet
option.
16. Barrier Options
Barrier options are claims that are activated or de-activated if the asset price crosses a barrier.
The se claims have path-dependent payoffs, that is, the history of the asset price process determines the payout at expiry. Barrier options are sometimes described as weakly path dependent,
because their value turns out to depend only on time and the current asset price.
There are two broad classes of barrier option: knock-in options (which are activated if the
barrier is breached) and knock-out options (which are de-activated if the barrier is breached).
The knock-in options are classified as follows:
(1) An up-and-in option is activated if the barrier is hit from below.
(2) A down-and-in option is activated if the barrier is hit from above.
The knock-out options are classified as follows:
(1) An up-and-out option is de-activated (so becomes worthless) if the barrier is hit from
below.
(2) A down-and-out option, is de-activated (so becomes worthless) if the barrier is hit from
above.
Call options with a barrier allow price reduction over the plain vanilla call. If you want to buy
a call, and believe the asset price will not fall very much, then a down-and-out call is cheaper than
a standard call (though of course you run the risk of losing the option if the barrier is crossed).
More complex barrier options, with multiple barriers, can be constructed. For example, a
double knock-out option has two barriers, say B2 > B1 , and is de-activated if the barrier B1 is
breached from above, and also if the barrier B2 is breached from below.
Some American digital options can be viewed as barrier options, since they become active, and
are exercised, as soon as the asset price reaches the strike. For instance, an American cash-ornothing call pays hc/n (St ) if exercised at t [0, T ], where
0, if St < K
c/n
h (St ) =
1, if St K.
If the current asset price lies above the strike, the option is immediately exercised, as there will
be no greater payoff from waiting. If the current asset price is below the strike, the option would
not be exercised, but as soon as the strike is breached from below, the option is exercised.
86
MICHAEL MONOYIOS
Barrier options sometimes incorporate a rebate. In this case, if the option knocks out, then the
holder receives a rebate R. This is equivalent to adding an American digital option whose strike
is equal to the barrier. For example, consider a down-and-out call with barrier B and rebate R.
If the barrier is breached from above, the call is cancelled, and the holder receives cash R. Hence,
the holder of the down-and-out call with rebate holds a standard down-and-out call plus a long
position in R American cash-or-nothing puts, each with strike B and payoff g c/n (St ) if exercised
at t [0, T ], given by
1, if St B
g c/n (St ) =
0, if St > B.
16.1. PDE approach to valuing barrier options. We outline a PDE approach to valuing
barrier options, using the case of a down-and-out call option. Note that we may consider only
knock-out options, as the value of knock-in options can be found from the observation that a
portfolio of a knock-out option plus a knock-in option with the same barrier and strike is equivalent
to a standard option. Hence, for a down-and-out call with price cd/o (t, St ) and a down-and-in call
with price cd/i (t, St ) at time t [0, T ], we have
cd/o (t, St ) + cd/i (t, St ) = c(t, St ),
where c(t, St ) c(t, St ; K) denotes the price of a vanilla call with the same strike K as the two
barrier options.
Denote the running minimum of the stock price by
mt := min Ss ,
0st
0 t T.
for St > B,
LBS
where
denotes the BS operator in (14.1).
Now, for a standard call we have boundary conditions c(T, ST ) = (ST K)+ , c(0, S0 ) = 0,
limSt c(t, St ) = St . For the down-and-out call option, we still have the first and third of
these as long as the option is still active. But since the option is cancelled (and hence becomes
worthless) as soon as the barrier is breached from above, in place of the zero boundary condition
at zero stock price we instead have a zero boundary condition at the barrier:
cd/o (t, B) = 0.
(16.1)
In summary, the problem of valuing the down-and-out-call becomes one of finding a solution to
the BS PDE subject to this altered boundary condition:
(16.2)
for St > B,
We now perform a change of variables which converts the BS PDE to the heat equation. Define
x
1
y(x) := log
, (t) := 2 (T t).
K
2
8This can be justified using a probabilistic approach to the valuation, by computing a risk-neutral expectation
of the payoff (ST K)+ 1{mT <B} , as in Remark 16.1.
87
Then u will satisfy a heat equation with an appropriate initial condition, as we now show. We
have
x
1
c(t, x) = Ku( (t), y(x)) exp (a (t) + by(x)) , (t) = 2 (T t), y(x) = log
.
2
K
So
u
c
d u
1
= Kea +by
+ au = 2 Kea +by
+ au .
t
dt
2
Similarly
u
c
K a +by u
a +by dy
= Ke
+ bu = e
+ bu ,
x
dx y
x
y
and
u
K a +by
2u
2c
(2b 1)
= 2e
+ b(b 1)u + 2 .
x2
x
y
y
Then computing the LHS of the BS PDE gives the diffusion equation u = uyy on using the
definitions of a, b and some algebra.
The boundary condition c(T, x) = (x K)+ translates to
u(, y) :=
exp
+
1
1
(k + 1)y exp
(k 1)y
.
2
2
y 2
2
2
We can apply the same change of variables to the down-and-out call. That is, we write
cd/o (t, x) = Kud/o ( (t), y(x)) exp (a (t) + by(x)) ,
for some function ud/o (, y). The down-and-out call price function satisfies the heat equation
for x > B, and the value of y corresponding to the barrier level B is yB := log(B/K), so the
boundary condition (16.1) becomes
(16.4)
ud/o (, yB ) = 0.
(Notice also that limx cd/o (t, x) = x translates to limy ud/o (, y) = exp(a + (1 b)y.)
Further, the transformed down-and-out call price function ud/o satisfies the heat equation for
x > B, so the IBVP (16.2)
d/o
ud/o
(, y) = uyy (, y),
for y > yB ,
ud/o (, yB ) = 0.
Having translated BS-type PDEs to the heat equation, we may apply the so-called method of
images to the problem of the down-and-out call. The argument is as follows.
The problem (16.3) for the standard call option is analogous to the flow of heat in an infinite
rod (since < y < ) with initial condition u(0, y) = hc (y). The corresponding problem for
the down-and-out call is analogous to the flow of heat in a semi-infinite rod with the temperature
held at zero at one end, corresponding to the boundary condition (16.4). The method of images
relies on the key observation that solutions to the heat equation are unaffected by translation or
reflection of the spatial co-ordinate: if u(, y) solves the heat equation, then so do u(, y + y0 ) and
u(, y + y0 ), for any constant y0 .
88
MICHAEL MONOYIOS
Hence, to solve the down-and-out call valuation problem (in the semi-infinite interval) we
use a solution made up of the solution of two infinite problems with equal and opposite initial
temperature distributions which cancel at the point yB , so that we get the correct boundary
condition (16.4). Then, by uniqueness of the solution to the initial-and-boundary-value problem
(IBVP) the resulting function is the solution we want.
We therefore reflect the initial data about the point yB , and at the same time change its sign
(creating an image solution) and combine the original solution and the image solution so as to
respect the boundary condition (16.4). The required initial condition is thus given by
ud/o (0, y) = hc (y) hc (2yB y),
which automatically satisfies ud/o (0, yB ) = 0. The solution for the down-and-out call at arbitrary
time is then given by
ud/o (, y) = u(, y) u(, 2yB y),
(16.5)
where u is the solution of the vanilla call problem, and by construction we respect the boundary
condition (16.4). In other words we have written
ud/o (, y) = u(, y) + u0 (, y),
where u0 (, y) = u(, 2yB y) is a solution of a problem on an infinite interval with antisymmetric initial data.
Translating the solution (16.5) into the original variables, we have
cd/o (t, x) = Kud/o (, y)ea +by
= K(u(, y) u(, 2yB y))ea +by
= c(t, x) Ku(, 2yB y))ea +by .
But for any y R we have
Using
2yB y = log
B 2 /x
K
,
we obtain
Ku(, 2yB y))ea +by = c(t, B 2 /x)e2b(yyB ) ,
or
x 12r/2
c(t, B 2 /x).
B
Hence the down-and-out call price function for the case B < K is given by cd/o (t, x) cd/o (t, x; K),
where)
x 12r/2
c(t, B 2 /x), B < K.
cd/o (t, x) = c(t, x)
B
16.1.2. Second case: barrier above strike. If the barrier is above the strike, B > K, then the
payoff of the down-and-out call is discontinuous. The payoff at the terminal time is that of a
standard call option with strike B together with (B K) cash-or-nothing calls of strike B. We
can apply the same reasoning as above to construct the reflected solution, to find that the price
function of down-and-out call is given by
Ku(, 2yB y) =
89
Remark 16.1 (Probabilistic valuation of barrier option*). An alternative to the PDE derivation
above would be to value the down-and-out call using a risk-neutral expectation:
cd/o (t, x) = er(T t) EQ [(ST K)+ 1{mT >B} |St = x].
This would require the joint distribution of ST and the running minimum of the stock price. (It
is not obvious that the result for the down-and-out value would depend only on the current value
of the stock price, but this turns out to be true.) This type of computation is carried out for an
up-and-out call in Shreve [17], Section 7.3.3.
17. Lookback options
Lookback options allow the holder to purchase (respectively, sell) a stock at the lowest (respectively, highest) price attained over the lifetime of the option, or to achieve the maximum absolute
difference between the stock price at maturity and its maximum (or minimum).
Example 17.1 (Floating and fixed strike lookback put payoffs). Denote the running maximum of
the stock price by
Mt := max Ss , 0 t T.
0st
Then the payoff of the floating strike lookback put option is (MT ST ).
The payoff of the fixed strike lookback put option is (K mT )+ , where m denotes the running
minimum of the stock price.
17.1. PDE satisfied by lookback pricing function. Let (Vt )0tT denote the price process
of a lookback option. For concreteness, let us consider a floating strike lookback put, with payoff
VT = h(ST , MT ) = MT ST . Recall that the general risk-neutral valuation result is valid for
path-dependent payoffs, so the price at time t T of the option is given by
Vt = er(T t) EQ [MT ST |Ft ]
0 t T,
and in particular, the discounted option price process is a Q-martingale. We shall use this property
shortly do derive a PDE satisfied by the pricing function of a floating strike lookback option.
Lemma 17.2. The two-dimensional process (S, M ) = (St , Mt )0tT is a Markov process.
To prove this lemma We will needuse the following result, called the Independence Lemma, a
proof of which is given in the Appendix (Lemma B.2).
Lemma 17.3 (Independence Lemma). Let X and Y be two random variables (which can be
vector valued) on a probability space (, F, P). Let G be a sub--algebra of F. Suppose that X is
independent of G while Y is G-measurable. Then for any function f such that E[|f (X, Y )|] < ,
we have
E[f (X, Y )|G] = g(Y ),
where
g(y) := E[f (X, y)].
Proof. See the Appendix (Lemma B.2).
Proof of lemma 17.2. So, for a fixed t [0, T ] consider EQ [f (ST , MT )|Ft ] (we shall work under
the EMM Q, though this does not affect the result). The stock price is given by
1 2
1
Q
r , t 0,
St = S0 exp (t + Wt ) , :=
2
for some Q-BM W Q . Define the process
ct := W Q + t,
W
t
t [0, T ],
90
MICHAEL MONOYIOS
t [0, T ].
c, M
c, we have
In terms of W
(17.1)
ct ),
St = S0 exp( W
ct ),
Mt = S0 exp( M
t [0, T ].
c
c
ST = St exp (WT Wt ) ,
c
c
MT = Mt exp (MT Mt ) ,
t [0, T ].
cT M
ct = 0. Hence, we have
Otherwise, M
+
+
c
c
c
c
c
c
c
c
= max (Wu Wt ) (Mt Wt )
.
MT Mt = max Wu Mt
tuT
tuT
cT M
ct )
= Mt exp (M
"
+ #
M
t
cu W
ct ) log
= Mt exp
.
max (W
tuT
St
Y := (Y1 , Y2 ) := (St , Mt ).
91
Proof. Let P = {t0 , . . . , tn } be a partition of [0, t] for any fixed t 0, and write Mk :=
Mtk+1 Mtk for k = 0, 1, . . . , n 1. Compute the sample quadratic variation
n1
n1
X
X
2
QP :=
(Mk )
max Mk
Mk .
k=0
k=0,...,n1
k=0
This converges to zero as kPk 0 because the first factor on the RHS vanishes, due to the fact
that the paths of M are continuous.
Observe that the above proof worked because because M has non-decreasing paths so we did
not have to use the absoulte value |Mk | inside the maximisation operator.
So, M has zero quadratic variation, and similar arguments (consider these as exercises) show
that it is of finite variation (equal to Mt M0 for any t 0), and that the cross variation between
S and M is zero: [S, M ]t = 0 for all t 0. However, note that there is no process such that
Rt
Mt = M0 + 0 u du. It turns out (though we do not show this here, see Shreve [17] Section 7.4.2
for more details) that M only increases on sets of Lebesgue measure zero. (The paths of M are
said to be singularly continuous.)
Theorem 17.5 (PDE for floating strike lookback option). In the standard BSM model, the pricing
function v(t, x, y) of the floating lookback put option
v(t, x, y) := er(T t) EQ [MT ST |St = x, Mt = y],
(17.2)
satisfies the PDE
(17.3)
BS
v(t, x, y)
1 2 2
vt + rxvx + x vxx rv (t, x, y) = 0,
2
0 t < T,
0 < x y,
v(t, 0, y) = er(T t) y, 0 t T, y 0,
vy (t, y, y) = 0, 0 t < T, y > 0,
v(T, x, y) = y x, 0 x y.
92
MICHAEL MONOYIOS
Proof. We have
v(t, x, y) = er(T t) EQ [MT ST |St = x, Mt = y].
Define the processes S () := S, M () := M . Re-write v(t, x, y) as
()
() + ()
()
r(T t) Q 1
(17.8)
v(t, x, y) = e
E
(M ST ) St = x, Mt = y .
T
()
()
and hence a similar result will also hold for M . We can therefore re-write (17.8) as
1
1
v(t, x, y) = er(T t) EQ [MT ST |St = x, Yt = y] = v(t, x, y).
0 t < T,
0 < z 1,
r(T t)
u(t, 0) = e
, 0 t T,
uz (t, 1) = u(t, 1), 0 t < T,
u(T, z) = 1 z, 0 z 1.
It can be checked that the following function satisfies the PDE and boundary conditions (Shreve
[17] has the full computation in Exercise 7.5).
where
1
1 2
d=
log z + r + (T t) .
2
T t
18. Asian options
An Asian option is a claim involving the average of the asset price over the option lifetime.
Define the process
Z t
Yt :=
Su du, t > 0.
0
The average of the stock price over [0, t] is then defined as At := Yt /t. Then an Asian option
will have some payoff h(ST , AT ). For instance, the fixed strike Asian call option with maturity
T and strike K has payoff (AT K)+ , while the floating strike Asian call option has payoff
(ST AT )+ . The original motivation for introducing these claims into financial markets was as
93
a way of preventing people easily affecting the realised payoff of an option by manipulating the
stock price.
It turns out that Asian options are strongly path-dependent. The claim price process depends
on both S and Y (and Y depends on the entire history of the asset price). To see this, observe
that the SDEs for S, Y under Q are
dSt = rSt dt + St dWtQ ,
dYt = St dt.
0 t T,
0 t K.
94
MICHAEL MONOYIOS
If one plots the implied volatility (IV) of options of different strikes (equivalently, of different
moneyness St /K) and of different maturity, using market prices of options, then if the BS model
was a true depiction of stock price dynamics, one would see a flat IV surface across moneyness
and maturity. But this is not observed in market option prices. These show a variation in
implied volatility with strike price and maturity, with an apparently stochastic dynamics. This
may be evidence of stochastic volatility in the underlying stock price process. Indeed, stochastic
volatility option pricing models can produce some of the observed features seen in empirical
implied volatilities, though there is evidence that additional stochastic factors are needed to
capture all the features of the observed implied volatility surface.
19.2. Local volatility model. Here is a simple generalisation of the BS model that allows the
volatility to depend on both time and the current level of the stock price (and hence be stochastic,
but only through dependence on the stock price):
dSt = St dt + (t, St )St dWt ,
for some non-negative function (, ). Then the market is still complete, with the usual replication
arguments going through unchanged. Option price functions will still satisfy the BS PDE with
volatility (t, x):
1
vt (t, x) + rxvx (t, x) + 2 (t, x)x2 vxx (t, x) rv(t, x) = 0.
2
Remark 19.2 (Recovering the local volatility function from option prices*). Dupire [5] has shown
how it is possible to find a volatility function (t, x) that is consistent with an observed implied
volatility surface on a given date, given (the idealised scenario) that we can observe market prices
of vanilla options of all strikes K 0. This is not to say that a local volatility model represents
a true model of how volatilities actually evolve. It is better thought of as an effective theory
a code which represents a convenient parametrisation of todays observed option prices (which
might be generated by a stochastic volatility model, or a model with yet more factors).
Indeed, empirical studies show that the local volatility function consistent with option prices
on a fixed date is unstable over time. This reflect the limitations of such models: they contradict
the empirical observation that price and volatility are not perfectly correlated (empirical studies
show a negative correlation between price and volatility), and since they are complete, options are
redundant in these models, so they say nothing about vega hedging. In essence, to get a true
reflection of the underlying process, one needs volatility to be genuinely stochastic, as described
shortly. However, this does not necessarily mean that inaccurate models are not useful in practical
hedging of options, as discussed in Section 19.4.
See Shreve [17] Exercise 6.10 for a guide to Dupires result.
19.3. Stochastic volatility models. A stochastic volatility model is one in which we let the
volatility process of a stock be a stochastic process in its own right, with a risky component that
is correlated with that driving the stock price. Here is one specification of such a model. Under
the physical measure P, a stock price S and its volatility Y follow
dSt = St dt + Yt St dWt ,
p
where the Brownian motions W, B have correlation [1, 1]. We write B = W + 1 2 W ,
with (W, W ) := (Wt , Wt )0tT a two-dimensional Brownian motion on a complete filtered
probability space (, F, F := (Ft )0tT , P).
We claim that this market is incomplete, in that there is one traded asset and two (so more
than the number of traded assets) independent driving Brownian motions W, W , which are the
sources of risk. (In general, if the number of sources of risk is greater than the number of traded
risky assets, then the model is incomplete).
For simplicity, let us take the interest rate to be zero. Suppose we have a European claim on
the stock, with payoff h(ST ). (We could allow for a claim whose payoff also depends on Y , but
95
this would not change the arguments below, so for simplicity we do not do this here.) Consider
attempting to replicate this option with a dynamic self-financing portfolio involving the traded
asset S (plus cash). The portfolio wealth process X := (Xt )0tT satisfies
(19.1)
dXt = Ht dSt ,
where H = (Ht )0tT is the process for the number of shares of S in the portfolio.
The value process of the claim will be (Vt )t[0,T ] , where Vt = v(t, St , Yt ), for some function
v(t, x, y). In other words, because Y enters into the dynamics for S, claims on S will have price
processes that depend on Y . The It
o formula gives
1
dv(t, St , Yt ) = vt (t, St , Yt ) dt + vx (t, St , Yt ) dSt + vxx (t, St , Yt ) d[S]t
2
1
+ vy (t, St , Yt ) dYt + vyy (t, St , Yt ) d[Y ]t + vxy (t, St , Yt ) d[S, Y ]t .
(19.2)
2
It is immediately apparent from (19.1) and (19.2) that perfect hedging of the claim using the
portfolio with value process X is not possible, because of the term in (19.2) involving dYt , which
has a component involving dWt , representing the unhedgeable risk associated with the claim.
19.3.1. Completion of the market. Now introduce a second traded asset in the form of another
claim with payoff g(ST 0 ) at time T 0 T , and we emphasise that this claim is traded in the market.
Denote the price of this claim at t T 0 by Ut = u(t, St , Yt ), for some function u(t, x, y).
Form a portfolio with Ht units of S and HtU units of U at time t [0, T ], with the remaining
wealth in the form of cash. The value process of the portfolio is now X = (Xt )0tT satisfying
dXt = Ht dSt + HtU dUt ,
with dUt = du(t, St , Yt ) given by (19.2) with v(t, St , Yt ) replaced by u(t, St , Yt ). Use this portfolio
to (attempt to) perfectly hedge the claim V , so set Xt = v(t, St , Yt ) for all t [0, T ]. This requires
dXt = dv(t, St , Yt ). Equating terms involving dYt gives
(19.3)
0 t T.
This is a vega hedging formula. The volatility risk associated with sensitivity to changes in Y for
the claim V is represented by the right-hand-side of (19.3), and this is hedged using HtU units of
claim U .
Using (19.3) and equating terms involving dSt then gives
vy
ux (t, St , Yt ),
(19.4)
Ht = vx (t, St , Yt )
uy
which is a (generalised) delta hedging formula. (The number of units of stock is the delta of the
claim to be hedged, less the delta of the HtU units of the claim that has been used to achieve the
vega hedge.)
Finally, using (19.3) and (19.4) and equating the finite variation terms in dXt = dv(t, St , Yt )
gives that the functions u, v must satisfy
1 2 2
1 2
1
(19.5)
ut + x y uxx + b (y)uyy + xyb(y)uxy
uy
2
2
1
1 2 2
1 2
=
vt + x y vxx + b (y)vyy + xyb(y)vxy ,
vy
2
2
for all (t, x, y) [0, T ] R2+ (with the arguments of u, v omitted for brevity).
Now, the left-hand-side of (19.6) contains terms involving u(t, x, y) only, while the right-handside of (19.6) contains terms involving v(t, x, y) only. In principle, therefore, the the left-hand-side
of (19.6) depends on T 0 (but not on T ) while the the right-hand-side of (19.6) depends on T (but
not on T 0 ). Hence we can only have equality in (19.6) if both sides of the equation do not depend
96
MICHAEL MONOYIOS
on either of T, T 0 . In other words, both sides of (19.6) must be equal to some function (t, x, y) of
(t, x, y) only. Then both u(t, x, y) and v(t, x, y) satisfy a PDE of the same form. For v we obtain
1
1
(19.6) vt + x2 y 2 vxx (t, x, y)uy + b2 (y)uyy + xyb(y)uxy = 0, v(T, x, y) = h(x),
2
2
with u satisfying the same PDE but with terminal condition u(T 0 , x, y) = g(x).
By the Feynman-Kac theorem, both u and v will have expectation representations. For v we
have
v(t, x, y) = EQ [h(ST )|St = x, Yt = y],
where Q is a measure equivalent to P and under which S, Y have dynamics
dSt = Yt St dWtQ ,
where W Q , B Q are Q-BMs with correlation . We observe that Q is an EMM, as S is a Qmartingale, while the drift of Y is arbitrary, as it is not a traded asset. There are many possible
EMMs, given the arbitrary drift of Y . This is a reflection of the incompleteness of the market
with stochastic volatility.
19.4. Robustness of the Black-Scholes formula. We end with a remarkable robustness property of BS-style hedging. We know that the stock price dynamics in the BS model are almost
certainly wrong, but this does not necessarilty imply that we cannot use a delta-hedging rule
based on the BS formula to achieve a successful hedge, even in the face of severe model error, as
the following argument shows.
Suppose the true price process of a stock is
dSt = t St dt + t St dWt ,
where (t , t )t0 are processes adapted to a filtration F = (Ft )t0 . The market is not necessarily
complete, so the filtration F can be larger than filtration generated by the BM W .
Suppose a trader sell an option (say, a call with some maturity T ) at time zero using an IV of
0 . That is, the option is sold for v(0, S0 ) where v(t, x) solves the BS PDE with volatility 0 :
1
(19.7)
vt (t, x) + rxvx (t, x) + 02 x2 vxx (t, x) rv(t, x) = 0.
2
The trader uses the proceeds of the option sale to form a hedge portfolio with initial value
X0 = v(0, S0 ), and then uses the hedge Ht = vx (t, St ) (so that Xt Ht St is in cash) at t [0, T ].
Define Rt := Xt v(t, St ), as the tracking error (or residual risk). Using the Ito formula and
the PDE satisfied by v(t, x), we have (exercise!)
1
d(ert Rt ) = ert St2 vxx (t, St )(02 t2 ) dt.
2
We conclude that since vxx (t, St ) 0 (for both a call and a put) we have RT 0 a.s. if 0 t
for all t [0, T ]. In other words, the hedging strategy makes a profit with probability 1 if the
implied volatility 0 is high enough. In this sense, successful hedging is entirely a matter of good
volatility estimation.
This is a crucial result, as it shows that successful hedging is quite possible even under significant
model error. Without some robustness property of this kind, it is hard to imagine that the
derivatives industry could exist at all.
Appendix A. Conditional expectation, martingales, equivalent measures
A.1. Independence. Here is a general treatment of independence in a finite probability space
(, F, P). Many of the definitions as written here extend to general probability spaces.
Definition A.1 (Independence of sets). Two sets A F and B F are independent if
P(A B) = P(A)P(B).
97
To see that this is a correct definition, suppose that a random experiment is conducted, and
is the outcome. The probability that A is P(A). Suppose you are not told , but you are
told that B. Conditional on this information, the probability that A is
P(A|B) :=
P(A B)
.
P(B)
The sets A and B are independent if and only if this conditional probability is the unconditional
probability P(A), i.e. knowing that B does not change the probability you assign to A. This
discussion is symmetric with respect to A and B; if A and B are independent and you know that
A, the conditional probability you assign to B is still the unconditional probability P(B).
Note that whether two sets are independent depends on the probability measure P.
Definition A.2 (Independence of -algebras). Let G and H be sub--algebras of F. We say that
G and H are independent if every set in G is independent of every set in H, i.e.
P(A B) = P(A)P(B),
for every A G, B H.
Definition A.3 (Independence of random variables). Two random variables X and Y are independent if the -algebras they generate, (X) and (Y ), are independent.
The above definition says that for independent random variables X and Y , every set defined
in terms of X is independent of every set defined in terms of Y .
Suppose X and Y are independent random variables. The measure induced by X on R is
X (A) := P{X A}, for A R. Similarly, the measure induced by Y is Y (B) := P{Y B},
for B R. The pair (X, Y ) takes values in the plane R2 , and we define the measure induced by
the pair (X, Y ) as
X,Y (C) := P{(X, Y ) C}, C R2 .
In particular, C could be a rectangle, i.e. a set of the form A B, where A R and B R. In
this case
{(X, Y ) C} = {(X, Y ) A B} = {X A} {Y B},
and X and Y are independent if and only if
X,Y (A B) = P({(X A} {Y B})
= P{(X A}P{(Y B}
= X (A)Y (B).
In other words, for independent random variables X and Y , the joint distribution represented by
X,Y factorises into the product of the marginal distributions represented by the measures X
and Y .
Theorem A.4. Suppose X and Y are independent random variables. Let g and h be functions
from R to R. Then g(X) and h(Y ) are also independent random variables.
Proof. We prove this only in the special case where g, h are bijections (but the result is true in
general). Put W = g(X) and Z = h(Y ). We must consider sets in (W ) and (Z). But a typical
set in (W ) is of the form
{ : W () A} = { : g(X()) A},
which is defined in terms of the random variable X, and is therefore in (X). So, every set in
(W ) is also in (X). Similarly every set in (Z) is also in (Y ). Since every set in (X) is
independent of every set in (Y ), we conclude that every set in (W ) is independent of every set
in (Z).
98
MICHAEL MONOYIOS
Definition A.5. Let X1 , X2 , . . . be a sequence of random variables. We say that these random
variables are independent if for every sequence of sets A1 (X1 ), A2 (X2 ), . . ., and for every
positive integer n,
P(A1 A2 . . . An ) = P(A1 )P(A2 ) . . . P(An ).
Theorem A.6. If two random variables X and Y are independent, and if g and h are functions
from R to R, then
E[g(X)h(Y )] = E[g(X)] E[h(Y )],
provided all the expectations are defined.
Proof. Note that by Theorem A.4 it is enough to prove the result for g(x) = x and h(y) = y.
We prove this only in a finite probability space when X, Y can take on only finitely many values
X = xi , i = 1, . . . , K and Y = yj , j = 1, . . . , L. We use the fact that in this case, the expectation
P
of X has the familiar form E[X] = K
i=1 xi P{X = xi }. So we have
E[XY ] =
K X
L
X
xi yj P({X = xi } {Y = yj })
i=1 j=1
K X
L
X
xi yj P{X = xi }P{y = yj }
i=1 j=1
K
X
xi P{X = xi }
i=1
L
X
yj P{y = yj }
j=1
= E[X] E[Y ].
Remark A.7 (The standard machine). For general probability spaces, the above theorem in proved
using the Lebesgue integral representation of expectation, and an argument which Shreve [17]
(Section 1.5) calls the standard machine. Let g(x) = 1A (x) and h(y) = 1B (y) be indicator
functions. Then the equation we are trying to prove becomes
P({X A} {Y B}) = P{X A}P{Y B},
which is true because X and Y are independent. Now this is extended to simple functions
(sums of indicator functions) by linearity of expectation. Sequences of such functions can always
be constructed that converge to general functions g and h, and then an integral convergence
theorem, the Monotone Convergence Theorem,9 gives the result.
The covariance of two random variables X and Y is
cov(X, Y ) := E[(X EX)(Y EY )] = E[XY ] E[X] E[Y ],
so var(X) = cov(X, X). According to Theorem A.6, two independent random variables have zero
covariance (though the opposite is not necessarily true!).
9The Monotone Convergence Theorem is as follows. Let X , n = 1, 2, . . . be a sequence of random variables
n
converging almost surely to a random variable X (that is, P {Xn = X} 1 as n ). Assume that
0 X1 X2 . . . ,
Then
almost surely.
Z
X dP = lim
Xn dP,
or equivalently
E[X] = lim E[Xn ].
n
99
For independent random variables, the variance of their sum is the sum of their variances.
Indeed, for any two random variables X and Y , and Z = X + Y , then
var(Z) = var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y ),
so that for independent X and Y , var(X + Y ) = var(X) + var(Y ).
This argument extends to any finite number of random variables. If we are given independent
random variables X1 , X2 , . . . , Xn , then
var(X1 + . . . + Xn ) = var(X1 ) + . . . + var(Xn ).
Example A.8. Toss a coin twice, so = {HH, HT, T H, T T }, with probability p (0, 1) for H
and probability q = 1 p for T on each toss. Let A = {HH, HT } and B = {HT, T H}. We have
P(A) = p2 + pq = p, P(B) = pq + qp = 2pq, and P(A B) = pq. The sets A and B are independent
if and only if 2p2 q = pq, that is, if and only if p = 12 .
Let G = F1 be the -algebra determined by the first toss and H be the -algebra determined
by the second toss. Then, writing AH := {HH, HT } and AT := {T H, T T }, we have
G = {, , AH , AT }
H = {, , {HH, T H}, {HT, T T }}.
It is easy to see that these two -algebras are independent. For example, if we choose AH from
G and {HH, T H} from H, we find
P(AH )P{HH, T H} = p(p2 + pq) = p2 ,
P(AH {HH, T H}) = P{HH} = p2 .
This will be true no matter which sets we choose from G and H. This captures the notion that
the coin tosses are independent of each other.
A.2. Conditional expectation. Recall Definition 4.1 of conditional expectation E[X|G] given
a random variable X on (, F, P), with G a sub--algebra of F.
A.2.1. Partial averaging. The partial averaging property is
Z
Z
E[X|G] dP =
X dP, A G.
A
Note that 1A () (which equals 1 for A and 0 otherwise) is a G-measurable random variable.
Equation (A.1) suggests (and it is indeed true) that the following holds.
Lemma A.9. If V is any G-measurable random variable, then provided E[V E[X|G]] < ,
(A.2)
Proof. Here is a sketch of the proof in a general probability space, using an argument that Williams
[18] calles the standard machine.
First use (A.1) and linearity of expectations to prove (A.2) when V is a simple G-measurable
P
random variable, i.e. V = K
k=1 ck Ak , where each Ak G and each ck is constant. Next consider
the case that V is a nonnegative G-measurable random variable, not necessarily simple. Such a V
can be written as the limit of an increasing (almost surely) sequence of simple random variables
Vn . We write (A.2) for each Vn and pass to the limit n , using the Monotone Convergence
Theorem, to obtain (A.2) for V . Finally, the general (integrable) G-measurable random variable
V can be written as the difference of two nonnegative random variables, V = V + V , and since
(A.2) holds for V + and V it must hold for V as well.
100
MICHAEL MONOYIOS
Based on Lemma A.9, we can replace the second condition in the definition of conditional
expectation by (A.2), so that the defining properties of Y = E[X|G] are:
(1) Y = E[X|G] is G-measurable.
(2) For every G-measurable random variable V , we have
E[V E[X|G]] = E[V X].
Note that we can write (A.2) as
E[V (E[X|G] X)] = 0,
which allows an interpretation of E[X|G] as the projection of the vector X on to the subspace
G. Then E[X|G] X is perpendicular to any V in the subspace.
A.2.2. Properties of conditional expectation. Here are some proofs of the properties of conditional
expectation given in Section 4.2 (and repeated below). All the X below satisfy E|X| < .
(1) E[E[X|G]] = E[X].
Proof. Take A = in the partial averaging property (or, equivalently, V = 1 in (A.2)).
(2) If X is G-measurable, then E[X|G] = X.
R
R
R
Proof. The partial averaging property A Y dP A E[X|G] dP = A X dP holds trivially
when Y is replaced by X. Then, if X is G-measurable, it satisfies the first requirement in
the definition of conditional expectation as well.
(3) (Linearity) For a1 , a2 R,
E[a1 X1 + a2 X2 |G] = a1 E[X1 |G] + a2 E[X2 |G].
Proof. By linearity of integrals (i.e. of expectations), as follows: E[a1 X1 + a2 X2 |G] is
G-measurable and satisfies, for any A G,
Z
E[a1 X1 + a2 X2 |G] dP
A
Z
=
(a1 X1 + a2 X2 ) dP (partial averaging)
A
Z
Z
= a1
X1 dP + a2
X2 dP, (linearity of integrals)
A
ZA
Z
= a1
E[X1 |G] dP + a2
E[X2 |G] dP (partial averaging)
A
Z A
=
(a1 E[X1 |G] + a2 E[X2 |G]) dP, (linearity of integrals),
A
The RHS of this is 0 and the LHS is < 0 unless P(A) = 0. Therefore we must have
P(A) = 0 E[X|G] 0 almost surely.
101
a.s.
(the second equality obtained using (A.2) with V = 1A Z), so the partial averaging property holds.
(8) (Role of independence) If X is independent of H (i.e. if (X) and H are independent
-algebras), then
E[X|H] = E[X].
Proof. Observe first that E[X] is H-measurable, since it is not random. So we only need
to check the partial averaging property; we require that
Z
Z
E[X] dP =
X dP, A H.
A
102
MICHAEL MONOYIOS
and so the partial averaging property holds because the sets A and B are independent. The
partial averaging property for general X independent of H then follows by the standard
machine.
Remark A.10. There are also analogues of integral convergence theorems such as Fatous Lemma,
and the Monotone and Dominated Convergence Theorems, for conditional expectations as opposed
to ordinary expectations.
A.3. Martingales. A simple argument using the tower property and induction shows the following.
Lemma A.11. Let (Mt )nt=0 be a martingale with respect to the filtration (Ft )nt=0 . Then
E[Mt+u |Ft ] = Mt ,
for arbitrary u {1, 2, . . . , n t}.
Proof. Consider E[Mt+2 )|Ft ]. By the tower property,
E[Mt+2 |Ft ] = E[E[Mt+2 |Ft+1 ]|Ft ] = E[Mt+1 |Ft ] = Mt ,
and continuing in this fashion we get
E[Mt+u |Ft ] = Mt ,
for u = 1, 2, . . . , n t.
Lemma A.12. Let X be an integrable random variable (E [|X|] < ) on a filtered probability
space (, F, F := (Ft )nt=0 , P). Define
Mt := E[X|Ft ],
t {0, 1, . . . , n}.
t
X
s=1
is a (P, F)-martingale.
s (Ms Ms1 ),
103
= E [ Nt + t+1 (Mt+1 Mt )| Ft ]
= Nt + t+1 (E[Mt+1 |Ft ] Mt ) (since Nt and t+1 are Ft -measurable)
|
{z
}
=0
= Nt .
Remark A.14. The process N is called
a martingale transform or a discrete time stochastic integral,
R
and is sometimes denoted Nt = [0,t] s dMs or Nt = ( M )t .
The following proposition is another very useful characterisation of martingales.
Proposition A.15. On a filtered probability space (, (Ft )nt=0 , F, P), denote F := (Ft )nt=0 . An
adapted sequence of real random variables (Mt )nt=0 is a (P, F)-martingale if and only if for any
predictable process (t )nt=1 , we have
!
t
X
E
s Ms = 0,
s=1
where Ms := Ms Ms1 .
Proof. If (Mt )nt=0 is a martingale, define the process X := (Xt )nt=0
P
1, . . . , n, Xt := ts=1 s Ms for any predictable process (t )nt=1 . Then
Proposition A.13, and so E[Xt ] =X0 = 0.
Pt
Conversely, if E
s=1 s Ms = 0 holds for any predictable ,
A Fm be given, and define a predictable process by setting m+1
t {1, . . . , n}. Then
!
t
X
0 = E
s Ms
by X0 = 0 and, for t =
X is also a martingale, by
take m {1, . . . , n}, let
= 1A , t = 0 for all other
s=1
= E[1A (Mm+1 Mm )]
= E [E[1A (Mm+1 Mm )]|Fm ]
= E [1A (E[Mm+1 |Fm ] Mm )] .
Since this holds for all A Fm it follows that E[Mm+1 |Fm ] = Mm , so M is a martingale.
A.4. Equivalent measures and the Radon-Nikodym theorem. Here is a deep theorem,
which we do not prove.
Theorem A.16 (Radon-Nikodym). Let P and Q be two probability measures on a measurable
space (, F), such that Q is absolutely continuous with respect to P. Under this assumption, there
is a nonnegative random variable Z such that
Z
(A.3)
Q(A) =
Z dP, A F,
A
104
MICHAEL MONOYIOS
dQ
.
dP
A F.
This is then extended this to general X via the standard machine argument.
If P and Q are equivalent and Z is the Radon-Nikodym derivative of Q w.r.t. P, then
Radon-Nikodym derivative of P w.r.t. Q, i.e.
1
Z
is the
EQ [X] = E[XZ] X,
1
Q
Y,
E[Y ] = E Y
Z
(A.4)
(A.5)
and letting X and Y be related by Y = XZ we see that the above two equations are the same.
Example A.17 (Radon-Nikodym theorem in 2-period coin toss space). Let = 2 given by
2 = {HH, HT, TH, TT},
the set of coin toss sequences of length 2. Let P correspond to probability 13 for H and 32 for T,
and let Q correspond to probability 12 for H and 12 for T. Then the Radon-Nikodym derivative of
Q w.r.t. P is easily seen to be
Q()
, ,
Z() =
P()
so that
9
Z(HH) = ,
4
9
Z(HT) = ,
8
9
Z(TH) = ,
8
Z(TT) =
9
.
16
A.4.1. Radon-Nikodym martingales. Let be a finite set (such as the set of all sequences of n
coin tosses). Let F = (Ft )t0 be a filtration. Let P be a probability measure and let Q be a
measure absolutely continuous with respect to P, written as Q P. Assume
P() > 0,
Q() > 0,
Q()
.
P()
t = 0, 1, . . . , n.
105
Proof.
EQ [X] =
=
=
=
E[XZ]
E[E[XZ|Ft ]]
E[X.E[Z|Ft ]]
E[XZt ].
Note that Lemma A.18 implies that if X is Ft -measurable, then for any A Ft ,
EQ [1A X] = E[1A XZt ],
or equivalently,
Z
XZt dP.
X dQ =
A
1
E[XZt |Fs ].
Zs
Proof. Note first that Z1s E[XZt |Fs ] is Fs -measurable. So for any A Fs , we have
Z
Z
1
E[XZt |Fs ] dQ =
E[XZt |Fs ] dP (Lemma A.18)
A Zs
ZA
=
XZt dP (partial averaging)
A
Z
=
X dQ (Lemma A.18)
A
Z
=
EQ [X|Fs ] dQ (partial averaging).
A
Example A.20 (Radon-Nikodym theorem in 2-period coin toss space, continued). We show in
Figure 19 the values of the martingale Zt . Note that we always have Z0 = 1, since
Z
Z0 = E[Z] =
Z dP = Q() = 1.
Z2 (HH) =
9
4
Z1 (H) = 32
@
@ Z2 (HT) = Z2 (TH) =
Z0 = 1
@
@
Z1 (T) = 34
@
@Z (TT) =
2
9
8
9
16
Figure 19. The values of the Radon-Nikodym martingale Zt in the 2-period binomial model example
106
MICHAEL MONOYIOS
A.4.2. Conditional expectation and Radon-Nikodym theorem. Here, we give another application
of the Radon-Nikodym theorem.
Let (, F, Q) be a probability space. Let G be a sub--algebra of F, and let X be a non-negative
random variable with
Z
(A.6)
X dQ = 1.
We can construct the conditional expectation (under Q) of X given G. Recall that this is the
(unique) G-measurable random variable EQ [X|G] that satisfies the partial averaging property
Z
Z
Q
X dQ, A G.
E [X|G] dQ =
A
e by
On G we can define two probability measures P and P
A G,
P(A) = Q(A),
and
Z
X dQ,
e
P(A)
=
A G.
since if Y = 1A for some A G then (A.7) is just the definition of P, and the full result follows
from the standard machine.
e
Also, if A G and P(A) = 0, then Q(A) = 0, so that P(A)
= 0. In other words, the measure
e is absolutely continuous with respect to the measure P. The Radon-Nikodym Theorem then
P
implies that there exists a G-measurable random variable Z such that
Z
e
P(A)
=
Z dP, A G,
A
that is,
Z
Z
X dQ =
Z dP,
A G,
Z dQ,
A G.
or, by (A.7),
Z
Z
X dQ =
Thus Z has the partial averaging property, and since it is G-measurable, it is the conditional
expectation (under Q) of X given G. In other words, the existence of conditional expectations is
a consequence of the Radon-Nikodym Theorem.
Appendix B. Markov processes
Definition B.1. Let (, F, P) be a probability space. Let (Ft )nt=0 be a filtration under F. Let
(Xt )nt=0 be a stochastic process on (, F, P). This process is said to be Markov if (Xt ) is adapted
to the filtration (Ft ), and:
The Markov property: For each t = 0, 1, . . . , n 1, the distribution of Xt+1 conditioned
on Ft is the same as the distribution of Xt+1 conditioned on Xt .
It is intuitively clear that the stock price process in the binomial model is Markov (we shall
prove this formally later). If we want to estimate the distribution of h(St+1 ), where h is any
function, based on the information in Ft , the only relevant piece of information is the value of St .
For example,
E(St+1 |Ft ) = (pu + qd)St = (1 + r)St ,
107
which is a function of St .
B.1. Proving a process is Markov. Recall the notions of independence of -algebras, and of
a random variable X being independent of a -algebra G (that is, the -algebra generated by X
is independent of G). Recall also the following facts about expectations involving independent
random variables.
Any random variable X induces a measure X on the measurable space (R, B(R)), defined by
X (B) := P(X 1 (B)) = P{X B},
for any set B R in the Borel -algebra B(R). Then the expectation of h(X) is defined by the
Lebesgue integral
Z
Z
Z
E[h(X)] :=
h(X) dP =
h(x) dX (x)
h(x)X ( dx).
R2
P{(X, Y ) C}
P ({X A} {Y B})
P{X A}P{Y B}
X (A)Y (B).
In other words, the joint distribution of X, Y factorises into the product of the marginal distributions X , Y . In particular, we then have
Z
Z
(B.1)
E[f (X, Y )] =
f (x, y)X,Y ( dx, dy) =
f (x, y)X ( dx)Y ( dy).
R2
R2
Here, X is the distribution of X, the measure on the Borel sets B n of Rn defined by X (B) =
P(X B) for B B n .
108
MICHAEL MONOYIOS
Proof. Recall that the partial averaging property is equivalent to the statement that for any
bounded G-measurable random variable Z, we have
E[ZE[f (X, Y )|G]] = E[Zf (X, Y )].
We therefore need to show that for all such G-measurable Z we have
E[Zf (X, Y )] = E[Zg(Y )].
Let X,Y,Z be the distribution of the Rn+m+1 -valued random variable (X, Y, Z). Since X is independent of G, the random variables X and (Y, Z) are independent, so that X,Y,Z ( dx, dy, dz) =
X ( dx)Y,Z ( dy, dz). Hence
Z
zf (x, y)X,Y,Z ( dx, dy, dz)
E[Zf (X, Y )] =
Rn+m+1
Z
Z
z
f (x, y)X ( dx) Z,Y ( dy, dz)
=
Rm+1
Rn
Z
=
zg(y)Z,Y ( dy, dz)
Rm+1
= E[Zg(y)].
Example B.3 (The binomial stock price is Markov). Consider an n-period binomial model. Fix a
time t and define X := St+1 /St and G := Ft . Then X = u if t+1 = H and X = d if t+1 = T .
Since X depends only on the outcome of coin toss t + 1, X is independent of G. Define Y := St ,
so that Y is G-measurable. Let h be any function and set f (x, y) := h(xy). Then
g(y) := E[f (X, y)] = E[h(Xy)] = ph(uy) + gh(dy).
The Independence Lemma asserts that
E[h(St+1 )|Ft ] =
=
=
=
E[h(XY )|G]
E[f (X, Y )|G]
g(Y )
ph(uSt ) + qh(dSt ).
This shows the stock price is Markov. Indeed, if we condition both sides of the above equation
on (St ) and use the tower property on the left and the fact that the right hand side is (St )measurable, we obtain
E[h(St+1 )|St ] = ph(uSt ) + qh(dSt ).
Thus E[h(St+1 )|St ] and E[h(St+1 )|Ft ] are equal. Not only have we shown that the stock price
process is Markov, but we have also obtained a formula for E[h(St+1 )|Ft ] as a function of St .
References
[1] N. H. Bingham and R. Kiesel, Risk-neutral valuation, Springer Finance, Springer-Verlag London Ltd.,
London, second ed., 2004. Pricing and hedging of financial derivatives.
rk, Arbitrage theory in continuous time, Oxford University Press, third ed., 2009.
[2] T. Bjo
[3] F. Black and M. Scholes, The pricing of options and corporate liabilities, J. Polit. Econ., 81 (1973), pp. 637
659.
[4] M. H. A. Davis and A. Etheridge, Louis Bacheliers Theory of Speculation: the origins of modern finance,
PUP, 2006.
[5] B. Dupire, Pricing with a smile, Risk, 7 (1994), pp. 1820.
[6] A. Etheridge, A course in financial calculus, Cambridge University Press, Cambridge, 2002.
[7] G. R. Grimmett and D. R. Stirzaker, Probability and random processes, Oxford University Press, New
York, third ed., 2001.
[8] J. M. Harrison and S. R. Pliska, Martingales and stochastic integrals in the theory of continuous trading,
Stochastic Process. Appl., 11 (1981), pp. 215260.
109
[9] J. C. Hull, Options, futures and other derivatives, Pearson, eighth ed., 2011.
[10] J. Jacod and P. Protter, Probability essentials, Universitext, Springer-Verlag, Berlin, second ed., 2003.
[11] H. D. Junghenn, Option valuation, Chapman & Hall/CRC Financial Mathematics Series, CRC Press, Boca
Raton, FL, 2012. A first course in financial mathematics.
[12] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus, vol. 113 of Graduate Texts in
Mathematics, Springer-Verlag, New York, second ed., 1991.
[13] D. Kennedy, Stochastic financial models, Chapman & Hall/CRC Financial Mathematics Series, CRC Press,
Boca Raton, FL, 2010.
[14] R. C. Merton, Theory of rational option pricing, Bell J. Econom. and Management Sci., 4 (1973), pp. 141183.
[15] B. ksendal, Stochastic differential equations, Universitext, Springer-Verlag, Berlin, sixth ed., 2003. An
introduction with applications.
[16] S. E. Shreve, Stochastic calculus for finance. I, Springer Finance, Springer-Verlag, New York, 2004. The
binomial asset pricing model.
[17]
, Stochastic calculus for finance. II, Springer Finance, Springer-Verlag, New York, 2004. Continuous-time
models.
[18] D. Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press,
Cambridge, 1991.
[19] P. Wilmott, S. Howison, and J. Dewynne, The mathematics of financial derivatives, Cambridge University
Press, Cambridge, 1995. A student introduction.
Michael Monoyios, Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
E-mail address: monoyios@maths.ox.ac.uk