B10.1bnotes - ht14 Maths Models of Financial Derivatives

B10.
1B MATHEMATICAL MODELS OF FINANCIAL DERIVATIVES

HILARY TERM 2013
MICHAEL MONOYIOS
MATHEMATICAL INSTITUTE
UNIVERSITY OF OXFORD
Useful textbooks
There are a huge number of books on financial derivatives. Here is a selection, worth consulting
for background reading. The numbers in square brackets refer to the bibliography at the end of
the notes.
Steven E. Shreve, Stochastic calculus for finance I: The binomial asset pricing model,
Springer 2004 [16]
(A superb probabilistic account of the binomial model.)
Steven E. Shreve, Stochastic calculus for finance II: Continuous-time models, Springer
2004 [17]
(A superb first text on stochastic calculus for finance with many examples.)
Alison Etheridge, A course in financial calculus, CUP 2002 [6]
(An excellent primer on stochastic calculus for finance.)
Paul Wilmott, Sam Howison and Jeff Dewynne, The mathematics of financial derivatives:
A student introduction, CUP 1995 [19]
(A decent first text on the PDE aspects of the subject.)
Tomas Bj
ork, Arbitrage theory in continuous time, 3rd Ed., OUP 2009 [2]
(A good all-round text which covers many topics outside the scope of the course.)
Nick H. Bingham and Ruediger Kiesel, Risk-neutral valuation: Pricing and hedging of
financial derivatives, 2nd Ed., Springer 2004 [1]
(A decent all-round text.)
Douglas Kennedy, Stochastic financial models, CRC Press 2010 [13]
(A good text based on a Cambridge Part III course, with a different emphasis, focusing a
little more on portfolio optimisation as opposed to derivative security valuation.)
Hugo D. Junghenn, Option valuation: A first course in financial mathematics, CRC Press
2012 [11]
(A good recent text at about the same level as the course.)
John C. Hull, Options, futures and other derivatives, 8th Ed., Pearson 2011 [9]
(A bestseller that has a more financial as opposed to mathematical bias, and was one of
the first textbooks on the subject, becoming a mainstay of many trading rooms.)
Jean Jacod and Philip Protter, Probability essentials, Springer 2003 [10]
(An excellent text on measure-theoretic probability, good for background.)
Geoffrey R. Grimmett and David R. Stirzaker, Probability and random processes, 3rd Ed.,
OUP 2001 [7]
(An excellent and encyclopedic background probability text.)
The lecture notes
These notes contain the core material and more, in somewhat more detail that we will be able
to cover in lectures. Material marked with an asterisk and is not examinable. Some probability
Date: March 12, 2014.
1
MICHAEL MONOYIOS
theory underlying conditional expectation and martingales is contained in the Appendix, for
those who wish to brush up on some probabilistic material. The material in the Appendix is not
examinable. The notes are no substitute for attending lectures. Some topics might be covered in
a little more or less detail than in these notes.
Regarding the mathematical material that you will need to be fluent in, here is some guidance.
You are expected to become familiar with the use of some probabilistic terminology (-algebras,
filtrations, random variables that are measurable with respect to a -algebra). You are expected
to have some familiarity with the properties of conditional expectation (but will not be examined
on proofs of these) and martingales, and to be able to use them.
You are expected to know the defining properties of a stochastic process W = (Wt )t0 known as
Brownian motion (BM), and to understand how these lead to the fact that its quadratic variation
(QV) process [W ] is equal to the time elapsed: [W ]t = t. You are expected to know Levys
criterion: any continuous martingale M satisfying [M ]t = t is a BM (and to be able sketch the
proof in the one-dimensional case).
You should be able to use the properties of Brownian motion (such as its independent Gaussian
increments property and its quadratic variation property). You should have some appreciation
of how the properties of BM lead to the properties of the Ito integral (such as the martingale
property and the It
o isometry) for elementary integrands, and you are required to know (but not
to prove) that these properties extend to the Ito integral for general integrands. You are not
required to know the theory of the construction of the Ito integral for general integrands.
You are expected to have an appreciation of how the quadratic variation property of BM leads to
the Ito formula and to properties of the Ito integral (that is, stochastic calculus). You are expected
to be able to use the It
o formula (both the one-dimensional and multi-dimensional versions)
fluently. You are expected to understand (and prove, using the Ito formula) the connection
between PDEs and stochastic calculus, in the form of the Feynman-Kac theorem.
Contents
Useful textbooks
The lecture notes
1. Introduction to financial derivatives
1.1. Underlying assets
1.2. Interest rates and time value of money
1.3. Forward and futures contracts
1.4. Arbitrage
1.5. Options
1.6. Some history*
2. Coin-toss space: a finite probability space
3. The binomial stock price process
4. Conditional expectation, martingales, equivalent measures
4.1. Conditional expectation
4.2. Properties of conditional expectation
4.3. Martingales
4.4. Equivalent measures
5. Contingent claim valuation in the binomial model
5.1. Equivalent martingale measures and no arbitrage
5.2. Valuation by replication in the binomial model
5.3. Completeness of the multiperiod binomial model
6. American options in the binomial model
6.1. Value of hedging portfolio for an American option*
7. Brownian motion
1
1
4
5
5
6
7
9
11
13
16
18
18
19
21
22
22
24
25
28
30
32
33
B10.1B FINANCIAL DERIVATIVES
7.1. BM as scaled limit of symmetric random walk

7.2. Brownian motion definition
7.3. Properties of BM
7.4. Quadratic variation of BM
7.5. Path length*
7.6. Cross-variation of W and t and QV of t
7.7. Other variations of Brownian motion
7.8. Levys characterisation of Brownian motion
8. The It
o integral
8.1. Ito Integral of an elementary integrand
8.2. Ito Integral of a general integrand*
8.3. Ito and martingale representation theorems for Brownian motion*
9. The It
o formula
9.1. Itos formula for one Brownian motion
9.2. Itos formula for It
o processes
9.3. Markovian diffusions
9.4. Connection with PDEs: Feynman-Kac theorem
10. Multidimensional Brownian motion
10.1. Cross-variations of Brownian motions
10.2. Two-dimensional It
o formula
10.3. Multidimensional It
o formula
10.4. Multi-dimensional Feynman-Kac theorem
10.5. The Girsanov Theorem*
11. The Black-Scholes-Merton model
11.1. Portfolio wealth evolution
11.2. Perfect hedging
11.3. Feynman-Kac solution of the BSM equation
11.4. BS option pricing formulae
11.5. Sensitivity parameters (Greeks)*
11.6. Probabilistic (martingale) interpretation of perfect hedging*
11.7. Black-Scholes-Merton analysis for dividend-paying stock
11.8. Time-dependent parameters
12. Claims on futures contracts
12.1. The mechanics of futures markets
12.2. Options on futures contracts
13. Multi-asset derivatives
14. American options in the BSM model
14.1. Smooth pasting condition for American put
14.2. Optimal stopping representation*
14.3. American call on a non-dividend-paying stock
15. Simple exotic options
15.1. Digital options
15.2. Pay-later options
15.3. Multi-stage options
16. Barrier Options
16.1. PDE approach to valuing barrier options
17. Lookback options
17.1. PDE satisfied by lookback pricing function
18. Asian options
19. Incomplete markets
19.1. Implied volatility
33
34
35
37
41
42
43
43
43
44
47
50
50
50
53
55
56
57
57
58
60
61
61
62
63
64
65
65
66
69
72
72
74
74
75
76
77
79
80
81
83
83
84
84
85
86
89
89
92
93
93
MICHAEL MONOYIOS
19.2. Local volatility model

19.3. Stochastic volatility models
19.4. Robustness of the Black-Scholes formula
Appendix A. Conditional expectation, martingales, equivalent measures
A.1. Independence
A.2. Conditional expectation
A.3. Martingales
A.4. Equivalent measures and the Radon-Nikodym theorem
Appendix B. Markov processes
B.1. Proving a process is Markov
References
94
94
96
96
96
99
102
103
106
107
108
1. Introduction to financial derivatives

Definition 1.1. A European derivative security (or European contingent claim) is a financial
contract which pays its holder a random amount (the payoff of the claim) at some future time T
(the maturity time of the derivative).
An American derivative delivers the payoff at a random time T chosen by the holder of
the contract. The payoff is (typically) contingent on the value of some other underlying security
(or securities), or on the level of some non-traded reference index.
Example 1.2 (Forward contract). The holder of a forward contract agrees to buy an asset at some
future time T for a fixed price K (the delivery price) that is decided at initiation of the contract.
Hence, the forward contract has a value (to the holder) at maturity of ST K, where ST is the
underlying asset value at maturity.
The forward contract thus allows the holder to fix the purchase price of the underlying asset
in advance, and so can be used to mitigate the risk inherent in the price uncertainty (that is, to
hedge the price risk). It can also be used to speculate against future price moves.
Example 1.3 (European call option). A European call option has payoff (ST K)+ at maturity.
It confers to the owner the right (but not the obligation) to buy the underlying asset at maturity
time T for a fixed price K (the strike price, or exercise price).
The origins of derivatives lie in medieval agreements between farmers and merchants to trade
the farmers harvest at some future date, at a price set in advance. This allowed farmers to fix
the selling price of their crop, and reduced the risk of having to sell at a lower price than their
cost of production, which might happen in a bumper harvest year. This is one motivation for
the existence of derivatives: they give random payoffs which can be used to eliminate uncertainty
from future asset price trades. The act of removing uncertainty in finance is called hedging.
Since derivatives have random payoffs, they can also be used to take risk by speculating on the
future values of asset prices, and they are often a cheaper device for doing so than investing in the
underlying asset. For instance, a European call option on a stock, with payoff (ST K)+ , allows
the holder to profit if the stock price at maturity is above the strike, and the cost of acquiring a
call option is usually only a fraction of the cost of buying the stock itself.
This course will be about how to assign a value to a derivative at any time t T . This will
involve modelling the randomness in the underlying asset price process S = (St )0tT . To do
this, we will need the notion of a stochastic process on a filtered probability space. We shall see
that the key to valuing derivatives is to attempt to use the underlying asset to remove the risk
from selling (or buying) the derivative. That is, derivative valuation is via a hedging argument.
1.1. Underlying assets. Typical assets which are traded in financial markets, and which can
be the underlying assets for a derivative contract, include:
shares (stocks)
commodities (metals, oil, other physical products)
currencies
bonds (assets used as borrowing tools by governments and companies) which pay fixed
amounts at regular intervals to the bond holder.
An agent who holds an asset will be said to hold a long position in the asset, or to be long in
the asset.
An agent who has sold an asset will be said to hold a short position in the asset, or to be short
in the asset.
For the most part in this course, we will focus on derivative securities which have a stock as
underlying asset. The stock price will be a stochastic process denoted by S = (St )0tT on a
filtered probability space (, F, F := (Ft )0tT , P). This means that for each t [0, T ], St , the
value of the stock at time t, is a random variable that is measurable with respect to the -algebra
Ft (this means that the information represented by the -algebra Ft is enough to know the value
of the stock price at that time). When St is Ft -measurable for all t [0, T ] we say that S is a
process that is adapted to the filtration F. We shall see later that this means the following: each
Ft is a collection of subsets of a set (the sample space), closed under complements and under
countable unions, and with Fs Ft for s < t (such an increasing sequence of -algebras is called
a filtration, and will represent increasing information as time evolves). Each St is a function from
to R+ with the property that sets of the form
{ |St () A R+ }
lie in Ft . This is what we mean by saying that St is an Ft -measurable random variable. The
adaptedness property (St is Ft -measurable for each t [0, T ]) is tantamount to the idea that the
information available at time t is sufficient to know the value of St (that is, if you observe the
stock market up to time t, you will know the current value of the stock price).
1.2. Interest rates and time value of money. Let us measure time in some convenient units,
say years. If an interest rate r is quoted per annum and with compounding frequency at time
intervals t, this means that an amount C invested for a time period t will grow to C(1 + r t).
If this is re-invested for another period t, the balance becomes C(1 + r t)2 , and so on. So after n
periods, with t := n t, we have C(1 + r t)n = C(1 + rt/n)n . A continuously compounded interest
rate corresponds to the limit n , or t 0. In this case, after time t an amount C will grow
to

rt n
= Cert .
lim C 1 +
n
n
So an amount C invested at time zero for a time t will grow to an amount Cert , where r is the
continuously compounded risk-free interest rate. We call the amount Cert the future value of C
invested at time zero, and the factor ert is called an accumulation factor.
By the same token, receiving an amount C at time t is equivalent to receiving Cert at time
zero. We call Cert the present value of C received at time t (we say that C is discounted to the
present) and the factor ert is called a discount factor.
It is usually convenient (but nothing more) to assume that interest is continuously compounded.
We do not need to assert that, in reality, interest is continuously compounded, in order to use a
continuously compounded interest rate in all our analysis. If the interest is actually compounded
m times a year at an interest rate of R per annum, then we can still use a continuously compounded
interest rate r simply by making the identification

R mt
(1.1)
C 1+
= C exp(rt),
m
MICHAEL MONOYIOS
so that there is a one-to-one correspondence between the interest rate R (compounded m times
per annum) and the continuously compounded interest rate r. In this course, we will generally use
continuously compounded rates when considering continuous time models (but will not necessarily
do so when using discrete-time models such as the binomial model).
A differential version of the above arguments is as follows. In continuous time, we model the
time evolution of cash in a bank account in terms of a riskless asset which we shall call a money
market account, which is the value at time t > 0 of $1 invested at time zero and continuously
earning interest which is reinvested. We shall often denote the value of this asset at time t by
(0)
St , which satisfies
(1.2)
(0)
(0)
dSt
= rSt dt,
(0)
S0 = 1,
where r is the (assumed constant) interest rate. Then the value of the bank account at time t is
(0)
given by St = ert , which we see is the accumulation factor we encountered above.
A more complex model could assume that interest rates are time-varying (possibly stochastic).
In this case the money market account would satisfy
(1.3)
(0)
dSt
(0)
= rt St dt,
(0)
S0 = 1,
where rt is the instantaneous (or short term) interest rate. We have allowed for this to be timevarying, and rt represents the interest rate in the time interval [t, t + dt). From (1.3) we see
that
Z t

(0)
(1.4)
St = exp
ru du ,
0
and this is the accumulation factor in this case. This is the factor by which $1 invested at time
zero grows to at time t, when the interest generated is continually reinvested.
1.3. Forward and futures contracts.
Definition 1.4 (Forward contract). A forward contract obliges its holder to buy an underlying
asset (a stock, say) at some future time T (the maturity time) for a price K (the delivery price)
that is fixed at contract initiation. Hence, at time T , when the stock price is ST , the contract is
worth ST K (the payoff of the forward) to the holder. This payoff is shown in Figure 1.
ST K (Forward contract payoff)

6
ST (Terminal asset price)
Figure 1. Forward contract payoff as function of final underlying asset price

A futures contract is a rather specialised forward contract, traded on an organised exchange,
and such that, if a contract is traded at some time t T , the delivery price is set to a special
value Ft,T , called the futures price of the asset or the forward price of the asset, chosen so that
the value of the futures contract at initiation (that is, at time t), is zero.
Futures markets have other specialised features that we will not dwell on too much in this
course. Principally, a participant in a futures market is required to set up a so-called margin
account as collateral, and ones daily profits and losses are reflected by adjustments in the margin
account (the holder of a futures contract receives the change in value of the futures price after
each day, for each contract held). One also has to maintain the balance in the margin account at
some minimum value (the maintenance margin), and receives a so-called margin call (a demand
to top-up the margin account) if the balance in the margin account falls below the maintenance
margin. This mechanism is designed to remove the risk of default from the market, and hence
futures markets are very liquid. See Hull [9] for detailed descriptions of the workings of futures
exchanges.
1.3.1. Valuation of forward contracts. In what follows we value forward contracts on a nondividend paying stock, that is, an asset with price process S = (St )0tT that pays no income to
its holder. We shall assume a constant interest rate r 0.
Lemma 1.5. The value at time t T of a forward contract with delivery price K and maturity
T , on an asset with price process S = (St )0tT , is ft,T f (t, St ) f (t, St ; T ) f (t, St ; T, K),
given by
(1.5)
ft,T = St K exp(r(T t)),
0 t T.
Proof. This is a simple hedging argument which provides our first example of a riskless hedging
strategy. Start with zero wealth at time t T , and sell the contract at this time for some price
ft,T . Hedge this sale by purchasing the asset for price St . This requires borrowing of St ft,T .
At time T , sell the asset for price K under the terms of the forward contract, and require that
this is enough to pay back the loan. Hence we must have
K = (St ft,T ) exp(r(T t)),
and the result follows.

Corollary 1.6. The forward price of the asset at time t T for delivery at T is Ft,T given by
Ft,T = St exp(r(T t)),
0 t T.
Proof. Set ft,T = 0 in (1.5) and then by definition we must have K = Ft,T .

1.4. Arbitrage. The simple argument above for valuing a forward contract is an example of
valuation by the principle of no arbitrage. If the relationship in Lemma 1.5 is violated, then
an elementary example of a riskless profit opportunity, called an arbitrage, ensues. Here is a
definition of arbitrage.
Definition 1.7 (Arbitrage). Let X = (Xt )0tT denote the wealth process of a trading strategy.
An arbitrage over [0, T ] is a strategy satisfying X0 = 0, P[XT 0] = 1 and P[XT > 0] > 0.
So an arbitrage is guaranteed not to lose money and has a positive probability of making a
profit. If the valuation formula (1.5) for the forward contract is violated, an immediate arbitrage
opportunity occurs, as we now illustrate.
Suppose ft,T > St K exp(r(T t)). Then one can short the forward contract and buy the
stock, by borrowing St ft,T at time t. At maturity, one sells the stock for K under the terms of
the forward contract and uses the proceeds to pay back the loan, yielding a profit of
K (St ft,T ) exp(r(T t)) > 0.
This is an arbitrage. A symmetrical argument applies if ft,T < St K exp(r(T t)) (and you
should supply this).
The principle of riskless hedging and no arbitrage will also apply, rather less trivially, to the
valuation of options later in the course.
An equivalent way of looking at no arbitrage is sometimes called the law of one price. Two
portfolios which give the same payoff at T should have the same value at time t T . Let us show
MICHAEL MONOYIOS
how this applies to the valuation of a forward contract. Consider the following two portfolios at
time t T :
a long position in one forward contract,
a long position in the stock plus a short cash position of K exp(r(T t)).
At time T , these are both worth ST K, so their values at time t T must be equal, yielding
ft,T = St K exp(r(T t)), as before. Notice that the second portfolio perfectly replicates
(or perfectly hedges) the payoff of the forward contract, meaning that it reproduces the payoff
ST K. Denote the position in the stock that is needed to perfectly hedge a forward contract by
(f )
(f )
Ht . Then we have that Ht = 1 for all t [0, T ], and note that
(f )
Ht
0 t T,
= fx (t, St ) = 1,
where f (t, x) := x Ker(T t) . This is a simple example of a delta hedging rule, in which one
differentiates the pricing function of the derivative with respect to the variable representing the
underlying asset price, in order to get the hedging strategy. We will see a similar result when
valuing options.
1.4.1. Forward contract on a dividend-paying stock. The stock in the preceding analysis was assumed to pay no dividends. Now assume that the stock pays dividends as a continuous income
stream with dividend yield q. This means that in the interval [t, t + dt), the income received
by someone holding one share of the stock will be qSt dt. In Problem Sheet 1 you are asked to
consider what happens to ones holding of shares if all such income is immediately re-invested in
more shares, to show that if the initial holding is n0 at time zero, we have
nt = n0 exp(qt),
0 t T,
and then to use this result to value a forward contract on the dividend-paying stock, arriving at
the following.
Lemma 1.8. The value at time t T of a forward contract with delivery price K and maturity
T , on a stock with price process S = (St )0tT paying dividends at a dividend yield q, is given by
(1.6)
ft,T = St exp(q(T t)) K exp(r(T t)),
0 t T.
Proof. Problem Sheet 1.

Corollary 1.9. The forward price of the dividend-paying asset at time t T is given by
Ft,T = St exp((r q)(T t)),
0 t T.
Proof. Problem Sheet 1.

Remark 1.10 (Forwards and futures on currencies). A foreign currency is treated as an asset which
pays a dividend yield equal to the foreign interest rate rf . Hence, suppose S = (St )0tT is
the exchange rate (the value in dollars of one unit of foreign currency), then a forward contract
on the foreign currency has value at time t T given by
ft,T = St exp(rf (T t)) K exp(r(T t)),
where T is the maturity and K is the delivery price.
0 t T,
1.5. Options. An option is a contract that gives the holder the right but not the obligation to
buy or sell an asset for some price that is defined in advance.
The two most basic option types are a European call and a European put.
Definition 1.11 (European call option). A European call option on a stock is a contract that
gives its holder the right (but not the obligation) to purchase the stock at some future time T
(the maturity time) for a price K (the strike price or exercise price) that is fixed at contract
initiation. If S = (St )0tT denotes the underlying assets price process, the payoff of a call
option is (ST K)+ , as shown in Figure 2.
(ST K)+ (Call payoff)

6
ST
Figure 2. Call option payoff
Definition 1.12 (European put option). A European put option on a stock is a contract that
entitles the holder to sell the underlying stock for a fixed price K, the strike price, at a future
time T . If S = (St )0tT denotes the underlying assets price process, the payoff of a put option
is (K ST )+ , as shown in Figure 3.
(K ST )+ (Put payoff)
6
@
@
@
@
@
- S
T
Figure 3. Put option payoff

The act of choosing to buy or sell the asset under the terms of the option contract is called
exercising the option.
Options which can be exercised any time before the maturity date T are called American
options, whilst European options can only be exercised at T . Hence, an American call (respectively,
put) option allows the holder to buy (respectively, sell) the underlying stock for price K at any
time before maturity.
1.5.1. Put-call parity.
10
MICHAEL MONOYIOS
Lemma 1.13 (Put-call parity). The European call and put prices c(t, St ) and p(t, St ) of options
with the same strike K and maturity T on a non-dividend paying traded stock with price St at
time t [0, T ] are related by
c(t, St ) p(t, St ) = St Ker(T t) ,
0 t T.
Proof. The payoffs of a call and put satisfy

c(T, ST ) P (T, ST ) = (ST K)+ (K ST )+ = ST K,
which shows the (obvious) fact that a long position in a call combined with a short position in
a put is equivalent to a long position in a forward contract. Hence, their prices at t T must
satisfy
c(t, St ) p(t, St ) = ft,T = St Ker(T t) , 0 t T,
where ft,T is the value of a forward contract at t T .

Remark 1.14 (Put-call parity for dividend-paying stock). The same argument applied to a dividendpaying stock yields
c(t, St ) p(t, St ) = ft.T = St eq(T t) Ker(T t) ,
0 t T,
where q is the dividend yield.

1.5.2. European call and put price bounds. Put-call parity is a model-independent result. From
it, we see that c(t, St ) f (t, St ). That is, a call option is always at least as valuable as a forward
contract (an obvious fact).
A call option gives its holder the right to buy the underlying stock, which means that its value
can never be greater than that of the stock, so c(t, St ) St .
From these facts we deduce model-independent bounds on a European call option price on a
non-dividend-paying stock:
St Ker(T t) c(t, St ) St ,
0 t T.
These bounds are shown in Figure 4.

In Figure 4 we have plotted the upper and lower bounds of a European call value (the dotted
graphs) as well as the Black-Scholes value of the call (the solid graph later we will show how this
function arises). If the above call option pricing bounds are violated, then arbitrage opportunities
arise.
For example, if If c(t, St ) < St Ker(T t) one should buy the call and short the stock, which
gives a cash amount St c(t, St ) to be invested in a bank account. At time T , we have two
possibilities:
(1) ST K, in which case the call is not exercised. The arbitrageur buys the stock in the
market to close out the short position, using the proceeds from the bank account, which
stand at (St c(t, St ))er(T t) prior to buying the stock. This leaves a profit of
(St c(t, St ))er(T t) ST
> (St c(t, St ))er(T t) K

= er(T t) (St Ker(T t) c(t, St )) > 0.
(2) ST > K, in which case the call is exercised. The arbitrageur buys the stock for K to
close out the short position, using the proceeds from the bank account, which stand at
(St c(t, St ))er(T t) prior to buying the stock. This leaves a profit of
(St c(t, St ))er(T t) K = er(T t) (St Ker(T t) c(t, St )) > 0.
We can derive similar model-independent bounds on a put option price. A put option gives its
holder the right to receive an amount K for the stock, so the most it can be worth at maturity
11
Bounds on Value of European Call

20
K=10, r=10%, sigma=25%, T=1year

15
call price, C
10
C=S
5
C=SK.exp(rT)
10
8
10
stock price, S
12
14
16
18
Figure 4. Bounds on European call option value

is K (if the final stock price is ST = 0). Hence, its current value can never be greater than the
present value of K, so that
p(t, St ) Ker(T t) ,
0 t T.
Similarly, for the value of a put at expiry we have p(T, ST ) = (K ST )+ K ST . That is, a
put option is at least as valuable as a short position in a forward contract. Hence we have the
lower bound
p(t, St ) Ker(T t) St , 0 t T.
The results in this section are model-independent. To say more about option values we need a
model for the dynamic evolution of a stock price. One of the simplest continuous-time models is
the Black-Scholes-Merton (BSM) model, which we shall describe later, and one of the simplest
discrete-time models is the binomial model, which we shall also see shortly.
1.5.3. Combinations of options. Options can be combined to give a variety of payoffs for different
hedging purposes, or for speculation on movements in the underlying asset price, and they are
often used to do so because the option premiums are relatively small in some cases, thus proving
attractive to gamblers.
A straddle is a call and a put with the same strike and maturity. The payoff of a long position
in a straddle is

K ST , ST < K
+
+
(1.7)
(ST K) + (K ST ) =
ST K, ST K,
This payoff is illustrated in Figure 5.
1.6. Some history*. As remarked earlier, the origins of derivatives lie in medieval agreements
between farmers and merchants to insure farmers against low crop prices.
12
MICHAEL MONOYIOS
LONG STRADDLE PAYOFF

6
@
@
@
@
@
@
@
@
@
@
ST
Figure 5. Straddle payoff
In the 1860s the Chicago Board of Trade was founded to trade commodity futures (contracts
that set trading prices of commodities in advance), formalising the act of hedging against future
price changes of important products.
Options were first valued by Bachelier in 1900 in his PhD thesis, a translation of which can
be found in the book by Davis and Etheridge [4]. Bachelier introduced a stochastic process now
known as Brownian motion (BM) to model stock price movements in continuous time. Bachelier
did this before a rigorous treatment of BM was available in mathematics. His work was decades
ahead of its time, both mathematically and economically speaking, and was therefore not given
the credit it deserved at the time. In the decades that followed, mathematicians and physicists
(Einstein, Wiener, Levy, Kolmogorov, Feller to name but a few) developed a rigorous theory of
Brownian motion, and It
o developed a rigorous theory of stochastic integration with respect to
Brownian motion, leading to the notion of a stochastic calculus, which we shall encounter. In the
1960s, economists re-discovered Bacheliers work, and this was one of the ingredients that led to
the modern theory of option valuation.
In the early 1970s a combination of forces existed which made markets more risky, derivatives
more prominent, and their valuation and trading possible. The system of fixed exchange rates
that existed before 1970 collapsed, and the Middle East oil crises caused a big increase in the
volatility of financial prices. This increased the demand for risk management products such as
options. At the same time Black and Scholes [3] and Merton [14] (BSM) published their seminal
work on how to price options, based on managing the risk associated with selling such an asset.
This breakthrough, for which Scholes and Merton received a Nobel Prize (Black having passed
away in 1995) coincided with the opening of the Chicago Board Options Exchange (CBOE), giving
individuals both a means to value option contracts and a marketplace where they could profit
from this knowledge of the fair price.
Following on from this, the financial deregulation of the 1980s, allied to technological developments which made it possible to trade securities globally and to run large portfolios of complex
products, caused a huge increase in risky trading across international borders. This opened up yet
more risks across currencies, interest rates and equities, and financial institutions very skillfully
(or opportunistically, perhaps) created markets to trade derivatives and to sell these products to
customers. This has led to the massive increase in derivative trading that we now see, with the
volume of derivative contracts traded now dwarfing that in the associated underlying assets.
13
The papers of Black-Scholes [3] and Merton [14] attracted mathematicians to the subject,
and led to a mathematically rigorous approach to valuing derivatives, based on probability and
martingale theory, inspired by Harrison and Pliska [8]. This led directly to modern financial
mathematics, and has also contributed to the advent of derivatives written on a plethora of
underlying stochastic reference entities, such as interest rates, weather indices, default events, as
well as on more traditional traded underlying securities such as stocks, currencies and interest
rates.
2. Coin-toss space: a finite probability space
The binomial stock price model (which we are working towards) is a discrete time stochastic
model of a stock price process in which a fictitious coin is tossed and a stock price depends on the
outcome of the coin tosses. Hence our first task is to introduce some probabilistic notions and
terminology in the context of a finite coin-toss probability space.
Let T := {0, 1, . . . , n} represent a discrete time set. Let = n , the set of all outcomes of n
coin tosses. The finite set is called the sample space, with elements called sample points,
representing the possible outcomes of the random experiment in which the coin is tossed. Each
sample point is a sequence of length n, written as = 1 2 . . . n , where each t , t T is
either H (head) or T (tail), representing the outcome of the tth coin toss.
Let F be the set of all subsets of ; F is a -algebra (or -field), that is, a collection of subsets
of with the properties: (i) F, (ii) if A F then Ac F, (iii) if A1 , A2 , . . . is a sequence of
sets in F, then
k=1 Ak is also in F. We interpret -algebras as a record of information (as we
shall see shortly). The pair (, F) is called a measurable space.
We place a probability measure P on (, F). A probability measure P is a function mapping
P : F [0, 1] S
with the properties:
(i) P() = 1, (ii) if A1 , A2 , . . . is a sequence of disjoint sets
P
P(A
A
)
=
in F, then P (
k ). The interpretation is that, for a set A F, there is a
k=1
k=1 k
probability in [0, 1] that the outcome of a random experiment will lie in the set A. We think of
P(A) as this probability. The set A F is called an event.
For A F we define
X
(2.1)
P(A) :=
P().
A
We can define P(A) in this way because A has only finitely many elements.
Let the probability of H on each coin toss be p (0, 1), so that the probability of T is q := 1p.
For each = (1 2 . . . n ) we define
(2.2)
P() := pNumber of H in q Number of T in .
Then for each A F we define P(A) according to (2.1).
In the finite coin toss space, for each t T let Ft be the -algebra generated by the first t coin
tosses. This is a -algebra which encapsulates the information one has if one observes the outcome
of the first t coin tosses (but not the full outcome of all n coin tosses). Then Ft is composed of
all the sets A such that Ft is indeed a -algebra, and such that if you know the outcome of the
first t coin tosses, then you can say whether A or
/ A, for each A Ft . The (increasing)
sequence of -algebras F := (Ft )tT is an example of a filtration, defined formally below.
Definition 2.1 (Filtration). Let T = {0, 1, . . . , n}. A filtration F = (Ft )tT is a sequence of
increasing -algebras F0 , F1 , . . . , Fn . That is, Fs Ft if s < t.
A filtration records increasing information flow, as we shall see in an example below. A probability space (, F, P) equipped with a filtration F = (Ft )tT , with each Ft F, is called a filtered
probability space.
Example 2.2 (3-period coin toss space). Take n = 3, so that = 3 , given by the finite set
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT},
14
MICHAEL MONOYIOS
the set of all possible outcomes of three coin tosses.

Define the following two subsets of :
AH = {HHH, HHT, HTH, HTT},
AT = {THH, THT, TTH, TTT},
corresponding to the events that the first coin toss results in H and T respectively.
Define also
AHH = {HHH, HHT}, AHT = {HTH, HTT},
ATH = {THH, THT}, ATT = {TTH, TTT},
corresponding to the events that the first two coin tosses result in HH, HT, TH and TT respectively.
Using (2.2) and (2.1) we can compute the probability of any event. For instance,
P(AH ) = P{H on first toss} = P{HHH, HHT, HTH, HTT} = p3 + 2p2 q + pq 2 = p,
precisely in accordance with intuition, and similarly P(AT ) = q.
We can construct the following -algebras of subsets of :
F0 = {, },
F1 = {, , AH , AT }.
It is easy to see (exercise) that F0 , F1 are both -algebras.

The -algebra F1 contains the information of the first toss or the information up to time
1. If one has information on the first toss only, then one cannot say what the actual outcome
= 1 2 3 of the three coin tosses is. With information up to time 1, all one knows is that
either 1 = H or that 1 = T. In this case one can only say whether A or
/ A for every
A F1 . One cannot say whether {HHH, HHT, THH, THT} = AHH ATH (corresponding
tho the event that the second throw results in H) as one would need to know the outcome of the
second toss to answer such a question. This is why F1 = {, AH , AT , }.
The trivial -algebra F0 contains no information: knowing whether the outcome of the two
tosses is in (it is not) and whether it is in (it is) tells you nothing about , in accordance
with the idea that at time zero one knows nothing about the eventual outcome of the three
coin tosses. All one can say is that
/ and , and so F0 = {, }.
Consider the collection of sets
F2 = {, , AHH , AHT , ATH , ATT , plus all unions of these}.
Then F2 can be written as follows:
F2 = {, , AHH , AHT , ATH , ATT , AH , AT ,
AHH ATH , AHH ATT , AHT ATH , AHT ATT ,
AcHH , AcHT , AcTH , AcTT }.
Then F2 is indeed a -algebra which contains the information of the first two tosses or the
information up to time 2. This is because, if you know the outcome of the first two coin tosses,
you can say whether the outcome of all three tosses satisfies A or
/ A for each
A F2 .
Similarly, F3 F, the set of all subsets of , contains full information about the outcome
of all three tosses. The sequence of increasing -algebras F = {F0 , F1 , F2 , F3 } is a filtration.
The general principle illustrated by this example is:
The -algebra Ft corresponding to information at time t T is composed of all sets A such
that Ft is indeed a -algebra, and such that one can say whether A or
/ A, given that one
has information on the outcome of the first t coin tosses.
Definition 2.3 (Random variable). An R-valued random variable X X() on (, F) is a
measurable function X : R, that is, the set X 1 (A) = { : X() A} {X A} F,
for every Borel set A R.
15
A simple example of a random variable is the indicator function of a set A F, defined as

1, A
1A () := 0, / A.
Since a random variable maps into R, we can look at the preimage, under the random
variable, of sets in R, that is, sets of the form
X 1 (A) = { : X() A R} {X A},
which is, of course, a subset of . The complete list of subsets of that you can get as preimages
(under X) of sets in R, turns out to be a -algebra, whose information content is exactly the
information obtained by observing X, and is called the -algebra generated by the random variable
X. This is defined formally below, along with the idea of measurability of a random variable with
respect to a -algebra.
Definition 2.4. Let be a nonempty finite set and let F be the -algebra of all subsets of .
Let X be a random variable on (, F). The -algebra (X) generated by X is defined to be
the collection of all sets of the form { : X() A} where A is a subset of R. Let G be a
sub--algebra of F. We say that X is G-measurable if every set in (X) is also in G.
Intuitively speaking, if X is measurable with respect to a -algebra G, then the information
content of G is sufficient to reveal the value of the random variable X.
Definition 2.5 (Induced measure (distribution) of a random variable X). Let X be a random
variable on (, F, P). For A R, we define the induced measure of the set A to be
X (A) := P{ : X() A} P{X A}.
So the induced measure of a set A tells us the probability that X takes a value in A. By the
distribution of a random variable X, we mean any of the several ways of characterising X (so
we shall sometimes refer to the induced measure X associated with a random variable X as the
distribution of X).
Remark 2.6. We make a clear distinction between random variables and their distributions. A
random variable is a mapping from to R, nothing more, and has an existence quite apart from
any discussion of probabilities. The distribution of a random variable is a measure X on R, that
is, a way of assigning probabilities to sets in R. It depends on the random variable X and on the
probability measure P we use on . If we change P, we change the distribution of the random
variable X, but not the random variable itself. Thus, a random variable can have more than one
distribution (e.g. an objective or market distribution, and a risk-neutral distribution, and we
shall see such constructs in finance). In a similar vein, two different random variables can have
the same distribution.
Definition 2.7. Let X be a random variable on (, F, P). The expected value or expectation of
X is defined to be
X
(2.3)
E[X] :=
X()P().
It is easy to see that for an indicator function 1A of an event A F, the definition of expectation
leads to E[1A ] = P(A).
Remark 2.8. When the sample space is infinite and, in particular, uncountable, the summation
in the definition of expectation is replaced by an integral. In general, the integral over an abstract
measurable space (, F) with respect to a probability measure P is a so-called Lebesgue integral
(which has all the linearity and comparison properties we associate with ordinary integrals). The
expectation E[X] becomes the Lebesgue integral over of X with respect to P, written as
Z
Z
Z
Z
(2.4)
E[X] =
X dP
X() dP() =
x dX (x)
xX ( dx).
16
MICHAEL MONOYIOS
When X takes on a continuum of values and has a density fX , then dX (x) = fX

R (x) dx and the
integral on the right-hand side of (2.4) reduces to the familiar Riemann integral R xfX (x) dx.
We do not delve into the construction of Lebesgue integrals over
P abstract spaces here. Merely
think of the RHS of (2.4) as an alternative notation for the sum X()P(). See Jacod and
Protter [10], Shreve [17] and Williams [18] for more details on Lebesgue integration.
Definition 2.9 (Discrete-time stochastic process). Let T = {0, 1, . . . , n} be a discrete time set.
A discrete-time stochastic process (Xt )tT is a sequence of random variables.
Definition 2.10 (Adapted stochastic process). Let T = {0, 1, . . . , n}. A stochastic process
(Xt )tT on a filtered space (, F, (Ft )tT ) is adapted to the filtration (Ft )tT if Xt is Ft -measurable
for each t T.
3. The binomial stock price process
Take the filtered n-period coin toss probability space (, F, F = (Ft )tT , P), with = n ,
the set of outcomes of n coin tosses, T = {0, 1, . . . , n} and, for t T, Ft denotes the -algebra
generated by the first t coin tosses. Let F = Fn , and define the probability measure P via (2.1)
and (2.2), with p (0, 1) being the probability of H on any coin toss and q := 1 p being the
probability of T on any coin toss.
The binomial model contains two assets, a riskless asset (or cash account, or money market
(0)
account, or bond), with price process S (0) = (St )tT , and a risky asset or stock, with price
process S = (St )tT .
The process S (0) evolves according to
(0)
(0)
St+1 = (1 + r)St ,
t = 0, 1, . . . , n 1,
(0)
S0 = 1,
where r 0 is the one-period (assumed constant) interest rate. Hence we have

(0)
(3.1)
St
= (1 + r)t ,
t = 0, 1, . . . , n,
(0)
and hence St represents the value of at time t of one unit of currency (say $1) invested at time
zero.
Regarding the stock price process S = (St )tT , for each t T, St St () (for ) is a
one-dimensional random variable on the measurable space (, F), such that St () = St (1 . . . t )
(where the second notation encapsulates the fact that each St , for t T, depends only on the
outcome of the first t coin tosses) is the stock price after t coin tosses. The sequence of random
variables S = (St )tT is a stochastic process. We shall see that for each t T, St is Ft -measurable,
so that S is an F-adapted process. This encapsulates the idea that the information at time t T,
represented by Ft , is sufficient information to know the values of Ss for all s t.
Define two constants u, d satisfying u > 1 + r > d > 0. The evolution of the stock price is given
by (see Figure 6)

St u, if t+1 = H,
(3.2)
St+1 () =
t = 0, 1, . . . , n 1.
St d, if t+1 = T,
We may sometimes write (3.2) in the equivalent notation

St (1 . . . t )u, if t+1 = H
St+1 (1 . . . t+1 ) =
St (1 . . . t )d, if t+1 = T,
t = 0, 1, . . . , n 1,
whenever we wish to emphasise that St actually depends only on the outcome of the first t coin
tosses, and we abbreviate this notation further by sometimes suppressing the dependence on
1 . . . t , and writing

St u, if t+1 = H
St+1 (t+1 ) =
t = 0, 1, . . . , n 1,
St d, if t+1 = T,
17
St u
p
St
@
q := 1 p@
R St d
@
Figure 6. Binomial process for stock price. We have associated a probability

p (0, 1) with an upward stock price move.
(j)
At time t T the possible stock prices are St , given by

(j)
St
= S0 uj dtj ,
j = 0, 1, . . . , t,
t T.
Example 3.1 (One-period binomial model). Let n = 1, so T = {0, 1}, and = 1 is the finite set
1 := {H, T},
the set of outcomes of a single coin toss. The stock price process is (St )t{0,1} , and S1 () takes
on two possible values, S1 (H) or S1 (T), given by

S0 u, if = H
S1 () =
S0 d, if = T.
Example 3.2 (3-period binomial model). Let n = 3, so that T = {0, 1, 2, 3} and = 3 , given by
the finite set
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT},
the set of all possible outcomes of three coin tosses.
We can write down all the stock prices in a binomial tree as:
S (HHH) = u3 S0
0
1
111111111111
000000000000
03
1
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
S2 (HH) = u2 S0
000000000000
111111111111
000000000000
111111111111
0
1
0000
1111
000000000000
111111111111
0
1
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
S1 (H)111111111111
=
uS
0
0000
1111
000000000000
S3 (HHT) = S3 (HTH) = S3 (THH) = u2 dS0
0000
1111
0
1
0
1
000000000000
111111111111
000000000
111111111
0
1
0
1
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
S (HT) = S2 (TH) = udS0
0
1
0
1
S0
000000000000
111111111111
000000000
111111111
0
1
0 2
1
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
000000000000
111111111111
000000000
111111111
0
1
2
000000000000
111111111111
0
1
000000000
111111111
0000
1111
0
1
000000000000
111111111111
0S3 (HTT) = S3 (THT) = S3 (TTH) = ud S0
1
S1 (T)
=
dS
0000
1111
0
000000000000
111111111111
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
0000
1111
000000000000
111111111111
0
1
0000
1111
000000000000
111111111111
0
1
000000000000
111111111111
S2 (TT) = d2 S0
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
0
1
000000000000
111111111111
0S3 (TTT) = d3 S0
1
Recall the filtration (Ft )3t=0 of Example 2.2. The stock price process is adapted to this filtration,
meaning that for each t T, the random variable St is Ft -measurable, which in turn means that
for each t T, all sets of the form
{ |St () A R} {St A}
lie in Ft (as you can check in a few cases), and this in turn means that the information in Ft is
enough to know the value of St .
18
MICHAEL MONOYIOS
Now we consider the -algebra generated by the random variable S2 , denoted by (S2 ). If we
list, in as minimal a fashion as possible, the subsets of that we can get as preimages under S2
of sets in R, along with sets which can be built by taking unions and complements of these, then
this collection of sets turns out to be (S2 ).
Now, if AHH , then S2 () = u2 S0 . If AHT ATH , then S2 () = udS0 . If ATT , then
S2 () = d2 S0 . Hence (S2 ) is composed of {, , AHH , AHT ATH , ATT }, plus all relevant unions
and complements. Using the identities
AHH (AHT ATH ) = AcTT ,
AHH ATT = (AHT ATH )c ,
(AHT ATH ) ATT = AcHH ,
we obtain
(3.3)
(S2 ) = {, , AHH , AHT ATH , ATT , AHH ATT , AcHH , AcTT }.
The information content of the -algebra (S2 ) is exactly the information learnt by observing S2 .
So, suppose the coin is tossed three times and you do not know the outcome , but you are told,
for each set in (S2 ), whether is in that set or not. For instance, you might be told that is
not in AHH , is in AHT ATH , and is not in ATT . Then you know that in the first two throws
there was a head and a tail, but you are not told in which order they occurred. This is the same
information you would have got by being told that the value of S2 () is udS0 .
Note that F2 contains all the sets which are in (S2 ), and even more. In other words, the
information in the first two throws is greater than the information in S2 . In particular, if you
see the first two tosses you can distinguish AHT from ATH , but you cannot make this distinction
from knowing the value of S2 alone.
4. Conditional expectation, martingales, equivalent measures

The conditional expectation E[X|G] of a random variable X given the -algebra G is a random
variable that expresses mathematically the best estimate of X given the information represented
by G.
4.1. Conditional expectation.
Definition 4.1 (Conditional expectation). Let X be a random variable on a probability space
(, F, P), and let G be a sub--algebra of F. The conditional expectation E[X|G] is defined to be
any random variable Y that satisfies
(1) Y = E[X|G] is G-measurable;
(2) For every set A G, we have the partial averaging property
Z
Z
Z
Y dP
E[X|G] dP =
X dP.
A
Note (we do not prove this here) that there is always a random variable Y satisfying the above
properties (that is, conditional expectations always exist) provided that E|X| < . There can
be more than one random variable satisfying the above properties, but if Y 0 is another one, then
Y = Y 0 with probability 1 (or almost surely (a.s.)).
For random variables X, Y it is standard notation to write
E[X|Y ] E[X|(Y )].
19
4.2. Properties of conditional expectation. Proofs (or sketch proofs) of the properties below
are given in the Appendix. All the X below satisfy E|X| < .
(1) E[E[X|G]] = E[X].
The conditional expectation of X is thus an unbiased estimator of the random variable
X.
(2) If X is G-measurable, then E[X|G] = X.
In other words, if the information content of G is sufficient to determine X, then the
best estimate of X based on G is X itself.
(3) (Linearity) For a1 , a2 R,
E[a1 X1 + a2 X2 |G] = a1 E[X1 |G] + a2 E[X2 |G].
(4) (Positivity) If X 0 almost surely, then E[X|G] 0 almost surely.
(5) (Jensens inequality) If : R R is convex and E|(X)| < , then
E[(X)|G] (E[X|G]).
(6) (Tower property) If H is a sub--algebra of G, then
E[E[X|G]|H] = E[X|H],
a.s.
The intuition here is that G contains more information than H. If we estimate X based
on the information in G, and then estimate the estimator based on the smaller amount of
information in H, then we get the same result as if we had estimated X directly based on
the information in H.
(7) (Taking out what is known). If Z is G-measurable, then
E[ZX|G] = Z E[X|G].
(8) (Role of independence) If X is independent of H (i.e. if (X) and H are independent
-algebras), then
(4.1)
E[X|H] = E[X].
The intuition behind (4.1) is that if X is independent of H, then the best estimate of
X based on the information in H is E[X], the same as the best estimate of X based on
no information.
Example 4.2 (Conditional expectation in 3-period binomial model). Recall Example 3.2, featuring
a 3-period binomial model. We continue this to give an example of a conditional expectation:
suppose we wish to estimate S1 , given S2 , and denote this estimate by E[S1 |S2 ]. This should
have two properties: (i) it should be a random variable, so should depend on , E[S1 |S2 ] =
E[S1 |S2 ()] = E[S1 |S2 ](), and (ii) it should be (S2 )-measurable, that is, if if the value of S2 is
known then the value of E[S1 |S2 ] should also be known.
In particular, if = HHH or = HHT, then S2 () = u2 S0 and, even without knowing , we
know that S1 () = uS0 . We thus define
E[S1 |S2 ](HHH) := E[S1 |S2 ](HHT) = uS0 .
In other words
E[S1 |S2 ]() = uS0 ,
AHH .
Similarly we define
E[S1 |S2 ](TTT) := E[S1 |S2 ](TTH) = dS0 .
In other words
E[S1 |S2 ]() = dS0 ,
ATT .
20
MICHAEL MONOYIOS
Finally, if A = AHT ATH = {HTH, HTT, THH, THT}, then S2 () = udS0 , so that S1 () =
uS0 or S1 () = dS0 . So, to get E[S1 |S2 ] in this case, we take a weighted average, as follows. For
A we define
R
S1 dP
E[S1 |S2 ]() := A
,
P(A)
which is a partial averageR of S1 over the set A, normalised by the probability of A.
Now, P(A) = 2pq and A S1 dP = pq(u + d)S0 , so that for A
1
E[S1 |S2 ]() = (u + d)S0 , A = AHT ATH .
2
(In other words, the best estimate of S1 , given that S2 = udS0 , is the average of the possibilities
uS0 and dS0 .) Then we have that
Z
Z
S1 dP.
E[S1 |S2 ] dP =
A
In conclusion, we can write

E[S1 |S2 ]() = g(S2 ()),
where
if x = u2 S0
uS0 ,
1
g(x) =
(u + d)S0 , if x = udS0
2
dS0 ,
if x = d2 S0 .
In other words E(S1 |S2 ) is random only through dependence on S2 (and hence is (S2 )-measurable).
This random variable satisfies
(1) E[S1 |S2 ] is (S2 )-measurable
(2) For every A (S2 ),
Z
Z
E[S1 |S2 ] dP =
A
S1 dP,
A
which is the partial averaging property.

Here is another (simpler) example of conditional expectation in the 3-period binomial model.
Recall the -algebra determined by the first toss, F1 = {, , AH , AT }, where AH (respectively
AT ) is the event corresponding to a H (respectively a T) on the first toss.
Using the partial averaging property on the sets AH and AT , we can show the (intuitively clear)
fact that
E[S2 |F1 ]() = (pu + qd)S1 (),
as follows: E[S2 |F1 ] is constant on AH and on AT (since it is F1 -measurable) and must satisfy
the partial averaging property on these sets:
Z
Z
Z
Z
E[S2 |F1 ] dP =
S2 dP,
E[S2 |F1 ] dP =
S2 dP.
AH
AH
AT
AT
(Obviously the partial averaging property is true on (all are zero) and it will be true on if it
is true on AH and AT since AH AT = ). On AH we have
Z
E[S2 |F1 ] dP = P(AH )E[S2 |F1 ]() (since E[S2 |F1 ] is constant over AH )
AH
= pE[S2 |F1 ],
whilst on the other hand
AH ,
S2 dP = p2 u2 S0 + pqudS0 .
AH
Hence
E[S2 |F1 ]() = pu2 S0 + qudS0 = (pu + qd)uS0 = (pu + qd)S1 (),
AH .
21
Similarly, we can show that

E[S2 |F1 ]() = (pu + qd)S1 (),
AT .
So overall we get
E[S2 |F1 ]() = (pu + qd)S1 (),
With F0 = {, }, we can show similarly that

E[S1 |F0 ] = (pu + qd)S0 .
F0 contains no information, so any F0 -measurable random variable must be constant (nonrandom). Therefore E[S1 |F0 ] is that constant which satisfies the averaging property
Z
Z
S1 dP = E[S1 ] = (pu + qd)S0 ,
E[S1 |F0 ] dP =
and so we have
E[S1 |F0 ] = (pu + qd)S0 .
We can generalise the above results to an n-period model.
Lemma 4.3. In an n-period binomial model, we have
E[St+1 |Ft ] = (pu + qd)St ,
(4.2)
t = 0, 1, . . . , n 1.
Proof. To show this, define, for any t = 0, 1, . . . , n 1, the random variable

X :=
St+1
.
St
Then X = u if t+1 = H and X = d if t+1 = T, and X is independent of Ft because each coin

toss is independent. Hence
E[St+1 |Ft ] = E[XSt |Ft ] = St E[X|Ft ] = St E[X] = (pu + qd)St .

Notice that the right-hand-side of (4.2) depends only on the current stock price St , signifying
that the stock price process is a Markov process.
4.3. Martingales.
Definition 4.4 (Martingale). A stochastic process M = (Mt )nt=0 on a filtered probability space
(, F, F := (Ft )nt=0 , P) is a martingale with respect to the filtration F = (Ft )nt=0 if: (i) each
Mt is Ft -measurable (so the process (Mt )nt=0 is adapted to the filtration (Ft )nt=0 ); (ii) for each
t {0, 1, . . . , n}, E[|Mt |] < , and (iii):
E[Mt+1 |Ft ] = Mt ,
t = 0, 1, . . . , n 1.
So martingales tend to go neither up nor down. A supermartingale tends to go down, i.e. the
second condition above is replaced by E[Mt+1 |Ft ] Mt . A submartingale tends to go up, i.e.
E[Mt+1 |Ft ] Mt .
Definition 4.5 (Predictable process). A a predictable process (t )nt=1 on a filtered probability
space (, F, (Ft )nt=0 , P) s one such that, for each t {1, . . . , n}, t is Ft1 -measurable.
See the Appendix for some more background material on discrete-time martingales, including
the idea of a martingale transform, or discrete stochastic integral (Proposition A.13), which uses
the notion of a predictable process.
22
MICHAEL MONOYIOS
4.4. Equivalent measures.

Definition 4.6 (Absolutely continuous measures). Let P and Q be two probability measures on
a measurable space (, F). Assume that for every A F satisfying P(A) = 0, we also have
Q(A) = 0. Then we say Q is absolutely continuous with respect to P, written Q P.
Definition 4.7 (Equivalent measures). If Q is absolutely continuous with respect to P and P is
absolutely continuous with respect to Q, then we say that P and Q are equivalent, written Q P.
In other words, P and Q are equivalent if and only if
P(A) = 0
exactly when Q(A) = 0,
A F.
See the Appendix for some more on equivalent and absolutely continuous measures, including
the concept of a Radon-Nikodym derivative.
5. Contingent claim valuation in the binomial model
Take the standard n-period binomial stock price process S of Section 3 on the filtered probability
space (, F, F := (Ft )tT , P), generated by n coin tosses, with time index set T = {0, 1, 2, . . . n}.
The sample space is finite, the probability measure P is called the physical measure (or objective
measure, or the market measure). We assume Fn = F and F0 = {, }.
The sample space is the set of all outcomes of n coin tosses, so each is of the form
= (1 2 . . . n ), with each t {H, T}, for each t {1, . . . , n}. The evolution of the stock price
process S = (St )nt=0 is given by

St u, if t+1 = H,
St+1 =
t = 0, 1, . . . , n 1,
St d, if t+1 = T,
where u > 1 + r > d > 0.
Introduce a financial agent with initial wealth X0 at time zero, who can choose at each time
how to split his wealth between the riskless and risky assets. The agents trading strategy is the
two-dimensional stochastic process
(0)
(t , t ),
t {1, . . . , n},
(0)
where, for t {1, . . . , n}, t denotes the number of units of the riskless asset held over the
interval [t 1, t) and t , denotes the number of units of the stock held over the interval [t 1, t).
The positions in the portfolio at time t, for t {1, . . . , n}, are decided at time t 1 and kept until
time t, when new asset price quotations are available.
(0)
Assumption 5.1. The portfolio process (t , t )t{1,...,n} is predictable, so that for each t
(0)
{1, . . . , n}, t
and t are Ft1 -measurable.
The initial wealth is given by

(5.1)
(0)
X0 = 1 + 1 S0 . (budget constraint)
Equation (5.1) is a budget constraint, that the agent splits all his initial wealth between cash and
the risky stock.
The time 1 wealth is
(5.2)
(0)
X1 = (1 + r)1 + 1 S1 ,
where we have assumed that no wealth has been taken out of the portfolio for (say) consumption
and no outside income has been injected into the portfolio. Equation (5.2) is thus one form of a
self-financing condition on the portfolio wealth evolution. Using the budget constraint (5.1) we
re-cast (5.2) into the form
(5.3)
X1 = (1 + r)X0 + 1 (S1 (1 + r)S0 ).
23
Similar self-financing portfolio rebalancing occurs at each time t {1, . . . , n 1}. Define the
wealth process X = (Xt )tT , where Xt denotes the wealth at time t (that is, at the end of the
interval [t 1, t) and the beginning of the interval [t, t + 1)), for t = 0, 1, . . . , n 1, with Xn the
final wealth at the end of the interval [n 1, n]. We then have the following evolution.
At the beginning of the interval [t 1, t), and just after portfolio rebalancing has taken place,
the wealth is Xt1 given by
(5.4)
(0) (0)
(0)
Xt1 = t St1 + t St1 = t (1 + r)t1 + t St1 ,
where the last equality follows from the expression (3.1) for the value of the riskless asset at any
(0)
time in T. The position (t , t ) is held over [t 1, t), and the wealth Xt achieved at the end of
this interval (and hence at the start of the interval [t, t + 1)) is
(0) (0)
(5.5)
Xt = t St
(0)
+ t St = t (1 + r)t + t St .
(0)
At this time, t, the portfolio is rebalanced to (t+1 , t+1 ) so that Xt is also given by
(0)
(0)
Xt = t+1 St
+ t+1 St .
Hence the general self-financing condition is

(0)
(0)
t+1 St
(0) (0)
+ t+1 St = t St
t = 1, . . . , n 1.
+ t St ,
We can raise this to a definition.

(0)
Definition 5.2. A trading strategy (t , t )nt=1 is self-financing if for every t = 1, . . . , n 1, we

have
(0) (0)
(0) (0)
t+1 St + t+1 St = t St + t St , t = 1, . . . , n 1.
(0)
Using (5.4) to eliminate t

(5.6)
from (5.5) we can write the portfolio wealth evolution as
Xt = (1 + r)Xt1 + t (St (1 + r)St1 ),
t = 1, . . . , n.
This can be put into a neater form if we work with discounted quantities, that is, we evaluate all
quantities in units of the bond price. The discounted stock price process Se is defined by
St
St
, t = 0, 1, . . . , n,
Set = (0) =
(1 + r)t
St
and the discounted wealth process is similarly defined by
et = Xt = Xt , t = 0, 1, . . . , n.
X
(0)
(1 + r)t
S
t
Then, in terms of discounted quantities, the wealth evolution equation (5.6) becomes
(5.7)
et = X
et1 + t (Set Set1 ),
X
t = 1, . . . , n.
Iterating this evolution from time zero to t T we obtain

(5.8)
et = X0 +
X
t
X
s (Ses Ses1 ),
t = 1 . . . , n.
s=1
From this we see that the wealth process is completely specified by the initial wealth X0 and the
choice of stock portfolio . When we need to emphasise the dependence of wealth on the chosen
portfolio we write X() X.
e denoted by
The sum in (5.8) as the (discrete-time) stochastic integral of with respect to S,
e
( S):
t
X
e t :=
( S)
s (Ses Ses1 ), t = 1, . . . , n.
s=1
24
MICHAEL MONOYIOS
5.1. Equivalent martingale measures and no arbitrage.

Definition 5.3 (Equivalent martingale measure). An equivalent martingale measure (EMM), also
called a risk-neutral measure, is a probability measure Q P such that the discounted stock price
Se is a Q-martingale.
Lemma 5.4. If a martingale measure exists, then the discounted wealth process of a self-financing
portfolio process is a Q-martingale.
Proof. If a martingale measure Q exists, we have EQ [St |Ft1 ] = St1 . Then, from (5.7) we obtain
et1 , t = 1, . . . , n,
et |Ft1 ] = X
EQ [X
so that the discounted wealth process is also a Q-martingale.

Remark 5.5. Lemma 5.4 also follows from the fact that the discounted wealth process is a finite
sum of stochastic integrals, and hence a martingale transform (Proposition A.13). For any selffinancing strategy , the discounted wealth process is given by (5.8), and combining this with
e is a Q-martingale.
Proposition A.13, the discounted wealth process X
5.1.1. The risk-neutral measure in the binomial model. In the binomial model, there is a unique
EMM Q defined as follows. For t {1, . . . , n}, define
1+rd
u (1 + r)
, Q(t = T) = q Q :=
.
ud
ud
It is clear that Q P, and that EQ [St+1 |Ft ] = (1 + r)St , so that Q is indeed an EMM. We will
see shortly that Q emerges naturally when we try to value derivatives in the binomial model via a
replication argument. This is one manifestation of deep results called the Fundamental Theorems
of Asset Pricing which hold in great generality (but are off-syllabus). We mention these theorems
in passing.
(5.9)
Q(t = H) = pQ :=
5.1.2. Fundamental theorems of asset pricing*. Recall the definition of arbitrage.

Definition 5.6 (Arbitrage). An arbitrage over T = {0, 1, . . . , n} is a strategy such that the
associated wealth process satisfies X0 = 0, P[Xn () 0] = 1 and P[Xn () > 0] > 0.
Theorem 5.7 (First Fundamental Theorem of Asset Pricing (FTAP I)). A finite sample space,
discrete-time financial market is arbitrage-free if and only if there exists an equivalent martingale
measure.
Remark 5.8. The proof of Theorem 5.7 is easy in one direction: suppose there exists an equivalent
martingale measure Q. Then for any self-financing strategy we have, from Lemma 5.4, that the
en ] = X
e0 . This immediately precludes the
discounted wealth process is a Q-martingale, so EQ [X
e is such that X
e0 = 0 and X
en 0 P-a.s., so that X
en 0
possibility of arbitrage. For suppose X
Q
e
e
Q-a.s. (since P and Q are equivalent). But since E [Xn ] = X0 = 0, it must be the case that
en = 0, Q-a.s. This implies that X
en = 0 P-a.s., since P and Q are equivalent, so there is no
X
arbitrage.
Definition 5.9. A European contingent claim with expiration time n is a non-negative Fn measurable random variable Y , which is called the payoff of the claim.
Definition 5.10. A European contingent claim Y is said to be attainable (or hedgeable or replicable) if there exists a constant X0 and a portfolio process = (t )nt=1 such that the self-financing
wealth process (Xt )nt=0 satisfies
Xn () = Y (),
In this case, for t = 0, . . . , n, we call Vt := Xt the no-arbitrage price at time t of Y , and the
portfolio which attains Xn = Y is called the replicating portfolio for the claim.
25
Definition 5.11 (Complete market). A financial market is said to be complete if every contingent
claim is attainable. Otherwise, the market is said to be incomplete.
Theorem 5.12 (Second Fundamental Theorem of Asset Pricing (FTAP II)). A finite-state
discrete-time arbitrage-free market is complete if and only if there is a unique equivalent martingale measure.
Here are some examples of European claims in a discrete-time setting with time index set
T = {0, 1, . . . , n}.
A European call option, with payoff Y = (Sn K)+ for fixed strike K 0.
A European put option, with payoff Y = (K Sn )+ for fixed strike K 0.
A fixed strike lookback call option, with payoff Y = (Mn K)+ for fixed strike K 0,
where Mn is the maximum of the stock price over {0, 1, . . . , n}, that is Mn = maxtT St .
A floating strike lookback call option, with payoff Y = (Sn mn )+ , where mn is the
minimum of the stock price over {0, 1, . . . , n}, that is mn = mintT St .
An arithmetic average fixed strike Asian call option, with payoff Y = (An K)+ for fixed
strike K 0,, where An is the arithmetic average of the stock price over {0, 1, . . . , n},
that is
n
1 X
St .
An =
n+1
t=0
In a complete market, all contingent claims are attainable. So, given a contingent claim Y ,
there is a unique trading strategy with wealth process X = (Xt )tT such that Xn = Y almost
surely. This immediately implies that, to avoid arbitrage, the price of the claim at any time t T
must be Vt := Xt , as in Definition 5.10.
Denote the discount factor from time t T to time zero by Dt . So, with constant interest rate
(0)
r, Dt = (1 + r)t = 1/St .
Lemma 5.13. The no-arbitrage price of an attainable claim Y is given by
1 Q
(5.10)
Vt =
E [Dn Y |Ft ], t T.
Dt
Any other price for the claim will lead to an arbitrage opportunity.
Proof. Let X = (Xt )tT be the wealth process of the replicating strategy. The discounted wealth
e = DX is a Q-martingale, so satisfies
process X
EQ [Dn Xn |Ft ] = Dt Xt ,
t n.
Using Xn = Y and the definition Vt := Xt yields (5.10).

To show that there is arbitrage if (5.10) is violated, consider buying or selling the claim at time
zero (and a similar argument would hold at any time t T).
First, suppose V0 > EQ [Dn Y ]. Sell the claim for V0 and use the proceeds to invest in the
replicating portfolio, which requires an initial investment of X0 = EQ [Dn Y ]. The wealth in this
portfolio at time n is given by Xn = Y , by assumption. Therefore, one can, at time zero, invest
V0 X0 > 0 in the bank, use the proceeds from the replicating portfolio to settle ones obligations
from the claim, and make a riskless profit of (V0 X0 )(1 + r)n > 0. This is an arbitrage.
Similarly, if V0 < EQ [Dn Y ], one buys the claim and sells the replicating portfolio, leading to
profits (X0 V0 )(1 + r)n > 0.

5.2. Valuation by replication in the binomial model. In this (most important) section we
shall show that in the binomial model it is possible to replicate any European claim, so the model
is complete. We shall see that the unique EMM Q, defined by
1+rd
u (1 + r)
(5.11)
Q(t = H) = pQ :=
, Q(t = T) = q Q :=
,
ud
ud
26
MICHAEL MONOYIOS
emerges naturally.
5.2.1. Replication in a one-period binomial model. First consider a one-period model, n = 1, so
T = {0, 1}. Suppose an agent sells a claim on the stock at time zero that expires at time 1. There
are just two points , given by = H and = T.
The claim pays off an amount Y at time 1, where Y is an F1 -measurable random variable.
This measurability condition is relevant; it says that the value of the claim at its maturity date
is determined by the coin toss, that is, by the value of the stock price at time 1. This is why it
does not make sense to use some stock unrelated to the derivative security in valuing it.
The agent sells the claim at time zero for some price V0 (to be determined) and attempts to
manage the risk from this sale by building a hedging portfolio composed of a number 1
(0)
shares of the underlying stock and a number 1 (0) of shares of the riskless asset (which has
(0)
initial value S0 = 1).
We suppose that the proceeds from the sale of the claim, V0 , are all that the agent uses to
construct a hedging portfolio. Therefore the initial wealth in the hedging portfolio is
X0 = (0) + S0 .
(5.12)
As the stock price evolves in time the hedging portfolio and option value will also evolve. The
option payoff is the random variable Y () (so for, say, a call option, Y () = (S1 () K)+ , where
K is the options strike and S1 () is the stock price after one coin toss). The agents hedge
portfolio wealth at time 1 is X1 (), given by
X1 () = (1 + r) (0) + S1 ().
Eliminating (0) using (5.12), we write X1 as
X1 () = (1 + r)X0 + (S1 () (1 + r)S0 ).
If the hedging portfolio is to successfully manage the risk from the option sale its value must
replicate the option payoff in each possible final state, so that we require X1 () = Y () for
= H and = T, yielding the equations
(5.13)
(5.14)
(1 + r)X0 + (S1 (H) (1 + r)S0 ) = Y (H),

(1 + r)X0 + (S1 (T) (1 + r)S0 ) = Y (T),
if = H,
if = T.
Solving these equations for gives

=
Y (H) Y (T)
.
S1 (H) S1 (T)
Then, the initial wealth is computed from either of (5.13) or (5.14) as

i
1 h Q
(5.15)
X0 =
p Y (H) + q Q Y (T) ,
1+r
where we have used (5.11) and S1 (H) = uS0 , S1 (T) = dS0 . (The cash position required can be
obtained using (5.12).)
If the agent holds the portfolio ( (0) , ), then he will be able to meet all his obligations associated
with the claim. Therefore the current claim price is the initial wealth required to do this, or
V0 = X0 , as given by (5.15). So we get the claim price at time zero as
V0 =
1
EQ [Y ].
1+r
The measure Q is the unique EMM for this one-period market, and is also known as the riskneutral probability measure.
27
It is clear that Q is equivalent to the physical measure P, and that Q is indeed a martingale
measure, in that

S1
Q
= S0 .
E
1+r
It is also clear that the discounted wealth process, and hence the discounted claim price process,
is also a Q-martingale, just as we would have expected from the FTAPs.
5.2.2. Replication in an n-period binomial model. We can easily generalise the above analysis to
an n-period model, by concatenating a sequence of 1-period models.
Let us place ourselves at some time t1, where t {1, . . . , n}. Given a fixed outcome 1 . . . t1
of the first t 1 coin tosses, suppose that the values of the stock and a derivative security at time
t are St (t ), Vt (t ) respectively, if the outcome of the tth coin toss is t (see Figure 7). Then one
can trade over [t 1, t) to reproduce the values of the derivative one period later, as follows.
St (H) = St1 u, Vt (H)

St1 , Vt1
@
@
R St (T) = St1 d, Vt (T)
@
Figure 7. Binomial process for stock price and derivative.

At time t 1, after portfolio rebalancing has taken place, the wealth with strategy ( (0) , ) =
(0)
(t , t )nt=1 is given by
(0) (0)
(5.16)
Xt1 = t St1 + t St1 .
This evolves to the wealth Xt (t ) at time t, given by

(0)
(0)
Xt (t ) = t (1 + r)St1 + t St1 t ,

t =
u, t = H,
d, t = T.
(0) (0)
Eliminating t St1 using (5.16) we get

Xt (t ) = (1 + r)Xt1 + t St1 (t (1 + r)),
t = H, T.
Writing this out fully as two equations, we have

(5.17)
(5.18)
Xt (H) = (1 + r)Xt1 + t St1 (u (1 + r)),

Xt (T) = (1 + r)Xt1 t St1 (1 + r d).
We require Xt (t ) = Vt (t ), for both t = H and t = H. This requires that the stock holding at
the beginning of the interval [t 1, t) must be
t =
Vt (H) V )t (T)
Vt (H) V )t (T)
=
.
St1 (u d)
St (H) St (T)
The required wealth at time t 1 is then given from either of (5.17) or (5.18) as
i
1 h Q
Xt1 =
p Vt (H) + q Q Vt (T) = EQ [(1 + r)1 Vt |Ft1 ], t = 1, . . . , n.
1+r
For no-arbitrage, we must then have that the derivative value at time t 1 must be given by
Vt1 = Xt1 :
(5.19)
Vt1 = EQ [(1 + r)1 Vt |Ft1 ],
t = 1, . . . , n.
28
MICHAEL MONOYIOS
Notice that this implies that the discounted option value ((1 + r)t Vt )nt=0 is a Q-martingale (as it
should be, since it is replicated by a discounted wealth process which is a Q-martingale).
This shows that one can always find a strategy at any time to reproduce the value of a contingent
claim one period later. The key to valuing the contingent claim is thus to begin at the maturity
time and work backwards, computing risk-neutral discounted expectations. The next section
formalises this.
5.3. Completeness of the multiperiod binomial model. The above analysis can clearly be
iterated so that in a multiperiod binomial model, we can replicate any contingent claim. The
next theorem rigorously demonstrates that a portfolio process to hedge any contingent claim in
the binomial model exists, and derives an expression for t , t = 1, . . . , n.
Define the unique EMM Q by setting the Q-probability of H on each coin toss is to be pQ , and
the Q-probability of T to be q Q := 1 pQ , given by (5.11).
Theorem 5.14. The n-period binomial model is complete. In particular, let Y be European claim
with maturity time n, and define
Vt (1 . . . t ) := (1 + r)t EQ [(1 + r)n Y |Ft ](1 . . . t ), t = 0, . . . , n,
Vt (1 . . . t1 H) Vt (1 . . . t1 T)
, t = 1, . . . , n.
t (1 . . . t1 ) :=
St (1 . . . t1 H) St (1 . . . t1 T)
Then, starting with initial wealth X0 := V0 = EQ [(1 + r)n Y ], the self-financing wealth process
corresponding to the portfolio process 1 , . . . , n is the process V0 , . . . , Vn .
Proof. Let V0 , . . . , Vn and 1 , . . . , n be defined as in the theorem. Observe that Vn = Y almost
surely.
Start with wealth X0 = V0 = EQ [(1+r)n Y ] and consider the self-financing value of the process
1 , . . . , n . This wealth satisfies the recursive formula
Xt+1 = (1 + r)Xt + t+1 (St+1 (1 + r)St ),
t = 0, 1, . . . , n 1.
We need to show that, with Xt , Vt , t defined as above, we have

(5.20)
Xt = Vt ,
almost surely, t {0, . . . , n}.
We proceed by induction. For t = 0, (5.20) holds by definition of X0 . Now assume that (5.20)
holds for some fixed value of t {1, . . . , n 1}, i.e. for each fixed (1 . . . t ) we have
Xt (1 . . . t ) = Vt (1 . . . t ).
Then we need to show that
Xt+1 (1 . . . t H) = Vt+1 (1 . . . t H),
Xt+1 (1 . . . t T) = Vt+1 (1 . . . t T).
We shall prove the first equality, and note that the second can be proved similarly (an exercise).
Note first that {(1 + r)t Vt }nt=0 is a martingale under Q, since
EQ [(1 + r)(t+1) Vt+1 |Ft ] = EQ [EQ [(1 + r)n Y |Ft+1 ]|Ft ]
n
= E [(1 + r)
Y |Ft ]
(defn. of Vt+1 )
(tower property)
= (1 + r) Vt .
So in particular,
Vt (1 . . . t ) = EQ [(1 + r)1 Vt+1 |Ft ](1 . . . t )
1
(pQ Vt+1 (1 . . . t H) + q Q Vt+1 (1 . . . t T)).
=
1+r
29
Since (1 . . . t ) will be fixed for the rest of the proof, we simplify notation by suppressing these
symbols. For example, the last equation is written as
(5.21)
Vt =
1
(pQ Vt+1 (H) + q Q Vt+1 (T)).
1+r
Now we compute
Xt+1 (H) = (1 + r)Xt + t+1 (St+1 (H) (1 + r)St )
= (1 + r)Vt + t+1 (St+1 (H) (1 + r)St ) (since Xt = Vt )

Vt+1 (H) Vt+1 (T)
= (1 + r)Vt +
(St+1 (H) (1 + r)St )
St+1 (H) St+1 (T)
= pQ Vt+1 (H) + q Q Vt+1 (T)

Vt+1 (H) Vt+1 (T)
+
(St+1 (H) (1 + r)St )
St+1 (H) St+1 (T)
= pQ Vt+1 (H) + q Q Vt+1 (T) + q Q (Vt+1 (H) Vt+1 (T))
= Vt+1 (H),
where we have used St+1 (H) = St u and St+1 (T) = St d.

Example 5.15 (European call in 2-period model). Let u = 2, d = 1/u, r = 1/4, S0 = 4, so
that pQ = q Q = 1/2. Consider a European call with expiration time 2 and payoff function
Y = (S2 5)+ . The possible stock prices in this model are shown in Figure 8.
S2 (HH) = 16
S1 (H) = 8
@
@ S2 (HT) = S2 (TH) = 4
S0 = 4
@
@
S1 (T) = 2
@
@S (TT) = 1
2
Figure 8. Two period binomial lattice

There are four elements = {HH, HT, TH, TT}, so in principle there are four possible final
stock prices. But in fact, two of the outcomes lead to the same stock price. We say that the
stock price is path-independent since it only depends on the number of H and T in the sequence
= (1 , 2 ) (where t , t = 1, 2 is either H or T), and does not depend on the order in which the
H and T occur. Thus S2 (HT) = S2 (TH) = 4. for example. The terminal option payoffs for each
are
Y (HH) = 11,
Y (HT) = Y (HT) = Y (TH) = Y (TT) = 0,
and these are of course the option values at time 2:

V2 (HH) = 11,
V2 (HT) = V2 (HT) = V2 (TH) = V2 (TT) = 0.
30
MICHAEL MONOYIOS
Then using the binomial algorithm in Theorem 5.14 we work backwards in time using the fact
that V is a Q-martingale, to obtain

1
4 1
1
22
Q
Q
V1 (H) =
(p V2 (HH) + q V2 (HT)) =
(11) + (0) = ,
1+r
5 2
2
5

4 1
1
1
(pQ V2 (TH) + q Q V2 (TT)) =
(0) + (0) = 0,
V1 (T) =
1+r
5 2
2

1
1
4 1 22
44
+ (0) = .
V0 =
(pQ V1 (H) + q Q V1 (T)) =
1+r
5 2 5
2
25
6. American options in the binomial model
We briefly discuss the pricing of American derivative securities in the binomial model. American
derivative securities can be exercised at any time prior to maturity.
Definition 6.1. In a discrete-time framework with time set T = {0, 1, . . . , n}, an American
derivative security with maturity n is a sequence of nonnegative random variables (Yt )nt=0 such
that for each t T, Yt is Ft -measurable. The owner of an American derivative security can
exercise at any time t T, and if he does, he receives the payment Yt .
For example, an American put option of strike K on a stock price S = (St )nt=0 can be exercised
at any time t T to give the owner a payment Yt := (K St )+ , which is called the intrinsic value
of the option at time t.
Recall the pricing of European securities. Consider a binomial model with n periods, so the
time set is T = {0, 1, . . . , n}. Suppose Yn is the payoff of a European derivative. For t T, we
define by backward recursion
1
[pQ Vt+1 (H) + q Q Vt+1 (T)], t = 0, . . . , n 1,
(6.1)
Vn := Yn , Vt :=
1+r
where, as before, the second equation is a shorthand for
Vt (1 . . . t ) = EQ [(1 + r)1 Vt+1 |Ft ](1 . . . t )
1
(pQ Vt+1 (1 . . . t H) + q Q Vt+1 (1 . . . t T)).
=
1+r
Then Vt is the value of the option at time t T, and the hedging portfolio over [t 1, t) is t
given by
Vt (H) Vt (T)
Vt (H) Vt (T)
t =
=
, t = 1, . . . , n,
St (H) St (T)
St1 (u d)
which is shorthand for
Vt (1 . . . t1 H) Vt (1 . . . t1 T)
t (1 . . . t1 ) =
, t = 1, . . . , n.
St (1 . . . t1 H) St (1 . . . t1 T)
Now suppose the option is American, with payoff Y = (Yt )nt=0 . At any time t T, the holder of
the American derivative can exercise the option and receive the payment Yt . Hence, the hedging
portfolio should create a wealth process X which satisfies
Xt Yt ,
t T,
almost surely.
This is because the value of the American option at time t is at least as much as the so-called
intrinsic value Yt , and the value of the hedging portfolio at that time must equal the value of the
option.
This suggests that, to price an American derivative, we should replace the European algorithm
(6.1) by to the following American algorithm:

1
Q
Q
(6.2)
Vn = Yn , Vt = max Yt ,
[p Vt+1 (H) + q Vt+1 (T)] , t = 0, . . . , n 1,
1+r
31
which checks whether the intrinsic value is greater than the value of the discounted risk-neutral
expectation, which would signify that the option would be exercised in that state. Then Vt would
be the value of the American derivative at time t T.
Remark 6.2 (Super-martingale property of American option price). In valuing European options
we found that the discounted option value is a Q-martingale (recall, for example, (5.19)). From
(6.2). We see that the value of the American option can be greater than that given by a discounted
risk-neutral expectation, because of the possibility of early exercise. In other words, we might
have that

1
Q
Vt > E
Vt+1 Ft ,
1+r
or, equivalently
EQ [(1 + r)(t+1) Vt+1 |Ft ] < (1 + r)t Vt ,
so that the option value is a Q-supermartingale. (It turns out that the value process of an
American option is the smallest supermartingale that dominates the payoff, though we do not
prove this here.)
Example 6.3 (American put in a 2-period model). Consider an American put option in a 2-period
binomial model with u = 2, d = 1/u, r = 1/4, S0 = 4, so that pQ = q Q = 1/2. Let the option
have payoff function Yt = (5 St )+ . The possible stock prices in this model are shown in Figure
9. The terminal values of the option are given by V2 = Y2 = (5 S2 )+ and these are also shown
in the figure.
S2 (HH) = 16, V2 (HH) = 0
S1 (H) = 8
@
@ S2 (HT) = S2 (TH) = 4, V2 (HT) = V2 (TH) = 1
S0 = 4
@
@
S1 (T) = 2
@
@S (TT) = 1, V (TT) = 4
2
2
Figure 9. Stock price and terminal value of American put

Then the values of the option at time 1 are:

1
1
2
2
+ 4
V1 (H) = max (5 8) ,
0+ 1
= max 0,
=
5 2
2
5
5

4 1
1
V1 (T) = max (5 2)+ ,
1+ 4
= max[3, 2] = 3.
5 2
2
In particular, we notice that at time 1, and for 1 = T, the option should be exercised, as the
intrinsic value is greater than the discounted risk-neutral expectation of later values.
The option value at time zero is

1 2
1
34
34
+ 4
V0 = max (5 4) ,
+ (3)
= max 1,
=
= 1.36.
5 2 5
2
25
25
Now let us attempt to construct the hedging portfolio for this option. We begin with initial wealth
X0 = 34/25, and we compute 1 via the replication condition for 1 = H:
2
X1 (H) = (1 + r)X0 + 1 (S1 (H) (1 + r)S0 ) = V1 (H) = ,
5
which yields 1 = 13/30. We could just as well calculate 1 by looking at the wealth X1 (T), as
follows:
X1 (T) = (1 + r)X0 + 1 (S1 (T) (1 + r)S0 ) = V1 (T) = 3,
32
MICHAEL MONOYIOS
which also yields 1 = 13/30. Now let us try to compute 2 in a similar manner:
X2 (HH) = (1 + r)X1 (H) + 2 (H)(S2 (HH) (1 + r)S1 (H)) = V2 (HH) = 0,
1
. The same result is obtained if one considers the wealth X2 (HT). Now
which yields 2 (H) = 12
let us try to compute 2 (T) as follows:
X2 (TH) = (1 + r)X1 (T) + 2 (T)(S2 (TH) (1 + r)S1 (T)) = V2 (TH) = 1,

which yields 2 (T) = 11/6. However, if we try to compute 2 (T) using X2 (TT), we get
X2 (TT) = (1 + r)X1 (T) + 2 (T)(S2 (TT) (1 + r)S1 (T)) = V2 (TT) = 4,
which yields 2 (T) = 1/6. In other words, we get different answers for 2 (T), the position in
stock that should be chosen at the start of the interval [1, 2) when 1 = T! This apparent anomaly
has arisen because X1 (T) = 3 (since the American put is exercised when 1 = T) rather than 2,
which would be the case if the option were European (and you can check that in this case the
above calculations would both have yielded 2 (T) = 1).
This example shows that we need to analyse the hedging portfolio for an American option more
closely.
6.1. Value of hedging portfolio for an American option*. Consider the following generalisation of the evolution of the wealth of a self-financing portfolio, equation (5.6):
(6.3)
Xt = (1 + r)(Xt1 Ct1 ) + t (St (1 + r)St1 ),
t = 1, . . . , n,
where, for t {0, 1, . . . , n 1}, Ct represents the amount of wealth consumed at time t. In other
words, we are allowing for some funds to be withdrawn from the self-financing portfolio. We
found earlier that, for a self-financing portfolio, the discounted wealth process ((1 + r)t Xt )nt=0
is a martingale. The consequence of allowing consumption from the portfolio will mean that the
discounted wealth process will be a supermartingale (i.e. it will tend to go down).
To appreciate why this adjustment might be needed, consider the American algorithm in (6.2).
We see that the value of the option can be greater than that given by a discounted risk-neutral
expectation, because of the possibility of early exercise. In other words, we might have that

1
Q
Vt+1 Ft ,
Vt > E
1+r
or, equivalently
EQ [(1 + r)(t+1) Vt+1 |Ft ] < (1 + r)t Vt ,
so that the option value is a supermartingale. (It turns out that the value process of an American
option is the smallest supermartingale that dominates the payoff, though we do not prove this
here.)
To see how consumption enters the hedging portfolio, consider the situation in which

1
Q
Vt+1 Ft .
(6.4)
Vt > E
1+r
Then the holder of the American option should exercise (this is the case in the state 1 = T in
Example 6.3), so that hedging should stop at this point (which is why we had difficulty isolating
what the hedging portfolio should be in the example). If the holder of the option does not exercise,
then the seller of the option may consume to close the gap between the left and right hand sides
of (6.4). By doing this, he can ensure that Xt = Vt for all t T, where Vt is the value defined by
the American algorithm.
In Example 6.3, we had V1 (T) = 3, V2 (TH) = 1, V2 (TT) = 4, so that

1
4 1
1
Q

E
V2 F1 (T) =
1 + 4 = 2,
1+r
5 2
2
33
and there is a gap if size 1 in (6.4). If the owner of the option does not exercise it at time 1 in the
state 1 = T, then the seller can consume an amount 1 at time 1. Thereafter he uses the usual
hedging portfolio
Vt (H) Vt (T)
.
t =
(u d)St1
In the example, we had V1 (T) = Y1 (T), which means that, acting optimally, the holder of the
option should exercise. It turns out that it is optimal for the owner of the American option to
exercise whenever its value Vt agrees with its intrinsic value Yt .
7. Brownian motion
We now move on to the continuous-time modelling of financial assets, based on a continuous
stochastic process called Brownian motion (BM). This has many remarkable properties, as we
shall see, not least of which it is simultaneously Gaussian, Markovian, and a martingale. Our first
foray into this subject is to think of BM as a suitably scaled and speeded up random walk.
We will use the notation where X N (, 2 ) denotes that a random variable X is normally
distributed with mean and variance 2 .
7.1. BM as scaled limit of symmetric random walk. Toss a coin infinitely many times, so
that the sample space is the set of all infinite sequences = (1 2 . . .) of H and T. One can
construct a well-defined probability space (, F, P) called the space of infinite coin tosses (though
this is not completely trivial, as is an uncountably infinite space), as well as a filtration (Fj )
j=0
on this space, where Fj denotes the -algebra generated by the first j coin tosses. We shall
not delve into the construction of infinite coin toss space here, but we take it as given that it is
well-defined. Chapters 1 and 2 of Shreve [17] has a detailed account.
Assume that each toss is independent. Take P{j = H} = P{j = T} = 21 , for j = 1, 2, . . . On
the infinite coin toss space (, F, P) define the random variables

1
if j = H,
Xj () :=
, j = 1, 2 . . .
1 if j = T,
So each Xj has mean zero and variance 1, (that is, E[Xj |Fj1 ] = E[Xj ] = 0 and E[Xj2 |Fj1 ] =
E[Xj2 ] = 1). Thus X1 , X2 , . . . is a sequence of independent, identically distributed (i.i.d.) random
variables. Then define the symmetric random walk M = (Mk )
k=0 via
M0 := 0,
Mk :=
k
X
Xj ,
k = 1, 2, . . . .
j=1
By the Law of Large Numbers we know that

1
Mk 0,
k
almost surely, as k .
By the Central Limit Theorem we know that for large k, Mk / k is approximately standard
normal:
1
Mk Z N (0, 1), almost surely, as k .
k
Brownian motion arises if we suitably speed up the tossing of the coins and scale the size of each
random walk increment. To this end, first fix some time t 0 in which k coin tosses take place,
with some time interval t between each toss, so that
t =: kt =:
k
,
n
34
MICHAEL MONOYIOS
where n := 1/t, for positive integers k, n. Then define a continuous time process W (n) via
r
1
t
(n)
Wt := Mnt =
Mk = tMt/t , t 0,
k
n
(n)
with linear interpolation used to define Wt for any times t 0 not of the form k/n. Take the
limit k , with t fixed. Equivalently, n , or equivalently, t 0, so we are speeding
(n)
up the coin tossing, and since, Wt = tMt/t = tMk , we are scaling each increment of the
random walk by t. Then, since 1k Mk Z N (0, 1) as k , we have that

(n)
Wt
Wt N (0, t),
as n ,
and we call the process (Wt )t0 a standard Brownian motion.

Notice that with t = kt, we have (though this is purely formal)
(n)
dWt
dt
(n)
(n)
W
Wt
= lim t+t
t0
t
1
= lim Xk+1 = .
t0
t
(n)
If, instead of Wt , we were to define

1
t
Mnt = Mk = tMt/t ,
n
k
then since Mk /k 0 as k (by the Law of Large Numbers), we have that
(n)
Vt
:=
(n)
Vt
0,
as n ,
(n)
(n)
and
(n)
V
Vt
dVt
= lim t+t
= lim Xk+1 = 1,
t0
t0
dt
t
(n)
so while the derivative of V
is defined (unlike that of W (n) ), the process V (n) is trivially zero
in the limit.
In other words the Brownian particle can only have motion if it has infinite velocity. This is
a manifestation of the fact that paths of Wt are almost surely continuous but not differentiable,
and of infinite length, as we shall see.
7.2. Brownian motion definition. We now give a standard definition of BM, and assume
without further mention that it exists, thought the existence of such a process is not trivial, and
there are many advanced texts devoted to various constructions of it (for instance, Karatzas and
Shreve [12]).
Definition 7.1 (Brownian motion). A standard 1-dimensional Brownian motion (BM) is a continuous adapted process W := (Wt , Ft )0t< on some filtered probability space (, F, F :=
(Ft )t0 , P) with the properties that W0 = 0 a.s. and, for 0 s < t, Wt Ws is independent of Fs
and normally distributed as Wt Ws N (0, t s).
The above definition is often summarised by saying that BM is a continuous process with
stationary and independent Gaussian increments (the stationarity refers to the fact that the
distribution of Wt Ws depends only on the elapsed time t s, and not individually on t, s).
7.2.1. Other definitions*. Here are some (non-examinable) remarks and pointers to more advanced topics.
Here is another definition of BM, based on a process called quadratic variation (one definition of
which is given below). Let M2 denote the space of right-continuous square-integrable martingales
on a complete filtered probability space (, F, F := (Ft )t0 , P): that is, for M := (Mt )t0 M2
we have M0 = 0 a.s., and E[Mt2 ] < , for all t 0.
35
Definition 7.2 (Quadratic variation). For M M2 , the quadratic variation (QV) of M is the
unique, increasing adapted process [M ] such that [M ]0 = 0 a.s. and such that (Mt2 [M ]t )t0 is
a martingale.
Definition 7.3 (Cross-variation). For X, Y M2 , define their cross-variation process ([X, Y ]t )t0
by
1
[X, Y ]t := ([X + Y ]t [X Y ]t ) ,
4
c
For X, Y M2 (i.e. continuous), this is the unique increasing adapted process [X, Y ] such that
[X, Y ]0 = 0 a.s. and such that ((XY [X, Y ])t )t0 is a martingale.
Remark 7.4. For Brownian motion W := (Wt )t0 , we have [W ]t = t, since Wt2 t is a martingale
(see Problem Sheet 2). Indeed, Brownian motion may be defined as the unique continuous process
that satisfies this property. This is related to the Levy criterion which we shall meet in Section
7.8.
7.2.2. Filtration generated by BM*. We denote by (Ft )t0 the filtration generated by Brownian
motion. Its required properties are:
For each t, Wt is Ft -measurable;
for each t and for t < t1 < t2 < . . . < tn , the Brownian motion increments Wt1 Wt , Wt2
Wt1 , . . . , Wtn Wtn1 are independent of Ft .
Here is one way to construct Ft . First fix t. Let s [0, t] and A B(R) be given. Put the set
{Ws A} = { : Ws () A}
in Ft . Do this for all possible numbers s [0, t] and all Borel sets A B(R). Then put in every
other set required by the -algebra properties. This -algebra Ft contains exactly the information
learnt by observing the Brownian motion up to time t, and (Ft )t0 is called the filtration generated
by the Brownian motion.
7.3. Properties of BM. We discuss some properties of BM, with starred subsections being (as
usual) non-examinable background.
7.3.1. Stationarity*. We say a stochastic process X = (Xt )t0 is stationary if Xt has the same
distribution as Xt+h for any h > 0. Brownian motion has stationary increments. To see this,
define the increment process I = (It )t0 by It := Wt+h Wt . Then It N (0, h), and It+h =
Wt+2h Wt+h N (0, h) have the same distribution. This is equivalent to saying that the process
(Wt+h Wt )h0 has the same distribution for all t.
7.3.2. Martingale property. The independent increments property allows us to show that BM is
a martingale. For 0 s t we have
E[Wt |Fs ] = E[Wt Ws + Ws |Fs ] = E[Wt Ws |Fs ] + Ws = E[Wt Ws ] + Ws = Ws .
7.3.3. Covariance of BM at different times*. Let 0 s t be given. Then Ws and Wt Ws are
independent, and (Ws , Wt ) are jointly normal with E[Ws ] = E[Wt ] = E[Wt Ws ] = 0, var(Ws ) = s,
var(Wt ) = t, var(Wt Ws ) = t s, so that the covariance of Ws and Wt is
cov(Ws , Wt ) := E[(Ws E[Ws ])(Wt E[Wt ])]
= E[Ws Wt ]
= E[Ws (Wt Ws + Ws )]
=
=
=
E[Ws (Wt Ws )] + E[Ws2 ]

E[Ws ]E[Wt Ws ] + s (by independence)
s.
36
MICHAEL MONOYIOS
Thus, for any s 0, t 0 (not necessarily s t), we have

cov(Ws , Wt ) = E[Ws Wt ] = s t = min(s, t),
or, equivalently, the covariance matrix of the vector Ws,t = (Ws , Wt ) is C A1 , given by

s
st
1
C=A =
, (positive definite, symmetric).
st
t
7.3.4. Transition density*.
Definition 7.5 (Transition density). Fix x R, t0 R+ . Then
P(Wt0 +t [y, y + dy]|Wt0 = x) = p(t, x, y) dy,
where the transition density of Brownian motion is the function

1
(x y)2
p(t, x, y) =
exp
, y R, t > 0.
2t
2t
This is the probability density that the BM moves from x to y R in a time period t.
7.3.5. Starting points other than zero*. For a standard Brownian motion W that starts at zero
we have a probability space (, F, P) that satisfies P{W0 = 0} = 1. Then for t 0, Wt N (0, t).
For x R, we can define a process Wtx := x + Wt which will satisfy P{W0x = x} = 1 and, for
t 0, Wtx N (x, t).
Equivalently, we can define another probability measure Px (or, more precisely, a probability
space (, F, Px )) under which Px {W0 = x} = 1, and with W having stationary independent
increments under Px : for s t, Wt Ws N (0, t s) and independent of Fs . Then, under Px ,
Wt N (x, t). In this case, we say that W is a Brownian motion starting at x. We see that such
a Brownian motion is equivalent to x + W , where W is a standard Brownian motion starting at
zero.
Note that:
If x 6= 0, then Px puts all its probability on a completely different set from P.
The distribution of Wt under Px is the same as the distribution of Wtx = x + Wt under P,
that is
Law(W x , P) = Law(W, Px ).
7.3.6. Markov property. We will need to be aware that BM is a Markov process (but you are not
expected to know the proof below for examination purposes).
We can show that W is a Markov process as follows. Recall that the Markov property is
equivalent to stating that for s 0, t 0, we have E[h(Ws+t )|Fs ] = g(Ws ), where h and g are
functions. Consider
E[h(Ws+t )|Fs ] = E[h(Ws+t Ws + Ws )|Fs ].
Use the properties that Ws+t Ws is independent of Fs , and that Ws is Fs -measurable, along with
the following independence lemma: if X, Y are random variables on a probability space (, F, P),
and if G is a sub--algebra of F, with
X G-measurable
Y independent of G.
Then if f (x, y) is a function of two variables, and if we define
g(x) := E[f (x, Y )],
then we have
E[f (X, Y )|G] = g(X).
37
In this lemma, take G = Fs , X = Ws , Y = Ws+t Ws , and f (x, y) = h(x + y). Then define
g(x) := E[h(Ws+t Ws + x)]
= E[h(x + Wt ) (since Wt N (0, t) has the same distribution as Ws+t Ws )
= Ex [h(Wt )].
Then
E[h(Ws+t )|Fs ] = g(Ws ) = EWs [h(Wt )],
which is the Markov property.
Remark 7.6 (Strong Markov property*). In fact, Brownian motion has the strong Markov property
given below (though we do not prove this).
Fix x R and define
:= min{t 0|Wt = x}.
Then we have
E[h(W +t )|F ] = g(x) = Ex h(Wt ).
7.4. Quadratic variation of BM. This section is the most important one on BM, giving the
crucial property that we shall use repeatedly, and which will form the bedrock of the Ito stochastic
calculus to come.
Definition 7.7 (pth variation). Let P = {t0 , t1 , . . . , tn } be a partition of [0, t], i.e.
0 = t0 t1 . . . tn = t.
The mesh of the partition is defined to be
kPk =
max
k=0,...,n1
|tk+1 tk |
(p)
(p)
The pth variation of a function f : R+ R on an interval [0, t], [f, f ]t [f ]t , is defined by
(p)
[f ]t
(7.1)
n1
X
:= lim
kPk0
|f (tk+1 ) f (tk )|p .
k=0
In particular, if p = 1 this is called the total variation (or the first variation) and if p = 2 this is
called the quadratic variation.
(1)
7.4.1. First variation. Consider the first variation (or total variation), [f ]t , of a function f .
Suppose f is differentiable. Then the Mean Value Theorem1 implies that in each subinterval
[tk , tk+1 ], there is a point tk such that
f (tk+1 ) f (tk ) = (tk+1 tk )f 0 (tk ).
Then
n1
X
|f (tk+1 f (tk )| =
k=0
n1
X
|f 0 (tk )|(tk+1 tk ),
k=0
and so
(1)
[f ]t
lim
kPk0
Z
=
n1
X
|f 0 (tk )|(tk+1 tk )
k=0
|f 0 (s)| ds.
Thus, first variation measures the total amount of up and down motion of the path of f over the
interval [0, t].
1The Mean Value Theorem states that if f is differentiable in (a, b), then there is a point x (a, b) at which
f (b) f (a) = (b a)f 0 (x).
38
MICHAEL MONOYIOS
(2)
7.4.2. Quadratic variation of Brownian motion. To simplify notation, we write [f, f ]t

[f ]t for the quadratic variation of a function f over the interval [0, t].
(2)
[f ]t
Lemma 7.8. If f is differentiable, then [f ]t = 0.

Proof.
n1
X
|f (tk+1 f (tk )|
n1
X
k=0
|f 0 (tk )|2 (tk+1 tk )2
k=0
kPk
n1
X
|f 0 (tk )|2 (tk+1 tk ),
k=0
and so
[f ]t
lim kPk lim
kPk0
kPk0
t
Z
=
lim kPk
kPk0
n1
X
|f 0 (tk )|2 (tk+1 tk )
k=0
|f 0 (s)|2 ds
= 0.

Theorem 7.9. For Brownian motion W = (Wt )t0 we have
[W ]t = t,
t 0,
or more precisely
P{ : [W ]t () = t} = 1.
In particular, the paths of Brownian motion are not differentiable.
We shall first prove that the quadratic variation of Brownian motion over [0, t] is equal to t
in mean square. (We shall then prove that the result holds almost surely, but the almost sure
convergence proof is not examinable).
Recall that a sequence (Xn )nN of random variables converges in mean square (or in L2 (, F, P))
to a random variable X if E[|Xn X|2 ] 0 as n , and converges to X almost surely if
P{ |Xn () = X()} 1 as n .
Proof of Theorem 7.9 I: convergence in L2 . Let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. Set
Dk := Wtk+1 Wtk and define the sample quadratic variation
QP :=
n1
X
Dk2 .
k=0
Then
QP t =
n1
X
[Dk2 (tk+1 tk )].
k=0
We want to show that limkPk0 (QP t) = 0 in mean square. Consider an individual summand
Dk2 (tk+1 tk ). This has expectation zero, so
E[QP t] = E
n1
X
[Dk2 (tk+1 tk )] = 0.
k=0
t)2 ]
Therefore, if we compute E[(QP

= var(QP t) and find it to approach zero as kPk 0,
then we have shown that the quadratic variation of Brownian motion is equal to t in mean square
39
or, equivalently, that var(QP ) 0 as kPk 0, so that QP essentially becomes non-stochastic as

kPk 0.
For j 6= k, the terms Dj2 (tj+1 tj ) and Dk2 (tk+1 tk ) are independent (due to the
independent increments property of BM), so
var(QP t) =
n1
X
k=0
n1
X
var[Dk2 (tk+1 tk )]
E[Dk4 2(tk+1 tk )Dk2 + (tk+1 tk )2 ]
k=0
n1
X
[3(tk+1 tk )2 2(tk+1 tk )2 + (tk+1 tk )2 ]
k=0
n1
X
(tk+1 tk )2
= 2
k=0
2kPk
n1
X
(tk+1 tk )
k=0
= 2kPkt.
Thus we have
E(QP t) = 0,
var(QP t) 2kPkt.
As kPk 0, var(QP t) 0, or E[(QP t)2 ] 0 as kPk 0 (i.e. as n ), so

QP t,
in L2 .
Proof of Theorem 7.9 II: a.s. convergence*. To show that the convergence is also almost sure,
consider the dyadic partition tk = kt/2m , k = 0, 1, . . . , 2m , i.e. we partition [0, t] into 2m intervals
of width t/2m , so that the mesh of the partition approaches zero as m . Then the sample
quadratic variation over [0, t] may be written as
Qm (t) :=
m 1
2X
W(k+1)t/2m Wkt/2m
k=0
2
=:
m 1
2X
(Wk )2 ,
k=0
where we have written Wk = W(k+1)t/2m Wkt/2m . We have Wk N (0, t/2m ), Wk , Wj

are independent for k 6= j, and hence (Wk )2 , (Wj )2 are independent for k 6= j.
Recall that for X N (0, v) we have E[X 4 ] = 3v 2 , so that
var[X 2 ] = E[X 4 ] (E[X 2 ])2 = 3v 2 v 2 = 2v 2 .
Therefore, from E[Wk2 ] = t/2m we get
E[Qm (t)] = t,
40
MICHAEL MONOYIOS
regardless of m. Further, by the independence of the squared increments we have

E[(Qm (t) t)2 ] = var(Qm (t))
= var
m 1
2X
(Wk )
k=0
m 1
2X
var(Wk2 )
k=0
= 2m 2
t
2m
2
2t2
0, as m .
2m
Therefore, since the limit of Qm (t) as m is [W ]t . we have established the mean square
convergence
=
[W ]t = lim Qm (t) t, in L2 .
m
Now we show almost sure convergence using the Chebyshev inequality and the Borel-Cantelli
lemmas (see, for instance Grimmett and Stirzaker [7], Section 7.3).2 By Chebyshevs inequality
we have, for a > 0,
2t2
1
P{|Qm (t) t| > a} 2 E[(Qm (t) t)2 ] = 2 m .
a
a 2
So
2t2 m2
P{|Qm (t) t| > 1/m} m2 E[(Qm (t) t)2 ] =
.
2m
P
Write Am = {|Qm (t)t| > 1/m}, and consider the sequence of events (Am )
m=1 . Then
m=1 P(Am ) <
, so by the Borel-Cantelli lemmas, the event that infinitely many of the Am occur has probability
given by
!

[
\
P lim sup Am = P
Ak = 0.
m
m=1 k=m
2Chebyshevs inequality follows from the following result, which is Theorem 7.3.1 in [7].
Theorem 7.10. Let h : R [0, ) be a non-negative function, Then

P(h(X) a)
E[h(X)]
,
a
a > 0.
Proof. Let A := {h(X) a}. Then h(X) a1A . Taking expectations gives the result.

Setting h(x) = |x| gives Markovs inequality. Taking h(x) = x2 gives Chebyshevs inequality: P(|X| a)
E[X 2 ]/a2 .
The Borel-Cantelli lemmas (Theorem 7.3.10 in [7]) state:
Theorem 7.11 (Borel-Cantelli lemmas). Let A1 , A2 , . . . be an infinite sequence of events from some probability
space (, F, P). Let A be the event that infinitely many of the An occur (or {An infinitely often} = {An i.o.}, given
by
\
\
A := {An i.o.} = lim sup An =
Ak .
n
n=1 k=n
Then:
P
(1) P(A) = 0 if P
n=1 P(An ) < ,
(2) P(A) = 1 if
n=1 P(An ) = and A1 , A2 . . . are independent events.
41
In other words, |Qm (t) t| 1/m for large m, almost surely, or

[W ]t = lim Qm (t) t,
m
almost surely.
7.5. Path length*. Given a continuous function f : [0, t] R its total variation over [0, t] is,
over any partition P = {0 = t0 t1 . . . tn = t} of [0, t], is
(1)
T V (f )t [f ]t
:= lim
kPk0
n1
X
|f (tk+1 ) f (tk )|.
k=0
This may be infinite, or some finite number, in which case we say that f has bounded variation.
Consider an element of arc length sk along f in the interval [tk , tk+1 ]. If this interval is
small, we have (sk )2 (tk )2 + (fk )2 , where we have written tk = tk+1 tk and fk =
f (tk+1 ) f (tk ). By the triangle inequality we have
|fk | |sk | |fk | + |tk |.
Denoting the total arc length (or path length) of f over [0, t] by s(f )t we therefore have, in the
limit kPk 0,
T V (f )t s(f )t T V (f )t + t.
Therefore,
finite path length T V (f )t < .
In contrast, the quadratic variation of f over [0, t] is
[f ]t =
lim
kPk0
n1
X
|fk ||fk |
k=0
max

n1
X
|fk | lim
|fk |
max

|fk | T V (f ).
lim
kPk0
k=0,...,n1

=
lim
kPk0
k=0,...,n1
kPk0
k=0
For any continuous function, limkPk0 (maxk=0,...,n1 |fk |) 03, so we conclude that
T V (f )t < [f ]t = 0 for all t 0.
In other words, paths of Brownian motion (Ws )0st over the interval [0, t] have infinite path
length.
Because the total variation of Brownian motion is infinite (i.e. Brownian paths are very long)
Rt
one is not readily able to give meaning to integrals with respect to Brownian motion, 0 bs dWs ,
via a path-by-path procedure. Thus we are led to a new type of integral, the It
o stochastic integral,
which we shall describe shortly.
Remark 7.12 (Heuristics). If we (formally) write dWt for the infinitesimal (corresponding to the
Rt
infinitesimal time interval dt) increase in Wt , then we have 0 dWs dWs = t, which is often
summarised by the formula
dWt dWt = dt,
or by
d[W ]t = dt.
Formally, note that if dWt dWt = dt, then in some sense dWt / dt = 1/ dt as dt 0. In
other words, Brownian motion is nowhere differentiable, as we saw earlier.
3This is a standard theorem from real analysis, proven from compactness arguments.
42
MICHAEL MONOYIOS
For the partition P defined by

0 = t0 < t1 < . . . < tn = t,
we defined
Dk := Wtk+1 Wtk ,
tk := tk+1 tk ,
k = 0, 1, . . . , n 1.
We have that
E[Dk2 ] = tk , var(Dk2 ) = 2(tk )2 .
It is tempting to argue that, because the variance of Dk2 is much smaller than its mean, then we
have that for small tk , Dk2 tk . But this equation has no content: when tk is small, it
would be true because both sides are near zero. A better way to capture what we think is going
might be to write
Dk2
1.
tk
But this is never true either. The left hand side is the square of the standard normal random
variable
Dk
Yk :=
N (0, 1),
tk
whose distribution is the same no matter how small we make tk .
To better understand what is going on, for some large positive integer n, define tk := kt/n, k =
0, 1, . . . , n, so that tk = t/n for all k = 0, 1, . . . , n 1. Then
Yk2
, k = 0, 1, . . . , n 1.
n
The
random variables Y0 , Y1 , . . . , Yn1 are i.i.d., so the Law of Large NumbersPimplies that
n1 2
1 Pn1 2
2
k=0 Dk conk=0 Yk converges to the common mean E[Yk ] = 1 as n , and hence
n
verges to t. Each of the terms Dk2 in this sum can be quite different from its mean tk = t/n,
but when we sum many terms like this, the differences average out to zero.
The point is that although we write dWt dWt = dt frequently, this has no rigorous mathematical meaning unless we consider the integrated relation [W ]t = t.
Dk2 = t
7.6. Cross-variation of W and t and QV of t. For the partition P defined by

0 = t0 < t1 < . . . < tn = t,
we have computed the quadratic variation
(7.2)
[W ]t := lim
kPk0
n1
X
Dk2 = t.
k=0
In addition to this, we can compute the cross variation of Wt with t or the quadratic variation of
t, given by
(7.3)
lim
kPk0
n1
X
Dk tk = 0,
k=0
lim
kPk0
n1
X
(tk )2 = 0.
k=0
We know that the second of these limits is zero since t is a differentiable function (see Lemma
7.8, or the argument below). To see that the first limit in (7.3) is zero, observe that

|Dk tk |
max |Dj | tk ,
0jn1
and hence

n1

X

Dk tk
max |Dj | t,

0jn1
k=0
which converges to zero as kPk 0 since W is continuous.
43
For the second equality in (7.3) we observe that

n1
X
n1
X
k=0
k=0
(tk )2 kPk
tk = kPkt,
which clearly converges to zero as kPk 0.

Just as we informally write dWt dWt = dt for (7.2), we capture (7.3) by writing
dWt dt = 0,
dt dt = 0.
7.7. Other variations of Brownian motion. Consider the first variation (or total variation)
of BM, denoted by
n1
X
T V (W )t := lim
|Dk |.
kPk0
k=0
Lemma 7.13. The first variation of BM is infinite, T V (W )t = , for t > 0.

Proof. We have
t = [W ]t =
n1
X
lim
kPk0
Dk2
k=0

max |Dj | T V (W )t .
lim
kPk0 0jn1
So T V (W )t < is equivalent to [W ]t = 0, which is false, so the result follows.

For the third variation, defined by
(3)
[W, W ]t
:= lim
kPk0
n1
X
|Dk |3 ,
k=0
we have:
Lemma 7.14.
(3)
[W, W ]t
= 0,
t 0.
Proof.
(3)
[W, W ]t
:= lim
kPk0
n1
X
|Dk |3 [W ]t
k=0

lim

max |Dj | = 0.
kPk0 0jn1

7.8. L
evys characterisation of Brownian motion. BM W is a martingale with continuous
paths whose quadratic variation is [W ]t = t. In fact, this is a complete characterisation of BM,
given in the following Theorem (we give a sketch of the proof in Problem Sheet 2).
Theorem 7.15 (Levys theorem, 1-dimensional). Let M be a martingale relative to a filtration,
with M0 = 0, continuous paths, and [M ]t = t for all t 0. Then M is a BM.
integral
8. The Ito
We consider how to define an integral with respect to Brownian motion. The probability space
(, F, P) (with F = (F)t0 the filtration generated by Brownian motion) is given, and always
lurks in the background, even when not explicitly mentioned.
We want to construct the It
o integral, which we write as
Z t
It =
bs dWs , t 0.
0
The integrator is Brownian motion, (Wt )t0 , with associated filtration (Ft )t0 and the following
properties:
44
MICHAEL MONOYIOS
(1) s t every set in Fs is also in Ft ;

(2) Wt is Ft -measurable, t 0;
(3) for t t1 . . . tn , the increments Wt1 Wt , Wt2 Wt1 , . . . , Wtn Wtn1 are independent
of Ft .
The integrand is a process b = (bt )t0 , where
(1) bt is Ft -measurable t 0 (i.e. (bt )t0 is adapted to the filtration (Ft )t0 );
(2) b is square-integrable:
Z t

2
E
bs ds < , t 0.
0
Remark 8.1. For a differentiable function f (t), we can define

Z t
Z t
b(s)f 0 (s) ds.
b(s) df (s) =
0
This wont work when the integrator is Brownian motion, because the paths of Brownian motion
are not differentiable.
8.1. It
o Integral of an elementary integrand. For some fixed T > 0 let P = {t0 , t1 , . . . , tn }
be a partition of [0, T ]:
0 = t0 t1 . . . tn = T.
Assume that b is constant on each interval [tk , tk+1 ), such that for t [tk , tk+1 ), bt = btk . We
call such a process b an elementary process, or a simple process. The Ito integral It of such a
process is defined as follows.
Definition 8.2 (It
o integral of elementary process). Suppose t [0, T ] is such that tk t < tk+1 ,
Rt
for some k {1, . . . , n 1}. Then the Ito integral It = 0 bs dWs of the elementary process b is
defined by
Z t
k1
X
It :=
bs dWs :=
btj (Wtj+1 Wtj ) + btk (Wt Wtk ), t 0.
0
j=0
Note that we can let T > 0 in the above definition be arbitrarily large and in this was construct
the integral It for any t 0, yielding the process (It )t0 .
8.1.1. Properties of the It
o integral of an elementary process.
Property 8.3 (Adaptedness). For each t 0, It is Ft -measurable;
Property 8.4 (Linearity). With
Z
It =
Z
bs dWs ,
Jt =
as dWs ,
0
then for , R,
Z
It + Jt =
(bs + as ) dWs .
0
Property 8.5 (Martingale property). (It )t0 is a martingale.

Theorem 8.6 (Martingale property). The process I = (It )t0 defined by
It :=
k1
X
j=0
is a (P, F)-martingale.
btj (Wtj+1 Wtj ) + btk (Wt Wtk ),
45
Proof. Let 0 s t be given. We treat the general case that s and t are in different subintervals.
That is, there are partition points t` and tk such that s [t` , t`+1 ) and t [tk , tk+1 ).
Write
It =
k1
X
btj (Wtj+1 Wtj ) + btk (Wt Wtk )
j=0
`1
X
btj (Wtj+1 Wtj )
j=0
(8.1)
+ bt` (Wt`+1 Wt` ) +
k1
X
btj (Wtj+1 Wtj ) + btk (Wt Wtk ).
j=`+1
Compute conditional expectations. For 0 s t, we have

`1
`1
X
X

E
btj (Wtj+1 Wtj ) Fs
=
btj (Wtj+1 Wtj ),

j=0
j=0

E bt` (Wt`+1 Wt` )|Fs = bt` (E[Wt`+1 |Fs ] Wt` ) = bt` (Ws Wt` ).
These are the conditional expectations of the first first two terms on the RHS of (8.1). They add
up to Is and so contribute this to E[It |Fs ]. We show that the third and fourth terms contribute
zero:

k1
k1
X
X

btj (Wtj+1 Wtj ) Fs

E
E E btj (Wtj+1 Wtj )|Ftj |Fs
=

j=`+1
j=`+1
k1
X

E btj (E[Wtj+1 |Ftj ] Wtj )|Fs = 0,
j=`+1
and
E [btk (Wt Wtk )|Fs ] = E [btk (E[Wt |Ftk ] Wtk )|Fs ] = 0.

Property 8.7 (The It
o isometry). Because (It )t0 is a martingale and I0 = 0 we have E[It ] = 0 for
all t 0. It follows that var(It ) = E[It2 ], a quantity given by the formula in the next theorem.
Theorem 8.8 (It
o isometry). The It
o integral of the elementary process b, defined by
(8.2)
It :=
k1
X
btj (Wtj+1 Wtj ) + btk (Wt Wtk ),
j=0
satisfies
E[It2 ]
Z
=E
b2s ds
t 0.
Proof. To simplify notation, write Dj = Wtj+1 Wtj , j = 0, . . . , k 1 and Dk = Wt Wtk , so

P
that (8.2) is written as It = kj=0 btj Dj . Then
It2 =
k
X
j=0
b2tj Dj2 + 2
X
0i<jk
bti btj Di Dj .
46
MICHAEL MONOYIOS
We first show that the expected value of the cross terms is zero. For i < j, the random variable
bti btj Di is Ftj -measurable, while the Brownian increment Dj is independent of Ftj , so E[Dj |Ftj ] =
E[Dj ] = 0. Therefore,

E[bti btj Di Dj ] = E E[bti btj Di Dj |Ftj ]

= E bti btj Di E[Dj |Ftj ]
= 0.
Now consider the square terms b2tj Dj2 . The random variable b2tj is Ftj -measurable, while the
squared Brownian increment Dj2 is independent of Ftj , so E[Dj2 |Ftj ] = E[Dj2 ] = tj+1 tj , for
j = 0, . . . , k 1, and E[Dk2 |Ftk ] = E[Dk2 ] = t tk . Therefore,
E[It2 ] =
k
X
E[b2tj Dj2 ]
j=0
k
X
h
i
E E[b2tj Dj2 |Ftj ]
j=0
k
X
E[b2tj E[Dj2 |Ftj ]]
j=0
k
X
E[b2tj E[Dj2 ]]
j=0
k1
X
E[b2tj (tj+1 tj )] + E[b2tk (t tk )].
j=0
Rt
Rt
But btj is constant on [tj , tj+1 ), so b2tj (tj+1 tj ) = tjj+1 b2s ds and similarly, b2tk (t tk ) = tk b2s ds,
so
"Z
#!
Z t

k1
tj+1
X
2
2
2
E[It ] =
E
bs ds
+E
bs ds
tj
j=0
k1 Z
X
= E
j=0
Z
= E
tk
tj+1
tj
!
b2s ds
b2s ds
tk
b2s ds

Property 8.9 (Quadratic variation of the integral). The quadratic variation of the integral is the
quadratic variation process [I]t )t0 of the integral process I = (It )t0 . For Brownian motion we
Rt
Rt
Rt
may write Wt = 0 1 dWs and this has quadratic variation [W ]t = 0 12 d[W ]s = 0 1 dt. We
say that Brownian motion
accumulates quadratic variation at the rate of one per unit time. In
Rt
the Ito integral It = 0 bs dWs , BM is scaled in a time and path-dependent way (depending on
(s, ) [0, t] ) by the integrand bs . Because increments are squared in the computation of
quadratic variation, the QV of BM will be scaled by b2s as it enters the integral. The following
theorem gives the precise statement.
47
Theorem 8.10 (Quadratic variation of the Ito integral). Let b be a simple process. Then the It
o
integral
Z t
It =
bs dWs , t 0,
0
has quadratic variation process ([I]t )t0 given by

Z t
[I]t =
b2s ds,
t 0.
We say that the It

o integral accumulates quadratic variation at a rate b2s (s [0, t]) per unit
Rt
time, and that the quadratic variation accumulated up to time t by the integral is [I]t = 0 b2s ds.
Proof. First compute the quadratic variation accumulated by the integral on one of the subintervals [tj , tj+1 ) on which bs = btj , s [tj , tj+1 ), is constant. Choose partition points
tj = s0 < s1 < . . . < sm = tj+1 ,
and consider
m1
X
m1
X
i=0
i=0
m1
X
2
btj
(Wsi+1
i=0
(Isi+1 Isi )2 =
(8.3)
[btj (Wsi+1 Wsi )]2

Wsi )2 .
As m and the mesh of the partition, maxi=0,...,m1 (si+1 si ) approaches zero, the term
Pm1
2
i=0 (Wsi+1 Wsi ) converges to the QV accumulated by BM over [tj , tj+1 ), which is tj+1 tj .
Therefore, the limit of the RHS of (8.3), which is the QV accumulated by the integral over
[tj , tj+1 ), is
Z tj+1
2
b2s ds,
btj (tj+1 tj ) =
tj
where we have use the fact that bs is constant for s [tj , tj+1 ). Similarly, the QV accumulated
Rt
by the integral over [tk , t] is tk b2s ds. Adding up all these contributions proves the theorem.

Informally, we establish the theorem in differential form via
dIt = bt dWt = d[I]t = dIt dIt = b2t dWt dWt = b2t d[W ]t = b2t dt,
just as we wrote d[W ]t = dWt dWt = dt earlier. In fact, one can do a lot of the calculations in
Ito calculus simply by applying the informal multiplication rules:
dWt dWt = dt,
dWt dt = dt dt = 0.
Remark 8.11. Note the contrast between Theorems 8.8 and 8.10. The QV [I]t is computed pathby-path, so the result can depend on the path, and so in principle is random. The variance of
the integral is precisely the expectation of the QV, as given by the Ito isometry (i.e. it is is an
average over all possible paths of the QV), and so is non-random.
8.2. It
o Integral of a general integrand*. It turns out that one can construct the Ito integral
for more general integrands, and the resulting integral inherits the same properties as the It
o
integral of an elementary integrand. We will take these properties as given and use them freely
from now on, but we do not prove them, and the general theory of Ito integration for general
adapted integrands (so proofs of all the results in this sub-section) is not examinable.
Fix t > 0. Let b be a process (not necessarily an elementary process) such that
bs his Fs -measurable,
s [0, t];
Rt 2 i
E 0 bs ds < .
48
MICHAEL MONOYIOS
We then have the following result.

Theorem 8.12. There is a sequence of elementary processes (b(n) )
n=1 such that
Z t

2
lim E
|b(n)
s bs | ds = 0.
n
Proof. See [17], Section 4.3, [15], Section 3.1, or [12], Section 3.2 and Problem 3.2.5 in [12].

We have shown how to define
(n)
It
b(n)
s dWs ,
=
0
for every n N. We now define the general Ito integral by

Z t
Z t
b(n)
bs dWs := lim
s dWs .
n 0
The only difficulty with this approach is that we need to make sure the above limit exists. Suppose
m and n are large positive integers. Then
(n)
E[|It
(m) 2
It
(n)
(m)
| ] = var(It It )
"Z
2 #
t
(bs(n) b(m)
= E
s ) dWs
0
Z
(b(n)
s
(It
o isometry) = E
2
b(m)
s ) ds
Z
(|bs(n)
(triangle inequality) E
2
b(m)
s |) ds
bs | + |bs
Z t

Z t

(n)
2
(m) 2
2
2
2
((a + b) 2(a + b )) 2E
|bs bs | ds + 2E
|bs bs | ds ,
0
which approaches zero as m, n , by Theorem 8.12. This guarantees that the sequence
(n)
(It )
n=1 is a Cauchy sequence in L2 (, F, P) and so has a limit.
8.2.1. Properties of the general It
o integral. The general Ito integral is
Z t
It =
bs dWs ,
0
where b is any adapted, square-integrable process. Its properties are inherited from the properties
of Ito integrals of simple processes and are summarised below. You are expected to know these
properties as being inherited from the properties of the integral of elementary processes, but are
not required to be able to prove any of them in this general case.
Property 8.13 (Adaptedness). For each t 0, It is Ft -measurable;
Property 8.14 (Linearity). If
Z
It =
Z
bs dWs ,
Jt =
as dWs ,
0
then for , R,
Z
It + Jt =
(bs + as ) dWs .
0
Property 8.15 (Martingale property). (It )t0 is a martingale.

(There is also a converse result, known as the martingale representation theorem, stated without
proof in Section 8.3).
49
Property 8.16 (The It

o isometry). The variance of the Ito integral is var(It ) = E[It2 ] given by
Z t

2
2
E[It ] = E
bs ds .
0
Property 8.17 (Continuity). It is a continuous function of the upper limit of integration t.

Property 8.18 (Quadratic variation). The Ito integral
Z t
bs dWs , t 0,
It =
0
has quadratic variation process ([I]t )t0 given by

Z t
[I]t =
b2s ds.
0
Example 8.19. Consider the It

o integral
Z
It =
Ws dWs .
0
(n)
We approximate the integrand by an elementary process bs , s [0, t], in the following way.
Partition the interval [0, t] into n time intervals t, so that t = nt, and
0 = t0 < t1 = t =
(n)
and define bs
t
kt
< . . . < tk = kt =
< . . . < tn = t,
n
n
by
b(n)
s = Wtk = Wkt/n , if
kt
n
s<
(k+1)t
n ,
k = 0, . . . , n 1.
Then by definition
Z
It =
Ws dWs = lim
n1
X

Wkt/n W(k+1)t/n Wkt/n .
k=0
To simplify notation, write Wk Wkt/n so that

Z
Ws dWs = lim
n1
X
Wk (Wk+1 Wk ).
k=0
Then we note that

2
Wk+1
Wk2 = (Wk+1 Wk )2 + 2Wk Wk+1 2Wk2
= (Wk+1 Wk )2 + 2Wk (Wk+1 Wk ),

so that
n1
X
Wk (Wk+1 Wk ) =
k=0
"n1
#
n1
X
1 X 2
2
2
(Wk+1 Wk )
(Wk+1 Wk )
2
k=0
k=0
!
n1
X
1
Wn2
(Wk+1 Wk )2 . (W0 = 0)
2
k=0
Now we let n and use the definition of quadratic variation to get

Z t
1
1
Ws dWs = (Wt2 [W ]t ) = (Wt2 t).
2
2
0
50
MICHAEL MONOYIOS
Remark 8.20 (Reason for the 12 t term). If f is a differentiable function with f (0) = 0, then
Z t
Z t
1
1
f (s)f 0 (s) ds = f 2 (s)|t0 = f 2 (t).
f (s) df (s) =
2
2
0
0
In contrast, for Brownian motion, we have
Z t
1
Ws dWs = (Wt2 t).
2
0
The extra term 21 t comes from the nonzero quadratic variation of Brownian motion. It has to be
Rt
there, because E[ 0 Ws dWs ] = 0 (the Ito integral is a martingale), but E[ 21 Wt2 ] = 12 t.
8.3. It
o and martingaleR representation theorems for Brownian
R t motion*. For adapted
t
integrands b satisfying E[ 0 b2s ds] < we have that the Ito integral 0 bs dWs is a martingale.
There is also a converse result, known as the martingale representation theorem (which do not
prove, see [15] for example).
Theorem 8.21 (It
o representation theorem for Brownian motion). Let (Wt )t0 be a Brownian
motion on a filtered probability space (, F, F := (Ft )t0 , P), with (Ft )t0 the natural filtration
Ft = (Ws , 0 s t). Suppose that X L2 (, Ft ,hP) (i.e. iX is Ft -measurable and E[X 2 ] < ).
Rt
Then there exists an adapted process b such that E 0 b2s ds < , t 0 and
Z t
X = E[X] +
bs dWs .
0
Theorem 8.22 (Martingale representation theorem for Brownian motion). Let (Wt )t0 be a
Brownian motion on a filtered probability space (, F, F := (Ft )t0 , P), with (Ft )t0 the natural
filtration Ft = (Ws , 0 s t). Suppose that the process M = (Mt )t0 is a square-integrable
martingale with respect to this filtration, written M M2 (that is, Mt L2 (, Ft , P) for
hR all t
i 0,
t 2
2
or E[Mt ] < , for all t 0). Then there exists an adapted process b such that E 0 bs ds <
, t 0 and
Z t
Mt = M0 +
bs dWs .
0
formula
9. The Ito
9.1. It
os formula for one Brownian motion. We want a rule to differentiate expressions
of the form f (Wt ). If the paths of Wt were differentiable then the ordinary chain rule would give
d
f (Wt ) = f 0 (Wt )Wt0 ,
dt
which could be written in differential notation as
df (Wt ) = f 0 (Wt )Wt0 dt = f 0 (Wt ) dWt .
However, Wt is not differentiable, and in particular has nonzero quadratic variation, so the correct
formula has an extra term, namely,
1
df (Wt ) = f 0 (Wt )Wt0 dt = f 0 (Wt ) dWt + f 00 (Wt ) d[W ]t ,
2
with the understanding that d[W ]t = dt. This is a version of It
os formula in differential form.
Integrating this, we obtain a version of Itos formula in integral form.
Theorem 9.1 (It
o formula for one BM). If f (x) is a C 2 (R) function and t 0, then
Z t
Z
1 t 00
0
(9.1)
f (Wt ) f (W0 ) =
f (Ws ) dWs +
f (Ws ) d[W ]s .
2 0
0
51
Remark 9.2 (Differential versus integral forms). The mathematically meaningful form of It
os
formula is its integral form, because we have solid definitions for the integrals appearing on the
RHS of (9.1). For pencil and paper computations, the more convenient form is the differential
form.
Proof of Theorem 9.1. Fix t > 0 and let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. By Taylors
theorem we have
n1
X
f (Wt ) f (W0 )
=
[f (Wtk+1 ) f (Wtk )]
k=0
n1
X

1 00
2
=
f (Wtk )(Wtk+1 Wtk ) + f (Wtk )(Wtk+1 Wtk ) +
2
k=0
Z t
Z t
1
kPk0
f 0 (Ws ) dWs +
f 00 (Ws ) d[W ]s ,
2 0
0
with higher order terms disappearing (since the third variation of BM is zero by Lemma 7.14 and
Pn1 3 Pn1
3
k=0 Dk
k=0 |DkR| , where Dk := Wtk+1 Wtk ) and the second-order summation converges to
t
the Riemann integral 0 f 00 (Ws ) d[W ]s , since it becomes the quadratic variation of an Ito integral.
That is, for the It
o integral
Z t
n1
X
It =
bs dWs = lim
btk (Wtk+1 Wtk ),
0
kPk0
k=0
we have
Z
b2s ds = [I]t =
lim
kPk0
lim
kPk0
n1
X
(Itk+1 Itk )2
k=0
n1
X
b2tk (Wtk+1 Wtk )2 .
k=0

A heuristic derivation would simply state that, by Taylors theorem
1
df (Wt ) = f 0 (Wt ) dWt + f 00 (Wt ) dt,
2
where we have used dWt dWt = dt in the last term on the RHS, and higher order terms are
neglected.
Example 9.3. Applying the It
o formula to f (x) = x2 we have
d(Wt2 ) = 2Wt dWt + d[W ]t = 2Wt dWt + dt.
Integrating over [0, t] and re-arranging we get
Z t
1
Ws dWs = (Wt2 t), t 0,
2
0
which reproduces the result obtained in Example 8.19 from first principles.
Corollary 9.4 (It
o formula for function of time and one Brownian motion). If St = f (t, Wt ) for
some C 1,2 (R+ R) function f (t, x), then
1
dSt = df (t, Wt ) = ft (t, Wt ) dt + fx (t, Wt ) dWt + fxx (t, Wt ) d[W ]t ,
2
and higher order terms do not contribute, since we have shown earlier in Section 7.6 that we have
the informal rules dWt dt = 0 and dt dt = 0.
52
MICHAEL MONOYIOS
Definition 9.5 (Geometric Brownian motion). Geometric Brownian motion is the process S =
(St )t0 given by

1 2
St = S0 exp Wt + t ,
2
where and > 0 are constant, and the parameter is called the volatility of the process S.
Define

1 2
f (t, x) = S0 exp x + ( )t ,
2
so that St = f (t, Wt ) and

1 2
ft (t, x) = f (t, x), fx (t, x) = f (t, x), fxx (t, x) = 2 f (t, x),
2
with the subscripts denoting partial derivatives. Then by Itos formula
dSt =
df (t, Wt )
1
= ft (t, Wt ) dt + fx (t, Wt ) dWt + fxx (t, Wt ) dt
2

1 2
1
=
f (t, Wt ) dt + f (t, Wt ) dWt + 2 f (t, Wt ) dt
2
2

1
1
=
2 St dt + St dWt + 2 St dt
2
2
= St dt + St dWt ,
which is geometric Brownian motion in differential form. Geometric Brownian motion in integral
form may be written as
Z t
Z t
St = S0 +
Ss ds +
Ss dWs , t 0.
0
9.1.1. Quadratic variation of geometric Brownian motion. In the integral form of geometric Brownian motion,
Z
Z
t
St = S0 +
Ss ds +
0
Ss dWs ,
0
the Riemann integral

Z
F (t) =
Ss ds
0
is differentiable with F 0 (t) = St . This term has zero quadratic variation. The Ito integral
Z t
G(t) =
Ss dWs
0
is not differentiable. It has quadratic variation

Z t
[G]t =
2 Ss2 ds.
0
Thus the quadratic variation of S is given by the quadratic variation of G, i.e.

Z t
[S]t = [G]t =
2 Ss2 ds.
0
In differential notation we write

d[S]t = dSt dSt = 2 St2 dWt dWt ,
or equivalently
d[S]t = 2 St2 d[W ]t ,
53
using the informal multiplication rules involving the differentials dt and dWt :
d[W ]t = dWt dWt = dt,
dWt . dt = dt. dWt = dt. dt = 0.
Remark 9.6. Note that

Z t
d[S]s
2 ds = 2 t,
=
Ss2
0
0
indicating that for geometric Brownian motion, the quadratic variation, when scaled by the square
of the stock price process, is a measure of the volatility of the process S.
Z
9.2. It
os formula for It
o processes.
Definition 9.7 (It
o process). Let (Wt , Ft )t0 be a standard Brownian motion. An Ito process is
a stochastic process of the form
Z t
Z t
bs dWs , t 0,
as ds +
(9.2)
Xt = X0 +
0
where
hR X0 iis non-random and a, b are adapted stochastic processes satisfying
t
E 0 b2s ds < .
Rt
0
|as | ds < and
In differential form we write (9.2) as

dXt = at dt + bt dWt .
Lemma 9.8 (Quadratic variation of an Ito process). The quadratic variation of the It
o process
(9.2) is the process ([X]t )t0 given by
Z t
[X]t =
b2s ds, t 0.
0
Proof. This is immediate from the fact that the quadratic variation of
Rt
0
as ds is zero.
Definition 9.9 (Integral with respect to an Ito process). Let (Xt )t0 be the Ito process (9.2)
and let (t )t0 be an adapted process satisfying
Z t

Z t
E
s2 b2s ds < ,
|s as | ds < ,
0
for every t 0. The It

o integral of with respect to X is the process J = (Jt )t0 defined by
Z t
Z t
Z t
Jt :=
s dXs :=
s as ds +
s bs dWs .
0
Theorem 9.10 (It

o formula for It
o processes). Let (Xt )t0 be the It
o process (9.2) and let
1,2
f (t, x) C ([0, ) R). Then, for every t > 0,
Z t
Z t
Z
1 t
fxx (s, Xs ) d[X]s
f (t, Xt ) = f (0, X0 ) +
ft (s, Xs ) ds +
fx (s, Xs ) dXs +
2 0
0
0

Z t
1 2
= f (0, X0 ) +
ft (s, Xs ) + as fx (s, Xs ) + bs fxx (s, Xs ) ds
2
0
Z t
+
bs fx (s, Xs ) dWs .
0
Proof. As for the It

o formula with respect to BM, and use the fact that [X]t = [I]t =
Rt
2
0 bs ds.
54
MICHAEL MONOYIOS
It is usually easier to remember and use this theorem in the differential form
1
df (t, Xt ) = ft (t, Xt ) dt + fx (t, Xt ) dXt + fxx (t, Xt ) d[X]t ,
2
where d[X]t = dXt dXt is computed according to the rules
dt dt = dt dWt = dWt dt = 0,
dWt dWt = dt.
Example 9.11 (Generalised geometric Brownian motion). Define the Ito process

Z t
Z t
1 2
Xt =
s dWs +
s s ds, t 0,
2
0
0
where , are adapted processes. Then

dXt
1
= t dWt + t t2
2

dt,
d[X]t = t2 d[W ]t = t2 dt.

A common model for an asset price process S = (St )t0 is given by
St = S0 eXt ,
with S0 > 0 non-random, which is called a generalised geometric Brownian motion. We write
St = f (Xt ) where f (x) = S0 ex . The It
o formula gives
dSt = t St dt + t St dWt .
Applying the It
o formula to the function g(t, St ) = log St , we find that

1
d(log St ) = dXt = t dWt + t t2 dt.
2
Example 9.12 (Exponential martingale). Let be a process adapted to the filtration of the Brownian motion W . Define the process Z = (Zt )0tT by

Z t
Z
1 t 2
d[W ]s .
Zt = exp
s dWs
2 0 s
0
Using the Ito formula we see that (and there is a similar exercise on Problem Sheet 2)
dZt = t Zt dWt ,
so Z is the It
o process given by
Z
Zt = 1
s Zs dWs ,
t 0,
and Z is a martingale provided that E
hR
T
0
i
t2 Zt2 dt < .
Remark 9.13 (Novikov condition*). It can be shown (though outside the scope of this course)
that a sufficient condition on alone for Z to be a martingale is the Novikov condition

Z T

1
2
E exp
dt
< .
2 0 t
In particular, Z will be a martingale if is bounded (for instance, a constant).
55
9.3. Markovian diffusions. Suppose an Ito process X := (Xt )t0 is given by

Z t
Z t
b(s, Xs ) dWs , x R,
a(s, Xs ) ds +
(9.3)
Xt = x +
0
or in differential form
(9.4)
dXt = a(t, Xt ) dt + b(t, Xt ) dWt ,
for well-behaved (see Remark 9.14 further below) functions a(t, x), b(t, x). Then (9.4) (or equivalently (9.3)) is called a stochastic differential equation (SDE) for X, and the process X is Markovian (though we do not prove this here):
E[h(XT )|Ft ] = E[h(XT )|Xt ],
0 t T,
for any function h() such that the expectations exist.

Given a function f C 1,2 ([0, ) R) ,the process (Yt )t0 defined by Yt := f (t, Xt ) has
differential given by
1
dYt df (t, Xt ) = ft (t, Xt ) dt + fx (t, Xt ) dXt + fxx (t, Xt ) d[X]t ,
2
where d[X]t = dXt dXt is computed according to the rules
dt dt = dt dWt = dWt dt = 0,
dWt dWt = dt.
In integral form Yt is given by

Z t
1 2
Yt = Y0 +
ft (s, Xs ) + a(s, Xs )fx (s, Xs ) + b (s, Xs )fxx (s, Xs ) ds
2
0
Z t
+
b(s, Xs )fx (s, Xs ) dWs , t 0.
0
This may be written as

Z t
Z t
Yt = Y0 +
(ft (s, Xs ) + Af (s, Xs )) ds +
b(s, Xs )fx (s, Xs ) dWs ,
0
t 0,
where A is called the generator of the diffusion X, and is defined by

1
Af (t, x) := a(t, x)fx (t, x) + b2 (t, x)fxx (t, x).
2
Remark 9.14 (Existence of solutions to SDEs*). There are conditions on the functions a, b such
that there exists a well-defined process X satisfying the stochastic differential equation (SDE)
(9.5)
or, more precisely, whether there exists a process X satisfying X0 = x and

Z t
Z t
Xt = x +
a(s, Xs ) ds +
b(s, Xs ) dWs , t 0.
0
The basic existence result is as follows. Suppose there is a constant K such that for all x, y, t we
have
|a(t, x) a(t, y)| K|x y|, |b(t, x) b(t, y)| K|x y| |a(t, x)| + |b(t, x)| K(1 + |x|).
(The first two conditions are Lipschitz continuity in x.) Then the SDE (9.5) has a unique, adapted,
continuous Markovian solution, and there exists a constant C such that
E[|Xt |2 ] CeCt (1 + |x|2 ).
56
MICHAEL MONOYIOS
9.4. Connection with PDEs: Feynman-Kac theorem. There is a remarkable connection

between stochastic calculus for Markov diffusions and partial differential equations (PDEs). Consider the one-dimensional diffusion
(9.6)
dXt = a(t, Xt ) dt + b(t, Xt ) dWt .
The process X = (Xt )t0 is a Markov process, satisfying

(9.7)
0 t T,
for a function h(x) such that the above expectations are defined. A consequence of the Markov
property is that the right-hand-side of (9.7) is a function of (t, Xt ) only. Write
(9.8)
v(t, x) := E[h(XT )|Xt = x].
Assume that v C 1,2 ([0, T ] R) and that the process defined by Yt := v(t, Xt ) is integrable
(E[|Yt | < ).
Lemma 9.15. The process Y = (Yt )0tT defined by Yt := v(t, Xt ) is a martingale.
Proof. By the Markov property, we have Yt = E[h(XT )|Xt ] = E[h(XT )|Ft ]. Then, for 0 s
t T,
E[Yt |Fs ] =
=
=
=
=
E[E[h(XT )|Xt ]Fs ]

E[E[h(XT )|Ft ]|Fs ] (by Markov property)
E[h(XT )|Fs ] (by Tower property)
E[h(XT )|Xs ]
Ys .
Theorem 9.16 (Feynman-Kac). The function v defined by (9.8) solves the PDE
1
vt (t, x) + a(t, x)vx (t, x) + b2 (t, x)vxx (t, x) = 0,
2
(9.9)
v(T, x) = h(x).
Proof. By the It
o formula
dYt =
dv(t, Xt )
1
= [vt (t, Xt ) + a(t, Xt )vx (t, Xt ) + b2 (t, Xt )vxx (t, Xt )] dt + b(t, Xt )vx (t, Xt ) dWt .
2
Since Y is a martingale the coefficient of the dt term must be zero for all (t, Xt ), and (9.9)
follows.

Note that the PDE (9.9) may be written
(9.10)
vt (t, x) + Av(t, x) = 0,
v(T, x) = h(x),
where A is the generator of the diffusion (9.6):

1
Av(t, x) = a(t, x)vx (t, x) + b2 (t, x)vxx (t, x),
2
(and this is the form that generalises to a multi-dimensional situation, as we shall see later). Note
also that the theorem is still valid if we replace h(XT ) in (9.8) by h(T, XT ), a function dependent
on T as well as XT .
57
10. Multidimensional Brownian motion

Definition 10.1 (d-dimensional Brownian Motion). A d-dimensional Brownian motion is a ddimensional process W = (Wt )t0 , where
Wt = (Wt1 , . . . , Wtd )
with the following properties:
Each Wti (i = 1, . . . , d) is a one-dimensional Brownian motion;
If i 6= j, then the processes Wti and Wtj are independent.
Associated with a d-dimensional Brownian motion, we have a filtration (Ft )t0 such that:
For each t, the random vector Wt is Ft -measurable;
For each t t1 . . . tn , the vector increments
Wt1 Wt , . . . , Wtn Wtn1
are independent of Ft .
10.1. Cross-variations of Brownian motions. Because each component Wti of Wt is a onedimensional Brownian motion, we have
[W i ]t = t,
i = 1, . . . , d.
However, if we define the cross-variation between W i and W j as

[W i , W j ]t := lim
n1
X
kPk0
(Wtik+1 Wtik )(Wtjk+1 Wtjk ),
i, j = 1, . . . , d,
k=0
where P = {t0 , t1 , . . . , tn } is a partition of [0, t], then we have:

Theorem 10.2. If i 6= j, then
[W i , W j ]t = 0.
Proof. Let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. For i 6= j, define the sample cross variation
of Wti and Wtj on [0, t] to be
CP :=
n1
X
(Wtik+1 Wtik )(Wtjk+1 Wtjk ).
k=0
The increments appearing on the RHS of the above equation are all independent of one another
and all have mean zero. Therefore
E[CP ] = 0.
We compute var(CP ) = E[CP2 ]. First note that
CP2
n1
X
(Wtik+1 Wtik )2 (Wtjk+1 Wtjk )2
k=0
n1
X
(Wti`+1 Wti` )(Wtj`+1 Wtj` )(Wtik+1 Wtik )(Wtjk+1 Wtjk ).
+ 2
`<k
All the increments appearing in the sum of cross terms are independent of one another and have
mean zero. Therefore
var(CP ) =
E[CP2 ]
n1
X
(Wtik+1 Wtik )2 (Wtjk+1 Wtjk )2 .
k=0
58
MICHAEL MONOYIOS
But (Wtik+1 Wtik )2 and (Wtjk+1 Wtjk )2 are independent of one another, and each has expectation
(tk+1 tk ). It follows that
var(CP ) =
n1
X
(tk+1 tk ) kPk
k=0
n1
X
(tk+1 tk ) = kPkt.
k=0
As kPk 0 we have var(CP ) 0 so CP converges in mean square4 to the constant E[CP ] = 0.

10.1.1. Levys characterisation of Brownian motion. Levys characterisation of BM (as given in
Theorem 7.15 extends to the multi-dimensional case (see Shreve [17], Section 4.6.3 for more
details). You are expected to know this fact, but are not expected to be able to prove it in the
multi-dimensional case.
Theorem 10.3 (Levys theorem, d-dimensional). Let M be a d-dimensional martingale relative
to a filtration, with M0 = 0, continuous paths, and [M i , M j ]t = ij t for all t 0. Then M is a
d-dimensional BM.
10.2. Two-dimensional It
o formula. There is a multi-dimensional version of the Ito formula.
We content ourselves for now with the following two-dimensional version. The formula generalises
(as we shall see) to any number of processes driven by a Brownian motion of any number (not
necessarily the same number) of dimensions. Let W := (W 1 , W 2 ) be a two-dimensional Brownian
motion (so that W 1 , W 2 are independent Brownian motions), and let X := (X 1 , X 2 ) be a twodimensional It
o process following
(10.1)
dXt = at dt + bt dWt ,
where

at =
a1t
a2t

bt =
b11
b12
t
t
21
bt b22
t ,
so that (10.1) is equivalent to

1
12
2
dXt1 = a1t dt + b11
t dWt + bt dWt ,
1
22
2
dXt2 = a2t dt + b21
t dWt + bt dWt ,
or in integral form
Xt1 = x1 +
Xt2 = x2 +
t
1
12
2
a1s ds + b11
s dWs + bs dWs ,
0
t
1
22
2
a2s ds + b21
s dWs + bs dWs ,
or in compact form
Z
Xt = x +
as ds + bs dWs ,
0
t 0,
1
x
x=
.
x2
Such processes, consisting of a nonrandom initial condition, plus a Riemann integral, plus one or
more Ito integrals, are examples of semimartingales. The integrands as , bs can be any adapted
processes such that the relevant integrals exist. The adaptedness of the integrands guarantees
that X is also adapted.
4The convergence also holds almost surely, though we do not prove this here.
59
Theorem 10.4 (Two-dimensional It

o formula). Let f (t, x1 , x2 ) be a function f : [0, )R2 R.
Then the process Y := (Yt )t0 defined by Yt := f (t, Xt1 , Xt2 ) f (t, Xt ) follows
dYt = ft (t, Xt1 , Xt2 ) dt + fx1 (t, Xt1 , Xt2 ) dXt1 + fx2 (t, Xt1 , Xt2 ) dXt2
1
1
+
fx1 x1 (t, Xt1 , Xt2 ) d[X 1 ]t + fx2 x2 (t, Xt1 , Xt2 ) d[X 2 ]t + fx1 x2 (t, Xt1 , Xt2 ) d[X 1 , X 2 ]t ,
2
2
j
i
j
i
where d[X , X ]t = dXt dXt , i = 1, 2 are computed according to the rules
dt dt = dt dWti = dWti dt = 0,
with
ij =
dWti dWtj = ij dt,
1, i = j,
0, i =
6 j.
In integral form the theorem is

Yt Y0 = f (t, Xt1 , Xt2 ) f (0, X01 , X02 )
Z t
Z t
Z t
fx2 (s, Xs1 , Xs2 ) dXs2
fx1 (s, Xs1 , Xs2 ) dXs1 +
ft (s, Xs1 , Xs2 ) ds +
=
0
0
0
Z
Z t
1 t
1
+
fx x (s, Xs1 , Xs2 ) d[X 1 ]s +
fx x (s, Xs1 , Xs2 ) d[X 2 ]s
2 0 1 1
2 0 2 2
Z t
fx1 x2 (s, Xs1 , Xs2 ) d[X 1 , X 2 ]s .
+
0
10.2.1. Markovian diffusion case. If, in (10.1), we have at = a(t, Xt ), bt = b(t, Xt ) for wellbehaved5 functions a(t, x), b(t, x), so that
then the process X is Markovian:
0 t T.
The integral equation for Y may be written

Z t
Z t
(ft (s, Xs ) + Af (s, Xs )) ds +
(f (s, Xs )) b(s, Xs ) dWs ,
Yt = Y0 +
0
where A is the generator of the two-dimensional diffusion X, defined by

Af (t, x) :=
2
X
i=1
(10.2)
where
1 X X ij
a (t, x)fxi (t, x) +
(bb ) (t, x)fxi xj (t, x)
2
i
i=1 j=1
a (t, x)f (t, x) +
1
2
2 X
2
X
(bb )ij (t, x)fxi xj (t, x),
i=1 j=1
denotes matrix transposition, and where

fx1 (t, x)
f (t, x) =
.
fx1 (t, x)
Example 10.5 (The product rule). Let X, Y be two one-dimensional Ito processes. Applying the
two-dimensional It
o formula with f (x, y) = xy we can derive the product rule
d(Xt Yt ) = Xt dYt + Yt dXt + d[X, Y ]t ,
or, in integral form, with X0 = x, Y0 = y,
Z t
Z t
Z
Xt Yt = xy +
Xs dYs +
Ys dXs +
0
d[X, Y ]s ,
5Lipschitz continuity and linear growth conditions are usually sufficient.
t 0.
60
MICHAEL MONOYIOS
10.3. Multidimensional It
o formula.
10.3.1. Multidimensional It
o process. Let Wt = (Wt1 , . . . , Wtd ) be a vector of d independent Brownian motions, that is, Wt is d-dimensional Brownian motion. We can use the Brownian motion
vector to form the following n It
o processes Xt1 , . . . , Xtn :
1
1d
d
dXt1 = a1t dt + b11
t dWt + + bt dWt
..
.
n
1
nd
d
dXt = ant dt + bn1
t dWt + + bt dWt ,
or, in matrix notation, with X = (X 1 , . . . , X n ) ,

(10.3)
dXt = at dt + bt dWt ,
where
Xt1
Xt = ... at =
Xtn
(10.4)
a1t
.. b =
. t
ant
b11
t
..
.
b1d
t
..
.
nd
bt
..
.
bn1
Note that the coefficients a and b are required to satisfy certain conditions so that the integrals
implicit in the above equations are well defined. In particular, their elements should all be adapted
process, so that we know their values at time t if we know Xt .
Theorem 10.6 (Multidimensional It
o formula). Suppose Xt satisfies (10.3). Let
f (t, x) = (f 1 (t, x), . . . , f p (t, x))
be a twice differentiable map from [0, ) Rn into Rp . Then the process Yt := f (t, Xt ) is again
an It
o process, whose k th component, Ytk , is given by the multidimensional It
o formula as
n
dYtk =
(10.5)
i=1
where
d[X i , X j ]
X f k
1 X X 2f k
f k
(t, Xt ) dt +
(t, Xt ) dXti +
(t, Xt ) d[X i , X j ]t ,
t
xi
2
xi xj
dXti dXtj
i=1 j=1
is computed according to the rules
dWti dWtj = ij dt,
dt dt = dWti dt = dt dWti = 0.
Example 10.7. Let W = (W 1 , . . . , W n ) be Brownian motion in Rn , for n 2. Consider

Rt := |Wt | = ((W 1 )2t + + (W n )2t )1/2 ,
which is a process describing the distance of the n-dimensional Brownian motion from the origin.
Now, the function f (t, x) = |x| is not differentiable at the origin, but since Wt never hits the origin
(almost surely, or with probability one) when n 2 (see, for example, ksendal [15], Exercise
9.7), the multidimensional It
o formula still works.
Take Xt = Wt , so that dXt = dWt , and consider the process Yt = Rt = f (t, Xt ) = f (t, Wt ) =
|Wt | = ((Wt1 )2 + + (Wtn )2 )1/2 . Then f (t, x) = (x21 + . . . + x2n )1/2 , so that
f
= 0,
t
f
xi
=
,
xi
|x|
ij
xi xj
2f
=
.
2
|x|
|x|3
xi
Then
dRt
n
n
n
X
Wti
1 XX
=
dWti +
|Wt |
2
i=1
i=1 j=1
n
X
W i dW i
t
i=1
Rt
n1
dt.
2Rt
ij
W iW j
t 3t
|Wt |
|Wt |
!
ij dt
61
10.4. Multi-dimensional Feynman-Kac theorem. Recall the connection between stochastic calculus for Markov diffusions and partial differential equations (PDEs), the Feynman-Kac
theorem (Theorem 9.16).
There is an obvious generalisation to a multi-dimensional situation. We content ourselves with
the following two-dimensional version. Suppose we have a two-dimensional diffusion X = (X 1 , X 2 )
following
(10.6)
where

a(t, Xt ) =
a1 (t, Xt )
a2 (t, Xt )

, b(t, Xt ) =
b11 (t, Xt ) b12 (t, Xt )

b21 (t, Xt ) b22 (t, Xt ),
so that (10.6) is equivalent to

dXt1 = a1 (t, Xt ) dt + b11 (t, Xt ) dWt1 + b12 (t, Xt ) dWt2 ,
dXt2 = a2 (t, Xt ) dt + b21 (t, Xt ) dWt1 + b22 (t, Xt ) dWt2 .
Let h(t, x) h(t, x1 , x2 ) be a function h : [0, ) R2 R. Define the function
(10.7)
0 t T..
v(t, x) := E[h(XT )|Xt = x],
The generator of the diffusion (10.6) is A, given by (10.2):

Af (t, x) :=
2
X
ai (t, x)fxi (t, x) +
i=1
1 X X ij
(bb ) (t, x)fxi xj (t, x).
2
i=1 j=1
Theorem 10.8 (Feynman-Kac, two-dimensional). The function v(t, x) in (10.7) satisfies the
PDE
vt (t, x) + Av(t, x) = 0, v(T, x) = h(x),
where A is the generator of the diffusion (10.6).
10.5. The Girsanov Theorem*. Given a Brownian motion W := (Wt )0tT on (, F, F, P)
with the filtration F := (Ft )0tT being that generated by W , and given an adapted process
:= (t )0tT , define the (local) martingale Z by
Z t

Z
1 t 2
Zt := E( W )t := exp
s dWs
ds , 0 t T,
2 0 s
0
where E is the so-called Doleans exponential. We have that Z follows
dZt = t Zt dWt .
Then, provided satisfies the Novikov condition

Z T

1
2
(10.8)
E exp
dt
< ,
2 0 t
we can define a new probability measure Q P on F FT by
Z
Q(A) =
ZT dP, A F,
A
and the process

WtQ
Z
:= Wt +
s ds,
0 t T,
is a Q-Brownian motion. We write ZT =

X,
(10.9)
dQ
dP
and we have, for any F-measurable random variable
EQ [X] = E[XZT ].
62
MICHAEL MONOYIOS
Remark 10.9. The Novikov condition (10.8) is sufficient to guarantee that Z is a (P, F)-martingale,
so that E[ZT ] = 1 and Q is indeed a probability measure.
As well as (10.9) we have the following results connecting conditional expectations under Q
and P.
Let 0 t T . If X is Ft -measurable, then
EQ [X] = E[XZt ].
The Bayes formula: If X is Ft -measurable and 0 s t T , then
Zs EQ [X|Fs ] = E[XZt |Fs ].
There is a multi-dimensional version of Girsanovs Theorem. Once again we content ourselves
with a two-dimensional version. Given a two-dimensional Brownian motion W = (W 1 , W 2 ) on
a stochastic basis (, F, F := (Ft )0tT , P), and a two-dimensional adapted process = (1 , 2 ),
define a (local) martingale Z by
Zt = E( W )t E(1 W 1 2 W 2 )t
Z t

Z t
Z

1 t 1 2
1
1
2
2
2 2
= exp
s dWs
s dWs
(s ) + (s ) ds .
2 0
0
0
Then, provided we have the two-dimensional Novikov condition

Z T

1 2
1
2 2
< ,
(t ) + (t ) ds
(10.10)
E exp
2 0
we can define a new probability measure Q P on F FT by
Z
Q(A) =
ZT dP, A F,
A
and the process
WQ
(W Q,1 , W Q,2 )
WtQ,1
defined by
Z t
Z t
Q,2
1
:= Wt +
s ds, Wt := Wt +
s2 ds.
0
is a two-dimensional Q-Brownian motion.

11. The Black-Scholes-Merton model
This is the classical option pricing model dating back to Black-Scholes [3] and Merton [14]. The
basic idea is that one can use absence of arbitrage to value an option on a stock: if the option
is valued correctly, then one should not be able to make sure profits by taking positions in the
option and the stock. In our rendition below we employ the equivalent notion that one can trade
the underlying stock to reproduce the option payoff, and in the absence of arbitrage the portfolio
wealth of the replicating portfolio must be the option value.
We are given a filtered probability space (, F, F = (Ft )t0 , P) on which we define a standard
BM W . A single stock follows the SDE
dSt = St dt + St dWt ,
where the drift and volatility are constants. By the Ito formula

1 2
d(log St ) =
dt + dWt ,
2
so that

1 2
St = S0 exp t + Wt , t 0.
2
There is also a riskless asset with price process S (0) following
(0)
dSt
(0)
= rSt dt,
(0)
S0 = 1,
63
where r 0 is the interest rate. Hence the price of the riskless asset is given by the usual
accumulation factor
(0)
St = exp(rt), t 0.
The model makes a number of idealised assumptions, of a continuous-time frictionless market.
This means that continuous trading is possible, with the absence of any exploitable arbitrage
opportunities, there are no trading costs or limits or taxes (so assets can be held in any amount),
assets are divisible and short-selling is always permitted. We also assume constant parameters in
the price model (this can be relaxed to some extent) and that the stock pays no dividends (this
can be relaxed).
11.1. Portfolio wealth evolution. An agent trades a portfolio of stock and cash in a selffinancing manner, meaning all profits and losses are generated by price changes and by adjusting
the proportion of wealth allocated to the stock and the bond. Let X = (Xt )0tt denote the
wealth process of the agent.
Denote the (adapted) processes for number of shares in the bond and in the stock by H (0) and
H, so that the wealth of the agent at time t [0, T ] is
(0) (0)
(11.1)
0 t T.
Xt := Ht St + Ht St ,
The self-financing condition asserts that once we set the portfolio up we will neither put any more
money into it nor take any out; any increase in the number of stocks must be financed by selling
bonds, any increase in the number of bonds must be financed by selling stocks, and nothing is
sold unless the funds are needed to buy something else. In other words, on seeing the new bond
and stock prices and deciding how many units of each to buy and sell, the change in the number
of stocks must be financed by the change in the number of bonds, and vice versa.
Lemma 11.1. A self-financing portfolio satisfies
(11.2)
(0)
(0)
St dHt + d[S, H]t + St dHt
+ d[S (0) , H (0) ]t = 0,
(self-financing condition).
Proof. In the time interval [t, t + dt) the wealth evolves to Xt + dXt , given by
(0)
(0)
Xt + dXt = Ht (St + dSt ) + Ht (St
(0)
+ dSt )
(0)
= (Ht + dHt )(St + dSt ) + (Ht
(0)
(0)
+ dHt )(St
(0)
+ dSt ),
(0)
the last equality following from the rebalancing of the portfolio to new positions (Ht + dHt , Ht +
(0)
dHt ). Hence, the self-financing portfolio satisfies
(0)
(0)
(St + dSt ) dHt + (St
(0)
+ dSt ) dHt
= 0,
which is the same as (11.2).

Applying the It
o product rule to the definition (11.1) of the wealth process we have
(0)
dXt := Ht dSt + St dHt + d[S, H]t + Ht
(0)
dSt
(0)
(0)
+ St dHt
+ d[S (0) , H (0) ]t ,
and augmenting this with the self-financing condition (11.2) we arrive at the evolution of the
wealth of a self-financing portfolio, given by
(0)
dXt = Ht dSt + Ht
(0)
dSt
(0) (0)
= Ht dSt + rHt St dt.
Many books simply take this as a definition of a self-financing portfolio. Using the definition
(11.1) of X we can write
dXt = Ht dSt + r(Xt Ht St ) dt.
64
MICHAEL MONOYIOS
11.2. Perfect hedging. Consider selling a European claim with payoff h(ST ) at time zero. Suppose that there exists a function v : [0, T ] R+ R+ such that the claims value at t [0, T ]
is v(t, St ) (so we suppose the claim is sold for v(0, S0 )). The goal is to characterise the function
v(t, x) that is consistent with the no-arbitrage principle.
We suppose that the proceeds from the option sale are invested in a self-financing portfolio
following, so X0 = v(0, S0 ). We want to show that we can achieve replication, that is, find a
portfolio whose final value matches the option payoff. To achieve this we insist that the portfolio
wealth matches the option value at all times t [0, T ]:
Xt = v(t, St ),
0 t T.
For this to be hold will also require

(11.3)
dXt = dv(t, St ).
By the Ito formula, the infinitesimal change in the process v(t, St ) is

1
dv(t, St ) = vt (t, St ) dt + vx (t, St ) dSt + vxx (t, St ) d[S]t .
2
Impose (11.3). Equating terms multiplying dSt gives the appropriate holding of shares as
(11.4)
Ht = vx (t, St ) =: t ,
0 t T.
This is the celebrated Black-Scholes (BS) delta hedging rule, and the quantity in (11.4) is called
the delta of the claim.
Using (11.4) and equating terms multiplying dt in (11.3) yields that the option pricing function
v(t, x) must satisfy the BS PDE
1
vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0.
2
For v to represent the option pricing function we would also require the terminal condition
v(T, ST ) = h(ST ). We then have a terminal value problem for v:
1
vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x).
2
Provided we can solve this PDE, then we must have that, to avoid arbitrage, that v(t, St ) must
be the unique option price at time t [0, T ]. If it were not, an immediate arbitrage opportunity
affords itself. For instance, if the claim is available in the market at time zero at a price V0 >
v(0, S0 ), then one can sell the claim and invest in the in the replicating portfolio. The excess
V0 v(0, S0 ) can be invested in the bank account. At time T , one uses the proceeds from the
replicating portfolio to pay ones obligations under the claim, leaving a profit of (V0 v(0, S0 )erT >
0. A symmetric argument is possible if V0 < v(0, S0 ), with reversed positions in the claim and
the replicating portfolio.
(11.5)
11.2.1. Riskless portfolio argument. An equivalent route to the BS PDE is to construct a riskless
portfolio involving the option and the stock. Take a position of one unit in the claim and a short
position of H shares, that is, a position H in the stock, so the overall portfolio has wealth Y
given by
Yt = v(t, St ) Ht St ,
where we once again assume the price process for the claim is given by some function v(t, St ).
The dynamics of the portfolio are given by
dYt =
dv(t, St ) Ht dSt
1
= vt (t, St ) dt + vx (t, St ) dSt + vxx (t, St ) d[S]t Ht dSt .
2
65
Choose H such that the terms involving dSt vanish (so that the terms involving dWt vanish).
This implies that H must be chosen such that
Ht = vx (t, St ),
0 t T,
matching the delta hedging condition we found earlier. With this choice, the portfolio value will
only contain a finite variation term (that is, a dt term) in its dynamics, so in the absence of
arbitrage it must be a riskless portfolio satisfying
dYt = rYt dt.
Combining this with the above choice for H yields once again that the pricing function v must
satisfy the BS PDE as before.
Notice that the BS PDE has no dependence on the stocks P-drift . This is a legacy of
removing all risk associated with the claim.
11.3. Feynman-Kac solution of the BSM equation. By the Feynman-Kac Theorem, a solution to (11.5) is given by
v(t, x) = EQ [er(T t) h(ST )|St = x],
(11.6)
where EQ denotes expectation under a measure Q P, and under which S follows

dSt = rSt dt + St dWtQ ,
(11.7)
where W Q is a Q-Brownian motion. This measure is called a risk-neutral measure, or an equivalent local martingale measure (ELMM), because under it, the discounted stock price is a local
martingale:
d(ert St ) = ert St dWtQ .
We will see later a probabilistic argument that gives a justification for the risk-neutral valuation
result (11.6).
Note that, under the measure Q, the stock price has an average growth rate of r, so in this sense
behaves like a riskless asset. This is no accident. The measure Q has arisen out of an argument
in which all risk associated with a claim was eliminated by dynamic trading. The result of this is
that the claim can be priced by expectation with the caveat that one treats the stock as though
its price grows, on average, like that of a riskless asset. For this reason the formula (11.6) is often
called a risk-neutral valuation formula.
11.4. BS option pricing formulae. We will use the risk-neutral evaluation formula
to derive formulae for some European options.
11.4.1. European call price. For a European call, h(x) = (x K)+ , and under the ELMM Q the
log-stock price is Gaussian. Given St = x, under Q we have

1 2
log ST = log x + r (T t) + (WTQ WtQ ), 0 t T.
2
Hence the probability law of ST under Q, given St = x, is
LawQ [log ST |St = x] = N(m(x, t, T ), 2 (t, T )),
where
0 t T,
N(m, s2 )
denotes the Gaussian probability law of mean m and variance s2 , and where

1 2
m(x, t, T ) = log x + r (T t), 2 (t, T ) = 2 (T t), 0 t T.
2
In terms of Y := log S, we have, writing c(t, x) for the call option pricing function:
c(t, x) = er(T t) EQ [(eYT K)1{YT >log K} |Yt = log x].
66
MICHAEL MONOYIOS
Then an easy computation using Gaussian integrals gives the celebrated Black-Scholes formula
for a call option as c(t, St ), where
c(t, x) = x(y) Ker(T t) (y T t),

(11.8)

1 2
1
x
(11.9)
+ r + (T t) ,
y =
log
K
2
T t
where () denotes the standard cumulative normal distribution function, defined by
2
Z y
1
u
(y) :=
du,
exp
2
2
so that (y) is the probability that a standard normal random variable (one with mean zero and
variance 1) is less than or equal to x. We have (y) = 1 (y), by the symmetry with respect
to negation of the function exp(u2 /2). The call price function is plotted as a function of stock
price in Figure 10.
Intrinsic and Time Value of European Call
9
8
K=10, r=10%, sigma=25%, T=1year
7
call price, C
8
10
stock price, S
12
14
16
18
Figure 10. Black-Scholes call value as function of stock price.

11.4.2. European put price. The Black-Scholes put valuation formula can be obtained from (11.8)
by put-call parity, as
p(t, St ) = Ker(T t) (y + T t) St (y),

where we have used the property (y) = 1 (y).
11.5. Sensitivity parameters (Greeks)*.
11.5.1. Delta. The derivatives of the call pricing function c(t, x) with respect to various variables
are called sensitivity parameters, or Greeks. We have already come across one of these, the delta,
which is
cx (t, x) = (y),
67
with y defined in (11.9). The call option delta is plotted as a function of stock price in Figure
11. It is positive, meaning that if one sells a call option, it is hedged with a dynamically adjusted
long position in the stock.
Delta of Call Option
0.8
call delta
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
stock price
1.2
1.4
1.6
1.8
Figure 11. Black-Scholes call delta as function of stock price. The parameters
are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
For a put option, the delta can be computed using put-call parity, which implies that cx (t, x)
px (t, x) = 1, so that
px (t, y) = (y),
and this function is plotted in Figure 12. It is negative, meaning that if one sells a put option, it
is hedged with a dynamically adjusted short position in the stock.
11.5.2. Theta. The theta of a call is
x
0 (y).
ct (t, x) = rKer(T t) (y T t)
2 T t
Because () and 0 () are always positive, theta is always negative, meaning that the price of
an option declines as we approach maturity (if all other factors remain unchanged). This is true
regardless of whether the option is a call or a put (as you can easily verify using put-call parity).
11.5.3. Gamma. The gamma of a call is
cxx (t, x) =
0 (y),
x T t
which is always positive, and is equal to pxx (t, x), the put gamma (again this follows easily from
put-call parity). The BS gamma is plotted in Figure 13.
Gamma is closely related to volatility and to the risk introduced into the BS hedging program
if trading is not continuous. To get an intuitive understanding of this effect, notice that gamma
measures how quickly the delta of an option changes as the stock price changes. If the magnitude
of gamma is small, then delta changes slowly, so a trader will not have to re-hedge very often in
68
MICHAEL MONOYIOS
Delta of Put Option
0.2
put delta
0.4
0.6
0.8
0.2
0.4
0.6
0.8
1
stock price
1.2
1.4
1.6
1.8
Figure 12. Black-Scholes put delta as function of stock price. The parameters
are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
Gamma of European Option
2.5
gamma
1.5
0.5
0
0
0.2
0.4
0.6
0.8
1
stock price
1.2
1.4
1.6
1.8
Figure 13. Black-Scholes call gamma as function of stock price. The parameters
are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
69
order to maintain delta neutrality. On the other hand, if the magnitude of gamma is large, then
the trader must re-hedge very often to maintain delta neutrality.
To make a portfolio gamma neutral, one cannot use a position in the underlying asset, as this
has zero gamma. In other words, gamma neutrality can only be achieved by adding more option
positions to ones portfolio.
11.5.4. Vega. The vega of an option is the derivative of the option price with respect to volatility,
and given by
c (t, x; ) = x T t0 (y).
This function is plotted versus the stock price in Figure 14. Note the similarity with the gamma
plot, which captures our intuitive notion that gamma does indeed measure sensitivity to volatility
in some way.
Vega of European Option
0.7
0.6
0.5
Vega
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
stock price
1.2
1.4
1.6
Figure 14. Variation of vega with stock price. The parameters are K = 1,
r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.
11.6. Probabilistic (martingale) interpretation of perfect hedging*. The wealth process
in the BSM model satisfies
dXt = rXt dt + Ht St ( dt + dWt ),
where := ( r)/ is called the market price of risk of the stock. Define the discounted stock
e by
price process Se and discounted wealth process X
et := ert Xt , t 0.
Set := ert St , X
e X
e satisfy, under the physical measure P,
Then S,
et = Ht dSet = Ht Set ( dt + dWt ).
dSet = Set ( dt + dWt ), dX
By the Girsanov theorem, if we define the measure Q P

dQ
= ZT := exp WT
dP
by
1 2
T
2

,
70
MICHAEL MONOYIOS
then the process W Q , defined by

WtQ := Wt + t,
t 0,
is a Brownian motion under Q. Hence the discounted stock price and discounted wealth process
are local Q-martingales (and in fact, can be shown to be martingales in the BSM model).
The measure Q, known as an equivalent (local) martingale measure (ELMM), or as a riskneutral measure, is defined as one such that traded asset prices are local Q-martingales, and
notice that it is uniquely defined in the BSM model. This is a consequence of the model being
complete (though we do not prove this here).
Earlier, we found a portfolio wealth process which could replicate the payoff of an option, and
in this case it must satisfy Xt = v(t, St ) = for all t [0, T ], else there is arbitrage. This led us to
the BS PDE and, via the Feynman-Kac theorem, to the representation for the option value:
v(t, St ) = EQ [er(T t) v(T, ST )|St ],
0 t T.
Hence the discounted option price is a Q-martingale, and hence the discounted wealth process of
a replicating strategy must also be a Q-martingale.
There is a purely probabilistic route to these conclusions, with the arguments as follows. Begin
with the Q-dynamics of the wealth process of any self-financing trading strategy:
Z t
(11.10)
t Xt = X0 +
s Hs Ss dWsQ , 0 t T,
0
ert
where t :=
is the discount factor for t [0, T ].
Introduce a European contingent claim with FT -measurable payoff C at time T , and then define
the Q-martingale
Mt := EQ [T C|Ft ], 0 t T.
Then by the representation of Brownian martingales as stochastic integrals (the martingale repRT
resentation theorem), there exists an adapted process ; [0, T ] R with EQ [ 0 t2 dt] <
such that we have
Z
T
s dWsQ ,
Mt = M0 +
0 t T.
e is a Q-martingale, we make the identification M = X, so X0 = M0 = EQ [T C],

Since X = X
and
et = t Xt = EQ [T C|Ft ], 0 t T,
(11.11)
X
by construction. Equivalently,
Z
t Xt = EQ [T C] +
s dWsQ ,
0 t T.
Comparing this with (11.10), we choose the portfolio process H to be determined by HS = .

eT = MT = T C, we have
Moreover, since we have T XT = X
XT = C,
a.s.
so that replication is guaranteed. This argument relies only on the martingale representation
theorem and the existence of a unique measure Q such that the discounted wealth process is a
local Q-martingale. The parameters , in the BSM model could just as well have been random,
provided they were F-adapted processes. Notice that the wealth process of the replicating portfolio
is actually a Q-martingale, not just a local martingale.
Since XT = C for the replication portfolio wealth process, we must have that the value of the
claim at all earlier times is equal to the wealth process. Let V be the value process of the claim.
We then have
(11.12)
Vt = Xt ,
0 t T,
71
with VT = C. Moreover, using (11.11) and (11.12) we arrive at

Vt = er(T t) EQ [C|Ft ],
0 t T,
which is a risk-neutral valuation formula, valid for any European claim.

We must also have dVt = dXt . Under Q, we have the dynamics of the discounted wealth
process
d(t Xt ) = t Ht St dWtQ .
For the discounted claim value we have,
d(t Vt ) = t dVt + Vt dt ,
with no cross-variation term since is of finite variation. Hence, we have
d(t Vt ) = t dVt rt Vt dt.
Since V = X when X is the replication portfolio wealth process, we have, under Q:
t dVt + rt Vt = t Ht St dWtQ .
Now suppose the model is Markovian, and assume Vt = v(t, St ). Then we obtain
(11.13)
t [(vt + LS,Q v)(t, St ) rt v(t, St )] dt + St vx (t, St ) dWtQ = t Ht St dWtQ ,
where LS,Q is the generator of S under Q, given by

1
LS,Q v(t, x) = rxvx (t, x) + 2 x2 vxx .
2
Then (11.13) implies that the hedging strategy for the claim is given by
Ht = vx (t, St ),
0 t T,
the delta hedging rule we found before, and that the claim pricing function must satisfy
vt (t, x) + LS,Q v(t, x) rv(t, x) = 0,
which is the BS PDE.
We are seeing a manifestation of deep results connecting absence of arbitrage with existence
of ELMMs and with completeness. These are called the Fundamental Theorems of Asset Pricing
(FTAPs). In continuous asset price models such as the BSM model, the theorems state that
no-arbitrage is equivalent to the existence of an ELMM, and completeness is equivalent to there
being a unique ELMM. Here is an easy part of the statements in the FTAPs to prove.
Lemma 11.2. If a model has an equivalent martingale measure Q such the discounted wealth
e := X is a Q-martingale, then it admits no arbitrage.
process X
Proof. Suppose there is an arbitrage. Then there exists a portfolio wealth process X with
e0 = 0,
X
eT 0,
X
a.s.
eT > 0] > 0.
and P[X
By the martingale property

(11.14)
eT ] = X
e0 = 0.
EQ [X
eT 0 P-almost surely, we have X

eT 0 Q-almost surely (since Q P). Similarly,
But since X
e
e
eT ] > 0, which
P[XT > 0] > 0 implies Q[XT > 0] > 0. These properties imply that EQ [X
contradicts (11.14), so we conclude that there is no arbitrage.
72
MICHAEL MONOYIOS
11.7. Black-Scholes-Merton analysis for dividend-paying stock. Suppose S pays dividends at a constant dividend yield q. Then the wealth dynamics for a portfolio with H shares of
S and H (0) shares of S (0) become
(0)
dXt = Ht dSt + qHt St dt + Ht
(0)
dSt ,
(0) (0)
since the dividend income received in the interval [t, t+ dt) is qHt St dt. Using Xt = Ht St +Ht St
(0)
(0)
and dSt = rSt dt, this converts to
(11.15)
dXt = Ht dSt + rXt dt + (q r)Ht St dt.
It is easy to apply the same replication analysis as in Section 11.2 to once again yield the same
delta hedging rule as before:
Ht = vx (t, St ), 0 t T,
and this time the option pricing function satisfies the BS PDE with dividend yield q, given as
1
vt (t, x) + (r q)xvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0,
2
with v(T, x) = h(x) (for a claim with path-independent payoff). We then obtain the risk-neutral
pricing formula
where EQ denotes expectation under Q P, and under which S follows
dSt = (r q)St dt + St dWtQ .
One can then go through a similar computation for the price function c(t, x) of a call option, to
obtain
c(t, x) = xeq(T t) (y) Ker(T t) (y T t),

1
1 2
x
y =
+ r q + (T t) .
log
K
2
T t
Notice that we can obtain this formula by the replacement x xeq(T t) in the original BS
formula (11.8).
Using put-call parity, namely,
c(t, St ) p(t, St ) = St eq(T t) Ker(T t) ,
we can compute the put option price function as
p(t, x) = Ker(T t) (y + T t) xeq(T t) (y).
11.8. Time-dependent parameters. It is straightforward to adapt the BS analysis to deal with

time-dependent parameters. So suppose , , r, q are time-dependent, but not stochastic, that is,
the stock price process under P evolves according to
dSt = (t)St dt + (t)St dWt ,
where (t), (t) are deterministic functions of time.
For a portfolio with H shares of S plus cash, the wealth process evolves according to
dXt = Ht dSt + q(t)Ht St dt + r(t)(Xt Ht St ) dt,
where r(t), q(t) are deterministic functions of time.
The replication analysis goes through unchanged from the case with constant parameters. For
a claim with pricing function v(t, x), Set Xt = v(t, St ) for all t [0, T ], and hence also require
dXt = dv(t, St ). Using the It
o formula we have
1
dv(t, St ) = vt (t, St ) + vx (t, St ) dSt + vxx (t, St ) d[S]t .
2
73
So the terms involving dSt equate provided we choose the usual delta-hedging rule:
Ht = vx (t, St ),
0 t T,
and then the remaining terms equate provided v satisfies the BS PDE with time-dependent
parameters:
1
vt (t, x) + (r(t) q(t))xvx (t, x) + 2 (t)x2 vxx (t, x) r(t)v(t, x) = 0,
2
with v(T, x) = h(x) (for a claim with path-independent payoff).
Define the (deterministic) discount factor

Z t
r(s) ds , t 0.
(t) := exp
0
Consider the discounted claim price process u(t, St ) := (t)v(t, St ). This satisfies
1
ut (t, x) + (r(t) q(t))xux (t, x) + 2 (t)x2 uxx (t, x) = 0, u(T, x) = (T )h(x).
2
By the Feynman-Kac theorem, u(t, x) has the representation

Z T

Q
u(t, x) = E exp
r(s) ds h(ST ) St = x ,
0
where
EQ
denotes expectation under Q P, and under which S follows

dSt = (r(t) q(t))St dt + St dWtQ ,
with W Q a Q-BM. Then it is immediate that v(t, x) is given by the risk-neutral pricing formula

Z T

Q

v(t, x) = E exp
r(s) ds h(ST ) St = x .
t
Applying the It
o formula (under Q) to log S we have

1 2
d(log St ) = r(t) q(t) (t) dt + (t) dWtQ .
2
Given t T and St = x, we therefore have

Z T
Z T
1
log ST = log x +
r(s) q(s) 2 (s) ds +
(s) dWsQ .
2
t
t
So the law of the log-stock price process under Q is Gaussian:
LawQ (log ST |St = x) = N (m, 2 ),
(11.16)
with

Z T
1 2
m = log x +
r(s) q(s) (s) ds, 2 =
2 (s) ds.
2
t
t
In the standard BS model, the law of the terminal stock price is given by
Z
2
LawQ
BS (log ST |St = x) = N (mBS , BS ),
with

1
mBS = log x + r q 2 (T t), 2BS = 2 (T t).
2
Comparing with the law of the log-stock price in (11.16), it is evident that we obtain BS-style
formulae for option prices if we make the following replacements in the standard formulae: r r,
q q,
, where
Z T
Z T
Z T
2
r(T t) =
r(s) ds, q(T t) =
q(s) ds,
(T t) =
2 (s) ds.
t
74
MICHAEL MONOYIOS
Under Q, the discounted option price process

Z t
r(s) ds v(t, St ),
Ut := exp
0 t T,
should (of course) be a martingale, so should satisfy an SDE of the form

dUt = Rt dWtQ ,
for some process R, where W Q is a Q-BM. With Ut =: u(t, St ), we have
1
dUt = ut (t, St ) dt + ux (t, St ) dSt + uxx (t, St ) d[S]t .
2
Using the dynamics of S under Q and the PDE satisfied by u we arrive at
dUt = Rt dWtQ ,
where

Z t
r(s) ds vx (t, St ),
Rt = (t)St ux (t, St ) = (t)St exp
0 t T.
12. Claims on futures contracts

12.1. The mechanics of futures markets. Consider a futures contract on a non-dividendpaying stock with maturity T 0 . If entered into at time t T 0 , then it obliges the holder to buy
0
the stock at time T 0 for the so-called futures price Ft,T 0 = St er(T t) . The cost of the contract at
time t is zero.
So the futures contract is a forward contract with maturity T 0 and delivery price K = Ft,T 0 .
The change in value of a forward contract with delivery price K and maturity T 0 is
df (t, T 0 ; K) = d(St Ker(T
0 t)
) = dSt rKer(T
0 t)
) dt.
So the change in value for the holder of a futures contract entered into at time t ought to be the
amount
df (t, T 0 ; Ft,T 0 ) = dSt rSt dt.
Note that
(12.1)
dFt,T 0 = d(St er(T
0 t)
) = er(T
0 t)
( dSt rSt dt) = er(T
0 t)
df (t, T 0 ; Ft,T 0 ).
So, in the BS model, under the physical measure P, we have futures price dynamics
dFt,T 0 = Ft,T 0 [( r) dt + dWt ].
The mechanics of futures markets are such that the holder of a futures contract receives the
amount dFt,T 0 in the interval [t, t + dt) (despite the fact that this is not the change in value of
the associated forward contract).
Definition 12.1. A futures contract with maturity T 0 is a contract which costs nothing to acquire
at any time t [0, T 0 ], and is such that after each time interval [t, t + dt) the contract holder
0
receives the amount dFt , where Ft = St er(T t) .
The result of this is that, if you hold a dynamic portfolio of futures contracts with position
(Ht )0tT 0 plus cash, then the associated portfolio wealth evolves according to
dXt = Ht dFt,T 0 + rXt dt.
Comparing this with (11.15) we see that a futures contract can be considered as an asset which
pays a dividend yield q = r.
75
12.2. Options on futures contracts. In this section, the maturity time T 0 of a futures contract
will be fixed, so we write Ft Ft,T 0 from now on.
A European option, with maturity T T 0 , on a futures contract with maturity T 0 , is a contract
with payoff h(FT ) at time T . For instance, a call option with strike K on a futures contract pays
(FT K)+ at time T . Note that if T = T 0 , then since FT = ST , the futures option in this case
pays the same as a conventional option on the stock.
If one holds a dynamic portfolio of futures contracts with position H = (Ht )0tT 0 , plus some
cash, the associated portfolio wealth at time t evolves according to
dXt = Ht dFt + rXt dt.
Notice that this evolution is precisely that which we would obtain for an asset with price process
F and dividend yield q = r. Thus means that we can value a futures option with BS-style
formulae provided we set the underlying asset price to F and the dividend yield to r, as we now
demonstrate.
Consider a European futures option with maturity T , and with price process (v(t, Ft ))0tT ,
where v(t, x) is some function. This evolves according to
1
dv(t, Ft ) = vt (t, Ft ) dt + vx (t, Ft ) dFt + vxx (t, Ft ) d[F ]t .
2
We attempt to hedge this option with a dynamic portfolio of futures contracts. Imposing the
replication condition
Xt = v(t, Ft ), 0 t T,
and hence also requiring dXt = dv(t, Ft ), gives the required hedge as
Ht = vx (t, Ft ),
0 t T.
Then using the fact that d[F ]t = e2r(T t) d[S]t (from (12.1)), we have d[F ]t = 2 Ft2 dt (in the
BS model), so the futures option price function solves the PDE
1
vt (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x),
2
which is indeed the BS PDE for an asset with dividend yield q = r.
Hence we obtain, via the Feynman-Kac theorem, the risk-neutral valuation formula
v(t, x) = EQ [er(T t) h(FT )|Ft = x],
where, under Q, F follows
dFt = Ft dWtQ ,
with W Q a Q-BM. Note that F is a Q-martingale, since we have

1 2
Q
Q
FT = Ft exp (WT Wt ) (T t) ,
2
t T,
0
so EQ [FT |Ft ] = Ft for t T (and this is also clear from the fact that Ft = St er(T t) =
EQ [ST 0 |Ft ] = EQ [FT 0 |Ft ].)
We observe that we can indeed recover the option price formula from the standard BS model
with dividends by treating the futures option as being written on an underlying asset with price
process F and dividend yield q = r. Hence, the call price function for a futures option is given by

1
x
1 2
r(T t)
c(t, x) = e
[x(y) K(y T t)], y =
log
+ (T t) .
K
2
T t
Lemma 12.2 (Put-call parity). For European call and put futures options with common maturity
T and strike K, the prices at t T satisfy
c(t, Ft ) p(t, Ft ) = er(T t) (Ft K),
0 t T.
76
MICHAEL MONOYIOS
Proof. One can either observe that the required relation follows from the standard put-call parity
relation for an asset with price process F and dividend yield q = r, or else compute that
c(T, FT ) p(T, FT ) = FT K.
Then a discounted risk-neutral expectation, using the fact that the discounted option prices as
well as the futures price are Q-martingales, gives the result.

From the put-call parity relation, the put pricing function is obtained as

1
x
1
p(t, x) = er(T t) [K( T t y) x(y)], y =
log
+ 2 (T t) .
K
2
T t
13. Multi-asset derivatives
It is straightforward to generalise the BSM analysis to the situation where there is more than
one stock in the market and when there are derivatives written on multiple stocks. The key idea
is that we can still perfectly hedge claims on many assets provided the number of traded stocks
is the same as the number of independent Brownian motions driving the stock prices. This keeps
the market complete, as we shall see.
To be concrete, we shall consider a market with two stocks. Their prices S, Y are assumed to
follow the geometric Brownian motions (GBMs)
dYt = Yt ( dt + dBt ), , R, , R+ ,
p
where B, W are correlated BMs, with B = W + 1 2 W , for [1, 1] and W, W
independent BMs. A European claim on S, Y , has payoff h(ST , YT ) at time T . Denote its price
process by (v(t, St , Yt ))0tT , for some function v(t, x, y). The interest rate is r 0.
Form a dynamic self-financing portfolio using S, Y and cash. Let the processes for the number
of shares of S, Y be H S , H Y respectively. The portfolio wealth process X = H S S + H Y Y + C (C
being the cash holding) follows
(13.1)
dSt = St ( dt + dWt ),
dXt = HtS dSt + HtY dYt + r(Xt HtS St HtY Yt ) dt.

For the option price process, the It
o formula gives
dv(t, St , Yt ) = vt (t, St , Yt ) dt + vx (t, St , Yt ) dSt + vy (t, St , Yt ) dYt
1
(vxx (t, St , Yt ) d[S]t + vyy (t, St , Yt ) d[Y ]t ) + vxy (t, St , Yt ) d[S, Y ]t .
+
2
Impose the replication condition Xt = v(t, St , Yt ) for all t [0, T ], and hence dXt = dv(t, St , Yt ).
Then we require
HtS = vx (t, St , Yt ), HtY = vy (t, St , Yt ), 0 t T.
Using this, the replication condition Xt = v(t, St , Yt ), and equating the finite variation terms in
dXt = dv(t, St , Yt ) yields that the function v satisfies
vt (t, x, y) + AQ
S,Y v(t, x, y) rv(t, x, y) = 0,
v(T, x, y) = h(x, y),
where AQ
S,Y denotes the generator of S, Y under the EMM Q:
1 2 2
2 2
AQ
S,Y v := r(xvx + yvy ) + ( x vxx + y vyy ) + xyvxy ,
2
where the arguments of the function have been omitted for brevity. Under Q, the asset prices
follow
p
dSt = St (r dt + dWtQ ), dYt = Yt (r dt + dBtQ ), B Q = W Q + 1 2 W ,Q ,
and (W Q , W ,Q ) is a two-dimensional Q-BM.
77
Using the Feynman-Kac theorem, we can write down an expectation representation for v(t, x, y):
v(t, x, y) = EQ [er(T t) h(ST , YT )|St = x, Yt = y],
0 t T.
Example 13.1. Suppose h(x, y) = xy. We can derive a closed-form formula for v(t, x, y) in this
case. Under Q, S, Y are given by

1 2
1 2
Q
Q
St = S0 exp r t + Wt , Yt = Y0 exp r t + Bt , t 0,
2
2
so in particular, given St Yt = xy at t T , we can compute
EQ [er(T t) ST YT |St = x, Yt = y] = xy exp(r + )(T t),
p
where we have used B Q = W Q + 1 2 W ,Q .
This can also be obtained by showing that U = SY follows a GBM. Under P
(13.2)
dUt = Ut (a dt + b dZt ),
p
where Z = (W + B)/b, a = + + , b = 2 + 2 + 2. Under Q, we make the
replacements r, R (and W W Q , B B Q ) and proceed from there.
Example 13.2. An European exchange option allows the holder to exchange one asset S for another
asset Y if this is favourable, so has payoff h(ST , YT ) = (ST YT )+ . In Problem Sheet 3 we show
that in the two-dimensional lognormal model (13.1), the value of the exchange option at t T ,
given St = x, Yt = y, is given by the so-called Margrabe formula:

1
x
1 2
log
+ (T t) ,
v(t, x, y) = x(d) y(d T t), d =
y
2
T t
where 2 = 2 + 2 2. Notice that the result does not depend on the interest rate.
14. American options in the BSM model
We give an informal treatment of American option pricing in the BS model. The full theory
requires the machinery of stopping times and the theory of optimal stopping, both of which are
beyond the scope of the course.
An American claim with (path-independent) payoff function h() and maturity T pays h(St )
(the so-called intrinsic value) to its holder if exercised at time t [0, T ]. For instance, an American
put pays (K St )+ if exercised at t [0, T ]. It would thus never be exercised if St K. We
conjecture that exercise would occur only if the stock price dropped to a low enough level below
the strike, and we suppose the existence of some critical stock price at each t [0, T ], denoted
Sf (t), such that the option is exercised at t [0, T ] if St Sf (t). The function Sf : [0, T ] R+
is called the optimal exercise boundary, and is an example of a free boundary. This means it is
not known a priori, and must be computed as part of the solution to the pricing problem.
Let (v(t, St ))0tT denote the price process of an American claim. We must have
v(t, St ) h(St ),
t [0, T ].
If this were not so, an immediate arbitrage opportunity would ensue: one could buy the option
and exercise it immediately to make profit h(St ) v(t, St ).
Recall the American valuation algorithm (6.2) in the binomial model, repeated below:
h
i
Vn = h(Sn ), Vt = max h(St ), EQ [(1 + r)1 Vt+1 |Ft ] , t = 0, . . . , n 1.
At each time, one checks whether the intrinsic value is is greater than the value of the discounted
risk-neutral expectation, in which case the option would be exercised at that time. We have two
possibilities:
Either Vt > h(St ), in which case the option is not exercised, and then the discounted
option price is constant on average, or
78
MICHAEL MONOYIOS
Vt = h(St ), in which case the is not exercised, and then the discounted option price is
decreasing on average.
We use this to make plausible the result in continuous time. We conjecture that v(t, St ) is
given by comparing the value of immediate exercise with the value of a discounted expectation
under the EMM. Denote the discounted option price process by u(t, St ) := ert v(t, St ). Under
the EMM Q, we have
h
i
du(t, St ) = ert LBS v(t, St ) dt + St vx (t, St ) dWtQ ,
where LBS denotes the BS operator:
(14.1)
1
LBS v(t, x) := vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x).
2
Observe that when the discounted option value is constant on average, then we have LBS v(t, St ) =
0, with LBS v(t, St ) < 0 when the discounted option value is decreasing on average.
We have two possibilities at each time t [0, T ]:
Either v(t, St ) = h(St ), so that exercise is optimal, and the discounted option price is
decreasing on average, that is LBS v(t, St ) < 0, or
v(t, St ) > h(St ), so that the option is not exercised, and the discounted option price is
constant on average, that is LBS v(t, St ) = 0.
This suggests that the option pricing function satisfies
max[h(x) v(t, x), LBS v(t, x)] = 0,
(14.2)
We interpret (14.2) as follows. At time t [0, T ] the option holder faces two possibilities:
(1) Exercise the option, in which case the maximisation in (14.2) is achieved by the first term,
and we have v(t, St ) = h(St ) (corresponding to St Sf (t) for a put), and LBS v(t, St ) < 0
(so the discounted option price would be decreasing on average, that is, ert v(t, St ) is a
Q-supermartingale). We say that the stock price is in the stopping region.
(2) Do not exercise the option, in which case the maximisation in (14.2) is achieved by the
second term, and we have v(t, St ) > h(St ) (corresponding to St > Sf (t) for a put), and
LBS v(t, St ) = 0 (so the discounted option price would be constant on average, that is,
ert v(t, St ) is a Q-martingale). We say that the stock price is in the continuation region.
We conclude that the American option price function solves the free boundary problem
v(t, x) h(x),
(14.3)
LBS v(t, x) 0,
with the first inequality holding as equality when we are in the stopping region, in which case the
second inequality is strict, or else the first inequality is strict and the second holds as equality,
when we are in the continuation region.
Example 14.1 (American put). Denote the American put price by pA (t, St ) when the stock price
at t [0, T ] is St . There is some free boundary Sf (t), which must be determined as part of the
solution to the problem, such that the American put pricing function pA (t, x) satisfies
(14.4)
LBS pA (t, x) = 0,
pA (t, x) > (K x)+ ,
(14.5)
LBS pA (t, x) < 0,
pA (t, x) = K x,
if x > Sf (t) (continuation region),
if x Sf (t) (stopping region),
where we have used the fact that Sf (t) K to write (K x)+ = K x in the stopping region.
We would also have the boundary conditions
pA (t, Sf (t)) = K Sf (t),
p(T, x) = (K x)+ ,
lim pA (t, x) = 0,
along with one further condition, on the derivative of the pricing function at the critical stock
price x = Sf (t), called the smooth pasting condition, which we shall discuss shortly.
79
Figure 15 shows a schematic drawing of the form of the optimal exercise boundary for the
American put. This boundary has been computed numerically in many papers but there is no
known closed form expression for it (just as there is no exact closed form expression for the
American put price).
Exercise boundary for American put
Stock price
Continuation region
Optimal exercise boundary

S_{f}(t)
Stopping region
t=T
t=0
Time
Figure 15. American put option optimal exercise boundary.

14.1. Smooth pasting condition for American put. It turns out (though we do not prove
this here) that the pricing function pA (t, x) is continuous in t and x. We also have the so-called
smooth pasting condition (also not proven here)
pA
x (t, Sf (t)) = 1.
That is, the pricing function joins the payoff smoothly, and the derivative pA
x (t, x) is continuous
in x. This is shown schematically in Figure 16. The smooth pasting condition is not at all
straightforward to establish rigorously. It cannot be deduced from pA (t, Sf (t)) = K Sf (t),
because we do not know a priori where Sf (t) is, and we need an extra condition to fix it and to
be able to solve the free boundary problem.
Here is a plausibility argument for the smooth pasting condition. This is by no means a rigorous
demonstration. Consider what would happen if smooth pasting did not hold.
A
+
In the top picture of Figure 17, we have pA
x (t, Sf (t)) < 1, which results in p (t, St ) < (K St )
for St > Sf (t), and this violates the lower bound in (14.3), that the option price always dominates
the payoff. This creates an immediate arbitrage opportunity, so is excluded.
In the bottom picture of Figure 17, we have pA
x (t, Sf (t)) > 1. We shall argue that this is also
not possible by considering the exercise policy of the option holder. Recall that in the continuation
region C, the price function pA (t.x) satisfies (14.4) (repeated below)
LBS pA (t, x) = 0,
pA (t, x) > (K x)+ ,
if x > Sf (t) (continuation region),
with boundary condition

pA (t, Sf (t)) = K Sf (t).
80
MICHAEL MONOYIOS
American put price as function of stock price

Put price
Smooth pasting point
S_{f}(t)
S_{t}
Figure 16. American put price as function of stock price.

This boundary condition is part of the pricing problem, providing boundary data for the PDE
LBS pA (t, x) = 0. This means that the choice of Sf (t) affects the values of pA (t, x) for all x > Sf (t).
A
When pA
x (t, Sf (t)) > 1, one could increase the option value p (t, St ) near St = Sf (t) by choosing
A
a lower exercise threshold Sf (t), and then the slope px (t, Sf (t)) would decrease, and approach 1
(this is indicated by the red curve). One could continue this process until pA
x (t, Sf (t)) = 1, but
we cannot achieve a more negative slope, because of the arguments implied by the top picture in
Figure 17. So we conclude that we must have pA
x (t, Sf (t)) = 1.
In summary, the American put pricing function satisfies
pA (t, x) (K x)+ , LBS pA (t, x) = 0, x > Sf (t)
A
p (t, x) = K x,
LBS pA (t, x)
< 0, x Sf (t)
(continuation region),
(stopping region),
along with boundary conditions

pA (t, Sf (t)) = K Sf (t),
pA
x (t, Sf (t) = 1,
p(T, x) = (K x)+ ,
lim pA (t, x) = 0.
14.2. Optimal stopping representation*. We do not prove this here, but the American option
pricing function for a claim with intrinsic value (h(St ))0tT is given by
(14.6)
v(t, x) =
sup
EQ [er( t) h(S )|St = x],
0 t T,
T (t,T )
where T (t, T ) denotes the class of F-stopping times with values in [t, T ]6 and F denotes the
underlying Brownian filtration.
6A stopping time with respect to the filtration F = (F )
t t[0,T ] is a random time variable, with values in [0, T ],
such that the event { t} lies in Ft for every t [0, T ].
81
Put price
Violation of smooth pasting condition (i)
Slope of price function too steep at S_{f}(t)
S_{f}(t)
S_{t}
Put price
Violation of smooth pasting condition (ii)
Slope of price function too shallow at S_{f}(t)
S_{f}(t)
S_{f}(t)
S_{t}
Figure 17. Violation of smooth pasting condition.

14.3. American call on a non-dividend-paying stock. If S pays no dividends, we have
cA (t, St ) = c(t, St ), that is, the American call has the same value as its European counterpart,
and so is never exercised before maturity. For r > 0, here is a simple argument. We have, for
t < T and r > 0,
cA (t, St ) c(t, St ) St Ker(T t) > St K.
But since the last term is the exercise value, the American option price is strictly greater than
this value (for r > 0), so is never exercised early.
A more rigorous approach, which can establish the property for all r 0, is to use the optimal
stopping representation (14.6) and then to show that (ert (St K)+ )0tT is a Q-submartingale
(via the conditional Jensen inequality) so tends to rise, making it optimal to wait at all times
t < T . Here is the argument.
Lemma 14.2. Let h(x) be a non-negative convex function of x 0 satisfying h(0) = 0. Then the
discounted intrinsic value (ert h(St ))0tT of an American claim that pays h(St ) upon exercise,
is a Q-submartingale.
82
MICHAEL MONOYIOS
Proof. Since h is convex, we have

h(x0 + (1 )x1 ) h(x0 ) + (1 )h(x1 ),
0 1,
0 x0 x1 .
This is shown in Figure 18.

Convex function
4
f(x)
3.5
3
f(x0+(1)x1)f(x0)+(1)f(x1)
2.5
f(x)
1.5
0.5
0.5
x0+(1)x1
x0
0
0.2
0.4
0.6
0.8
1
x
1.2
x1
1.4
1.6
1.8
Figure 18. Convex function.

Taking x0 = 0, x1 = x and writing := 1 gives
(14.7)
h(x) h(x),
for all x 0, [0, 1].
For 0 s t T , 0 er(ts) 1, so (14.7) implies that

(14.8)
EQ [er(ts) h(St )|Fs ] EQ [h(er(ts) St )|Fs ].
Using the conditional Jensen inequality and the fact that (ert St )0tT is a Q-martingale, we
have

(14.9)
E Q [h(er(ts) St )|Fs ] h E Q [er(ts) St |Fs ] = h(Ss ), 0 s t T.
Combining (14.8) and (14.9) gives the submartingale property for (ert h(St ))0tT .

Theorem 14.3. Let h(x) be a non-negative convex function of x 0 satisfying h(0) = 0. Then
the value of the American claim expiring at time T and having intrinsic value (h(St ))0tT , is
the same as the value of the European claim with payoff h(ST ).
Proof. Set 0 t T in Lemma 14.2 to obtain
(14.10)
EQ [er(T t) h(ST )|Ft ] h(St ),
0 t T,
which says that the value of the European claim always dominates the exercise value of the
American claim, that is, there is no value in early exercise.

Applying the above results to the call payoff shows that an American call on a non-dividendpaying stock is never exercised early.
83
Remark 14.4. An alternative (but equivalent) way to proceed is as follows. Let Zt := ert h(St ) =
ert (St K)+ be the discounted intrinsic value process of the call, which is a Q-submartingale.
Then, for all stopping times [t, T ] we must have7
EQ [ZT |Ft ] EQ [Z |Ft ] Zt ,
t T.
Taking the supremum over stopping times and re-arranging gives

c(t, St ) cA (t, St ) h(St ),
0 t T.
But we also have the reverse inequality cA (t, St ) c(t, St ), hence cA (t, St ) = c(t, St ).
15. Simple exotic options
Any option which is not a plain vanilla call or put is called an exotic option. There are
(usually) no markets in these options and they are bought over-the-counter (OTC). Effective
risk management of these products is important as they are much less liquid than standard options.
They often have discontinuous payoffs and can have large deltas near expiration which can make
them difficult to hedge.
We recall that we can price any option with payoff function h(ST ) via the risk-neutral valuation
formula
v(t, x) = EQ [er(T t) h(ST )|St = x], 0 t T,
where Q denotes the EMM under which the stock price follows
and that the risk-neutral pricing formula corresponds to the Feynman-Kac solution of the BS
PDE
1
vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x).
2
Note that the risk-neutral valuation result generalises to any payoff, not necessarily one that is
just a function of the final stock price. For a (possibly path-dependent) European claim payoff C
(some FT -measurable random variable), its price at t T is given by
Vt = EQ [er(T t) C|Ft ],
0 t T.
15.1. Digital options. Digital (or binary, or cash-or-nothing) options have discontinuous payoffs.
For example, the cash-or-nothing digital call with strike K and maturity T has payoff hc/n (ST )
given by

0, if ST < K
c/n
h (ST ) =
1, if ST K.
The price of a digital call at t T is cc/n (t, St ), given by
cc/n (t, x) = er(T t) EQ [1{ST >K} |St = x],
and the usual computation with Gaussian integrals yields

1
x
1 2
c/n
r(T t)
c (t, x) = e
(y T t),
log
+ r + (T t) .
K
2
T t
An alternative way of obtaining the above formula is to observe that the standard call option has
price function given by
c(t, x) = er(T t) EQ [(ST K)1{ST >K} |St = x] = x(y) Ker(T t) (y T t).

7Here, we use a result called the optional sampling theorem: for bounded stopping times , satisfying ,
almost surely, if X is a sub-martingale, then E[X |F ] X .
84
MICHAEL MONOYIOS
The first and second terms on the RHS are:

er(T t) EQ [ST 1{ST >K} |St = x] = x(y),
Ker(T t) EQ [1{ST >K} |St = x] = Ker(T t) (y T t),

and the second of these gives the formula for the digital call.
A digital (or cash-or-nothing) put option pays 1 if the terminal stock price is less than the
strike and zero otherwise, so the put price can be obtained via the put-call parity relation
cc/n (t, St ) + pc/n (t, St ) = er(T t) ,
0 t T,
which follows from cc/n (T, ST ) + pc/n (T, ST ) = 1 along with a risk-neutral expectation.
An asset-or-nothing call option has payoff

0,
if ST < K
a/n
h (ST ) =
ST , if ST K.
Once again, a computation using Gaussian integrals can be used to give the price at time t T .
Or we can use the observation that a vanilla call option can be decomposed into a portfolio of an
asset-or-nothing option plus a short position in K cash-or-nothing options. We have
c(t, x) = x(y) Ker(T t) (y T t) = ca/n (t, x) Kcc/n (t, x), 0 t T.

Hence we have ca/n (t, x) = x(y). Hence, a European call option can be replicated by buying one
asset-or-nothing call and selling K cash-or-nothing calls.
15.2. Pay-later options. Digital options can be used in the valuation of pay-later (or contingent
premium) options. A pay-later option is a standard European option which costs nothing to
initiate: the holder pays the premium at maturity, and then only if the option is in the money.
At maturity time T , therefore, a pay-later call option is equivalent to a portfolio consisting of a
long position in a standard call (strike K), plus a short position in cash-or-nothing calls of strike
K, chosen such that the contract has zero value at initiation time t T . Let the required short
position in cash-or-nothing calls be a(t, St ). At maturity time T , the value of the pay-later call
initiated at t T is cp/l (T, ST ), given by
cp/l (T, ST ) = c(T, ST ) a(t, St )cc/n (T, ST ).
Hence, the price function of the pay-later call is given by
cp/l (t, x) = EQ [er(T t) (c(T, ST ) a(t, St )cc/n (T, ST ))|St = x] = c(t, x) a(t, x)cc/n (t, x) = 0,
the last equality by definition. Hence, the required value of a(t, x) is given by
a(t, x) =
c(t, x)
cc/n (t, x)
= xer(T t)
(y)
K.
(y T t)
15.3. Multi-stage options. Multi-stage options are contracts allow decisions to be made, or
stipulate conditions, at intermediate dates during the life of the contract. An example is a forward
start call option with maturity T and forward start time T1 < T (with the contract initiated at at
some time t T1 ). At time T1 < T , the option holder receives an at-the-money call option (one
whose strike is equal to the asset price at T1 , ST1 ) with maturity time T .
The procedure for valuing an option with expiry at T and some intermediate stage with date
T1 is to first determine the final payoff of the option at time T , then determine the value of this
payoff at the intermediate time T1 , and then determine the value at t T1 by using the value of
the contract at time T1 as a terminal payoff.
Denote the price function of the multi-stage option by v m/s (t, x). Then
v m/s (t, x) = EQ [er(T1 t) v m/s (T1 , ST1 )|St = x],
0 t T1 < T.
85
Example 15.1 (Chooser option). A chooser option allows the holder at time T1 the choice of buying,
for an amount K1 , either a call or a put, these options having with strike K and maturity T > T1 .
The terminal payoff at time T is therefore (ST K)+ or (K ST )+ .
At time T1 , the chooser option will be exercised if either of the underlying call or put at that
time are worth more than the chooser strike K1 , that is, if c(T1 , ST1 ) > K1 or p(T1 , ST1 ) > K1 .
The holder of the chooser option will select the vanilla option with the larger T1 -value. Hence the
value of the chooser option at time T1 is given by
v ch (T1 , ST1 ) = max(c(T1 , ST1 ; T, K) K1 , p(T1 , ST1 ; T, K) K1 , 0).
For K1 = 0, we can use put-call parity to write
v ch (T1 , ST1 ) = c(T1 , ST1 ; T, K) + max(0, Ker(T T1 ) ST1 ).
Hence the chooser with strike K1 = 0 is equivalent to a call of maturity T and strike K plus a
put of maturity T1 and strike Ker(T T1 ) . The value of the chooser with strike K1 = 0 at t T1
is then given by
v ch (t, St ) = c(t, St ; T, K) + p(t, St ; T1 , Ker(T T1 ) ).
Problem Sheet 4 values two types of multi-stage option, a forward start option and a ratchet
option.
16. Barrier Options
Barrier options are claims that are activated or de-activated if the asset price crosses a barrier.
The se claims have path-dependent payoffs, that is, the history of the asset price process determines the payout at expiry. Barrier options are sometimes described as weakly path dependent,
because their value turns out to depend only on time and the current asset price.
There are two broad classes of barrier option: knock-in options (which are activated if the
barrier is breached) and knock-out options (which are de-activated if the barrier is breached).
The knock-in options are classified as follows:
(1) An up-and-in option is activated if the barrier is hit from below.
(2) A down-and-in option is activated if the barrier is hit from above.
The knock-out options are classified as follows:
(1) An up-and-out option is de-activated (so becomes worthless) if the barrier is hit from
below.
(2) A down-and-out option, is de-activated (so becomes worthless) if the barrier is hit from
above.
Call options with a barrier allow price reduction over the plain vanilla call. If you want to buy
a call, and believe the asset price will not fall very much, then a down-and-out call is cheaper than
a standard call (though of course you run the risk of losing the option if the barrier is crossed).
More complex barrier options, with multiple barriers, can be constructed. For example, a
double knock-out option has two barriers, say B2 > B1 , and is de-activated if the barrier B1 is
breached from above, and also if the barrier B2 is breached from below.
Some American digital options can be viewed as barrier options, since they become active, and
are exercised, as soon as the asset price reaches the strike. For instance, an American cash-ornothing call pays hc/n (St ) if exercised at t [0, T ], where

0, if St < K
c/n
h (St ) =
1, if St K.
If the current asset price lies above the strike, the option is immediately exercised, as there will
be no greater payoff from waiting. If the current asset price is below the strike, the option would
not be exercised, but as soon as the strike is breached from below, the option is exercised.
86
MICHAEL MONOYIOS
Barrier options sometimes incorporate a rebate. In this case, if the option knocks out, then the
holder receives a rebate R. This is equivalent to adding an American digital option whose strike
is equal to the barrier. For example, consider a down-and-out call with barrier B and rebate R.
If the barrier is breached from above, the call is cancelled, and the holder receives cash R. Hence,
the holder of the down-and-out call with rebate holds a standard down-and-out call plus a long
position in R American cash-or-nothing puts, each with strike B and payoff g c/n (St ) if exercised
at t [0, T ], given by

1, if St B
g c/n (St ) =
0, if St > B.
16.1. PDE approach to valuing barrier options. We outline a PDE approach to valuing
barrier options, using the case of a down-and-out call option. Note that we may consider only
knock-out options, as the value of knock-in options can be found from the observation that a
portfolio of a knock-out option plus a knock-in option with the same barrier and strike is equivalent
to a standard option. Hence, for a down-and-out call with price cd/o (t, St ) and a down-and-in call
with price cd/i (t, St ) at time t [0, T ], we have
cd/o (t, St ) + cd/i (t, St ) = c(t, St ),
where c(t, St ) c(t, St ; K) denotes the price of a vanilla call with the same strike K as the two
barrier options.
Denote the running minimum of the stock price by
mt := min Ss ,
0st
0 t T.
The payoff of the down-and-out call is then

hd/o (ST , mT ) = (ST K)+ 1{mT >B} .
16.1.1. First case: barrier below strike. First consider the case where the barrier lies below the
strike: B < K.
Suppose that at t [0, T ], we have mt > B. Then the barrier has not been breached from
above, and hence we also have St > B. In this case the option has not been cancelled, and the
down-and-out call is an active standard call option at this point. Assuming (as we have) that the
down-and-out call option price function is a function only of time and current stock price,8 then
in this region of the (t, St )-plane, the option will satisfy the BS PDE
LBS cd/o (t, St ) = 0,
for St > B,
LBS
where
denotes the BS operator in (14.1).
Now, for a standard call we have boundary conditions c(T, ST ) = (ST K)+ , c(0, S0 ) = 0,
limSt c(t, St ) = St . For the down-and-out call option, we still have the first and third of
these as long as the option is still active. But since the option is cancelled (and hence becomes
worthless) as soon as the barrier is breached from above, in place of the zero boundary condition
at zero stock price we instead have a zero boundary condition at the barrier:
cd/o (t, B) = 0.
(16.1)
In summary, the problem of valuing the down-and-out-call becomes one of finding a solution to
the BS PDE subject to this altered boundary condition:
(16.2)
LBS cd/o (t, St ) = 0
for St > B,
subject to cd/o (t, B) = 0.
We now perform a change of variables which converts the BS PDE to the heat equation. Define
x
1
y(x) := log
, (t) := 2 (T t).
K
2
8This can be justified using a probabilistic approach to the valuation, by computing a risk-neutral expectation
of the payoff (ST K)+ 1{mT <B} , as in Remark 16.1.
87
For the standard call option price function, set

c(t, x)
1
1
2r
exp(a by), a := (k + 1)2 , b := (k 1), k := 2 ,
K
4
2
Then u will satisfy a heat equation with an appropriate initial condition, as we now show. We
have
x
1
c(t, x) = Ku( (t), y(x)) exp (a (t) + by(x)) , (t) = 2 (T t), y(x) = log
.
2
K
So

u
c
d u
1
= Kea +by
+ au = 2 Kea +by
+ au .
t
dt
2
Similarly

u
c
K a +by u
a +by dy
= Ke
+ bu = e
+ bu ,
x
dx y
x
y
and

u
K a +by
2u
2c
(2b 1)
= 2e
+ b(b 1)u + 2 .
x2
x
y
y
Then computing the LHS of the BS PDE gives the diffusion equation u = uyy on using the
definitions of a, b and some algebra.
The boundary condition c(T, x) = (x K)+ translates to
u(, y) :=
Ku( (T ), y(x)) exp (a (T ) + by(x)) = K(ey 1)+ ,

and using (T ) = 0 this converts to
u(0, y) = (ey 1)+ eby =

exp

+
1
1
(k + 1)y exp
(k 1)y
.
2
2
In summary, u satisfies the heat equation with an appropriate initial condition:

+
u
2u
1
1
(16.3)
(, y) =
(, y), u(0, y) = exp
(k + 1)y exp
(k 1)y
=: hc (y).
y 2
2
2
We can apply the same change of variables to the down-and-out call. That is, we write
cd/o (t, x) = Kud/o ( (t), y(x)) exp (a (t) + by(x)) ,
for some function ud/o (, y). The down-and-out call price function satisfies the heat equation
for x > B, and the value of y corresponding to the barrier level B is yB := log(B/K), so the
boundary condition (16.1) becomes
(16.4)
ud/o (, yB ) = 0.
(Notice also that limx cd/o (t, x) = x translates to limy ud/o (, y) = exp(a + (1 b)y.)
Further, the transformed down-and-out call price function ud/o satisfies the heat equation for
x > B, so the IBVP (16.2)
d/o
ud/o
(, y) = uyy (, y),
for y > yB ,
ud/o (, yB ) = 0.
Having translated BS-type PDEs to the heat equation, we may apply the so-called method of
images to the problem of the down-and-out call. The argument is as follows.
The problem (16.3) for the standard call option is analogous to the flow of heat in an infinite
rod (since < y < ) with initial condition u(0, y) = hc (y). The corresponding problem for
the down-and-out call is analogous to the flow of heat in a semi-infinite rod with the temperature
held at zero at one end, corresponding to the boundary condition (16.4). The method of images
relies on the key observation that solutions to the heat equation are unaffected by translation or
reflection of the spatial co-ordinate: if u(, y) solves the heat equation, then so do u(, y + y0 ) and
u(, y + y0 ), for any constant y0 .
88
MICHAEL MONOYIOS
Hence, to solve the down-and-out call valuation problem (in the semi-infinite interval) we
use a solution made up of the solution of two infinite problems with equal and opposite initial
temperature distributions which cancel at the point yB , so that we get the correct boundary
condition (16.4). Then, by uniqueness of the solution to the initial-and-boundary-value problem
(IBVP) the resulting function is the solution we want.
We therefore reflect the initial data about the point yB , and at the same time change its sign
(creating an image solution) and combine the original solution and the image solution so as to
respect the boundary condition (16.4). The required initial condition is thus given by
ud/o (0, y) = hc (y) hc (2yB y),
which automatically satisfies ud/o (0, yB ) = 0. The solution for the down-and-out call at arbitrary
time is then given by
ud/o (, y) = u(, y) u(, 2yB y),
(16.5)
where u is the solution of the vanilla call problem, and by construction we respect the boundary
condition (16.4). In other words we have written
ud/o (, y) = u(, y) + u0 (, y),
where u0 (, y) = u(, 2yB y) is a solution of a problem on an infinite interval with antisymmetric initial data.
Translating the solution (16.5) into the original variables, we have
cd/o (t, x) = Kud/o (, y)ea +by
= K(u(, y) u(, 2yB y))ea +by
= c(t, x) Ku(, 2yB y))ea +by .
But for any y R we have
Ku(, y) = c(t, Key )ea by .
Using

2yB y = log
B 2 /x
K

,
we obtain
Ku(, 2yB y))ea +by = c(t, B 2 /x)e2b(yyB ) ,
or
x 12r/2
c(t, B 2 /x).
B
Hence the down-and-out call price function for the case B < K is given by cd/o (t, x) cd/o (t, x; K),
where)
x 12r/2
c(t, B 2 /x), B < K.
cd/o (t, x) = c(t, x)
B
16.1.2. Second case: barrier above strike. If the barrier is above the strike, B > K, then the
payoff of the down-and-out call is discontinuous. The payoff at the terminal time is that of a
standard call option with strike B together with (B K) cash-or-nothing calls of strike B. We
can apply the same reasoning as above to construct the reflected solution, to find that the price
function of down-and-out call is given by
Ku(, 2yB y) =
cd/o (t, x) = c(t, x; B) + (B K)cc/n (t, x; B)

x 12r/2

c(t, B 2 /x; B) + (B K)cc/n (t, B 2 /x; B) ,

B
c/n
where c(t, x; B) and c (t, x; B) denote the price function of a vanilla call and a cash-or-nothing
call of strike B.
89
Remark 16.1 (Probabilistic valuation of barrier option*). An alternative to the PDE derivation
above would be to value the down-and-out call using a risk-neutral expectation:
cd/o (t, x) = er(T t) EQ [(ST K)+ 1{mT >B} |St = x].
This would require the joint distribution of ST and the running minimum of the stock price. (It
is not obvious that the result for the down-and-out value would depend only on the current value
of the stock price, but this turns out to be true.) This type of computation is carried out for an
up-and-out call in Shreve [17], Section 7.3.3.
17. Lookback options
Lookback options allow the holder to purchase (respectively, sell) a stock at the lowest (respectively, highest) price attained over the lifetime of the option, or to achieve the maximum absolute
difference between the stock price at maturity and its maximum (or minimum).
Example 17.1 (Floating and fixed strike lookback put payoffs). Denote the running maximum of
the stock price by
Mt := max Ss , 0 t T.
0st
Then the payoff of the floating strike lookback put option is (MT ST ).
The payoff of the fixed strike lookback put option is (K mT )+ , where m denotes the running
minimum of the stock price.
17.1. PDE satisfied by lookback pricing function. Let (Vt )0tT denote the price process
of a lookback option. For concreteness, let us consider a floating strike lookback put, with payoff
VT = h(ST , MT ) = MT ST . Recall that the general risk-neutral valuation result is valid for
path-dependent payoffs, so the price at time t T of the option is given by
Vt = er(T t) EQ [MT ST |Ft ]
0 t T,
and in particular, the discounted option price process is a Q-martingale. We shall use this property
shortly do derive a PDE satisfied by the pricing function of a floating strike lookback option.
Lemma 17.2. The two-dimensional process (S, M ) = (St , Mt )0tT is a Markov process.
To prove this lemma We will needuse the following result, called the Independence Lemma, a
proof of which is given in the Appendix (Lemma B.2).
Lemma 17.3 (Independence Lemma). Let X and Y be two random variables (which can be
vector valued) on a probability space (, F, P). Let G be a sub--algebra of F. Suppose that X is
independent of G while Y is G-measurable. Then for any function f such that E[|f (X, Y )|] < ,
we have
E[f (X, Y )|G] = g(Y ),
where
g(y) := E[f (X, y)].
Proof. See the Appendix (Lemma B.2).

Proof of lemma 17.2. So, for a fixed t [0, T ] consider EQ [f (ST , MT )|Ft ] (we shall work under
the EMM Q, though this does not affect the result). The stock price is given by

1 2
1
Q
r , t 0,
St = S0 exp (t + Wt ) , :=
2
for some Q-BM W Q . Define the process
ct := W Q + t,
W
t
t [0, T ],
90
MICHAEL MONOYIOS
which is a drifted Q-Brownian motion. Define also its running maximum by

ct := max W
cs ,
M
0st
t [0, T ].
c, M
c, we have
In terms of W
(17.1)
ct ),
St = S0 exp( W
ct ),
Mt = S0 exp( M
t [0, T ].
Hence, we may write

c
c
ST = St exp (WT Wt ) ,

c
c
MT = Mt exp (MT Mt ) ,
t [0, T ].
c attains a new maximun in [t, T ], then

If W
cT M
ct = max W
cu M
ct .
M
tuT
cT M
ct = 0. Hence, we have
Otherwise, M
+
+

c
c
c
c
c
c
c
c
= max (Wu Wt ) (Mt Wt )
.
MT Mt = max Wu Mt
tuT
tuT
Multiplying this equation by and using (17.1), we obtain

+

M
t
cT M
ct ) = max (W
cu W
ct ) log
.
(M
tuT
St
Hence
MT

cT M
ct )
= Mt exp (M
"
+ #

M
t
cu W
ct ) log
= Mt exp
.
max (W
tuT
St
We define the random variables

c
c
c
c
X := (X1 , X2 ) := WT Wt , max (Wu Wt ) ,
tuT
Y := (Y1 , Y2 ) := (St , Mt ).
Then X is independent of Ft and Y is Ft -measurable, and we have

+
f (ST , MT ) = f St eX1 , Mt e(X2 log(Mt /St )) .
With this X, Y , and with G = Ft in the Independence Lemma, define
h
i
+
g(y1 , y2 ) := EQ f (y1 eX1 , y2 e(X2 log(y2 /y1 )) ) .
The Independence Lemma then gives
EQ [f (ST , MT )|Ft ] = g(St , Mt ),
which is the Markov property.

The above Markov property implies that the price of the lookback option at t T is given by
Vt = v(t, St , Mt ), for some function v(t, x, y), that we shall assume is nice enough to be able to
apply the Ito formula. To do this, we shall need to know the properties of the differential dMt
and, in particular, d[M ]t . It turns out that because M is a non-decreasing process, it is of finite
variation, and in particular has zero quadratic variation.
Lemma 17.4. The process M = (Mt )t0 has zero quadratic variation: [M ]t = 0 for all t 0.
91
Proof. Let P = {t0 , . . . , tn } be a partition of [0, t] for any fixed t 0, and write Mk :=
Mtk+1 Mtk for k = 0, 1, . . . , n 1. Compute the sample quadratic variation

n1
n1
X
X
2
QP :=
(Mk )
max Mk
Mk .
k=0
k=0,...,n1
k=0
This converges to zero as kPk 0 because the first factor on the RHS vanishes, due to the fact
that the paths of M are continuous.

Observe that the above proof worked because because M has non-decreasing paths so we did
not have to use the absoulte value |Mk | inside the maximisation operator.
So, M has zero quadratic variation, and similar arguments (consider these as exercises) show
that it is of finite variation (equal to Mt M0 for any t 0), and that the cross variation between
S and M is zero: [S, M ]t = 0 for all t 0. However, note that there is no process such that
Rt
Mt = M0 + 0 u du. It turns out (though we do not show this here, see Shreve [17] Section 7.4.2
for more details) that M only increases on sets of Lebesgue measure zero. (The paths of M are
said to be singularly continuous.)
Theorem 17.5 (PDE for floating strike lookback option). In the standard BSM model, the pricing
function v(t, x, y) of the floating lookback put option
v(t, x, y) := er(T t) EQ [MT ST |St = x, Mt = y],
(17.2)
satisfies the PDE
(17.3)
BS

v(t, x, y)

1 2 2
vt + rxvx + x vxx rv (t, x, y) = 0,
2
0 t < T,
0 < x y,
with boundary conditions

(17.4)
(17.5)
(17.6)
v(t, 0, y) = er(T t) y, 0 t T, y 0,
vy (t, y, y) = 0, 0 t < T, y > 0,
v(T, x, y) = y x, 0 x y.
Proof. Apply the It

o formula to the discounted price process ert v(t, St , Mt ) to obtain
i
h
d(ert v(t, St , Mt ) = ert LBS v(t, St , Mt ) dt + vy (t, St , Mt ) dMt + St vx (t, St , Mt ) dWtQ .
This must be the increment of a martingale, so the coefficient of the dt term must be zero. This
leads to the BSM PDE (17.3). But we must also have that the term involving dMt must be zero.
It is certainly zero whenever M does not increase (so when St < Mt ), but at points of increase
of M (when St = Mt ) we therefore need vy (t, Mt , Mt ) = 0, yielding (17.5). If at any time t T
we have St = 0 then ST = 0, so M will be constant on [t, T ], and hence if Mt = y then MT = y.
In other words, if St = 0 and Mt = y, then MT ST = y, and we discount this value to get the
option value at t T , leading to (17.4). Finally, (17.6) is just the option payoff.

One way to solve the lookback price function PDE is via an extension of the method of images
that we used for barrier options. An alternative is to compute the risk-neutral expectation (17.2),
which would need the joint law of ST , MT . (This is done in Section 7.4.4 of Shreve [17].) Neither
of these computations will be carried out here, but we present the solution, after reducing the
dimension of the PDE by using the homogeneity of v in (x, y), as given by the following lemma.
Lemma 17.6. The function v in (17.2) satisfies
(17.7)
v(t, x, y) = v(t, x, y),
for any > 0.
92
MICHAEL MONOYIOS
Proof. We have
v(t, x, y) = er(T t) EQ [MT ST |St = x, Mt = y].
Define the processes S () := S, M () := M . Re-write v(t, x, y) as

()
() + ()
()
r(T t) Q 1
(17.8)
v(t, x, y) = e
E
(M ST ) St = x, Mt = y .
T
()
Recall that the dynamics of S under Q are

where W Q is a Q-BM. The dynamics of S () under Q are the same as those of S, and hence so is
their probability law, and so the same holds for M . Put another way, given St = x, the terminal
stock price is given by

1
Q
Q
ST = x exp t + Wt,T
, := r 2 , Wt,T
:= WTQ WtQ ,
2
from which it is evident that
()
()
LawQ (ST |St
= x) = LawQ (ST |St = x),
and hence a similar result will also hold for M . We can therefore re-write (17.8) as
1
1
v(t, x, y) = er(T t) EQ [MT ST |St = x, Yt = y] = v(t, x, y).
which is the result we want.

If we now define z := x/y and u(t, z) := v(t, z, 1), then v(t, x, y) = yu(t, x/y) and the lookback
PDE and boundary conditions reduce to the problem
LBS u(t, z) = 0,
0 t < T,
0 < z 1,
r(T t)
u(t, 0) = e
, 0 t T,
uz (t, 1) = u(t, 1), 0 t < T,
u(T, z) = 1 z, 0 z 1.
It can be checked that the following function satisfies the PDE and boundary conditions (Shreve
[17] has the full computation in Exercise 7.5).

u(t, z) = z(d) + er(T t) d + T t

2
2r
r(T t) (12r/ 2 )
+
T t
,
z(d) e
z
d
2r
where

1
1 2
d=
log z + r + (T t) .
2
T t
18. Asian options
An Asian option is a claim involving the average of the asset price over the option lifetime.
Define the process
Z t
Yt :=
Su du, t > 0.
0
The average of the stock price over [0, t] is then defined as At := Yt /t. Then an Asian option
will have some payoff h(ST , AT ). For instance, the fixed strike Asian call option with maturity
T and strike K has payoff (AT K)+ , while the floating strike Asian call option has payoff
(ST AT )+ . The original motivation for introducing these claims into financial markets was as
93
a way of preventing people easily affecting the realised payoff of an option by manipulating the
stock price.
It turns out that Asian options are strongly path-dependent. The claim price process depends
on both S and Y (and Y depends on the entire history of the asset price). To see this, observe
that the SDEs for S, Y under Q are
dYt = St dt.
If we define the two-dimensional process X = (S, Y (with denoting transposition), then X

satisfies an SDE of the form (under Q, though this is not the salient point)
dXt = a(Xt ) dt + b(Xt ) dWtQ ,
so that X is Markov. Hence, the Asian option pricing function will be of the form
v(t, x, y) = er(T t) EQ [g(T, ST , YT )|St = x, Yt = y],
0 t T,
where g(T, ST , YT ) is the claim payoff.

There is no closed form formula for the pricing function of such Asian options. We can,
however, derive the PDE satisfied by the pricing function, using the usual replication argument
(or equivalently by the fact that the discounted price process (er(T t) v(t, St , Yt ))t[0,T ] will be a
Q-martingale).
Construct a hedging portfolio with wealth process X = HS + C, where C is the cash position
and H is the process for the number of shares. Then, as usual
dXt = Ht dSt + r(Xt Ht St ) dt,
while by the It
o formula, the claim price process satisfies
1
dv(t, St , Yt ) = vt (t, St , Yt ) dt + vx (t, St , Yt ) dSt + vxx (t, St , Yt ) d[S]t + vy (t, St , Yt ) dYt .
2
Here we have used the fact that since dYt = St dt, then [Y ] = [S, Y ] = 0. Then we impose
the replication conditions Xt = v(t, St , Yt ) for all t [0, T ], thus requiring dXt = dv(t, St , Yt )
yields the usual delta-hedging rule Ht = vx (t, St , Yt ). Equating the finite variation terms in
dXt = dv(t, St , Yt ) and using Xt = v(t, St , Yt ) then yields that the price function satisfies the
PDE
1
vt (t, x, y) + rxvx (t, x, y) + xvy (t, x, y) + 2 x2 vxx (t, x, y) rv(t, x, y) = 0, v(T, x, y) = g(T, x, y).
2
19. Incomplete markets
The most salient feature of the BSM model was that it was possible to replicate all claims.
The BSM model is an example of a complete market. But this is a very special case. One can
easily construct models where the completenesss property is lost, simply by introducing extra
risk factors, in the form of more Brownian motions driving the asset prices that there are traded
assets. There is ample evidence that the BS model is not a true reflection of reality. This is
obvious from its assumption of perfect, frictionless markets, but also from empirical evidence that
stock price volatility is stochastic.
19.1. Implied volatility. Suppose we observe a market price at time t T of a traded call (or
put) option with maturity T and strike K. Denote this observed market price by C(t, St ; T, K).
Denote by BS(t, St ; T, K; ) the BS formula for the price of such a vanilla option, with the
volatility value input into the formula.
Definition 19.1 (Implied volatility). The implied volatility of an option with observed market
price C(t, St ; T, K) is the unique number
e(t; T, K) satisfying
BS(t, St ; T, K;
e(t; T, K)) = C(t, St ; T, K),
0 t K.
94
MICHAEL MONOYIOS
If one plots the implied volatility (IV) of options of different strikes (equivalently, of different
moneyness St /K) and of different maturity, using market prices of options, then if the BS model
was a true depiction of stock price dynamics, one would see a flat IV surface across moneyness
and maturity. But this is not observed in market option prices. These show a variation in
implied volatility with strike price and maturity, with an apparently stochastic dynamics. This
may be evidence of stochastic volatility in the underlying stock price process. Indeed, stochastic
volatility option pricing models can produce some of the observed features seen in empirical
implied volatilities, though there is evidence that additional stochastic factors are needed to
capture all the features of the observed implied volatility surface.
19.2. Local volatility model. Here is a simple generalisation of the BS model that allows the
volatility to depend on both time and the current level of the stock price (and hence be stochastic,
but only through dependence on the stock price):
dSt = St dt + (t, St )St dWt ,
for some non-negative function (, ). Then the market is still complete, with the usual replication
arguments going through unchanged. Option price functions will still satisfy the BS PDE with
volatility (t, x):
1
vt (t, x) + rxvx (t, x) + 2 (t, x)x2 vxx (t, x) rv(t, x) = 0.
2
Remark 19.2 (Recovering the local volatility function from option prices*). Dupire [5] has shown
how it is possible to find a volatility function (t, x) that is consistent with an observed implied
volatility surface on a given date, given (the idealised scenario) that we can observe market prices
of vanilla options of all strikes K 0. This is not to say that a local volatility model represents
a true model of how volatilities actually evolve. It is better thought of as an effective theory
a code which represents a convenient parametrisation of todays observed option prices (which
might be generated by a stochastic volatility model, or a model with yet more factors).
Indeed, empirical studies show that the local volatility function consistent with option prices
on a fixed date is unstable over time. This reflect the limitations of such models: they contradict
the empirical observation that price and volatility are not perfectly correlated (empirical studies
show a negative correlation between price and volatility), and since they are complete, options are
redundant in these models, so they say nothing about vega hedging. In essence, to get a true
reflection of the underlying process, one needs volatility to be genuinely stochastic, as described
shortly. However, this does not necessarily mean that inaccurate models are not useful in practical
hedging of options, as discussed in Section 19.4.
See Shreve [17] Exercise 6.10 for a guide to Dupires result.
19.3. Stochastic volatility models. A stochastic volatility model is one in which we let the
volatility process of a stock be a stochastic process in its own right, with a risky component that
is correlated with that driving the stock price. Here is one specification of such a model. Under
the physical measure P, a stock price S and its volatility Y follow
dSt = St dt + Yt St dWt ,
dYt = a(Yt ) dt + b(Yt ) dBt ,
p
where the Brownian motions W, B have correlation [1, 1]. We write B = W + 1 2 W ,
with (W, W ) := (Wt , Wt )0tT a two-dimensional Brownian motion on a complete filtered
probability space (, F, F := (Ft )0tT , P).
We claim that this market is incomplete, in that there is one traded asset and two (so more
than the number of traded assets) independent driving Brownian motions W, W , which are the
sources of risk. (In general, if the number of sources of risk is greater than the number of traded
risky assets, then the model is incomplete).
For simplicity, let us take the interest rate to be zero. Suppose we have a European claim on
the stock, with payoff h(ST ). (We could allow for a claim whose payoff also depends on Y , but
95
this would not change the arguments below, so for simplicity we do not do this here.) Consider
attempting to replicate this option with a dynamic self-financing portfolio involving the traded
asset S (plus cash). The portfolio wealth process X := (Xt )0tT satisfies
(19.1)
dXt = Ht dSt ,
where H = (Ht )0tT is the process for the number of shares of S in the portfolio.
The value process of the claim will be (Vt )t[0,T ] , where Vt = v(t, St , Yt ), for some function
v(t, x, y). In other words, because Y enters into the dynamics for S, claims on S will have price
processes that depend on Y . The It
o formula gives
1
dv(t, St , Yt ) = vt (t, St , Yt ) dt + vx (t, St , Yt ) dSt + vxx (t, St , Yt ) d[S]t
2
1
+ vy (t, St , Yt ) dYt + vyy (t, St , Yt ) d[Y ]t + vxy (t, St , Yt ) d[S, Y ]t .
(19.2)
2
It is immediately apparent from (19.1) and (19.2) that perfect hedging of the claim using the
portfolio with value process X is not possible, because of the term in (19.2) involving dYt , which
has a component involving dWt , representing the unhedgeable risk associated with the claim.
19.3.1. Completion of the market. Now introduce a second traded asset in the form of another
claim with payoff g(ST 0 ) at time T 0 T , and we emphasise that this claim is traded in the market.
Denote the price of this claim at t T 0 by Ut = u(t, St , Yt ), for some function u(t, x, y).
Form a portfolio with Ht units of S and HtU units of U at time t [0, T ], with the remaining
wealth in the form of cash. The value process of the portfolio is now X = (Xt )0tT satisfying
dXt = Ht dSt + HtU dUt ,
with dUt = du(t, St , Yt ) given by (19.2) with v(t, St , Yt ) replaced by u(t, St , Yt ). Use this portfolio
to (attempt to) perfectly hedge the claim V , so set Xt = v(t, St , Yt ) for all t [0, T ]. This requires
dXt = dv(t, St , Yt ). Equating terms involving dYt gives
(19.3)
HtU uy (t, St , Yt ) = vy (t, St , Yt ),
0 t T.
This is a vega hedging formula. The volatility risk associated with sensitivity to changes in Y for
the claim V is represented by the right-hand-side of (19.3), and this is hedged using HtU units of
claim U .
Using (19.3) and equating terms involving dSt then gives

vy
ux (t, St , Yt ),
(19.4)
Ht = vx (t, St , Yt )
uy
which is a (generalised) delta hedging formula. (The number of units of stock is the delta of the
claim to be hedged, less the delta of the HtU units of the claim that has been used to achieve the
vega hedge.)
Finally, using (19.3) and (19.4) and equating the finite variation terms in dXt = dv(t, St , Yt )
gives that the functions u, v must satisfy

1 2 2
1 2
1
(19.5)
ut + x y uxx + b (y)uyy + xyb(y)uxy
uy
2
2

1
1 2 2
1 2
=
vt + x y vxx + b (y)vyy + xyb(y)vxy ,
vy
2
2
for all (t, x, y) [0, T ] R2+ (with the arguments of u, v omitted for brevity).
Now, the left-hand-side of (19.6) contains terms involving u(t, x, y) only, while the right-handside of (19.6) contains terms involving v(t, x, y) only. In principle, therefore, the the left-hand-side
of (19.6) depends on T 0 (but not on T ) while the the right-hand-side of (19.6) depends on T (but
not on T 0 ). Hence we can only have equality in (19.6) if both sides of the equation do not depend
96
MICHAEL MONOYIOS
on either of T, T 0 . In other words, both sides of (19.6) must be equal to some function (t, x, y) of
(t, x, y) only. Then both u(t, x, y) and v(t, x, y) satisfy a PDE of the same form. For v we obtain
1
1
(19.6) vt + x2 y 2 vxx (t, x, y)uy + b2 (y)uyy + xyb(y)uxy = 0, v(T, x, y) = h(x),
2
2
with u satisfying the same PDE but with terminal condition u(T 0 , x, y) = g(x).
By the Feynman-Kac theorem, both u and v will have expectation representations. For v we
have
v(t, x, y) = EQ [h(ST )|St = x, Yt = y],
where Q is a measure equivalent to P and under which S, Y have dynamics
dSt = Yt St dWtQ ,
dYt = (t, St , Yt ) dt + b(Yt ) dBtQ ,
where W Q , B Q are Q-BMs with correlation . We observe that Q is an EMM, as S is a Qmartingale, while the drift of Y is arbitrary, as it is not a traded asset. There are many possible
EMMs, given the arbitrary drift of Y . This is a reflection of the incompleteness of the market
with stochastic volatility.
19.4. Robustness of the Black-Scholes formula. We end with a remarkable robustness property of BS-style hedging. We know that the stock price dynamics in the BS model are almost
certainly wrong, but this does not necessarilty imply that we cannot use a delta-hedging rule
based on the BS formula to achieve a successful hedge, even in the face of severe model error, as
the following argument shows.
Suppose the true price process of a stock is
dSt = t St dt + t St dWt ,
where (t , t )t0 are processes adapted to a filtration F = (Ft )t0 . The market is not necessarily
complete, so the filtration F can be larger than filtration generated by the BM W .
Suppose a trader sell an option (say, a call with some maturity T ) at time zero using an IV of
0 . That is, the option is sold for v(0, S0 ) where v(t, x) solves the BS PDE with volatility 0 :
1
(19.7)
vt (t, x) + rxvx (t, x) + 02 x2 vxx (t, x) rv(t, x) = 0.
2
The trader uses the proceeds of the option sale to form a hedge portfolio with initial value
X0 = v(0, S0 ), and then uses the hedge Ht = vx (t, St ) (so that Xt Ht St is in cash) at t [0, T ].
Define Rt := Xt v(t, St ), as the tracking error (or residual risk). Using the Ito formula and
the PDE satisfied by v(t, x), we have (exercise!)
1
d(ert Rt ) = ert St2 vxx (t, St )(02 t2 ) dt.
2
We conclude that since vxx (t, St ) 0 (for both a call and a put) we have RT 0 a.s. if 0 t
for all t [0, T ]. In other words, the hedging strategy makes a profit with probability 1 if the
implied volatility 0 is high enough. In this sense, successful hedging is entirely a matter of good
volatility estimation.
This is a crucial result, as it shows that successful hedging is quite possible even under significant
model error. Without some robustness property of this kind, it is hard to imagine that the
derivatives industry could exist at all.
Appendix A. Conditional expectation, martingales, equivalent measures
A.1. Independence. Here is a general treatment of independence in a finite probability space
(, F, P). Many of the definitions as written here extend to general probability spaces.
Definition A.1 (Independence of sets). Two sets A F and B F are independent if
P(A B) = P(A)P(B).
97
To see that this is a correct definition, suppose that a random experiment is conducted, and
is the outcome. The probability that A is P(A). Suppose you are not told , but you are
told that B. Conditional on this information, the probability that A is
P(A|B) :=
P(A B)
.
P(B)
The sets A and B are independent if and only if this conditional probability is the unconditional
probability P(A), i.e. knowing that B does not change the probability you assign to A. This
discussion is symmetric with respect to A and B; if A and B are independent and you know that
A, the conditional probability you assign to B is still the unconditional probability P(B).
Note that whether two sets are independent depends on the probability measure P.
Definition A.2 (Independence of -algebras). Let G and H be sub--algebras of F. We say that
G and H are independent if every set in G is independent of every set in H, i.e.
P(A B) = P(A)P(B),
for every A G, B H.
Definition A.3 (Independence of random variables). Two random variables X and Y are independent if the -algebras they generate, (X) and (Y ), are independent.
The above definition says that for independent random variables X and Y , every set defined
in terms of X is independent of every set defined in terms of Y .
Suppose X and Y are independent random variables. The measure induced by X on R is
X (A) := P{X A}, for A R. Similarly, the measure induced by Y is Y (B) := P{Y B},
for B R. The pair (X, Y ) takes values in the plane R2 , and we define the measure induced by
the pair (X, Y ) as
X,Y (C) := P{(X, Y ) C}, C R2 .
In particular, C could be a rectangle, i.e. a set of the form A B, where A R and B R. In
this case
{(X, Y ) C} = {(X, Y ) A B} = {X A} {Y B},
and X and Y are independent if and only if
X,Y (A B) = P({(X A} {Y B})
= P{(X A}P{(Y B}
= X (A)Y (B).
In other words, for independent random variables X and Y , the joint distribution represented by
X,Y factorises into the product of the marginal distributions represented by the measures X
and Y .
Theorem A.4. Suppose X and Y are independent random variables. Let g and h be functions
from R to R. Then g(X) and h(Y ) are also independent random variables.
Proof. We prove this only in the special case where g, h are bijections (but the result is true in
general). Put W = g(X) and Z = h(Y ). We must consider sets in (W ) and (Z). But a typical
set in (W ) is of the form
{ : W () A} = { : g(X()) A},
which is defined in terms of the random variable X, and is therefore in (X). So, every set in
(W ) is also in (X). Similarly every set in (Z) is also in (Y ). Since every set in (X) is
independent of every set in (Y ), we conclude that every set in (W ) is independent of every set
in (Z).
98
MICHAEL MONOYIOS
Definition A.5. Let X1 , X2 , . . . be a sequence of random variables. We say that these random
variables are independent if for every sequence of sets A1 (X1 ), A2 (X2 ), . . ., and for every
positive integer n,
P(A1 A2 . . . An ) = P(A1 )P(A2 ) . . . P(An ).
Theorem A.6. If two random variables X and Y are independent, and if g and h are functions
from R to R, then
E[g(X)h(Y )] = E[g(X)] E[h(Y )],
provided all the expectations are defined.
Proof. Note that by Theorem A.4 it is enough to prove the result for g(x) = x and h(y) = y.
We prove this only in a finite probability space when X, Y can take on only finitely many values
X = xi , i = 1, . . . , K and Y = yj , j = 1, . . . , L. We use the fact that in this case, the expectation
P
of X has the familiar form E[X] = K
i=1 xi P{X = xi }. So we have
E[XY ] =
K X
L
X
xi yj P({X = xi } {Y = yj })
i=1 j=1
K X
L
X
xi yj P{X = xi }P{y = yj }
i=1 j=1
K
X
xi P{X = xi }
i=1
L
X
yj P{y = yj }
j=1
= E[X] E[Y ].

Remark A.7 (The standard machine). For general probability spaces, the above theorem in proved
using the Lebesgue integral representation of expectation, and an argument which Shreve [17]
(Section 1.5) calls the standard machine. Let g(x) = 1A (x) and h(y) = 1B (y) be indicator
functions. Then the equation we are trying to prove becomes
P({X A} {Y B}) = P{X A}P{Y B},
which is true because X and Y are independent. Now this is extended to simple functions
(sums of indicator functions) by linearity of expectation. Sequences of such functions can always
be constructed that converge to general functions g and h, and then an integral convergence
theorem, the Monotone Convergence Theorem,9 gives the result.
The covariance of two random variables X and Y is
cov(X, Y ) := E[(X EX)(Y EY )] = E[XY ] E[X] E[Y ],
so var(X) = cov(X, X). According to Theorem A.6, two independent random variables have zero
covariance (though the opposite is not necessarily true!).
9The Monotone Convergence Theorem is as follows. Let X , n = 1, 2, . . . be a sequence of random variables
n
converging almost surely to a random variable X (that is, P {Xn = X} 1 as n ). Assume that
0 X1 X2 . . . ,
Then
almost surely.
Z
X dP = lim
Xn dP,
or equivalently
E[X] = lim E[Xn ].
n
99
For independent random variables, the variance of their sum is the sum of their variances.
Indeed, for any two random variables X and Y , and Z = X + Y , then
var(Z) = var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y ),
so that for independent X and Y , var(X + Y ) = var(X) + var(Y ).
This argument extends to any finite number of random variables. If we are given independent
random variables X1 , X2 , . . . , Xn , then
var(X1 + . . . + Xn ) = var(X1 ) + . . . + var(Xn ).
Example A.8. Toss a coin twice, so = {HH, HT, T H, T T }, with probability p (0, 1) for H
and probability q = 1 p for T on each toss. Let A = {HH, HT } and B = {HT, T H}. We have
P(A) = p2 + pq = p, P(B) = pq + qp = 2pq, and P(A B) = pq. The sets A and B are independent
if and only if 2p2 q = pq, that is, if and only if p = 12 .
Let G = F1 be the -algebra determined by the first toss and H be the -algebra determined
by the second toss. Then, writing AH := {HH, HT } and AT := {T H, T T }, we have
G = {, , AH , AT }
H = {, , {HH, T H}, {HT, T T }}.
It is easy to see that these two -algebras are independent. For example, if we choose AH from
G and {HH, T H} from H, we find
P(AH )P{HH, T H} = p(p2 + pq) = p2 ,
P(AH {HH, T H}) = P{HH} = p2 .
This will be true no matter which sets we choose from G and H. This captures the notion that
the coin tosses are independent of each other.
A.2. Conditional expectation. Recall Definition 4.1 of conditional expectation E[X|G] given
a random variable X on (, F, P), with G a sub--algebra of F.
A.2.1. Partial averaging. The partial averaging property is
Z
Z
E[X|G] dP =
X dP, A G.
A
We can rewrite this as

(A.1)
E[1A E[X|G]] = E[1A X].
Note that 1A () (which equals 1 for A and 0 otherwise) is a G-measurable random variable.
Equation (A.1) suggests (and it is indeed true) that the following holds.
Lemma A.9. If V is any G-measurable random variable, then provided E[V E[X|G]] < ,
(A.2)
E[V E[X|G]] = E[V X].
Proof. Here is a sketch of the proof in a general probability space, using an argument that Williams
[18] calles the standard machine.
First use (A.1) and linearity of expectations to prove (A.2) when V is a simple G-measurable
P
random variable, i.e. V = K
k=1 ck Ak , where each Ak G and each ck is constant. Next consider
the case that V is a nonnegative G-measurable random variable, not necessarily simple. Such a V
can be written as the limit of an increasing (almost surely) sequence of simple random variables
Vn . We write (A.2) for each Vn and pass to the limit n , using the Monotone Convergence
Theorem, to obtain (A.2) for V . Finally, the general (integrable) G-measurable random variable
V can be written as the difference of two nonnegative random variables, V = V + V , and since
(A.2) holds for V + and V it must hold for V as well.
100
MICHAEL MONOYIOS
Based on Lemma A.9, we can replace the second condition in the definition of conditional
expectation by (A.2), so that the defining properties of Y = E[X|G] are:
(1) Y = E[X|G] is G-measurable.
(2) For every G-measurable random variable V , we have
E[V E[X|G]] = E[V X].
Note that we can write (A.2) as
E[V (E[X|G] X)] = 0,
which allows an interpretation of E[X|G] as the projection of the vector X on to the subspace
G. Then E[X|G] X is perpendicular to any V in the subspace.
A.2.2. Properties of conditional expectation. Here are some proofs of the properties of conditional
expectation given in Section 4.2 (and repeated below). All the X below satisfy E|X| < .
(1) E[E[X|G]] = E[X].
Proof. Take A = in the partial averaging property (or, equivalently, V = 1 in (A.2)).

(2) If X is G-measurable, then E[X|G] = X.
R
R
R
Proof. The partial averaging property A Y dP A E[X|G] dP = A X dP holds trivially
when Y is replaced by X. Then, if X is G-measurable, it satisfies the first requirement in
the definition of conditional expectation as well.

(3) (Linearity) For a1 , a2 R,
E[a1 X1 + a2 X2 |G] = a1 E[X1 |G] + a2 E[X2 |G].
Proof. By linearity of integrals (i.e. of expectations), as follows: E[a1 X1 + a2 X2 |G] is
G-measurable and satisfies, for any A G,
Z
E[a1 X1 + a2 X2 |G] dP
A
Z
=
(a1 X1 + a2 X2 ) dP (partial averaging)
A
Z
Z
= a1
X1 dP + a2
X2 dP, (linearity of integrals)
A
ZA
Z
= a1
E[X1 |G] dP + a2
E[X2 |G] dP (partial averaging)
A
Z A
=
(a1 E[X1 |G] + a2 E[X2 |G]) dP, (linearity of integrals),
A
and this is the partial averaging property.

(4) (Positivity) If X 0 almost surely, then E[X|G] 0 almost surely.
Proof. Let A = { : E[X|G] < 0}. This set is in G since E[X|G] is G-measurable.
Now, the partial averaging property says that
Z
Z
E[X|G] dP =
X dP.
A
The RHS of this is 0 and the LHS is < 0 unless P(A) = 0. Therefore we must have
P(A) = 0 E[X|G] 0 almost surely.
101
(5) (Jensens inequality) If : R R is convex and E|(X)| < , then

E[(X)|G] (E[X|G]).
Proof. Recall the usual Jensen inequality: E[(X)] (E[X]). The proof of the conditional version follows exactly the same lines (see the proof of Theorem 23.9 in Jacod and
Protter [10]).

(6) (Tower property) If H is a sub--algebra of G, then
E[E[X|G]|H] = E[X|H],
a.s.
Proof. If H H, then it is also true that H G, since H is a sub--algebra of G. Hence

Z
Z
E[X|G] dP
E[E[X|G]|H] dP =
H
ZH
X dP
=
ZH
E[X|H] dP,
=
H
so by a.s. uniqueness of conditional expectations, there is a unique H-measurable random

variable on the left- and right-hand-sides of the above, so we must have E[E[X|G]|H] =
E[X|H] a.s.

(7) (Taking out what is known). If Z is G-measurable, then
E[ZX|G] = Z E[X|G].
Proof. Let ZE[X|G is G-measurable (since the product of G-measurable functions is Gmeasurable), so satisfies the first property of a conditional expectation. So we check the
partial averaging property. For A G we have
Z
Z
Z
ZE[X|G] dP = E[1A ZE[X|G]] = E[1A ZX] =
ZX dP =
E[ZX|G] dP,
A
(the second equality obtained using (A.2) with V = 1A Z), so the partial averaging property holds.

(8) (Role of independence) If X is independent of H (i.e. if (X) and H are independent
-algebras), then
E[X|H] = E[X].
Proof. Observe first that E[X] is H-measurable, since it is not random. So we only need
to check the partial averaging property; we require that
Z
Z
E[X] dP =
X dP, A H.
A
If X is an indicator of some set B, which by assumption must be independent of H, then

the partial averaging equation we must check is
Z
Z
P(B) dP =
1B dP.
A
The LHS is P(A)P(B), and the RHS is

Z
Z
1A 1B dP = 1AB dP = P(A B),
102
MICHAEL MONOYIOS
and so the partial averaging property holds because the sets A and B are independent. The
partial averaging property for general X independent of H then follows by the standard
machine.

Remark A.10. There are also analogues of integral convergence theorems such as Fatous Lemma,
and the Monotone and Dominated Convergence Theorems, for conditional expectations as opposed
to ordinary expectations.
A.3. Martingales. A simple argument using the tower property and induction shows the following.
Lemma A.11. Let (Mt )nt=0 be a martingale with respect to the filtration (Ft )nt=0 . Then
E[Mt+u |Ft ] = Mt ,
for arbitrary u {1, 2, . . . , n t}.
Proof. Consider E[Mt+2 )|Ft ]. By the tower property,
E[Mt+2 |Ft ] = E[E[Mt+2 |Ft+1 ]|Ft ] = E[Mt+1 |Ft ] = Mt ,
and continuing in this fashion we get
E[Mt+u |Ft ] = Mt ,
for u = 1, 2, . . . , n t.
Lemma A.12. Let X be an integrable random variable (E [|X|] < ) on a filtered probability
space (, F, F := (Ft )nt=0 , P). Define
Mt := E[X|Ft ],
t {0, 1, . . . , n}.
Then M := (Mt )nt=0 is a (P, F)-martingale.

Proof. Integrability of M is clear, since we have
E[|Mt |] = E[|E[X|Ft ]|] E[E[|X||Ft ]] = E[|X|] < ,
the inequality following from the conditional Jensen inequality and the monotonicity of the expectation operator. For the martingale property, we have
E[Mt+1 |Ft ] = E[E[X|Ft+1 ]|Ft ]
= E[X|Ft ] (by Tower property)
= Mt .

Recall Definition 4.5 of a predictable process.
Proposition A.13 (Martingale transform). Let (Mt )nt=0 be a martingale on a filtered probability
space (, (Ft )nt=0 , F, P), and denote F := (Ft )nt=0 . Let (t )nt=1 be a bounded predictable process.
Then the process N := (Nt )nt=1 defined by
Nt :=
t
X
s=1
is a (P, F)-martingale.
s (Ms Ms1 ),
103
Proof. Boundedness of leads easily to integrability of M . We also have

#
" t+1

X

E[Nt+1 |Ft ] = E
s (Ms Ms1 ) Ft

s=1
#
" t

X

= E
s (Ms Ms1 ) + t+1 (Mt+1 Mt ) Ft

s=1
= E [ Nt + t+1 (Mt+1 Mt )| Ft ]
= Nt + t+1 (E[Mt+1 |Ft ] Mt ) (since Nt and t+1 are Ft -measurable)
|
{z
}
=0
= Nt .

Remark A.14. The process N is called
a martingale transform or a discrete time stochastic integral,
R
and is sometimes denoted Nt = [0,t] s dMs or Nt = ( M )t .
The following proposition is another very useful characterisation of martingales.
Proposition A.15. On a filtered probability space (, (Ft )nt=0 , F, P), denote F := (Ft )nt=0 . An
adapted sequence of real random variables (Mt )nt=0 is a (P, F)-martingale if and only if for any
predictable process (t )nt=1 , we have
!
t
X
E
s Ms = 0,
s=1
where Ms := Ms Ms1 .
Proof. If (Mt )nt=0 is a martingale, define the process X := (Xt )nt=0
P
1, . . . , n, Xt := ts=1 s Ms for any predictable process (t )nt=1 . Then
Proposition A.13, and so E[Xt ] =X0 = 0.
Pt
Conversely, if E
s=1 s Ms = 0 holds for any predictable ,
A Fm be given, and define a predictable process by setting m+1
t {1, . . . , n}. Then
!
t
X
0 = E
s Ms
by X0 = 0 and, for t =
X is also a martingale, by
take m {1, . . . , n}, let
= 1A , t = 0 for all other
s=1
= E[1A (Mm+1 Mm )]
= E [E[1A (Mm+1 Mm )]|Fm ]
= E [1A (E[Mm+1 |Fm ] Mm )] .
Since this holds for all A Fm it follows that E[Mm+1 |Fm ] = Mm , so M is a martingale.

A.4. Equivalent measures and the Radon-Nikodym theorem. Here is a deep theorem,
which we do not prove.
Theorem A.16 (Radon-Nikodym). Let P and Q be two probability measures on a measurable
space (, F), such that Q is absolutely continuous with respect to P. Under this assumption, there
is a nonnegative random variable Z such that
Z
(A.3)
Q(A) =
Z dP, A F,
A
and Z is called the Radon-Nikodym derivative of Q with respect to P.
104
MICHAEL MONOYIOS
The random variable Z is often written as

Z=
dQ
.
dP
Equation (A.3) implies the apparently stronger condition

EQ [X] = E[XZ]
for every random variable X for which E[XZ] < . To see this, note that (A.3) in Theorem A.16
is equivalent to
EQ [1A ] = E[1A Z],
A F.
This is then extended this to general X via the standard machine argument.
If P and Q are equivalent and Z is the Radon-Nikodym derivative of Q w.r.t. P, then
Radon-Nikodym derivative of P w.r.t. Q, i.e.
1
Z
is the
EQ [X] = E[XZ] X,

1
Q
Y,
E[Y ] = E Y
Z
(A.4)
(A.5)
and letting X and Y be related by Y = XZ we see that the above two equations are the same.
Example A.17 (Radon-Nikodym theorem in 2-period coin toss space). Let = 2 given by
2 = {HH, HT, TH, TT},
the set of coin toss sequences of length 2. Let P correspond to probability 13 for H and 32 for T,
and let Q correspond to probability 12 for H and 12 for T. Then the Radon-Nikodym derivative of
Q w.r.t. P is easily seen to be
Q()
, ,
Z() =
P()
so that
9
Z(HH) = ,
4
9
Z(HT) = ,
8
9
Z(TH) = ,
8
Z(TT) =
9
.
16
A.4.1. Radon-Nikodym martingales. Let be a finite set (such as the set of all sequences of n
coin tosses). Let F = (Ft )t0 be a filtration. Let P be a probability measure and let Q be a
measure absolutely continuous with respect to P, written as Q P. Assume
P() > 0,
Q() > 0,
so that P and Q are equivalent. The Radon-Nikodym derivative of Q with respect to P is

Z() =
Q()
.
P()
Define the P-martingale

Zt := E[Z|Ft ],
t = 0, 1, . . . , n.
We can check that Zt is indeed a martingale:

E[Zt+1 |Ft ] = E[E[Z|Ft+1 ]|Ft ]
= E[Z|Ft ]
= Zt .
Lemma A.18. For t {0, 1, . . . , n}, if X is Ft -measurable, then EQ [X] = E[XZt ].
105
Proof.
EQ [X] =
=
=
=
E[XZ]
E[E[XZ|Ft ]]
E[X.E[Z|Ft ]]
E[XZt ].
Note that Lemma A.18 implies that if X is Ft -measurable, then for any A Ft ,
EQ [1A X] = E[1A XZt ],
or equivalently,
Z
XZt dP.
X dQ =
A
Lemma A.19. If X is Ft -measurable and 0 s t, then

EQ [X|Fs ] =
1
E[XZt |Fs ].
Zs
Proof. Note first that Z1s E[XZt |Fs ] is Fs -measurable. So for any A Fs , we have
Z
Z
1
E[XZt |Fs ] dQ =
E[XZt |Fs ] dP (Lemma A.18)
A Zs
ZA
=
XZt dP (partial averaging)
A
Z
=
X dQ (Lemma A.18)
A
Z
=
EQ [X|Fs ] dQ (partial averaging).
A

Example A.20 (Radon-Nikodym theorem in 2-period coin toss space, continued). We show in
Figure 19 the values of the martingale Zt . Note that we always have Z0 = 1, since
Z
Z0 = E[Z] =
Z dP = Q() = 1.
Z2 (HH) =
9
4
Z1 (H) = 32
@
@ Z2 (HT) = Z2 (TH) =
Z0 = 1
@
@
Z1 (T) = 34
@
@Z (TT) =
2
9
8
9
16
Figure 19. The values of the Radon-Nikodym martingale Zt in the 2-period binomial model example
106
MICHAEL MONOYIOS
A.4.2. Conditional expectation and Radon-Nikodym theorem. Here, we give another application
of the Radon-Nikodym theorem.
Let (, F, Q) be a probability space. Let G be a sub--algebra of F, and let X be a non-negative
random variable with
Z
(A.6)
X dQ = 1.
We can construct the conditional expectation (under Q) of X given G. Recall that this is the
(unique) G-measurable random variable EQ [X|G] that satisfies the partial averaging property
Z
Z
Q
X dQ, A G.
E [X|G] dQ =
A
e by
On G we can define two probability measures P and P
A G,
P(A) = Q(A),
and
Z
X dQ,
e
P(A)
=
A G.
e is indeed a probability measure since it satisfies P()

e
Notice that P
= 1, by (A.6).
Now, whenever Y is a G-measurable random variable, we have
Z
Z
(A.7)
Y dP =
Y dQ,
since if Y = 1A for some A G then (A.7) is just the definition of P, and the full result follows
from the standard machine.
e
Also, if A G and P(A) = 0, then Q(A) = 0, so that P(A)
= 0. In other words, the measure
e is absolutely continuous with respect to the measure P. The Radon-Nikodym Theorem then
P
implies that there exists a G-measurable random variable Z such that
Z
e
P(A)
=
Z dP, A G,
A
that is,
Z
Z
X dQ =
Z dP,
A G,
Z dQ,
A G.
or, by (A.7),
Z
Z
X dQ =
Thus Z has the partial averaging property, and since it is G-measurable, it is the conditional
expectation (under Q) of X given G. In other words, the existence of conditional expectations is
a consequence of the Radon-Nikodym Theorem.
Appendix B. Markov processes
Definition B.1. Let (, F, P) be a probability space. Let (Ft )nt=0 be a filtration under F. Let
(Xt )nt=0 be a stochastic process on (, F, P). This process is said to be Markov if (Xt ) is adapted
to the filtration (Ft ), and:
The Markov property: For each t = 0, 1, . . . , n 1, the distribution of Xt+1 conditioned
on Ft is the same as the distribution of Xt+1 conditioned on Xt .
It is intuitively clear that the stock price process in the binomial model is Markov (we shall
prove this formally later). If we want to estimate the distribution of h(St+1 ), where h is any
function, based on the information in Ft , the only relevant piece of information is the value of St .
For example,
E(St+1 |Ft ) = (pu + qd)St = (1 + r)St ,
107
which is a function of St .
B.1. Proving a process is Markov. Recall the notions of independence of -algebras, and of
a random variable X being independent of a -algebra G (that is, the -algebra generated by X
is independent of G). Recall also the following facts about expectations involving independent
random variables.
Any random variable X induces a measure X on the measurable space (R, B(R)), defined by
X (B) := P(X 1 (B)) = P{X B},
for any set B R in the Borel -algebra B(R). Then the expectation of h(X) is defined by the
Lebesgue integral
Z
Z
Z
E[h(X)] :=
h(X) dP =
h(x) dX (x)
h(x)X ( dx).
[Aside: To prove the last

R equality. If h(x) R= 1B (x) for some B R, then we have E[h(X)] =
P{X B} =: X (B) = R 1B (x) dX (x) = R h(x) dX (x), which is true by definition. Now use
the standard machine to extend the result to general h().]
For two random variables X, Y , their joint law is the measure X,Y on (R2 , B(R2 )) defined by
X (C) := P{(X, Y ) C},
for C B(R2 ). Then the expectation of f (X, Y ) is given by
Z
Z
Z
E[f (X, Y )] :=
f (X, Y ) dP =
f (x, y) dX,Y (x, y)
R2
f (x, y)X,Y ( dx, dy).
R2
Suppose X and Y are independent. Let C R2 be a rectangle, C = A B, for A, B R. In

this case,
{(X, Y ) C} = {(X, Y ) A B} = {X A} {Y B}.
Then, as X and Y are independent,
X,Y (C) = X,Y (A B) =
=
=
=
P{(X, Y ) C}
P ({X A} {Y B})
P{X A}P{Y B}
X (A)Y (B).
In other words, the joint distribution of X, Y factorises into the product of the marginal distributions X , Y . In particular, we then have
Z
Z
(B.1)
E[f (X, Y )] =
f (x, y)X,Y ( dx, dy) =
f (x, y)X ( dx)Y ( dy).
R2
R2
(Note this implies that E[XY ] = E[X] E[Y ].)

B.1.1. The Independence Lemma. The following Lemma will be useful in proving a process is
Markov.
Lemma B.2 (Independence Lemma). Let X and Y be random variables taking values in Rn , Rm
respectively, on a probability space (, F, P). Let G be a sub--algebra of F. Suppose that X is
independent of G while Y is G-measurable. Then for any measurable function f : Rn+m R such
that E[|f (X, Y )|] < , we have
E[f (X, Y )|G] = g(Y ),
where
Z
g(y) :=
f (x, y)X ( dx) = E[f (X, y)].
Rn
Here, X is the distribution of X, the measure on the Borel sets B n of Rn defined by X (B) =
P(X B) for B B n .
108
MICHAEL MONOYIOS
Proof. Recall that the partial averaging property is equivalent to the statement that for any
bounded G-measurable random variable Z, we have
E[ZE[f (X, Y )|G]] = E[Zf (X, Y )].
We therefore need to show that for all such G-measurable Z we have
E[Zf (X, Y )] = E[Zg(Y )].
Let X,Y,Z be the distribution of the Rn+m+1 -valued random variable (X, Y, Z). Since X is independent of G, the random variables X and (Y, Z) are independent, so that X,Y,Z ( dx, dy, dz) =
X ( dx)Y,Z ( dy, dz). Hence
Z
zf (x, y)X,Y,Z ( dx, dy, dz)
E[Zf (X, Y )] =
Rn+m+1

Z
Z
z
f (x, y)X ( dx) Z,Y ( dy, dz)
=
Rm+1
Rn
Z
=
zg(y)Z,Y ( dy, dz)
Rm+1
= E[Zg(y)].

Example B.3 (The binomial stock price is Markov). Consider an n-period binomial model. Fix a
time t and define X := St+1 /St and G := Ft . Then X = u if t+1 = H and X = d if t+1 = T .
Since X depends only on the outcome of coin toss t + 1, X is independent of G. Define Y := St ,
so that Y is G-measurable. Let h be any function and set f (x, y) := h(xy). Then
g(y) := E[f (X, y)] = E[h(Xy)] = ph(uy) + gh(dy).
The Independence Lemma asserts that
E[h(St+1 )|Ft ] =
=
=
=
E[h(XY )|G]
E[f (X, Y )|G]
g(Y )
ph(uSt ) + qh(dSt ).
This shows the stock price is Markov. Indeed, if we condition both sides of the above equation
on (St ) and use the tower property on the left and the fact that the right hand side is (St )measurable, we obtain
E[h(St+1 )|St ] = ph(uSt ) + qh(dSt ).
Thus E[h(St+1 )|St ] and E[h(St+1 )|Ft ] are equal. Not only have we shown that the stock price
process is Markov, but we have also obtained a formula for E[h(St+1 )|Ft ] as a function of St .
References
[1] N. H. Bingham and R. Kiesel, Risk-neutral valuation, Springer Finance, Springer-Verlag London Ltd.,
London, second ed., 2004. Pricing and hedging of financial derivatives.
rk, Arbitrage theory in continuous time, Oxford University Press, third ed., 2009.
[2] T. Bjo
[3] F. Black and M. Scholes, The pricing of options and corporate liabilities, J. Polit. Econ., 81 (1973), pp. 637
659.
[4] M. H. A. Davis and A. Etheridge, Louis Bacheliers Theory of Speculation: the origins of modern finance,
PUP, 2006.
[5] B. Dupire, Pricing with a smile, Risk, 7 (1994), pp. 1820.
[6] A. Etheridge, A course in financial calculus, Cambridge University Press, Cambridge, 2002.
[7] G. R. Grimmett and D. R. Stirzaker, Probability and random processes, Oxford University Press, New
York, third ed., 2001.
[8] J. M. Harrison and S. R. Pliska, Martingales and stochastic integrals in the theory of continuous trading,
Stochastic Process. Appl., 11 (1981), pp. 215260.
109
[9] J. C. Hull, Options, futures and other derivatives, Pearson, eighth ed., 2011.
[10] J. Jacod and P. Protter, Probability essentials, Universitext, Springer-Verlag, Berlin, second ed., 2003.
[11] H. D. Junghenn, Option valuation, Chapman & Hall/CRC Financial Mathematics Series, CRC Press, Boca
Raton, FL, 2012. A first course in financial mathematics.
[12] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus, vol. 113 of Graduate Texts in
Mathematics, Springer-Verlag, New York, second ed., 1991.
[13] D. Kennedy, Stochastic financial models, Chapman & Hall/CRC Financial Mathematics Series, CRC Press,
Boca Raton, FL, 2010.
[14] R. C. Merton, Theory of rational option pricing, Bell J. Econom. and Management Sci., 4 (1973), pp. 141183.
[15] B. ksendal, Stochastic differential equations, Universitext, Springer-Verlag, Berlin, sixth ed., 2003. An
introduction with applications.
[16] S. E. Shreve, Stochastic calculus for finance. I, Springer Finance, Springer-Verlag, New York, 2004. The
binomial asset pricing model.
[17]
, Stochastic calculus for finance. II, Springer Finance, Springer-Verlag, New York, 2004. Continuous-time
models.
[18] D. Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press,
Cambridge, 1991.
[19] P. Wilmott, S. Howison, and J. Dewynne, The mathematics of financial derivatives, Cambridge University
Press, Cambridge, 1995. A student introduction.
Michael Monoyios, Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
E-mail address: monoyios@maths.ox.ac.uk

B10.1bnotes - ht14 Maths Models of Financial Derivatives

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B10.1bnotes - ht14 Maths Models of Financial Derivatives

Uploaded by

Copyright:

Available Formats

B10.

1B MATHEMATICAL MODELS OF FINANCIAL DERIVATIVES

B10.1B FINANCIAL DERIVATIVES

7.1. BM as scaled limit of symmetric random walk

19.2. Local volatility model

1. Introduction to financial derivatives

B10.1B FINANCIAL DERIVATIVES

ST K (Forward contract payoff)

ST (Terminal asset price)

Figure 1. Forward contract payoff as function of final underlying asset price

B10.1B FINANCIAL DERIVATIVES

ft,T = St K exp(r(T t)),

ft,T = St exp(q(T t)) K exp(r(T t)),

Proof. Problem Sheet 1.

Proof. Problem Sheet 1.

B10.1B FINANCIAL DERIVATIVES

(ST K)+ (Call payoff)

Figure 2. Call option payoff

Figure 3. Put option payoff

Proof. The payoffs of a call and put satisfy

where q is the dividend yield.

These bounds are shown in Figure 4.

> (St c(t, St ))er(T t) K

B10.1B FINANCIAL DERIVATIVES

Bounds on Value of European Call

K=10, r=10%, sigma=25%, T=1year

Figure 4. Bounds on European call option value

LONG STRADDLE PAYOFF

Figure 5. Straddle payoff

B10.1B FINANCIAL DERIVATIVES

the set of all possible outcomes of three coin tosses.

AT = {THH, THT, TTH, TTT},

It is easy to see (exercise) that F0 , F1 are both -algebras.

B10.1B FINANCIAL DERIVATIVES

A simple example of a random variable is the indicator function of a set A F, defined as

When X takes on a continuum of values and has a density fX , then dX (x) = fX

where r 0 is the one-period (assumed constant) interest rate. Hence we have

B10.1B FINANCIAL DERIVATIVES

Figure 6. Binomial process for stock price. We have associated a probability

At time t T the possible stock prices are St , given by

(S2 ) = {, , AHH , AHT ATH , ATT , AHH ATT , AcHH , AcTT }.

4. Conditional expectation, martingales, equivalent measures

B10.1B FINANCIAL DERIVATIVES

In conclusion, we can write

which is the partial averaging property.

B10.1B FINANCIAL DERIVATIVES

Similarly, we can show that

With F0 = {, }, we can show similarly that

Proof. To show this, define, for any t = 0, 1, . . . , n 1, the random variable

Then X = u if t+1 = H and X = d if t+1 = T, and X is independent of Ft because each coin

4.4. Equivalent measures.

exactly when Q(A) = 0,

and t are Ft1 -measurable.

The initial wealth is given by

X1 = (1 + r)X0 + 1 (S1 (1 + r)S0 ).

B10.1B FINANCIAL DERIVATIVES

Xt1 = t St1 + t St1 = t (1 + r)t1 + t St1 ,

Hence the general self-financing condition is

We can raise this to a definition.

Definition 5.2. A trading strategy (t , t )nt=1 is self-financing if for every t = 1, . . . , n 1, we

Using (5.4) to eliminate t

from (5.5) we can write the portfolio wealth evolution as

Xt = (1 + r)Xt1 + t (St (1 + r)St1 ),