Schmidt01 PDF

Stochastic Processes I
Master of Science in Quantitative Finance

Core Course WS 2006
Prof. Dr. Wolfgang M. Schmidt

HfB–Business School of Finance & Management, Frankfurt
Centre of Practical Quantitative Finance
e-mail: schmidt@hfb.de
October 27, 2006
– Typeset by FoilTEX –
Contents
1 Warm-up & Motivation 5
1.1 Binomial tree model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Pricing an option by replication . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Costs of replication as expectation . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 What do we have to learn? . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Probability Theory 21
2.1 Probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
– Typeset by FoilTEX – 1
2.3 Expectation and integration w.r.t. measures . . . . . . . . . . . . . . . . . . 25
2.4 Convergence of random variables . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Change of measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Conditional expectation, conditional probability . . . . . . . . . . . . . . . . 36
2.7 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Stochastic Processes 48
3.1 General definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Black & Scholes model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5 Distribution of some important variables associated with Brownian motion . . . 62
3.6 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 Stochastic Integration 73
4.1 Integrating w.r.t. a stochastic process . . . . . . . . . . . . . . . . . . . . . 74
4.2 Ito integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Ito process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5 Simple stochastic differential equations . . . . . . . . . . . . . . . . . . . . 89
4.6 Change of measure – Girsanov’s theorem . . . . . . . . . . . . . . . . . . . 93
4.7 Links to partial differential equations . . . . . . . . . . . . . . . . . . . . . 97
4.8 Stochastic integral representation . . . . . . . . . . . . . . . . . . . . . . . 104
4.9 Applications to the Black & Scholes model . . . . . . . . . . . . . . . . . . 105
1 Warm-up & Motivation
1 Warm-up & Motivation
Modern finance makes critical use of mathematical tools, in particular, tools from a discipline called
”stochastics” which includes probability calculus, stochastic processes and stochastic calculus.
Practical applications in finance requiring a profound knowledge of these prerequisites are, for
example, risk management, statistical analysis of financial markets, and, most importantly, pricing
and hedging of complex financial products such as derivatives.
This section takes the problem of pricing and hedging an option as a starting point to develop
the motivation and a course programme for our journey through the theory of stochastic processes.
We base our motivation on a quite simple model for a financial market.
Prof. Dr. W. M. Schmidt 5 Stochastic Processes I, WS 2006

1 Warm-up & Motivation 1.1 Binomial tree model
1.1 Binomial tree model
Consider a model for a dynamic financial market, where time is evolving discretely, t =
0, 1, 2, . . . . The time t = 0 refers to today, whereas t = 1, 2, . . . are future points in time.
In our financial market there are liquidly traded financial securities S 1, S 2 with prices St1,
resp. St2 at time t = 0, 1, 2, . . . . Assume security S 1 is a risk-free bank account that increases
in each time period at an interest rate r :
1 1 t
S0 = 1, St = (1 + r) .
The other security is assumed to be risky, i.e., the price at future points in time is unknown
today, it is a random quantity. Of course, the price today, S02 is known, and we assume that
S02 = 1. For the future time points we suppose a random behaviour consisting of two possible
price branchings from time step to time step:
(
2 2 u ”up”move
St+1 = St · ,
d ”down” move

1 Warm-up & Motivation 1.1 Binomial tree model
where u and d are fixed numbers with u > d. At time t the possible values of the random price
St2 are
2 2 k t−k
St ∈ {S0 · u d , k = 0, 1, . . . , t}.
Here is an example:
u= 1.100
d = 1/u = 0.9091
t=0 t=1 t=2 ...
1.210
1.100
S02 = 1.000 1.000
0.9091
0.8265

1 Warm-up & Motivation 1.2 Pricing an option by replication
1.2 Pricing an option by replication
We are now interested in getting a ”fair” price today for a payoff XT at time T that is random
and depends on the random price of S 2 at time T , or, even more general, on the random price
path of S 2 up to time T ,
2 2
XT = F (ST ) or XT = F (St , t = 0, 1, . . . , T ),
with some payoff function F . The simplest example is a call option with strike (exercise) price
K , XT = (ST2 − K)+.
At a first glance it looks like that there is no solution for the ”fair” price of XT , because,
naively one would expect that the price depends on our expectations on the future behaviour
of the underlying security S 2, in particular, on the likelihoods we would assign to the up and
down moves. But those expectations are highly subjective and possibly different for each market
participant. It is one of the most remarkable inventions of modern finance that there is indeed a
fair price that is independent of subjective expectations! The reason is the following. Although
XT , whose payoff is determined by its underlying S 2, introduces a new security to our market, its

fair price at times t < T must be somehow linked to the prices of the primary securities S 1, S 2
to avoid price conflicts between the prices of the now three securities in our market. This will
become more transparent soon.
It turns out that XT does not really introduce a new financial instrument to our market since
it can be replicated by a clever strategy of trading in the primary securities S 1, S 2. A trading
strategy in S 1, S 2 is given by a pair θ = (θt1, θt2)t=0,1,2,..., where θti is the quantity of shares of
security S i we hold at time t, more precisely, between time t and t + 1. Of course, the strategy
(θt1, θt2) at time t is not predetermined as of today, it may depend on the random behaviour of
the prices of S 1, S 2 up to time t.
In the Binomial tree model the random behaviour of the prices of S 1, S 2 is characterized by
a sequence ω of moves ξt being ”up” or ”down”
ω = (ξ1, ξ2, . . . ), ξt ∈ {u, d}.
Such an ω is also called a state of the world. The behaviour up to time t is described by the
up to time t part ωt of this sequence: ωt = (ξ1, ξ2, . . . , ξt). A trading strategy is then fully

specified by the following numbers
t=0 θ0i , i = 1, 2
t=1 θ1i (u), θ1i (d), i = 1, 2
t=2 θ2i (uu), θ2i (ud), θ2i (du), θ2i (dd), i = 1, 2
... ...
For example, the quantity θ2i (ud) is the number of shares of security S i to be held between time
t = 2 and t = 3 if the realized price moves up to time t = 2 were an ”up” u followed by a
”down” d.
In our Binomial tree example above consider a call option with maturity T = 2 and payoff
2 +
X2 = (S2 − 0.98) .
The option payout in the possible states at maturity are X2(uu) = 0.23, X2(ud) =
0.02, X2(du) = 0.02, X2(dd) = 0.0. For security S 1 we assume an interest rate of

r = 5.0%. Now consider the following strategy
time strategy
t=0 θ01 -0.67871
θ02 0.79937
t=1 θ11(u) -0.88889
θ12(u) 1.00000
θ11(d) -0.08638
θ12(d) 0.11524
At time t = 0 the strategy requires an initial investment of V0(θ) = θ01S01 + θ02S02 = 0.12066
which is the value of the position we hold at time t = 0. If at time t = 1 the market goes ”up”
the strategy generates a profit & loss of
1 1 1 2 2 2
θ0 [S1 (u) − S0 ] + θ0 [S1 (u) − S0 ] = 0.046
giving us a gross value of 0.16667 at time t = 1 if the market went up. To continue from
here with our strategy above from time t = 1 to t = 2 we need capital of V1(θ)(u) =

θ11(u)S11(u) + θ12(u)S12(u) = 0.16667 fitting exactly to what we have obtained so far by our
strategy! So rearranging our portfolio is costless. Continuing from time t = 1 to t = 2 with
our strategy, adding up the profit & loss generated, we end up with a portfolio value from our
strategy at time t = 2
V2(θ)(uu) = 0.23
V2(θ)(ud) = 0.02.
But this is exactly identical to the payoff of our call option X2 in the states (uu), (ud). One
can easily verify that the given strategy replicates the payoff of the option XT for all possible
future path of the market. This means that the payoff of the option XT and the result of above
strategy are indistinguishable for all states ω of our world. The above strategy perfectly replicates
the option payoff, it is called a replicating strategy for XT .
As a consequence, the price of the option and the costs of replication must coincide.
Since the replication only required an initial investment V0(θ) the fair price of the option is
V0(θ) = 0.12066. Any price different from that would give raise to a contradiction and would
imply that there is a money printing machine.

So far we have made no assumption on the probabilities associated with the up and down
moves, and the fair price of XT does clearly not depend on any probability assignments! The
reason is that the replication strategy above does work in all states of the world independent of
their probabilities.
Now it is time to summarize and formalize what we have learned from the example.
We have replicated the option payoff XT in all states of the world by a clever strategy
θ = (θ 1, θ 2). Replication means that the initial investment of the strategy plus the cumulative
profit & loss from the strategy, finally, at time T , generates exactly the same payout as the
option,
XX T X i
i i i i
XT = V0(θ) + θt−1[St − St−1] = θT −1ST . (1.1)
i=1,2 t=1 i=1,2
During the replication, rearranging the strategy was costless since
X i i
X i i
θt−1St = θt S t .
i=1,2 i=1,2

A strategy with this property is also called self-financing. Recall that all these equalities are
equalities between random variables, i.e., the random variables coincide for all states ω .
The appropriate strategy, θt1(ωt), θt2(ωt), both depending on the state of the world ωt up to
time t, always1 exists. It is obtained from the following backward solving algorithm
X i i
XT (ωT ) = θT −1(ωT −1)ST (ωT ) (1.2)
i=1,2
X i i
X i i
θt (ωt)St (ωt) = θt−1(ωt−1)St (ωt) (1.3)
i=1,2 i=1,2
t = T − 1, T − 2, . . . , 1. (1.4)
Finally, the fair price V0(XT ) of XT today has to be the same as the costs of replicating XT by
the strategy θ . In view of (1.3) there are only the costs of initially setting up the strategy, i.e.,
V0(XT ) = V0(θ).
1 Up to the pathological case where u = d or u = 1 + r or d = 1 + r .

1 Warm-up & Motivation 1.3 Costs of replication as expectation
In other words, the fair price of the derivative XT is equal to its costs of replication.
1.3 Costs of replication as expectation
Clearly it is cumbersome to calculate the replicating strategy by the backward induction (1.2),
(1.3) just to obtain the amount that has to be invested initially. It turns out that there is a clever
tool to shorten the calculation.
The replication algorithm can be formalized also in the following way. We started with the
goal of generating at time T a portfolio value VT (θ) that replicates the payoff XT ,
X i i
VT (θ) = XT = θT −1ST . (1.5)
i
Then for each time step t = T, T − 1, . . . , 1 the required capital in the previous step is
X i i
Vt−1(θ) := θt−1St−1, (1.6)
i

1 2
with a portfolio (θt−1 , θt−1 ) replicating the position needed in the next step,
X i i X i i
θt St = Vt(θ) = θt−1St . (1.7)
i i
Proposition 1.1. In the binomial tree model the value process Vt(θ), t = T, T − 1, . . . , 1
satisfies the recursion equation
1 h i
Vt−1(θ)(ωt−1) = p Vt(θ)(ωt−1, u) + (1 − p) Vt(θ)(ωt−1, d) , (1.8)
1+r
where
1+r−d
p= (1.9)
u−d
and (ωt−1, u) denotes the path ωt continued with an ”up” move u in step t and analogously for
(ωt−1, d).
If we assume that
d < (1 + r) < u,

then 0 < p < 1 can be interpreted as a probability and (1.8) tells us that Vt−1(θ) is the
expectation of Vt(θ) divided (“discounted”) by (1 + r).
Warning. One has to be careful interpreting the probability p. The whole approach so far was
completely independent of any assumptions on the probabilities associated with the u, d moves.
Also, here we do not assume that p is the ”real” or any subjective likelihood for the up move!
The probability p is purely artificial and, as such, just a tool or trick to shorten calculations.
Proof. For the two branchings u, d continuing the path ωt−1 equation (1.7) reads as
1 1 2 2
Vt(θ)(ωt−1, u) = θt−1(ωt−1)St−1(ωt−1) (1 + r) + θt−1(ωt−1)St−1(ωt−1) u
1 1 2 2
Vt(θ)(ωt−1, d) = θt−1(ωt−1)St−1(ωt−1) (1 + r) + θt−1(ωt−1)St−1(ωt−1) d.
1 1
Multiplying the first equation with 1+r p and the second with 1+r (1 − p) and, finally, adding
both equations yields the assertion.

In each step, associating with the up move u a probability p and with the down move d
the probability 1 − p, we construct a probability distribution Q on the set Ω of all states of
the world ω = (ξ1, ξ2, . . . ), ξi ∈ {u, d}. For a set of states ω which follow a given path
(ξ1, ξ2, . . . , ξT ) up to time T we set
]u (ωT ) ]d (ωT )
Q({ω : ωT = (ξ1, ξ2, . . . , ξT )}) = p (1 − p) , (1.10)
where ]u(ωT ), resp. ]d(ωT ), denotes the number of occurrences of ”ups” u, resp. ”downs” d,
in the path ωT = (ξ1, ξ2, . . . , ξT ).
We will learn form the exercises that this particular probability distribution Q admits an
interpretation as risk neutral distribution if p is given by (1.9). Moreover, the moves of the
security S 2 are independent from step to step and the security S 2 is a Markov process.

Corollary 1.2. Again assume a Binomial tree model. Consider a derivative with payoff XT (ωT )
at time T in state ωT which can be replicated by a strategy θ as in (1.2), (1.3). The fair price
V0(XT ) of XT today is then given by
1 X ]u (ωT ) ]d (ωT ) 1
V0(XT ) = XT (ωT )p (1 − p) = EQ(XT ), (1.11)
(1 + r)T ω (1 + r)T
T
where p is as in (1.9). So V0(XT ) is nothing else but the ”discounted” expectation of the
payoff XT , where the expectation is taken under an appropriate artificial probability distribution
assigning the probability p (resp. 1 − p) to the ”up” (resp. ”down”).

1 Warm-up & Motivation 1.4 What do we have to learn?
1.4 What do we have to learn?
To be able to carry out the same analysis as in the preceding two sections, but in much more
general modeling framework than just a Binomial tree model, it is obvious that we have to get a
hand on the mathematical concepts and tools necessary to do the job.
Here is our programme:
(i) Describe the dynamics of random prices (Sti) through time

,→ theory of stochastic processes.
P PT i i i P RT i i
(ii) Model trading strategies θ and their profit & loss i t=1 θt−1[St −St−1] = i 0 θu dSu
,→ theory of stochastic integration.
(iii) Switch to appropriate artificial probabilities on the path space
,→ change of measure, Girsanov’s theorem.
P RT i
(iv) Replicate a given payoff XT by a self-financing strategy θ , XT = V0(θ) + i 0 θudSui
(cf. (1.1))
,→ stochastic integral representation theorems.

2 Probability Theory
2 Probability Theory
2.1 Probability space
The mathematical model of an event with uncertain outcome is a probability space (Ω, F , P)
with Ω as the (nonempty) set of all possible outcomes (states of the world) ω ∈ Ω, which is also
often named sample space. F is called the set of observable events, it is a set of subsets
of Ω. Finally, P is the probability measure which associates a probability P(A) to each
event A ∈ F .
The set F of observable events is assumed to be a σ -algebra, i.e.,
(i) ∅ ∈ F ,
(ii) A ∈ F ⇒ Ac ∈ F , where Ac denotes the complement of A in Ω,
S∞
(iii) A1, A2, · · · ∈ F ⇒ n=1 An ∈ F .

2 Probability Theory 2.1 Probability space
As a consequence of these three conditions one can easily show that F is closed under all kinds
of set-theoretic operations (union, intersection, set-difference . . . )
The probability measure P is a mapping P|F → [0, 1] with the following properties
(i) P(Ω) = 1,
(ii) (countable additivity) for A1, A2, · · · ∈ F with Ai ∩ Aj = ∅, ∀i 6= j , it holds that
∞ ∞
!
[ X
P An = P(An).
n=1 n=1
Again, many simple calculation rules for probabilities can be derived as a consequence of these
two properties.

2 Probability Theory 2.2 Random variables
2.2 Random variables
A random variable X associates with each outcome ω ∈ Ω a real number X(ω) ∈ R, so X

is a mapping X|Ω → R. Since one is interested in probabilities that X takes values in certain
sets of R, for example {ω : X(ω) < x}, for those probabilities to be well-defined we have to
impose additional conditions on a random variable. The set of real numbers R is equipped with
the σ -algebra B of Borel sets, which is the smallest σ -algebra of subsets of R containing all
intervals. We assume that, by definition, a random variable X is (B, F ) measurable, meaning
that for every B ∈ B the set X −1(B) = {ω : X(ω) ∈ B} belongs to F , X −1(B) ∈ F .
The assumption of measurability assures that the probabilities
−1
PX (B) := P(X (B)), B ∈ B (2.1)
are well-defined. PX is a probability measure on the space (R, B), it is called the distribution
of X .
Closely related to the distribution of X is the distribution function FX (x), x ∈ R,

2 Probability Theory 2.2 Random variables
which is defined by
FX (x) = P(X ≤ x) = PX ((−∞, x]), x ∈ R. (2.2)
The function FX is increasing and right continuous with limx→−∞ FX (x) =

0, limx→∞ FX (x) = 1.
Associated with a random variable X is the σ -algebra σ(X) generated by the random
variable X , defined by2
−1
σ(X) = {X (B) : B ∈ B}. (2.3)
The family σ(X) is a sub-σ -algebra of F , i.e., σ(X) ⊆ F . Informally, σ(X) contains all
events, ”information”, from Ω that are observable by observing the random variable X .
There is a quite useful little result on the structure of any other random variable Y that is
σ(X)-measurable3.
2 Verify that the right hand side is indeed a σ -algebra!
3 More precisely, (B, σ(X))–measurable.

2 Probability Theory 2.3 Expectation and integration w.r.t. measures
Lemma 2.1. Let Y be a random variable on (Ω, F , P). Then Y is σ(X) measurable if and
only if there is a function g|R → R that is (B, B)–measurable and such that Y = g(X).
2.3 Expectation and integration w.r.t. measures
For an at most countable space of outcomes Ω the expectation EX of the random variable X is
defined as the probability weighted average of all possible values of X ,
X
EX = X(ω)P({ω}). (2.4)
ω∈Ω
In the general case the expectation is formally defined as the integral

Z
EX = X(ω)dP(ω), (2.5)
Ω
that we have to give a meaning now.

We decompose X = X + − X −, with nonnegative components X + = max(X, 0), X − =

max(−X, 0), and define the expectation first for a nonnegative random variable. Once that is
done we can define EX = EX + − EX −, provided the difference is well-defined.
So assume X ≥ 0. In case X is piecewise constant taking finitely many values

xk , k = 1, . . . , n, on disjoint sets Ak ∈ F , i.e.,
n
X
X(ω) = xk 1Ak (ω),
k=1
we define the integral by

Z n
X
EX = X(ω)dP(ω) = xk P(Ak ).
Ω k=1

In the general situation we can approximate X ≥ 0 by an increasing sequence Xn of

piecewise constant variables. Indeed, for example define
n2 n −1
X k h
Xn(ω) = 1 k k+1
” (X(ω)),
k=0
2 n 2n , 2n
then for n → ∞ we have Xn(ω) ↑ X(ω), ∀ω ∈ Ω, the expectation EXn is well-defined and
increasing in n and we define finally
EX = lim EXn.
n→∞
Observe that EX can take the value infinity!
The random variable X is called integrable, if EX + < ∞ and EX − < ∞, which is the
same as E|X| < ∞.
If X is integrable we define its expectation (under the measure P) as

+ −
EX = EX − EX .

One can easily extend the definition of EX allowing that at most one of the two components,
EX + or EX −, can take the value ∞. The expectation in this generalized sense takes then
values in [−∞, ∞].
Observe that for any function g|R → R which is (B, B)-measurable4, the variable g(X)
is again a random variable. So along the above lines we have also defined the expectation
E(g(X)).
Proposition 2.2. The expectation of a random variable satisfies the following properties.
(i) If X and Y are integrable then so is αX + βY and
E(αX + βY ) = αEX + β EY.
(ii) If X and Y are integrable and P(X ≤ Y ) = 1 then
EX ≤ EY.
4 Such a function is also called Borel-measurable.

(iii) (Jensen’s inequality) If X is integrable and g|R → R is a convex function such that
g(X) is integrable, then
E(g(X)) ≥ g(EX). (2.6)
When calculating the expectation of a random variable X or g(X), the integration over the
sample space Ω can be transformed equivalently into an integration over the value space R by
replacing the measure P on (Ω, F ) by PX on (R, B),
Z Z Z ∞
E(g(X)) = g(X(ω))dP(ω) = g(x)dPX (x) = g(x)dFX (x), (2.7)
Ω R −∞
the latter integral being a Lebesgue-Stieltjes integral w.r.t. the increasing function FX .
In case the distribution function FX possesses a density fX , we get further

Z ∞ Z ∞
E(g(X)) = g(x)dFX (x) = g(x)fX (x)dx. (2.8)
−∞ −∞

2 Probability Theory 2.4 Convergence of random variables
2.4 Convergence of random variables
Consider a sequence of random variables X1, X2, . . . defined on the probability space
(Ω, F , P). There are many different ways of how this sequence can converge to a certain
limiting random variable X . In the simplest case, we can think of pointwise convergence
limn Xn(ω) = X(ω), ∀ω ∈ Ω. However, since for most applications, in the limit exceptional
sets of vanishing probability can be ignored it is more appropriate to work with weaker concepts
of convergence than pointwise convergence.

Definition 2.3. The sequence (Xn) converges to X in the sense of
P–a.s. ` ´
(i) P-almost surely convergence: Xn −→ X , if P ω : limn Xn(ω) = X(ω) = 1,
P
(ii) Convergence in P probability: Xn −→ X , if ∀ > 0
` ´
lim P ω : |Xn(ω) − X(ω)| > = 0,
n
Lp
(iii) L convergence: Xn −→ X , if E|Xn|p < ∞ and limn E|Xn − X|p = 0,
p
d
(iv) Distribution or weak convergence: Xn −→ X , if limn FXn (x) = FX (x) for all
points x ∈ R of continuity of FX .
Convergence P–almost surely implies convergence in probability, but not vice versa.
Convergence in Lp and convergence in probability are equivalent, provided E[Xn|p < ∞ and
the family |Xn|p, n = 1, 2, . . . is uniformly integrable, a kind of boundedness condition.

Convergence in distribution is a very weak5 concept, but of enormous practical importance.

d
Weak convergence Xn −→ X implies limn Eg(Xn) = Eg(X) for ”nice” functions g , which
gives the foundation for calculating expectations (fair prices of derivatives) approximately by
numerical schemes based on approximating the random variable X by simpler variables Xn.
The notion of P–almost surely is also important and useful in other contexts. We say a
statement A holds P–almost surely, in short, P-a.s., if P(A) = 1.
It is important for many applications to know under which conditions taking the limit and
taking expectation are interchangeable operations.
Theorem 2.4. [Monotone convergence] Let (Xn) be an P–almost surely increasing sequence
of non-negative random variables with limit X , P(X1 ≤ X2 ≤ . . . , limn Xn = X) = 1, then
lim EXn = EX.

n
5 Observe that it is even not necessary that the random variables X and X are defined on the same probability space.
n

2 Probability Theory 2.5 Change of measure
Theorem 2.5. [Dominated convergence] If the sequence (Xn) converges in probability to X

and if |Xn| ≤ Y with EY < ∞, then
lim EXn = EX.

n
2.5 Change of measure
Consider first the situation of a finite sample space Ω where for each ω the one point set is
in the σ –algebra F , {ω} ∈ F . Then the probability measure P assigns a certain probability
to each ω , and we assume for simplicity P(ω) = P({ω}) > 0. Then any other probability
measure Q on (Ω, F ) can be seen as nothing else but a re-scaling of the probabilities P with a
scaling function Z , indeed,
Q(ω) = Z(ω)P(ω), (2.9)
(ω)
with Z(ω) = Q P(ω) . For Q defined by (2.9) to be a probability measure it is necessary and
sufficient that Z(ω) ≥ 0 and EPZ = 1, where EP denotes the expectation taken w.r.t. the
measure P.

In case of an infinite sample space Ω the transformation (2.9) may not make sense anymore,
since there will be ω ’s such that P(ω) = 0, Q(ω) = 0. This indicates that the transformation
has to be done on a set basis instead of an outcome basis.
Theorem 2.6. On the probability space (Ω, F , P) let Z be an P–a.s. nonnegative random
variable with EPZ = 1. Define
Z Z
Q(A) = Z(ω)dP(ω) = 1A(ω)Z(ω)dP(ω), A ∈ F . (2.10)
A Ω
Then Q is a probability measure on (Ω, F ). Moreover, Q is absolutely continuous w.r.t.

P meaning that if P(A) = 0 then Q(A) = 0.
The notion of absolute continuity of Q w.r.t. P, in symbols Q P, states that

whenever a set A ∈ F has zero probability under P then its probability under Q must also
vanish. Two probability measures P and Q are called equivalent, denoted by P ∼ Q, if
Q P and, at the same time, P Q. Equivalence means that the sets of vanishing
probabilities coincide under both measures.

It turns out that the reverse to Theorem 2.6 holds:
Theorem 2.7. On the probability space (Ω, F , P) let Q be another probability measure
absolutely continuous w.r.t P. Then there exists an P–a.s. unique nonnegative random variable
Z with EPZ = 1 such that
Z
Q(A) = Z(ω)dP(ω), ∀A ∈ F . (2.11)
A
Z is called the Radon-Nikodym density of Q w.r.t. P and is denoted formally as
dQ
Z= .
dP
If the measures P and Q are equivalent, then P(Z > 0) = 1 and
1 dP 1
Z
P(A) = dQ, i.e., = .
A Z dQ Z

2 Probability Theory 2.6 Conditional expectation, conditional probability
2.6 Conditional expectation, conditional probability
The conditional probability of the event A ∈ F given the event B ∈ F is defined as
P(A ∩ B)
P(A|B) := ,
P(B)
where this definition makes sense only if P(B) > 0. Analogously one defines the conditional
expectation E(X|B) of the random variable X given B ,
R
1 XdP
Z
B
E(X|B) := X 1B dP = . (2.12)
P(B) Ω P(B)
Intuitively P(A|B) resp. E(X|B) is the probability of A, resp. the expectation of X , provided
that the event B has ”occurred”. Conditional expectation and conditional probability are linked

by
P(A|B) = E( 1A|B)
Z
E(X|B) = X(ω)dP(ω|B) = EP(.|B)X.
Ω
The whole theory of stochastic processes, and, as a consequence, many applications of

quantitative finance, are critically based on an important generalization of the concept of
conditional probability and conditional expectation. First, the interpretation of B as a kind
of information of the outcomes that ”occurred” will be extended to a more general concept of
information, namely, information will be described by a (sub)-σ -algebra of F . Related to that we
will be able to generalize the definition of conditional probability and expectation to the situation,
where B has zero probability.

The σ –algebra F in our probability space, was interpreted as the set of all observable events
we have the probability P assigned to. For a random variable X , observing the outcomes of X ,
we know whether an event of the form X −1(B) = {ω : X(ω) ∈ B}, B ∈ B, has occurred or
not. So observing X gives us an information on the occurrence of all sets of the form X −1(B)
that form the σ –algebra σ(X) generated by X . Clearly, X −1(B) ∈ F by the definition of
a random variable, but, it is in general not true that any A ∈ F is necessarily of the form
X −1(B). This means that in general the observation of X gives us only information about the
events contained in σ(X), but not the whole of F .
In general we define information as a sub-σ -algebra G of F . A simple example is the

information generated by observing solely the occurrence of an event B ∈ F , in which case we
would set G = {B, B c, Ω, ∅}.

Theorem 2.8. Let X be an integrable random variable on the probability space (Ω, F , P).
Then for any σ -algebra G ⊆ F there exists a P–a.s. unique random variable X̃ such that
(i) X̃ is G –measurable6, and,

(ii) (partial averaging property)
Z Z
X̃dP = XdP, ∀A ∈ G.
A A
Definition 2.9. Any random variable X̃ with the properties of Theorem 2.8 is called condi-
tional expectation of X under the σ –algebra G and is denoted formally by E(X|G).
6 More precisely, (B, G)–measurable.

Example. Let us calculate the conditional expectation of the random variable X und the
σ –algebra G generated by the information generated by observing if the given event B ∈ F has
occurred or not: G = {B, B c, Ω, ∅}. Assume 0 < P(B) < 1. Then
(
1
R
P(B) B XdP if ω ∈ B,
E(X|G)(ω) = 1
if ω ∈ B c.
R
P(B c ) B c XdP
Taking conditional expectation of X under G involves a kind of smoothing of the values of

7
X . On every atomR A of G with P(A) > 0 the conditional expectation E(X|G) takes the
1
constant value P(A) A XdP for all ω ∈ A.
When working with conditional expectations the following fundamental calculation rules are
extremely useful.
7 An atom A in G is a set that is indivisible in G , i.e., if B ∈ G and A ∩ B 6= ∅ then B = A.

Theorem 2.10. [Properties of conditional expectation] Let (Ω, F , P) be a probability space

and let G and H be sub-σ –algebras of F .
(i) (Linearity) If X1 and X2 are integrable random variables and c1, c2 ∈ R, then
E(c1X1 + c2X2|G) = c1E(X1|G) + c2E(X2|G), P–a.s. (2.13)
(ii) (Taking out what is known) If XY and Y are integrable and X is G –measurable, then
E(XY |G) = X E(Y |G), P–a.s. (2.14)
(iii) (Iterated conditioning, Tower law) If X is integrable and G ⊆ H then

` ´
E E(X|H)|G = E(X|G), P–a.s. (2.15)
(iv) If X is integrable and G = {∅, Ω} is the trivial σ –algebra, then
E(X|G) = EX, P–a.s. (2.16)

(v) ( Jensen’s inequality) If X is integrable und g|R → R is a convex function such that
g(X) is integrable, then
` ` ´
g E(X|G)) ≤ E g(X)|G , P–a.s. (2.17)
Proof: . . .
The interpretation of the conditional expectation as a smoothing or coarsening of a random

variable w.r.t. a given information can be taken further. It turns out that E(X|G) is, in a sense
to be made precise below, the best predictor for X in the set of all random variables that are
known, once the information G has been prevailed. More precisely, E(X|G) is the ”closest”
G –measurable random variable approximating X .
Proposition 2.11. Let the random variable X be square integrable, i.e., EX 2 < ∞. Denote
by L2(G) the space of all square integrable random variables that are G –measurable. Then
E(X|G) ∈ L2(G) and E(X|G) is characterized by the fact that
ˆ ˜2 ˆ ˜2
E X − E(X|G) = min E X − Z . (2.18)
Z∈L2 (G)

Finally we introduce the notion of conditional probability P(A|G) of the event A ∈ F

under the σ –algebra G .
Definition 2.12. We define the conditional probability of the event A ∈ F under the
σ –algebra G , by
P(A|G) = E( 1A|G), P–a.s. (2.19)
Consider the conditional expectation E(Y |σ(X)) of the random variable Y w.r.t. the
σ –algebra σ(X) generated by another random variable X . As we know from Lemma 2.1 there
exists a Borel–measurable function g|R → R such that
E(Y |σ(X)) = g(X).
The following notation is then frequently used:
E(Y |X = x) := g(x), x ∈ R.
The expression E(Y |X = x) on the left hand side is a purely formal notation which gets defined

2 Probability Theory 2.7 Independence
by the function g . Recall that referring to (2.12), the conditional expectation E(Y |{X = x})
is only well-defined if P(X = x) > 0!
2.7 Independence
The concept of independence can be introduced on various levels of generality.
Two events A, B ∈ F are called independent if P(A ∩ B) = P(A) P(B). If

P(B) > 0 this is equivalent to P(A|B) = P(A).
More generally, a family Ai, i ∈ I is called independent, if for any n and every finite choice
of n different indices i1, i2, . . . , in ∈ I we have
n
! n
\ Y
P A ik = P(Aik ).
k=1 k=1

The concept of independence is extended to sets of events in the following way. Let G1, G2
be subsets of F , in particular, G1, G2 could be sub-σ –algebras of F . Then G1, G2 are called
independent, if for every choice A1 ∈ G1, A2 ∈ G2 the events A1, A2 are independent, i.e.,
P(A1 ∩ A2) = P(A1) P(A2).
As a special case consider two random variables X1, X2 and the σ –algebras σ(X1), σ(X2)
generated by X1, X2, respectively. Then the random variables X1, X2 are named independent,
if σ(X1), σ(X2) are independent. Independence of X1, X2 is equivalent to
P(X1 < x1, X2 < x2) = P(X1 < x1)P(X2 < x2), for all x1, x2 ∈ R.
The random variable X is called independent of G ⊆ F if σ(X) and G are independent.
Here are some extremely useful rules related to independence for working with conditional
expectations.
Theorem 2.13. Let (Ω, F , P) be a probability space and let G and H be sub-σ –algebras of
F . The random variable X is assumed to be integrable.

(i) If X and G are independent, then
E(X|G) = EX, P–a.s. (2.20)
(ii) If X and G are independent, G and H are independent, and G ∨ H denotes the smallest
σ –algebra containing G and H, then
E(X|G ∨ H) = E(X|H), P–a.s. (2.21)
(iii) Assume the random variables X1, . . . , Xn are G –measurable and the random vari-
ables Y1, . . . , Ym are independent of G . Then for any bounded function f =
f (x1 . . . , xn, y1, . . . , ym) we have that
E(f (X1, . . . , Xn, Y1, . . . , Ym)|G) = g(X1, . . . , Xn), (2.22)
where g is given by
g(x1, . . . , xn) = E(f (x1, . . . , xn, Y1, . . . , Yn)).

Proof: . . .
Property (iii) is somehow an extension to the ”Taking out what is known” property of
conditional expectation combined with property (i) above.

3 Stochastic Processes
3 Stochastic Processes
3.1 General definitions
Let (Ω, F , P) be a probability space. A stochastic process is a family X = (Xt)t∈T

of random variables Xt(ω) indexed by a parameter t ∈ T, which is interpreted as ”time”,
T = {0, 1, 2, . . . } or T = [0, ∞) or T = [0, T ]. The random variable Xt is the value of the
stochastic process X at time t. For every ω ∈ Ω the function X.(ω)|T → R : t 7→ Xt(ω) is
the sample path, realization or trajectory of the process associated with ω .
A stochastic process can be seen as a random variable taking values in the space of all
functions from T to R, but to make that precise one has to equip this space with an appropriate
σ –algebra to define measurability of the mapping.
For all stochastic processes of interest in this course one can show that there is a so-called
version which has regular paths, meaning paths which are right-continuous with left limit
(”cadlag”), or left-continuous with right hand limits (”caglad”) or even with continuous paths.

3 Stochastic Processes 3.1 General definitions
By definition, the random variable Xt is (B, F )–measurable, i.e., {ω : Xt(ω) < x} ∈

F , ∀x ∈ R. The σ –algebra σ(Xt) contains all events from F which are observable through
the observation of Xt. Accumulating information over time we define the concept of flow of
information.
Definition 3.1. (i) For all t ∈ T define the σ –algebra
X
_
Ft = σ(Xs) = σ(Xs : s ≤ t)
s≤t
which contains all events of F observable through the stochastic process X until time t.
Clearly FsX ⊆ FtX for s ≤ t. The family FX = (FtX )t∈T is called the filtration or flow
of information generated by X .
(ii) A family F = (Ft)t∈T of sub–σ –algebras Ft of F is called a filtration if Fs ⊆ Ft for all
s, t ∈ T with s ≤ t.
(iii) A stochastic process Y = (Yt)t∈T is called F–adapted, if, for every t ∈ T, the random
variable Yt is (B, Ft)–measurable.

3 Stochastic Processes 3.1 General definitions
(iv) A filtered probability space (Ω, F , (Ft)t∈T, P) is a probability space equipped with a
filtration F = (Ft)t∈T of sub–σ –algebras of F .
If Y = (Yt) is F–adapted this means intuitively, that observing the flow of information
F = (Ft), at each point in time t the random variable Yt is ”known”.
The filtration FX is the smallest filtration with respect to which X is adapted.
In most of our applications the underlying filtration will be generated by the observation of
price processes of securities (Sti)t∈T, i = 1, . . . , N ,
S 1 ,...,S N S 1 ,...,S N S 1 ,...,S N i

F = (Ft )t∈T, Ft = σ(Su, u ≤ t, i = 1, . . . , N ).

3 Stochastic Processes 3.2 Brownian motion
3.2 Brownian motion
Brownian motion is prototype of many types of stochastic processes – it is at the same time
a martingale, a Markov process, a Gaussian process, a process with independent increments, a
stationary process. On the other hand, it can be shown that almost every stochastic process with
continuous paths can be obtained by a suitable transformation from Brownian motion.
Brownian motions plays an outstanding role in mathematical finance since the majority of
popular models are based on it.
Unless otherwise stated we work on a continuous time scale: T = [0, ∞).
Definition 3.2. The stochastic process W = (Wt)t∈T defined on a probability space (Ω, F , P)
is called a Brownian motion or Wiener process if
(i) W possesses continuous paths, i.e., for every ω ∈ Ω the function t 7→ Wt(ω) is continuous.
(ii) W has independent increments, i.e., for all t1 < t2 < · · · < tn the increments Wt2 −
Wt1 , Wt3 − Wt2 , . . . , Wtn − Wtn−1 are independent random variables.

(iii) For all t > s the increment Wt − Ws has Normal distribution with mean zero and variance
t − s, in symbols (Wt − Ws) ∼ N(0, t − s).
(iv) W0 = 0.
A filtration F = (Ft)t∈T is called a filtration for the Brownian motion W , if
(i) W is F–adapted.
(ii) For all t > s the increment Wt − Ws is independent of the σ –algebra Fs.
Clearly, the filtration FW generated by W is a filtration for the Brownian motion W . Any F
being a filtration for the Brownian motion W satisfies F ⊇ FW , i.e., Ft ⊇ FtW , ∀t.
Many useful transformations of a given Brownian motion result again in another Brownian
motion.

Proposition 3.3. Let W = (Wt)t≥0 be a Wiener process. Then the following process
B = (Bt) is again a Wiener process:
(i) Bt = −Wt, t ≥ 0,
(ii) Bt = Ws+t − Ws, t ≥ 0, for arbitrary fixed s ∈ T,
(iii) Bt = 1c Wc2 t, t ≥ 0, for some fixed c ∈ R, c 6= 0.
The paths W.(ω) of Brownian motion admit some very interesting properties that will stress our
imagination. On the other hand understanding
Rt those path properties is critical when we approach
the problem of defining the integral 0 gsdWs in Section 4.
Recall that a function f |[0, ∞) → R is of finite p–variation on the interval [0, T ] if

n
X
p n n p
Vf ([0, T ]) = lim |f (ti ) − f (ti−1)| < ∞, (3.1)
δn →0
i=1
where 0 = tn0 < tn1 < · · · < tnn = T is a partition of [0, T ] and δn = maxi(tni − tni−1).

The function f is said to be of finite p–variation if

p
Vf ([0, T ]) < ∞, for all T ≥ 0.
It is called of bounded p–variation if there is a constant C ∈ R such that

p
Vf ([0, T ]) < C, for all T ≥ 0.
For p = 1 we simply speak of finite resp. bounded variation. The case p = 2 is called
quadratic variation.

Proposition 3.4. For P–almost all ω ∈ Ω the path W.(ω) of a Brownian motion has the
following properties.
(i) W.(ω) is not monotone on any interval.

(ii) W.(ω) is not differentiable at any point.
(iii) W.(ω) has infinite variation on every interval.
(iv) W.(ω) has quadratic variation t on the interval [0, t]:
2
VW.(ω)([0, t]) = t.
Proof: . . .
The quadratic variation of a stochastic process, if it exists, will play an outstanding role in
the definition of the general stochastic integral. Quadratic variation of the stochastic process
X = (Xt)t≥0 up to time t is usually denoted by the symbol
n
X 2
[X, X]t = lim |Xtn − Xtn | , (3.2)
δn →0 i i−1
i=1

3 Stochastic Processes 3.3 Martingales
where the δn is the maximum width of the partition of [0, t]. So for a Brownian motion W we
have
[W, W ]t = t, t ≥ 0, P − a.s.
3.3 Martingales
The theory of martingales, although developed by probabilists during the 50s to 80s, has turned
out to be THE MACHINERY for modern finance.
Definition 3.5. Let X = (Xt)t∈T be a stochastic process defined on a filtered probability

space (Ω, F , F = (Ft)t∈T, P). We assume that X is F–adapted and that Xt is P–integrable
for every t ∈ T. We call X
(i) a martingale, if for all t > s, s, t ∈ T
E(Xt|Fs) = Xs, P − a.s.

(ii) a submartingale, if for all t > s, s, t ∈ T
E(Xt|Fs) ≥ Xs, P − a.s.
(iii) a supermartingale, if for all t > s, s, t ∈ T
E(Xt|Fs) ≤ Xs, P − a.s.
When working with different probability measures and different filtrations one has to be more
precise saying that X is a martingale (sub-, supermartingale) with respect to the measure P and
the filtration F.
Example 1. Consider the binomial tree model from Section 1.1. Under the probability measure
Q as introduced in Section 1.3 formula (1.10) the process Xt = St2/St1, t = 0, 1, . . . is a
S 1 ,S 2
martingale w.r.t. the filtration F .

Example 2. Let ξ be an integrable random variable defined on the probability space

(Ω, F , P). For a filtration F = (Ft)t∈T define
Xt = E(ξ|Ft), t ∈ T. (3.3)
Then X = (Xt) is a martingale.
There are several martingales associated with a Brownian motion W .
Proposition 3.6. Let W be a Brownian motion on the filtered probability space (Ω, F , F, P).
The filtration is assumed to be a filtration for the Brownian motion W , in particular, we could
take F = FW . Then
(i) (Wt) is a martingale.

(ii) (Wt2 − t) is a martingale.
(iii) (exp(σWt − σ 2t/2)) is a martingale.
Proof. . . .

The martingale in item (iii) above is called the exponential martingale associated with
σW .
Definition 3.7. Given a Brownian motion W we call the process X = (Xt) defined by
„ «
1 2
Xt = X0 exp σWt + (µ − σ )t , t ≥ 0, (3.4)
2
geometric Brownian motion with volatility σ and drift µ.

3 Stochastic Processes 3.4 Black & Scholes model
3.4 Black & Scholes model
For easy reference in forthcoming examples and remarks we introduce the dynamics of the
stochastic security price processes underlying the famous Black & Scholes model. The model
assumes geometric Brownian motion for the price St of a stock. Under the analogous risk-neutral
distribution Q as investigated in Section 1.3 the drift of the geometric Brownian motion is equal
to the (continuously compounded) interest rate r , i.e.,
„ «
1 2
St = S0 exp rt + σWt − σ t .
2
The model also assumes that we can invest and borrow money at the risk-free rate r , that is
there is another security, the money market account, with deterministic price process
Bt = exp(rt).
The flow of information (filtration) F observable in the market is generated by observing the
stock price path through time, i.e., F = FS .

3 Stochastic Processes 3.4 Black & Scholes model
The ratio (St/Bt), which is the discounted stock price process, is clearly a (Q, F)–martingale.
Consider a payoff XT at time T , which is dependent on the stock price (path) via the payoff
function F :
XT = F (St, t ≤ T ).
No-arbitrage pricing theory and our analysis in Section 4.9 justifies that for any time 0 ≤ t ≤ T
the fair price Vt(XT ) is given by
„ ˛ «
XT ˛˛
Vt(XT ) = Bt EQ Ft , (3.5)
BT ˛
in particular,
„ «
XT
V0(XT ) = EQ .
BT

3 Stochastic Processes 3.5 Distribution of some important variables associated with Brownian motion
3.5 Distribution of some important variables associated with Brownian

motion
Generations of probabilists have worked hard to reveal the probability distribution of almost all
random quantities of interest related to Brownian motion. For example, we are interested in the
distribution of the maximum of the Brownian path up to some fixed time, or the distribution of
the first time the path hits some level.
Let X = (Xt)t≥0 be a stochastic process and for a ∈ R denote by

X
τa = inf{t ≥ 0 : Xt = a} (3.6)
the first time X hits the level a. We use the convention inf ∅ = ∞. Also define the running
maximum MtX and the running minimum mX t of the process X up to time t by
8
X X
Mt (ω) = max{Xs(ω) : s ≤ t}, mt (ω) = min{Xs(ω) : s ≤ t}. (3.7)
8 We are a bit sloppy here. In general one should use the supremum resp. infimum instead of the maximum resp. minimum which is not
always defined. But since we are working mostly with continuous processes X this is not a problem.

Proposition 3.8. Let W = (Wt) be a Brownian motion. Then
(i) For every a ∈ R the random hitting time τaW is finite P − a.s.,
W
P(τa < ∞) = 1.
(ii) The distribution of τaW is given by
∞ 2
2
Z
W −x
P(τa ≤ t) = √ e 2t dx. (3.8)
2πt a
Proof. . . .
Using the same ideas as in the proof of Proposition 3.8 one shows the following result on the
joint distribution of a Brownian motion and its running maximum.

Proposition 3.9. Let W be a Brownian motion. The joint distribution of (WT , MTW ) possesses
a density that is given by
2(2y − x) − (2y−x)2
f(W ,M W )(x, y) = √ e 2T , x ≤ y, y > 0. (3.9)
T T T 2πT
Proof. . . .

An important application is the pricing of a so-called barrier option on the underlying S

with price process (St). The payoff of an up and out call option with maturity T , strike E and
”up and out” barrier B is
+
XT = (ST − E) 1{M S ≤B}.
T
As motivated by our analysis in Section 1.3 and justified by no-arbitrage pricing theory, the fair
price V0(XT ) of this derivative is the ”discounted” expectation
1 h
+
i
V0(XT ) = EQ (ST − E) 1{M S ≤B} ,
(1 + r)T T
under an appropriately chosen probability measure Q. To calculate this expectation the knowledge
of the joint distribution of (ST , MTS ) is required. In the framework of the Black & Scholes
model the security S is modeled as a geometric Brownian motion with a certain drift. To derive
a closed pricing formula for barrier options, in Proposition 4.10 below we will generalize the result
of Proposition 3.9 to Brownian motions with drift applying Girsanov’s theorem.

3 Stochastic Processes 3.6 Markov processes
3.6 Markov processes
In applications many phenomena are modeled as stochastic processes that are so-called Markov
processes. The theory of Markov processes is extremely rich and offers a fruitful interplay with
functional analysis. Many well-known techniques in financial applications, such as, for example,
using partial differential equations or tree methods to solve for option prices, are critically based
on the Markov property of the underlying process.
However, despite the richness of methods available for Markov processes one should be aware
of the fact, that it is sometimes highly questionable whether a process in real life, such as, e.g., a
price process of a stock, is really Markovian.
Definition 3.10. A stochastic process X = (Xt)t∈T defined on a filtered probability space

(Ω, F , F = (Ft)t∈T, P) is called a Markov process with respect to F if
(i) X is F–adapted,

(ii) for all t > s and B ∈ B it holds that
P(Xt ∈ B|Fs) = P(Xt ∈ B|σ(Xs)), P − a.s. (3.10)
As a consequence of Lemma 2.1 for every s, t, B the right hand side of (3.10) can be written as
some Borel measurable function P (s, ., t, B) of the random variable Xs:
P (s, Xs, t, B) = P(Xt ∈ B|σ(Xs)). (3.11)
One often formally writes also
P (s, x, t, B) = P(Xt ∈ B|Xs = x).
The function P (s, x, t, B) is called the transition probability function of the Markov
process X . It can be shown that, considered as a function of B ∈ B, the transition
probability P (s, x, t, B) is a probability measure on R. In case this measure possesses a density
p(s, x, t, y), i.e., Z
P (s, x, t, B) = p(s, x, t, y)dy
B

we call p(s, x, t, y) the transition density of the Markov process X .
Theorem 3.11. Let W be a Wiener process with filtration F. Then W is a Markov process
w.r.t. F. Moreover W admits a transition density given by
1 (y−x)2
−
P(Wt ∈ dy|Ws = x) = p(s, x, t, y)dy = p e 2(t−s) dy, t > s. (3.12)
2π(t − s)
Proof: . . .

The Markov property is the key to the popular backward-induction method frequently used
in pricing derivatives. Let us illustrate this going back to the Binomial tree model of Section 1.1.
The underlying probability space (Ω, F , Q) is given by
Ω = {ω : ω = (ξ1, ξ2, . . . ), ξi ∈ {u, d}} (3.13)

F = σ ({ω : ωT = (ξ1, . . . , ξT )}, (3.14)
ξ1, . . . , ξT ∈ {u, d}, T = 1, 2, . . . )
]u (ωT ) ]d (ωT )
Q({ω : ωT = (ξ1, ξ2, . . . , ξT )}) = p (1 − p) (3.15)
1+r−d
p = . (3.16)
u−d
There are two securities S 1, S 2 which are modeled as stochastic processes for times
T = {0, 1, 2, . . . }:
1 t
St (ω) = (1 + r) (3.17)
2 2 ]u (ωt ) ]d (ωt )
St (ω) = S0 · u d . (3.18)

The flow of information (the filtration) is
S 1 ,S 2 S2
F=F =F .
Proposition 3.12. The following properties hold.
2
St+1
(i) For every t the return ratio is independent of Ft.
St2
S2
(ii) The process S1
is a martingale.
2
(iii) The process S is a Markov process w.r.t. F.
Proof: see Exercises.

Consider a derivative with integrable payoff XT at time T . As known from Corollary 1.2, the fair
price V0(XT ) today is given by
1
V0(XT ) = T
EQ(XT ).
(1 + r)
Denote Vt(XT ) = (1 + r)−(T −t)EQ(XT |Ft), t ≤ T . As will become clear from arbitrage
pricing theory, Vt(XT ) is indeed the fair price of the derivative XT at time t – but this is not
important for now.
If XT is of the form XT = F (ST2 ), applying the Markov property of S 2 and the rule of
iterated conditioning (2.15) from Theorem 2.10, one derives the following facts:
(i) For each t ≤ T the price Vt(XT ) is a function of St2

2
Vt(XT ) = g(t, St ),
in particular, it does not depend on the history of the price Su2 , u < t.

(ii) We have the following backward recursion

2 2
g(T, ST ) = F (ST )
h i
2 −1 2 2
g(t, St ) = (1 + r) p · g(t + 1, St u) + (1 − p) · g(t + 1, St d)
t = T − 1, . . . , 0.

4 Stochastic Integration
4 Stochastic Integration
In this section we define the stochastic integral of the form

Z t
Hs dXs, t ≥ 0,
0
where H and X are stochastic processes on a filtered probability space (Ω, F , F, P). From
Subsection 4.2 on we will mainly deal with the case that X is a Brownian motion.
Recall our motivation for looking for integrals of stochastic integrands H integrated w.r.t. a
stochastic process X . In Section 1, working there on a discrete time scale, we have seen that
for a strategy θ = (θt) of trading in the security S = (St) the cumulative profit and loss up to
time t is of the form Z t
θudSu.
0

4 Stochastic Integration 4.1 Integrating w.r.t. a stochastic process
4.1 Integrating w.r.t. a stochastic process
Consider first integrating a ”simple” process H , where there is a partition 0 = t0 < t1 <
t2 · · · < tn of time such that H is ”constant” on each interval (ti, ti+1]:
n
X
Ht = ξ0 1{0}(t) + ξi 1(ti,ti+1](t), t ≥ 0. (4.1)
i=0
The ”values” ξi are not necessarily constant, they could be random variables. Then a natural
definition of the stochasticRintegral would be the following. For every t ≥ 0 and ω ∈ Ω we
t
define the random variable 0 HsdXs by9
„Z t « n
X “ ”
HsdXs (ω) = ξi(ω) Xt∧ti+1 (ω) − Xt∧ti (ω) , t ≥ 0. (4.2)
0 i=0
9 As common we denote s ∧ t = min(s, t) and s ∨ t = max(s, t).

4 Stochastic Integration 4.1 Integrating w.r.t. a stochastic process
Now, as is standard, one could try to extend the definition of the integral to more general
integrands H by taking limits of sequences of simple integrands hoping that the respective
integrals also converges to a limit and so on. Unfortunately, this will not work in the cases of
particular interest to us, namely in case of X being a Brownian motion.
To give an idea where the problem is coming from, we consider a result which can be found in
[13], Theorem 52. Assume h = h(t) and x = x(t) are deterministic functions of time t ∈ [0, 1].
P
For a partition π = {0 = t0 < t1 < . . . tn = 1} define Iπ = i h(ti )(x(ti+1 ) − x(ti )).
If the limit lim|π|→0 Iπ exists for every continuous integrand h then x = x(t) is necessarily a
function of bounded variation!
But as we know from Proposition 3.4 the Brownian path W.(ω) is of unbounded variation,
so a naive path by path definition of the stochastic integral is not viable for X being a Brownian
motion. To solve this problem we have to think of a limiting procedure which is not based on
a path by path convergence but rather on a weaker concept of measuring the distance when
passing to the limit.

4 Stochastic Integration 4.2 Ito integral
4.2 Ito integral
From now on we assume that the integrating process X is a Brownian motion W = (Wt) with
filtration F. For the simple integrand H ,
n
X
Ht = ξ0 1{0}(t) + ξi 1(ti,ti+1](t), t ≥ 0, (4.3)
i=0
we suppose now in addition that the random variable ξi is Fti –measurable and square integrable.
This measurability assumption implies that H is F–adapted. Recall from our motivation that the
integrands of interest to us are trading strategies which have to be adapted anyway.
Define the stochastic integral by

Z t n
X “ ”
It(H) = HsdWs = ξi Wt∧ti+1 − Wt∧ti , t ≥ 0. (4.4)
0 i=0

Theorem 4.1. The stochastic integral I(H) = (It(H)) satisfies the following properties.
(i) (Continuity) The stochastic process I(H) = (It(H)) possesses continuous paths.
(ii) (Adaptedness) The process I(H) is F–adapted, i.e., It(H) is Ft–measurable for every t.
(iii) (Linearity) For two simple integrands H 1, H 2 and constants c1, c2 we have that
1 2 1 2
It(c1H + c2H ) = c1It(H ) + c2It(H ), t ≥ 0.
(iv) (Martingale property) The integral process I(H) = (It(H)) is a martingale which
starts at zero, I0(H) = 0.
(v) (Isometry) For every t it holds that
Z t
2 2
E(It(H)) = E (Hs) ds. (4.5)
0

(vi) (Covariance) For every t and two simple integrands H, K the covariance of the integrals is
Z t
E(It(H)It(K)) = E HsKsds. (4.6)
0
(vii) (Quadratic Variation) The quadratic variation of the integral process I(H) is
Z t
2
[I(H), I(H)]t = Hs ds. (4.7)
0
Proof: . . .
The isometry property is the key to extending the definition of the stochastic integral to
more general integrands H . We assume that H = (Ht) is F–adapted and satisfies the condition
that10 „Z t «
2
E (Hs) ds < ∞, ∀t ≥ 0. (4.8)
0
10 Implicitly this assumes also that for every ω the integral R t (H (ω))2 ds is defined and gives again a random variable. For that to
0 s
+
hold Hs (ω) has to be B(R ) ⊗ F –measurable.

Then one can show that H can be approximated by a sequence H n of simple processes such that
„Z t «
n 2
lim E (Hs − Hs) ds = 0,
n 0
cf. [6], Lemma 6.3.5. Applying the isometry property we can show that the stochastic integrals
It(H n) converge11 to a limiting variable, which we denote by It(H), where the convergence is
in the sense of mean square convergence:
n 2
lim E (It(H ) − It(H)) = 0.
n
It turns out that all properties of the stochastic integral in Theorem 4.1 are preserved by the
limiting procedure.
Finally, it is possible to extend the definition of the stochastic integral to an even wider class
11 To be more precise, I (H n ) forms a Cauchy sequence in the complete space L2 (Ω, F , P) and has therefore a limit.
t

of integrands. Let H = (Ht) be F–adapted and such that

„Z t «
2
P (Hs) ds < ∞ = 1, ∀t ≥ 0. (4.9)
0
To extend the definition of the integral further to those integrands we have to utilize an even
weaker concept of convergence. One can show that any such integrand H can be approximated12
by a sequence H n of integrands satisfying (4.8). The corresponding integrals It(H n) are then
convergent in probability P and the integral It(H) is defined to be their limit.
Observe that for integrands H satisfying only the weaker condition (4.9) the martingale
property of the stochastic integral as well as the isometry and the formula for the covariance get
lost. The integral It(H) is in general no longer square integrable and it is merely a so-called
local martingale instead of a martingale.
12 The convergence in the approximation is uniform on finite time intervals in probability.

In case the integrand H is non-stochastic the integral preserves the normal distribution of
the Brownian motion.
Proposition 4.2. Assume a deterministic integrand h = h(t) such that
Z t
2
(h(s)) ds < ∞.
0
Rt
Then the stochastic integral It(h) = 0
h(s)dWs is a Gaussian process. In particular, its
expectation and covariance function is
EIt(h) = 0 (4.10)
Z t∧s
2
E (It(h)Is(h)) = (h(u)) du. (4.11)
0

4 Stochastic Integration 4.3 Ito process
4.3 Ito process
The definition of the stochastic integral is trivially extended from Brownian motion as the
integrating process to so-called Ito processes.
Definition 4.3. Given two F–adapted processes µ = (µt) and σ = (σt) satisfying
Z t
2
(σs) ds < ∞, P − a.s., ∀t ≥ 0 (4.12)
0
Z t
|µs|ds < ∞, P − a.s., ∀t ≥ 0. (4.13)
0
Then the process X = (Xt) defined by

Z t Z t
Xt = X0 + σsdWs + µsds (4.14)
0 0
is called an Ito process with diffusion (σt) and drift (µt).

4 Stochastic Integration 4.3 Ito process
On often writes also formally

dXt = σtdWt + µtdt.
Rt
The stochastic integral 0
GsdXs is then naturally defined as
Z t Z t Z t
GsdXs := GsσsdWs + Gsµsds, (4.15)
0 0 0
where we have to assume that

Z t Z t
2
(Gsσs) ds < ∞, |Gsµs|ds < ∞, P − a.s., ∀t ≥ 0.
0 0
Here is a result which will be frequently used.
Lemma 4.4. For an Ito process (4.14) to be a martingale, it is necessary that the ”dt” vanishes,
i.e., µ ≡ 0.

4 Stochastic Integration 4.4 Ito formula
4.4 Ito formula
The Ito-formula is one of the most powerful tools of stochastic calculus. It is a version of the
well known chain rule of differential calculus, adapted and modified for the particular needs of
stochastic integration.
Let us recall the chain rule. For smooth functions f and g we know that
d 0 0
f (g(t)) = f (g(t))g (t),
dt
which can be written also as
0 0 0
df (g(t)) = f (g(t))g (t)dt = f (g(t))dg(t).
A formula of this type remains valid as long as the function g is of bounded variation. In case of
the stochastic integral we know that the integrating Brownian motion has paths of unbounded
variation, which forced us to define the integral in the sense of a mean square convergence. This

will be exactly the reason for the appearance of an additional term in the chain rule of stochastic
calculus.
We start with the Ito formula for a Brownian motion W .
Theorem 4.5. [Ito formula for Brownian motion] Let f = f (t, x) be a function for which
the partial derivatives ft(t, x), fx(t, x), fxx(t, x) exist and are continuous. Then
t t t
1
Z Z Z
f (t, Wt) = f (0, W0)+ fx(s, Ws)dWs + ft(s, Ws)ds+ fxx(s, Ws)ds, t ≥ 0.
0 0 2 0
(4.16)
Proof: We restrict ourselves to the case of a function f that does not depend on time,
f = f (x). We sketch the main steps of the proof discussing more details in the class.
For f = f (x) depending only on x the Ito formula reads as
t t
1
Z Z
f (Wt) = f (W0) + fx(Ws)dWs + fxx(Ws)ds.
0 2 0

Take a partition (tni) of the interval [0, t], 0 = tn0 < tn1 < . . . tnn = t. Then, clearly
n−1 “
X ”
f (Wt) = f (W0) + f (Wtn ) − f (Wtn ) .
i+1 i
i=0
Using Taylor’s formula, applied to each difference f (Wtn ) − f (Wtn ) we get

i+1 i
n−1 n−1
X “ ” 1X n
“ ”2
f (Wt) = f (W0) + fx(Wtn ) Wtn − Wtn + fxx(θi ) Wtn − Wtn ,
i=0
i i+1 i 2 i=0 i+1 i
with θin inside the interval spanned by Wtn , Wtn . It is not difficult to see that the first sum
i+1 i
converges to the stochastic integral,
n−1
X “ ” Z t
fx(Wtn ) Wtn − Wtn → fx(Ws)dWs.
i i+1 i
i=0 0

It is a bit more tricky to see that the second sum converges to a ”ds”–integral,
n−1
X “ ”2 Z t
n
fxx(θi ) Wtn − Wtn → fxx(Ws)ds.
i+1 i
i=0 0
Observe that in classical differential calculus this sum would converge to zero. If the function
Ws(ω), s ∈ [0, 1], had bounded variation then its square variation would be zero (cf. Exercises!)
causing the second sum to disappear in the limit.
Along the same lines one derives an Ito formula for general Ito processes.
Theorem 4.6. [Ito formula for Ito processes] Suppose X = (Xt) is an Ito process,
Z t Z t
Xt = X0 + σsdWs + µsds.
0 0
Let f = f (t, x) be a function for which the partial derivatives ft(t, x), fx(t, x), fxx(t, x) exist

and are continuous. Then
f (t, Xt) (4.17)

t t
1 t
Z Z Z
2
= f (0, X0) + fx(s, Xs)dXs + ft(s, Xs)ds + fxx(s, Xs)(σs) ds
0 0 2 0
Z t Z t
= f (0, X0) + fx(s, Xs)σsdWs + fx(s, Xs)µsds
0 0
t t
1
Z Z
2
+ ft(s, Xs)ds + fxx(s, Xs)(σs) ds, t ≥ 0.
0 2 0
The last integral is often written using the quadratic variation [X, X] of the Ito-process X . The
quadratic variation of X is by (4.7) by the fact that the ”ds” part of X does not contribute to
the quadratic variation of X ,
Z t
2
[X, X]t = (σs) ds.
0

4 Stochastic Integration 4.5 Simple stochastic differential equations
Now the last integral in the Ito–formula is indeed of the form

Z t Z t
2
fxx(s, Xs)(σs) ds = fxx(s, Xs)d[X, X]s.
0 0
4.5 Simple stochastic differential equations
Now it is time to relax from the hard work so far. Let us enjoy the power of the Ito formula
applying it now to solve some stochastic differential equations that are of extreme importance in
finance.
Consider the following simple linear stochastic differential equation (SDE)

Z t Z t
Xt = X0 + σXsdWs + µXsds, (4.18)
0 0
or, formally,
dXt = σXtdWt + µXtdt,

with constants σ, µ. A stochastic process X satisfying this equation is called a solution of the
SDE (4.18).
Proposition 4.7. The linear stochastic differential equation (4.18) possesses a unique solution
that is given by „ «
1 2
Xt = X0 exp σWt + (µ − σ )t , t ≥ 0. (4.19)
2
In other words, the solution of (4.18) is a geometric Brownian motion with volatility σ and drift
µ.
Proof: . . .
In the same way one derives that the solution of the inhomogenous SDE,
dXt = σtXtdWt + µtXtdt, (4.20)
is given by „Z t t «
1 2
Z
Xt = X0 exp σsdWs + (µs − σs )ds . (4.21)
0 0 2

In case µt = 0 the process
„Z t t «
1 2
Z
Xt = X0 exp σsdWs − σs ds .
0 0 2
is the solution of the SDE dXt = XtσtdWt and R t X is called the exponential (local)
martingale or the exponential associated with 0 σsdWs. Exponential martingales will act
as Radon-Nikodym densities when changing probability measures on filtered spaces. We will meet
those exponentials soon in Section 4.6 below.
In the generalized Vasicek model (also called Hull-White model) the so-called short rate
(rt) is modeled by the mean-reverting SDE
drt = λ(t)(ϑ(t) − rt)dt + σ(t)dWt. (4.22)

Proposition 4.8. The unique solution (rt) of the SDE (4.22) is given by
» Z t Z t –
−1
rt = A (t) r0 + A(s)λ(s)ϑ(s)ds + A(s)σ(s)dWs , (4.23)
0 0
where A(t) is defined as „Z t «

A(t) = exp λ(s)ds .
0
Proof: Use Ito’s formula. Hint: start calculating the dynamics of the process rtA(t).

4 Stochastic Integration 4.6 Change of measure – Girsanov’s theorem
4.6 Change of measure – Girsanov’s theorem
As we have seen in Section 1.3 pricing of derivatives is simplified by introducing an appropriate

artificial probability measure. Such a measure is obtained by changing measure, see also Section
2.5.
Clearly, when switching from a measure P to another measure Q any Wiener process W
under the probability P will be no longer a Wiener process w.r.t. Q.
Theorem 4.9. [Girsanov’s theorem] Let W = (Wt) be a Brownian motion on a filtered

probability space (Ω, F , F, P). Denote
„Z t t «
1
Z
2
Zt = exp HsdWs − Hs ds , (4.24)
0 2 0
where the integrand H is such that the Ito integral is well-defined. Define a new probability
measure Q by
dQ = ZT dP, (4.25)

where we have to assume that

EPZT = 1. (4.26)
Then the process B = (Bt) defined by
Z t
Bt = W t − Hsds, t ≤ T, (4.27)
0
is a Q–Brownian motion.
Proof: . . .
The condition (4.26) is not easy to verify. There is a sufficient condition for (4.26) to hold
true due to Novikov:
(Novikov condition)
!
T
1
Z
2
EP exp Hs ds < ∞ ⇒ EPZT = 1. (4.28)
2 0

As an application we use Girsanov’s theorem to generalize Proposition 3.9 and to derive the
joint distribution of the process and its running maximum for a Brownian motion with drift µ
Xt = Wt + µt.
Proposition 4.10. The joint distribution of (XT , MTX ) possesses a density that is given by
2(2y − x) − (2y−x)2 µx− 1 µ2T

f(X ,M X )(x, y) = √ e 2T e 2 , x ≤ y, y > 0. (4.29)
T T T 2πT
Proof. . . .
We can use this result to get the explicit valuation formula for an up-and-out call (barrier
option) with strike E and up-and-out barrier B in the Black & Scholes model
+
EQ exp(−rT )(ST − E) 1{M S ≤B},
T

where „ «
1 2
St = S0 exp rt + σWt − σ t .
2
See the Exercises.

4 Stochastic Integration 4.7 Links to partial differential equations
4.7 Links to partial differential equations
According to no-arbitrage pricing theory the price of a payoff is given by the expectation under
an appropriate probability measure.
Given that the underlying security price processes are Markovian, it turns out that expectations
are solutions to certain partial differential equations (PDE). This is the key to using methods of
PDE solving to calculate prices.
Consider the process X being the solution of the stochastic differential equation
dXt = σ(t, Xt)dWt + µ(t, Xt)dt. (4.30)
Under relatively mild conditions on the functions σ(t, x), µ(t, x) one can show existence and
uniqueness of the solution.

Theorem 4.11. The solution X of equation (4.30) is a Markov process, i.e., for every bounded
Borel function h|R → R and t ≤ T it holds that
E(h(XT )|Ft) = E(h(XT )|σ(Xt)).
Proof: We give only an intuitive argument why the solution is Markov, a detailed proof is quite
technical. Let t = T − δ for some small δ . Then
Z T Z T
XT = XT −δ + σ(s, Xs)dWs + µ(s, Xs)ds
T −δ T −δ
≈ XT −δ + σ(T − δ, XT −δ )(WT − WT −δ ) + µ(T − δ, XT −δ ) δ.
Now applying Proposition 2.13 (iii), using that XT −δ is FT −δ –measurable and that (WT −WT −δ )

is independent of FT −δ , we get
E(h(XT )|FT −δ )
„ ˛ «
` ´˛
≈ E h XT −δ + σ(T − δ, XT −δ )(WT − WT −δ ) + µ(T − δ, XT −δ ) δ ˛˛FT −δ
Z
` ´
≈ h XT −δ + σ(T − δ, XT −δ ) z + µ(T − δ, XT −δ ) δ n0,δ (z)dz,
R
where n0,δ (z) is the density of the normal distribution with expectation zero and variance δ . The
result on the right hand side is obviously σ(XT −δ )–measurable. Now repeating the argument
from T − δ to T − 2δ and so on, and, applying the property of iterated conditional expectations
”proves” the assertion.
Due to the Markov property, we know that E(h(XT )|Ft) is an σ(Xt)–measurable variable,
which, by Lemma 2.1 can be written as a function of Xt,
E(h(XT )|Ft) = g(t, Xt).

It is common to write
g(t, x) = Et,x(h(XT )).
Now we are ready to establish the link to partial differential equations.
Theorem 4.12. [Feynman-Kac] Let X be the solution of the stochastic differential equation
dXt = σ(t, Xt)dWt + µ(t, Xt)dt. (4.31)
(i) For a Borel function h(x) denote
g(t, x) = Et,x(h(XT )), 0 ≤ t ≤ T, x ∈ R
Then g(t, x) satisfies the PDE

1 2
gt(t, x) + µ(t, x)gx(t, x) + σ (t, x)gxx(t, x) = 0, (4.32)
2
with terminal condition
g(T, x) = h(x).

(ii) For Borel functions h(x) and r(t, x) denote
„ R «
− T r(s,Xs )ds
f (t, x) = Et,x e t h(XT ) , 0 ≤ t ≤ T, x ∈ R
Then f (t, x) satisfies the PDE
1 2
ft(t, x) + µ(t, x)fx(t, x) + σ (t, x)fxx(t, x) = r(t, x)f (t, x), (4.33)
2

f (T, x) = h(x).
Proof: We sketch the proof of part (i). First, one can show that g(t, x) is a smooth function

so that we can apply Ito’s formula for Ito processes, Proposition 4.6, to the process g(t, Xt),
dg(t, Xt)
= gt(t, Xt)dt + gx(t, Xt)µ(t, Xt)dt + gx(t, Xt)σ(t, Xt)dWt
1 2
+ gxx(t, Xt)σ (t, Xt)dt.
2
On the other hand, being a conditional expectation of a random variable h(XT ),
g(t, Xt) = E(h(XT )|Ft), 0 ≤ t ≤ T,
is a martingale (cf. (3.3)). Therefore, by Lemma 4.4 the ”dt” contribution has to vanish, i.e.,
1 2
gt(t, Xt) + gx(t, Xt)µ(t, Xt) + gxx(t, Xt)σ (t, Xt) = 0,
2
which ”shows” that g(t, x) has to solve the asserted PDE. The terminal condition is obvious.

As an application we derive the Black & Scholes partial differential equation for the
price of an option in the framework of the Black & Scholes model, see Section 3.4. The stock
price S = (St) satisfies the SDE
dSt = σStdWt + rStdt.
Consider an option with payoff XT = h(ST ) at time T , in particular, if h(x) = (x − E)+ we

have a call option with strike E . By (3.5) the price of the option at time t ≤ T is
„ ˛ « “ ”
−r(T −t) ˛ −r(T −t)
Vt(XT ) = E e h(ST )˛˛Ft = Et,St e h(ST ) =: v(t, St).
Now applying the Feynman-Kac theorem we get immediately that v(t, x) solves the PDE
1 2 2
vt(x, t) + rxvx(t, x) + σ x vxx(t, x) = rv(t, x), (4.34)
2
v(T, x) = h(x).

4 Stochastic Integration 4.8 Stochastic integral representation
4.8 Stochastic integral representation
The result of this section is the key to answer the question whether a payoff XT can be replicated
by trading in the underlying securities. Recall from Section 1 that this was our starting point to
pricing payoffs by the no-arbitrage argument. Basically, payoffs which cannot replicated cannot
be priced by no-arbitrage.
Theorem 4.13. [Martingale representation] Let W be a Brownian motion on a probability

space (Ω, F , P). We take the filtration FW generated by observing this Brownian motion W .
(i) Let XT be an integrable random variable which is FTW –measurable. Then there exists an
appropriate integrand H = (Ht) such that XT has the representation
Z T
XT = EXT + HsdWs. (4.35)
0

4 Stochastic Integration 4.9 Applications to the Black & Scholes model
(ii) Let M = (Mt) be an FW –martingale. Then M admits the representation

Z t
Mt = M0 + GsdWs, t ≥ 0, (4.36)
0
with some appropriate integrand G.
4.9 Applications to the Black & Scholes model
We start the most important application of the previous result in the field of finance.
Consider the Black & Scholes model with the underlying stock price S modeled as
dSt = σStdWt + rStdt,
under the risk-neutral measure Q, and with risk free bank account B given by
dBt = rBtdt.

The information in the market is generated by observing the underlying securities S, B ,
S,B S
F=F =F .
We are interested in the pricing of a payoff XT at time T which is an integrable FTS –measurable
random variable. The payoff XT can be replicated with the strategy θ = (θtS , θtB ) if the value
generated by the strategy until time T coincides with XT in all states ω of the world. The value
Vt(θ) at time t of the portfolio associated with the strategy is defined as
S B
Vt(θ) = θt St + θt Bt.
Replicating the payoff XT requires a strategy θ such that
S B
XT (ω) = VT (θ)(ω) = θT (ω)ST (ω) + θT (ω)BT (ω), ∀ω ∈ Ω.
Moreover, the strategy has to be such that there are only initial costs V0(θ) meaning that

rearranging the portfolio constitution is cost-less, i.e.,

Z t Z t
S B
Vt(θ) = V0(θ) + θu dSu + θu dBu, ∀t ≤ T.
0 0
Such a strategy θ is called self-financing.
Proposition 4.14. In the framework of the Black & Scholes model any integrable payoff XT at
time T can be replicated by a self-financing strategy.
Proof: Because of FS = FW we can apply the martingale representation Theorem 4.13 (i) to
get
T
XT XT
Z
= EQ + HsdWs.
BT BT 0
Now we have to translate the dW integral into an dS integral. Let S̃ = S/B , then
dS̃t = σ S̃tdWt

and thus
T
XT XT Hs
Z
= EQ + dS̃s.
BT BT 0 σ S̃s
Now define
S Ht
θt =
σ S̃t
t
XT
Z
B S S
θt = EQ + θs dS̃s − θt S̃t.
BT 0
The initial costs of the strategy are
S B XT
V0(θ) = θ0 S 0 + θ0 B0 = EQ . (4.37)
BT
The strategy replicates XT . Indeed,
„ Z T «
S B S XT S S
θT ST + θT BT = θT ST + EQ + θs dS̃s − θT S̃T BT = XT .
BT 0

It remains to show that θ = (θtS , θtB ) is self-financing. Right from the definition of the strategy
we obtain
S B
Vt(θ) = θt S t + θt Bt
„ Z t «
S
= Bt V0(θ) + θs dS̃s .
0
Rt
Now, Ito’s formula applied for the Ito process Yt = V0(θ) + 0
θsS dS̃s = θtB + θtS S̃t and
remembering that Bt = ert, yields
“ ”
rt
dVt(θ) = d (BtYt) = d e Yt
rt rt S
= r e Ytdt + e θt dS̃t
B S S
= (θt + θt S̃t)dBt + Btθt dS̃t
B S
= θt dBt + θt dSt,

where the last step uses Ito’s formula for BtS̃t. This shows that
Z t Z t
B S
Vt(θ) = V0(θ) + θu dBu + θu dSu,
0 0
that is, θ is self-financing.
Knowing that every payoff XT can be replicated, the price of XT equals its costs of
replication, i.e., V0(XT ) = V0(θ). As a side result of the proof, namely equation (4.37), we
have shown the following result, which is the analogue to our result for the Binomial model in
Section 1.3.

Proposition 4.15. The fair price of the integrable payoff XT is given by the expectation
XT
V0(XT ) = EQ .
BT
More generally, for times 0 ≤ t ≤ T the price at time t is

„ ˛ «
XT ˛˛
Vt(XT ) = BtEQ Ft .
BT ˛
Even though we know the existence of a replicating strategy θ it is of enormous importance

in practice to know the strategy explicitly. The reason is, that once a bank bought or sold a
contract on the payoff XT from or to a client, the bank will normally start hedging the position
by entering into offsetting trades. This means the bank will start setting up the replicating
strategy for XT and for that the explicit knowledge of θtS and θtB is indispensable. Again, Ito’s
formula is the key to solve the problem.

Proposition 4.16. Consider an integrable payoff XT at time T which is of the form XT =

h(ST ). Denote by v(t, St) the pricing function
“ ”
−r(T −t)
Vt(h(ST )) = EQ e h(ST )|Ft = v(t, St).
The replicating strategy for XT = h(ST ) is then given by

S
θt = vx(t, St) (4.38)
B v(t, St) − vx(t, St)St
θt = . (4.39)
Bt
Proof: Applying Ito’s formula we obtain

1 2 2
dv(t, St) = vt(t, St)dt + vx(t, St)dSt + vxx(t, St)σ St dt
2
vt(t, St) + 21 vxx(t, St)σ 2St2
= vx(t, St)dSt + dBt.
rBt

On the other hand, the replicating strategy satisfies

S B
dVt(h(T )) = θt dSt + θt dBt.
Comparing coefficients in both equations (why is this justified?) yields

S
θt = vx(t, St)
B vt(t, St) + 21 vxx(t, St)σ 2St2

θt = .
rBt
The other representation for θtB follows from the PDE (4.34).
The replicating position θtS is called the option delta, which is nothing else but the first
derivative (the sensitivity) of the option price v(t, St) with respect to the current stock price
St .

References
[1] Bauer, H.: Wahrscheinlichkeitstheorie, Walter de Gruyter Berlin 1991

[2] Bauer, H.: Maß- und Integrationstheorie, Walter de Gruyter Berlin 1992
[3] Björk, T.: Arbitrage Theory in Continuous Time, Oxford University Press, Oxford 1998
[4] Breiman, L.: Probability, Classics in Applied Mathematics, SIAM 1992
[5] Durrett, R.: Stochastic Calculus, A Practical Introduction, CRC Press Boca Raton,
1996
[6] Elliott, R.J. and Kopp, P.E.: Mathematics of Financial Markets, Springer Berlin,
2005 79
[7] Hunt, P.J. and Kennedy, J.E.: Financial Derivatives in Theory and Practice, Wiley
New York 2000
[8] Jacod, J. and Protter P.: Probability Essentials, Springer Berlin 2000
[9] Karatzas, I. and Shreve, S. E.: Brownian Motion and Stochastic Calculus, Springer
Verlag Berlin 1988

[10] Klebaner, F.C.: Introduction to Stochastic Calculus with Applications, Imperial Collge
Press 2005
[11] Lamberton, D. and Lapeyre, B.: Introduction to Stochastic Calculus Applied to
Finance, Chapman & Hall London, 1996
[12] Øksendal, B.: Stochastic Differential Equations, Springer Berlin 1995
[13] Protter, P.: Stochastic Integration and Differential Equations, , Springer Berlin 1990
75
[14] Royden, H.L.: Real Anaylsis, Macmillan New York 1988
[15] Shreve, S.E.: Stochastic Calculus for Finance I, II, Springer Berlin 2004

Schmidt01 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Schmidt01 PDF

Uploaded by

Copyright:

Available Formats

Stochastic Processes I

Master of Science in Quantitative Finance

Prof. Dr. Wolfgang M. Schmidt

October 27, 2006

1 Warm-up & Motivation 5

1.1 Binomial tree model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Pricing an option by replication . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Costs of replication as expectation . . . . . . . . . . . . . . . . . . . . . . . 15

1.4 What do we have to learn? . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1 Probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Convergence of random variables . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Change of measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6 Conditional expectation, conditional probability . . . . . . . . . . . . . . . . 36

3.1 General definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Black & Scholes model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.6 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.1 Integrating w.r.t. a stochastic process . . . . . . . . . . . . . . . . . . . . . 74

4.2 Ito integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 Ito process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.4 Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.5 Simple stochastic differential equations . . . . . . . . . . . . . . . . . . . . 89

4.6 Change of measure – Girsanov’s theorem . . . . . . . . . . . . . . . . . . . 93

4.7 Links to partial differential equations . . . . . . . . . . . . . . . . . . . . . 97

4.9 Applications to the Black & Scholes model . . . . . . . . . . . . . . . . . . 105

1 Warm-up & Motivation

We base our motivation on a quite simple model for a financial market.

Prof. Dr. W. M. Schmidt 5 Stochastic Processes I, WS 2006

1.1 Binomial tree model

Prof. Dr. W. M. Schmidt 6 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 7 Stochastic Processes I, WS 2006

1.2 Pricing an option by replication

Prof. Dr. W. M. Schmidt 8 Stochastic Processes I, WS 2006

ω = (ξ1, ξ2, . . . ), ξt ∈ {u, d}.

Prof. Dr. W. M. Schmidt 9 Stochastic Processes I, WS 2006

specified by the following numbers

Prof. Dr. W. M. Schmidt 10 Stochastic Processes I, WS 2006

r = 5.0%. Now consider the following strategy

Prof. Dr. W. M. Schmidt 11 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 12 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 13 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 14 Stochastic Processes I, WS 2006

1.3 Costs of replication as expectation

Prof. Dr. W. M. Schmidt 15 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 16 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 17 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 18 Stochastic Processes I, WS 2006

Prof. Dr. W. M. Schmidt 19 Stochastic Processes I, WS 2006

1.4 What do we have to learn?

Here is our programme:

(i) Describe the dynamics of random prices (Sti) through time

Prof. Dr. W. M. Schmidt 20 Stochastic Processes I, WS 2006

2.1 Probability space

The set F of observable events is assumed to be a σ -algebra, i.e.,

Prof. Dr. W. M. Schmidt 21 Stochastic Processes I, WS 2006

(ii) (countable additivity) for A1, A2, · · · ∈ F with Ai ∩ Aj = ∅, ∀i 6= j , it holds that

Prof. Dr. W. M. Schmidt 22 Stochastic Processes I, WS 2006

2.2 Random variables

A random variable X associates with each outcome ω ∈ Ω a real number X(ω) ∈ R, so X

Closely related to the distribution of X is the distribution function FX (x), x ∈ R,

Prof. Dr. W. M. Schmidt 23 Stochastic Processes I, WS 2006

FX (x) = P(X ≤ x) = PX ((−∞, x]), x ∈ R. (2.2)

The notion of absolute continuity of Q w.r.t. P, in symbols Q P, states that