You are on page 1of 89

COURSE NOTES

Financial Mathematics MTH3251


Modelling in Finance and Insurance ETC 3510.
Lecturers: Andrea Collevecchio and Fima Klebaner
School of Mathematical Sciences
Monash University
Semester 1, 2016

Contents
1 Introduction.
1.1 Example of models . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Application in Finance . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Application in Insurance . . . . . . . . . . . . . . . . . . . . . . . .

1
1
2
2

2 Review of probability
2.1 Distribution of Random Variables. General.
2.2 Expected value or mean . . . . . . . . . . .
2.3 Variance Var, and SD . . . . . . . . . . . . .
2.4 General Properties of Expectation . . . . .
2.5 Exponential moments of Normal distribution
2.6 LogNormal distribution . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

4
4
5
6
7
8
9

3 Independence.
3.1 Joint and marginal densities . . . . . . . . . .
3.2 Multivariate Normal distributions . . . . . . .
3.3 A linear combination of a multivariate normal
3.4 Independence . . . . . . . . . . . . . . . . . .
3.5 Covariance . . . . . . . . . . . . . . . . . . . .
3.6 Properties of Covariance and Variance . . . .
3.7 Covariance function . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

10
10
11
12
13
15
15
16

4 Conditional Expectation
4.1 Conditional Distribution and its mean . . . .
4.2 Properties of Conditional Expectation . . . .
4.3 Expectation as best predictor . . . . . . . . .
4.4 Conditional Expectation as Best Predictor . .
4.5 Conditional expectation with many predictors

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

16
16
17
18
18
20

.
.
.
.

22
22
23
23
25

.
.
.
.
.
.

26
26
27
28
29
29
30

5 Random Walk and Martingales


5.1 Simple Random Walk . . . . . . . . . . . . . . .
5.2 Martingales . . . . . . . . . . . . . . . . . . . .
5.3 Martingales in Random Walks . . . . . . . . . .
5.4 Exponential martingale in Simple Random Walk
6 Optional Stopping Theorem and Applications
6.1 Stopping Times . . . . . . . . . . . . . . . . . .
6.2 Optional Stopping Theorem . . . . . . . . . . .
6.3 Hitting probabilities in a simple Random Walk .
6.4 Expected duration of a game . . . . . . . . . . .
6.5 Discrete time Risk Model . . . . . . . . . . . . .
6.6 Ruin Probability . . . . . . . . . . . . . . . . .

. . . .
. . . .
. . . .
( pq )Xn
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

7 Applications in Insurance
7.1 The bound for the ruin probability. Constant R.
7.2 R in the Normal model . . . . . . . . . . . . . .
7.3 Simulations . . . . . . . . . . . . . . . . . . . .
7.4 The Acceptance- Rejection method . . . . . . .

.
.
.
.

30
32
32
34
35

8 Brownian Motion
8.1 Denition of Brownian Motion . . . . . . . . . . . . . . . . . . . . .
8.2 Independence of Increments . . . . . . . . . . . . . . . . . . . . . .

36
36
37

9 Brownian Motion is a Gaussian Process


9.1 Proof of Gaussian property of Brownian Motion
9.2 Processes obtained from Brownian motion . . .
9.3 Conditional expectation with many predictors .
9.4 Martingales of Brownian Motion . . . . . . . . .

.
.
.
.

38
38
41
42
44

.
.
.
.
.
.
.
.
.

46
46
46
47
48
49
49
50
51
52

.
.
.
.

.
.
.
.

10 Stochastic Calculus
10.1 Non-dierentiability of Brownian motion . . . . .
10.2 Ito Integral. . . . . . . . . . . . . . . . . . . . . .
10.3 Distribution of Ito integral of simple deterministic
10.4 Simple stochastic processes and their Ito integral
10.5 Ito integral for general processes . . . . . . . . . .
10.6 Properties of Ito Integral . . . . . . . . . . . . . .
10.7 Rules of Stochastic Calculus . . . . . . . . . . . .
10.8 Chain Rule: Itos formula for f (Bt ). . . . . . . . .
10.9 Martingale property of Ito integral . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. . . . . .
. . . . . .
processes
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

11 Stochastic Dierential Equations


11.1 Ordinary Dierential equation for growth . . . . . . . . . . . . . . .
11.2 Black-Scholes stochastic dierential equation for stocks . . . . . . .
11.3 Solving SDEs by Itos formula. Black-Scholes equation. . . . . . . .
11.4 Itos formula for functions of two variables . . . . . . . . . . . . . .
11.5 Stochastic Product Rule or Integration by parts . . . . . . . . . . .
11.6 Ornstein-Uhlenbeck process. . . . . . . . . . . . . . . . . . . . . . .
11.7 Vasiceks model for interest rates . . . . . . . . . . . . . . . . . . .
11.8 Solution to the Vasiceks SDE . . . . . . . . . . . . . . . . . . . . .
11.9 Stochastic calculus for processes driven by two or more Brownian
motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.10Summary of stochastic calculus . . . . . . . . . . . . . . . . . . . .

54
54
54
55
56
57
57
58
59

12 Options
12.1 Financial Concepts . . . . .
12.2 Functions x+ and x . . . . .
12.3 The problem of Option price
12.4 One-step Binomial Model .
12.5 One-period Binomial Pricing

61
61
63
63
64
65

. . . . .
. . . . .
. . . . .
. . . . .
Model.
2

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

59
60

12.6 Replicating Portfolio . . . . . . . . . . .


12.7 Option Price as expected payo . . . . .
12.8 Martingale property of the stock under p
12.9 Binomial Model for Option pricing. . . .
12.10Black-Scholes formula . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

65
66
67
68
69

13 Options pricing in the Black-Scholes Model


13.1 Self-nancing Portfolios . . . . . . . . . . . . . .
13.2 Replication of Option by self-nancing portfolio
13.3 Replication in Black-Scholes model . . . . . . .
13.4 Black-Scholes Partial Dierential Equation . . .
13.5 Option Price as discounted expected payo . . .
13.6 Stock price ST under EMM Q . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

71
71
72
72
73
74
74

14 Fundamental Theorems of Asset Pricing


14.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
14.2 Arbitrage . . . . . . . . . . . . . . . . . . . . . . .
14.3 Fundamental theorems of Mathematical Finance . .
14.4 Completeness of Black-Scholes and Binomial models
14.5 A general formula for option price . . . . . . . . . .
14.6 Summary . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

76
76
76
77
78
78
80

.
.
.
.
.
.
.
.
.

81
81
81
81
82
83
83
84
84
85

15 Models for Interest Rates


15.1 Term Structure of Interest Rates . . . . .
15.2 Bonds and the Yield Curve . . . . . . . . .
15.3 General bond pricing formula . . . . . . .
15.4 Models for the spot rate . . . . . . . . . .
15.5 Forward rates . . . . . . . . . . . . . . . .
15.6 Bonds in Vasiceks model . . . . . . . . . .
15.7 Bonds in Cox-Ingersoll-Ross (CIR) model .
15.8 Options on bonds . . . . . . . . . . . . . .
15.9 Caplet as a Put Option on Bond . . . . .

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

Introduction.

Intro

In order to study Finance and Insurance, we need mathematical tools. We start


with a review of probability theory: random variables, their expected values, variance, independence etc. We introduce Random Walks, Martingales and Brownian
motion, stochastic dierential equations. These are sophisticated mathematical
tools. We compromise: we are going to learn how to use these tools, which are
useful in other areas, such as Engineering and Biology.

1.1

Example of models

Let xt the amount of money in a savings account. Suppose the interest rate is r,
and x0 > 0.
The evolution of xt is described by the dierential equation
dxt
= rxt .
dt
We solve this equation as follows. Divide by xt to get xt /xt = r. We know that
the derivative of ln xt equals xt /xt . Hence, by integration, we have ln xt = rt + C,
where C is a constant. Finally,
xt = eC ert .
In order to nd the value of eC we need to know x0 . In fact, by plugging t = 0, we
have x0 = eC . Hence, we get
xt = x0 ert .
What is it for? It allows to predict xt at a future time t. Or it allows to nd rate
r if both xt and x0 are known.

What if we introduce a random perturbation?


dXt = rXt dt + dt ,
where t is a random process. This is a strong generalization. We will introduce and
study how to solve some cases of this class of equations. They are called Stochastic
Dierential Equations.

1.2

Application in Finance

13.0

3.0

15.0

3.6

Prices

50

150

250

50

20
10

15.0

250

Boral

17.5

BHP

150

50

150

250

LLC

50

150

250

NCP
Figure 1: Prices of stocks

Observed prices of stocks as functions of time


Plot price at time t, St of the y-axis and time t on the x-axis.
A model for such functions.
Notice that simulated functions of time that look like stock prices.
Simulations
These are random functions, continuous but not smooth, (not dierentiable).
Using models we solve the problem of Options Pricing in Finance. Option is
a nancial contract that allows to buy assets in the future for the agreed price at
present.
This is modern approach to risk management in markets used by Banks and
other large Financial Companies.

1.3

Application in Insurance

Consider a sequence of independent games, and suppose that your payo at the end
of each game is Xi , which is a random variable.
n We assume that Xi are identically
distributed. The Random Walk is simply i=1 Xi for n N. This is the discrete
counterpart of Brownian motion.
Using Random Walk to model Insurance surplus we can calculate the ruin
probability.

1.4

1.2

0.6

1.0

0.8
0.4
0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.6

0.8

1.0

0.8

1.0

mu = 0

0 5

15

25

mu = -1

0.4

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

mu = 1

0.4

0.6

mu = 2

Figure 2: Computer simulations


The equation for surplus at the end of year n is
Un = U0 + cn

Xk ,

k=1

where U0 the initial funds, c is the premium collected in each year and Xk is the
amount of claims paid out in year k. The insurance company wants to compute the
probability of ruin, i.e. the probability that soon or later the process (Un , n 1)
to hit zero or become negative.
This model allows to nd sucient initial funds to control the probability of
ruin.

Review of probability

2.1

Distribution of Random Variables. General.

A random variable refers to a quantity that takes dierent values with some probabilities. A random variable is completely dened by its cumulative probability
distribution function.
Cumulative probability distribution function
F (x) = Pr(X x),

x IR.

cdf

The probability of observing an outcome in an interval A = (a, b] is


Pr(X A) = F (b) F (a).
Sometimes it is more convenient to describe the distribution by the probability
density function.
Probability density function for continuous random variables
f (x) =

d
F (x)
dx

pdf

Using the relation between the integral and the derivative we can calculate
probabilities of outcomes by using the pdf.
The probability of observing an outcome in the range (a,b] (or (a, b)) is
b
Pr(a < X b) = F (b) F (a) =
f (x)dx.
a

Any probability density is a non-negative function, f (x) 0 that integrates to 1

f (x)dx = 1.
Conversely, any such f corresponds to some probability distribution.

Uniform(0,1) have density


{
1
f (x) =
0

if x (0, 1)
otherwise.

The cumulative function in this case is

0
F (x) = x

if x 0
if x (0, 1)
if x 1.

Exponential with parameter have density


{
ex
if x > 0
f (x) =
0
otherwise.
The cumulative function in this case is
{
0
if x 0
F (x) =
x
1e

Standard Normal Distribution N (0, 1)


x2
1
f (x) = e 2
2
General Normal
Distribution involves two numbers (parameters) and
The density of normal N (, 2 ) distribution is given by
(x)2
1
f (x) =
e 22
2

The cumulative probability function of Standard Normal is denoted by (x).


x
(x) =
f (u)du.

It cannot be expressed in terms of other elementary functions. It is available in


Excel and Tables.

2.2

Expected value or mean

The expected value or the mean is dened as

E(X) = xf (x)dx.
Interpretation, if f (x) is the mass density then EX is the centre of gravity.
5

2.3

Variance Var, and SD

The variance is dened as


V ar(X) = E(X EX)2 .
The Standard Deviation (SD) is dened as

SD = = E(X EX)2 .
SD shows how far on average the values are away from the mean.

It turns out that for the N (, 2 ) distribution the mean is and variance 2 .
Theorem 1 If X has N (, 2 ) distribution then
E(X) = , V ar(X) = 2 , SD(X) = .
Proof is an exercise in Calculus.

Linear transform

Normal Distribution N (, 2 ) is obtained from the standard Normal by a linear


transformation.

linearNorm Theorem 2 If Z has standard Normal distribution N (0, 1), then random variable

X = + Z
has N (, 2 ) distribution. Conversely, if X has N (, 2 ) distribution, then
Z=

has standard Normal distribution N (0, 1).


Proof. Write P (X x) and dierentiate.

Exercise 1. Find distribution of X = + N (0, 1).


2. Find distribution of X = N (0, 1).

This result allows to calculate probabilities for any Normal distribution by using
tables of Standard Normal.
It also allows to generate any Normal by using a standard Normal.
Example: X N (1, 2). Find P (X > 0).
X 1
01
1
P (X > 0) = P (
> ) = P (Z > ) = 1 (0.707) = 0.76
2
2
2

Example Consider the process Xt = tZ, where Z is N (0, 1). Give the distribution of Xt . Give the distribution of the increments of Xt .

2.4

General Properties of Expectation

Expectation

1. Expectation is linear E(aX + bY ) = aE(X) + bE(Y )


2. If X 0, then E(X) 0
3. If X = c then E(X) = E(c) = c.
4. Expectation of a function of a rv.

Eh(X) =

h(x)fX (x)dx.

5. Expectation of an indicator IA (X) (IA (X) = 1 if X A and 0 if X


/ A)
EIA (X) = P (X A)
These properties are established from the denition of expectation.
Remark that if h(x) = xn then Eh(X) = E(X n ) is called the n-th moment of
rv. X.

2.5

Exponential moments of Normal distribution

Exponential moment of a random variable X is EeuX for a number u.


It is also known as the moment generating function of X when considered as a
function of the argument u.
mgfNormal Theorem 3 Exponential moment of N (, 2 ) distribution is given by eu+

2 2
u
2

Proof: By
the Property 4 of the expectation (Expectation of a function of
a rv. Eh(X) = h(x)fX (x)dx) with h(x) = eux

(x)2
1
uX
Ee =
eux
e 22 dx.
2

The rest is an exercise in integration. By putting exponential terms together and


mgf of Normal completing the square,

(x)2
1
uX
Ee =
eux
e 22 dx
2


x2 2x+2 2 2 ux
1
2 2

=
e
dx
2


x2 2x(+ 2 u)+2
1
2 2

e
=
dx
2


(x(+ 2 u))2 (+ 2 u)2 +2
1
2 2

=
e
dx
2


2 2 u+( 2 u)2
(x(+ 2 u))2
1
2 2
2 2

e
dx.
=
e
2

By taking the term which does not include x outside, we have



(x(+ 2 u))2
1
u+( 2 u2 )/2
2 2

=e
dx.
e
2

Recognising that the function under the integral is probability density of


N ( + 2 u, 2 ) distribution, and that its integral equals to 1,
= eu+(

2 u2 )/2

1 = eu+(

2 u2 )/2

.
2

2.6
LogNormal

LogNormal distribution

X is Lognormal LN (, 2 ) if log X is Normal N (, 2 ). In other words,


X = eY ,
where Y is Normal N (, 2 ).
Since ex > 0 for any x, the lognormal variable is always positive.
Lognormal density is given by the formula: for x > 0
(ln x)2
1
f (x) =
e 22
2x

Exercise Derive this formula by using the denition and the normal density.
Example: X LN (1, 2). Find P (X > 1).
X = eY where Y N (1, 2). Then
P (X > 1) = P (eY > 1) = P (Y > 0) = 0.76,
where the last value by using the previous example.
2
mean of LN Theorem 4 If X has LN (, ) distribution then its mean
2

EX = e+ 2 ,
2

and its standard deviation SD(X) = e+ 2 (e 1) 2 .


Proof:

This is just mgf of N (, 2 ) evaluated at u = 1.


2

Independence.

Concept of independence of random variables involves their joint distributions.


When we model many Random Variables together we can look at them as a a
vector X = (X1 , X2 , ..., Xn ).
It takes values
x = (x1 , x2 , ..., xn )
according to some probability distribution, called the joint distribution. Its probability density is a function of n variables f (x) which is non-negative and integrates
to 1.
Similar to the one dimension, probabilities, by denition of f (x), are given by
the multiple integral. For a set B in Rn

Pr(X B) =
. . . f (x)dx1 dx2 ...dxn
B

Note that this formula is only sometimes used for calculations.


The probability density functions for each Xi are called marginal density functions.

3.1

Joint and marginal densities

marginal dn

Consider case n = 2.
Theorem 5 If X and Y have a joint density f (x, y) then marginal densities are
given by integrating
out the other variable.

fX (x) = f (x, y)dy, and fY (y) = f (x, y)dx.


Proof:

F (x, y) = P (X x, Y y) =
Then
FX (x) = P (X x) = F (x, ) =

f (u, v)dudv.

f (u, v)dudv.

Dierentiating with respect to x gives the formula for marginal density of X.


2

10

3.2

Multivariate Normal distributions

Multivariate Normal distribution is a collection of a number of Normal distributions, which are correlated with each other.
Multivariate N
Denition
Multivariate Normal distribution is determined by its mean vector and its covariance matrix:
X = (X1, X2,...., Xd ) is N (, ), where
(
)
= (EX1 , EX2 , . . . , EXd ), = Cov(Xi , Xj )
i,j=1,...,d

if its probability density function is given by


fX (x) =

(2)

d
2

e 2 (X)
1

(X)T

det()

det() is the determinant of the square matrix and 1 is the inverse,


1 = I.

Example A bivariate
[
]normal.
1
= 0 and =
.
1
Calculations give
fX (x) =

1 2

1
[x2 2xy+y 2 ]
2(12 )

Standard Multivariate Normal Z is N (0, I)

1 0 0
I is the identity matrix, eg. for d = 3 I = 0 1 0
0 0 1

It is easy to see that Z is N (0, I) is a vector of independent standard Normals


Zi . |I| = 1, I 1 = I, and
fZ (z) =

1
d

12 ZI ZT

(2) 2

d
d

1 1 zi2
2
e
=
=
fZi (zi ).
2
i=1
i=1

In a similar way one can show


11

Exercise If the random variables (X1, X2,...., Xd ) are jointly Normal, then they are
independent if and only if they are uncorrelated.

A multivariate Normal is a linear transformation of the standard multivariate


Normal, just like in one dimension.
Theorem 6 If X = (X1, X2,...., Xd ) is N (, ) then
X = + AZ,
where Z is standard multivariate Normal, and A is a matrix square root of
satisfying = AAT .

Matrix square root is not unique (cf 4 = 2).


Proof: This is an exercise in multivariable calculus. The probability density
of Z is the product by independence. Then perform a change of variables in the
multiple integral to nd that the probability density of X.
2

3.3

A linear combination of a multivariate normal

A linear combination of components of a multivariate vector is aX for a nonrandom


a. It is is a scalar random variable
aX = a1 X1 + a2 X2 + ... + ad Xd

aX = N

linearcomb Theorem 7 If X is multivariate Normal N (, ) then aX is N (a, aaT ).

This theorem can be proved by using transforms of distributions given later.


Note that in this theorem the joint distribution is multivariate normal, and it is
not enough that the marginal distributions (ie distribution of each Xi ) is normal.
A counterexample: Z is standard normal and let X1 = Z and X2 = Z. Then
both X1 and X2 are standard normal also, but X1 + X2 = 0.

Example. Find the distribution of (X1 + X2 ), and specify its variance, where
X1 , X2 are correlated normals.
[
]
1 2
12
X = (X1, X2 ) is N (, ), =
1 2
22
Note that the sum can be written as a scalar product
X1 + X2 = aX, where a = (1, 1).
12

]( )
12
1 2
1
aa = (1, 1)
= 12 + 21 2 + 22 .
1 2
22
1
as it should be, as can be veried directly that
V ar(X1 + X2 ) = 12 + 21 2 + 22
T

Example. The average of Normals, even if they are correlated, is again Normal,
2
but not so for LogNormals (eN (, ) ).
If X is N (, ) nd distribution of

= 1
X
Xi .
n i=1
n

Find distribution of
n
(

Xi

) n1

1 Xi
e .
n i=1
n

i=1

Remark Let X be multivariate Normal and U = BX, for a non random matrix
B. Using Theorem 7 show that U is multivariate Normal with mean BX and
covariance matrix BX B T .

3.4

Independence

Independence

Events A1 and A2 are independent if the probability that they occur together, is
given by the product of their probabilities,
P (A1 A2 ) = P (A1 )P (A2 ).
Random variables X and Y are independent, if the joint probability distribution
is a product of marginal probabilities, and in terms of densities
fX (x, y) = fX (x)fY (y).
In general it is not enough to know the distribution of each variable X and Y
in order to know the distribution of the random vector (X, Y ).
But if variables X and Y are independent then their marginal distributions
determine their joint distribution (by the product formula).

An important corollary to independence is


13

expindepend Theorem 8 If random variables X and Y are independent then

E(XY ) = E(X)E(Y ).
Proof:


E(XY ) =

by independence

xyf (x, y)dxdy

xyfX (x)fY (y)dxdy


(
)(
)
=
xfX (x)dx
yfY (y)dy = E(X)E(Y ).
2

Independence can be formulated as a property of expectations.


Theorem 9 X and Y are independent if and only if for any bounded functions h
and g
E(h(X)g(Y )) = Eh(X)Eg(Y ).

Independence for many variables


Events A1 , A2 , . . . , An are independent if for any their subcollection the probability that they occur together, is given by the product of their probabilities.
Random variables are independent, if the joint probability distribution is a
product of marginal probabilities, and the the joint density function is a product
of marginal density functions.
f (x) = f1 (x1 )f2 (x2 )....fn (xn ).
In general it is not enough to know the distribution of each variable Xi in order
to know the distribution of them all, the random vector (X1 , X2 , . . . , Xn ).
But if variables X1 , X2 , . . . , Xn are independent then their marginal distributions determine their joint distribution (by the product formula).

14

Covariance

3.5

Covariance

Let X and Y be two random variables with nite second moments E(X 2 ) <
and E(Y 2 ) < . Their covariance is dened as
Cov(X, Y ) = E(X EX)(Y EY ).
Theorem 10
Cov(X, Y ) = E(XY ) E(X)E(Y )
Proof:

(
)
E(X EX)(Y EY ) = E XY Y EX XEY + EXEY

Now use the property of expectation that constants can be taken out
E(aX) = aEX
= E(XY ) 2EXEY + EXEY = E(XY ) E(X)E(Y ).
2
Correlation

Correlation is dened as
Cov(X, Y )
=
.
V ar(X)V ar(Y )

Now Theorem 8 has the following


Corollary If X and Y are independent then they are uncorrelated.

3.6
Covariance

Properties of Covariance and Variance

1. Cov(X, Y ) = E(XY ) E(X)E(Y ).


2. Covariance is bilinear (as multiplying polynomials)
Cov(aX+bY, cU +dV ) = acCov(X, U )+adCov(X, V )+bsCov(Y, U )+bdCov(Y, V )
3. V ar(X) = Cov(X, X)
4. V ar(X) = E(X 2 ) (E(X))2 = E((X E(X))2 ) It is always nonnegative
5. V ar(X + Y ) = V ar(X) + 2Cov(X, Y ) + V ar(Y )
6. If X and Y are independent or uncorrelated, then
V ar(X + Y ) = V ar(X) + V ar(Y )

15

3.7

Covariance function

Denition The covariance function of a random process Xt is dened by


(
)
(
)(
)
(s, t) = Cov Xt , Xs = E Xt EXt Xs EXs
(
)
= E Xt Xs EXt EXs

4
CondExp

4.1

Conditional Expectation
Conditional Distribution and its mean

Recall expectation or mean

E(X) =

xfX (x)dx.

Similarly, the conditional expectation is the integral with respect to the conditional distribution

E(X|Y = y) =
xf (x|y)dx.

The conditional distribution is dened as follows. Let X, Y have joint density


f (x, y) and marginal densities fX (x), and fY (y). The conditional distribution of
X given Y = y is dened by the density
f (x|y) =

f (x, y)
,
fY (y)

at any point y where fY (y) > 0. It is easy to see that so dened f (x|y) is indeed
a probability density, as it is nonnegative and integrates to one.
The expectation of this distribution, when exists, is called the conditional expectation of X given Y = y, and is given by the above formula.

Example Let X and Y have a standard bivariate normal distribution with parameter . Then
1. The conditional distribution of X given Y = y is normal N (y, (1 2 )).
2. E(X|Y = y) = y, E(X|Y ) = Y .
}
{
1
2
2
Proof: 1. The joint density f (x, y) = 1 2 exp 2(1
[x

2xy
+
y
]
,
2)
and the marginal density is fY (y) =

2 1
2
1 ey /2 .
2

16

Hence the conditional distribution of X given Y = y is f (x, y)/fY (y)


{
}
1
2
2
1
exp

[x

2xy
+
y
]
2
2(1 )
2 12
fX|y (x) =
2
1 ey /2
2
{
}
(x y)2 + (1 2 )y 2 y 2
1
+
exp
=
2(1 2 )
2
2(1 2 )
{
}
1
(x y)2
exp
=
.
2(1 2 )
2(1 2 )
But this is the density of N (, 2 ) distribution with = y and 2 = 1 2 .
2. The mean of N (, 2 ) is , thus from 1. E(X|Y = y) = y.
2

Conditional expectation as a random variable


E(X|Y = y) is a function of y. If g denotes this function, that is, g(y) =
E(X|Y = y), then by replacing y by Y we obtain a new random variable g(Y ),
which is called the conditional expectation of X given Y ,
E(X|Y ) = g(Y ).
In the above example
E(X|Y ) = Y.
Similarly to this example, in the case of the multivariate normal vector the
conditional expectation of E(X|Y) is a linear function of Y, Theorem 23.

4.2
Prop.

Properties of Conditional Expectation

E(X|Y)

1. Conditional expectation is linear in X


E(aX1 + bX2 |Y ) = aE(X1 |Y ) + bE(X2 |Y ).
2. E(E(X|Y )) = E(X). The law of double expectation.
3. If X is a function of Y (also said as Y -measurable), then
E(X|Y ) = X.
4. If U is Y -measurable, then it is treated as a constant
E(XU |Y ) = U E(X|Y ).
17

5. If X is independent of Y , then
E(X|Y ) = EX,
that is, if the information we know provides no clues about X, then the
conditional expectation of X is simply its mean value.

4.3

Expectation as best predictor

Let X denote a random variable. If we predict the outcome of X by a number c


then the dierence between the actual and the predicted outcomes is (X c). This
represents the error in our prediction. If we want to predict an outcome so that the
error irrespective of its size is smallest we minimize mean-squared error E(X c)2 .
= E(X).
Theorem 11 The best mean-square predictor of X is its mean X
Proof: The mean-squared error is a function of c, E(X c)2 . Minimize in c.
E(X c)2 = E(X 2 2cX + c2 ) = E(X 2 ) + c2 2cE(X).
The number c that minimizes this is found by dierentiating and equating to zero.
d
d
E(X c)2 = (E(X 2 ) + c2 2cE(X)) = 2(E(X) c))
dc
dc
Equate to 0, to have
c = E(X).
Take second derivative, it is 2 > 0, therefore the critical point is minimum. Thus
= E(X).
X
2

4.4

Conditional Expectation as Best Predictor

We now look for the best possible predictor of X based on Y , some function of Y ,
as the one that minimizes
h(Y ). We dene the optimal predictor or estimator X
the mean-squared error, i.e. for any random variable Z, which is a function of Y
2 E(X Z)2 .
E(X X)
It turns out that the best predictor of X based on Y is the conditional expectation of X given Y , denoted E(X|Y ).

18

based on Y is given by
Best Predictor Theorem 12 The best predictor (optimal estimator) X
= E(X|Y ), in other words for any random variable Z, Y -measurable (a function
X
of Y )
E(X E(X|Y ))2 E(X Z)2 .
For the proof we need the following result.
Theorem 13 Any random variable Z which is Y -measurable (a function of Y ) is
= X E(X|Y ).
uncorrelated with X X
Proof:
= E(Z(X X))
E(Z)E(X X)

Cov(Z, X X)
The second term is zero because by the law of double expectation
= E(X) E(E(X|Y )) = E(X) E(X) = 0.
E(X X)
Thus
= E(Z(X X))

Cov(Z, X X)
=0
= E(ZX) E(Z X)
by the law of double expectation
(
)
(
)

E(ZX) = E E(ZX|Y ) = E ZE(X|Y ) = E(Z X)


= 0.
since Z is Y -measurable. Finally Cov(Z, X X)

2
Example Consider the process Xt = tZ, where Z is N (0, 1). Give the distribution of Xt . Give the distribution of the increments of Xt .

In particular, we have a
= E(X|Y ) and X X
= X E(X|Y ) are uncorrelated.
Corollary X

= E(X|Y ). Take any Z,


Proof: of the Theorem that best predictor is X
which is a function of Y . We need to show that
2 E(X Z)2 .
E(X X)
(
)2
+X
Z
E(X Z)2 = E X X

(
)
2
2

= E(X X) + E(X Z) + 2E (X X)(X Z)


19

2 + E(X
Z)2 , by the previous result
= E(X X)
2.
E(X X)
= E(X|Y ) is the optimal, best predictor/estimator.
Thus X
2

4.5

Conditional expectation with many predictors

Let X, Y1 , Y2 , . . . , Yn be random variables.


minimizes the mean square error, i.e.
By denition, the optimal predictor X
for any Z function of Y s
2 E(X Z)2 .
E(X X)

based on Y1 , Y2 , . . . , Yn is given by
Theorem 14 The best predictor X
= E(X|Y1 , Y2 , . . . Yn ).
X
Conditional expectation given many random variables is dened similarly as
the mean of the conditional distribution. It is denoted by
E(X|Y1 , Y2 , . . . Yn )
Notation: If we denote the information generated by Y1 , Y2 , . . . , Yn by Fn then
E(X|Y1 , Y2 , . . . Yn ) = E(X|Fn ).

Note that often it is hard to nd a formula for the conditional expectation. But
in the multivariate Normal case it is known and is established by direct calculations.
Cond.Exp.MvN Theorem 15 (Normal Correlation) Suppose X and Y jointly form a multi-

variate normal distribution. Then the vector of conditional expectations is given by


the following
E(X|Y) = E(X) + Cov(X, Y)Cov 1 (Y, Y)(Y E(Y)).
Cov(X, Y) denotes the matrix with elements Cov(Xi , Yj ), Cov 1 (Y, Y) denotes
the inverse of the covariance matrix of Y .

20

best Pred Example Best predictor of X based on Y in Bivariate Normal

Direct application of the formula


E(X|Y ) = EX +

Cov(X, Y )
(Y EY ).
V ar(Y )

21

Random Walk and Martingales

5.1

Simple Random Walk

RW

A model of pure chance is served by an ideal coin being tossed with equal probabilities for the Heads and Tails to come up. Introduce a random variable Y taking
values +1 (Heads) and 1 (Tails) with probability 12 . If the coin is tossed n times
then a sequence of random variables Y1 , Y2 , . . . , Yn describes this experiment. All
Yi have exactly the same distribution as Y1 , moreover they are all independent.
Random walk is the process Xn , dened by
Xn = X0 + Y1 + Y2 + .... + Yn .
Xn gives the fortune of a player in a game of chance after n plays, where a coin is
tossed and one wins $1 if Heads come up and loses $1 when Tails come up. Random
walk is the central model for stock prices, the standard assumption is that returns
on stocks follow a random walk

Random Walk
A more general Random Walk
Xn = X0 + Y1 + Y2 + .... + Yn ,
where Yi s are i.i.d. (not necessarily 1).
RW is unbiased if EYi = 0 and biased otherwise.

Mean and Variance of Random Walk


Since E(Yi ) = 0, and V ar(Yi ) = E(Y 2 ) =
1, the mean
and the variance of the
random walk are given by E(Xn ) = X0 + E( ni=1 Yi ) = ni=1 E(Yi ) = X0 ,
V ar(Xn ) = V ar(X0 +

Yi ) =

i=1

V ar(Yi ) = nV ar(Y1 ) = n.

i=1

Useful tools. Strong law of large numbers and central limit theorems. In general
if X1 , X2 , . . . , Xn , . . . are i.i.d. random variables with nite mean, we have
1
lim
Xi = E[X1 ].
n n
i=1
n

22

Moreover if Xi have nite variance 2 , we have


x
n
1
1
2
eu /2 du.
lim P (
(Xi E[Xi ]) x) =
n
n i=1
2

5.2

Martingales

Martingales

Denition A process (Xn ), n = 0, 1, 2, . . . is called a martingale if for all n


E|Xn | < , and the martingale property holds
E(Xn+1 |X1 , X2 , . . . , Xn ) = Xn .

Martingale property of Random Walk


Since
n
n

E|Xn | = E|
Yi |
E|Yi | = nE|Y1 |,
i=1

i=1

Xn is integrable provided E|Y1 | < . For any time n given Xn ,


E(Xn+1 |X1 , X2 , . . . , Xn ) = Xn + E(Yn+1 |X1 , X2 , . . . , Xn ).
Since Yn+1 is independent of the past, and Xn is determined by the rst n variables,
Yn+1 is independent of Xn . Therefore, E(Yn+1 |X1 , X2 , . . . , Xn ) = E(Yn+1 ). It now
follows that if E(Yn+1 ) = 0, then
E(Xn+1 |Xn ) = Xn + E(Yn+1 |X1 , X2 , . . . , Xn ) = Xn + 0 = Xn .
Thus Xn is a martingale.

5.3

Martingales in Random Walks

Some questions about Random Walks, such as ruin probabilities can be answered
with the help of martingales.
RWMGs Theorem 16 Let Xn , n = 0, 1, 2, . . . be a Random Walk. Then the following

processes are martingales.


1. Xn n, where = E(Y1 ). In particular, if the Random Walk is unbiased
( = 0), then it is itself is a martingale.
2. (Xn n)2 2 n, where 2 = E(Y1 )2 = V ar(Y1 ).

23

3. For any u, euXn nh(u) , where h(u) = ln E(euY1 ) (exponential martingales).


Using the moment generating function notation m(u) = E(euY1 ), this martingale becomes (m(u))n euXn .

Proof
1. Since, by the triangle inequality |a + b| |a| + |b|,
E|Xn n| = E|X0 +

Yi n| E|X0 |+

i=1

E|Yi |+n|| = E|X0 |+n(E|Y1 |+||),

i=1

Xn n is integrable provided E|Y1 | < , and E|X0 | < . To establish the


martingale property consider for any n
E(Xn+1 |Xn ) = Xn + E(Yn+1 |Xn ).
Since Yn+1 is independent of the past, and Xn is determined by the rst n variables,
Yn+1 is independent of Xn . Therefore, E(Yn+1 |Xn ) = E(Yn+1 ). It now follows that
E(Xn+1 |Xn ) = Xn + E(Yn+1 |Xn ) = Xn + ,
and subtracting (n + 1) from both sides of the equation, the martingale property
is obtained,
E(Xn+1 (n + 1)|Xn ) = Xn n.

2. This is left as an exercise.


3. Put Mn = euXn nh(u) . Since Mn 0, E|Mn | = E(Mn ), which is given by
n

E(Mn ) = EeuXn nh(u) = enh(u) EeuXn = enh(u) Eeu(X0 + i=1 Yi )


n
n

uX0 nh(u)
uYi
uX0 nh(u)
= e e
E
e =e e
E(euYi ) by independence
i=1
n

nh(u)
h(u)

= euX0 e

i=1

= euX0 < .

i=1

The martingale property is shown by using the fact that


Xn+1 = Xn + Yn+1 ,

(1)

with Yn+1 independent of Xn and of all previous Yi s i n, or independent of Fn .


Using the properties of conditional expectation, we have
E(euXn+1 |Fn ) = E(euXn +uYn+1 |Fn )
= euXn E(euYn+1 |Fn ) = euXn E(euYn+1 )
= euXn +h(u) .
24

Multiplying both sides of the above equation by e(n+1)h(u) , the martingale property
is obtained, E(Mn+1 |Fn ) = Mn .

5.4
exp MG RW

Exponential martingale in Simple Random Walk ( pq )Xn

In the special case when P (Yi = 1) = p, P (Yi = 1) = q = 1 p choosing


u = ln(q/p) in the previous martingale, we have euY1 = (q/p)Y1 and E(euY1 ) = 1.
Thus h(u) = ln E(euY1 ) = 0, and euXn nh(u) = (q/p)Xn . Alternatively in this case,
the martingale property of (q/p)Xn is easy to verify directly and is left as exercise.

25

Optional Stopping Theorem and Applications

6.1

Stopping Times

stopping times

Let X1 , X2 , . . . , Xn , . . . be a sequence of random variables. A random time is


called a stopping time if for any n, one can decide whether the event { n} (and
hence the complementary event { > n}) has occurred by observing the rst n
variables X1 , X2 , . . . , Xn .
Another way of expressing the fact that is a stopping time is that for any
n you can tell if { = n} holds by looking at X1 , X2 , . . . , Xn . Alternatively, i.e.
equivalent denition, you can tell if { n} holds by looking at X1 , X2 , . . . , Xn

Lets see rst an example of a random variable that is not a stopping time. Flip
a fair coin 10 times. Denote by the last time you observe Head. Is this a stopping
time? No. Looking at X1 and X2 is not enough to tell if = 2. In fact if X1 = 0
(here 0 means Tail), X2 = 1, Xi = 0 for all i {3, 4, . . . , 10} then = 2. On the
other hand if X1 = 0, X2 = 1, Xi = 1 for all i {3, 4, . . . , 10} then = 10. Hence
the rst two observations X1 and X2 are not enough to tell if = 2 holds.
Time of ruin is a stopping time.
= min{n : Xn = 0}.
{ > n} = {X1 = 0, X2 = 0, . . . , Xn = 0}.
If we can tell if > n we can also tell if { n}. So by observing the capital at
times 1, 2, . . . , n, we can decide if the ruin by time n has occurred or not, e.g. If
X1 = 0, X2 = 0, X3 = 0 then > 3.
The time when something happens for the rst time is a stopping time. E.g.
rst time Random Walk hits value 1 (or 100). Say you gamble from 8pm to 11 pm,
is rst time you win $ 100. By observing your winnings you can decide whether
has or has not occurred.

A stopping time is allowed to take value + with a positive probability.


For example if is the rst time when a RW with a positive drift hits 0, then
P ( = ) > 0.
26

One way to see that a random variable is nite is to establish that it has nite
mean.
If E( ) < then P ( < ) = 1.

If 1 and 2 are stopping times then their minimum is also a stopping time,
= min(1 , 2 ) = 1 2 ,
is a stopping time.
We use this result mainly when one of the stopping times is a constant, = N .
Clearly, any constant N is a stopping time.
Then N is a stopping time, which is bounded by N .
For example, if is the rst time one wins $5 in a game of coin tossing then
10 is the time of winning $5 if it happens before 10 tosses, or time 10 if $5 were
not won by toss 10.
Note that also max(1 , 2 ) = 1 2 and 1 + 2 are also stopping times. But
we dont use these properties.

6.2

Optional Stopping Theorem

Prove that a martingale has a constant mean for any deterministic time. For
example, if (Mn ) is a martingale, then prove that
E(M5 ) = E(M4 ) = E(M3 ) = E(M2 ) = E(M1 ) = M0 .
There is nothing special about time 5. We can prove the same for all xed times.
What if we substite a xed deterministic time with a random one?
It turns out that the mean of the stopped martingale does not change also for
some random times, such as a bounded stopping time. What we observed above
might not be true! This is why we need the following theorem.
Theorem 17 (Optional Stopping Theorem)
Let Mn be a martingale.
EMtau=M0
1. If K < is a bounded stopping time then
E(M ) = E(M0 ).
2. If Mn are uniformly bounded, |Mn | C for any n, then for any stopping time
(even non nite),
E(M ) = E(M0 ).
The proof of this theorem is outside this course.

27

6.3

Hitting probabilities in a simple Random Walk

Ruin Prob.

Unbiased RW
Suppose that you are playing a game of chance by betting on the outcomes of
tosses of a fair coin (p = 0.5). You win $1 if heads come up and lose $1 if tails
come up. You start with $20. Find the probability of winning $10 before losing all
your initial capital of $20.

Solution. Denote the required probability by a.


X0 , X1 , . . . , Xn , . . . denote the capital at times 0, 1, . . . , n, . . ..
X0 = 20, for any n, Xn+1 = Xn + Yn+1 , where Yn+1 is the outcome of the n + 1
toss, Yn+1 = 1 with probabilities 0.5. Xn is an unbiased a random walk.
Denote by the time when you either win 10 or lose 20. In terms of the process
Xn
= min{n : Xn = 30 or 0}.

Denote by a the probability that you win 10 before losing 20, ie X = 30, then
1 a is the probability X = 0 if you lose 20 before winning 10.
Thus the distribution of X is: X = 30 and X = 0 with probability 1 a.
We have seen that the process Xn is a martingale. Applying the optional
stopping theorem (without proving that we can)
E(X ) = E(X0 ) = X0 = 20.
On the other hand, calculating the expectation directly E(X ) = 30a+0(1a).
Thus we have from these equations 30a = 20, and a = 2/3. Thus the probability
of winning $10 before losing the initial capital of $20 is 2/3.

The same calculation gives that the probability of the process Xn hitting level
unbiased RW b before it hits level c, having started at x, b < x < c is given by
a=

ruin biased RW

cx
.
cb

Biased RW
Let a simple random move to the right with probability p and to the left with
probability q = 1 p. We want to nd the probability that it hits level b before it
hits level c, when started at x, b < x < c. Let be the stopping time of random
walk hitting b or c.
28

Stopping the exponential martingale Mn = (q/p)Xn we have


E(q/p)X = (q/p)x .
But X = b with probability a and X = c with probability 1 a. Hence
a(q/p)b + (1 a)(q/p)c = (q/p)x .
prob.biased RW

Solving it for a
a=

6.4

(q/p)x (q/p)c
.
(q/p)b (q/p)c

Expected duration of a game

Unbiased RW
We use the martingale Mn = Xn2 n and stop it at . Assuming it is allowed
EX2 E = x2 . E = ab2 + (1 a)c2 x2 , where a is the hitting probability
in unbiased RW.
Biased RW
We use the martingale Mn = Xn n. Here = p q = 2p 1. Stopping it
at gives
EX E = x.
E = (ab + (1 a)c x)/(2p 1), , where a is the hitting probability in the
biased RW.

Exercise: Give a proof that Optional stopping applies in the martingale above.

6.5

Discrete time Risk Model

Time is discrete n = 0, 1, 2, . . . (years).


The insurer charges a premium of ck > 0 in the kth year. Let Xk denote the
aggregate claim amount (sum total of all claims) in the kth year. The insurer has
funds x at the start of year 1.
Un denotes the insurance company surplus at time n. U0 = x is the initial fund.
The premium in year n is c. The payout at time n is Xn (Xn is the aggregate
claim, ).
Then the equation for surplus at the end of year n is
Un = U0 + cn

k=1

29

Xk

Assumptions.
c E(Xn ) > 0. The premiums are greater than the expected payout.
X1 , X2 , . . . are identically distributed and independent.
Exercise

1. Find the expected surplus and the sd of the surplus Un .

2. Use the Law of Large Numbers to give an approximate value for Un .


3. Use the Central Limit Theorem to give the approximate distribution of Un

6.6

Ruin Probability

The probability of ruin is the probability that surplus becomes negative.


More precisely,
the time of ruin T , T = min{n : Un < 0}, where T = if Un 0 for all
n = 1, 2, .
The probability that ruin has occurred by time n is
P (T n).
The probability that ruin occurs is
P (T < ).
This probability is the central question of study in Actuarial mathematics/
Insurance.

Applications in Insurance

Insurance is an agreement where, for an upfront payment (called premium) the


company agrees to pay the policyholder a certain amount if a specic loss occurs.
The individual transfers this risk to an insurance company in exchange for a xed
premium.

30

Theorem 18 Assume that {c Xk , k = 1, 2, } are i.i.d. random variables, and


ruin exp bound there exists a constant R > 0 such that

EeR(cX1 ) = 1.
Then for all n

Px (T n|U0 = x) eRx

Proof.
Step 1. Show that Mn = eRUn is a martingale.
Step 2. Use the Martingale stopping Theorem with the stopping time min(T, n) =
T n
Step 3. Extract information from the resulting equation.
Step 1.
Finite expectation.

E|eRUn | = E(eRUn ) = eRx nk=1 E(eR(cXk ) ) = eRx < .

Proof of the martingale property.


Since
Un+1 = Un + c Xn+1 ,
we have E(Mn+1 |U1 , , Un ) =
=
=
=
=
=

E(eRUn+1 |U1 , , Un ) by denition of Mn+1


E(eRUn R(cXn+1 ) |U1 , , Un ) by denition of Un+1
eRUn E(eR(cXn+1 ) |U1 , , Un ) since Un is known
eRUn E(eR(cXn+1 ) ) = eRUn E(eR(cXn+1 ) ) by independence
eRUn by denition of R.

This together with nite expectation implies that Mn = eRUn , n = 0, 1, . . . is


a martingale.

We have seen that T is a stopping time. T n is a stopping time bounded by n,


min(T, n) n.
We can apply the Martingale Stopping E(MT n ) = E(M0 )
E(eRUT n ) = E(eRU0 ) = eRx .
We now open the T n by using the indicators, T n = T I(T n) + nI(T > n).
Thus
eRUT n = eRUT I(T n) + eRUn I(T > n).
31

Thus from above,


(
)
(
)
eRx = E eRUT I(T n) + E eRUn I(T > n)
(
)
E eRUT I(T n) since E(eRUn I(T > n)) > 0
E(I(T n)) since UT < 0 and eRUT > 1
= P (T n) as required .

7.1

The bound for the ruin probability. Constant R.

Bound on the ruin probability is eRx .


We now turn to nding constant R.
The constant R is found from the equation
(
)
E eR(cX) = 1.
Rewriting,

(
)
eRc E eRX = 1.

Recall that the second term is the moment generating function of X


(
)
E eRX = mX (R)
So that R solves equation
mX (R) = eRc

7.2

R in the Normal model

Example. Suppose that the aggregate claims have N (, 2 ) distribution. We give


a bound on the ruin probability.
The mgf of N (, 2 ) is given by
mX (R) = E(eRX ) = E(eN (R,R

2 2 )

) = eR+ 2 R

2 2

(or use the formula for Normal moment generating function). Thus the equation
for R becomes
mX (R) = eRc
1

eR+ 2 R

2 2

= eRc .

Taking log and solving,


1 2 2
R = R(c )
2
32

R=

2(c )
2

Remark The aggregate claims in consecutive years X1 , X2 , . . . , Xn , . . . are assumed to have the same distribution, say as X1 .
Suppose that there are n insured individuals. Then each has individual claim
distribution Y . So that in one year the aggregate claim is
X1 =

Yi ,

i=1

where Yi is the claim of person i.


If the individual claim has mean Y and sd Y2 , then the
CLT states that
X1 nY

N (0, 1).
nY
In other words,
X1 N (nY , nY2 ).

Example Consider a car owner who has an 80% chance of no accidents in a year,
a 20% chance of being in a single accident in a year, and no chance of being in more
than one accident in a year. For simplicity, assume that there is a 50% probability
that after the accident the car will need repairs costing 500, a 40% probability
that the repairs will cost 5000, and a 10% probability that the car will need to be
replaced, which will cost 15,000. Hence the distribution of the random variable Y,
loss due to
accident:
0.80
if x = 0

0.10
if x = 500
f (x) =

0.08
if x = 5000

0.02
if x = 15000
The car owners expected loss is the mean of this distribution, E(Y ) = 750. The
standard deviation of the loss Y = 2442.
An insurance company that will reimburse repair costs resulting from accidents
for 100 such car owners
For the company the loss in one year is sum of losses for each car. If the loss
to car i is Yi , then
100

X1 =
Yi ,
i=1

and similar in subsequent years.


33

Note that most of Yi s are zero. This fact is taken into account in the loss (claim)
distribution.

For the company, the expected loss in one year is sum of expected losses
( 100 )

= X = E
Yi = 100car = 75, 000
i=1

The variance is
2
2 = X
= V ar

( 100

)
Yi

2
= 100car
= 596, 336, 400.

i=1

So that the aggregate loss in one year X has approximately Normal distribution
with this parameters.

Suppose the premium is set to be 30% higher than the expected claim, c = 1.3
Then
R=

2(c )
0.6
0.6
45000
= 2 = 2 =
= 7.55 105
2

596, 336, 400

So, if the company has initial fund of x = 100, 000 = 105 , then the ruin probability is less than e7.55 = 0.0005.
Note that initial fund of only x = 10, 000 = 104 is not enough, the ruin probability is less than e0.755 = 0.47.

7.3

Simulations

Suppose you want to simulate from a strictly increasing c.d.f. F . Let U be a


uniform random variable. We have that
Y = F 1 (U ),
has c.d.f. F . In fact
P (Y x) = P (F 1 (U ) x) = P (U F (x)) = F (x).

Example: simulation of an exponential.


34

The cumulative distribution function of exp(1) distribution is F (x) = 1 ex .


To nd the inverse, solve
F (x) = y.
y = ln(1 x).
So F 1 (x) = ln(1 x).
By the above theorem, if U is uniform (0,1), then
X = ln(1 U ) exp(1).
Since 1U also comes from the same Uniform (0,1) distribution, exp(1) is simulated
by
X = ln U

7.4

The Acceptance- Rejection method

Suppose we want to simulate from a distribution F with density f , where F 1 is


dicult to calculate.
The idea is to start with a rv Y with a density g(x), which is easy to simulate,
and has the property f (x) < Cg(x) for some nite constant C.
Given Y = x, one accepts Y and let X = Y with probability f (x)/(Cg(x)).
Otherwise a new Y is generated until acceptance achieved.
Algorithm:
1. Generate a Y from the density g(x)
2. Generate U uniform (0,1).
f (Y )
3. If U Cg(Y
, set X = Y and stop: (Acceptance), Otherwise (rejection) go to
)
step 1.

Example: Simulate from f (x) = 20x(1 x)3 , 0 < x < 1. (Beta distribution).
Take g(x) = 1.
It is an exercise in Calculus to see that f (x)/g(x) C = max f (x) = 135
.
64
Thus
1. generate Y and U2 from U (0, 1).
64
2. If U2 135
20Y (1 Y )3 , then set X = Y and stop; otherwise sample again.
This method applies to any distribution with bounded density f (x).

35

Brownian Motion

Botanist R. Brown described the motion of a pollen particle suspended in uid in


1828. It was observed that a particle moved in an irregular, random fashion.
In 1900 L. Bachelier used the Brownian motion as a model for movement of
stock prices in his mathematical theory of speculation.
A. Einstein in 1905 explained Brownian motion as a result of molecular bombardment by the molecules of the uid.
Mathematical foundation for Brownian motion as a stochastic process was done
by N. Wiener in 1931, hence it is also called the Wiener process.

8.1

Denition of Brownian Motion

Properties BM Dening Properties of Brownian Motion {Bt } . Time t, 0 t T .

1. (Normal or Gaussian increments) For all s < t, Bt Bs has N (0, t s)


distribution, Normal distribution with mean 0 and variance t s.
2. (Independent increments) Bt Bs is independent of the past, that is, of Bu ,
0 u s.
3. (Continuity of paths) Bt , t 0 are continuous functions of t.
The initial point B0 is a constant, often 0. If B0 = x then Bt is BM started at x.
We explain these properties below.

Dening Property 1 of Brownian Motion


Bt Bs is N (0, t s) fors < t.
By Theorem2 with = t s, the distribution of Bt Bs is the same as the
distribution of t sZ, where Z is N (0, 1).
Hence
E(Bt Bs ) = 0.
By the linearity of expectation
E(Bt Bs ) = EBt EBs = 0.
Thus for all s and t
EBt = EBs .
In particular
EBt = EB0 = B0 .
36

The last equality, because expectation of a constant is that constant. Next for a
random variable X with zero mean, EX = 0, we have
V ar(X) = E(X EX)2 = E(X 2 ).
Since (Bt Bs ) has zero mean, and by a property of N (0, 2 ) distribution
E(Bt Bs )2 = V ar(Bt Bs ) = t s,

SD(Bt Bs ) = t s.
If we take s = 0 then we obtain E(Bt B0 ) = 0 and E(Bt B0 )2 = t.

8.2

Independence of Increments

For any times s and t, s < t, the random variable Bt Bs is independent of all the
variables Bs and Bu , u < s.
BMGauss Theorem 19 Brownian motion has covariance function min(t, s).

Proof: Take t > s.


Then Bt can be written as a sum of Bs and increment (Bt Bs ),
Bt = Bs + (Bt Bs ).
Hence
E(Bs Bt ) =

EBs2

)
+ E Bs (Bt Bs ) .

Now Brownian motion has independent increments: (Bt Bs ) and Bs are independent, therefore expectation of their product is the product of their expectations
(Theorem 8) so that
(
)
E Bs (Bt Bs ) = EBs E(Bt Bs ).
Brownian motion has Normal increments: (Bt Bs ) is N (0, t s). Therefore its
mean is zero, E(Bt Bs ) = 0. So that
E(Bs Bt ) = E(Bs2 ).
Next, writing Bs = B0 + (Bs B0 ) and using independence of terms, we have
E(Bs2 ) = E(B02 + (Bs B0 )2 + 2B0 (Bs B0 )) = E(B02 ) + s = B02 + s.
Here we used that E(Bs B0 )2 = s as the variance of N (0, s) distribution, and
that B0 is non-random, E(B02 ) = B02 .
Next, for any t
EBt = E(B0 ) + E(Bt B0 ) = E(B0 ) = B0 .
37

Hence
EBt EBs = B02 .
Finally

(
)
(
)
Cov Bs , Bt = E Bt Bs EBt EBs = B02 + s B02 = s.

If t < s, then similarly (or exchanging roles of s and t) E (Bt Bs ) = t. Therefore


(
)
Cov Bs , Bt = min(t, s).
2

Brownian Motion is a Gaussian Process

The distributions of B(t) for any time t are called marginal distributions of Brownian motion.
The joint distributions of the vector (B(t1 ), B(t2 )) of Brownian motion sampled
at two arbitrary times t1 < t2 are called bivariate distributions.
Similarly for any n the joint distributions of the vector (B(t1 ), B(t2 ), . . . , B(tn ))
of Brownian motion sampled at n arbitrary times t1 < t2 < . . . < tn are called ndimensional distributions.
Finite dimensional distributions are the joint distributions when n = 1, 2, 3, . . ..
To describe a random process it is not enough to know the distributions of its
values at any time t, but also joint distributions.
A stochastic (random) process is called Gaussian if all its nite dimensional
distributions are multivariate Normal.
In this lecture we prove that Brownian motion is a Gaussian process.
BMGaussPr Theorem 20 Brownian Motion is a Gaussian process.

9.1

Proof of Gaussian property of Brownian Motion

Proof: of Theorem 20.


We need to show that all joint distributions of BM at time points t1 , t2 , . . . , tn
for all n = 1, 2, . . . are Multivariate Normal. Take BM started at 0, B0 = 0.
Start with n = 1. By the property of increments of BM, with s = 0 we have
Bt B0 has N (0, t) distribution. Hence Bt has N (0, t) distribution.
Now take n = 2. Write
(B(t1 ), B(t2 )) = (B(t1 ), B(t1 ) + (B(t2 ) B(t1 )).

38

Denote X = B(t1 ), and Y = B(t2 ) B(t1 ). By the property of independence of


increments of BM X and Y are independent, X N (0, t1 ), Y N (0, t2 t1 ).
Then
(B(t1 ), B(t2 )) = (X, X + Y ).

Write X = t1 Z1 ,Y = t2
t1 Z2 , where Z1 , Z2 are independent standard Normal. Denote 1 = t1 , 2 = t2 t1 . Then the vector
(X, X + Y ) = (1 Z1 , 1 Z1 + 2 Z2 ) = AZ,
)
(
1 0
. Therefore (X, X + Y ) is bivariate Normal with
where matrix A =
1 2
) (
( 2
)
1
12
t1 t1
T
=
mean vector (0, 0) and covariance matrix AA =
.
12 12 + 22
t1 t2
Similarly, for n = 3, the joint distribution of the vector (B(t1 ), B(t2 ), B(t3 )) is
trivariate normal with mean (0, 0, 0), and covariance matrix

t1 t1 t1
t1 t2 t2
t1 t2 t3
For a general n can complete the proof by induction. Alternatively, write directly
(B(t1 ), B(t2 ), . . . , B(tn ))
= (B(t1 ), B(t1 ) + (B(t2 ) B(t1 )), . . . , B(tn1 ) + (B(tn )) B(tn1 )),
| {z } | {z } |
{z
}
{z
}
| {z } |
Y1

Y1

Y2

Y1 +...+Yn1

Yn

Denote Y1 = B(t1 ), and for k > 1 Yk = B(tk ) B(tk1 ). Then by the property of
independence of increments of Brownian motion, Yk s are independent. They also
have normal distribution, Y1 N (0, t
1 ), and Yk N (0, tk tk1 ). B(t2 ) = Y1 + Y2 ,

etc, B(tk ) = Y1 +Y2 +. . . Yk . Z1 = Y1 / t1 , and Zk = Yk / tk tk1 are independent


standard normal.
Thus
(B(t1 ), B(t2 ), . . . , B(tn )) = A(Z1 , Z2 , . . . , Zn )T ,
with

1 0 0
1 2 0
A=
... ... ...
1 2 3

... 0

... 0
, 1 = t1 , k = tk tk1 .
... 0
. . . n
(B(t1 ), B(t2 ), . . . , B(tn ))

is a linear transformation of Z. This shows that this vector is a linear transformation of standard normal vector, therefore it is multivariate normal.
2

39

BMGauss Corollary Brownian motion is a Gaussian process with constant mean function,

and covariance function min(t, s).

Example Find the distribution of B(1) + B(2) + B(3) + B(4).


Consider X = (B(1), B(2), B(3), B(4)). Since Brownian motion is a Gaussian
process, all its nite dimensional distributions are Normal, in particular X has
a multivariate Normal distribution with mean vector zero and covariance matrix
given byij = Cov(Xi, Xj ). For example, Cov(X1 , X3 ) = Cov((B(1), B(3)) = 1.
1 1 1 1
1 2 2 2

=
1 2 3 3
1 2 3 4
Now, let a = (1, 1, 1, 1). Then
aX = X1 + X2 + X3 + X4 = B(1) + B(2) + B(3) + B(4).
aX has a Normal distribution with mean zero and variance aaT , the sum of the
elements of the covariance matrix in this case.
Thus B(1) + B(2) + B(3) + B(4) has a Normal distribution with mean zero
and variance 30. Alternatively, we can calculate the variance of the sum by the
covariance formula
V ar(X1 + X2 + X3 + X4 ) =
Cov(X1 + X2 + X3 + X4 , X1 + X2 + X3 + X4 ) =

Cov(Xi , Xj ) = 30.
i,j

40

9.2

Processes obtained from Brownian motion

Two process used in applications are the Arithmetic and Geometric Brownian
motion.
Arithmetic Brownian motion Xt = t + Bt , where and are constants. This
is also known as Brownian motion with drift.
Theorem 21 If Xt is Brownian motion with drift above then
Brownian motion.

Xt t

is a standard

It is easy to show that Xt is a Gaussian process.


Calculation of its mean and covariance functions is left as exercise.

Geometric Brownian motion


St = S0 et+Bt .
What is the distribution of St ? Compute its mean and variance.

In particular, we have a
= E(X|Y ) and X X
= X E(X|Y ) are uncorrelated.
Corollary X

= E(X|Y ). Take any Z,


Proof: of the Theorem that best predictor is X
which is a function of Y . We need to show that
2 E(X Z)2 .
E(X X)
(
)2
2

E(X Z) = E X X + X Z
(
)
2
2

= E(X X) + E(X Z) + 2E (X X)(X Z)


2 + E(X
Z)2 , by the previous result
= E(X X)
2.
E(X X)
= E(X|Y ) is the optimal, best predictor/estimator.
Thus X
2

41

9.3

Conditional expectation with many predictors

Let X, Y1 , Y2 , . . . , Yn be random variables.


minimizes the mean square error, i.e.
By denition, the optimal predictor X
for any Z function of Y s
2 E(X Z)2 .
E(X X)

based on Y1 , Y2 , . . . , Yn is given by
Theorem 22 The best predictor X
= E(X|Y1 , Y2 , . . . Yn ).
X
Conditional expectation given many random variables is dened similarly as
the mean of the conditional distribution. It is denoted by
E(X|Y1 , Y2 , . . . Yn )
Notation: If we denote the information generated by Y1 , Y2 , . . . , Yn by Fn then
E(X|Y1 , Y2 , . . . Yn ) = E(X|Fn ).

Note that often it is hard to nd a formula for the conditional expectation. But
in the multivariate Normal case it is known and is established by direct calculations.
Cond.Exp.MvN Theorem 23 (Normal Correlation) Suppose X and Y jointly form a multi-

variate normal distribution. Then the vector of conditional expectations is given by


the following
E(X|Y) = E(X) + Cov(X, Y)Cov 1 (Y, Y)(Y E(Y)).
Cov(X, Y) denotes the matrix with elements Cov(Xi , Yj ), Cov 1 (Y, Y) denotes
the inverse of the covariance matrix of Y .

best Pred Example Best predictor of X based on Y in Bivariate Normal

Direct application of the formula


E(X|Y ) = EX +

Cov(X, Y )
(Y EY ).
V ar(Y )

42

Example Best predictor of future value of Brownian motion based on the present
best PredBM value.

Consider best predictor of Brownian motion Bt+s at the future time t + s if we


know the present value Bt .
Since (Bt , Bt+s ) is a Bivariate Normal with V ar(Bt ) = t and Cov(Bt , Bt+s ) =
min(t, t + s) = t, EBt = 0 we obtain
E(Bt+s |Bt ) = Bt .
Further one can check that even if we know many past values of Brownian
motion at times t1 < t2 < . . . < tn = t
E(Bt+s |Bt1 , Bt2 , . . . , Bt ) = Bt .
This is known as the martingale property of Brownian motion.

43

9.4

Martingales of Brownian Motion

A process Mt , t 0, is a martingale if
for all t, E|Mt | <
for all t and s > 0, E(Mt+s |Mu , u t) = Mt
Introduce notation Ft = {Mu , u t} for the values of the process (prices) before
time t, the history up to time t.
Then the martingale property reads for all t and s > 0,
E(Mt+s |Ft ) = Mt .
(In fact, Ft is a model for the ow of information, called -eld, and their
collection is called ltration, but we dont study these concepts in this course.)

Theorem 24 The following processes are martingales


1. Bt .
2. Bt2 t.
3. eBt 2 t .
1

Proof: The proof is by direct calculations. Compute conditional expectations


by using representation
Bt+s = Bt + (Bt+s Bt )
and that the increment (Bt+s Bt ) is independent of the past Ft .
For 1. we have
E(Bt+s |Ft ) = E(Bt + (Bt+s Bt )|Ft ) = Bt + E((Bt+s Bt )|Ft )
= Bt + E(Bt+s Bt ) = Bt .
The proof of 2. and 3. is left as an exercise.
2

44

45

10
10.1

Stochastic Calculus
Non-dierentiability of Brownian motion

Theorem 25 Brownian motion Bt is not dierentiable at any point t (although it


is continuous at any point t).
While it is hard to prove that a random function Bt is no-where dierentiable, it
is easy to see why for any given point t0 derivative Bt0 does not exist.
Non-dierentiability of Brownian motion at a given point
Consider
Bt0 + Bt0

This random variable has distribution


1
1
N (0, ) = N (0, 1).

Clearly, as 0 this random variable converges to plus or minus .

10.2

It
o Integral.

Here we give a concise introduction to denition and properties of stochastic integral, Ito integral.
Firstly it is dened for simple processes as a sum, and then a general process
is approximated by simple ones.
If Xt is a constant c, then the integral should be
T
cdBt = c(BT B0 )
0

The integral over (0, T ] should be the sum of integrals over two sub-intervals
(0, a] and (a, T ]. Thus if Xt takes two values c1 on (0, a], and c2 on (a, T ], then the
integral of X with respect to B is easily dened.

Simple processes
46

A simple deterministic process Xt :


X t = ci
The Ito integral

T
0

if ti < t ti+1 , i = 0, . . . , n 1,

Xt dBt is dened as a sum

Xt dBt =
0

n1

(
)
ci B(ti+1 ) B(ti ) .
i=0

Example Let Xt = 1 for 0 t 1, Xt = 1 for 1 < t 2, and Xt = 2 for


2 < t 3. Then (note that ci = X(ti+1 ), i = 0, . . . , n 1)
c0 = X(t1 ) = X(1) = 1, c1 = X(t2 ) = X(2) = 1, c3 = X(t3 ) = 2

)
)
)
(
(
(
X(s)dB(s) = c0 B(1) B(0) + c1 B(2) B(1) + c2 B(3) B(2)
(
)
(
)
= B(1) + B(2) B(1) + 2 B(3) B(2)
= N (0, 1) + N (0, 1) + N (0, 4) = N (0, 6)

as sum of independent Normal rv.s.

10.3

Distribution of It
o integral of simple deterministic
processes

Ito integral of a deterministic simple process

T
0

n1

(
)
Xt dBt =
ci B(ti+1 ) B(ti )
i=0

is a Normal random variable with mean zero.


It can be written as cb with c = (c0 , . . . , cn1 ) and )random vector
b = (B(t1 ) B(t0 ), B(t2 ) B(t1 ), . . . , B(tn ) B(tn1 ) .
Use properties of multivariate Normal to see that stochastic integral of simple
deterministic processes, whichis the sum in this case, is a Normal random variable
T
with mean zero and variance 0 Xt2 dt.

A way to calculate the variance is by using properties of covariance


(
)

V ar( XdB) = Cov


XdB, XdB
47

( n1
)
n1
(
)
(
)
= Cov
ci B(ti+1 ) B(ti ) ,
cj B(tj+1 ) B(tj )
i=0

n1
n1

j=0

( (
) (
))
Cov ci B(ti+1 ) B(ti ) , cj B(tj+1 ) B(tj )

i=0 j=0

n1

( (
) (
))
Cov ci B(ti+1 ) B(ti ) , ci B(ti+1 ) B(ti )

i=0

n1

n1

)
V ar(ci B(ti+1 ) B(ti ) ) =
c2i (ti+1 ti ) =

i=0

10.4

i=0

X 2 (t)dt.
0

Simple stochastic processes and their It


o integral

If ci s are replaced by random variables i s, then in order to carry out calculations,


and have convenient properties of the integral, the random variable i s are allowed
to depend on the past values of Brownian motion, but not on future values.
Xt = i

if ti < t ti+1 , i = 0, . . . , n 1,

where i can depend on the values of Brownian motion Bt up to time t ti (Fti


measurable). The Ito integral is dened as

n1

(
)
Xt dBt =
i B(ti+1 ) B(ti )

i=0

It is also required that Ei2 < .

Properties of Ito integral


Note that due to independence property of Brownian increments,
and the) fact
(
that i depends only on the past values up to Bti , i and B(ti+1 ) B(ti ) are
independent.
This allows to establish that Ito integral has zero mean and variance
T
EXt2 dt.
0

This is similar to the case of simple deterministic processes, except for random i s
the distribution of Ito integral is no longer Normal.

48

10.5

It
o integral for general processes

Proposition
T 2Stochastic integrals are dened for adapted processes Xt ,
such that 0 Xt dt < . Adapted means that for a given t the value Xt may depend
on the past and present values of Brownian motion B(u), u t, but not on the
future values B(u) for u > t.
The integral for general processes is dened by approximation by integrals of
simple processes. This mathematical theory is too advanced to cover here.
T
T

(n)
Xt dBt = lim
Xt dBt = lim
X (n) (ti1 )B(ti ),
n

(n)

where Xt are simple adapted processes. The limit is the limit in probability,
which is not covered here.

10.6

Properties of It
o Integral

1. Linearity. If Xt and Yt are adapted processes and and are some constants
then
T
T
T
(Xt + Yt ) dBt =
Xt dBt +
Yt dBt .
0

2. If

T
0

EXt2 dt < then

T
Zero mean property. E 0 Xt dBt = 0
(
)2
T
T
Isometry property. E 0 Xt dBt = 0 E(Xt2 )dt
Note that there are cases when the Ito integral does not have mean.

1
Example Let J = 0 tdBt . We calculate E(J) and V ar(J).
1
Since 0 t2 dt < , Ito integral is dened. Since the integrand t is nonrandom, the
1
integral has the rst two moments, E(J) = 0, and E(J 2 ) = 0 t2 dt = 1/3.

49

Example
on [0, T ].

T
0

Bt dBt .
(

Therefore E

Bt2 dt < , because Bt is continuous and thus bounded


)

0
T
0

Bt2 dt

E
(

Bt dBt

= 0 and E

Bt2

T
0

tdt = T 2 /2 <

dt =
0

)2

Bt dBt

T
0

E (Bt2 ) dt =

T
0

tdt =

T 2 /2.

10.7

Rules of Stochastic Calculus

The rules of stochastic calculus are dierent to usual. This has to do with properties
of Brownian motion paths Bt .
In the usual calculus only terms which have dt are important, and higher order
terms are all taken to be zero.
(dt)2 = dtdt = 0
In stochastic calculus in addition to this
(dBt )2 = dBt dBt = dt,
but
dtdBt = dBt dt = 0

For a dierentiable function g(t) = gt

dgt = gt dt
and

(dgt )2 = (gt dt)2 = (gt )2 (dt)2 = 0.

But for Brownian motion


(dBt )2 = dBt dBt = dt.

One can recover stochastic calculus rules from the usual one by use of Taylors
formula (up to the second order terms)
Recall
1
f (x + dx) = f (x) + f (x)dx + f (x)(dx)2 + ....
2
50

The dierential of f (x) is the linear part of the increment over [x, x + dx].
Thus
df (x) = f (x)dx.
So if dx = 0.1, then

f (x) = f (x) 0.1

Inclusion of the next term will change only the next decimal place f (x) (0.1)2 .
So it is not included.

10.8

Chain Rule: Itos formula for f (Bt ).

Since (dBt )2 = dt gives a linear term dt, we need to keep the quadratic term to
obtain stochastic dierential.
1
df (x) = f (x)dx + f (x)(dx)2
2
2
Ito for BM Using (dBt ) = dt, and letting x = Bt , we have
1
df (Bt ) = f (Bt )dBt + f (Bt )dt
2

Example Calculate d(Bt2 )

Take f (x) = x2 . Then f (x) = 2x, f (x) = 2.


Taylors formula gives
d(x2 ) = 2xdx + 12 2(dx)2 = 2xdx + (dx)2
Now we put x = Bt , and obtain
d(Bt2 ) = 2Bt dBt + (dBt )2 = 2Bt dBt + dt.
The meaning of it is given by its the integral form
t

1
1 t
1 t
1
2
Bs dBs =
d(Bs )
ds = Bt2 t.
2 0
2 0
2
2
0
t
Compare stochastic integral 0 Bs dBs to the Riemann integral of a dierentiable
t
function g with g0 = 0, 0 gs dgs . Make the change of variable
t
0

gt

gs = u

gs dgs = g0 udu = 12 gt2


We see that in stochastic integral there is a stochastic correction term 12 t.

51

10.9

Martingale property of It
o integral

Let Xt be adapted and such that

T
0

EXt2 dt < . Then the process

Xs dBs
0

is a martingale. This can be easily proved for Ito integral of simple processes and
then by taking limits for general processes.

Examples.
T
T
t
1. Since 0 EBt2 dt = 0 tdt < , it follows that 0 Bs dBs is a martingale.
This is also veried by direct evaluation of the integral above
t
1
1
Bs dBs = Bt2 t,
2
2
0
which is a martingale, since Bt2 t is a martingale.
T
t
T
2. Since 0 E(eBt )2 dt = 0 E(e2Bt )dt < , it follows that 0 eBs dBs is a
martingale.

Now we give results that help to check the martingale property by using stochastic calculus. Stochastic integrals are martingales under some condition.
t
T
Proposition 0 Xs dBs is a martingale provided 0 EXt2 dt < .
It is now intuitively clear but can be proven that
Proposition For a process Mt to be a martingale, it is necessary that its stochastic dierential dMt has no dt term.
This proposition is used together with Itos formula to obtain equations for pricing
of options, such as Black-Scholes partial dierential equation.
Proposition Stochastic integral with respect to a martingale is again a martingale, provided some integrability conditions hold.

Examples.
1. Mt = Bt . Then dMt = dBt . Here Xt = 1.
martingale.

52

T
0

Xt2 dt = T < . Mt is a

2. Mt = Bt2 t. dMt = d(Bt2 ) dt = 2Bt dBt + dt dt = 2Bt dBt .


(
)
(
)
T
T
To check the technical condition here Xt = 2Bt . E 0 Xt2 dt = E 0 4Bt2 dt =
T
T
4 0 E(Bt2 )dt = 4 0 tdt < . Mt is a martingale.
3. Mt = eBt t/2 left as an exercise.
4. Let St = S0 et+Bt . Find condition on and so that St is a martingale.
First calculate dSt by using Itos formula. Then equate coecient of dt to
zero to obtain = 2 /2.

53

11
11.1

Stochastic Dierential Equations


Ordinary Dierential equation for growth

Consider the equation describing growth xt in which the rate of growth is constant
and is proportional to xt . For example the amount of money in a savings account
with continuously compounded interest, or bacteria growth.
If bt is the amount in the account at time t,
dbt is change in the account bt over interval of time [t, t + dt], where dt denotes
small change in time, e.g. 1 day.
continuous compounding means
dbt = rbt dt.
This is an ordinary dierential equation (ODE)

To solve it, notice that the variables separate


t=T
dbt
dbt
= rdt,
= rT,
bt
bt
t=0
change variable, say x = bt ,
x=b(T ) dx
x=b(T )
= ln x|x=b(0) = ln b(T ) ln b(0)=rT
x=b(0) x
Solving it, we have
bt = b0 ert

11.2

Black-Scholes stochastic dierential equation for stocks

Black and Scholes assumed that for a stock rate is


dSt
= adt + bdBt .
St
Rewriting this as an equation for the stock price,
dSt = aSt dt + bSt dBt
we obtain a stochastic dierential equation (SDE), which is understood in the
integral form
t
t
St = S0 +
aSu du +
bSu dBu ,
0

54

where the second integral is the new stochastic integral.

Most of the time is convenient to use instead of a and instead of b. We


chose the less common notation a and b to avoid confusion with the more general
class of equations below. With the new notation, the solution of the Black-Scholes
equation is given by is given by
1 2
)t+bB

St = S0 e(a 2 b

which is a Geometric Brownian motion, with Lognormal marginal distributions.


Notice that it is dierent to what we would expect in the exponential,
( 12 2 ) instead of . This is due to some new rules for stochastic integration.

11.3

Solving SDEs by Itos formula. Black-Scholes equation.

In stochastic dierential equations variables do not separate, but other techniques,


such as change of variables, called Itos formula, apply.
Let Xt solve
dXt = (Xt )dt + (Xt )dBt
Itos formula then

df (Xt ) = f (Xt )dXt + 2 (Xt )f (Xt )dt


2

Example The Black-Scholes equation


dXt = aXt dt + bXt dBt

BSsde

can be solved by using Itos formula with f (x) = ln x.


f (x) = 1/x and f (x) = 1/x2 .
By
( Itos)formula we have
d ln Xt = X1t dXt + 21 ( X12 )b2 Xt2 dt
t

1
(aXt dt
Xt

+ bXt dBt )

b2
dt
2

d ln Xt = (a

b2
)dt + bdBt .
2

Integrating we have
ln Xt ln X0 = (a
55

b2
)t + bBt ,
2

and nally
b2

Xt = X0 e(a 2 )t+bBt .

11.4

It
os formula for functions of two variables

Using Taylors formula for 2 variables and keeping quadratic terms (dx)2 , (dy)2 and
using (dBt )2 = dt, we obtain for a function f (x, y) of two diusion processes Xt , Yt
1 2f
f
f
1 2f
2
dXt +
dYt +
(dX
)
+
(dYt )2
t
x
y
2 x2
2 y 2

df (Xt , Yt ) =
+

2f
(dXt )(dYt ),
xy

Ito 2dim

where all derivatives of f are evaluated at the point (Xt , Yt ). Using rules for (dXt )2
we have
df (X t , Y t ) =

f
f
(X t , Y t )dX t + (X t , Y t )dY t
x
y
2
1 f
1 2f
2
+
(X
,
Y
)
(X
)dt+
(X , Y ) 2 (Y )dt
2 x2 t t X t
2 y 2 t t Y t
2f
+
(X , Y ) (X ) (Y )dt.
xy t t X t Y t

It
os formula is for functions of the form f (Xt , t)
Let f (x, t) be twice continuously dierentiable in x, and continuously dierentiable in t, then (by taking Yt = t) we have
df (X t , t) =

f
f
1 2
2f
(X t , t)dX t + (X t , t)dt+ X
(X t , t) 2 (X t , t)dt.
x
t
2
x

since (dYt )2 = (dt)2 = 0.


Example d(eBt t/2 ). Use a) f (x, t) = ext/2 . b) by parts.

56

11.5

Stochastic Product Rule or Integration by parts

The usual integration by parts formula states that for two dierentiable functions
u(t) and v(t)
d(uv) = udv + vdu.
Here we show similar rule when the functions are functions of Brownian motion,
Xt and Yt , and may not be dierentiable.
If we take f (x, y) = xy, then we obtain a dierential of a product (or the
product rule) which gives the integration by parts formula.
2
2
f
2f
= y, f
= x, xf2 = 0, yf2 = 0, xy
=1
x
y
d(Xt Yt ) = Xt dYt + Yt dXt + dXt dYt .

By parts

Expanding dXt dYt = X (Xt )Y (Yt )dt


d(Xt Yt ) = Xt dYt + Yt dXt + X (Xt )Y (Yt )dt.
Note that if one of the processes is a usual function, i.e. X = 0, the the
integration by parts is the same as in the ordinary calculus.
For example with Yt = ert
d(Xt ert ) = ert dXt rert Xt dt.

11.6

Ornstein-Uhlenbeck process.

OU sde Here we dene the process by the SDE (Langevin equation)

dX t = X t dt + dB t ,
where and are some nonnegative constants. We solve it and later show that it
gives the Ornstein-Uhlenbeck process, a Gaussian process with the specied mean
and covariance functions.
To solve this equation consider the process
Yt = X t et
Using the dierential of the product rule, we have
dY t = et dX t +et Xt dt
Using the SDE for dXt we obtain
dY t = et dB t
57

This gives

es dB s

Yt = Y 0 +
0

Now the solution for Xt

Xt = e

(
)
t
t
s
t
X0 +
e dBs = e X0 +
e(ts) dBs
0

11.7

Vasiceks model for interest rates

Vasiceks sde The spot rate in the Vasiceks model satises

drt = b(a rt )dt + dBt .


This equation conveys the mean reversion eect:
if the rate is above a then the increment a rt is negative, which pushes the
rate down; if the rate is below a then the increment a rt is positive, wich pushes
the rate up.

Writing the equation in the integral form and taking expectations and using
that E(Bt ) = 0
t
t
E(rt ) E(r0 ) =
b(a E(rs ))ds + E(Bt ) =
b(a E(rs ))ds.
0

Let ht = Ert , then we have an integral equation for it


t
ht h0 =
b(a h(s))ds.
0

Taking derivatives
dht = b(a ht )dt
This equation is solved by separating variables. Integrating from 0 to t, and per0
forming the change of variable u = hs , we have ln ah
= bt, and nally
aht
ht = a ebt (a h0 ).
Note that in the long run the rate approaches the value a, limt E(rt ) = ht = a.

58

11.8

Solution to the Vasiceks SDE

Xt = rt ht satises
dXt = drt dht = b(a rt )dt + dBt b(a ht )dt
= b(rt ht )dt + dBt
or dXt = bXt dt + dBt .
But this is the Ornstein-Uhlenbeck process, solution to this equation was found
earlier.
t
bt
Xt = e X0 +
eb(ts) dBs .
0

Hence

rt = Xt + ht = ht +

eb(ts) dBs

rt = a ebt (a r0 ) +

eb(ts) dBs .

It is seen that rt is normally distributed, it is a Gaussian process.

11.9

Stochastic calculus for processes driven by two or more


Brownian motions

Often one needs to model a number of correlated assets. To do this we use a


number of independent Brownian motions Bti , i = 1, 2, ..n.
Similar rules apply
(dt)2 = 0, dtdBti = 0, (dBti )2 = dt,
But for independent Brownian motions
dBti dBtj = 0.
Correlated Brownian motions
are given by

B1 and W = B1 + 1 2 B2 .

dB1 dW = dB1 (dB1 + 1 2 dB2 )

= (dB1 )2 + 1 2 dB1 dB2


= dt
check W is also a Brownian motion.

For each t, W (t) = B1 (t) + 1 2 B2 (t)


2
2
B
1 (t) = N (0, t) = N ( 0, t) = N (0, t).
1 2 B2 (t) = N (0, (1 2 )t).
The sum of two independent normals is also normal, with sum of means and
variances.
59

N (0, 2 t + (1 2 )t) = N (0, t).


are both Brownian motions and they are correlated with correlation .
Multivariate case
B = (B1 , B2 , ..., Bn )T independent BM.
Matrix A n n, W =AB Can write condition on A so that W is also BM
(each coordinate is BM). j a2i,j = 1 for each i.

11.10

Summary of stochastic calculus

t
t
Ito integral 0 Xs dBs is dened for adapted processes Xt with 0 Xs2 ds < .
T
t
If 0 E(Xs2 )ds < , then 0 Xs dBs , t T , is a martingale,
(
)2
t
t
t
E 0 Xs dBs = 0, E 0 Xs dBs = 0 EXs2 ds.
t
If Xt is deterministic, then 0 Xs dBs is a Normal random variable.
conventions:

(dBt )2 = dt, dBt dt = 0, (dt)2 = 0.

Chain rule (Itos formula).


1
df = f (Xt )dXt + f (Xt )(dXt )2
2
Product rule (Integration by parts).
d (Xt Yt ) = Xt dYt + Yt dXt + dXt dYt .
Chain rule (Itos formula) for functions of two variables.
df (X, Y ) =

f
f
(X, Y )dX +
(X, Y )dY +
x
y

1 2f
2f
1 2f
2
2
(X,
Y
)(dX)
+
(X,
Y
)(dY
)
+
(X, Y )(dX)(dY )
2 x2
2 y 2
xy

60

12
12.1

Options
Financial Concepts

Markets

In Finance a market is where people sell and buy nincial papers (agreements).
For example, stock market, bond market, currencies market (FX), options markets
etc.
http://www.asx.com.au/products/all-products.htm
Look up BHP. Price and history chart.

Shares

Options

Shares
To raise capital a company issues shares to shareholders. By buying a share a
shareholder has a part in that company. Prices of shares are determined by the
market (ASX) and uctuate in time.
Example: A paper that represents 1 share of BHP
1 SHARE of BHP
On February 23, 2011, the price of 1 BHP share was $46.58. On March 27,
2008, it was $24.80.
Notation: price of a share at time t is denoted by St .

Options
OPTION on BHP
This paper gives its holder the right to buy 1 share of BHP for K at time T (or
before).
Example: T =23/06/2011. K = 46.00. Price on 23/2/ 2011 is $2.695.
Option contract is on 1000 shares costs $2695.
More formally, an option is a contract between two parties, the buyer, the
other is the seller which either
1. gives its holder the right (not the obligation) to buy a certain amount of
shares of stock at the agreed price at any time on (or before) a given date
(call option); or
2. gives its holder the right (not the obligation) to buy a certain amount of
shares of stock at the agreed price at any time on (or before) a given date
(put option)
We denote by T the given date, and by K the amount of shares. The contract is
set up at time t prior to T .

61

Types

We have the following types of options.


American Options: An option can be exercised at any time before the expiration date T .
European Options: Options can be exercised only exactly on the expiration
date T .
Other nancial derivatives, Futures.
Forward (futures) contract
The parties initially agree to buy and sell an asset for a price agreed upon
today (the forward price) with delivery and payment occurring at a future point,
the delivery date. Example. Today (4/02/2016) we agree for a price of 1 BHP
share. The delivery and the payment at time T = (7/02/2016).

Bond

Bond
A bond is an instrument of indebtedness of the bond issuer to the holders. The
issuer owes the holders a debt and, depending on the terms of the bond, is obliged
to pay them interest and/or to repay at a later date, which we call the maturity
date

Savings account
Savings acc.

When the principal is invested in a continuously compounding account, then at


time t, the amount satises
dbt = rbt dt.
bt = b0 ert
In our example, if 3% is continuously compounding rate, then after 10 years
the amount will be 1000 e100.03 = 1349. 9
The above equation is the equation for exponential growth (with rate r)

Value of option at maturity T or Payo.


A European call with exercise price (strike) K is worth max(0, ST K) where
ST is the price at maturity T .

62

This is because: if ST < K the option is worthless (it gives the holder the right
to buy stock for K from the writer, but he/she can buy it from the market for
ST < K; if ST > K then the option is worth ST K. This is because the holder
can buy the share for K instead of price ST .
CT = max(ST K, 0)

12.2

Functions x+ and x .

Denote x+ = max(x, 0). x = max(0, x).


For example 5+ = 5, 5 = 0. (2)+ = 0, (2) = 2.
Note that both x+ and x are nonnegative.
Any number x can be written as
x = x+ x ,
and

|x| = x+ + x .

Letting x vary we have functions x+ and x .


Exercise: Draw the graphs of functions: x+ , x , x, |x|.

Why the plus function is useful? We can rewrite CT above as


CT = (ST K)+ .

Payo graph
Let ST = x then we have payo function of an option.
Here we take{K = 10, and x for S
0
if x 10
Payo(x) =
or (x 10)+ .
x 10 if x > 10
A European {
put with strike K pays max(0, K x) (option to sell)
0
if x 10
Payo(x) =
10 x if x < 10
+
or (10 x) .

12.3

The problem of Option price

The value of option at expiration T is given by the contract, eg (ST K)+ .


What is its value at time t < T ?
Since ST is random, it seems that the price at time zero should be
E(ST K)+ .
63

But this is not so, or at least we have to choose carefully the distribution! In fact it
could give the possibilities of arbitrage. Arbitrage, will be dene rigorously below.
Roughly, is the possibility of an agent to make money with no risk. We shall see
this rst on the simplest model to price this option. The one period model.

12.4

One-step Binomial Model

Binomial Model

Assume the simplest model for stock movement.


Trading in only one period. T = 1. Assume the following model
12

Current Price = 10

8
Suppose interest rate over the period is 10%.
Consider pricing of a call option with exercise price K = 10. Suppose the call
is priced at $1 per share. We claim that this price allows to make prot out of
nothing without taking any risk (arbitrage).
Consider the strategy: buy call option on 200 shares and sell 100 shares of
stock. At this stage it is not clear why we chose such strategy.
Look at what happens at all possibilities under the model.
S1 = 12 S1 = 8
Buy 200 options
-200
400
0
Sell (short) 100 shares 1000
-1200
-800
Invest
-800
880
880
Prot
0
+80
+80
In either case a prot of $80 is realized. In a case like this it is said that there
exists an arbitrage, i.e. a strategy of making money with no risk involved,
arbitrage also known as free lunch.
Thus the price of $1 for the option allows for arbitrage. $1 is too little. Arbitrage
strategies are not allowed by the theory.
Suppose the call is priced at $2. Then the opposite strategy will give arbitrage:
Sell calls on 200 shares and buy 100 shares
S1 = 12 S1 = 8
Sell 200 options
400
-400
0
Buy 100 shares -1000
1200
800 In this case the reverse strategy
Borrow
600
-660
-660
Prot
0
+140
+140
gives an arbitrage opportunity.
The price that does not allow for arbitrage strategies is $1.36.
How to compute it? We show it in the next section
64

12.5

One-period Binomial Pricing Model.

One period T = 1
Assume that in one period stock price moves up by factor u or down by d,
d < 1 < u. In the previous example, d = 8/10 and u = 12/10.
So that the model for the random future price of stock at time 1, S1 is Su and
Sd . These values are realized with some probabilities, but it turns out that they
are not important for the purpose of option pricing.
uS

Current Price = S

dS
Savings account: in one period model (and other discrete time models) the interest
rate r > 1 (eg. 10% is 1.1). It corresponds to er in continuous time models.
The value of an option at time 1 is denoted by C1 which is given by the following
formula.

Cu = (uS K)+ if the price goes up


C1 =

Cd = (dS K)+ if the price goes down


Note that the values of the claim Cu and Cd can be computed.

12.6

Replicating Portfolio

A portfolio that replicates payo of C consists


savings account.
After one period the value of this portfolio
{
auS + br
aS1 + br =
adS + br

of a shares of stock and b dollars in


is
if S1 = uS
if S1 = dS

replicating portfolio

Since this portfolio is equivalent to the claim C, we obtain two equations with
two unknowns,
}
auS + br = Cu
adS + br = Cd
Solving them gives
uCd dCu
Cu Cd
, b=
(u d)S
(u d)r
Thus to avoid arbitrage C must equal to the following,
a=

C = aS + b
65

This is because if C is larger than the portfolio, then we can use the following
strategy to make money: sell the option and buy the portfolio. In one period they
will be same. If it is priced below this value then buy it and sell the portfolio. This
gives an arbitrage strategy.
Example (continued):
12 u = 1.2

10

8 d = 0.8
C1 = (S1 K)+ . Let us nd the price C of this option.

Call with K = 10,

2 = Cu
C =?
0 = Cd
Take r = 1.1.
Then by solving equations for a and b we nd
a = 0.5, b = 3.64.
Thus this option is replicated by the portfolio consisting of borrowing 3.64 dollars
and buying 0.5 share of stock.
The initial value of this portfolio is
0.5 10 3.64 = 1.36,
which gives the no-arbitrage value for the call option.

12.7

Option Price as expected payo

Some algebraic manipulations allow to represent the price of the option as an


expected value of its nal payo, but using some other articial probability p.
The formula for the price of claim C can be written as
C = aS + b =

1
[pCu + (1 p)Cd ]
r

with
p=

rd
ud
66

Bin 1step Pricing

can be viewed as the discounted expected payo of the claim, with probability p of
up and (1 p) down movements.
This probability p is calculated from given returns of the stock and has nothing
to do with subjective personal assessment of market going up or down.
This recovers the main principle of pricing options by no arbitrage which applies
in all other models: the price of an option is the expected discounted payo but
under new probability.
{
C1 =

Cu with probability p
Cd with probability 1 p

C = 1r E(C1 )
For the call option C1 = (S1 K)+
C = 1r E(S1 K)+
= 1r [(uS K)+ p + (dS K)+ (1 p)] = 1r pCu
In our example p =

1.10.8
1.20.8

= 0.75. So C =

1
2
1.1

0.75 = 1.36.

Remark When is p a probability? When does it make sense? We need d r u.


Otherwise, we have arbitrage. Why?
Remark We can represent the above measure as a martingale measure. This is
explained in the next section. This is the core of mathematical nance, and is the
main ingredient for the rst and second fundamental principles of nance. Notice
that we were lucky to be able to identify the martingale measure. Many times we
have to content ourselves to know its existence.

12.8

Martingale property of the stock under p

Consider two random variables X and Y . We say that {X, Y } is a two step
martingale with E(Y |X) = X and both E(|X|) and E(|Y |) are nite. Next, we
show that the price of the one period binomial model is connected with a particular
two step martingale.
Theorem 26 The discounted by r stock price St , t = 0, 1, is a martingale under
rd
.
new probability p = ud
Proof: Since there are only two values S0 and S1 all we need to check is that
E(S1 /r|S0 ) = S0 .
67

But under the new probability


P (S1 = uS0 ) = p, P (S1 = dS0 ) = 1 p,
hence
E(S1 /r|S0 ) = uS0 p + dS0 (1 p) = uS0

rd
ur
+ dS0
= S0 r,
ud
ud

and result follows.


2
Notice the process of pricing translated into nding a martingale measure.
This will be seen in a general abstract setting in the next chapter. In particular,
here we were able to write explicitely the probability distribution which achieves
the martingale property. This can be computed explicitely only in few cases during
this course. In the next chapter we give sucient conditions for the existence and
the uniqueness of such a probability measure.

12.9

Binomial Model for Option pricing.

The one-period formula can be applied recursively to price the claim C if trading
is done one period after another.
Take 2-period, T = 2, model. If all the parameters are the same for both
periods (r, u, d) then
Cuu

Cu

Cud
C

Cdu

Cd

Cdd
where
Cu = 1r [pCuu + (1 p)Cud ], Cd = 1r [pCdu + (1 p)Cdd ]
and using the formula again
C = 1r [pCu + (1 p)Cd ] =

1
[p2 Cuu
r2

+ p(1 p)Cud + (1 p)pCdu + (1 p)2 Cdd ]

C, again, is the discounted expected payo of the security, where probability of


market going up is p.
68

C=

1
E(C2 )
r2

Multiperiod model
Continuing by induction, if Cudu...du = Cu...ud...d ,
T ( )
1 T i
p (1 p)T i Cu...u d....d .
C= T
| {z } |{z}
r i=0 i
i

T i

In particular for a call option,


T ( )
1 T i
C= T
p (1 p)T i (ui dT i S K)+
r i=0 i

the price of a call now which is to be exercised T periods from now.


C, again, is the discounted expected payo E(CT ) of the option, where there
are n outcomes, probability of outcome i is the binomial probability for the market
going up i times and down n i times, with probability of going up p.

12.10

Black-Scholes formula

BS formula The price of a European call option at time t is given by

Ct = St (ht ) Ker(T t) (ht T t)

where
(h) is the standard normal distribution function (also denoted by N (h) in
h
2
nance). (h) = 12 ex /2 dx.
St is the stock price at time t
r is continuously compounding interest rate
is the volatility, the standard deviation of the return on the stock
T exercise (maturity) time of the call, T t is time remaining to expiration
K is the exercise (strike) price

ht =

ln(St /K) + (r + 2 /2)(T t)

T t

Remarks.
(h) gives the number ofshares held in the replicating portfolio, the of the
portfolio. Ker(T t) (ht T t) gives the amount borrowed in the replicating
portfolio.

69

Example On July 28, 2000 the following information is found. BHP last sale
S=18.50. For August call option with strike K=18.50,
Working out Black-Scholes value:
Time to expiration=1 month=1/12=0.083
Interest rate r = 0.062- from the Bank Bill
Take volatility 0.25. Hence Black-Scholes call price is C = 0.5789

70

13

Options pricing in the Black-Scholes Model

The market model involves two or more assets. One is riskless (savings account)
with value at time t t . Others are risky asset (stocks). We consider only one risky
asset St .
The model for t is ert , dt = rt dt.
The model for stock prices is given by
BS Model
dSt = St dt + St dBt .
Here is annual yield on stock, the mean of returns, and is the volatility or
the standard deviation of returns.
We have seen that solution to this SDE is given by
St = S0 e(

2
)t+Bt
2

Note that the marginal distribution of St are Lognormal.

13.1

Self-nancing Portfolios

A portfolio is a combination of a certain number of shares and money in savings


account. The number of shares at time t is at and bt in savings account.
The price of 1 share at time t is St . The value of the shares in the portfolio
is at St . The savings account value in the portfolio is bt t . Thus the value of the
portfolio is given by
Vt = at St + bt t .
Denition (at , bt ) is a self-nancing portfolio if no funds are added or withdrawn
since its initial value V0 . The change in the portfolio is only through re-distribution
of funds.
Therefore a self-nancing portfolio is dened by (Self-nancing condition)
dVt = at dSt + bt dt .

self-fin.

This implies that the value of a self-nancing portfolio at any time equals to its
initial value plus the gain in trade,
t
t
Vt = V0 +
au dSu +
bu du .
0

71

13.2

Replication of Option by self-nancing portfolio

Theorem 27 Suppose we can nd a self-nancing portfolio (at , bt ) that replicates


an option that pays X at time T ,
VT = X.
Then the price of this option at time t < T must be given by the value of this
portfolio at time t,
Ct = Vt .
Since the value of the portfolio is known at time t the above equation gives the
value of the option, and solves the option pricing problem.

Proof: If Ct < Vt then sell the portfolio and buy the option. The dierence is
Vt Ct > 0. At time T the value of the option and portfolio are the same (condition
of Theorem). It costs nothing to evolve the portfolio as it is self-nancing. Thus
we have arbitrage prot Vt Ct times the interest. Since arbitrage is not allowed
we must rule out that Ct < Vt .
If Ct > Vt then the opposite strategy of selling the option and buying the
portfolio results in arbitrage prot. Thus cannot have Ct > Vt . The only possibility
left is Ct = Vt .
2
In nance a self-nancing replicating portfolio is called a hedge.

13.3

Replication in Black-Scholes model

Black, Scholes and Merton approach.


(at , bt ) is a self-nancing replicating portfolio.
By the above result
Ct = Vt .
Hence
dCt = dVt = at dSt + bt d(ert ),
because portfolio (at , bt ) is self-nancing
Now by using Itos formula
dCt = dC(St , t) =

C
1 2C
C
dSt +
dt +
(dSt )2
x
t
2 x2

BSpde

Comparing the two equations (separating the terms with dSt and dt) we obtain
at =

C
(St , t)
x
72

and
rt

bt d(e ) =

)
C 1 2 C 2 2
+
St dt,
t
2 x2

where all derivatives are taken at (St , t).

13.4

Black-Scholes Partial Dierential Equation

We derive the PDE for the price of option, and then give its solution (we dont
solve it)
Putting these back into the equation Ct = Vt
Ct = at St + bt ert
and replacing St by x we obtain the Black-Scholes PDE.
1 2 2 2C
C C
x
+ rx
+
rC = 0
2
2
x
x
t
Boundary conditions for a call option with exercise price K

BSPDE

C(x, T ) = (x K)+ , C(0, t) = 0.


Solution to the Black-Scholes PDE (derived in 1973 by Black and Scholes) is
the B-S formula

C(x, t) = x(ht ) Ker(T t) (ht T t),

BS formula

ln(x/K) + (r + 12 2 )(T t)

T t
by direct verication.
ht =

Proof:

Corollary The replicating self-nancing portfolio for a call option in the BlackScholes model is given by

at = (ht ), bt = KerT (ht T t).


Remark For other options in the Black-Scholes model the same PDE (13.4)
holds, but boundary conditions are dierent. For an option with payo g(x) boundary conditions are:
C(x, T ) = g(x), C(0, t) = er(T t) g(0).

73

13.5

Option Price as discounted expected payo

It can be seen by using calculations with Lognormal random variable that the
Black-Scholes formula can be written as the discounted expected nal payo of the
option
C = erT EQ (ST K)+ ,
but for a dierent probability Q. This probability makes the discounted stock price
St ert into a martingale. Q is called an equivalent martingale probability measure
(EMM), also known as risk-neutral probability.

13.6

Stock price ST under EMM Q

Options are priced not under the real probability measure but under the riskneutral, EMM Q. For calculations of options prices including simulations equations
for stock under Q must be used, not the original model.
t such that Xt = St ert = X0 eBt 2
Theorem 28 There is a Brownian motion B
is a martingale. Further the SDE for St with new Brownian motion is

2t

t ,
dSt = rSt dt + St dB
with solution for ST
1

ST = S0 e(r 2

2 )T + B

Derivation relies on the Girsanovs theorem below.


Proof: Write stochastic dierential for St ert (product rule)
d(St ert ) = ert dSt rert St dt = ert (St dt + St dBt rSt dt)
(
)
r
rt
= St e
dt + dBt .

t is a Brownian
By Girsanovs theorem with c = r
, there is Q so that ct + Bt = B

motion.
Hence under Q the sde for the discounted stock price Xt = St ert is
t .
dXt = Xt dB
Solving this, we have
Xt = X0 eBt 2 t .

Recall now that eBt 2 t is a martingale (exponential martingale of Brownian


t ). Hence under Q, the discounted stock Xt = St ert is a martingale.
motion B
2

74

Remarks
The eect of Q is changing to r in the coecient of dt. Financially it makes
sense: in the risk-neutral world (Q) the return is r (the same as risk-free rate) not
(average return > r can be only due uncertainty in returns, ie. when there is
a possibility of losses).
When the price of an option is evaluated by simulations, the sde for stock under
Q must be used.
1 2

S0 eBT 2 T has Lognormal distribution with parameters log S0 12 2 T and


2 T . Doing the (long) calculation we obtain the BS formula.

Girsanovs theorem states that if we have a Brownian motion with drift, then
there is an equivalent measure under which this process is a Brownian motion.
Theorem 29 (Girsanov) Let Bt , 0 t T be a Brownian Motion (under the
original probability measure P ) and c be a constant. Then there exists an equivalent
t = Bt + ct is a Q Brownian motion.
measure Q such that the process B
The proof is outside this course.

75

14
14.1

Fundamental Theorems of Asset Pricing


Introduction

Pricing options.
Denition: A contingent claim (derivative) with delivery time T , is a random
variable X FT . It represents that at t = T the amount X is paid to the holder
of the claim by the seller.
Example: (European Call Option) X = max[ST K, 0] = (ST K)+
(ST =stock price at time T )
Want to nd a price so that there no arbitrage possibilities.
Arbitrage
An arbitrage strategy is a way to make money out of nothing without taking
risk. An arbitrage possibility is a miss-pricing on the market. In mathematical
theory of options models in which arbitrage strategies exist are not allowed.

14.2

Arbitrage

Denition. An arbitrage strategy is a self-nancing portfolio with


V (0) = 0 and P (V (T ) 0) = 1 and P (V (T ) > 0) > 0.
Interpretation: Borrowing money and investing it in a risky asset represents a
portfolio with V (0) = 0.
If V (T ) 0 for any possibility and V (T ) > 0 sometimes then such portfolio
gives an arbitrage strategy.
However, if there is also a chance to loose money, ie. V (T ) < 0 even with a
small probability, then this portfolio is not an arbitrage strategy.

76

14.3

Fundamental theorems of Mathematical Finance

The rst theorem gives a necessary and sucient condition for models not to have
arbitrage strategies.
1st Fund.

Thm. Theorem 30 (First fundamental theorem) A model does not have arbitrage

strategies if there is an equivalent martingale probability measure (EMM) Q (also


known as risk-neutral) such that the discounted stock price St ert is a martingale.
Equivalent Probability measure EMM
A probability Q is equivalent to probability P if they agree on what is possible
and impossible, ie. for an event A
Q(A) = 0 if and only if P (A) = 0.
For example, any two normal probabilities N (1 , 12 ) and N (2 , 22 ) are equivalent
whereas they are not equivalent to exponential distribution, which assigns zero
probability to negative half line. Also any discrete distribution is not equivalent to
a continuous distribution.
The probability measures referred to in the theorem are on space of values of
stock prices, which is more complex than the real line. We do not cover this.
Proof of the theorem relies on Functional Analysis results and advanced Probability and is not given here.

Since we have seen that there is EMM in the Black-Scholes model, we have
Corollary Black-Scholes model does not have arbitrage.
Remark. It is possible to prove it directly by showing that the discounted portfolio Vt ert is a martingale (see Theorem 32) and using the fact that a martingale
has a constant mean.

Example: Binomial model.


S1 = uS0 or dS0 . The process S0 , S1 /r is a martingale if
E(S1 /r) = S0 .
E(S1 ) = puS0 + (1 p)dS0 .
Solving for p we have
p=

rd
.
ud
77

It is a a probability only if d < r < u. This is the no-arbitrage condition in the


Binomial model.
Exercise: Give arbitrage strategies when the condition d < r < u does not hold.

When can an option be hedged, ie. replicated by a self-nancing portfolio?


Denition A market model is called complete if any option can be replicated by
a self-nancing portfolio.
2nd Fund.

Thm. Theorem 31 (Second fundamental theorem) A market model is complete, ie.

any option on stock can be replicated by a self-nancing portfolio, if there is only


one EMM (equivalent martingale measure), ie. the EMM Q is unique.

14.4

Completeness of Black-Scholes and Binomial models

Since the EMM Q is unique, the market model of BS is complete. This means that
any option can be replicated by a self-nancing portfolio and therefore priced by
no-arbitrage approach.

In the Binomial model


if d < r < u then the martingale probability Q exists and is unique. Therefore
the one-step Binomial model is arbitrage free and is complete, ie. any option can
be replicated and priced by a self-nancing portfolio.

14.5

A general formula for option price

We know that arbitrage method consists of nding a self-nancing replicating portfolio and then Ct = Vt . But how to nd Vt ? It is possible to give a general formula.
It relies on the insight that the discounted portfolio can be represented as an integral with respect to the discounted stock price.

PortfMG Theorem 32 If the discounted stock price is a martingale then the discounted value

of a self-nancing portfolio is also a martingale.


78

Proof:

d(Vt ert ) = ert dVt + Vt d(ert ).

Using the self-nancing condition dVt = at dSt + bt d(ert ) and Vt = at St + bt ert we


have
d(Vt ert ) = ert at dSt + ert bt d(ert ) + (at St + bt ert )d(ert ).
= at (ert dSt + St d(ert )) = at d(St ert ).

Thus
Vt e

rt

= V0 +

au d(Su eru )

is a stochastic integral with respect to a martingale. Hence it is a martingale.


2

Corollary The price of an option is given by the discounted expected payo taken
Price Formula under the martingale probability, e.g. for the call option the price at time t is
Ct = er(T t) EQ (X|Ft ) .
For example, for call option

(
)
= er(T t) EQ (ST K)+ |Su , u t

Proof:

To avoid arbitrage
Ct = Vt .

But Vt ert is a Q-martingale. Therefore Ct ert is a Q-martingale.


Claim pays CT = X at time T . By martingale property (E(MT |Ft ) = Mt )
Ct ert = EQ (XerT |Ft ).
2

Remark If the interest rate r is itself random rt and savings account is given by
t

t = e
OptionPrice then the pricing formula takes form

Ct = EQ

rs ds

)
t
X|Ft .
T

79

(2)

14.6

Summary

Option is characterized by its payo, a function of the price on expiration,


or a functional of future prices.
Options are priced by matching their payo with a self-nancing portfolio.
The price of the option at any time is the price of this portfolio.
The price of an option is the expected discounted payo. The expectation is
taken in the risk-neutral world, under the arbitrage-free probability Q.
In the Binomial model the arbitrage-free probability is given by

rd
ud

In the Black-Scholes model the arbitrage-free probability is given by changing


the drift for the model in stock, from to r.

80

15
15.1

Models for Interest Rates


Term Structure of Interest Rates

If $1 is invested at time t until time T > t it will result in an amount greater than
$1 due to interest. The length of investment period T t is called term. Money
invested for dierent terms yield a dierent rate of interest. The function R(t, T )
of the argument T is called the yield curve, or the term structure of interest rates.
The rates are not traded. They are derived from prices of bonds, which are
traded on the bond market. This leads to construction of models for bonds and
no-arbitrage pricing for bonds and their options. In this section we denote the
standard Brownian motion by Wt rather than Bt (This is because sometimes in
other texts the bond is denoted by Bt ).

15.2

Bonds and the Yield Curve

A $1 bond with maturity T is a contract that guarantees the holder $1 at T .


Sometimes bonds also pay a certain amount, called a coupon, during the life of
the bond, but for the theory it suces to consider only bonds without coupons
(zero-coupon bonds). Denote by P (t, T ) the price at time t of the bond paying $1
at
T , P (T, T ) = 1. The yield to maturity of the bond is dened as
Yield
R(t, T ) =

ln P (t, T )
,
T t

and as a function in T , is called the yield curve at time t. Assume also that a
savings account paying at t instantaneous rate r(t), called the spot (or short) rate,
savings acc. is available. $1 invested until time t will result in
t

(t) = e

15.3

r(s)ds

General bond pricing formula

To avoid arbitrage between bonds and savings account, a certain relation must
hold between bonds and the spot rate. If there were no uncertainty, then to avoid
arbitrage the following relation must hold
P (t, T ) = e
81

T
t

r(s)ds

since investing either of these amounts at time t results in $1 at time T . When the
T
rate is random, then t r(s)ds is also random and in the future of t, whereas the
price P (t, T ) is known at time t, and the above relation holds only on average.

No arbitrage approach is used for pricing of bonds and their options. The
market model for bonds is incomplete. Hence there are many EMM. The model
for rates is often specied under the EMM Q.
We can use the fundamental theorem to price a bond as an option on the rate.
By
Arbitrage pricing theory the price of the bond P (t, T ) is given by
Bond price
P (t, T ) = EQ (e

T
t

r(s)ds

| Ft ),

where Q is the EMM, such


that simultaneously for all T T the processes
t
P (t, T )/(t) = P (t, T )e 0 r(s)ds are martingales.
This formula is just a martingale condition for the above martingale at times t
and T : (E(MT |Ft ) = Mt )
)
(
t
T
E P (T, T )e 0 r(s)ds |Ft = P (t, T )e 0 r(s)ds .
Now use P (T, T ) = 1 and re-arrange.

15.4

Models for the spot rate

Some of the well-known models for the spot rate.


Merton The Merton model

dr(t) = dt + dW (t).
Vasicek The Vasicek model

dr(t) = b(a r(t))dt + dW (t).


CIR The Cox-Ingersoll-Ross (CIR) model

dr(t) = b(a r(t))dt +

r(t)dW (t).

Hull-White The Hull-White model

dr(t) = b(t)(a(t) r(t))dt + (t)dW (t).

82

15.5

Forward rates

Bond Fwd rates Forward rates f (t, T ), t T T are dened by the relation

P (t, T ) = e

T
t

f (t,u)du

Thus the forward rate f (t, T ), t T , is the (continuously compounding) rate at


forward f(t,T) time T as seen from time t,
f (t, T ) =
spot f(t,t)

ln P (t, T )
,
T

The spot rate r(t) = f (t, t). Consequently the savings account (t) grows according to
t
(t) = e 0 f (s,s)ds .
The class of models suggested by Heath, Jarrow, and Morton (1992) is based
on modelling the forward rates. We dont cover this.

15.6

Bonds in Vasiceks model

Recall Vasiceks model for interest rate. We have seen that the solution to the
Vasiceks SDE
dr(t) = b(a r(t))dt + dW (t).
is given by
bt

rt = r0 e

bt

+ a(1 e

)+

eb(ts) dWs .

Vasicek Bond

Formula for the bond prices in the the Vasiceks model


P (t, T ) = eA( )C( )rt ,
where
= T t is the time to maturity also called term
C( ) =

1eb
,
a

A( ) = (C( ) )(a

2
)
2b

2
C( )2
4b

From the bond prices forward rates can be determined and then the yield curve.
Exercise: Find these for the Vasiceks model.

83

15.7

Bonds in Cox-Ingersoll-Ross (CIR) model

CIR sde has the same drift as Vasiceks but diusion coecient is the square root.
Unlike Vasiceks model CIR process is always positive (this is not proved here).
dr(t) = b(a r(t))dt +

r(t)dW (t).

Bond prices have similar form except for dierent functions A( ) and C( ).
P (t, T ) = eA( )C( )rt ,
(
)

2(e 1)
2e(+b) /2
2ab
,
log
where C( ) = (+b)(e
A(
)
=
, = b2 + 2 2 .
1)+2
2
(+b)(e 1)+2
From the bond prices forward rates can be determined the yield curve by formula. Exercise: Find these for the CIR model.

15.8

Options on bonds

A call option to buy a bond at time S with maturity T gives its holder the right to
buy the T -bond at time S < T . It pays (P (S, T ) K)+ at time S. The arbitragefree price of this call at time t < S is given by the Option Pricing formula by
replacing X by its expression in this case
)
( S
EQ e t ru du (P (S, T ) K)+ |Ft .
T
In Vasiceks model the conditional distribution of t r(s)ds given Ft is the same
as that given r(t) (Markov property) and is a Normal distribution. Hence in the
Vasiceks model the price of bonds is Lognormal with known mean and variance,
and a closed form expression for the price of an option on the bond can be obtained.
It looks like a version of the Black-Scholes formula.

Options on bonds are used to cap interest rates. It can be seen that a cap
corresponds to a put option, and a oor to a call option.
A cap is a contract that gives its holder the right to pay the rate of interest
smaller of the two, the oating rate, and rate k, specied in the contract. A party
holding the cap will never pay rate exceeding k, the rate of payment is capped at k.
Since the payments are done at a sequence of payments dates T1 , T2 , . . . , Tn , called
a tenor, with Ti+1 = Ti + (e.g. = 14 of a year), the rate is capped over intervals
of time of length . Thus a cap is a collection of caplets.

Consider a caplet over [T, T + ]. Without the caplet, the holder of a loan
must pay at time T + an interest payment of f , where f is the oating, simple
84

dates
fi
t

T0

T1

T2

Ti

Ti+1

Tn

Figure 3: Payment dates and simple rates.


rate over the interval [T, T + ]. If f > k, then a caplet allows the holder to pay
k. Thus the caplet is worth f k at time T + . If f < k, then the caplet is
worthless. Therefore, the caplets worth to the holder is (f k)+ . In other words,
a caplet pays to its holder the amount (f k)+ at time T + . Therefore a caplet
is a call option on the rate f , and its price at time t, as any other option, is given
by
the expected discounted payo at maturity under the EMM Q,
Caplet
(
)
(t)
+
Caplet(t) = EQ
(f k) Ft .
(T + )

By denition, P (T,T1 +) = 1 + f . This relation is justied as the amounts


obtained at time T + when $1 invested at time T in the bond and in the investment
account with a simple rate f . Thus
(
)
1
1
f=
1 .
(3) fP
P (T, T + )

15.9

Caplet as a Put Option on Bond

We show next that a caplet is in eect a put option on the bond. From the basic
(T )
relation (EMM) P (T, T + ) = E( (T
| FT ). Proceeding from (15.8) by the law
+)
of double expectation, with E = EQ
(t)(T )
1
(
1 k)+ | FT ) | Ft )
(T )(T + ) P (T, T + )
1
(T )
(t)
(
1 k)+ E(
| FT ) | Ft )
= E(
(T ) P (T, T + )
(T + )
(t)
1
= (1 + k)E(
(
P (T, T + ))+ | Ft ).
(T ) (1 + k)

Caplet(t) = E(E(

(4)

1
Thus a caplet is a put option on P (T, T + ) with strike (1+k)
, and exercise time T .
In practical modelling, as in models with deterministic volatilities, the distribution
of P (T, T + ) is Lognormal, giving rise to the Black-Scholes type formula for a
caplet, Blacks (1976) formula.

85

You might also like