You are on page 1of 26

An Introduction to

Monte Carlo Simulations

Notes by

Mattias Jonsson

Version of October 4, 2006

Math 472, Fall 2006


Numerical Methods with Financial Applications

Contents
1. Introduction
2. Computing areas, volumes and integrals
2.1. Examples
2.1.1. Example A: Area of a rectangle
2.1.2. Example B: Area of a disc
2.1.3. Example C: Area of a superellipse
2.1.4. Example D: Volume of an n-dimensional ball
2.1.5. Example E: Computing a triple integral
2.2. Methods
2.3. First method: exact computation
2.4. Second method: the lattice method
2.4.1. Example A
2.4.2. Examples B and C
2.4.3. Example D
2.4.4. Example E
2.5. Third method: the Monte Carlo method
2.5.1. Examples B and C
2.5.2. Example D
2.5.3. Example E
3. A crash course in probability theory
3.1. Outcomes, events and probabilities
3.2. Random variables
3.3. Discrete random variables
3.4. Continuous random variables
3.5. Expectation and variance
3.6. Some important distributions
3.6.1. Binomial random variables
3.6.2. Poisson random variables
3.6.3. Normal random variables
3.7. Random vectors
3.8. Independence
3.9. Conditional expectation and variance
4. Computing expected values
4.1. General strategy
4.2. Sampling
4.2.1. Sampling discrete random variables
4.2.2. Sampling continuous random variables
4.3. The law of large numbers
4.4. Presenting the results
4.5. Examples
4.5.1. Flipping a coin
4.5.2. Rolling dice
4.5.3. Normal random variables

4
4
4
4
4
4
4
4
4
5
5
5
5
6
6
7
7
8
9
9
9
10
10
10
11
11
12
12
12
13
13
13
14
14
14
15
16
16
17
18
18
19
19

4.5.4. Computing integrals


4.6. Variance reduction techniques
4.6.1. Antithetic variables
4.6.2. Moment matching
5. Examples from Financial and Actuarial Mathematics
5.1. Option prices in the Black-Scholes model
5.2. The collective loss model
5.3. The binomial tree model
5.3.1. Calibration
5.3.2. Pricing a barrier option

20
20
21
21
22
22
23
24
25
25

1. Introduction
These notes are meant to serve as an introduction to the Monte Carlo
method applied to some problems in Financial and Actuarial Mathematics.
Roughly speaking, Monte Carlo simulation is a method for numerically
computing integrals and expected values. It is a probabilistic method at
heart, based on simulating random outcomes or scenarios. Such a method
may bring up the analogy with gambling at the casino. At any rate the
method is named after the casino in Monte Carlo, Monaco.
2. Computing areas, volumes and integrals
2.1. Examples. In order to introduce the Monte Carlo method, we start
by giving a few examples of problems where the method applies. First we
list the problems, then we discuss methods of solving them.
2.1.1. Example A: Area of a rectangle. We start by a seemingly trivial problem: compute the area of the rectangle
= [0.2, 0.8] [0.1, 0.6]

sitting inside the square [0, 1]2 .

2.1.2. Example B: Area of a disc. The next example seems essentially as


easy: compute the area of the unit disc
= {(x, y) | x2 + y 2 < 1}

sitting inside the square [1, 1]2 .

2.1.3. Example C: Area of a superellipse. Next we want to compute the area


of a so called superellipse
= {(x, y) | x4 + y 4 < 1}

sitting inside the square [1, 1]2 .

2.1.4. Example D: Volume of an n-dimensional ball. In a given dimension


n 1 we wish to find the volume of the unit ball
= {(x1 , . . . , xn ) | x21 + + x2n < 1}

sitting inside the square [1, 1]n .

2.1.5. Example E: Computing a triple integral. We wish to compute the


triple integral
!!!
ex1 +x2 +x3 dx1 dx2 dx3 .
x41 +x42 +x43 <1

2.2. Methods. How can we compute the quantities in the examples above?
We shall discuss three different methods.
Exact computations.
The lattice method.
The Monte Carlo method.

2.3. First method: exact computation. Of course, if we can compute


an area, a volume or an integral exactly, then there is usually no need to
employ a numerical method. Consider the examples above.
In Example A, the area equals 0.3.
In Example B, the area is .
In Example C, it is possible but not so easy to compute the exact
area.
In Example D, it is also possible to find the exact answer. For example, the answer is 1, and 4/3, 2 /2 for n = 1, 2, 3, 4, respectively.
In Example E, however, it is (as far a I know) not possible to compute
the exact answer as a closed formula.
2.4. Second method: the lattice method. The lattice method essentially exploits the definition of the (multiple, Riemann) integral. Let us
illustrate it on the five examples above.
2.4.1. Example A. Fix an integer J 1 and set N = J 2 . Divide the unit
square [0, 1]2 into N equally large subsquares Rk , k = 1, . . . , N . Each square
has area 1/N = 1/J 2 . Let pk be the center of Rk . These centers have
2j1
coordinates ( 2i1
2J , 2J ), where 1 i, j J. Now we use the approximation
Area() AN =

"

Area(Rk ) =

pk

1
#{k | pk }
N

Exercise: Check that A16 = 0.25 but that AN = 0.3 whenever J is divisible
by 10.
2.4.2. Examples B and C. The lattice method can be used to compute the
area of quite general domains in the plane. We proceed as follows:
First find a rectangle R that you know contains the region .
Pick a (large) integer J 1 and set N = J 2 .
Divide the rectangle R into N equally large subrectangles Rk , 1
k N.
Let pk be the center of Rk . The centers form a lattice, explaining
the name of the method!
Approximate the exact area A = A() of by
AN =

"

pk

Area(Rk ) =

Area(R)
#{k | pk }.
N

The method is easy to implement as long as it is easy to decide


whether a given point belongs to or not. This is the case in
Examples B and C.
It follows by the definition of A = Area() that AN converges to A
as N (i.e. J ).

One can check that if is a nice domain (more precisely, if it has


sufficiently smooth boundary), then the convergence rate is equal to
1
1
O( ) = O( ).
J
N
This convergence is not very fast. Basically, to get one more significant digit, we have to increase N by a factor 100.
Here is a simple matlab function that uses the lattice method to compute
an approximation to the area of the unit disk.
function area=da_lattice(J)
v=(2*[1:J]-1)/J-1;
[x,y]=ndgrid(v);
area=4*sum(sum(x.^2+y.^2<1))/(J^2);
Using this program we compute the approximations AN for a few values
of N .
N
1 100 10,000 1,000,000
AN 4 3.2 3.144 3.1418

The accuracy is in accordance with the rate of convergence O(1/ N ).


2.4.3. Example D. In principle the lattice method works in any dimension
n. In the case of the unit ball we would divide the cube [1, 1]n (which has
volume 2n ) into N = J n subcubes having centers pk , 1 k N . Then we
would approximate the volume of the ball by
"
2n
Vol(Rk ) =
VN =
#{k | pk }.
N
pk

The drawback of this method is that for large dimensions n, and in fact
even for n = 3, the number of subcubes N = J n of subcubes has to be
very large in order to achieve a reasonably accurate number. In fact, one
can prove that for smoothly bounded domains (such as a ball), the rate of
convergence is
1
1
O( ) = O( 1/n ).
J
N
1
Notice that N 1/n decreases as n increases. This phenomenon is sometimes
called the curse of dimensionality.
2.4.4. Example E. Computing an integral instead of an area is not much
harder. In Example E we proceed as follows:
Let R be the unit cube [1, 1]3 and the unit ball.
Pick a (large) integer J 1 and set N = J 3 .
Divide the cube R into N equally large subcubes Rk , 1 k N .
Let pk be the center of Rk .

Approximate the exact value of the integral I by


"
Vol(R) "
IN =
Vol(Rk ) f (pk ), =
f (pk ),
N
pk

pk

ex1 +x2 +x3 .

where f is the function f (x1 , x2 , x3 ) =


It should be clear how to modify the algorithm in order to compute more
general integrals
!
!!
!!!
f (x) dx,
f (x1 , x2 ) dx1 dx2 ,
f (x1 , x2 , x3 ) dx1 dx2 dx3 , . . .

The implementation is easy as long as we can evaluate the function f , and


decide whether a given point belongs to the domain .
It follows by the definition of the integral that IN converges to I as N
(i.e. J ). As before, for nice domains and nice functions f ,
the convergence rate is equal to
1
1
O( ) = O( 1/n ),
J
N
where n is the dimension.

2.5. Third method: the Monte Carlo method. The idea of the Monte
Carlo method is quite simple. Instead of using all points on a lattice, we
pick them totally randomly. We illustrate the method on Examples B E
above.
2.5.1. Examples B and C. Let us use the Monte Carlo method to compute
the are of a domain in the plane.
First find a rectangle R = Rx Ry that you know contains the region
.
Pick a (large) integer N (not necessarily of the form N = J 2 .
Pick 2N independent samples x1 , . . . , xN and y1 , . . . , yN from the
uniform distribution on Rx and Ry , respectively. Set pk = (xk , yk ).
Then p1 , . . . , pN are independent samples of points from the uniform
distribution of R.
Approximate the exact area A = A() of exactly as before, that
is, by AN , where
"
Area(R)
AN =
Area(Rk ) =
#{k | pk }.
N
pk

As before, the method is easy to implement as long as it is easy to


decide whether a given point belongs to or not.
Notice that AN is a random variable. Different runs will give rise to
different approximations!
We hope that AN converges (in some sense) as N . More on
this later.

The following simple matlab function computes an approximation to the


area of the unit disk using the Monte Carlo method.
function area=da_mc(N)
x=2*rand(1,N)-1;
y=2*rand(1,N)-1;
area=4*sum(x.^2+y.^2<1)/N;
The function rand in matlab samples from the uniform distribution U (0, 1)
on the interval (0, 1).
Using this program we compute the approximations AN for a few values
of N .
N
1 100
10,000 1,000,000
AN 4 3.3200 3.1356 3.1420
(If you run this program, you may not get the exact same numbers!) The
accuracy is lower than thatof the lattice method, but it seems like the rate
of convergence is still O(1/ n), as we will verify later.
At this stage it may not be clear why, and in what sense, AN would
converge to A. We will give a probabilistic explanation shortly.
2.5.2. Example D. The Monte Carlo method is easy to implement in any
dimension n. In Example D we would draw nN independent samples xik ,
1 i n, 1
lek N from the uniform distribution on (1, 1). Then we would set
2n
VN =
#{k | x21k + + x2nk < 1}
N
and hope that VN converges to the actual volume as N .
The following matlab function computes the volume of the unit ball in
dimension dim using the Monte Carlo method.
function vol=bv_mc(dim,N)
x=2*rand(dim,N)-1;
vol=2^dim*sum(sum(x.^2)<1)/N;
Using this program we compute the approximations VN for a few values
of N .
N 1 100 10,000 1,000,000
VN 0 5.28 4.888 4.9329
The exact value is 2 /2 = 4.9348. The accuracy seems to be about as
good as in dimension 2. In fact, one of the advantages of the Monte Carlo
method is that it does not suffer from the curse of dimensionality discussed
above.

2.5.3. Example E. Having worked through the previous examples, it should


now be straightforward to modify the lattice method for the multiple integral
in Example E to instead use the Monte Carlo method. We approximate the
(multiple) integral
!
I=

by

IN =

Vol(R) "
f (pk ),
N
pk

where pk are points taken from a uniform distribution on a cube R containing


the domain .
The following matlab function computes the volume of the unit ball in
dimension dim using the Monte Carlo method.
function int3d=int3d_mc(N)
x=2*rand(3,N)-1;
int3d=(2^3/N)*sum((sum(x.^4)<1).*exp(sum(x)));
Again, it may not be obvious why (and in what sense) IN should converge
to I. We will discuss this shortly.
3. A crash course in probability theory
Since the Monte Carlo method is probabilistic in nature, we need some
probability theory in order to understand it. In this section we recall some
relevant notions.
For now we consider the static or one-period situation where there
are two points in time: today and tomorrow. We do not know the state
of the world tomorrow, but we can build probabilistic models based on our
knowledge of the world today.
3.1. Outcomes, events and probabilities. There is a space of outcomes or states. One may think of an outcome as a roll of a die.
We wish to assign likelihoods to not only individual outcomes, but to
whole subsets of , or events. For technical reasons, we may not be able
to assign a likelihood to all subsets of . The correct notion is as follows:
there exists a collection F of subsets of that is closed under complements
and under countable unions and intersections. Such a collection is called a
-algebra.
The intuition behind this is that a -algebra F represents information.
For instance, suppose the outcome is the roll of a regular six-sided die but
that we are wearing bad glasses so that we can only tell whether we rolled
an even or odd number. Then = {1, 2, 3, 4, 5, 6} and the events we know
anything about are , {1, 3, 5}, {2, 4, 6} and , so the relevant -algebra
contains exactly those events, and no others. For instance, {1} is not an

10

event in F, since it corresponds to knowing that the we rolled a 1, and


nothing else.
A probability measure is a function P : F [0, 1], satisfying certain
natural conditions. It should be though of as a rule that assigns a likelihood
to an event. An event happens almost surely if it has probability one. The
triple (, F, P ) is called a probability space.
3.2. Random variables. A random variable is, formally, a function X :
R which is F-measurable, that is, for any x R, the set {
| X() x} is in F.
The intuition behind this is that the information encoded in F about the
experiment allows us to tell us whether or not X is less than x. By properties
of -algebras this means that we are able to tell many other things about
X, for instance whether X belongs to a given interval.
For many purposes, what we need to know about a random variable is
encoded in its distribution function FX : R [0, 1], defined by
FX (x) = P {X x}.

Notice that there is no explicit reference to or F here.


A natural example is to think of X as the price of an asset tomorrow.
Given information today, we typically want to know the probability of the
asset price tomorrow being, say, less that $100.
3.3. Discrete random variables. A random variable X is discrete if it
can take only finitely or countably many values x1 , x2 ,. . . . The simplest
nontrivial example is that when X is the number of heads after flipping
a fair coin once. In this case X takes only the values 0 and 1. A more
complicated example is that of a Poisson random variable X, which can
take any nonnegative integer as its value.
Consider a discrete random variable X taking only the nonnegative integers as values, i.e. X = 0, 1, 2, . . . . Then the distribution of X is conveniently
described by the probability mass function (pmf) fX . This function is simply
defined by
fX (n) = P {X = n}.

We can easily obtain the cdf (cumulative distribution function) from the
pdf:
"
FX (x) =
fX (n)
nx

3.4. Continuous random variables. Together with the discrete ones, the
continuous random variables are the most important ones. While a discrete
random variable can only take (at most) countably many values, the probability of a continuous random variable taking a specific value is always
zero.They can be defined as random variables X admitting a probability

11

density function (pdf) fX . The (pdf) is related to the cumulative distribution function by
! x
fX (u) du.
FX (x) =

Thus a random variable X can be though of as a random variable for which


the cdf is (more or less) differentiable. By contrast, the cdf of a discrete
random variable is a step function.
3.5. Expectation and variance. Intuitively, it is natural to talk about
the typical or average value of a random variable X, as well as the
size of the typical fluctuation of X around this value. The corresponding mathematical notions are those of expected value (or expectation) and
variance.
If X is discrete, taking values X = 0, 1, 2, . . . then the expected value is
given by

"
E[X] =
nfX (n).
n=0

More generally, if g : N R is a function, then g(X) is again a discrete


random variable (not necessarily integer-valued) with expectation
E[g(X)] =

"

g(n)fX (n).

n=0

When instead X is continuous, the expected value is given by


!
E[X] =
xfX (x) dx.

More generally, if g : R R is a function, then g(X) is a (not necessarily


continuous) random variable with expectation
!
E[g(X)] =
g(x)fX (x) dx.

For an arbitrary random variable X, the variance of X is defined by


Var[X] = E[(X E[X])2 ] = E[X 2 ] E[X]2 .

In general, there is no guarantee that the integrals or sums defining the


exposed value or variance are convergent. There are random variables for
which the expected value or variance are not defined. However, these random
variables will not play an important role in these notes.
3.6. Some important distributions. There are many important families
of distributions that are commonly used throughout applied mathematics.
We shall make no attempt to list them all, but content ourselves with recalling a few selected distributions.

12

3.6.1. Binomial random variables. A binomial random variable with parameters (m, p) can be thought of as the number of heads after tossing m
independent biased coins, where the probability of heads is p for each single
throw. If X is binomial, then we write
X Bin(m, p).
Clearly X then is discrete, takes values 0, 1, . . . , m and the pmf of X is given
by
# $
m k
fX (k) =
p (1 p)mk .
k

Sometimes a binomial random variable with parameter (1, p) is called a


Bernoulli random variable. A binomial random variable can be seen as the
sum of m i.i.d. Bernoulli random variables.
A binomial random variable X Bin(m, p) has expected value and variance given by
E[X] = mp

and

Var[X] = mp(1 p).

Binomial random variables are often used to model the number of events
during a fixed time period.
3.6.2. Poisson random variables. Another distribution commonly used to
model the number of events is the Poisson distribution. We write
X Poisson()
for a Poisson random variable with parameter . Then X takes values
X = 0, 1, 2, . . . and has pmf
fX (k) = e

k
.
k!

A Poisson random variable X Poisson() has expected value E[X] =


and Var[X] = .
It is possible to view the Poisson random variable as the limit of binomial
random variables, but we shall not dwell on this here.
3.6.3. Normal random variables. A very important example is a normal (or
Gaussian) random variable, having density function
#
$
1
(x m)2
fX (x) =
exp
,
2 2
2 2
for some constants m R and > 0. We write X N (m, 2 ).
If X N (m, 2 ), then E[X] = m and Var[X] = 2 .

13

3.7. Random vectors. Let X1 , . . . , Xn be random variables on the same


probability space (, F, P ). We may think of this as an n-dimensional random vector X = (X1 , . . . , Xn ).
The joint distribution function of X is the function FX : Rn [0, 1]
defined by
FX (x1 , . . . , xn ) = P {X1 x1 , . . . , Xn xn }.
We say that X is continuous if there exists a function fX 0, the joint
density function, such that
! x1
! xn
FX (x1 , . . . , xn ) =
...
fX (u1 , . . . , un ) du1 . . . dun .

A normal random vector is specified by its with mean m = (m1 , . . . , mn )


and its positive definite variance-covariance matrix R = (Rij )1i,jn . It is
a continuous random vector X with joint density function
#
$
1
1
T 1

fX (x) =
exp (x m) R (x m) .
2
(2)n/2 detR

3.8. Independence. There is an intuitive (but imprecise) notion of two


real-world events happening independently of each other. In probability
theory, this is formulated as follows: given a probability space (, F, P ),
two events A, B F are independent if P (A B) = P (A)P (B). For our
applications, it is more important to talk about independent random variables. We say that the random variables X1 , . . . , Xn are independent if, for
any events Ai F, we have
P {X1 A1 , . . . , Xn An } = P {X1 A1 } . . . P {Xn An }.

This means that the joint distribution function FX of (X1 , . . . , Xn ) is given


in terms of the individual distribution functions as
FX (x1 , . . . , xn ) = FX1 (x1 ) . . . FXn (xn ).
If X is continuous with density function fX , then this is in turn equivalent
to
fX (x1 , . . . , xn ) = fX1 (x1 ) . . . fXn (xn ).
3.9. Conditional expectation and variance. Let (, F, P ) be a probability space. If B F is an event with nonzero probability, then the we can
define a new probability measure on (, F) by
P (A|B) =

P (A B)
.
P (B)

This is called the probability of A conditioned on B, or the conditional probability of A given B. If A and B are independent, then P (A|B) = P (A).
The expectation of a random variable X with respect to this measure is
given by
E[X 1A ]
E[X|B] =
,
P (B)

14

where 1A is the (F-measurable) random variable whose value at is 1 if


A and 0 if
/ A. The interpretation of this is the expected value of X
given that the event B occurred.
If X and Y are F-measurable random variables, then we want to define
E[X|Y ], the conditional expectation of X given Y , to be a function of Y .
Its value at Y = y should mean the expected value of X given that Y is
taking value y. Since the probability of the latter event could be zero, one
has to be careful in defining E[X|Y ]. One way is to define E[X|Y ] to be
the function g(Y ) of Y that minimizes the value of E[(X g(Y ))2 ].
We may also define the conditional variance of X given Y :
Var[X|Y ] = E[(X E[X|Y ])2 |Y ] = E[X 2 |Y ] E[X|Y ]2 .

An important property is the law of iterated expectations, asserting that


E[X] = E[E[X|Y ]].
A consequence of this formula is
Var[X] = E[Var[X|Y ]] + Var[E[X|Y ]].
4. Computing expected values
The Monte Carlo method, as far as we will use it, is a method for computing (numerically) the expected value E[X] of a random variable X. Notice
that both the probability of an event and the variance of a random variable
can be viewed as the expectation of some other random variable, so the
Monte Carlo method will allows us to estimate probabilities and variances,
too.
We shall also see that the computations of area, volumes and multiple
integrals in Section 2.5 can be cast as computations of expected values.
4.1. General strategy. In principle, the Monte Carlo method is very simple. To compute the expected value E[X] of X one can do as follows:
(i) Pick a large number N
(ii) Generate (somehow) N independent samples 1 , 2 ,. . . ,N of X.
%
(iii) Compute EN = N1 N
n=1 n .
(iv) We hope that EN E[X] if N is large enough.
We shall discuss the convergence in (iv) (and the choice of N ) later on. For
now we shall briefly discuss how to sample the random variables (ii) and
how to present and interpret the results from a Monte Carlo simulation.
4.2. Sampling. It can be tricky to sample for an arbitrary distribution. In
matlab, there are two basic commands for generating samples of random
variables:
The command rand samples from U (0, 1), the uniform distribution
on the interval [0, 1].
The command randn samples from N (0, 1), the standard normal
distribution.

15

By adding arguments to these functions, matlab will produce matrices of


random numbers. For instance, rand(2,3) will produce 2x3 matrix whose
elements are independent samples from U (0, 1).
It should be noted the numbers produced by matlab or any similar software are not true random numbers but merely pseudo-random numbers. We
will not discuss what this means here. The algorithms used by matlab seem
fairly robust and tend to produce good result if used properly.
One more remark about matlab is in order. Every time you start a matlab
session, the seed used by the random number generator is reset. This
means that you may get the same result when running the same program
several times. Occasionally, this may be desirable, but typically it is not a
good thing. A standard way around this problem is to reset the seed using
the computers clock. For instance, one can use the command
rand(state,sum(100*clock))
This command should then be invoked at the start of the matlab session,
but not every time a particular program is run. Resetting the seed too often
will compromise the samples.
4.2.1. Sampling discrete random variables. One can use the function rand
in matlab to sample discrete random variables.
For example, to generate N independent samples of a Bernoulli random
variable, i.e. a random variable X with pmf fX (1) = p, fX (0) = 1 p for
some 0 < p < 1 we can set
x=(rand(1,N)<p);
As another example, we generate a N independent samples of a uniform
distribution on the integers 1, 2, . . . , N , i.e. fX (n) = 1/n for 1 i n:
x=ceil(n.*rand(1,N));
More generally, if X is a discrete random variable taking values X =
a1 , a2 , . . . and with pmf fX (an ) = P {X = an } = pn , then we can generate
samples of X by observing that X has the same distribution as the random
variable X % given by
X % = an ,
where n = n(Y ) is a random variable defined by
n1
"
k=1

pk < Y <

n
"

pk ,

k=1

where Y has a uniform distribution on the interval (0, 1). The idea is of
course that Y can be easily sampled in matlab. We leave it to the reader to
figure out the exact matlab implementation of the algorithm above.

16

4.2.2. Sampling continuous random variables. In principle one can sample


any continuous random variable X (with values in R) as follows:
X = FX1 (Y ),

that is,

Y = FX (X),

where Y is uniformly distributed on (0, 1) (hence can be sampled using rand


and FX : R [0, 1] is the cdf for X.
For example, when X is exponentially distributed with expected value
one, then
FX (x) = 1 ex

which implies

FX1 (y) = log(1 y).

In general, however, it can be difficult to apply this procedure in a computationally efficient way, unless there is an explicit formula for the inverse
cdf FX1 .
A special (and important) case that matlab knows how to handle well is
that of a standard normal distribution. The command
x=randn(1,N)
generates a vector of N independent samples from the standard normal
distribution function.
Sometimes one can do simple transformations to get new distributions.
For instance,
x=mu+sigma*randn(1,N)
generates samples from the normal distribution N (, 2 ) with mean
and variance 2 , and
x=exp(mu+sigma*randn(1,N))
generates samples from a log-normal distribution.
4.3. The law of large numbers. The reason why Monte Carlo works is
the Law of Large Numbers.
Theorem (Kolmogorovs Strong Law of Large Numbers). If X1 , X2 , . . . , Xn , . . .
are i.i.d. random variables with (well-defined) expected value E[Xn ] = ,
%
then N1 N
n=1 Xn a.s. as N .
This version of the law of large numbers does not tell us how fast the
convergence is. In general, we have a rule of thumb: In Monte Carlo Simulations, the error is O( 1N ) when using N samples.
Note: this is quite slow: to get 1 more decimal, need 100 times more
simulations!
The reason for this rule of thumb is

17

Lemma. If X1 , X2 , . . . , Xn , . . . are i.i.d. random variables with expected


value E[Xn ] = and variance Var[Xn ] = 2 , then
'
&
'
&
N
N
1 "
2
1 "
Xn = and Var
Xn =
E
N
N
N
n=1

n=1

%N
1
The formula for the variance
n=1 Xn
implies that the random variable N
has standard deviation / N .
While the strong law of large numbers is rather difficult to prove, the
two formulas above are elementary. A more precise (and non-elementary!)
version of the rule of thumb is given by
Theorem (Central Limit Theorem). If X1 , X2 , . . . , Xn , . . . are i.i.d. random variables with mean and variance 2 : i.e. E[Xn ] = and Var[Xn ] =
2 , then
%N
n N
1 X
N (0, 1) as N .
N

In other words, after subtracting the mean and dividing by the standard
deviation, the sum converges to a standard normal/Gaussian.
In particular we get
N
1 "
2
Xn N (0, )
N
N
( 1 )*
+
error
so the error has standard deviation N .

4.4. Presenting the results. The Monte Carlo method differs from most
other methods by its probabilistic nature. In particular, the approximation
we get will vary from simulation to simulation. For this reason it is important
to present the results of a Monte Carlo simulation in an instructive way.
It is a good habit to always include the following elements in a presentation of the results of a Monte Carlo simulation:
N : the number of paths
x
N : the Monte Carlo estimate
&N : the standard error
A convergence diagram
Let us explain the items above. We have a random variable X with
(unknown) expected value E[X] = and variance Var[X] = 2 and we
are interested in computing . To this end, we use independent samples
x1 , . . . , xN from X and form the average
1
(x1 + x2 + + xN ).
N
This is the Monte Carlo estimate of referred to above.
x
N =

18

The standard error &N is supposed to measure the error in the Monte Carlo
simulation. Of course, we cannot know the error exactly, or else there would
be no point in doing the Monte Carlo simulation in the first place. What we
would like to know is the approximate size of the error. Now, if we think of
x
N as a random variable rather than a number, then
by the Central Limit
Theorem, xN is approximately distributed as N (, 2 / N ). Unfortunately,
we do not know ! However, we can use the following estimate for :
,
/N
0
- 1
"

N = .
x2i N x
2N
N 1
i=1

The standard error is then defined as

N
&N = .
N
Finally, a convergence diagram is a plot of x
n against n for 1 n N . It
is a good visual tool for determining whether a given Monte Carlo simulation
is close to having converged. See the examples below for more details on
how to do this.
4.5. Examples. Let us illustrate the Monte Carlo method for approximating expected values in a few simple cases. More interesting examples will be
given later on.
4.5.1. Flipping a coin. Let X be a Bernoulli random variable, i.e. X = 0
or X = 1 with probability fX (0) = 1 p and fX (1) = p. Then X can be
though of as the number of heads after a single toss of an unfair (if p -= 0.5)
coin. We wish to use Monte Carlo simulations to approximate the expected
value of X and compare it with the exact value E[X] = p.
We saw above how to sample X using matlab. The following matlab code
generates N samples, computes a Monte Carlo estimate approximate value
of E[X], computes the standard error, and produces a convergence diagram.
In the code, we have set p = 0.4 and N = 100000 but this can of course be
modified.
clear;
N=100000;p=0.4;
x=(rand(1,N)<p);
hx=cumsum(x)./[1:N];
eps=sqrt((sum(x.^2)-N*hx(N)^2)/(N-1))/sqrt(N);
fprintf(No paths: %i\n,N);
fprintf(MC estimate: %f\n,hx(N));
fprintf(standard error: %f\n,eps);
Nmin=ceil(N/100);

19

figure(1);plot([Nmin:N],hx(Nmin:N));
title(MC convergence diagram: expectation of Bin(1,0.4));
xlabel(No paths);
ylabel(estimate);
4.5.2. Rolling dice. Next we want to use Monte Carlo simulations to compute the probability of rolling a full house in one roll in the game of
Yahtzee. This means rolling five (fair, independent, six-sided) dice and getting a pattern of type jjjkk, where 1 j, k 6 and j -= k.
Recall that the probability of an even A is the same as the expected value
of the random variable 1A taking value 1 if A occurs and zero otherwise.
We can use the following matlab code to compute a Monte Carlo estimate
of the probability of a full house.
clear;
N=100000;
x=sort(ceil(6*rand(5,N)));
y=((x(1,:)==x(2,:)==x(3,:))&(x(4,:)==x(5,:))&(x(3,:)~=x(4,:)))|...
((x(3,:)==x(4,:)==x(5,:))&(x(1,:)==x(2,:))&(x(2,:)~=x(3,:)));
hy=cumsum(y)./[1:N];
eps=sqrt((sum(y.^2)-N*hy(N)^2)/(N-1))/sqrt(N);
fprintf(No paths: %i\n,N);
fprintf(MC estimate: %f\n,hy(N));
fprintf(standard error: %f\n,eps);
Nmin=ceil(N/100);
figure(1);plot([Nmin:N],hy(Nmin:N));
title(MC convergence diagram: probability of full house);
xlabel(No paths);
ylabel(estimate);
4.5.3. Normal random variables. Let Xi , i = 1, 2, 3 be independent random
normal variables with mean i and variance i2 , respectively. Let us compute
the expected value of the difference between the maximum and the minimum
of the Xi . In other words, we want to compute E[Y ], where
Y = max{X1 , X2 , X3 } min{X1 , X2 , X3 }

For this we use the normal random number generator discussed above. A
possible partial matlab code is as follows:
clear;
N=100000;
mu=[0.1 0.3 -0.1];
sigma=[0.1 0.2 0.5];

20

x=randn(3,N);
z=repmat(mu,1,N)+repmat(sigma,1,N).*x;
y=max(z)-min(z);
hy=cumsum(y)./[1:N];
eps=sqrt((sum(y.^2)-N*hy(N)^2)/(N-1))/sqrt(N);
fprintf(No paths: %i\n,N);
fprintf(MC estimate: %f\n,hy(N));
fprintf(standard error: %f\n,eps);
(We have not included the code for plotting the convergence diagram.)
4.5.4. Computing integrals. Our first examples of Monte Carlo simulations
concerned the computation of multiple integrals and, as special cases, areas
and volumes of regions in the plane or space. Let us show how these computations fit into the framework of computing expected values of random
variables.
Let us stick to dimension three for definiteness. Suppose we want to
compute the triple integral
!!!
I=
f (x1 , x2 , x3 ) dx1 dx2 dx3 ,

where is a (bounded) region in the plane and f is a (sufficiently nice)


function defined on .
Pick a large box R of the form
R = [a1 , b1 ] [a2 , b2 ] [a3 , b3 ] = {ai xi bi , i = 1, 2, 3}

such that is contained in R. Define a trivariate function f on R by


1
f (x1 , x2 , x3 ) if (x1 , x2 , x3 )
f(x1 , x2 , x3 ) =
0
if (x1 , x2 , x3 )
/
Then we have

I = E[f(X1 , X2 , X3 )],

where X1 , X2 , X3 are i.i.d. random variables and Xi U (ai , bi ) is uniformly


distributed on [ai , bi ] for i = 1, 2, 3.
This point of view leads to the Monte Carlo simulation algorithm that we
discussed in connection with Example E above.
4.6. Variance reduction techniques. The Monte Carlo method is, generally speaking, easy to implement and does not suffer from the curse of dimensionality. It does however, have the drawback that it is relatively
slow,
with rate of convergence (in the probabilistic sense) equal to O(1/ N ).
Here we will briefly discuss a couple of methods for speeding upthe convergence. They will typically still have a convergence rate of O(1/ N ) but the
constant hidden in the O()-terminology may be smaller. More precisely,

21

the variance of the Monte Carlo estimator is small, explaining the name
variance reduction methods for these techniques.
In these notes we shall cover two variance reduction methods: antithetic
variables and moment matching. There are many more variance reduction
techniques that we have to leave out due to limited space. Some of the more
important ones that we have left out are stratified sampling, importance
sampling, control variates and low-discrepancy sequences (or quasi-Monte
Carlo methods).
4.6.1. Antithetic variables. The method of antithetic variables is based on
the symmetry of the distributions U (0, 1) and N (0, 1):
if Y U (0, 1) then 1 Y U (0, 1).
if Y N (0, 1) then Y N (0, 1).
This can be exploited as follows. Suppose we need to generate 2N samples
of a random variable Y , and that this is done through 2N samples of a
uniform random variable Y U (0, 1), say X = f (Y ) for some function f .
Normally we would use 2N independent samples
y1 , . . . , y2N
of Y and covert these to samples xn = f (yn ) of X.
With antithetic variables, we would instead only use N independent variables
y1 , . . . , y N
and generate the 2N samples of Y as
f (y1 ), f (1 y1 ), f (y2 ), f (1 y2 ), . . . , f (yN ), f (1 yN ).
If the underlying random variable Y was normal rather than uniform,
say Y N (0, 1), then we would generate 2N samples of X as
f (y1 ), f (y1 ), f (y2 ), f (y2 ), . . . , f (yN ), f (yN ).

The method of antithetic variables typically reduces the variance of the


Monte Carlo estimator, but it is hard to quantify how large the reduction
will be. Generally speaking, antithetic variables are good to use unless there
is a specific reason not to.
4.6.2. Moment matching. Moment matching is somewhat similar in spirit
to antithetic variables. Suppose we want to sample a random variable X =
f (Y ), where Y N ((0, 1). Let y1 , . . . , yN be iid samples of y and think of
the yn s as random variables. Then the random variables
1
%

= N1 N
n=1 yn
1 %N
2

= N 1 n=1 (yn
)2

have expected values

E[
] = 0

and E[
2 ] = 1.

22

Of course, there is no reason why for a particular sample, we would have

= 0 and
2 = 1.
The idea is now to correct the numbers y1 , . . . , yN to match the first two
moments:
1
zn = (yn
), n = 1, . . . , N

Now use y1 , . . . , yN as the generated samples of N (0, 1), i.e. set xn = f (yn ).
5. Examples from Financial and Actuarial Mathematics
To end these notes we give outline how to apply the Monte Carlo technique
on three different problems in Financial and Actuarial mathematics.
5.1. Option prices in the Black-Scholes model. Let us consider a (simplified) model for the price of a call option on a stock. We have a so called
one-period model with two time points: today (t = 0) and one time unit
(e.g. one year) from now (t = 1). The stock price S0 today is known. The
stock price at t = 1 is a random variable. We assume that it is a log-normal
random variable. This means that
S1 = S0 eY ,
where Y N (r 2 /2, 2 ) is a normal random variable with mean r 2 /2
and variance . Here r is the interest rate and is the volatility of the stock.
We are interested in computing the price of a call option expiring at time
t = 1 and with strike price K. Such an option gives the holder (i.e. the
owner) the right, but not the obligation, to buy one stock at time t = 1 for
the predetermined price K. At time t = 1, the value of this option will be
1
S1 K if S1 > K
C1 = max{0, S1 K} =
0
otherwise.
One can then argue that the market price of the option at time t = 0 (today)
should be
C0 = er [C1 ] = er max{0, S1 K}.

To implement this model we could do as follows:


Fix a large number N .
Generate N independent samples z1 , . . . , zN of an standard normal
random variable Z N (0, 1).
Set yn = zj + (r 2 /2). Then y1 , . . . , yN are independent samples
of a random variable Y N (r 2 /2, ).
Set s1,n = s0 eyn . Then s1,1 , . . . , s1,N are independent samples from
the distribution of S1 .
Set c1,n = max{0, s1,n K}. These are the option values at time
t = 1.
%
Set cn = er n1 nk=1 c1,k . These are the successive Monte Carlo
estimates of the option price at time t = 0.

23

Report N , the final MC estimate cN , the standard error &N and plot
a convergence diagram.
The reader should note that there is an exact formula, the celebrated
Black-Scholes formula for the option price C0 . In practice, one would therefore not use Monte Carlo simulations to compute the option prices. However,
it is well known that the Black-Scholes model does not accurately reflect all
the features of typical stock prices. In more advanced (and realistic) models,
it may not be possible to produce an exact formula for the option price and
then numerical methods are crucial. Monte Carlo simulations is one such
method that is easy to implement.
5.2. The collective loss model. Next we study an example from Actuarial
Mathematics. Pretend that you are an insurance company insuring, say,
cars. There are a lot of policy holders (car owners) and each time a policy
holder has an accident, loses his/her car to theft or fire etc, he/she makes a
claim and you have to pay some money. In compensation for this, you receive
money in the forms of insurance premia. How big premia is it reasonable to
collect?
A general way to look at this problem is to say that the total loss S, i.e.
the total amount of money that you (the insurer) must pay out during, say, a
year, is a random variable. Definitely, you would like to set the premium so
that the total premium collected is at least the expected value E[S], or else
you be in the red on average. In practice, premia are higher than that. Let
us for simplicity say that the total premium collected is twice the expected
value E[S]. Let us ask what the probability is that the collected premia
exceeds the total (or aggregate) loss.
To attack this problem, we must know the distribution of the aggregate
loss S (during a fixed time period, say a year). A common model for S is
the so called collective risk model, in which
S = X1 + X2 + + XN ,

where N is the total number of accidents (the frequency) and Xj is what you
have to pay to the insured in accident j. Here both N and Xj are random
variables. We assume that:
the random variables X1 , X2 ,. . . are i.i.d. ., so they have the same
distribution as a random variable X called the severity;
N and X1 , X2 ,. . . are all independent.
One can then show that
E[S] = E[N ]E[X],
and we are interested in computing the probability
P {S > 2E[S]}.

To do this we must know the distributions of N and S above, but even if


we know them it can be difficult to calculate the probability above exactly.
This is where Monte Carlo simulations come into the picture.

24

Let us consider the case when frequency N is a Poisson random variable


and severity X 0 is a Pareto random variable. This means that N takes
values 0, 1, 2, . . . and has a pmf of the form
fN (k) = e k /k!,
for some > 0 and that X is a continuous random variable with pdf

fX (x) =
,
(x + )
for some constants > 0 and > 1. Then we have

E[S] = E[N ]E[X] =


.
1
We can generate samples of the Poisson distribution using the matlab
command poissrnd in the statistics toolbox. It can also be done by hand,
but that is a little tricky.
As for the Pareto distribution, it can be sampled by inverting the cdf.
#
$

1
y = FX (x) = 1
x=
1 .
(x + )
(1 y)1/
To obtain a Monte Carlo estimate for the probability P {S > 2E[S]} we
proceed as follows:
Pick a large number M (we do not call it N this time. . . )
Generate M independent samples n1 , n2 , . . . , nM from the Poisson
distribution.
Generate n1 + + nm independent samples
xmj , 1 m M, 1 j nm

from the Pareto distribution.


Set xm = xm1 + + xmnm for 1 m M .

Set zm = 1 if xm > 2 E[S] = 1


and zm = 0 otherwise.
%
m
1
Set zm = m
for
1

M
.
j=1
Report M , the final Monte Carlo estimate zM , the standard error
&M and draw a convergence diagram.

5.3. The binomial tree model. Finally we consider another model for
the evolution of stock prices: the binomial tree model. In this model, time
starts at t = 0 and ends at some time t = T in the future. In between, there
are M time steps, so time is discrete,
t = 0, t, 2t, . . . , M t = T.
We consider a market with a single stock. The price of the stock at time
mt is denoted by Sm . It is a random variable. Given its value Sm at time
mt, its value Sm+1 is equal to
1
Sm u with probability pu
Sm+1 =
Sm d with probability pd

25

Here u and d are constants with d < 1 < u and pu , pd > 0, pu + pd = 1.


This implies that given the value S0 of the stock at time 0, Sm can take
the m + 1 possible values
S0 dm , S0 udm1 , . . . , S0 um .
It is possible to transform Sm to a binomial randdom variable.
Now, we wish to do two things with this model:
Calibrate the parameters u, d, pu and pd .
Compute the price of a so called Barrier option in this model.

5.3.1. Calibration. We wish to use the binomial tree to model the behavior
of a real stock. How should we pick the parameters u, d, pu and pd . Basically
we need four conditions to nail down these four unknown. One condition is
(1)

pu + pd = 1,

so we need three more.


We can get two conditions by specifying the conditional expected value
and variance of Sm+1 giveen Sm . For financial reasons it is natural to demand
E[Sm+1 |Sm ] = Sm (1 + rt)

2 2
Var[Sm+1 |Sm ] = Sm
t,

where r is the interest rate and is the volatility of the stock (assumed
constant). This leads to the equations
(2)
(3)

pu u + pd d = 1 + rt
pu (u (1 + rt)) + pd (1 + rt d)2 = 2 t
2

Finally, it is common to put in a condition of symmetry, namely


(4)

ud = 1

It is in general not possible to find an analytic expression for the values


of u, d, pu , pd determined by the four conditions (1)-(4). However, we can
reduce the four equations to one, and solve the latter using e.g. Newtons
method!
5.3.2. Pricing a barrier option. We wish to use Monte Carlo simulations to
price an up-and-out barrier call option in the binomial tree model. Such
an option works in the same way as a (vanilla) call option, but it becomes
worthless if the stock price exceeds a certain barrier B prior to expiry.
Today is t = 0 and the option expires at time t = T . At the latter time,
the option is worth

if SM K
0
CM = 0
if Sm B for some m, 0 < m M

SM K otherwise

26

Here SM and Sm , 1 m M are of course unknown at time t = 0.


One can then argue that at time 0, the option is worth
C0 = (1 + rt)M E[max{SM K, 0} 1{max1mM Sm <B} ].

An algorithm for computing the price C0 is as follows:


Calibrate the model and calculate the values of u, d, pu and pd above.
Pick a large number N .
Generate N M independent samples zn,m , 1 n N , 1 m M
from the Bernoulli(pu ) distribution. Thus zn,m = 0 or zn,m = 1 for
all m, n.
Pm
Pm
Set sn,m = S0 u j=1 zn,j dm j=1 zn,j for 1 n N and 1 m M .
Set xn = max{sn,M K, 0} 1{max1mM sn,m <B} for 1 n N .
%
Set cn = (1 + rt)M n1 nj=1 xj for 1 n N . These are the
successive Monte Carlo estimates of the option price C0 .
Report the final Monte Carlo estimate cN , the standard error &N ,
and draw a convergence diagram.

You might also like