You are on page 1of 5

W4413 Nonparametric statistics Lecture 5 - 4/10/2014

Monte Carlo Method


Lecturer: Arian Maleki Scribe: Arian Maleki

1 Monte Carlo method


In this section we would like to review the Monte Carlo technique for evaluating the integrals and expected
values. This topic will be used many times in our course.

1.0.1 Numeric calculation of expected values


In many problems we would like to calculate E[g(x)] or E[g(x1 , . . . , xn )]. You will see some examples later
in the course. Suppose that g(x) is given and x f (x). Furthermore, suppose that f (x) is zero outside
the interval [a, b]. One way to do this calculation is to employ the numeric integration techniques and the
formula:
Z b
g(x)f (x)dx.
a
We will review this method and its benefits and drawbacks in Section 1.0.2.
Another way to address this question is to employ the law of large numbers and the central limit theorem;
iid
If we have access to x1 , x2 , . . . , xn f , then according to the weak law of large numbers
n
1X p
g(xi ) E[g(x)],
n i=1
and according to the central limit theorem
n
!
1X d
n g(xi ) E[g(x)] N (0, var(g)).
n i=1
n
X
1
In other words it seems that n g(xi ) provides a good estimate of E(g(x)). This is the basic idea of the
i=1
Monte Carlo method as we will describe in Section 1.0.3. But before we discuss whether this approach is
good or not, let us start with a more standard approach that you might have seen in calculus.

1.0.2 Numeric integration


Suppose that we are interested in the numeric evaluation of the following integral:
Z b
g(x)dx
a

Note that this integral corresponds to the area below the curve g(x).

One easy way to approximate the value of this integral is to partition interval [a, b] into thin slices of
width and approximate the area of each slice with g(xi ). This is shown in Figure 1. In other words,

1
Rb
Figure 1: a g(x)dx calculates the area below the curve g. Numerical methods partition the interval [a, b] into thin
slices and then approximate the area of each thin slice.

Z b n
X
g(x)dx g(xi ),
a i=1

where xi is the midpoint in the ith slice, and n = d ba e . Clearly, this approximation suffers from a certain
amount of error. Using the mean value theorem we can provide an upper bound for the amount of error.
Also, it is intuitively clear that as gets smaller, the number of slices n increases and the error decreases.
Lets formalize this statement by providing an upper bound for the approximation error. Suppose g is
differentiable and |g 0 (x)| < C for all values of x, where C is a constant. Using the mean value theorem we
have  
C
|g(x) g(xi )| , x xi , xi + .
2 2 2
Can you prove why? Therefore, we have
Z Z
xi +/2 xi +/2 C2
g(x)dx g(xi ) = (g(x) g(xi ))dx . (1)

2

xi /2 xi /2

Try to describe why each step is correct. Using (1), it is straightforward to prove that
Z
b n
X C(b a) C(b a)2
g(x)dx g(xi ) . (2)

2 2n

a
i=1

Again, try to prove this last step. Remember that in numeric integration usually n is very large, and similar
to what we had in regression the rate of decay of the error, as n grows, is an important goodness measure
for a numeric integration method. The simple method we described above gives us n1 rate as shown in (2).
The simple approach we described above can be extended to higher dimensional functions. Consider a
function g : Rd R and suppose that we are interested in the numeric evaluation of
Z b Z b Z b
... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd .
a a a

Note that for notational simplicity we have assumed that we are interested in the integral over [a, b]d .
But all the discussions are true for more general forms of intervals.

2
Figure 2: Depiction of integration in dimensions greater than 1.

Then, again, if we would like to perform this integration, we should break [a, b]d into subintervals and
approximate the integral with
n
X Z b Z b Z b
g(xi1 , xi2 , . . . , xid )d ... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd .
i=1 a a a

Figure 2 depicts the partitions we consider for the numeric integration. However, as you can imagine,
as the dimension goes up, we should consider many more intervals to achieve a certain level of accuracy. In
fact under some minimal conditions, such as the boundedness of the gradient of the function, one can prove
that Z Z
b b Z b n
X
i i i

d C0
... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd g(x1 , x2 , . . . , xd ) 1 , (3)


a a a nd
i=1

where C 0 is constant that depends on the maximum size of the partial derivatives and the interval of
integration. But it is free of n. We will prove this statement later. As is clear from (3) as the dimension
increases the rate of decay of the error decreases. This means that we need more number of slices to
obtain certain level of accuracy. As the number of points (slices) increases, the computational complexity
of integration increases. Is it clear why? Therefore, the numeric integration becomes dramatically slower.
Let me give you an example. Assume for a moment that C 0 = 1 and we want to evaluate my integral with
error=0.01. Then we have
1 2d
1 0.01 n 10 .
nd
This means that if you want to calculate a numerical integral of a 10 dimensional function, then you have
to calculate the function at 1020 points!!!
Those of you have tried matlab and R numeric integration have noticed this problem. It is almost
impossible to get any result from them when the dimension is larger than 5. Now, lets try to prove the
upper bound for the d-dimensional numeric integration. Suppose that function g satisfies the following
property:
g(x1 , x2 , . . . , xd )
max sup C. (4)
i=1,2,...,d x1 ,x2 ,...,xd xi

3
We would like to calculate the integral
Z bZ b Z b
... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd .
a a a

We partition the integration interval into n equal-volume cubes of size d , where nd = (b a)d . Call
the center of the ith cube xi1 , xi2 , . . . , xid . We then have
Theorem 1. Let g be d-variate function that satisfies (4). Then
Z Z n

b b Z b X Cd(b a)d+1
i i i d
... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd g(x1 , x2 , . . . , xd ) .

2n1/d

a a a
i=1

Proof
Z b Z b Z b n
X
| ... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd g(xi1 , xi2 , . . . , xid )d |
a a a i=1
n Z xi1 +
2
Z xid +
2
n
(a) X X
= | ... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd g(xi1 , xi2 , . . . , xid )d |
i=1 xi1
2 xid
2 i=1
n Z xi1 +
2
Z xid +
2
(b) X
= | ... (g(x1 , x2 , . . . , xd ) g(xi1 , xi2 , . . . , xid ))dx1 dx2 . . . dxd |
i=1 xi1
2 xid
2

(c) n Z
X xi1 +
2
Z xid +
2
... |(g(x1 , x2 , . . . , xd ) g(xi1 , xi2 , . . . , xid ))|dx1 dx2 . . . dxd . (5)
i=1 xi1
2 xid
2

Therefore, we should find an upper bound for |g(x1 , x2 , . . . , xd ) g(xi1 , xi2 , . . . , xid )|. We do it in the following
way:
|g(x1 , x2 , . . . , xd ) g(xi1 , xi2 , . . . , xid )|
(d)
= |g(x1 , x2 , . . . , xd ) g(xi1 , x2 , x3 , . . . , xd ) + g(xi1 , x2 , x3 , . . . , xd ) g(xi1 , xi2 , . . . , xid )|
(e)
|g(x1 , x2 , . . . , xd ) g(xi1 , x2 , x3 , . . . , xd )| + |g(xi1 , x2 , x3 , . . . , xd ) g(xi1 , xi2 , . . . , xid )|
(6)
It it straight forward to use the mean value theorem to bound the first term above. However, the second
term still looks complicated. Therefore, we use the same technique to simplify the second term:
|g(xi1 , x2 , x3 . . . , xd ) g(xi1 , xi2 , . . . , xid )|
= |g(xi1 , x2 , x3 . . . , xd ) g(xi1 , xi2 , x3 . . . , xd ) + g(xi1 , xi2 , x3 . . . , xd ) g(xi1 , xi2 , . . . , xid )|
|g(xi1 , x2 , x3 . . . , xd ) g(xi1 , xi2 , x3 . . . , xd )| + |g(xi1 , xi2 , x3 . . . , xd ) g(xi1 , xi2 , . . . , xid )|. (7)
If we repeat this process we can easily provide the following upper bound:
|g(x1 , x2 , . . . , xd ) g(xi1 , xi2 , . . . , xid )|
(f )
|g(x1 , x2 , . . . , xd ) g(xi1 , x2 , x3 . . . , xd )| + |g(xi1 , x2 , x3 . . . , xd ) g(xi1 , xi2 , x3 . . . , xd )| + . . .
(g) Cd
+|g(xi1 , xi2 , . . . , xid1 , xd ) g(xi1 , xi2 , . . . , xid )| . (8)
2
If we combine (5) with (8) we obtain
Z bZ b Z b n
X
| ... g(x1 , x2 , . . . , xd )dx1 dx2 . . . dxd g(xi1 , xi2 , . . . , xid )d |
a a a i=1
d+1 d d+1
nCd Cd(b a) Cd(b a)
= = . (9)
2 2 2n1/d

4
In the next section we will see that in certain cases we can solve this problem more efficiently by the
Monte Carlo method.

1.0.3 Monte Carlo method


Again, consider the problem of evaluating E(g(x)) numerically for x f (x). We assume that that drawing
iid
iid samples x1 , x2 , . . . , xn f (x) is not too complicated. Once we have these samples according to CLT we
have
n
!
1X d
n g(xi ) E[g(x)] N (0, var(g))
n i=1
n
X
1
Practically speaking this means that n g(xi ) E[g(x)] is at the order of 1n . If we compare this


i=1
rate with the rate of the one-dimensional numeric integration in (2), we conclude that Monte Carlo method
is weaker. However, Monte Carlo method has several advantages. (i) The rate of decay of Monte Carlo
method is still 1n for multidimensional functions g : Rd R. In other words as soon as the dimension of
the function is greater than 2, MC method provides a better decay rate.1 (ii) MC method does not require to
have access to the pdf of the random variable. In many cases the pdf has a very complicated form however,
drawing samples from it is still straight forward. This is the case in many Bayesian settings. We will also
see some examples in our course. (iii) The assumptions we have on g are minimal. We dont even need the
smoothness of g any more. With this introduction let us now mention the exact form of the Monte Carlo
method.

Monte Carlo method for evaluating Eg(x)


iid
1. Generate x1 , . . . , xn f .
n
X
1
2. Estimate E[g(x)] with n g(xi )
i=1

3. Characterize the confidence interval based on the central limit theorem.

1 Note that we are ignoring a subtle point here: the variance of the function g usually grows for higher dimensional functions.

Therefore, in practice, Monte Carlo methods are not necessarily faster for high dimensional functions. But, if the variance of g
is not too large, and we can easily obtain an estimate of a variance of g from our samples, then Monte Carlo methods can be
efficient.

You might also like