You are on page 1of 68

Tutorial on

quasi-Monte Carlo methods


Josef Dick
School of Mathematics and Statistics, UNSW, Sydney, Australia
josef.dick@unsw.edu.au
Comparison: MCMC, MC, QMC
Roughly speaking:
Markov chain Monte Carlo and quasi-Monte Carlo are for
different types of problems;
If you have a problem where Monte Carlo does not work, then
chances are quasi-Monte Carlo will not work as well;
If Monte Carlo works, but you want a faster method try
(randomized) quasi-Monte Carlo (some tweaking might be
necessary).
Quasi-Monte Carlo is an "experimental design" approach to
Monte Carlo simulation;
In this talk we shall discuss how quasi-Monte Carlo can be faster than
Monte Carlo under certain assumptions.
Outline
DISCREPANCY, REPRODUCING KERNEL HILBERT
SPACES AND WORST-CASE ERROR
QUASI-MONTE CARLO POINT SETS
RANDOMIZATIONS
WEIGHTED FUNCTION SPACES AND
TRACTABILITY
The task is to approximate an integral
I
s
(f ) =
_
[0,1]
s
f (z) dz
for some integrand f by some quadrature rule
Q
N,s
(f ) =
1
N
N1

n=0
f (x
n
)
at some sample points x
0
, . . . , x
N1
[0, 1]
s
.
_
[0,1]
s
f (x) dx
1
N
N1

n=0
f (x
n
)
In other words:
Area under curve = Volume of cube average function value.
If x
0
, . . . , x
N1
[0, 1]
s
are chosen randomly Monte Carlo
If x
0
, . . . , x
N1
[0, 1]
s
chosen deterministically Quasi-Monte
Carlo
Smooth integrands
Integrand f : [0, 1] R; say continuously differentiable;
We want to study the integration error:
_
1
0
f (x) dx
1
N
N1

n=0
f (x
n
).
Representation:
f (x) = f (1)
_
1
x
f
t
(t ) dt = f (1)
_
1
0
1
[0,t ]
(x)f
t
(t ) dt ,
where
1
[0,t ]
(x) =
_
1 if x [0, t ],
0 otherwise.
Integration error
Substitute:
_
1
0
f (x)dx =
_
1
0
_
f (1)
_
1
0
1
[0,t ]
(x)f
t
(t ) dt
_
dx
= f (1)
_
1
0
_
1
0
1
[0,t ]
(x)f
t
(t ) dx dt ,
1
N
N1

n=0
f (x
n
) =
1
N
N1

n=0
_
f (1)
_
1
0
1
[0,t ]
(x
n
)f
t
(t ) dt
_
= f (1)
_
1
0
1
N
N1

n=0
1
[0,t ]
(x
n
)f
t
(t ) dt .
Integration error:
_
1
0
f (x) dx
1
N
N1

n=0
f (x
n
) =
_
1
0
_
1
N
N1

n=0
1
[0,t ]
(x
n
)
_
1
0
1
[0,t ]
(x) dx
_
f
t
(t ) dt .
Local discrepancy
Integration error:
_
1
0
f (x) dx
1
N
N1

n=0
f (x
n
) =
_
1
0
_
1
N
N1

n=0
1
[0,t ]
(x
n
) t
_
f
t
(t ) dt .
Let P = x
0
, . . . , x
N1
[0, 1]. Dene the local discrepancy by

P
(t ) =
1
N
N1

n=0
1
[0,t ]
(x
n
) t , t [0, 1].
Then
_
1
0
f (x) dx
1
N
N1

n=0
f (x
n
) =
_
1
0

P
(t )f
t
(t ) dt .
Koksma-Hlawka inequality

_
1
0
f (x) dx
1
N
N1

n=0
f (x
n
)

_
1
0

P
(t )f
t
(t ) dt

_
_
1
0
[
P
(t )[
p
dt
_
1/p
_
_
1
0
[f
t
(t )[
q
dt
_
1/q
= |
P
|
L
p
|f
t
|
L
q
,
1
p
+
1
q
= 1,
where |g|
L
p
=
__
[g[
p
_
1/p
.
Interpretation of discrepancy
Let P = x
0
, . . . , x
N1
[0, 1]. Recall the denition of the local
discrepancy:

P
(t ) =
1
N
N1

n=0
1
[0,t ]
(x
n
) t , t [0, 1].
Local discrepancy measures difference between uniform distribution
and emperical distribution of quadrature points P = x
0
, . . . , x
N1
.
This is the Kolmogorov-Smirnov test for the difference between the
empirical distribution of x
0
, . . . , x
N1
and the uniform distribution.
Function space
Representation
f (x) = f (1)
_
1
x
f
t
(t ) dt .
Dene inner product:
f , g = f (1)g(1) +
_
1
0
f
t
(t )g
t
(t ) dt .
and norm
|f | =

[f (1)[
2
+
_
1
0
[f
t
(t )[
2
dt .
Function space:
1 = f : [0, 1] R : f absolutely continuous and |f | < .
Worst-case error
The worst-case error is dened by
e(1, P) = sup
f ),|f |1

_
1
0
f (x) dx
1
N
N1

n=0
f (x
n
)

.
Reproducing kernel Hilbert space
Recall:
f (y) = f (1) 1 +
_
1
0
f
t
(x)(1
[0,x]
(y)) dx
and
f , g = f (1)g(1) +
_
1
0
f
t
(x)g
t
(x) dx.
Goal: Find set of functions g
y
1 for each y [0, 1] such that
f , g
y
= f (y) for all f 1.
Conclusions:
g
y
(1) = 1 for all y [0, 1];

g
t
y
(x) = 1
[0,x]
(y) =
_
1 if y x,
0 otherwise;
Make g continuous such that g 1;
Reproducing kernel Hilbert space
It follows that
g
y
(x) = 1 + min1 x, 1 y.
The function
K(x, y) := g
y
(x) = 1 + min1 x, 1 y, x, y [0, 1]
is called reproducing kernel.
The space 1 is called a reproducing kernel Hilbert space (with
reproducing kernel K).
Numerical integration in reproducing kernel Hilbert
spaces
Function representation:
_
1
0
f (z) dz =
_
1
0
f , K(, z) dz =
_
f ,
_
1
0
K(, z) dz
_
1
N
N1

n=0
f (x
n
) =
1
N
N1

n=0
f , K(, x
n
) =
_
f ,
1
N
N1

n=0
K(, x
n
)
_
Integration error
_
1
0
f (z) dz
1
N
N1

n=0
f (x
n
) =
_
f ,
_
1
0
K(, z) dz
1
N
N1

n=0
K(, x
n
)
_
= f , h,
where
h(x) =
_
1
0
K(x, z) dz
1
N
N1

n=0
K(x, x
n
).
Worst-case error in reproducing kernel Hilbert spaces
Thus
e(1, P) = sup
f ),|f |1

_
1
0
f (z) dz
1
N
N1

n=0
f (x
n
)

= sup
f ),|f |=1
[f , h[
= sup
f ),f ,=0

_
f
|f |
, h
_

=
h, h
|h|
= |h|,
since the supremum is attained when choosing f =
h
|h|
1.
Worst-case error in reproducing kernel Hilbert spaces
e
2
(1, P) =
_
1
0
_
1
0
K(x, y) dx dy
2
N
N1

n=0
_
1
0
K(x, x
n
) dx
+
1
N
2
N1

n,m=0
K(x
n
, x
m
)
Numerical integration in higher dimensions
Tensor product space 1
s
= 1 1.
Reproducing kernel
K(x, y) =
s

i =1
[1 + min1 x
i
, 1 y
i
],
where x = (x
1
, . . . , x
s
), y = (y
1
, . . . , y
s
) [0, 1]
s
.
Functions f 1
s
have partial mixed derivatives up to order 1 in
each variable

]u]
f
x
u
L
2
([0, 1]
s
),
where u 1, . . . , s, x
u
= (x
i
)
i u
and [u[ denotes the cardinality
of u and where

|u|
f
x
u
(x
u
, 1) = 0 for all u 1, . . . , s.
Worst-case error
Again
e
2
(1, P) =
_
[0,1]
s
_
[0,1]
s
K(x, y) dx dy
2
N
N1

n=0
_
[0,1]
s
K(x, x
n
) dx
+
1
N
2
N1

n,m=0
K(x
n
, x
m
)
and
e
2
(1, P) =
_
[0,1]
s

P
(x)

s
f
x
(x) dx,
where

P
(x) =
1
N
N1

n=0
1
[0,x]
(x
n
)
s

i =1
x
i
.
Discrepancy in higher dimensions
Point set P = x
0
, . . . , x
N1
[0, 1]
s
, t = (t
1
, . . . , t
s
) [0, 1]
s
.
Local discrepancy:

P
(t) =
1
N
N1

n=0
1
[0,t]
(x
n
)
s

i =1
t
i
,
where [0, t] =

s
i =1
[0, t
i
].
Koksma-Hlawka inequality
Let f : [0, 1]
s
R with
|f | =
_
_
[0,1]
s

s
x
f (x)

q
dx
_
1/q
,
and where

|u|
f
x
u
(x
u
, 1) = 0 for all u 1, . . . , s.
Then

_
[0,1]
s
f (x) dx
1
N
N1

n=0
f (x
n
)

|f ||
P
|
L
p
,
where
1
p
+
1
q
= 1.
Construct points P = x
0
, . . . , x
N1
[0, 1]
s
with small
discrepancy
L
p
(P) = |
P
|
L
p
.
It is often useful to consider different reproducing kernels yielding
different worst-case errors and discrepancies. We will see further
examples later.
Constructions of low-discrepancy sequences
Lattice rules (Bilyk, Brauchart, Cools, D., Hellekalek, Hickernell,
Hlawka, Joe, Keller, Korobov, Kritzer, Kuo, Larcher, LEcuyer,
Lemieux, Leobacher, Niederreiter, Nuyens, Pillichshammer,
Sinescu, Sloan, Temlyakov, Wang, Wo zniakowski, ...)
Digital nets and sequences (Baldeaux, Bierbrauer, Brauchart,
Chen, D., Edel, Faure, Hellekalek, Hofer, Keller, Kritzer, Kuo,
Larcher, Leobacher, Niederreiter, Owen, zbudak,
Pillichshammer, Pirsic, Schmid, Skriganov, Sobol, Wang, Xing,
Yue, ...)
Hammersley-Halton sequences (Atanassov, De Clerck, Faure,
Halton, Hammersley, Kritzer, Larcher, Lemieux, Pillichshammer,
Pirsic, White, ...)
Kronecker sequences (Beck, Hellekalek, Larcher, Niederreiter,
Schoissengeier, ...)
Lattice rules
Let N N, let
g = (g
1
, . . . , g
s
) 1, . . . , N 1
s
.
Choose the quadrature points as
x
n
=
_
ng
N
_
, for n = 0, . . . , N 1,
where z = z z| for z R
+
0
.
Fibonacci lattice rules
Lattice rule with N = 55 points and generating vector g = (1, 34).
Fibonacci lattice rules
Lattice rule with N = 89 points and generating vector g = (1, 55).
In comparison: Random point set
Random set of 64 points generated by Matlab using Mersenne
Twister.
Lattice rules
How to nd generating vector g?
Reproducing kernel:
K(x, y) =
s

i =1
(1 + 2B

(x
i
y
i
)) ,
where x
i
y
i
= (x
i
y
i
) x
i
y
i
| is the fractional part of x
i
y
i
.
Reproducing kernel Hilbert space of Fourier series:
f (x) =

hZ
s

f (h)e
2ihx
.
Worst-case error:
e
2

(g) = 1 +
1
N
N1

n=0
s

i =1
_
1 + 2B

__
ng
i
N
___
,
where B

is the Bernoulli polynomial of order . For instance


B
2
(x) = x
2
x + 1/6.
Component-by-component construction (Korobov,
Sloan-Reztsov,Sloan-Kuo-Joe, Nuyens-Cools)
Set g
1
= 1.
For d = 2, . . . , s assume that we have found g
2
, . . . , g
d1
. Then
nd g
d
1, . . . , N 1 which minimizes e

(g
1
, . . . , g
d1
, g) as a
function of g, i.e.
g
d
=
argmin
g1,...,N1]
e
2

(g
1
, . . . , g
d1
, g).
Using fast Fourier transform (Nuyens, Cools), a good generating
vector g 1, . . . , N 1
s
can be found in O(sN logN) operations.
Hammersley-Halton sequence
Radical inverse function in base b:
Let n N
0
have base b expansion
n = n
0
+ n
1
b + + n
a1
b
a1
.
Set

b
(n) =
n
0
b
+
n
1
b
2
+ +
n
a1
b
a
[0, 1].
Hammersley-Halton sequence:
Let P = p
1
, p
2
, . . . , be the set of prime numbers in increasing
order, i.e. p
1
= 2, p
2
= 3, p
3
= 5, p
4
= 7, . . ..
Dene quadrature points x
0
, x
1
, . . . by
x
n
= (
p
1
(n),
p
2
(n), . . . ,
p
s
(n)) for n = 0, 1, 2, . . . .
Hammersley-Halton sequence
Hammerlsey-Halton point set with 64 points.
Digital nets
Choose prime number b and nite eld Z
b
= 0, 1, . . . , b 1 of
order b.
Choose C
1
, . . . , C
s
Z
mm
b
.
Let n = n
0
+ n
1
b + + n
m1
b
m1
and set

n = (n
0
, . . . , n
m1
)

Z
m
b
.
Let

y
n,i
= C
i

n for 1 i s, 0 n < b
m
.
For

y
n,i
= (y
n,i ,1
, . . . , y
n,i ,m
)

Z
m
b
let
x
n,i
=
y
n,i ,1
b
+ +
y
n,i ,m
b
m
.
Set x
n
= (x
n,1
, . . . , x
n,s
) for 0 n < b
m
.
(t , m, s)-net property
Let m, s 1 and b 2 be integers. A point set P = x
0
, . . . , x
b
m
1

is called a (t , m, s)-net in base b, if for all integers d


1
, . . . , d
s
0 with
d
1
+ + d
s
= mt
the number of points in the elementary intervals
s

i =1
_
a
i
b
d
i
,
a
i
+ 1
b
d
i
_
where 0 a
i
< b
d
i
,
is b
t
.
If P = x
0
, . . . , x
b
m
1
[0, 1]
s
is a (t , m, s)-net, then local
discrepancy function

P
_
a
1
b
d
1
, . . . ,
a
s
b
d
s
_
= 0
for all 0 a
i
< b
d
i
, d
i
0, d
1
+ + d
s
= mt .
Sobol point set
The rst 64 points of a Sobol sequence which form a (0, 6, 2)-net in
base 2.
Niederreiter-Xing point set
The rst 64 points of a Niederreiter-Xing sequence.
Niederreiter-Xing point set
The rst 1024 points of a Niederreiter-Xing sequence.
Kronecker sequence
Let
1
, . . . ,
s
R be linearly independent over Q.
Let
x
n
= (n
1
, . . . , n
s
) for n = 0, 1, 2, . . . .
For instance, one can choose
i
=

p
i
where p
i
is the i th prime
number.
Kronecker sequence
The rst 64 points of a Kronecker sequence.
Discrepancy bounds
It is known that for all the constructions above one has
L
p
(P) C
s
(logN)
c(s)
N
, for all N 1, 1 p ,
where C
s
> 0 and c(s) s.
Lower bound: For all point sets P consisting of N points we have
L
p
(P) C
t
s
(logN)
(s1)/2
N
for all N 1, 1 < p .
In comparison: random point set
E(L
2
(P)) = O
_
_
log logN
N
_
.
For 1 < p < , the exact order of convergence is known to be of
order
(logN)
(s1)/2
N
.
Explicit constructions of such points were achieved by Chen &
Skriganov for p = 2 and Skriganov for 1 < p < .
Great open problem: What is the correct order of convergence of
min
P[0,1]
s
]P]=N
L

(P)?
Randomized quasi-Monte Carlo
Introduce random element to deterministic point set.
Has several advantages:
yields an unbiased estimator;
there is a statistical error estimate;
better rate of convergence of the random-case error compared to
worst-case error;
performs similar to Monte Carlo for L
2
functions but has better
rate of convergence for smooth functions;
Randomly shifted lattice rules
Lattice rules can be randomized using a random shift:
Lattice point set:
ng/N , n = 0, 1, . . . , N 1.
Shifted lattice rule: choose [0, 1]
s
uniformly distributed; then
the shifted lattice point set is given by
ng/N + , n = 0, 1, . . . , N 1.
Lattice point set
Shifted lattice point set
Owens scrambling
Let Z
b
= 0, 1, . . . , b 1 and let
x =
x
1
b
+
x
2
b
2
+ , x
1
, x
2
, . . . Z
b
.
Randomly choose permutations ,
x
1
,
x
1
,x
2
, . . . : Z
b
Z
b
.
Let
y
1
= (x
1
)
y
2
=
x
1
(x
2
)
y
3
=
x
1
,x
2
(x
3
)
. . . . . .
Set
y =
y
1
b
+
y
2
b
2
+ .
Scrambled Sobol point set
The rst 1024 points of a Sobol sequence (left) and a scrambled
Sobol sequence (right).
Scrambled net variance
Let
e(R) =
_
[0,1]
s
f (x) dx
1
N
b
m
1

n=0
f (y
n
).
Then

E(e(R)) = 0

Var(e(R)) = O
_
(logN)
s
N
2+1
_
for integrands with smoothness 0 1.
Dependence on the dimension
Although the convergence rate is better for quasi-Monte Carlo, there
is a stronger dependence on the dimension:
Monte Carlo: error = O(N
1/2
);
Quasi-Monte Carlo: error = O
_
(log N)
s1
N
_
;
Notice that g(N) := N
1
(logN)
s1
is an increasing function for
N e
s1
.
So if s is large, say s 30, then N
1
(logN)
29
increases for N 10
12
.
Intractable
For integration error in 1 with reproducing kernel
K(x, y) =
s

i =1
min1 x
i
, 1 y
i

it is known that
e
2
(1, P)
_
1 N
_
8
9
_
s
_
e
2
(1, ).
Error can only decrease if
N > constant
_
9
8
_
s
.
Weighted function spaces (Sloan and Wo zniakowski)
Study weighted function spaces: Introduce
1
,
2
, . . . , > 0 and dene
K(x, y) =
s

i =1
(1 +
i
min1 x
i
, 1 y
i
).
Then if

i =1

i
<
we have
e(1
s,
, P) CN

,
where C > 0 is independent of the dimension s and < 1.
Tractability
Minimal error over all methods using N points:
e

N
(1) = inf
P:]P]=N
e(1, P);
Inverse of the error:
N

(s, ) = minN N : e

N
(1) e

0
(1)
for > 0;
Strong tractability:
N

(s, ) C

for some constant C > 0;


Sloan and Wo zniakowski: For the function space considered above,
we get strong tractability if

i =1

i
< .
Tractability of star-discrepancy
Recall the local discrepancy function

P
(t) =
1
N
N1

n=0
1
[0,t)
(x
n
) t
1
t
s
, where P = x
0
, . . . , x
N1
.
The star-discrepancy is given by
D

N
(P) = sup
t[0,1]
s
[
P
(t)[ .
Result by Heinrich, Novak, Wo zniakowski, Wasilkowski:
Minimal star discrepancy D

(s, N) = inf
P
D

N
(P) C
_
s
N
for all s, N N.
This implies that
N

(s, ) Cs
2
.
Extensions, open problems and future research
Quasi-Monte Carlo methods which achieve convergence rates of
order
N

(logN)
s
, > 1
for sufciently smooth integrands;
Construction of point sets whose star-discrepancy is tractable;
Completely uniformly distributed point sets to speed up
convergence in Markov chain Monte Carlo algorithms;
Innite dimensional integration: Integrands have innitely many
variables;
Connections to codes, orthogonal arrays and experimental
designs;
Quasi-Monte Carlo for function approximation;
Choosing the weights in applications;
Quasi-Monte Carlo on the sphere;
Book
Thank You!
Special thanks to
Dirk Nuyens for creating many of the pictures in this talk;
Friedrich Pillichshammer for letting me use a picture of his
daughters;
Fred Hickernell, Peter Kritzer, Art Owen, and Friedrich
Pillichshammer for helpful comments on the slides;
All colleagues who worked on quasi-Monte Carlo methods over
the years.

You might also like