Professional Documents
Culture Documents
Coupling
ABSTRACT
Coupling is a method in probability theory through which random objects are put onto a
single probability space with the aim to compare them with each other. Coupling is a powerful
tool that has been applied in a wide variety of different contexts, e.g. to derive probabilistic
inequalities, to prove limit theorems and identify associated rates of convergence, and to obtain
approximations.
The course first explains what coupling is and what general framework it fits into. After that
a number of applications are described, which illustrate the power of coupling and at the same
time serve as a guided tour through some key areas of modern probability theory.
The course is intended for master students and PhD students. A basic knowledge of probability
theory and measure theory is required.
Lindvall [10] provides a brief history of how coupling was invented in the late 1930s by
Wolfgang Doeblin.
PRELUDE
Contents
1 Introduction
1.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Birth-Death processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Poisson approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Basic theory of coupling
2.1 Definition of coupling
2.2 Coupling inequalities .
2.3 Rates of convergence .
2.4 Distributional coupling
2.5 Maximal coupling . . .
6
6
7
7
.
.
.
.
.
10
10
11
12
13
14
3 Random walks
3.1 Random walks in dimension 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Random walks in dimension d . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Random walks and the discrete Laplacian . . . . . . . . . . . . . . . . . . . . .
16
16
18
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Card shuffling
21
4.1 Random shuffles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Top-to-random shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Poisson approximation
5.1 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Stein-Chen method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Two applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
25
26
29
6 Markov Chains
6.1 Case 1: Positive recurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Case 2: Null recurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Case 3: Transient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
31
33
34
7 Probabilistic inequalities
7.1 Fully ordered state spaces . . . . . . . . .
7.2 Partially ordered state spaces . . . . . . .
7.2.1 Ordering for probability measures
7.2.2 Ordering for Markov chains . . . .
7.3 The FKG inequality . . . . . . . . . . . .
7.4 The Holley inequality . . . . . . . . . . .
.
.
.
.
.
.
36
36
37
37
38
40
42
8 Percolation
8.1 Ordinary percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Invasion percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Invasion percolation on regular trees . . . . . . . . . . . . . . . . . . . . . . . .
44
44
45
46
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
49
51
52
52
52
53
53
53
54
54
55
55
10 Diffusions
10.1 Diffusions on the half-line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Diffusions on the full-line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Diffusions in higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
60
61
62
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
References
[1] O. Angel, J. Goodman, F. den Hollander and G. Slade, Invasion percolation on regular
trees, Annals of Probability 36 (2008) 420466.
[2] A.D. Barbour, L. Holst and S. Janson, Poisson Approximation, Oxford Studies in Probability 2, Clarendon Press, Oxford, 1992.
[3] P. Diaconis, The cutoff phenomenon in finite Markov chains, Proc. Natl. Acad. Sci. USA
93 (1996) 16591664.
[4] G.R. Grimmett, Percolation, Springer, Berlin, 1989.
[5] O. H
aggstr
om, Finite Markov Chains and Algorithmic Applications, London Mathematical Society Student Texts 52, Cambridge University Press, Cambridge, 2002.
[6] F. den Hollander and M.S. Keane, Inequalities of FKG type, Physica 138A (1986) 167
182.
[7] C. Kraaikamp, Markov Chains: an introduction, lecture notes TU Delft, 2010.
[8] D.A. Levin, Y. Peres and E.L. Wilmer, Markov Chains and Mixing Times, American
Mathematical Society, Providence RI, 2009.
[9] T.M. Liggett, Interacting Particle Systems, Grundlehren der mathematische Wissenschaften 276, Springer, New York, 1985.
[10] T. Lindvall, W. Doeblin 19151940, Annals of Probability 19 (1991) 929934.
[11] T. Lindvall, Lectures on the Coupling Method, John Wiley & Sons, New York, 1992.
Reprint: Dover paperback edition, 2002.
[12] H. Nooitgedagt, Two convergence limits of Markov chains: Cut-off and Metastability,
MSc thesis, Mathematical Institute, Leiden University, 31 August 2010.
[13] J.A. Rice, Mathematical Statistics and Data Analysis (3rd edition), Duxbury Advanced
Series, Thomson Brooks/Cole, Belmont, California, 2007.
[14] F. Spitzer, Principles of Random Walk, Springer, New York, 1976.
[15] H. Thorisson, Coupling, Stationarity and Regeneration, Springer, New York, 2000.
Introduction
We begin by describing three examples of coupling illustrating both the method and its usefulness. Each of these examples will be worked out in more detail later. The symbol N0 is
used for the set N {0} with N = {1, 2, . . .}. The symbol tv is used for the total variation
distance, which is defined at the beginning of Section 2.
1.1
Markov chains
Let X = (Xn )nN0 be a Markov chain on a countable state space S, with initial distribution
= (i )iS and transition matrix P = (Pij )i,jS . If X is irreducible, aperiodic and positive
recurrent, then it has a unique stationary distribution solving the equation = P , and
lim P n =
componentwise on S.
(1.1)
This is the standard Markov Chain Convergence Theorem (MCCT) (see e.g. Haggstrom [5],
Chapter 5, or Kraaikamp [7], Section 2.2).
A coupling proof of (1.1) goes as follows. Let X 0 = (Xn0 )nN0 be an independent copy of the
same Markov chain, but starting from . Since P n = for all n, X 0 is stationary. Run X
and X 0 together, and let
T = inf{k N0 : Xk = Xk0 }
be their first meeting time. Note that T is a stopping time, i.e., for each n N0 the event
{T = n} is an element of the sigma-algebra generated by (Xk )0kn and (Xk0 )0kn . For
n N0 , define
Xn , if n < T,
00
Xn =
Xn0 , if n T.
Then, because of the strong Markov property, we have that X 00 = (Xn00 )nN0 is a copy of X.
Now write, for i S,
(P n )i i = P(Xn00 = i) P(Xn0 = i)
= P(Xn00 = i, T n) + P(Xn00 = i, T > n)
P(Xn0 = i, T n) P(Xn0 = i, T > n)
= P(Xn00 = i, T > n) P(Xn0 = i, T > n),
where we use P as the generic symbol for probability. Hence
X
kP n ktv =
|(P n )i i |
iS
iS
The l.h.s. is the total variation norm of P n . The conditions in the MCCT guarantee
that P(T < ) = 1 (as will be explained in Section 6). The latter is expressed by saying that
the coupling is successful. Hence the claim follows by letting n .
1.2
Birth-Death processes
Let X = (Xt )t0 , be the Markov process with state space N0 , birth rates b = (bi )iN0 , death
rates d = (di )iN0 (d0 = 0), and initial distribution = (i )iN0 . Suppose that b and d
are such that X is recurrent (see Kraaikamp [7], Section 3.6, for conditions on b and d that
guarantee recurrence). Let X 0 = (Xt0 )t0 be an independent copy of the same Markov process,
but starting from a different initial distribution = (i )iN0 . Run X and X 0 together, and
let
T = inf{t 0 : Xt = Xt0 }.
For t 0, define
Xt00
=
Xt , if t < T,
Xt0 , if t T.
If X is positive recurrent, then X has a unique stationary distribution , solving the equation
Pt = for all t 0. In that case, by picking = we get
lim kPt ktv = 0.
1.3
(1.2)
Poisson approximation
m = 1, . . . , n,
P
and put X = nm=1
P Ym . If all the pm s are small, then X is approximately Poisson distributed
with parameter nm=1 pm (see Rice [13], Section 2.1.5). How good is this approximation?
For > 0, define
p (i) = e
i
,
i!
7
i N0 ,
(1.3)
Thus, it suffices to find a coupling of X and X 0 that makes them equal with high probability.
Choosing them independently will not do.
Let (Ym , Ym0 ), m = 1, . . . , n, be independent {0, 1} N0 -valued random variables with distribution
1 pm ,
if i = 0, i0 = 0,
p
e m (1 pm ), if i = 1, i0 = 0,
0
0
m = 1, . . . , n.
P (Ym , Ym ) = (i, i ) =
0,
if i = 0, i0 N,
i
pm pm
if i = 1, i0 N,
e
i0 ! ,
By summing out over i0 , respectively, i we see that
0
pi
1 pm , if i = 0,
P(Ym = i) =
,
P(Ym0 = i0 ) = epm m
pm ,
if i = 1,
i0 !
i0 N0 ,
P(X 6= X ) = P
Ym 6=
m=1
n
X
!
Ym0
m=1
P m = 1, . . . , n : Ym 6= Ym0
n
X
P(Ym 6= Ym0 )
=
=
m=1
"
n
X
m=1
n
X
m=1
n
X
pm
(1 pm ) +
X
i0 =2
0
pi
epm m
i0 !
pm (1 epm )
p2m .
m=1
Hence, for =
Pn
m=1 pm ,
with M = maxm=1,...,n pm . This quantifies the extent to which the approximation is good for
M small. Both and M will in general depend on n. Typical applications will have of
order 1 and M tending to zero as n .
8
The coupling produced above will turn out to be the best possible: it is a maximal coupling.
The crux is that (Ym , Ym0 ) = (0, 0) and (1, 1) are given the largest possible probabilities. More
details follow in Section 5.
We proceed by formulating some basic theory of coupling. We need to arm ourselves with a
number of basic facts about coupling before we can proceed to describe examples.
Definition 2.1 Given a bounded signed measure M on a measurable space (E, E) such that
M(E) = 0, the total variation norm of M is defined as
kMktv = 2 sup M(A).
AE
where the supremum runs over all functions f : E R that are bounded and measurable w.r.t.
E, and kf k = supxE |f (x)| is the supremum norm. By the Jordan-Hahn decomposition
theorem, there exists a set D E such that M+ () = M( D) and M () = M( Dc ) are
both non-negative measures on (E, E). Clearly,
M = M+ M and supAE M(A) = M(D) =
R
+
M (E). It therefore follows that kMktv = (1D 1Dc ) dM = M+ (E) + M (E). If M(E) = 0,
then M+ (E) = M (E), in which case kMktv = 2M+ (E) = 2 supAE M(A).
2.1
Definition of coupling
A probability space is a triple (E, E, P), with (E, E) a measurable space, consisting of a sample
space E and a -algebra E of subsets of E, and with P a probability measure on E. Typically,
E is a Polish space and E consists of its Borel sets.
Definition 2.2 A coupling of two probability measures P and P0 on the same measurable space
on the product measurable space (E E, E E) (where
(E, E) is any (!) probability measure P
E E is the smallest sigma-algebra containing E E) whose marginals are P and P0 , i.e.,
1 ,
P = P
01 ,
P0 = P
0 (x, x0 ) = x0 ,
(x, x0 ) E E.
A similar definition holds for random variables. Given a probability space (, F, Q), a random
variable X is a measurable mapping from (, F) to (E, E). The image of Q under X is P, the
probability measure of X on (E, E). When we are interested in X only, we may forget about
(, F, Q) and work with (E, E, P) only.
Definition 2.3 A coupling of two random variable X and X 0 taking values in (E, E) is any
X
0 ) taking values in (E E, E E) whose marginals have
(!) pair of random variables (X,
0
the same distribution as X and X , i.e.,
D
=
X
X,
D
0 =
X
X 0,
of (X,
X
0 ) is a coupling of the laws P, P0 of X, X 0 in the sense of Definition 2.2.
Note: The law P
Note: Couplings are not unique. Two trivial examples are:
= P P0
P
lives on the diagonal
P = P0 , P
X
0 are independent
X,
=X
0.
X
In applications the challenge is to find a coupling that makes kP P0 ktv as small as possible.
Coupling allows for flexibility: coupling is an art, not a recipe.
2.2
Coupling inequalities
The basic coupling inequality for two random variables X, X 0 with probability distributions
P, P0 reads as follows:
Theorem 2.4 Given two random variables X, X 0 with probability distributions P, P0 , any (!)
of P, P0 satisfies
coupling P
X
6= X
0 ).
kP P0 ktv 2P(
Proof. Pick any A E and write
X
X
A) P(
0 A)
P(X A) P0 (X 0 A) = P(
X
X
A, X
=X
0 ) + P(
A, X
=
0)
= P(
6 X
X
X
0 A, X
=X
0 ) P(
0 A, X
6= X
0)
P(
X
X
A, X
6= X
0 ) P(
0 A, X
6= X
0 ).
= P(
Hence, by Definition 2.1,
kP P0 ktv = 2 sup [P(A) P0 (A)]
AE
X
A, X
6= X
0)
2 sup P(
AE
X
6= X
0 ).
= 2P(
There is also a version of the coupling inequality for sequences of random variables. Let
X = (Xn )nN0 and X 0 = (Xn0 )nN0 be two sequences of random variables taking values in
X
0 ) be a coupling of X and X 0 . Define
(E N0 , E N0 ). Let (X,
0
m = X
m
T = inf{n N0 : X
for all m n},
and X
0 , i.e., the first time from which the two sequences agree
which is the coupling time of X
onwards.
Theorem 2.5 For two sequences of random variables X = (Xn )nN0 and X 0 = (Xn0 )nN0
X
0 ) be a coupling of X and X 0 , and let T be the coupling
taking values in (E N0 , E N0 ), let (X,
time. Then
> n).
kP(Xn ) P0 (Xn )ktv 2P(T
11
n 6= X
n0 } {T > n}.
Proof. This follows from Theorem 2.4 because {X
In Section 1.3 we already saw an example of sequence coupling, namely, there X and X 0 were
two copies of a Markov chain starting from different initial distributions.
A stronger form of sequence coupling can be obtained by introducing the left-shift on E N0 ,
defined by
(x0 , x1 , . . .) = (x1 , x1 , . . .),
i.e., drop the first element of the sequence.
Theorem 2.6 Let X, X 0 and T be defined as in Theorem 2.5. Then
> n).
kP(n X ) P0 (n X 0 )ktv 2P(T
Proof. This also follows from Theorem 2.4 because
0
m 6= X
m
{X
for some m n} {T > n}.
Note: Similar inequalities hold for continuous-time random processes X = (Xt )t0 and
X 0 = (Xt0 )t0 .
Since total variation distance never increases under a mapping, we have the following corollary.
Corollary 2.7 Let be a measurable map from (E, E) to (E , E ). Let Q = P 1 and
Q0 = P0 1 (i.e., Q(B) = P( 1 (B)) and Q0 (B) = P0 ( 1 (B)) for B E ). Then
X
6= X
0 ).
kQ Q0 ktv kP P0 ktv 2P(
Proof. Simply estimate
kQ Q0 ktv = 2 sup [Q(B) Q0 (B)]
BE
(A = 1 (B))
AE
= kP P0 ktv ,
where the inequality comes from the fact that E may be larger than 1 (E ). Use Theorem 2.4
to get the bound.
2.3
Rates of convergence
Suppose that we have some control on the moments of the coupling time T , e.g. for some
: N0 [0, ) non-decreasing with limn (n) = we know that
E((T
)) < .
Theorem 2.8 Let X, X 0 and be as above. Then
kP(n X ) P0 (n X 0 )ktv = o 1/(n) as n .
12
Proof. Estimate
> n) E
(T )1{T >n} .
(n)P(T
Note that the r.h.s. tends to zero as n by dominated convergence because E((T
)) < ,
and use Theorem 2.6.
Similar results hold for continuous-time random processes. Typical examples are:
(n) = n , > 0 (polynomial rate),
(n) = en , > 0 (exponential rate).
For instance, for finite-state irreducible aperiodc Markov chains, there exists an M < such
> 2M |T > M ) 1 (see H
aggstrom [5], Chapter 5), which implies that there exists
that P(T
2
T
a > 0 such that E(e ) < . In Section 3 we will see that for random walks we typically
) < for all 0 < < 1 .
have E(T
2
2.4
Distributional coupling
X
0 ) of two random sequences X = (Xn )nN and X 0 = (Xn0 )nN
Suppose that a coupling (X,
0
0
comes with two random times T and T 0 such that not only
D
=
X
X,
but also
D
0 =
X
X 0,
0
D
T)=
0 , T 0 ).
(T X,
(T X
Here we compare the two sequences shifted over different random times, rather than the same
random time.
Theorem 2.9 Let X, X 0 , T , T 0 be as above. Then
> n) = 2P(T
0 > n).
kP(n X ) P0 (n X 0 )ktv 2P(T
Proof. Write, for A E N0 ,
nX
A, T n) =
P(
=
n
X
m=0
n
X
nm (m X)
A, T = m)
P(
nm (m X
0 ) A, T 0 = m)
P(
m=0
nX
0 A, T 0 n).
= P(
It follows that
nX
nX
nX
nX
A) P(
0 A) = P(
A, T > n) P(
0 A, T 0 > n)
P(
> n) + P(T
0 > n) = 2P(T
> n),
P(T
and hence
kP(n X ) P0 (n X 0 )ktv = 2 sup [P(n X A) P0 (n X 0 A)]
AE N0
nX
nX
A) P(
0 A)]
= 2 sup [P(
AE N0
> n).
2P(T
13
Note: A restrictive feature of distributional coupling is that T = T 0 , i.e., the two random
times must have the same distribution. In Section 3 we will encounter an example illustrating
the usefulness of distributional coupling.
2.5
Maximal coupling
Does there exist a best possible coupling, one that gives the sharpest estimate on the total
variation, in the sense that the inequality in Theorem 2.4 becomes an equality? The answer
is yes!
Theorem 2.10 For any two probability measures P and P0 on a measurable space (E, E) there
such that
exists a coupling P
X
6= X
0 ).
(i) kP P0 ktv = 2P(
and X
0 are independent conditional on {X
6= X
0 }, provided the latter event has
(ii) X
positive probability.
Proof. Let = {(x, x) : x E} be the diagonal of E E. Let : E E E be the map
defined by (x) = (x, x), which is measurable because E is a Polish. Put
= P + P0 ,
g=
dP
,
d
g0 =
dP0
,
d
note that g and g 0 are well defined because P and P0 are both absolutely continuous w.r.t. ,
and define
dQ
= Q 1 .
= g g0,
Q
d
puts all its mass on . Call this mass = Q(),
Then Q
and put
= P Q,
Then
0 = P0 Q,
0
= + Q.
P
1
0
E) = (A) (E) + Q(A
P(A
E) = P(A),
1
P(E A) = P (A), so that the marginals are correct and we indeed have a proper coupling.
To get (i), compute
kP P0 ktv =
Z
|g g 0 | d = 2 1 (g g 0 ) d
c ) = 2P(
X
6= X
0 ).
= 2 [1 Q(E)] = 2(1 ) = 2P(
Here, the first equality uses the Jordan-Hahn decomposition of signed measures into a difference of non-negative measures.
Exercise 2.11 Prove the first equality.
14
|X
=
0 ) = P(|
P(
6 X
)=
1 1
().
What Theorem 2.10 says is that, by being creative enough, we can in principle find a coupling
that gives the correct value for the total variation. However, in practice it is often difficult to
find this maximal coupling explicitly and we have to content ourselves with good estimates
or approximations. We will encounter examples in Section 9.
15
Random walks
Random walks on Zd , d 1, are special cases of Markov chains: the transition probability to
go from site x to site y only depends on the difference vector y x. Because of this translation
invariance, random walks can be analyzed in great detail. A standard reference is Spitzer [14].
One key fact that we will use below is that any random walk whose step distribution has zero
mean and finite variance is recurrent in d = 1, 2 and transient in d 3. In d = 1 any random
walk whose step distribution has zero mean and finite first moment is recurrent.
3.1
Pn
i=1 Yi ,
P (Yi = 1) = P (Yi = 1) = 12 .
The following theorem says that, modulo period 2, the distribution of Sn becomes flat for
large n.
Theorem 3.1 Let S be a simple random walk. Then, for every k Z even,
lim kP (Sn ) P (Sn + k )ktv = 0.
Proof. Let S 0 denote an independent copy of S starting at S00 = k. Write P for the joint
probability distribution of (S, S 0 ), and let
T = min{n N0 : Sn = Sn0 }.
Then
kP (Sn ) P (Sn + k )ktv = kP (Sn ) P (Sn0 )ktv 2P (T > n).
Now, S = (Sn )nN0 defined by Sn = Sn0 Sn is a random walk on Z starting at S0 = k with
i.i.d. increments Y = (Yi )iN given by
P (Yi = 2) = P (Yi = 2) = 14 ,
P (Yi = 0) = 12 .
This is a simple random walk on 2Z with a random time delay, i.e., it steps only half of the
time. Since
T = 0 = {n N0 : Sn = 0}
and k is even, it follows from the recurrence of S that P (T < ) = 1. Let n to get the
claim.
In analytical terms, if p(, ) denotes the transition kernel of the simple random walk, pn (, ),
n N, denotes the n-fold composition of p(, ), and k (), k Z, denotes the point measure
at k, then Theorem 3.1 says that for k even
lim kk pn () 0 pn ()ktv = 0.
It is possible to prove the latter statement by hand, i.e., by computing k pn (), evaluating
the total variation distance and letting n . However, this computation turns out to be
somewhat cumbersome.
16
n N0 , k Z odd.
Does the same result as in Theorem 3.1 hold for random walks other than the simple random
walk? Yes, it does! To formulate the appropriate statement, let S be the random walk on Z
with i.i.d. increments Y satisfying the aperiodicity condition
gcd{z 0 z : z, z 0 Z, P (Y1 = z)P (Y1 = z 0 ) > 0} = 1.
(3.1)
k Z.
Proof. We try to use the same coupling as in the proof of Theorem 3.1. Namely, we put
Sn = Sn0 Sn , n N0 , note that S = (Sn )nN0 is a random walk starting at S0 = k whose
i.i.d. increments Y = (Yi )iN are given by
X
P (Y1 = z)P (Y1 = z 0 ),
z Z,
P (Y1 = z) =
z,z 0 Z
z 0 z=
z
(3.2)
E(|Y1 |) = , in which case S is not necessarily recurrent (see Spitzer [14], Section 3).
The lack of recurrence may be circumvented by slightly adapting the coupling. Namely, instead
of letting the two copies of the random walk S and S 0 step independently, we let them make
independent small steps, but dependent large steps. Formally, we let Y 00 be an independent
copy of Y , and we define Y 0 by putting
00
Yi if |Yi Yi00 | N,
0
Yi =
(3.3)
Yi if |Yi Yi00 | > N,
i.e., S 0 copies the jumps of S 00 when they differ from the jumps of S by at most N , otherwise
it copies the jumps of S. The value of N N is arbitrary and will later be taken large enough.
First, we check that S 0 is a copy of S. This is so because, for every z Z,
P 0 (Y10 = z) = P (Y10 = z, |Y1 Y100 | N ) + P (Y10 = z, |Y1 Y100 | > N )
= P (Y100 = z, |Y1 Y100 | N ) + P (Y1 = z, |Y1 Y100 | > N ),
17
and the first term in the r.h.s. equals P (Y1 = z, |Y1 Y100 | N ) by symmetry (use that Y and
Y 00 are independent), so that we get P 0 (Y10 = z) = P (Y1 = z).
Next, we note from (3.3) that the difference random walk S = S S 0 has increments
00
Yi Yi if |Yi Yi00 | N,
0
Yi = Yi Yi =
0
if |Yi Yi00 | > N,
i.e., no jumps larger than N can occur. Moreover, by picking N large enough we also have
that
P (Y1 6= 0) > 0,
(3.2) holds.
Exercise 3.4 Prove the last two statements.
Thus, S is an aperiodic symmetric random walk on Z with bounded step size. Consequently,
S is recurrent and therefore we have P (
0 < ) = 1, so that the proof of Theorem 3.3 can be
completed in the same way as the proof of Theorem 3.1.
Remark: The coupling in (3.3) is called the Ornstein coupling.
Remark: Theorem 3.1 may be sharpened by noting that
1
.
P (T > n) = O
n
Indeed, this follows from a classical result for random walks in d = 1 with zero mean and finite
variance, namely P (z > n) = O( 1n ) for all z 6= 0 (see Spitzer [14], Section 3). Consequently,
kP (Sn ) P (Sn + k )ktv = O
k Z even.
A direct proof without coupling turns out to be very hard, especially for an arbitrary random
walk in d = 1 with zero mean and finite variance. Even a well-trained analyst typically does
not manage to cook up a proof in a day!
3.2
Yi0
1
2
= Yi .
1 2
.
2d
each,
z Zd .
Proof. Combine the componentwise coupling with the cut out large steps in the Ornstein
coupling (3.3) for each component.
Exercise 3.7 Write out the details of the proof. Warning: The argument is easy when the
random walk can move in only one direction at a time (like simple random walk). For other
random walks a projection argument is needed.
Again, the total variation norm does not tend to zero when (3.4) fails.
3.3
The result in Theorem 3.6 has an interesting corollary. Let denote the discrete Laplacian
acting on functions f : Zd R as
(f )(x) =
1
2d
[f (y) f (x)],
x Zd .
yZd
kyxk=1
A function f is called harmonic when f 0, i.e., f is at every site equal to the average of
its values at neighboring sites.
19
X
=
f (z)[P (Sn = z x) P (Sn = z y)]
zZd
zZd
20
Card shuffling
Card shuffling is a topic that combines coupling, algebra and combinatorics. Diaconis [3]
contains many ideas. Levin, Peres and Wilmer [8] provides a broad panorama on mixing
poperties of Markov chains, with Chapter 8 devoted to card shuffling. Two examples of
random shuffles are described in the MSc thesis by H. Nooitgedagt [12].
4.1
Random shuffles
2. XT = .
3. XT and T are independent.
21
Remark: Think of T as a random time at which the random shuffling of the deck is stopped
such that the arrangement of the deck is completely random. In typical cases the threshold
times (tN )N N are such that limN P (1 < T /tN < 1 + ) = 1 for all > 0. In Section 4.2
we will construct T for a special example.
Theorem 4.4 If T is a strong uniform time, then
kP (Xn ) ()ktv 2P (T > n)
n N0 .
Proof. By now the intuition behind this inequality should be obvious. For n N0 and A PN ,
write
P (Xn A, T n) =
=
n
X X
PN i=0
n
X X
PN i=0
n
X
i=0
n
X
h X
i
P (T = i)
P (Xni A|X0 = )()
PN
P (T = i) (A)
i=0
= (A)P (T n),
where the second equality holds by the strong Markov property of X in combination with
Definition 4.3, and the third equation holds because is the invariant distribution. Hence
P (Xn A) (A) = P (Xn A, T > n) (A)P (T > n),
from which the claim follows after taking the supremum of A.
Remark: Note that T really is the coupling time to a parallel deck that starts in , even
though this deck is not made explicit.
4.2
Top-to-random shuffle
We will next focus on a particular random shuffle, namely, take the top card and insert it
randomly back into the deck, i.e., with probability 1/N put it at each of the N possible
locations, including the top itself. This is called top-to-random shuffle.
Theorem 4.5 For the top-to-random shuffle the sequence (tN )N N with tN = N log N is a
sequence of threshold times.
22
T =V
(4.1)
with V the number of random draws with replacement from an urn with N balls until each
ball has been drawn at least once. To see why this holds, put, for i = 0, 1, . . . , N ,
Ti = the first time that there are i cards below the original bottom card,
Vi = the number of draws required to obtain i distinct balls.
Then
D
i+1
N
, i = 0, 1 . . . , N 1, are independent,
(4.2)
where GEO(p) = {p(1 p)k1 : k N} denotes the geometric distribution with parameter
p [0, 1].
Exercise 4.7 Prove (4.2).
P 1
PN 1
Since T = TN = N
i=0 (Ti+1 Ti ) and V = VN =
i=0 (VN i VN (i+1) ), this proves (4.1).
Label the balls 1, . . . , N and let Ai be the event that ball i is not in the first (1 + )N log N
draws, i = 1, . . . , N . Then, for N ,
N
X
P (T > (1 + )N log N ) = P (V > (1 + )N log N ) = P N
A
P (Ai )
i=1 i
i=1
= N
1
N
(1+)N log N
log N
N
N ,
which yields the second line of Definition 4.2 via Theorem 4.4.
To get the first line of Definition 4.2, pick > 0, pick j = j() so large that 1/j! < 21 , and
define
BN
N j.
Then (BN ) = 1/j!, and {Xn BN } is the event that the order of the original j bottom
cards is retained at time n. Since the first time the card with label N j + 1 comes to the
top is distributed like VN j+1 , we have
P (X(1)N log N BN ) P (VN j+1 > (1 )N log N ).
(4.3)
Indeed, for the upward ordering to be destroyed, the card with label N j + 1 must come to
the top and must subsequently be inserted below the card with label N j + 1. We will show
that, for N N (),
(4.4)
P (VN j+1 (1 )N log N ) < 12 .
From this it will follow that
kP (X(1)N log N ) ) ()ktv 2[P (X(1)N log N ) BN ) (BN )]
2[1 P (VN j+1 (1 )N log N )] (BN )
2[1 21 21 ] = 2(1 ).
The first inequality follows from the definition of total variation, the third inequality from
(4.3) and (4.4). By letting N followed by 0, we get the first line of Definition 4.2.
To prove (4.4), we compute
E(VN j+1 ) =
N
1
X
E(VN i VN i1 )
i=j1
N
1
X
i=j1
Var(VN j+1 ) =
N
1
X
N
N
N log
N log N
i+1
j
Var(VN i VN i1 )
i=j1
N
1
X
i=j1
N
i+1
2
i+1
1
N
cj N 2 , cj =
k 2 .
kj
Here we use that E(GEO(p)) = 1/p and Var(GEO(p)) = (1 p)/p2 . Chebyshevs inequality
therefore gives
P (VN j+1 (1 )N log N ) = P VN j+1 E(VN j+1 ) N log N [1 + o(1)]
P [VN j+1 E(VN j+1 )]2 2 N 2 log2 N [1 + o(1)]
Var(VN j+1 )
2
[1 + o(1)]
E(VN j+1 )2
cj N 2
1
2 2
=O
.
N log2 N
log2 N
This proves (4.4).
24
Poisson approximation
5.1
Coupling
Yi = BER(pi ),
i = 1, . . . , n,
be independent,
P
i.e., P (Yi = 1) = pi and P (Yi = 0) = 1 pi , and put X = ni=1 Yi .
Theorem 5.1 With the above definitions,
kP (X ) p ()ktv
n
X
2i
i=i
Pn
i=1 i .
Pn
0
i=1 Yi .
Yi = = Yi0 1,
Then
i = 1, . . . , n,
X 0 = POISSON(),
where the first line uses that ei = 1 pi and the second line uses that the independent sum
of Poisson random variables with given parameters is again Poisson, with parameter equal to
the sum of the constituent parameters. It follows that
0
P (X 6= X )
n
X
P (Yi 6=
Yi0 )
i=1
P (Yi0 2) =
X
k=2
n
X
P (Yi0 2),
i=1
k
i i
k!
12 2i
X
l=0
ei
li
= 21 2i ,
l!
where the second inequality uses that k! 2(k 2)! for k 2. Since
kP (X ) p ()ktv = kP (X ) P (X 0 )ktv 2P (X 6= X 0 ),
the claim follows.
Remark: TheP
interest in Theorem 5.1 is when n is large, p1 , . . . , pn are small and is of order
n
2
1. (Note that
1 , . . . , n }.) A typical example is pi c/n, in
Pn i=12 i M with M =2 max{
2
which case i=1 i = n[ log(1 c/n)] c /n as n .
Remark: In Section 1.3 we derived a bound similar to Theorem 5.1 but with i = pi . For
small pi we have i pi , and so the difference between the two bounds is minor.
25
5.2
Stein-Chen method
We next turn our attention to a more sophisticated way of achieving a Poisson approximation,
which is called the Stein-Chen method. Not only will this lead to better bounds, it will also
be possible to deal with random variables that are dependent. For details, see Barbour, Holst
and Janson [2].
Again, we fix n N and p1 , . . . , pn [0, 1), and we let
D
Yi = BER(pi ),
i = 1, . . . , n.
i=1
Uj = W, P (Uj ) = P (W ),
(5.1)
D
Vj = W 1|Yj = 1, P (Vj ) = P (W 1 | Yj = 1),
P
where we note that W 1 = i6=j Yi when Yj = 1 (and we put Vj = 0 when P (Yj = 1) = 0).
Clearly, if Uj = Vj , j = 1 . . . , n, with large probability, then we expect the Yi s to be weakly
dependent. In that case, if the pi s are small, then we expect that a good Poisson approximation
is possible.
Before we proceed, we state two core ingredients in the Stein-Chen method.
D
Lemma 5.2 If Z = POISSON() for some (0, ), then for any bounded function f : N
R,
E(f (Z + 1) Zf (Z)) = 0.
(5.2)
Proof. In essence, (5.2) is a recursion relation that is specific to the Poisson distribution.
Indeed, let p (k) = e k /k!, k N0 , denote the coefficients of POISSON(). Then
p (k) = (k + 1)p (k + 1),
k N0 ,
and hence
E(f (Z + 1)) =
p (k)f (k + 1)
kN0
(k + 1)p (k + 1)f (k + 1)
kN0
p (l)lf (l)
lN
= E(Zf (Z)).
26
(5.3)
Lemma 5.3 For (0, ) and A N0 , let g,A : N0 R be the solution to the recursive
equation
g,A (k + 1) kg,A (k) = 1A (k) p (A),
k N0 ,
g,A (0) = 0.
Then, uniformly in A,
kg,A k = sup |g,A (k + 1) g,A (k)| 1 1 .
kN0
Proof. For k N0 , let Uk = {0, 1, . . . , k}. Then the solution of the recursive equation is given
by g,A (0) = 0 and
g,A (k + 1) =
1
[p (A Uk ) p (A)p (Uk )] ,
p (k)
k N0 ,
(5.4)
jA
g,A = g,Ac ,
(5.5)
with Ac = N0 \ A.
Exercise 5.4 Check the last claim.
For A = {j}, the solution reads
p (j)
p
(k)
(
g,{j} (k + 1) =
p (j)
+ p
(k)
Pk
k < j,
k j,
l=0 p (l),
l=k+1 p (l),
j1
p (j) X
1 p (j) X
p (l) +
g,{j} (j + 1) g,{j} (j) =
p (l)
p (j)
p (j 1)
l=0
l=j+1
j1
X
X
1
p (l) +
=
p (l)
j
l=j+1
l=0
X
X
1
l
=
p (l) +
p (l)
j
l=j+1
p (l) =
l=1
27
l=1
1
(1 e ) 1 1 ,
(5.6)
where the second and third equality use (5.3). It follows from (5.4) that
g,A (k + 1) g,A (k) 1 1 ,
where we use that the jumps from negative to positive in (5.6) occur at disjoint positions as
j runs through A. Combine the latter inequality with (5.5) to get
g,A (k + 1) g,A (k) (1 1 ),
so that kg,A k 1 1 .
We are now ready to state the result we are after.
Theorem 5.5 Let n N, p1 , . . . , pn [0, 1) and W, U, V as defined above. Then
kP (W ) p ( )ktv 2(1 1 )
n
X
pj E(|Uj Vj |).
j=1
(1 1 )
n
X
pj E(|Uj Vj |),
j=1
where the second equality and the first inequality use Lemma 5.3, while the fifth equality uses
(5.1). Take the supremum over A to get the claim.
To put Theorem 5.5 to use, we look at a subclass of dependent Y1 , . . . , Yn
Definition 5.6 The above random variables Y1 , . . . , Yn are said to be negatively related if
there exist arrays of random variables
Yj1 , . . . , Yjn
j = 1, . . . , n,
0 ,...,Y 0
Yj1
jn
such that, for each j with P (Yj = 1) > 0,
D
0 , . . . , Y 0 ) = (Y , . . . , Y )|Y = 1,
(Yj1
1
n
j
jn
0
Yji Yji
i 6= j,
What negative relation means is that the condition Yj = 1 has a tendency to force Yi = 0
for i 6= j. Thus, negative relation is like negative correlation (although the notion is in fact
stronger).
An important consequence of negative relation is that there exists a coupling such that Uj Vj
for all j. Indeed, we may pick
Uj =
n
X
Vj = 1 +
Yji ,
i=1
n
X
Yji0 ,
i=1
i6=j
pj E(|Uj Vj |) =
j=1
n
X
pj E(Uj Vj )
j=1
n
X
pj E(W )
j=1
n
X
pj E(W |Yj = 1) +
j=1
= E(W )2
n
X
n
X
pj
j=1
E(Yj W ) +
j=1
2
= E(W ) E(W 2 ) +
= Var(W ) + ,
where the second equality uses (5.1).
Note: The upper bound in Theorem 5.7 only contains the unknown quantity Var(W ). It
turns out that in many examples this quantity can be either computed or estimated.
5.3
Two applications
1. Let Y1 , . . . , Yn be independent
(as assumed previously). Then Var(W ) =
P
Pn
n
2
i=1 pi (1 pi ) =
i=1 pi , and the bound in Theorem 5.7 reads
2(1
n
X
p2i ,
i=1
29
Pn
i=1 Var(Yi )
m N n
N ) N 1 .
Exercise 5.9 Check that the above construction produces arrays with the properties required
by Definition 5.6.
We expect that if m/N, n/N 1, then W is approximately Poisson distributed. The formal
computation goes as follows. Using Theorem 5.7 and Exercise 5.9, we get
kP (W ) p ()ktv 2(1 1 )[ Var(W )]
m N n
1
= 2(1 ) 1 1
N N 1
(m + n 1)N mn
= 2(1 1 )
N (N 1)
m+n1
2
.
N 1
Indeed, this is small when m/N, n/N 1.
30
Markov Chains
In Section 1.1 we already briefly described coupling for Markov chains. We now return to this
topic. We recall that X = (Xn )nN0 is a Markov chain on a countable state space S, with an
initial distribution = (i )iS and with a transition matrix P = (Pij )i,jS that is irreducible
and aperiodic.
There are three cases:
1. positive recurrent,
2. null recurrent,
3. transient.
In case 1 there exists a unique stationary distribution , solving the equation = P and
satisfying > 0, and limn P n = componentwise on S. This is the standard Markov
Chain Convergence Theorem, and we want to investigate the rate of convergence. In cases 2
and 3 there is no stationary distribution, and limn P n = 0 componentwise. We want to
investigate the rate of convergence as well, and see what the role is of the initial distribution
.
6.1
For i S, let
Ti = min{n N : Xn = i},
mi = Ei (Ti ) = E(Ti |X0 = i),
which, by positive recurrence, are finite. A basic result of Markov chain theory is that i =
1/mi , i S (see H
aggstr
om [5], Chapter 5, and Kraaikamp [7], Section 2.2).
We want to compare two copies of the Markov chain starting from different initial distributions
= (i )iS and = (i )iS , which we denote by X = (Xn )nN0 and X 0 = (Xn0 )nN0 ,
respectively. Let
T = min{n N0 : Xn = Xn0 }
denote their first meeting time. Then the standard coupling inequality in Theorem 2.5 gives
kP n P n ktv 2P, (T > n),
where P, denotes any probability measure that couples X and X 0 . We will choose the
independent coupling P, = P P , and instead of T focus on
T = min{n N0 : Xn = Xn0 = 0},
their first meeting time at 0 (where 0 is any chosen state in S). Since T T , we have
kP n P n ktv 2P, (T > n).
The key fact that we will uses is the following.
Theorem 6.1 Under positive recurrence,
P, (T < ) = 1
31
, .
(6.1)
Proof. The successive visits to 0 by X and X 0 , given by the {0, 1}-valued sequences
Y = (Yk )kN0
Y 0 = (Yk0 )kN0
constitute a renewal process: each time 0 is hit the process of returns to 0 starts from scratch.
Define
Yk = Yk Yk0 ,
k N0 .
Then also Y = (Yk )kN0 is a renewal process. Let
I = {Yk = 1 for infinitely many k}.
It suffices to show that P, (I) = 1 for all , .
If = = , then Y is stationary, and since P, (Y0 = 1) = 02 > 0, it follows from the
renewal property that P, (I) = 1. The in turn implies that
P (I) = 1,
P () = P ( | Y0 = 1).
x, y, z (0, 1]
1 = xz + y(1 z)
x = 1,
P (I|Amn )P (Amn ),
(6.2)
m,nN0
i,jS
, ,
, , n.
1
k
log[1/(1 )] > 0.
Remark: All rates of decay are possible when |S| = : sometimes exponential, sometimes
polynomial. With the help of Theorem 2.8 it is possible to estimate the rate when some
additional control of the moments of T or T is available (recall Section 2.3). This typically
requires additional structure. For simple random walk on Z and Z2 it is known that P (T >
6.2
(6.3)
, ?
(6.4)
It suffices to show that there exists a coupling P, such that P (T < ) = 1. However,
the proof of Theorem 6.1 for positive recurrent Markov chains does not carry over because
it relies on 0 > 0. It is, in fact, enough to show that there exists a coupling P, such that
P, (T < ) = 1, which seems easier because the two copies of the Markov chain only need
to meet somewhere, not necessarily at 0.
Theorem 6.3 Under null recurrence,
P, (T < ) = 1
, .
Proof. A proof of this theorem and hence of (6.4) is beyond the scope of the present course.
We refer to Lindvall [11], Section III.21, for more details. As a weak substitute we prove the
Cesaro average version of (6.4):
1
N
1
1 NX
X
1
P n
P n
= 0
, .
X recurrent = lim
N
N
N
n=0
n=0
tv
The proof uses the notion of shift-coupling, i.e., coupling with a random time shift. Let X
and X 0 be two independent copies of the Markov chain starting from and . Let 0 and 00
denote their first hitting times of 0. Couple X and X 0 by letting their paths coincide after 0 ,
respectively, 00 :
0
Xk+0 = Xk+
k N0 .
0
0
33
0
(N m1)(N
h
X m 1)
(m + m0 ) + |m m0 | +
k=0
P, Xm+k A |
(0 , 00 )
i
0
0
0
= (m, m ) P, Xm
A
|
(
,
)
=
(m,
m
)
0 +k
0 0
0
2
P (0 00 M ) + E((0 00 )1{0 00 <M } ),
N
where in the first inequality we take M N and note that m+m0 +|mm0 | = 2(mm0 ) is the
number of summands that are lost by letting the sums start at n = m, respectively, n = m0 ,
shifting them by m, respectively, m0 , and afterwards cutting them at (N m1)(N m0 1).
The last sum after the first inequality is zero by the coupling and therefore we get the second
inequality.
Since the bound is uniform in A, we get the claim by taking the supremum over A and letting
N followed by M .
6.3
Case 3: Transient
There is no general result for transient Markov chains: (6.3) always holds, but (6.4) may hold
or may fail. For the special case of random walks on Zd , d 1, we saw with the help of the
Ornstein coupling that (6.4) holds. This also showed that for arbitrary random walk
x n
35
Probabilistic inequalities
7.1
x R,
x R,
i.e., F F 0 pointwise.
Theorem 7.1 Let X, X 0 be R-valued random variables with probability measures P, P 0 . If
X
0 ) of X and X 0 with probability measure P such
P P 0 , then there exists a coupling (X,
that
X
0 ) = 1.
P (X
Proof. Let F , F 0 denote the generalized inverse of F, F 0 defined by
F (u) = inf{x R : F (x) u},
F 0 (u) = inf{x R : F 0 (x) u},
u (0, 1).
0 = F 0 (U ).
X
D
D
X
0 because F F 0 implies F F 0 . This construction, via
=
0 =
Then X
X, X
X 0 , and X
a common U , provides the desired coupling.
Z
f dP
f dP 0
Actually, the converses of Theorems 7.1 and 7.2 are also true, as is easily seen by picking sets
[x, ) and functions 1[x,) for running x R. Therefore the following equivalence holds:
36
7.2
What we did in Section 7.1 can be extended to partially ordered state space.
7.2.1
We will show that the above equivalence continues to hold for more general state spaces,
provided it is possible to put a partial ordering on them. In what follows, E is Polish and E
is the -algebra of Borel subsets of E.
Definition 7.5 A relation on a space E is called a partial ordering if (x, y, z are generic
elements of E)
1. x x,
2. x y, y z = x z,
3. x y, y x = x = y.
Definition 7.6 Given two probability measures P, P 0 on E, we say that P 0 stochastically
dominates P and write P P 0 , if
P (A) P 0 (A) for all A E non-decreasing,
or equivalently
Z
Z
f dP
f dP 0 for all f : E R measureable, bounded and non-decreasing,
E
1.
2.
3.
are
P P 0,
0
1,
RP : P (X R X ) =
0 for all f measurable, bounded and non-decreasing,
f
dP
f
dP
E
E
equivalent.
Examples:
E = {0, 1}Z , x = (xi )iZ E, x y if and only if xi yi for all i Z. For p [0, 1],
let Pp denote the probability measure on E under which X = (Xi )iZ has i.i.d. BER(p)
components. Then Pp Pp0 if and only if p p0 .
It is possible to build in dependency. For instance, let Y = (Yi )iZ be defined by
Yi = 1{Xi1 =Xi =1} , and let Pep be the law of Y induced by the law Pp of X. Then the
components of Y are not independent, but again Pep Pep0 if and only if p p0 .
Exercise 7.9 Prove the last two claims.
More examples will be encountered in Section 9.
7.2.2
The notions of partial ordering and stochastic domination are important also for Markov
chains. Let E be a polish space equipped with a partial ordering . A transition kernel K
on E E is a mapping from E E to [0, 1] such that:
1. K(x, ) is a probability measure on E for every x E;
2. K(, A) is a measurable mapping from E to [0, 1] for every A E.
The meaning of K(x, A) is the probability for the Markov chain to jump from x into A. An
example is
1
|B1 (x) A|,
E = Rd , K(x, A) =
|B1 (x)|
which corresponds to a Levy flight on Rd , i.e., a random walk that makes i.i.d. jumps drawn
randomly from the unit ball B1 (0) around the
P origin. The special case where E is a countable
set leads to transition matrices: K(i, A) = jA Pij , i E.
Definition 7.10 Given two transition kernels K and K 0 on E E, we say that K 0 stochastically dominates K if
K(x, ) K 0 (x0 , ) for all x x0 .
If K = K 0 and the latter condition holds, then we say that K is monotone.
Remark: Not all transition kernels are monotone, which is why we cannot write K K 0 for
the property in Definition 7.10.
Lemma 7.11 If and K 0 stochastically dominates K, then
K n K 0n for all n N0 .
38
Proof. The proof is by induction on n. The ordering holds for n = 0. Suppose that the
ordering holds for n. Let f be an arbitrary bounded and non-decreasing function on E n+2 .
Then
Z
f (x0 , . . . , xn , xn+1 )(K n+1 )(dx0 , . . . , dxn , dxn+1 )
n+2
E
Z
Z
(7.1)
=
(K n )(dx0 . . . , dxn )
f (x0 , . . . , xn , xn+1 )K(xn , dxn+1 ).
E n+1
which equals
Z
E n+2
(Z0 , . . . , Zn ) = K n ,
n N0 ,
(Z00 , . . . , Zn0 ) = K 0n ,
1. E = R, becomes . The result says that if and K(x, ) K 0 (x, ) for all x x0 ,
then the two Markov chains on R can be coupled so that they are ordered for all times.
2. E = {0, 1}Z . Think of an infinite sequence of lamps, labelled by Z, that can be either
off or on. The initial distributions are = Pp and = Pp0 with p < p0 . The transition
kernels K and K 0 are such that the lamps change their state independently at rates
u
u0
v0
K:
01, 10,
K0 :
01, 10,
with u0 > u and v 0 < v, i.e., K 0 flips more rapidly on and less rapidly off compared to K.
39
Exercise 7.14 Give an example where the flip rate of a lamp depends on the states of the
two neighboring lamps.
7.3
Let S be a finite set and let P(S) be the set of all subsets of S (called the power set of
S). Then P(S) is partially ordered by inclusion. A probability measure on P(S) is called
log-convex if
(a b)(a b) (a)(b)
a, b P(S).
(7.3)
A function f on P(S) is called non-decreasing if
f (b) f (a)
Abbreviate [f ] =
aP(S) f (a)(a)
a, b P(S) with a b.
(7.4)
s3 s4 t3 t4 ,
s2 s3 t1 t4 ,
= (a b)
= (a b)
= ([a b] {s})
= ([a b] {s})
t1
t2
t3
t4
= (a)
= (b)
= (a {s})
= (b {s})
s2 s3 t2 t3 .
Exercise 7.18 Check the latter inequality by using (7.3) and Lemma 7.16.
Step 2: f 0 , g 0 are non-decreasing on P(S 0 ):
For a, b P(S 0 ) with a b, write
f 0 (b) f 0 (a)
n
1
[(a) + (a {s})][f (b)(b) + f (b {s})(b {s})]
= 0
(a)0 (b)
o
[(b) + (b {s})][f (a)(a) + f (a {s})(a {s})]
=
[(a) + (a {s})]
n
o
[f (b) f (a)](b) + [f (b {s}) f (a {s})](b {s})
0 (a)0 (b)
aP(S 0 )
aP(S)
Hence
X
[f g]
aP(S)
41
The two factors in the integrand are either both 0 or both 0, and hence [f g] [f ][g].
Remark: The intuition behind log-convexity is the following. First, note that the inequality
in (7.3) holds for all a, b P(S) if and only if
(a {s})
({s})
(a)
()
a P(S), s S\a.
(7.5)
Next, let X P(S) be the random variable with distribution P (X = a) = (a), a P(S).
Define
p(a, {s}) = P s X | X S\{s} = a ,
a P(S), s S\a,
(7.6)
and note that
p(a, {s}) =
1+
(a {s})
(a)
1 !1
.
a P(S), s S\a.
In view of (7.6), the latter says: larger X are more likely to contain an extra point than
smaller X.
d
(7.7)
7.4
a, b P(S).
(7.8)
g
,
[g]
2 = .
(7.9)
= (t )t0 ,
at rate
1 ( s )
1 ()
(
(, ) ( s , s )
at rate
1
1 ( s )
1 ()
Exercise 7.23 Check property (3) by showing that the allowed transitions preserve the ordering of the Markov chains, i.e., if , then the same is true after every allowed transition.
Consequently,
0 0 = t t t > 0.
(7.10)
Check properties (1) and (2). Condition (7.8) is needed to ensure that
2 ( s )
1 ( s )
43
Percolation
8.1
Ordinary percolation
Consider the d-dimensional integer lattice Zd , d 2. Draw edges between neighboring sites.
Associate with each edge e a random variable w(e), drawn independently from UNIF(0, 1).
This gives
w = (w(e))e(Zd ) ,
where (Zd ) is the set of edges.
Pick p [0, 1], and partition Zd into p-clusters by connecting all sites that are connected by
edges whose weight is p, i.e.,
p
xy
if there is a path connecting x and y such that w(e) p for all e . Let Cp (0) denote the
p-cluster containing the origin, and define
(p) = Pp (|Cp (0)| = )
with P denoting the law of w. Clearly,
C0 (0) = {0},
C1 (0) = Zd ,
p 7 Cp (0) is non-decreasing,
so that
(0) = 0,
(1) = 1,
p 7 (p) is non-decreasing.
Define
pc = sup{p [0, 1] : (p) = 0}.
It is known that pc (0, 1) (for d 2), and that p 7 (p) is continuous for all p 6= pc .
Continuity is expected to hold also at p = pc , but this has only been proved for d = 2 and
d 19. It is further known that pc = 21 for d = 2, while no explicit expression for pc is known
for d 3. There are good numerical approximations available for pc , as well as expansions in
1
powers of 2d
for d large.
44
With the parial ordering on {0, 1}Z obtained by inclusion, and X = (Xz )zZd and X 0 =
(Xz0 )zZd the random fields defined by
Xz0 = 1{zCp0 (0)} ,
Xz = 1{zCp (0)} ,
satisfy X X 0 when p < p0 .
8.2
Invasion percolation
Again consider Zd and (Zd ) with the random field of weights w. Grow a cluster from 0 as
follows:
1. Invade the origin: I(0)={0}.
2. Look at all the edges touching I(0), choose the edge with the smallest weight, and invade
the vertex that lies at the other end: I(1) = {0, x}, with x = argminy : kyk=1 W ({0, y}).
3. Repeat 2 with I(1) replacing I(0), etc.
In this way we obtain a sequence of growing sets I = (I(n))nN0 with I(n) Zd and |I(n)|
n + 1. The invasion percolation cluster is defined as
CIPC = lim I(n).
n
This is an infinite subset of Zd , which is random because w is random. Note that the sequence
I is uniquely determined by w (because no two edges have the same weight).
The first question we may ask is whether CIP C = Zd . The answer is no:
CIPC ( Zd
a.s.
1
|BN CIPC | = 0 a.s.,
|BN |
BN = [N, N ]d Zd .
A key result for invasion percolation is the following. Let Wn denote the weight of the edge
that is traversed in the n-th step of the growth of CIPC , i.e., in going from I(n 1) to I(n).
Theorem 8.1
lim sup Wn = pc
n
45
a.s.
Proof. Pick p > pc . Then the union of all the p-clusters contains a unique infinite component,
denoted by Cp , whose asymptotic density is (p) > 0. All edges incident to Cp have weight
> p. Let p denote the first time a vertex in Cp is invaded:
p = inf{n N0 : I(n) Cp 6= }.
Lemma 8.2
P (p < ) = 1
Proof. Each time the invasion breaks out of the box with center 0 it is contained in, it sees
a never-before-explored region containing a half-space. There is an independent probability
(p) > 0 that it hits Cp at such a break out time. (This inequality uses that pc (Zd ) =
pc (halfspace).)
We proceed with the proof of Theorem 8.1. The edge invaded at time p , being incident
to Cp , has weight > p. Since the invasion took place along this edge, all edges incident to
I(n 1) (which includes this edge) have weight > p too. Thus, all edges incident to I(p ) Cp
have weight > p. However, all edges connecting the vertices of Cp have weight p, and so
after time p the invasion will be stuck inside Cp forever. Not only does this show that
CIPC = I(p ) Cp ( Zd , it also shows that Wn p for all n large enough a.s. Since p > pc is
arbitrary, it follows that
lim sup Wn pc a.s.
n
In fact, it is trivial to see that equality must hold. Indeed, suppose that Wn p for all n
large enough for some p < pc . Then CIPC Cp {some finite set}. But |Cp| < a.s., and
this contradicts |CIPC | = . (The finite set is I(n0 ) with n0 the smallest integer such that
Wn p for all n n0 .)
Theorem 8.1 shows that invasion percolation is an example of a stochastic dynamics that exhibits self-organized criticality: CIPC is in some sense close to Cpc for ordinary percolations.
Informally:
CIPC = Cpc + .
Very little is known about the probability distribution of CIPC .
8.3
If we replace Zd by T , the rooted tree with branching number 2, then a lot can be
said about CIPC in detail. What follows is taken from Angel, Goodman, den Hollander and
Slade [1].
46
47
Proof. We only give the proof for CIPC . By symmetry, all possible backbones are equally likely.
Condition on the backbone, abbreviated BB. Conditional on W = (Wk )kN0 , the following is
true for every vertex x T :
x CIPC every edge on the path between xBB and x has weight < Wk ,
where xBB is the vertex where the path downwards hits BB and k = is the height of xBB .
Therefore, the event {BB = bb, W = w} is the same as the event that for all k N0 there is
no percolation below level Wk (i.e., for p < Wk ) in each of the branches off BB at height k,
and the forward maximal weights along bb are equal to w.
On the tree, there is a nice duality relation between subcritical and supercritical percolation.
Lemma 8.5 A supercritical percolation cluster with parameter p > pc conditioned to stay
finite has the same law as a subcritical percolation cluster with dual parameter p < pc given
by
p = p(p)1
with (p) the probability that the cluster along a particular branch from 0 is finite.
Proof. For v T , let C(v) denote the forward cluster of v for p-percolation. Let U be any
finite subtree of T with, say, m edges, and hence with ( 1)m + boundary edges. Then
Pp (U C(v)| |C(v)| < ) =
=
the numerator being the probability of the event that the edges of U are open and there is no
percolation from any of the sites in U . The r.h.s. equals
pm = Pp (U C(v)),
which proves the duality. To see that p > pc implies p < pc , note that
(p) = 1 (p) ,
(p) = 1 p(p).
We can now complete the proof of CIPC CIIC : since CIPC has subcritical clusters hanging off
its backbone, these branches are all stochastically smaller than the critical branches hanging
off the backbone of CIIC .
48
9.1
Definitions
An Interacting Particle System (IPS) is a Markov process = (t )t0 on the state space
d
d
= {0, 1}Z (or = {1, 1}Z ), d 1, where
t = {t (x) : x Zd }
denotes the configuration at time t, with t (x) = 1 or 0 meaning that there is a particle or
hole at site x at time t, respectively. Alternative interpretations are
1 = infected/spin-up/democrat
0 = healthy/spin-down/republican.
The configuration changes with time and this models how a virus spreads through a population, how magnetic atoms in iron flip up and down as a result of noise due to temperature, or
how the popularity of two political parties evolves in an election campaign.
The evolution is modeled by specifying a set of local transition rates
c(x, ),
x Zd , ,
(9.1)
playing the role of the rate at which the state at site x changes into the configuration , i.e.,
x
with x the configuration obtained from by changing the state at site x (either 0 1 or
1 0). Since there are only two possible states at each site, such systems are called spin-flip
systems.
Remark: It is possible to allow more than two states, e.g. {1, 0, 1} or N0 . It is also possible
to allow more than one site to change state at a time, e.g. swapping state (01 10 or
10 01). In what follows we focus entirely on spin-flip systems.
If c(x, ) depends on only via (x), the value of the spin at x, then consists of independent
spin-flips. In general, however, the rate to flip the spin at x may depend on the spins in
the neighborhood of x (possibly even on all spins). This dependence models an interaction
between the spins at different sites. In order for to be well-defined, some restrictions must
be placed on the family in (9.1), e.g. c(x, ) must depend only weakly on the states at far
away sites (formally, 7 c(x, ) is continuous in the product topology), and must not be
too large (formally, bounded away from infinity in some appropriate sense).
9.2
y Zd
(9.2)
with y the shift of space over y, i.e., y (x) = (x + y), x Zd . Property (9.2) says that the
flip rate at x only depends on the configuration as seen relative to x, which is natural when
49
the interaction between spins is homogeneous in space. Another useful and frequently used
assumption is that the interaction favors spins that are alike, i.e.,
c(x, ) c(x, 0 ) if (x) = 0 (x) = 0,
0
(9.3)
c(x, ) c(x, 0 ) if (x) = 0 (x) = 1.
Property (9.3) says that the spin at x flips up faster in 0 than in when 0 is everywhere
larger than , but flips down slower. In other words, the dynamics preserves the order .
Spin-flip systems with this property are called attractive.
Exercise 9.1 Give the proof of the above statement with the help of maximal coupling.
We next give four examples of systems satisfying properties (9.2) and (9.3).
1. (ferromagnetic) Stochastic Ising model (SIM):
d
This model is defined on = {1, 1}Z with rates
X
c(x, ) = exp[(x)
(y)],
0,
yx
which means that spins prefer to align with the majority of the neighboring spins.
2. Contact process (CP):
d
This model is defined on = {0, 1}Z with rates
( P
(y), if (x) = 0,
yx
c(x, ) =
1, if (x) = 1,
> 0,
which means that infected sites become healthy at rate 1 and healthy sites become
infected at rate times the number of infected neighbors.
3. Voter model (VM):
d
This model is defined on = {0, 1}Z with rates
c(x, ) =
1 X
1
,
2d yx {(y)6=(x)}
which means that sites choose a random neighbor at rate 1 and adopt the opinion of
that neighbor.
4. Majority vote process (MVP):
This model is defined on = {0, 1}Z with rates
1 , if (x 1) = (x + 1) 6= (x),
c(x, ) =
, otherwise,
(0, 21 ],
which means that sites change opinion at rate 1 when both neighbors have a different
opinion and at rate otherwise.
In the sequel we will discuss each model in some detail, with coupling techniques playing a
central role. We will see that properties (9.2) and (9.3) allow for a number of interesting
conclusions about the equilibrium behavior of these systems, as well as the convergence to
equilibrium.
50
9.3
Convergence to equilibrium
Write [0] and [1] to denote the configurations 0 and 1, respectively. These are the
smallest, respectively, the largest configurations in the partial order, and hence
[0] [1],
Since the dynamics preserves the partial order, we can obtain information about what happens
when the system starts from any by comparing with what happens when it starts from
[0] or [1].
Lemma 9.2 Let P = (Pt )t0 denote the semigroup of transition kernels associated with .
Write Pt to denote the law of t conditional on 0 = (which is a probability distribution
on ). Then
t 7 [0] Pt is stochastically increasing,
t 7 [1] Pt is stochastically decreasing.
Proof. For t, h 0,
[0] Pt+h = ([0] Ph )Pt [0] Pt ,
[1] Pt+h = ([1] Ph )Pt [1] Pt ,
where we use that [0] Ph [0] and [1] Ph [1] for any h 0, and also use Strassens theorem
(Theorem 7.7) to take advantage of the coupling representation that goes with the partial
order.
Corollary 9.3 Both
= lim [0] Pt (lower stationary law),
t
exist as probability distributions on and are equilibria for the dynamics. Any other equilibrium satisfies .
Proof. This is an immediate consequence of Lemma 9.2 and the sandwich [0] Pt Pt [1] Pt
for and t 0.
The class of all equilibria for the dynamics is a convex set in the space of signed bounded
measures on . An element of this set is called extremal if it is not a proper linear combination
of any two distinct elements in the set, i.e., not of the form p1 + (1 p)2 for some p (0, 1)
and 1 6= 2 .
Lemma 9.4 Both and are extremal.
Proof. We give the proof for , the proof for being analogous. Suppose that = p1 + (1
p)2 . Since 1 and 2 are equilibria, we have by Corollary 9.3 that
Z
Z
Z
Z
f d1
f d,
f d2
f d,
Z
f d1 + (1 p)
f d = p
f d2
51
and p (0, 1), it follows that both inequalities must be equalities. Since the integrals of
increasing functions determine the measure w.r.t. which is integrated, it follows that 1 = =
2 .
Exercise 9.5 Prove that the integrals of increasing functions determine the measure.
Corollary 9.6 The following three properties are equivalent (for shift-invariant spin-flip systems):
(1) is ergodic (i.e., Pt converges to the same limit distribution as t for all ),
(2) there is a unique stationary distribution,
(3) = .
Proof. Obvious because of the sandwiching of all the configurations between [0] and [1].
9.4
9.4.1
Four examples
Example 1: Stochastic Ising Model (SIM)
For = 0, c(x, ) = 1 for all x and , in which case the dynamics consists of independent
d
spin-flips, up and down at rate 1. In that case = = ( 12 1 + 21 +1 )Z .
For > 0 the dynamics has a tendency to align spins. For small this tendency is weak, for
large it is strong. It turns out that in d 2 there is a critical value d (0, ) such that
d :
> d :
= ,
6= .
The proof uses the so-called Peierls argument, which we will encounter in Section 9.5. In
the first case (high temperature), there is a unique ergodic equilibrium, which depends on
and is denoted by . In the second case (low temperature), there are two extremal
equilibria, both of which depend on and are denoted by
R
+ = plus state with (0)+ (d) > 0,
R
= minus-state with (0)+ (d) < 0,
which are called the magnetized states. Note that + and are images of each other under
the swapping of +1s and 1s. It can be shown that in d = 2 all equilibria are a convex
combination of + and , while in d 3 also other equilibria are possible (e.g. not shiftinvariant) when is large enough. It turns out that 1 = 0, i.e., in d = 1 the SIM is ergodic
for all > 0.
9.4.2
Note that [0] is a trap for the dynamics (if all sites are healthy, then no infection will ever
occur), and so
= [0] .
For small infection is transmitted slowly, for large rapidly. It turns out that in d 1 there
is a critical value d (0, ) such that
d :
> d :
Note that [0] and [1] are both traps for the dynamics (if all sites have the same opinion, then
no change of opinion occurs), and so
= [0] ,
= [1] .
It turns out that in d = 1, 2 these are the only extremal equilibria, while in d 3 there is a
1-parameter family of equilibria
( )[0,1]
with the density of 1s, i.e., ((0) = 1) = . This is remarkable because the VM has
no parameter to play with. For = 0 and = 1 these equilibria coincide with [0] and [1] ,
respectively.
Remark: The dichotomy d = 1, 2 versus d 3 is directly related to simple random walk
being recurrent in d = 1, 2 and transient in d 3.
9.4.4
It turns out that = for (0, 21 ]. This can be proved with the help of an auxiliary twostate Markov chain. The convergence to equilibrium is exponentially fast. Note that = 12
corresponds to independent spin-flips.
9.5
We will next prove (i-iii) in Lemma 9.7. This will take up some space. In the proof we need
a property of the CP called self-duality. We will not explain in detail what this is, but only
state that it says the following:
CP locally dies out (in the sense of weak convergence) starting from [1] if and only if
CP fully dies out when starting from a configuration with finitely many infections, e.g.,
{0}.
For details we refer to Liggett [9].
53
9.5.1
Pick 1 < 2 . Let c1 (x, ) and c2 (x, ) denote the local transition rates of the CP with
parameters 1 and 2 , respectively. Then it is easily checked that
c1 (x, ) c2 (x, 0 ) if (x) = 0 (x) = 0,
0
x Zd , .
c1 (x, ) c2 (x, 0 ) if (x) = 0 (x) = 1,
(For the CP the last inequality is in fact an equality.) Consequently,
[1] Pt1 [1] Pt2
t 0,
by the maximal coupling, with P = (Pt )t0 denoting the semigroup of the CP with parameter . Letting t , we get
1 2
with the upper variant measure of the CP with parameter . With () = ((0) = 1)
denoting the density of 1s in equilibrium, it follows that (1 ) (2 ). Hence
d = inf{ 0 : () > 0},
= sup{ 0 : () = 0},
defines a unique critical value, separating a phase of (local) extinction of the infection from
a phase of (local) survival of the infection. The curve 7 () is continuous on R. The
continuity at = d is hard to prove.
9.5.2
Pick A0 finite and consider the CP with parameter starting from the set A0 as the set of
infected sites. Let At denote the set of infected sites at time t. Then
|At | decreases by 1 at rate |At |,
|At | increases by 1 at rate 2d|At |,
where the latter holds because each site in At has at most 2d non-infected neighbors. Now
consider the random process X = (Xt )t0 with Xt = |At | and Y = (Yt )t0 given by the
birth-death process on N0 that moves at rate n from n to n 1 (death) and at rate (2d)n
from n to n + 1 (birth), both starting from n0 = |A0 |. Then X and Y can be coupled such
that
P (Xt Yt t 0) = 1,
where P denotes the coupling measure. Note that n = 0 is a trap for both X and Y . If
2d < 1, then this trap is hit with probability 1 by Y , i.e., limt Yt = 0 a.s., and hence it
follows that limt Xt = 0 a.s. Therefore () = 0 when 2d < 1. Consequently, 2dd 1.
54
9.5.3
The idea is to couple two CPs that live in dimensions 1 and d. Let A = (At )t0 with At
the set of infected sites at time t of the CP in dimension d with parameter starting from
A0 = {0}. Let B = (Bt )t0 be the same as A, but for the CP in dimension 1 with parameter
d starting from B0 = {0}.
Define the projection d : Zd Z as
d (x1 , . . . , xd ) = x1 + + xd .
We will construct a coupling P of A and B such that
P (Bt d (At ) t 0) = 1.
From this we get
P At 6= | A0 = {0} = P d (At ) 6= | A0 = {0} P Bt 6= | B0 = {0} ,
which implies that if A dies out then also B dies out. In other words, if d , then d 1 ,
which implies that dd 1 as claimed.
The construction of the coupling is as follows. Fix t 0. Suppose that At = A and Bt = B
with B d (A). For each y B there is at least one x A with y = d (x). Pick one such x
for every y (e.g. choose the closest up or the closest down). Now couple:
The proof proceeds via comparison with directed site percolation on Z2 . We first make a
digression into this part of percolation theory.
Each site is open with probability p and closed with probability 1 p, independently of all
other sites, with p [0, 1]. The associated probability law on configuration space is denoted
by Pp . We say that y is connected to x, written as x ; y, if there is a path from x to y such
that
1. all sites in the path are open (including x and y),
2. the path traverses bonds in the upward direction.
55
80
81 .
= N
i=0 {x H : (2i, 0) ; x}
= all sites connected to the lower left boundary
of H (including the origin).
We want to lay a contour around CN . To do so, we consider the oriented lattice that is
obtained by shifting all sites and bonds downward by 1. We call this the dual lattice, because
the two lattices together make up Z2 (with upward orientation). Now define
N
= the exterior boundary of the set of all faces in the dual lattice
containing a site of CN or one of the boundary sites(2i + 1, 1),
with i = 1, . . . , N.
Think of N as a path from (0, 1) to (2N, 1) in the dual lattice, enclosing CN and being
allowed to cross bonds in both directions. We call N the contour of CN . We need the
following observations:
56
4 3n2 (1 p)n/4 .
n=2N
If p > 80/81, then 3(1 p)1/4 < 1 and so the sum is smaller than 1 for N sufficiently large,
i.e., Pp (|CN | = ) > 0 for N N0 (p). Using the translation invariance, we have
Pp (|CN | = ) (N + 1)Pp (|C0 | = ).
Hence, if p > 80/81, then Pp (|C0 | = ) > 0, which implies that pc 80/81.
The contour argument above is referred to as a Peierls argument. A similar argument works
for many other models as well (such as SIM).
Fact 9.9 is the key to proving that 1 < , as we next show. The proof uses a coupling
argument showing that the one-dimensional CP observed at times 0, , 2, . . . with > 0 small
enough dominates oriented percolation with p > 80/81 pc and therefore (locally) survives.
Fact 9.11 The one-dimensional CP survives if
+1
2
1
+1
2
>
80
.
81
Proof. Again consider the half-lattice H that was used for directed percolation. Pick > 0
and shrink the vertical direction by a factor . Add dotted vertical lines that represent the
time axes associated with the sites of Z. In this graph we are going to construct CP and
orientated percolation together. This construction comes in three steps.
Step 1: With each time axis we associate three Poisson processes:
All Poisson point processes at all time axes are independent. Given their realization, we define
At = the set of x Z such that (x, t) can be reached from (0, 0)
by a path that only goes upwards along stretches without
hs and sidewards along arrows with i+ or i .
Exercise 9.12 Show that A = (At )t0 is the CP with parameter starting from A0 = {0}.
Step 2: We say that the site (x, n) is open if
(i) between time (n 1) and (n + 1) there is no h;
(ii) between time n and (n + 1) there are both an i+ and an i .
Define
Bn = the set of x Z such that 0 ; (x, n).
Exercise 9.13 Show that Bn = {x Z : (x, n) C0 }, where C0 is the cluster at the origin
in orientated percolation with p = e2 (1 e )2 .
Step 3. The key part of the coupling is
An Bn
n N0 .
P Bn 6= n N0 | B0 = {0}
= Pp (|C0 | = ),
we obtain, with the help of Fact 9.9, that the one-dimensional CP with parameter survives
if
80
sup e2 (1 e )2 > ,
81
>0
where in the l.h.s we optimize over , which is allowed because the previous estimates hold
for all > 0. The supremum is attained at
1
1
= log
,
+1
which yields the claim in Fact 9.11.
Since
lim
+1
2
1
+1
2
= 1,
58
10
Diffusions
Brownian motion B = (Bt )t0 is a Markov process taking walues in R and having continuous
paths. The law of P of B is called the Wiener measure, a probability measure on the set
of continuous paths such that increments over disjoint time intervals are independent and
normally distributed. To define B properly requires a formal construction that is part of
stochastic analysis, a subarea of probability theory that uses functional analytic machinery to
study continuous-time random processes taking values in R. B is an example of a diffusion.
Definition 10.1 A diffusion X = (Xt )t0 is a Markov process on R with continuous paths
having the strong Markov property.
We write Px to denote the law of X given X0 = x R. The sample space is the space of
continuous functions with values in R, written CR [0, ), endowed with the Borel -algebra
CR [0, ) of subsets of CR [0, ) with the Skorohod topology.
Remark: The time interval need not be [0, ). It can also be (, ), [0, 1], etc., depending
on what X describes. It is also possible that X takes values in Rd , d 1, etc.
An example of a diffusion is X solving the stochastic differential equation
dXt = b(Xt ) dt + (Xt ) dBt ,
(10.1)
where b(Xt ) denotes the local drift function and (Xt ) the dispersion function. The integral
form of (10.1) reads
Z t
Z t
Xt = X0 +
b(Xs ) ds +
(Xs ) dBs ,
0
where the last integral is a so-called Ito-integral. Equation (10.1) is short-hand for the
statement:
The increments of X over the infinitesimal time interval [t, t + dt) is a sum of two
parts, b(Xt )dt and (Xt )dBt , with dBt the increment of B over the same time
interval.
59
Again, a formal definition of (10.1) requires functional analytic machinery. The functions
b : R R and : R R need to satisfy mild regularity properties, e.g. locally Lipschitz
continuous and modest growth at infinity. The solution of (10.1) is called an Ito-diffusion.
The special case with b 0, 1 is Brownian motion itself. The interpretation of X is:
X is a Brownian motion whose increments are blown up by a factor () and
shifted by a factor b(), both of which depend on the value of the process itself.
Definition 10.2 A diffusion is called regular if
Px (y < ) > 0
x, y R
s(x) s(a)
s(b) s(a)
a, b R a < x < b,
for some s : R R continuous and strictly increasing. This s is called the scale function for
X. A diffusion is in natural scale when s is the identity. An example of such a diffusion is
Brownian motion B. More generally, Y = (Yt )t0 with Yt = s(Xt ) is in natural scale, and is
an Ito-diffusion with b 0.
Exercise 10.3 Check the last claim.
Definition 10.4 A diffusion is called recurrent if
Px (y < ) = 1
10.1
x, y R.
For recurrent diffusions on the half-line we have a successful coupling starting from any two
starting points. Indeed, let
T = inf{t [0, ) : Xt = Xt0 }
be the coupling time of X = (Xt )t0 and X 0 = (Xt0 )t0 . Because X and X 0 are continuous
(skip-free), we have
T 0 00 ,
and so recurrence implies that Pxx0 (T < ) = 1 for all x, x0 R, with Pxx0 = Px Px0 the
independent coupling.
60
x, y R.
10.2
For recurrent diffusions on the full-line a similar result holds. The existence of a successful
coupling is proved as follows. Without loss of generality we assume that X is in natural scale.
Fix x < y and pick 0 < N1 < N2 < such that
|Pz (Ak = Nk ) 12 |
1
4
z Ak1 , k N,
, l N,
Pxy XAk X 0 for 1 k l 1 14
Ak
Ak
x R,
is continuous.
Proof. Fix t > 0. Let X and X 0 be independent copies of the diffusion starting from x and x0 ,
respectively. Then
xx0 [f (Xt )] E
xx0 [f (Xt0 )
(Pt f )(x) (Pt f )(x0 ) = E
2Pxx0 (T > t) kf k .
The claim follows from the fact that
lim Pxx0 (T > t) = 0
x0 x
t > 0,
Theorem 10.7 Let P = (Pt )t0 be the semigroup of a regular diffusion. Then
Pt Pt
t 0.
Proof. This is immediate from the skip-freeness, by which allows X0 X00 , and hence
Xt Xt0 for all t 0, when X0 , X00 start from , .
10.3
x, y Rd , > 0,
x, y Rd , > 0,
i.e., points are replaced by small balls around points in all statements about hitting times.
Ito-diffusions are defined by
dXt = b(Xt ) dt + (Xt ) dBt ,
(10.2)
where b : Rd Rd and : Rd Rd Rd are the vector local drift function and the matrix local
dispersion function, both subject to regularity properties.
Diffusions in Rd , d 2, are more difficult to analyze than in R. A lot is known for special
classes of diffusions (e.g. with certain symmetry properties). Stochastic analysis has developed
a vast arsenal of ideas, results and techniques. The stochastic differential equation in (10.2) is
very important because it has a wide range of application, e.g. in transport, finance, filtering,
coding, statistics, genetics, etc.
62