Markov Chains An Introduction WI1614

Course WI1614
Markov Chains:
an introduction
Faculty of Electrical
Engineering, Mathematics and
Computer Science
April 2011
C. Kraaikamp
Based on lecture notes by J.A.M. van der Weide and G. Hooghiemstra, and
elaborated by F.M.Dekking.
Delft
University of
Technology
0693150310
Bestelnummer: 06917490019
VERSION 2011
Contents
Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Moving molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Definition of a discrete Markov chain . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Classification of the states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
The simple random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Solutions to the quick exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Limit behavior of discrete Markov chains . . . . . . . . . . . . . . . . . .

2.1 Branching processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Solutions to the quick exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
24
27
28
31
32
Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Semigroup and generator matrix . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Kolmogorovs backward and forward equations . . . . . . . . . . . . . .
3.4 The generator matrix revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Birth-death processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Solutions to the quick exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
39
41
44
45
49
52
56
58
Short answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . 61
Solutions to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
VERSION 2011
VERSION 2011
1
Discrete-time Markov chains
Up to now we primarily encountered sequences of independent random variables X1 , X2 , . . . , Xn , each having the same distribution. Such a sequence is
a model for the n-fold repetition of a certain experiment. However, one can
also use such sequences to describe the state of a system. For instance, Xn
may denote the number of customers in a shop at time n, where the time
step n is measured in minutes, or seconds, or in any convenient discrete time
step. Of course one could also continuously report the state of the system;
then we consider random variables Xt , with t I, where I is some interval.
For example, I = [0, ) or I = [0, 1]. In both the discrete and the continuous
case we call {Xn }nN respectively {Xt }tI a stochastic process. In contrast to
the model that describes the repetition of an experiment we do not demand
that the random variables in a stochastic process are either independent, or
identically distributed, only that the random variables Xn (resp. Xt ) are all
defined on the same sample space . In fact, if one wants to describe the
state of a system, a property such as independence seem highly unlikely and
questionable.
In this chapter we will introduce a stochastic process where given the state
of the system at the present time n, the future is independent of the past.
Although this seems only a small step away from the models we studied up to
this point, where all the random variables are independent, it turns out that
this stochastic process, which is called a Markov chain after A.A. Markov1 , is
widely applicable, and one of the most important stochastic processes studied.
In this chapter discrete Markov chains will be introduced, and some of its
elementary properties studied. In the next chapter we will study the longterm behavior of these stochastic processes. In Chapter 3 continuous time
Markov chains will be introduced and studied.
1
In fact, Markov was not the first to study the stochastic processes we now call
Markov processes.
VERSION 2011
1 Discrete-time Markov chains
1.1 Moving molecules

As an example of the kind of stochastic process we have in mind, suppose that
we have a gas of N molecules, moving freely between two connected chambers.
How can we describe this in a simple, more-or-less realistic manner? Let the
chambers be denoted by A and B, and let Xn be the number of molecules in
chamber A at time n. In the 1930ies, Paul Ehrenfest modeled the situation
along the following lines. Suppose the molecules are numbered from 1 to N ,
and at time n we select one of these molecules at random (so the ith molecule
is selected with probability 1/N ), and move it to the other chamber. Note that
knowing the realization of Xn at time n determines the possible realizations
(and the probabilities of these realizations) of Xn+1 at time n+1. For instance,
if all the molecules are in chamber A at time n, then one molecule will be in
chamber B at time n + 1 with probability one, so
P(Xn+1 = N 1 | Xn = N ) = 1.
Similarly, P(Xn+1 = 1 | Xn = 0) = 1, and more generally
P(Xn+1 = i 1 | Xn = i) =
i
,
N
for i = 1, 2, . . . , N ,
and
P(Xn+1 = i + 1 | Xn = i) =
N i
,
N
for i = 0, 1, . . . , N 1.
Due to the way we modeled the movement of the gas we moreover have that
P(Xn+1 = j | Xn = i) = 0
for j = 0, 1, . . . , N, j 6= i 1, i + 1.
Note that Xn+1 is determined only by Xn . In fact we will see that for processes
such as in this example one has that given Xn , the random variable Xn+1 is
independent of Xk , for k n 1. I.e., for all i, xn1 , . . . , x0 {0, 1, . . . , N }
with
P(Xn = i, Xn1 = xn1 , . . . , X0 = x0 ) > 0,
we have that
P(Xn+1 = j | Xn = i, Xn1 = xn1 , . . . , X0 = x0 ) = P(Xn+1 = j | Xn = i).
This is called the Markov property.
Since the conditional probabilities in our example do not depend on n, we
define for all n 0 the transition probabilities pi,j by
pi,j = P(Xn+1 = j | Xn = i)
for i, j S = {0, 1, . . . , N } .
We say that the stochastic process {Xn }nN is a time homogeneous Markov
chain, on the state space S = {0, 1, . . . , N }.
VERSION 2011
1.1 Moving molecules
Quick exercise 1.1 In Ehrenfests model of a gas moving between chambers,

determine the matrix P = (pi,j ) of transition probabilities for N = 5. Also,
show that the rows of this matrix add up to 1, i.e., show that for i = 0, 1, . . . , 5,
5
X
pi,j = 1.
j=0
To know the distribution of Xn at time n, it seems to be important to

know how the system started out. That is, it seems to be important how
the molecules were distributed over the two chambers A and B at time 0.
After all, due to the law of total probability (see e.g. [7], p. 18), we have that
P(Xn = j) =
N
X
P(Xn = j | X0 = i)P(X0 = i) ,
for j S.
i=0
Let the vector = (0 1 N ) describe the initial distribution, i.e.,

P(X0 = i) = i ,
for i S,
whereof coursewe have that i 0 and 0 + 1 + + N = 1.

For example, if at time 0 all the molecules are in chamber A, then N = 1,
and i = 0 for all other is. In this case P(X1 = N 1) = 1, and
N 1
if j = N 2
N
1
P(X2 = j) = N
if j = N
0
for all other values of j S.

Quick exercise 1.2 In our molecules example, suppose that i = Ni 2N

for i S. Then P(Xn = j) = Nj 2N for each j S and each n 1. Show
this holds for N = 5, n = 1, and for a general n 2. Why is this initial
distribution not so far fetched as it may seem to be at first view?
So we showed in this Quick exercise, that if the initial distribution of the
number of molecules in chamber A is Bin(N, 12 ), it will remain so for all n. In
fact we will see in the next chapter for this particular example, that whatever
the initial distribution, the distribution of the Xn s will be approximately
Bin(N, 21 ) for n large, i.e., that

N N
lim P(Xn = j) =
2 ,
for all j S.
n
j
In this example the Bin(N, 21 ) distribution is the so-called stationary distribution, and we will see in Theorem 2.2 that the Xn s converges in distribution
to this stationary distribution.
VERSION 2011
1.2 Definition of a discrete Markov chain

A (discrete) Markov chain is a discrete stochastic process (Xn )n0 , taking its
values in a finite or countably infinite set S, which is called the state space.
The elements of S are called states. The process starts at time n = 0 in one of
the states (in which state it starts is determined by the initial distribution ),
and then moves at integer times from one state to the other. The conditional
probability pij that2 the process moves to state j, given that it is currently
in state i, only depends on the current state i, and does not depend on the
way it got to this state i. The process can also remain in state i; this happens
with probability pii . So we must have that
X
pij 0,
pij = 1,
jS
and (in case P(Xn = i, Xn1 = xn1 , . . . , X0 = x0 ) > 0),

P(Xn+1 = j | Xn = i, Xn1 = xi1 , . . . , X0 = x0 ) = P(Xn+1 = j | Xn = i),
which is as already mentioned in the previous sectionthe so-called Markov
property. The transition matrix P is the matrix consisting of the various
transition probabilities: P = (pi,j ). So the matrix P is a Markov matrix, i.e.,
it is a square matrix with non-negative entries, whose rows add up to 1. Such
a matrix is also called a stochastic matrix. For example, if S = {1, 2, . . . , N },
then
p11 p12 p1N

p21 p22 p2N
P = .
..
.. .
..
.
.
pN 1 pN 2 pN N
The Markov property can be generalized as follows. Let Kn S n be a subset
of the set S n of vectors of length n with entries from S, and let V be the
event, given by
V = {(X0 , X1 , . . . , Xn1 ) Kn }.
If P(Xn = i, V ) > 0, then the Markov property yields that (see also Exercise 1.15)
P(Xn+1 = j | Xn = i, V ) = P(Xn+1 = j | Xn = i) = pij .
Let m 1, and let the event Z for Lm S m be given by
Z = {(Xn+1 , Xn+2 , . . . , Xn+m ) Lm },
then we find that the Markov property can also be given by
2
Note that in the previous section we denoted these transition probabilities as pi,j .
VERSION 2011
1.3 Classification of the states
P(Z | Xn = i, V ) = P(Z | Xn = i).
(1.1)
We have the following theorem, which states that in a Markov chain, given
that the chain is now in state i, the future (which is the event Z) is independent of the past (i.e., the event V ). Conversely, if the future is independent
of the past, given the present state i, then the stochastic process is a Markov
chain.
Theorem 1.1 Let V and Z be defined as above. Then the Markov
property (1.1) is equivalent with
P(V Z | Xn = i) = P(V | Xn = i)P(Z | Xn = i)
(1.2)
Proof. Suppose that the Markov property (1.1) holds. Then

P(Z, Xn = i, V )
P(Xn = i)
P(Z, Xn = i, V ) P(V, Xn = i)
=
P(Xn = i, V ) P(Xn = i)
= P(Z | Xn = i, V )P(V | Xn = i)
= P(Z | Xn = i)P(V | Xn = i).
P(V Z | Xn = i) =
(use (1.1))
Conversely, if (1.2) holds, then

P(V | Xn = i)P(Z | Xn = i) = P(V Z | Xn = i)
= P(Z | Xn = i, V )P(V | Xn = i),
which yields (1.1).

In the molecules example in Section 1.1 it is intuitively clear that every state
j can be reached from every state i. By this we mean, that for every i, j S
there exists a non-negative integer n, such that P(Xn = j | X0 = i) > 0. We
[n]
denote this last probability by pi,j , i.e., the n-step transition probability is
given by
[n]
pi,j = P(Xn = j | X0 = i),
for i, j S and n 0,
and we write3 i j. In particular we have for n = 0 that

(
1, if i = j
[0]
pij =
0, if i 6= j.
3
Some authors, see e.g. [3] and [4], say that i and j communicate, but to me this
is a rather one-way form of communication.
VERSION 2011
Note, that since our Markov chains are time homogeneous, we have for every
[n]
k 0 that pi,j = P(Xk+n = j | Xk = i), for i, j S and n 0. Obviously,
[1]
pij = pij for all i, j S.

Theorem 1.2 (Chapman-Kolmogorov equations)
Let (Xn )n0 be a discrete time-homogeneous Markov chain on the
[n]
state space S. Furthermore, let pij be the n-step transition probabilities of this Markov chain. Then for i, j S and m, n 0 we have
X [m] [n]
[m+n]
pij
=
pik pkj .
kS
Proof. The proof rests on the law of total probability ([7], p. 18), and the
Markov property (1.1). We have that
[m+n]
pij
= P(Xm+n = j | X0 = i)
X
=
P(Xm+n = j, Xm = k | X0 = i)
kS
X P(Xm+n = j, Xm = k, X0 = i) P(Xm = k, X0 = i)
P(Xm = k, X0 = i)
P(X0 = i)
kS
X
=
P(Xm+n = j | Xm = k, X0 = i)P(Xm = k | X0 = i)
=
kS
P(Xm+n = j | Xm = k)P(Xm = k | X0 = i)
kS
[m]
[n]
pik pkj .
kS

This theorem has a nice corollaries; In the first corollary the n-step transition
[n]
probabilities pij are linked to powers of the transition matrix P .
Corollary 1.1 Let (Xn )n0 be a discrete time-homogeneous Markov
[n]
chain on the state space S. Furthermore, let pij be the n-step tran

[n]
sition probabilities of this Markov chain, and let P [n] = pij be the
matrix of the n-step transition probabilities. Then for m, n 0 we
have
P [m+n] = P [m] P [n] .
In particular, for n 0 we have that P [n] = P n .
The idea behind the proof of the second corollary is the same idea behind the
solution of Quick exercise 1.2; the law of total probability, i.e.,
X
P(Xn = j) =
P(Xn = j | X0 = i)P(X0 = i) .
iS
VERSION 2011
We have the following corollary.

Corollary 1.2 Let (Xn )n0 be a discrete time-homogeneous Markov
[n]
chain on the state space S. Furthermore, let pij be the n-step transition probabilities of this Markov chain, and let be the vector of
initial probabilities, i.e., i = P(X0 = i) for i S. Then for n 0
and j S we have
X
[n]
P(Xn = j) = (P n )j =
i pij .
iS
Now let i, j S be two states, such that i can be reached from j, and con[m]
versely, j can be reached from i. I.e., there exist m, n 0 such that pij > 0
[n]
and pji > 0. In this case we4 say that i and j communicate, and we write
i j. In the molecules-example each state i communicates with each other
state j, but in general this need not to be the case. In fact, is an equivalence relation on S, and consequently S can be written as the disjoint union
of subsets of elements of S (the so-called equivalence classes) which only communicate with one-another. To show that is indeed an equivalence relation
[0]
on S is not very hard. Obviously is reflexive (since by definition pii = 1
for every i S), and symmetric. Finally, the relation is also transitive;
[n]
suppose that i j and j k, then there exist n, m 0 such that pij > 0
[m]
and pjk > 0. But then, due to the Theorem of Chapman-Kolmogorov, we

have that
X [n] [m]
[n+m]
[n]
[m]
pik
=
pi` p`k pij pjk > 0,
`S
i.e., i k. In the same way one finds that i can be reached from k. If there
is only one equivalence class (so when all the states communicate with oneanother), we say that the Markov chain is irreducible. In this case also the
matrix of transition probabilities P is called irreducible.
Examples 1.1
Let (Xn )n0 be a Markov chain on the state space S = {1, 2, 3, 4}, and let P
be the matrix of transition probabilities. We will consider three different P s,
and see that the behavior of the chain is qualitatively very different for each
of these three cases.
(a) Let
2 1
3 0 3 0
1 0 1 0
2
2
P =
0 1 0 1 .
2
2
0 19 0 98
It is also very instructive to put in one figure the various transitions between
the states, see Figure 1.1. With the aid of Figure 1.1 one convinces oneself
4
Some authors, see e.g. [3] and [4], say that i and j intercommunicate, see also the
footnote on the previous page.
VERSION 2011

2/3
8/9
1/9
....................
....................
...............................
...
...
.....
.....
............
........ ....
...
...
...
........
.
.
.
.
.
.
.
.
.....
.
.
.
.
.
..
.........
..
.
...
......
......
..
.. 1/2
..... 1/2
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....... .............................................. ..................................................
........... ..
.. .
.....
.....
.......
.
. .
. ...
.
...........
............. .......
1 .......
2 ........................................ 3 ............................... 4
.
.....
.
1/2
1/2
.
...
......
.
.
.
.
.
.
......
......
........
........
.............
.............................
1/3
Fig. 1.1. The various transitions in Example 1.1(a)
quickly that there is only one communicating class; the Markov chain is irreducible.
(b) Now suppose that P is given as follows:
2
1
3 0 0 3
0 1 1 0
2 2
P =
0 1 1 0 .
2 2
8
1
9 0 0 9
From Figure 1.2 we see that there are two equivalence classes: {1, 4} and {2, 3}.
The Markov chain is reducible; each of these classes acts as a subworld.
2/3
........
....... ..........
...
...
.
.....
...
...
..
.....
......... ......
1/2
........
....... ..........
...
...
.
.....
...
...
..
.....
......... ......
.......... ....
.. 1 ....
...
...
. 1/3
1/9 ....
...
..
.... 4 .......
.... ......
.......... ....
.. 2 ....
...
...
. 1/2
1/2 ....
...
..
.... 3 .......
.... ......
....
....... .......
...
...
..
..
....
...
...
.....
...
...................
8/9
....
....... .......
...
...
..
..
....
...
...
.....
...
...................
1/2
Fig. 1.2. The transitions in Example 1.1(b)
(c) Finally, let P be given as follows:
1 0 0 0
1 1 1 0
3 3 3
P =
0 1 1 0 .
2 2
1
8
9 0 0 9
From Figure 1.3 it is now clear that there are three classes: {1}, {2, 3}, and
{4}. The last two classes are special; they are transient. Starting in {2, 3} or
in {4}, one will eventually leave this class, never to return. Note that the time
it takes to leave the class {4} is geometrically distributed, with parameter
p = 1/9. The class {1} is recurrent; it will occur infinitely often.
VERSION 2011
1/2
1
......................
......................
...
...
....
....
..
..
...
...
....
....
.
.
..
...
............................................... ...
.
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
......... ......
......
.. ............
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......... .
.. . ...........
..
...
........... ....
...........
.. 1
.. 2 .....
..
..
...
1/9 ....
1/3 ....
1/3
...
...
...
.... 3 ......
.... 4
.... ......
....
......... ......
.....
...
..
...
....
..
...
...
.....
....................
1/2
......... ......
.....
...
..
...
....
..
...
...
.....
....................
8/9
Fig. 1.3. The transitions in Example 1.1(c)
Note that for a reducible Markov chain the recurrent states are like subworlds, behaving as irreducible Markov chains (once you get in such a state,
youll never get out again). It is for this reason that we often only will consider
irreducible Markov chains; reducible ones can be studied piecemeal. Not only
classes can be recurrent, but states also. It is intuitively clear, that if one state
in a class is recurrent (transient), all the other states in that class are also
recurrent (transient) (since all the states communicate with each other). We
will give a proof of this at the end of this section. We start with a formal
definition of a recurrent (transient) state.
Recurrent and transient states. Let (Xn )n0 be a discrete timehomogeneous Markov chain on the state space S. A state i S is
called recurrent (or persistent) if
fi = P(Xn = i for some n 1 | X0 = i) = 1.
A state i which is not recurrent is called transient.
Setting
[n]
fii
(
P(X1 = i | X0 = i)
=
P(Xn = i, Xm 6= i, 1 m < n | X0 = i)
we have that
fi =
n=1
n > 1,
[n]
fii .
n=1
Clearly,
[1]
fii
= pii for all states i, and for all n 2,

[n]
[n]
fii pii .
Now let Ni be the number of visits to state i:
Ni =
1{Xn =i} .
n=0
VERSION 2011
10
Clearly, P(Ni = 1 | X0 = i) = 1 fi . In Exercise 1.11 you are invited to show

that for k 1,
P(Ni = k + 1 | X0 = i) = fi P(Ni = k | X0 = i).
(1.3)
Repeating this another k 1 times we find that

P(Ni = k | X0 = i) = (1 fi )fik1 ,
for k = 1, 2, . . . .
For a recurrent state i this yields that

P(Ni = | X0 = i) = 1,
while for a transient state i we find that
E(Ni | X0 = i) =
Since
E(Ni | X0 = i) =
1
.
1 fi
E(1{Xn =i} | X0 = i) =
n=0
[n]
pii ,
n=0
we find the following characterization of the states.

Theorem 1.3 The state i is recurrent if and only if
Equivalently, the state i is transient if and only if
[n]
pii = .
n=0
[n]
pii < .
n=0
An immediate consequence of this theorem is the following corollary.

[n]
Corollary 1.3 If i is a transient state, then pii 0 as n .

In Exercise 1.12 you will show that that two communicating states are either
both recurrent or both transient. Furthermore, in Exercise 1.13 you will show
that any state j which can be reached from a recurrent state i communicates
with i, and is therefore also recurrent. In view of this it seems natural to call
an equivalence class recurrent if one (and thus all) of its states is recurrent,
and to call it transient otherwise. The followingclassicalexample shows
that somehow the concept of a recurrent state is not completely satisfactory.
The simple random walk
Up to now we have seen only examples of Markov chains with a finite state
space S. One of the oldest and easiest examples of a Markov chain on an
infinite state space S is the so-called simple random walk. Let S = Z, the
collection of the real integers, and let the transition probabilities be given by
pi,i+1 = p,
pi,i1 = q = 1 p,
for some p (0, 1),
VERSION 2011
p
1
.........p
.......................... .
...........
......
...... ................
. ..........
........
.......
...
11
p
1
.........p
.......................... .
...........
......
...... ................
. ..........
........
.......
...
Fig. 1.4. The various transitions in a simple random walk
and pij = 0 for j 6 {i 1, i + 1}; see also Figure 1.4.

Since p 6= 0, 1 it is clear that the Markov chain is irreducible; all states i and j
communicate with one-another. Thanks to Exercise 1.12 we know that every
state is either recurrent, or transient. In view of Theorem 1.3 we need to find
P [n]
n=0 pii .
Quick exercise 1.3 Show that for every n 1 one has, that
(
0
when n is odd
[n]

pii =
2k k k
p
q
when
n = 2k.
k
From this quick exercise we see that
[n]
pii =
n=0

X
2k
k=0
pk q k .
Using Stirlings formula, which states that

k!
k
k
2k,
e
k ,
one finds that

2k
k
But the we see that
(2k)!
22k
(2k/e)2k 4k
=
=
.

2
(k!)2
k
(k/e)k 2k

k
2k k k
(4p(1 p))
p q
.
k
k
Now 4p(1 p) < 1 if and only if p 6= 12 . So if p 6=
1
2
we find that
[n]
pii < ,
n=0
while
VERSION 2011
12
[n]
pii = ,
when p =
n=0
1
.
2
We conclude that the Markov chain is transient if p 6= 12 , and is recurrent if

p = 12 .

Starting in any state i, how long does it take to return to i? In order to answer
thisrather naturalquestion, we define the first return time to state i, by
Ti = min{n 1; Xn = i},
with the convention that Ti = on the event (n1 {Xn = i})c , i.e., if this
visit never happens. We have the following definition.
Definition 1.1 The mean recurrence time mi of a state i is defined
as
(P
[n]
when i is recurrent
n=1 nfii
mi = E(Ti | X0 = i) =
when i is transient.
It is important to note that mi may be infinite, even if i is recurrent. In fact
the simple random walk with p = 21 is an example of this (but this is not so
easy to show; in fact it is a consequence of the main theorem in Chapter 2;
see Exercise 2.11). We have the following definition.
Definition 1.2 The recurrent state i is called null-recurrent if mi =
, and positive recurrent (or non-null recurrent) if mi < .
In a finite irreducible Markov chain all states are positive recurrent, see also
Exercise 1.14.
1.4 Solutions to the quick exercises

1.1 In this example S = {0, 1, 2, 3, 4, 5}, and the matrix of transition probabilities P = (pij ) is given by
0 1 0 0 0 0
1 0 4 0 0 0
5 2 5 3
0 0 0 0
5
5
.
P = (pij ) =
3
2
0 0 5 04 5 01
0 0 0 0
5
5
0 0 0 0 1 0
Clearly, all the rows of this matrix sum up to 1.
VERSION 2011
1.5 Exercises
13
1.2 Due to the law of total probability we have

P(X1 = j) =
5
X
P(X1 = j | X0 = i)P(X0 = i) ,
i=0
Now plugging-in the transition probabilities we have found in Quick exercise 1.1 the desired result follows.
We prove the general statement by induction. We have just shown that the
statement holds for n = 1. Next suppose that for some n 2 one has that
P(Xn1 = j) = Nj 2N for j S. We need to show that P(Xn = j) =

N N
for j S. We have,
j 2
P(Xn = j) =
N
X
P(Xn = j | Xn1 = i)P(Xn1 = i) .
i=0
Because the Markov chain is time homogeneous, this is essentially checking

the same thing as in the case n = 1, i.e., the result holds for all n 1.
The initial distribution tells us how we started, so if 0 = 1 (and i = 0 for
i = 1, 2, . . . , N ), then initially all molecules were put in A). If we randomly
put each molecule either in A, or in B, then i automatically is equal to
N N
for i S.
i 2
1.3 Starting in state i, one cannot return to i in an odd number of steps, i.e.,
[n]
pii = 0 whenever n is odd. Furthermore, one can only return to i is the
number of steps to the left is equal to the number of steps to the right.
So
if n = 2k, k steps must be to the left, and k to the right. There are 2k
k ways
to do this, each having probability pk q k .
1.5 Exercises
1.1 Let A, B, and C be three events. Show that the statements
P(A | B C) = P(A | B)
and
P(A C | B) = P(A | B) P(C | B)
are equivalent.
1.2 Suppose that the weather of tomorrow only depends on the weather
conditions of today. If it rains today, it will rain tomorrow with probability
0.7. If there was no rain today, there will be no rain tomorrow with probability
0.7. Define for n 0 the stochastic process (Xn ) by
(
1 no rain on the nth day
Xn =
0 rain on the nth day.
VERSION 2011
14
a. What is the state space S, and what are the transition probabilities pij ?
b. If today it rained, what is the probability it will rain the day after tomorrow? After three days?
1.3 Clearly the weather model in Exercise 1.2 is not very realistic. Suppose
the weather of today depends on the weather conditions of the previous two
days. Suppose that it will rain today with probability 0.7, if it is given that it
rained the previous two days. If it rained only yesterday, but not the day before
yesterday, it will rain today with probability 0.5. If it rained two days ago,
but not yesterday, then the probability that it will rain today is 0.4. Finally,
if the past two days were without rain, it will rain today with probability 0.2.
a. Translate the statement it will rain today with probability 0.7, if it rained
the previous two days in a statement involving probabilities of the process
(Xn )n0 .
b. Why is the process (Xn )n0 not a Markov chain?
c. Define
0 if Xn1 = 1 and Xn = 1
1 if X
n1 = 0 and Xn = 1
Yn =
2
if
X
n1 = 1 and Xn = 0
3 if Xn1 = 0 and Xn = 0.
Show that (Yn )n0 is a Markov chain. What is the state space, and what
are the transition probabilities?
1.4 You repeatedly throw a fair die. Let Xn be the outcome of the nth throw,
and let Mn be the maximum of the first n throws:
Mn = max{X1 , . . . , Xn },
(so Mn = X(n) ). You may assume that the random variables Xn are independent, and discrete uniformly distributed on S = {1, 2, 3, 4, 5, 6}.
a. Show that the stochastic process (Mn )n1 is a Markov chain with state
space S.
b. Find the matix P of transition probabilities of the Markov chain (Mn )n1 ,
and classify the states.
c. Let T be the first time a 6 has appeared:
(
min{n; Mn = 6} in case {n : Mn = 6} 6=
T =
in case {n : Mn = 6} =
Determine the probability distribution of T .
1.5 Supply a proof of Corollaries 3.1 and 3.2.
VERSION 2011
1.5 Exercises
15
1.6 Consider the Markov chain (Xn )n0 , with state space S = {0, 1}, and
with matrix of transition probabilities P , given by
1 2
P = 31 13 .
2 2
a. Classify the states. Is the chain irreducible?

[n]
b. Determine f00 and f0 . Does this answer surprise you? (or not?). Why?
c. Let the initial distribution be given by
= ( 3/7
4/7 ) .
Determine P(Xn = 0) for all n 1. What do you notice?

1.7 In the previous exercise, Exercise 1.6, we saw that P(Xn = i) = i for
i S, if = ( 3/7 4/7 ). It is quite natural to wonder what happens if is
some arbitrary initial distribution. I.e., what happens if
= ( 0
1 ) ,
with 0 0, 1 0, and 0 + 1 = 1?
In the next chapter we will investigate this, but here we will use our simple setup to get a feel of whats going on. In view of Corollary 3.2 we are interested
in powers P n of the matrix of transition probabilities P .
a. Find the determinant, the eigenvalues, and eigenvectors of P .
b. Let T be the matrix, whose columns are the eigenvectors of P , and let
D be a diagonal matrix, with the corresponding eigenvalues in the main
diagonal. Argue that P = T DT 1 . Use this, to find
P = lim P n .
n
c. Let be an initial distribution. Use Corollary 3.2 and your results from
b. to show that
3
lim P(Xn = 0) = .
n
7
1.8 Let (Yn )n0 be a sequence of independent random variables, all with the
same distribution, given by
P(Yn = 0) =
2
,
3
P(Yn = 1) =
1
,
6
P(Yn = 2) =
1
6
Consider the stochastic process (Xn )n0 , defined by X0 = 0, and
if Xn = 4
Xn Yn
Xn+1 = Xn 1 + Yn if 1 Xn 3
Yn
if Xn = 0,
for n 0.
VERSION 2011
16
a. Determine the matrix of transition probabilities P .

[2]
b. Determine p34 , the probability to go from state 3 to state 4 in two steps.
1.9 Consider the Markov chain (Xn )n0 , with state space S = {1, 2, 3, 4},
and with matrix of transition probabilities P . Classify the states, determine
fi for i S, and find out whether the chain is irreducible in the following two
cases:
a. The matrix P of transition probabilities
1 0 0
1 1 0
2 2
P =
1 0 1
2
2
1
2 0 0
is first given by:
0
0
.
0
1
2
b. Next, P is given by:
100 0
1 0 1 0
2
2
P =
1 0 0 1 .
2
2
1
1
2 0 2 0
1.10 In simple random walk, show that for every n 1 one has that
[n]
pii =
n
X
[nk] [k]
fii .
pii
k=1
1.11 In this exercise we show that for k 1,

P(Ni = k + 1 | X0 = i) = fi P(Ni = k | X0 = i),
see also (1.3).
a. First show that P(Ni = k + 1 | X0 = i) is equal to
P(X` 6= i, ` = 1, . . . , m 1, Xm = i,
m=1
1{X` =i} = k | X0 = i).
`=m+1
b. Setting Am = {Xj 6= i for j = 1, 2, . . . , m 1, X0 = i}, and

(
)
X
Bm =
1{Xn+m =i} = k ,
n=0
show that the probability in a. is equal to P(Xm = i, Am Bm | X0 = i).

c. Use (1.2) to show that P(Xm = i, Am Bm | X0 = i) is equal to
P(Am | Xm = i)P(Bm | Xm = i)
P(Xm = i)
.
P(X0 = i)
VERSION 2011
1.5 Exercises
17
d. Next show that

P(Am | Xm = i)
P(Xm = i)
[m]
= fii ,
P(X0 = i)
and derive the desired result.

1.12 Let i and j be two communicating states, i.e., i j. Then i is recurrent
if and only if j is recurrent.
1.13 Let i be a recurrent state, and suppose moreover that i j. Then
i j. (Due to Exercise 1.12 it follows that j is also recurrent.)
1.14 Let (Xn ) be an irreducible Markov chain on a finite state space S. Then
mi < .
1.15 With the notation of Section 1.2, show that
P(Xn+1 = j | Xn = i, V ) = P(Xn+1 = j | Xn = i) (= pij ).
VERSION 2011
VERSION 2011
2
Limit behavior of discrete Markov chains
In this chapter the long-term behavior of a discrete Markov chain will be investigated. Already in Exercise 1.7 we have seen that if the Markov chain is
nice, something remarkable is happening; the matrix P n of n-step transition probabilities converges to a matrix in which the values per column are
identical. In fact each row of this limiting matrix is equal to the stationary
distribution. In Section 2.2 this will be further investigated. However, we will
start with a process where the Markov chain does not have the nice properties of the chain in Exercise 1.7, and where we still can say a lot about the
long term behavior of the chain.
2.1 Branching processes

It is well know that the study of Probability Theory has its roots in gambling. Another major reason to study Probability Theory is to understand
the evolution of systems. This is in general quite a difficult problem, and
we will address it with a rather simple model; Branching processes. These
branching processes were first introduced by Galton in 1889 to understand
how family names become extinct, but they can also be used to model the
growth (or decline) of a group of bacteria, or neutrons hitting atoms and in
this way releasing more neutrons, thus (possibly) starting a chain reaction. In
this model we have the following assumption: for n 0, the nth generation
has Xn individuals, and at the end of the nth generation each individual has Z offsprings (and dies itself), independently from what happened in
previous generations, and independently from the number of offsprings of the
other members of the nth generation. Here Z is a discrete random variable,
with probability mass function given by
pj = P(Z = j) ,
j = 0, 1, 2, . . . .
We will assume that for every j one has that pj < 1 (otherwise the whole
process becomes deterministic; with probability one every individual will
VERSION 2011
20
2 Limit behavior of discrete Markov chains
have exactly j offsprings), and also that p0 > 0 (otherwise it is trivial what
will happen in the long-run; the number of offsprings will eventually pass any
given limit). Finally, we will assume that X0 = 1; this is not really necessary,
but makes the analysis more transparent.
(n1)
Setting Zi
as the number of offspring of the ith member of the n 1th
generation, we thus have that
(
X0 = 1
(n1)
(n1)
(n1)
Xn = Z1
+ Z2
+ + ZXn1 ,
(n1)
(n1)
where for each n the random variables Z1

, Z2
, . . . are independent and
identically distributed (with the same distribution as Z), and for different n
these sequences of random variables are independent.
Clearly every Xn has its values in S = N, and that the number of individuals
Xn+1 in the n + 1st generation only depends on the number of individuals Xn
in the nth generation. I.e., the stochastic process (Xn )n0 is a Markov chain,
with transition probabilities pij given by

(n)
(n)
(n)
pij = P(Xn+1 = j | Xn = i) = P Z1 + Z2 + + Zi = j .
(2.1)
Quick exercise 2.1 Show that the assumption that p0 > 0 implies that
every state k different from 0 is transient (Hint: determine pk0 ).
From Quick exercise 2.1 we conclude that for every k 1 the set {1, 2, . . . , k}
will be visited only a finite number of times. So if p0 > 0 we either have (with
probability one) that the population will become extinct or that eventually
the population size will grow beyond any given bound. In order to see which
of these two possible cases will happen we will first determine the expected
number of individual in the nth generation. Setting = E[Z], i.e.,
X
X
jP(Z = j) =
jpj ,
=
j=0
j=0
we see that the expected number of individuals E[X1 ] in the first generation
is . In general we have the following result.
Theorem 2.1 For n 0 we have, that
E[Xn ] = n .
Proof. Because X0 = 1, the statement in the theorem is correct for n = 0, and
we just have seen that it is also correct for n = 1. We may therefore assume
that n 2. Using the law of Total Expectation (see [7] page 149), we find
that
VERSION 2011
E[Xn ] = E[E(Xn | Xn1 )] =
21
E(Xn | Xn1 = i)P(Xn1 = i) .
i=0
h
i
(n1)
(n1)
(n1)
Since E(Xn | Xn1 = i) = E Z1
+ Z2
+ Zi
= i, we find that
E[Xn ] =
iP(Xn1 = i)
i=0
iP(Xn1 = i)
(by definition of E[Xn1 ])
i=0
= E[Xn1 ] .
Since E[X0 ] = 1, we find by iteration the desired result: E[Xn ] = n .
Trivially, the n + 1st generation does not have any individual, when the nth
generation does not have any individuals, i.e., {Xn = 0} {Xn+1 = 0}, implying that for n 1, P(Xn+1 = 0) P(Xn = 0). I.e., the sequence (cn )n0
where cn = P(Xn = 0), is a monotonically non-decreasing sequence of probabilities, and is therefore bounded by 1. But then the limit exists as n tends
to infinity; say this limit is 0 , then
0 = lim P(Xn = 0) .
n
So 0 is the probability of ultimate extinction. We have the following two

cases: 1, and > 1.
If < 1, we have that 0 = 1. To see this, note that
n = E[Xn ] =
jP(Xn = j)
j=0
P(Xn = j) = P(Xn 1) = 1 P(Xn = 0) ,
j=1
yielding that
1 n P(Xn = 0) 1,
from which it follows that 0 = 1. Also in case = 1 one can show that
0 = 1. However, in case > 1 we have that 0 < 1.
Theorem 2.2 The probability of ultimate extinction 0 is the smallest
non-negative solution of the equation
x=
p j xj .
(2.2)
j=0
VERSION 2011
22
Proof. This proof uses the notion of probability generating functions of random
variables with values in the natural numbers. These are close cousins of moment
generating functions: the moment generating function of Z is M
Z (t) =
E etZ , and the probability generating function of Z is GZ (s) = E sZ . So you
can go from one to the other by simply substituting s for et . In this way the
properties we know for moment generating functions ([7], Section 4.5) carry
over to probability generating functions. As an example: always MZ (0) = 1
corresponds to always GZ (1) = 1.
Note that the right-hand side of (2.2) is the probability generating function
of Z;
X
GZ (s) = E sZ =
pj sj .
j=0
The consequences of this are two-fold. First, s = 1 is a solution of (2.2).

Furthermore, = E[Z] = MZ0 (0) = G0Z (1), so is the slope of the tangent of
GZ (s) in s = 1. Because it is assumed that p0 > 0, it follows in case > 1
from Figure 2.1 that (2.2) must have another solution x between 0 and 1. Also
note that
0 G0Z (x) < G0Z (1),
for x between 0 and 1,
so x = 1 is the smallest solution of (2.2) in case 1; the curve y = GZ (x)
does not cross y = x for x between 0 and 1.
..
......
......
........ .....
... ......
... .....
........
..........
.
.
.......
......
..... ...
..... ...
..... ........
.
.
.
.
..... ....
.... ....
..... .........
.....
... ..
.....
.
.
.
... ...
.
.....
... ..
.....
... ....
.
.
.
.
.
.
.
.....
.... ..
....
.... ....
.....
.....
.....
.....
...
.....
....
.
.
.
.
.
.
...
.
.
.
.
..
.....
.....
..
.....
.....
.
.
.
.
.
.
.
.
.
....
.....
...
..... ...........
...
.....
.
..... ......
..
..... .............
..
.
.
.
.
.
.
.. ...
...
..............
.........
...
............
...
......... ....
..
0..... ...........
.
.
...
.....
...
.....
Fig. 2.1. The tangent of y = GZ (x) in x = 1, in case > 1.
By definition of 0 ,
VERSION 2011
23
0 = lim P(Xn = 0)
n
= lim
= lim
= lim
X
j=0
X
j=0
P(Xn = 0, X1 = j)
P(Xn = 0 | X1 = j)P(X1 = j)
pj P(Xn = 0 | X1 = j).
j=0
What can one say about P(Xn = 0 | X1 = j)? Note this is the probability of
extinction at time n, given we started j independent trees at time 1. Each of
these trees is extinct at time n with probability P(Xn1 = 0), sodue to the
independence of these j treeswe find
j
P(Xn = 0 | X1 = j) = P(Xn1 = 0) .
But then it follows that
0 =
pj 0j ,
j=0
and we find that 0 is indeed a solution of (2.2).

Next, let be a non-negative solution of (2.2). We are done if we show that
0 . Since GZ (0) = p0 > 0, it follows from X0 = 1 that
0 = P(X0 = 0) .
Now suppose that P(Xn1 = 0) for some n 1. We want to show that
P(Xn = 0), so that by induction for all n 1 we have that P(Xn = 0),
implying that limn P(Xn = 0) = 0 . Recycling what we derived earlier
in this proof, we find
P(Xn = 0) =
=
X
j=0
X
j=0
X
j=0
P(Xn = 0, X1 = j)
pj P(Xn = 0 | X1 = j)
j
pj (P(Xn1 = 0))
(by induction hypothesis)
pj j = .
j=0
Thus we find that 0 is the smallest positive solution of (2.2).
VERSION 2011
24
2.2 Asymptotic behavior

Two examples
In Section 1.3 we introduced some notions that describe the long-time behavior of a Markov chain. If (Xn )n0 is a irreducible Markov chain on a finite
state space S, then every state must be recurrent; see Exercise 1.14. However,
if the state space if infinite, we need to sharpen the definition of recurrence;
as in case of the symmetric random walk, each state is recurrent, but the
expected return time might be infinite. One could wonder whether there are
other properties important for the long-term behavior of the Markov chain?
What happens to P n when the chain is reducible? Consider the following examples.
Example 2.1
Let (Xn )n0 be a Markov chain on the finite state space S = {1, 2, 3, 4}, and
let the matrix of transition probabilities be given by
0 1/2 0 1/2
1/2 0 1/2 0
P =
0 1/2 0 1/2 ;
1/2 0 1/2 0
see also Figure 2.2.
..................................
.................
..........
..........
........
.......
.......
.........
1 . .....
........
........... .... ...........
....... .......
.
.
.
.
............
...
..... ........
.
.
.
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................
...
.
.
.
.....
.....
...
.
...
...........................................
.... ........ ....................
......... ......
... ........ .......
.
.
.
.
.............
...
.
4 ............
......
.
. ........
........
..........
..........
..................
...............................
2.....
...
...
...
..
.
...
.........
3
Fig. 2.2. The transitions in Example 2.1 (all transitions are with probability 1/2).
From Figure 2.2 we see that the Markov chain is irreducible, something that
also follows from the following Quick exercise. But for every i S we have
[n]
[n]
that fii = 0 if n is odd, while fii = 21 if n is even; the states are behave in
a periodic manner.
Quick exercise 2.2 Show that for the Markov chain in Example 2.1 we have
that
0 1/2 0 1/2
1/2 0 1/2 0
Pn =
0 1/2 0 1/2 if n is odd,
1/2 0 1/2 0
VERSION 2011
and
1/2 0
0 1/2
n
P =
1/2 0
0 1/2
1/2 0
0 1/2
1/2 0
0 1/2
25
if n 2 is even.
In the following example we will see, that a small change in the values of P
has dramatic consequences for the values of P n , with n 2.
Example 2.2
Again, let (Xn )n0 be a Markov chain on the finite state space S = {1, 2, 3, 4},
but let the matrix of transition probabilities P now be given by
1/3 1/3 0 1/3

1/2 0 1/2 0
P =
0 1/2 0 1/2 ;
1/2 0 1/2 0
see also Figure 2.3.
...................
.....
...
...
...
.
.....
..
..........................................
...
.. .................
.........
.....
.
.
.
........
... ..................
.....
.
.
..
1 .. .....
.....
........ .
.......... .... ...........
........ ...........
.
............
.
.
...
.
.
.
.
.
.
.
.
.
.......................................
...
..
..
....
....
..
.
...
..
.......................................
.... ....... ......................
......... ......
.... ....... ........
....... . ....
........
...
4 ...........
......
. .......
........
..........
.........
................
....................................
2.....
...
...
...
...
.
.
.
.
.
.
.
.
..
3
Fig. 2.3. The transitions in Example 2.2
From Figure 2.3 we see that the Markov chain is irreducible. Moreover, using
MAPLE (or Matlab, Mathematica,...) we see that P n converges to a matrix
with constant values per column (the values in the matrices are rounded off);
4/9 1/9 1/3 1/9

67/243 283/972 23/162 283/972
1/6 5/12 0 5/12
283/648 8/81 79/216 8/81
5
P2 =
1/2 0 1/2 0 , P = 23/108 79/216 1/18 79/216 ,
1/6 5/12 0 5/12
283/648 8/81 79/216 8/81
.3099029337 .2501914018 .1897142627 .2501914018

.3752871027 .1721414630 .2804299713 .1721414630
P 11 =
.2845713941 .2804299713 .1545686633 .2804299713 ,
.3752871027 .1721414630 .2804299713 .1721414630
and
VERSION 2011
26
P 100
.3333333694 .2222221792 .2222222723 .2222221792

.3333332687 .2222222993 .2222221326 .2222222993
=
.3333334084 .2222221326 .2222223264 .2222221326 .
.3333332687 .2222222993 .2222221326 .2222222993
Of course, things get really interesting when n becomes really big. For instance, if n = 1000, we find
.3333333333 .2222222222 .2222222222 .2222222222

.3333333333 .2222222222 .2222222222 .2222222222
P 1000 =
.3333333333 .2222222222 .2222222222 .2222222222 .
.3333333333 .2222222222 .2222222222 .2222222222
This suggests (but certainly does not prove!) that
1/3 2/9 2/9 2/9

1/3 2/9 2/9 2/9
lim P n =
1/3 2/9 2/9 2/9 .
n
1/3 2/9 2/9 2/9
(2.3)
In fact, following Exercise 1.7 it is in fact easy to show that (2.3) holds; see
also Exercise 2.5.
Quick exercise 2.3 Let be the vector, given by

= 1/3 2/9 2/9 2/9 ,
i.e., is any of the rows of the matrix in (2.3). Show that P = .
In Examples 1.1 (b) and (c) the Markov chain was reducible. In Example 1.1
(b) there are clearly two subworlds; these are the equivalence-classes {1, 4}
and {2, 3}. Again using MAPLE one sees that
1/4 0 0 3/4
0 1/2 1/2 0
as n .
Pn
0 1/2 1/2 0
1/4 0 0 3/4
So on the equivalence-class {1, 4} the Markov chain (Xn )n0 behaves as a
chain with only two states 1 and 4, and with matrix of transition probabilities

2/3 1/3
1/4 3/4
n
P{1,4} =
; P{1,4}
as n ,
1/9 8/9
1/4 3/4
while the chain behaves on on the class {2, 3} as a chain with matrix of
transition probabilities

1/2 1/2
P{2,3} =
.
1/2 1/2
VERSION 2011
27
n
Note that P{2,3}
= P{2,3} for every n 1.
In Example 1.1 (c) we have seen that state 1 behaves as a sink; everything
is eventually sucked into it. Using MAPLE this also becomes apparent. For
example, for n = 100 one finds that
1
0
0
0
.999999990 .482986938 108 .482986938 108
0
,
Pn
.999999985 .724480408 108 .724480408 108
0
.999992330
0
0
.766915923 105
while for n = 1000,
1
0
0
0
1 .2635202196 1079 .2635202196 1079
0
.
Pn
1 .3952803294 1079 .3952803294 1079
0
51
1
0
0
.703845847310
Periodicity
In view of Example 2.1 we have the following definition.
Definition 2.1 Let i be a recurrent state. The period d(i) of state i
is the greatest common divisor of the set
[n]
Ti = {n 1; pii > 0}.

A recurrent state with period 1 is called aperiodic.
In words: d(i) is the greatest positive integer, that divides those positive inte[n]
gers n for which pii > 0. So in Example 2.1 we have that T1 = {2, 4, 6, 8, . . . },
and therefore d(1) = 2. Note that in fact d(i) = 2 for all i S. This is no
coincidence, as the following theorem shows that two recurrent states in the
same equivalence-class have the same period.
Theorem 2.2 Let i and j be two recurrent states with periods d(i)
respectively d(j). Furthermore, let i j. Then d(i) = d(j).
Proof. Because i j, there exist positive integers ` and m, such that
[`]
[m]
pij > 0
and pji > 0.
[n]
Let i be any element of Ti : pii

Kolmogorov) we have, that
[2n]
pii
> 0. Due to Theorem 1.2 (Chapman[n] [n]
pii pii > 0,
and
VERSION 2011
28

[`+m+n]
pjj
[m] [n] [`]
[`+m+2n]
pji pii pij > 0
and pjj
[m] [2n] [`]
pji pii pij > 0.
By definition, d(j) divides both ` + m + n and ` + m + 2n. But then we have

that d(j) also divides the difference of these two positive integers; d(j) divides
n. Since d(i) is the greatest common divisor of all positive integers n for which
[n]
pii > 0, we must have that d(j) d(i). Exchanging the role of i and j in the
preceding discussion yields that d(i) d(j), and we find that d(i) = d(j).
In view of this theorem we can therefore speak of the period of an equivalence
class (of communicating states), or even of the period of a Markov chain, if
this chain is irreducible.
Theorem 2.3 Let P be an irreducible stochastic matrix with period d,
then there exist non-negative integers m, n0 , such that for all n n0 ,
[m+nd]
pij
> 0.
Remark. If d = 1 and P is a finite matrix (i.e., an s s-matrix for some

s N), then it follows from Theorem 2.3 that there exists a non-negative
[n]
integer n0 such that all the entries in P n for n n0 , i.e., pij > 0 for n n0 .
[m]
Proof. Since i j, there exists a positive integer m such that pij > 0. From
the Chapman-Kolmogorov equations we obtain
[m+nd]
pij
[m] [nd]
pij pjj ,
so it is enough to show that there exists a non-negative integer n0 such that

[nd]
pjj > 0 for all n n0 . Since the period is d, by definition the gcd of the set
[n]
Tj = {n 1; pjj > 0} is d. Again due to Chapman-Kolmogorov the set Tj is

closed under addition; if n1 , n2 Tj , then n1 + n2 Tj :
X [n ] [n ]
[n +n ]
[n ] [n ]
pjj1 2 =
pjs1 psj2 pjj1 pjj2 > 0.
sS
The desired result now follows from the following lemma from Number Theory;
see [1], Section 2.4 and Appendix A.
Lemma Let d be the gcd of the set of positive integers A = {an ; n
N}, and suppose that A is closed under addition (i.e., an + am A
for all n, m N). Then there exists a positive integer n0 such that
and A for all n n0 .

The main theorem
In the Ehrenfest gas-of-molecules-example in Section 1.1 we saw that the
vector = ( 0 1 . . . N ), given by
VERSION 2011

N N
i =
2 ,
i
29
i = 0, 1, . . . , N,
is a stationary distribution; for all n we have P(Xn = j) = j , for j =

0, 1, . . . , N .
Here is the general definition.
Stationary distribution Let (Xn )n0 be a Markov chain on a finite
or countable state space S, and let P be the matrix of transition
probabilities of (Xn )n0 . Then the vector = ( i ), with
X
i 0 for i S,
and
i = 1,
iS
is a stationary distribution of the Markov chain if for all j S, and

all n N,
P(Xn = j) = j .
When is unique it is usually denoted by .
It follows from Corollary 1.2 that is a stationary distribution if and only if
= P .
In the following Quick exercise you are invited to show that there might be
more than one stationary distribution.
Quick exercise 2.4 Let N be some positive integer, and let (Xn )n0 be a
Markov chain on the state space S = {1, 2, . . . , N }, satisfying pii = 1 for each
state i S. Show that any distribution on the state space S is a stationary
distribution.
In Examples 2.1 and 2.2 we have seen that the long term behavior of a Markov
chain depends on the (ir)reducibility and (a)periodicity the chain. Clearly, if
the chain is reducible, it will fall apart in subworlds, which can be studied
separately, and if the chain is periodic the distribution will not settle down
for large n. On the other hand, we have seen in various examples that if the
chain is nice (say irreducible and aperiodic), that the distribution of Xn for
n large stabilizes.
We have the following theorem, which we mention here without proof. Various
proofs exist in the literature; see for example the excellent books [1], [3], [4],
and [5].
Main theorem Let (Xn )n0 be an irreducible Markov chain on a
finite or countable state space S with transition matrix P = (pij ). The
chain has a stationary distribution if and only if all the states are
non-null recurrent. In this case, is the unique stationary distribution,
and satisfies
1
i =
,
for each i S,
mi
VERSION 2011
30
where mi is the mean recurrence time of state i (c.f. Definition 1.1).

Finally, if the chain is aperiodic (and non-null recurrent), then
[n]
pij j
as n , for all i, j S.
Remarks. (i) If the state space S is finite and the chain is irreducible, automatically all states are non-null recurrent; see Exercise 1.14.
(ii) In an irreducible aperiodic Markov chain in which all states are non-null
[n]
recurrent, the limit limn pij does not depend on the starting point X0 = i;
the chain forgets its origin. By Corollary 1.2,
P(Xn = j) j
as n ,
irrespective of the initial distribution; see also Exercise 2.19, where you are
invited to give a proof of this in case S is finite.
In fact a more general result holds, which is known as the ergodic theorem.
Ergodic theorem Let (Xn )n0 be an irreducible Markov chain on
a finite state space S. Let P = (pij ) be its transition matrix, and
its unique stationary distribution. Furthermore, let f : S R be a
function on S. Then with probability 1 we have that
n1
X
1X
f (Xk ) =
f (i)i .
n n
lim
k=0
iS
In fact, this theorem can be formulatedwith some constraints on f for states

spaces S which are countable, but not finite; see e.g. [5]. The ergodic theorem
in fact a generalization of the law of large numbersis often interpreted as:
time mean = space mean. The ergodic theorem has its roots in theoretical
physics, but is nowadays widely used in many fields in mathematics, such as
Number Theory; see e.g. [2].
(iii) In fact, when the state space S is finite, a proof of the main theorem can
be obtained using linear algebra, essentially following the lines of Exercises 1.7
and 2.5. In these exercises the key point is, that the largest eigenvalue of P
is 1, and that the other eigenvalues are smaller than 1 (in absolute value).
Then P can be written as P = T DT 1 , where T is a matrix whose columns
are the eigenvectors of P , and D is a diagonal matrix with the eigenvalues
of P in its diagonal. Consequently, P n = T Dn T 1 , and we find the desired
result, since Dn is a diagonal matrix with the nth powers of the eigenvalues
of P in its diagonal. In order to see that this approach works in general for
a transition matrix P of a finite irreducible Markov chain of period d, the
famous theorem of Perron-Frobenius comes to our aid; see Section 6.6 in [3],
where this theorem is stated, and a more detailed proof of this approach is
given.
VERSION 2011
31
(iv) The main theorem implicitly contains an algorithm how to determine the
stationary distribution ; one can find from
P = ,
and the fact that
X
i = 1.
iS
As an example, consider the Markov chain from Quick exercise 1.1. In this
example (Ehrenfests model for N = 5), we have that
0 1 0 0 0 0
1 0 4 0 0 0
5 2 5 3
0 0 0 0
5
5
P =
0 0 3 0 2 0 .
5
5
0 0 0 4 0 1
5
5
0 0 0 0 1 0
From = P , we find that
0 = 15 1 ,
1 = 0 + 52 2 , 2 = 45 1 + 35 3 ,
3
4
3 = 5 2 + 5 4 , 4 = 25 3 + 5 , 5 = 15 4 ,
yielding that
1 = 50 , 2 = 100 , 3 = 100 , 4 = 50 , 5 = 0 ,
and therefore,
0 + 50 + 100 + 100 + 50 + 0 = 1
0 =
1
32 .
We find that
=
1 5 10 10 5 1
32 32 32 32 32 32
is the unique stationary distribution of the Markov chain in the Ehrenfest

model (in case N = 5). Note that in Quick exercise 1.2 you have showed that
the stationary distribution is Bin(N, 12 ), where N is the number of molecules
in the gas. It is a simple exercise to show that this is indeed the case here.

2.1 From (2.1) it follows that

(n)
(n)
pk0 = P Z1 + + Zk = 0

(n)
(n)
(n)
= P Z1 = 0, Z2 = 0, . . . , Zk = 0
= pk0 > 0.
VERSION 2011
32
2.2 The result immediately follows from the fact that
1/2 0 1/2 0
0 1/2
0 1/2 0 1/2
1/2 0
2
3
P =
, and P =
1/2 0 1/2 0
0 1/2
0 1/2 0 1/2
1/2 0
0 1/2
1/2 0
= P.
0 1/2
1/2 0
2.3
P =
1/2 1/3
1/2 0
1/3 2/9 2/9 2/9

0 1/2
1/2 0

1/3 2/9 2/9 2/9
0 1/3
1/2 0
0 1/2
1/2 0
=
= .
2.4 Let = ( 1 2 . . . N ) be any distribution on S. Because pii = 1 for

each state i S we have that the matrix of transition probabilities P is the
identity matrix, sone trivially has that P = .
2.4 Exercises
2.1 Consider a branching process, where the distribution of the number of
offsprings Z is given by
pj = P(Z = j) ,
j = 0, 1, 2, . . . .
Determine the probability of ultimate extinction in each of the following three

cases:
a. p0 = 41 ; p2 = 34 .
b. p0 = 14 ; p1 = 12 ; p2 = 41 .
c. p0 = 16 ; p1 = 12 ; p2 = 31 .
2.2 Suppose a branching process is given, where the distribution of the number of offsprings Z is given by
P(Z = 0) =
1bc
,
1c
P(Z = j) = bcj1 ,
j = 1, 2, . . . ,
with 0 < b 1 c.
a. Find the probability generating function of Z, and use this generating
function to determine the expectation = E[Z].
b. Determine for all values of the probability of ultimate extinction 0 .
VERSION 2011
2.4 Exercises
33
2.3 Let (Xn )n0 be a branching process, with X0 = 1, and let the number of offspring of each individual be given by the random variable Z, with
probability mass function given by
P(Z = 0) = P(Z = 1) =
1
4
and P(Z = 2) =
1
.
2
a. Determine the conditional expectation of X2 , if it is known that X1 = 2.

Next, what is the conditional expectation of X10 , given that X1 = 2?
b. Determine the probability of ultimate extinction if it is given that X1 = 2.
2.4 Suppose a branching process is given, where the distribution of the number of offsprings Z is given by
P(Z = 0) = 1 2p,
and P(Z = 1) = P(Z = 2) = p,
where p is a parameter satisfying 0 < p < 21 . Determine the probability of

ultimate extinction 0 for each p satisfying 0 < p < 21 .
2.5 Find the the eigenvalues, and the left eigenvector belonging to the eigenvalue 1 of the matrix P from Example 2.2. Use this to show that (2.3) holds.
2.6 Find the stationary distribution of the Markov chain (Xn )n0 of the
weather model in Exercise 1.2. What percentage of the time will it rain?
2.7 Find the stationary distribution of the Markov chain (Yn )n0 of the more
elaborate weather model in Exercise 1.3. What is the percentage of the time
it will rain?
2.8 In Exercise 1.6 we consider the Markov chain (Xn )n0 , with state space
S = {0, 1}, and with matrix of transition probabilities P , given by
1 2
P = 31 13 .
2 2
Determine the stationary distribution of this Markov chain. Do you find the
same answer as in Exercise 1.7? Why or why not?
2.9 Suppose we have a vase with N balls. At each time n N some of these
balls will be white and the others black (although it is possible that there are
times n where all the balls are black, or all white. At every time n we throw
a fair coin: if heads show we select completely at random a ball from the
urn, and replace it by a white ball. In case we throw tails, we also select
completely randomly a ball from the urn, but now replace it by a black ball.
Let Xn be the number of white balls in the urn at time n.
a. Explain in words why (Xn )n0 is Markov chain. What is the state space
S?
VERSION 2011
34
b. Determine the transition matrix P , i.e., determine the transition probabilities pij = P(Xn+1 = j | Xn = i) for i = 0, 1, . . . , N , j = 0, 1, . . . , N .
c. What are the equivalence classes? Is the chain irreducible? What is the
period d(i) for i = 0, 1, . . . , N ?
d. Suppose that N = 2. Determine the stationary distribution . What is
mi for i = 0, 1, 2?
2.10 A transition matrix P is called doubly stochastic if the sum over the
entries of each column is equal to 1, i.e., if
X
pij = 1,
for each j.
i
Let (Xn )n0 be an aperiodic irreducible Markov chain on the finite state space
S, say S = {0, 1, . . . M }, with a doubly stochastic transition matrix P . Show
that the stationary distribution is equal to the discrete uniform distribution
on S, i.e., that
1
,
for each i S.
i =
M +1
2.11 In Section 1.3 we considered as an example of a Markov chain on an infinite state space the simple random walk on Z; cf. page 10. Here the transition
probabilities for all i Z are given by
pi,i+1 = p,
pi,i1 = q = 1 p,
for some p (0, 1),
and we saw that the chain is irreducible, but that only in case p =
are recurrent.
1
2
the states
a. Show that the transition matrix P is doubly stochastic.

b. Use the Main Theorem to show that in case p = 21 each state i Z is
null-recurrent.
2.12 Five hippies are sitting in a circle, smoking a pipe. After every puff the
smoker gives the pipe to the person to her/his right with probability p, and to
the person to the left with probability q = 1 p. For sake of convenience, the
five people are numbered 1 to 5. Let Xn be the person holding the pipe after
n puffs. Clearly, (Xn )n0 is a Markov chain on the state space S = {1, . . . , 5}.
a. Find the transition matrix P and classify the states.
b. What is the fraction of time the person who lit the pipe is actually smoking
it?
c. Discuss the case when there are six smokers.
2.13 Suppose that we repeatedly throw an unbiased coin, and that Xn is the
outcome of the nth throw, with Xn = 1 if heads appears in the nth throw,
and Xn = 0 otherwise. We assume that the Xn s are independent. Define for
for n = 1, 2, . . . the random variables Sn by
VERSION 2011
2.4 Exercises
35
Sn = X1 + X2 + + Xn .
Show that
lim P(Sn is a multiple of 5) =
1
.
5
Hint: find a suitable Markov chain.

2.14 From the examples in Section 2.2 it is clear that aperiodicity is an
important ingredient of the Main Theorem. In fact it is a theorem, that if
(Xn )n0 is an irreducible Markov chain on a finite state space S with transition matrix P , we always have that
n1
1 X [k]
pij ,
n n
for all i, j S,
i = lim
(2.4)
k=0
where is the unique probability vector satisfying = P . We know from

the Main Theorem that
[n]
for all i, j S.
i = lim pij ,
n
(2.5)
This suggests that (2.5) is a stronger property than (2.4) (i.e., that (2.5)
implies (2.4)). This is indeed so, as the following will show.
a. Investigate whether the limits
n
1X
ak ,
n n
lim
and lim an
n
k=1
exist, in the following two cases:

(i) an = (1)n , for n 1.
(ii) an = n1 , for n 1.
b. Let a be a real number, and let (an )n1 be a sequence of real numbers
converging to a, i.e.,
lim an = a.
n
Show that the limit
1X
ak
n n
lim
k=1
exists, and is equal to a.

Hints: First show that it is enough to show that the statement holds in
cases a = 0. Next, set for n 1,
Sn =
n
X
ak ,
k=1
VERSION 2011
36
and show that for every > 0 there exists a positive integer N , such that
SN (n N ) < Sn < SN + (n N ) ,
for all n N . Use this to show that Sn /n 0 as n .
2.15 Let (Xn )n0 be a Markov chain on the state space S = {1, 2, 3}, with
transition matrix P given by
1 1 1
3 2
1
3
1 1
2 6
P = 16
6
1
2
1
3
Furthermore, the initial distribution is given by

= 13 13 13 .
Determine:
n1

1X
Xk . Hint: use the ergodic theorem.
n n
k=0
b. P(Xn = i | Xn+1 = j), for i, j S.
a. lim
2.16 (Continuation of Exercise 2.15). Suppose that we are also given that
for n 1,
Yn = Xn + Xn1 .
a. Calculate
P(Y3 = 5 | Y2 = 3, Y1 = 2)
and
P(Y3 = 5 | Y2 = 3, Y1 = 3).
Is the stochastic process (Yn )n1 a Markov chain (on SY = {2, 3, 4, 5, 6})?
b. Determine
n
1X
lim
Yk .
n n
k=1
2.17 Let us return once more to Ehrenfests model of molecules in a gas. Suppose we made a film of the transitions, and we started this movie somewhere
in the middle, without telling you whether it moved forward or backward
in time. You wouldnt be able to tell the difference! In other words, the forward transition probabilities pij = P(Xn+1 = j | Xn = i) are identical to
the backward transition probabilities qij = P(Xn = j | Xn+1 = i), for each
i, j S. In fact, for any Markov chain one can reverse the order of time,
and get a new Markov chain, with transition matrix Q = (qij ). Here, in the
Ehrenfest example, we moreover have that P = Q. In general, when P = Q,
we say that the Markov chain is time-reversible.
VERSION 2011
2.4 Exercises
37
a. Show that an irreducible positively recurrent Markov chain (Xn )n0 with
stationary distribution is time-reversible if and only if
i pij = j pji ,
for all i, j S.
b. Let (Xn )n0 be an irreducible Markov chain, and let be a probability

vector, satisfying
X
i 0,
i = 1, and i pij = j pji , for all i, j S.
iS
Show that is the unique stationary distribution of the chain.

2.18 In the remarks following the Main Theorem in Section 2.2 a method
was given, how to find the stationary distribution of an irreducible Markov
chain. However, when this chain is time-reversible, the results of Exercise 2.17
yield an easier way to find .
Find the stationary distribution of the Ehrenfest example, by using Exercise 2.17.
2.19 Let (Xn )n0 be an irreducible aperiodic Markov chain on a finite
state space S. Show that Remark (ii) (on page 29) holds, that is, the limit
[n]
limn pij does not depend on the starting point X0 = i; the chain forgets
its origin. More precisely, show that for j S,
P(Xn = j) j
as n ,
irrespective of the initial distribution.
VERSION 2011
VERSION 2011
3
Continuous-time Markov chains
In this chapter we will study the continuous-time analogue of the discrete time
Markov chains from Chapters 1 and 2. This continuous-time Markov chain is
a stochastic process (Xt )tI , where the random variables Xt are indexed by
some (time-)interval I (usually the interval [0, )), and where the Markov
property
given the present state, the future is independent of the past
applies. Continuous-time Markov chains have many important applications,
and some of thesein queuing theorywill be outlined.
The analysis of continuous-time Markov chains is harder than that of its
brother, the discrete-time Markov chain, so often the proofs of its properties
will only be mentioned, or heuristically motivated. Having said this, we will see
that there are important similarities between these two stochastic processes.
We will see for example, that the main theorem from Chapter 2 plays an
important role in understanding the long-term behavior of continuous-time
Markov chains. In the next section, a formal definition will be given, and we
will briefly consider an important example of such stochastic processes, which
is a process that you already have met on page 46 of [7]: the Poisson process.
We will see in later sections that one of the fundamental properties of the
Poisson process also applies to general continuous-time Markov chains; the
inter arrival times are exponentially distributed.
3.1 The Markov property

As in the discrete case we will assume that the random variables in the stochastic process (X(t))t0 will1 attain their values in some finite or countably infinite set S, the state space. The stochastic process (X(t))t0 is a continuous
time Markov chain if it satisfies the Markov property:
1
We changed the notation from Xt to X(t) for typographical reasons.
VERSION 2011
40
3 Continuous-time Markov chains
P(X(tn+1 ) = j | X(tk ) = ik , 1 k n) = P(X(tn+1 ) = j | X(tn ) = in ),

for any sequence 0 < t1 < < tn < tn+1 and for all i1 , i2 , . . . , in , j S.
In case P(X(s) = i) > 0, we define for t s the transition probability pij (s, t)
by
pij (s, t) = P(X(t) = j | X(s) = i),
and we denote by P (s, t) the stochastic matrix with entries pij (s, t).
Example 3.1 (The Poisson process)
As an example of the kind of stochastic process we have in mind, recall the
Poisson process (N (t))t0 with intensity from page 46 of [7]. Such a process
has the following properties:
(i) N (0) = 0;
(ii) the process has independent increments, i.e., for all t0 , t1 , . . . , tn with
0 t0 < t1 < < tn ,
the random variables
N (t0 ),
N (t1 ) N (t0 ),
N (t2 ) N (t1 ),
......,
N (tn ) N (tn1 )
are independent;
(iii) for all t, s 0,
P(N (t + s) N (t) = k) =
(s)k s
e
.
k!
(3.1)
In order to show that the Poisson process (N (t))t0 is a continuous time

Markov chain, consider the event
V = {N (tk ) = ik , 1 k n}
for 0 < t1 < < tn < s < t and i1 , i2 , . . . , in , j S. Then
P(N (t) = j, N (s) = i, V)
P(N (s) = i, V)
P(N (t) N (s) = j i, N (s) = i, V)
=
(use (ii) above)
P(N (s) = i, V)
P(N (t) N (s) = j i) P(N (s) = i, V)
=
P(N (s) = i, V)
= P(N (t) N (s) = j i) .
P(N (t) = j | N (s) = i, V) =
A similar calculation yields that

P(N (t) = j | N (s) = i) = P(N (t) N (s) = j i) ,
VERSION 2011
3.2 Semigroup and generator matrix
41
so we find that
P(N (t) = j | N (s) = i, V) = P(N (t) = j | N (s) = i),
and it follows that the Poisson process (N (t))t0 is indeed a continuous
time Markov chain. It is time-homogeneous, because it follows from (iii) that
P(N (t + s) = j | N (s) = i) is independent from s, but only depends on t.
Quick exercise 3.1 Let (N (t))t0 be a Poisson process with intensity .
Determine pij (s, t), for all i, j N and all t s 0.

An assumption we will make throughout is that we will only consider chains
which are time homogeneous; the transition probability pij (s, t) only depends
on the time difference t s, not on the values of s and t, i.e.,
pij (s, t) = pij (0, t s),
for all i, j S and t s.
In view of this it suffices to write pij (t) in stead of pij (s, t), and Pt or P (t) in
stead of P (s, t). So we have that
X
pij (t) 0,
and
pij (t) = 1.
jS
As in the discrete case we have the Chapman-Kolmogorov equations:

X
pij (s + t) =
pik (s)pkj (t),
for s, t 0.
(3.2)
kS
Quick exercise 3.2 Give a proof of (3.2).

One of the handy things in dealing with discrete-time Markov chains is the
transition matrix P . Clearly, for continuous-time Markov chains such a single
matrix P does not exist, but rather a family {Pt }t0 of transition matrices.
This family has a few nice properties, which makes it into a semigroup; it
satisfies
(a) P0 = I, where I is the identity matrix;
(b) for all t, s 0 we have that Pt+s = Pt Ps (Chapman-Kolmogorov).
Henceforth we will assume that the semigroup is continuous at the origin,
lim Ph = P0 ,
h0
where the convergence is pointwise for each entry, i.e.,
VERSION 2011
42
(
1, if i = j
lim pij (h) =
h0
0, if i 6= j.
(3.3)
This makes the semigroup {Pt }t0 a standard semigroup (or: continuous semigroup). We have the following proposition.
Proposition 3.1 Let {Pt }t0 be a standard semigroup on S. Then
for all i, j S and all t 0 we have that
lim pij (t + h) = pij (t),
h0
that is, for all i, j S and all t 0 the function t 7 pij (t) is continuous at t.
Proof. This follows directly from the assumption (3.3) that the semigroup is
standard and the next lemma, taking s = h, and t for right-continuity, t h
for left-continuity.

Lemma 3.1 For all s, t 0 and i, j S
|pij (t + s) pij (t)| 1 pii (s).
Proof. For all s, t 0 one has on the one hand that
X
pij (t + s) = pii (s)pij (t) +
pik (s)pkj (t)
k6=i
pij (t) +
pik (s)
k6=i
= pij (t) + 1 pii (s),

while on the other hand we have that
pij (t + s) pii (s)pij (t) pii (s) + pij (t) 1 = pij (t) (1 pii (s)),
since P(A B) P(A) + P(B) 1 for any events A, B.
Combination of the two inequalities yields |pij (t + s) pij (t)| 1 pii (s).
The standard semigroup {Pt }t0 on S is not a very handy object to work
with. However, if the functions t 7 pij (t) are continuous at t = 0 one can show
that they are also differentiable in t = 0; for a proof, see e.g. [1], Chapter 8,
Theorem 2.1. Now things become more easier. We can define for i, j S
numbers qij by
pij (h) pij (0)
.
(3.4)
qij = p0ij (0) = lim
h0
h
Quick exercise 3.3 Show that qii 0, and that qij 0 whenever i 6= j.
VERSION 2011
43
From (3.4) it follows that for h > 0 small,

pij (h) hqij if i 6= j,
Since
jS
and pii (h) 1 + hqii .
pij (h) = 1, we find that

1=
pij (h) 1 + h
jS
qij ,
jS
suggesting that
X
for all i S.
qij = 0,
jS
If S is finite, this can also be seen as follows:

X
jS
qij =
X
jS
p0ij (0) =

d X
d

=
(1) = 0.
pij (t)
dt
dt
t=0
jS
In case S in countably infinite the exchange of sum and derivative is not

always allowed.
The matrix Q = (qij ) is called the generator (or: intensity matrix ) of the
standard semigroup {Pt }t0 on S. It takes over the role of the transition
matrix P for discrete-time Markov chains. In a compact notation,
Q = lim
h0
1
(Ph I) .
h
Setting q(i) = qii , we have found that

(
qij h + o(h), h 0,
if i 6= j
P(X(t + h) = j | X(t) = i) = pij (t) =
1 q(i)h + o(h), h 0, if i = j.
We will assume for the rest of this chapter that for every state i
q(i) < ,
(the semigroup {Pt }t0 is stable),
and
X
qij = 0
(the semigroup {Pt }t0 is conservative).
jS
Quick exercise 3.4 Let (N (t))t0 be a Poisson process with intensity .

Show that qii = , qi,i+1 = , and that qij = 0 for all other values of
i, j N.
VERSION 2011
44
3.3 Kolmogorovs backward and forward equations

It follows from the property (c) of the semigroup {Pt }t0 , that for all t 0
and all h 0 one has that
Pt+h Pt
Ph I
Ph I
= Pt
=
Pt .
(3.5)
h
h
h
So in case S is a finite state space, we find that
d
Pt = Pt Q = QPt ,
dt
where Q is the generator of the semigroup {Pt }t0 . The equation
d
Pt = QPt ,
dt
is know as Kolmogorovs backward equation, while the equation
(3.6)
d
Pt = Pt Q,
(3.7)
dt
is Kolmogorovs forward equation. Note, that these equations are differential
equations, and that we have as initial condition that P0 = I. In case S is
finite, the unique solution of these equations is given by
Pt = etQ .
(3.8)
Recall from your linear algebra classes that if C is a finite dimensional matrix,
eC =
X
Cn
,
n!
n=0

where the sum is taken per entry.
In case S in infinite, problems may arise when taking the limit h 0 in (3.5).
We have the following theorem.
Theorem 3.1 (Kolmogorovs backward equation) If the standard semigroup {Pt }t0 is stable and conservative, Kolmogorovs
backward equation (3.6) is satisfied.
For the forward equation the result is far less general, mainly due to the lack
of regularity assumptions on the trajectories of the Markov chain.
Theorem 3.2 (Kolmogorovs forward equation) If the standard
semigroup {Pt }t0 is stable and conservative, and moreover, if for all
states i and all t 0,
X
pij (t)q(j) < ,
jS
then Kolmogorovs forward equation (3.7) is satisfied.

For a proof of these results, see [1] (Chapter 8, Theorems 1.1 and 1.2), [3]
(Section 6.10), or [6] (Theorem 2.1.1).
VERSION 2011
3.4 The generator matrix revisited
45

Up to now it is not clear why the generator Q is such a handy thing, comparable to the transition matrix P in the discrete case. To get an idea why this
is so, let us first return to the Poisson process. Let (N (t))t0 be a Poisson
process with intensity . Define the random variable S1 by
S1 = inf{t 0; N (t) = 1},
(recall that by definition N (0) = 0), so S1 is the first time t for which N (t) = 1.
In general, for k 2 define Sk by
Sk = inf{t 0; N (t) = k}.
If we view the Poisson process as a bookkeeping in time of certain occurrences
(N (t) is the number of customers who have entered a shop up to and including time t), then Sk is the time when the kth occurrence took place. Setting
S0 = 0, and
Tk = Sk Sk1
for k = 1, 2, . . . ,
one can show that the random variables Tk the so-called interarrival times
are independent, Exp() distributed random variables (in fact, the Poisson
process can be defined in this way, and the properties (i), (ii), and (iii) are
equivalent to this). In Quick exercise 3.4 we saw that q(i) = = qi,i+1 for
every i N. This is no coincidence, as we now show.
Theorem 3.3 Let (X(t))t0 be a continuous-time Markov chain on
the state space S, with standard semigroup {Pt }t0 and initial distribution . Moreover, suppose that, for t, u 0,

u
P(X(s) = i, t s t + u) = lim P X t + k n = i, k = 0, 1, . . . , 2n .
n
2
Then we have for t, u 0,
P(X(s) = i, t s t + u | X(t) = i) = eq(i)u .
In words, this theorem states the following: given that at time t the chain is
in state i, the probability that it will stay in state i during the time-interval
[t, t + u] is equal to eq(i)u (i.e., it is exponentially distributed with parameter
q(i)).
Proof of Theorem 3.3. Note that repeated use of the Markov properties yields
that
n

u 2
u
.
P X t + k n = i, k = 0, 1, . . . , 2n = P(X(t) = i) pii n
2
2
By definition of q(i) it follows that (by taking h = u/2n ),
VERSION 2011
46
pii
u
q(i)u
u
=1 n +o n ,
n
2
2
2
as n .
But then (by a well known standard limit)

lim
pii
u 2
= eq(i)u ,
2n
and we find that

P(X(s) = i, t s t + u) = P(X(t) = i) eq(i)u .
The desired formula follows if we divide the left- and righthand side of this
last equation by P(X(t) = i).

Now suppose that the chain is in state i at time t. Given this event, we define
the random variable Ti as the remaining time (the holding time) that the
chain will be in i;
Ti = inf{s 0; X(t + s) 6= i}.
It is no coincidence that this random variable bears the same name as the
random variable Ti we defined for the Poisson process at the beginning of this
Section! Theorem 3.3 states that given that the chain is at time t in state i,
Ti has an Exp(q(i)) distribution. So at time t + Ti the chain is for the first
time after time t in a state different from i. To which state j (different from
i) does the chain jump? Note that
P(X(t + Ti ) = j, Ti > u, X(t) = i) =
= lim

X
u
u
P X(t + k n ) = i, 0 k 2n + N 1, X(t + (2n + N ) n ) = j
2
2
N =1
n

u
u 2 X
u N 1
pij n
= lim P(X(t) = i) pii n
pii n
n
2
2
2
N =1

n
u

u 2 pij 2n

= lim P(X(t) = i) pii n
n
2
1 pii 2un
qij
= P(X(t) = i) eq(i)u
.
q(i)
Since
P(X(t + Ti ) = j, Ti > u | X(t) = i) =
P(X(t + Ti ) = j, Ti > u, X(t) = i)

,
P(X(t) = i)
we find that
P(X(t + Ti ) = j, Ti > u | X(t) = i) = eq(i)u
qij
,
q(i)
VERSION 2011
47
or in words: given that X(t) = i, the chain stays an exponentially distributed

time Ti in state i, and then jumps (independently of Ti ) to state j with
qij
.
probability q(i)
Define the jump-matrix R = (rij ) by
(q
ij
rij =
q(i)
if i 6= j
if i = j.
Quick exercise 3.5 Show that R is a Markov matrix (i.e., a stochastic matrix) on S.

So our previous characterization of a continuous-time Markov chain is as follows: given that X(t) = i, the chain stays an exponentially distributed time
Ti in state i, and then jumps (independently of Ti ) to state j according to the
stochastic matrix R.
This characterization makes it possible to simulate continuous-time Markov
chains. We also see that in essence discrete-time and continuous-time behave
in the same way, the difference being that the time between two steps is
discrete in the former, and exponentially distributed in the latter case.
Example 3.2
Consider a factory with two machines and one repairman in case one or both
machines break down. Operating machines break down after an exponentially
distributed time with parameter . Repair times are exponentially distributed
with parameter .
Let X(t) be the number of operational machines at time t, then X(t) can
attain as values 0, 1, and 2, so we have S = {0, 1, 2} as state space. Suppose that at time t both machines work, so X(t) = 2, and let X1 and X2
be the time until failure of the first respectively the second machine. Let
T2 = min{X1 , X2 }, then T2 isas the minimum of two independent exponentially distributed random variables, both with parameter exponentially
distributed with parameter 2. So X(s) = 2 for s [t, t + T2 ) and X(s) = 1
for2 s = t + T2 . Suppose that at this time s = t + T2 it is the first machine that
breaks down. Due to the memoryless property the residual life of the second
machine starts all over again! We now have one operating machine, and one
machine the repairman is trying to fix. Let Y be the time needed to fix the
broken machine, and Xres the time the operating machine still runs. Then
T1 = min{Xres , Y }, and we have that X(s) = 1 if s [t + T2 , t + T2 + T1 ),
and that at time s = t + T2 + T1 the chain either jumps with probability
r10 to state 0 (the second machine also breaks down, while the repairman is
still working on the broken machine), or to state 2 with probability r12 (the
2
Note that with probability zero both machines will break down at exactly the
same moment.
VERSION 2011
48
repairman finished repairing the broken machine before the other machine
broke down). Etcetera. Note that T1 is exponentially distributed with parameter + (because T1 = min{Xres , Y }), and T0 is exponentially distributed
with parameter . The schematic representation in Figure 3.1 helps us to find
the generator matrix Q.
...................................... ............................................
........
..... ..
.....
...
... .....
................................. 1 .......................................... 2
0
..............
Fig. 3.1. The intensities of the transitions in Example 3.2
We find that
0
Q = ( + ) .
0
2
2
Quick exercise 3.6 Determine the jump matrix R.
Example 3.3
Suppose that in the previous example we have two repairmen, each having
an exponentially distributed repair time with parameter , and where these
repair times are independent. Furthermore, we assume that each repairmen
works alone. In this case the transition intensities are given in Figure 3.2.
2
..................................... .............................................
........
. ...
.... ..
.......
.
.
...
.
.
.
.
.
.
.............................. 1 ...................................... 2
0
..............
Fig. 3.2. The transition intensities in Example 3.3
For this example we find that
2
2
0
Q = ( + ) .
0
2
2
Example 3.4
In the previous example the assumption that both repairmen can do the job
in the same time seems to be somewhat unrealistic. Suppose that the first
VERSION 2011
49
repairman has an exponential repair time with parameter 1 , and that the
second repairman has an exponential repair time with parameter 2 , and say
1 < 2 (i.e., in the mean the second repairman works faster than the first
one). In view of this, the board of directors of the factory have decided that the
second repairman always should start to repair a broken machine after a period
in which both machines were operational. Once a repairman starts to work on
a machine, he/she will also finish the job (so the first repairman is not taken
from his job if the second repairman finished repairing her machine before
the first repairman was finished). Clearly, we cannot describe this anymore as
a continuous Markov chain with state space S = {0, 1, 2} (as we did in the
previous two examples). We need to split state 1. We set S = {0, 11 , 12 , 2},
and X(t) = 11 now means that at time t one machine is operational, and that
repairman 1 is working on the otherbrokenmachine. In the same way,
X(t) = 12 now means that at time t one machine is operational, and that
repairman 2 is working on the other machine. The transition intensities are
given in Figure 3.3.
.........................
........
..........
................
........
..........
.......
.........
..
........
.
.
0
.......
.........
.
.......... .... .........
....... ............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..............
...
.......................................
..
.
....
...
1
...
2....
. 2
...
...
..
....
.... .......
....
.... .........
11 ......
...........
.......
.
.
.
.
.
.........
.....
.
.
.
.
.............
.
.
.
.
.
........................................
1
1..2...
...
...
.. 2
..
..
.
.
.
.
.
.......
2
Fig. 3.3. The transition intensities in Example 3.4
Now the intensity matrix Q is given by
(1 + 2 )
2
1
0
( + 1 )
0
1
.
Q=
0
( + 2 ) 2
0
0
2
2

In this section we will study the asymptotic behavior of a continuous-time
Markov chain on a finite or countably infinite state space S. A lot of notation will be recycled from the discrete case. For example, the chain is called
irreducible if for any pair i, j S we have that pij (t) > 0 for some t.
Lemma 3.2 For any pair of states i, j S we either have either that
pij (t) = 0 for all t > 0, or that pij (t) > 0 for all t > 0.
VERSION 2011
50
Proof. Since limh0 pii (h) = 1, we find that pii (h) > 0 for small values of h.
From this, and the inequality
pii (s + t) pii (s)pii (t),
it follows that pii (t) > 0 for all t 0.
Now suppose that i 6= j, and there exists a value s > 0 for which pij (s) > 0,
so i j. Then we can find states i0 = i 6= i1 6= i2 6= =
6 in1 6= in = j, for
which
ri0 i1 ri1 i2 ri2 i3 rin1 in > 0,
and therefore
qi0 i1 qi1 i2 qi2 i3 qin1 in > 0.
Since ik1 6= ik for k = 1, 2 . . . , n, we see that
pik1 ik (h) = qik1 ik h + o(h),
h0
implies that pik1 ik (h) > 0 for h sufficiently small, yielding due to ChapmanKolmogorovthat pik1 ik (t) > 0 for t > 0. But then we find that
pij (t) pi0 i1 (t/n)pi1 i2 (t/n) pin1 in (t/n) > 0.

Another definition which is recycled from the discrete case, is that of a stationary distribution. The vector = (i )iS is a stationary distribution of the
chain if is a probability vector, and
= Pt
forall t 0.
Note that if the probability vector (0) is the initial distribution, i.e.,
(0)
P(X(0) = i) = i
for i S,
thenusing the law of total probability (see also Corollary 1.2)the distribution (t) at time t is given by
(t) = (0) Pt .
So if (0) = , then (t) = , for all t.
Recall, that for discrete-time Markov chains we find by solving = P . For
continuous-time Markov chains we need to solve = Pt for all t. This might
seem to be a hard task, but the intensities matrix Q comes to our aid here.
As in the discrete case the stationary distribution need not exist, orif it
existsneed not to be unique. We have the following theorem, which we state
for finite state spaces S; from the proof it will be clear what extra conditions
are needed in case S is countably infinite.
VERSION 2011
51
Theorem 3.4 Let (X(t))t0 be an irreducible continuous-time Markov

chain with finite state space S, standard semigroup {Pt }t0 , and generator matrix Q. Then for every i, j S the limit
lim pij (t) = j ,
(3.9)
exists, where = (i )iS is the unique probability vector, satisfying

Q = 0.
(Here 0 denotes the null vector.) Moreover, is stationary.
Proof. First we show that the limit exists. Let > 0, and choose h > 0, (using
that the semigroup is standard) such that for all 0 u < h
pii (u) 1 /2.
(3.10)
Then Ph is a stochastic matrix, whose entries are all positive due to Lemma 3.2.
But then we see that Ph is equal to the transition matrix P of an irreducible
and aperiodic discrete-time Markov chain (Yn )nN on S (and because S is finite this discrete chain is non-null recurrent); this discrete-time chain (Yn )nN
is called a skeleton of the continuous-time chain (X(t))t0 . So the transition
probabilities pij of the skeleton are given by pij = pij (h), and therefore the
[n]
n-step transition probabilities of the skeleton satisfy pij = pij (nh), for n N.
Due to the main theorem from Chapter 2 we know that there exists for the
skeleton a unique probability vector = (i )iS , satisfying
lim pij (nh) = j
for all i, j S.
(3.11)
We now show that (3.9) holds. According to (3.11) there exists an N N such
that for all j S
|pij (nh) j | < /2
for all n N .
For each t N h there exists a unique n N such that (n 1)h t < nh,
and using Lemma 3.1 and Equation (3.10) we find that
|pij (t) j | |pij (t) pij (nh)| + |pij (nh) j |
1 pii (nh t) + |pij (nh) j | < /2 + /2 = .
Since > 0 was chosen arbitrarily, we see that (3.9) follows. Note that the
existence of this limit implies that is unique.
That is stationary follows by letting s go to infinity in Ps+t = Ps Pt .
Finally we have
Q = 0 Qn = 0 for all n 1 tn Qn /n! = 0 for all n 1, t 0
tn Qn /n! = for all t 0 Pt = for all t 0,

n=0
VERSION 2011
52
where we used that etQ = Pt (see (3.8)).
Remark. As we mentioned before, in case S is countable, but infinite, it is

possible that does not exist, even if the chain (X(t)) is irreducible; it is still
possible that the skeleton (Yn )nN is null-recurrent! In that case one has that
lim pij (t) = 0
for all states i and j.
(3.12)
For a sketch of a proof of this, see [3]. So we see that for an irreducible
continuous-time chain either the stationary distribution exists, in which
case (3.9) hold, or that does not exist, in which case we have (3.12).
3.6 Birth-death processes

Continuous-time Markov chains which have wide applications in queuing theory are the so-called birth-death processes. Here we usually have that S = N,
and that X(t) = i (for i N) means that there are i persons in the system
at time t (think for instance of the number of people waiting to be served
in the bakery on the corner). New people arrive after an exponentially distributed time with parameter i , and leave after an exponentially distributed
time with parameter i (so the arrival intensities i and the departure intensities i may depend on the state i). We call the parameters i the birth rates,
and the parameters i the death rates. It is handy to set 0 = 0.
Quick exercise 3.7 Why is the Poisson process a birth-death process? What
are the i s and i s?

In Figure 3.4 the transition intensities are given.
0
1
2
3
..................................... ............................................ ............................................ ............................................
........
.....
..... ..
..... ..
..... ..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
.
.
.
.................................. 1 ........................................... 2 ........................................... 3 ...........................................
0
1
2
3
4
...............
Fig. 3.4. The transition intensities in a birth-death process
Quick exercise 3.8 Determine the intensity matrix Q and the jump matrix
R for a birth-death process with birth rates i and death rates i .
In order to find the stationary distribution we must solve Q = 0. Writing this
out (and using what you found in Quick exercise 3.8), we find for birth-death
processes the so-called rate out = rate in principle:
VERSION 2011
53
State i
0
1
2
..
.
Rate at which leave state i = rate at which state i is entered

0 0 = 1 1
(1 + 1 )1 = 0 0 + 2 2
(2 + 2 )2 = 1 1 + 3 3
..
.
n (n 1)
(n + n )n = n1 n1 + n+1 n+1
If we now substract from each of these (infinitely many) equations the equation
directly above it we find that
0 0 = 1 1
1 1 = 2 2
2 2 = 3 3
..
.
n n = n+1 n+1 ,
n 0,
i.e.,
1 = 01 0
2 = 12 1
3 = 23 2
..
.
n
n ,
n+1 = n+1
n 0,
so recursively we find that

n =
0 1 n1
0 .
1 2 n
Since 0 + 1 + = 1, we obtain
0 =
1
1+
n=1
0 1 n1
1 2 n
and therefore, for n N:

n =
0 1 n1
1
P 0 1 n1 ,
1 2 n 1 + n=1
(3.13)
1 2 n
We see from (3.13) that the birth-death process with birth rates i and death
rates i has a stationary distribution if and only if
X
0 1 n1
< .
1 2 n
n=1
Examples 3.5
VERSION 2011
54
The (stable) M/M/1 queue. Suppose that i = for i 0 and i = for

i 1. This models a shop with one server with an exponentially distributed
service time with parameter , where customers arrive according to a Poisson
process with intensity . So both the service time and the arrival times are
exponentially distributed, and this explains the name of this model, where
the M stands for Markov. We find, setting = /, that
1 = 0 ,
2 = 2 0 ,
3 = 3 0 ,
and in general, n = n 0 , for n N. We have now two possible situations:

(a) The traffic intensity is smaller than 1, i.e., 0 < < 1. So < , and
we see that the expected time between two consecutive customers is greater
than the expected time to serve a customer; in view of this we do not expect
that the shop will fill up with an ever growing number of customers. That
thisheuristicpoint of view is correct follows from the fact that
X
i=0
i =
1
< ,
1
so exists, and i = i (1 ) for i N. The expected number of customers

in the shop will be (when the chain has attained its stationary distribution):
E[X()] =
X
i=0
ii =
ii (1 ) =
i=0
.
1
(b) In case 1 the equation Q = 0 implies that i = 0 for all i, so no

stationary distribution exists. In time the number of customers in the shop
will grow beyond any bound.
The (stable) M/M/2 queue. Suppose
0
i =
that i = for i 0 and

i=0
i=1
i 2,
see also Figure 3.5
..................................... ............................................ ............................................ ............................................

........
..... ..
..... ..
..... ..
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
................................. 1 ......................................... 2 ......................................... 3 .........................................
0
2
2
2
..............
Fig. 3.5. The transition intensities in an M/M/2 queue
VERSION 2011
55
This models a shop with two servers, both with exponentially distributed
service times with parameter , where customers arrive according to a Poisson process with intensity . In the name M/M/2 the M again stands for
Markov, while the 2 tells you that there are two servers. From Figure 3.5
we see, that
0
0 0
( + )
0 0
Q= 0
.
2
( + 2) 0
..
..
..
. . .. ..
. . .
.
.
.
The rate out = rate in principle now yields that
0 = 1
( + )1 = 0 + 22
( + 2)n = n1 + 2n+1 ,
n 2.
Setting = /2, we find

1 =
0 = 20 ,
22 = 1
2 = 1 = 22 0 ,
and in general
n = 2n 0 ,
n 1.
Since the i add up to 1, we find that (for < 1),

0 =
1
,
1+
yielding that
n =
2(1 )n
,
1+
n 1.
The expected number of customers in (a stationary) M/M/2 queue will be

E[X()] =
ii =
i=0
2(1 ) X i
1
i = 2
1 + i=0
1 + (1 )2
2
.
1 2
For example, if = 2 and = 3, then the expected number of customers in

the queue is for an M/M/1 queue equal to 2, while for an M/M/2 queue it
is equal to 9/12.
VERSION 2011
56

3.1 Obviously pij (s, t) = 0 if i > j, and for i j it follows from (3.1) that
for t s the discrete random variable N (t) N (s) has a Pois((t s))
distribution. But then we have, that
pij (s, t) = P(N (t) = j | N (s) = i)
= P(N (t + s) N (t) = j i)
=
((t s))(ji) (ts)

e
.
(j i)!
3.2 The proof of (3.2) is essentially the same as that of Theorem 1.2; for
s, t 0,
pij (s + t) = P(X(s + t) = j | X(0) = i)
X
=
P(X(s + t) = j, X(s) = k | X(0) = i)
kS
P(X(s + t) = j | X(s) = k)P(X(s) = k | X(0) = i)
kS
pik (s)pkj (t).
kS
(0)
3.3 Since pii = 1 and 0 pii (h) 1 (after all, pii (h) is a probability!), we
see that pii (h) pii (0) 0 for all h 0, implying that qii 0. In case i 6= j
we have that pij (0) = 0, and thus we findsince pij (h) is a probabilitythat
pij (h) pij (0) 0, yielding that qij 0.
3.4 In solving Quick exercise 3.1 we have seen that pij (t) = 0 whenever i > j,
and that
(t)(ji) t
pij (t) =
e ,
if i j.
(j i)!
But then we have that
et
p0ij (t) =
t
2 tet
e
(ji1)
(t)
et
(ji1)!
if i > j
if i = j
if j = i + 1
(t)(ji) t
(ji)! e
if j > i + 1,
yielding that
if i = j
qij = p0ij (0) =
if j = i + 1
0
in all other cases.
VERSION 2011
3.5 Since
jS
57
qij = 0 and q(i) = qii for each j S, it follows that

q(i) =
qij .
jS,j6=i
We see that
X
rij =
jS
1
q(i)
X
jS,j6=i
qij =
q(i)
= 1.
q(i)
3.6 Since q(0) = , q(1) = + , and q(2) = 2, we find that
0 1 0

0 +
R = +
.
0 1 0
3.7 In the Poisson process there are only births; i = for all i N, and
i = 0 for all i N.
3.8 Note that q(0) = 0 , and q(i) = i + i for i 1. From Figure 3.4 we see
that the intensity matrix Q is given by:
0
0
0
0 0
1 (1 + 1 )
1
0 0
.
Q= 0
2
(2 + 2 ) 2 0
..
..
..
.. .. . .
.
. . .
.
.
From Q we see that r01 = 1 and
ri,i+1 =
i
,
i + i
ri,i1 =
i
,
i + i
for i 1.
For all other values of i and j we have that rij = 0.
VERSION 2011
58
3.8 Exercises
3.1 Let X be an exponentially distributed random variable, with expected
value 1/. Furthermore, let S be a continuous random variable, independent
of X, with probability density function fS , satisfying fS (x) = 0 for x < 0.
a. Show that X has the memoryless property; for s 0 and for t 0,
P(X > s + t | X > s) = P(X > t) .
b. Show that for t 0 one has, that
P(X > S + t | X > S) = P(X > t) .
3.2 In a postoffice two counters are open for service. Suppose that the service
time at either counter is exponentially distributed, both with expected value
1/. Two customers enter the postoffice at a moment both counters are idle,
and their service starts immediately. What is the probability distribution of
the residual service time of the customer whose service time was longer than
the service time of the other customer, measured from the moment the other
customers service was ready?
Hint: Let S1 and S2 be the service times of these two customers, what can
you say about P(|S1 S2 | > t), for t 0?
3.3 You, and two other customers enter the postoffice from Exercise 3.2 at
a moment that the postoffice is empty. The two other customers are helped
at once, while you wait for your turn.
a. What is the probability that you will leave the postoffice while one of the
other customers is still being attended?
b. What is the probability that you are still in the postoffice, while the other
customers already have left?
3.4 Let X1 and X2 be two independent exponentially distributed random
variables, with E[X1 ] = 1/1 and E[X2 ] = 1/2 . As usual, let
X(1) = min{X1 , X2 } and X(2) = max{X1 , X2 }.
a. Determine the expectation and variance of X(1) .
b. Determine the expectation and variance of X(2) .
c. Argue why X(1) and X(2) X(1) are independent.
3.5 Consider the factory with two machines and one repairman, as described
in Examples ?? and ?? see pages ?? and ??.
a. Determine the stationary distribution for both examples.
b. If = 1 and = 2, what is the proportion of the time in each of these
examples that both machines are out-of-order?
VERSION 2011
3.8 Exercises
59
3.6 Answer the same questions posed in Exercise 3.5, but now for the factory
described in Example ?? on page ??.
3.7 Suppose that each bacteria in a group of bacteria either splits into two
new bacteria after an exponentially distributed time with parameter , or dies
after an exponentially distributed time with parameter .
a. Describe this as a birth-death process. What are the birth rates i , and
the death rates i ?
b. Can you find a stationary distribution ?
3.8 In a birth-death process with birth rates i for i 0, and death rates
i for i 1, determine the expected time to go from state 0 to state 3.
3.9 Show that Kolmogorovs backward equation yields for birth-death processes that
p00j (t) = 0 (p1j (t) p0j (t)) ,
p0ij (t) = i pi+1,j (t) + i pi1,j (t) (i + i )pij (t),
i 1.
3.10 A machine works for an exponentially distributed amount of time before

breaking down. After breaking down, it takes an exponentially distributed
amount of time to repair it. Let it be known that the expected time it runs
is 1, and also that the expected repair time is 1. If the machine is working at
time t = 0, what is the probability that it is also running at time t = 100?
Hint: model this as a continuous Markov chain, with state space S = {0, 1},
where X(t) = 0 means that the machine is running at time t. What is Q? Qn
for n 0? Finally, what is etQ ?
3.11 Suppose we have almost the same situation as in Exercise 3.10, the
difference being that now the expected time the machine runs is 1/, and that
the expected repair time is 1/. In this case it is more tedious to determine
etQ (although you could use MAPLE to get an idea). In order to find p00 (100)
we use Kolmogorovs backward equation.
a. Setting 0 = , 1 = 0, 0 = 0, and 1 = , show that
p000 (t) = (p10 (t) p00 (t)) ,
p010 (t) = (p00 (t) p10 (t)) .
b. Use your result from a. to obtain that p000 (t) + p010 (t) = 0. Show this
implies that p00 (t) + p10 (t) = c, for some constant c.
c. Argue that c = , and derive that p000 (t) = ( + )p00 (t).
d. Setting
,
h(t) = p00 (t)
+
derive that
h0 (t) = ( + )h(t).
(3.14)
VERSION 2011
60
e. Show that
h(t) = Ke(+)t
is the solution of the differential equation (3.14).
f. Conclude from your answer in e. that
p00 (t) = Ke(+)t +
.
+
What is K? Plug in t = 100.

g. Find Pt .
3.12 The probability and statistics group has two printers: a new and fast
one, and one which is old and ragged. The old machine is used as a back-up
for the new one (so effectively the group only has one printer). As soon as
the new printer breaks down (after an exponentially distributed time, with
parameter new ), it is replaced by the old one, and the systems manager will
repair the new machine. Of course, the old machine can also break down
(after an exponentially distributed time, with parameter old ). If the new
machine will break down while the old one is being repaired, the system
manager will immediately stop working on the old one, and start working on
the new printer. The repair time of the system manager is (for both machines)
exponentially distributed, with parameter . What is the proportion of the
time the probability and statistics group will have no printer available?
3.13 Customers arrive according to a Poisson process with intensity at
Jaaps Fish & Chips Joint; a dinghy little place with one chef where one can
eat the best fried fish in Delft. The serving time of this chef is exponentially
distributed, with parameter . This would be the set-up of a standard M/M/1
queue, except that customers joint the queue in Jaaps restaurant with probability 1/(n + 1) (and go with probability n/(n + 1) to another restaurant) if
there are already n customers in Jaaps culinary Kingdom; here n 0. Show
that the stationary distribution = (0 1 2 . . . ) of this birth-death process
has a Poisson distribution, with parameter /.
VERSION 2011
A
Short answers to selected exercises
1.2a S = {0, 1}, p00 = 0.7, p01 = 0.3, 2.3b 1/4.

p10 = 0.3, and p11 = 0.7.
2.6 About 50% of the time.
1.6a Since 0 1 and #S = 2, we find
that the chain is irreducible. Both states 2.12b 1/5.
are recurrent.
2.12c The Markov chain is irreducible,
[1]
[2]
[n]
1.6b f00 = 31 , f00 = 32 12 = 13 , and f00 = but periodic.
n2 1
n2
2
2 = 13 12
, for n 2. 2.16b 4.
12
3
From this we find, that
3.3a 1/2.
k
X
1
1X 1
[n]
= 1. 3.3b 1/2.
f0 =
f00 = +
3
3
2
n=1
k=0
3.5b Example ??: 0 = 1/5, 1 = 2/5 =

2 . E[X()] = 6/5.
1.9a Classes: {1}, {2}, {3}, {4}. Only Example ??: 0 = 1/9, 1 = 4/9 = 2 .
state 1 is recurrent; the other states are E[X()] = 4/3.
transient.
3.10

1 1
Q=
,
1.9b The Markov chain is reducible, and
1 1
1 is recurrent, {2, 3, 4} are transient.
and
1

2.1a 0 = 31 .
+ 21 e2t 12 e2t
2
.
Pt =
1
e2t 21 + 12 e2t
2.1b 0 = 1.
2
[2]
1.8b p34 =
P4
k=0
p3k pk4 =
1
36
2
18
5
.
36
2.1c 0 = 21 .
2.3a 2 21 resp. 14.90116.
VERSION 2011
VERSION 2011
B
Solutions to selected exercises
1.1 Suppose that we have that

P(A | B C) = P(A | B).
(B.1)
Then we have, that

P(A B C) P(B C)
P(B C)
P(B)
= P(A | B C)P(C | B)
P(A C | B) =
= P(A | B)P(C | B),

where the last equality is due to (B.1).
Conversely, suppose that
P(A C | B) = P(A | B)P(C | B).
(B.2)
Then we have, that

P(A B C)
P(B)
1
= P(A C | B)
P(B)
P(B C)
P(C | B)
P(A | B)P(C | B)
=
,
P(C | B)
P(A | B C) =
where the last equality is due to (B.2).

[1]
1.6b Clearly f00 =

n1
2
12
. So
3
1
,
3
[2]
and f00 =
2
3
1
2
1
.
3
[n]
For n 2: f00 =
2
3

1 n2 1
2
2
2
3
1
21
2 1
2 1
f0 = +
+
+
+
3
32
3 2
3 2
!
k
1
2 X 1
= +
1
3
3
2
k=0

1
2
1
= +
1
= 1.
3
3 1 12
VERSION 2011
64
B Solutions to selected exercises
1.6c By Corollary 3.2 we have, that for j S,

X
P(Xn = j) = (P n )i =
[n]
i pij .
iS
Note that for n = 1

1
3
1
2

P = ( 3/7
4/7 )
= ( 3/7
2
3
1
2
4/7 ) ,
so P 2 = P = , and in general (using induction), P n = . We find that

P(Xn = 0) = 73 for all n 1, so is stationary.
1.7a Clearly, det(P ) = 13 12 23 12 = 61 . Furthermore, an easy calculation shows
that the eigenvalues of P are given by 1 = 16 and 2 = 1. An eigenvector with

4
1
eigenvalue 1 is for example the vector
, while the vector
is an example
3
1
of an eigenvector for the eigenvalue 2 .
1.7b Setting

T =
we have that

PT =
41
3 1
32 1
1
1
2

,

= T D,
where D is the matrix, given by

D=
16 0
01

.
But then we find that P = T DT 1 , from which we have that

P n = T DT 1 T DT 1 T DT 1 T DT 1 = T Dn T 1 .
Since
Dn =
we find that
lim P n =
( 61 )n 0
0 1
41
3 1

00
01

1
7
3
7
00
01
17

,
4
7

=
3
7
3
7
4
7
4
7

.
1.12 Suppose that state i is recurrent. Due to symmetry it suffices to show that j is
also recurrent. Since i j, there exist positive integers k and m, such that
[k]
pij > 0
[m]
and pji > 0.
But then we have for n N, due to Chapman-Kolmogorov, that

[m+n+k]
pjj
[m] [n] [k]
pji pii pij ,
yielding that
VERSION 2011
[m+n+k]
pjj

!
X [m] [n] [k]
X [n]
[k] [m]
pji pii pij =

pii pij pji = .
| {z }
n
n
{z
} >0
|
65
Since
X
[n]
pjj
[m+n+k]
pjj
we find that
X
[n]
pjj = ,
i.e., j is a recurrent state.

1.13 For every n N we have, that
[
1 = P(Ni = | X0 = i) P(
{Xm = i} | X0 = i)
m>n
P(
{Xm = i}, Xn = k | X0 = i).
m>n
kS
Due to the Markov property, the last expression is equal to

X
[
P(Xn = k | X0 = i)P(
{Xm = i} | Xn = k),
m>n
kS
which in its turn is equal to

X
[n]
pik P(
{Xm = i} | X0 = k).
m>0
kS
Note that
[n]
kS
pik = 1, so we find, using the inequality we derived above, that

X [n] [
pik P(
{Xm = i} | X0 = k) 1,
m>0
kS
which is only possible if

P(
{Xm = i} | X0 = k) = 1
m>0
[n]
for all states k S for which pik > 0. In particular we find (since i j), that
[
P(
{Xm = i} | X0 = j) = 1,
m>0
i.e., j i, which is what we set out to prove.

2.2a By definition,
GZ (s) =
P(Z = k) sk =
1 b c X k1 k
+
bc
s
1c
k=1
k=0
X
1bc
=
+ bs
(cs)`
1c
`=0
1bc
bs
=
+
.
1c
1 cs
VERSION 2011
66
But then we have, that

= E[Z] = G0Z (1) =
b
.
(1 c)2
2.2b We have that 0 = 1 if and only if 1, which is (for this exercise) equivalent
with b (1 c)2 . In case b > (1 c)2 , we know (due to Theorem 4.2) that 0 is the
smallest positive solution x of x = GZ (x), i.e., of
x=
1bc
bx
+
.
1c
1 cx
It follows that
0 =
1bc
.
c(1 c)
1
2.3a A longand tediousanswer is the following: P(X2 = 0|X1 = 2) = 16
(both
2
parents their family branches die-out), P(X2 = 1|X1 = 2) = 16 (one parent does
not have any offsprings, the other parent has one offspring), P(X2 = 2|X1 = 2) =
5
(one parent does not have any offsprings, the other parent has two offsprings),
16
4
(one parent ha 1 offspring, the other parent has two
P(X2 = 3|X1 = 2) = 16
4
offsprings), P(X2 = 4|X1 = 2) = 16
(both parents have two offsprings). But then
we have, that
E[X2 |X1 = 2] = 0
1
2
5
4
4
1
+1
+2
+3
+4
=2 .
16
16
16
16
16
2
A shorter and more elegant answer is:

i
i
h
h
5
(1)
(1)
(1)
(1)
= 2 E[Z] =
E[X2 |X1 = 2] = E Z1 + + ZX1 |X1 = 2 = E Z1 + Z2
2
(here we used that = E[Z] = 1 41 ). Heuristically, in the first generation we have two
parents, each starting a new family. So the total number of expected offspring is
the expected offspring of each of these parents. For the second question, this yields
an easy answer. We have that E[X10 |X1 = 2] is the expected number of offsprings
of these two parents in their ninth generation, i.e.,
9
5
E[X10 |X1 = 2] = 2 E[X9 ] = 2
= 14.90116.
4
2.3b Again using the idea, that the two parents in the first generation both start
independently from one-anothera new family tree, we eventually have extinction
if both trees become extinct. Each one dies out with probability 0 , which is equal
to 1/2, so the probability of eventual extinction, given that X1 = 2, is equal to
02 = 41 .
2.4 Since = E[Z] = 0 (1 2p) + p + 2p = 3p, we find that = 1 if and only if
p = 31 . So for 0 p 13 we have that 0 = 1, and for 31 < p 12 we have that
0 < 1.
In this last case (i.e., the case 31 < p
the smallest positive solution of
1
),
2
we know from Theorem 4.2 that 0 is
x = 1 2p + px + px2 .
VERSION 2011
67
But then we find that (after applying some high-school math)

(
1
1 p (3p 1)
= 12p
x1,2 =
2p
.
p
So we find that (in case
1
3
< p 12 ),
0 =
1 2p
.
p
2.6 The matrix of transition probabilities is given by

0.7 0.3
.
0.3 0.7
We have an irreducible and aperiodic Markov chain, so according to the main theorem there exists a unique probability vector = (0 1 ), such that P = , i.e.,
(
0.70 + 0.31 = 0
0 + 1 = 1 and
0.30 + 0.71 = 1 .
From this we derive that 0 = 1 , and thus that 0 = 1 = 21 . So
P(Xn = i) i =
1
2
for i = 0, 1,
so in about 50% of the days one gets wet.

2.10 Due to the Main Theorem we know there exists a unique probability vector
= (i )iS , for which P = . Note that
!

M
M
1
1
1 X
1 X
, ,
pi1 , ,
piM
=
M +1
M +1
M + 1 i=0
M + 1 i=0

1
1
=
1
1 ,
M +1
M +1

PM
since i=0 pij = 1 for j S. But then we must have that = M1+1 , , M1+1 .
2.12a A drawing (such as Figure 2.3) helps to find the transition matrix P , which is
given by
0 1p 0
0
p
p
0 1p 0
0
0
p
0
1
p
0
P =
.
0
0
p
0 1 p
1p 0
0
p
0
Clearly i j, for all i, j S = {1, 2, 3, 4, 5}. So the Markov chain is irreducible.
[2]
[5]
Furthermore, p00 = 2p(1 p) > 0, and p00 = p5 + (1 p)5 > 0, so state 1 (and
therefore all other states as well) is aperiodic.
VERSION 2011
68
2.12b Because the matrix is doubly-stochastic, we know from Exercise 2.10 that the
stationary distribution is given by

1 1 1 1 1
=
.
5 5 5 5 5
From the Main Theorem it then follows that each smoker will hold the pipe 20% of
the time.
2.12c More-or-less everything is the same, except that every state is now periodic
(with period d(i) = 2, for i S = {1, 2, . . . , 6}).
2.13 Define for n 0 the random variables Yn by Y0 = 0, and
Yn Sn (mod 5).
Consequently, the state space is given by S = {0, 1, 2, 3, 4}, and
(
1
if j i (mod 5)
P(Yn+1 = j | Yn = i) = 21
if j i + 1 (mod 5),
2
and P(Yn+1 = j | Yn = i) = 0 otherwise. So the matrix of transition probabilities P
is given by
1 1
0 0 0
2 2
0 1 1 0 0
2 21 1
0 0
0
2 2
.
0 0 0 1 1
2 2
1
0 0 0 12
2
Since the event {Yn+1 = j} is determined by the event {Yn = i}, it is clear that
(Yn )n1 is a Markov chain. For an explicit proof of this, first note that
P(Yn+1 = j | Yn = i, Yn1 = yn1 , . . . , Y1 = y1 , Y0 = 0) = 0
in case j 6 i, i + 1 (mod 5), and we are done, since pij = 0 in this case. Therefore
we may assume that j i, i + 1 (mod 5). Next, one should realize that
Yn+1 = j, Yn = i, Yn1 = yn1 , . . . , Y1 = y1 , Y0 = 0
uniquely determines
Sn+1 = sn+1 , Sn = sn , Sn1 = sn1 , . . . , S1 = s1 , S0 = 0,
which in its turn uniquely determines
Xn+1 = xn+1 , Xn = xn , Xn1 = xn1 , . . . , X1 = x1 ,
where the xi are 0 or 1. Since
P(Yn+1 = j | Yn = i, Yn1 = yn1 , . . . , Y1 = y1 , Y0 = 0)
is equal to
P(Xn+1 = xn+1 , Xn = xn , Xn1 = xn1 , . . . , X1 = x1 )
,
P(Xn = xn , Xn1 = xn1 , . . . , X1 = x1 )
VERSION 2011
69
which isdue to the independence of the Xi equal to

P(Xn+1 = xn+1 ) P(Xn = xn ) P(X1 = x1 )
1
= = pij .
P(Xn = xn ) P(X1 = x1 )
2
Obviously this is an irreducible Markov chain, since i j for each i, j S, and
since pii = 12 it is also aperiodic. It follows from the fact that P is doubly stochastic
(see Exercise 2.10), that the stationary distribution is given by

1 1 1 1 1
=
.
5 5 5 5 5
Consequently,
lim P(Yn = 0) = 0 =
1
.
5
2.15a Note that P is doubly stochasric, so immediately we seeusing Exercise 2.10,

that the stationary distribution is given by

1 1 1
=
.
3 3 3
Let f (x) = x, then the ergodic theorem yields that with probability 1
lim
X
X0 + X1 + + Xn1
1+2+3
=
ii =
= 2,
n
3
iS
i.e.,

X0 + X1 + + Xn1
P lim
=2
n
n

= 1.
2.15b Note that the initial distribution is equal to the stationary distribution .
Consequently, for every n 0 and every i S we have, that P(Xn = i) = 31 .
Now,
P(Xn = i | Xn+1 = j) =
P(Xn+1 = j, Xn = i)
P(Xn = i)
1/3
= pij
= pij .
P(Xn = i)
P(Xn+1 = j)
1/3
2.16a Obviously,
P(Y3 = 5 | Y2 = 3, Y1 = 2) =
P(Y3 = 5, Y2 = 3, Y1 = 2)
.
P(Y2 = 3, Y1 = 2)
Since Yn = Xn + Xn1 , we have

P(Y2 = 3, Y1 = 2) = P(X2 + X1 = 3, X1 + X0 = 2)
= P(X2 = 2, X1 = 1, X0 = 1)
P(X2 = 2, X1 = 1, X0 = 1) P(X1 = 1, X0 = 1)
=
P(X0 = 1)
P(X1 = 1, X0 = 1)
P(X0 = 1)
(Markov property)
= P(X2 = 2 | X1 = 1) P(X1 = 1 | X0 = 1) P(X0 = 1)
1 1 1
1
= =
.
2 3 3
18
VERSION 2011
70
Furthermore, P(Y3 = 5, Y2 = 3, Y1 = 2) =
= P(X3 + X2 = 5, X2 + X1 = 3, X1 + X0 = 2)
= P(X3 = 3, X2 = 2, X1 = 1, X0 = 1)
P(X3 = 3, X2 = 2, X1 = 1, X0 = 1)
=
P(X2 = 2, X1 = 1, X0 = 1)
P(X2 = 2, X1 = 1, X0 = 1)
= P(X3 = 3 | X2 = 2, X1 = 1, X0 = 1) P(X2 = 2, X1 = 1, X0 = 1)
= P(X3 = 3 | X2 = 2) P(X2 = 2, X1 = 1, X0 = 1)
1
= p23 .
18
So we find that
1
P(Y3 = 5 | Y2 = 3, Y 1 = 2) = p23 = .
2
To determine P(Y3 = 5 | Y2 = 3, Y1 = 3), note that
P(Y2 = 3, Y1 = 3) = P(X2 + X1 = 3, X1 + X0 = 3)
= P(X2 = 1, X1 = 2, X0 = 1) + P(X2 = 2, X1 = 1, X0 = 2)
= ...
= 2 p12 p21
1
1
=
,
3
27
and that
P(Y3 = 5, Y2 = 3, Y1 = 3) = P(X3 + X2 = 5, X2 + X1 = 3, X1 + X0 = 3)
= P(X3 = 3, X2 = 2, X1 = 1, X0 = 2)
= ...
1
=
.
108
But then we find that
P(Y3 = 5 | Y2 = 3, Y1 = 3) =
1/108
1
= .
1/27
4
Thus we see that

P(Y3 = 5 | Y2 = 3, Y1 = 2) =
1
1
6= = P(Y3 = 5 | Y2 = 3, Y1 = 3),
2
4
and consequently (Yn )n1 is not a Markov chain.

2.16b With probability 1,
lim
n
n
1X
1X
Yk = lim
(Xk + Xk1 )
n n
n
k=1
k=1
= lim
= lim
= 2
n1
n
1X
1X
Xk +
Xk
n
n
k=0
2
n
n1
X
k=0
k=1
Xn X0
Xk +
n
ii + 0 = 4
iS
VERSION 2011
71
2.19 Note that

P(Xn = j) =
X [n]
X P(Xn = j, X0 = i)
P(X0 = i) =
pij P(X0 = i) .
P(X0 = i)
iS
iS
So if S is finite, we findexchanging summation and limit,

X
X
[n]
lim P(Xn = j) =
lim pij P(X0 = i) =
j P(X0 = i)
n
iS
= j
iS
P(X0 = i) = j .
iS
{z
1
3.2 Let S1 and S2 be the service times of the two customers. Then the residual time
R is given by R = |S1 S2 |, and we have that
P(R > t) = P(|S1 S2 | > t)
= P(S1 > S2 + t, S1 > S2 ) + P(S2 > S1 + t, S2 > S1 ) .
From Exercise 3.1b. it follows, that
P(S1 > S2 + t | S1 > S2 ) = et ,
and therefore we have that
P(S1 > S2 + t, S1 > S2 ) = et P(S1 > S2 ) .
Similarly,
P(S2 > S1 + t, S2 > S1 ) = et P(S2 > S1 ) ,
and we find that, since P(S1 = S2 ) = 0,
P(R > t) = et P(S1 > S2 ) + et P(S2 > S1 ) = et ,
i.e., R = |S1 S2 | has an Exp() distribution.
3.3a One of the two other customers (who are being served before you), will leave
first, and due to the memoryless property of the exponential distribution, the service
time of the remaining customer starts all over again, exactly at the same time your
service starts. But then your service will be finished before the other persons service
with probability
1
= .
+
2
3.3b Essentially the same reasoning as in part a. yields that you will still be in the
postoffice after both other customers have left with probability 1/2.
3.7a Obviously we have that S = {0, 1, 2, . . . }, and that if there are i bacteria at
some time t (with i 1, we can only move to state i + 1 (one of the bacteria splits
into two new ones), or to state i 1 (one of the bacteria dies). We cannot move to
other states, because then we would have (for example) that two bacteria die at the
same time, which has probability 0. Furthermore, for i 1 we have that i = i,
and that i = i. Finally, 0 = 0; once all the bacteria have died, one cannot have
new births (except if you believe in spontaneous regeneration).
VERSION 2011
72
3.7b We have that = (1, 0, 0, . . . ) is a stationary distribution.

3.10 We have that S = {0, 1}, so that

1 1
1 1
Q=
But then we have that

10
(tQ)0 =
,
01
(tQ)1 =
and
(tQ)3 =
t t
t t

.
(tQ)2 =
4t3 4t3
4t3 4t3
2t2 2t2
2t2 2t2

,

.
Using induction, we have that

1
(2t)n 12 (2t)n
2
,
(tQ)n =
n 1
n
1
2 (2t) 2 (2t)
for n 1.
Since Pt = etQ , we find that

Pt =

2
X
(tQ)n
10
t t
t t2
+
=
+
+
2
2
t
t
01
t t
n!
n=0
1

(2t)n 21 (2t)n
2
+
+
n 1
n
1
2 (2t) 2 (2t)
X
X
(2t)n
(2t)n
1
1
1
+
2
n!
2
n!
n=1
n=1
X
1 X (2t)n
(2t)n
2
1 + 21
n!
n!
n=1
1
2
(2t)n
n!
12
=

=
1
2

e2t 1
+ 12 e2t
12 e2t
n!
1+
1
2
X
(2t)n
n!
!
1
n=0
12 e2t 1
1
2
1
2
X
(2t)n
n=0
n=0
1+
n=1
1
1 +
n=0
1 X (2t)n
2
1
n!
1
2
1
2
12 e2t 1
1+
21 e2t
+ 21 e2t
e2t 1
1
2

.
But then we find that

p00 (100) =
1
1
1
+ e200 0.5000000 .
2
2
2
Because Q = 0, we find that 0 = 1 , and from 0 + 1 = 1 it then follows that

0 = 21 = 1 .
VERSION 2011
73
3.11a Now we have that S = {0, 1}, and

.
Q=

A simple calculation yields that
2 2

t + t2 2 t2 t2
(tQ)2 =
,
2 2
2
2 2
2
t t
t + t
so it becomes quite hard to determine (tQ)n by hand (try yourself to find (tQ)3
. . . ).
Due to Kolmogorovs Backward Equation (it is perhaps more clever to take the
Forward Equation) we have that Pt0 = QPt , so

0
p00 (t) p01 (t)

p00 (t) p001 (t)
,
=
p10 (t) p11 (t)

p010 (t) p011 (t)
yielding that
p000 (t) = p00 (t) + p10 (t) = (p10 (t) p00 (t))
p010 (t) = (p00 (t) p10 (t)) .
3.11b It follows from the answer of a., that
p000 (t) = p10 (t) p00 (t)
p010 (t) = p00 (t) p10 (t),
from which we see that, for all t 0,
p000 (t) + p010 (t) = 0.
But then we must have that p00 (t) + p10 (t) = c, for some constant c.
(It is more clever to use p00 (t) + p01 (t) = 1.)
3.11c Using that p00 (t) + p10 (t) = c, for some constant c (see the answer of b.),
we find for t = 0 that (since p00 (0) = 1 and p10 (0) = 0),
1+0=c
c = .
Furthermore,
( + )p00 (t) = (1 p00 (t)) p00 (t).
From Q = 0 (which yields that 1 =
0 =
1
1+
),
0
and 0 + 1 = 1, we find that

and
1 =
.
+
Because we have that Pt = Pt , for all t 0, we find that

p00 (t) + p10 (t)
= 0 = 0 p00 (t) + 1 p10 =

,
+
+
which yields that
(1 p00 (t)) = p10 (t).
But then we find (using a.),
( + )p00 (t) = p10 (t) p00 (t) = p000 (t).
VERSION 2011
74
3.11d By definition of h(t), we have that h0 (t) = p000 (t). But then it follows from c.
immediately that h0 (t) = ( + )h(t).
3.11e From calculus we know that the differential equation (3.14) has as solution
h(t) = K e(+)t .
But then we find, by definition of h(t), that
p00 (t) = K e(+)t +
.
+
3.11f In case t = 0 we find that (using our answer from e.),

1 = K e0 +
i.e., K =
,
+
.
+
3.13 Note, that we have that
,
n+1
n =
for n 0,
and
for n 1.
n = ,
From the rate out = rate in principle, we find that
0 = 1
1
0 + 2 = ( + )1
2
and in general
n =
n
0 ,
n!n
1 =
0 ,
2 =
2
0 ,
22
for n 1.
Setting = /, we find that

n =
n
0 ,
n!
for n 0,
and it follows that

0 =
1
1+
1!
2
2!
3
3!
1
= e ,
e
which implies that
n
e , for n 0,
n!
which are exactly the probabilities of a Pois() distribution!
n =
VERSION 2011
References
1. Pierre Bremaud. Markov chains, volume 31 of Texts in Applied Mathematics.

Springer-Verlag, New York, 1999. Gibbs fields, Monte Carlo simulation, and
queues.
2. Karma Dajani and Cor Kraaikamp. Ergodic theory of numbers, volume 29 of
Carus Mathematical Monographs. Mathematical Association of America, Washington, DC, 2002.
3. G. R. Grimmett and D. R. Stirzaker. Probability and random processes. The
Clarendon Press Oxford University Press, New York, second edition, 1992.
4. Olle H
aggstr
om. Finite Markov chains and algorithmic applications, volume 52
of London Mathematical Society Student Texts. Cambridge University Press,
Cambridge, 2002.
5. Marius Iosifescu. Finite Markov processes and their applications. John Wiley
& Sons Ltd., Chichester, 1980. Wiley Series in Probability and Mathematical
Statistics.
6. J. R. Norris. Markov chains. Cambridge Series in Statistical and Probabilistic
Mathematics. Cambridge University Press, Cambridge, 1998. Reprint of 1997
original.
7. John A. Rice. Mathematical Statistics and Data Analysis. Duxbury Advanced.
Thomson, Brooks/Cole, Belmont, CA, 2007.
VERSION 2011

Markov Chains An Introduction WI1614

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Markov Chains An Introduction WI1614

Uploaded by

Copyright:

Available Formats

Course WI1614

Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Limit behavior of discrete Markov chains . . . . . . . . . . . . . . . . . .

Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . .

Short answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . 61

Solutions to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

1 Discrete-time Markov chains

1.1 Moving molecules

1.1 Moving molecules

Quick exercise 1.1 In Ehrenfests model of a gas moving between chambers,

To know the distribution of Xn at time n, it seems to be important to

Let the vector = (0 1 N ) describe the initial distribution, i.e.,

whereof coursewe have that i 0 and 0 + 1 + + N = 1.

1 Discrete-time Markov chains

1.2 Definition of a discrete Markov chain

and (in case P(Xn = i, Xn1 = xn1 , . . . , X0 = x0 ) > 0),

p11 p12 p1N

1.3 Classification of the states

P(Z | Xn = i, V ) = P(Z | Xn = i).

Proof. Suppose that the Markov property (1.1) holds. Then

Conversely, if (1.2) holds, then

1.3 Classification of the states

pi,j = P(Xn = j | X0 = i),

and we write3 i j. In particular we have for n = 0 that

1 Discrete-time Markov chains

pij = pij for all i, j S.

1.3 Classification of the states

We have the following corollary.

and pjk > 0. But then, due to the Theorem of Chapman-Kolmogorov, we

1 Discrete-time Markov chains

Fig. 1.1. The various transitions in Example 1.1(a)

Fig. 1.2. The transitions in Example 1.1(b)

(c) Finally, let P be given as follows:

1.3 Classification of the states

Fig. 1.3. The transitions in Example 1.1(c)

= pii for all states i, and for all n 2,

1 Discrete-time Markov chains

Clearly, P(Ni = 1 | X0 = i) = 1 fi . In Exercise 1.11 you are invited to show

Repeating this another k 1 times we find that

For a recurrent state i this yields that

we find the following characterization of the states.

An immediate consequence of this theorem is the following corollary.

Corollary 1.3 If i is a transient state, then pii 0 as n .

for some p (0, 1),

1.3 Classification of the states

Fig. 1.4. The various transitions in a simple random walk

and pij = 0 for j 6 {i 1, i + 1}; see also Figure 1.4.

Using Stirlings formula, which states that

one finds that

But the we see that

Now 4p(1 p) < 1 if and only if p 6= 12 . So if p 6=

1 Discrete-time Markov chains

We conclude that the Markov chain is transient if p 6= 12 , and is recurrent if

1.4 Solutions to the quick exercises

1.2 Due to the law of total probability we have

P(Xn = j | Xn1 = i)P(Xn1 = i) .

Because the Markov chain is time homogeneous, this is essentially checking

1 Discrete-time Markov chains

a. Classify the states. Is the chain irreducible?

Determine P(Xn = 0) for all n 1. What do you notice?

Consider the stochastic process (Xn )n0 , defined by X0 = 0, and

1 Discrete-time Markov chains

a. Determine the matrix of transition probabilities P .