You are on page 1of 62

Stochastic Operations Research Models

Hans Reijnierse
August 2012
This syllabus deals with the rst seven weeks. It is roughly based on topics in
A rst course in Stochastic Models
by
Henk C. Tijms
Wiley
ISBN 0-471-49880-7 (Cloth)
ISBN 0-471-49881-5 (Paper)
Contents
1 Introduction 2
2 Discrete Time Markov Chains 4
2.1 Examples of Discrete Time Markov Chains . . . . . . . . . . . . . . . . . 7
2.2 Multiple Step Transition Probabilities . . . . . . . . . . . . . . . . . . . . 10
2.3 Mean First Passage Times . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 The Long Term Behavior of a DTMC . . . . . . . . . . . . . . . . . . . . 12
2.5 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Cost-Reward Structures on Finite Irreducible DTMCs . . . . . . . . . . 18
3 Continuous Time Processes 20
3.1 The Markov Property for Continuous Time Processes . . . . . . . . . . . 20
3.2 Poisson and Other Counting Processes . . . . . . . . . . . . . . . . . . . 21
3.3 Continuous Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . 29
3.3.1 Time Dependent Transition Probabilities and Transition Rates . . 33
3.3.2 Equilibrium Distributions . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Long Term Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.4 Cost-Reward Structures . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.5 The PASTA Theorem and Littles Law . . . . . . . . . . . . . . . 44
A Appendix; Mathematical Tools to Build Stochastic Models i
B Index ix
1 Introduction
In order to study Economic situations, it is convenient to rephrase them into a robust
mathematical model. Advantages are that ambiguities are (should be) eliminated, one
can leave side issues out of the model and focus on the main ones, and one can simplify
the situation if it is too complex to handle as a whole. Of course, too much simpli-
cation will lead to a poor description of reality; there is always a trade-o between
eort and accuracy. Another advantage is that mathematical models may reveal general
phenomena that apply for many situations. The objective of this course can be put in
the one-liner
. .... to model and optimize dynamic systems under uncertainty. . .
The term Operations Research in the name of the course refers to a scientic discipline
described by Wikipedia as follows:
... an interdisciplinary branch of applied mathematics and formal science that uses
analytical methods to arrive at (near-)optimal solutions to complex decision-making
problems.
It is often concerned with determining the maximum (of prot, performance, or yield) or
minimum (of loss, risk, or cost) of some objective function. This course deals with some
techniques for modeling and optimizing systems under uncertainty. It aims to increase
the capability of analyzing managerial problems under uncertainty which occur, for
example, in
. .
scheduling,
inventory,
production control,
telecommunications,
maintenance,
insurance.
. .
2
The emphasis is on providing insight in the theory, on formulating an economic situation
into a mathematical model and on providing practical examples in which the discussed
models can be applied. Examples of mathematical models of OR-problems are e.g.
a time dependent variable counting the number of jobs waiting in a queue,
a directed graph representing cities and roads, (e.g., to model the Traveling Salesman
Problem),
Markov chains; these will form the main subject of the rst half of this course.
. .
The following exercise is an example of a stochastic optimization problem concerning
inventory:
Exercise 1
A vendor in the lobby of a big oce building sells coee. Suppose that the daily demand
X for coee, measured in liters, is exponentially distributed
1
with rate > 0. Suppose
that she must pay e c per liter for coee, receives e b for each liter sold (assume that
b > c), but suers a penalty of e a per liter in lost goodwill for unfullled demand. How
much coee should she buy to maximize her expected prot per day?
Hint: Let I be the amount of coee the vendor purchases; the decision variable. Then
E[X] =
1

and show that E[X | X > I] = I +


1

. Apply the law of total expectation


to obtain E[X | X < I].
Answer: If I is the amount of coee she buys, it is optimal to choose I subject to
. .P[X > I] =
c
b +a
.. .
Besides uncertainty, time will have a crucial role in the problems that we will consider.
Real life situations but also mathematical models that involve time and uncertainty
are called stochastic processes. Examples are the weather, Dow Jones, queues.... The
1
This syllabus contains an appendix with denitions of mathematical notions and an index section
to nd them.
3
following queueing example displays a basic queue and its costs and rewards. The
questions raised can be answered with the theory provided by this course.
Example 1 A call center has two servers. Callers who nd both servers busy are put
in a waiting queue. The rst come, rst served policy is used to handle the services.
Each caller pays ea, per minute while waiting and each caller pays eb, per minute
during his own service. Suppose that each idle server costs ec, per hour and each
busy server costs ed, per hour.
What would be the net rewards per hour in the long run?
What will be the average waiting time per client?
Actually, we will see that a queue like in the previous example is a Markov chain. In
order to treat the topic of Markov chains (rather) rigourously, we will spend half of
the course on it. Markov chains can be categorized into two types. Section 2 involves
discrete time Markov chains and Section 3 treats continuous time Markov chains.
2 Discrete Time Markov Chains
Stochastic processes involve time and uncertainty. They describe dynamic systems. A
random variable X turns into a process when you ag it with a time moment. E.g.,
X = {X
0
, X
1
, X
2
, X
3
, ...} is a stochastic process, and Y = {Y (t) : t 0} is another one.
More general, X
n
need not be a real number, but can denote the state of a system. The
set of all possible states is called the state space of the process and is usually denoted by
I. E.g., if X
n
describes the weather type at day n, then I might be I = {sunny, moderate,
freezing cold, stormy}. The interpretations of X
7
and Y (3.5) are the probabilistic values
of X and Y at times 7 and 3.5 respectively. Examples of stochastic processes are the
weather, the AEX.
Stochastic processes have a time index set, denoted by T. There are two time index sets
used in this course:
Discrete time index set: T = IN
0
= {0, 1, 2, 3, ...}.
4
The variable n ( IN
0
) is used for discrete time moments.
Continuous time index set: T = [0, ).
The variable t ( IR
+
; the set of non-negative real numbers) is used for continuous
time moments.
A sample or realization of a stochastic process is a function s : T I, so for each time
moment, the function tells the state of the system at that moment. If T = IN
0
, a sample
is thereby an innite sequence of states.
In general, stochastic processes are extremely complex. In principle, the process is
described by its so-called fdds (nite dimensional distributions), i.e., all probabilities of
the form P[X
t
1
x
1
, . . . , X
tn
x
n
] with t
1
< < t
n
T, (x
1
, . . . , x
n
) IR
n
. This is
of course in general not practical; too complex. Simplifying assumptions are useful, for
instance the i.i.d. assumption (see the Appendix).
P[X
t
1
x
1
, . . . , X
tn
x
n
] . .(by independency)
..= P[X
t
1
x
1
] . . . P[X
tn
x
n
] . .(because of identical distributions)
..= F(x
1
) . . . F(x
n
).
The describing probabilities have become a lot more manageable. Actually, all informa-
tion is given by a single distribution function. A less rigorous assumption is the Markov
property.
Sloppily formulated, in the discrete time context, if the process is in some state i at
time n, then the probability that it moves to state j at time n + 1 depends only on the
current state i. Let us formalize this:
Denition 2.1 Let {X
n
: n IN
0
} be a discrete time stochastic process with state
space I. It has the Markov property if for all nite increasing sequences of time moments
t
1
< < t
n+1
and all (i
1
, . . . , i
n+1
) I
n+1
P[X
t
n+1
= i
n+1
| X
t
1
= i
1
, . . . , X
tn
= i
n
] = P[X
t
n+1
= i
n+1
| X
tn
= i
n
]. (2.1)
A process with the Markov property is called a Markov chain.
An interpretation can be
5
t
1
, . . . , t
n1
: (times in the) past,
t
n
: present,
t
n+1
: moment in the future.
Past and future are independent, given the present.
Or, Where you go to doesnt depend on how you got here.
The one-step transition probabilities are given by P[X
n+1
= j | X
n
= i] for all n
IN
0
, i, j I. If these do not depend on n, the Markov chain is said to be time-
homogeneous; i.e., for all n IN
0
we have
p
ij
:= P[X
1
= j | X
0
= i] = P[X
n+1
= j | X
n
= i]. (2.2)
Of course, 0 p
ij
1 and

jI
p
ij
= 1. They form the (one-step-transition-probability-
matrix P. In the case I = IN
0
, it looks like
. .P =

p
00
p
01
p
02

p
10
p
11
p
12

p
20
p
21
p
22

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

. .
P is row-stochastic; each row-sum equals 1.
If one knows that the distribution of X
n
equals , the distribution of the system at the
next time moment is readily found by right multiplying with P:
P[X
n+1
= j | X
n
] :=

iI
P[X
n
= i | X
n
] P[X
n+1
= j | X
n
= i]
=

iI

i
p
ij
= (P)
j
.
(2.3)
In principle, a time-homogenous DTMC is determined by its initial distribution (the
distribution of X
0
) and its transition probability matrix, i.e., all fdds can be found if
you know P and F
X
0
.
6
2.1 Examples of Discrete Time Markov Chains
Example 2 Playing Roulette
A gambler starts with 20 ches and constantly plays red at a roulette table. He bets
for one che at a time and stops either when he is broke or owns 40 ches. Intuitively,
this is a nite but unbounded process, the gambler will eventually either reach his goal
or will end up broke, but there is no bound on the number of turns until this happens.
We have dened DTMCs on innite time horizons. Still, this situation can be described
by a DTMC by a simple modeling trick. Dene I := {0, . . . , 40}, P[X
0
= 20] = 1, and
. .p
ij
=

1 if i = j = 0 or i = j = 40,
19
37
if j = i 1,
18
37
if j = i + 1,
0 otherwise.
. .
States 0 and 40 are called absorbing. When a DTMC reaches an absorbing state, this
can be interpreted as the end of the process (at least, nothing interesting will happen
any further).
Many other games, like the game of checkers (dammen), are Markov chains. Checkers
has a huge state space, each possible conguration of men and kings (damstenen en
dammen), combined with the information who is on turn, denes a state.
Example 3 Simply counting can be considered to be a DTMC with X
1
= 1, X
2
=
2, X
3
= 3, ... and an innite state space I = IN.
Example 4 An insurance problem: Bonus malus in Automobile Insurance
Suppose that you have your car insured by the following system. There are four pos-
sible premium classes with respective premium payments of size C
i
. C
1
is the highest
premium, C
4
the lowest. Each year without claim leads to a shift to one premium class
higher next year (if not already in 4). Each claim brings you back to class 1 next year.
If your car is damaged, should you pay for it yourself, or use your insurance?
7
. .
`

E
1
1
'

2
1

`
c
1
`

E
1
2
'

3
2
`

E
1
3
' `

4
3
`

T
1
4
. .
Figure 1: A transition probability diagram
The following policy may be optimal. Determine four numbers a
1
, . . . , a
4
, and make a
claim if the damage is more than ea
i
and your current premium equals C
i
.
Can we model this situation by means of a Markov chain? Since the chosen policy only
uses information of the current year, (the current premium and damage), the Markov
property is valid. Let D
i
n
be the event that in year n, you face at least once some damage
to your car that exceeds a
i
. Assume that P[D
i
n
] = P[D
i
m
] and D
i
m
D
i
n
for all m = n.
Let
i
:= P[D
i
1
].
First dene the state space: I = {1, 2, 3, 4}. The name of the state indicates the level
of premium you have to pay in a typical year. It might give insights to draw a (one-
step)-transition-probability-diagram. In such a diagram there is a circle for each state
and an arrow from one state to another (or itself) if the system can move from the rst
to the latter in one unit of time. It is customary to put the name of the state inside its
circle and to ag an arrow with the probability that it is used, given that the system is
in the state at the tail of the arrow. Figure 1 displays the diagram corresponding to this
Markov chain. It stores the same information as the transition probability matrix P:
. .P =

1
1
1
0 0

2
0 1
2
0

3
0 0 1
3

4
0 0 1
4

. . .

8
Example 5 The s-S-stock inventory system
Suppose some shop sells some product as a regular stock item. The demand for it is
stable over time. The demands in the successive weeks are independent of each other.
Backlogging is not allowed: an empty stock means demand lost and unhappy customers.
The owner of the shop uses the so-called periodic review s-S control rule for stock
replenishment of the item. s and S are xed natural numbers with 0 s < S. At
Monday mornings, stock is inspected.
Policy: When at inspection it turns out that the level of stock has decreased under the
critical level s, a replenishment order of size S minus the current level of stock is
placed.
Before the store opens, the stock is back to maximal volume S. Typical question are:
What is the average ordering frequency?
What is the average amount of demand that is lost per week?
These questions can be answered by the theory of Markov chains. Let D
1
, D
2
, . . . be
an i.i.d.-sequence of integer valued random variables. D
n
denotes the number of items
requested in week n. Denote
j
= P[D
n
= j] for all n, j IN
0
. So is the common
pmf for the demand of any week. Time is discrete and expressed in weeks: t T =
{0, 1, 2, . . . }. Assume that at time t = 0 the stock equals S. Let X
n
denote the stock at
the beginning of week n just prior to review.
We get . .X
n+1
=

max(X
n
D
n
, 0) if X
n
s,
max(S D
n
, 0) if X
n
< s.
. .We get
X
1
, . . . , X
n1
only implicitly (i.e., via X
n
) inuence X
n+1
. In other words, the process
{X
n
, n IN} possesses the Markov property. It is a discrete time-homogeneous Markov-
chain with state space I = {0, 1, . . . , S}. How can we nd the transition probabilities
p
ij
?
If i s: p
i0
= P[X
1
= 0 | X
0
= i] = P[D
1
i] =

k=i

k
,
p
ij
= P[D
1
= i j] =
ij
. .(1 j i),
p
ij
= 0 . .if j > i.
9
If i < s: p
i0
= P[D
1
S] =

k=S

k
,
p
ij
= P[D
1
= S j] =
Sj
. .if 0 < j S.
The probability-matrix is . .

k=S

k

S1

0
.
.
.
.
.
.
.
.
.

k=S

k

S1

0

k=i

k

i1

0
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i1

0
0

k=i

k

i1

0

i = 0
1

1
.
.
.
i = s 1
1

1
i = s
1

1
.
.
.
.
.
.
i = S
1

1
. .The probability-matrix is
. .
2.2 Multiple Step Transition Probabilities
If one knows the one step probabilities of a time homogeneous DTMC, it should in
principle be possible to nd out the chances to move in n steps from one state to another.
The following result shows that these probabilities are, given P, easy to compute (at
least for a computer). Let p
(n)
ij
:= P[X
n
= j | X
0
= i]. We have p
(1)
ij
= p
ij
and, by
convention, p
(0)
ij
=

1 if i = j,
0 if i = j.
Theorem 2.2 p
(n+m)
ij
=

aI
p
(n)
ia
p
(m)
aj
, or, in matrix notation, P
(n+m)
= P
(n)
P
(m)
.
In particular, the n-step transition matrix P
(n)
equals the n
th
power of P, which is
denoted by P
n
. Therefore we might as well leave out the brackets again and denote the
n-step transition probability matrix by P
n
. Note that when we would denote p
(n)
ij
simply
by p
n
ij
this would cause ambiguity, so for the entries of P
n
we keep the notation p
(n)
ij
. To
prove this theorem, condition upon X
n
, given X
0
= i:
10
Proof p
(n+m)
ij
= P[X
n+m
= j | X
0
= i]
=

aI
P[X
n+m
= j | X
n
= a, X
0
= i] P[X
n
= a | X
0
= i]
=

aI
P[X
n+m
= j | X
n
= a] P[X
n
= a | X
0
= i]
=

aI
P[X
m
= j | X
0
= a] P[X
n
= a | X
0
= i]
=

aI
p
(m)
aj
p
(n)
ia
.
The second equality is nothing but the law of total probability. The third is due to the
Markov property. The fourth is valid because of time homogeneity. . .2.
2.3 Mean First Passage Times
Can we nd the expected time until a DTMC jumps to some specied state j? Let

j
:= min{n 1 : X
n
= j}.
j
is called the rst passage time of state j.
We have . .


j
IN if the chain ever visits state j after n = 0,

j
= else.
. .We have
The question can be answered by computing for a given state j and each state i the
conditional mean rst passage times
ij
:= E[
j
| X
0
= i] (we call
jj
the mean returning
time of state j). This will be done by applying the law of total expectation telling that
. .E[
j
] =

iI

ij
P[X
0
= i]. . .
So, all we have to nd out are, for a given state j, the values of
ij
(i I).
Condition upon X
1
:
ij
=

aI
P[X
1
= a | X
0
= i]E[
j
| X
0
= i, X
1
= a]
= P[X
1
= j | X
0
= i]E[
j
| X
0
= i, X
1
= j]
. .+

a=j
P[X
1
= a | X
0
= i]E[
j
| X
0
= i, X
1
= a]
= p
ij
1 +

a=j
p
ia
E[
j
| X
1
= a]
= p
ij
1 +

a=j
p
ia
(1 +E[
j
| X
0
= a])
= 1 +

a=j
p
ia
E[
j
| X
0
= a].
11
The second equality is due to the Markov property. The reason that E[
j
| X
1
= a] =
1 + E[
j
| X
0
= a] for a = j is as follows. It takes 1 time unit to jump to a, and when
in a at time moment 1, it takes another E[
j
| X
0
= a] time units to jump to j for the
rst time. This proves the following theorem and corollary:
Theorem 2.3 Let {X
n
: n IN
0
} be a DTMC with state space I. Let for all i, j I,

ij
be dened by
ij
= E[min
nIN
: X
n
= j| X
0
= i]. Then for all i, j I we have

ij
= 1 +

a=j
p
ia

aj
. (2.4)
For a given state j, and i running through the state space, we have |I| unknowns and the
same amount of equations (one for each state i). Often, the set of equations has a unique
solution. E.g., if I is nite and
j
< with probability 1 for all initial distributions.
Why will be explained later on.
Corollary 2.4 Let j I and dene
j
to be the rst time moment at which to system
jumps to state j. Then E[
j
] =

iI

ij
P[X
0
= i], which can be computed if the
system of equations (2.4) can be solved.
2.4 The Long Term Behavior of a DTMC
In general, if we would like to predict the state of a DTMC in the next time unit, it is
important to know the current state. However, if you would like to give probabilities of
a state of the system in the far future, the current state might not be relevant at all.
E.g., if we think of s-S-inventory (Example 5), the stock size at week 52 hardly depends
on the stock size in week 1. On the other hand, if the gambler in Example 2 would have
started with 1 che, his chances of ending up winning would be much lower than when
he would have started with 39 ches. Furthermore, if he starts with an even number of
ches, he will remain to have an even number of ches at every even time moment. So in
two ways, the initial state does give information about the far future. In this subsection,
12
we will get more insight about these phenomena. First, let us discuss the long term rate
(frequency) of visits of a specic state j.
Assume for the time being that X
0
= i for some i I.
The expected number of visits of state j at time 1 equal P[X
1
= j] = p
ij
.
The expected number of visits of state j at time 2 equal P[X
2
= j] = p
(2)
ij
.
.
.
.
.
.
.
.
.
.
The expected number of visits of state j at time n equal P[X
n
= j] = p
(n)
ij
.
By aggregating, we nd that the expected number of visits of state j in period {1, . . . , n}
equal

n
k=1
p
(k)
ij
and the expected average number of visits of state j in period {1, . . . , n}
equal
1
n

n
k=1
p
(k)
ij
. In the long run this leads to a frequency of visits of state j of
. . lim
n
1
n
n

k=1
p
(k)
ij
. . .
Without a proof we postulate that these limits always exist: for every j I, the long
run frequency of visits of state j is well dened. Furthermore, for states that have a
nite expected rst passage time, these limits have an intuitive value:
Theorem 2.5 Let j be a state of a DTMC with E[
j
] < . Then the long run rate of
visits of a state j equals
lim
n
1
n
n

k=1
p
(k)
ij
=
1

jj
. (2.5)
Note that even if E[
j
] < , it might be the case that
jj
= (can you provide an
example?). In that case
1

jj
should be interpreted as 0. If we drop the assumption that
X
0
= i, Expression (2.5) turns in to
. . lim
n
1
n
n

k=1

iI
p
(k)
ij
P[X
0
= i] =
1

jj
. . .
The intuition behind this is straightforward: If a bus station is visited by a bus about
every half hour, the long run frequency of visits equals to 2 buses per hour. Likewise, if
it takes
jj
time units on average between two visits of state j, the chain will visit this
state
1

jj
times per time unit.
13
One might expect that the probabilities of being in a state will converge in the long run
to these frequencies. If this is the case, we say that the system reaches equilibrium. The
corresponding distributions are important enough to be named:
Denition 2.6 A probability distribution {
i
: i I} is called an equilibrium or
stationary distribution for DTMC {X
n
, n IN } if
. .
j
=

kI

k
p
kj
, or, equivalently, = P. . .
Such a is called stationary since if X
0
, then X
1
, and, by induction, X
n

for all n IN (cf, Equation (2.3)):
. .P[X
1
= j | X
0
] = (P)
j
=
j
. . .
So, if a chain starts in equilibrium, it remains in equilibrium forever. Many chains
converge to equilibrium however regardless their initial distributions. Can we nd su-
cient conditions that guarantee existence of or even convergence to equilibrium? For the
existence, it suces to nd a so-called regenerative state. A state r is called regenerative
if
ir
< for all i. So regardless the initial state of the system, state r will be visited
in nite time.
Theorem 2.7 If DTMC {X
n
: n IN
0
} has a regenerative state, then it has a unique
equilibrium distribution = {
j
}
jI
and for any i I:
. .
j
= lim
n
1
n
n

k=1
p
(k)
ij
. . .
Instead of computing all these limits, it is far more easy to solve the system =
P,

iI
p
i
= 1. We will provide the proof for chains with a nite state space, but it is
facultative.
Proof (facultative) First assume that X
0
= r. Dene
j
= lim
n
1
n
n

k=1
p
(k)
rj
. So
j
is the long run frequency of being in state j. The long run frequency of being somewhere
is 1, so

jI

j
= 1, i.e., is a distribution.
2
2
Here we apply that |I| is nite(!)
14
. .
We have (P)
j
=

lim
n
1
n
n

k=1
p
(k)
r

p
j
= lim
n
1
n
n

k=1

I
p
(k)
r
p
j
= lim
n
1
n
n

k=1
p
(k+1)
rj
= lim
n
1
n
n+1

k=2
p
(k)
rj
=
j
,
. .
This makes an equilibrium distribution. Now, drop the assumption that X
0
= r. Let
i I be arbitrarily chosen and assume that X
0
= i. Since,
ir
< , we know that, in
nite time, state r is visited, say after T time units. Long run frequencies do not change
if we do not take the rst T time units into account:
lim
n
1
n
n

k=1
p
(k)
ij
= lim
n
1
n

k=1
p
(k)
ij
+
n

k=T+1
p
(kT)
rj

lim
n
1
n
n

k=1
p
(k)
ij
= lim
n
1
n
T

k=1
p
(k)
ij
+ lim
n
1
n
n

k=T+1
p
(kT)
rj
lim
n
1
n
n

k=1
p
(k)
ij
= 0 + lim
n
1
n
nT

k=1
p
(k)
rj
lim
n
1
n
n

k=1
p
(k)
ij
= lim
m
1
m+T
m

k=1
p
(k)
rj
lim
n
1
n
n

k=1
p
(k)
ij
= lim
m+T
m
m+T
1
m
m

k=1
p
(k)
rj
lim
n
1
n
n

k=1
p
(k)
ij
= lim
m
1
m
m

k=1
p
(k)
rj
lim
n
1
n
n

k=1
p
(k)
ij
=
j
.
So, regardless which state is the initial one, the long term frequencies form the equilib-
rium distribution . Why are not there more equilibrium distributions? Suppose y is a
stationary distribution. We have
15
. .
y = yP
y = yP
2
.
.
.
y = yP
n

y
j
=
1
n
n

k=1
(yP
k
)
j
for all j I.. .
. .
Finally, y
j
=
1
n
n

k=1

iI
y
i
p
(k)
ij
=

iI
y
i
1
n
n

k=1
p
(k)
ij

iI
y
i

j
=
j
.
. .2
To apply the previous theorem one needs to nd a regenerative state. Therefore we need
some notions concerning communication. This will be the topic of the next subsection.
2.5 Communication
Denition 2.8 State j is accessible from state i if p
(n)
ij
> 0 for some n 0. We denote
this by i j. If two states are accessible from each other, we say that they communicate,
notation, i j.
Properties of the relation are
1. reexivity i i for all i I . . (take n = 0: p
(0)
ii
= 1 > 0)
2. symmetry i j implies j i,
3. transitivity i j and j k imply i k.
These three properties make communication an equivalence relation, i.e., communication
divides I in disjoint equivalence-classes. A Markov chain is called irreducible if all states
communicate.
Example 6 Let I = {0, 1, 2, 3}, P =

1
2
1
2
0 0
1
2
1
2
0 0
1
4
1
4
1
4
1
4
0 0 0 1

.
16
P does not dene a Markov chain alone, an initial distribution, i.e., the distribution of
X
0
, is needed as well. However, the state classication does not depend on the initial
distribution. Draw yourself a transition probability diagram. We have, p
33
= 1, so 3 is
an absorbing state. The communication partition is

{0, 1}, {2}, {3}

.
Denition 2.9 A set of states C I is called closed, if
. .X
i
C = X
i+1
C. . .
A set is minimal closed if it does not contain a proper closed subset.
I itself is by this denition always closed. In the previous example, the closed sets are
I, {0, 1, 3}, {0, 1} and {3}. Without a proof we postulate
Theorem 2.10 Let {X
n
: n IN
0
} be a DTMC with nite state space I. Then there
exists a regenerative state if and only if the chain has a unique minimal closed set.
Corollary 2.11 A DTMC with a nite state space has a unique equilibrium distribution
if and only if it has a unique minimal closed set.
The next question that arises is: will this equilibrium always be reached, i.e., is it the
case that lim
n
P[X
n
= j | X
0
= i] =
j
for all initial states i? A counter example is
readily found:
Example 7 Take I = {0, 1}, P =

0 1
1 0

, X
0
= 0. Both states are regenerative,
and the equilibrium distribution is given by
0
=
1
=
1
2
. Nevertheless, lim
n
P[X
n
= 0]
does not exist since P[X
n
= 0] keeps on toggling between 0 and 1.
To ensure convergence one other notion is needed: periodicity.
Denition 2.12 Let i be a state with
ii
< . The period of state i is said to be d if
d is the greatest common divisor
3
of the indices n 1 for which p
(n)
ii
> 0. A state i with
period d = 1 is said to be aperiodic.
3
Let A IN. The greatest common divisor, or gcd for short, is the largest natural number such that
all members of A are divisible by this number: gcd(A) = max{k IN :
a
k
IN for all a A}.
17
Theorem 2.13 Let {X
n
: n IN
0
} be a DTMC with a nite state space and an ape-
riodic regenerative state. Then in the long run the chain reaches its unique equilibrium
, i.e., . . lim
n
P[X
n
= j | X
0
= i] =
j
for all i, j I. . ., i.e.,
Resuming, for each DTMC we have
For every i, j I, the long run frequency of visits of state j given that X
0
= i is
well dened. If E[
j
] < , then lim
n
1
n
n

k=1
p
(k)
ij
=
1

jj
.
If the process possesses a regenerative state, these limits do not depend on the initial
state and we can dene
j
= lim
n
1
n

n
k=1
p
(k)
ij
. It holds that = P.
If moreover the state space is nite, then

jI

j
= 1 and is the unique equilibrium
distribution.
If moreover the regenerative state is aperiodic, the process converges to equilibrium,
i.e., lim
n
P[X
n
= j | X
0
= i] =
j
for all i, j I.
2.6 Cost-Reward Structures on Finite Irreducible DTMCs
Let r : I IR be a reward function, i.e., every time state i is visited, a reward of
r(i) is gained. If {X} is nite and irreducible, then we can apply the following intuitive
theorem.
Theorem 2.14 Let {X
k
: k IN
0
} be an irreducible DTMC with nite state space I
and let f : I IR be a cost- (reward-) structure. Let be the unique equilibrium
distribution of the Markov chain (cf. Theorem 2.7). Then
lim
n
1
n
n

k=1
f(X
k
) =

jI

j
f(j). (2.6)
The left hand side of Equation (2.6)concerns long run average costs, the right hand
side concerns the mean costs per time unit at equilibrium. Note that even if the chain
does not converge to equilibrium (because of periodicity), the theorem is still valid.
18
Often costs (rewards) r(i j) are made at transitions rather than at states. If so, is it
still possible to apply the theorem above? We can perform the following trick: consider
the coming transition costs as expected costs of the current state.
Example 8 A taxi rides between three cities 0,1 and 2, so I = {0, 1, 2}. After each
ride, it waits in the city where it brings its client to until a new client asks for a ride.
Let X
0
be the city where the taxi starts its working day and let X
n
denote the city to
which the taxi brings its n
th
client. Suppose the (mean) waiting costs depend on the
city where the taxi is situated and are expressed by c(i) (i I). The rewards per
ride depend on both departure and destination and are called r(i j). We can dene
a (net) reward structure by
. .f(i) = c(i) +E[reward next ride] = c(i) +

jI
p
ij
r(i j) . .
Theorem 2.14 can be applied to nd that the average reward per ride will be
. .

j{0,1,2}

j
f(j).. .
19
3 Continuous Time Processes
Like discrete time stochastic processes, continuous time processes have a state space,
denoted by I. The time index set of a continuous time process is an interval, usually
T = [0, ). Where a discrete time process can jump to another state only at integer
time moments, a continuous time process X = {X(t) : t 0} can jump to another state
at any moment in T. In this course, we will only consider processes that only switch in
nite segments of time only nitely many times. Therefore, a sample of such a process is
a so called step-function. To visualize such a function, put time on the x-axis and state
space I the y-axis. The graph will be horizontal everywhere but on a sequence of time
moments. A continuous time process starts at time 0 at some state i, i.e., X(0) = i,
stays there for a while, and then jumps after a random while T
1
to another state j, so
j = i. Then it stays at j for a random period of length T
2
before it makes its second
transition (jump), and so on. Just as in the discrete time case, we will focus on processes
with the Markov property.
3.1 The Markov Property for Continuous Time Processes
The Markov property can be copied from the discrete time context, except the time
index is agged dierently (X(t) instead of X
n
):
Denition 3.1 A stochastic process {X(t) : t [0, )} with state space I possesses
the Markov property if for all increasing sequences of time moments t
1
< < t
n+1
and
i
1
, . . . , i
n+1
in I we have
P[X(t
n+1
) = i
n+1
| X(t
1
) = i
1
, . . . , X(t
n
) = i
n
] = P[X(t
n+1
) = i
n+1
| X(t
n
) = i
n
]. (3.1)
Such a process is called a Continuous Time Markov Chain or CTMC for short. The
interpretation of the Markov property remains the same. Given the present, the history
of a Markov process does not inuence its future, or, cryptically,
. .[X(future) | X(present)] X(past). . .
20
A CTMC is called time-homogeneous if for all u > 0, i, j I,
P[X(t + u) = j | X(u) = i] = P[X(t) = j | X(0) = i]. (3.2)
This course, time-homogeneity is assumed.
Suppose we know the state in which the process starts, let us say X(0) = i. Let T
1
be
the time it takes until the rst jump (transition), so X(u) = i for all u [0, T
1
) and
X(T
1
) = i. It is convenient and customary to assume that T
1
is a proper random variable
in the sense that P[T
1
= ] = 0. We have
P[T
1
> t +s | T
1
> s, X(0) = i]
= P[X(u) = i u [s, t +s] | X(u) = i u [0, s]]
= P[X(u) = i u [s, t +s] | X(s) = i]
= P[X(u) = i u [0, t] | X(0) = i]
= P[T
1
> t | X(0) = i].
(3.3)
The second equality is due to the Markov property and the third one uses time ho-
mogeneity. We conclude that, given the initial state, the time it takes until the rst
transition has the memoryless property! So, there is a thin line between the Markov
property and the memoryless property. The rst class of a CTMCs we will discuss is
the class of Poisson processes:
3.2 Poisson and Other Counting Processes
A counting process is an idealized stochastic model to count events that occur ran-
domly in time (generically called renewals or arrivals). Counting processes are used as a
foundation for building more realistic models. They are often found embedded in other
stochastic processes, e.g., to describe incoming calls at a service center.
Denition 3.2 A counting process N = {N(t) : t 0} is a stochastic process with
states space I = {0, 1, 2, 3, } such that there exists a series of nonnegative random
variables T
1
, T
2
, T
3
, with
N(t) = 0 for all t [0, T
1
),
21
N(t) = 1 for all t [T
1
, T
1
+ T
2
),
.
.
.
N(t) = n for all t with T
1
+ T
n
t < T
1
+ T
n+1
.
Let S
k
:= T
1
+ + T
k
, so S
k
denotes the time moment at which the k
th
event takes
place. N(t) denotes the number of events until time t, i.e.,
. .N(t) := max{k IN| S
k
t}. . .
{N(t) | t 0} or simply N, is called the counting process with interarrival times
T
1
, T
2
, . . . Typical events that are counted in this course are
customers entering a shop,
people joining a queue,
internet users visiting a web page,
job completions of a server,
machine breakdowns; replacements.
In many models it is assumed that the times T
n
between the successive arrivals are
independent and identically distributed. If so, we also speak of a renewal process. If,
moreover, it is assumed that the interarrival times are exponentially distributed, we
speak of a Poisson Process.
Denition 3.3 A counting process {N(t) : t 0} with interarrival times T
1
, T
2
, ...
is called a Poisson process (shortly PP or PP()) if the interarrival times form an i.i.d.
sequence and T
i
exp() for some > 0 and all i IN.
The parameter is the frequency with which the to be counted events occur. It is
called the arrival-rate, failure-rate, service-rate, depending on what is counted. The
interpretation of is as follows: it takes on average
1

time units between two events


that count. So there are events per hour to be expected. So, the main property of a
Poisson process is that the events occur at a constant rate.
Why has a Poisson process the Markov property? It is dicult to give a formal proof,
but an intuitive argument is readily given. Suppose you have been monitoring a Poisson
22
process up to time moment t. You have stored every time moment at which the events
occur in the list (S
1
, . . . , S
N(t)
). What does this tell about the forthcoming part of the
process? Because of the constant arrival rate, the waiting time until the next arrival
is exp() distributed, you did not need to observe the process to know that. The only
relevant information that the current sequence (S
1
, . . . , S
N(t)
) gives, is the number of
arrivals so far, i.e., N(t). Therefore the present situation (N(t)) gives the same relevant
information about the future of the process as the complete history {N(s) : s [0, t]}.
The random variable S
n
, being the sum of n independent exponentially distributed ran-
dom variables, and has a so called Erlang distribution, see e.g., http://en.wikipedia.
org/wiki/Erlang_distribution. Its distribution function is given by
P[S
n
t] = F
Sn
(t) = 1
n1

k=0
e
t
(t)
k
k!
. (3.4)
This enables us to nd the distribution of N(t), the number of arrivals up to time t:
Proposition 3.4 Let N = {N(t) : t } be a Poisson process with rate . Then for
each t > 0, N(t) has the Poisson(t) distribution, i.e.,
P[N(t) = n] = e
t
(t)
n
n!
. (3.5)
Proof The event {N(t) n} can also be described by {S
n
t}. Hence,
. .
P[N(t) n] = P[S
n
t]
= 1
n1

k=0
e
t
(t)
k
k!
.
. .
Therefore, P[N(t) = n] = P[N(t) n] P[N(t) n + 1]
=

1
n1

k=0
e
t
(t)
k
k!

1
n

k=0
e
t
(t)
k
k!

= e
t
(t)
n
n!
.
. .
23
As a consequence we nd that
E[N(t)] =

n=1
n e
t
(t)
n
n!
= e
t
t

n=1
(t)
n1
(n 1)!
= e
t
t

k=0
(t)
k
k!
= e
t
t e
t
= t.
(3.6)
This makes sense; if on average events per time unit take place, then after t time units
you can expect t events so far.
Example 9 Taxis wait for passengers at Central Station in Amsterdam. A so called
trein-taxi is a cheap alternative for a regular taxi. Several passengers share one taxi.
Suppose that passengers arrive according to a PP() process with = 0.4, being the
rate of passengers arriving at the taxi standing per minute. A taxi departs if either 4
passengers have been collected, or the rst passenger has waited for 10 minutes.
a. What is the probability that the rst passenger has to wait for 10 minutes?
b. What is the probability that the rst passenger has to wait for 10 minutes, given
that a third passenger enters exactly 5 minutes after the rst?
a. Set t = 0 as the moment that the rst passenger enters the taxi. He has to wait
for 10 minutes if less than 3 other passengers arrive at the scene. In terms of a
Poisson()-process, this is the event {N(10) < 3}. The random variable N(10) has
a Poisson(10)-distribution, so the answer to question a. equals
. .P[N(10) < 3] =
2

n=0
e
10
(10)
n
n!
. . .
b. Set t = 0 as the moment that the third passenger enters the taxi. Now, the rst
passenger has to wait for 5 more minutes if no passenger shows up in 5 minutes time.
24
This is the event {N(5) = 0}. The answer to question b. equals P[N(5) = 0] =
e
5
.. .
The remainder of this section will be spent on properties of Poisson processes. The rst
four characterize Poisson processes.
(A) A Poisson process has got Independent increments:
..if [a, b] and [c, d] are disjoint time-segments, then N(b) N(a) N(d) N(c). ..
(B) A Poisson process has got Stationary increments, i.e., the distribution of the number
of arrivals within a time segment depends on the length of the segment, but not on
its starting time. In formula,
. .for all t 0: N(b) N(a) N(b +t) N(a +t). . .
As a consequence, for all s 0, {N(t +s) N(s), t 0} is another Poisson process.
(C) If t > 0 is a small increase in time, then the probability of one arrival in an
interval [a, a + t] is about t. Formally,
. .P[N(a + t) N(a) = 1] = t +o(t), . .
in which o(t) denotes an unspecied expression (called the error term) with the
property that lim
t0
o(t)
t
= 0.
The proof of property (C) uses the standard summation e
x
=

k=0
x
k
k!
. We have
e
t
=
1
0!
+
(t)
1!
+
(t)
2
2!
(3.7)
Hence, P[N(a + t) N(a) = 1] = P[N(t) = 1]
= e
t
t
= t[1 t +
1
2
(t)
2
]
= t +o(t).
25
In this elaboration the rst equality is due to Property (B), the second uses Proposi-
tion 3.4, and in the nal one all negligible terms (the terms containing a factor t
2
)
are put in error term o(t).
(D) For small t > 0, the probability on two or more arrivals in time segment [a, a+t]
is negligible. Formally,
. .P[N(a + t) N(a) 2] = o(t). . .
The proof of property (D) is facultative:
. .
P[N(t) 2] = 1 P[N(t) = 0] P[N(t) = 1]
= 1 e
t
e
t
t
= 1 (1 + t)e
t
= 1 (1 + t)(1 t +o(t))
=
2
(t)
2
+o(t)
= o(t).
. .
The fourth equality is valid because of Lemma A.5.
Without a proof, we state that the four properties above characterize a Poisson process:
Theorem 3.5 Any counting process that obeys properties (ABCD) is a Poisson process.
Corollary In order to show that some counting process is a Poisson process, just
check properties (ABCD).
Suppose, trac is analyzed in order to be able to decrease the congestion at some road.
The number of vehicles passing a certain road crossing is counted, which turns out to
produce a Poisson() process {N
v
(t) : t 0}. After a while, one realizes that it is
relevant to count cars and motorcycles separately. Luckily the person who has stored
so far the arrival moments has also written down the type of the vehicle. This leads to
two new counting processes {N
c
(t) : t 0} and {N
m
(t) : t 0} respectively, obeying
N
v
= N
c
+N
m
. Under which condition will these new processes be Poisson processes as
26
well, and if so, with what rates? The next theorem (the proof is skipped) answers this
question. A lot of independency has to be assumed, but nothing further.
Theorem 3.6 [Splitting Theorem for Poisson Processes]
Let {N(t) | t 0} be PP(). Suppose there are two types of arrivals, 1 and 2. Suppose,
independent of everything, an arrival is of type i with probability p
i
(0, 1). Let
{N
i
(t) | t 0} count the arrivals of type i. Then N
1
and N
2
are two independent
Poisson processes with arrival rates
1
:= p
1
and
2
:= p
2
.
A similar result works the other way round. Given that the processes N
c
and N
m
are
independent Poisson processes, the counting of vehicles N
v
:= N
c
+ N
m
is a Poisson
process as well:
Theorem 3.7 [Merging Theorem for Poisson Processes]
Let {N
1
(t) : 0} and {N
2
(t) : t 0} be independent Poisson processes with arrival
rates
1
and
2
. Then {N(t) : t 0} dened by N(t) := N
1
(t) + N
2
(t) for all t is a
Poisson process with arrival rate
1
+
2
.
Let I
k
be the type of the k
th
arrival,
and X
k
the k
th
interarrival time of the merged process N.
Then P[I
k
= i] = P[I
k
= i | X
k
= t] =

i

1
+
2
(so I
k
X
k
).
The proof of this theorem is beyond the scope of this course as well, but the following
convenient corollary can be shown relatively easy.
Corollary 3.8 Let X Poisson(
1
), Y Poisson(
2
), XY . Then X + Y
Poisson(
1
+
2
).
Proof Let N
i
be a PP(
i
) process (i {1, 2}) with N
1
N
2
. Then X N
1
(1)
and Y N
2
(1). Hence, X + Y [N
1
+ N
2
](1). By the merging theorem, N
1
+ N
2
is a
PP(
1
+
2
) process. So, [N
1
+N
2
](1) Poisson(
1
+
2
).. .
27
Finally, we will discuss a relation between Poisson processes and the uniform distribution.
It is about the conditional distribution of an arrival. Suppose it is known that in a given
time segment [0, t] exactly one arrival has taken place, i.e., N(t) = 1. What is the
(conditional) distribution of the time of arrival of this customer? Or, suppose you know
that a soccer game ended in 1-0. In which minute the goal has been scored? In formula,
nd . .P[S
1
x | N(t) = 1]. . .nd
One might have the intuition that time segments of equal length have equal probabilities
to contain the arrival, so the uniform distribution sounds reasonable. Let U U[0, t] be
a random variable with the uniform distribution on [0, t].
Lemma 3.9 For all x [0, t] we have
P[S
1
x | N(t) = 1] = P[U x] =
x
t
. (3.8)
Proof
. .

K
K
P[S
1
x | N(t) = 1] =
P[S
1
x, N(t) = 1]
P[N(t) = 1]

K
K
=
P[N(x) = 1, N(t) N(x) = 0]
P[N(t) = 1]

K
K
=
P[N(x) = 1]P[N(t) N(x) = 0]
P[N(t) = 1]

K
K
=
xe
x
e
(tx)
te
t

K
K
=
x
t
.
. .
The second equality might be best understood by drawing a sample. Indeed, under the
knowledge of {N(t) = 1}, the event {S
1
x} is also described by telling that N(x) = 1
and N(t) N(x) = 0. The third equality follows from Properties (B) and (A). The
fourth equality is valid since all probabilities involve events with Poisson distributions.
. .
As a consequence, S
1|{N(t)=1}
has distribution U[0, t] and E[S
1
| N(t) = 1] =
1
2
t. The
same intuition that, given {N(t) = n}, each small interval [a, a + t] [0, t] has the
same probability to contain an arrival leads to the idea that it is possible to simulate a
Poisson process conditioned on {N(t) = n} by means of n times applying the uniform
28
distribution on [0, t]. This is expressed by the following theorem, which is a direct
generalization of Lemma 3.9:
Theorem 3.10 For each t > 0 and n IN,
. .P[S
1
x
1
, S
2
x
2
, . . . , S
n
x
n
| N(t) = n] . .
. .equals . .
. .P[U
(1)
x
1
, U
(2)
x
2
, . . . , U
(n)
x
n
], . .
in which U
1
, . . . , U
k
(i.i.d) and U
i
U[0, t], and U
(k)
is the k
th
in line if U
1
, . . . , U
n
are
arranged in increasing order.
In words: Suppose you generate n samples with distribution U(0, t) and arrange the
outcomes in increasing order, resulting in n outcomes
. .U
(1)
, . . . , U
(n)
. .
with U
(1)
< < U
(n)
. Then this n-tuple has the same distribution as
. .S
1|{N(t)=n}
, . . . , S
n|{N(t)=n}
. . .
The theorem itself is facultative, but you might want to apply its consequence:
Corollary 3.11 For all integers k, n with 1 k n we have
. .E[S
k
| N(t) = n] =
kt
n + 1
. . .
An intuitive argument for the validity of this corollary is as follows. The symmetric
shape of the uniform density gives E[S
1
| N(t) = 1] =
t
2
. For notational purposes,
denote S
0
= 0 and S
n+1
= t. The same argument gives that S
k
is expected to be
situated right in the middle of time segment [S
k1
, S
k+1
] for all k {1, . . . , n}. So, all n
interarrival times T
k
and time segment [S
n
, t] have equal expected lengths, i.e., t/(n+1).
Hence, E[S
k
| N(t) = n] = E[T
1
+ + T
k
| N(t) = n] =
kt
n + 1
.
3.3 Continuous Time Markov Chains
This section returns to general stochastic processes with a continuous time index set
and the Markov property. Furthermore, we assume time homogeneity and restrict the
29
number of jumps to nitely many in nite time segments. Throughout the section
X = {X(t) : t 0} is a process with the mentioned properties and state space I.
Let i I be any state and assume that at some time X(t) = i. Let J
i
be the time it
takes until the rst jump (transition) out of state i, so X(u) = i for all u [t, t +J
i
) and
X(t + J
i
) = i. J
i
is called the sojourn time (verblijfstijd) at i. If the chain never leaves
state i, then J
i
:= and state i is called absorbing. To avoid exceptions, we assume
that there are no absorbing states. In order to determine the distribution function of
J
i
, we might as well take t = 0, because of time-homogeneity. In that case J
i
= T
1
,
which has been dened to be the time it takes until the rst jump of the system (see
Subsection 3.1). We have seen that T
1
has the memoryless property and has therefore
an exponential distribution. Hence, there exists a positive real number
i
such that
J
i
exp(
i
).
i
is called the transition rate at state i.
So, given that the system is in state i at a certain time, it stays there for an exponentially
distributed time. What about the next state, i.e., can we say something about the
probability that it will be, say, state j? Because of time homogeneity and the Markov
property, the state to which the system jumps may depend on the current state i, but
not on any further information about how the system got into state i. Therefore, just
like in the discrete time setting, we can speak of transition probabilities.
p
ij
:= P[j is the next state | i is the current state]. (3.9)
Unlike in the discrete time setting, p
ii
> 0 is not possible; the system can only jump
to an other state since transitions are associated with changes of the state of the sys-
tem. Whereas we spoke of one step transitions in the previous section, here one jump
transitions would be more appropriate.
Just like the class of Poisson processes, the class of CTMCs without absorbing states
can be characterized by a list of four properties:
Theorem 3.12 Let X = {X(t) : t 0} be a stochastic process with state space I.
Then X is a CTMC without absorbing states if and only if it obeys the following four
30
properties:
a) Each time the process reaches state i, it stays there for a stochastic period with
length exp(
i
). . .(i I)
b) If the chain leaves state i, it jumps to state j with probability p
ij
, with

j=i
p
ij
= 1
and p
ii
= 0.. .(i, j I)
c) All sojourn times and transitions are independent on each other.
d) There exists an M IN such that 0 <
i
M for all i I.
Property d) might appear out of the blue. This assumption avoids the phenomenon of
an explosion of jumps, as the following example displays.
Example 10
Suppose we have a counting process N = {N(t) : t 0} obeying properties a), b), and
c) with
i
= 2
i
for all i I. Since N is a counting process, p
i,i+1
= 1 for each i IN
0
and the chain starts in state N(0) = 0. The expected time until it jumps to state 1
equals
1

0
= 1. When in state 1, the next arrival will take place after an expected time
of length
1

1
=
1
2
. Each consecutive jump will happen in expectation twice as fast as the
previous one. So E[S
k
] = 1 +
1
2
+ +
1
2
k1
= 2
1
2
k
. After 2 time units, the expected
number of jumps is innite! So after a while one could say that X(t) = , but / I.
So, the example is not only unrealistic, it is also infeasible. That is why property d) is
needed.
Let us turn the attention to a widely applied model:
Example 11 The M/M/1-queue
In queueing theory, Kendalls notation is the standard system used to describe and
classify the queueing model that a queueing system corresponds to. First suggested
by D. G. Kendall in 1953 as a three factor A/B/C notation system for characterizing
queues, it has since been extended to include up to six dierent factors, but we stick to
the original three. See e.g., http://en.wikipedia.org/wiki/Kendalls_notation if
you would like to have more information.
31
M: The rst M represents exponential interarrival times of the customers that form a
queue. Usually the arrival rate is called . Let T
1
, T
2
, . . . represent the sequence of
interarrival times. It is assumed that these are i.i.d. T
n
exp(), so the related
process that counts the arrivals of customers is a Poisson() process.
M: The second M denotes that service times have exponential distributions as well. The
service completion times SC
1
, SC
2
, . . . form another (i.i.d) sequence and the service
rate is usually denoted by , so SC
n
exp() for all n IN.
1: The third factor denotes the number of servers; 1 in this example.
Let X(t) give the number of customers present at time t, including the one at service (if
any). Does {X(t) : t 0} obey the Markov-property? Instead of verifying the property
directly, we might as well check the four characterizing properties a), b), c) and d).
A transition from one state to another can be caused by two types of events; arrivals
of customers and service completions. In state i = 0, the server is idle, so J
0
exp()
and
0
= . In any other state i > 0, the transition time J
i
until a jump will take
place can be expressed by J
i
= min{T, SC}, in which T stands for a typical interarrival
time and SC for a typical service completion time. Due to the memoryless property it
does not matter how long a service is already in progress or how long ago the previous
customer arrived. Theorem A.2 tells that the minimum of two independent exponentially
distributed random variables is exponentially distributed itself: J
i
= min{T, SC}
exp( +). This gives property a): J
i

exp(), if the server is free,


exp( +), if the server is busy.
b) Theorem A.4 tells that P[min{T, SC} = T] =

+
, yielding
. .p
ij
=

1 if i = 0 and j = 1,
/( +) if 0 < i = j + 1,
/( + ) if 0 < i = j 1,
0 if |i j| = 1.
. .
c) is valid by assumption (or not, but then there is no story).
d) 0 <
i
+ for all i I.. .
32
3.3.1 Time Dependent Transition Probabilities and Transition Rates
Let, for all t 0 and i, j I:
p
ij
(t) := P[X(j) = t | X(0) = i]. (3.10)
In words, p
ij
(t) is the probability that when in i, the system is in state j after t time
units. It is not required that the movement from i to j takes place in a single transition.
In general, p
ij
(t) is very hard to derive. However, the cases that t is close to zero and
that t tends to innity can be manageable. Let t > 0 and think of a number so small
that t
2
is negligible compared to t.
The probability that two transitions take place in period [t, t +t] is negligible because
if X(t) = i and if k is the next state visited after t, then
P[2 jumps in [t, t + t]] P[J
i
< t] P[J
k
< t]
= (1 e

i
t
) (1 e

k
t
)
(1 e
Mt
)
2
= (Mt +o(t))
2
= o(t).
(3.11)
The second inequality uses property d) of a CTMC (see Theorem 3.12). The penultimate
equality is valid by Lemma A.5. Similarly, the probability that three (or any other
natural number exceeding two) transitions take place in period [t, t + t] is negligible.
Therefore,
p
ij
(t) = P[a single jump to j in period [0, t] | X(0) = i] + o(t)
= P[any jump in [0, t] | X(0) = i] P[rst jump is to j | X(0) = i] + o(t)
= P[J
i
< t | X(0) = i] P[rst jump is to j | X(0) = i] + o(t)
= (1 e

i
t
) P[rst jump is to j | X(0) = i] + o(t)
= (
i
t +o(t)) p
ij
+o(t)
=
i
p
ij
t + o(t).
The second equality uses the independence of J
i
and the destination of the rst jump
(property c)). For the fth equality we refer again to Lemma A.5. We conclude that
. . lim
t0
p
ij
(t)
t
=
i
p
ij
. . .
33
In words, when the chain is in state i, jumps to state j occur at a constant rate of p
ij

i
.
Hence, we dene . .q
ij
=
i
p
ij
, . .Hence, we dene
and call q
ij
the transition rate from i to j. If it is not possible to jump directly from
i to j (i.e., p
ij
= 0), this is expressed by a transition rate from i to j equal to 0. The
notions p
ij
, q
ij
, and
i
relate to each other by p
ij
=

q
ij

i
if j = i,
0 if j = i.
In particular, we have

j=i
q
ij
=

j=i

i
p
ij
=
i
. In words, the sum over all rates out
of state i with a specied destination equals the rate out of state i without a specied
destination.
Instead of drawing a transition probability diagram, it is more natural and more infor-
mative to draw a transition rate diagram; simply replace the probabilities by the rates
(the arrows remain the same). E.g., Figure 2 displays the transition rate diagram of
an M/M/1-queue. Just like a one step transition probability matrix for DTMCs, we
. .
`

'

0
`

'

1
`

'

2
`

'

3
. .
Figure 2: The transition rate diagram of an M/M/1-queue.
denote the (one jump) transition probability matrix by P. If the state space I equals
IN
0
it looks like
P =
0
1
2
.
.
.

0 p
01
p
02
p
03
. . .
p
10
0 p
12
p
13
. . .
p
20
p
21
0 p
23
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

. (3.12)
Similarly we introduce a transition rate matrix, usually denoted by a capital A and
entries a
ij
. Here,
a
ij
:=
q
ij
if j = i,

i
if j = i,
(3.13)
34
so the transition rate matrix for a CTMC with state space IN
0
is given by
A =
0
1
2
.
.
.

0
q
01
q
02
q
03
. . .
q
10

1
q
12
q
13
. . .
q
20
q
21

2
q
23
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

. (3.14)
The diagonal of A contains the unspecied transition rates
i
with minus signs. There-
fore, the rows of A sum up to 0. It may seem a rather articial notation. Why this has
been done will be claried when discussing equilibrium distributions.
Example 12 A single product inventory
Customers arrive PP(). Each customer buys 1 unit if available, no backlogging is
allowed. Opportunities to replenish occur PP(). Rell is only possible if the stock
is empty. Each rell has deterministic size Q. Dene X(t) to be the amount of stock at
time t. Assume that X(0) = Q. Is X = {X(t) : t 0} a time homogeneous CTMC?
Its state space is I = {0, . . . , Q}. Instead of verifying the denition of a CTMC, we
apply Theorem 3.12 and check the four characterizing properties. The argumentation
why properties a) and b) are valid is similar as in the M/M/1-example 11 and therefore
left out.
Property a):
i
=

if i {1, . . . , Q},
if i = 0.
Property b): p
ij
=

1 if j = i 1 or (i, j) = (0, Q),


0 else.
Property c): True by assumption.
Property d): 0 <
i
M := max{, }.
We conclude that X is a CTMC. The transition rate and probability matrices are re-
spectively
35
. .A =

0 0
0 0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
0 0

and P =

0 0 1
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 0

. . .
. .
3.3.2 Equilibrium Distributions
Intuitively, {X(t) : 0} is in equilibrium if there exists a distribution {p
j
}
jI
with
P[X(t) = j] = p
j
for all j I, t 0. In words, for all time moments, the probability to
be in a certain state is the same. If so, we have for all t 0, t > 0,
P

system reaches j in [t, t + t]

= P

system leaves j in [t, t + t]

k=j
P

jump from k to j in [t, t + t]

+o(t)

k=j
P

jump from j to k in [t, t + t]

+o(t)

k=j
P[X(t) = k](q
kj
t +o(t))

k=j
P[X(t) = j](q
jk
t + o(t))

k=j
p
k
q
kj
t +o(t)

k=j
p
j
q
jk
t +o(t)
Divide by t and let t 0 to infer that

k=j
p
k
q
kj
= p
j

j
. (3.15)
This gives rise to the following denition of an equilibrium distribution.
Denition 3.13 A probability distribution {p
j
, j I} is an equilibrium distribu-
tion (or stationary distribution) if for all j I. we have
. .

k=j
p
k
q
kj
= p
j

j
. . .
36
The equation can also be found by taking the inner product of p and the j
th
column
of A. This is exactly the reason why A has been dened as it has. Apparently, if p is
an equilibrium distribution, then pA = 0. So, to nd an equilibrium distribution, just
solve the system
pA = 0,

jI
p
j
= 1, p 0. (3.16)
Often it is convenient to search for an x with xA = 0, and then normalize.
The j
th
equation of pA = 0 can be interpreted as follows. When the system is in
equilibrium, events forcing the system to move into state j happen with the same rate
as events that move the system out of state j:
rate into state j =

k=j
p
k
q
kj
= p
j

j
= rate out of j. (3.17)
This is called a balance equation or ow rate equation. More generally,
. .rate into set A = rate out of set A, . .(A I)
i.e., . .

k/ A
p
k

jA
q
kj
=

jA
p
j

k/ A
q
jk
. . .
Assume that an equilibrium p exists (later on we will discuss when this assumption is
righteous). Such equations can be visualized by a transition rate diagram in which in
each circle the equilibrium probability p
i
of the state i that it represents is put. Let us
clarify this by discussing the M/M/c-queue.
Example 13 The M/M/c-queue (the Erlang delay model)
Two models are named after the Danish Mathematician Agner Krarup Erlang. In the
Erlang loss model potential customers are not willing to wait when all servers are busy
and the revenues they generate will be lost. In the Erlang delay model all customers
patiently wait for their turns. The two Ms in M/M/c denote that both the interarrival
times between consecutive customers and all service times have exponential distributions.
Their respective rates are and . c is a natural number and species the number of
servers, all identical. How can we model such a queue by means of a CTMC {X(t) :
37
t 0}? Let X(t) denote the number of customers in the system at time t, including the
ones in service. Assume that X(0) = 0.
First, we dene the state space I to be {0, 1, } and determine the transition rates.
An increase of the number of customers is the result of an arrival. Arrivals take place
by a constant rate of , so
. .q
j,j+1
= for all j I. . .
A decrease of the number of customers is the result of a service completion. Suppose that
the system is in a state in which k servers are busy. At what rate service completions
occur? If SC
j
represents the service time of server j, then it is the shortest service time
that forces the state transition, say SC = min{SC
1
, . . . , SC
k
}. We know by Theorem
A.2 already that if k = 2, then SC exp(2). By induction, this statement can be
easily generalized for general k. Indeed, assume that it holds for k 1, say SC :=
min{SC
1
, . . . , SC
k1
} exp((k 1)), then SC = min{SC, SC
k
}, so
. .SC exp((k 1) +) = exp(k). . .
In state k there are either k or c servers at work, depending on which of the two numbers
is the smallest. Hence, we have
. .q
k,k1
= min{k, c} for all k IN. . .
This leads to the following transition rate diagram:
`

'

0
`

'
2
1
`

'
3
2
E

'
(c 1)
`

'
c
c 1
`

'
c
c
`

'
c
c + 1
How to nd an equilibrium probability vector p? Let p be an equilibrium probability
vector. We are going to describe all coordinates of p in terms of p
c
. Consider the subset
{0, . . . , c} of I. This subset is visualized by the rectangle in the diagram below. Instead
of the names of the states, we put the coordinates of p inside the circles:
38
`

'

p
0
`

'
2
p
1
`

'
3
p
2

E

'
(c 1)
`

'
c
p
c1
`

'
c
p
c
`

'
c
p
c+1

There is just one arrow pointing from a cell inside the box around set {0, . . . , c} to a cell
outside the box. In equilibrium it generates an outow of rate p
c
q
c,c+1
. There is just
one arrow pointing from a cell outside the box to a cell inside the box. In equilibrium it
generates an inow of rate p
c+1
q
c+1,c
. In other words, ow out of set {0, . . . , c} = ow
into set {0, . . . , c} gives p
c
= c p
c+1
. Hence, p
c+1
= p
c


c
. Let us call this fraction
; =

c
. Now, we consider the set {0, . . . , c + 1}:
`

'

p
0
`

'
2
p
1
`

'
3
p
2

E

'
(c 1)
`

'
c
p
c1
`

'
c
p
c
`

'
c
p
c+1

Flow out of set {0, . . . , c+1} = ow into set {0, . . . , c+1} gives p
c+1
= c p
c+2
. Hence,
p
c+2
= p
c+1
= p
c

2
. By induction, p
c+k
= p
c

k
for all k IN and
. .

i=c
p
i
= p
c
(1 + +
2
+ ) =

pc
1
if < 1,
if 1.
. .
This implies that p can only exist if < 1. There is a heuristic argument for this. There
is a job ow of jobs per hour, each having an expected service length of
1

hours. So,
the total number of working hours per hour equals

on average. Hence, each individual


server must work hours per hour on average. This is only possible if 1. Servers
cannot work all of the time (there are periods in which no customers are in the queue),
so actually must be strictly smaller than 1. Because of this interpretation of , it is
called the server utilization.
39
Let us continue with computing p. For i < c, ow out of set {0, . . . , i} = ow into set
{0, . . . , i} gives p
i
= (i + 1)p
i+1
. Hence, p
i
= p
i+1
(i+1)

.
This describes p in terms of p
c
. The normalizing equation

k
p
k
= 1 now xes p. Let
us explicitly calculate p in terms of for the case that c = 1 (the M/M/1-queue). We
have
p
0
= p
1
/, so p
1
= p
0
,
p
k+c
= p
k+1
= p
1

k
= p
0

k+1
,
Aggregating leads to

k=0
p
k
= p
0

k=0

k
=
p
0
1
. Hence p
0
= 1 and p
k
=
k
(1).
This comports with the interpretation of being the server utilization; 1 is the
percentage of the time that the server is free in the long run (or in equilibrium).
3.3.3 Long Term Behavior
CTMCs behave similar as DTMCs in the long run. For every CTMC, there is a DTMC
with almost the same characteristics. They have the same state space, the same minimal
closed sets, the same regenerative states, and if one of them has a unique equilibrium,
the other one has a unique equilibrium too. Not the same one however. There is one
other dierence; the phenomenon of periodicity does not occur in the CTMC setting.
A discrete time indexed process {X
n
: n IN
0
} is called an embedding of a CTMC
{X(t) : t 0} if there is an increasing sequence of stochastic time moments t
0
, t
1
, . . .
such that X
n
= X(t
n
) for all n IN
0
. Let, similar to the notation in Section 3.2 about
Poisson processes, T
k
denote the interarrival time between the (k 1)
rst
and k
th
state
transitions of X and let S
k
= T
1
+ + T
k
. An important embedding is the one that
stores the sequence of states that are visited, i.e., X
0
:= X(0) and X
n
= X(S
n
) for all
n IN. It inherits the Markov property and the time homogeneity of the original chain.
Furthermore, its one step transition probability matrix equals the one jump transition
probability matrix P of the original process.
For DTMCs, we have seen theorems that provide sucient conditions that guarantee
existence of or convergence to equilibrium. Without a proof, we state that these theorems
40
apply for CTMCs as well. The reason is that the CTMC inherits its long term behavior
from its embedding. One might wonder why the equilibria of both processes do not
coincide. The following example might give insight:
Example 14 Let I = {0, 1, 2} and A =

200 100 100


10 20 10
1 1 2

. Then P =

0
1
2
1
2
1
2
0
1
2
1
2
1
2
0

.
Because of the symmetry of P, it is immediate that = (
1
3
,
1
3
,
1
3
) is the equilibrium of
the embedded DTMC (recall that must satisfy P = ).
j
gives the long run fraction
of times that a transition results in a visit of state j. This means that in the long run
each state is visited equally often. Solving pA = 0 leads to the equilibrium distribution
p = (
1
111
,
10
111
,
100
111
) of the CTMC. The reason of this dierence is that in the continuous
setting the sojourn times in states are taken into account. p
j
gives the long run fraction
of time that the system is in state j. Since,
0
= 200 = 10
1
= 100
2
, each visit
of state 0 last on average 10 times shorter than a visit of state 1 and 100 times shorter
than a visit of state 2. Therefore, the ratios p
1
: p
2
: p
3
relate to
1
3
/200 :
1
3
/20 :
1
3
/2, i.e.,
1 : 10 : 100. This explains that p = (
1
111
,
10
111
,
100
111
). In general, the way p and relate
follows from the fact that each visit of state j lasts
1

j
in expectation.
So, the requirements of having a unique equilibrium distribution are the same as in the
discrete time setting. The notions accessability, communication, irreducibility, regener-
ation state, and closedness of a subset of I can be copied directly from that setting. The
CTMC has one of the properties above if and only if the embedded DTMC has that
property. Recall e.g., that a state r is called regenerative if it is visited in nite expected
time, wherever the chain starts. If such a state exists, there can only be one minimal
closed set of states and r must be an element of this set (why?).
If the state space I is nite, having a unique minimal closed set is not only necessary,
but also sucient. Each element of it is regenerative.
If |I| = , more is required. In general, it is very hard to verify whether there is a
regenerative state. In the case of a queue type of chain, i.e., I = {0, 1, 2, 3, } and
p
ij
> 0 if and only if |i j| = 1, there must exist a so-called drift to the left, i.e.,
41
p
i,i1
> p
i,i+1
for all states i in some tail {N, N + 1, N + 2, } of the state space. For
the M/M/c-queue, this is exactly the case when < 1.
Theorem 3.14 Let X = {X(t) : t 0} be a CTMC. If the chain has a regenerative
state, then there exist a unique equilibrium distribution p of X and a unique equilibrium
distribution of the embedded chain {X
n
: n IN
0
}, where X
n
:= X(S
n
). They relate
by . .p
j
= c

j

j
, . .by
in which c equals the normalizing constant

kI

k
/
k

1
. Moreover, independent on
the initial distribution (the distribution of X(0)), the CTMC converges to the equilib-
rium, i.e., for all i, j I we have
. .p
j
= lim
t
p
ij
(t). . .
3.3.4 Cost-Reward Structures
Similar as in the DTMC setting, cost-reward structures can be dened to model opti-
mization issues. There, we only dened costs or rewards for states. Here, we follow a
dierent approach. Also transitions can lead to revenues or costs. So, a reward structure
consists of two parts:
For each state j, a reward rate r(j) denotes the rewards per time unit that are earned
when the system is in state j. These rewards are called continuous rewards.
Each transition (j k) can lead to revenues or costs too. The value of such a
transition is denoted by d
jk
. These rewards are called discrete rewards.
Let R(t) be the total reward in period [0, t]:
R(t) =

t
0
r(X(u)) du +

j,k:j=k
d
jk

#{jumps in [0, t] from j to k}

. (3.18)
Example 15 A reward structure on an M/M/1-queue
In an M/M/1-queue, customers pay a starting fee of e5., pay e0.10 per minute when
waiting, e10, per hour when being served. Can we nd the long run average rewards
per hour for the server?
42
We can only expect an armative answer when the system reaches equilibrium. This is
the case when =

< 1. Set the unit of time to be 1 minute. What is the appropriate


reward structure? As long as the system stays in state 0, there is no income. We can
model this by dening r(0) = 0. If there is at least one customer present, then the
customer being served pays
10
60
per time unit, all others pay
1
10
per minute, so dene
r(j) =
10
60
+
j1
10
for all j 1. Each time somebody joins the queue, 5 is earned. Hence,
dene d
j,j+1
= 5. The completion of a service yields no further rewards, so d
j,j1
must
be chosen to be 0.
The total rewards in the rst t minutes equal R(t). The average rewards per minute in
time segment [0, t] equal
R(t)
t
. Therefore the long run average rewards per minute equal
lim
t
R(t)
t
. Theorem 3.15 below states that the long run average rewards per time unit
equal the expected reward per time unit in equilibrium. In the example we nd that the
long run average rewards per minute equal
..

jI
r(j)p
j
+

jI
p
j

k=j
q
jk
d
jk
=

j=1

10
60
+
j 1
10

(1 )
j
+

j=0
(1 )
j
5
= 5 +

j=1
4 + 6j
60
(1 )
j
= 5 +

k=0
4 + 6(k + 1)
60
(1 )
k+1
= 5 +

k=0
10 + 6k
60
(1 )
k
= 5 +

6
(1 )

k=0

k
+

10
(1 )

k=0
k
k
= 5 +

6
+

2
10(1 )
.
..
The nal equality of the elaboration uses standard summations that can be found in the
Appendix (Equations (A.12) and (A.13)).
Theorem 3.15 [Renewal Reward Theorem]
Let {X(t) : t 0} be a CTMC with unique equilibrium distribution {p
j
: j I}. Let
r and d be the continuous and discrete parts of a cost-reward structure. If the number
of states is nite or if the chain is irreducible, then the long run average rewards per
43
time unit equal
4
. . lim
t
R(t)
t
=

jI
r(j)p
j
+

jI
p
j

k=j
q
jk
d
jk
. . .
Corollary 3.16 Let N PP(). Then lim
t
N(t)
t
= .
Proof Let {X(t) : t 0} be an M/M/1-queue and N the Poisson process counting
arrivals of customers. The discrete reward structure that counts these arrivals is given
by d
jk
= 1 if k = j + 1 and d
jk
= 0 if k = j + 1. Then R(t) = N(t) and
. . lim
t
N(t)
t
=

jI
p
j

k=j
q
jk
d
jk
=

jI
p
j
= .. .2
3.3.5 The PASTA Theorem and Littles Law
This nal subsection discusses two generally applicable theorems: the Pasta Theorem
and Littles Law. Both will be illustrated by an example before being postulated.
Example 12 A single product inventory (revisited)
Recall that I = {0, 1, . . . , Q} and that the transition rate and probability matrices in
this example are respectively
. .A =

0 0
0 0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
0 0

and P =

0 0 1
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 0

. . .
The matrix P is the one step transition matrix of the embedded DTMC as well. If
X(0) = X
0
= Q, the embedding is the deterministic and repetitive sequence of states
. .Q, Q1, Q2, . . . , , 1, 0, Q, Q1, . . . . . .
4
Formally, we should state equal with probability 1.
44
So the embedding does not converge to equilibrium. The reason is that its states are
periodic. Theorem 3.14 tells however that both the original process as the embedding
do have a unique equilibrium, called p and respectively. It is readily seen that is the
uniform distribution on I. p can be found by ow rate equations:
Let x be a solution of yA = 0. Choose x
0
= 1 (later on we normalize anyway). Rate
into state i = rate out of state i says
x
0
= x
1
, so x
1
=

,
x
j
= x
j+1
, so x
j+1
= x
j
=

, . .j {1, . . . , Q1}
x
Q
= x
0
,
The last equation follows from the previous ones and can serve as a verication. Hence,
x = (1,

, . . . ,

). Find an equilibrium distribution by normalizing:


p =
1
+Q
(, , . . . , ). (3.19)
Two natural questions that arise are
(i) What is the long run fraction of time that the store is out of stock?
(ii) What is the long run fraction of customers that will have to be disappointed?
In order to answer the rst question, dene a continuous time cost-reward structure r
by r(0) = 1 and r(j) = 0 for all j {1, . . . , Q}. We have
. .
R(t) =

t
0
r(X(u)) du
= the total reward in period [0, t]
= total time with empty stock in period [0, t].
. .
The Renewal Reward Theorem 3.15 claims that
. . lim
t
R(t)
t
=

jI
r(j)p
j
= p
0
=

+Q
. . .
For Question (ii), we dene a discrete cost-reward structure, i.e., r(i) = 0 for all j I.
Furthermore, let d
j,j1
= 1 for all j {1, , . . . , Q} and let d
0Q
= 0. This reward structure
counts sold items; R(t) equals the total number of items sold in period [0, t]. So, the
total number of lost demand equals N(t) R(t) and the total fraction of the demand
45
that is lost equals
N(t)R(t)
N(t)
. The Renewal Reward Theorem tells that in the long run
the average number of items sold per time unit equals
. .
lim
t
R(t)
t
=

jI
p
j

k=j
q
jk
d
jk
=
Q

j=1
p
j
q
j,j1
=
Q

j=1
p
j
=
Q

j=1

+ Q

=
Q
+Q
.
. .
The answer is thereby found as follows:
. . lim
t
N(t) R(t)
N(t)
= 1 lim
t
R(t)
t

t
N(t)
= 1
Q
+Q

1

=

+Q
. . .
The answers coincide! This is no coincidence, but due to a general phenomenon, de-
scribed by the PASTA theorem. . .
Before we can postulate the theorem, we need more notations. Let N = {N(t) : t 0}
be a Poisson process. Let X = {X(t) : t 0} be a process with state space I. It
need not be a CTMC, but we do assume that it converges to equilibrium; there exists a
distribution p on I with p
i
= lim
t
P[X(t) = i].
It is assumed that N is an exogenous factor, i.e., the process of arrivals is independent
on the past of the chain. In formula, for all t > 0 we have
. .{N(t + t) N(t) : t > 0} {X(u) : u [0, t)}. . .
The idea of the theorem is that the arrivals of the Poisson process form a sequence
of time moments at which process X is observed. In the long run, an observer has a
probability of p
i
to nd the system in state i. Here, i is the state of the system just
before the arrival of the observer might adjust the state. E.g, in a queue system he nds
a queue of length i and his arrival enlarges the length of the queue to i + 1.
Theorem 3.17 [The PASTA Theorem]
Let X = {X(t) : t 0} be a stochastic process converging to equilibrium p and let
N = {N(t) : t 0} be an exogenous Poisson process. Let X
n
be the state of the system
46
X just prior to the n
th
arrival of the Poisson process. Then for all j I we have
lim
n
1
n
#

k {1, . . . , n} : X
k
= j

= p
j
. (3.20)
Here #A denotes the number of elements of set A. In words, the PASTA-theorem claims
that the long run fraction of arrivals nding the system in state j equals the long run
fraction of time that the system is in j. The name of the theorem refers to the
abbreviation . .Poisson Arrivals See Time Averages. . .abbreviation
Are there situations in which PASTA does not hold?
Example 16
Consider a U/D/1-queue in which each interarrival time last longer than each service
time. The U in the notation stands for uniformly distributed interarrival times and the
D expresses deterministic service times. E.g, T
k
U(55, 65) (in minutes) and SC
k
= 30
minutes for each service completion. In this situation, each new customer nds an empty
queue, since the server has ample time. On the other hand, the working load of the server
equals .5, since on average there will enter one client per hour, costing him 30 minutes of
time. Hence, the long run fraction of time that the server is busy equals
1
2
. The reason
that PASTA does not apply is that the arrivals do not constitute a Poisson process.
The following exercise is representative for an exam.
Exercise 2
An assembly line for a certain product has two stations in series. Each station has only
room for a single unit of the product. If the assembly of a unit is completed at station
1, it is forwarded immediately to station 2 provided station 2 is idle; otherwise the unit
remains in station 1 until station 2 becomes free. Units for assembly arrive at station 1
according to a Poisson process with rate , but a newly arriving unit is only accepted by
station 1 when no other unit is present in station 1. Each unit rejected will be handled
elsewhere. The assembly times at stations 1 and 2 are exponentially distributed with
respective rates
1
and
2
. Formulate a continuous-time Markov chain to analyze the
47
situation at both stations. Specify the state variable(s) and the transition rate diagram.
Find the long run fraction of units accepted and the long run average time spent in the
system by an accepted customer. You dont have to calculate the unique equilibrium
distribution. Just tell why it exists, call it p and give the answers in terms of p. . .
Elaboration
Station 1 can be free (F), busy (B) or waiting (W). Station 2 can be free or busy. It is
recommended not to speak of the state of a station, but e.g., of the condition, situation,
or whatever, just to avoid confusion. Save the term state for the complete system. The
state (W, F) does not occur, so we have ve states in total:
. .I = {(FF), (BF), (FB), (BB), (WB)}. . .
Let {(X
1
(t), X
2
(t)) : t 0} describe the state of the system varying over time. At an
exam, you only have to verify the Markov property, the time homogeneity property, or
the four characterizing properties a), b), c), and d) of Theorem 3.12 when explicitly
asked for. Here, the four characterizing properties are straightforward.
In order to move e.g., from state (W, B) to state (F, B), a service completion at the
second station is needed. Given that the system is in state (W, B), these service com-
pletions occur with a rate of
2
. Therefore, q
(WB),(FB)
=
2
. The other six positive
transition rates can be found in the following transition rate diagram:
. .
`

F, F
`

B, F
`

F, B
`

B, B
`

W, B

d
d
d
ds

c c
'
'

1

1

2
. .
48
In order to prove the existence of a unique equilibrium p to which the system converges
(regardless the initial state of the system), it suces to remark that there are nitely
many states and that the chain is irreducible and then refer to Theorem 3.14.
By the PASTA theorem, the fraction p
acc
of the units that is accepted equals the fraction
of time that the system is in a state in which the rst station is free:
. .p
acc
:= p
(FF)
+p
(FB)
. . .
Let {N
acc
(t) : t 0} be the counting process that counts the accepted arrivals. This is
in general not a Poisson process (the rate of accepted arrivals is not constant), but it is
if the system is in equilibrium. There is a constant rate of incoming units. Each of
these units have in equilibrium a probability of p
acc
to be accepted. Therefore, the long
run average rate of accepted units equals
. .
acc
:= p
acc
. . .
Let L denote the long run average number of units present in the system. By means of
a costs-reward-structure we can nd L. Dene r(i, j) to be the number of units present
in state (i, j) so r(F, F) = 0, r(B, F) = r(F, B) = 1, and r(B, B) = r(W, B) = 2. Then
. .L = lim
t
R(t)
t
=

(i,j)I
p
(i,j)
r(i, j) = p
(F,B)
+p
(B,F)
+ 2p
(B,B)
+ 2p
(W,B)
. . .
Let W denote the long run average time that an accepted unit spends in the system.
We have . .

K
K
W = lim
t
accumulative system time in [0,t]
total # accepted units in [0,t]

K
K
= lim
t
R(t)
N
acc
(t)

K
K
= lim
t
R(t)
t

t
N
acc
(t)

K
K
=
L

acc
.
. .We have
. .
We have found an easy relation W = L/
acc
. It is called Littles Law and it is widely
applicable. It applies whenever the three notions L, W, and
acc
are well dened limits
(or constants). It is much more general than PASTA, since the process {N(t) : t 0}
of observation moments need not be a poisson process. It suces that lim
t
N(t)
t
exists. In
a queueing context, L represents the Length of the queue and W the Waiting time. E.g,
49
in the U/D/1-queue-example 16, we nd that W = 30 minutes, L = 1/2, and = 1/60
arrivals per minute on average in the long run.
Theorem 3.18 [Littles Law]
Let {X(t) : t 0} be a stochastic process and let {N(t) : t 0} be the process that
counts arrivals of people or objects into the system. Let be the long run rate of arrivals.
Let L denote the long run average number of people/objects inside the system. Let W
be the long run average system time of a person/object. Then
. .L = W . . .
50
A Appendix; Mathematical Tools to Build Stochas-
tic Models
Random Variables
Random variables (stochasten in Dutch) are often denoted by capitalized characters:
X, Y , Z, N, ... Mathematically, they are just functions with a domain and a range.
Intuitively, these are variables of which some information is known. The domain of a
random variable is mostly called (also called sample space). e.g., = {head, tail}
when tossing a coin. Suppose you gamble and bet for e1, on head. The random
variable X modeling this bet is given by X(head) = 1 and X(tail) = 1.
Elements of are called states, samples, realizations or outcomes. Subsets A of are
called events. Each event has a given probability P(A) to occur. E.g., if you roll a die,
then = {1, 2, 3, 4, 5, 6} and the event of an odd number thrown is {1, 3, 5}.
Each random variable X has a (cumulative) distribution function, (cdf)
. .
F(x) = P(X x)
:= P({ : X() x}).
. .
Random variables in applications are either discrete, continuous, or a mixture of these
two types. A discrete random variable has a (probability) mass function (pmf) p :
IR [0, 1] representing the probability of each single possible outcome to realize:
p(x) = P(X = x). In order to have a discrete random variable, there must exist a (nite
or innite) sequence of dierent numbers x
1
, x
2
, x
3
, ... such that
. .

i=1
p(x
i
) = 1. . .
Consequently, P(X = x) = 0 for all x IR with x / {x
1
, x
2
, . . . }.
A random variable is called continuous if its cdf is continuous. It then has a (probability)
density function (pdf) f : IR IR
+
with F(x) =

f(y) dy. If F is dierentiable at


x, then F

(x) = f(x). If X is continuous, then P(X = x) = 0 for all x IR (why?), so


X cannot be discrete.
A real life example of a random variable that is neither discrete nor continuous can be
constructed as follows.
i
Example 17 Suppose the policy of the university is to replace pcs as soon as they
seriously crash or reach the age of 5. Assume that a pc has a continuously distributed
lifetime L (the time until it crashes when no preventive replacement would be scheduled).
Then the time T that a pc is in use is neither continuous, nor discrete; there is a positive
probability that T = 5.
This syllabus uses the customary notations:
X F denotes that F is the cdf of X,
X Y denotes random variables X and Y have the same cdf-s, i.e., for all
x IR, we have that P(X x) = P(Y x).
X F, f denotes that X has cdf F and pdf f.
Independence
A pair of events A and B are called independent, notation AB, if
. .P(A B) = P(A) P(B). . .
Let A, B be events with P(B) > 0. The occurrence of the event B might have impact on
the probability that A will happen. P(A | B) denotes the conditional probability that
A occurs, given that B occurs. By denition, P(A | B) =
P(A B)
P(B)
. A pair of random
variables X and Y are called independent, XY , if for all x, y IR
. .{X x} {Y y}. . .
Geometric Distribution
A random variable N is said to have a geometric distribution with parameter p (0, 1)
if it is non-negative and integer valued with
. .P[N = n] = (1 p)
n
p . .
for all n {0, 1, 2, 3, ...}. Such a random variable expresses e.g. the number of failures
before success in a series of independent attempts.
ii
Poisson Distribution
A random variable N is said to have a Poisson distribution with parameter > 0,
notation N Poisson(), if it is non-negative and integer valued with
. .P[N = n] =

n
e

n!
. .
for all n {0, 1, 2, 3, ...}. Such a random variable expresses e.g. the number of customers
entering a queue in the rst time unit when the arrival rate equals customers per time
unit. The sum of two independent Poisson distributed random variables has a Poisson
distribution as well and has as its rate the sum of the original rates (see Corollary 3.8).
Uniform Distribution
A random variable X is said to have a uniform distribution over a set A, notation
X U(A), if A is nite and P[X = a] =
1
|A|
for all a A. Another option is that A
equals a nite interval (a, b). In that case X is continuous and for x (a, b) we have
. .F(x) =
x a
b a
. . .
Exponential Distribution
A random variable X is said to have an exponential distribution with rate > 0,
notation X exp(), if
. .P[X t] = 1 e
t
. .
for all t 0. This is e.g. the distribution of the excess waiting time until a new customer
joins a queue when the arrival rate equals customers per time unit. So is not just a
parameter, it has an important interpretation.
5
Exponentially distributed random variables have the so-called memoryless property.
E.g., if the lifetime of a machine exp(), then given that the machine still works
at time t, it is as good as new at time t. I.e., a machine of t years old has still an
exp()-distributed lifetime.
5
That is why has been chosen to follow the book of Tijms and not other textbooks in which X
exp() implies that P[X t] = 1 e
t/
.
iii
Denition A.1 A nonnegative random variable X possesses the memoryless property
if for all t, u IR
+
we have
P(X > t + u | X > t) = P(X > u). (A.1)
It is straightforward that exponentially distributed random variables possess the mem-
oryless property:
. .P(X > t +u | X > t) =
P(X > t +u)
P(X > t)
=
e
(t+u)
e
t
= e
u
= P(X > u). . .
Actually, they are the only ones with the property, but there are other random variables
that obey weaker versions, as the following exercise displays:
Exercise 3 A way to adapt the memoryless property such that it makes sense for
integer valued random variables is the following:
P[X m+n | X m] = P[X n] for all m, n IN
0
. (A.2)
Let N be a random variable with its outcomes in IN
0
with P[N = 0] = p for some
p (0, 1).
a. Determine P[N n] for all n IN
0
.
b. Determine P[N = n] for all n IN.

Another easy to prove and frequently applied result about exponentially distributed
random variables is the following:
Theorem A.2 Let X
1
exp(
1
), X
2
exp(
2
) and X
1
X
2
. Then their minimum
is an exponentially distributed random variable with rate
1
+
2
, i.e., min(X
1
, X
2
)
exp(
1
+
2
).
Proof

K
K
P(min(X, Y ) > z) = P(X > z and Y > z) = P(X > z) P(Y > z)

K
K
P(min(X, Y ) > z) = e

1
z
e

2
z
= e
(
1
+
2
)z
. . .
iv
Failure Rate Function
Let X F, f be a positive random variable with a probability distribution function F
and a continuous probability density f. For example, the random variable X represents
the lifetime of some item. The failure rate function of the random variable X is dened
by . .r(t) =
f(t)
1 F(t)
. .by
for those values of t with F(t) < 1. The failure rate has a useful probabilistic interpre-
tation. Think of the random variable X as the lifetime of an item. The probability that
an item of age t will fail in the next t time units is given by
. .
P[t < X t + t | X > t] =
P[t < X t + t]
P[X > t]
=
f(t)t
1 F(t)
+o(t) as t 0.
. .
Hence r(t)t gives approximately the probability that an item of age t will fail in the
next t time units when t is small. Hence the name failure rate.
It is easy to verify that exponentially distributed random variables have a constant failure
rate. This comports with the memoryless property. It is not dicult to prove that these
are the only random variables with a constant failure rate. The course Dierentiation
& Integration Theory will provide the theory to be able to do this. Other important
cases are the case of an increasing failure rate (the older, the worse) and the case of a
decreasing failure rate (the older, the better).
Expectation
If X discrete, its expectation is dened by E[X] =

x:p(x)>0
xP(X = x). If X continuous,
it has an expectation of E[X] =

xf(x) dx. Other convenient formulas are


E[X] =


0
[1 F(x)] dx (A.3)
for any non-negative random variable, and
E[N] =

n=0
P(N > n) (A.4)
if N is non-negative and integer-valued.
v
Conditioning upon an Event or Random Variable
Many times, when a problem is too hard to tackle at once, because there are too many
unknowns, it might be convenient to act as if some unknown variable is known. One
can solve the problem conditioned on the value of the presumed to be known unknown.
Suppose, Y is some random variable and the optimum of some problem equals f(y)
when we assume that {Y = y}. Then the expected answer will be E[f(Y )]. In general,
this leads to the Law of Total Expectation. Let A be an event and let X and Y be
random variables. Two variants are:
E[X] =

y:p(y)>0
E[X | Y = y]P(Y = y) if Y is discrete, (A.5)
E[X] =

y:f
Y
(y)>0
E[X | Y = y]f
Y
(y) dy if Y is continuous. (A.6)
The technique can also be applied to compute the probability of some event. Then it is
called the Law of Total Probability, again in two variants:
P(A) =

y:p(y)>0
P(A | Y = y)P(Y = y) if Y is discrete, (A.7)
P(A) =

y:f
Y
(y)>0
P(A | Y = y)f
Y
(y) dy if Y is continuous. (A.8)
Independent and Identically Distributed (i.i.d.)
Let X
1
, X
2
, . . . be a sequence of random variables. The sequence is called independent
and identically distributed (i.i.d) if
. .X
i
X
j
for all i = j, . .
. .X
i
X
j
for all i = j. . .
vi
Examples of Conditioning
The following theorem can be proven by applying the law of total expectation. There
are milder criteria under which the expression remains valid, but for the sake of the
argument we choose the (i.i.d.)-assumption.
Theorem A.3 Let X
1
, X
2
, . . . be an i.i.d. sequence of random variables. Let N be a
discrete random variable with n IN = {1, 2, 3, . . .}. If NX
i
for all i IN, then
E

k=1
X
k

= E[N]E[X
1
], (A.9)
Proof Condition upon the event {N = n}, i.e., apply the law of total expectation:
. .
E

k=1
X
k

n=1
E

k=1
X
k

N = n

P(N = n)
=

n=1
E

k=1
X
k

N = n

P(N = n)
=

n=1
E

k=1
X
k

P(N = n)
=

n=1
n

k=1
E(X
k
) P(N = n)
=

n=1
n

k=1
E(X
1
) P(N = n)
= E(X
1
)

n=1
P(N = n)
n

k=1
1
= E(X
1
)

n=1
nP(N = n)
= E(X
1
)E(N).
. .
The third equality is due to the assumption that the X
i
s are independent on N. . .2
Another example of conditioning provides the proof of the following theorem:
Theorem A.4 Let X exp(
1
), Y exp(
2
) and XY . Then
P(X Y ) =

+
. (A.10)
Proof Condition P(X Y ) upon the event {Y = y}:
vii
. .
P(X Y ) =


0
P(X Y | Y = y)f
Y
(y) dy
=


0
P(X y | Y = y)e
y
dy
=


0
P(X y)e
y
dy
=


0
(1 e
y
)e
y
dy
.
.
.
=

+
.
. .
Which equality uses the assumption that XY ? . .2
Three Useful Sums
These three standard summations are supposed to be known and might be needed in
exercises and exams:

k=0
a
k
=
1
1 a
if |a| < 1 (A.11)

k=1
ka
k
=
a
(1 a)
2
if |a| < 1 (A.12)

k=0
a
k
k!
= e
a
for all a IR (A.13)
The Error Term o(t)
Let f : IR IR be a function. Suppose that f(t) is relatively small compared to t for
small instances of t, or, in formula, lim
t0
f(t)
t
= 0. Then f is called of order smaller than
t, notation f(t) = o(t). The notation o(t) stands for any unspecied expression with the
property that lim
t0
o(t)
t
= 0. o(t) is called the error term.
Lemma A.5 Let IR. Then the function f given by f(t) = e
t
1 t is of order
o(t).
Proof We apply the standard summation in Equation (A.13) with a = t:
viii
. .
e
t
1 t = 1 t +
1
0!
+
t
1!
+

2
t
2
2!
+

3
t
3
3!
+
=

2
t
2
2!
+

3
t
3
3!
+
= o(t).
. .
The last equality is due to the fact that t
k
is relatively small compared to t for small
instances of t and any k > 1. . .
B Index
ix
Index
; independence, ii

ij
; mean rst passage time, 11

jj
; mean returning time, 11
IN
(0)
= {(0, )1, 2, 3, . . .}, 4
#; number of elements, 47
o(t) (error term), viii, 25
p; equilibrium of a CTMC, 36
; equilibrium of a DTMC, 14
p
ij
; one step transition probability, 6
p
ij
; one jump transition probability, 30
p
(n)
ij
; n-step transition probability, 10
IR; the set of real numbers, 5
; server utilization, 39
X F, ii

j
; rst passage time, 11
absorbing state, 7, 17, 30
accessible, 16
aperiodic, 17
arrival-rate, 22
balance equation, 37
cdf, i
closed set, 17
communication, 16
conditioning, vi
continuous random variable, i
counting process, 21
CTMC, 20
density, i
discrete random variable, i
embedding, 40
equilibrium, 14, 36
equivalence relation, 16
Erlang Distribution, 23
event, i
exogenous, 46
expectation, v
Exponential Distribution, iii
failure-rate, v, 22
ow rate equation, 37
Geometric Distribution, ii
greatest common divisor, 17
i.i.d., vi, 5
Independent increments, 25
interarrival times, 22
irreducible, 16
law of total expectation, probability, vi
Littles Law, 49
Markov chain, 5
Markov property, 5
memoryless property, iv
x
minimal closed set, 17
n-step probability, 10
one-step transition probabilities, 6
outcome, i
PASTA, 46
pdf, i
periodicity, 17
pmf, i, 9
Poisson Distribution, iii
Poisson Process, 22
probability, i
Random Variable, i
rate, 13
realization, i, 5
regenerative state, 14
row-stochastic, 6
sample, i, 5
sample space, i
service-rate, 22
sojourn time, 30
s-S-stock inventory, 9
state, i, 4
state space, 4
stationary, 36
stationary distribution, 14
Stationary increments, 25
time index set, 4
time-homogeneous, 6, 21
transition probability matrix, 34
transition rate, 30, 34
transition rate matrix, 34
transition-probability-diagram, 8
transition-probability-matrix, 6
Uniform Distribution, iii, 28
xi

You might also like