You are on page 1of 22

ECM3724 Stochastic Processes

ECM3724 Stochastic Processes


1

Overview of Probability

We call (X, , P ) a probability space. Here is the sample space, X : R is a random variable (RV)
and P is a probability (measure). This is a function on subsets of . Elements are called outcomes.
Subsets of are called events.
Given A R, P {X A} = { : X() A}. Given x R, {X = x} = { : X() = x}.
Example
Suppose we toss a coin twice, then = {HH, HT, T H, T T }, || = 4, X is the number of heads. Then X
is the of values that X takes, that is X = {0, 1, 2}. Now {X = 1} = { : X() = 1} = {HT, T H}.
P (X = 1) = P (HT ) + P (T H).
P
(or X ) could be discrete, for example X = {x1 , x2 , ...}. We require
xX P (X = x) = 1. (or X )
R
could be continuous.
If X = [0, 1], P (X A) = A fX (x)dx. Here fX (x) is a probability density function (pdf),
R
fX (x) 0, X fX (x)dx = 1.
Expectation
P
If X = {x1 , x2 , ...} and g : X R then E(g(X)) =
g(xi )P (X = xi ). If g(X) = X, E(X) := X , then mean
2
2
2
of X. If g(X) = (X X )2 then E(g(X))
:=
V
ar(X)
=

X > 0, the variance of X. Also V ar(X) = E(X ) X .


R
In the continuous case, E(g(X)) = X g(x)fX (x)dx.
Common Distributions
Rb
If X U nif [0, 1] then we say that X is distributed as Uniform[0, 1]. If A = [a, b] X then P (X A) = a 1dx =
b a. If A = Q [0, 1], P (X A) = 0. There exist subsets of [0, 1] for which P (X A) is undefined.
If X Ber(p) then we say that X is distributed as a Bernoulli trial with success probability p. X = {0, 1},
P (X = 0) = 1 p, P (X = 1) = p, p (0, 1). We can extend this over multiple independent, identically
distributed (IID) trials (recall that events A and B are independent if P (A|B) = P (A), P (B|A) = P (B) or
P (A B) = P (A)P (B)). In this case X Bin(n, p), that is X is distributed binomially
with n trials and success

probability p. In this case X = {0, 1, 2, ..., n}. For 0 r n, P (X = r) = nr pr (1 p)nr . The Binomial
distribution has E(X) = np, V ar(X) = np(1 p).
r

e
for r N {0}. Recall that
Let X = {0, 1, 2, ...}. We say that X P oisson() if P (X = r) = r!
P
P
xr
exp(x) = r0 r! , which means that r P (X = r) = 1. For the Poisson distribution the mean and variance are
both . Other discrete distributions include the geometric and hypergeometric distributions.

We say X Exp() if the pdf of X is given by fX (x) = ex for x 0. Now


Z
x
P (X > x) =
eu du = [eu ]
.
x =e
x

The exponential distribution is memoryless, that is the time to wait does not depend on the time already waited.
More specifically, P (X > t + s|X > s) = P (X > t) = et .
Gaussian/Normal distribution
We say X N (, 2 ) if the pdf is


(x )2
1
fX (x) = exp
2 2
2
for x R. E(X) = , V ar(X) = 2 .
The Central Limit Theorem (CLT)
Supoose (Xi )ni=1 are IID RVs with E(Xi ) = , V ar(Xi ) = 2 and let Sn =

Pn

i=1

Xi . Then if Zn =

Sn
n
n

ECM3724 Stochastic Processes

N (0, 1), that is the distribution converges to N (0, 1). If A = [a, b] then P (Zn A)n

Rb
a

exp(u2 /2

du.
2

This can bePapplied, for example if we take Sn to be the number of heads in n coin tosses. E(Sn ) = n/2,
n
V ar(Sn ) = i=1 V ar(Xi ) = n/4. Hence

P

Sn n2
[a, b]
1
2 n

=
a

exp(u2 /2)

du.
2

Moment Generating Functions (MGF)


For a RVPX, the MGF of X is the function MX (t) = E(etx ), a function of t R. RIn the discrete case,
E(etx ) = i etxi P (X = xi ), where X X = {x1 , x2 , ...}. In the continuous case, E(etx ) = X etx fX (x)dx.
Properties
MX (0) = 1.

dr MX
dtr t=0

= E(X r ).

If Z = X + Y and X, Y are independent then MZ (t) = MX (t)MY (t).


If X, Y are RVs and MX (t) = MY (t) then X and Y have the same probability distribution, provided MX (t)
is continuous in a neighbourhood of t = 0.
Exercise
Compute MX (t) for the Bernoulli distribution. Compute MY (t) for Y = X1 + ... + Xn , where the Xi are IID
Bernoulli RVs. What is the distribution of Y ? What happens as n (and p 0 with = np fixed.
t
We have MX (t) = p(e
1. Hence My (t) = (p(et 1) + 1)n by the properties of the MGF. If Y Bin(n, p)
 1) +nr
n r
then P (Y = r) = r p (1 p)
so
ty

E(e ) =

n  
X
n

r=0

pr (1 p)nr etr = (p(et 1) + 1)n

by the Binomial theorem. Hence the sum of IID Bernoulli trials has a Binomial distribution. If we fix > 0 and
let = np with n (so p 0). Note that in a special case with n and p close to 21 then we can apply the
n
CLT. In this case we can use the fact that limn 1 + nx = ex then

n

(et 1)
MY (t) = 1 +
exp (et 1) .
n
If Z P oisson(), P (Y = r) =
MY (t) =

xr
r=0 r! .

for r 0. Then

etr

r=0

since ex =

r e
r!

X (et )r e
r e
=
= exp((et 1))
r!
r!
r=0

This agrees with the limiting case of the Binomial distribution.

Probability generating functions (PGF)


These are useful in cases when X takes integer values.
Definition
X
Suppose
P n X takes values in tN. The PGF for X is the function GX () with GX () = E( ), that is GX () =

P
(X
=
n).
If

=
e
,
then
we
recover
M
(t).
X
n=0
Properties
GX (1) = 1.
P
n1
X
dG
P (X = n),
n=1 n
d =
G00X (1) = E(X(X 1)).

dGX
d =1

= E(X).

ECM3724 Stochastic Processes

GX+Y () = GX ()GY () if X, Y are independent.


V ar(X) = G00X (1) + G0X (1) [G0X (1)]2 .
Given a series for GX (), the coefficient of n is precisely P (X = n).
Moreover, GX (0) = P (X = 0).
00
The final property can be compared to MX
(0) MX (0)2 = V ar(X).

Example
X = X1 + ... + Xn , where the Xi Bernoulli(p). Now GXi () = (1 p + p). If the Xi are independent then
GX () = GX1 ()GX2 ()...GXn () = (1 p + p)n . So X Bin(n, p).
Example
Consider GX () =

1
2 .

What distribution does X have? Now

 n
1X
1
GX () = E( ) =
=
P (X = n) =
2
2 n=0 2
n=0

which means that P (X = n) =

1
2n+1 ,

that is X has a geometric distribution.

Conditional Expectation and Joint RVs


Consider X and Y discrete RVs taking values in X = {x1 , x2 , ...} and Y = {y1 , y2 , ...}. The joint probability
function is given by fX,Y
probability (distribution) function is
P (xi , yj ) = P (X = xi , Y = yj ). The marginal
P
fX (xi ) = P (X = xi ) = j fX,Y (xi , yj ) or FY (yj ) = P (Y = yj ) = i fX,Y (xi , yj ). The conditional probability
of X = xi given Y = yj is
fX|Y (xi |yj ) = P (X = xi |Y = yj ) =

fX,Y (xi , yj )
P (X = xi , Y = yj )
=
.
P (Y = yj )
fY (yj )

If
PX, Y are independent then fX,Y (xi , yj ) = fX (xi )fY (yj ) for all i, j. Given g : X Y R, E(g(X, Y )) =
i,j g(xi , yj )fX,Y (xi , yj ). If X, Y are independent and g(X, Y ) = h1 (X)h2 (Y ) then E(g(X, Y )) = E(h1 (X))E(h2 (Y )).
The conditional expectation of X given Y is the quantity E(X|Y ). This is a function of Y , the average over
X given by a value of Y . If Y = yj , then
E(X|Y = yj ) =

X
i

xi P (X = xi |Y = yj ) =

X
i

xi

fX,Y (xi , yj )
,
fY (yj )

a function of Y = yj . E(X|Y ) is a RV which is governed by the probability distribution of Y , hence we can also
take expectations.
Tower rule
E(E(X|Y )) = E(X).
We have a useful check, if X and Y are independent then E(X|Y ) = E(X). In general
!
X X xi fX,Y (xi , yj )
X X
X
E(E(X|Y )) =
fY (yj ) =
xi
fX,Y (xi , yj ) =
xi fX (xi ) = E(X).
fY (yj )
i
j
i
j
i
Compound processes
Suppose (Xi )
i=1 are IID RVs with PGF GX () (since the Xi are IID, X = Xi ). Suppose N is a RV with PGF GN (),
independent of the Xi . Let Z = X1 +X2 +...+XN . Z is a compound process, a random sum of random variables.
Proposition
For the compound process Z, the PGF is GZ () = GN (GX ()) = GN GX ().

ECM3724 Stochastic Processes

Proof
By definition
GZ () = E(Z ) =

n P (Z = n)

n=0
X1 +X2 +...+Xn

= E(
) = E(E(X1 +X2 +...+Xn |N )) [Tower rule]

X
X
=
E(X1 +X2 +...+Xn |N = n)P (N = n) =
E(X1 )E(X2 )...E(Xn )P (N = n) [Independence]
=

n=0

n=0

(GX ())n P (N = n) = GN (GX ()) [Definition of PGF].

n=0

The coefficient of n in GZ () gives P (Z = n).


Example
Suppose we roll a dice and then flip a number of coins equal to the number on the dice. If Z is the number of heads,
what is P (Z = k)?
By the previous proposition we have GZ () = GN (GX ()) where N U nif (6) (the values on the dice) and
X Bernoulli(1/2) (the flip of the coin). Now
GX () = E(X ) =

1
X

n P (X = n) =

n=0

1
(1 + )
2

and
6
1X n
P (N = n) =
.
GN () = E( ) =
6 n=1
n=1
N

6
X

Hence

GZ () = GN


6
1
1X 1
(1 + ) =
(1 + )n .
2
6 n=1 2n

It follows that
P (Z = k) is given by
the k coefficient in the sum. By the binomial theorem, we have P (Z = k) =



P
n
6
n
1
1
, recalling that nk = 0 for k > n.
n=1 k
6
2

Branching Processes

Let Sn be the number of individuals in a population at time n. Suppose S0 = 1 (one individual at time 0). Individuals evolve at each timestep according to a common RV X, and evolve independently of others. We assume X
has PGF GX (). Let Xi , i 1 be IID copies of X. We want to work out the long term behaviour of Sn , E(Sn )
and P (Sn = 0).
We use generating function analysis. For Sn , denote the PGF by Gn (). So since S1 = X, G1 () = GX (). For
S2 , G2 () = E(2S ) = E(E(S2 |X)) = GX GX () by the previous proposition. Similarly, G3 () = E(E(S3 |S2 )) =
GX GX GX ().
Proposition
Gn () = GX GX ... GX () (n-fold composition). Moreover Gn () = GX (Gn1 ()) = Gn1 (GX ()).
Proof
This follows easily by induction.
Remark: the coefficient of k in Gn () gives P (Sn = k).

ECM3724 Stochastic Processes

Expected behaviour of Sn
0
0
n
We want to study E(Sn ) = dG
d =1 = Gn (1). Let = E(X) = GX (1). We work out E(Sn ) iteratively. Now
Gn () = GX (Gn1 ()) G0n () = G0X (Gn1 ())G0n1 () [Chain rule]
G0n (1) = G0X (Gn1 (1))G0n1 (1)
G0n (1) = G0X (1)G0n1 (1) [Since GX (1) = 1 for all RVs X]
E(Sn ) = E(Sn1 ).
Since E(S1 ) = , we can apply this iteratively to get n := E(Sn ) = n .
Probability of extinction
Recall, given Gn (), P (Sn = 0) = Gn (0). Let en = Gn (0) be the probability of extinction at time n. Let
e = limn en be the probability of ultimate extinction. Now e1 = GX (0), e2 = GX (GX (0)) = GX (e1 ). By
iteration, en+1 = GX (en ), that is en = GX GX ... GX (0) (n-fold composition).
Finding e
We can begin
P to find e by plotting GX () for [0, 1]. Note that GX (0) [0, 1], GX (1) = 1 and since
GX () = n=0 n P (X = n), hence GX () is increasing. There are two cases, as seen in the following figure.

Figure 1: The two behaviours of the sequence (en ) as n .


(en ) is an increasing sequence. e = limn en+1 = limn GX (en ) = GX (limn en ) since GX () is continuous for [0, 1]. Hence e = GX (e), that is e is a fixed point of GX (). Remark: If X := G0X (1) 1 then e = 1.
If X > 1 then e 6= 1, in fact e is the smallest root in [0, 1] of GX () = . Useful check: GX (1) = 1, so = 1 is a
root too.
Example
Suppose P (X = 0) = 0.3, P (X = 1) = 0.5, P (X = 2) = 0.2. Work out E(Sn ) and P (S2 = 2) and e.
Note GX () = 0.3 + 0.5 + 0.22 . X = 0.5 1 + 0.2 2 = 0.9, hence E(Sn ) = 0.9n since n = nX . Since X 1,
e = 1. G2 () = GX GX () = 0.3 + 0.5(0.3 + 0.5 + 0.22 ) + 0.2(0.3 + 0.5 + 0.22 )2 . Hence, since P (S2 = 2) is the
2 coefficient of G2 (), P (S2 = 2) = 0.5 0.2 + 0.12 0.2 + 0.2 (0.5)2 = 0.174.
Example
Suppose GX () = 0.2 + 0.4 + 0.32 + 0.13 . Work out e and E(Sn ).
E(Sn ) = (G0X (1))n = (0.4 + 0.6 + 0.3)n = (1.3)n . Since X = 1.3 > 1, e < 1. We need to solve GX () = .

ECM3724 Stochastic Processes

2
3
2
Rearranging
this equation we get 2 6 + 3 + = 0 ( 1)( + 4 2) = 0, so the three roots are
= 1 and
= 2 6. We need to take the positive root for this to make sense as a probability, so e = 2 + 6.

Envelope problem
Take two envelopes, one contains twice the amount of the other. Pick an envelope, suppose it contains amount x.
Hence the other envelope contains either x/2 or 2x. The expected value for switching is 12 x2 + 12 (2x) = 5x
4 > x,
suggesting that you should always switch. This is a slight misconception, since the same argument can be applied
again meaning that you would be better off staying with the envelope you already have, which gives a paradox.
Consider S0 random, with PGF GY () specified. How do the results (as before) change? In this case, S1 =
X1 + X2 + ... + XY and Sn+1 = X1 + X2 + ... + XSn .
Proposition
n (), where G
n () is the S0 = 1 case PGF.
The PGF Gn () for Sn is given by Gn () = GY G
Proof
1 () = GY GX (). We can apply this argument
We make the observation that G1 () = E(E(S1 |Y )) = GY G
repeatedly to get the result.
Consequences
n := E(Sn ) = Y
n , where
n is the S0 = 1 case mean. en := GY (
en ), where en is the S0 = 1 case. Hence
e = GY (
e), where e is the ultimate extinction probability assuming S0 = 1.
Example
Suppose S0 = 6 and GX () = 0.3 + 0.5 + 0.22 . Work out E(Sn ) and e.
We have E(Sn ) = Y
n . Since Y = 6 and
n = 0.9, E(Sn ) = 6(0.9)n . Similarly, e = GY (
e). Since e = 1,
e = GY (1) = 1.
What if GY () = (0.4 + 0.6)3 ?
This is a Binomial(3, 0.4) distribution for S0 . So E(Y ) = 1.2, which means that E(Sn ) = 1.2(0.9)n . e = GY (
e) =
(0.4
e + 0.6)3 = 1 since e = 1. Remark: If e = 1, then we always get e = 1 if GY () is a well defined PGF.

Poisson Processes

Definition
Events occur as a Poisson process if the intervals of time between events are IID exponentially distributed RVs.
Recall that a RV T has an exponential distribution of its pdf is given by fT (t) = et for t 0 and > 0 (and 0
otherwise). For example,
Z
Z
1
u
t
P (T > t) =
e
du = e , E(T ) =
tet dt = .

t
0
The mean time between successive events is

1
.

The mean number of events per unit time is .

Let T1 be the time to the first event, T2 be the time between the first and second events,..., Tk be the time
between the k 1-th and the k-th events. Let Sn = T1 + T2 + ... + Tn , this is the time to the n-th event. Assume
{Tk }nk=1 is a sequence of IID RVs each with exponential distribution of rate . Recall the memoryless property of
the exponential distribution: P (T > t + s|T > s) = P (T > t).
Questions: What is the distribution of Sn ? Given a specified time t, how many events occur within this time?
Remark: Suppose we have n lightbulbs in sequence. We may be interested to find P (min{T1 , ..., Tn } t) or
P (max{T1 , ..., Tn } 6).
Theorem 2.1
n1
exp(t)
The time Sn to the n-th event follows a gamma distribution with pdf gn (t) = (t) (n1)!
for t 0, n 1.
Note that n = 1 gives the exponential distribution.
Proof
We will compute the moment generating functions (MGF) for Sn via the MGFs for the Tk , and show that this

ECM3724 Stochastic Processes

coincides with the MGF for a RV Y with pdf gn (t). By uniqueness of MGFs, the distributions must then coincide.
Recall that MT (t) = E(etT ). T has an exponential distribution, hence


Z
Z

MT (t) =
etu (eu )du =
eu(t) du =
exp(u(t ))
,
=
t

t
0
0
0
that is assuming t < . It follows that
MSn (t) = E(ttSn ) = E(etT1 etT2 ...etTn ) = E(etT1 )E(etT2 )...E(etTn )
n

since T1 , ..., Tn are independent. Hence MSn (t) = (t)


n . If Y is a RV with the gamma distribution pdf gn (x),
R tx
R
then the MGF for Y is precisely 0 e gn (x)dx. By direct calculation, using 0 xn ex dx = n! and substituting
n
y = x(t ), we have MY (t) = (t)
n . We can conclude that MSn (t) = MY (t). Since these are continuous near
t = 0, then Sn has the same (gamma) distribution as Y .

Suppose we now wish to fix some period of time t and consider the number N of events in this time.
Theorem 2.2
The number Nt of events in time t follows a Poisson distribution with parameter t, that is Nt P o(t). That is
r
P (Nt = r) = (t) exp(t)
for r 0.
r!
Proof
Consider the following. For r 1, P (At least r events in time period t) = P (Time to the r-th event is at most
Rt
t) = 0 gr (x)dx. P (Exactly r events in time t) = P (At least r events in time t) P (At least r + 1 events in time
Rt
Rt
r
.
t) = 0 gr (x)dx 0 gr+1 (x)dx = (t) exp(t)
r!
We can draw several consequences from this.
Combining Poisson processes
A Poisson stream is a sequence of arrivals (events) where the inter-arrival times are independent and follow an
exponential distribution.
Suppose males (M ) arrive to a shop as a Poisson stream of rate m . Suppose females (F ) also arrive as a Poisson
stream with rate f . Assume both streams are independent. We want to analyse the combined process for the total
arrivals.
Method 1
Let G(t) := P (Time to the next arrival is less than t). This is the probability distribution of the arrival time of
the next customer, irrespective of being male or female. Now G(t) = 1 P (No arrivals before time t) = 1 P (No
males arrive in time t)P (No females arrive in time t) = 1 exp(m t) exp(f t) = 1 exp((m + f )t).
Method 2
Let N be the number of arrivals in time t, N (m) be the number of male arrivals and N (f ) be the number of female
arrivals. Then N (m) P o(m t), N (f ) P o(f t). Then
P (N = k) = P (N (m) + N (f ) = k) =

k
X

P (N (m) = r and N (f ) = k r) =

r=0

=
=

r!

P (N (m) = r)P (N (f ) = k r)

r=0

k
X
(m t)r exp(m t) (f t)kr exp(f t)
r=0

k
X

(k r)!

k  
exp(m t) exp(f t) X k
(m t)r (f t)kr
=
k!
r
r=0

((m + f )t)k exp((m + f )t)


.
k!

Hence N has a Poisson distribution of rate (m + f )t.


In general, given X and Y , Z = X + Y , Z is a convolution of X and Y .

ECM3724 Stochastic Processes

Splitting processes
Suppose customers arrive as a Poisson stream with combined rate . For each customer that arrives, there is a
probability p that the customer is male, and probability 1 p that the customer is female. Arrivals are independent.
We want to find the arrival process
P for females alone. We expect f = (1 p). Now, P (No females arrive in time
t) = P (No arrivals in time t) + n=1 P (n arrivals in time t, each arrival is male)
)
(

X
X
(t)n exp(t) n
(tp)n
= exp(t) exp(tp) = exp((1 p)t).
p = exp(t) 1 +
= exp(t) +
n!
n!
n=1
n=1
Hence the inter-arrival times for females have an exponential distribution with parameter (1 p).
Remark: we can extend combining and splitting to an arbitrary number of Poisson stream types.
Example
Suppose events occur as a Poisson stream. Suppose we know that there are N events in time T (N, T fixed). Now
fix t < T . What is the probability distribution that governs the number of events in time t?
We can split the time interval T into two parts, one up to time t and one after time t. Now
P ({r events in time t} {N events in time T })
P (N events in time T )
P ({r events in time t} {N r events in time T t})
=
P (N events in time T )

P (r events in time t|N events in time T ) =

by the memoryless property of the exponential distribution. The top events are independent so we can take the
product of the probabilities. Hence




(t)r exp(t)
((T t))N r exp((T t))
N!
P (r events in time t|N events in time T ) =
r!
(N r)!
(T )N exp(t)
 
N r
=
p (1 pt )N r
r t
where pt =

t
T

. So it has a Binomial(N, Tt ) distribution.

Remark: Also consider the time to the r-th event given N events in time T . The corresponding distribution
is a beta distribution and the mean time taken is NrT
+1 .
Example
Suppose students arrive to Harrison as a Poisson stream of rate 3 per unit time.
(i) Find P (5 students enter within time 2).
(ii) Find P (Time taken for 4-th student to arrive is at least 2).
(iii) Find P (3 students enter in time 1|5 students enter by time 2).
(iv) If one student entered in time 1, show that the time the student entered is uniformly distributed on [0, 1].
5
(i) We need to find P (N2 = 5) for = 3. P (N2 = 5) = 6 exp(6)
= 0.161 to 3 sf.
5!
(ii) Following the proof of Theorem 2.2:


62
63
6
P (4 th event in time t 2) = P ( 3 events in time 2) = P (N2 3) = e
1+6+
+
= 0.151 to 3 sf.
2
3!
  3 1 2
5
(iii) For N = 5, T = 2, r = 3, t = 1, P (3 students in time 1|5 students in time 2) = 53 12
= 16
.
2
t
(iv) The single event in time T is governed by a Binomial(1, T ). Since T = 1, the probability of success is t, that
means we get the probability distribution for a U nif orm[0, 1] RV.
Poisson Rate Equations (Steady state)
Previously we have has the number of arrivals in time t Nt P oisson(t). If Nt is specified alone, the system
state will just tend to . Well consider a corresponding departure process (also with a Poisson distribution
P oisson(t)). Well examine the long run average, the probability distribution governing the state of the system in
future time (after transient effects). We hope that the long run probabilities that govern the number of individuals

ECM3724 Stochastic Processes

in the system do not depend on time.


Consider the number of arrivals (events) in a short time period t. Define o(x) as any function of x such that

3/2
limx0 o(x)
is a suitable choice for o(x), but x isnt. In time period t, the number N of events is
x = 0. x
governed by a P oisson(t) distribution. We want to look at
2 t2
+ ... = 1 t + o(t)
2!
P (N = 1) = t exp(t) = t + o(t)

P (N = 0) = exp(t) = 1 t +

P (N 2) =

(t)2 exp(t) (t)3 exp(t)


+
+ ... = o(t).
2
3!

Well consider probabilities with o(t) as insignificant. Well now consider a system which contains a population
of individuals. Denote the system state by n, that is the number of individuals in the system. Given state n, we
assume arrivals are governed by a P oisson(n t) distribution, and departures are governed by a Poisson distribution
with rate n (per unit time). Observe that if the state of the system changes then so do the probability distributions
that govern arrivals and departures, that is the rates n and n will change. If the system is in state n = 0 (empty)
then 0 = 0. Also, assume an upper limit on capacity so that if we are in state N (for some fixed number) then
N = 0.
State diagram

Figure 2: The state diagram for the system, show the directions of the rate constants i and j .
Define Pn (t) to be the probability of being in state n at time t. Well compare Pn (t) to its neighbours. In fact
the evolution of Pn (t) with time will just depend on the neighbours. Transitions to states with a difference in n at
least 2 will have small probabilities in time scale t. Note we assume all processes are independent. Now suppose
Pn (t) is given for all 0 n N . Consider P0 (t + t), that is the probability of being in state 0 at time t + t. Now
P0 (t + t) = P (State 0 at time t and no arrivals in time t) + P (State 1 at time t and one departure from state 1
in time t) + P (State at least 2 at time t and sufficient departures to 0). Hence
P0 (t + t) = P0 (t)(1 0 t) + P1 (t)1 t + o(t).

(1)

Similarly
PN (t + t) = PN (t)(1 N t) + PN 1 (t)N 1 t + o(t).

(2)

Now we consider 0 < n < N . In this case we also need to include the possibility of no arrivals or departures about
our given state. Hence
Pn (t + t) = Pn1 (t)n1 t + Pn+1 (t)n+1 t + Pn (t)(1 nt)(1 n t) + o(t)
= Pn1 (t)n1 t + Pn+1 (t)n+1 t + Pn (t)(1 n t n t) + o(t).

(3)
(4)

n (t)
We can rearrange Equations 1 to 3 to get the LHSs in the form Pn (t+t)P
and take t 0. These LHSs become
t
dPn
for
0

N
.
Hence
we
obtain
the
Poisson
rate
equations:
dt

Equation 1

dP0
= 1 P1 (t) 0 P0 (t)
dt

(5)

ECM3724 Stochastic Processes

10

Equation 2

dPN
= N 1 PN 1 (t) N PN (t)
dt

(6)

Equation 3

dPn
= n1 Pn1 (t) + n+1 Pn+1 (t) (n + n )Pn (t) for 0 < n < N.
dt

(7)

We now have N + 1 coupled ODEs. We will be interested in the steady state, and this means that the long run
behaviour is time-independent. Sufficient conditions for (that is that imply) steady state are at least one of the
following: (i) An upper limit on capacity; (ii) After some state n = n0 , the departure rate n is greater than the
arrival rate n for all n n0 .
Well assume that the system tends to a steady state. Moreover we assume that the steady state is independent of the initial state and transient effects can be ignored (quickly) as time t increases. As we approach the
dPn
n
steady state, Pn (t) Pn as t (for some constant Pn ). Hence dP
dt 0 and hence we set dt = 0 in Equations
4 to 6 to get:
Equation 4 0 = 1 P1 (t) 0 P0 (t)

(8)

Equation 5 0 = N 1 PN 1 (t) N PN (t)

(9)

Equation 6 0 = n1 Pn1 (t)n+1 Pn+1 (t) (n + n )Pn (t) for 0 < n < N.

(10)

These equations are homogeneous


PN and Pn = 0 is a solution for all n. However we have linear dependence and we
can remove this by imposing n=0 Pn = 1. This will lead to a unique solution. We take a change of variable,
letting n = n1 Pn1 n Pn . Then from Equation 7 we find 1 = 0, from Equation 9 we find n = n+1 , that
is n = 0 for all n N 1. Finally from Equation 8 we get N = 0. Since Pn = 0, we have Pn = n1Pn n1 for
n = 1, ..., N 1. By iteration we find
n1 n2 ...0
P0 .
n n1 ...1
n
PN
PN
Since n=0 Pn = 1, we have 1 + n=1
10. This is the steady state relation.
Pn =

(11)
n1 ...0
n ...1

P0 = 1 which can be solved for P0 and hence Pn from Equation

Steady state diagram analysis


We want to work out the steady state probabilities. We dont need to remember Equations 1 to 10. A probability
flow from state (n) to state (n + 1) is just n Pn . A probability flow from state (n + 1) to state (n) is just n+1 Pn+1 .
In fact, at a steady state, the probability flows balance, that is n+1 Pn+1 = n Pn . This analysis is equivalent to
Pn
solving n = 0 (or n = n+1 ). Hence Pn+1 = nn+1
.
Method
1. Draw the steady state diagram and equate probability flows to get steady state equations in Pn .
2. Solve these equations for Pn in terms of P0 .
PN
3. Solve for P0 using n=0 Pn = 1 (note that N = is allowed).
4. Find Pn via the equations in Step 2.
5. Compute the expected system state

PN

n=0

nPn .

ECM3724 Stochastic Processes

11

Recall that if N = we require limn nn < 1. If limn > 1 then there is no steady state (that is the state
tends to infinity with probability 1). If limn nn = 1 then finding a steady state is possible but not always
guaranteed.
Example
Suppose we have one engineer to repair a set of three photocopiers. Individual machines break down at a rate
of once per hour. Repair time is 30 minutes per machine on average. The state of the system if the number of
machines broken. Times are exponentially distributed.
The individual breakdown rate is 1 per hour so = 1. The repair rate is 2 per hour so = 2.

Figure 3: The steady state diagram for this example.


From the steady state diagram, we can derive the steady state equations:
P3 3P0 = 2P1 , 2P1 = 2P24, P2 = 2P63 ,
which in turn mean P1 = 3P2 0 , P2 = 3P2 0 , P3 = 3P4 0 . Hence by applying n=0 Pn = 1 we get P0 = 19
, P1 = 19 ,
3
6
and P3 = 19
. From this we can find the expected state of the system is 27
P2 = 19
19 .
Example
Suppose there are 3 machines with a breakdown rate of = 1 per hour. Suppose there are 2 engineers and each
repairs a single machine with a mean service time of 30 minutes. Let X be the number of machines broken. Find
P0 and E(X).
Note that the rate of service (per engineer) is = 2 per hour.

Figure 4: The steady state diagram for the case of 3 photocopiers and 2 engineers.
To find the required information we equate probability flows. Doing this, we find the equations 2P0 = P1 ,
2P1 = 2P2 and P2 = 2P3 . Solving each of these equations for P0 and using P0 + P1 + P2 + P3 = 1 we find
16
57
P0 = 55
. Using this we can calculate E(X) = 55
.

Queueing Theory

As usual, we ignore transient effects and assume steady state. The analysis is via steady state diagrams. We
have the notation that a G1 /G2 /n queue is a queue whose arrival process is governed by a process G1 , a service
(departure) process given by G2 , and n is the number of servers. Denote G1 /G2 /n/ to be the queue as above
with infinite capacity (we usually omit ).

ECM3724 Stochastic Processes

12

Well consider G1 = M () and G2 = M (), where the arrival rate is and the individual service rate is
and G1 , G2 are Poisson streams (note the 0 M 0 denotes Markov). The mean time between successive arrivals is 1
(from the exponential distribution). We will focus on M/M/n queues with n = 1, 2 specifically. Well also consider
finite or infinity capacity for n = 1. Well analyse the probability distribution of the system size, expected system
size and waiting time in the system.
Suppose we have an M/M/1 = M/M/1/ queue. This is a single server queue with infinite capacity. There
is an arrival rate of individuals and service rate . The state is the number of individuals in the systems, that
is the sum of the number of people in the system and the number of people being served. Let = be the traffic
intensity parameter. We get the following steady state equations. P0 = P
we
1 ,...,Pn = Pn+1 ,... By
P
Pinduction

can see that P1 = P0 , P2 = 2 P0 ,..., Pn = n P0 . We then find P0 from n=0 Pn = 1. Hence P0 ( n=0 P n ) = 1
which means that P0 = 1 (provided < 1). If > 1 then there is no steady state solution and the system size
tends to . Hence Pn = n (1 ).
We define Ls to be the mean number of individuals in the system and Lq to be the mean number of individuals in the queue. Now

Ls =

n=0

nPn =

nn (1 ) = (1 )

n=0

X
n=0

nn1 =

P
1
1
since n=0 nxn1 = (1x)
2 provided |x| < 1. We can obtain Lq in two ways. Firstly Lq =
n=1 (n 1)Pn = 1 .
However Ls is the mean number of people in the queue plus the mean number of people being served, that is
Lq + 0 P (System empty)+1 P (System busy). Hence
P

Ls = Lq + 0 P0 + 1 (1 P0 )

2
= Lq + Lq =
.
1
1

Now let Ws be the average waiting


time in the system and Wq be the average waiting time in the queue. We have
P  n+1 
the result that Ws = n=0 Pn , that is if there are n customers in the system then there is a waiting time n
in the queue and a time

being served. Similarly to before we have Ws = Wq + 1 , hence




1
1
1

Ls
1X
nPn + =
+1 =
=
Ws =
.
Ws =
n=0

1
(1 )
(1 )

We can see similarly that


Wq =

2
Lq
=
=
.
(1 )
(1 )

We can summarise this as Ws = Wq + 1 , Ls = Lq + and Ls = Ws , Lq = Wq (Littles formula).


Suppose we have an M/M/2 queue with 2 servers and infinite capacity. The individual arrival rate is and
the service rate per server is .

Figure 5: The set up for the M/M/2 queue, note that a single line is formed and the first customer joins any empty
server.

ECM3724 Stochastic Processes

13

We can derive the steady state equations P0 = P1 , P1 = 2P2 ,...,Pn =


2Pn+1 for n 1. We can solve
P

these for P0 to get P1 = 2P0 ,...,Pn = 2n P0 for n 1, where = 2


. We solve n=0 Pn = 1 to find P0 , specifically
!





X
1
1
2
n
= 1 P0 =
and Pn = 2
n for n 1.
P0 1 + 2
= 1 P0 1 +

1
+

1
+

n=1
This steady state solution is valid provided < 1. As before we have
Ls =

nPn =

n=0

2
,
1 2

Lq =

(n 2)Pn =

n=2

23
.
1 2

We can also use the relation Ls = Lq plus the expected number of people being served. To get Ws and Wq we will
L
use Littles formula: Ws = Ls and Wq = q , that is the expected time spent in the system and the queue respectively.
M/M/N systems
Suppose we have a system with N servers, infinite capacity, an arrival rate and a service rate (per server).

Figure 6: The state diagram for the M/M/N/ system.


From the state diagram we can derive the steady state equations P0 = P1 , P1 = 2P2 ,..., PN 1 = N PN ,
Pn = N Pn+1 for all n N . From these equations we can derive
(
n
if n < N and =
n! P0
Pn =
n

P0
if n N
N ! N nN
P
As before we can set n=0 Pn = 1 to solve for P0 and hence for Pn . In this case we can
P
P
L
apply Littles theorem so Ls = n=0 nPn and Ws = Ls , Lq = n=N (n N )Pn and Wq = q .
provided we set =

Example
Suppose we have an M/M/1 queue with finite capacity N . The possible system states are 0, 1, 2, ..., N . The arrival
rate is and the service rate is and we define the state of the system to be the number of people in the shop. We
can use our previous steady state equations to find the steady state probabilities Pn = n P0 for n N and = .
PN
As usual we solve for P0 by setting n=0 Pn = 1, that is P0 (1 + + 2 + ... + N ) = 1. This is a geometric series so
1
1
N
P0 = 1
) = 1 so P0 = N1+1 . Using the 6= 1
N +1 provided 6= 1. In the case of = 1 we have P0 (1 + 1 + ... + 1
n

(1)
case we have Pn = 1
N +1 for n N . As N , the results are consistent with the infinite capacity case. Now
PN
PN
Ls = n=0 nPn = P0 n=0 nn . Let X be the state of the system. Then

GX () = E(X ) =

N
X

n P (X = n) =

n=0

We then compute
Ls =

G0X (1)

N
X
n=0

P0 ()n =

P0 (1 ()N +1 )
.
1

to get a formula for E(X) in terms of N and , specifically

(1 (N + 1)N + N N +1 )
.
(1 )(1 N +1 )

In this case we are unable to apply Littles theorem (due to customers potentially being turned away). We need to
replace by a modified effective arrival rate ef f = (1 PN ).

ECM3724 Stochastic Processes

14

Littles Formulae
Let ef f denote the effective arrival rate, that is the rate at which customers arrive and actually join the queue,
that is the arrival rate for customers who eventually get served.
Littles Theorem
Ls = ef f Ws , Lq = ef f Wq where Ls (Lq ) is the average number of customers in the system (queue) and Ws (Wq )
is the average waiting time in the system (queue).
Idea of proof
An arriving customer sees Ls in the system (on average). The customer spends time Ws in the system before
departing, in this time a further ef f Ws have arrived. Since were in steady state, the number seen on arrival
should balance the number seen on departure, hence Ls = ef f Ws .
In general ef f 6= . For M/M/1, M/M/2 queues with infinite capacity then ef f = . For an M/M/1 queue with
finite capacity N , ef f = (1 PN ). In general,
ef f =

(Probability of being in state n)(Probability customer stays, given state n)(Arrival rate to state n).

n=0

For example, in an M/M/1 queue, ef f =


PN 1
n=0 Pn = (1 PN ).

n=0

Pn 1 = . In a model with finite capacity N , ef f =

P
P
Remark: We have Ls =
n=0 nPn , Lq = P n=r (n r)Pn (for r servers. Also, Ls = Lq +expected number

being served= Lq + 0P0 + 1P1 + ... + rPr + r n=r+1 Pn .


Method
Compute Pn , P0 via steady state diagrams.
Compute Ls , Lq directly.
Compute ef f , then Ws , Wq via Littles Theorem.
Queue efficiency
Is an M/M/2 queue faster than two parallel M/M/1 systems?
We take = 1 and let be the arrival rate with < 2.

Figure 7: The two queue systems to be compared. In the case of the two queues, new customers join either queue
with equal probability.

2
1 for M/M/1, Ls = 12
2(/2)
4

= /2
1 and for M/M/2, = 2 = 2 . For M/M/2, Ls = 1(/2)2 = 42 . For M/M/1
Ls

2
4
2 , hence the system total is 2Ls = 2 . For M/M/2, Ws = = 42 (which is

We can quote the results for Ls and Ws for the M/M/1 and M/M/2 queues. Ls =
for M/M/2. For M/M/1,

(/2)
per server, Ls = 1(/2)
=
valid given our assumption that < 2). For each M/M/1 queue we have

Ws =

Individual server Ls
1
2
=
Ls =
.
Arrival rate to the server
(/2)
2

ECM3724 Stochastic Processes

15

We can now compare the two systems:


Waiting time in parallel M/M/1 Waiting time in M/M/2 =

2
2
4
=
> 0.

2 4 2
4 2

Hence the expected waiting time in two M/M/1 queues is longer than that of one M/M/2 queue. We can also
compare the expected number of people in each of the two systems:
Expected number in parallel M/M/1 Expected number in M/M/2 =

22
2
4
=
> 0.

2 4 2
4 2

Hence we can expect to see more people in the two M/M/1 queues than in the single M/M/2 queues.
Limitations in the model
1. Finite capacity is usual in realistic models. However for N large, the infinite capacity model is a good
approximation.
2. Customers tend to opt for the queue of minimum length.
3. Arrival and service processes are not always Poisson.

Markov Chains

Each day is considered either cloudy (C), or sunny (S). If C occurs on any given day then on the following day C
occurs with probability 12 . If S occurs on any given day then it is cloudy with probability 31 on the next day.
Let Sn denote the event of being sunny on day n. Let Cn denote the event of being cloudy on day n. Let
P (0) = (P (S0 ), P (C0 )) denote the initial probability state at time 0. For example, if P (0) = (0, 1) then we are
certain that it is cloudy on day 0. We let P (n) = (P (Sn ), P (Cn )). We want to know P (n) given some P (0) . We can
use our information to draw a probability transition diagram.

Figure 8: The probability transition diagram for this weather example.


This gives the day-to-day probability transition rules: P (S1 ) = 23 P (S0 )+ 21 P (C0 ) and P (C1 ) = 13 P (S0 )+ 12 P (C0 ).
Hence
2 1
(P (S1 ), P (C1 )) = (P (S0 ), P (C0 )) 31 31 = (P (S0 ), P (C0 ))T.
2

The components in row 1 of T are the transitions from S and the entries in row 2 are the transitions from C. Note
that the row sum is always 1. Notice also that this rule does no depend on the day n. Hence P (n+1 ) = P (n) T . In

ECM3724 Stochastic Processes

16

general T could depend on time n, and in this case we write T := T (n). However, here T does not depend on time.
By iteration, we see that P (1) = P (0) T ,...,P (n) = P (0) T n . We can observe that as n P (n) settles to a limit
vector, which we call P . That is limn P (n) = P , moreover limn P (n+1) = P . But we have
lim P (n+1) = lim (P (n) T ) = ( lim P (n) )T = P T P = P T.

Let P = (P1 , P2 ). We solve for P using linear algebra, specifically



2 1

2
1
1
1
(P1 , P2 ) = (P1 , P2 ) 13 31 (P1 , P2 ) =
P1 + P2 , P1 + P2 .
3
2
3
2
2
2
This matrix equation gives 3P2 = 2P1 for both components. However P1 + P2 = 1 (since
P is a probability vector).

Solving these simultaneous equations we get P1 = 53 and P2 = 25 . So P = 53 , 25 describes the long term state
behaviour. P is also independent of P (0) in this example. The speed of convergence of P (n) to P depends on
the eigenvalues of T . Observe that P is a row eigenvector of T with eigenvalue 1. The next largest eigenvalue of
modulus less than 1 determines this rate of convergence. To find this eigenvalue we solve det(T I) = 0 for ,
which gives the equation 32 12 61 = 0. We know that = 1 is a solution to this equation and = 16 is
n
the other. Hence |P (0) P | c 16 for some c > 0.
When computing P (n) for small n it is possible to calculate these directly from the probability transition diagram, especially in the case of a large number of states in the system.
Some conventions use column vectors, and in this case we require the column sums of T to be equal to 1. The
problem is equivalent by taking transposes.
General Theory
We consider a system with m states {1, 2, ..., m} (or labelled as (1), (2), ..., (m) or labelled E1 , ..., Em ) and time
n Z, but usually n 0. The state at time n will be denoted by a RV Xn taking values in {1, ..., m}. We will
study P (Xn = j|Xn = i) for 1 i, j m and study (for example) P (Xn+1 = j|Xn = in , Xn1 = in1 ) and so on.
We also assume that Tij = P (Xn+1 = j|Xn = i) is given for all 1 i, j m and all n. To work out the state at
time n + 1, we just need to know the state at time n and the transition probabilities Tij . T is an m m matrix
with elements Tij = P (Xn+1 = j|Xn = i). The system is memoryless in the sense that we do not need the entire
history X0 , X1 , ..., Xn1 , Xn to work out Xn+1 , we just need Xn . The Markov (chain) property states that
P (Xn+1 = j|Xn = in , Xn1 = in1 , ..., X0 = i0 ) = P (Xn+1 = j|Xn = in ).
Moreover
P (Xn+1 = j =

m
X

P (Xn+1 = j|Xn = i)P (Xn = i) P (n+1) = P (n) T.

i=1

Here P (n) = (P (Xn = 1), P (Xn = 2), ..., P (Xn = m)).


Remarks
1. m is usually finite, but we can take m N (a countable number of states), for example random walks which
will be discussed later.
2. It suffices to take time n 0.
3. T could depend on time n. We write T := T (n). In this case P (n) = P (n1) T (n), so P (n) = P (0) T (1)T (2)...T (n).
1
An example of this situation would be a model with two states where the probability of changing state is n+1
for the n-th move.
P
4. T is an m m matrix and each row sum j Tij = 1. Note in the probability transition diagram we omit
arrows where Tij = 0.
Aims
We want to study the long term, steady state behaviour of the system. We also want to classify the states in the

ECM3724 Stochastic Processes

17

system in terms of their recurrence properties. That is, we want to know the frequency of visits to each state as
time evolves.
Consider two urns, Urn 1 and Urn 2. These contain between them 3 balls labelled 1, 2, 3. A ball is selected
at random with an equal chance for any of them to be chosen. We then take that labelled ball and transfer it from
one urn to the other. The state of the system at time n, denoted by Xn is the number of balls in Urn 1. We want
to find P (n) and P (the behaviour as n ).

Figure 9: The probability transition diagram for the case of two urns and three balls.
From the probability transition diagram, we can see that

0 1 0 0
1 0 2 0
3
3

T =
0 2 0 1 .
3
3
0 0 1 0
We observe that as n increases, P (n) oscillates and does not converge
to a limit. So P = limn P (n) does not
P
3
exist as previously defined. However, solving P T = P subject to i=0 Pi = 1 gives a solution P = 18 , 83 , 38 , 81 .
Before solving this, lets fully understand the two state Markov chain.

Figure 10: The probability transition diagram for the two state Markov chain, where 0 a, b 1.
From the probability transition diagram, we can see that


1a
a
T =
.
b
1b

ECM3724 Stochastic Processes

18

If P is the steady state probability then P = P T and P1 + P2 = 1. From the matrix equation we get
P1 = (1 a)P1 + bP2 , P2 = aP1 + (1 b)P2 aP1 = bP2 .


b
a
, a+b
for 0 < a, b < 1. Solving det(T I) = 0 gives
Combining this with the probability equation gives P = a+b
the eigenvalues = 1 and = 1 a b (1, 1) for 0 < a, b < 1. In this case, if |1 a b| < 1 then for any P (0) ,
P (0) T n P with rate bounded by |1 a b|. There are also two special cases to consider.
If a = b = 1 then the

state vectors form an alternating sequence (1, 0) (0, 1) (1, 0).... Moreover 21 , 12 12 , 21 under T . However
Pn1
if we redefine P to be the average of all of the state vectors then n1 k=0 P (k) P . The second special case is
n

b = 0, a 6= 0. In this case P = (0, 1), P (n) = ((1 a)n , 1 (1 a)n ), given P (0) = (1, 0).
Going back to the urn problem with three balls, P (n) oscillates with period 2, however


n1
1 3 3 1
1 X (k)
P =
P
, , ,
.
n
n
8 8 8 8
k=0

What if we now want to consider M balls in two urns? Again each ball is equally likely to be chosen and Xn denotes
the number of balls in Urn 1.

Figure 11: The probability transition diagram for the urn problem with M balls and two urns.
As before we want to determine P . We observe that Tij = P (Xn+1 = j|Xn = i) is an (M + 1) (M + 1)
matrix with zeros on the diagonal. We examine the vector P = (P0 , P1 , ..., PM ) such that P = P T subject to
 M
for 0 i M (this gives the Binomial distribution). For
P0 + P1 + ... + PM = 1. This gives Pi = Mi 12

(n)
(0) n
1 3 3 1

M = 3, we saw P = 8 , 8 , 8 , 8 . However P
= P T 6 P as n (it oscillated). Moreover, for the long-run
Pn1 (k)
1

average, n k=0 P
P as n .
Classification of states (subchains)
Definition
A state j is accessible from i if (T n )ij > 0 for some n > 0. Two states i, j communicate if they are accessible to
each other. This induces an equivalence relation () on the states, that is if i and j communicate then
1. If i j then j i.
2. i i.
3. If i j and j k then i k (transitivity).
Consequently, communication splits the Markov chain into subchains. These are disjoint equivalence classes.
A problem in the study of Markov chains is finding these irreducible subchains (consisting of communicating
states). A Markov chain is irreducible if all states within communicate with each other, for example the urn
problem is irreducible.
A state i is called absorbing if Tii = 1 and Tij = 0 for j 6= i. A state i is periodic with period k > 1 if
(T n )ii > 0 when k|n and (T n )ii = 0 otherwise. State i is aperiodic if no such k exists. The urn problem is a Markov
chain with period 2 since all states have period 2.
Recurrence of states

ECM3724 Stochastic Processes

19

(n)

Let fi be the probability of the first return to state i occurring at time n (starting from i at time 0). Let
P
(n)
fi =
n1 fi , this is the probability of eventual return to state i (starting from i initially). Notice that
(n)

fi
n.

6= (T (n) )ii , where (T (n) )ij = P (Xn = j|X0 = i). The latter (T (n) )ii includes intermediate returns before time

Classification of recurrence
If
Pfi = 1(n)then state i is called recurrent, hence the return to state i is certain (this is equivalently expressed by
(t )ii = ). If fi < 1 then state i is said to be transient and return is not certain (again characterised by
Pn1 (n)
(t
)ii < ). The number of returns is then governed by a geometric distribution with parameter fi in the
n1
P
(n)
case of a transient state. In a recurrent state, any number of returns occurs with probability 1. Let i = n=1 nfi ,
the expected recurrence time. For a recurrent state i, we say that state i is positively recurrent if i is finite,
but we say state i is null recurrent if i is infinite. We say a state is ergodic if it is aperiodic and positively
recurrent. Similarly, a subchain is called ergodic if all of its communicating states are ergodic.
Remark: The urn problem has an irreducible Markov chain but is not ergodic. The previous example considering sunny and cloudy weather can be shown to be ergodic.
For ergodic chains, there exists some n such that (T (n) )ij > 0 for all i, j. For ergodic chains, we see that P (n) P
Pn1
with P = P T . This need not hold for irreducible (for example periodic) chains. Instead n1 k=0 P (k) P as
n .
General approach
Given a Markov matrix T , first draw the corresponding probability transition diagram.
Decide which states communicate, and hence identify subchains.
Decide which state are absorbing, periodic or aperiodic.
P
(n)
Decide which states are recurrent or transient. Calculate fi = n1 fi (the sum of the probabilities of first
return at time n for all positive n). If fi = 1 then state i is recurrent and if fi < 1 then state i is transient.
P
(n)
If state i is recurrent, we compute i = n1 nfi . If i < then state i is positively recurrent. If i =
then state i is null recurrent (this requires the use of series convergence tests).
(n)

Remark: If T is a constant transition matrix (in time n) then every recurrent state has i < (fi
exponentially fast).

Gamblers Ruin Problems

We pose a classical problem. At each play of a game the gambler wins 1 with probability p and loses 1 with
probability q = 1 p. The gambler aims to win N before losing to 0. The gambler starts with i for 0 i N .
What is the probability that the gambler wins? Let Xn be the gamblers fortune at time n, then Xn {0, 1, ..., N }.
By considering Markov chains, we observe that states 0 and N are absorbing states.

Figure 12: The probability transition diagram for the gamblers ruin problem.
Let Ei be the event of winning given that we start in state i and i be the probability of winning given that
we start in state i. We can show that the states 1, ..., N 1 are transient, so with probability 1 we will eventually
reach either state 0 or state N . We aim to get a recurrence relation (difference equation) between i , i1 and i+1

ECM3724 Stochastic Processes

20

which we will then solve for i in terms of i. Notice that we can write
i = P (Winning|Win at time 1)P (Win at time 1)+P (Winning|Lose at time 1)P (Lose at time 1) = pi+1 +qi1 .
(12)
We know that 0 = 0 and N = 1. We take a trial solution i = Ai for some unknown and A. Then
Ai = pAi+1 + qAi1 Ai1 (p2 + 1) = 0 ( 1)(p q) = 0 = 1 or =

q
.
p

Note that we should always have = 1 as a solution. Combining these two values of , we get the general solution
 i
i = A + B pq . From the boundary conditions we have

0 = 0 A + B = 0,

 i
q
 N
1

p
q
=1A+B
= 1 i =
 N .
p
1 pq

This is valid only for q 6= p. If q = p =


i = A + Bi i =

1
2

(13)

then the general solution is

i
N

(14)

is the specific solution.


Remark
Let N , and keep state 0 as an absorbing state. From Equations 12 and 13 we can see that if p > q then we
escape to with positive probability. If p q then we reach state 0 with probability 1 (that is i 0 as N ).
There are analogous results if the lower bound is allowed to tend to with N fixed, that is we will reach state
N with probability 1. If both boundaries are extended to , we get a random walk on Z.
If the states are a, a + s, a + 2s, ..., a + N s = b for some a, b, s R, we need to translate the problem to
{0, 1, 2, ..., N } in order to quote the results from Equations 12 and 13. For example if Xn {4, 2, 0, 2, 4}
then Yn = Xn2+4 {0, 1, 2, 3, 4}.
Extended models and applications
The general set up is that we have N states. Suppose r < N of the states are absorbing and define A = {a1 , a2 , ..., ar }
where ai is an absorbing state. Then A = W L where W is the set of winning states and L is the set of losing
states. Given some state i 6 A, we let Ei be the event of reaching W before L. Let i = P (Ei ). We have two results.
Lemma
i =

j Tij

(15)

where (Ti j) is the transition matrix.


Proof
By the total law of probability
X
X
i =
P (Ei |i j)P (i j) i =
j Tij .
j

P
Note that this result is not the same as the steady state vector equation P = P T Pj = i Pi Tij . The steady
state vector equation has non-zero values for states in A.
In order to solve this problem, we also need to impose boundary conditions. If state i A then i = 0 if i L
and i = 1 if i W . In the classical gamblers ruin problem, A = {0, N }, L = {0} and W = {N }. Let Di be the
expected time to reach some state in A given that we start in some state i 6 A. From this we can define a set of
boundary conditions Di = 0 if state i A.

ECM3724 Stochastic Processes

21

Lemma
If Tij is the transition matrix then

X
Di =
Dj Tij + 1.

(16)

Proof
Let E(n, i) be the event of reaching A in time n, starting in state i. Then
X
X
P (E(n, i)) =
P (E(n, i)|i j)P (i j) =
P (E(n 1, j))Tij .
j

(17)

P
P
P
By definition, Di = n0 nP (E(n, i)). Then Di = n1 j nP (E(n 1, j))Tij . We interchange the sums and
relabel n by n + 1 then

XX
X X
X X
X

Di =
(n+1)P (E(n, j))Tij Di =
nP (E(n, j)) Tij +
P (E(n, j)) Tij =
Dj Tij +1.
j

n0

n0

n0

Example
We consider the classical gamblers ruin problem. Immediately, Equation 15 gives Di = pDi+1 + qDi1 + 1. This
 i
is a difference equation with homogeneous solution Di = A + B pq , and we take a trial particular solution
Di = C + Ei. We substitute this into the above to find C and E. Then we use D0 = DN = 0 to find A and B. If
p = q = 21 we can check that Di = i(N i).
It is usually best to solve Equation 14 (for i ) or Equation 15 (for Di ) by direct algebra of simultaneous equations.
Example
Suppose we toss a fair coin. How long do we have to wait in order to see 3 heads in a row?
Let En be the event of seeing HHH at time n. Then the En are not independent over n, since if the n 1-th toss
comes up T then En = En+1 = 0.

Figure 13: The probability transition diagram for the coin toss example. Note that the unlabelled arrows have
probability 21 .
In this example we have states {S, T, H, HH, HHH} and DHHH = 0 since it is an absorbing state. From
Equation 15 we get
DS =

1
1
DH + DT + 1,
2
2

DH =

1
1
DHH + DT + 2,
2
2

DHH =

1
1
DHHH + DT + 1,
2
2

DT =

1
1
DT + DH + 1.
2
2

Using the boundary condition DHHH = 0 and back-substitution gives DS = 14. We can extend this for requiring
N heads in a row. In this case we can show that DS = 2n+1 2.

ECM3724 Stochastic Processes

22

Example
Suppose coins are flipped in sequence. Player 1 wins when HH occurs and Player 2 wins when T H occurs. The
sequence of coin tosses continues until one of these events occurs. What is the probability that Player 1 wins?

Figure 14: The probability transition diagram for this game. Note that each arrow has probability
values are the probabilities of reaching each winning state.

1
2

and the red

We can see from the diagram that the probability of Player 1 winning is 14 , since the only way for Player 1 to
win is if the first two coins are H. We can see this another way. Let i be the probability of reaching HH from
state i. We want to know S . The states
H, T, HH, T H} and the boundary conditions are
P of the system1 are {S,
HH = 1, T H = 0. We then solve i =
Tij j so S = 2 H + 12 T , T = 21 T + 12 T H , H = 12 HH + 12 T . Solving
these equations simultaneously gives S = 41 . We can also calculate Di , the mean time to finish given state i in a
similar way, noting the boundary conditions DHH = DT H = 0.
This idea generalises. Given any sequence of coin states, we can create a sequence that favourably beats the
original sequence by removing the last state and adding any state to the beginning of the sequence, for example
HT HHT T H loses to HHT HHT T more often than not. Proving the relative probabilities proceeds in a similar
method to above.

You might also like