Professional Documents
Culture Documents
OF COMMUNICATIONS
NETWORKS AND SYSTEMS
isbn-13
isbn-10
978-0-521-85515-0 hardback
0-521-85515-2 hardback
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
to my wife Saskia
and my sons Vincent, Nathan and Laurens
Contents
Preface
1
Introduction
Part I
7
9
Basic distributions
3.1
3.2
3.3
3.4
3.5
3.6
3.7
Probability theory
Random variables
2.1
2.2
2.3
2.4
2.5
2.6
xi
37
Correlation
4.1
4.2
4.3
9
16
20
26
28
34
37
43
47
51
54
58
59
61
61
67
68
vi
Contents
4.4
4.5
4.6
5
Inequalities
5.1
5.2
5.3
5.4
5.5
5.6
5.7
Limit laws
6.1
6.2
6.3
6.4
A stochastic process
The Poisson process
Properties of the Poisson process
The nonhomogeneous Poisson process
The failure rate function
Problems
Renewal theory
8.1
8.2
8.3
8.4
8.5
Stochastic processes
Basic notions
Limit theorems
The residual waiting time
The renewal reward process
Problems
78
82
83
83
84
86
87
90
92
94
97
Part II
74
Denition
97
101
103
104
113
115
115
120
122
129
130
132
137
138
144
149
153
155
157
157
Contents
9.2
9.3
9.4
10
11
Denition
Properties of continuous-time Markov processes
Steady-state
The embedded Markov chain
The transitions in a continuous-time Markov chain
Example: the two-state Markov chain in continuous-time
Time reversibility
Problems
12
Branching processes
12.1
12.2
12.3
12.4
12.5
13
14
A queueing system
The waiting process: Lindleys approach
The Benes approach to the unnished work
The counting process
PASTA
Littles Law
Queueing models
vii
158
168
177
179
179
180
187
188
193
195
196
199
201
201
202
208
218
219
224
228
229
231
233
237
240
243
247
247
252
256
263
266
267
271
viii
Contents
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
Part III
15
Physics of networks
Introduction
The number of paths with m hops
The degree of a node in a graph
Connectivity and robustness
Graph metrics
Random graphs
The hopcount in a large, sparse graph with unit link
weights
15.8 Problems
16
17
271
276
283
289
296
300
304
309
312
317
319
319
321
322
325
328
329
340
346
347
348
349
354
359
361
366
373
380
385
387
388
392
401
Contents
17.4
17.5
17.6
17.7
17.8
18
Introduction
General analysis
The n-ary tree
The uniform recursive tree (URT)
Approximate analysis
The performance measure in exponentially growing
trees
ix
404
407
410
414
416
417
417
419
423
424
431
432
Appendix A
Stochastic matrices
435
Appendix B
471
Appendix C
Solutions of problems
493
Bibliography
523
Index
529
Preface
Performance analysis belongs to the domain of applied mathematics. The
major domain of application in this book concerns telecommunications systems and networks. We will mainly use stochastic analysis and probability
theory to address problems in the performance evaluation of telecommunications systems and networks. The rst chapter will provide a motivation
and a statement of several problems.
This book aims to present methods rigorously, hence mathematically, with
minimal resorting to intuition. It is my belief that intuition is often gained
after the result is known and rarely before the problem is solved, unless the
problem is simple. Techniques and terminologies of axiomatic probability
(such as denitions of probability spaces, ltration, measures, etc.) have
been omitted and a more direct, less abstract approach has been adopted.
In addition, most of the important formulas are interpreted in the sense of
What does this mathematical expression teach me? This last step justies
the word applied, since most mathematical treatises do not interpret as
it contains the risk to be imprecise and incomplete.
The eld of stochastic processes is much too large to be covered in a single
book and only a selected number of topics has been chosen. Most of the topics are considered as classical. Perhaps the largest omission is a treatment
of Brownian processes and the many related applications. A weak excuse
for this omission (besides the considerable mathematical complexity) is that
Brownian theory applies more to physics (analogue elds) than to system
theory (discrete components). The list of omissions is rather long and only
the most noteworthy are summarized: recent concepts such as martingales
and the coupling theory of stochastic variables, queueing networks, scheduling rules, and the theory of long-range dependent random variables that currently governs in the Internet. The connement to stochastic analysis also
excludes the recent new framework, called Network Calculus by Le Boudec
and Thiran (2001). Network calculus is based on min-plus algebra and has
been applied to (Inter)network problems in a deterministic setting.
As prerequisites, familiarity with elementary probability and the knowledge of the theory of functions of a complex variable are assumed. Parts in
the text in small font refer to more advanced topics or to computations that
can be skipped at rst reading. Part I (Chapters 26) reviews probability
theory and it is included to make the remainder self-contained. The book
essentially starts with Chapter 7 (Part II) on Poisson processes. The Poisxi
xii
Preface
January 2006
1
Introduction
The aim of this rst chapter is to motivate why stochastic processes and
probability theory are useful to solve problems in the domain of telecommunications systems and networks.
In any system, or for any transmission of information, there is always a
non-zero probability of failure or of error penetration. A lot of problems in
quantifying the failure rate, bit error rate or the computation of redundancy
to recover from hazards are successfully treated by probability theory. Often
we deal in communications with a large variety of signals, calls, sourcedestination pairs, messages, the number of customers per region, and so on.
And, most often, precise information at any time is not available or, if it
is available, deterministic studies or simulations are simply not feasible due
to the large number of dierent parameters involved. For such problems, a
stochastic approach is often a powerful vehicle, as has been demonstrated
in the eld of physics.
Perhaps the rst impressing result of a stochastic approach was Boltzmanns and Maxwells statistical theory. They studied the behavior of particles in an ideal gas and described how macroscopic quantities as pressure and
temperature can be related to the microscopic motion of the huge amount
of individual particles. Boltzmann also introduced the stochastic notion of
the thermodynamic concept of entropy V,
V = n log Z
where Z denotes the total number of ways in which the ensembles of particles can be distributed in thermal equilibrium and where n is a proportionality factor, afterwards attributed to Boltzmann as the Boltzmann constant.
The pioneering work of these early physicists such as Boltzmann, Maxwell
and others was the germ of a large number of breakthroughs in science.
Shortly after their introduction of stochastic theory in classical physics, the
1
Introduction
p
p!
Pp m
m=0 m!
where the load or tra!c intensity is the ratio of the arrival rate of calls to
the telephone local exchange or switch over the processing rate of the switch
per line. By equating the desired blocking probability s = Pr [QV = p], say
s = 1034 , the number of input lines p can be computed for each load .
Due to its importance, books with tables relating s, and p were published.
Another pioneer in the eld of communications that deserves to be mentioned is Shannon. Shannon explored the concept of entropy V. He introduced (see e.g. Walrand, 1998) the notion of the Shannon capacity of a
channel, the maximum rate at which bits can be transmitted with arbitrary
small (but non zero) probability of errors, and the concept of the entropy
rate of a source which is the minimum average number of bits per symbol required to encode the output of a source. Many others have extended
his basic ideas and so it is fair to say that Shannon founded the eld of
information theory.
A recent important driver in telecommunication is the concept of quality of service (QoS). Customers can use the network to transmit dierent
types of information such as pictures, les, voice, etc. by requiring a specic level of service depending on the type of transmitted information. For
example, a telephone conversation requires that the voice packets arrive at
the receiver G ms later, while a le transfer is mostly not time critical but
requires an extremely low information loss probability. The value of the
mouth-to-ear delay G is clearly related to the perceived quality of the voice
conversation. As long as G ? 150 ms, the voice conversation has toll quality, which is roughly speaking, the quality that we are used to in classical
Introduction
Introduction
is certain level of stringency and kQ (p) is the number of hops towards the
most nearby of the p servers in a network with Q routers.
The popularity of the Internet results in a number of new challenges. The
traditional mathematical models as the Erlang B formula assume smooth
tra!c ows (small correlation and Markovian in nature). However, TCP/IP
tra!c has been shown to be bursty (long-range dependent, self-similar and
even chaotic, non-Markovian (Veres and Boda, 2000)). As a consequence,
many traditional dimensioning and control problems ask for a new solution. The self-similar and long range dependent TCP/IP tra!c is mainly
caused by new complex interactions between protocols and technologies (e.g.
TCP/IP/ATM/SDH) and by other information transported than voice. It
is observed that the content size of information in the Internet varies considerably in size causing the Noah eect: although immense oods are
extremely rare, their occurrence impacts signicantly Internet behavior on
a global scale. Unfortunately, the mathematics to cope with the self-similar
and long range dependent processes turns out to be fairly complex and beyond the scope of this book.
Finally, we mention the current interest in understanding and modeling
complex networks such as the Internet, biological networks, social networks
and utility infrastructures for water, gas, electricity and transport (cars,
goods, trains). Since these networks consists of a huge number of nodes Q
and links O, classical and algebraic graph theory is often not suited to produce even approximate results. The beginning of probabilistic graph theory
is commonly attributed to the appearance of papers by Erds and Rnyi in
the late 1940s. They investigated a particularly simple growing model for a
graph: start from Q nodes and connect in each step an arbitrary random,
not yet connected pair of nodes until all O links are used. After about Q@2
steps, as shown in Section 16.7.1, they observed the birth of a giant component that, in subsequent steps, swallows the smaller ones at a high rate.
This phenomenon is called a phase transition and often occurs in nature.
In physics it is studied in, for example, percolation theory. To some extent,
the Internets graph bears some resemblance to the Erds-Rnyi random
graph. The Internet is best regarded as a dynamic and growing network,
whose graph is continuously changing. Yet, in order to deploy services over
the Internet, an accurate graph model that captures the relevant structural
properties is desirable. As shown in Part III, a probabilistic approach based
on random graphs seems an e!cient way to learn about the Internets intriguing behavior. Although the Internets topology is not a simple ErdsRnyi random graph, results such as the hopcount of the shortest path and
the size of a multicast tree deduced from the simple random graphs provide
Introduction
a rst order estimate for the Internet. Moreover, analytic formulas based
on other classes of graphs than the simple random graph prove di!cult to
obtain. This observation is similar to queueing theory, where, beside the
M/G/x class of queues, hardly closed expressions exist.
We hope that this brief overview motivates su!ciently to surmount the
mathematical barriers. Skill with probability theory is deemed necessary
to understand complex phenomena in telecommunications. Once mastered,
the power and beauty of mathematics will be appreciated.
Part I
Probability theory
2
Random variables
This chapter reviews basic concepts from probability theory. A random variable (rv) is a variable that takes certain values by chance. Throughout this
book, this imprecise and intuitive denition su!ces. The precise denition
involves axiomatic probability theory (Billingsley, 1995).
Here, a distinction between discrete and continuous random variables is
made, although a unied approach including alsoR mixed cases via the Stieltjes integral (Hardy et al., 1999, pp. 152157), j({)gi ({), is possible. In
general, the distribution I[ ({) = Pr [[ {] holds in both cases, and
Z
X
j(n) Pr[[ = n] where [ is a discrete rv
j({)gI[ ({) =
n
Z
=
j({)
gI[ ({)
g{
g{
where [ is a continuous rv
La rgle des partis, a chapter in Pascals mathematical work (Pascal, 1954), consists of a
series of letters to Fermat that discuss the following problem (together with a more complex
question that is essentially a variant of the probability of gamblers ruin treated in Section
11.2.1): Consider the game in which 2 dice are thrown q times. How many times q do we have
to throw the 2 dice to throw double six with probability s = 12 ?
10
Random variables
11
Axiom 2 because D _ > = > and D = D ^ >), for mutually exclusive events
D and E holds that Pr [D _ E] = 0.
As a classical example that explains the formal denitions, let us consider the experiment of throwing a fair die. The sample space consists of
all possible outcomes:
= {1> 2> 3> 4> 5> 6}. A particular outcome of the
experiment, say $ = 3, is a sample point $ 5
. One may be interested in
the event D where the outcome is even in which case D = {2> 4> 6}
and
Df = {1> 3> 5}.
If D and E are events, the union of these events D ^ E can be written
using set theory as
D ^ E = (D _ E) ^ (Df _ E) ^ (D _ E f )
because D_E, Df _E and D_E f are mutually exclusive events. The relation
is immediately understood by drawing a Venn diagram as in Fig. 2.1. Taking
ABc
AB
AcB
B
:
(2.1)
where the last relation follows from Axiom 2. Figure 2.1 shows that D =
(D _ E) ^ (D _ E f ) and E = (D _ E) ^ (Df _ E). Since the events are
mutually exclusive, Axiom 2 states that
Pr [D] = Pr [D _ E] + Pr [D _ E f ]
Pr [E] = Pr [D _ E] + Pr [Df _ E]
Substitution into (2.1) yields the important relation
Pr [D ^ E] = Pr [D] + Pr [E] Pr [D _ E]
(2.2)
Although derived for the measure Pr [=], relation (2.2) also holds for other
measures, for example, the cardinality (the number of elements) of a set.
12
Random variables
q
X
Pr [Dn1 ]
n1 =1
q
X
q
X
q
X
Pr [Dn1 _ Dn2 ]
n1 =1 n2 =n1 +1
q
X
q
X
n1 =1 n2 =n1 +1 n3 =n2 +1
q
X
q1
+ + (1)
q
X
n1 =1 n2 =n1 +1
q
X
Pr _qm=1 Dnm
(2.3)
nq =nq1 +1
The formula shows that the probability of the union consists of the sum of
probabilities of the individual events (rst term). Since sample points can
belong to more than one event Dn , the rst term possesses double countings.
The second term removes all probabilities of samples points that belong to
precisely two event sets. However, by doing so (draw a Venn diagram), we
also subtract the probabilities of samples points that belong to three events
sets more than needed. The third term adds these again, and so on. The
inclusion-exclusion formula can be written more compactly as,
q
q
q
q
h
i
X
X
X
X
q
m31
(1)
Pr _mp=1 Dnp
(2.4)
Pr [^n=1 Dn ] =
m=1
n1 =1 n2 =n1 +1
or with
i
h
Pr _mp=1 Dnp
Vm =
nm =nm31 +1
as
Pr [^qn=1 Dn ] =
q
X
(1)m31 Vm
(2.5)
m=1
Proof of the inclusion-exclusion formula 3 : Let D = q31
n=1 Dn and E = Dq such that
3
Another proof (Grimmett and Stirzacker, 2001, p. 56) uses the indicator function dened in
Section 2.2.1. Useful indicator function relations are
1DKE = 1D 1E
1Df = 1 3 1D
1DX E = 1 3 1(DE)f = 1 3 1Df KE f = 1 3 1Df 1E f
= 1 3 (1 3 1D )(1 3 1E ) = 1D + 1E + 1D 1E = 1D + 1E + 1DKE
Generalizing the last relation yields
1q
D =13
n=1 n
q
\
(1 3 1Dn )
n=1
Multiplying out and taking the expectations using (2.13) leads to (2.3).
13
q31
q31
D E = q
n=1 Dn and D K E = Dq K n=1 Dn = n=1 Dn K Dq by the distributive law in set
theory, then application of (2.2) yields the recursion in q
l
l
k
k
q31
q31
Pr [q
n=1 Dn ] = Pr n=1 Dn + Pr [Dq ] 3 Pr n=1 Dn K Dq
(2.6)
14
Random variables
Substitution of (2.3) into the above expression yields, after suitable grouping of the terms,
q
q
l
k
[
[
Pr Dn1 3
Pr q+1
n=1 Dn = Pr[Dq+1 ] +
n1 =1
q
[
Pr Dn1 K Dn2 3
Pr Dn1 K Dq+1
n1 =1 n2 =n1 +1
q
[
q
[
q
[
q
[
n1 =1
q
[
Pr Dn1 KDn2 K Dn3 +
n1 =1 n2 =n1 +1 n3 =n2 +1
+ + (31)q31
q
[
+ + (31)q
q
[
Pr [Dn ] 3
n1 =1
q+1
[
q+1
[
k
l
Pr Kq
m=1 Dnm 3
q
[
n1 =1 n2 =n1 +1
q+1
[
Pr Dn1 KDn2 KDq+1
n1 =1 n2 =n1 +1
q
[
n1 =1 n2 =n1 +1
q
[
q
[
nq =nq1 +1
l
k
Pr Kq
m=1 Dnm K Dq+1
q
[
nq =nq1 +1
Pr Dn1 K Dn2
n1 =1 n2 =n1 +1
q+1
[
q+1
[
q+1
[
Pr Dn1 K Dn2 K Dn3
n1 =1 n2 =n1 +1 n3 =n2 +1
+ + (31)q
q+1
[
q+1
[
n1 =1 n2 =n1 +1
q+1
[
l
k
Pr Kq
m=1 Dnm K Dq+1
nq+1 =nq +1
Although impressive, the inclusion-exclusion formula is useful when dealing with dependent
random
variables because of its general nature. In parh
i
m
ticular, if Pr _p=1 Dnp = dm and not a function of the specic indices np ,
the inclusion-exclusion formula (2.4) becomes more attractive,
Pr [^qn=1 Dn ]
q
X
=
(1)m31 dm
m=1
q
X
m31 q
=
dm
(1)
m
m=1
q
q
X
X
(1)m31 Vm =
(1)m Vm
m=1
m=0
(2.8)
15
q
X
Pr [Dn ]
(2.9)
n=1
Pr [_qn=1 Dn ] 1
q
X
Pr [Dfn ]
n=1
q
X
(1)m |Vm |
(2.10)
m=0
where the total number of elements in the sample space is |V0 | = Q and
X
m
|Vm | =
_p=1 Dnp
1$n1 ?n2 ?===?nm $q
An integer number s is prime if s A 1 and s has no other integer divisors than 1 and itself
s. The sequence of the rst primes are 2, 3, 5, 7, 11, 13, etc. If I
d and e are divisors of q,
then q = de from which it follows that d and e cannot exceed
both q. Hence, any composite
I
number q is divisible by a prime s that does not exceed q.
16
Random variables
The number of primes smaller than a real number { is ({) and, evidently,
if sq denotes the q-th prime, then (sq ) = q. Let Dn denote the set of the
multiples of the n-th prime sn that belong to
. The number of such sets Dn
in the sieve of Eratosthenes
is equal tothe largest
prime number sq smaller
hs i
s
than or equal to
Q , hence, q =
Q . If t 5 (^qn=1 Dn )f , this means
that t is not divisible by each prime
s number smaller than sq and that t is
a prime number lying between Q ? t sQ . The cardinality of the set
(^qn=1 Dn )f , the number of primes between Q ? t Q is
s
f
q
|(^n=1 Dn ) | = (Q )
Q
On the other hand, if u 5 _mp=1 Dnp for 1 n1 ? n2 ? ? nm q, then
u is a multiple of sn1 sn2 = = = snm and the number of multiples of the integer
sn1 sn2 = = = snm in
is
= _mp=1 Dnp
sn1 sn2 = = = snm
Applying
inclusion-exclusion formula (2.10) with |
| = V0 = Q 1 and
s the
Q gives
q=
(Q )
q
s
X
Q =Q 1
(1)m
m=1
X
1$n1 ?n2 ?===?nm $q
Q
sn1 sn2 = = = snm
hs i
Q , i.e. the
The knowledge of the prime numbers smaller than or equal to
s
rst q =
Q primes, su!ces to compute the number of primes (Q )
smaller than
s or equal to Q without explicitly knowing the primes t lying
between Q ? t Q .
17
The expectation H [[] is also called the mean or average or rst moment of
[. More generally, if [ is a discrete random variable and j is a function,
then \ = j([) is also a discrete random variable with expectation H [\ ]
equal to
X
j({) Pr [[ = {]
(2.12)
H [j([)] =
{
{Ad
H [1[=d ] = Pr[[ = d]
(2.13)
The higher moments of a random variable are dened as the case where
j({) = {q ,
X
H [[ q ] =
{q Pr [[ = {]
(2.14)
{
From the denition (2.11), it follows that the expectation is a linear operator,
#
" q
q
X
X
dn [n =
dn H [[n ]
H
n=1
n=1
h
i
Var[[] = H ([ H [[])2
(2.15)
18
Random variables
where the last equality follows from (2.12). If [ is integer-valued and nonnegative, then the pgf is the Taylor expansion of the complex function *[ (}).
Commonly the latter restriction applies, otherwise the substitution
} =
hlw is used such that (2.17) expresses the Fourier series of *[ hlw . The
importance of the pgf mainly lies in the fact that the theory of functions can
be applied. Numerous examples of the power of analysis will be illustrated.
Concentrating on non-negative integer random variables [,
*[ (}) =
"
X
Pr [[ = n] } n
(2.18)
n=0
1 gn *[ (})
Pr [[ = n] =
n!
g} n }=0
Z
1
*[ (})
=
g}
2l F(0) } n+1
(2.19)
(2.20)
A similar inversion formula for Fourier series exist (see e.g. Titchmarsh (1948)).
19
"
X
j(n) gn *[ (})
(2.21)
H [j([)] =
n!
g} n }=0
n=0
[ [3q
gq *[ (})
1
[3q
}
= H
= H [([ 1) ([ q + 1)}
q
g} q
q!
such that
1 gq *[ (})
[
=
H
q!
g} q }=1
q
(2.22)
gq *[ (hw )
= H [ q hw[
q
gw
from which the moments follow as
gq *[ (hw )
H [[ ] =
gwq w=0
q
(2.23)
gq h3wd *[ (hw )
H [([ d) ] =
gwq
q
(2.24)
w=0
(2.25)
*0 (})
(2.26)
20
Random variables
2
Var[[] = *00[ (1) + *0[ (1) *0[ (1)
= O00[ (1) + O0[ (1)
(2.27)
(2.28)
(2.29)
gI[ ({)
g{
(2.30)
"
[
q=0
where 0 ? e ? 1 and d is an odd positive integer. Since the series is uniformly convergent
for any {, i ({) is continuous everywhere. Titchmarsh (1964, Chapter IX) demonstrates for
i ({+k)3i ({)
that
takes arbitrarily large values such that i 0 ({) does not exist.
de A 1 + 3
2
k
Another class of continuous non-dierentiable functions are the sample paths of a Brownian
motion. The Cantor function which is discussed in (Berger, 1993, p. 21) and (Billingsley, 1995,
p. 407) is an other classical, noteworthy function with peculiar properties.
21
gI[ ({)
{ + R ({)2
=
g{
Using the denition (2.30) indicates that, if I[ ({) is dierentiable at {,
Pr [{ ? [ { + {]
{{<0
{
i[ ({) = lim
(2.31)
{{<0
This means that I[ ({) jumps upwards at { over I[ ({). In that case,
there is a probability mass with magnitude I[ ({) at the point {. Although the second denition (2.31) is strictly speaking not valid in that
case, one sometimes denotes the pdf at | = { by i[ (|) = I[ ({)(| {)
where
R +" ({) is the Dirac impulse or delta function with basic property that
3" (| {)g{ = 1. Even apart from the above-mentioned di!culties
for certain classes of non-dierentiable, but continuous functions, the fact
that probabilities are always conned to the region [0,1] may suggest that
0 i[ ({) 1. However, the second denition (2.31) shows that i[ ({) can
be much larger than 1. For example, if [ is a Gaussian random variable
1
can be
with mean and variance 2 (see Section 3.2.3) then i[ () = I2
made arbitrarily large. In fact,
2
exp ({3)
22
s
= ({ )
lim
<0
2
7
In Lesbesgue measure theory (Titchmarsh, 1964; Billingsley, 1995), it is said that a countable,
nite or enumerable (i.e. function evaluations at individual points) set is measurable, but its
measure is zero.
22
Random variables
{j([)
gj
gj
to [ j ({) if g{ A 0 and to [ A j 31 ({) if g{
? 0. Hence,
(
gj
I[ j 31 ({) >
A0
31 g{
I\ ({) = Pr [j([) {] =
(2.32)
gj
1 I[ j ({) > g{ ? 0
For well-behaved continuous random variables, we may rewrite (2.31) in
terms of dierentials,
i[ ({) g{ = Pr [{ [ { + g{]
and, similarly for i\ (|),
i\ (|) g| = Pr [| \ = j ([) | + g|]
If
j is increasing, then the event {| j ([) | + g|} is equivalent to
31
j (|) [ j 31 (| + g|) = {{ [ { + g{} such that
i\ (|) g| = i[ ({) g{
If j is deceasing, we nd that i\ (|) g| = i[ ({) g{. Thus, if j 31 and
j 0 exists, then the relation between the pdf of a well-behaved continuous
random variable [ and that of the transformed random variable \ = j([)
is
g{ i[ ({)
i\ (|) = i[ ({) = 0
g|
|j ({)|
This expression also follows by straightforward dierentiation of (2.32). The
chi-square distribution introduced in Section 3.3.3 is a nice example of the
transformation of random variables.
2.3.2 The expectation
Analogously to the discrete case, we dene the expectation of a continuous
random variable as
Z "
{i[ ({)g{
(2.33)
H [[] =
3"
R"
In addition for the expectation to exist8 , we require 3" |{| i[ ({)g{ ? 4.
If [ is a continuous random variable and j is a continuous function, then
8
This requirement is borrowed from measure theory and Lebesgue integration (Titchmarsh, 1964,
Chapter X)(Royden, 1988, Chapter 4), where a measurable function is said to be integrable (in
the Lebesgue sense) over D if i + = max(i ({)> 0) and i 3 = max(3i ({)> 0) are both integrable
over D. Although this restriction seems only of theoretical interest, in some applications (see the
23
H [j([)] =
3"
j({)i[ ({)g{
(2.34)
H [[] =
{i[ ({)g{ = {
i[ (x)gx +
g{
i[ (x)gx
{
0
{
0
Z0 "
(1 I[ ({)) g{
(2.35)
=
0
{i[ ({)g{ = {
i[ (x)gx
g{
i[ (x)gx
H [[] =
3"
Z 0
=
3"
3"
3"
3"
3"
I[ ({)g{
3"
I[ ({)g{
"
[
n=3"
31
[
31
[
n Pr [[ = n] =
n Pr [[ = n] +
n=3"
31
[
n Pr [[ = n]
n=0
"
[
n (Pr [[ $ n] 3 Pr [[ $ n 3 1]) +
n=3"
"
[
n (Pr [[ D n] 3 Pr [[ D n + 1])
n=0
32
[
n Pr [[ $ n] 3
n=3"
= 3 Pr [[ $ 31] 3
(n + 1) Pr [[ $ n] +
n=3"
32
[
n=3"
Pr [[ $ n] +
"
[
n=1
"
[
n Pr [[ D n] 3
"
[
(n 3 1) Pr [[ D n]
n=1
Pr [[ D n]
n=1
24
Random variables
n=3"
*[ (}) = H h3}[ =
h3}w i[ (w)gw
(2.37)
3"
3}([3d)Similarly
H h
q }d
q
q g (h *[ (}))
(2.39)
H [([ d) ] = (1)
g} q
}=0
The main dierence with the discrete case lies in the denition H h3}[
(continuous) versus H } [ (discrete). Since the exponential is an entire
9
We remark that
"
[
H [[] =
n Pr [[ = n] =
n=3"
"
[
6=
"
[
n (Pr [[ D n] 3 Pr [[ D n + 1])
n=3"
n Pr [[ D n] 3
n=3"
"
[
n Pr [[ D n + 1] =
n=3"
"
[
Pr [[ D n]
n=3"
because the series in the second line are diverging. In fact, there exists a nite integer n such
that, for any real arbitrarily small A 0 holds that Pr [[ D n ] = 1 3 and Pr [[ D n ] $
Pr [[ D n] for all n ? n . Hence,
H [[] =
n
[
n=3"
Pr [[ D n] +
"
[
n=n
Pr [[ D n] D (1 3 )
n
[
1+f<"
n=3"
S
S"
where "
n=n Pr [[ D n] = f is nite. Also, even for negative [,
n=3" Pr [[ D n] is always
positive.
25
P"
n=0
(31)n [ n
n!
} n , the
(2.40)
n=0
provided11 H [ n = R (n!) which is a necessary condition for the summation
to converge
for } 6= 0. Assuming convergence12 , the Taylor series
3}[
of H h
around } = 0 is expressed as function
of the moments of [,
whereas in the discrete case, the Taylor series of H } [ around } = 0 given
by (2.18) is expressed
in terms of probabilities of [. This observation has
led
3}[
to call H h
sometimes the moment generating function, while H } [
is the probability generating function
[ of the random variable [. On the
around } = 1,
other hand, series expansion of H }
*[ (}) =
"
X
Pr [[ = n] (} + 1 1) =
n=0
"
X
n=0
n
X
n
Pr [[ = n]
(} 1)m
m
5
6
"
"
X
X
n
7
Pr [[ = n]8 (} 1)m
=
m
m=0
m=0
n=m
X
"
n
[
Pr [[ = n]
=
m
m
n=m
If moments are desired, the substitution } $ h3x in H } [ is appropriate.
O[ (}) = log (*[ (})) = log H h3}[
10
11
12
(2.41)
An entire (or integral) function is a complex function without singularities in the nite complex
plane. Hence, a power series around any nite point has innite radius of convergence. In other
words, it exists for all nite complex values.
The Landau big R-notation species the order of a function when the argument tends to some
limit. Most often the limit is to innity, but the R-notation can also be used to characterize
the behavior of a function around some nite point. Formally, i ({) = R (j ({)) for { < "
means that there exist positive numbers f and {0 for which |i ({)| $ f|j({)| for { A {0 .
The lognormal distribution dened by (3.43) is an example where the summation (2.40) diverges
for any } 6= 0.
26
Random variables
q
q g *[ (})
H [[ ] = (1)
(2.42)
g} q }=0
because with H[[ 2 ] = *00[ (0),
2
Var[[] = *00[ (0) *0[ (0)
= O00[ (0)
(2.43)
The latter expression makes O[ (}) for a continuous random variable particularly useful. Since the variance is always positive, it demonstrates that
O[ (}) is convex (see Section 5.5) around } = 0. Finally, we mention that
H ([ H[[])3 = O000
[ (0)
Pr [D _ E]
Pr [E]
(2.44)
The denition implicitly assumes that the event E has positive probability,
otherwise the conditional probability remains undened. We quote Feller
(1970, p. 116):
Taking conditional probabilities of various events with respect to a particular hypothesis E amounts to choosing E as a new sample space with probabilities proportional to the original ones; the proportionality factor Pr[E] is necessary in order
to reduce the total probability of the new sample space to unity. This formulation shows that all general theorems on probabilities are valid for conditional
probabilities with respect to any particular hypothesis. For example, the law
Pr [D ^ E] = Pr [D] + Pr [E] Pr [D _ E] takes the form
Pr [D ^ E|F] = Pr [D|F] + Pr [E|F] Pr [D _ E|F]
(2.45)
27
P
Finally, since all events Dn are mutually exclusive, Pr [D] = n Pr [Dn ] =
P
n Pr [D _ En ]. Thus, if
= ^n En and in addition, for any pair m> n holds
that En _ Em = >, we have proved the law of total probability or decomposability,
X
Pr [D|En ] Pr [En ]
(2.46)
Pr [D] =
n
(2.47)
Pr [D _ En ]
Pr [D|En ] Pr [En ]
Pr [En _ D]
=
=
Pr [D]
Pr [D]
Pr [D]
(2.48)
28
Random variables
(2.49)
where Pr [En ] are called the a-priori probabilities, while Pr [En |D] are the
a-posteriori probabilities.
The conditional distribution function of the random variable \ given [
is dened by
I\ |[ (||{) = Pr [\ ||[ = {]
(2.50)
Pr[[ = {> \ = |]
Pr [[ = {]
i[\ ({> |)
i[ ({)
(2.51)
(2.52)
(2.53)
(2.54)
29
(2.55)
The simplest example of the general function is ] = [ + \ . In that case,
the sum is over all { and | that satisfy { + | = }. Thus,
X
X
Pr [[ = {> \ = } {] =
Pr [[ = } |> \ = |]
Pr [[ + \ = }] =
{
Pr [[ = } |] Pr [\ = |]
(2.56)
Another example: let X be uniform on [0> 1] and [ = cos(2X) and \ = sin (2X ). Using
(2.34),
] 1
cos(2x) sin (2x) gx = 0
H [[\ ] =
0
as well as H [[] = H [\ ] = 0. I
Thus, Cov[[> \ ] = 0, but [ and \ are perfectly dependent
because [ = cos (arcsin \ ) = 1 3 \ 2 .
30
Random variables
then H [Vq ] =
Pq
n=1 n
and
5
!2 6
q
i
h
X
([n n ) 8
Var [Vq ] = H (Vq H [Vq ])2 = H 7
n=1
5
6
q X
q
X
=H7
([n n )([m m )8
n=1 m=1
6
5
q
q
q
X
X
X
([n n )([m m )8
= H 7 ([n n )2 + 2
n=1
n=1 m=n+1
Using the linearity of the expectation operator and the denition of the
covariance (2.56) yields
Var [Vq ] =
q
X
n=1
Var [[n ] + 2
q
q
X
X
(2.57)
n=1 m=n+1
Observe that for a set of independent random variables {[n } the double
sum with covariances vanishes.
The Cauchy-Schwarz inequality (5.17) derived in Chapter 5 indicates that
i h
i
h
(H [([ [ ) (\ \ )])2 H ([ [ )2 H ([ [ )2
such that the covariance is always bounded by
|Cov [[> \ ]| [ \
2.5.3 The linear correlation coe!cient
Since the covariance is not dimensionless, the linear correlation coe!cient
dened as
Cov [[> \ ]
([> \ ) =
(2.58)
[ \
is often convenient to relate two (or more) dierent physical quantities expressed in dierent units. The linear correlation coe!cient remains invariant
(possibly apart from the sign) under a linear transformation because
(d[ + e> f\ + g) = sign(df)([> \ )
This transform shows that the linear correlation coe!cient ([> \ ) is inde2 provided 2 A 0.
pendent of the value of the mean [ and the variance [
[
Therefore, many computations simplify if we normalize the random variable
properly. Let us introduce the concept of a normalized random variable
31
[
[ W = [3
[ . The normalized random variable has a zero mean and a
variance equal to one. By the invariance under a linear transform, the correlation coe!cient ([> \ ) = ([ W > \ W ) and also ([> \ ) = Cov[[ W > \ W ].
The variance of [ W \ W follows from (2.57) as
Cov [[> \ ]
2
[
eU = H [\ ] dU H [[]
Since a correlation coe!cient ([> \ ) = 1 implies Cov[[> \ ] = [ \ , we
see that dU = [
as derived above with normalized random variables.
\
Although the linear correlation coe!cient is a natural measure of the
dependence between random variables, it has some disadvantages. First,
the variances of [ and \ must exist, which may cause problems with
heavy-tailed distributions. Second, as illustrated above, dependence can
lead to uncorrelation, which is awkward. Third, linear correlation is not
invariant under non-linear strictly increasing transformations W such that
(W ([)> W (\ )) 6= ([> \ ). Common intuition expects that dependence measures should be invariant under these transforms W . This leads to the denition of rank correlation which satises that invariance property. Here, we
merely mention Spermans rank correlation coe!cient, which is dened as
V ([> \ ) = (I[ ([)> I\ (\ ))
where is the linear correlation coe!cient and where the non-linear strict
increasing transform is the probability distribution. More details are found
in Embrechts et al. (2001b) and in Chapter 4.
32
Random variables
C 2 I[\ ({> |)
C{C|
Hence,
"
3"
3"
(2.60)
"
H [j([> \ )] =
3"
(2.59)
3"
(2.61)
Most of the di!culties occur in the evaluation of the multiple integrals. The
change of variables in multiple dimensions involves the Jacobian. Consider
the transformed random variables X = j1 ([> \ ) and Y = j2 ([> \ ) and
denote the inverse transform by { = k1 (x> y) and | = k2 (x> y), then
iX Y (x> y) = i[\ (k1 (x> y)> k1 (x> y)) M (x> y)
where the Jacobian M (x> y) is
M (x> y) = det
C{
Cx
C|
Cx
C{
Cy
C|
Cy
3"
33
4
4
(2.64)
q
Y
H } [n
n=1
q
Y
*[n (})
(2.65)
n=1
(2.66)
" X
X
} { Pr [VQ = {|Q = n] Pr [Q = n]
n=0 {
34
Random variables
we have
*VQ (}) =
"
X
*Vn (}) Pr [Q = n]
(2.67)
n=0
"
X
*0Vn (1) Pr [Q = n] =
n=0
H [Vn ] Pr [Q = n]
(2.68)
n=0
hP
n
"
X
Pn
Since H [Vn ] = H
m=1 [m =
m=1 H [[m ] and assuming that all random
variables [m have equal mean H [[m ] = H [[], we have
H [VQ ] =
"
X
nH [[] Pr [Q = n]
n=0
or
H [VQ ] = H [[] H [Q ]
(2.69)
"
X
(2.70)
n=0
35
i\ |[ (||{) dened by (2.51) of the random variable \f = \ |[ can be regarded as only function of |. Using the denition of the expectation (2.33)
for continuous random variables (the discrete case is analogous), we have
Z "
H [\ |[ = {] =
|i\ |[ (||{) g|
(2.71)
3"
Since this expression holds for any value of { that the random variable
[ can take, we see that H [\ |[ = {] = j ({) is a function of { and, in
addition since [ = {, H [\ |[ = {] = j ([) can be regarded as a random
variable that is a function of the random variable [. Having identied the
conditional expectation H [\ |[ = {] as a random variable, let us compute its
expectation or the expectation of the slightly more general random variable
k ([) j ([) with j ([) = H [\ |[ = {]. From the general denition (2.34)
of the expectation, it follows that
Z "
Z "
k ({) j ({) i[ ({) g{ =
k ({) H [\ |[ = {] i[ ({) g{
H [k ([) j ([)] =
3"
3"
3"
where we have used (2.51) and (2.61). Thus, we nd the interesting relation
H [k ([) H [\ |[ = {]] = H [k ([) \ ]
(2.72)
As a special case where k({) = 1, the expectation of the conditional expectation follows as
H [\ ] = H[ [H\ [\ |[ = {]]
where the index in H] claries that the expectation is over the random
P
variable ]. Applying this relation to \ = } VQ where VQ = Q
n=1 [n and
all [n are independent yields
*VQ (}) = H } VQ = HQ HV } VQ |Q = q
"
X
n=0
which is (2.67).
*Vn (}) Pr [Q = n]
3
Basic distributions
(3.1)
38
Basic distributions
(3.2)
Expanding the binomial pgf in powers of }, which justies the name binomial,
q
X
q n q3n n
s t
}
*[ (}) =
n
n=0
(3.3)
The alternative, probabilistic approach starts with (3.3). Indeed, the probability that [ has n successes out of q trials consists of precisely n successes
(an event with probability sn ) and q n failures (with probability equal to
t q3n ). The total number
q of ways in which n successes out of q trials can be
obtained is precisely n .
P
The mean follows from (2.23) or from the denition [ = qm=1 [Bernoulli
and the linearity of the expectation as H [[] = qs. Higher order moments
around the mean can be derived from (2.24) as
q
q
gp h3wqs t + shw
gp X q n q3n w(tq3n)
p
t s
H [([ ) ] =
= p
h
n
gwp
gw
n=0
w=0
w=0
q
X
q n q3n
t s
=
(tq n)p
n
n=0
(3.4)
39
(3.5)
"
X
tn }n =
n=0
s}
1 t}
(3.6)
!
"
X
h3wt@s
gq
q
=
s
t n (n t@s)q
H [([ ) ] = s q
gw
1 thw
w=0
n=0
Similarly as for the binomial random variable, the variance most easily folt
,
lows from (2.27) with O[ (}) = log s+log (})log(1t}), O0[ (}) = }1 + 13t}
O00[ (}) = }12 +
t2
.
(13t})2
Thus,
Var [[] =
t
t2 t
+ = 2
2
s
s
s
(3.7)
P
The distribution function I[ (n) = Pr [[ n] = nm=1 Pr [[ = m] is obtained as
n31
X
1 tn
= 1 tn
tm = s
Pr [[ n] = s
1t
m=0
(3.8)
Hence, the probability that the number of trials until the rst success is
larger than n decreases geometrically in n with rate t. Let us now consider
an important application of the conditional probability. The probability
that, given the success is not found in the rst n trials, success does not
occur within the next p trials, is with (2.44)
Pr[[ A n + p|[ A n] =
Pr [{[ A n + p} _ {[ A n}]
Pr [[ A n + p]
=
Pr [[ A n]
Pr [[ A n]
40
Basic distributions
*[ (}) = h3
"
X
n
n=0
n!
} n = h(}31)
(3.10)
(3.11)
41
w
gq h3(w3h )
gwq
w=0
from which
p
X
n h3
n=0
n!
q
q!
q3n
n
1
3
n!(q 3 n)! qn
q
3n n31
n
\
m
q
=
13
13
13
n!
q
q
q
m=1
Pr[[ = n] =
or
log (Pr[[ = n]) = log
n
n!
n31
[
m
log 1 3
3 n log 1 3
+
+ q log 1 3
q
q
q
m=1
33
{
{
{2
For large q, we use the Taylor expansion log 1 3 q
= 3q
to obtain up to order
3 2q
2 +R q
32
R q
n
n(n 3 1)
2
log (Pr[[ = n]) = log
+ n + R q32 3
+ R q32 3 3
+ R q32
n!
q
2q
2q
n
1
= log
(n 3 )2 3 n + R q32
33
n!
2q
With h{ = 1 + { + R({2 ), we nally obtain the approximation for large q,
n h3
1
Pr[[ = n] =
13
(n 3 )2 3 n + R q32
n!
2q
42
Basic distributions
The coe!cient of
1
q
k
is negative if n M +
1
2
t
+ 14 > +
1
2
t
l
+ 14 . In that n-interval,
the Poisson density is a lower bound for the binomial density for large q and qs = . The reverse
holds for values of n outside that interval. Since for the Poisson density
Pr[[=n]
Pr[[=n31]
,
n
we
see that Pr[[ = n] increases as A n and decreases as ? n. Thus, the maximum of the
Poisson density lies around n = = H[[]. In conclusion, we can say that the Poisson density
approximates the binomial density for large q and qs = from below in the region of about the
I
standard deviation around the mean H[[] = and from above outside this region (in the
tails of the distribution).
(} 1) q
= h(}31)
lim *[ (}) = lim 1 +
q<"
q<"
q
Invoking the Continuity Theorem 6.1.3, comparison with (3.10) shows that
the limit probability generating function corresponds to a Poisson distribution. The SteinChen (1975) Theorem1 generalizes the law of rare events:
this law even holds when the Bernoulli trials are weakly dependent.
As a nal remark, let Vq be the sum of i.i.d. Bernoulli trials each with
mean s, then Vq is binomially distributed as shown in Section 3.1.2. If s
is a constant and independent of the number of trials q, the Central Limit
Theorem 6.3.1 states that sVq 3qs tends to a Gaussian distribution. In
qs(13s)
g n h3
n!
{2
g 3
V
3qs
q
s
$ hI22
qs(13s)
Vq $
The proof (see e.g. Grimmett and Stirzacker (2001, pp. 130132)) involves coupling theory
of stochastic random variables. The degree of dependence is expressed in terms of the total
variation distance. The total variation distance between two discrete random variables [ and
\ is dened as
gW Y ([> \ ) =
|Pr [[ = n] 3 Pr [\ = n]|
and satises
gW Y ([> \ ) = 2 sup |Pr [[ M D] 3 Pr [\ M D]|
DaZ
43
1
1
e d {M[d>e]
(3.12)
(3.13)
(e3d)
3 }2 (e3d)
2
h
q
q
h
(1) g
H [([ )q ] =
q
e d g}
}
}=0
)}
2(1)q gq sinh( e3d
2
=
e d g} q
}
}=0
n=0
leads to
(e d)2q
H ([ )2q =
(2q + 1)22q
H ([ )2q+1 = 0
2
}*
(})
(3.14)
[
Notice that
equals the convolution i W j of two exponential densities i and j with rates
de
d and e, respectively.
44
Basic distributions
Let us dene X as the uniform random variable in the interval [0> 1]. If
Z = 1 X is a uniform random variable on [0> 1], then Z and X have the
g
> { 0
(3.15)
where is the rate at which events occur. The corresponding Laplace transform is
Z "
(3.16)
h3w h3}w gw =
*[ (}) =
}+
0
and the probability distribution is, for { 0,
I[ ({) = 1 h3{
(3.17)
45
The mean or average follows from (2.33) or from H [[] = *0[ (0) as =
H [[] = 1 . The centered moments are obtained from (2.39) as
q
q h}@
g
}+
1
H [
= (1)q
g} q
}=0
h}@
around } = 0 is
!
q
"
"
"
n X
h}@ X 1 } n X
1 q X (1)n
n }
}q
=
(1)
=
}+
n!
n!
n=0
we nd that
n=0
}+
q=0
n=0
q
1 q
q! X (1)n
H [
= q
n!
(3.18)
n=0
Pr [{[ w + W } _ {[ A w}]
Pr [[ w + W ]
=
Pr [[ A w]
Pr [[ A w]
46
Basic distributions
transition from the discrete to continuous space involves the limit process
w $ 0 subject to a xed average waiting time H [W ]. Let w = nw, then
w w@{w
= h3w@H[W ]
lim Pr[W A w] = lim 1
{w<0
{w<0
H [W ]
For arbitrary small time units, the waiting time for the rst success and
with average H [W ] turns out to be an exponential random variable.
({ )2
1
i[ ({) = s exp
(3.19)
22
2
which explicitly shows its dependence on the average and variance 2 . The
importance of the Gaussian random variables stems from the Central Limit
Theorem 6.3.1. Often a Gaussian also called normal random variable
with average and variance 2 is denoted by Q (> 2 ). The distribution
function is
Z {
{
1
(w )2
(3.20)
gw
exp
I[ ({) = s
22
2 3"
R{
w2
where3 ({) = I12 3" h3 2 gw is the normalized Gaussian distribution corresponding to = 0 and = 1. The double-sided Laplace transform is
1
*[ (}) = s
2
3
"
3}w
h
3"
2 }2
(w )2
3}
2
gw
=
h
exp
2 2
(3.22)
Abramowitz and Stegun (1968, Section 7.1.1) dene the error function as
2
erf (}) = I
h3w gw
(w 3 )2
1
{3
exp 3
I
gw
=
1
+
erf
22
2
2
3"
(3.21)
47
}
2q
2 q
g
h 2
(2q)!
2q
=
=
H ([ )
2q
g}
q!
2
}=0
2q+1
=0
H ([ )
(3.23)
Pq
2
Q (n > n2 ) is again a Gaussian random variable Q
>
n=1 n
n=1 n . If
2
[ = Q (> 2 ), then the scaled random variable \ = d[ is a Q (d>
(d)| )
random variable that is veried by computing Pr [\ |] = Pr [ d .
Similarly for translation, \ = [ + e, then \ = Q ( + e> 2 ). Hence, a
linear combination of Gaussian random variables is again a Gaussian random
variable,
q
!
q
q
X
X
X
dn Q (n > n2 ) + e = Q
dn n + e>
d2n n2
n=1
n=1
n=1
q
Y
n=1
n
} + n
m=1
48
Basic distributions
}+
with
Again, the contour can be closed over the negative half plane and the q-th
order poles are deduced from Cauchys relation for the q-th derivative of a
complex function
Z
i ($) g$
1 gn i (})
1
=
n! g} n }=}0
2l F(}0 ) ($ }0 )n+1
as
gq31 h}w
q
(w)q31 3w
h
=
iVq (w) =
(q 1)! g} q31 }=3
(q 1)!
(3.24)
(w)31 3w
h
()
(3.25)
} 3
= 1+
(3.26)
*[ (}; > ) =
}+
and distribution
I[ ({; > ) =
()
w31 h3w gw
(3.27)
49
g} q
}=0
q
3p
X
(d)
= (1)q q!dq
p (q p)!
p=0
In particular, since H [[] = = , we nd with
3}
p
= (1)p K(}+p)
p!K(})
q
3p
q X
H [([ ) ] = (1) q! q
p=0 p (q p)!
q
q X
( + p)
q
q
= (1) q
(1)p
p=0 p
() p
q
= (1)q
q
() X (> + 1 + q> )
q
and
2 = 2 and further, H ([ )3 = 23 , H ([ )4 = 3(+2)
4
4(5+6)
5
.
H ([ ) =
5
[{]
X
m=0
(1)m
({ m)n
m!(n m)!
*Vn (}) =
1 h3}
}
(3.29)
50
Basic distributions
n
X
({ m)n31
n
1
(1)m
=
(n 1)! ({3m)D0
m
(3.30)
m=0
51
3{
2
h
I
2{
{
{ 2 31
i"2 ({) = q q h3 2
2
2 2
(3.31)
or
Pr
p
Y
min [n { = 1
Pr[[n A {]
1$n$p
(3.32)
n=1
p
Y
Pr max [n A { = Pr [not all [n {] = 1
Pr[[n {]
1$n$p
n=1
or
Pr
Y
p
Pr[[n {]
max [n { =
1$n$p
(3.33)
n=1
!
p
p
Y
X
3n {
Pr min [n { = 1
h
= 1 exp {
n
1$n$p
n=1
n=1
An alternative argument for independent random variables is that the event {min1$n$p [n A
{} is only possible if and only if {[n A {} for each 1 $ n $ p. Similarly, the event
{max1$n$p [n $ {} is only possible if and only if all {[n $ {} for each 1 $ n $ p.
52
Basic distributions
Cp
i{[(m) } ({1 > {2 > = = = > {p ) =
Pr [(1) {1 > = = = > [(p) {p
C{1 = = = C{p
p
Y
i[ ({m )
(3.34)
= p!
m=1
because there are precisely p! permutations of the set {[n }1$n$p onto the
given ordered sequence {{1 > {2 > = = = > {p }. If the sequence is not ordered such
that {n A {o for at leastone couple of
indices n ? o, then the probability is
zero because the event [(n) A [(o) is, by denition, impossible. Finally,
the product in (3.34) follows by independence.
If the set {[n }1$n$p is uniformly distributed over [0> w], then
p!
i{[(m) } ({1 > {2 > = = = > {p ) = p
w
=0
0 {1 ? {2 ? ? {p w
elsewhere
Sp
m=1
{m
0 {1 ? {2 ? ? {p
elsewhere
j
[
after a continuous,
non-decreasing
transform
j,
i.e.
j
[
(1)
(2)
53
The event [(n) { means that at least n among the p random variables {[m }1$m$p are smaller than {. Since each of the p random variables
is chosen independently from a same distribution I[ , the probability that
precisely q of the p random variables is smaller than { is binomially distributed with parameter s = Pr [[ {]. Hence,
p
X
p
(Pr [[ {])q (1 Pr [[ {])p3q
(3.35)
Pr [(n) { =
q
q=n
The probability density function can be obtained in the usual, though cumbersome, way by dierentiation,
g Pr [(n) {
i[(n) ({) =
g{
p
X
p g
(Pr [[ {])q (1 Pr [[ {])p3q
=
q g{
q=n
p
X
p
(Pr [[ {])q31 (1 Pr [[ {])p3q
= i[ ({)
q
q
q=n
p
X
p
(Pr [[ {])q (1 Pr [[ {])p3q31
i[ ({)
(p q)
q
q=n
p
p31
p31
Using q q = p q31 , (p q) p
and lowering the upper index
q =p q
in the last summation, we have
p
X
p1
i[(n) ({) = pi[ ({)
(Pr [[ {])q31 (1 Pr [[ {])p3q
q1
q=n
p31
X p 1
(Pr [[ {])q (1 Pr [[ {])p3q31
pi[ ({)
q
q=n
p
X
p1
(Pr [[ {])q31 (1 Pr [[ {])p3q
= pi[ ({)
q1
q=n
p
X
p1
(Pr [[ {])q31 (1 Pr [[ {])p3q
pi[ ({)
q1
q=n+1
p1
(I[ ({))n31 (1 I[ ({))p3n
i[(n) ({) = pi[ ({)
n1
(3.36)
The more elegant and faster argument is as follows: in order for [(n) to be
equal to {, exactly n 1 of the p random variables {[m }1$m$p must be
54
Basic distributions
less than {, one equal to { and the other p n must all be greaterthan {.
Abusing the notation i[ ({) = Pr [(n) = { and observing that p p31
n31 =
p!
p!
1!(n31)!(p3n)! is an instance of the multinomial coe!cient q1 !q2 !qn ! which
gives the number of ways of putting p = q1 + q2 + + qn dierent objects
into n dierent boxes with qm in the m-th box, leads alternatively to (3.36).
3.5 Examples of other distributions
1. The Gumbel distribution appears in the theory of extremes (see Section 6.4) and is dened by the distribution function
3d({3e)
(3.37)
}
3d(w3e)
h3}w h3h
dh3d(w3e) gw = h3e} 1 +
*Gumbel (}) =
d
3"
g 3e}
from which the mean follows as H [[] = g}
h 1 + d} }=0 = e + d ,
where = 0.57721=== is the Euler constant. The variance is best computed
2
with (2.43) resulting in Var[[] = 6d
2.
2. The Cauchy distribution has the probability density function
iCauchy ({) =
and corresponding distribution,
ICauchy ({) =
1
(1 + {2 )
1
+ arctan {
2
"
3"
(3.38)
h3}{ g{
1 + {2
55
3l
Since h3l$uh = h$u sin = h3|$|u sin and sin 0 for 0 , the limit
of the last integral vanishes. The contour encloses the simple pole (zero of
{2 + 1 = ({ l)({ + l)) at { = l. Applying Cauchys residue theorem, we
obtain
Z " 3l${
h
g{
h3l${ ({ + l)
=
2l
lim
= h3$
2
2
{<3l
1
+
{
1
+
{
3"
If $ 0, we close the contour over the positive Im({)-plane such that the
contribution of the semi-circle to the contour F again vanishes. The resulting
contour then encloses the simple pole at { = l and
Z " 3l${
h
g{
h3l${ ({ l)
=
2l
lim
= h3$
2
2
{<l
1
+
{
1
+
{
3"
Combining both expressions results in
Since |$| is not analytic around $ = 0, none of the moments of the Cauchy
distribution exists! Hence, the Cauchy distribution is an example of a distribution without mean (see the requirement for the existence
of the expectaR " {g{
tion in Section 2.3.2), although the improper integral 3" 1+{2 = 0 due to
R " {g{
R 0 {g{
diverge.
symmetry (in the Riemann sense), but both 3" 1+{
2 and 0
1+{2
Pq
In addition, if Vq = n=1 [n is the sum of i.i.d. Cauchy random variables
[n , the sample mean Vqq has the Fourier transform,
q
h
i
h $ Sq
i Y
h $ i h $ iq
Vq
H h3l$ q = H h3l q n=1 [n =
H h3l q [n = H h3l q [
= h3|$|
n=1
Hence, the sample mean Vqq of i.i.d. Cauchy random variables is again a
Cauchy random variable independent of q. This means that the law of large
numbers (see Section 6.2) does not hold for the Cauchy random variable,
as a consequence of the non-existence of the mean. Also, the sum Vq has
1
Fourier transform h3|q$| and the pdf equals iVq ({) = q 1+({@q)
2 .
(
)
3. The Weibull distribution with pdf dened for { 0 and d> e A 0
e
exp {d
(3.39)
iWeibull ({) =
d 1 + 1e
generalizes the exponential distribution (3.17) corresponding to e = 1 and
d = 1 . It is related to the Gaussian distribution if e = 2. Let [ be a Weibull
56
Basic distributions
Z "
h i
dn n+1
1
{ e
n
n
e
g{ =
{ exp
H [ =
d
d 1 + 1e 0
1e
The generating function possesses the expansion
"
"
3}[ X
n + 1 (}d)n
(})n h n i
1 X
H [ = 1
*[ (}) = H h
=
n!
e
n!
e n=0
n=0
which cannot be summed in explicit form for general e.
Sometimes an alternative denition of the Weibull distribution appears
iWeibull ({) = de{e31 h3d{
(3.40)
3d{e
IWeibull ({) = 1 h
h i Z "
n + 1e
n
n
{ iWeibull ({)g{ =
H [ =
dn@e
0
Z "
1 + 2e 2 1 + 1e
2
Var[[] =
({ H [[]) iWeibull ({)g{ =
d2@e
0
The interest of the Weibull distribution in the Internet stems from the
self-similar and long-range dependence of observables (i.e. quantities that
can be measured such as the delay, the interarrival times of packets, etc.).
Especially if the shape factor e 5 (0> 1), the Weibull has a sub-exponential
tail that decays more slowly than an exponential, but still faster than any
power law.
4. Power law behavior is often described via the Pareto distribution
with pdf for { 0 and A 0>
{ 331
(3.41)
iPareto ({) =
1+
and with distribution function
Z
{ 331
{
{ 3
1+
IPareto ({) =
gw = 1 1 +
(3.42)
0
Since lim{<" I ({) = 1, the power must exceed 0. The higher moments
are Beta-functions (Abramowitz and Stegun, 1968, Section 6.2.1)
h i Z "
{n g{
n ( n)
H [n =
{ +1 = n!
0 (1 + )
()
57
and show that H [ n only exists if A n. Hence, the mean H [[] only ex-
ists if A 1. The deep tail asymptotic for large { is iPareto ({) = R {331
and Pr [[ A {] = R ({3 ). For example, the distribution of the nodal degree
in the Internet has an exponent around = 2=4 (see Section 15.3).
5. Another distribution with heavy tails is the lognormal
distribution
dened as the random variable [ = h\ where \ = Q > 2 isa Gaussian
Z log {
1
(w )2
Ilognormal ({) = s
exp
gw
(3.43)
22
2 3"
and, for { A 0,
i
h
{3)2
exp (log2
2
s
ilognormal ({) =
{ 2
(3.44)
Z "
h i
1
(log { )2
n
n31
H [ = s
{
exp
g{
22
2 0
Z "
(x )2
1
nx
gx
h exp
= s
22
2 3"
or, explicitly,
2 2
h i
n
n
H [ = exp (n) exp
2
and
(3.45)
2
Var[[] = h2 h2 h
(3.46)
Z
0
"
h3
h3}w
(log w3)2
2 2
1
gw = s
2
"
h3}h h3
({3)2
2 2
g{
3"
(3.47)
only exists for Re(}) 0.
The integral (3.47) indicates that *[
This means that *[ (}; > 2 ) is not analytic at any point } = lw on the
imaginary axis because the circle with arbitrary small but non-zero radius
around } = lw necessarily encircles points with Re(}) ? 0 where *[ (}; > 2 )
does not exist. Hence, the Taylor expansion (2.40) of the generating function
around } = 0 does not exist, although all moments or derivatives at } = 0
(}; > 2 )
58
Basic distributions
n=0
is a divergent series (except for = 0 or } = 0). The fact that the pgf
(3.44) is not available in closed form complicates the computation of the
sum of i.i.d. lognormal random variables via (2.66). This sum appears in
radio communications with several transmitters and receivers.
In radio communications, the received signal levels decrease with the distance between the
transmitter and the receiver. This phenomenon is called pathloss. Attenuation of radio signals
due to pathloss has been modeled by averaging the measured signal powers over long times and
over various locations with the same distances to the transmitter. The mean value of the signal
power found in this way is referred to as the area mean power Pd (in Watts) and is well-modeled as
Pd (u) = fu3 where f is a constant and is the pathloss exponent5 . In reality the received power
levels may vary signicantly around the mean power Pd (u) due to irregularities in the surroundings
of the receiving and transmitting antennas. Measurements have revealed that the logarithm of
the mean power P (u) at dierent locations on a circle with radius u around the transmitter is
approximately normally distributed with mean equal to the logarithm of the area mean power
Pd (u). The lognormal shadowing model assumes that the logarithm of P(u) is precisely normally
distributed around the logarithmic value of the area mean power: log10 (P(u)) = log10 (Pd (u))+[,
where [ = Q (0> ) is a zero-mean normal distributed random variable (in dB) with standard
deviation (also in dB and for severe uctuations up to 12 dB). Hence, the random variable
P(u) = Pd (u)10[ has a lognormal distribution (3.43) equal to
Pr [P(u) $ {] = Pr [ $ log10
&
%
] {
{
(log10 x 3 log10 (Pd (u)))2 gx
1
exp 3
= I
Pd (u)
22
x
2 log 10 0
Pr [[ = n]
H [[]
Var[[]
*[ (}) = H } [
Bernoulli
Binomial
Geometric
Pr [[ = 1] = s
q n
q3n
n s (1 s)
s (1 s)n31
s
qs
s (1 s)
qs (1 s)
1 s + s}
((1 s) + s})q
1
s
13s
s2
Poisson
n
s}
13(13s)}
h(}31)
n!
h3
The constant f depends on the transmitted power, the receiver and the transmitter antenna
gains and the wavelength. The pathloss exponent depends on the environment and terrain
structure and can vary between 2 in free space to 6 in urban areas.
3.7 Problems
59
i[ ({)
H [[]
Var[[]
*[ (}) = H h3}[
1d{e
e3d
h3{
({)2
exp 3
22
I
2
({)1 3{
h
K()
{
3{
h
h h
d+e
2
1
(e3d)2
12
1
2
h}d 3h}e
}(e3d)
}+
2
2
2
6
exp
1
(1+{2 )
e
de{e31 h3d{
331
1+ {
(log {)2
exp 3
22
I
{ 2
= 0=5772===
does not exist
K(1+ 1
e)
K(1+ 2
3K2 (1+ 1
e)
e)
d1@e
1{A1}
31
2 }2
2
l
3 }
}+
K (} + 1)
h3| Im(})| (Re(}) = 0)
d2@e
2 1{A2}
(31)2 (32)
exp () exp
2
2
2
2
h2 h2 3 h
3.7 Problems
(i) If *[ (}) is the probability generating function of a non-zero discrete
random variable [, nd an expression of H [log [] in terms of *[ (}).
(ii) Compute the mean value of the n-th order statistic in an ensemble of
(a) p i.i.d. exponentially distributed random variables with mean 1
and (b) p i.i.d. polynomially distributed random variables on [0,1].
(iii) Discuss how a probability density function of a continuous random
variable [ can be approximated from a set {{1 > {2 > = = = > {q } of q measurements or simulations.
(iv) In a circle with radius u around a sending mobile node, there are
Q 1 other mobile nodes uniformly distributed over that circle. The
possible interference caused by these other mobile nodes depends on
their distance to the sending node at the center. Derive for large Q
but constant density of mobile nodes the pdf of the distance of the
p-th nearest node to the center.
(v) Let X and Y be two independent random variables. What is the
probability that the one is larger than the other?
4
Correlation
h3
{2 +| 2
2
2
62
Correlation
h3 2
iUX (u> ) =
4
which shows that iUX (u> ) does not depend on . Hence, we can write
iUX (u> ) = iU (u) iX () with iX () = f, where f is a constant and iU (u) =
u
h3 2
4f
1
1
, we end up with iU (u) = h 2 2 and iX () = 2
.
Thus, choosing f = 2
These two independent random variables U and can each be generated
separately from a uniform random variable X on [0,1], as discussed in Section
3.2.1, leading to
U = 2 ln(X1 )
= 2X2
and, nally, to the independent Gaussian random variables
p
p
[1 = 2 ln(X1 ) cos 2X2
[2 = 2 ln(X1 ) sin 2X2
The procedure can be used to generate a single Gaussian random variable,
but also more independent Gaussians by repeating the generation procedure.
4.1.2 The q-joint Gaussian probability distribution function
A collection of q random variables [l is called a random vector [ =
([1 > [2 > = = = > [q )W , a matrix with dimension q 1. The average of a random
vector is a vector with components H [[l ] for 1 l q. The variance of a
random vector
63
{W [ { = H {W ([ H [[])([ H [[])W {
h
i
W
= H ([ H [[])W { ([ H [[])W {
h
2 i
= H ([ H [[])W {2 0
which implies that all real eigenvalues l are non-negative. Hence, there
exists an orthogonal matrix X such that
[ = Xdiag(l )X W
(4.1)
3}[
1 W
W
} [ } H [[] }
(4.2)
*[ (}) = H h
= exp
2
Using (4.1), and the fact that X is an orthogonal matrix such that X 31 = X W
and X X W = L,
W
W W
1 W W
W
*[ (}) = exp
X } diag(l )X } X H [[] X }
2
Denote the vectors z = X W } and p = X W H [[]. Then we have
1 W
W
z diag(l )z p z
*[ (}) = exp
2
4
3
q
q z2
2
X
Y
m m
m zm
D
C
pm zm =
= exp
h 2 3pm zm
2
m=1
m=1
64
Correlation
m zm2
3pm zm
2
and h
= *[m (zm ) is the Laplace transform (3.22) of a Gaussian
random variable [m because all m are real and non-negative. With (2.65),
this shows that a vector of joint Gaussian random variables can be transformed into a vector of independent Gaussian random variables. Reversing
the order of the manipulations also justies that (4.2) indeed denes a general q-joint Gaussian probability generating function. If [1 > [2 > = = = > [q are
joint normal and not correlated, then [ is a diagonal matrix, which implies
that [1 > [2 > = = = > [q are independent. As discussed in Section 2.5.2, independence implies non-correlation, but the converse is generally not true. These
properties make Gaussian random variables particularly suited to deal with
correlations.
0
2
fXY(x,y)
0.25
0.2
0.15
0.1
0.05
0
2
0
2
Fig. 4.1. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and = 0=
1
1
W 31
i[ ({) = s q s
exp ({ H[{]) [ ({ H[{])
(4.3)
2
2
det [
The inverse Laplace transform for q = 2 is computed in Section C.2.
After computing the inverse matrix and the determinant in (4.3) explicitly, the two-dimensional (q = 2) or bi-variate Gaussian probability density
function is
5
2
2 6
({3[ )
exp 7
i[\ ({> |; ) =
2
[
2
[ \
3
({3[ )(|3\ )+
2(132 )
2[ \
p
1 2
(|3\ )
2
\
8
(4.4)
65
2
fXY(x,y)
0.25
0.2
0.15
0.1
0.05
0
2
0
Fig. 4.2. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and = 0=8=
2
fXY(x,y)
0.25
0.2
0.15
0.1
0.05
0
2
0
Fig. 4.3. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and = 0=8=
If we denote {W =
(4.4) reduces to
({3[ )
[
and |W =
(|3\ )
,
\
h
i
W 2
W | W +(| W )2
exp ({ ) 32{
2(132 )
p
i[\ ({W > |W ; ) =
2[ \ 1 2
66
Correlation
(4.5)
(4.6)
\ = H (\ H [\ ])(\ H [\ ])W
= H D[(D[)W = H D[[ W DW = DH [[ W DW
= D [ DW = DDW
From the eigenvalue decomposition of \ = X diag(l )X W with real eigens
s W
values l 0 and the fact that diag(l ) = diag( l ) diag( l )
such
that
p
p W
DDW = X diag( l ) diag( l ) X W
we obtain
p
D = X diag( l )
The matrix D is also called the square root matrix of \ and can be found
from the singular value decomposition of \ or from Cholesky factorization
(Press et al., 1992).
Example Generate a normal vector \ with H [\ ] = (300> 300)W , with
standard deviations 1 = 106=066, 2 = 35=355 and correlation \ = 0=8.
67
2
1
11250 3000
\ 1 2
=
\ =
3000 1250
\ 1 2 22
The square root matrix D of \ is
63=640 84=853
D=
0
35=355
which is readily checked by computing DDW = \ . It remains to generate
p independent draws for [1 and [2 from a normal distribution with zero
mean and unit variance as explained in Section 4.1.1. Each pair ([1 > [2 )
out of the p pairs is transformed as \ = D[ +H [\ ]. The result, component
\2 versus \1 , is shown in Fig. 4.4.
Y2
450
400
350
300
250
200
150
100
50
0
0
100
200
300
400
500
600
700
Y1
68
Correlation
|e
|e
Pr [ A
I\ (|) = Pr [\ |] = Pr [d[ + e |] = Pr [
d
d
|e
= 1 I[
A0
d
which contradicts the fact that I\ (|) = 0 for | ? 0. Hence, positive random variables with innite range cannot be correlated with = 1= The
requirement that the range needs to be unbounded is necessary because two
uniform random variables on [0> 1], X1 and X2 , are negatively correlated with
= 1 if X1 = 1 X2 .
In summary, the set of all possible correlations is a closed interval [min > max ]
for which min ? 0 ? max . The precise computation of min and max is, in
general, di!cult, as shown below.
4.3 The non-linear transformation method
The non-linear transformation approach starts from a given set of two random variables [1 and [2 that have a correlation coe!cient [ 5 [1> 1]. If
the joint distribution function
Z {1 Z {2
i[1 [2 (x> y; [ )gxgy
I[1 [2 ({1 > {2 ; [ ) = Pr [[1 {1 > [2 {2 ] =
3"
3"
3"
Since for any random variable [ holds that I[ ([) = X where X is a uniform
random variable on [0> 1], it follows that X1 = I[1 ([1 ) and X2 = I[2 ([2 )
are uniformly correlated random variables with correlation coe!cient X .
As shown in Section 3.2.1, if X is a uniform random variable on [0,1],
any other random variable \ with distribution function j({) can be constructed as j 31 (X ). By combining the two transforms, we can generate
69
\1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) that are correlated because [1
and [2 are correlated. It may be possible to construct directly the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ) if the transforms W1
and W2 are known.
The goal is to determine the linear correlation coe!cient \ dened in
(2.58),
H [\1 \2 ] H [\1 ] H [\2 ]
p
\ = p
Var [\1 ] Var [\2 ]
as a function of [ . Using (2.61),
3"
From the partial dierential equation (4.5) of i[1 [2 (x> y; [ ), it follows that
CH [\1 \2 ]
=
C[
"
3"
"
C 2 i[1 [2 (x> y; [ )
j131 I[1 (x) j231 I[2 (y)
gxgy
CxCy
3"
gj 1 ({)
g{
1
j 0 (j 1 ({))
gives
gj 31 ({)
i[ (x)
g{
gj 31 (I[ (x))
=
= 0 31
gx
g{
j (j (I[ (x)))
{=I[ (x) gx
Since j 0 ({) and i[ (x) are probability density functions and positive,
CH[\1 \2 ]
C[
C\
C[
A 0. Hence,
70
Correlation
Since [ 5 [1> 1], \ increases from \ min at [ = 1 to \ max corresponding to [ = 1. In the sequel, we will derive expressions to compute
the boundary cases [ = 1 and [ = 1.
Theorem 4.3.2 (of Lancaster) For any two strictly increasing real functions W1 and W2 that transform the correlated Gaussian random variables [1
and [2 to the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ),
it holds that
|\ | |[ |
If two correlated random variables \1 and \2 can be obtained by separate
transformations from a bi-variate normal distribution with correlation coefcient [ , the correlation coe!cient \ of the transformed random variables
cannot in absolute value exceed [ . The interest of the proof is that it uses
powerful properties of orthogonal polynomials and that \ is expanded in a
power series in [ in (4.12).
Proof: The proof is based on the orthogonal Hermite polynomials Kq ({) (see e.g. Rainville
(1960) and Abramowitz and Stegun (1968, Chapter 22)) dened by the generating function
"
[
Kq ({) wq
exp 2{w 3 w2 =
q!
q=0
(4.8)
After expanding exp 2{w 3 w2 in a Taylor series and equating corresponding powers in w, we nd
that
[ q2 ]
[
(31)n (2{)q32n
Kq ({) = q!
(4.9)
n! (q 3 2n)!
n=0
with K0 ({) = 1. The Hermite polynomials satisfy the orthogonality relations
] "
2
h3{ Kq ({) Kp ({) g{ = 0
p 6= q
3"
"
3"
I
2
h3{ Kq2 ({) g{ = 2q q!
"
[
dn Kn ({)
n=0
converges for all {, then it follows from the orthogonality relations that
] "
2
1
h3{ i ({) Kn ({) g{
dn = n I
2 n! 3"
The joint normalized Gaussian density function can be expanded (Rainville, 1960, pp. 197198)
in terms of Hermite polynomials
2
2
exp 3 { 32{|+|
"
2
(13 )
2
2 [
q
(4.10)
s
= h3{ 3|
Kq ({) Kq (|) q
2 q!
1 3 2
q=0
71
In order for the covariance Cov[\1 \2 ] to exist, both H \12 and H \22 must be nite. Since
\m = Wm ([m ) for m = 1> 2, the mean is
&
%
] "
] "
1
({ 3 [ )2
g{
H [\m ] =
Wm ({)i[m ({) g{ = I
Wm ({) exp 3
2
2[
2[ 3"
3"
] "
I
2
1
= I
Wm ([ + 2[ x)h3x gx
3"
Let
"
[
I
Wm [ + 2[ x =
dn;m Kn (x)
(4.11)
n=0
with
dn;m =
1
I
2n n!
"
3"
I
2
h3{ Wm [ + 2[ { Kn ({) g{
"
3"
1
= I
"
3"
%
Wm2 ({) exp
&
({ 3 [ )2
3
g{
2
2[
3"
"
3"
5
9
exp 73
"
W1 (x) W2 (y)
(x[ )2
2
[
2[
(y\ )2
(x3[ )(y3\ )+
2
[ \
\
2(132
[)
3
6
:
8
s
gxgy
1 3 2
{2 32[ {|+| 2
] " ] "
exp 3
I
I
(132[ )
=
W1 [ + 2[ { W2 \ + 2\ |
g{g|
t
3" 3"
1 3 2[
%
&
] " ] "
"
"
[
[
1
{2 3 2[ {| + | 2
= t
dn;1
dp;2
Kn ({) Kp (|) exp 3
g{g|
1 3 2[
3" 3"
1 3 2 n=0
p=0
=
3"
3"
2[ \
Using (4.10),
H [\1 \2 ] =
] "
] "
"
"
"
[
[
q
2
2
1 [
[
dn;1
dp;2
h3{ Kn ({) Kq ({) g{
h3| Kp (|) Kq (|) g|
q
n=0
2
q!
3"
3"
p=0
q=0
72
Correlation
"
[
q=0
"
[
q q q
[
(4.12)
q=1
If 21 = 1 and 12 = 1, then |\ | = |[ | because all other n and n must then vanish. In all
other cases, either 21 ? 1 or 12 ? 1 or both, such that
\ = 1 1 [ +
"
[
q q q
[
q=2
and
y
y
x"
x"
"
"
[
[
x[
x[
q
q
q
2 | |q
q q [ $
|q q | |[ | $ w
2q |[ | w
q
[
q=2
q=2
q=2
q=2
because
Sq
n=2
2n ?
S"
n=2
de $
q=2
sS
d2
sS
" [
q
[
|[ |2
= 1 3 21 |[ |2
2n |[ |q $ (1 3 |[ |) 1 3 21
1
3
|
|
[
q=2 n=2
t
t
|\ | $ |[ | |1 1 | + 1 3 21 1 3 12 |[ |
Finally, for 21 $ 1 and 12 $ 1, the inequality |1 1 | +
Lancasters theorem because |[ | $ 1.
t
t
1 3 21 1 3 12 $ 1 holds. This proves
73
1
(x [ )2
(x y)
exp
i[1 [2 (x> y; 1) = s
2
2[
2[
which follows from Pr [[1 {1 > [2 {2 ] = Pr [[ {1 > [ {2 ] = Pr [[ {]
with { = max({1 > {2 ). In that case,
Z "
H [\1 \2 ] =
j131 (I[ (x)) j231 (I[ (x)) gI[ (x)
3"
1
Z
=
(4.13)
1
(x [ )2
exp
i[1 [2 (x> y; 1) = s
(x + y)
2
2[
2[
which follows from the symmetry relations (4.6). In that case,
Z "
j131 (I[ (x)) j231 (I[ (x)) gI[ (x)
H [\1 \2 ] =
Z3"
"
=
j131 (I[ (x)) j231 (1 I[ (x)) gI[ (x)
3"
1
Z
=
(4.14)
74
Correlation
X =
1
4
1
12
3"
"
H [X1 X2 ] =
gx
3"
"
gy
gw h
3"
H [X1 X2 ] =
(2)2
gx0
x3[1
[1
3"
"
, w0 =
gy 0
3"
x0
g h
( [ )2
2
22
[2
3"
(x3[1 )(y3[2 )+
2(132
[)
2 2
(2)2 [
1 [2
Substituting successively x0 =
3"
"
(w[ )2
1
22
[1
2
x[
2[
1
3
2
[1 [2
[1
9
exp 9
73
w3[1
[1
w02
h3 2
, y0 =
gw0
3"
6
:
:
8
t
1 3 2[
y3[2
[2
y0
2
y[
2
2
[2
, 0 =
02
h3 2
3"
3[2
[2
, we obtain
02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
g 0
t
1 3 2[
"
gx0
3"
"
gy0
3"
h3
w02
2
3"
gw0
3"
h3
02
2
g 0
C2
Cx0 Cy 0
02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
t
2
(2)
1 3 2[
"
y0
02
3 2
75
yields
CH [X1 X2 ]
=
C
"
y2
gyh3 2
3"
"
gx0
3"
x0
w02
h3 2
gw0
C
Cx0
3"
02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
t
(2)2 1 3 2[
6
1
6
[
t
+ f = arcsin
+f
2
4 3 2[
It remains to determine the constant f. We have shown in Section 4.3.2 that random variables
generated from uncorrelated Gaussian random variables are also uncorrelated implying that X =
0 if [ = 0 and, hence, that the constant f = 0. This nally results in
6
[
(4.15)
X = arcsin
2
H [\1 \2 ]
1
1 2
1
1 2
= H [1 2 \1 \2 ] 1
76
Correlation
As above, we generate \1 = 11 log I[1 ([1 ) and \2 = 12 log I[2 ([2 ),
where [1 and [2 are correlated Gaussian random variables with correlation
coe!cient [ . Then,
1 2
H [1 2 \1 \2 ] =
H [log I[1 ([1 ) log I[2 ([2 )]
1 2
Z "Z "
log I[1 (x) log I[2 (y)i[1 [2 (x> y; [ )gxgy
=
3"
3"
In the general case for [ 6= 0, the previous method can be followed, which yields after substitution
towards normalized variables,
2
2
] x
] y
exp 3 x 32[ xy+y
] "
] "
2 )
2
2
2
13
(
w
[
H [1 2 \1 \2 ] =
gx
gy log
h3 2 gw log
h3 2 g
t
3"
3"
3"
3"
(2)2 1 3 2[
Unfortunately, we cannot evaluate this integral analytically.
Let us compute the upper bound \ max from (4.13) with j131 ({) = 11 log {
and j231 ({) = 12 log {,
Z 1
log2 {g{ = 2
H [1 2 \1 \2 ; [ = 1] =
0
and thus \ max = 1. The lower boundary \ min follows from (4.14) as1 ,
Z 1
2
log { log(1 {)g{ = 2
H [1 2 \1 \2 ; [ = 1] =
6
0
2
and
]
{n log {g{ = 3
"
S"
{n
n=1 n
"
[
1
n
n=1
gives
{n log {g{
h3(n+1)x xgx = 3
1
(n + 1)2
Thus,
]
Since
]
1
n(n+1)2
1
n
1
n+1
"
[
n=1
1
n(n + 1)2
1
,
(n+1)2
"
"
"
"
"
[
[
[
1 [ 1 [ 1
1
1
2
= 13
= 23
= 2 3(2) = 2 3
3
3
2
2
2
n n=2 n n=2 n
n
n
6
n=1
n=2
n=1
77
h
i
2
\ is limited to the interval 1 6 > 1 . As explained in the introduction
of Section 4.2, the exponential random variables are positive with innite
range for which not all negative correlations are possible. The analysis
demonstrates that it is not possible to construct two exponential random
2
variables with correlation coe!cient smaller than \ min = 1 6 ' 0=645.
"
"
H [\1 \2 ] =
3"
3"
12 2
22 2
= exp d1 1 + d2 2 + d1 + d1 d2 [ 1 2 + d2
2
2
where the Laplace transform (4.2) for q = 2 has been used. Invoking (3.45)
and (3.46) with m $ dm m and m2 $ d2m m2 , the correlation coe!cient \ is
hd1 1 d2 2 [ 1
\ = r
2 2
2 2
hd1 1 1 hd2 2 1
(4.16)
If at least one (but not all) of the quantities 1 > 2 > d1 or d2 grows large, \
tends to zero irrespective of [ . Thus even if [1 and [2 and, hence also
\1 and \2 , have the strongest kind of dependence possible, i.e. [ = 1,
the correlation coe!cient \ can be made arbitrarily small. In case d1 1 =
d2 2 = , (4.16) reduces to
2
\ =
h [ 1
h2 1
2
78
Correlation
d11 d12
d21 d22
(d11 d22 + d12 d21 ) H[Y ] H[Z ] + d12 d22 H Z 2 (H[Z ])2
Since X and Y are independent, H [Y Z ] = H [Y ] H [Z ], and with the denition of the variance (2.16) and denoting Y2 =Var[Y ] and similarly for Z ,
we obtain
2
Cov [[> \ ] = d11 d21 Y2 + d12 d22 Z
In a same way, we nd
2
2
[
= d211 Y2 + d212 Z
2
\2 = d221 Y2 + d222 Z
(4.17)
79
In order to achieve our goal of constructing two correlated random variables [ and \ , we can choose the coe!cients of the matrix D to obtain
an expression as simple as possible. If we choose [ = Y or d11 = 1,
d12 = e1 = 0, the correlation coe!cient reduces to
2
d21 [
1
=r
= q q
2
2
2 + d2 2
d2 Z
[
d221 [
22 Z
1 + d22
2 2
21
d
Z
p 22
[
1 2
p
1 2 , the random variables [ and \ are specied as
[=Y
\ =
p
Z
Y + 1 2 Z + e2
[
2 = 2 and 2 = 2 . Finally,
and the corresponding variances (4.17) are [
Y
\
Z
we require that H [Z ] = Z = 0, which species
e2 = H [\ ]
\
H [[]
[
[
(4.18)
\ =
[
[
In the sequel, we take the positive sign for .
Let us now investigate what happens with the distribution functions
of [
3}[
and \ . Using the
pgfs
for
continuous
random
variables
*
(})
=
H
h
[
3}
H h3}\ = h
\ 3 \ [
[
i h s 2 i
h
3} \ [
H h3} 13 Z
H h [
p
\
3} \ 3 \ [
[
*\ (}) = h
*[
} *Z
1 2 }
[
(4.19)
In order to produce two random variables [ and \ that are correlated with
correlation coe!cient , the pgf of the zero mean random variable Z with
80
Correlation
\ 3 \ [
[
I
}
132
*Z (}) =
*[
[
*\
\
s
13
s} 2
13
(4.20)
}
2
}
*\ 0 s 2
13
*Z (}) =
\
s
*[ 0
}
2
[
13
}
i s 2
13
i (}) =
i s 2}
13
1
(2l)2
f1 +l" Z f2 3l"
f1 3l"
f2 3l"
3"
(4.21)
s 2 i
h
3 }1 +}2 \ [
[
[
H h3}2 13 Z
H h
*[\ (}1 > }2 ) = h
p
\
3} 3 \
*Z }2 1 2 (4.22)
= h 2 \ [ [ *[ }1 + }2
[
3}2 \ 3 \ [
Introduced into the complex double integral (4.21), the joint probability
density function of the two correlated random variables can be computed.
The main deciency of the linear combination method is the implicit assumption that any joint distribution function i[\ ({> |) can be constructed
from two independent random variables [ and Z . The corresponding joint
81
pgf (4.22) possesses a product form that cannot always be made compatible
with the form of an arbitrary pgf *[\ (}1 > }2 ). The examples below illustrate
this deciency.
2
3}1 [3}2 \
[
\2 2
2
= exp }2 \ [ }1 +
} + }1 }2 \ [ +
}
H h
2 1
2 2
Since
2
2
[
\2 2 1
}1
[ \
[
2
} + }1 }2 \ [ +
} =
}1 }2
[ \
\2
}2
2 1
2 2
2
formula (4.2) indicates that H h3}1 [3}2 \ is the two dimensional pgf of a
joint Gaussian with pdf (4.4). The linear combination method thus provides
the exact results for correlated Gaussian random variables.
13
I
132
1+
1+
s}
132
}
s
| 132
|
IZ (w) =
1
2l
f+l"
f3l"
1+
1+
s}
132
s}
| 132
|
} w+
|
13
I
132
g}
fA0
82
Correlation
|
s1
f+l"
f3l"
132
, then
1 + W } h}(w+(13)W )
g}
1 + }W
}
(1 + W }) } + W1 }(w+(13)W )
(1 + W }) } }(w+(13)W )
IZ (w) = lim
h
h
+ lim
}<0 (1 + }W ) }
(1 + }W ) }
}<3 W1
w
= 1 (1 ) h3(13) h3 W
Hence, for the generation of two exponential, correlated random variables,
the auxiliary random variable Z has an exponential distribution with an
atom of size (1 ) h3(13) at w = 0, which is fortunately easily to generate
with a computer. It appears that only for 0, the linear combination
method leads to correct results for exponential random variables. Moreover,
the method does not give anh indication
i of the validity in the range of . We
2
have shown above that 5 1 6 > 1 .
While the linear combination method applied to generate two exponential random variables still correctly treats a range of , the application to
correlated uniform random variables leads to bizarre results and denitely
shows the deciency of the method. The di!culties already encountered in
this chapter in generating q = 2 correlated random variables with arbitrary
distribution suspects that the case for q A 2 must be even more intractable.
4.6 Problem
(i) Show in two dimensions that (4.3), or in explicit form (4.4), is indeed
the joint pdf corresponding to (4.2).
5
Inequalities
Hardy et al. (1999) view the most known inequalities from various angles,
provide several dierent proofs and relate the nature of these inequalities.
For example, starting from the most basic inequality between geometric and
arithmetic mean1 ,
s
{+|
max({> |)
(5.1)
{|
2
s
s 2
{ | 0, they masterly extend this
which directly follows from
relation to the theorem of the arithmetic and geometric mean in several real
variables {n ,
q
q
Y
X
{tnn
tn {n
(5.2)
min({> |)
Pq
n=1
n=1
where
n=1 tn = 1. They further move to the inequalities of CauchySchwarz, of Hlder, of Minkowski and many more. Only a few inequalities
are reviewed here and we recommend the classic treatise on inequalities by
Hardy, Littlewood and Polya for those who search for more depth, elegance
and insight.
The arithmetic-geometric mean P({> |) is the limit for q < " of the recursion {q =
I
1
({q31 + |q31 ), which is an arithmetic mean, and |q = {q31 |q31 , which is a geometric
2
mean, with initial values {0 = { and |0 = |. Gausss famous discovery on intriguing properties
of P({> |) (which lead e.g. to very fast converging series for computing ) is narrated in a
paper by Almkvist and Berndt (1988).
83
84
Inequalities
x+y
i (x) + i (y)
i
2
2
is called convex in that interval L. If i is convex, i is concave. Hardy
et al. (1999, Section 3.6) demonstrate that this condition is fundamental
from which the more general condition2
q
!
q
X
X
i
tn {n
tn i ({n )
(5.3)
n=1
Pq
n=1
(5.4)
The convexity concept can be generalized (Hardy et al., 1999, Section 98) to several variables
in which case the condition (5.3) becomes
$
#
[
[
[
i
tn {n >
tn |n $
tn i ({n > |n )
n
85
f(x)
f(v)
c2
c1
f(u)
u a
a'
b'
b v
curve i in the interval L. The more general form (5.3) asserts that the
centre of gravity of any number of arbitrarily weighted points of the curve
lies above or on the curve. Figure 5.1 illustrates that for any convex function
i and points d> d0 > e> e0 5 [x> y] such that d d0 e0 and d ? e e0 , the
chord f1 over (d> e) has a smaller slope than the chord f2 over (d0 > e0 ) or,
0
(d)
(d0 )
i (ee)3i
= Suppose that i ({) is twice dierentiable
equivalently, i (e)3i
0 3d0
e3d
in the interval L, then a necessary and su!cient condition for convexity is
i 00 ({) 0 for each { 5 L. This theorem is proved in Hardy et al. (1999,
pp. 7677). Moreover, they prove that the equality in (5.3) can only occur
if i ({) is linear.
Applied to probability, relation (5.3) with tn = Pr [[ = n] and {n = n is
written with (2.12) as
i (H [[]) H [i ([)]
(5.5)
and is known as Jensens inequality. The Jensens inequality (5.5) also hold
for continuous random variables. Indeed, if i is dierentiable and convex,
then i ({) i (|) i 0 (|)({ |). Substitute { by the random variable [
and | = H [[], then
i ([) i (H [[]) i 0 (H [[])({ H [[])
After applying the expectation operator to both sides, we obtain (5.5). An
important application of Jensens inequality is obtained for i ({) = h3}{
with real } as
86
Inequalities
Any probability generating function *[ (}) is, for real }, bounded from
below by h3}H[[] .
A continuous analog of (5.3) with i ({) = h{ (and similarly for i ({) =
log {)
Z y
Z y
1
1
exp
i ({)g{
hi ({) g{
yx x
yx x
can be regarded as a generalization of the inequality between arithmetic and
geometric mean.
q31
X
n=1
(5.7)
An important application of Taylors Theorem (or of the mean value theorem) to the exponential function gives a list of inequalities. First,
h{ = 1 + { +
{2 {
h
2
(5.8)
87
h =
q31
X
{n {q {
+
h
n!
q!
n=0
2p31
X
n=0
{n
n!
and, for q = 2p + 1,
h{ A
2p n
X
{
n=0
h{ ?
2p n
X
{
n=0
{A0
n!
{?0
n!
Qq
n=0 (1 + dn {)
(1 + dn {) ? exp
q
X
n=0
n=0
H [[]
d
(5.9)
A tighter bound
relation indicates
SqThe above
Tqis obtained if all dn A 0 (e.g. dn is a probability).
that j({) =
n=0 (1 + dn {) is smaller than i ({) = exp {
n=0 dn for any { 6= 0 and
j(0) = i (0) = 1. Further, from (1 + dn {) ? hdn { it can be veried that, S
for all Taylor
q
n
coe!cients 1 ? n $ q holds that 0 ? jn $ in and j2 ? i2 such that j({) =
n=0 jn { ?
Sq
S
S
"
q
n ?
n for { A 0. Thus, for { = 1, we have j(1) ?
i
{
i
{
i
or
n=0 n
n=0 n
n=0 n
q
\
(1 + dn ) ?
n=0
q
[
1
n!
n=0
q
[
n=0
$n
dn
88
Inequalities
Another proof of the Markov inequality follows after taking the expectation
of the inequality d1[Dd [ for [ 0. The restriction to non-negative
random variables can be circumvented by considering the random variable
[ = (\ H [\ ])2 and d = w2 in (5.9),
h
i
i H (\ H [\ ])2
h
Var [\ ]
=
Pr (\ H [\ ])2 w2
w2
w2
From this, the Chebyshev inequality follows as
Pr [|[ H [[]| w]
2
w2
(5.10)
The Chebyshev inequality quanties the spread of [ around the mean H [[].
The smaller , the more concentrated [ is around the mean.
Further extensions of the Markov inequality use the equivalence between
the events {[ d} / {j([) j(d)} where j is a monotonously increasing
function. Hence, (5.9) becomes
Pr [[ d]
H [j([)]
j(d)
H [[ n ]
For example, if j({) = {n , then Pr [[ d] dn . An interesting application of this idea is based on the equivalence of the events {[ H [[]+w} /
{hx[ hx(H[[]+w) } provided x 0. For x 0,
i
h
x
h3x(H[[]+w) H hx[ = h3x(qs+w)+q log(t+sh )
x[
g2 3x(H[[]+w)
Provided gx
h
H
h
x=xW
89
g 3x(H[[]+w) x[
h
H h
=0
gx
Explicitly,
qshx
g 3x(H[[]+w) x[
x
h
H h
= h3x(qs+w)+q log(t+sh ) (qs + w) +
gx
t + shx
from which xW follows using t = 1 s as
qst + tw
xW = log
qst sw
Hence,
i
h
1
W
W
h3x (H[[]+w) H hx [ =
1+
w
qt
w
qs
w3qt
w+qs
W
W
For large q, but s and w xed, we observe4 that h3x (H[[]+w) H hx [ =
w2
w2
, we
h3 qst 1 + R q1 . Since Var[[] = qst and by denoting | 2 = Var[[]
nd that the asymptotic regime for large q,
#
"
|[ H [[]|
2
| h3|
Pr p
(5.13)
Var [[]
is in agreement with the Central Limit Theorem 6.3.1. The corresponding
Chebyshev inequality,
#
"
|[ H [[]|
1
| 2
Pr p
|
Var [[]
is considerably less tight for the binomial distribution than the Cherno
bound (5.13). More advanced and sharper inequalities than that of Chebyshev are surveyed by Janson (2002).
4
Write
h3x
(H[[]+w)
k l
w
w
H hx [ = exp (w 3 qt) log 1 3
3 (w + qs) log 1 +
qt
qs
90
Inequalities
+ + = 1
(5.14)
The Hlder inequality can be deduced from the basic convexity inequality
(5.4). Since log { is a convex function for real { A 0, the basic convexity
inequality (5.4) is with 0 1,
log (x + (1 )y) log(x) + (1 ) log(y)
After exponentiation, we obtain for x> y A 0 a more general inequality than
(5.1), which corresponds to = 12 ,
x y 13 x + (1 )y
s
Sq|{m | s
|{
m=1 m |
Substitute x =
|{ |s
Pq m
s
m=1 |{m |
!
and y =
|| |t
Pq m
t
m=1 ||m |
t
Sq||m | t ,
||
m=1 m |
!13
then
|{m |s
||m |t
P
Pq
+
(1
)
q
s
t
m=1 |{m |
m=1 ||m |
3
4 3
413
q
q
X
X
C
|{m |s D C
||m |t D
m=1
m=1
By choosing s = 1 and t =
s A 1 and 1s + 1t = 1,
q
X
m=1
1
13 ,
3
41 3
41
s
t
q
q
X
X
s
t
|{m |m | C
|{m | D C
||m | D
m=1
(5.15)
m=1
(5.16)
m=1
91
the following sense (Hardy et al., 1999, Theorem 101 (p. 82)). Suppose that
i ({) is convex (such Rthat the inverse j({) =Ri 31 ({) is also convex) and that
{
{
i (0) = 0. If I ({) = 0 i (x)gx and J({) = 0 j(x)gx, and if
!
!
q
q
q
X
X
X
tn dn en I 31
tn I (dn ) J31
tn J(en )
Pq
n=1
n=1
n=1
with n=1 tn = 1 holds for all positive dn and en , then i ({) = {u and the
above inequality is Hlders inequality.
The next inequalities are of a dierent type. For s A 1, the Minkowski
inequality is
(H [|[ + \ |s ])1@s (H [|[|s ])1@s + (H [|\ |s ])1@s
(5.18)
m=1
m=1
Suppose that i ({) is continuous and strictly increasing for { 0 and i (0) =
0. Then the inverse function j({) = i 31 ({) satises the same conditions.
The Young inequality states that for d 0 and e 0 holds that
Z e
Z d
i (x)gx +
j(x)gx
(5.19)
de
0
}
2
[h3 2 [ , we obtain (*0[ (}))2 = H [h3}[
H h3}[ H [ 2 h3}[ =
*[ (})*00[ (}). Hence, O00[ (}) 0.
92
Inequalities
(H [[])2
H [[ 2 ]
(5.20)
p
where = Var [[] is the standard deviation.
q
if ?
if A
I
3
then p
4
3
then p 1
q3
4
2
93
[ W = [3H[[]
. The proof of this theorem only uses real function theory and
is characteristic for the genius of Gauss.
Proof: Consider the inverse function { = j (|) of the integral | =
I[ (3{). An interesting general property of the inverse function is
]
j 2 (x) gx =
"
3"
U{
3{
i[ (x) gx = I[ ({) 3
{2 i[ ({) g{
which is veried by the substitution { = j (x). Since H [[] = 0 and Var[[] = H [ 2 , we have
]
(5.22)
1
1
=
0 ({) 3 I 0 (3{)
I[
i
({)
+
i[ (3{)
[
[
00
xj (x) gx
0
00
Since j (|) D 0, we have that |j 0 (|) 3 j (|) D 0 and since |j 0 (|) A 0 (for | A 0) that
k (|) = 1 3
j (|)
|j 0 (|)
lies in the interval [0> 1]. From (5.21), it follows that = j (p) and that k (p) = 1 3
j 0 (p) =
pj 0 (p)
or
p (1 3 k (p))
(| 3 pk(p))
p (1 3 k (p))
(5.23)
Clearly, we have that J (p) = and that J0 (|) = p(13k(p))
= j 0 (p) is independent of
|. Since j 0 (|) is non decreasing which is the basic assumption of the theorem the dierence
g
j 0 (|)3J0 (|) is negative if | ? p, but positive if | A p. Since j 0 (|)3J0 (|) = g|
(j (|) 3 J (|)),
the function j (|) 3 J (|) is convex with minimum at | = p for which j (p) 3 J (p) = 0. Hence,
j (|) 3 J (|) D 0 for all | M [0> 1]. Further, J (|) is positive for | M (pk (p) > 1]. Especially in this
interval, the inequality j (|) D J (|) is sharp because j (|) is positive in (0> 1]. Thus,
]
J2 (|) g| $
pk(p)
j 2 (|) g| ?
pk(p)
j 2 (|) g|
(1 3 pk(p))3
? 2
3
94
Inequalities
2 ?
(5.24)
(1 3 p})3
where } = k (p) M [0> 1]. The derivative of the right-hand side with respect to },
$
#
3p2 (1 3 })
g
3 (1 3 })2 2
=3
p
(2 3 3p + p})
g} (1 3 p})3
(1 3 p})4
3p2 (13})2
is monotonously decreasing for all } M [0> 1] if p ? 23 with maximum at
(13p})3
I
} = 0. Thus, if p ? 23 , evaluating (5.24) at } = 0 yields ? 3p. On the other hand, if p A 23 ,
2
2
(13})
2
then 3p
is maximal provided 2 3 3p + p} = 0 or for } = 3 3 p
. With that value of },
(13p})3
2
2
2
I
the inequality (5.24) yields ? 3 13p . Both regimes p A 3 and p ? 3 tend to a same bound
? I2 if p < 23 . The converse is similarly derived from (5.24).
shows that
1
If [ has a symmetric uniform distribution with i[ ({) = 2d
1{M[3d>d] , then
d
p = d and = I3 from which p = I3 . This example shows that Gausss
2
3
Q
X
n=0
Q
X
Pr [[ = n] } +
n
Pr [[ = n] } +
n=0
"
X
sQ +1
n=0 n
"
X
un
n=0
un
} Q+1
(} sn )
Q
X
1
}p
+
} sn p=0 sp+1
n
(5.25)
!
(5.26)
Q
X
n=0
Pr [[ = n] =
"
X
un
Q+1
s
(1 sn )
n=0 n
(5.27)
95
"
X
un
n=0
sm+1
n
(m A Q )
(5.29)
"
X
Pr [[ = m] =
m=N+1
"
X
un
N+1
s
(1 sn )
n=0 n
(N A Q ) (5.30)
m=N+1
"
X
{m3N31 Pr [[ = m]
m=N+1
log C{3N31
"
X
({ 5 R and { 1)
4
{m Pr [[ = m]D
m=0
(5.32)
This inequality holds for all real { 1. To get the tightest bound, we
determine the maximizer {max of (5.32), thus L(N) = sup{D1 [(N +1) log {
log *[ ({)]. There exists such a supremum on account of the convexity of
5
In terms of the queue occupancy in ATM, the initial Pr [[ = m]-regime for m ? Q reects the
cell scale, while the asymptotic regime m D Q refers to the burst scale.
96
Inequalities
L(N) because *[ ({) and log *[ ({) are convex for { 1 as shown in Section
5.5. Assuming that the maximum, say {max exists, then it is solution of
[ ({m a x )
{max = (N + 1) *
*0 ({m a x ) and the large deviations estimate becomes
[
3(N+1)
Pr [[ A N] h3[(N+1) log {m a x 3log *[ ({m a x ))] = *[ ({max ) {max
(5.33)
Observe that (5.33) can be obtained directly from (5.11) with N = w+H[[].
Comparing (5.33) and (5.31) indicates, for large N, that {max = s0 because
lim
N<"
log Pr [[ A N]
= log s0 = log {max
N
"
X
(q + 1)q31 3 q
h
q!
q=0
(5.34)
44536288 (1 )
11714672 (1 )
11
+
+ r (1 )
+
189448875
795685275
(5.35)
(}31)
6
Limit laws
Limit laws lie at the heart of analysis and probability theory. Solutions of
problems often considerably simplify in limit cases. For example, in Section
16.5.1, the ooding time in the complete graph with Q nodes and exponentially distributed link weights can be computed exactly. However, the
expression is unattractive, but, fortunately, the limit result for Q $ 4 is
appealing. Many more results and deep discussions are found in the books
of Feller (1970, 1971). In this chapter, we will mainly be concerned with
P
sums of independent random variables, Vq = qn=1 [n .
6.1.1 Summability
We will need results from the analysis on summability1 . First the discrete
case is presented and then the continuous case.
Lemma 6.1.1 Let {dq }qD1 be a sequence of numbers with limq<" dq = d,
then the average of the partial sums converges to d,
q
1 X
dp = d
q<" q
p=1
lim
(6.1)
In his classical treatise on Divergent Series, Hardy (1948) discusses Csaro, Abel, Euler and
Borel summability in depth.
97
98
Limit laws
q0
q
1 X
1 X
(dp d) +
(dp d)
q p=1
q p=q
0
Hence,
q0
q
q q0
1 X
1 X
f
%
|dp d| +
|dp d| ? +
|vq d|
q
q p=q
q
q
p=1
0
f
? +%
q
Since f is a constant, qf can be made arbitrarily small for q large enough
and
99
Z w
Z
1 W
1 w
1
j(x)gx j"
(j(x) j" ) gx +
|j(x) j" | gx
w
w 0
w W
0
f
wW
? +%
w
w
Both in Markov theory (Section 9.3.2) and in Littles Law (Section 13.6)
these Lemmas will be used.
Pr lim |[n [| = 0 = 1
n<"
n<"
then it is said that the sequence {[n }nD0 converges in probability or in meas
sure to [. This mode of convergence is denoted by [n $ [ as n $ 4.
Convergence in probability is a weaker notion of convergence than almost
sure convergence. Almost sure convergence implies convergence in probability, whereas convergence in probability means that there exists a subsequence of {[n }nD0 that converges almost surely. An equivalent criterion for
almost surely convergence is
Pr [|[n [| A i.o.] = 0
where i.o. stands for innitely often, thus for an innite number of n.
If, for all { with the possible exception of a set of measure zero where
I[q ({) is discontinuous, the distributions
lim I[q ({) = I[ ({)
n<"
100
Limit laws
g
n<"
t
then the sequence
R " {[n }tnD0 converges to [ in O , the space of all functions
i for which 3" |i ({) | g{ ? 4. The most common values of t are 1, 2 and
t = 4. This convergence is also called convergence in norm (see Appendix
A.3). The Markov inequality (5.9)
H [|[n [|]
shows that convergence in mean (t = 1) implies convergence in probability.
In general, it is fair to say that the convergence of sequences belong to the
most complicated topics in both analysis and probability theory. In many
limit theorems, for example, the Law of Large Numbers in Section 6.2 and
Littles Law in Section 13.6, the art consists in proving the theorem with
the least possible number of assumptions or in its most widely applicable
form.
Pr [|[n [| ]
q<"
If there exists a real function j ({) such that |iq ({)| ? j({) and for which
the random variable j([) has nite expectation, then
lim H [iq ([)] = H [i ([)]
q<"
101
Vq
(6.2)
lim Pr
= 0
q<"
q
Proof 3 : Replacing [n by [n demonstrates that, without loss of
generality,
that = 0. Denote Xq = Vqq , then *Xq (}) =
3}X we may
3}Vassume
q = H h
q @q . Since the set {[ } is independent with common
H h
n
distribution, applying relation (2.66) yields
} q
*Xq (}) = *[
q
Since the expectation exists ( = 0), the Taylor expansion
} (2.40)
q of *[
around } = 0 is *[ (}) = 1 + r(})
. Taking
and*Xq (}) = 1+ r q
the logarithm, log(*Xq (})) = q log 1 + r q} = q=r q} = r(}) for large q
such that limq<" *Xq (}) = 1. By the Continuity Theorem 6.1.3, *X (}) =
g
H h3}X = 1 which implies that Xq $ 0. Hence, the sequence Vqq converges
in distribution to = 0, which is equivalent to (6.2).
q
the Weak Law
large q. In fact, large uctuations in Vqq can happen;
of Large Numbers only concludes that large values of Vqq occur with
(very) small probability. For example, in a coin-tossing experiment with
a fair coin such that Pr [[n = 1] = Pr [[n = 0] = = 12 in q-trials, the
sequence of always head {[n = 1}1$n$q is possible with probability 23q
and Vqq = 1 A . But, only for q $ 4, the probability of this always
head sequence is impossible (limq<" 23q = 0). For all nite q there is a
non-zero probability of having a large deviation from the mean.
If we assume in addition to the existence of the expectation that also the
variance Var[[] exists, the Weak Law follows from the Chebyshev inequality
3
102
Limit laws
Vq
Var [[]
Pr
q
q2
which tends to zero for any xed and nite Var[[]. In fact, with the
additional assumption of a nite variance Var[[], a much more precise result
can be proved known as the Central Limit Theorem (Section 6.3). We
remark that the Weak Law of Large Numbers also holds in the case Var[[]
does not exist.
Theorem 6.2.2 (Strong Law of Large Numbers) Let {[n } be a sequence of independent random variables each with distribution identical to
that of the random variable [ with = H [[]. If the expectation = H [[]
and variance Var[[] exists, then,
Vq
= =1
(6.3)
Pr lim
q<" q
Proof : See e.g. Feller (1970, p. 259261), Berger (1993, pp. 4648) or
Wol (1989, pp. 4041). Their proof is based on the Kolmogorov criterion:
P
Var[[n ]
is a su!cient condition for the Strong Law
the convergence of "
n=1
n2
of Large Numbers for independent random variables
[n with mean H [[n ]
The Strong Law of Large Numbers roughly states that Vqq remains
small for su!ciently large q with overwhelming probability. The importance
of the Law of Large Numbers is the mathematical foundation of the intuition
that the sample mean is the best estimator.
Theorem 6.2.3 (Law of the Iterated Logarithm) Let {[n } be a sequence of independent random variables each with distribution identical to
that of the random variable [ with = H [[] and, if Var[[] exists, then,
Vq q
=1 =1
(6.4)
Pr lim sup s
q<"
2q log log q
103
VIII.5)4 .
In addition to the Weak and Strong Laws of Large Numbers, the
Law of
the Iterated Logarithm provides information about large values of Vqq .
q
surely. The latter means that it is satised innitelyqoften and only for a
Z {
w2
1
Vq q
s
{ $ ({) = s
h3 2 gw
Pr
q
2 3"
Proof : Without loss of generality, we may conne to normalized random
variables replace [n by [n3 such that = 0 and = 1. Consider the
scaled random variable Xq = q Vq , where q is a real number depending on
q and to be determined later. Similarly as in the proof of the Weak Law of
Large Numbers, we nd that *Xq (}) = (*[ (q }))q . Due to the existence
of the variance, the Taylor expansion (2.40) of *[ around } = 0 is known
2
with higher precision as *[ (}) = 1 + }2 + r(} 2 ). For su!ciently small }, the
logarithm
q2 } 2
2 } 2
2 2
+ r(dq } ) = q q + r(qd2q } 2 )
log(*Xq (})) = q log 1 +
2
2
only converges to a nite (non-zero) number if q = R I1q . Choosing the simplest function that satises this condition, q =
limq<" log(*Xq (})) =
}2
2
I1 ,
q
leads to
or, since
logarithm is a continuous, increasing
2the
}
function, limq<" *Xq (}) = exp 2 . The transform (3.22) shows that the
corresponding limit random variable is a Gaussian Q (0> 1). The theorem
then follows by virtue of the Continuity Theorem 6.1.3.
104
Limit laws
An alternative formulation of the Central Limit Theorem is that the nfold convolution of any probability density function
converges
to a Gaussian
i
h
(nW)
({3)2
1
with = nH [[]
probability distribution, i[ ({) $ I2 exp 22
and 2 = nVar[[]. Both the Law of Large Numbers and the Central Limit
Theorem can be shown to be valid for a surprisingly large class of sequences
{[n } where each random variable may have a dierent distribution. The
conditions for the extension of the Central Limit Theorem are summarized
in the Lindeberg conditions (Feller, 1971, p. 263). An example where the
sum of independent random variables tend to a dierent limit distribution
than the Gaussian appears in Section 16.5.1.
If higher moments are known, the convergence to the Gaussian distribution can be bounded. Feller (1971, Chapter XVI) devotes a chapter on
expansions related to the Central Limit Theorem culminating in the BerryEsseen Theorem.
Theorem 6.3.2 (BerryEsseen Theorem) Let {[n } be a sequence of
independent random variables each with distribution identical to that
i
h of the
3
2
.
random variable [ with nite = H [[], = Var[[] and = H |[3|
3
Then, with F = 3,
F
Vq q
s
{ ({) s
sup Pr
q
q
{
(6.5)
Proof : See e.g. Feller (1971, Section XVI.5). The constant F can be
slightly improved to F 2=05.
As an example of the rate of convergence towards the Gaussian distribution, the n-fold convolutions of the uniform density given by (3.30) is plotted
in Fig. 6.1 together with the Gaussian approximation (3.19).
105
1.0
k=2
Exact convolution
Gaussian approximation
0.8
k=3
k=4
0.6
k=8
0.4
k = 16
0.2
0.0
0
10
12
14
(n)
Fig. 6.1. Both the exact iX ({) with iX ({) = 10{1 and the Gaussian approximation for several values of n.
Pr max [n { = I p ({)
1$n$p
Pr min [n A { = (1 I ({))p
1$n$p
log Pr max [n {p
= p log I ({p )
1$n$p
Since 0 I ({p ) 1 and since the logarithm has a Taylor expansion log(1
P
{n
{) = "
n=1 n around { = 0 and convergent for |{| ? 1, we rewrite the
right-hand side as log I ({p ) = log [1 (1 I ({p ))] and, after expansion,
= p (1 I ({p )) + r [p (1 I ({p ))]
log Pr max [n {p
1$n$p
106
Limit laws
lim Pr max [n {p = h3
p<"
(6.6)
1$n$p
(6.7)
lim Pr min [n A {p = h3
p<"
1$n$p
D=1
0.8
D=2
D=2
D = 0.5
D = 0.5
0.6
0.4
Weibull
D=1
Frchet
0.2
Gumbel
0.0
-6
-4
-2
Fig. 6.2. The probability density function of the three types of extremal distributions.
107
1
3{
lim Pr max [n (log p + {) = h3h
p<"
1$n$p
The minimum p (1 h3{p ) $ is equivalent to h3{p log p log
log
or {p 1 log(log p log ) 1 log log p log
p . Hence, after putting
{ = log , the limit law for the minimum of exponential random variables
is
{
1
3{
lim Pr min [n A
log log p +
= h3h
p<"
1$n$p
log p
For both the maximum and the minimum of exponential random variables,
3{
a scaling law exists that leads to a Gumbel distribution IJxpeho ({) = h3h .
In other words, for large p, the random variables P = max1$n$p [n
log p and Q = log p (min1$n$p [n ) log p log log p have an identical
distribution equal to the Gumbel distribution.
2. Another example is the maximum of a set of i.i.d. uniform random
variables {Xn } in [0> 1] with I ({) = { for 0 { 1. Since p (1 {p ) $
with 0 {p 1 we have, after putting { =
or, equivalently, {p $ 1 p
with { 0,
{
= h3{
lim Pr max Xn 1
p<"
1$n$p
p
108
Limit laws
p<"
min Zk>n A
1$n$p
k!{
p
1 #
k
= h3{
|k
Pr [Z |] 1 exp p
k!
5
Any path in a rectangular lattice can be representated by a sequence of r(ight), l(eft), u(p) and
d(own), which is called an encoded path word. The encoded path word of the shortest hop
path between diagonal corner points consists of }1 rs (or ls) and }2 ds (or us). The total
number of these paths equals }1}+}2 . Two paths coincide in a same lattice point at j $ k hops
1
from the source node if their encoded path word has the same sum of rs and ds in the rst
j lettres. The number of overlapping links between two paths equals the number of the same
consecutive lettres (r or d) in a block after a same sum of rs and ds in the encoded path words.
Checking for overlap between k hop paths requires a comparison of }1}+}2 ! permutations in
1
the encoded path words.
109
Z "
Z "
1
1
{k
g{ = 1 +
(}1 !}2 !) k
(1 IZ ({)) g{
exp p
H [Z ] =
k!
k
0
0
For a square lattice where }1 = }2 = k2 , we have
2
k
k
1
!
H [Z ] = 1 +
k
2
Using Stirlings formula (Abramowitz and Stegun, 1968, Section 6.1.38) for
s
1
the factorial k! = 2kk+ 2 h3k+ 12k where 0 ? ? 1, for large k, the mean
H [Z ] increases about linearly in the number of hops k,
2
s
k
k
k
H [Z ] '
kh 12k
2h
2h
The average weight of a link (or 1 hop) of the shortest k hop path is roughly
1
2h 0=184.
In spite of the fact that path dependence (overlap) has been
in the
s ignored
Q is correct.
computation of the minimum weight, H [Z ] = R (k) = R
However, the approximate analysis does not give the correct prefactor in
H [Z ] nor the correct limit pdf which turns out to be Gaussian. Hence,
if random variables are not independent, Theorem 6.4.1 does not apply.
Finally, a shortest k hop path is not necessarily the overall shortest path
because it is possible though with small probability that the overall
shortest path has k + 2m hops with m A 0.
4. The probability density function of the longest shortest path The most commonly
used process that informs each node about changes in a network topology (e.g. an autonomous
domain) is called ooding: every router forwards the packet on all interfaces except for the
incoming one and duplicate packets are discarded. Flooding is particularly simple and robust
since it progresses, in fact, along all possible paths from the emitting node to the receiving node.
Hence, a ooded packet reaches a node in the network in the shortest possible time (if overhead in
routers are ignored). Therefore, the interesting problem lies in the determination of the ooding
time WQ , which is the minimum time needed to inform the last node in a network with Q nodes.
Only after WQ , all topology databases at each router in the network are again synchronized, i.e. all
routers possess the same topology information. Rather than investigating the ooding time WQ
(for which we refer to Section 16.5), the largest number of traversed routers (hops) or the longest
shortest path from the emitting node to the furthermost node in its shortest path tree is computed.
The number of hops, in short the hopcount KQ , along the shortest path between two arbitrary
nodes in a network containing Q nodes is modeled subject to the following assumptions: (a) the
hopcount KQ is a Poisson random variable with mean H [KQ ] = = log Q with A 0, which is
motivated in Section 16.3.1; (b) the number of nodes Q is very large6 ; (c) all shortest paths from
the emitting node towards any other node in the network are independent. The problem reduces
to compute the pdf of the random variable max1$n$Q 31 Kn . The distribution function follows
from (3.9) as
IKQ ({) =
6
{
"
[
[
n h3
n h3
=13
n!
n!
n=0
n={+1
110
Limit laws
lim Q 13
Q <"
n
=
n!
n={Q +1
from which we must choose the appropriate {Q as function of Q. Observe that the maximum
term in the series has index n = [], where the latter denotes the largest integer smaller or equal
to . For, the ratio between two consecutive (positive) terms in the n-sum equals d dn =
such
n
n1
that, if A n, then dn A dn31 , implying that the terms increase, while, if ? n, the terms
dn ? dn31 form a decreasing sequence. The series is rewritten as
"
[
n={Q +1
"
{Q +1 [ ({Q + 1)!n
n
=
n!
({Q + 1)! n=0 ({Q + 1 + n)!
{Q +1
2
=
1+
+
+
({Q + 1)!
{Q + 2
({Q + 2)({Q + 3)
We choose {Q = [] + [] ; (1 + ) for large Q and, thus large , where must be related to
. The series then consists of decreasing terms. Moreover, for large ,
"
[
n
(1+)+1
1
1
=
1+
+
+
n!
((1 + ) + 1)!
(1 + ) + 2@
((1 + ) + 2@)((1 + ) + 3@)
n=[]+[]+1
1+
(1+)+1
((1 + ) + 1)!
1
1
+
+
(1 + )
(1 + )2
and thus,
"
[
n=[]+[]+1
(1+)+1 1 +
n
=
n!
((1 + ) + 1)!
1
1+R
I
1
2{{+ 2 h3{ , for
(1+)+1 1 +
1+
(1+)+1 h(1+)
;
I
1
1
((1 + ) + 1)!
((1 + ) + 1) 2(1+)+ 2 (1 + )(1+)+ 2
;
h(1+)[13log(1+)] 1
s
2(1 + )
Q (1+)[13log(1+)]+13 1
s
2(1 + ) log Q
1+R
1
log Q
1
log Q
or
log + ( 3 1) log Q +
1
log log Q + R
2
1
log Q
; ((1 + ) [1 3 log (1 + )]) log Q
3
1
log(2(1 + )) 3 log
2
(6.8)
At this point, we will assume that ? 1, which justies the expansion log (1 + ) = + R( 2 ).
This assumption will be checked later. Thus,
(1 + ) [1 3 log (1 + )] ; (1 + ) [1 3 ] ; (1 3 2 )
111
3 log
2
I
with U = log
2 + ( 3 1) log Q + 12 log log Q . The NewtonRaphson iteration can be
applied with starting value 0 to nd the solution of the equation up to the leading order in log Q,
i.e. U ; (1 3 2 ) log Q. Hence,
v
log
U
31
U
;13
;13
3
13
log Q
2 log Q
2
0 =
I
2 +
1
2
log log Q
2 log Q
which demonstrates that, for D 1, the assumption ? 1 is correct for large Q. The case ? 1
requires the application of NewtonRaphsons method on (6.8), which we omit here. The second
iteration in NewtonRaphsons method leads to
1 = 0 3
0
2
+ log 0
20 log Q +
1
2
1
0
and shows that the n-th iteration improves the previous with a quantity of order R log3n Q .
31
Since (6.8) is only accurate up to R log Q , a second iteration is superuous and we obtain
the choice {Q = (1 + ), or
{Q =
I
1
1
3 + 1
log Q 3 log
2 3 log log Q
2
2
4
max
1$n$Q 31
Kn $
{
{
3 + 1
1
1
+
log Q 3 log log Q 3 log (2) = h3h
2
2
4
4
from which the pdf of the hopcount of the longest shortest path (lsp) follows as
iovs ({) = 2h3h
2({f) 32({3f)
(6.9)
with
3 + 1
1
1
log Q 3 log log Q 3 log (2)
2
4
4
1
1
1
3
=
+
H [KQ ] 3 log H [KQ ] 3 log (2)
2
2
4
4
f=
and
H [ovs] = f +
ydu [ovs] =
E
2
3
1
+
2
2
H [KQ ] 3
1
log H [KQ ] 3 0=170
4
2
' 0=4112
24
Observe that the average longest shortest path is about twice the average hopcount if = 1
while the variance is small, constant and independent of the scaling parameter f or . Figure 6.3
compares the above approximate analysis with simulations.
112
Limit laws
0.6
Theory
Simulation
0.5
Pr[lsp = k]
0.4
0.3
0.2
0.1
0.0
7
10
11
12
13
Number of hops k
Fig. 6.3. The hopcount of the shortest path for Q = 4000. Both simulations based on an
Internet-like topology generator (with unit link weights) and theory iovs (n) with = 0=4786
are shown.
Notes
(i) The classical theory of extremes, extremal properties of dependent
sequences and extreme values in continuous-time are treated in detail
in the book by Leadbetter et al. (1983).
(ii) A more recent book by Embrechts et al. (2001a) applies the theory
of extremal events to problems in insurance and nance.
Part II
Stochastic processes
7
The Poisson process
The word stochastic is derived from r"l in Greek which means to aim at, try to
hit.
115
116
2
t
Fig. 7.1. Two dierent sample paths of the experiment: roll a die and record the
outcome. The total number of dierent sample paths is 6W where W is the number
of times an outcome is recorded. The state space only contains 6 possible outcomes
{1> 2> 3> 4> 5> 6}.
Stochastic processes are distinguished by (a) their state space, (b) the
index set W and (c) by the dependence relations between random variables
[(w). For example, a standard Brownian motion (or Wiener process)2 is dened as a stochastic process [(w) having continuous sample paths, stationary independent increments and [(w) has a normal distribution Q (0> w). A
Poisson process, dened in more detail in Section 7.2, is a stochastic process
[(w) having discontinuous sample paths, stationary independent increments
and [(w) has a Poisson distribution. A generalization of the Poisson process
is a counting process. A counting process is dened as a stochastic process
Q (w) 0 with discontinuous sample paths, stationary independent increments, but with arbitrary distribution. A counting process Q (w) represents
the total number of events that have occurred in a time interval [0> w]. Examples of a counting process are the number of telephone calls at a local
exchange during an interval, the number of failures in a telecommunication
network, the number of corrupted bits after transmission due to channel
errors, etc.
2
Harrison (1990) shows that the converse is also true: if \ is a continuous process with stationary
independent increments, then \ is a Brownian motion.
117
50
Interdepature time = 12 s
hopcount IP path = 13
# measurement points = 1006
E[D] = 35.03 ms
V[D] = 1.36 ms
min[D] = 34.18 ms
max[D] = 53.95 ms
45
40
35
5:00 a.m.
6:00 a.m.
7:00 a.m.
8:00 a.m.
8:30 a.m.
Fig. 7.2. The raw data of the end-to-end delay of IP test packets along a same path
of 13 hops in the Internet measured during 3.5 hours.
118
The end-to-end delay along a xed path between source and destination
measured during some interval is an example of a continuous time stochastic
process. We have received data of the delay measured at RIPE-NCC as
illustrated in Fig. 7.2. Figure 7.2 shows a sample path of this continuous
stochastic process. The precise details of the measurement conguration are
for the present purpose not relevant. It su!ces to add that Figure 7.2 shows
the time dierence between the departure of an IP test packet of 100 byte at
the sending box and its arrival at the destination box accurate within 10 s.
1
packets per second. Each
The average sending rate of IP test packets is 12
IP test packet is assumed to follow the same path from sending to receiving
box. The steadiness of the path is checked by trace-route measurements
every 6 minutes.
Usually, in the next step, the histogram of the raw data is made. A
histogram counts the number of data points that lie in an interval of G
ms, which is often called the bin size. Most graphical packages allow to
choose the bin size. Figure 7.3 shows two dierent histograms with bin size
G = 0=5 ms and G = 0=1 ms. In general, there is no universal rule to
choose the bin size G. Clearly, the bin size is bounded below by the measurement accuracy, in our case G A 10 s. A ner bin size provides more
detail, but the resulting histogram exhibits also more stochastic variations
because there are fewer data points in a small bin and adjacent bins may
possess a signicantly dierent amount of data points. Hence, compared
to one larger bin that covers a same interval, less averaging or smoothing
occurs in a set of smaller bins. The normalized histogram obtained by dividing the counts per bin by the total number of data points provides a rst
approximation to the probability density function of G. However, it is still
discrete and approximates Pr [n ? G n + G]. A more precise description
of constructing a histogram is given in Section C.1.
The histogram is generally better suited to decide whether outliners in de
data points may be due to measurement errors or not. Figure 7.3 suggests
to either neglect the data points with G A 40 ms or to measure at a higher
sending rate of IP test packets in order to have more details in the intervals
exceeding 38 or 40 ms. If there existed a good3 stochastic model for the endto-end delay along xed Internet paths, a normal procedure4 in engineering
and physics would be to t the histogram with that stochastic model to
obtain the parameters of that stochastic model. The accuracy of the t can
3
4
119
140
120
300
100
200
80
100
60
0
40
35
40
45
50
20
0
34
35
36
37
38
39
40
41
42
43
44
D in ms
Fig. 7.3. The histogram of the end-to-end delay with a bin size of 0.1 ms (the insert
has bin size of 0.5 ms).
120
of the process for all w0 ? ? wn31 ? wn ? ? wq are assumed to be independent or weakly dependent. The study of Markov processes (Chapters
911) basically tries to compute and analyze the process in steady state.
Figure 7.2 is measured over a relatively long period of time and indicates
that after 8.00 a.m. the background tra!c increases. The background trafc interferes with the IP test packets and causes them to queue longer in
routers such that larger variations are observed. However, it is in general
di!cult to ascertain that (a part of) the measurement is performed while
the system operates in a certain stable regime (or steady state).
We have touched upon some aspects in the art of modeling to motivate the
importance of studying stochastic processes. In the sequel of this chapter,
one of the most basic and simplest stochastic processes is investigated.
(w)n h3w
n!
(7.1)
(7.2)
Relation (7.2) explains why is called the rate of the Poisson process,
namely, the derivative over time w or the number of events per time unit.
121
(b) The probability that exactly one event occurs in an arbitrarily small
time interval of length k follows from condition (iii) as
Pr [[(k + v) [(v) = 1] = kh3k = k + r(k)
while the probability that no event occurs in an arbitrarily small time interval of length k is
Pr [[(k + v) [(v) = 0] = h3k = 1 k + r(k)
Similarly, the probability that more than one event occurs in an arbitrarily
small time interval of length k is
Pr [[(k + v) [(v) A 1] = r(k)
Example 1 A conversation in a wireless ad-hoc network is severely disturbed by interference signals according to a Poisson process of rate = 0=1
per minute. (a) What is the probability that no interference signals occur
within the rst two minutes of the conversation? (b) Given that the rst
two minutes are free of disturbing eects, what is the probability that in the
next minute precisely 1 interfering signal disturbs the conversation?
(a) Let [ (w) denote the Poisson interference process, then Pr [[(2) = 0]
needs to be computed. Since [ (0) = 0 and with (7.1), we can write
Pr [[(2) = 0] = Pr [[(2) [ (0) = 0] = h32 , which equals Pr [[(2) = 0] =
h30=2 = 0=8187.
(b) The events during two non-overlapping intervals of a Poisson process
are independent. Thus the event {[(2) [(0) = 0} is independent from the
event {[(3) [(2) = 1} which means that the asked conditional probability Pr [[(3) [(2) = 1|[(2) [(0) = 0] = Pr [[(3) [(2) = 1]. From
(7.1), we obtain Pr [[(3) [(2) = 1] = 0=1h30=1 = 0=0905=
Example 2 During a certain time interval [w1 > w1 + 10 s], the number of IP
packets that arrive at a router is on average 40/s. A service provider asks
us to compute the probability that there arrive 20 packets in the period
[w1 > w1 + 1 s] and 30 IP packets in [w1 > w1 + 3 s]. We may regard the arrival
process as a Poisson process.
We are asked to compute Pr [[(1) = 20> [(3) = 30] knowing that = 40
31
s . Using the independence of increments and (7.1), we rewrite
Pr [[(1) = 20> [(3) = 30] = Pr [[(1) [(0) = 20> [(3) [(1) = 10]
= Pr [[(1) [(0) = 20] Pr [[(3) [(1) = 10]
=
122
which means that the request of the service provider does not occur in
practice.
(v)
123
By independence (ii),
Pr [Q (w + k) Q (w) = m|Q (w) = q m] = Pr [Q (w + k) Q (w) = m]
and by denition Pr [Q(w) = q m] = Sq3m (w), we have
Sq (w + k) =
q
X
m=0
q
X
m=0
m=2
q
X
m=2
124
g w
h Sq (w) = hw Sq31 (w)
(7.3)
gw
In case q =
reduces with S0 (w) = h3w to
1, the dierential equation w
g
w
gw h S1 (w) = . The general solution is h S1 (w) = w + F and, from the
initial condition S1 (0) = 0, we have F = 0 and S1 (w) = wh3w . The general
q 3w
solution to (7.3) is proved by induction. Assume that Sq (w) = (w)q!h
holds
for q, then the case q + 1 follows from (7.3) as
g w
(w)q
h Sq+1 (w) =
gw
q!
q+1 3w
h
and integrating from 0 to w using Sq+1 (0) = 0, yields Sq+1 (w) = (w)(q+1)!
which establishes the induction and nalizes the proof of the theorem.
The second theorem has very important applications since it relates the
number of events in non-overlapping intervals to the interarrival time between these events.
Theorem 7.3.2 Let {[(w); w 0} be a Poisson process with rate A 0
and denote by w0 = 0 ? w1 ? w2 ? the successive occurrence times of
events. Then the interarrival times q = wq wq31 are independent identically
distributed exponential random variables with mean 1 .
Proof: For any v 0 and any q 1, the event {q A v} is equivalent to
the event {[(wq31 +v)[(wq31 ) = 0}. Indeed, the q-th interarrival time q
can only be longer than v time units if and only if the q-th event has not yet
occurred v time units after the occurrence of the (q 1)-th event at wq31 .
Since the Poisson process has independent increments (condition (ii) in the
denition of the Poisson process), changes in the value of the process in nonoverlapping time intervals are independent. By the equivalence in events,
this implies that the set of interarrival times q are independent random
variables. Further, by the stationarity of the Poisson process (deduced from
condition (iii) in the denition of the Poisson process),
Pr [q A v] = Pr [[(wq31 + v) [(wq31 ) = 0] = h3v
which implies that any interarrival time has an identical, exponential distribution,
Iq ({) = Pr [q {] = 1 h3{
125
The converse of Theorem 7.3.2 also holds: if the interarrival times {q }
of a counting process {Q (w)> w 0} are i.i.d. exponential random variables
with mean 1 , then {Q (w)> w 0} is a Poisson process with rate .
An association to the exponential distribution is the memoryless property,
Pr[q A v + w|q A v] = Pr[q A w]
By the equivalence of the events, for any w> v 0,
Pr[q A v + w|q A v] = Pr[[(wq31 + v + w) 3 [(wq31 ) = 0|[(wq31 + v) 3 [(wq31 ) = 0]
= Pr[[(wq31 + v + w) 3 [(wq31 + v) = 0|[(wq31 + v) 3 [(wq31 ) = 0]
"
X
(w)n h3w
n=q
n!
126
3
=C
q
Y
m=1
= q
q
Y
m=1
Sq
= q
q
Y
m=1
127
Thus,
q
Pr [w1 v1 > w2 v2 > = = = > wq ? vq |[(w) = q] =
q
Y
m=1
(w)q h3w
q!
q! Y
(vm vm31 )
wq
q
m=1
Cq
Pr [w1 v1 > = = = > wq ? vq |[(w) = q]
Cv1 = = = Cvq
follows as
q!
wq
which is independent of the rate . If 0 ? w1 ? w2 ? ? wq ? w are the
successive occurrence times of q Poisson events in the interval [0> w], then
the random variables w1 > w2 > = = = > wq are distributed as a set of order statistics,
dened in Section 3.4.2, of q uniform random variables in [0> w]. In other
words, if q i.i.d. uniform random variables on [0> w] are assorted in increasing
order, they may represent q successive occurrence times of a Poisson process.
The average spacing between these q ordered i.i.d. uniform random variables
is qw as computed in Problem (ii) of Section 3.7.
A related example is the conditional probability where 0 ? v ? w and
0 n q,
i{wm } (v1 > v2 > = = = > vq |[(w) = q) =
Pr [[(v) = n|[(w) = q] =
128
Given that a total number of q Poisson events have occurred in time interval
[0> w], the chance that precisely n events have taken place in the sub-interval
[0> v] is binomially distributed with parameter q and s = vw . Observe that
also this conditional probability is independent of the rate . In addition,
since limw<" [(w) = 4 such that q $ 4, applying the law of rare events
results in
lim Pr [[(v) = n|[(w) = q] =
w<"
vn 3v
h
n!
129
130
g
is immediate. Rewritten as gw
log S0 (w) = (w), after integration over (v> w],
we nd log S0 (w) = ((w) (v)) since S0 (v) = Pr [Q (v) Q (v) = 0] = 1.
Thus, for the case q = 0, we nd S0 (w) = exp [ ((w) (v))], which proves
the theorem for q = 0.
The remainder of the proof (q A 0) uses the same ingredients as the proof
of Theorem 7.3.1 and is omitted.
A nonhomogeneous Poisson process [(w) with rate (w) can be transformed to a homogeneous Poisson process \ (x) with rate 1 by the time
transform x = (w). For, \ (x) = \ ((w)) = [(w), and \ (x + x) =
\ ((w) + (w)) = [(w + w) because (w) = (w) for small w such
that
Pr [\ (x + x) \ (x) = 1] = Pr [[(w + w) [(w) = 1]
= (w)w + r(w)
= x + r(x)
because x = (w)w + r(w). Hence, all problems concerning nonhomogeneous Poisson processes can be reduced to the homogeneous case treated
above.
Pr [w [ w + w|[ A w] =
In medical sciences, [ can represent in general the time for a certain event to occur. For
example, the time it takes for an organism to die, the time to recover from illness, the time for
a patient to respond to a therapy and so on.
Recall the discussion in Section 2.3.
131
Pr [w ? [ w + w] = i[ (w)w,
Pr [w [ w + w|[ A w] =
i[ (w)
w
1 I[ (w)
i[ (w)
1 I[ (w)
(7.4)
can be interpreted as the intensity or rate that a w-year old object will fail.
It is called the failure rate u(w) and
U(w) = 1 I[ (w) = Pr [[ A w]
(7.5)
gU(w)
gw
U(w)
=
g ln U(w)
gw
U(w) = exp
u(x)gx
(7.6)
(7.7)
The expressions (7.6) and (7.7) are inverse relations that specify u(w) as function of U(w) and vice versa. The reliability function U(w) is non-increasing
with maximum at w = 0 since it is a probability distribution function. On
the other hand, the failure rate u(w) being a probability density function can
take any positive real value. From (7.4) we obtain the density function of
the lifetime [ in terms of failure rate u(w) as
Z w
u(x)gx
i[ (w) = u(w)U(w) = u(w) exp
0
with i[ (0) = u(0). Using the tail relation (2.35) for the expectation of the
lifetime [ immediately gives the mean time to failure,
Z "
U(w)gw
(7.8)
H [[] =
0
In biology, medical sciences and physics, U(w) is called the survival function and u(w) is the
corresponding mortality rate or hazard rate.
132
w = W= In practice, the failure rate u(w) is relatively high for small w due to
initial imperfections that cause a number of objects to fail early and u(w) is
increasing towards the maximum life time W due to aging or wear and tear.
This shape of u(w) as illustrated in Fig. 7.4 is called a bath-tub curve,
which is convex.
r(t)
fX(0)
t
T
An often used model for the failure rate is u(w) = dwd31 with corresponding reliability function U(w) = exp [wd ] and where the lifetime [ has a
Weibull distribution function I[ (w) = 1 U(w) as in (3.40). In case d = 1,
the failure rate u(w) = is constant over time, while d A 1 (d ? 1) reects an
increasing (decreasing) failure rate over time. Hence, a bath-tub shaped
(realistic) failure function as in Fig. 7.4 can be modeled by a Weibull model
for u (w) with d ? 1 in the beginning, d = 1 in the middle and d A 1 at the
end of the life time.
For an exponential lifetime where i[ (w) = h3w , the failure rate (7.4)
equals u(w) = and is independent of time. This means that the failure
rate for a w-year-old object is the same as for a new object, which is a
manifestation of the memoryless property of the exponential distribution.
It also explains why in both the exponential as Poisson process is often
called a rate.
7.6 Problems
(i) A series of test strings each with a variable number Q of bits all equal
to 1 are transmitted over a channel. Due to transmission errors, each
1-bit can be eected independently from the others and only arrives
non-corrupted with probability s. The length Q of the test strings
(words) is a Poisson random variable with mean length bits. In
7.6 Problems
133
this test, the sum \ of the bits in the arriving words is investigated
to determine the channel quality via s. Compute the pdf of \ .
(ii) At a router, four QoS classes are supported and for each class packets
arrive according to a Poisson process with rate m for m = 1> 2> 3> 4.
Suppose that the router had a failure at time w1 that lasted W time
units. What is the probability density function of the total number
of packets of the four classes that has arrived during that period?
(iii) Let Q (w) = Q1 (w) + Q2 (w) be the sum of two independent Poisson
processes with rates 1 and 2 . Given that the process Q (w) had
an arrival, what is the probability that that arrival came from the
process Q1 (w)?
(iv) Peter has been monitoring the highway for nearly his entire life and
found that the cars pass his house according to a Poisson process.
Moreover, he discovered that the Poisson process in one lane is independent from that in the other lanes. The rate of these independent
processes diers per lane and is denoted by 1 > 2 > 3 , where m is
expressed in the number of cars on lane m per hour.
(a) Given that one car passed Peter, what is the probability that
it passed in lane 1?
(b) What is the probability that q cars pass Peter in 1 hour ?
(c) What is the probability that in 1 hour q cars have passed and
that they all have used lane 1?
(v) In a game, audio signals arrive in the interval (0> W ) according to a
Poisson process with rate , where W A 1@. The player wins only
if at least one audio signal arrives in that interval, and if he or she
pushes a button (only one push allowed) upon the last of the signals.
The player uses the following strategy: he or she pushes the button
upon the arrival of the rst signal (if any) after a xed time v W .
(a) What is the probability that the player wins?
(b) Which value of v maximizes the probability of winning, and
what is the probability in that case?
(vi) The arrivals of voice over IP (VoIP) packets to a router is close to
a Poisson process with rate = 0=1 packets per minute. Due to an
upgrade to install weighted fair queueing as priority scheduling rule,
the router is switched o for 10 minutes.
(a) What is the probability of receiving no VoIP packets when
switched o?
(b) What is the probability that more than ten VoIP packets will
arrive during this upgrade?
134
(c) If there was one VoIP in the meantime, what is the most probable minute of the arrival?
(vii) A link of a packet network carries on average ten packets per second.
The packets arrive according to a Poisson process. A packet has a
probability of 30 % to be an acknowledgment (ACK) packet independent of the others. The link is monitored during an interval of 1
second.
(a) What is the probability that at least one ACK packet has been
observed?
(b) What is the expected number of all packets given that ve
ACK packets have been spotted on the link?
(c) Given that eight packets have been observed in total, what is
the probability that two of them are ACK packets?
(viii) An ADSL helpdesk treats exclusively customer requests of one of
three types: (i) login-problems, (ii) ADSL hardware and (iii) ADSL
software problems. The opening hours of the helpdesk are from 8:00
until 16:00. All requests are arriving at the helpdesk according to
a Poisson process with dierent rates: 1 = 8 requests with login
problems/hour, 2 = 6 requests with hardware problems/hour, and
3 = 6 requests with software problems/hour. The Poisson arrival
processes for dierent types of requests are independent.
(a) What is the expected number of requests in one day?
(b) What is the probability that in 20 minutes exactly three requests arrive, and that all of them have hardware problems?
(c) What is the probability that no requests will arrive in the last
15 minutes of the opening hours?
(d) What is the probability that one request arrives between 10:00
and 10:12 and two requests arrive between 10:06 and 10:30?
(e) If at the moment w + v there are n + p requests, what is the
probability that there were n requests at the moment w?
(ix) Arrival of virus attacks to a PC can be modeled by a Poisson process
with rate = 6 attacks per hour.
(a) What is the probability that exactly one attack will arrive
between 1 p.m. and 2 p.m.?
(b) Suppose that at the moment the PC is turned on there were no
attacks on PC, but at the shut-down time precisely 60 attacks
have been observed. What is the expected amount of time
that the PC has been on?
7.6 Problems
135
(c) Given that six attacks arrive between 1 p.m. and 2 p.m., what
is the probability that the fth attack will arrive between 1:30
p.m. and 2 p.m.?
(d) What is the expected arrival time of that fth attack?
(x) Consider a system V consisting of q subsystems in series as shown
in Fig. 7.5. The system V operates correctly only if all subsystems
operate correctly. Assume that the probability that a failure in a
subsystem Vl occurs is independent of that in subsystem Vm . Given
the reliability functions Um (w) or each subsystem Vm , compute the
reliability function U(w) of the system V.
S1
S3
S2
Sn
S1
S2
Sn
Fig. 7.6. A system consisting of q subsystems in parallel.
8
Renewal theory
N(t)
W1
W0
W2
W1
W3
W2
W4
W3
W5
W4
W5
Fig. 8.1. The relation between the renewal counting process Q (w), the interarrival
time q and the waiting time Zq .
137
138
Renewal theory
P
As illustrated in Fig. 8.1, the waiting time Zq = qn=1 n (for q 1, with
Z0 = 0 by convention) is related to the counting process {Q (w)> w 0} by the
equivalence {Q (w) q} +, {Zq w}: the number of events (renewals) up
to time w is at least q if and only if the q-th renewal occurred on or before
time w. Alternatively, the number of events by time w equals the largest
value of q for which the q-th event occurs before or at time w, Q (w) =
max [q : Zq w]. The convention that Z0 = 0 implies that Q(0) = 0: the
counting process starts counting from zero at time 0. The main objective
of renewal theory is to deduce properties of the process {Q(w)> w 0} as a
function of the interarrival distribution I (w) = Pr [ w].
8.1.1 The distribution of the waiting time Zq
If we assume that the interarrival times are i.i.d. having a Laplace transform
Z "
Z "
3}w
* (}) =
h gI (w) =
h3}w i (w)gw
0
By partial integration, we
R w nd the Laplace transform of the distribution
IZq (w) = Pr [Zq w] = 0 iZQ (x)gx
Z "
*q (})
*Zq (})
=
(8.2)
h3}w IZq (w)gw =
}
}
0
The inverse Laplace transform follows1 with (2.38) as
Z f+l" q
* (}) }w
1
Pr [Zq w] =
h g}
2l f3l"
}
(8.3)
i[ (x)gx =
0
1
2l
f+l"
f3l"
*[ (})
h}w 3 1
g}
}
U f+l" *[ (})
1
whose form seems dierent from (8.3). However, 2l
g} = 0 because the contour
f3l"
}
can be closed over the positive Re(}) A f plane where *[ (}) is analytic and because limU<"
* (Uhl ) = 0 for 3 2 ? ? 2 , which follows from the existence of the Laplace integral
U[
" 3}w
i[ (w)gw.
0 h
139
Integrated,
Z
Pr [Zq w] =
gx
3"
Z " Z
"
3"
w3|
=
3"
w
3"
Pr [Zq31 w |] i (|)g|
=
0
(qW)
By denoting Pr [Zq w] = I
(w), we have
(0W)
These equations also show that we can dene I (w) = 1. Let us dene
P
(nW)
Xq (w) = qn=1 I (w). By summing both sides in the last equation, we
obtain
Z wX
Z w q31
q
X
((n31)W)
Xq (w) =
I
(w |)i (|)g| =
I((n)W) (w |)i (|)g|
0 n=1
0 n=0
(0W)
(w) = 1, we arrive at
Z w
Xq (w) =
Xq31 (w |)gI (|) + I (w)
(8.4)
(w) for
140
Renewal theory
which follows from the monotone increasing nature of any distribution func(0W)
tion. By iteration on q starting from I (w) = 1, it is immediate that
I(qW) (w) (I (w))q
(8.5)
X
n { Pr max n {
Pr
1$n$q
n=1
P
which is rather obvious because qn=1 n max1$n$q n . The equality sign
is only possible if q 1 of the n are zero.
8.1.2 The renewal function p (w) = H [Q (w)]
From the equivalence {Q (w) q} +, {Zq w}, we directly have
Pr [Q (w) q] = Pr [Zq w] = I(qW) (w)
(8.6)
"
X
I(nW) (w)
(8.7)
n=1
"
X
n=1
(I (w))n =
1
1
1 I (w)
Hence, for nite w where I (w) ? 1, the renewal function p(w) converges at
least as fast as a geometric series and is bounded. In the limit w $ 4, where
limw<" I (w) = 1, we see that p(w) is not bounded anymore. Intuitively, the
number of repeated events (renewals) in an innite time interval is clearly
innite.
The renewal function p(w) completely characterizes the renewal process.
Indeed, if *p (}) is the Laplace transform of p(w), then after taking the
141
Laplace transform of both sides in (8.7) and using the denition Pr [Zq w] =
(qW)
I (w) together with (8.2), we obtain
1X n
1 * (})
*p (}) =
* (}) =
}
} 1 * (})
"
(8.8)
n=1
provided |* (})| ? 1. From this expression, the interarrival time can be
found from
}*p (})
* (}) =
1 + }*p (})
after inverse Laplace transform. By taking the inverse Laplace transform
(2.38), p(w) is written as a complex integral
Z f+l"
* (}) h}w
1
g}
p(w) =
2l f3l" 1 * (}) }
8.1.3 The renewal equation
After taking the inverse Laplace transform of *p (}) = *p (})* (}) + *}(}) ,
which is deduced from (8.8), a third relation for p (w) that often occurs is
Z w
p(w) =
p(w x)gI (x) + I (w)
0
Z w
I (w x)gp(x) + I (w)
(8.9)
=
0
and is called the renewal equation. Taking the limit q $ 4 in (8.4) also
leads to the renewal equation. Since p(0) = 0, the renewal equation implies
that I (0) = Pr [ 0] = 0 or that processes where a zero interarrival time
is possible (e.g. in simultaneous events) are ruled out. For a Poisson process,
Theorem 7.3.1 states that the occurrence of simultaneous events (k $ 0) is
zero. The requirement p(0) = 0 generalizes the exclusion of simultaneous
events in any renewal process.
The probabilistic argument that leads to the renewal equation is as follows.
By conditioning on the rst renewal for n A 0,
Pr [Q (w) = n|Z1 = v] = 0
= Pr [Q (w v) = n 1]
w?v
wv
where in the last case for w v the event {Q (w) = n} is only possible if n 1
renewals occur in time interval (v> w], which is, due to the stationarity of the
142
Renewal theory
Multiplying both sides by n and summing over all n 1 gives the average
at the left-hand side,
H [Q (w)] =
"
X
n Pr [Q (w) = n]
n=1
n Pr [Q (w v) = n 1] =
n=1
"
X
(n + 1) Pr [Q (w v) = n]
n=0
= H [Q (w x)] + 1
Combining both sides yields
H [Q (w)] = I (w) +
0
in the unknown function \ (w), where k (w) is a known function and I (w) is
a distribution function. This equation can be written using the convolution
notation as
\ (w) = k (w) + \ I (w)
By conditioning on the rst renewal as shown above, many renewal problems
can be recasted into the form of the general renewal equation (8.11). An
example is the derivation of the residual life or waiting time given in Section
8.3. Therefore, it is convenient to present the solution to the general renewal
equation (8.11).
143
Lemma 8.1.1 If k (w) is bounded for all w, then the unique solution of the
general renewal equation (8.11) is
Z w
k(w x)gp(x)
(8.12)
\ (w) = k (w) +
where p (w) =
P"
n=1
I (nW) (w)
Proof: Let us rst concentrate on the formal solution. In general, convolutions are best treated in the transformed domain. After taking the Laplace
transform of the general renewal equation (8.11), we obtain
*\ (}) = *k (}) + *\ (}) *I (})
such that
*\ (}) =
*k (})
1 *I (})
There always exists a region in the }-domain where |*I (})| ? 1 such that
the geometric series applies,
*\ (}) = *k (})
"
X
n=0
"
X
(*I (}))n
n=1
Back transforming and taking into account that (*I (}))n is the transform
of a n-fold convolution yields
\ (w) = k (w) + k
"
X
n=1
n=1
"
[
n=2
= k (w) + k W p (w)
&
I
(nW)
(w) = k (w) + k W
"
[
n=1
&
I
(nW)
(w)
144
Renewal theory
By convolving both sides with I and using the original equation, we deduce
that Y (w) = Y I I (w). Continuing this process, for each n, we have
that Y (w) = Y I (nW) (w). Since I (nW) (w) $ 0 for all nite w and n $ 4
(because p (w) exists for all nite w), and if Y (w) is bounded, this implies that
Y (w) = 0 for all nite w. This demonstrates the uniqueness and motivates
the requirement that k (w) should be bounded.
gh}w
= w. This result, of course, follows directly from the
p(w) = g}
}=0
denition of the Poisson process given in (7.2). We see that the renewal
function p(w) for the Poisson process is linear for all w. Moreover, the Poisson
process is the only continuous time renewal process with a linear renewal
function p(w). Indeed, if3 p (w) = w, the renewal equation is
Z w
Z w
w =
((w x)) gI (x) + I (w) =
I (x)gx wI (0) + I (w)
0
gI (w)
gw
whose solution is I (w) = 1 h3w . By Theorem 7.3.2, exponential interarrival times characterize a Poisson process with rate = .
8.2 Limit theorems
In the limit w $ 4, the equivalence relation (8.6) indicates that, for any
xed value of q, Pr [Q (w) q] = 1, which means that the number of events
3
145
Q (w)
Q (w) $ 4 as w $ 4. Let us consider Q(w)
, which is the sample mean
of the rst Q (w) interarrival times in the
w]. The Strong Law of
intervalZ(0>
q
Large Numbers (6.3) indicates that Pr limq<" q = = 1 and, because
(w)
Q (w) $ 4 as w $ 4, we have that QQ(w)
$ = H [ ] as w $ 4. Since
ZQ(w) w ? ZQ(w)+1 , we obtain the inequality
ZQ(w)+1
ZQ(w)
w
?
Q (w)
Q (w)
Q(w)
Since both lower and upper bound tend to , we arrive at the important
= 1 . The random variable counting the number
result that limw<" Q(w)
w
of events in (0> w] per interval length w, converges to the average interarrival time = H [ ]. Unfortunately4 , weh cannot
i simply deduce the intuQ(w)
tends to 1 . On the other
itive result that also the expectation, H
w
hand, the expectation
of ZQ(w) is obtained from Walds identity (2.69)
lim
w<"
(8.13)
The left-hand side in (8.13) describes the long run average number of
events (renewals) per unit time. The right-hand side is the reciprocal of
the average interarrival rate (or life time). For example, in the light bulb
replacement process, a bulb lasts on average time units, then, in the long
run or steady state, the light bulbs must be replaced at rate 1 per time unit.
4
As remarked by Ross (1996, p. 108), if X is uniformly distributed on (0> 1), consider the random
variables \q dened as \q = q1X $ 1 . For large q, X A 0 with probability 1, whence \q < 0
q
k
l
1
if q < ". However, H [\q ] = qH 1X $ 1 = q q
= 1, for all q. The sequence of random
q
variables \q converges to 0, although the expected values of \q are all precisely 1=
The elementary renewal theorem can be proved only by resorting to complex function theory
and using LaplaceStieltjes transforms (Cohen, 1969, p. 100). The limit argument provided
by the Strong Law of Large Numbers follows then from a Tauberian theorem.
146
Renewal theory
R"
while the Key Renewal Theorem states that limw<" \ (w) = 1 0 k(x)gx =
W
. Hence, we arrive at Blackwells Theorem, for any xed W A 0,
p(w) p(w W )
1
=
w<"
W
lim
su!cient condition for Udirect Riemann integrability is (a) j(w) D 0 for all w D 0, (b) j(w) is
non-increasing and (c) 0" j(x)gx ? ".
Based on the relatively new probabilistic concept of coupling, alternative proofs of the Key
Renewal Theorem exist (see e.g. Grimmett and Stirzacker (2001, pp. 429430)).
147
which means that limw<" p(w) p(w W ) = d(W ) exists, the Elementary
Renewal Theorem su!ces to prove that the limit has value 1 . Following the
argument of Ross (1996, p. 110), we can write, for nite { and |,
d({ + |) = lim [p(w) p(w { |)]
w<"
w<"
= d({) + d(|)
Apart from the trivial solution d({) = 0, the only other9 solution of d({ +
|) = d({) + d(|) is d({) = f{, where f is a constant. Hence, given that
limw<" p(w) p(w W ) = d(W ) exists, this is equivalent to the fact that
q 3W )
the sequence {eq }qD0 where eq = p(wq )3p(w
and wq A wq31 converges
W
to a constant f. The simplest sequence with this property is {eWq }qD0 where
eWq = p(q) p(q 1) and W = 1. Lemma 6.1.1 states that
1
1X W
1X
p(q)
=
en = lim
p(n) p(n 1) = lim
q<" q
q<" q
q<" q
q
n=1
n=1
f = lim
where the last equality follows from the Elementary Renewal Theorem (8.13).
Theorem 8.2.3 (Asymptotic Renewal Distribution) If the average
= H [ ] and variance 2 = Var[ ] of the interarrival time of the events in
a renewal process exist, then
5
6
Z {
Q (w) w
1
2
? {8 = s
h3x @2 gx
lim Pr 7 q
w<"
w
2 3"
3
(8.15)
Proof : The Elementary Renewal Theorem states that Q (w) w for large
w, which suggests to consider the random variable X (w) = Q(w) w with
H [X(w)] $ 0. From the equivalence {Q (w) ? q} +, {Zq A w}, we have
{X (w) ? {w } +, {Z{w + w A w} where {w is such that {w + w is a positive
The proof is as follows: (i) if | = 0, we see that d({ + 0) = d({) + d(0) or d(0) = 0. (ii)
d(q{) = qd({) for integer q. (iii) Using (ii), we have
that
d(q{ + p|) = qd({) + pd(|). By
choosing q{ + p| = 0 it follows from (i) that d 3 p
| = 3p
d (|) such that (ii) holds for
q
q
rational numbers. Thus, d(t1 { + t2 |) = t1 d({) + t2 d(|) for rational numbers t1 and t2 . (iv)
(y)
Recalling the denition i x+y
$ i (x)+i
of a convex function in Section 5.2 and the fact
2
2
that a function that is both concave and convex is a linear function, it follows that d({) is
linear and with (i) that d({) = f{.
148
Renewal theory
integer. Then,
h
i
Pr [X (w) ? {w ] = Pr Z{w + w A w
5
w
Z{w + w {w +
w {w + w
8
q
q
A
= Pr 7
{w + w
{w + w
The waiting time Zq consists of a sum of i.i.d. random variables with mean
and variance 2 . By the Central Limit Theorem 6.3.1, there holds that
Z "
Zq q
1
2
s
h3x @2 gx
lim Pr
A{ = s
q<"
q
2 {
which implies that
6
5
Z "
Z{w + w {w + w
w {w + w
1
2
8
7
q
q
s
=
A
h3x @2 gx
lim Pr
w<"
w
w
2 |
{w +
{w +
provided
w3 {w + w
limw<" t
{w + w
for large w,
{
q w
{w +
r
which is satised if {w =
|2 2
w
= |
1+4
|
!
w
which is equivalent to
22
6
Z "
Q (w) w
1
2
lim Pr 7 q
? {8 = s
h3x @2 gx
w<"
2 3{
w3
Noting that
R"
3x2 @2 gx
3{ h
R{
3x2 @2 gx
3" h
Comparing Theorem 8.2.3 to the Central Limit Theorem 6.3.1 shows that
the asymptotic variance of Q (w) behaves as
2
Var [Q (w)]
= 3
w<"
w
lim
(8.16)
149
Moreover, Theorem 8.2.3 is a central limit theorem for the dependent random
variables Q (Zq ) where dependence is obvious from Q (Zq ) = Q (Zq31 ) + 1.
8.3 The residual waiting time
Suppose we inspect a renewal process at time w and ask the question How
long do we have to wait on average to see the next renewal? This question
frequently arises in renewal problems. For instance, the arrivals of taxis at
a station is a renewal process and, often, we are interested to known how
long we have to wait until the next taxi. Also, packets arriving at a router
may nd an earlier packet that is partially served. In order to compute the
total time spent in the system, it is desirable to know the residual service
time of that packet. In addition, this problem belongs to one of the classical
examples to demonstrate how misleading intuition in probability problems
can be. There are two dierent arguments to the question above leading to
two dierent answers:
(i) since my inspection of the process does not alter or inuence the
process, the distribution of my waiting time should not depend on the
time w; hence, my average waiting time equals the average interarrival
time of the renewal process.
(ii) the time w of the inspection is chosen at random in (i.e. uniformly distributed over) the interval between two consecutive renewals; hence
my expected waiting time should be half of the average interarrival
time.
Both arguments seem reasonable although it is plain that one of them
must be wrong. Let us try to sort out the correct answer to this apparent
paradox, which, according to Feller (1971, pp. 1213), has puzzled many
before its solution was properly understood.
A(t)
R(t)
time
t
WN(t)
WN(t)+1
L(t)
Fig. 8.2. Denition of the random variables the age D(w), the lifetime O(w) and the
residual life (or waiting time) U(w).
Figure 8.2 denes the setting of the renewal problem and the quantities of
interest: D(w) is the age at time w, which is the total time elapsed since the
150
Renewal theory
last renewal before w at time ZQ(w) , the residual waiting time (or residual
life or excess life) U(w) is the remaining time at w until the next renewal at
time ZQ(w)+1 and O(w) is the total waiting time (or life time). From Fig. 8.2,
we verify that
D(w) = w ZQ (w)
U(w) = ZQ (w)+1 w
O(w) = ZQ (w)+1 ZQ(w) = D(w) + U(w)
The distribution of the residual waiting time, IU(w) ({) = Pr [U(w) {] will
be derived. Similar to the probabilistic argument before, we condition on
the rst renewal. If Z1 = v w, then the rst renewal occurs before time
w and the event {U(w) A {|Z1 = v} has the same probability as the event
{U(w v) A {} because the renewal process restarts from scratch at time v.
If v A w, the residual waiting time U(w) lies in the rst renewal interval [0> v].
In this case, we have either that the residual waiting time U(w) is certainly
shorter than { if v is contained in the interval [w> w + {], else the residual
waiting time U(w) is surely larger than {. In summary,
;
? Pr [U(w v) A {] if 0 v w
Pr [U(w) A {|Z1 = v] =
0
if w ? v w + {
=
1
if v A { + w
Using the total law of probability (2.46),
Z "
g Pr [Z1 v]
Pr [U(w) A {] =
gv
Pr [U(w) A {|Z1 = v]
gv
0
Z "
Z w
g Pr [ v]
gv
Pr [U(w v) A {] i (v)gv +
=
gv
0
{+w
Z w
Pr [U(w v) A {] gI (v) + 1 I ({ + w)
=
0
which also implies that limw<" 1I ({+w) = 0. Hence, k (w) = 1I ({+w)
is bounded for all w 0 and Lemma 8.1.1 is applicable, yielding
Z w
Pr [U(w) A {] = 1 I ({ + w) +
[1 I ({ + w v)] gp(v)
0
151
Also, the conditions for direct Riemann integrability in the Key Renewal
Theorem 8.2.2 for j(w) = 1 I ({ + w) are satised such that
Z w
[1 I ({ + w v)] gp(v)
lim Pr [U(w) A {] = lim
w<"
w<" 0
Z "
1
=
(1 I ({ + w)) gw
with (8.14)
H [ ] 0
Z "
1
=
(1 I (w)) gw
H [ ] {
In other words, the steady-state or equilibrium distribution function for the
residual waiting time equals
Z {
1
(1 I (w)) gw (8.17)
lim Pr [U(w) {] = Pr [U {] = IU ({) =
w<"
H [ ] 0
Similarly, for w A |, the event {D(w) A |} is equivalent to the event {no
renewals in [w |> w]}, which is equivalent to {U(w |) A |}. Hence,
lim Pr [D(w) A |] = lim Pr [U(w |) A |] = lim Pr [U(w) A |]
w<"
w<"
Z "
1
(1 I (w)) gw
=
H [ ] |
w<"
or, both the residual waiting time U and the age D have the same distribution in steady state (w $ 4). Intuitively, when reversing the time axis in
steady state or looking backward in time, an identically distributed renewal
process is observed in which the role of the age D and the residual life U
are interchanged. Thus, by a time symmetry argument, both distributions
must be the same in steady state.
It is instructive to compute the average residual waiting time H [U] =
H [D] in steady state. Using the expression of the average in terms of tail
probabilities (2.35), we have
Z "
H [U] =
(1 IU ({)) g{
0
Z " Z "
1
=
g{
(1 I (w)) gw
H [ ] 0
{
Reversing the order of the {- and w-integration yields
Z "
Z w
1
gw (1 I (w))
g{
H [U] =
H [ ] 0
0
Z "
1
=
w (1 I (w)) gw
H [ ] 0
152
Renewal theory
(8.18)
This expression shows that the average remaining waiting time equals half of
the average interarrival time plus the ratio of the variance over the mean of
the interarrival time. The last term is always positive. Since H [D] = H [U]
and H [O] = H [D] + H [U], we observe the curious result that
H [O] = H [ ] +
Var[ ]
H [ ]
H [ ]
or that the average total waiting H [O] is longer than the average interarrival time H [ ], contrary to intuition. This fact is referred to as the inspection paradox: the steady-state interrenewal time, O(w) = ZQ(w)+1 ZQ(w) ,
containing the inspection point at time w, exceeds on average the generic
interarrival time, say Z1 . The explanation is that the inspection point at
time w is uniformly chosen over the time axis and every inspection point is
thus equally likely. The chance that the inspection point w lies in a renewal
interval is proportional to the length of that interval. Hence, it has higher
probability to fall in a long interval, which explains10 why H [O] H [ ].
Only for deterministic interarrival times where Var[ ] = 0 holds the equality sign, H [O] = H [ ]. For exponential interarrival times, application of
(3.18) gives Var[ ] = (H [ ])2 and H [U] = H [ ] while H [O] = 2H [ ]: the
fact of being inspected at time w changes the lifetime distribution and even
doubles the expected total life time for exponentially distributed failure or
interoccurrence times.
Returning to the initial question, we observe that the intuitive result that
]
my waiting time H [U] = H[
2 is only correct for deterministic processes.
Thus, the variability in the interarrival process causes the paradox. We will
see later, in queueing theory in Section 14.3.1, that also in queueing systems the variability in service discipline causes the average waiting time to
increase. At last, Feller (1971, p. 187) remarks that an apparently unbiased
inspection plan may lead to false conclusions because the actual observations are not typical of the population as a whole. When people complain
that buses or trains start running irregularly, the inspection paradox shows
10
A similar type of reasoning is used in the computation of the waiting of the GI/D/m queueing
system in Section 14.4.2.
153
that above-average interarrival times are experienced more often. The inspection paradox thus implies that complaints may be erroneously based on
an overestimation of the real deviations from the regular time schedule of
busses or trains.
By separating each renewal interval into two non-overlapping subintervals
D(w) and U(w), we have described an alternating renewal process. An alternating renewal process models a system that can be in on- or o-period with
a repeating pattern [1 > \1 > [2 > \2 > = = = where each on-period [q has a same
distribution Ion and is followed by an o-period \q . Each o-period has also
a same distribution Io . The o-period \q may dependent on the on-period
[q , but the q-th renewal cycle with duration [q + \q is independent of any
other cycle. An alternating renewal process can be used to model a data
stream of packets, where the on-period reects the time to store or process
an arriving packet and the o-period a (random) delay between two packets.
Another example is the modeling of the end-to-end delay from a source v
to a destination g in the Internet, where the o-period describes a queueing
in a router due to other interfering tra!c along that path. During the onperiod, a packet is not blocked by other packets. The on-period equals the
propagation delay to travel from the output port of one router to the output
port of the next-hop router. The end-to-end delay along a path with k hops
equals the sum of k consecutive o-periods augmented by the propagation
time from v to g.
8.4 The renewal reward process
The renewal reward process associates at each renewal at time Zq a certain
cost or reward Uq , which may vary over time and can be negative. For
example, each time a light bulb fails, it must be replaced at a certain cost
(negative reward) or each customer in a restaurant pays for his meal (positive reward). The reward Uq may depend on the interarrival time q or
length of the q-th renewal interval, but it is independent of other renewal
epochs (dierent from the q-th). Thus, the pairs (Uq > q ) are assumed to be
independent and identically distributed. Most often one is interested in the
total reward U(w) over a period w (not to be confused with the residual life
time) dened as
X
Q(w)
U(w) =
Uq
(8.19)
q=1
154
Renewal theory
Q(w)
Q(w)
Uq U(w)
q=1
Uq + UQ(w)+1
q=1
leads, after taking the expectations and using Walds identity (2.69), to an
inequality for the averages,
H UQ(w)+1
H [Q (w)]
H [U(w)]
H [Q (w)]
H [U] lim
lim
H [U] lim
+ lim
w<"
w<"
w<"
w<"
w
w
w
w
Since the average reward per renewal period is nite and H UQ(w)+1 =
H [U], we obtain by the Elementary Renewal Theorem 8.2.1 that
H [U(w)]
H [U]
=
w<"
w
H [ ]
lim
(8.21)
Hence, by comparing (8.21) and (8.20), the time average of the average
reward rate equals the time average of the reward rate.
Example The hard disc in a network server is replaced at cost F1 at time
W . The lifetime or age of this mass storage has pdf iD . If the hard disc fails
earlier, the cost of the repair and the penalties for service disruption is F2 .
What is the long run cost of the hard disc in the server per unit time?
8.5 Problems
155
H } Q(w) of the number of renewals in the interval [0> w] and deduce from that equation the renewal equation (8.9) and a relation
for Var[Q (w)].
(iii) In a TCP session from A to B, IP data packets and IP acknowledgement packets travel a distance of 2000 km over precisely the same
bi-directional path. In case of congestion, the average speed is 40000
km/s and without congestion the speed is three times higher. Congestion only occurs in 20% of the travels. What is the average speed
of IP packets in the TCP session?
(iv) The production of digitalized speech samples depends primarily on
the codec, with an eective average rate u (bits/s). Since this rate is
low compared to the ATM capacity F (bits/s), UMTS will use AAL2
mini-cells in which 1 ATM cell is occupied by Q users. The nancial
cost of an UMTS operator increases at qf euro per unit time whenever
156
Renewal theory
9
Discrete-time Markov chains
9.1 Denition
A stochastic process {[(w)> w 5 W } is a Markov process if the future state
of the process only depends on the current state of the process and not on
its past history. Formally, a stochastic process {[(w)> w 5 W } is a continuous
time Markov process if for all w0 ? w1 ? ? wq+1 of the index set W and
for any set {{0 > {1 > = = = > {q+1 } of the state space it holds that
Pr[[(wq+1 ) = {q+1 |[(w0 ) = {0 ,...,[(wq ) = {q ] = Pr[[(wq+1 ) = {q+1 |[(wq ) = {q ]
(9.1)
Similarly, a discrete-time Markov chain {[n > n 5 W } is a stochastic process
whose state space is a nite or countably innite set with index set W =
{0> 1> 2> = = =} obeying
Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ]
(9.2)
A Markov process is called a Markov chain if its state space is discrete. The
conditional probabilities Pr [[n+1 = m|[n = l] are called the transition probabilities of the Markov chain. In general, these transition probabilities can
depend on the (discrete) time n. A Markov chain is entirely dened by the
transition probabilities (9.2) and the initial distribution of the Markov chain
157
158
n
Y
m=1
(9.3)
which demonstrates that the complete information of the Markov chain is
obtained if, apart from the initial distribution, all time depending transition
probabilities are known.
(9.4)
the Markov chain is called stationary. In the sequel, we will conne ourselves
to stationary Markov chains. Since the discrete-time Markov chain is conceptually simpler than the continuous counterpart, we start the discussion
with the discrete case.
Let us consider a state space V with Q states (where Q = dim V can be
innite). It is convenient to introduce a vector notation1 . Since [n can
only take Q possible values, we denote the corresponding state vector at
discrete-time n by v[n] = [v1 [n] v2 [n] vQ [n]] with vl [n] = Pr [[n = l].
Hence, v[n] is a 1 Q vector. Since the state [n at discrete-time n must
P
be in one of the Q possible states, we have that Q
l=1 Pr [[n = l] = 1 or, in
PQ
vector notation, v[n]=x = l=1 1=vl [n] = 1, where xW = [1 1 1]. This
fact is also written as kv[n]k1 = 1, where kdk1 is the t = 1 norm of vector d
1
Unfortunately, a vector in Markov theory is represented as a single row matrix which deviates
from the general theory in linear algebra, followed in Appendix A, where a vector is represented
as a single column matrix. In order to be consistent with the literature on Markov processes,
we have chosen to follow the notation of Markov theory here, but elsewhere we adhere to the
general convention of linear algebra.
159
dened in the Appendix A.3. In a stationary Markov chain, the states [n+1
and [n are connected via the law of total probability (2.46),
Pr [[n+1 = m] =
=
Q
X
l=1
Q
X
(9.5)
l=1
S1;Q31
S1Q
9 S21
S
S
S
S2Q
22
23
2;Q31
9
9 S31
S32
S33
S3;Q31
S3Q
9
S =9
..
.
.
.
.
..
..
..
..
9
.
9
7 SQ31;1 SQ31;2 SQ31;3 SQ31;Q31 SQ 31;Q
SQ1
SQ;2
SQ3
SQ;Q31
SQQ
(9.6)
6
:
:
:
:
:
:
:
8
(9.7)
Since (9.6) must hold for any initial state vector v[0], by choosing v[0] equal
to a base vector [0 0 1 0 0] (all columns zero except for column l)
which expresses that the Markov chain starts from one of the possible states,
say state l, then v[1] = [Sl1 Sl2 SlQ ]. Furthermore, since kv[n]k1 = 1 for
P
any n, it must hold that Q
m=1 Slm = 1 for any state l. The relation
Q
X
Slm = 1
(9.8)
m=1
160
P12
1
P41
4
2
P32
P16
P34
P45
3
P55
P75
P47
5
9
9
S =9
7
P63
6
P56
P67
0
0
0
S41
0
0
0
S12
S22
S32
0
0
0
0
0
0
0
0
0
S63
0
0
0
S34
0
0
0
0
0
0
0
S45
S55
0
S75
S16
0
0
0
S56
0
S67
0
0
0
S47
0
S67
0
P76
Given the initial state vector v[0], the general solution of (9.6) is
v[n] = v[0]S n
(9.9)
Similarly, when knowledge of the Markov chain at discrete-time n is available, we obtain from (9.6) that
v [n + q] = v[n]S q
The elements of the matrix S q are called the q-step transition probabilities,
Slmq = Pr [[n+q = m|[n = l]
(9.10)
for n 0 and q 0. Since the discrete Markov chain must be surely in one
of the Q states q time units later given that it started at time n in state l,
we obtain an extension of (9.8), for all q 1,
Q
X
Slmq = 1
(9.11)
m=1
Slmq+1 =
m=1
Q
X
Sln
Q
X
q
Snm
m=1
n=1
Q
X
Sln
(induction argument)
n=1
=1
This proves (9.11).
(q = 1 case)
6
:
:
:
8
161
n
walks of length n from state l to state m is equal to the element D lm .
A directed graph is strongly connected if and only if each non-diagonal
PQ31 n
PQ31 n
element of the matrix n=1
S or, equivalent, of E = n=1
D is positive.
Since S has Q states, the longest possible path between two states consists
of Q 1 hops. By summing over all powers of 1 n Q 1, the element
elm of the matrix E equals the number of all possible walks (of any possible
length) between l and m. Hence, if elm A 0 for all l 6= m, there exists walks
from any state l to any other state m. The converse is readily veried.
Another way to determine irreducibility follows from the denition of
reducibility in the Appendix A.4. However, the methods for strongly connectivity or irreducibility are still algebraic in that they require matrix operations. A computationally more e!cient method consists of applying allpair shortest path algorithms on the Markov graph. Examples of all-pair
shortest path algorithms are that of Floyd-Warshall (with computational
complexity FFloyd-Warschall = R(Q 3 )) or the algorithm of Johnson (complexity FJohnson = R(Q 2 log Q + Q O), where O is the total number of links in
the Markov graph). These algorithms are nicely discussed in Cormen et al.
(1991).
162
1
2
5
9
9
9
9
S =9
9
9
7
...
6
4
5
0 1 0 0
0 0 1 0
0 0 0 1
.. .. ..
. . .
0 0 0 0
1 0 0 0
..
.
0
0
0
..
.
:
:
:
:
:
:
:
1 8
0
Since the greatest common divisor of a set is the largest integer g that
divides any integer in a set, it is smaller than the minimum element in the
set. Thus,
q
1 gm min{Smm
A 0}
PQ
q+p
qSp +
q p
The relation Smm
= Smm
mm
o=1;o6=m Smo Som deduced from matrix multiplication and the fact that all elements in S are non-negative shows that
163
Q
X
u=1
Sluq
Q
X
o
p
o
Sun
Snl
Slmq Smm
Smlp
(9.12)
n=1
The hitting time Wm is also called the rst passage time into a state m.
164
(9.14)
p=1
q
X
p=1
p31
By denition of the hitting time, {Wm = p} = ^n=1
{[n 6= m} {[p = m}
such that
= Pr [[q = m|[p = m]
where the last step follows from the Markov property (9.2). Thus we obtain
Pr [[q = m|[0 = l] =
q
X
p=1
q
X
q3p
Pr [Wm = p|[0 = l] Smm
(9.15)
p=1
(9.16)
165
(9.17)
If the starting state l equals the target state m, then ull is the probability of
ever returning to state l. If ull = 1, the state l is a recurrent state, while,
if ull ? 1, state l is a transient state. If l is a recurrent state, the Markov
chain started at l will denitely (i.e. with probability 1) return to state l
after some time. On the other hand, if l is a transient state, the Markov
chain started at l has probability 1 ull of never returning to state l. For
an absorbing state l dened by Sll = 1, we have by (9.16) that ulm = 1,
implying that an absorbing state is a recurrent state. Further, the mean
return time to state m when the chain started in m is denoted by
pm = H [Wm ? 4|[0 = m]
(9.18)
n
X
q=1
Pr [[q = m|[0 = l]
q=1
n
X
Slmq
(9.19)
q=1
The average number of times that the Markov chain is ever in state m
given that it started from state l, is with Q (m) = limn<" Qn (m),
H [Q (m)|[0 = l] =
"
X
q=1
Pr [[q = m|[0 = l] =
"
X
Slmq
q=1
166
state m exceeds q, given the Markov chain started from state l. The event
{Q (m) q} is equivalent to the occurrence of the events {Q (m) q 1}
and the event that the Markov chain will return to m again given that it
started from m. The probability of the latter event is precisely umm . Thus, we
obtain the recursion
Pr [Q (m) q|[0 = l] = umm Pr [Q (m) q 1|[0 = l]
with solution for q 1,
Pr [Q (m) q|[0 = l] = (umm )q31 Pr [Q (m) 1|[0 = l]
Now, Pr [Q (m) 1|[0 = l] = Pr [Wm ? 4|[0 = l] = ulm , such that
Pr [Q (m) q|[0 = l] = (umm )q31 ulm
(9.20)
H [Q (m)|[0 = l] =
(9.21)
provided ulm A 0. If ulm = 0 then (9.20) vanishes for every q and thus
H [Q(m)|[0 = l] = 0, which means that state m is not reachable from state
l. In summary:
For a recurrent state m for which umm = 1, we obtain from (9.21) that
H [Q (m)|[0 = l] $ 4 (if ulm 6= 0 else H [Q (m)|[0 = l] = 0) and from
(9.20),
Pr [Q(m) = 4|[0 = l] = lim Pr [Q (m) q|[0 = l] = ulm
q<"
P
q
A state m is recurrent if and only if "
q=1 Smm $ 4.
For a transient state m for which umm ? 1, there holds that H [Q (m)|[0 = l]
will be nite and Pr [Q (m) = 4|[0 = l] = 0 or, equivalently,
Pr [Q (m) ? 4|[0 = l] = 1
P
q
A state m is transient if and only if "
q=1 Smm is nite.
These relations explain the dierence between a recurrent and a transient
state. When the Markov chain starts at a recurrent state, it returns innitely
often to that state because umm = Pr [Q (m) = 4|[0 = m] = 1. If the chain
starts at some other state l that is reachable from state m (ulm A 0), then the
chain will visit state m innitely often. From this analysis, some consequences
arise.
167
Corollary 9.2.2 A nite-state Markov chain must have at least one recurrent state.
Proof: Suppose the contrary that, if the state space V is nite, all states
are transient states. For a transient state m it follows from (9.21) that
P"
n
n
n=1 Slm is nite, which implies that limn<" Slm = 0 for any other state l.
If the state space is nite and all states are transient states, then
X
lim Slmn = 0
mMV
n<"
Since the summation has a nite number of terms, the limit and summation
operator can be reversed,
X
lim
Slmn = 0
n<"
mMV
P
mMV
q+o+p
Smm
Slmq Smlp
"
X
o=1
Sllo
168
or
o
Smm
"
X
o
Smm
o=q+p+1
"
X
q+o+p
Smm
o=1
Slmq Smlp
"
X
Sllo
o=1
P
o
It follows from (9.21) that the right-hand side diverges. Hence, "
o=1 Smm
diverges and relation (9.21) indicates that umm = 1 or, that m must be a
recurrent state.
(9.22)
Q
X
Snm n
(9.23)
n=1
169
PQ
initial state v[0] vanishes since relation (9.9) becomes m = l=1 vl [0]dlm =
P
n
d1m Q
l=1 vl [0] = d1m . Hence, D = limn<" S = x= or, componentwise, for
all 1 m Q,
lim Slmn = m
(9.24)
n<"
The sequence of matrices S> S 2 > S 3 > = = = > S n thus converges to D = x= for
su!ciently large n. Instead of multiplying the last matrix S n in the sequence
by S to obtain the next one S n+1 , with a same computational eort, the
n
sequence S> S 2 > S 4 > = = = > S 2 , obtained by successively squaring, converges
considerably faster to D = x= and may be useful for sparse S .
On the other hand, relation (9.22) is an eigenvalue equation with eigenvalue = 1 and eigenvector . The Frobenius Theorem A.4.2 states that
the transition probability matrix S has one eigenvalue = 1 with corresponding eigenvector . Since in (9.22) the set (S L)W W = 0 has rank
Q 1, the normalization condition kk1 = 1 furnishes the (last) remaining
equation. Except for the trivial case where S is the identity matrix L, the
solution of is obtained from
5
9
9
9
9
9
9
7
S11 1
S21
S31
S12
S22 1
S32
S13
S23
S33 1
..
..
..
.
.
.
S1;Q 1 S2;Q 1 S3;Q 1
1
1
1
..
.
SQ1;1
SQ1;2
SQ1;3
..
.
SQ 1
SQ 2
SQ 3
..
.
SQ 1;Q 1 1 SQ;Q 1
1
1
6 5
1
2
3
..
.
: 9
: 9
: 9
: =9
: 9
: 9
8 7
Q 1
Q
6
0
: 9 0:
: 9 0:
: 9 :
: =9 . :
: 9 . :
: 9 . :
8 7 08
1
(9.25)
6
A third method consists of a directed graph solution of linear, algebraic equations discussed by
Chen (1971, Chapter 3) and applied to the steady state equation (9.22) by Hooghiemstra and
Koole (2000).
170
Invoking (9.19) where Qqq(m) is the fraction of the time the chain is in state
m during the interval [1> q], the relation is equivalent to
H [Qq (m)|[0 = l]
= m
q<"
q
lim
(9.26)
The time average of the average number of visits to state m given the Markov
chain started in state l converges to the steady-state distribution. In other
words, the long run mean fraction of time that the chain spends in state m
equals m and is independent of the initial state l. From (9.21), it immediately follows that, if m is a transient state, m = 0. Only recurrent states
m have a non-zero probability m that the steady-state is in recurrent state
m. Lemma 6.1.1 and its consequence (9.26) suggests to investigate Qqq(m) for
recurrent states m.
If the Markov chain starts in a recurrent state m, we know from (9.21)
that the chain returns to state m innitely often. Let Zn (m) denote the time
of the n-th visit of the Markov chain to state m. Then,
Zn (m) = min(Qp (m) = n)
pD1
The interarrival time between the n-th and (n1)-th visit is n (m) = Zn (m)
Zn31 (m). The interarrival times {n (m)}nD1 are independent and identically
distributed random variables as follows from the Markov property. Indeed,
every time w the Markov chain returns to state m, it behaves from that time
onwards as if the Markov process would have started from state m, ignoring
the past before time w. Moreover, they have a common mean H [ (m)] =
H [1 (m)] equal to the mean return time to m given by H [Wm |[0 = m] = pm
because the hitting time is Wm = 1 (m). In other words, just as in renewal
theory in Chapter 8, we have a counting process {Qp (m)> p 1} with
associated waiting times Zn (m) and i.i.d. interarrival times n (m), specied
by the equivalence
{Qp (m) ? n} +, {Zn (m) A p}
Invoking the Elementary Renewal Theorem (8.13), we obtain with pm =
171
H [Wm |[0 = m]
lim
q<"
Qq (m)
1
=
q
pm
(9.27)
Thus, the chain returns to state m on average every pm time units and, hence,
the fraction of time the chain is in state m is roughly p1m . These results are
summarized as follows:
Theorem 9.3.1 (Limit Law of Markov Chains) If m is a recurrent state
and the Markov chain starts in state l, then, with probability 1,
1{Wm ?"}
Qq (m)
=
q<"
q
pm
lim
(9.28)
and
lim
q<"
ulm
H [Qq (m)|[0 = l]
=
q
pm
(9.29)
Proof: Above we have proved the case (9.27) where the initial state
[0 = m. In that case 1{Wm ?"} = 1. For an arbitrary initial distribution,
it is possible that the chain will never reach the recurrent state m. In that
case 1{Wm ?"} = 0 given [0 = l. It remains to proof (9.29). By denition,
0 Qq (m) q or, 0 Qqq(m) 1, which demonstrates that, for any q, Qqq(m)
is bounded. From the Dominated Convergence Theorem 6.1.4, we have
Qq (m)
Qq (m)
[0 = l = H lim
[0 = l
lim H
q<"
q<"
q
q
1{Wm ?"}
[0 = l
=H
pm
ulm
Pr [Wm ? 4|[0 = l]
=
=
pm
pm
which completes the proof.
Theorem 9.3.1 introduces the need for an additional denition. A recurrent state m is called null recurrent if pm = 4, in which case (9.29) reduces
to
H [Qq (m)|[0 = l]
=0
(9.30)
lim
q<"
q
By Tauberian theorems (which investigate conditions for the converse of
Lemma 6.1.1 but which are far more di!cult, as illustrated in the book by
Hardy (1948)), it can be shown that, for null recurrent states, the stronger
result limq<" Slmq = 0 also holds. A recurrent state m is called positive
172
recurrent if pm ? 4. The dierence between a transient and a null recurrent state that both obey (9.30) lies in the fact that, for a transient state,
the limit limq<" H [Qq (m)|[0 = m] is nite while, for a null recurrent state,
limq<" H [Qq (m)|[0 = m] = 4. Relation (9.30) indicates that for a null
recurrent state
H [Qq (m)|[0 = m] = R (qd )
where 0 ? d ? 1, while for a positive recurrent state
H [Qq (m)|[0 = m] = m q + r (q)
The strength of the increase of H [Qq (m)|[0 = m] leads to term positive recurrent states also as strongly ergodic states while null recurrent states are
called weakly ergodic. Figure 9.1 sketches the classication of states in a
Markov process.
state j
recurrent
transient
null recurrent
Sj = 0
positive recurrent
Sj > 0
aperiodic
lim Pijk S j
kof
periodic
lim Pijkd d S
k of
Fig. 9.1. Classication of the states in a Markov process with the corresponding
steady state vector m .
173
m=1
When taking the limit q $ 4 of both sides, the summation and limit
operator can be reversed because the summation involves a nite number of
terms. Hence,
Q
X
m=1
H [Qq (m)|[0 = l]
=1
q<"
q
lim
Q
X
n=1
Snm dn
(9.31)
174
Sml dm =
m=1
Q
X
Sml
m=1
Q
X
Q
X
Snm dn
n=1
dn
n=1
Q
X
Snm Sml =
m=1
Q
X
2
dn Snl
n=1
Q
X
2
dn Snl
n=1
Q
X
q
dn Snl
n=1
Q
X
dn = l
n=1
q<"
H [Qq (m)|[0 = l]
Qq (m)
1
= lim
=
= m
q<"
q
q
pm
(9.32)
and
Qq (m) qm g
$ Q(0> 1)
3@2 s
m m
q
(9.33)
175
q<"
(9.34)
Q
X
1
=1
pm
m=1
l=1
A Markov chain that is irreducible and for which all states are positive
recurrent is said to be ergodic. Ergodicity implies that both the steady-state
distribution and the long-run probability distribution limn<" v[n] are the
same. Ergodic Markov chains are basic stochastic processes in the study of
queueing theory.
9.3.3 Example: the two-state Markov chain
The two-state Markov chain is dened by
1s
s
S =
t
1t
and illustrated in Fig. 9.2. A matrix computation of the two-state Markov
1p
1q
1
q
chain is presented in Appendix A.4.2. Here, we follow a probabilistic approach. Since there are only two states, at any discrete-time n, there holds
that Pr [[n = 0] = 1Pr [[n = 1]. Hence, it su!ces to compute Pr [[n = 0].
By the total law of probability and the Markov property (9.2), we have
Pr [[n+1 = 0] = Pr [[n+1 = 0|[n = 1] Pr [[n = 1]
+ Pr [[n+1 = 0|[n = 0] Pr [[n = 0]
176
or, from Fig. 9.2, the Markov chain can only be in state 0 at time n + 1, if
it is in state 0 at time n and the next event at time n + 1 brings it back to
that same state 0, or if it is in state 1 at time n and the next event at time
n + 1 induces a transfer to state 0. Introducing the transition probabilities,
Pr [[n+1 = 0] = t Pr [[n = 1] + (1 s) Pr [[n = 0]
= t (1 Pr [[n = 0]) + (1 s) Pr [[n = 0]
= (1 s t) Pr [[n = 0] + t
This recursion can be iterated back to n = 0,
n31
X
(1 s t)m
Pr [[n = 0] = (1 s t) Pr [[0 = 0] + t
n
m=0
Pn31
m=0
{m =
13{n
13{
Pn31
t
t
n
+ (1 s t) Pr [[0 = 0]
Pr [[n = 0] =
s+t
s+t
m=0
{m =
(9.35)
s
s
(9.36)
+ (1 s t)n Pr [[0 = 1]
s+t
s+t
t
Observe from (9.35) and (9.36) that, if Pr [[0 = 0] = s+t
= Pr [[" = 0] and
s
Pr [[0 = 1] = s+t
= Pr [[" = 1], the Markov chain starts and remains the
whole time (for all n) in the steady-state. In addition, the probability of a
particular sequence of states can be computed from (9.3) or directly from
Fig. 9.2. For example,
9.4 Problems
177
9.4 Problems
(i) Given the transition probability matrix
5
0=8 0=2
S = 7 0=8 0=0
0=0 0=8
S>
6
0=0
0=2 8
0=2
(a) draw the Markov chain, (b) compute the steady-state vector in
three dierent ways.
(ii) Consider the discrete-time Markov chain with Q states and with
transition probabilities at each state m,
Sm>m+1 = 1
Sm1 =
1
m
1
m
(a) draw the Markov chain, (b) show that the drift is positive, but
that the Markov chain is nevertheless recurrent.
(iii) Assume that trees in a forest fall into four age groups. Let e [n],
| [n], p [n] and x [n] denote the number of baby trees, young trees,
middle-aged trees and old trees, respectively, in the forest at a given
time period n. A time period lasts 15 years. During a time period,
the total number of trees remains constant, but a certain percentage
of trees in each age group dies and is replaced with baby trees. All
surviving trees in the baby, young and middle-aged group enter into
the next age group. Surviving old trees remain old. Let 0 ? se > s| >
sp > sr ? 1 denote the loss rates in each age group in percent.
(a) Make a discrete Markov chain presentation of the process of
aging and replacement in the forest.
178
(b) The distribution of tree population amongst dierent age categories in time period n is represented by
W
{ [n] = e [n] | [n] p [n] x [n]
If {[n + 1] = S {[n], what is the transition probability matrix
S?
(c) Let se = 0=1> s| = 0=2> sp = 0=3> sr = 0=4 and suppose that
W
{ [n] = 5000 0 0 0 . What is the number of trees in
each category after 15 and after 30 years?
(d) What is the steady-state situation?
(iv) A faulty digital video conferencing system shows a clustered error
pattern. If a bit is received correctly, then the chance to receive the
next bit correctly is 0.999. If a bit is received incorrectly, then the
next bit is incorrect with probability 0.95.
(a) Model the error pattern of this system using the discrete-time
Markov chain.
(b) How many communicating classes does the Markov chain have?
Is it irreducible?
(c) In the long run, what is the fraction of correctly received bits
and the fraction of incorrectly received bits?
(d) After the system is repaired, it works properly for 99.9% of
the time. A test sequence after repair shows that, when always starting with a correctly received bit, the next 10 bits
are correctly received with probability 0.9999. What is the
probability now that a correctly (and analogously incorrectly)
received bit is followed by another correct (incorrect) bit?
10
Continuous-time Markov chains
10.1 Denition
For the continuous-time Markov chain {[(w)> w 0} with Q states, the
Markov property (9.1) can be written as
Pr[[(w + ) = m|[( ) = l> [(x) = {(x)> 0 x ? ] = Pr[[(w + ) = m|[( ) = l]
and reects the fact that the future state at time w + only depends on the
current state at time . Similarly as for the discrete-time Markov chain,
we assume that the transition probabilities for the continuous-time Markov
chain {[(w)> w 0} are stationary, i.e. independent of a point in time,
Slm (w) = Pr [[(w + ) = m|[( ) = l] = Pr [[(w) = m|[(0) = l]
(10.1)
Analogous to (9.5) and (9.6), the state vector v(w) in continuous-time with
components vn (w) = Pr [[(w) = n] obeys
v(w + ) = v( )S (w)
(10.2)
180
satises
S (w + x) = S (x)S (w) = S (w)S (x)
(10.3)
This fundamental relation1 (10.3) is called the Chapman-Kolmogorov equation. Furthermore, since the Markov chain must be at any time in one of
the Q states, the analogon of (9.8) is, for any state l,
Q
X
Slm (w) = 1
(10.4)
m=1
(10.5)
where S (0) = limw0 S (w). The relations (10.1), (10.3), (10.4) and (10.5) are
su!cient to describe the continuous-time Markov process completely.
k<0
k<0
On a higher level of abstraction, S (w) can be viewed as a linear operator acting upon the vector
space dened by all possible state vectors v(w). Relation S (w + x) = S (x)S (w) is known as
the semigroup property. The family of these commuting operators possesses an interesting
algebraic structure (see e.g. Schoutens (2000)).
181
the matrix
lim
k0
S (k) L
= S 0 (0) = T
k
(10.6)
exists. This matrix T is called the innitesimal generator of the continuoustime Markov process and it plays an important role as shown below. The
innitesimal generator T corresponds to S L in discrete-time. From (10.4),
Q
X
m=1>m6=l
and, dividing both sides by k and letting k approach zero, we nd for each
l with the denition of T that
Q
X
tlm = tll 0
(10.7)
m=1>m6=l
S (k)
Hence, the sum of the rows in T is zero, tlm = limk0 lmk 0 and tll 0.
The elements tlm of T are derivatives of probabilities and reect a change in
transition probability from state l towards state m, which suggests us to call
P
them rates. Usually, one denes tl = tll 0. Then, Q
m=1 |tlm | = 2tl ,
which demonstrates that T is bounded if and only if the rates tl are bounded.
Karlin and Taylor (1981, p. 140) show that tlm is always nite. For nitestate Markov processes, tm are nite (since tlm are nite), but, in general, tm
can be innite. If tm = 4, the state is called instantaneous because when
the process enters this state, it immediately leaves the state. In the sequel,
we conne the discussion to non-instantaneous states, thus 0 tm ? 4.
Continuous-time Markov chains with all states non-instantaneous are coined
conservative.
Probabilistically, (10.1) indicates that, for small k,
Pr [[(w + k) = m|[(w) = l] = tlm k + r(k)
Pr [[(w + k) = l|[(w) = l] = 1 tl k + r(k)
(l 6= m)
(10.8)
which clearly generalizes the Poisson process (see Theorem 7.3.1) and motivates us to call tl the rate corresponding to state l.
Lemma 10.2.2 Given the innitesimal generator T, the transition probability matrix S (w) is dierentiable for all w 0,
S 0 (w) = S (w)T
(10.9)
= TS (w)
(10.10)
182
These equations are called the forward (10.9) and backward (10.10) equation.
Proof: For w = 0, the lemma follows from the existence of T = S 0 (0).
The derivative S 0 (w) is dened, for w A 0, as
S 0 (w) = lim
k<0
S (w + k) S (w)
k
gSlm (w)
gw .
Using
Q
X
vm (w)Smn (k)
m=1
from which
Q
X
Smn (k)
vn (w + k) vn (w)
Snn (k) 1
= vn (w)
+
vm (w)
k
k
k
m=1>m6=n
Smn (k)
k
Q
X
and tn = limk0
tmn vm (w)
13Snn (k)
k
(10.11)
m=1>m6=n
which, together with the initial condition vn (0), completely determines the
probability vn (w) that the Markov process is in state n at time w=
183
(10.12)
tQ;2
tQ 3
tQ;Q31
tQ
(10.14)
Q
X
hn w {n |nW
n=1
where the inner or scalar vector product |nW {n = 1 while the outer product
{n |nW is an Q Q matrix,
6
5
{n1 |n1 {n1 |n2 {n1 |n3 {n1 |nQ
9 {n2 |n1 {n2 |n2 {n2 |n3 {n2 |nQ :
:
9
9
:
W
{n |n = 9 {n3 |n1 {n3 |n2 {n3 |n3 {n3 |nQ :
9
:
.
.
.
.
.
..
..
..
..
..
7
8
{nQ |n1 {nQ |n2 {nQ |n3
{nQ |nQ
If we further assume (thus omitting pathological cases) that S (w) is a stochastic, irreducible matrix
for any time w, Frobenius Theorem A.4.2 indicates that all eigenvalues hn w ? 1 and that only the largest one is precisely
equal to 1, say h1 w = 1, which corresponds to the steady-state eigenvector
|1W = and {1 = x, where xW = [1 1 1]. Frobenius Theorem A.4.2
184
implies that all eigenvalues of T have a negative real part, except for the
steady-state eigenvalue 1 = 0. Hence, we may write
S (w) = x +
Q
X
(10.15)
n=2
where S" = x is the Q Q matrix with each row containing the steadystate vector . The expression (10.15) is called the spectral or eigen decomposition of the transition probability matrix S (w).
Apart from the eigen decomposition method and the Taylor expansion
Tw
"
X
(Tw)n
n=0
n!
Tw q
Tw
S (w) = h = lim L +
q<"
q
(10.16)
(10.17)
185
Pr [m w + W ]
= Smm (w)
Pr [m A W ]
which holds for any W and thus also for W = 0, where Pr [m A 0] = 1. The
distribution of the sojourn time at state m satises
Pr [m w] = h3m w = Smm (w)
After dierentiation evaluated at w = 0, we nd m = tm .
2. An alternative demonstration of the exponential sojourn times starts
by considering for an initial state m, the probability Kq that the process
remains in state m during an interval [0> w]. The idea is to rst sample the
continuous-time interval with step qw and afterwards proceed to the limit
q $ 4, which corresponds to a sampling with innitesimally small step,
2w
w
= m> [
= m> = = = > [ (w) = m
Kq = Pr [(0) = m> [
q
q
q31
Y
pw
(p + 1)w
= m [
= m Pr [[(0) = m]
=
Pr [
q
q
p=0
q
= m [ (0) = m
= Pr [
Pr [[(0) = m]
q
q
w
Pr [[(0) = m]
= Smm
q
where (9.3) and (10.1) are used. For large q, Smm qw can be expanded in a
Taylor series around the origin,
1
w
w
0
= Smm (0) + Smm (0) + R
Smm
q
q
q2
w
1
= 1 tm + R
q
q2
186
such that
q
1
w
w
= exp q log 1 tm + R
Smm
q
q
q2
w
w
1
1
= tm + R
log 1 tm + R
q
q2
q
q2
which shows that
q
w
= h3tm w
lim Smm
q<"
q
q<"
Hence, the probability that the process remains in state m at least for a
duration w equals
Pr [[(x) = m> 0 x w] = h3tm w Pr [[(0) = m]
Conditioned to the initial state with (2.44),
Pr [[(x) = m> 0 x w|[(0) = m] = Pr [m w] = h3tm w
(10.18)
10.3 Steady-state
187
10.3 Steady-state
Theorems 9.3.4 and 9.3.6 demonstrate that, when a nite-state Markov chain
is irreducible (all states communicate and Slm (w) A 0), the steady-state
exists. Since, by denition, the steady-state does not change over time, or
limw<" S 0 (w) = 0, it follows from (10.9) and (10.10) that
TS" = S" T = 0
where limw<" S (w) = S" . This relation implies that S" is the adjoint
matrix of T belonging to eigenvalue = 0, which plays a role analogous
to = 1 in the discrete case. By the same arguments as in the discrete
case and as shown in Section 10.2.2, all rows of S" are proportional to the
eigenvector of T belonging to = 0. Thus, the steady-state (row) vector
is solution of
T = 0
(10.19)
Q
X
m tml
(10.20)
m=1>m6=l
This equation has a continuity or conservation law interpretation. The lefthand side reects the long-run rate at which the process leaves state l. The
right-hand side is the sum of the long-run rates of transitions towards the
state l from other states l 6= m or the aggregate long-run rate towards state
l. Both in- and outwards ux at any state l are in steady-state precisely
in balance. Therefore relations (10.20) are called the balance equations.
The balance equation (10.20) directly follows from the dierential equation (10.11) of the state probabilities vn (w) since limw<" vn (w) = n and
limw<" v0n (w) = 0.
Alternatively, the steady-state vector obeys (10.2) or
= v(0)S" = lim v(0)hTw
w<"
which, together with (10.14), implies that all eigenvalues of T must have negative real part such that only = 0 determines the steady-state. This stability condition on the eigenvalues corresponds to that in a linear, time-variant
system. Since all rows in S" are equal (see also (10.15)), the dependence of
the steady-state vector on the initial state drops out. For, analogous to
188
Q
X
n=1
Q
X
n=1
Slm (k)
Pr [{[(k) = m} _ {[(k) 6= l} |[(0) = l]
=
Pr [[(k) 6= l|[(0) = l]
1 Sll (k)
tlm
tl
P
By (10.7), we see that m=1>m6=l Ylm = 1, demonstrating that, given a transition, it is a transition out of state l to another state m. The quantities Ylm
correspond to the transition probabilities of the embedded Markov chain.
189
Alternatively, we can write the rate tlm in terms of the transition probabilities Ylm of the embedded Markov chain as
tlm = tl Ylm
(10.21)
Since tl is the rate (i.e. the number of transitions per unit time) of the
process in state l, relation (10.21) shows that the transition rate tlm from
state l to state m equals the rate of transitions in state l multiplied by the
probability that a transition from state l to state m occurs. By denition,
Yll = 0. For, if we assume that Ylm A 0, relation (10.21) would result in
tll = Yll tl A 0 which contradicts the denition tll = tl . Hence, in the
embedded Markov chain specied by the transition probability matrix Y ,
there are no self-transitions (Yll = 0), which is equivalent to the fact that
the sum of the eigenvalues of Y is zero (A.7), since trace(Y ) = 0.
From the steady-state equation or balance equation (10.20), (10.21) and
Yll = 0, we observe that
l tl =
Q
X
m tm Yml
m=1
On the other hand, the embedded Markov chain has a steady-state vector y
that obeys (9.22) or (9.23)
yl =
Q
X
ym Yml
m=1
and kyk1 = 1. The relations between the steady-state vectors of the continuoustime Markov chain and of its corresponding embedded discrete-time Markov
chain y, are
l tl
(10.22)
yl = PQ
m=1 l tl
yl @tl
l = PQ
m=1 ym @tm
(10.23)
10.4.1 Uniformization
The restriction Yll = 0 or tll = 0, which means that there are no selftransitions from a state into itself, can be removed. Indeed, we can rewrite
190
the basic relation (10.12) between the transition probability matrix S (w) and
the innitesimal generator T for all as
T
T
= h3w exp w L +
S (w) = exp Lw + w L +
Dening W () = L + T
and maxl tl , a description, alternative to (10.15),
(10.16) and (10.17), appears
Slm (w) = h3w
"
X
(w)n
n=0
n!
Wlmn ()
(10.24)
where W () is a stationary transition probability matrix and, hence, a stochastic matrix.
We also observe that W () = T + L can be regarded as a rate matrix,
with the property that, for each state l,
Q
X
Wlm () =
m=1
Q
X
m=1
Tlm +
Q
X
lm =
m=1
the transition rate in any state l is precisely the same, equal to . Whereas
the embedded Markov chain dened by (10.21) has no self-transitions (Yll =
P
tl
0), we see for any l and m, that Wll () = 1 1 Q
m=1;m6=l tlm = 1 0 while
t
Wlm () = lm . Hence, W () can be interpreted as an embedded Markov chain
that allows self-transitions. In view of (10.21), the embedded structure of
W () is summarized as
tlm = Wlm ()
for l 6= m
q33
191
chain W () is
0
t24
t2
0
0
0
0
0
0
0
0
0
t65
t6
6
0
:
0 :
t36 :
:
t3 :
t46 :
t4 :
t56 :
t5 8
0
Q
X
n=1
T
,
Q
X
tnm
wn ()
nm +
wm () =
n=1
= wm () +
Q
1X
wn ()tnm
n=1
or,
wn ()tn =
Q
X
wn ()tnm
n=1;n6=m
where wn () = n (independent of ) since it satises the balance equation (10.20) and Theorem 9.3.5 assures that the steady-state of a positive
recurrent chain is unique.
We will now interpret (10.24) probabilistically. Let Q(w) denote the
total number of transitions in [0> w] in the uniformized (discrete) process
{[n ()}. Since the transition rates tl = are all the same, Q (w) is a Poisson process with rate because, for any continuous-time Markov chain,
the inter-transition or sojourn times are i.i.d. exponential random variables.
n
Thus, Pr [Q (w) = n] = h3w (w)
is recognized as the probability that the
n!
number of transitions that occur in [0> w] in the uniformized Markov chain
with rate equals n. With (9.10), Wlmn () = Pr [[n () = m|[0 () = l] is the
n-step transition probability of that discrete {[n ()} uniformized Markov
192
"
X
n=0
or, the probability that the continuous Markov process moves from state l
to state m in a time interval of length w, can be decomposed in an innite
sum of probabilities. Each probability corresponds to a transition from state
l to state m in n-steps, where the number of intermediate transitions n is a
Poisson counting process with rate .
10.4.2 A sampled-time Markov chain
The sampled-time Markov chain approximates the continuous Markov process
in that the transition probabilities Slm (w) are expanded to rst order as in
(10.8) with xed step k = w. The transition probabilities of the sampledtime Markov chain are
Slm = tlm w
(l 6= m)
Sll = 1 tl w
Clearly, the sampled-time Markov chain also allows self-transitions, as illustrated in Fig. 10.1.
1 q12't
q12
1
q51
2
q52
q23
q53
q34
4
q45
Continuous-time Markov process
1 q23't
q12't
q23't
q53't
q51't
1 q34't
3
q34't
q52't
4 1 q45't
q45't
1 (q51 + q52 + q53't
Sampled-time Markov chain
5
From (10.8), we observe that the approximation lies in two facts: (a) w is
xed such that tlm w Pr [[(w + w) = m|[(w) = l] is increasingly accurate
as w $ 0 and (b) transitions occur at discrete times every w time units.
The sampling step w should be chosen such that the transition probabilities
obey 0 Slm 1, from which we nd that w max1l tl .
193
Q
X
Q
X
Snm yn = w
n=1
tnm yn + (1 tm w) ym
n=1;n6=m
or
tm ym =
Q
X
tnm yn
n=1;n6=m
Proof: If l is an absorbing state (tl = 0), then, by denition, Slm (w) = lm
for all w 0. For a non-absorbing state l and a process starting from state l,
the event { w> [( ) = n} _ {[(w) = m} is possible if and only if the rst
transition from l to n occurs at some time x 5 [0> w] and the next transition
from n to m takes place in the remaining time w x. The probability density
g
Pr [l w] = tl h3tl w for w 0 and
function of the sojourn time is il (w) = gw
194
Furthermore,
Pr[ w and [(w) = m|[(0) = l] =
n6=l
and
Pr [ A w and [(w) = m|[(0) = l] = lm Pr [l A w] = lm h3tl w
Finally,
Slm (w) = Pr [[(w) = m|[(0) = l]
= Pr[ w and [(w) = m|[(0) = l] + Pr[ A w and [(w) = m|[(0) = l]
Combining all above relations into the last one proves the theorem.
By a change of variable v = w x in (10.25), we have
Slm (w) = lm h3tl w + tl
Z
Yln h3tl w
n6=l
n6=l
X
n6=l
n6=l
195
n6=l
n6=l
Q
X
n=1
T=
where > 0. We will solve S (w) from the forward equation (10.9),
0
0 (w)
S11 (w) S12
S11 (w) S12 (w)
=
0 (w) S 0 (w)
S21
S21 (w) S22 (w)
22
which actually contains two independent transition probabilities because
S12 (w) = 1 S11 (w) and S21 (w) = 1 S22 (w). The forward equation simplies
to
0
(w) = ( + )S11 (w) +
S11
0
(w) = ( + )S22 (w) +
S22
Only the rst equation needs to be solved since, by symmetry, the solution
of S11 (w) equals that of S22 (w) after changing the role of $ and $ .
The linear, rst-order, non-homogeneous dierential equation consists of
the solution to the corresponding homogeneous dierential equation and a
particular solution. The solution of the homogeneous dierential equation,
0 (w) = ( + )S (w), is S (w) = Fh3(+)w . The particular solution
S11
11
11
is generally found by variation of the constant F, which proposes S11 (w) =
196
3(+)w
+
h
+ +
3(+)w
+
h
S22 (w) =
+ +
S11 (w) =
=
Pr _np=1 {[q+p = {q+p }
Since the intersection is commutative D _ E = E _ D, the indices can be reversed,
197
The original stationary process is a Markov process that satises (9.2). Using (9.2)
and (9.3) we have
and, similarly,
n
Y
Hence,
Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ]
Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 ]
Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ]
=
Pr [[q+1 = {q+1 ]
U=
Applying Bayes rule (2.48) to the last relation nally proves the theorem.
m Sml
l
(10.26)
A Markov chain is said to be time reversible if, for all l and m, Slm = Ulm .
From (10.26), the condition for time reversibility is
l Slm = m Sml
(10.27)
This condition means that, for all states l and m, the rate l Slm from state
l $ m equals the rate m Sml from state m $ l. An interesting property of
time reversible Markov chains is that any vector { satisfying k{k1 = 1 and
198
(10.28)
We will now show that the rates ul of the time-reversed continuous Markov
process are indeed exponential random variables. Assume that the timereversed process is in state l at time w. The probability that the process is
still in state l at reversed time w x is, using Theorem 10.2.3,
Pr [[( ) = l> w x w]
Pr [[(w) = l]
Pr [[(w x) = l] h3tm w
= h3tm w
=
Pr [[(w) = l]
ym Yml
m tm Yml
=
yl
l tl
(10.29)
10.8 Problems
199
Comparing (10.29) with the discrete case (10.26), we see that the transition
probabilities Slm and Ulm are changed for the rates tlm and ulm . We know
that m is the portion of time the process (both forward and reversed) spend
in state m and that tlm is the rate at which the process makes transitions
from state l to state m. Equation (10.29) has again a balance interpretation:
m tml is the rate at which the forward process moves from state m to l, while
l ulm is the rate of the time-reversed process from state l to m and both rates
are equal. Intuitively, when a process jumps from state l $ m in forward
time, it is plain that the process makes, in reversed time, just the opposite
transition from m $ l. Similarly as above, a continuous-time Markov chain
is time reversible if, for all l and m, it holds that ulm = tlm . For these processes
(which occur often in practice, as demonstrated in the chapters on queueing),
the rate from l $ m is equal to the rate from m $ l since l tlm = m tml .
10.8 Problems
(i) Consider a computer that has two identical and independent processors. The time between failures has an exponential distribution. The
mean value of this distribution is 1000 hours. The repair time for a
damaged processor is exponentially distributed as well, with a mean
value of 100 hours. We assume damaged processors can be repaired
in parallel. There are clearly three states for this computer: (1) both
processors work, (2) one processor is damaged and (3) both processors are damaged.
(a) Make a continuous Markov chain presentation of these states.
(b) What is the innitesimal generator matrix T for this Markov
chain? Give the relation between the state probability at time
w and its derivative.
(c) Calculate the steady-state of this process.
(d) What is the availability of the computer if (i) both processors
are required to work, or (ii) at least one processor should work.
(ii) Consider two identical servers that are working in parallel. When one
server fails, the other has to do the whole job alone under a higher
load. The failure times of servers are exponentially distributed: H =
3 1034 k31 , when the servers are equally loaded and F = 7 1034
k31 , when one of the servers works under the full load. In addition,
both servers may fail at the same time with a failure rate of E =
6 1035 k31 .
As soon as one of the servers fails, the repair is initiated. The
200
11
Applications of Markov chains
This chapter illustrates the theory of Markov chains with several examples.
Examples of queueing problems are deferred to later chapters. Generally,
Markov processes can be solved explicitly provided the transition probability
matrix S or the innitesimal generator T has a special structure. Only in a
very small number of problems is the entire time dependence of the process
available in analytic form.
202
m
X
Pr [\q+1 = n] =
n=1
m
X
dn = Dm
n=1
is
6
d4
d4 :
:
d4 :
:
D4 8
P
A related discrete-time Markov chain is [q = qn=1 \n which obeys [q+1 =
[q + \q+1 . Furthermore, if m l, Pr [[q+1 = m|[q = l] = 0 because the
random variables \q are non-negative such that the sum cannot decrease by
adding a new member. If m A l, then
d3
d3
D3
0
6
:
:
:
:
8
This list can be extended by considering other integer functions of the set
Q
{\q }qD1 (such as [q+1 = min [[q > \q+1 ] or [q = qn=1 \n etc.).
11.2 The general random walk
The general random walk is an important model that describes the motion
of an item that is constrained to moving either one step forwards, stay at the
position where it currently is or move one step backwards. In general, this
three-possibility motion has transition probabilities that depend on the
position m as depicted in Fig. 11.1. Figure 11.1 illustrates that, if the process
is in state m, it has three possible choices: remain in state m with probability
203
u0 s0 0 0
t1 u1 s1 0
0 t2 u2 s2
..
..
..
..
.
.
.
.
0 0 0 0
0 0 0 0
..
.
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
:
:
:
:
:
:
:
8
(11.1)
...
j1
pj
j
j+1
...
qj
The general random walk serves as model for a number of practical phenomena:
The one-dimensional motion of physical particles, electrons that hop from
one atom to another. In this case, the number of states Q can be very
large.
The gamblers ruin problem: a state m reects the capital of a gambler
whereas the sm is the chance that the gambler wins while tm is the probability that he looses. The gambler achieves his target when he reaches
state Q , but he is ruined at state 0. In that case are both states absorbing states with u0 = uQ = 1. In games, most often the probabilities are
independent of the state and simplify to sm = s, tm = t and um = 1 s t=
The continuous-time counterpart, the birth and death process (Section
11.3), has applications to queueing processes. For a wealth of examples
and applications of the random walk, we refer to the classical treatise of
Feller (1970, Chapter III, XIV).
204
Q
X
n=0
Q
X
n=0
(Markov property)
=
Q
X
n=0
(1 m ? Q )
tm
tm
(11.2)
xm
xm+1 = xm31 + 1 +
sm
sm
Iteration on m for the rst few values using x0 = 1 yields
t1
t1
x2 = + 1 +
x1
s1
s1
t1
t2
t2
t1 t2
t1 t2
t1
x2 =
+ 1+
x1
x3 = x1 + 1
+
+
s2
s2
s1 s1 s2
s1 s1 s2
205
which suggests
!
m31 Y
m31 Y
n
n
X
X
tp
tp
+ 1+
xm =
x1
s
s
p
p
p=1
p=1
n=1
n=1
n=m
1+
p=1 sp
tp
n=1
p=1 sp
PQ31 Qn
(11.3)
tm
tm
1
m+1 = m31 + 1 +
m
sm
sm
sm
By iteration,
1
s1
1
3 =
s2
1
4 =
s3
1
=
s1
2 =
t1
+ 1+
1
s1
t2
t2
1
1
t1 t2
t2
t1
1 + 1 +
+
2 =
1+
+ 1+
1
s2
s2
s1 s2
s1
s1 s1 s2
t3
t3
2 + 1 +
3
s3
s3
1
t2 t3
t1 t2
t1 t2 t3
t2
1
t3
t1
+
+
+
1+
1+
+ 1+
1
s2
s1
s3
s2
s1 s2
s1 s1 s2
s1 s2 s3
or
m31
X
1
m =
sn
n=1
n31 Y
q
X
tn3p+1
1+
sn3p
q=1 p=1
m31 Y
n
X
tp
1+
sp
!
1
n=1 p=1
206
!
m31
q
n31 Y
X
X
1
tn3p+1
m =
1+
sn
sn3p
q=1 p=1
n=1
Pn31 Qq
! PQ31 1
tn3p+1
m31 Y
n
1
+
X
n=1 sn
q=1
p=1 sn3p
tp
+ 1+
PQ31 Qn
s
1 + n=1 p=1 tp
p=1 p
(11.4)
sp
n=1
Q3m
Q .
If the
if t s
if t ? s
which demonstrates that the gambler surely will loose all his money if his
chances s on winning are smaller than those t on losing. Even in a fair
game where s = t, he will be defeated surely. In a favorable game (s A t)
m
and with start capital m, ruin is possible with probability st . Another
interpretation is a game with two players d and e in which player d starts
with capital m and has winning chance of s, while player e starts with capital
Q m and wins with probability t = 1 s.
Similarly, the mean duration of the game (11.4) simplies to
3
m 4
n 4 3
n
t
t
t
m31
Q31
1
1
1
XE
s
s
s
FX
F E
m =
D+C
C
Q D
st
st
n=1
n=1
1 st
or
5
H [W |[0 = m] =
m 4
t
s
1 9 E1
:
F
7Q C
Q D m 8
st
1 st
207
(11.5)
(11.6)
(11.7)
sQ31 Q31 = tQ Q
Explicitly, for a few values of m, we observe that
s1 1 = (s0 0 t1 1 ) + t2 2 = t2 2
s2 2 = (s1 1 t2 2 ) + t3 3 = t3 3
sm
tm+1 m
starting at m = 0, we nd
m
Y
sm sm31
sp
s0
=
0 = 0
tm+1 tm
t1
tp+1
p=0
which determines the complete steady-state vector for the general random
walk as
Qm31 sp
m =
p=0 tp+1
1+
PQ Qm31
m=1
sp
p=0 tp+1
(11.8)
These relations remain valid even when the number of states Q tends to
innity provided the innite sum converges.
208
(1 ) m
1 Q+1
(11.9)
0
1
0
0
0
..
.
0
0
0
0
0
(1 + 1 )
1
0
0
0
2
(2 + 2 )
2
0
0
0
3
(3 + 3 )
3
0
0
0
4
(4 + 4 ) 4
..
..
..
..
..
.
.
.
.
.
0
0
0
0
0
..
.
..
.
6
:
:
:
:
:
:
8
The transition graph is shown in Fig. 11.2. Although the theory in the
previous chapter was derived for nite-state Markov chains, the birth and
death process is a generalization to an innite number of states. The general
random walk (Section 11.2) forms the embedded Markov chain of the birth
and death process with transition probabilities specied by (10.21) resulting
l
l
in Yl>l31 = l+
, Yl>l+1 = l+
and Yln = 0 for n 6= l 1 6= l + 1.
l
l
The transition probability matrix is a tri-band diagonal matrix which is
irreducible if all m A 0 and m A 0.
O0
Oj1
...
j1
P1
Oj
j
Pj
j+1
...
Pj+1
(11.10)
v0n (w) = (n + n ) vn (w) + n31 vn31 (w) + n+1 vn+1 (w)
(11.11)
with initial condition vn (0) = Pr [[(0) = n]. Exact analytic solutions for
1
Kleinrock (1975, p. ix) mentions that William Feller was the father of the birth and death
process.
209
any n and n are not possible. Indeed, let us denote the Laplace transform
of vn (w) by
Z "
h3}w vn (w)gw=
(11.12)
Vn (}) =
0
Since vn (w) is a continuous and bounded function (|vn (w)| 1 for all w A 0),
the Laplace transform exists for Re(}) A 0. The Laplace transform of (11.10)
and (11.11) becomes,
(0 + }) V0 (}) = v0 (0) + 1 V1 (})
(11.13)
(11.14)
which is a set of dierence equations more complex due to the initial condition vn (0) than the set (A.51) in Appendix A.5.2.3. That set (A.51) appears
in the general random walk whose solution is shown to be intractable in general. This innite set of dierential equations has been thoroughly studied
over years under several simplifying conditions for n and n , for example,
n = and n = for all n. As shown in Chapter 13, they form the basis
for the simplest set of queueing models of the family M/M/m/K.
m =
p
p=0 p+1
m=1
Qm31 p
p=0 p+1
P" Qm31 p
+ m=1 p=0 p+1
1+
1
P" Qm31
(11.15)
m1
(11.16)
Theorem 9.3.4 states that an irreducible Markov chain with a nite number of states is necessarily recurrent. However, it is in general di!cult
to decide whether an irreducible Markov chain with an innite number of
states is recurrent or transient. In case of the birth and death process, it
is possible to determine when the process is transient or recurrent. The
process is transient if and only if the embedded Markov chain (determined
210
p=1 sp
tp
p=1 sp
Thus, for any xed initial state m, the condition for a recurrent chain
Pr [Wm ? 4|[0 = l] = 1
PQ31 Qn
tp
is only possible in the limit Q $ 4 if limQ<" n=0
p=1 sp = 4. Transformed to the birth and death rates, the condition for recurrence becomes
P
Qm31 p
2 = "
p=0 p = 4. Furthermore, we observe from (11.16) that the
m=1
Qm31 p
P
innite series 1 = "
p=0 p+1 must converge to have a stationary or
m=1
steady-state distribution.
In summary, if 1 ? 4 and 2 = 4 the birth and death process is
positive recurrent. If 1 = 4 and 2 = 4, it is null recurrent. If 2 ? 4,
the birth and death process is transient.
11.3.2 A pure birth process
A pure birth process is dened as a process {[(w)> w 0} for which in any
state l it holds that l = 0. It follows from Fig. 11.2 that a birth process
can only jump to higher states such that Slm (w) = 0 for m ? l. Similarly, in
a pure death process {[(w)> w 0} all birth rates l = 0.
11.3.2.1 The Poisson process
Let us rst consider the simplest case where all birth rates are equal l =
and where Smm (w) = Pr [m A w|[(0) = m] = h3w . Using either the back or
forward equation or (10.25) with Yl>l+1 = l>l+1 , yields
Z w
3w
Slm (w) = lm h
+
h3x Sl+1>m (w x)gx
0
3w
Sl>l+n (w) = h
Explicitly, for n = 1,
Z
3w
Sl>l+1 (w) = h
211
(w)n 3w
h
n!
(11.17)
Z
= h3w
0
w
hx
n31
(x)
(w) 3w
h3x gx =
h
(n 1)!
n!
lD0
Pr [[(x) = l] Sl>l+n (w x)
lD0
( (w x))n 3(w3x) X
h
=
Pr [[(x) = l]
n!
lD0
( (w x))n 3(w3x)
h
n!
(11.18)
and, since [(0) = 0 and the increments are independent (Markov property),
we conclude that the pure birth process is a Poisson process (Section 7.2).
11.3.2.2 The general birth process
In case the birth rates n depend on the actual state n, the pure birth process
can be regarded as the simplest generalization of the Poisson. The Laplace
212
(11.19)
Q
with the convention that ep=d i (p) = 1 if d A e. The validity of this
general solution is veried by substitution into the dierence equation for
Vn (}). The form of Vn (}) is a ratio that can always be transformed back to
the time-domain provided that n is known. If all n A 0 are distinct, using
(2.38) with f A 0, we nd
vn (w) =
n
X
vm (0)
m=0
n31
Y
p=m
1
p
2l
f+l"
f3l"
Qn
h}w
p=m
(p + })
g}
By closing the contour over the negative real plane (Re(}) ? 0), only simple
poles at } = q are encountered,
1
2l
f+l"
f3l"
h}w
Qn
p=m
(p + })
g} =
n
X
Qn
q=m
h3q w
p=m;p6=q (p
q )
resulting in
vn (w) =
n
X
m=0
vm (0)
n
X
q=m
h3q w
Qn
Qn31
p=m
p=m;p6=q (p
p
q )
(11.20)
213
Q
q313m (q m)! and
and with p=m (p q) = (1)q313m q3m
o=1 o = (1)
Qn3q
Qn
p=q+1 (p q) =
o=1 o = (n q)! we nd
Qn31
(1)q313m (n 1)!
p=m p
=
Qn
(m 1)!(q m)!(n q)!
p=m;p6=q (p q )
Qq31
such that
n
X
q=m
Qn31
n3m
(n 1)! X h3(q+m)w (1)q31
=
Qn
(m 1)! q=0 q!(n m q)!
p=m;p6=q (p q )
n3m
(n 1)!h3mw X n m 3w q
=
h
(n m)!(m 1)! q=0
q
n3m
n 1 3mw
1 h3w
h
=
m 1
h3q w
p=m
p
Finally, for the Yule process, we obtain from (11.20) the evolution of the
state probabilities over time
n
n3m
X
n 1 3mw
vn (w) =
1 h3w
h
vm (0)
(11.21)
m1
m=0
In practice, vm (0) = mq if the process starts from state q (implying vn (w) = 0
for n ? q because the process moves to the right for w 0) and the general
form simplies to
n3q
n 1 3qw
1 h3w
vn (w) =
h
(11.22)
q1
The Yule process has been used as a simple model for the evolution of a
population in which each individual gives birth at exponential rate and
[(w) denotes the number of individuals in the population (that never decreases as there are no deaths) as a function of time w. At each state n
the population has precisely n individuals that each generate births such
that n = n, the birth rate of the population. If the population starts at
w = 0 with one individual q = 1, the evolution over time has the distribution
n31
, which is recognized from (3.5) as a geometric
vn (w) = h3w 1 h3w
distribution with mean hw . Since the sojourn times of a Markov process
are i.i.d. exponential random variables, the average time Wn to reach n inPn 1
1 Pn
1
dividuals from one ancestor equals H [Wn ] =
m=1 m =
m=1 m which
is well approximated (Abramowitz and Stegun, 1968, Section 6.3.18) as
H [Wn ] log(n+1)+
, where = 0.577 215. . . is Eulers constant. If the
214
population starts with q individuals, the distribution (11.22) at time w consists of a sum of q i.i.d. geometric random variables, which is a negative
binomial distribution. The Yule process has been employed for example as
a crude model to estimate the spread of a disease or epidemic and the split
of molecules in new species by cosmic rays.
m0
(11.23)
only depends on the ratio of birth over death rate. The time-dependent
constant rate birth and death process can still be computed in analytic
form. In this case, the matrix form of the innitesimal generator T has
the tri-band Toeplitz structure, which can be diagonalized in analytic form
as shown in Appendix A.5.2.1. In this section, we present an alternative
approach. Instead of dealing with an innite set of dierence equations,
a generating function approach seems more convenient. Let us denote the
generating function of the Laplace transforms Vn (}) by
*({> }) =
"
X
Vn (}){n
(11.24)
n=0
"
h3}w
"
X
vn (w){n gw
n=0
where the reversal of summation and integration is allowed because all terms
are positive. Since 0 vn (w) 1, the sum is at least convergent for |{| ? 1,
which shows that *({> }) is analytic inside the unit circle |{| ? 1 for any
Re(}) A 0.
After multiplying (11.14) by {n and summing over all n, we obtain
( + }) V0 (}) + ( + + })
"
X
n=1
Vn (}){n =
"
X
vn (0){n +
n=0
+ V1 (}) +
"
X
Vn31 (}){n
n=1
"
X
n=1
Vn+1 (}){n
215
(1 {) V0 (}) {m+1
{2 { ( + + }) +
(11.25)
still depends on the unknown function V0 (}). The following derivation involving the theory of complex functions demonstrates a standard procedure
that will also be useful in other queuing problems.
The denominator in (11.25) has two roots,
q
1
++}
+
{1 =
( + + })2 4
2
2 q
1
++}
{2 =
( + + })2 4
2
2
We need the powerful theorem of Rouch (Titchmarsh, 1964, p. 116) to
deduce more on the location of {1 and {2 .
Theorem 11.3.1 (Rouch) If i (}) and j(}) are analytic inside and on a
closed contour C, and |j(})| ? |i (})| on C, then i (}) and i (}) + j(}) have
the same number of zeros inside C.
Choose i ({) = { ( + + }) and j({) = {2 such that i ({) + j({) =
{2 { ( + + }) + , the denominator in (11.25). Since both i ({) and
j({) are polynomials, they are analytic everywhere in the complex {-plane.
We know that *({> }) is analytic inside the unit disk. If the roots {1 or {2 lie
inside the unit disk, the numerator in (11.25) must have zeros at precisely the
same place in order for *({> }) to be analytic inside the unit disk. Hence, we
consider as contour C in Rouchs Theorem, the unit circle |{| = 1. Clearly,
inside the unit circle (because A 0, A 0
i ({) has one single zero ++}
and Re(}) A 0). Furthermore, on the unit circle |{| = 1,
| { ( + + })| | |{| |( + + })|| = | |( + + })|| A = |{2 |
which shows that |j(})| ? |i (})| on the unit circle. Rouchs Theorem then
tells us that i ({) + j({) has precisely one zero inside the unit circle. This
216
implies that |{1 | A 1 and |{2 | ? 1 and that the numerator in (11.25) has a
zero {2 ,
(1 {2 ) V0 (}) {m+1
=0
2
This relation determines the unknown function V0 (}) as
V0 (}) =
{m+1
2
(1 {2 )
(1 {) {m+1 (1 {2 )
{m+1
= 2
*({> }) =
({ {1 ) ({ {2 )
(1 {2 ) ({ {1 ) ({ {2 )
We know that the numerator can be divided by ({ {2 ), or explicitly,
(1 {) {m+1 (1 {2 ) = {m+1
{m+1 + {2 {({m {m2 )
{m+1
2
2
" m
#
m31
X m3n
X
{2 {n + {2 {
{2m313n {n
= ({ {2 )
"
= ({ {2 )
n=0
{m2
n=0
+ ({2 1)
m
X
{2m3n {n
n=1
Finally,
P
{m2 + ({2 1) mn=1 {2m3n {n
*({> }) =
(1 {2 ) ({ {1 )
(11.26)
1
(1 {2 )
nAm
"
X
!
dn {n
n=0
n
"
X X
n=0
dn3p
{p
1
p=0
"
X
!
p
{3p
1 {
p=0
{n
217
n31 p
n
X
1
{
{
{m3n
2
2
= 2
+
1n?m
(1 {2 ) {n1 p=0 {p
1
!
m 13n
1
{
{
{
{m2 {m3n
1
1
2 1
=
+ 2
1n?m
(1 {2 ) {n1
{1 {2
Vn (}) =
This expression can be put in dierent forms by using relations among the
zeros {1 and {2 , such as {1 + {2 = ++}
and {1 {2 = . This ingenuity
is required to recognize in Vn (}) a known Laplace transform. Otherwise,
one has to proceed by computing the inverse Laplace transform by contour
integration via (2.38). In any case, the computation needs advanced skills in
complex function theory and we content ourselves here to present the result
without derivation (see e.g. Cohen (1969, pp. 8082)),
h
i
(11.27)
vn (w) = h3(+)w (n3m)@2 Ln3m (dw) + (n3m31)@2 Ln+m+1 (dw)
3(+)w
+h
(1 )
"
X
3p@2 Lp (dw)
p=n+m+2
s
where = , d = 2 and where Lv (}) denotes the modied Bessel function (Abramowitz and Stegun, 1968, Section 9.6.1). Using the asymptotic
formulas for the modied Bessel function, the behavior of vn (w) for large w
can be derived (see e.g. Cohen (1969, p. 84)),
s 2
s
s
(nm)@2 h(1 ) w
1
vn (w) = (1 ) +
n
m
+ R(w )
s
s
s s 3@2
1
1
2 w
1
only if = 1
1 + R(w1 )
=s
w
n
s 2
converges to the steady-state (1 )n with a relaxation rate 1 .
Clearly, the higher , the lower the relaxation rate and the slower the process
tends to equilibrium as illustrated in Fig. 11.3. Intuitively, two eects play
a role. Since the probability that states with large n are visited increases
with increasing , the built-up time for this occupation will be larger. In
addition, the variability of the number of visited states (further derived
for the M/M/1 queue in Section 14.1) increases with increasing , which
218
0.10
U = 0.8
U = 0.9
U = 0.7
400
300
0.06
U = 0.6
200
0.04
100
0.02
U = 0.4
0.0
0.00
0.2
0.4
20
0.8
U = 0.2
0.6
Relaxation time
s4(t,U)
0.08
40
60
80
100
t
Fig. 11.3. The probability v4 (w) that the process is in state 4 given that it started
from state 0 as function of time (in units of average death
s time, = 1) for various
= . The insert shows the relaxation time = (1 )2 (in units of average
death time, = 1). The corresponding steady state probability 4 are 0.0012, 0.015,
0.051, 0.072, 0.082, 0.065 for = 0=2> 0=4> 0=6> 0=7> 0=8> 0=9 respectively. Observe that
for = 0=9, the plotted 100 time units are smaller than the relaxiation time, which
is 379 time units.
suggests that larger oscillations of the sample paths around the steady-state
are likely to occur, enlarging the convergence time.
n=1 zln
This constraint (9.8) destroys the symmetry in link weight structure (zlm =
P
PQ
zml ) because, in general, Slm 6= Sml since Q
n=1 zln 6=
n=1 zmn . The sequence of nodes (or links) visited by that packet resembles a random walk
on the graph J(Q> O) and constitutes a Markov chain. Moreover, the steadystate of this Markov process is readily obtained by observing that the chain is
219
time reversible. Indeed, the condition for time reversibility (10.27) becomes
l zlm
m zml
= PQ
PQ
n=1 zln
n=1 zmn
or, since zlm = zml ,
l
PQ
n=1 zln
PQ
m
= PQ
n=1 zmn
This implies that l = n=1 zln and using the normalization kk1 = 1,
we obtain the steady-state probabilities for all nodes l,
PQ
PQ
zln
n=1 zln
= PQ n=1
l = PQ PQ
PQ
2 l=1 n=l+1 zln
l=1
n=1 zln
This Markov process can model an active packet that monitors the network by collecting state information (number of packets, number of lost or
retransmitted packets, etc.) in each router. Of course, the link weight structure zlm for the active packet is decisive and requires additional information
to be chosen e!ciently. For example, for tra!c monitoring, the distribution
of the number of packets forwarded by each router must be obtained. For
the collection of these data, the active packet should in steady-state visit
all nodes about equally frequently or l = Q1 , implying that the Markov
transition matrix S must be doubly stochastic (see Appendix A.5.1).
11.5 Slotted Aloha
The Aloha protocol is a basic example of a multiple access communication
scheme of which Ethernet2 is considered as the direct descendant. Aloha
which means hello in the Hawaiian language was invented by Norman
Abramson at the university of Hawaii in the beginning of 1970s to provide
packet-switched radio communication between a central computer and various data terminals at the campus. Slotted Aloha is a discrete-time version
of the pure Aloha protocol, where all transmitted packets have equal length
and where each packet requires one timeslot for transmission.
Consider a network consisting of Q nodes that can communicate with
each other via a shared communication channel (e.g. a radio channel) using
the slotted Aloha protocol. The simplest arrival process D of packets at
each node is a Poisson process. We assume that these Poisson arrivals at a
2
The essential dierence with the Ethernets CSMA/CD (carrier sense multiple access with
collision detection) is that Aloha does not use carrier sensing and does not stop transmitting
when collisions are detected. Carrier sensing is only adequate if the nodes are near to each
other (as in a local area network) such that collisions can be detected before the completion of
transmission. Only then is a timely reaction possible.
220
node are independent from the Poisson arrivals at another node and that all
Poisson arrivals at a node have the same rate Q where is the overall arrival rate at the network of Q nodes. The idea of the Aloha protocol is that,
upon receipt of a packet, the node transmits that newly arrived packet in the
next timeslot. In case two nodes happen to transmit a packet at the same
timeslot, a collision occurs, which results in a retransmission of the packets.
A node with a packet that must be retransmitted is said to be backlogged.
Even if new packets arrive at a backlogged node, the retransmitted packet
is the rst one to be transmitted and, for simplicity (to ignore queueing
of packets at a node), we assume that those new packets are discarded. If
backlogged nodes retransmit the packet in the next timeslot, surely a new
collision would occur. Therefore, backlogged nodes wait for some random
number of timeslots before retransmitting. We assume, for simplicity, that
su is the probability (which is the same for all backlogged nodes) that a
successful transmission occurs in the next time slot. Moreover, the probability su of retransmission is the same for each timeslot. The number of time
slots between the occurrence of a collision and a successful transmission is
a geometric random variable Wu (see Section 3.1.3) with parameter su such
that Pr [Wu = n] = su (1 su )n31 .
221
Q m q
sd (1 sd )Q3m3q
xq (m) =
q
A packet is transmitted successfully if and only if (a) one new arrival and
no backlogged packet or (b) no new arrival and one backlogged packet is
transmitted. The probability of successful transmission in state m and per
time slot equals
sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m)
The transition probability Sm>m+p equals
;
A
A
?
Sm>m+p
xp (m)
2pQ m
p=1
x1 (m) (1 e0 (m))
=
x
(m)
e
(m)
+
x
(m)
(1
e
(m))
p
=0
A
0
0
1
A
= 1
p = 1
x0 (m) e1 (m)
The state with m backlogged nodes jumps to the state m 1 with one backlogged node less if no new packets are sent and there is precisely 1 successful
retransmission. The state m remains in the state m if there is 1 new arrival
and there are no retransmission or if there are no new retransmissions and
none or more than 1 retransmission. The state m jumps to state m +1 if there
is 1 new arrival from a non-backlogged node and at least 1 retransmission
because then there are surely collisions and the number of backlogged nodes
increases by 1. Finally, the state m jumps to state m + p if p new packets
arrive from p dierent non-backlogged nodes, which always causes collisions
irrespective of how many backlogged nodes also retransmit in the next time
slot.
The Markov chain is illustrated in Fig. 11.4, which shows that the state
can only decrease by at most 1.
P0i
P01
0
1
P10
P00
2
P21
P11
P04
P03
P02
3
P32
P22
P44
PNN
P43
P33
Fig. 11.4. Graph of the Markov chain for slotted Aloha. Each state m counts the
number of backlogged nodes.
222
S0Q
9 S10 S11 S12
S1Q
9
9 0 S21 S22
S2Q
9
S =9 .
..
.
.
.
.
..
..
..
..
9 ..
.
9
7 0
0 SQ31>Q32 SQ31;Q31 SQ31;Q
0
0
0
SQ;Q31
SQQ
6
:
:
:
:
:
:
:
8
(11.28)
and equals the expected number of new arrivals minus the expected number
of successful transmissions. This quantity H [[n+1 [n |[n = m] is often
called the drift. If the drift is positive for all timeslots n, the Markov chain
moves (on average) to higher
or to the right in Fig. 11.4. Since
states
Q<"
Thus, the drift tends to innity, which means that, on average, the number of
backlogged nodes increases unboundedly and suggests (but does not prove,
a counter example is given in problem (ii) of Section 9.4) that the Markov
chain is transient for Q $ 4.
A more detailed discussion and engineering approaches to cure this instability are found in Bertsekas and Gallager (1992, Chapter 4). The interest
of the analysis of slotted Aloha lies in the fact that other types of multiple
access protocols, such as the important class of carrier sense multiple access (CSMA) protocols, can be deduced in a similar manner. Of the CSMA
class with collision detection, Ethernet is by far the most important because it is the basis of local area networks. Multiple access protocols of the
CSMA/CD type are discussed in our book Data Communications Networking (Van Mieghem, 2004a).
223
(Q m) sd
msu
=
+
(1 sd )Q3m (1 su )m
1 sd
1 su
For small arrival probability sd and small retransmission probability su , the
probability of successful transmission in state m can be approximated by
using the Taylor
expansions of (1 {) = h ln(13{) = h3{ (1 + r (1)) and
{
2 as
13{ = { + r {
224
slots, only performs half as e!ciently as slotted Aloha with PAloha = 18%.
Recall that each packet is assumed to have an equal length that corresponds
with the length of one timeslot. In pure Aloha, a transmitted packet at
time w is successful if no other packet is sent during (w 1> w + 1). This time
interval is precisely equal to two timeslots in slotted Aloha which explains
why PAloha = 12 SAloha . The same observation tells us that, in pure Aloha,
sqr (m) ' h32w(m) because in the successful interval the expected number of
arrivals and retransmissions is twice that in slotted Aloha. The throughput
V roughly equals the total rate of transmission attempts J (which is the same
as in slotted Aloha) multiplied by sqr (m) ' h32w(m) , hence, VPAloha = Jh32J .
5
4
(a)
0
0
0
P51
P12
0
0
0
P52
0
P23
P14
P24
0
P43
0
0
0
P15
P25
0
0
0
(b)
Fig. 11.5. A subgraph of the World Wide Web (a) and the corresponding transition
probability matrix S (b).
225
1
2
226
to an important document on the web, which itself does not refer to any
other webpage. The corresponding row in S possesses only zero elements,
which violates the basic law (9.8) of a stochastic matrix. To rectify the
deviation from a stochastic matrix, each zero row must be replaced by a
particular non-zero row vector3 y W that obeys (9.8), i.e. kyk1 = y W x = 1
where xW = [1 1 1]. Again, the simplest recipe is to invoke uniformity
W
and to replace
zero row byy W = xQ . In our example, we replace the
any
third row by 15 15 51 15 15 and obtain
5
0
9 0
9 1
S = 9
9 5
7 0
1
2
1
3
0
1
5
0
1
2
1
3
1
5
1
3
1
3
1
5
1
0
0
0
1
3
1
3
1
5
:
:
:
:
0 8
0
S = S + (1 )
x=xW
Q
S = S + (1 )xy W
3
(11.29)
We use the normal vector algebra convention, but remark that the stochastic vectors and
v[n] are also row vectors (without the transpose sign)!
1
4
6
4
1
For y W = 16
16
16
16
16
matrix in our example becomes
5
9
=
9
S =9
9
7
1
80
1
80
1
16
1
80
33
80
19
60
1
20
1
4
1
20
9
20
and =
3
40
41
120
3
8
7
8
3
40
19
60
19
60
1
4
1
20
1
20
4
5,
227
67
240
67
240
1
16
1
80
1
80
6
:
:
:
:
8
If the presented method were implemented, the initially very sparse matrix
=
S would be replaced by the dense matrix S , which for the size Q of the web
would increase storage dramatically. Therefore, a more eective way is to
dene a special vector u whose component um = 1 if row m in S is a zero-row
or node m is dangling node. Then, S = S + uy W is a rank-one update of S
=
and so is S because
=
(11.30)
This formula indicates that only the product of v[n] with the (extremely)
=
sparse matrix S needs to be computed and that S and S are never formed
nor stored. As shown in Appendix A.4.3, the rate of convergence of a Markov
chain towards the steady-state is determined by the second largest eigenvalue. Furthermore, Lemma A.4.4 demonstrates that, for any personaliza=
228
12
Branching processes
[n
X
\n>m
(12.1)
m=1
X1 = 3 = Y0
Fig. 12.1. A branching process with one root ([0 = 1) drawn as a tree in which all
nodes of generation n lie at a same distance n from the root (label 0).
230
Branching processes
process is entirely dened by the basic law (12.1) and the distribution of
the initial set [0 . The basic law (12.1) indicates that the number of items
[n+1 in generation n + 1 is only dependent on the number of items [n in
the previous generation n. The Markov property (9.2)
Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ]
5
6
{n
X
= Pr 7
\ = {n+1 8
m=1
= Pr [{n \ = {n+1 ]
is obeyed, which shows that the branching process {[n }nD0 is a Markov
chain with transition probabilities Slm = Pr [l\ = m]. The discrete branching
process can be extended to a continuous-time branching process in which
items are produced continuously in time, rather than by generations. Since
continuous-time Markov processes are mathematically more di!cult than
their discrete counterpart, we omit the continuous-time branching processes
but refer to the book of Harris (1963) and to a simple example, the Yule
process, in Section 11.3.2.3.
There are many examples of branching processes and we briey describe
some of the most important. In biology, a certain species generates osprings
and the survival of that species after q generations is studied as a branching
process. In the same vein, what is the probability that a family name that is
inherited by sons only will eventually become extinct? This was the question
posed by Galton and Watson that gave birth to the theory of branching
processes in 1874. In physics, branching processes have been studied to
understand nuclear chain reactions. A nucleus is split by a neutron and
several new free neutrons are generated. Each of these free neutrons again
may hit another nucleus producing additional free neutrons and so on. In
micro-electronics, the avalanche break-down of a diode is another example of
a branching process. In queuing theory, all new arrivals of packets during the
service time of a particular packet can be described as a branching process.
The process continues as long as the queue lasts. The number of duplicates
generated by a ooding process in a communications network is a branching
process: a ooded packet is sent on all interfaces of a router except for
the incoming interface. The spread of computer viruses in the Internet can
be modeled approximately as a branching process. The application of a
branching process to compute the hopcount of the shortest path between
two arbitrary nodes in a network is discussed in Section 15.7.
231
(12.2)
m=1
Iteration starting from a given average H [[0 ] of the initial population gives
H [[n ] = n H [[0 ]
(12.3)
2
*00[n+1 (}) = *00[n (*\ (})) *0\ (}) + *0[n (*\ (}))*00\ (})
as
2
Var [[n+1 ] = *00[n+1 (1) + *0[n+1 (1) *0[n+1 (1)
2
2
= *00[n (1) (*0\ (1)) + *0[n (1)*00\ (1) + *0[n (1)*0\ (1) *0[n (1)*0\ (1)
2
Iteration starting from a given variance Var[[0 ] of the initial set of items
232
Branching processes
2(n31)
2n
m H [[0 ] Var [\ ]
m=n31
or
Var [[n ] = 2n Var [[0 ] + H [[0 ] Var [\ ] n31
1 n
1
(12.4)
Substitution into the recursion for Var[[n ] justies the correctness of (12.4).
The relations for the expectation (12.3) and the variance (12.4) of the
number of items in generation n imply that, if the average production
per generation is H [\ ] = = 1, H [[n ] = H [[0 ] and that Var[[n ] =
Var[[0 ] + nH [[0 ]Var[\ ]. In the case that the average production H [\ ] =
A 1 (H [\ ] ? 1), the average population per generation grows (decreases)
exponentially
p in n with rate log and, similarly for large n, the standard
deviation Var [[n ] grows (decreases) exponentially in n with the same rate
log . Hence, the most important factor in the branching process is the average production H [\ ] = per generation. The variance terms and H [[0 ]
only play a role as prefactor. A branching process is called critical if = 1,
subcritical if ? 1 and supercritical if A 1. In the sequel, we will only
consider supercritical ( A 1) branching processes.
Often, the initial set of items consists of only one item. In that case,
[0 = 1 and *[0 (}) = i (}) = } and
H [[n ] = n
Var [[n ] = Var [\ ] n31
1 n
1
while the explicit nested form of the probability generating function indicates
that
*[n+1 (}) = *\ (*[n (}))
(12.5)
233
[n
X
7
=H
\n>m [n 8 = H [\ [n |[n ] = [n
m=1
is a random variable, which suggests us to consider the scaled random varin
able Zn = [
because
n
H [Zn+1 |Zn > Zn31 > ===> Z1 ] = Zn
while (12.3) shows that H [Zn ] = H [[0 ] for all n. The stochastic process
{Zn }nD1 is a martingale process, which is a generalization of a fair game
with characteristic property that at each step n in the process H [Zn ] is a
constant (independent of n). From (12.4), the variance of the scaled random
n
is
variables Zn = [
n
Var [Zn ] = Var [[0 ] + H [[0 ]
Var [\ ]
3n
1
2
(12.6)
which geometrically tends, provided H [\ ] = A 1, to a constant independent of n. The expression for the variance (12.6) indicates that
Var [Z ] = lim Var [Zn ] = Var [[0 ] + H [[0 ]
n<"
Var [\ ]
2
(12.7)
234
Branching processes
Proof: Consider
i
h
2
+ H Zq2 2H [Zn+q Zq ]
H (Zn+q Zq )2 = H Zn+q
Using (2.72) with k({) = { and the Markov property,
h
Var [\ ] 3q
3n
1
H (Zn+q Zq )2 = Var [Zn+q ]Var [Zq ] = H [[0 ]
2
In the limit n $ 4,
h
i
H (Z Zq )2 = R 3q
q=1
P
2
which means that the series has nite expectation and that "
q=1 (Z Zq )
is nite with probability 1. The convergence of this series implies for large
q that (Z Zq )2 $ 0 with probability 1 or that Zq $ Z a.s.
Theorem 12.2.1 means that the number of items in generation n is, for
large n, well approximated by [n Z n . Hence, an asymptotic analysis
of a branching process crucially relies on the properties of the limit random
variable Z .
The generating function *Z (}) = H } Z of this limit random variable
can be deduced as the limit of the sequence of generating functions
[
n
Z
3n
n
(12.8)
*Zn (}) = H }
= H } n = *[n (} )
3n31
[n
n
1
*Zn+1 (}) = *\ *Zn }
235
1
*Z (}) = *\ *Z }
(12.9)
Since Z is a continuous random variable except at Z = 0 as explained
below (see (12.19)), it ismore convenient to dene the moment generating function "Z (w) = H h3wZ . Obviously, the relation between the two
generating functions is with } = h3w
"Z (w) = *Z (h3w )
With } = h3w in (12.9) the functional equation of "Z (w) is for w 0 and
H [Z ] = H [[0 ] = 1,
w
(12.10)
"Z (w) = *\ "Z
The functional equation (12.10) is simpler than (12.9) and "Z (w) is convex for all w, while *Z (}) is not convex for all }. In particular, *Z (}) =
"Z ( log }) is not analytic at } = 0 and appears2 to have a concave regime
00
near } & 0 where "0Z ( log }) + "Z ( log }) ? 0.
Lemma 12.2.2 "Z (w) is the only probability generating function satisfying
the functional equation (12.10).
W
Proof : Let #Z W (w) = H h3wZ and "Z (w) = H h3wZ be two probability generating functions that satisfy both (12.10). Then #Z W (w) "Z (w)
is continuous for Re w 0 and, since H[Z ] = H[Z W ] = 1, the Taylor series
(2.40) around w = 0 is
""
#
X (w)n
n
#Z W (w) "Z (w) = (w)H [Z W Z ] + H
(Z W ) Z n
n!
n=2
""
#
X (w)n
= wH
Z n+1 (Z W )n+1
(n + 1)!
n=1
from which #Z W (w) "Z (w) = w k(w) and k(0) = 0. Since |*0\ (})| for
|}| 1, equation (5.6) of the Mean Value Theorem implies |*\ (d) *\ (e)|
|d e| for any |d|> |e| 5 [0> 1]. Since |"Z (w) | 1 and |#Z W (w)| 1 for
2
This fact is observed for both a geometric and Poisson production distribution function.
236
Branching processes
Re(w) 0, we obtain
w
w
|w k(w)| = *\ #Z W
*\ "Z
w
w
w
= k
#Z W
"Z
or
w
|k(w)| k
After N iterations, we have that |k(w)| k wN which hold for any integer
N. Hence,for any nite
w and since
w
|k(w)| lim k
= k(0) = 0
N<"
N
which proves the Lemma.
Lemma 12.2.2 is important because solving the functional equation, for example by Taylor expansion, is one of the primary tools to determine "Z (w).
If *\ (}) is analytic inside a circle with radius U\ A 0 centered at } = 1,
then the Taylor series around }0 = 1,
*\ (}) = 1 +
"
X
xn (} 1)n
n=0
"
X
$n wn
(12.11)
n=1
237
w
? 1,
|w|]+1
[log
"Z [logw|w|]+1
(12.12)
exists and 0 ? 0 1. The existence of a limiting process and the fact that
the probability generating function is analytic for |}| ? 1 and hence, continuous, which allows us to interchange limn<" *\ (tn ) = *\ (limn<" tn )
238
Branching processes
(12.13)
w<"
and the convexity of "Z (w) implies that, for any real value of w, "Z (w) 0 .
An alternative, more probabilistic derivation of equation (12.13) is as
follows. Applying the law of total probability (2.46) to the denition of the
extinction probability
0 = Pr [[q = 0 for some q A 0]
"
X
=
Pr [[q = 0 for some q A 0|[1 = m] Pr [[1 = m]
m=0
Only if [0 = 1, relation (12.5) indicates that *[1 (}) = *\ (}) which implies
that Pr [[1 = m] = Pr [\ = m]. In addition, given the rst generation consists
of m items, the branching process will eventually terminate if and only if each
of the m sets of items generated by the rst generation eventually dies out.
Since each set evolves independently and since the probability that any set
generated by a particular ancestor in the rst generation becomes extinct is
0 , we arrive at
"
X
0m Pr [\ = m] = *\ (0 )
0 =
m=0
239
a
b
Pr[Ya = 0]
Pr[Yb = 0]
x3
0
S0 x2
x1
x0
Fig. 12.2. The generating function *\ ({) along the positive real axis {. The two
possible cases are shown: curve d corresponds to H [\ ] ? 1 and curve e to H [\ ] A 1.
The fast convergence towards the zero 0 is exemplied by the sequence {0 A {1 =
*\ ({0 ) A {2 = *\ ({1 ) A {3 = *\ ({2 ).
A root equation such as (12.13) also appears in queuing models such as the
M/G/1 (Section 14.3) and GI/D/1 (Section 14.4) and reects the asymptotic
behavior as explained in Section 5.7. The extinction probability 0 can be
expressed explicitly as a Lagrange series as demonstrated in Van Mieghem
(1996).
The branching process with innitely many generations n $ 4 can be
viewed as an innite directed tree where each node has a nite degree a.s.
The fact that 0 ? 1 if H [\ ] A 1 implies that, in innite directed trees,
there exists an innitely long path starting from the root with probability
1 0 .
Theorem 12.3.2 The limiting branching process with [0 = 1 obeys for
240
Branching processes
n$4
Pr [[n = 0] $ 0
Pr [[n = m] $ 0
for any m A 0
Proof: First, if H [\ ] ? 1, then Theorem 12.3.1 states that 0 = 1. For any probability
generating function * (}), it holds that |* (}) | $ 1 for |}| $ 1. Hence, *[n ({) $ 1 for real
{ M [0> 1]. Moreover, tn = *[n (0) $ *[n ({). In the limit n < ", tn < 0 = 1, which implies
that for all { M [0> 1] it holds that *[n ({) < 0 = 1. The fact that a probability generating
function, a Taylor series around } = 0, converges to a constant 0 for 0 $ { $ 1 implies that
Pr [[n = m] < 0 for any m A 0 and Pr [[n = 0] < 0 .
The second case H [\ ] A 1 possesses an extinction probability 0 ? 1. For { M (0 > 1), Fig. 12.2
shows that 0 ? *\ ({) ? { ? 1. By induction using (12.5), we nd that 0 ? *[n ({) ?
*[n1 ({) ? ? 1 or limn<" *[n ({) = 0 for { M (0 > 1). For { M [0> 0 ), the same argument
tn = *[n (0) $ *[n ({) $ 0 shows that limn<" *[n ({) = 0 for { M [0> 1). This proves the
theorem.
Since "Z (w) M [0 > 1] for real w D 0, then *0\ (0 ) $ *0\ "Z m w $ for any m. Theorem 12.3.1
0
states that if = *\ (1) A 1, then there are two zeros 0 and 1 of i (}) = *\ (})3} in } M [0> 1]. By
Rolles Theorem applied to the continuous function i (}) = *\ (}) 3 }, there exists an M (0 > 1)
241
for which i 0 () = 0. Equivalently, *0\ () = 1 and A 0 . Since *0\ (}) is monotonously increasing
in } M [0> 1], we have that *0\ (0) = Pr [\ = 1] $ *0\ (0 ) ? 1. Since "Z (w) is continuous and
monotone decreasing, there exists an integer N0 such that *0\ "Z m w ? 1 for m A N0 and
any w A 0. Hence,
lim
N<"
N31
\
"
0 31
N\
\
*0\ "Z m w =
*0\ "Z m w
*0\ "Z m w < 0
m=0
m=0
m=N0
N
w < 0 for N < " which implies the lemma.
Lemma 12.4.1 is, for large w, equivalent to |"0Z (w) | Fw313 for some
real A 0 and where F is a nite positive real number. Lemma 12.4.1 thus
suggests that
"0Z (w) = j (w) w331
(12.14)
(12.15)
3"0Z ( )
00
"Z ( )
( + 1) A 0
Since "0Z (0) = 31, implying that j (w) = w+1 (1 + r(w) as w 0 or that j (w) is initially monotone
increasing in w, the extremum at w = is a maximum. The derivative of j (w) = 3"0Z (w) w+1 is,
with (12.14),
0
(w) =
j
+1
+1
j (w) 3 "00
Z (w) w
w
+2 00
"
+1 Z
0
j
(w) $
( ). Since "00
Z (w) D 0 for all w, we also obtain the
+1
+1
j (w) $
F
w
w
j (w) D Dj
w
242
Branching processes
For w ? , j (w) is shown in (a) to be monotone increasing, which requires that D D 1 for A 1.
But, since the inequality with D D 1 holds for all w A 0, we must have that < ". Hence, j (w)
is continuous and strict increasing for all w D 0 with a maximum at innity, which proves the
existence of a unique limit I $ F.
If I = 0, the suggestion (12.14) is not correct implying that "0Z (w) decreases faster than any
power of w31 . The proof of Lemma 12.4.1 indicates that his case can occur if *0\ (0 ) = 0.
(12.17)
(12.18)
I
{31
( + 1)
(12.19)
243
10
P =5
0.8
P =4
-1
10
P =3
-2
10
fW(x)
0.6
increasing P
fW (x)
-3
10
P =2
0.4
increasing P
-4
10
-5
10
0.2
10
increasing P
Poisson
Geometric
0.0
0
10
Fig. 12.3. The probability density function of the limit random variable Z for both
a geometric and a Poisson production process for a same set of values of the average
= H[\ ].
dg ef
(} {)
(g + f}) (g + f{)
f{1 + g
f{0 + g
} {0
} {1
The linear fractional transformation i (}) is an automorphism of the extended complex plane
and basic in the geometric theory of a complex function for which we refer to the book of
Sansone and Gerretsen (1960, vol. 2). Fixed points of an automorphism of the extended plane
are solutions of } = i (}), which is a quadratic equation f} 2 + (g 3 d) } 3 e = 0 and which
shows that there are at most two dierent xed points.
244
Branching processes
Let us now conne to the two xed points, {0 and {1 , of i (}) that are
1 +g
solution of i (}) = } and let = f{
f{0 +g , then
i (}) {0
} {0
=
i (}) {1
} {1
Now, substitute } $ i (}), then
i (i (})) {0
} {0
i (}) {0
=
= 2
i (i (})) {1
i (}) {1
} {1
Let us denote the iterates of i (}) by zq = iq (}) = i (iq31 (})). By iterating, we nd that the iterates obey
zq {0
} {0
= q
zq {1
} {1
or
zq =
{0 (} {1 ) {1 q (} {0 )
} {1 q (} {0 )
(12.20)
t
1
s{1 + 1
= =
s{0 + 1
s
The functional equation (12.5) associates zq = *[q (}) and after substitution in (12.20) we obtain
q31
1 } q31 + 1
(12.21)
*[q (}) =
(q 1) } q + 1
In the case that H [\ ] = $ 1 or s = t, using the rule of de lHospital gives
*[q (}) =
(q 1)} q
q} q + 1
1 n 1
Pr [[n = 0] = *[n (0) =
n 1
245
*Zn (}) =
3n
n31 1 } n31 +
(n 1) } 3n n +
1
1
1 log } + 1
log } + 1
1
1
(12.22)
and
"Z ;Geo (w) =
w
+1
w+1
1
1
=1+
"
X
n31 (1)n
n31
( 1)
n=0
wn
(12.23)
Since "Z (w) = H h3Z w and using (2.40), all moments are found as H Z n =
n!n31
.
(31)n31
Furthermore, with 0 = *Z (0) = 1 and from (2.38), the probability density function follows as
Z f+l" w + 1 1
1
{w
iZ ;Geo ({) =
h gw
(f A 0)
1
2l f3l" w + 1
By closing the contour for { A 0 over the negative Re(w)-plane, we encounter
a simple pole at w = 1 + 1 = (1 0 ) ? 0 (since A 1) resulting in
;
2
1
1
A
{A0
1
exp
{
1
?
1
iZ ;Geo ({) =
(12.24)
{=0
({)
A
=
0
{?0
From (12.7) the variance is Var[ZGeo ] = +1
31 . The limit random variable
ZGeo of a geometric branching process is exponentially distributed with an
atom at { = 0 equal to the extinction probability 0 = 1 . From (12.17),
the exponent Geo = 1 for any value of 1. Comparing (12.24) and
the general relation (12.19) for small { indicates that the parameter I =
2
1
1
for a geometric production process.
The limit random variable Z for production processes \ of which all moments exist can be computed via Taylor series expansions. In Van Mieghem
246
Branching processes
(2005), series for both "Z ;Po (w) and iZ ;Po ({) of a Poisson branching process
are presented. Fig. 12.3 illustrates that the probability density function
iZ ;Po ({) of a Poisson branching process is denitely distinct from that of
1
geometric branching process. Since H [Z ] = 1, the variance Var[ZPo ] = 31
of a Poisson limit random variable ZPo implies that iZ ;Po ({) is centered
around { = 1 more tightly as increases.
13
General queueing theory
Queueing theory describes basic phenomena such as the waiting time, the
throughput, the losses, the number of queueing items, etc. in queueing
systems. Following Kleinrock (1975), any system in which arrivals place
demands upon a nite-capacity resource can be broadly termed a queueing
system.
Queuing theory is a relatively new branch of applied mathematics that
is generally considered to have been initiated by A. K. Erlang in 1918 with
his paper on the design of automatic telephone exchanges, in which the famous Erlang blocking probability, the Erlang B-formula (14.17), was derived
(Brockmeyer et al., 1948, p. 139). It was only after the Second World War,
however, that queueing theory was boosted mainly by the introduction of
computers and the digitalization of the telecommunications infrastructure.
For engineers, the two volumes by Kleinrock (1975, 1976) are perhaps the
most well-known, while in applied mathematics, apart from the penetrating
inuence of Feller (1970, 1971), the Single Server Queue of Cohen (1969)
is regarded as a landmark. Since Cohens book, which incorporates most
of the important work before 1969, a wealth of books and excellent papers
have appeared, an evolution that is still continuing today.
248
Service process
Arrival
process
Departure
process
Queueing process
249
(13.1)
The service process needs additional specications. First of all, in a singleserver queueing system, only one packet (customer) is served at a time. If
there is more than one server, more packets can evidently be served simultaneously. Next, we must detail the service discipline or scheduling rule,
which describes the way a packet is treated. There is a large variety of
service disciplines. If all packets are of equal priority, the simplest rule
is rst-in-rst-out (FIFO), which serves the packets in the same order in
which they arrive. Other types such as last-in-rst-out or a random order are possible, though in telecommunication, FIFO occurs most often. If
we have packets of dierent multimedia ows, all with dierent quality of
service requirements, not all packets have equal priority. For instance, a
delay-sensitive packet (of e.g. a voice call) must be served as soon as possible preferably before non-delay-sensitive packets (of e.g. a le transfer).
In these cases, packets are extracted from the queue by a certain scheduling rule. The simplest case is a two-priority system with a head-of-the line
scheduling rule: high-priority packets are always served before low-priority
packets. In the sequel, we conne the presentation to a single-server system
with one type of packet and a FIFO discipline. Hence, we omit a discussion
of scheduling rules. A next assumption is that of work conservation: if there
is a packet waiting for service, the server will always serve the packet. Thus,
the server is only idle if there are no packets waiting in the buer and immediately starts service when the rst packet is placed in the queue or arrives.
In a non-work-conservative system, the server may stay idle, even if there
are customers waiting (e.g. a situation where patients have to wait during
a coee break in a hospital). Finally, we assume that the arrival process
is independent of the service process. Situations where arriving packets of
some type (e.g. control) change the way the remaining packets in the buer
are served, or a service discipline that serves at a rate proportional to the
number of waiting packets, are not treated.
The service in a router consists in fetching the packet from the buer,
inspecting the header to determine the correct output port and in placing
the packet on the output link for transmission.
In this chapter unless the contrary is explicitly mentioned, we consider
250
251
v(t)
x2
x1
x3
x6
w4
w3
w2
x4
x5 w
6
r1
r2
r3
r4
r5
r6
NS(t)
t1
t2
t3
t4
busy period
t5
t6
idle
Fig. 13.2. The unnished work y(w) and the number of packets in the system QV (w)
as function of time. At any new arrival at wq holds y(wq ) = zq +{q . The unnished
work y(w) decreases with slope 1 between two arrivals. The waiting times zq and
departure times uq are also shown. Notice that z1 = z5 = 0.
252
H [{]
=
H [ ]
(13.2)
where and are the mean interarrival and service rate, respectively.
Clearly, if A 1 or H [{] A H [ ], which means that the mean service time is
longer than the mean interarrival time, then the queue will grow indenitely
long for large w, because packets are arriving faster on average than they
can be served. In this case ( A 1), the queueing system is unstable or will
never reach a steady-state. The case where = 1 is critical. In practice,
therefore, mostly situations where ? 1 are of interest. If ? 1, a steadystate can be reached. These considerations are a direct consequence of the
law of conservation of packets in the system, but can be proved rigorously
by ergodic theory or Markov steady-state theory, which determine when the
process is positive recurrent.
13.2 The waiting process: Lindleys approach
From the denition of the waiting time and from Fig. 13.2, a relation between
zq+1 and zq is found. Suppose the waiting time for the rst packet z1 = z,
which is the initialization. If uq wq+1 , which means that the q-th packet
leaves the queueing system before the (q + 1)-th packet arrives, the system
is idle and zq+1 = 0. In all other situations, uq A wq+1 , the q-th packet
is still in the queueing system while the next (q + 1)-th packet arrives and
zq+1 = wq + zq + {q wq+1 . Indeed, the waiting time of (q + 1)-th packet
equals the system time Wq = zq + {q of the q-th packet which started
at wq minus his own arrival time wq+1 . During the interval [wq > wq+1 ], the
queueing system has processed an amount of the unnished work equal to
wq+1 wq = q+1 time units. Hence, we arrive at the general recursion for
the waiting time,
zq+1 = max (zq + {q q+1 > 0)
Let q = {q q+1 , then
zq+1 = max [0> zq + q ]
= max [0> max [zq31 + q31 > 0] + q ]
= max [0> q > zq31 + q31 + q ]
(13.3)
and, by iteration,
"
zq+1 = max 0> q > q31 + q > q32 + q31 + q > = = = >
q
X
253
#
n + z1
(13.4)
n=1
n=p
In other words, this relation is similar to (13.4) as if the system were started
from n = p and zp = 0 instead of n = 1 with z1 = z. Any busy period can
be regarded as a renewal of the waiting process, independent of the previous
busy periods.
Third, again invoking the assumption that q are i.i.d. random variables,
then the order in the sequence {q }qD1 is of no importance in (13.4) and
we may relabel the random variables in (13.4) as n $ q3n+1 to obtain a
new random variable
"
#
q31
q
X
X
$q+1 = max 0> 1 > 1 + 2 > = = = >
n >
n + z1
n=1
n=1
254
P
cannot decrease2 if an additional term q+1
n=0 n is added. Thus, if z1 = 0,
the event {$q+1 ? {} is always contained in {$q ? {}. In steady-state,
which is reached if q $ 4,
lim {$q ? {} =
q<"
_"
q=1 {$q
? {} = {sup
mD0
m
X
n ? {}
n=0
which means that the random variable $q with same distribution as the
waiting time zq converges to a limit random variable that is the supremum
P
of the terms mn=0 n in the series. From this relation, it follows that the
steady-state distribution Z ({) of the waiting time is
"
#
m
X
n ? {
Z ({) = lim Pr [zq ? {] = lim Pr [$q ? {] = Pr sup
q<"
q<"
mD0
n=0
if the latter probability exists, i.e. not zero for all {. Lindley has proved that,
if ? 1, the latter corresponds to a proper probability distribution. In other
words, the steady-state distribution of the waiting time in a GI/G/1 system3
exists. Alternatively, the Markov process {zq }qD1 is ergodic if ? 1.
Lindleys proof is as follows. Due to the assumption that q are
i.i.d. random variables,
the
1 Sq
Strong Law of Large Numbers (6.3) is applicable: Pr limq<" q
n=0 n = H [] = 1 where
H [] = H [{] 3 H [ ] ? 0 (the mean service time is smaller than the mean interarrival time) if
? 1 while H [] A 0 if A 1. In case A 1, there exists a number A 0 and A 1 such
Sq
Sq
1
that, for all q A , holds
n=0 n D H [] q with probability 1. For large ,
n=0 n can be
k
l
Sm
made larger than any xed { such that Pr supmD0 n=0 n ? { = 0. In case ? 1, we have for
Sq
su!ciently large q that
n=0 n ? 0. Thus, for any { A 0 and A 0, there exists a number
(independent of {) such that, for all q A ,
% q
&
% q
&
[
[
Pr
n ? { D Pr
n ? 0 A 1 3
n=0
n=0
Sm
Since supmD0 n=0 n is attained for m ? or m A and because both regimes can be bounded
by the same lower bound,
6
5
&
%qA
&
%q?
m
[
[
[
n ? {8 A Pr
n ? { 1m? + Pr
n ? { 1mA
Pr 7sup
mD0 n=0
n=0
n=0
A13
k
l
k
l
S
S
Clearly, lim{<" Pr supmD0 mn=0 n ? { = 1 and Pr [zq ? 0] = 0, thus Pr supmD0 mn=0 n ? {
2
3
This observation cannot be made from (13.4) because q , which aects all but the rst term
in the maximum, can be negative.
Notice that the analysis crucially relies on the independence of the interarrival and service
process.
255
is non-decreasing and a proper probability distribution. We omit the considerations for the case
= 1.
if { 0
Pr [zq+1 ? {] = 0
if { ? 0
With the law of total probability (2.46) and since q can be negative, the
right-hand side is
Z "
g
Pr [zq + q ? {] =
Pr [q ? v] gv
Pr [zq ? { v|q = v]
gv
3"
Using the independence of zq and q , and that zq 0, we obtain for { 0,
Z {
Pr [zq ? { v] g Pr [q ? v]
Pr [zq + q ? {] =
3"
=0
if { ? 0
The integral equation (13.5) is of the Wiener-Hopf type and treated in general by Titchmarsh (1948, Section 11.17) and specically by Kleinrock (1975,
Section 8.2) and Cohen (1969, p. 337). Apart from Lindleys approach, Pollaczek has used variants of the complex integral expression for
Z
{h3d{ f+l" h{}
max({> 0) =
g}
(f A Re(d) A 0)
2l f3l" } d
to treat the complicating non-linear function max({> 0) in (13.3). Several
other approaches (Kleinrock, 1975, Chapter 8) have been proposed to solve
(13.3). We will only discuss the approach due to Benes, because his approach
does not make the conning assumption that both the interarrival times
q+1 and the service times {q are i.i.d. random variables. As mentioned
before, in Internet tra!c, which has been shown to be long-range dependent
256
(i.e. correlated over many time units mainly due to TCPs control loop), the
interarrival times can be far from independent.
D(t)
[(W)
v(W)
D(u)
x1+ x2+ x3
[(u)
v(u)
x1+ x2
u
x1
idle
t
0 t1 t2
t3 t 4
t5
t 6 t7
t8 t9
t10
Fig. 13.3. The amount of work arriving to the queueing system (w) versus time
w. At w = x, we observe that (w) = (w) w + y(0 ) A 0. The largest value of
(w) (x) is found for x = w1 because (w1 ) = w1 , the only negative value of (w)
in [0> x). Graphically, we shift the line at 45r so to intersect the point (w1 > (w1 )) to
determine y(x). At w = , ( ) ? 0 and the largest negative value of (w) in [0> )
is attained at w = w8 . Three of the ve idle periods have also been shown.
257
D(w)
(w) =
{m
m=0
In general, however, the work may arrive continuously over time with possibly jumps at certain times. The purpose is to determine the unnished
work or virtual waiting time y(w) at time instant w, and not Rover a time
w
interval [0> w) as the previously dened quantities and D(w) = 0 QD (x)gx.
Clearly, for w A 0, the unnished work at time w consists of the total amount
of work brought in by arrivals during [0> w) plus the amount of work present
just before w = 0 minus the total time the server has been active,
y(w) = y(03 ) + (w) e(w)
(13.6)
(13.7)
Moreover, (w), (w) and e(w) are non-decreasing and right continuous (jumps
may occur) functions of time w. Since (w) and e(w) are complementary, it is
convenient to eliminate e(w) from (13.6) and (13.7) and further concentrate
on the total idle time (w) given as
(w) = y(w) + w y(03 ) (w)
(13.8)
If y(x) A 0 at any time x 5 [0> w), then (w) = 0. On the other hand, if
y(x) = 0 at some time x 5 [0> w), then it follows from (13.8) that
(x) = x y(03 ) (x)
(13.9)
Since (w) is non-decreasing in w, the total idle time in the interval [0> w) at
moments x when the buer is empty (y(x) = 0) is the largest value for
that can be reached in [0> w),
and the supremum is needed because (w) can increase discontinuously (in
jumps). Combining the two regimes, we obtain in general that
3
(13.10)
(w) = max 0> sup x y(0 ) (x)
0?x?w
258
Equating the two general expressions (13.8) and (13.10) for the total idle
time of the server leads to an new relation for the unnished work,
3
3
y(w) = y(0 ) + (w) w + max 0> sup x y(0 ) (x)
0?x?w
3
= max y(0 ) + (w) w> sup {(w) w ((x) x)}
0?x?w
y(w) = max (w)> sup {(w) (x)}
0?x?w
and, with the convention that y(03 ) = sup0?x?03 {(0) (x)} = (0),
y(w) = sup {(w) (x)}
(13.11)
0?x?w
The unnished work y(w) at time w is equal to the largest value of the overload
or excess work during any interval [x> w) [0> w). The relation (13.11) is
illustrated and further explained in Fig. 13.3. This general relation (13.11)
shows that the unnished work is the maximum of a stochastic process.
Furthermore, if y(w) = 0, (13.11) indicates that sup0?x?w {(w) (x)} = 0.
Let xW denote the value at which sup0?x?w {(w) (x)} = (w) (xW ) = 0.
But (xW ) is the lowest value of (w) in [0> w) and, unless an arrival occurs
during the interval [w> w+ w], (w + w) = (w) w ? (w). This argument
shows that, as soon as a new idle period begins, (w) attains the minimum
value so far.
During the idle period as shown in Fig. 13.4, (w) further decreases linearly
with slope 1 towards a new minimum (em ) in [0> em ] until the beginning of
a new busy period, say the m-th at w = em . Then, for all em ? w ? em+1 ,
y(w) = (w) (em ) = sup {(w) (x)}
em ?x?w
In other words, we observe that idle periods decouple the past behavior from
future behavior, as deduced earlier from the waiting time analysis in Section
13.2. As illustrated in Fig. 13.4, the series {(em )}, where em denotes the start
of the m-th busy period, is monotonously decreasing in em , i.e. (em ) A (em+1 )
for any m.
259
[(t)
b1
b3
b2
b4
b5
t
0 t1 t2
t3 t4
t5
t6
t7
t8 t9
t10
Fig. 13.4. The excess work (w) for the same process as in previous plot. The arrows
with em denote the start of the m-th busy period. Observe that (em ) is the minimum
so far and that a busy period ends at w A em for which (w) = (em ). The length of
a busy period has been represented by a double arrow.
Let us proceed to compute the distribution of the unnished work following an idea due to Benes. Benes applies the identity4 , valid for all },
Z w
3}w
=1}
h3} g
h
0
31 (w)
=1}
h3}(x) g(x)
31 (0)
where 31 (w) is the inverse function. Note that (0) = 0 and that g(x) =
1{y(x)=0} gx = (y(x)) gx where ({) is the Dirac impulse. Let w $ (w),
then
Z w
3}(w)
=1}
h3}(x) (y(x)) gx
h
0
Substituting (13.9) in the integral, which is only valid if y(x) = 0, and (13.8)
at the left-hand side, which is generally valid, gives
Z w
3
3
h3}(y(w)+w3y(0 )3(w)) = 1 }
h3}(x3y(0 )3(x)) (y(x)) gx
0
4
Borovkov (1976, p. 30) proposes another but less simple approach by avoiding the use of the
identity ingeniously introduced by Benes=
260
Recall that, with (2.34), with the denition of a generating function (2.37),
and further with (2.61),
h
i
T = H h3}((w)3(x)) (y(x))
Z "Z "
C2
Pr [(w) (x) {> y(x) |] g{g|
=
h3}{ (|)
C{C|
3" 3"
and with (2.45), we have
Z
C2
h}{ (|)
Pr [(w) (x) {|y(x) |] Pr [y(x) |] g{g|
C{C|
4 4
Z 4
Z 4
C Pr [y(x) |]
C
g{h}{
(|) Pr [(w) (x) {|y(x) |]
=
g|
C{ 4
C|
4
Z 4
g
h}{
=
Pr [(w) (x) {|y(x) = 0] Pr [y(x) = 0] g{
g{
4
4
T=
"
3"
]
h3}{ iy(w) ({)g{ =
"
3"
"
g
g{
h3}{
3}
3"
]
Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{
3"
Hence, we arrive at
]
"
]
h3}{ Pr [y(w) $ {] g{ =
3"
k
h3}{ Pr [(w) $ {]
"
3"
which is equivalent to
Pr [y(w) {] = Pr [(w) {]
g
g{
]
Z
0
Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{
261
This general relation for the distribution of the unnished work in terms
of the excess work is the Benes equation. If y(x) = 0 for all x 5 [0> w),
this means that during that interval no work arrives and that (w) (x) =
x w or that w (w) (x) 0 for any x 5 [0> w). Thus, if we choose
{ 5 [w> 0) such that the event {(w) (x) = x w {} is possible, the
probabilities appearing in the right-hand side are not identically zero while
Pr [y(w) {] = 0. Hence, for { 5 [w> 0), the Benes equation reduces to
Z w
g Pr [(w) (x) {|y(x) = 0]
Pr [y(x) = 0] gx
Pr [(w) {] =
g{
0
from which the unknown probability of an empty system Pr [y(x) = 0] can
be found5 for w + { x w. The Benes equation translates the problem
of nding the time-dependent virtual waiting time or unnished work in an
integral equation that, in principle, can be solved. We further note that
in the derivation hardly any assumptions about the queueing system nor
the arrival process are made such that the Benes equation provides the
most general description of the unnished work in any queueing system. Of
course, the price for generality is a considerable complexity in the integral
equation to be solved. However, we will see examples6 of its use in ATM.
13.3.1 A constant service rate
If the server operates deterministically as in ATM, for example, the amount
of work arriving to the queueing system in the interval [0> w) simplies to
(w) = D(w), the number of ATM cells in the interval [0> w), because {m = {
is the time to process one ATM cell, which we take as time unit { = 1.
With this convention, we have that (w) = D(w) w. After substitution of
x = w |, the integral L in (13.12) is
Z w
g Pr [(w) (w |) {|y(w |) = 0]
Pr [y(w |) = 0] g|
L=
g{
0
and the event
{(w) (w |) {} = {D(w) D(w |) { + |}
5
This relation in the unknown function i (x) = Pr [y(x) = 0] is a Volterra equation of the rst
kind (see e.g. Morse and Feshbach (1978, Chapter 8))
] }
j(}) =
N(}|x)i (x)gx
d
These integral equations frequently appear in physics in boundary problems, potential and
Greens function theory.
Borovkov (1976) investigates the Benes method in more detail. He further derives from (13.12)
formulae for light and heavy tra!c, and the discrete time process.
262
b{+wc
L=
n=d{e
Hence, for a discrete queue with time slots equal to the constant service
time, the Benes equation reduces to
Pr [y(w) {] = Pr [D(w) b{ + wc]
X
(13.13)
b{+wc
n=d{e
(w)
lim (w) = lim w
1 = 4
w<"
w<"
w
and thus limw<" Pr [(w) {] = 1 and limw<" (w)
w = 1 ? 0. From (13.7),
we see that
(w)
e(w)
= 1 lim
=1
lim
w<" w
w<" w
263
(13.14)
If the Strong Law of Large Numbers is applicable, which implies that the
lengths of the idle periods are independent and identically distributed, this
relation Y (0) = 1 is proved to be true by Borovkov (1976, pp. 3334).
Hence, for any stationary single-server system with tra!c intensity , the
probability of an empty system at an arbitrary time is 1 .
Taking the limit w $ 4 in (13.12) then yields
Z w
gPr[(w)(w |) {|y(w |) = 0]
Pr[y(w |) = 0]g|
Y ({) = 1 lim
w<" 0
g{
The tail probability 1 Y ({) = limw<" Pr [y(w) A {] = Pr [y(w" ) A {] is
Z "
gPr[(w" )(w"|) {|y(w"|) = 0]
Pr[y(w" |) = 0]g|
Pr[y(w" ) A {] =
g{
0
This relation shows that, at a point in the steady-state w" $ 4, the contributions to Y ({) are due to arrivals and idle periods in the past. The
corresponding steady-state equation for (13.13) is
Z w"
"
X
Pr [y(w" ) A {] =
Pr y(w" + { n) = 0
QD (x) gx = n
n=d{e
Z
w" +{3n
w"
Pr
w" +{3n
QD (x) gx = n
(13.15)
264
interarrival times are specied, the counting process has more advantages
in a discrete time analysis. In the latter, the queueing system is observed
at certain moments in time, for instance, at the beginning of a timeslot n
that starts immediately7 after the departure of the n-th packet and is equal
to the interval [un > un+1 ], for all un A 0 and u0 = 0. It will be convenient
to simplify the notation: Sn = QV (un+ ) denotes the system content (i.e. the
number of occupied queue positions including the packets currently being
served) at the beginning of timeslot n, Qn = QT (un+ ) is the queue content
at the beginning of timeslot n, and Xn and An are the number of served
packets and of arriving packets during timeslot n respectively. The system
content satises the continuity (or balance) equation
Sn+1 = (Sn Xn )+ + An
(13.16)
(13.17)
where ({)+ max({> 0). On the other hand, the relation between system
and queue content implies that Qn = (Sn Xn )+ such that (13.16) is rewritten as
Sn+1 = Qn + An
(13.18)
265
)
m
QV (uq+ ) m =, QV (w3
q+m+1
t n j 1
t n k 1
tn+k
n+k
n+k
n+1
x d n+1
n+j
n+j+1-k
n
n j k
^N r k ` ^N t
S
n
n k 1
d k`
n j 1
k` ^N r
n j k
d k`
^N r d j` ^N t
S
^N t
n j 1
^N t
S
d j`
n j 1
^N t
S
n j 1
d j` ^N r d j`
S
n
d j` ^N r d j`
S
n
Fig. 13.5. Relation between queue observations at arrival and at departure epochs.
Consider now the converse. Suppose that the (q + m + 1)-th packet sees
precisely n m packets in front of it upon arrival: QV (w3
q+m+1 ) = n m. This
implies that the (q + m + 1 n)-th packet is the rst packet that will leave
the system after w = wq+m+1 and that the (q + m n)-th packet has already
left the system. At its departure at w = uq+m3n wq+m+1 , it has observed at
most n packets behind it, because only arrivals are possible in the interval
+
) n and, set n = m, then QV (uq+ ) m
[uq+m3n > wq+m+1 ). Hence, QV (uq+m3n
leading to the implication
o
n
+
QV (w3
q+m+1 ) m =, QV (uq ) m
Combining both implications leads to the equivalence,
n
o
QV (w3
)
m
+, QV (uq+ ) m
q+m+1
266
or, for any sample path (or realization), it holds, for any non-zero integer m,
that
i
h
)
m
= Pr QV (uq+ ) m
Pr QV (w3
q+m+1
In steady-state for q $ 4, with limq<" QV (w3
q ) = QV;D and limq<"
QV (uq+ ) = QV;G , we nd that
Pr [QV;D = m] = Pr [QV;G = m]
(13.19)
13.5 PASTA
Let us denote by limw<" QV (w) = QV the steady-state system content or the
number of packets in the system in steady-state. To compute the waiting
time distribution (under a FIFO service discipline), we must take the view
of how a typical arriving packet in steady-state nds the queue. Therefore,
it is of interest to know when
?
Pr [QV;D = m] = Pr [QV = m]
(13.20)
The equality would imply that, in steady-state, the probability that an arriving packet nds the system in state m equals the probability that the
system is in state m. Recall with (6.1) that the existence of the probabilities
means that Pr [QV = m] also equals the long-run fraction of the time the system contains m packets or is in state m. Similarly, Pr [QV;D = m] also equals
the long-run fraction of arriving packets that see the system in state m. In
general, relation (13.20) is unfortunately not true. For example, consider
a D/D/1 queue with a constant interarrival time f and a constant service
time {f ? f . Clearly, the D/D/1 system has a periodic service cycle: a
busy period takes {f time units and the idle period equals f {f time
units. Thus, every arriving packet always nds the system empty and conf
cludes Pr [QV;D = 0] = 1, while Pr [QV = 1] = {ff and Pr [QV = 0] = f 3{
f .
The waiting time computation of the GI/D/c system in Section 14.4.2 is
another counter example. Since the arrival process {QD (w)> w 0} interacts
267
with the system process {QV (w)> w 0} because every arrival increases the
system content with one, they are dependent processes. Relation (13.20)
is true for Poisson arrivals and this property is called Poisson arrivals see
time averages (PASTA).
Theorem 13.5.1 (PASTA) The long-run fraction of time that a process
spends in state m is equal to the long-run fraction of Poisson arrivals that
nd the process in state m>
Pr [QV;D = m] = Pr [QV = m]
Proof: See8 e.g. Wol (1982).
The Poisson process has the typical property that future increments are
independent of the past and, thus also of the past system history. In certain
sense, Poisson arrivals perform a random sampling which is su!cient to characterize the steady-state of the system exactly. The PASTA property also
applies to Markov chains. The transitions in continuous time Markov chains
are Poisson processes if self-transitions are allowed (see Section 10.4.1). For
any state m, the fraction of Poisson events that see the chain in state m is
m , which (see Lemma 6.1.2) also equals the fraction of time the chain is in
state m.
(13.21)
H [QT ] = H [z]
8
Although Wollfs general proof (Wol, 1982) only contains two pages, it is based on martingales
and on axiomatic probability theory.
268
(13.22)
QV (x)gx = H [QV ]
(13.23)
lim
1
lim
w<" w
w<"
w
1X
lim
Wn = H [W ]
q<" q
q
(13.24)
n=1
exist.
$(t)
idle
T3
T2
3
2
1
T1
t1 = 0 t2
t5 W
t3 t 4
t6
t7
t8 t9
t10
Fig. 13.6. The arrival (bold) and departure (dotted) process, together with the
system time Wn for each packet in the queueing system.
Proof: Recall that D(w) represents the total number of arrivals in time
interval [0> w]. If QV (w) = 0 or the system is idle at time w, then
Z
QV (x)gx =
0
Z wX
"
0 m=1
D(w) Z w
X
m=1
D(w)
Wn
n=1
The general case where QV (w) 0 is more complicated as Fig. 13.6 shows
for w = because not all intervals [wm > wm + Wm ) for 1 m D( ) are contained
PD( )
n=1 Wn counts too much and is an upper bound for
Rin [0> ). Hence,
Q
(x)gx.
If
G(w)
denotes the number of departures in [0> w], Fig. 13.6
V
0
illustrates that the area (in grey) in an interval [0> w],Rwhich equals the total
w
number of packets in the system in that interval 0 (D(x) G(x))gx =
Rw
0 QV (x)gx, can be bounded for any realization (sample path) and any w 0
269
by
Z
Wn
QV (x)gx
0
n:Wn +wn $w
D(w)
Wn
n=1
where the lower bound only counts the packets that have left the system by
time w. By dividing by w, we have
D(w)
w
X
n:Wn +wn $w
Wn
1
D(w)
w
D(w) X Wn
QV (x)gx
w
D(w)
D(w)
(13.25)
n=1
Since we assume that the limit (13.22) exists, we have that D(w) = R(w)
for w $ 4. From the existence of the limit (13.24), we can thus write
X Wn
= H [W ]
w<"
D(w)
D(w)
lim
n=1
as q $ 4
q31
n=1
n=1
as q $ 4
which implies that, for any % A 0, there exists a xed p such that, for all
n A p, we have that Wwnn ? % or wn + Wn ? (1 + %)wn . For w A wp , the lower
bound in (13.25) is
X
nAp:Wn +wn $w
Wn +
p
X
n=1
D(w@(1+%))
Wn =
n=1
Wn
270
or
1
w
X
nD1:Wn +wn $w
1 D(w@(1 + %))
Wn =
1 + % w@(1 + %)
nD1:Wn +wn $w
Wn $
D(w@(1+%))
n=1
Wn
D(w@(1 + %))
H [W ]
1+%
Although the proof may seem rather technical9 for, after all, an intuitive
result, it reveals that no assumptions about the distributions of arrival and
service process apart from steady-state convergence are made. There are
no probabilistic arguments used. In essence Littles Law is proved by showing that two limits exist for any sample path or realization of the process,
which guarantees a very general theorem. Moreover, no assumptions about
the service discipline, nor about the dependence between arrival and service
process or about the number of servers are made which means that Littles
Law also holds for non-FIFO scheduling disciplines, in fact for any scheduling discipline! Littles Law connects three essential quantities: once two of
them are known the third is determined by (13.21). Littles Law is very
important in operations where it relates the average inventory (similar to
H [QV ]), the average ow rate or throughput and the average ow time
H [W ] in a process ow of products or services. Several examples can be
found in Chapter 14, in Anupindi et al. (2006) and Bertsekas and Gallager
(1992, pp. 157162).
We have chosen for a very general proof. Other proofs (e.g. in Ross (1996) and Gallager (1996))
use arguments from renewal reward theory (Section 8.4) which makes their proofs less general
because they require that the system has renewals.
14
Queueing models
This chapter presents some of the simplest and most basic queueing models.
Unfortunately, most queueing problems are not available in analytic form
and many queueing problems require a specic and sometimes tailor-made
solution.
Beside the simple and classical queueing models, we also present two other
exact solvable models that have played a key role in the development of
Asynchronous Transfer Mode (ATM). In these ATM queueing systems the
service discipline is deterministic and only the arrival process is the distinguishing element. The rst is the N*D/D/1 queue (Roberts, 1991, Section
6.2) whose solution relies on the Benes approach. The arrivals consist of Q
periodic sources each with period of G time slots, but randomly phased with
respect to each other. The second model is the uid ow model of Anick
et al. (1982), known as the AMS-queue, which considers Q on-o sources
as input. The solution uses Markov theory. Since the Markov transition
probability matrix has a special tri-band diagonal structure, the eigenvector
and eigenvalue decomposition can be computed analytically.
We would like to refer to a few other models. Norros (1994) succeeded
in deriving the asymptotic probability distribution of the unnished work
for a queue with self-similar input, modeled via a fractal Brownian motion.
The resulting asymptotic probability distribution turns out to be a Weibull
distribution (3.40). Finally, Neuts (1989) has established a matrix analytic
framework and was the founder of the class of Markov Modulated arrival
processes and derivatives as the Batch Markovian Arrival process (BMAP).
272
Queueing models
tially distributed service time, one server and an innitely long queue. The
M/M/1 queue is a basic model in queueing theory for several reasons. First,
as shown below, the M/M/1 queue can be computed in analytic form, even
the transient time behavior. Apart from the computational advantage, the
M/M/1 queue possesses the basic feature of queuing systems: the quantities
of interest (waiting time, number of packets, etc.) increase monotonously
with the tra!c intensity .
Packets arrive in the M/M/1 queue with interarrival rate and are served
with service rate . The M/M/1 queue is precisely described by a constant
rate birth and dead process. Any arrival of a packet to the queueing system
can be regarded as a birth. The current state n that reects the number of
packets in the M/M/1 system jumps to state n + 1 at the arrival of a new
packet and the transition rate equals the interarrival rate : on average every
1
time units a packet arrives to the system. A packet leaves the M/M/1
system after service, which corresponds to a death: at each departure from
the system the current state is decreased by one, with death rate equal
to the service rate : on average every 1 time units, a packet is served.
In the sequel, we concentrate on the steady-state behavior and refer for the
transient behavior to the discussion of the birth and death process in Section
11.3.3.
m0
(14.1)
273
"
X
n=0
Pr [QV = n] } n =
1
1 }
The average number of packets in the M/M/1 system H [QV ] = *0V (1) equals
H QV;P@P@1 =
1
while the variance Var[QV ] follows from (2.27) as
Var QV;P@P@1 =
(1 )2
Both the mean and variance of the number of packets in the system diverges
as $ 1. When the interarrival rate tends to the service rate, the queue
grows indenitely long with indenitely large variation. From Littles law
(13.21), the average time spent in the M/M/1 system equals
H [QV ]
1
1
=
=
H WP@P@1 =
(1 )
(14.2)
H zP@P@1 =
1
1
=
(1 )
(1 )
(14.3)
274
Queueing models
(w)n 3w
h
n!
Using the law of total probability (2.46), the system time Wq of the q-th
packet or the virtual waiting time at time w = wq becomes
g
Pr [Wq w]
gw
"
X
3
iWq (w|QV (w3
=
q ) = n) Pr QV (wq ) = n
iWq (w) =
n=0
= h3w
"
X
(w)n
n=0
n!
Pr QV (w3
q) = n
3
In Section 11.3.3, vn (w3
q ) = Pr [QV (wq ) = n] is computed in (11.27) assuming that the system starts with m packets, i.e. vn (0) = nm . In steadyn
state, where wq $ 4, it is shown that vn (w3
q ) $ (1 ) . In most
cases, however, a time-dependent solution is not available in closed form.
Fortunately, for Poisson arrivals, the PASTA property helps to circumvent this inconvenience. Based on the PASTA property, in steady-state,
limq<" Pr [QV (w3
q ) = n] = Pr [QV = n] given by (14.1). The probability
density function iW (w) = limq<" iWq (w) of the steady-state system time W
(or the total waiting time of a packet) is
3w
iW (w) = h
"
X
(w)n
n=0
n!
(1 ) n
or
iW (w) = (1 ) h3(13)w
(14.4)
275
(14.5)
where the rst term with Dirac function reects a zero queueing time provided the system is empty, which has probability Pr [QV = 0] = 1 .
The Laplace transform of the waiting time in the queue follows from Wq = zq + {q as
*W (})
(1 3 ) } +
=
*{ (})
} + (1 3 )
(1 3 )
= (1 3 ) +
} + (1 3 )
*z (}) =
276
Queueing models
the queue),
Pr [u w] = Pr [u w|QV = 0] Pr [QV = 0] + Pr [u w|QV A 0] Pr [QV A 0]
In case (a), we must await for the next packet to arrive and to be served.
This total time is the sum of an exponential random variable with rate
and an exponential random variable with rate . It is more convenient to
compute the Laplace transform as shown in Section 3.3.1,
Z "
h3}w g (Pr [u w|QV = 0]) =
*u|QV =0 (}) =
}+}+
0
In case (b), the next packet leaves the M/M/1 queue after an exponential
service time with rate ,
Z "
*u|QV A0 (}) =
h3}w g (Pr [u w|QV A 0]) =
}+
0
Hence,
*u (}) = *u|QV =0 (}) Pr [QV = 0] + *u|QV A0 (}) Pr [QV A 0]
(1 ) +
=
=
}+}+
}+
}+
which proves the theorem.
Burkes Theorem states that the steady-state arrival and departure process
of the M/M/1 queue are the same! Consequently, the steady-state departure
rate equals the steady-state arrival rate .
14.2 Variants of the M/M/1 queue
A number of readily obtained variants from the birth-death analogy are
worth considering here. Mainly a steady-state analysis is presented.
14.2.1 The M/M/m queue
Instead of one server, we consider the case with p servers. The buer is still
innitely long and the interarrival process is exponential with interarrival
rate . The M/M/m queue can model a router with p physically dierent
interfaces (or output ports) with same transmission rate towards the same
next hop. All packets destined to that next hop can be transmitted over
any of the p interfaces. This type of load balancing frequently occurs in the
Internet.
As shown in Fig. 14.1, the M/M/m system can still be described by a birth
277
and death process with birth rate n = , but with death rate n = n for
0 n p and n = p if n p. Indeed, if there are n p packets in
system, they can all be served and the departure (or death) rate from the
system is n. Only if there are more packets n A p, only p of them can be
served such that the death rate is limited to the maximum service rate p.
O
0
O
1
P
P
...
m1
(m 1)P
O
m
mP
mP
m+1
...
mP
1+
Pp31
= Pp31
m
m=1 m!m
1
P
+ "
m=p
m
pm3p p!m
1
m
m=0 m!m
(14.6)
p
1
p!p 13
p
m
Pr [QV = 0]
m!m
m
pp
=
Pr [QV = 0]
p! p
Pr [QV = m] =
mp
(14.7)
mp
(14.8)
The tra!c intensity is = p
, the ratio between average interarrival rate
and average (maximum) service rate. Again, ? 1 corresponds to the stable
(ergodic) regime.
For the M/M/m system it is of interest to know what the probability of
queueing is. Queueing occurs when an arriving packet nds all servers busy,
which happens with probability Pr [QV p], or explicitly,
Pr [QV p] =
Pr [QV = 0] p
p
p! 1 p
(14.9)
278
Queueing models
(14.10)
p(pw)m 3pw
h
m!
iz (w|QV p) =
m=0
= 1
= (1 ) m
p
p
p
m+p
279
We observe from (14.1) that, if all p servers are busy, the system content
of an M/M/m system behaves as that in a M/M/1 system. Thus, the
conditional probability density function for the waiting time in the M/M/m
queue is also an exponential distribution,
iz (w|QV p) = (1 ) ph3pw
"
X
(pw)m
m=0
m!
= (1 ) ph3(13)pw
or
Pr [z w|QV p] = 1 h3(13)pw
Substitution in (14.10) nally results in the distribution of the waiting time
in the queue of the M/M/m system,
Iz (w) = Pr [z w] = 1 Pr [QV p] h3(13)pw
(14.11)
Pr [QV p]
(1 ) p h(1)pw hw
1 p (1 )
(14.13)
and the average system time can be computed from (14.13) with (2.33) or
directly from H [W ] = H [z] + H [{] as
H [W ] =
1 Pr [QV p]
+
p (1 )
(14.14)
Also, in the single-server case (p = 1), (14.12) reduces to the pdf (14.5) of
the M/M/1 queue. Furthermore, Burkes Theorem 14.1.1 can be extended
to the M/M/m queue: the arrival and departure process of the M/M/m
queue are both Poisson processes with rate .
14.2.2 The M/M/m/m queue
The dierence with the M/M/m queue is that the number of packets (calls)
in the M/M/m/m queue is limited to p. Hence, when more than p packets (calls) arrive, they are lost. This situations corresponds with classical
280
Queueing models
telephony where a conversation is possible if no more than p trunks are occupied, otherwise you hear a busy tone and the connection cannot be set-up.
The limitation to p arrivals is modeled in the birth and death process by
limiting the interarrival rates, n = if n ? p and n = 0 if n p. The
death rates are the same as in the M/M/m queue, n = n for 0 n p
and n = p if n p.
From the basic steady-state relations for the birth and death process
(11.15) and (11.16), we nd
Pr [QV = 0] = Pp
Pr [QV = m] =
(14.15)
m
m=0 m!m
m
m!m
=0
mp
Pr [QV = 0]
mAp
(14.16)
The quantity of interest in the M/M/m/m system is the probability that all
trunks (servers) are busy, which is known as the Erlang B formula,
Pr [QV = p] =
p
p!p
Pp m
m=0 m!m
(14.17)
281
m
Pr [QV = m] =
exp
m!m
This queueing system is denoted as M/M/4. Thus, the number in the
M/M/4 system (in steady-state) is Poisson distributed with parameter .
Hence, in case p $ 4, the average number in the system H[QV ] =
(as follows from (3.11)) and the average time in the system follows from
Littles theorem (13.21) as H [W ] = 1 . The fact that, if p $ 4, the mean
time in the system M/M/4 equals the average service time has a consistent
explanation: if the number of servers p $ 4, implying that there is an
innite service capacity, it means that there is no waiting room and the
only time a packet is in the system is his service time 1 .
Example 2 Consider two voice over IP (VoIP) gateways connected by a
link with capacity F. Denote the capacity of a voice call by Fvoice (in bit/s).
For example in ISDN, Fvoice = 64 kb/s. In general, Fvoice in VoIP depends
on the specics of the codecs used. The arrival rate of voice tra!c can be
expressed in terms of the number d of call attempts per hour and the mean
call duration g (in seconds) as
dg
3600
The number p of calls that the link can carry simultaneously is
=
p=
F
Fvoice
Since the arrival process of voice calls is well modeled by a Poisson process
with exponential holding time, the Erlang B formula (14.17) is applicable
to compute the blocking probability or grade of service (GoS) as
up
Ppp! um
m=0 m!
(14.18)
where u = = p and = p
is the tra!c intensity. This relation
(14.18) species the probability that admission control will have to refuse
282
Queueing models
a call request between the two VoIP gateways because the link is already
transporting p calls. An Internet service provider can make a trade-o
between the link capacity F (by hiring more links or a higher capacity link
from a network provider) and the blocking probability or GoS. The latter
must be small enough to keep its subscribed customers, but large enough
to make prot. A reasonable value for GoS seems = 1034 . If the Internet
service provider hires a 2 Mb/s link and oers its customers VoIP software
with codec rate 40 kb/s (G.726 standard), then p = 50. Since the left-hand
side of (14.18) is strictly increasing in , solving the equation (14.18) for
u yields u 28=87 or the tra!c intensity equals = 0=5775. Furthermore,
F
= 40 kb/s, we obtain = 1=155 Mb/s. If the mean call duration
since = p
g (in seconds) is known, the number of call attempts per hour then follow as
d = 4158129
. If we assume that a telephone call lasts on average 2 minutes
g
or g = 120 s, the number of call attempts per hour that the Internet service
provider can handle with a GoS of 1034 equals d = 34651.
m=0
m
1
1 N+1
the pdf of the system content for the M/M/1/K system becomes,
(1 ) m
1 N+1
=0
Pr [QV = m] =
0mN
(14.19)
mAN
31
1 N+1
.
The probability that the system is completely lled with N packets equals
Pr [QV = N] =
(1 ) N
1 N+1
(14.20)
283
This probability also equals the loss probability for packets in the M/M/1/K
system. Regarding the QoS problem in multimedia based on IP-networks,
a rst crude estimate of the packet loss in a router with N positions can
be derived from (14.20). The estimate is rather crude because the arrival
process of packets in the Internet is likely not a Poisson process and the
variable length of the packets does not necessarily lead to an exponential
service rate.
284
Queueing models
(14.21)
Hence, since An 0, we see that Sn+1 (Sn 1)+ or that Sn+1 ? (Sn 1)+
is impossible. Hence, Ylm = 0 for m ? l and l A 1 while, for l A 0, Ylm =
Pr [l 1 + An = m|Sn = l] = Pr [An = m l + 1]. The case for l = 0 results
in Y0m = Pr [An = m] = Y1m . Denoting dm = Pr [An = m], the transition
probability matrix becomes2
6
5
d0 d1 d2 d3
9 d0 d1 d2 d3 :
:
9
:
9
Y = 9 0 d0 d1 d2 :
9 0 0 d d :
0
1
8
7
.. . .
..
..
..
.
.
.
.
.
and the corresponding transition graph is sketched in Fig. 14.2.
aj i + 1
a1
...
i2
a3
i1
a2
i+1
i+2
...
...
a0
Fig. 14.2. State transition graph for the M/G/1 embedded Markov chain.
The number of Poisson arrivals during a time slot [un > un+1 ] clearly depends on the length of the service time {n+1 = un+1 un that is distributed
according to I{ (w), which is independent of a specic packet n. Furthermore,
the arrival process is a Poisson process with rate and independent of the
state of the queueing process, thus Pr [An = m] = Pr [A = m]. Hence, using
the law of total probability (2.46),
Z "
Pr [A = m|{ = w] gI{ (w)
Pr [A = m] =
0
Z
=
0
2
"
h3w
(w)m
gI{ (w)
m!
The structure of this transition probability matrix Y has been investigated in great depth by
Neuts (1989). Moreover, Y belongs to the class of matrices whose eigenstructure is explicitly
given in Appendix A.5.3.
285
"
h
0
()m gm *{ (})
w gI{ (w) =
m!
g} m }=
3w m
(14.22)
(14.23)
m=0
(14.24)
Uw
0
286
Queueing models
Section 14.4,
(} 1) D(})
V(}) = 1 D0 (1)
} D(})
We further continue to introduce in this general equation the details of the
P"
m
M/G/1 queueing system by specifying D(}) =
m=0 Pr [A = m] } . With
(14.22), we nd the Taylor expansion,
"
X
(})m gm *{ (})
= *{ ( })
(14.25)
D(}) =
m!
g} m }=
m=0
and the probability generating function of the system content of the steadystate M/G/1 queueing system,
(} 1) *{ ( })
(} 1) *{ ( })
} *{ ( })
(14.26)
2 *00{ (0)
2(1 )
2 H {2
H [QV ] = +
2(1 )
(14.27)
Hence, the average number of packets in the M/G/1 system (in steady-state)
is proportional
to the second moment of the service time distribution. Since
2
H { = Var[{]+(H [{])2 , the relation4 (14.27) shows that, for equal average
I
4
Va r[[]
,
H[[]
287
service rates, the service process with highest variability leads to the largest
average number of packets in the system. One of the early successes of the
Japanese industry was the just in time (JIT) principle, which essentially
tries to minimize the variability in a manufacturing process. Minimization
of variability is also very important in the design of scheduling rules: the
less variability, the more e!ciently buer places in a router are used. Since
a deterministic server has the lowest variance (namely zero), the M/D/1
queue will occupy on average the lowest number of packets. This design
principle was used in ATM, where all service times precisely equal the time
needed to serve one ATM cell. The average time spent in the system follows
directly from Littles law (13.21),
H {2
H [QV ]
= H [{] +
H [W ] =
2(1 )
and, since H [W ] = H [{] + H [z], the average waiting time in the queue is
H {2
H [z] =
(14.28)
2(1 )
Observe a general property of averages in queueing systems: there is a
simple pole at = 1. Both the average number of packets in the system
(and in the queue) and the average waiting time grows unboundedly as
$ 1.
Z
m " 3w m
()m gm *W (})
W
Pr [A = m] =
h w gIW (w) =
m! 0
m!
g} m }=
288
Queueing models
"
X
Pr [AW = m] } m = *W ( })
m=0
where *W (}) is the Laplace transform of the system time W . Since the number of Poisson arrivals AW during the system time W of a packet in steadystate equals the number of packets left behind by that packet, Pr [AW = m] =
Pr [QV;G = m]. The PASTA property (Theorem 13.5.1) states that, in steadystate, the observed number of packets in the queue at departure or arrival
times is equal in distribution to the actual number of packets in the queue
or that Pr [QV;G = m] = Pr [QV = m]. By considering the pgfs of both sides,
DW (}) = V(}), such that with (14.26),
*W ( }) = (1 )
(} 1) *{ ( })
} *{ ( })
After a change of variable v = }, we end up with the result that the
Laplace transform of the total system time in steady-state is a function of
the Laplace transform of the service time
*W (v) = (1 )
v*{ (v)
v + *{ (v)
(14.29)
1
1 *uz (v)
It shows that the dominant tail behavior (see Section 5.7) arises from the
pole at *uz (v) = 1 . By formal expansion into a Taylor series (only valid for
289
"
X
n *nuz (v)
n=0
iz (w) =
n=0
The pdf iz (w) of the waiting time in the queue can be interpreted as a sum
(nW)
of convolved residual service time pdfs iuz (w) weighted by (1 ) n =
Pr [QV = n], the steady-state probability of the system content in the M/M/1
system (14.1).
X
p
n
= } . Substituting (13.16) in (14.23) leads
Xn = p and [n (}) = H }
to
i
h
+
(14.31)
Vn+1 (}) = H } (Sn 3p) +An
At this point, a further general evaluation of the expression (14.31) is only
possible by assuming independence between the random variables An and
Qn . From (13.18), it then follows that Vn+1 (}) = Tn (})Dn (}). This crucial
assumption facilitates the analysis considerably. For,
i
h
+
Vn+1 (}) = H } (Sn 3p) } An
h
i
+
(by independence)
= H } (Sn 3p) H } An
= Dn (})
"
X
m=0
Pr[(Sn p)+ = m] } m
290
Queueing models
= Pr[(Sn p)+ = 0] +
=
p
X
Pr[Sn = p + m] } m
m=1
"
X
Pr[Sn = m] } m3p
Pr[Sn = m] +
m=0
"
X
m=1+p
=}
3p C
Vn (})
m=0
p
X
m=0
Pr[Sn = m] (} m } p )D
m=0
In the single-server case p = 1, where precisely one cell is served per time
slot (provided the queue is not empty), equation (14.32) simplies with
Vn (0) = Pr[Sn = 0] to
Vn+1 (}) = Dn (})} 31 {Vn (}) Pr[Sn = 0] (1 })}
Vn (}) Vn (0)
+ Vn (0)
= Dn (})
}
(14.33)
291
l
|D(})| = 1 + %h + r(%) = (1 + % cos + r(%))2 + (% sin + r(%))2
m=0
which determine the unknown probabilities Pr[S = m]. Since s(}) is a poly5
For any probability generating function *J (}) it holds for |}| $ 1 that
[
[
"
"
[
"
m
|*J (})| =
Pr [J = m] } $
Pr [J = m] } m $
Pr [J = m] = 1
m=0
m=0
m=0
292
Queueing models
|z| = 1
p31
Y
(} q )
q=1
}<1 } p
s(})
=1
D(})
p31
Y
q=1
1
=
(1 q ) lim
p31
}<1 p}
D0 (})
Qp31
(1 q )
p
q=1
Finally, we arrive at the generating function of the buer content via that
of the system content V(}) = D(})T(}),
T(}) =
p31
(p )(} 1) Y } q
} p D(})
1 q
(14.34)
q=1
(} 1) D(})
V(}) = 1 D0 (1)
} D(})
}1
T(}) = (1 )
} D(})
(14.35)
(14.36)
293
D00 (1)
2 (1 )
(14.37)
(S p)+ + F
W =
+1
p
where b{c denotes the largest integer smaller than or equal to {. Indeed,
(S p)+ + F are the number of packets in the system just before the
arrival of the test packet. At the beginning of a timeslot, at most p
packets are served, which explains the integer division. The service time
takes precisely one additional time slot. Let us simplify the notation by
dening R = (S p)+ + F. From this expression for the system time,
we deduce, for each integer n 1 (the minimal waiting time in the system
equals 1 timeslot), that
Pr [W = n] =
p31
X
Pr [R = (n 1)p + m]
m=0
"
X
Pr [W = n] } n =
n=1
"
p31
XX
m=0 n=0
" p31
X
X
Pr [R = (n 1)p + m] } n
n=1 m=0
Pr [R = np + m] } n+1
294
Queueing models
Also,
W (} p ) = } p
p31
X
} 3m
m=0
=}
=}
Pr [R = np + m] } pn+m
n=0
p31
X
3m
"
" X
X
Pr [R = q] } q q>pn+m
n=0 q=0
m=0
p
"
X
p31
X
3m
"
X
Pr [R = q] }
q=0
m=0
"
X
q>pn+m
n=0
"
X
q>pn+m =
n=0
q3m>pn = 1p|q3m
n=3"
q3m>pn
n=3"
p31
X
1 h2l(q3m)
=
=
h2ln(q3m)@p
p 1 h2l(q3m)@p
n=0
p31
"
p31
X
} p X 3m X
}
Pr [R = q] } q
h2ln(q3m)@p
p
q=0
m=0
n=0
p31 p31
}p X X
}h2ln@p
n=0 m=0
p31
}p X
n=0
3m
"
X
Pr [R = q] }h2ln@p
q=0
1 } 3p
2ln@p
31 U }h
1 }h2ln@p
P"
q=0 Pr [R
= q] } q .
p31
}p 1
1 X
2ln@p
U
}h
2ln@p 31
p
n=0 1 }h
295
P
1
W
with the proportionality factor equal to = H[A]
because "
m=0 Pr [A = m] =
1. The test packet is uniformly distributed among the arriving packets
AW in the time slot of the test packet (in steady-state). The probability
of having precisely n packets in front of the test packet given AW = m 1
equals
1n?m
Pr [F = n|AW = m] =
m
Indeed, the test packet has equal probability 1m of occupying any of the
m possible positions. The occupation of a position n + 1 implies precisely
n cells in front of the test packet in a FIFO discipline. Using the law of
total probability (2.46),
Pr [F = n] =
=
"
X
Pr [F = n|AW = m] Pr [AW = m]
m=1
"
X
m=n+1
"
X
1 m Pr [A = m]
1
=
Pr [A = m]
m
H [A]
H [A]
m=n+1
296
Queueing models
"
X
n=0
"
"
1 X X
Pr [F = n] } =
Pr [A = m] } n
H [A]
n
n=0 m=n+1
m
X
"
"
1 X
1 X
1 }m
Pr [A = m]
} n31 =
Pr [A = m]
H [A]
H [A]
1}
m=1
m=1
n=1
3
4
"
"
X
X
1
C
=
Pr [A = m] } m
Pr [A = m]D
H [A] (} 1)
m=1
m=1
} (D(}) Pr [A = 0] 1 + Pr [A = 0])
H [A] (} 1)
D (}) 1
(} 1) D0 (1)
W (} ) =
31 p
p
1 q
} D }h2ln@p q=1
1 }h2ln@p
n=0
For the single-server case (p = 1), the generating function of the system
time (queueing time plus service time) considerably simplies to
D (}) 1
1
}
W (}) =
} D(})
from which H [W ] and Var[W ] readily follow. The computation of the pdf
given the arrival process D(}) is more complex, as illustrated for the M/D/1/K
queue in the next section.
297
6
Poisson(0.8)
4
number of arriving cells
10
2
0
0
0
0
200
400
600
800
1000
timeslot
200
400
600
800
1000
200
400
600
800
1000
timeslot
timeslot
Fig. 14.4. On the left, the Poisson input process with = 0=8 in terms of the number
of cells versus the timeslot. In the middle, the buer occupancy for a buer with
N = 20 as function of time. On the right, the M/D/1/K output process in cells
served per timeslot.
For a Poisson process, Pr[A = n] = n! h3 and D(}) = h(}31) . The pgf
T(}) of the buer content immediately follows from (14.36) as
T(}) = (1 )
(1 }) h(13})
1 } h(13})
H QT;P@G@1 = H [Q] =
2
2 (1 )
and Littles law (13.21) provides the average waiting time in the queue
H QT;P@G@1
H zP@G@1 =
=
2 (1 )
(14.38)
n=0
298
Queueing models
"
"
X
X
}n
(1)q (n + 1)q q q
h(n+1)
}
=
n+1
q!
q=0
n=0
" X
"
X
(1)q (n+1)
h
=
(n + 1)q31 q } q+n
q!
q=0
n=0
p+1
X
(n)p+13n
(n)p3n
p+13n n
(14.39)
+
t[p] = (1 )
(1)
h
(p + 1 n)! (p n)!
n=1
One readily observes from the derivation above that the probability v[p] =
Pr [S = p] that p positions in the system are occupied, is v[p] = t[p 1]
6
2
3 h2 + 3 h 3 2 h )
2
(p D 2)
299
p
X
{n
(p n)n
n!
(14.41)
n=0
h
i
t[p] = (1 ) h(N+1) j(h ; N + 1) h
p=1
N
X
(14.42)
p=0
The expressions in (14.40) are numerically only useful for small p because
the series is alternating. This problem may be solved by considering a
famous result due to Lagrange (Markushevich, 1985, Vol. 2, Chapter 3,
Section 14)
he} = 1 + e
"
X
(e + qd)q31 3d} q
}h
q!
(14.43)
q=1
g e}
g e} g}
because gz
h = g}
h gz =
} = , we obtain
ehe}
(13d})h3d} .
"
"
X
X
h3p
(p q)q
(q p)q
=
(h3 )q = j(h3 ; p) +
(h3 )q
1 q=0
q!
q!
q=p+1
300
Queueing models
and thus
3
j(h
"
X
(q p)q
h3p
(h3 )q
; p) =
1 q=p+1
q!
Pr[Q A N] = (1 )
"
X
qq+N+1
(h3 )q
(q
+
N
+
1)!
q=1
(14.44)
(13)
1 3N31
h3N log h32N
1
(14.45)
K
D
.
.
.
N
Fig. 14.5. Sketch of an ATM concentrator where Q input lines are multiplexed onto
a single output line. The N*D/D/1 queue models this ATM basic switching unit
accurately.
Whereas the arrivals in the M/D/1 queue are uncorrelated, the successive
301
interarrival times in the N*D/D/1 queue are negatively correlated. For the
same average arrival rate, this more regular arrival process results in shorter
queues than in the M/D/1 queue, where the higher variability in the arrival
process causes longer queues.
Due to the dependence of the arrivals over many timeslots, the solution method is based on the Benes approach and starts from the complementary distribution (13.15) for the virtual waiting time or unnished
work in steady-state, Pr [y(w" ) A {] = limw<" Pr [y(w) A {]. Applied to the
N*D/D/1 queue the unnished work equals the number of ATM cells in the
system, thus Pr [y(w" ) A {] = Pr [QV A {]. Hence, in the steady-state for
? 1 or Q ? G,
Z w"
"
X
Pr y(w" + { n) = 0
QD (x) gx = n
Pr [QV A {] =
n=d{e
w" +{3n
Z
w"
Pr
w" +{3n
QD (x) gx = n
The periodic cell trains with period equal to G timeslots at each input line
lead to a periodic aggregated arrival stream of the Q input lines also with
period G. Each cell train transports precisely one cell per period G, which
allows us to observe the characteristics of the aggregated arrival process
during the time interval [0> G). The computations are most conveniently
performed if we choose the steady-state observation point w" = G. Each of
the Q ATM cells arrives uniformly in [0> G] due to the random phasing of
each cell train and the probability that it arrives in [G +{n> G] is s = n3{
G .
Hence, the number of arrivals in [G + { n> G] is a sum of Bernoulli random
variables, which is binomially distributed,
Z G
n { Q3n
n{ n
Q
1
QD (x) gx = n =
Pr
G
G
n
G+{3n
The conditional probability is obtained as follows. The unnished work
at time G + { n only depends on past arrivals in the interval [0> G + { n].
Given that the number of arrivals in [G + { n> G] equals n while there
are always precisely Q in [0> G], the number of arrivals in [0> G + { n]
Q3n
? 1 since
equals Q n and the corresponding tra!c intensity 0 = G+{3n
Q ? G and, thus, Q n ? G n for any n. From Section 13.3.2, we use the
local stationary result: for any stationary single-server queueing system with
tra!c intensity , the probability of an empty system at an arbitrary time
is 1 . If we take a random point in w 5 [0> G + { n], then stationarity
0
implies that Pr [y (w) = 0] = 1 0 = G+{3Q
G+{3n . Since ? 1, the system
302
Queueing models
Q
X
n{ n
G+{Q Q
n { Q3n
1
(14.46)
G+{n n
G
G
n=d{e
Q3d{e
G+{Q X Q
=
(Q m {)Q3m (G + { Q + m)m31
Q
m
G
m=0
G+{Q
=
GQ
Q
X
m=0
G+{Q
GQ
Q
m
(G + { Q + m)m31 (Q { m)Q3m
Q
X
Q
(G+{Q +m)m31 (Q m {)Q3m
m
m=Q3d{e+1
Applying Abels identity (Comtet, 1974, p. 128), valid for all x> |> },
q
X
q
(x n})n31 (| + n})q3n
(x + |) = x
n
q
(14.47)
n=0
G+{Q
GQ
Q
X
Q
(G + { Q + m)m31 (Q m {)Q3m
m
m=Q3d{e+1
(14.48)
demonstrating that, indeed, Pr [QV 0] = 0. For small {, relation (14.48)
is convenient, while (14.46) is more suited for large { $ Q . For example,
1 Q
1 Q
1+ G
Pr [QV 1] = G+13Q
, while Pr [QV A Q 1] = G
.
G+1
Q
G
303
{
32{ Q
+ 13
Pr [QV A {] ' h
(14.49)
Figure 14.6 compares the exact (14.46) overow probability and the Brownian approximation (14.49) for = 0=95. Observe from (14.45) that
Pr [QV A {] ' h3
2{2
Q
Pr QM/D/1 A {
which shows that, for su!ciently high Q , the overow probability of the
N*D/D/1 queue tends to that of the M/D/1 queue. Thus, an arrival process
consisting of a superposition of a large number of periodic processes tends to
2{2
a Poisson arrival process. The decaying factor h3 Q reects the eect of the
negative correlations in the arrival process and shows that a Poisson process
overestimates the tail probability in heavy tra!c. Comparison of (14.46)
and (14.44) for lower loads = Q
G illustrates that the Poisson approximation
becomes more accurate.
0
10
-1
Exact
Brownian Approximation
10
-2
10
-3
10
M/D/1: N of
-4
10
-5
10
-6
Pr[NS > x]
10
N = 5000
-7
10
-8
10
-9
10
N = 1000
-10
10
N = 500
-11
10
N = 200
-12
10
N = 100
-13
N = 50
10
U = 0.95
-14
10
-15
10
20
40
60
80
100
Fig. 14.6. The overow probability Pr [QV A {] in the N*D/D/1 queue for = 0=95
and various number of sources Q .
304
Queueing models
305
g({)
= T({)
g{
(14.51)
Q
1
0
0
Q [(Q 1) + 1]
2
0
0
(Q 1)
[(Q 2) + 2] 3
..
..
..
..
.
.
.
.
0
0
0
0
0
0
0
..
..
.
.
[ + (Q 1)] Q
Q
6
:
:
:
:
:
:
8
Q
X
m=0
where, as shown in Appendix A.5.2.2, the eigenvalues are labeled in increasing order Q 3[f]31 ? ? 1 ? 0 ? Q = 0 ? Q31 ? ? Q3[f] . This
way of writing distinguishes between underload and overload eigenvalues.
Only bounded solutions are allowed. As shown in Appendix A.5.2.2, there
306
Queueing models
are precisely Q [f]1 negative real eigenvalues such that m 5 [0> Q [f]1].
In addition, m = Q that corresponds to the eigenvalue Q = 0 and the (4)
eigenvector. The general bounded solution of (14.51) is
X
Q3[f]31
({) = (4) +
dm hm { {m
(14.53)
m=0
where the scalar coe!cients dm = |mW (0) still need to be determined. Rather
than determining |mW (0) as in Appendix A.5.2.2, a more elegant and physical method is used. The eigenvalue solution in Appendix A.5.2.2 has scaled
the eigenvectors by setting the Q component equal to 1, hence, ({m )Q = 1.
Writing the Q -th component in (14.53) gives with (14.52)
Q
+
Q ({) =
(1 + )Q
Q3[f]31
dm hm {
(14.54)
m=0
for [f] + 1 m Q
Q3[f]31
m=0
dm =
Q
(1 + )Q
and shows that Q [f] 1 additional equations are needed to determine all
coe!cients dm . By dierentiating (14.54) p-times and evaluating at { = 0,
we nd these additional equations
Q3[f]31
X
gp Q ({)
=
dm mp
p
g{
{=0
m=0
which will be determined with the help of the dierential equation (14.51).
Indeed, for p = 1, the dierential equation (14.51) gives
gQ ({)
= G31 T(0)
g{ {=0
31
The important observation is that the eect of multiplication by
G T
gm ({)
decreases the number of zero components in (0) by 1, i.e. g{
=0
{=0
eect. Since
that
gp ({)
g{p
307
p
= G31 T ({), we thus nd, for 0 p Q [f] 1
gp Q ({)
=0
g{p {=0
1
1
12
..
.
9
9
9
9
9
9
9 Q [f]2
9
7 1
Q [f]1
1
1
2
22
..
.
1
3
32
..
.
Q [f]2
3
Q [f]1
3
2
2
Q[f]2
Q[f]1
..
.
Q [f]1
2
Q
[f]1
..
.
Q [f]2
Q [f]1
Q [f]1
:
d0
:
d1
: 9
: 9
d2
:=9
: 9
..
: 7
.
:
8
dQ [f]1
Q [f]1
9
: 9
: 9
:=9
: 9
8 9
9
7
Q
(1+)Q
0
0
..
.
0
0
6
:
:
:
:
:
:
:
8
l=0
m=l+1
Q3[f]31Q3[f]31
det (Y ) =
(m l )
Since all eigenvalues appearing in the Vandermonde matrix are distinct (Appendix A.5.2.2) det (Y ) 6= 0 and a unique solution follows for all 0 m
Q [f] 1 from Cramers theorem as
dm =
1+
Q3[f]31
l=0;l6=m
m
l m
(14.55)
Together with the exact determination of the eigenvalues m and corresponding right-eigenvector {m explicitly given in Appendix A.5.2.2, the coe!cients
dm completely solve the AMS queue.
P
The buer overow probability Pr [QV A {] = 1 Q
m=0 m ({) becomes
PQ
with m=0 m (4) = 1,
X
Q3[f]31
Pr [QV A {] =
m=0
m {
dm h
Q
X
({m )o
o=0
Using the explicit form of the generating function (A.44) where the roots
u1 and u2 belonging to eigenvalue n are specied in (A.42) and the residue
n = nm = f1 in (A.43), the buer overow probability is
X
Q 3[f]31
Pr [QV A {] =
m=0
(14.56)
308
Queueing models
Q
X
({Q )m
m=0
Writing that largest negative eigenvalue (A.47) in terms of the tra!c intensity , gives
(1 + ) (1 )
0 =
1 Qf
Q Q
P
From (A.49), we have Q
. Combined with (14.55), the
m=0 ({Q )m =
f
asymptotic formula for the buer overow probability becomes
Y
Q3[f]31
Q 0 {
Pr [QV A {] h
l=1
l
l 0
(14.57)
10
-1
10
-2
10
-3
U = 0.9
10
-4
10
-5
10
-6
Pr[NS > x]
10
N = 40
-7
10
-8
10
U = 0.7
-9
10
N = 100
-10
10
-11
10
-12
10
U = 0.5
-13
10
-14
10
-15
10
10
15
20
Fig. 14.7. The overow probability (14.56) in the AMS queue versus the buer level
{ for xed = 12 . For each tra!c intensity = 0=5, 0=7 and 0.9, the upper curve
corresponds to Q = 40 and the lower to Q = 100. The asymptotic formula (14.57)
is shown in dotted line.
Figure 14.7 shows both the exact (14.56) and asymptotic (14.57) overow
309
(14.58)
In the combinatorial view, only the arrival process is viewed from a position
310
Queueing models
in the buer and the number of ways in which cells are lost are counted
leading to
"
N
1 X X
q
t [N m] d [m + q]
(14.59)
fou = 0
D (1) q=0
m=0
gY (})
0
fou D (1) =
(14.60)
g} }=1
where
Y (}) =
"
X
}q
q=0
N
X
N
X
t [N m] d [m + q] =
m=0
t [N m] } 3m
"
X
q=0
"
X
N
X
}q
!
t [N m] } 3m d [m + q] } m
m=0
d [m + q] } m+q
q=0
m=0
Rearranging in terms of the generating function for the arrivals D(}) and
P
m
for the buer occupancy T(}) = N
m=0 t [m] } , where t [m] = 0 for m A N,
yields
!
m31
N
X
X
t [N m] } 3m D(})
d [q] } q
Y (}) =
q=0
m=0
= D(}) }
3N
N
X
t [N m] }
N3m
m=0
= } 3N D(})T(}) } 3N
N
X
t [N m] }
3m
N
X
m=0
t [m] } m
d [q] } q
q=0
m=0
N3m31
X
m31
X
d [q] } q
(14.61)
q=0
In order to express the cell loss ratio entirely in terms of the generating
functions D(}) and T(}), we employ (2.20),
!
Z
Z
q
q
} q+1
X
X
1
\
($)
\ ($)
1
m
m
1
g$
|[m]} =
}
g$ =
2l F(0) $ m+1
2l F $ }
$
m=0
m=0
Z
1
\ ($) } q+1
= \ (})
g$
(14.62)
2l F $ } $
311
where F is a contour enclosing the origin and the point } and lying within
the convergence region of \ (}). Combining (14.61) and (14.62), we rewrite
Y (}) as
Z
D($)T($)
1
Y (}) = } 3N D(})T(}) } 3N T(})D(}) +
g$
2l F ($ }) $ N
Z
D($)T($)
1
g$
=
2l F ($ }) $ N
Finally, our expression for the cell loss ratio in a GI/G/1/K system reads
Z
1
D($)T($)
fou =
g$
(14.63)
2lD0 (1) F ($ 1)2 $N
where the contour F encloses both the origin and the point } = 1 and lies in
the convergence region of D(}). Usually, D(}) is known while T(}) proves
to be more complicated to obtain. The product T(})D(}) = V(}) is the pgf
of the system content.
If T(}) and D(}) are meromorphic functions7 and if
D(}) T(})
= 0>
lim
}<" (} 1)2 } N31
the contour F in (14.63) can be closed over |$| A 1-plane to get
1 X
D($)T($)
fou = 0
Res$<s N
D (1) s
$ ($ 1)2
(14.64)
where s are the poles of D(})T(}) outside the unit circle. If these conditions
are met, a non-trivial evaluation of the cell loss ratio can be obtained. In
case the buer pgf of the nite system is known, then T(}) is a polynomial
T(})
is zero and
of degree at most N so that the only pole of T(})
}N
lim}<" } N =
} D(})
t(N) 1 and the above conditions simplify to lim}<" (}31)
2 = 0. Executing (14.63) then leads to
1 X
T(s)
fou = 0
Res$<s D($)
(14.65)
N
D (1) s s (s 1)2
where only the poles s of the arrival process D(}) play a role. For example,
if the number of arrivals has a geometric distribution d [n] = (1 )n
13
with 0 1 with generating function (3.6), Dgeo (}) = 13}
, then the
conditions for (14.65) are satised and we obtain,
1
N
fougeo = T
7
312
Queueing models
Pr[Q A N 1]
1 Pr[Q A N 1]
(14.66)
where, as usual, the tra!c intensity = and Pr [Q A N 1] is the overow probability in the corresponding innite system M/G/1. Transforming
fouco nt
yields
(14.66) to discrete-time using foudiscr = (13fou
co nt )
fouM/G/1/K;discr =
1 Pr[Q A N 1]
1 Pr[Q A N 1]
(14.67)
14.9 Problems
(i) A router processes 80% of the time data packets. On average 3.2
packets are waiting for service. What is the mean waiting time of a
packet given that the mean processing time equals 1 ?
(ii) Compute in a M/M/m/m queue the average number of busy servers.
(iii) Let us model a router by a M/M/1 system with average service time
equal to 0.5 s.
14.9 Problems
313
(a) What is the relation between the average response time (average system time) and the arrival rate ?
(b) How many jobs/s can be processed for a given average response
time of 2.5 s?
(c) What is the increase in average response time if the arrival
rate increases by 10%?
(iv) Assume that company has a call center with two phone lines for service. During some measurements it was observed that both the lines
are busy 10% of the time. On the other hand, the average call holding
time was 10 minutes. Calculate the call blocking probability in the
case that the average call holding time increases from 10 minutes to
15 minutes. Call arrivals are Poissonean with constant rate.
(v) Consider a queueing network with Poisson arrivals consisting of two
innitely long single-server queues in tandem with exponential service
times. We assume that the service times of a customer at the rst
and second queue are mutually independent as well as independent
of the arrival process. Let the rate of the Poisson arrival process be
, and let the mean service rates at queues 1 and 2 be 1 and 2 ,
respectively. Moreover, assume that ? 1 and ? 2 . Give the
probability that in steady-state there are q customers at queue 1 and
p customers at queue 2.
(vi) Let us consider the following simple design question: which queue of
the M/M/m family is most suitable if the arrival rate is and the
required service rate is n, with n A 1. We have the three options
illustrated in Fig. 14.8 at our disposal. Since all queues have innite
P
O/k
kP
O
O
P
P
O/k
A
k
B
Fig. 14.8. Three dierent options: (A) one M/M/1 queue with service rate
n, (B) n M/M/1 queue with service rate and (C) one M/M/k queue with
service rate n=
buers and the same tra!c intensity = n
, and thus the same
throughput. The QoS qualier of interest here is the delay, more
314
Queueing models
14.9 Problems
315
When a line has been assigned to a customer, this customer is transferred from the still demanding subgroup to the served group.
The number of call attempts decreases with the size of the served
group whose members all occupy one line. More precisely, the arrival rate in the Engset model is proportional to the size of the still
demanding subgroup and the number of arrivals is exponential. The
holding time of a line is also exponentially distributed with mean 1 .
(a) Describe the M/M/m/m/s queue as a birth-death process.
(b) Compute the steady-state.
(c) Compute the blocking probability (similar to the blocking in
the Erlang model).
(xii) Compare the cell loss ratio of the M/M/1/K and of the discrete
M/1/D/K using the dominant pole approximation in Section 5.7.
Hint: approximate the cell loss ratio by the overow probability.
Part III
Physics of networks
15
General characteristics of graphs
15.1 Introduction
Network topologies as drawn in Fig. 15.1 are examples of graphs. A graph
J is a data structure consisting of a set of Y vertices connected by a set
of H edges. In stochastic graph theory and communications networking,
the vertices and edges are called nodes and links, respectively. In order
to dierentiate between the expectation operator H [=], the set of links is
denoted by L and the number of links by O and similarly, the set of nodes
by N and number of nodes by Q . Thus, the usual notation of a graph
J (Y> H) in graph theory is here denoted by J (Q> O).
The full mesh or complete graph NQ consists of Q nodes and O = Omax =
Q(Q31)
links, where every node has a link to every other node. The graph
2
that is generated by the statement any l is directly connected to any m in
319
320
full mesh
(complete graph)
star
ring
2D (square) lattice
Tree (connected,
loopless graph)
a population of Q members,
is a complete graph NQ . Since in NQ the number of links Omax = R Q 2 for large Q , it demonstrates Metcalfes law:
the value of networking increases quadratically in the number of connected
members.
The interconnection pattern of a network with Q nodes can be represented
by an adjacency matrix D consisting of elements dlm that are either one or
zero depending on whether there is a link between node l and m or not.
The adjacency matrix is a real symmetric Q Q matrix when we assume
bi-directional transport over links. If there is a link from l to m (dlm = 1)
then there is a link from m to l (dml = 1) for any m 6= l. Moreover, we
exclude self-loops (dmm = 0) or multiple links between two nodes l and m.
More properties of the adjacency matrix of a graph are found in Appendix
B.
A walk from node D to node E with n 1 hops or links is the node
list WD<E = q1 $ q2 $ qn31 $ qn where q1 = D and qn = E.
A path from node D to node E with n 1 hops or links is the node list
PD<E = q1 $ q2 $ qn31 $ qn where q1 = D and qn = E and where
qm 6= ql for each index l and m. Sometimes the shorter notation PD<E
= q1 q2 qn31 qn is used. All links ql $ qm and the nodes qm in the path
PD<E are dierent, whereas in a walk WD<E no restrictions on the node
list is put. If the starting node D equals the destination node E, that path
PD<D is called a cycle or loop. In telecommunications networks, paths and
not walks are basic entities in connecting two communicating parties. Two
paths between D and E are node(link)-disjoint if they have no nodes(links)
in common.
Apart from the topological structure specied via the adjacency matrix D, the link between node l and m is further characterized by a link
weight z(l $ m), most often a positive real number1 that reects the
1
321
(15.1)
where 1{ is the indicator function. The number of paths with one hop equals
[1 (D $ E; Q ) = 1D<E . The maximum number of m hop paths is attained
in the complete graph NQ where 1n1 <n2 = 1 for each link n1 $ n2 and
equals
(Q 2)!
(15.2)
max([m (D $ E; Q )) =
(Q m 1)!
The maximum number of hops in any path is Q 1. This maximum occurs,
for example, in a line graph where the path runs from the one extreme node
to the other or in a ring (see Fig. 15.1) between neighboring nodes where
there is a one hop and a (Q 1)-hops path.
The total number of paths PQ between two nodes in the complete graph
is
PQ =
Q31
X
max([m (D $ E; Q )) =
m=1
Q31
X
m=1
X 1
(Q 2)!
= (Q 2)!
(Q m 1)!
n!
Q32
n=0
= (Q 2)!h U
2
10
In Ciscos OSPF implementation, it is suggested to use z(l < m) = E(l<m)
where E(l < m)
denotes the capacity (in bit/s) of the link between nodes l and m. An approach to optimize the
OSPF weights to reect actual tra!c loads is presented by Fortz and Thorup (2000).
322
where
"
"
X
X
1
(Q 2)!
U = (Q 2)!
=
m!
(Q 1 + m)!
m=Q31
m=0
1
1
1
+
+
+
Q 1 (Q 1)Q
(Q 1)Q (Q + 1)
m
"
X
1
1
?
=
Q 1
Q 2
=
m=1
(15.3)
where h = 2.718 281=== and [{] denotes the largest integer smaller than or
equal to {. Since any graph is a subgraph of the complete graph, the
maximum total number of paths between two nodes in any graph is upper bounded by [h(Q 2)!].
gm = 2O>
m=1
since each link belongs to precisely two nodes and, hence, is counted twice.
In directed graphs, the in(out)-degree is dened as the number of the in(out)going links at a node, while the sum of in- and out-degree equals the degree.
The minimum nodal degree in the graph J is denoted by gmin = minmMJ gm .
P
2O
The average degree of a graph is dened as gd = Q1 Q
m=1 gm = Q which
is, for a connected graph, bounded by 2 Q2 gd Q 1. The lower
bound is obtained for any spanning tree, a graph that connects all nodes
and that contains no cycles and where O = Omin = Q 1. The upper bound
is reached in the complete graph NQ with Omax = Q(Q31)
. Graphs where
2
gmin = gd such as NQ and the ring topology in Fig. 15.1 are called regular
graphs since any node has precisely gd links.
Sometimes networks are classied either as dense if gd is high or as sparse
323
Fig. 15.2. Degree graph with = 2=4 and Q = 300. All nodes are drawn on a
circle.
n3
( )
(15.4)
P
3v for Re(v) A 1 is the Riemann
with3 5 (2=2> 2=5) and (v) = "
n=1 n
Zeta function (Titchmarsh and Heath-Brown, 1986). A graph of this class
is called a degree graph. Figures 15.2 and 15.3 show two instances of a
degree graph.
Also the web graph consisting of websites and hyperlinks features a power
law for the in-degree. David Aldous has given the following argument why a
power law of the in-degree of the web graph is natural. To a good approximation, the number of websites is growing exponentially at rate A 0. This
means that the lifetime W of a random website satises Pr [W A w] h3w .
3
A more general expression than (15.4) is Pr [gm = n] = fn j(n), where f is a normalization
constant and where j(n) is a slowly varying function (Feller, 1971, pp. 275-284) with basic
property that limw<" j(w{)
= 1, for every { A 0.
j(w)
324
Fig. 15.3. Degree graph with = 2=4 and Q = 200. The higher degree nodes are
put inside the circle.
Let o (x) denote the number of links into a site at time x after its creation. At
observation time w, the distribution of the number of links [ into a random
website is, by the law of total probability,
Z w
g Pr [W x]
gx
Pr [[ A n] =
Pr [[ A n|W = x]
gx
0
Z w
h3x Pr [[ A n|W = x] gx
0
Z w
Z w
31
3x
h 1{o(x)An} gx =
h3x gx = h3w + h3o (n)
0
o31 (n)
Pr [[ A n] n3
arises for su!ciently large w. For a polynomial growth o (x) x and large
w,
Pr [[ A n] h3n
1
325
amples illustrates the importance of the growth law of o (x). The argument
shows that a polynomial scaling law, commonly referred to as a power law,
is a natural consequence of exponential growth. An exponential growth possesses the property that go(x)
gx = o (x) which is established by preferential
attachment. Preferential attachment means that new links are on average
added to sites proportional to their size. The more links a site has, the larger
the probability that a new link attaches to this site. For example, already
popular websites are increasingly more often linked to than small or less
popular websites. Since many aspects of the Internet, such as the number
of IP packets, number of users, number of websites, number of routers, etc.,
are currently growing approximately exponentially fast, the often observed
power laws are more or less expected.
15.4 Connectivity and robustness
A graph J is connected if there is a path between each pair of nodes and
disconnected otherwise. A telecommunication network should be connected.
Moreover, it is essential that the network should be robust: it should still
operate if some of the links between routers or switches are broken or temporarily blocked by other calls. Hence, the network graph should possess a
redundancy of links. The minimum number of links to connect all nodes in
the network equals Q 1. This minimum conguration is called redundancy
level 1. In general, a redundancy level of G is dened by Baran (2002) as
the link-to-node ratio in an innite G-lattice4 . A redundancy level of at
least 3 is regarded as a highly robust network. A consequence of this insight has been employed in the design of the early Internet (Arpanet): it
would be theoretically possible to build extremely reliable communication
networks out of unreliable links by the proper use of redundancy. Another
more timely application of the same principle is the design of reliable ad-hoc
and sensor networks.
4
A G-lattice is a graph where each nodal position corresponds to a point with integer coordinates
within a G dimensional hyper-cube with size ]. Apart from the border nodes, each node has a
same degree equal to 2G. The number of nodes equals Q = ] G . From (B.2), the link-to-node
ratio follows as
Q
1 [
O
=
gm = G 3 u
Q
2Q m=1
1
where the correction u = R Q G 31 is due to the border nodes. For an innite G-lattice,
where the limit ] < " (which implies Q < "), we obtain
lim
]<"
O
=G
Q
326
There exist interesting results from graph theory that help to dimension
a reliable telecommunication network. Instead of the redundancy level, the
edge and vertex connectivity seem more natural quantiers from which robustness can be derived. The edge connectivity (J) of a connected graph
J is the smallest number of edges (links) whose removal disconnects J. The
vertex connectivity (J) of a connected graph dierent5 from the complete
graph NQ is the smallest number of vertices (nodes) whose removal disconnects J.
edge connectivity
B
A
D
O(G) = 1
G
E
F
A
D
G
E
A
N(G) = 1
vertex connectivity
E
F
Fig. 15.4. An example of the edge and the vertex connectivity of a graph.
These denitions are illustrated in Fig. 15.4. For any connected graph J
holds that
(J) (J) gmin (J)
(15.5)
The complete graph NQ cannot be disconnected by removing nodes and we dene (NQ ) =
Q 3 1 for Q D 3.
A second general inequality (B.23) relates the second smallest eigenvalue of the Laplacian to
the edge and vertex connectivity (see Section B.4).
327
G1
C
328
is minimized. Since the minimum cannot exceed the average, we have that
gmin gd = 2O
Q . From (15.5), it follows that the best possible reliability is
achieved if the network graph is designed such that
(J) =
2O
Q
2|
gy (gy 1)
(15.6)
The expansion hJ (k) of a graph reects the number of nodes that can be
329
1 X
|F (k)|
Q2
(15.7)
yMN
where F (k) is the set of nodes that can be reached in k hops from a node
y and |D| represents the number of elements in the set D. We can interpret
F (k) geometrically as a ball centered at node y with radius k.
The resilience uJ (p) measures the connectivity or robustness of a graph.
Let p = |F (k)| denote the number of nodes in a ball centered at node y
and with radius k, and dene o (y> p) as the number of links that needs to
be removed to split F (k) into two sets with roughly equal numbers of nodes
(around p@2). The resilience uJ (p) of a graph is
1 X
uJ (p) =
o(y> p)
(15.8)
O
yMN
The distortion wJ (p) measures how closely the graph resembles a tree and
is dened as
1 X
z (F (k))
(15.9)
wJ (p) =
Q
yMN
where z (J) is the value of the minimum spanning tree in J with unit link
weight z (l $ m) = 1 for each link of J=
Consider a ow with a unit amount of tra!c between each pair of nodes
in the graph J. Each ow between a node pair follows the shortest path
between that node pair. The betweenness E of a link (node) is dened as
the number of shortest paths between all possible pairs of nodes in J that
traverse the link (node). If Kl<m denotes the number of hops in the shortest
path from l $ m, then the total number of hops KJ in all shortest paths in
P PQ
PO
J is KJ = Q
l=1
m=l+1 Kl<m . This number is also equal to KJ =
o=1 Eo ,
where Eo is the betweenness of a link o in J. Taking the expectation of both
relations gives the average betweenness of a link in terms of the average
hopcount
Q
2
H[KQ ] H[KQ ]
O
with equality only for the complete graph.
H [E] =
330
H[[m ] =
(Q 2)! m
s
(Q m 1)!
(15.10)
for 1 m Q 1. The average total number of paths between two arbitrary
331
Fig. 15.6. A connected random graph Js (Q ) with Q = 300 and s = 0=013 drawn
on a circle.
o=0
where the latter bound is closely approached for large Q. Moreover, when
the random graph reduces to the complete graph (s = 1), we again obtain
(15.3). Since the degree gm of a node m is the number of links incident
with that node, it follows directly from the denition of Js (Q ) that the
probability density function of the degree Grg of an arbitrary node in Js (Q )
equals
Q 1 n
Pr [Grg = n] =
s (1 s)Q313n
(15.11)
n
The interest in random graphs is fueled by the fact that the topology of
the Internet is inaccurately known and also that good models7 are lacking.
In some sense, the Internet can be regarded as a growing and changing organism. Such complex networks also arise in other elds. Increased interest
7
332
333
formula,
Q+1
2
y+1
Q (X
2 )
X
Q
y=0
Q3y
2
F(y + 1> )
O
=y
(15.12)
!
n
"
" X
"
X
X
F(Q> O) Q O
(1 + |)(2) {n
{ | = log 1 +
(15.13)
Q!
n!
Q=1 O=1
n=1
Q +1
(X
2 )
=Q
y+1
0 Q31
Q3y
2 )
X Q (X
2
2
F(Q +1> )
F(y+1> )
+
y =y
O
O
y=0
0
= >O , we arrive after a substitution of Q $ Q 1 at the
Since O3
recursion formula,
Q
F(Q> O) =
(y+1
Q 313y
2 )
Q 1 X
2
F(y + 1> )
y
O
=y
Q32
X
y=0
(15.14)
F(3> 3) = 1
F(4> 4) = 15
F(5> 5) = 222
F(5> 9) = 10
F(6> 6) = 3660
F(6> 10) = 2997
F(6> 14) = 15
F(7> 7) = 68295
F(7> 11) = 343140
F(7> 15) = 54257
F(7> 19) = 210
F(4> 5) = 6
F(5> 6) = 205
F(5> 10) = 1
F(6> 7) = 5700
F(6> 11) = 1365
F(6> 15) = 1
F(7> 8) = 156555
F(7> 12) = 290745
F(7> 16) = 20349
F(7> 20) = 21
F(4> 6) = 1
F(5> 7) = 120
F(6> 8) = 6165
F(6> 12) = 455
F(7> 9) = 258125
F(7> 13) = 202755
F(7> 17) = 5985
F(7> 21) = 1
334
1
32{
is connected = h3h
lim Pr Ju Q> Q log Q + {Q
Q<"
2
(15.15)
Ignoring the integral part [.] operator and eliminating { using the number
of links O = 12 Q log Q + {Q gives, for large Q ,
Pr[Ju (Q> O) = connected] h3Q h
3 2O
Q
(15.16)
F(Q> O)
(Q )
2
(15.17)
In contrast to the unattractive computation of the exact F(Q> O) via recursion (15.14), the Erds and Rnyi asymptotic expression (15.16) is simple.
The accuracy for relatively small Q is shown in Fig. 15.7.
1.0
L=N
Pr[Gr(N,L) = disconnected]
0.8
L = 3/2 N
0.6
L = 2N
0.4
L = 2/3 N log N
0.2
exact
Erdos' asymptotic formula
0.0
0
10
20
30
40
50
60
Number of Nodes N
Fig. 15.7. The probability that a random graph Ju (Q> O) is disconnected : a comparison between the exact result (15.17) and Erdos asymptotic formula (15.16) for
O = Q , O = 32 Q , O = 2Q and O = 23 Q log Q .
The key observation of Erds and Rnyi (1959) is that a phase transition in
random graphs with Q nodes occurs when the number of links O is around
335
(h32{ )n h3h
lim Pr [number of nodes in JF (Q> O{ ) = Q n] =
Q<"
n!
(15.18)
If n = 0, then all nodes belong to the giant component and the graph is
completely connected in which case (15.18) leads to (15.16).
The total number of graphs Ju (Q> O{ ) with n 1 isolated nodes equals
Q (Q 3n)
2
, the number of ways in which n isolated nodes can be chosen
n
O{
out of the total of Q nodes multiplied by the number of graphs that can be
constructed with Q n nodes and O{ links. Observe that this total number
also includes those graphs where not all the Q n nodes are necessarily
connected. In other words, this total number includes the graphs that do
not possess property Dn . The total number of graphs W0 without isolated
node follows from the inclusion-exclusion formula (2.10) as
Q3n
Q
X
n Q
2
(1)
W0 (Q> O{ ) =
n
O{
n=0
where the index n = 0 equals the total number of graphs with Q nodes
and O{ links, i.e. the total number of elements in the sample space. Evi-
336
dently, the total number F(Q> O{ ) of connected random graphs of the class
Ju (Q> O{ ) is smaller than W0 (Q> O{ ) because all of them must obey property
D0 as well.
Since9
Q (Q 3n)
2
(h32{ )n
n
O{
lim
=
Q
(2)
Q<"
n!
O{
we obtain
32{ )n
W0 (Q> O{ ) X
32{
n (h
= h3h
(1)
=
Q
Q<"
(2)
n!
"
lim
n=0
O{
O{
Q
wn =
Q n
2
O
{
Q
2
O{
O\
n31
{ 31 Q 3n
3m
1 \
2
(Q 3 m)
=
Q
n! m=0
3
m
m=0
2
2m
O{ 31 O\
n31
{ 31 1 3
m
n
n O{ 31
Qn \
(Q 3n)(Q 313n)
13
13
13
2m
n! m=0
Q
Q
Q 31
1 3 Q (Q
m=0
31)
which is
log (n!wn ) = n log Q +
H[
{ 31
m=0
m
n
n
log 1 3
+ (H{ 3 1) log 1 3
+ log 1 3
Q
Q
Q 31
m=0
n31
[
log 1 3
2m
(Q 3 n)(Q 3 1 3 n)
3 log 1 3
2m
Q(Q 3 1)
For large Q and using the expansion log (1 3 }) = 3} + R } 2 , we have for xed n with
2m
2m
log 1 3
= log 1 3
+ R Q 33
(Q 3 n)(Q 3 1 3 n)
Q(Q 3 1)
that
2n
log (n!wn ) = n log Q + R Q 31 3 O{
+ R O2{ Q 33
Q
In order to have a nite limit limQ <" log (n!wn ) = f M R, we must require that n log Q 3
f
O{ 2n
= f which implies that O{ = Q
log Q 3 Q
. For this scaling the order term R O2{ Q 33
Q
2
2n f
indeed vanishes if Q < ". By choosing { = 3 2n
, we arrive at the correct scaling of O{ =
1
Q
log
Q
+
{Q
postulated
above
and
f
=
32n{.
2
337
O{
( )
Q
2
O{
W0 (Q n> O{ )
(h32{ )n 3h32{
h
$
Q
3n
( )
n!
2
O{
where the limit gives the correct result because the small dierence between
the total number and that without property Dn tends to zero.
a.s.
Q
Pr[Gmin 1] = (Pr[Guj 1])Q = (1 Pr[Guj = 0])Q = 1 (1 s)Q31
which shows that Pr [Gmin 1] rapidly tends to one for xed 0 ? s ? 1
and large Q. Therefore, the asymptotic behavior of Pr [Js (q) is connected]
338
If we denote fQ , Q (1 sQ )Q 31 , then
Q
"
X
(1 sQ )(Q31)m
m=2
"
X
m=2
fmQ
mQ m31
can be made arbitrarily small for large Q provided we choose fQ = R Q
with ? 12 . Thus, for large Q , we have that
Pr [Js (Q ) is connected] = h3fQ 1 + R Q 231
which tends to 0 for 0 ? ? 12 and to 1 for ? 0. Hence, the critical
exponent where a sharp transition occurs is = 0. In that case, fQ = f (a
real positive constant) and
log Qf
log f
log Q
sQ = 1 exp
+R
=
=
Q 1
Q
Q
In summary, for large Q ,
Pr [Js (Q ) is connected] $
0
1
if s ?
if s A
log Q
Q
log Q
Q
(15.19)
339
Since in Js (Q ) all neighbors of q are independent10 , the conditional probability becomes, with 1 V = Pr [q 5
@ F],
Pr [all n neighbors of q 5
@ F|gq = n] = (Pr [q 5
@ F])n = (1 V)n
Moreover, this probability holds for any node in q 5 Js (Q ) such that,
writing the random variable Grg instead of an instance gq ,
1V =
"
X
n=0
where *Grg (x) = H xGrg is the generating function of the degree Grg in
Js (Q ). For large Q , the degree distribution in Js (Q ) is Poisson distributed
with mean degree rg = s (Q 1) and *Grg (x) ' hrg (x31) . For large Q , the
fraction V of nodes in the giant component in the random graph satises
an equation similar to that in (12.13) of the extinction probability in a
branching process,
V = 1 h3rg V
(15.20)
and the average size of the giant component is Q V. For rg ? 1 the only
solution is V = 0 whereas for rg A 1 there is a non-zero solution for the
size of the giant component. The solution can be expressed as a Lagrange
series using (5.34),
V (rg ) = 1 h3rg
"
X
q
(q + 1)q
rg h3rg
(q + 1)!
q=0
(15.21)
By reversing (15.20), the average degree in the random graph can be expressed in terms of the fraction V of nodes in the giant component,
rg (V) =
10
log (1 V)
V
(15.22)
This argument is not valid, for example, for a two-dimensional lattice Z2s in which each link
between adjacent nodes at integer value coordinates in the plane exists with probability s. The
critical link density for connectivity in Z2s is sf = 12 , a famous result proved in the theory of
percolation (see, for example, Grimmett (1989)).
340
15.7 The hopcount in a large, sparse graph with unit link weights
Routers in the Internet forward IP packets to the next hop router, which is
found by routing protocols (such as OSPF and BGP). Intra-domain routing
as OSPF is based on the Dijkstra shortest path algorithm, while inter-domain
routing with BGP is policy-based, which implies that BGP does not minimize a length criterion. Nevertheless, end-to-end paths in the Internet are
shortest paths in roughly 70% of the cases. Therefore, we consider the shortest path between two arbitrary nodes because (a) the IP address does not
reect a precise geographical location and (b) uniformly distributed world
wide communication, especially, on the web seems natural since the information stored in servers can be located in places unexpected and unknown to
browsing users. The Internet type of communication is dierent from classical telephony because (a) telephone numbers have a direct binding with a
physical location and (b) the intensity of average human interaction rapidly
decreases with distance. We prefer to study the hopcount KQ because it is
simple to measure via the trace-route utility, it is an integer, dimensionless,
and the quality of service (QoS) measures (such as packet delay, jitter and
packet loss) depend on the hopcount, the number of traversed routers. In
this section, we rst investigate the hopcount in a sparse, but connected
graph where all links have unit weight. Chapter 16 treats graphs with other
link weight structures.
E
Q 31
qE
15.7 The hopcount in a large, sparse graph with unit link weights
341
which consists of the ratio of all combinations in which the qE nodes around
E can be chosen out of the remaining nodes that do not belong to the set
FD over all combinations in which qE nodes can be chosen in the graph with
Q nodes except for node D. Furthermore,
Q313qD
(Q qD 1)(Q qD 2) (Q qD qE )
qE
Q31
=
(Q 1)(Q 2) (Q qE )
q
E
(1
qD +1
qD +2
qD +qE
)
Q )(1 Q ) (1
Q
qE
1
2
(1 Q )(1 Q ) (1 Q )
E
Q31
qE
n
qD + n
log 1
log 1
Q
Q
n=1
!
X
qE
qE
"
X
X
1
n
qD + n
=
(qD + n)m nm
Q
Q
mQ m
m=2
n=1
n=1
2
qD qE
qD qE
1
1
1
=
U
+
+
Q
Q
2qD 2qE
2qD qE
=
qE
X
m31
qE
"
X
X
qD qE 3
1 X m
m3p
p
q
n =R
U=
p D
mQ m
Q
m=3
p=0
n=1
After exponentiation
qD qE 2
1+R
Q
qD qE
large Q , we obtain
Pr [KQ
(15.23)
342
o
X
Zn n
n=0
o
X
n =
n=0
o+1 1
1
A n] exp
2
n
Q ( 1)2
15.7 The hopcount in a large, sparse graph with unit link weights
343
With the tail probability expression (2.36) for the average, we arrive at the
lower bound for the expected hopcount in large graphs,
"
"
X
X
2
n
Pr [KQ A n]
exp
H [KQ ] =
Q ( 1)2
n=0
n=0
P
n can be evaluated exactly11 as
The sum V1 (w) = "
n=0 exp w
h
i
" cos 2n log + arg 2nl
X
log
w
log
2
1 log w +
q
V1 (w) =
+s
2
2
log
log n=1
2n sinh 2n
log
"
X
3n
1 h3w
+
n=1
Furthermore,
h
X
2n
2nl
"
" cos log log w + arg log X
1
q
q
= e()
2
n=1 2n sinh 2n2
n=1
2n sinh 2n
log
log
and the function W () =
2e()
I
log
value W (5) is smaller than 0=0035. Since w = Q(31)
2 is small and A 1, we
approximate
1 log w +
V1 (w)
(15.24)
2
log
11
"
wv31 h3
gw
and
] "
"
"
[
[
n
1
=
wv31
h3 w gw
nv
0
n=0
n=0
K(v)
or
"
v
31
v
1
2l
f+l"
f3l"
K(v) v
gv
v 3 1 w
By moving the line of integration to the left, we encounter a double pole at v = 0 from K(v)
2nl
and v131 and simple poles at v = log
from v131 . Invoking Cauchys residue theorem leads
to the result.
344
1 log (31)2 +
log Q
log Q
+
H [KQ ]
log
2
log
log
This shows that in large, sparse graphs for which the discovery process is
Q
well modeled by a branching process, it holds that H [KQ ] scales as log
log
where = H [\ ] 1 A 1 is the average degree minus 1 in the graph.
We can rene the above analysis. Let us now assume that the convergence
of Zn $ Z is su!ciently fast for large Q and that Z A 0 such that,
|FD (o)| ZD
o
X
n=0
n = ZD
o+1 1
o+1
ZD
1
1
is a good approximation (and similarly for |FE (o)|). The verication of this
approximation is di!cult in general. Theorem 12.3.2 states that Pr [Z = 0] =
0 and equivalently Pr [Z A 0] = 1 0 where the extinction probability 0
obeys the equation (12.13). Using this approximation, we nd from (15.23)
ZD ZE 2o+2
ZD > ZE A 0
Pr [KQ A 2o] H exp
Q ( 1)2
where the condition on Z A 0 is required else there are no clusters FD (o)
and FE (o) nor a path. Since the same asymptotics also holds for odd values
of the hopcount, we nally arrive, for n 1 and large Q , at
h
2
ZD ZE
Q ( 1)2
#
"
1 2 log Z log Q + 2 log (31) +
=H
Z A 0
2
log
1 log Q 2 log (31)
H [ log Z | Z A 0]
= +
2
2
log
log
15.7 The hopcount in a large, sparse graph with unit link weights
345
In sparse graphs with average degree H [\ ] equal to and for a large number
of nodes Q , the average hopcount is well approximated12 by
H [KQ ] =
1 2 log ( 1)
H [ log Z | Z A 0]
log Q
+
2
log
2
log
log
(15.25)
This expression (15.25) for the average hopcount which is more rened than
Q
the commonly used estimate H [KQ ] log
log contains the curious average
H [ log Z | Z A 0] where Z is the limit random variable of the branching
process produced by the graphs degree distribution \ .
Application to Gp (N) The above analysis holds for xed H [\ ] = s(Q
where is approximately
1) such that, for large Q , we require that s = Q
equal to the average degree. Since the binomial distribution (15.11) for
the degree in Js (Q) is very well approximated by the Poisson distribution
n
Pr [Grg = n] n! h3 for large Q and constant , formula (15.25) requires
the computation of H [ log Z | Z A 0] in a Poisson branching process, which
is presented in Hooghiemstra and Van Mieghem (2005) but here summarized
in Fig. 15.8. The numerical evaluation of average hopcount (15.25) in a
1.2
1.0
E[logW|W>0]
0.8
0.6
0.4
0.2
0.0
-0.2
1
10
Fig. 15.8. The quantity H [ log Z | Z A 0] of a Poisson branching process versus the
average degree .
12
A more rigorous derivation that stochastically couples the graphs growth specied by a certain
degree distribution to a corresponding branching process is found in van der Hofstad et al.
(2005). In particular, the analysis is shown to be valid for any randomly constructed graph
with a nite variance of the degree. More details on the result for the average hopcount are
presented in Hooghiemstra and Van Mieghem (2005).
346
random graph of the class Js (Q ) for small average degree and large Q
shows that (15.25) is much more accurate than only its rst term log Q .
At the other end of the scale for a constant link density s = f ? 1, which
corresponds to an average degree H [\ ] = f(Q 1), the above analysis no
longer applies for such large values of the average degree H [\ ]. Fortunately,
in that case, an exact asymptotic analysis is possible (see Problem (iii)):
Pr [KQ = 1] = s
Pr [KQ = 2] = (1 s) 1 (1 s2 )Q 32
(15.26)
Q 32
s) 1 s2
tends to zero rapidly for su!ciently large Q . Hence, H[KQ ] '
Pr [KQ = 1] + 2 Pr [KQ = 2] ' 2 s and, similarly, we nd Var[KQ ] '
s(1s). This asymptotic analysis even holds for a larger link density regime
1
s = fQ 3 2 + with A 0 because
Q32
1
=0
lim Pr [KQ A 2] = lim (1 fQ 3 2 + ) 1 fQ 31+2
Q<"
Q<"
16
The Shortest Path Problem
The shortest path problem asks for the computation of the path from a
source to a destination node that minimizes the sum of the positive weights1
of its constituent links. The related shortest path tree (SPT) is the union of
the shortest paths from a source node to a set of p other nodes in the graph
with Q nodes. If p = Q 1, the SPT connects all nodes and is termed a
spanning tree. The SPT belongs to the fundamentals of graph theory and
has many applications. Moreover, powerful shortest path algorithms like
that of Dijkstra exist. Section 15.7 studied the hopcount, the number of
hops (links) in the shortest path, in sparse graphs with unit link weights.
In this chapter, the inuence of the link weight structure on the properties
of the SPT will be analyzed. Starting from one of the simplest possible
graph models, the complete graph with i.i.d. exponential link weight, the
characteristics of the shortest path will be derived and compared to Internet
measurements.
The link weights seriously impact the path properties in QoS routing
(Kuipers and Van Mieghem, 2003). In addition, from a tra!c engineering
perspective, an ISP may want to tune the weight of each link such that the
resulting shortest paths between a particular set of in- and egresses follow
the desirable routes in its network. Thus, apart from the topology of the
graph, the link weight structure clearly plays an important role. Often, as
in the Internet or other large infrastructures, both the topology and the
link weight structure are not accurately known. This uncertainty about the
precise structure leads us to consider both the underlying graph and each
of the link weights as random variables.
A zero link weight is regarded as the coincidence of two nodes (which we exclude), while an
innite link weight means the absence of a link.
347
348
A 0>
(16.1)
log Iz ({)
log {
D
D
D!
larger scale
Fig. 16.1. A schematic drawing of the distribution of the link weights for the three
dierent -regimes. The shortest path problem is mainly sensitive to the small
region around zero. The scaling invariant property of the shortest path allows us
to divide all link weights by the largest possible such that Iz (1) = 1 for all link
weight distributions.
349
350
q 5 VQ
(16.2)
Indeed, initially there is only the source node D with label2 0, hence q =
1. From this rst node D precisely Q 1 new nodes can be reached in
the complete graph NQ . Alternatively one can say that Q 1 nodes are
competing with each other each with exponentially distributed strength to
be discovered and the winner amongst them, say F with label 1, is the one
reached in shortest time which corresponds to an exponential variable with
rate Q 1.
v8
v7
corresponding URT
Markov
discovery
process
v6
v5
v4
v3
2
5
v2
h=0
h=1
h=2
v1
1
7
h=3
time
2
4
3
W6
7
Fig. 16.2. On the left, the Markov discovery process as function of time in a graph
with Q = 9 nodes. The circles centered at the discovering node D with label
0 present equi-time lines and yn is the discovering time of the n-th node, while
n = yn yn1 is the n-th interattachment time. The set of discovered nodes
redrawn per level are shown on the right, where a level gives the number of hops k
from the source node D. The tree is a uniform recursive tree (URT).
2
When continuous measures such as time and weight of a path are computed, the source node is
most conveniently labeled by zero, whereas in counting processes, such as the number of hops
of a path, the source node is labeled by one.
351
352
o
n
(n)
Denote by [Q the n-th level set of a tree W , which is the set of nodes
in the tree W at hopcount n from the root nD in aograph with Q nodes, and
(n)
(n)
(0)
by [Q the number of elements in the set [Q . Then, we have [Q = 1
because the zeroth level can only contain the root node D itself. For all
(n)
n A 0, it holds that 0 [Q Q 1 and that
Q31
X
(n)
[Q = Q
(16.3)
n=0
(q)
(n)
Theorem 16.2.1 Let {\Q }n>QD0 and {]Q }n>QD0 be two independent
copies of the vector of level sets of two sequences of independent URTs.
Then
(n)
(n31)
(n)
(16.4)
(16.5)
where (qm # m) means that the m-th node is attached to node qm 5 [1> m 1]
and q2 = 1. Hence, qm is the predecessor of m and the predecessor relation
is indicated by the arrow #. Moreover, qm is a discrete uniform random
variable on [1> m 1] and all q2 > q3 > = = = > qQ are independent.
Root
1
12
18
22
24
26
10
14
21
13
16
20
23
25
19
11
15
17
353
X N( 0)
X N(1)
X N( 2)
X N( 3)
X N( 4)
Fig. 16.3. An instance of a uniform recursive tree with Q = 26 nodes organized per
level 0 n 4. The node number (inside the circle) indicates the order in which
the nodes were attached to the tree.
===
Q31
X
1 = (Q 1)!
qQ =1
In general, Cayleys Theorem (Appendix B.1 art. 3) states that there are
Q Q32 labeled trees possible. The URT is a subset of the set of all possible
labeled trees. Not all labeled trees are URTs, because the nodes that are
further away from the root must have larger labels.
The shortest path tree from the source or root D to other nodes in the complete graph is the tree associated with the Markov discovery process, where
the number of nodes [(w) at time w is constructed as follows. Just as the discovery process, the associated tree starts at the root D. We now investigate
the embedded Markov chain (Section 10.4) of the continuous-time discovery
process. After each transition in the continuous-time Markov chain, [(w) $
354
probability
H [Q
Q
of having hopcount n,
Pr[kQ = n] =
h
i
(n)
H [Q
Q
(16.7)
If the size of the URT grows from q to q + 1 nodes, each node at hopcount
n 1 from the root can generate a node at hopcount n with probability 1@q.
Hence, for n 1,
i
h
(n31)
h
i Q31
X H [q
(n)
H [Q =
q
q=n
355
Q 31
1 X
= n] =
Pr[kq = n 1]
Q
q=n
1
1
+
Q
Q
1
1
+
=
Q
Q
Q
31 Q31
X
X
n=1 q=n
q
Q
31 X
X
q=1 n=1
Pr[kq = n 1]} n
Q31
} X
1
+
Pr[kq = n 1]} =
*k (})
Q
Q q=1 q
n
Taking the dierence between (Q + 1)*kQ +1 (}) and Q *kQ (}) results in the
recursion
(Q + 1)*kQ +1 (}) = (Q + })*kQ (})
Iterating this recursion starting from *k1 (}) = H } k1 = H } 0 = 1 leads
to (16.6).
(1)Q 3(n+1) VQ
(16.8)
Q!
Proof: The probability generating function *kQ (}) in (16.6) is also the
(n)
generating function of the Stirling numbers VQ of the rst kind (Abramowitz
and Stegun, 1968, 24.1.3) such that the probability that a uniformly chosen
node in the URT has hopcount n equals (16.8).
Pr[kQ = n] =
The explicit form of the generating function shows that the average hopcount kQ in a URT of size Q equals
Q
X
1
g
0
H[kQ ] = *kQ (1) =
log *kQ (})
(16.9)
=
g}
o
}=1
o=2
= #(Q + 1) + 1
0
(})
where #(}) = KK(})
is the digamma function (Abramowitz and Stegun,
1968, Section 6.3) and the Euler constant is = 0=57721 = = =. Similarly,
356
the variance (2.27) follows from the logarithm of the generating function
OkQ (}) = log (Q + }) log (Q + 1) log (} + 1) as
Var[kQ ] = # 0 (Q + 1) # 0 (2) + #(Q + 1) + 1
2
+ # 0 (Q + 1)
6
Using the asymptotic formulae for the digamma function leads to
1
H[kQ ] = log Q + 1 + R
Q
2
1
+R
Var[kQ ] = log Q +
6
Q
= #(Q + 1) +
(16.10)
(16.11)
1
Q }31
*kQ (}) =
1+R
(} + 1)
Q
P
1
n
= "
Introducing the Taylor series of K(})
n=1 fn } where the coe!cients fn
are listed in Abramowitz and Stegun (1968, Section 6.1.34), we obtain with
Q } = h} log Q ,
"
"
X
1
logn Q n
1 X
n31
} 1+R
fn }
*kQ (}) =
Q
n!
Q
n=1
n=0
1 " n
1+R Q X X
logn3p Q n
=
}
fp+1
Q
(n p)!
n=0 p=0
357
the URT are approximately H[kQ ] Var[kQ ] log Q . The accuracy of the
Poisson approximation can be estimated by comparison with the average
(16.10) and the variance (16.11) found above up to second order in Q . For
example, if the URT has Q = 104 nodes, the Poisson approximation yields
H[kQ ] = Var[kQ ] = 9=21034, while the average (16.10) is H[kQ ] = 8=78756
accurate up to 1034 and the variance (16.11) is Var[kQ ] = 8=14262. The
exact results are H[kQ ] = 8=78761 and Var[kQ ] = 8=14277.
(31)Q 31 VQ
Q!
1
Q
we obtain, for 1 n Q 1,
Pr[kQ = n> kQ 6= 0]
Pr[kQ 6= 0]
Q
Pr[kQ = n]
Q 1
Using (16.8), we nd
(n+1)
Pr[KQ = n] =
Q (1)Q3(n+1) VQ
Q 1
Q!
(16.14)
Q31
X
Pr[KQ = n] } n
n=1
Q31
Q X
Q
Pr[kQ = 0]
Pr[kQ = n] } n
Q 1
Q 1
n=0
Q
1
=
*kQ (})
Q 1
Q
Q 31
Q X1
Q 1
o
o=2
(16.15)
358
Pr[KQ = n] = Pr[kQ = n] + R
1
Q
Asia
Europe
USA
fit with log(NAsia) = 13.5
fit with log(NEurope) = 12.6
fit with log(NUSA) = 12.9
0.10
Pr[H = k]
0.08
0.06
0.04
0.02
0.00
0
10
15
20
25
30
hop k
Fig. 16.4. The histograms of the hopcount derived from the trace-route measurement in three continents from CAIDA in 2004 are tted by the pdf (16.12) of the
hopcount in the URT.
359
!#
n
n
n
X
Y
Y
3}y
q(Q q)
q
H h3}q =
H h n = H exp }
=
} + q(Q q)
q=1
q=1
q=1
(16.16)
Q31
X
n=1
Q31 n
1 X Y q(Q q)
Q 1
} + q(Q q)
(16.17)
n=1 q=1
because any node apart from the root D but including the destination node
E has equal probability to be the n-th attached node.
The average weight is
Q 31
n
g*ZQ (})
1 X g Y q(Q q)
H [ZQ ] =
=
g}
Q 1
g} q=1 } + q(Q q)
}=0
n=1
1
d
}=0
360
n
g Y q(Q q)
g} q=1 } + q(Q q)
n
Y
}=0
q(Q q) g
=
} + q(Q q) g}
q=1
=
n
X
q(Q q)
log
} + q(Q q)
q=1
}=0
n
X
1
q(Q q)
q=1
gives
Q31 n
Q31
Q
31
X
1 X
1
1
1 XX
H [ZQ ] =
=
1
Q 1
q(Q q)
Q 1 q=1 q(Q q)
q=1
1
Q 1
n=1
Q31
X
q=1
n=q
Q q
q(Q q)
Q31
#(Q ) +
1 X 1
=
Q 1
q
Q 1
(16.18)
q=1
For large Q ,
log Q +
+R
H [ZQ ] =
Q
1
Q2
Similarly, the variance is computed (see problem (ii) in Section 16.9) as,
2
P
Q 31 1
Q31
X
q=1 q
1
3
Var [ZQ ] =
(16.19)
2
Q (Q 1)
q
(Q 1)2 Q
q=1
and for large Q,
2
+R
Var [ZQ ] =
2Q 2
log2 Q
Q3
Q<"
(16.20)
361
Since ZQ equals the sum of the link weights of the shortest path from
the root to an arbitrary node and since KQ = kQ |kQ A 0 is the number of
links in that shortest path (where the arbitrary destination node is dierent
from the root), one may wonder whether there is a relation between them.
Although the shortest path has precisely KQ hops, the destination node of
that path is not necessarily the KQ -th attached node to the URT grown
at the root. The destination node cannot be discovered sooner than the
KQ -th attached node, otherwise the hopcount of the shortest path would be
shorter than KQ . Hence, the destination node is the n-th discovered node
and attached to the URT somewhere in between the KQ 1-th and the last
attached node. Thus, n 5 [KQ > Q 1]. If n = KQ , then all previously
discovered nodes belong to the shortest path and the m-th attached node in
the URT is linked to the m 1-th, for all m n. If n A KQ , precisely n KQ
of the attached nodes do not belong to the shortest path. Hence ZQ = Zn
provided n KQ nodes in the URT discovered so far do not belong to the
path and precisely KQ do. The latter condition requires the determination
of all structurally favorable possibilities which is rather complex.
Curiously, the probability that the shortest path consists of the direct
(2)
link between source and destination is, with (16.14), (16.18) and VQ =
P
Q31 1
(1)Q (Q 1)! n=1
n,
Pr[KQ
Q31
1 X 1
= H [ZQ ]
= 1] =
Q 1
n
n=1
362
"
]=
0
Q31
Y
q=1
q(Q q)
q(Q q) + {
(16.21)
Q31
X
H [q ] =
q=1
Q31
X
q=1
Q31
1
2 X 1
2
=
= (#(Q ) + ) (16.22)
q(Q q)
Q q=1 q
Q
Q31
Q31
1
2 X 1
4 X 1
=
+
q2 (Q q)2
Q 2 q=1 q2 Q 3 q=1 q
q=1
q=1
(16.23)
log Q
2
For large Q , we have that Var[WQ ] = 3Q 2 + R Q 3 .
Var[WQ ] =
Q31
X
Var [q ] =
Q31
X
Both the exponential and uniform distribution are regular distributions with extreme value
index = 1. This means that the small link weights that are most likely included in the
shortest path are almost identically distributed for all regular distributions with same iz (0).
363
Q31
Q2
q=1 { + 4 q
For Q = 2P , using
K(}+p)
K(}+1)
*W2P ({) =
Qp31
q=1
Q 2
2
(16.24)
!2
s
(2P )(1 + { + P 2 P )
s
(P + { + P 2 )
(16.25)
s
{
For large P , there holds { + P 2 P + 2P
, provided |{| ? 2P . After
substitution of { = 2P | in (16.25), with ||| ? 1, we obtain
*W2P (2P |) 2 (1 + |)
2 (2P )
2 (1 + |)(2P )32|
2 (2P + |)
Q<"
lim H[h
Q<"
1
] = lim
Q<" Q
"
3|w
h
3"
||| ? 1
iWQ
w + 2 log Q
Q
(16.26)
gw
= 2 (1 + |)
This limit demonstrates that the probability distribution function of the
random variable QWQ 2 log Q converges to a probability distribution with
Laplace transform 2 (1 + |). Let us dene the normalized density function
w + 2 log Q
1
(16.27)
jQ (w) = iWQ
Q
Q
We can prove convergence in density, i.e. limQ <" jQ (w) = j (w) and that the latter exists. By
the inversion theorem for Laplace transforms we obtain for w M R,
lim jQ (w) = lim
Q <"
Q <"
1
2l
f+l"
f3l"
where 0 ? f ? 1. Since K(}) is analytic over the entire complex plane except for simple poles at
the points } = 3q for q = 0> 1> 2> ===> we nd that Q 2| *WQ (Q|) is analytic whenever the real part
of | is non-negative. Evaluation along the line Re(|) = f = 0 then gives
] "
1
lim jQ (w) = lim
hlwx Q 2lx *WQ (lQx)gx
Q <"
Q <" 2 3"
364
1 + x2
x4
when x| A 1> and |*WQ (lQx)| $ 1> for |x| $ 1= This follows from the rst equality in (16.24),
using only the factors in the product with q = 1 and q = Q 3 1> and bounding the other factors
using
q(Q 3 q)
$1
|q(Q 3 q) + lQx|
The Dominated Convergence Theorem 6.1.4 allows us to interchange the limit and integration
operator such that
lim jQ (w) =
Q <"
1
2
1
=
2l
"
3"
] l"
Q <"
1
2l
l"
3l"
hw| K2 (1 + |)g|
(16.28)
3l"
f+l"
h|x (| + 1) g|
f3l"
which shows that (16.28) is the two-fold convolution of the probability den3w
g
sity function gw
(w)> where (w) = h3h is the Gumbel distribution (3.37).
Furthermore, the two-fold convolution is given by
Z "
g (2W)
3x
3(w3x)
h3h h3h
gx
(w) = h3w
gw
3"
Z "
w
3w
3w@2
x gx
=h
exp 2h
cosh
2
3"
Z "
h
i
= 2h3w
exp 2h3w@2 cosh (x) gx = 2h3w N0 2h3w@2
0
where N ({) denotes the modied Bessel function (Abramowitz and Stegun,
1968, Section 9.6) of order .
In summary,
g (2W)
(w) = 2h3w N0 2h3w@2
(16.29)
lim jQ (w) = j(w) =
Q<"
gw
365
3"
(16.30)
The right-hand side of (16.29) is maximal for w = 0=506357, which is slightly
smaller than = 0=577261> but still in accordance with H[WQ ] given by
(16.22). The asymmetry shows that {Q WQ 2 log Q + }} is much more
0
10
M=5
M = 10
M = 20
limit M of
-1
10
-2
10
g2M(t)
g2M(t)
0.20
-3
10
0.15
0.10
0.05
-4
10
0.00
-4
-2
10
-5
10
-4
-2
10
Fig. 16.5. The scaled density jQ (w) for three values of Q = 2P (dotted lines) and
the asymptotic result (full line) on a log-lin scale. The insert is drawn on a lin-lin
scale.
likely than the event {Q WQ 2 log Q }}, which conrms the intuition that
the ooding time can be much longer than the average H[WQ ], but not so
much shorter than H[WQ ]. Figure 16.5 illustrates the convergence of jQ (w) to
the limit in (16.29). When comparing (16.26) with the corresponding result
(C.6) for the weight of the shortest path, we observe that, for large Q , the
random variable Q WQ 2 log Q consists of the sum of Q ZQ;1 log Q +
Q ZQ;2 log Q , where both Q ZQ;m log Q are i.i.d. random variables.
Intuitively, we can say that the ooding time consists of the time to travel
from a left-hand corner of the graph to the center and from the center to a
right-hand corner of the graph.
The asymptotic distribution (16.30) is a beautiful example of a sum of Q
366
Q nodes
n
o and by GQ the cardinality (the number of elements) of this set
(n)
GQ . Since each node appears only in one set, it holds for any graph
that
Q31
X (n)
GQ = Q
(16.31)
n=1
m=1
m=1
m=1 n=1
m=1
367
The evolution scenario in three parts is generally applicable for any class
of trees that possess a growth law. It does not hold for graphs in general
because only in a tree, a node has one well-dened parent node and the
in-degree is one. Using the law of total probability (2.46) yields,
i
h
n
n
oi h
oi
h
(n)
(n)
(n) (n31)
(n) (n31)
@ GQ >GQ
@ GQ >GQ
Pr qQ+1 5
Pr GQ +1 = m = Pr GQ = m|qQ +1 5
oi h
oi
n
n
h
(n)
(n)
(n)
Pr qQ+1 5 GQ
+ Pr GQ = m + 1|qQ+1 5 GQ
h
oi h
oi
n
n
(n)
(n31)
(n31)
+ Pr GQ = m 1|qQ+1 5 GQ
Pr qQ+1 5 GQ
If the process of attaching a new node Q + 1 does not depend on the way
thehQ previous nodes
i arehattached ibut rather on their number, there holds
(n)
(n)
Pr GQ = m|qQ+1 = Pr GQ = m . This property holds for the URT. We
obtain a three point recursion for n A 1,
oi
h
i
h
i h
n
(n)
(n)
(n)
(n31)
Pr GQ+1 = m = Pr GQ = m Pr qQ+1 5
@ GQ > GQ
i h
n
oi
h
(n)
(n)
+ Pr GQ = m + 1 Pr qQ +1 5 GQ
oi
i h
n
h
(n)
(n31)
+ Pr GQ = m 1 Pr qQ +1 5 GQ
The probability generating function
"
h (n) i X
h
i
(n)
*G (}> Q ; n) = H } GQ =
Pr GQ = m } m
m=0
368
for any n 5 [1> nmax ] because the attachment of the node qQ+1 is possible
to any non-empty set, this means that the absence
of nodes
with degree
h
i
(n)
n 5 [1> nmax ] cannot occur in URTs, thus Pr GQ = 0 = 0. A consequence is that the probability generating function *G (}> Q ; n) is at least
R (}) as
h } $ 0 n(for Q A 1).oiAfter using *G (0> Q ; n) = 0 and eliminat(n)
(n31)
, the recursion relation for the probability
@ GQ > GQ
ing Pr qQ +1 5
generating function becomes5
oi
n
4
3 h
(n)
h
oi
n
5
G
Pr
q
Q+1
Q
*G (}> Q + 1; n)
(n31) D
(1 })
= 1+C
Pr qQ+1 5 GQ
*G (}> Q ; n)
}
(16.33)
The special case for n = 1 and Q A 1 is
h
oi
n
4
3
(1)
(1 })
Pr qQ+1 5 GQ
D *G (}> Q ; 1)
*G (}> Q + 1; 1) = C1 +
}
n equals
H GQ
Q
*G (}> Q + 1; n) = C1 + C
i
h
(n)
H GQ
Q}
h
i4
4
(n31)
H GQ
D (1 })D *G (}> Q ; n)
Q
(16.34)
k (n) l
(n)
= 1 for all p $ n because Gp = 0 if
With the initialization *G (}> p; n) = H } Gp
1 ? p $ n, after iterating (16.33) we arrive at
*G (}> Q ; n) =
Q
31
\
p=n
k
rl
rl
q
q
k
1
(n)
(n31)
1 3 Pr qp+1 M Gp
13
(1 3 })
3 Pr qp+1 M Gp
}
369
(n)
For large Q and using the asymptotics of the Stirling numbers VQ of the
rst kind (Abramowitz and Stegun, 1968, Section 24.1.3.III), the asymptotic
law is
h
i
!
(n)
H GQ
1
logn31 Q
= n +R
Pr [GURT = n] =
(16.38)
Q
2
Q2
The ratio of the average number of nodes with degree n over the total number
of nodes, which equals the probability that an arbitrary node in a URT of
size Q has degree n,
exponentially fast with rate ln 2.
i
h decreases
(n)
The variance Var GQ is most conveniently computed from the logarithm
of the probability generating function with (2.27). By taking the logarithm
of both sides in (16.34) and dierentiating twice and adding (16.35), we
obtain
i
h
i
h
(n)
(n)
Var GQ+1 = i (Q ; n) + Var GQ
where
3
i (Q ; n) = C
i
h
(n)
H GQ
Q
h
h
i 42 3 h
i
i4
(n31)
(n)
(n31)
H GQ
H GQ
H GQ
D +C
D
+
+
Q
Q
Q
370
h
i
(n)
Since Var Gp = 0 for p n, the general solution is
Q
i X
h
(n)
i (m; n)
Var GQ =
m=n
!
(n)
Var GQ
log2n32 Q
3
1
=
+R
Q
2n 22n
Q2
(16.39)
(n)
G
(n)
In practice, if we use the estimator wQ = QQ for the probability that the
degree of a node equals n, then (a) the estimator is unbiased
because the
l
k
(n)
h i
H
G
Q
(n)
mean of the estimator H wQ equals the correct mean
and (b) the
Q
l
k
(n)
h i
(n)
Var GQ
G
(n)
variance Var wQ = Var QQ =
$ 0 as R Q1 for large Q .
2
Q
10
Pr[DU = k]
10
10
10
10
-1
-2
-3
-4
10
12
14
16
18
20
22
24
26
28
30
32
34
Fig. 16.6. The histogram of the degree GX derived from the graph JX formed by
the union of paths measured via trace-route in the Internet. Both measurements in
2003 and 2004 are tted on a log-lin plot and the correlation coe!cient quanties
the quality of the t.
The law (16.38) is observed in Fig. 16.6, which plots the histogram of the
degree GX in the graph JX . The graph JX is obtained from the union of
trace-routes from each RIPE measurement box to any other box positioned
371
16.6.3 The degree of the shortest path tree in the complete graph
with i.i.d. exponential link weights
In the complete graph NQ with i.i.d. exponential link weights, any node q
possesses equal properties in probability because of symmetry. If we denote
by gq the degree of node q in the shortest path tree rooted at that node q,
the symmetry implies that Pr [gq = n] = Pr [gl = n] for any node q and l. In
fact, we consider here the degree of a URT as an overlay tree in a complete
graph. Concentrating on a node with label 1, we obtain from (16.32)
i
h
i
h
(n)
(1)
H GQ = Q Pr [g1 = n] = Q Pr [Q = n
The latter follows from the fact that the degree of a node is equal to the
(1)
number of its direct neighbors, the nodes at level 1, [Q . By denition of
the URT, the second node surely belongs to the level set 1, while node 3 has
equal probability to be attached to the root or to node 2. In general, when
attaching a node m to a URT of size m 1, the probability that node m is
1
. Thus, the number of nodes at level 1 in the
attached to the root equals m31
URT (constructed upon the complete graph) is in distribution equal to the
sum of Q 1 independent Bernoulli random variables each with dierent
1
,
mean m31
(1) g
[Q =
Q
X
m=2
Bernoulli
1
m1
Q31
X
m=1
1
Bernoulli
m
372
Using the
generating function (3.1) of a Bernoulli random vari probability
Bernoulli 1m
able, H }
= 1 1m + }m , yields
h (1) i Q31
Y } + m 1 (} + Q 1)
[Q
=
H }
=
m
(})(Q )
m=1
n=1
Pr [gq = n] = Pr[kQ31
(1)Q313n VQ31
= n 1] =
(Q 1)!
(16.40)
The degree of an arbitrary node in the union of all shortest paths trees in
the complete graph NQ with i.i.d. exponential link weights is also given by
(16.40) because in that union each node q is once a root and further plays,
6
373
by symmetry, the role of the m-th attached node in the URT rooted at any
other node in NQ .
16.7 The minimum spanning tree
From an algorithmic point of view, the shortest path problem is closely related to the computation of the minimum spanning tree (MST). The Dijkstra
shortest path algorithm is similar to Prims minimum spanning tree algorithm (Cormen et al., 1991). In this section, we compute the average weight
of the MST in a graph with a general link weight structure.
374
a
e
c
d
f
b
We will now transform the mean degree rg in the random graph Js (Q )
to the mean degree MST in the corresponding stage in Kruskal growth
process of the MST. In early stages of the growth each selected link will
be added with high probability such that MST = rg almost surely. After
some time the probability that a selected link is forbidden increases, and
thus rg exceeds MST . In the end, when connectivity of all Q nodes is
reached, MST = 2 (since it is a tree) while rg = R(log Q ), as follows from
(15.19) and the critical threshold sf logQQ .
Consider now an intermediate stage of the growth as illustrated in Fig. 16.8.
Assume there is a giant component of average size Q V and qo = Q(1 V)@vo
small components of average size vo each. Then we can distinguish six types
of links labelled d-i in Fig. 16.8. Types d and e are links that have been
375
chosen earlier in the giant component (d) and in the small components (e)
respectively. Types f and g are eligible links between the giant component
and a small component (f) and between small components (g) respectively.
Types h and i are forbidden links connecting nodes within the giant component (h), respectively within a small component (i ). For large Q , we can
enumerate the average number of links O{ of each type {:
Od + Oe = 12 PVW Q
Of = VQ (1 V)Q
Og = 12 q2o v2o
Oh = 12 (VQ )2 VQ
Oi = 12 qo vo (vo 1) qo (vo 1)
1
Og = Q 2 (1 V)2 >
2
1
Oh = Q 2 V 2
2
Of +Og
Of +Og +Oh +Oi
t = 1 V2
(16.41)
In contrast with the growth of the random graph Js (Q ) where at each stage
a link is added with probability s, in the Kruskal growth of the MST we are
only successful to add one link (with probability 1) per 1t stages on average.
Thus the average number of links added in the random graph corresponding
1
to one link in the MST is 1t = 13V
2 . This provides an asymptotic mapping
between rg and MST in the form of a dierential equation,
grg
1
=
gMST
1 V2
By using (15.22), we nd
gMST grg
(1 + V) (V + (1 V) log(1 V))
gMST
=
=
gV
grg gV
V2
Integration with the initial condition MST = 2 at V = 1, nally gives the
average degree MST in the MST as function of the fraction V of nodes in
the giant component
MST (V) = 2V
(1 V)2
log(1 V)
V
(16.42)
As shown in Fig. 16.9, the asymptotic result (16.42) agrees well with the
simulation (even for a single sample), except in a small region around the
transition MST = 1 and for relatively small Q .
The key observation is that all transition probabilities in the Kruskal
376
1.0
0.8
N = 1000
N = 10000
N = 25000
Theory
0.6
0.4
0.2
0.0
0.0
0.5
1.0
1.5
2.0
Fig. 16.9. Size of the giant component (divided by Q ) as a function of the mean
degree M ST . Each simulation for a dierent number of nodes Q consists of one
MST sample.
growth process asymptotically depend on merely one parameter V, the fraction of nodes in the giant component, and V is called an order parameter in
statistical physics. In general, the expectation of an order parameter distinguishes the qualitatively dierent regimes (states) below and above the phase
transition. In higher dimensions, uctuations of the order parameter around
the mean can be neglected and the mean value can be computed from a selfconsistent mean-eld theory. In our problem, the underlying complete (or
random) graph topology makes the problem eectively innite-dimensional.
The argument leading to (15.20) is essentially a mean-eld argument.
O
X
z(m) 1mMMST
(16.43)
m=1
where z(m) is the m-th smallest link weight. The average MST weight is
H [ZMST ] =
O
X
m=1
H z(m) 1mMMST
377
The random variables z(m) and 1mMMST are independent because the m-th
smallest link weight z(m) only depends on the link weight distribution and
the number of links O, while the appearance of the m-th link in the MST
only depends on the graphs topology, as shown in Section 16.7.1. Hence,
O
X
H z(m) Pr [m 5 MST]
(16.44)
m=1
I1 h3
2
(m3)2
2 2
, which
we
{m = H z(m) ' Iz31 ( Om ).
peaks at m = . For large Q and xed
We found before in (16.41) that the link ranked m appears in the MST
with probability
have7
Pr [m 5 MST] = 1 Vm2
where Vm is the fraction of nodes in the giant component during the construction process of the random graph at the stage where the number of
links precisely equals m. Since links are added independently, that stage in
fact establishes the random graph Ju (Q> O = m). Our graph under
Q consideration is the complete graph NQ such that we add in total O = 2 links.
7
31
In general, it holds that z(n) = Iz
(X(n) ) and
31
31
H z(n) = H Iz
(X(n) ) 6= Iz
(H X(n) )
but, for a large number of order statistics O, the Central Limit Theorem 6.3.1 leads to
m
31
31
H z(n) ' Iz
(H X(n) )
' Iz
O
because for a uniform random variable X on [0,1] the average weight of the m-th smallest link
is exactly
m
m
H z(l) =
'
O+1
O
378
2O
Q,
it follows that
log(1 Vm )
2m
=
Q
Vm
Hence,
H [ZMST ] '
O
X
m=1
Iz31
(16.46)
m
1 Vm2
O
Iz31
H [ZMST ] '
1 Vx2 gx
O
1
Substituting { = 2x
Q (which is the average degree in any graph J (Q> x))
2
yields for large Q where O ' Q2 ,
Z
Z
Q Q 31 {
Q Q 3131 {
2
1 V Q { g{ '
Iz
Iz
H [ZMST ] '
1 V 2 ({) g{
2
2
Q
2 0
Q
2
Q
It is known (Janson et al., 1993) that, if the number of links in the growth
process of the random graph is below Q2 , with high probability (and ignoring
a small onset region just below Q2 ), there is no giant component such that
V ({) = 0 for { 5 [0> 1]. Thus, we arrive at the general formula valid for
large Q ,
Z
Z
Q Q 31 {
Q 1 31 {
g{ +
Iz
Iz
H [ZMST ] '
1 V 2 ({) g{
2 0
Q
2 1
Q
(16.47)
The rst term is the contribution from the smallest Q@2 links in the graph,
which are included in the MST almost surely. The remaining part comes
from the more expensive links in the graph, which are included with diminishing probability since 1 V 2 ({) decreases exponentially for large { as
can be deduced from (15.21). The rapid decrease
of 1 V 2 ({) makes only
379
innity. These cases occur, for example, for polynomial link weights with
iz ({) = {31 and 6= 1. For polynomial link weights, however, holds
13 1
1
that Q2 Iz31 Q{ = Q 2 { . Formally, this latter expression reduces to the
rst order Taylor approach for = 1, apart from the constant factor iz1(0) .
Therefore, we will rst compute H [ZMST ] for polynomial link weights and
then return to the case in which the Taylor expansion is useful.
16.7.2.1 Polynomial link weights
The average weight of the MST for polynomial link weights follows8 from
(16.47) as
!
1
Z Q
1
1
Q 13
+
{ 1 V 2 ({) g{
H [ZMST ()] '
1
2
+
1
1
and g{ =
Let | = V ({) and use (15.22), then { = V 31 (|) = log(13|)
|
log(13|)
g
g| while | = V (1) = 0 and | = V (Q ) = 1, such that
g|
|
Z
L=
1
Z
=
1
{ 1 V 2 ({) g{
log(1 |)
|
1
+
+1
1
2
+1
1
1
+1
1
13
log(1 |)
g|
|
g
1 |2
g|
Z
"
h3{
{ +1
"
(1 h3{ )
h3{
1
+1
g{
(1 h3{ )
g{
(16.48)
Since the average of the n-th smallest link weight can be computed from (3.36) as
1
K n+
H!
H z(n) =
1
K (n)
K H+1+
the exact formula (16.44) reduces to
H [ZM S T
H!
()] =
K H+1+
1
H
[
K m+
m=1
1
K (m)
1 3 Vm2
380
(16.49)
where we have used (Abramowitz and Stegun, 1968,R Section 23.2.7) the
" v31
integral of the Riemann Zeta function (v) (v) = 0 hxx 31 gx, which is
convergent for Re (v) A 1. This particular case for = 1 has been proved
earlier by Frieze (1985) based on a dierent method.
16.7.2.2 Generalizations
We now return to the Taylor series valid for link weights where 0 ? iz (0) ?
4. The above result for = 1 immediately yields
H [ZMST ] =
(3)
iz (0)
(16.50)
(1)
G2
= 0,
l
k
(Q 31)
H GQ
Q
= 2, the solution
l
k
(Q 31)
=
H GQ
2
(16.51)
(Q 3 1)!
l
k
(n)
is readily veried. Since for any URT, it holds that Pr GQ = m = 0 for m A Q 3 n, we have
k
l
k
l
(Q 31)
(Q 31)
that H GQ
= Pr GQ
= 1 . Since there exists in total (Q 3 1)! dierent URTs of size
Q, this result (16.51) means that there are precisely two possible URTs with a node of degree
Q 3 1. Indeed, one is the root with Q 3 1 children and the other is the root with one child of
degree Q 3 1 that in turn possesses Q 3 2 children. Also,
(Q 31)
uQ
l
k
(Q 31)
=
= (Q 3 1)H GQ
2
(Q 3 2)!
(16.52)
i
h
(1)
16.8.2 The case n = 1: H GQ
381
If n = 1 and Q D 3, the recursion (16.35) is slightly dierent because the newly attached node
nQ +1 necessarily belongs to the set of degree 1 nodes in the URT of size Q + 1 such that
(0)
(1)
with G1 = 1 and G2
for n = 1 becomes
k
l
Q 3 1 k (1) l
(1)
H GQ +1 =
H GQ + 1
Q
l
l
k
k
(1)
(1)
(1)
= 2. With uQ = (Q 3 1)H GQ , the recursion
= 2. Hence, H G3
(1)
(1)
uQ +1 = uQ + Q
(1)
(1)
k
l
f
Q
(1)
H GQ =
+
2
Q 31
k
l
(1)
Using H G3
= 2 shows that f = 1 such that, for Q A 2,
l
k
Q
1
(1)
H GQ =
+
2
Q 31
(16.53)
i
h
(n)
16.8.3 The general case: H GQ
Let us denote
U({> |) =
31
" Q
[
[
(n)
uQ {n |Q
(16.54)
Q =3 n=2
Q =3 n=2
Q =3 n=2
" Q 31
"
1 [ [
1 [
(n)
(Q 31) Q 31 Q
QuQ {n |Q 3
QuQ
{
|
| Q =3 n=2
| Q =3
" Q 31
"
2 [ [ (n) n Q
2 [ (Q 31) Q 31 Q
uQ { | +
u
{
|
| Q =3 n=2
| Q =3 Q
382
"
"
2 [ (Q 31) Q 31 Q
1 [
(Q 31) Q 31 Q
QuQ
{
| +
u
{
|
| Q =3
| Q =3 Q
= 2 ({|)2
"
"
[
[
({|)Q
({|)Q
+ 4{|
= 2 ({|)2 h{| + 4{| (h{| 3 1)
Q!
Q!
Q =0
Q =1
such that
31
" Q
[
[
CU({> |)
2
(n)
(Q 3 1)uQ +1 {n | Q =
3 U({> |) 3 2 ({|)2 h{|
C|
|
Q =3 n=2
Similarly,
31
" Q
[
[
(n31) n Q
uQ
{ |
={
Q =3 n=2
32
" Q
[
[
(n)
uQ {n |Q
Q =3 n=1
={
31
" Q
[
[
(n)
uQ {n |Q + {2
Q =3 n=2
"
[
(1)
uQ | Q 3 {
Q =3
(1)
Q
2
{2
"
[
(1)
uQ |Q = {2
Q =3
"
[
(Q 31) Q 31 Q
uQ
{
|
=2
Q =3
(1)
uQ |Q 3 {
Q =3
= {U({> |) + {2
"
[
"
[
(Q 31) Q 31 Q
uQ
Q =3
"
[
(Q 31) Q 31 Q
uQ
Q =3
(Q 3 1) + 1 leads to
3
"
"
[
[
({|)2 g2
Q
|
{2 |3
(Q 3 1)|Q + {2
|Q =
+
2
2
2 g|
13|
13|
Q =3
Q =3
"
[
Q =3
"
[
({|)Q
({|)Q
= 2 ({|)2
= 2 ({|)2 (h{| 3 1)
(Q 3 2)!
Q!
Q =1
such that
31
" Q
[
[
(n31) n Q
uQ
{ |
Q =3 n=2
= {U({> |) +
({|)2 g2
2 g| 2
|3
13|
+
{2 | 3
3 2 ({|)2 (h{| 3 1)
13|
Combining all transforms the recursion (16.36) to a rst order linear partial dierential equation
(1 3 |)
CU({> |)
+
C|
2
{2 |3
3 3 3| + | 2
13{3
+ 2 ({|)2
+
U({> |) = {2 | 3
|
13|
(1 3 |)3
1
1
+
= {2 | 2
13|
(1 3 |)3
383
(n)
31
31
" Q
" Q
l
k
[
[
[
[
uQ
U({> |)
(n)
n Q 31
g|
=
|
=
H GQ {n | Q 31
{
|2
Q
3
1
Q =3 n=2
Q =3 n=2
Hence, if { = 1,
]
31
" Q
"
l
k
l
k
[
[
[
U(1> |)
(n)
(1)
Q 3 H GQ
| Q 31
g| =
H GQ |Q 31 =
2
|
Q =3 n=2
Q =3
"
"
"
l
k
[
[
[
Q Q 31
|Q 31
(1)
|
H GQ | Q 31 =
Q|Q 31 3
3
2
Q 31
Q =3
Q =3
Q =3
Q =3
Q =3
]
1
1
1
|
g|
=
3 (1 + 2|) 3
2 (1 3 |)2
2
13|
"
[
Q| Q 31 3
"
[
or
U(1> |) =
|2
(1 3 |)3
|2
(1 3 |)
(16.55)
]
(1 3 |){33 + (1 3 |){31 g| + f ({)
{2 (1 3 |){32
{2 (1 3 |){
3
+ f ({)
{32
{
({|)2
3
({ 3 2) (1 3 |)
{| 2
+ f ({) (1 3 |)3{31 | 2
(1 3 |)
The initial condition U (0> |) = 0 shows that f (0) = 0, while the boundary condition (16.55)
implies that f(1) = 0. Expanding this solution in a power series around { = 0 and | = 0 yields
Uk ({> |) = (1 3 |)3{31 |2 =
"
[
3{ 3 1
(31)Q |Q +2
Q
Q =0
384
From the generating function of the Stirling numbers of the rst kind (Abramowitz and Stegun,
1968, Section 24.1.3),
q
[
K({ + 1)
(m)
Vq {m
=
K({ + 1 3 q)
m=0
(16.56)
we observe that
(n+1)
Q
3{ 3 1
[
VQ +1 (31)n n
K(3{)
=
=
{
Q
Q !K(3{ 3 Q)
Q!
n=0
such that
Uk ({> |) =
(n+1)
" [
Q
[
VQ +1
Q =0 n=0
Q!
(31)Q +n {n | Q +2 =
(n+1)
32
" Q
[
[
VQ 31
Q =2 n=0
(Q 3 2)!
(31)Q +n {n |Q
Hence,
U ({> |) =
" [
"
[
Q =2 n=2
1
2n31
(n+1)
32
"
" Q
Q
[
[
[
VQ 31
{n |Q 3 {
(31)Q +n {n | Q
|Q + f ({)
2
(Q
3
2)!
Q =2
Q =2 n=0
It remains to determine f ({) by equating the corresponding powers in { and | at both sides. With
the denition (16.54), equating the second power (Q = 2) in | yields
0=
"
[
n=2
1
{n 3 { + f ({)
2n31
{2
23{
(n)
uQ {n =
n=2
"
[
n=2
S"
n
n=0 fn {
with f0 = 0,
(n+1)
Q
32
Q
[
VQ 31
n
{
3
{
+
f
({)
(31)Q +n {n
2n31 2
(Q
3
2)!
n=0
(n+1)
"
"
Q
[
[
VQ 31
n
n
{
3
{
+
f
{
(31)Q +n {n
n
2n31 2
(Q 3 2)!
n=2
n=0
n=0
3
4
(m+1)
n31
"
"
[
[
[
VQ 31
1 Q n
Q +m D n
C
(31)
=
fn3m
{
{ 3{+
2n31 2
(Q 3 2)!
m=0
n=2
n=1
3
3
4
4
(1)
(m+1)
n31
"
"
[
[
[
(31)Q VQ 31
(31)Q +m VQ 31
1 Q n
C
C
D
D {n
{ 3 { + f1
=
fn3m
{+
2n31 2
(Q 3 2)!
(Q 3 2)!
m=0
n=2
n=2
"
[
(1)
(n)
uQ {n
"
[
n=2
3
4
(m+1)
n31
"
Q
[
[
VQ 31
Q +m D n
n
C
{ +
(31)
{
fn3m
2n31 2
(Q 3 2)!
m=0
n=2
1
16.9 Problems
385
uQ =
(m+1)
Q n31
[
VQ 31
(31)Q +m
f
+
n3m
2n31 2
(Q 3 2)!
m=0
1
2n31
n31
Q (31)Q +n31 V (n)
[ (m)
(31)Q
Q 31
+
V
(32)m
+ n
2
(Q 3 2)!
2 (Q 3 2)! m=1 Q 31
Q
2Q 31
(31)q K(q3{)
K(3{)
1
Q!
1
2
Q
+
3 Q 31
+
=
2Q 31
(Q 3 1)!
2
(Q 3 1)!
(Q 3 1)!
(Q 3 1)!
+
m=0
(m)
Vq {m ,
Q
32
[
1
(31)Q
(m)
V
(32)m
+ Q 31
(Q 3 1)!
2
(Q 3 1)! m=1 Q 31
&
%
(31)Q 31 K(Q 3 1 + 2)
1
(31)Q
(Q 31)
Q 31
+
+ Q 31
3 VQ 31 (32)
(Q 3 1)!
2
(Q 3 1)!
K(2)
Q
2Q 31
Q
2
Sq
1
=
Also H GQ
1
Q 31
is readily veried.
16.9 Problems
(i) Comparison of simulations with exact results. Many of the theoretical results are easily veried by simulations. Consider the following
standard simulation: (a) Construct a graph of a certain class, e.g. an
instance of the random graphs Js (Q ) with exponentially distributed
link weights (b) Determine in that graph a desired property, e.g. the
hopcount of the shortest path between two dierent arbitrary nodes,
(c) Store the hopcount in a histogram and (d) repeat the sequence
(a)-(c) q times with each time a dierent graph instance in (a). Estimate the relative error of the simulated hopcount in Js (Q ) with
s = 1 for q = 104 > 105 and 106 .
(ii) Given the probability generating function (16.17) of the weight of the
shortest path in a complete graph with independent exponential link
weights, compute the variance of ZQ .
(iii) Prove the asymptotic law (16.20) of the weight of the shortest path
in a complete graph with i.i.d. exponential link weights.
(iv) In a communication network often two paths are computed for each
important ow to guarantee su!cient reliability. Apart from the
shortest path between a source D and a destination E, a second path
between D and E is chosen that does not travel over any intermediate
router of the shortest path. We call such a path node-disjoint to the
shortest path. Derive a good approximation for the distribution of
386
17
The e!ciency of multicast
The assumption ignores shared tree multicast forwarding such as core-based tree (CBT, see
RFC2201).
387
388
that, if p and Q are large, deviations from the uniformity assumption are
negligibly small. Also the Internet measurements of Chalmers and Almeroth
(2001) seem to conrm the validity of the uniformity assumption.
17.1 General results for jQ (p)
Theorem 17.1.1 For any connected graph with Q nodes,
Qp
(17.1)
p+1
Proof: We need at least one edge for each dierent user; therefore
jQ (p) p and the lower bound is attained in a star topology with the
source at the center.
We will next show that an upper bound is obtained in a line topology.
It is su!cient to consider trees, because multicast only uses shortest paths
without cycles. If the tree has not a line topology, then at least one node
has degree 3 or the root has degree 2. Take the node closest to the root
with this property and cut one of the branches at this node; we paste that
branch to a node at the deepest level. Through this procedure the multicast
function jQ (p) stays unaltered or increases. Continuing in this fashion until
we reach a line topology demonstrates the claim.
For the line topology we place the source at the origin and the other
nodes at the integers 1> 2> = = = > Q 1. The links of the graph are given by
(l> l + 1)> l = 0> 1> = = = > Q 2. The multicast gain jQ (p) equals H [P ], where
P is the maximum of a sample of size p, without replacement, from the
integers 1> 2> = = = > Q 1. Thus,
n
p jQ (p)
p
Pr [P n] = Q31
>
pn Q 1
n=p
Q
31
X
pQ
pQ
p
Q
=
p+1
p+1
p
n=p
n=p p+1
PQ31 n Q
where we have used that
n=p p @ p+1 = 1, because it is a sum of
probabilities over all possible disjoint outcomes.
=p
p
Q31
=
389
Nm/(m + 1)
N1
N/2
clog(N)
1
m
1
N1
Fig. 17.1. The allowable region (in white) of jQ (p). For exponentially growing
graphs, H[KQ ] = f log Q , implying that the allowable region for these graphs is
smaller and bounded at the left (in dotted line) by the straight line p(f log Q ).
Theorem 17.1.2 For any connected graph with Q nodes, the map p 7$
(p)
jQ (p) is concave and the map p 7$ ijQ
is decreasing.
Q (p)
Proof: Dene \p to be the random variable giving the additional number
of hops necessary to reach the p-th user when the rst p1 users are already
connected. Then we have that
H [\p ] = jQ (p) jQ (p 1)
Moreover, let \p0 be the random number of additional hops necessary to
reach the p-th multicast group member, when we discard all extra hops
of the (p 1)-st group member. An example is illustrated in Fig. 17.2.
The random variable \p0 has the same distribution as \p31 , because both
the (p 1)-st and the p-th group member are chosen uniformly from the
remaining Q p 1 nodes. In general, \p0 6= \p31 > but, for each n,
Pr[\p0 = n] = Pr[\p31 = n] and, hence,
(17.2)
H \p0 = H [\p31 ]
Furthermore, we have by construction that \p \p0 with probability 1,
implying that
(17.3)
H [\p ] H \p0
Indeed, attaching the p-th group member to the reduced tree takes at least
as many hops as attaching that same group member to the non-reduced tree
because the former is contained in the latter and the extra hops added by
390
the p 1 group member can only help us. Combining (17.2) and (17.3)
immediately gives
jQ (p) jQ (p 1) = H [\p ] H \p0 = jQ (p 1) jQ (p 2) (17.4)
This is equivalent to the concavity of the map p 7$ jQ (p).
Root
A
jQ (p)
iQ (p)
n=1
n=1
n=1
391
Next, we will give a representation for jQ (p) that is valid for all graphs.
Let [l be the number of joint hops that all l uniformly chosen and dierent
group members have in common, then the following general theorem holds,
Theorem 17.1.3 For any connected graph with Q nodes,
p
X
p
(1)l31 H [[l ]
jQ (p) =
l
(17.5)
l=1
Note that
jQ (1) = iQ (1) = H [[1 ] = H [KQ ]
so that the decrease in average hops or the gain by using multicast over
unicast is precisely
p
X
p
(1)l31 H [[l ]
jQ (p) iQ (p) =
l
l=2
for 1 l Q
and
H [[2 ] = H [|Dl _ Dm |] >
for 1 l ? m Q
etc.. Now, jQ (p) = H [|D1 ^ D2 ^ ^ Dp |]. Since T(D) = H [|D|] @ Q2 is
a probability measure on the set of all links, we obtain from
Q the inclusionexclusion formula (2.3) applied to T and multiplied with 2 afterwards,
H [|D1 ^ D2 ^ ^ Dp |] =
p
X
H [|Dl |]
l=1
H [|Dl _ Dm |] +
l?m
+ (1)p31 H [|D1 _ D2 _ _ Dp |]
p
H [[2 ] + + (1)p31 H [[p ]
= pH [[1 ]
2
This proves Theorem 17.1.3.
392
(17.6)
l=1
Corollary 17.1.5 means that for any connected graph, including the graph
describing the Internet, the ratio of the unicast over multicast e!ciency is
bounded by the expected hopcount in unicast. In order words, the maximum
savings in resources an operator can gain by using multicast (over unicast)
never exceeds H [KQ ], which is roughly about 15 in the current Internet.
17.2 The random graph Js (Q )
In this section, we conne to the class RGU, the random graphs of the
class Js (Q ) with independent identically and exponentially distributed link
weights z with mean H [z] = 1 and where Pr[z {] = 1 h3{ , { A 0.
In Section 16.2, we have shown that the corresponding SPT is, asymptotically, a URT. The analysis below is exact for the complete graph NQ while
asymptotically correct for connected random graphs Js (Q ).
393
(Q p 1)(Q 1 + p})
p2 }
*
(})
+
K
(p)
Q
1
2 *KQ 1 (p1) (})
(Q 1)2
(Q 1)
(17.8)
Proof: To prove (17.8), we use the recursive growth of URTs: a URT of size
Q is a URT of size Q 1, where we add an additional link to a uniformly
chosen node.
1
2 N
N
Case A
Case B
Case C and D
Fig. 17.3. The several possible cases in which the Q -th node can be attached uniformly to the URT of size Q 1. The root is dark shaded while the p multicast
member nodes are lightly shaded.
h
i
p
p
*KQ (p) (}) = 1
*KQ 31 (p) (}) +
H } 1+OQ 31 (p) (17.9)
Q 1
Q 1
where OQ31 (p) is the number of links in the subtree of the URT of size
Q 1 spanned by p 1 uniform nodes and the one refers to the link from
394
the added Q -th node to its ancestor in the URT of size Q 1. We complete
the proof by investigating the generating function of OQ31 (p). Again, there
are two cases. In the rst case (B in Fig. 17.3), the ancestor of the added
Q -th node is one of the p 1 previous nodes (which can only happen if it is
unequal to the root), else we get one of the cases C and D in Fig. 17.3. The
probability of the rst event equals p31
Q31 , the probability of the latter equals
p31
1 Q31 . If the ancestor of the added Q-th node is one of the p 1 previous
nodes, then the number of links OQ31 (p) equals KQ31 (p 1), otherwise
the generating function of the number of additional links equals
1
1
*KQ 31 (p) (}) +
*
1
(})
Q p
Q p KQ 31 (p31)
The rst contribution comes from the case where the ancestor of the added
Q -th node is not the root, and the second from where it is equal to the root,
1
1
= Q3p
. Therefore,
which has probability Q313(p31)
i p1
h
*
H } OQ 31 (p) =
(})
Q 1 KQ 31 (p31)
Since jQ (p) = H[KQ (p)] = *0KQ (p) (1), we obtain the recursion for
jQ (p),
p2
p2
p
jQ (p) = 1
j
(p)
+
jQ31 (p 1) +
Q31
(Q 1)2
(Q 1)2
Q 1
(17.11)
Theorem 17.2.2 For all Q 1 and 1 p Q 1,
p
i p!(Q 1 p)! X
h
(Q + n})
p
KQ (p)
=
(1)p3n
*KQ (p) (}) = H }
2
(1 + n})
n
((Q 1)!)
n=0
(17.12)
Consequently,
(m+1) (p)
p!(1)Q3(m+1) VQ Sm
Pr [KQ (p) = m] =
(Q 1)! Q31
p
(17.13)
395
(p)
where VQ
and Sm denote the Stirling numbers of rst and second kind
(Abramowitz and Stegun, 1968, Section 24.1).
Proof: By iterating the recursion (17.8) for small values of p, the computations given in
van der Hofstad et al. (2006a, Appendix) suggest the solution (17.12) for (17.8). One can verify
that (17.12) satises (17.8). This proves (17.12) of Theorem 17.2.2. Using (Abramowitz and
Stegun, 1968, Section 24.1.3.B), the Taylor expansion around } = 0 equals
*KQ (p) (}) =
p
K(Q + n})
p!Q(Q 3 1 3 p)! [ p
1
(31)p3n
3
n
(Q 3 1)!
Q!K(1 + n})
Q
n=0
(m+1)
Q
31
p
[
(31)Q 3(m+1) VQ
p!Q(Q 3 1 3 p)! [ p
(31)p3n
nm } m
n
(Q 3 1)!
Q!
m=1
n=0
$
# p
(m+1)
Q 31
[ p
p!Q(Q 3 1 3 p)! [ (31)Q 3(m+1) VQ
(31)p3n nm } m
=
n
(Q 3 1)!
Q!
m=1
n=0
Using the denition of Stirling numbers of the second kind (Abramowitz and Stegun, 1968,
24.1.4.C),
p
[
p
(p)
(31)p3n nm
=
p!Sm
n
n=0
(p)
for which Sm
= 0 if m ? p, gives
*KQ (p) (}) =
Q 31
(p!)2 (Q 3 1 3 p)! [
(m+1) (p) m
(31)Q 3(m+1) VQ
Sm }
2
((Q 3 1)!)
m=1
Figure 17.4 plots the probability density function of K50 (p) for dierent
values of p.
Corollary 17.2.3 For all Q 1 and 1 p Q 1,
jQ (p) = H [KQ (p)] =
Q
X
1
pQ
Q p
n
(17.14)
n=p+1
and
P
1
2 (p)
p2 Q 2 Q
jQ
Q 1+p
n=p+1 n2
Var [KQ (p)] =
jQ (p)
Q +1p
(Q + 1 p) (Q p)(Q + 1 p)
(17.15)
The formula (17.14) is proved in two dierent ways. The earlier proof
presented in Section 17.6 below does not rely on the recursion in Lemma
17.2.1 nor on Theorem 17.2.2. The shorter proof is presented here. Formula
(17.14) can be expressed in terms of the digamma function #({) as
#(Q ) #(p)
1
(17.16)
jQ (p) = pQ
Q p
396
Pr[H50(m) = j]
0.4
0.3
0.2
0.1
0.0
0
10
20
30
40
50
j hops
Fig. 17.4. The pdf of K50 (p) for p = 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45,
47.
Proof of Corollary 17.2.3: The expectation and variance of KQ (p) will not be obtained
using the explicit probabilities (17.13), but by rewriting (17.12) as
p
k
l
K(p + 1)K(Q 3 p) [ p
(31)p3n CwQ 31 wQ 31+n}
2
w=1
n
K (Q)
n=0
k
l
K(p + 1)K(Q 3 p)
(31)p CwQ 31 wQ 31 (1 3 w} )p
=
w=1
K2 (Q)
(17.17)
Indeed,
k
l
K(p + 1)K(Q 3 p)
Q 31 Q 31
p
} p
(31)
w
C
C
(1
3
w
)
}
w
w=}=1
K2 (Q)
k
l
K(p + 1)K(Q 3 p)
p(31)p31 CwQ 31 wQ log w(1 3 w)p31
=
>
2
w=1
K (Q)
k
l
K(p + 1)K(Q 3 p)
H[KQ (p) (KQ (p) 3 1)] =
(31)p C}2 CwQ 31 wQ 31 (1 3 w} )p
2
w=}=1
K (Q)
K(p + 1)K(Q 3 p)
p31
=
p(31)
K2 (Q)
k
l
CwQ 31 wQ log2 w(1 3 w)p32 [3(p 3 1)w + (1 3 w)]
H[KQ (p)] =
w=1
Cwl (1
w)m |w=1
m!(31)m
l>m
l
K(p + 1)K(Q 3 p) Q 3 1 Q 3p k Q
p!
C
w log w
w=1
K2 (Q)
p31 w
397
Since
Cwn [wq log w]w=1 =
q!
(q 3 n)!
q
[
m=q3n+1
1
m
K(p + 1)K(Q 3 p)
(U1 + U2 )
K2 (Q)
(17.18)
where
k
l
U1 = p(p 3 1)(31)p32 CwQ 31 wQ +1 log2 w(1 3 w)p32
w=1
k
l
U2 = p(31)p31 CwQ 31 wQ log2 w(1 3 w)p31
w=1
Using
Cwn [wq
log w]w=1
q!
=2
(q 3 n)!
q
[
q
[
l=q3n+1 m=l+1
53
q
[
1
q!
7C
=
lm
(q 3 n)!
l=q3n+1
6
42
q
[
1D
18
3
l
l2
l=q3n+1
we obtain,
k
Q 3 1
l
p(p 3 1)(p 3 2)!CwQ 3p+1 wQ +1 log2 w
w=1
p32
53
6
42
Q
+1
Q
+1
Q 3 1
[ 1
[ 1
7C
8
D 3
= (Q + 1)!
p32
n
n2
n=p+1
n=p+1
U1 =
Similarly,
Q 3 1
l
k
p(p 3 1)!CwQ 3p wQ log2 w
w=1
p31
6
53
42
Q
Q
Q 3 1
[
[
1
1
8
7C
D 3
= Q!
p31
n
n2
n=p+1
n=p+1
U2 =
2p(p 3 1)Q
(Q + 1 3 p)(Q 3 p)
Q
[
n=p+1
1
n
2 (p), we
From jQ (p) = H[KQ (p)] and Var [KQ (p)] = H[KQ (p)(KQ (p) 3 1)] + jQ (p) 3 jQ
398
12
10
8
600
6
400
4
g1000 (m)
V1000 (m)
800
200
2
200
400
600
0
1000
800
Fig. 17.5. The average number of hops jQ (p) (left axis) in the SPT and the corresponding standard deviation Q (p) (right axis) as a function of the number p
of multicast group members in the complete graph with Q = 1000.
Figure 17.5 also indicates that the standard deviation Q (p) of KQ (p)
is much smaller than the average, even for Q = 1000. In fact, we obtain
from (17.15) that
Var [KQ (p)]
2 (p)
2jQ
Q 1+p
2Q jQ (p)
jQ (p)
= PQ
Q +1p
Q p
p n=p+1
1
n
2
= r(jQ
(p))
KQ (p)
%
1
jQ (p)
KQ (p)3jQ (p)
s
jQ (p)
g
KQ (p)3jQ (p)
s
$
jQ (p)
399
p
X
m=1
Q31
1 X 1
Q m
n
(17.19)
n=m
In particular, if the shortest path tree spans the whole graph, then for all
Q 2,
H [ZQ (Q 1)] =
Q31
X
q=1
2
6
1
q2
(17.20)
m
Q31
Q31
Q31
X 1 X
X 1
1
4 X 1
5
+
4
Var [ZQ (Q 1)] =
Q
n3
m3
n
m4
n=1
m=1
(17.21)
m=1
n=1
log Q
Q2
(17.22)
400
fW* (m)(x)
N
10
10
m
m
m
m
m
m
m
m
m
m
m
m
-1
-2
=1
=5
= 10
= 20
= 30
= 40
= 50
= 60
= 70
= 80
= 90
= 95
N = 100
10
10
-3
Normalized Gumbel
Normalized Gaussian N(0,1)
-4
-4
-2
Fig. 17.6. The pdf of the normalized random variable ZQ (p) for Q = 100.
a normalized Gumbel (see Theorem 6.4.1). Fig. 17.6 may suggest that, for
all p ? Q ,
{3
3I
6
Q<"
(17.23)
g
2
Q (ZQ (Q 1) (2)) $ Q 0> SPT
2
with SPT
= 4 (3) 4=80823 as follows from (17.22). This shows that simulations alone may be inadequate to deduce asymptotic behavior. Finally,
Janson (1995) gave the related result for the minimum spanning tree. He
extended Friezes result (16.49) by proving that the scaled weight of the
minimum spanning tree also tends to a Gaussian for large Q ,
s
g
2
Q (ZMST (3)) $ Q 0> MST
401
2
where2 MST
= 6 (4) 4 (3) 1=6857.
k=2
k=5
Fig. 17.7. The left hand side tree (n = 2) has Q = 31 and G = 4, while the right
hand side (n = 5) has Q = 31 and G = 2.
n G+1 1
n1
(17.24)
G31
X
m=0
Q313 nm+1 31
n G3m
n31
Qp31
(17.25)
Wstlund (2005) succeeded in computing the triple sum in Jansons original result
2
M
ST =
" [
"
" [
[
4
(l + n 3 1)!nn (l + m)l32 m
32
45
l!n! (l + m + n)l+n+2
l=0 m=1 n=1
The depth G is equal to the number of hops from the root to a node at the leaves.
402
8000
gN (m)
6000
k=3
k=2
4000
k=5
k = 10
random graph
k-ary tree
0.8
m law
4
N = 10
2000
0
0
2000
4000
6000
8000
10000
Fig. 17.8. The multicast gain jQ (p) computed for the n-ary tree with four values
of n, the random graph (with eective nrg = h = 2=718===), and the Chuang-Sirbu
power law for Q = 104 on a linear scale where the prefactor H [KQ ] is given by
(16.10).
jQ>n (p) = Q 1 n
1
G
G31
X
p
Q 1
nm+1 31
31
n31
n G3m
m=2
n
G31
n
Y
1
t=0
1
t=0
p
Q 1t
t=
n31
p
Q313t
1
?
n
1
(17.26)
Q31
n .
Moreover,
!nm
p
Q 1
31
Q nm+1
31
n31
1
m
n 31
p
Q 1t
nm 31
n31
?? 1
403
G31
X
G31
Q 1 n n3131
1 X G3m nm+1 1
=
n
Q 1
Q 1
n1
m+1
m=0
G3m
m=0
G
1
QG
+
=
Q 1 (Q 1)(n 1) n 1
(17.27)
log[1 + Q (n 1)]
1 logn Q + logn (1 1@n) + R(1@Q )
G=
log n
that
1
+R
H[KQ ] = logn Q + logn (1 1@n)
n1
logn Q
Q
(17.28)
Comparing (17.28) with the average hopcount in the random graph (16.10)
shows equality to rst order if nrg = h. Moreover, both the second order
1
terms 1 = 0=42 and log(1 1@h) h31
= 1=04 are R(1) and independent of Q . This shows that the multicast gain in the random graph is well
approximated by jQ>h (p).
404
The most realistic graph models for the Internet assume that H [KQ ]
f log Q , since this implies that the number of routers that can be reached
from any starting destination grows exponentially with the number of hops.
For these realistic graphs, Corollary 17.4.1 states that empirical Chuang
Sirbu law does not hold for all p. On the other hand, there are more regular
graphs (such as a g-lattice, where H[KQ ] ' g3 Q 1@g ) with H [KQ ] Q 0=2+
(and A 0) for which the mathematical condition p0=2 H [KQ ] is satised
for all p and Q . As shown in Van Mieghem et al. (2000), however, these
classes of graphs, in contrast to random graphs, are not leading to good
models for SPTs in the Internet.
405
(17.30)
Q ] 0=8
= H[K
{ . It is interesting
The normalized ChuangSirbu law is jQ ({Q)
Q
Q 0=2
H[KQ ]
to note that the ChuangSirbu law is best if Q 0=2 = 1, since then both
endpoints { = 0 and { = 1 coincide with (17.30). This optimum is achieved
when Q 250 000, which is of the order of magnitude of the estimated
number of routers in the current Internet. This observation may explain
the fairly good correspondence on a less sensitive log-log scale with Internet
measurements. At the same time, it shows that for a growing Internet, the t
of the ChuangSirbu law will deteriorate. For Q 106 , the ChuangSirbu
law underestimates jQ (p) for all p.
7
10
0.8
m law
random graph
6
10
10
gN (m)
1.00
4
10
0.95
Effective Power Exponent
10
10
0.90
0.85
0.80
Number of Nodes N
0.75
10
10
0
10
10
10
10
10
3
10
10
4
10
10
10
10
5
10
10
10
6
10
10
10
10
Fig. 17.9. The multicast e!ciency for Q = 10m with m = 3> 4> ===> 7. The endpoint of
each curve jQ (Q 1) = Q 1 determines Q . The insert shows the eective power
exponent versus Q .
406
log H(KQ )+(Q ) log p, which is a rst order Taylor expansion of log jQ (p)
in log p. This observation suggests the computation4 of the eective power
exponent (Q ) as
g log jQ (p)
(Q ) =
(17.31)
g log p
p=1
Only for a straight line, the dierential operator can be replaced by the
dierence operator such that (Q) W (Q )> where
W
(Q) =
jQ (2)
log H[K
Q]
log 2
(17.32)
In general, for small p, the eective power exponent (17.31) is not a constant
0.8 as in the ChuangSirbu law, but dependent on Q . Since jQ (p) is concave
jQ (p)
by Theorem 17.1.2, (Q ) is the maximum possible value for g log
at any
g log p
p 1. A direct consequence of Theorem 17.1.1 is that the eective power
exponent (Q ) 5 [ 12 > 1]. From recent Internet measurements, Chalmers and
Almeroth (2001) found that 0=66 (Q) 0=7.
The eective power exponent (Q) as dened in (17.31) for the random
graph is
2
2
Q #(Q ) + 6 + 6Q
(Q ) =
(Q 1) #(Q ) + ( 1) + Q1
while, according to the denition (17.32),
W
(Q ) =
jQ (2)
log H[K
Q]
log 2
= 1 + log2
Although (17.5) only has meaning for integer p, analytic continuation to a complex variable
is possible and, hence, dierentiation can be dened.
407
jQ (p) H [KQ ] p
H [KQ ]
(17.33)
Since jQ (p) is concave (Theorem 17.1.2), H [Q (p)] is always positive and
decreasing in p. If the scope of p is extended to real numbers, H [Q (p)]
0 (p) which simplies further estimates.
jQ
The situation where on average less than one link changes if one multicast
group member leaves may be regarded as a stable regime. Since H [Q (p)]
is always positive and decreasing in p, this stable regime is reached when
the group size p exceeds p1 , which satises H [Q (p1 )] = 1. For example,
for the URT that is asymptotically the SPT for the class RGU dened in
Section 16.2.2, this condition approximately follows from (17.29) as
(p 1)Q
Q
Q
pQ
(17.34)
log
log
H [Q (p)]
Q p
p
Q p+1
p1
5
Many recent articles devote attention to power law behavior but most of them seem prudent:
just recall the immense interest (hype?) a few years ago in the long range and self-similar nature
of Internet tra!c and the relation to the simple power law with only the Hurst parameter
(comparable to (Q) here) in the exponent.
408
Let { =
then 0 ? { ? 1 and
H [Q (p)]
{
({ 1@Q )
1
log { +
log {
Q
1{
1 ({ 1@Q )
Q
After expanding the second term in a Taylor series around { to rst order
in Q1 ,
{ 1 log {
1
H [Q ({Q )]
+R
(1 {)2
Q
For large Q , H [Q ({1 Q)] 1 occurs when {1 = 0=3161, which is the
{
= 1. For the class RGU, a stable tree as dened
solution in { of {313log
(13{)2
above is obtained when the multicast group size p is larger than p1 =
0=3161Q Q3 . In the sequel, since p1 is high and of less practical interest,
we will focus on multicast group sizes smaller than p1 = The computation
of p1 for other graph types turns out to be di!cult. Since, as mentioned
above, the comparison with Internet measurement (Van Mieghem et al.,
2001a) shows that formula (17.29) provides a fairly good estimate, we expect
that p1 Q3 also approximates well the stable regime in the Internet.
The following theorem quanties the stability in the class RGU.
Theorem 17.5.1 For su!ciently large Q and xed p, the number of
changed edges Q (p) in a random graph Js (Q ) with uniformly distributed
link weights tends to a Poisson distribution,
n
3H[{Q (p)] (H [Q (p)])
Pr [Q (p) = n] h
n!
(17.35)
409
Root
A
B
D
m
m1
Fig. 17.10. A sketch of a uniform recursive tree, where kU$p = 3 and kU$p1 = 4
and the number of links in common is two (shown in bold Root-A-B).
overlap, there always exists a node in the SPT, say node E as illustrated in
Fig. 17.10, that sees the partial shortest paths from itself to p and p 1
as non-overlapping and independent. Since the SPT is a URT, the subtree
rooted at that node E (enclosed in dotted line in Fig. 17.10) is again a
URT as follows from Theorem 16.2.1. With respect to E, the nodes p and
p 1 are uniformly chosen and the number of links Q (p) that change
if the p-th node leaves is just its hopcount with respect to E (instead of
the original root). We denote the unknown number of nodes in that subtree
rooted at E by (p) Q . We have that (p) (p 1) because by
adding a group member, the size of the subtree can only decrease. For large
Q and small p, (p) is large such that the above mentioned asymptotic
law of the hopcount applies. If both p and Q are large, (p) will become
too small for the asymptotic law to apply. Thus, for xed p and large Q ,
this implies that Q (p) tends to a Poisson random variable with mean
H [Q (p)].
Simulations in Van Mieghem and Janic (2002) indicate that the Poisson
law seems more widely valid than just in the asymptotic regime (Q $ 4).
The proof can be extended to a general topology. Assume for a certain
class of graphs that the pdf of the hopcount Pr [KQ = n] and the multicast
e!ciency jQ (p) can be computed for all sizes Q. The subtree rooted at E
is again a SPT in a subcluster of size (p), which is an unknown random
variable. The argument similar as the one in the proof above shows that
410
This argument implicitly assumes that all multicast users are uniformly
distributed over the graph. By the law of total probability,
Q
X
q=1
Q
X
Pr [Kq = n] Pr [(p) = q]
q=1
n Pr KH[(p)] = n , by equating
Indeed, since H KH[(p)] = n=1
H KH[(p)] = jQ (p) jQ (p 1)
a relation in one unknown H [(p)] is found and can be solved for H [(p)].
In conclusion, we end up with the approximation
Lemma 17.6.1
For d A e,
V(d> e) =
e
[
d!
(d 3 n)! 1
=
[#(d + 1) 3 #(d 3 e + 1)]
(e 3 n)! n
e!
n=1
and
V(e> e) =
e
[
1
= #(e + 1) +
n
n=1
411
e
[
(d 3 n) (e 3 n + 1)
n
n=1
=d
e
e
[
[
(d 3 1 3 n) (e 3 n + 1)
(d 3 1 3 n) (e 3 n + 1)
3
n
n=1
n=1
and by the recurrence for the binomial
Since (d 3 1 3 n) (e 3 n + 1) = (d 3 e 3 1)! d313n
e3n
Se d313n d31
=
>
we
have
that
n=1
e3n
e31
V(d> e) = dV(d 3 1> e) 3
1 (d 3 1)!
d 3 e (e 3 1)!
s31
[
d!
1
(e 3 1)! m=0 (d 3 m)(d 3 m 3 e)
V(d> e) =
l
k
(Q )
in the URT with Q
Proof of equation (17.16): We will investigate H [[l ] = H [l
nodes. Here H [[l ] is the number of joint hops in a multicast SPT from the root to l uniformly
chosen
k l nodes in the URT and where all the group member nodes are dierent from the root. Let
l be the same quantity where we allow the group member nodes to be the root. Then,
H [
k l
l = Q 3 l H [[l ]
H [
Q
since there are l possibilities each with probability
1
Q
which case [l = 0.
k l
l is deduced from Fig. 17.11, where two clusters are
The average number of joint hops H [
shown each with respectively n and Q 3 n nodes. The rst cluster with n nodes does not possess
the root (dark shaded), but it contains the l multicast group members (light shaded). There is
already at least 1 joint hop because the link between the root and node D, that can be viewed as
the root of the rst cluster, is used by all l group members lying in the rst cluster. Given the size
n of the rst cluster, the probability that all l uniformly chosen group members belong to the rst
cluster equals Qn(n31)(n3l+1)
because the probability that the rst group member belongs to
(Q 31)(Q 3l+1)
n
that cluster, which is Q
, the probability that the second group member also belongs to the rst
n31
cluster, which is Q 31 and so on. Since the size of the rst cluster connected to the root is uniform
in between 1 and Q 3 1, the probability that the size is n equals Q 131 . When all l nodes are in
that rst cluster of size n, [l is at least 1, and the problem restarts, but with Q replaced by n and
D being the root. Hence, if all l group members
belong
tolthe rst cluster, the average number of
k
S 31 n(n31)(n3l+1)
(n) because we must sum over all possible
joint hops is Q 131 Q
1
+
H
[
l
n=1 Q (Q 31)(Q 3l+1)
sizes for the rst cluster. If not all l group member nodes are in the rst cluster, the group member
nodes are divided over the two clusters. But, in that case, we have no joint overlaps or [l = 0.
412
N k nodes
k nodes
l
k
(Q ) -recursion.
Fig. 17.11. The two contributing clusters leading to the H [
l
Thus, if not all l group members nodes are in the rst cluster, the only way that there are possible
joint overlaps ([l A 0), is that all l group member nodes are in the second cluster. However,
by removing the rst cluster, we are left again with a uniform recursive tree of size Qk 3 n. The
l
S 31 (Q 3n)(Q 3n31)(Q 3n3l+1)
(Q 3n) .
average number of joint hops in this case is Q 131 Q
H [
l
n=1
Q (Q 31)(Q 3l+1)
Adding both contributions results in the recursion formula
k
l
(Q ) =
H [
l
Q
31
k
l
[
n(n 3 1) (n 3 l + 1)
1
(n)
1 + 2H [
l
Q 3 1 n=1 Q (Q 3 1) (Q 3 l + 1)
(17.36)
We next write
(Q )
l
l
k
(Q ) =
= Q(Q 3 1) (Q 3 l + 1)H [
l
l
k
Q!
(Q )
H [
l
(Q 3 l)!
l
Q
31
[
1
(n)
[n(n 3 1) (n 3 l + 1) + 2l ]
Q 3 1 n=1
Q
31
Q
31
[
[
2
1
(n)
n(n 3 1) (n 3 l + 1) +
Q 3 1 n=l
Q 3 1 n=1 l
Subtracting
(Q )
(Q 3 1)l
(Q 31)
3 (Q 3 2)l
(Q 3 1)!
(Q 31)
+ 2l
(Q 3 l 3 1)!
(Q )
l
Q
(Q 3 2)!
+ l
Q(Q 3 l 3 1)!
Q 31
(17.37)
l
Q
n31
[
m=0
(Q 3n)
(Q 3 2 3 m)!
+ l
(Q 3 m)(Q 3 l 3 1 3 m)!
Q 3n
413
l
k
(l)
(l) = 0, because the root is then always one of the group member nodes, we
Since l = H [
l
nally obtain,
(Q )
l
=Q
Q[
3l31
m=0
Q
[
(Q 3 2 3 m)!
(n 3 2)!
=Q
(Q 3 m)(Q 3 l 3 1 3 m)!
n(n
3 l 3 1)!
n=l+1
(Q )
(17.38)
(Q 32)!
Q
.
l31 (Q 3l31)!
k
l
Q
(Q ) = (Q 3 l 3 1)! (Q )
H [
l
l
Q 3l
(Q 3 1)!
we have that
l
k
(Q 3 l 3 1)!Q
(Q )
=
H [l
(Q 3 1)!
k
l
(Q )
and, for large Q, H [l
;
1
Q
l31 (Q 31)
Q
[
n=l+1
(n 3 2)!
n(n 3 l 3 1)!
(17.39)
1
=
l31
Invoking Theorem 17.1.3, the average number of multicast hops for p uniformly chosen, distinct
group members is
jQ (p) =
p
Q
[
(Q 3 l 3 1)!Q [ (n 3 2)!
p
(31)l31
l
(Q
3
1)!
n(n 3 l 3 1)!
l=1
n=2
Q
32
p
[
3Q
(Q 3 2 3 v)! [ p
(Q 3 l 3 1)!
(31)l
l
(Q 3 1)! v=0
Q 3v
(Q
3 l 3 1 3 v)!
l=1
Sp p
l Q 3l31 =
l=0 l (31) {
p
l
[
(Q 3 l 3 1)!
p
gv k Q 313p
(31)l
{
({ 3 1)p
{Q 3l3v31 =
v
l
(Q 3 l 3 1 3 v)!
g{
l=0
"
[
Q 3 1 3 p (n + p)!
({ 3 1)n+p3v
n
(n + p 3 v)!
n=0
414
and
Q 32
Q
32
[
3Q(Q 3 1 3 p)! [
v!
1
+Q
(Q 3 1)!
(Q
3
v)(v
3
p)!(Q
3
1
3
v)
(Q
3
v)(Q
3 1 3 v)
v=0
v=0
%Q 32
&
Q
32
[
v!
v!
3Q(Q 3 1 3 p)! [
=
3
(Q 3 1)!
(v
3
p)!(Q
3
1
3
v)
(v
3
p)!(Q
3 v)
v=p
v=p
%Q 31
&
Q
[ 1
[
1
3
+Q
n
n
n=1
n=2
%Q 3p31
&
Q[
3p
[
(Q 3 n 3 1)!
(Q 3 n)!
3Q(Q 3 1 3 p)!
3
=
+Q 31
(Q 3 1)!
(Q 3 n 3 1 3 p)!n
(Q 3 n 3 p)!n
n=1
n=2
jQ (p) =
Q[
3p
(Q 3 2)!
(Q 3 n 3 1)!
(Q 3 n 3 1)!(Q 3 n 3 p)
=
+
(Q 3 n 3 1 3 p)!n
(Q 3 2 3 p)!
(Q 3 n 3 p)!n
n=2
Q[
3p
Q[
3p
(Q 3 2)!
(Q 3 n)!
(Q 3 n 3 1)!
+
3p
(Q 3 2 3 p)!
(Q
3
n
3
p)!n
(Q
3 n 3 p)!n
n=2
n=2
Then,
3Q(Q 3 1 3 p)!
jQ (p) =
(Q 3 1)!
=
&
Q[
3p
(Q 3 n 3 1)!
(Q 3 2)!
3p
+Q 31
(Q 3 2 3 p)!
n(Q 3 n 3 p)!
n=2
Q 3p
Q (p 3 1) + 1
pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)!
+
(Q 3 1)
(Q 3 1)!
n(Q 3 n 3 p)!
n=2
= 31 +
Q 3p
pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)!
(Q 3 1)!
n(Q 3 n 3 p)!
n=1
(Q 3 1)!
(Q 3 n 3 1)! 1
=
[#(Q) 3 #(p)]
(Q 3 n 3 p)! n
(Q 3 p)!
(17.40)
Let [l be the number of joint hops for l dierent multicast group members (we allow the root to
l = 0). Then,
be a user in which case [
k
l
l D 1 = Pr [All group members belong to the same cluster connected to the root]
Pr [
= n Pr [All group members belong to the rst cluster connected to the root]
(Q 31)@n
1+n++nG1
=n
Ql
l
l
=n
1+n++nG
l
(17.41)
415
G
\
(q)
sl
>
(17.42)
q=G3m+1
k
l
l D G = 0, because if [
l = G some destinations must
Note that for l D 2 the probability Pr [
be identical. From (2.36) we obtain for l D 2,
k l G31
[
l =
H [
G
\
(q)
sl
m=1 q=G3m+1
Since H [[l ] =
Q
H
Q 3l
G31
[
m=1
nm
1++nGm
l
1++nG
l
G31
[
m=1
m
nG3m 1++n
l
1++nG
(17.43)
k l
l , we nd
[
H [[l ] =
1++nm
G31
Q [ nG3m
l
1++nG >
Q 3 l m=1
lD2
(17.44)
k l
1 and H [[1 ] we nd
For the value of H [
G
k l
r
q
[
1
1 = 1
H [
nm (1 + + nG3m ) =
GnG+1 3 (Q 3 1)
Q m=1
Q(n 3 1)
and
H [[1 ] =
m
G31
G
[ nG3m 1++n
nG
1 [ m
Q
1
n (1 + + nG3m ) =
1++nG +
Q 3 1 m=1
Q 3 1 m=1
Q 31
1
1++nm
G31
p
[
Q [ nG3m
p
pnG
l
(31)l
3
1++nG
Q 3 1 l=1 l
Q 3 l m=1
l
Writing Dm =
nm+1 31
n31
jQ>n (p) =
G31
p
[
Dm ! [ p
(Q 3 l 3 1)!
pnG
(31)l
nG3m
3Q
l
Q 31
Q!
(Dm 3 l)!
m=1
l=1
Concentrating on the inner sum with lower sum bound l = 0, denoted as Vm , and substituting
n = p 3 l, we have
p
[
p
K(Q 3 p + n)
(31)p3n
Vm =
n
K(D
m 3 p + n + 1)
n=0
Invoking the Taylor series of the hypergeometric function (Abramowitz and Stegun, 1968, Section
15.1.1),
I (d> e; f; }) =
"
K(f) [ K(d + q)K(e + q) q
}
K(d)K(e) q=0
K(f + q)q!
416
"
[
p
(31)n } n
n
n=0
and
"
[
K(Q 3 p)
K(Q 3 p + n)
I (1> Q 3 p; Dm 3 p + 1; }) =
}n
K(Dm 3 p + 1)
K(D
m 3 p + 1 + n)
n=0
Hence,
Vm =
1
K(Q 3 p)
gp
[(1 3 })p I (1> Q 3 p; Dm 3 p + 1; })]|}=0
p! K(Dm 3 p + 1) g} p
Invoking the dierentiation formula (Abramowitz and Stegun, 1968, Section 15.2.7),
(31)p K(d + p)K(f 3 e + p)K(f)
gp
(13})d31 I (d+p> e; f+p; })
(1 3 })d+p31 I (d> e; f; }) =
g} p
K(d)K(f 3 e)K(f + p)
we have, since d = 1 and I (d> e; f; 0) = 1,
Vm =
Thus,
jQ>n (p) =
G31
[
Dm ! (31)p (Q 3 p 3 1)!(Dm 3 Q + p)!
(Q 3 1)!
pnG
3Q
3
nG3m
Q 31
Q!
(Dm 3 Q)!Dm !
Dm !
m=1
G31
G31
[
(31)p31 (Q 3 p 3 1)! [ G3m (Dm 3 Q + p)!
pnG
nG3m +
n
+
Q 31
(Q 3 1)!
(Dm 3 Q)!
m=1
m=1
17.8 Problem
(i) Compute the eective power exponent W (Q) for the n-ary tree.
18
The hopcount to an anycast group
18.1 Introduction
IPv6 possesses a new address type, anycast, that is not supported in IPv4.
The anycast address is syntactically identical to a unicast address. However, when a set of interfaces is specied by the same unicast address, that
unicast address is called an anycast address. The advantage of anycast
is that a group of interfaces at dierent locations is treated as one single
address. For example, the information on servers is often duplicated over
several secondary servers at dierent locations for reasons of robustness and
accessibility. Changes are only performed on the primary servers, which
are then copied onto all secondary servers to maintain consistency. If both
the primary and all secondary servers have a same anycast address, a query
417
418
from some source towards that anycast address is routed towards the closest
server of the group. Hence, instead of routing the packet to the root server
(primary server) anycast is more e!cient.
Suppose there are p (primary plus all secondary) servers and that these p
servers are uniformly distributed over the Internet. The number of hops from
the querying device D to the closest server is the minimum number of hops,
denoted by kQ (p), of the set of shortest paths from D to these p servers in
a network with Q nodes. In order to solve the problem, the shortest path
tree rooted at node D, the querying device, needs to be investigated. We
assume in the sequel that one of the p uniformly distributed servers can
possibly coincide with the same router to which the querying machine D is
attached. In that case, kQ (p) = 0. This assumption is also reected in
the notation, small k, according to the convention made in Section 16.3.2
that capital K for the hopcount excludes the event that the hopcount can
be zero.
Clearly, if p = 1, the problem reduces to the hopcount of the shortest
path from D to one uniformly chosen node in the network and we have that
kQ (1) = kQ >
where kQ is the hopcount of the shortest path in a graph with Q nodes.
The other extreme for p = Q leads to
kQ (Q ) = 0
because all nodes in the network are servers. In between these extremes, it
holds that
kQ (p) kQ (p 1)
since one additional anycast group member (server) can never increase the
minimum number of hops from an arbitrary node to that larger group.
The hopcount to an anycast group is a stochastic problem. Even if the
network graph is exactly known, an arbitrary node D views the network
along a tree. Most often it is a shortest path tree. Although the sequel
emphasizes shortest path trees, the presented theory is equally valid for
any type of tree. The node Ds perception of the network is very likely
dierent from the view of another node D0 . Nevertheless, shortest path
trees in the same graph possess to some extent related structural properties
that allow us to treat the problem by considering certain types or classes
of shortest path trees. Hence, instead of varying the arbitrary node D over
all possible nodes in the graph and computing the shortest path tree at
each dierent node, we vary the structure of the shortest path tree rooted
419
at D over all possible shortest path trees of a certain type. Of course, the
connement of the analysis then lies in the type of tree that is investigated.
We will only consider the regular n-ary tree and the irregular URT . It
seems reasonable to assume that real shortest path trees in the Internet
possess a structure somewhere in between these extremes and that scaling
laws observed in both the two extreme cases may also apply to the Internet.
The presented analysis allows us to address at least two dierent issues.
First, for a same class of trees, the e!ciency of anycast over unicast dened
in terms of a performance measure ,
=
H [kQ (p)]
1
H [kQ (1)]
is quantied. The performance measure indicates how much hops (or link
traversals or bandwidth consumption) can be saved, on average, by anycast.
Alternatively, also reects the gain in end-to-end delay or how much faster
than unicast, anycast nds the desired information. Second, the so-called
server placement problem can be treated. More precisely, the question How
many servers p are needed to guarantee that any user request can access the
information within m hops with probability Pr [kQ (p) A m] , where is
certain level of stringency, can be answered. The server placement problem
is expected to gain increased interest especially for real-time services where
end-to-end QoS (e.g. delay) requirements are desirable. In the most general
setting of this server placement problem, all nodes are assumed to be equally
important in the sense that users requests are generated equally likely at
any router in the network with Q nodes. As mentioned in Chapter 17, the
validity of this assumption has been justied by Phillips et al. (1999). In
the case of uniform user requests, the best strategy is to place servers also
uniformly over the network. Computations of Pr [kQ (p) A m] ? for given
stringency and hop m, allow the determination of the minimum number p
of servers. The solution of this server placement problem may be regarded
as an instance of the general quality of service (QoS) portfolio of an network
operator. When the number of servers for a major application oered by the
service provider are properly computed, the service provider may announce
levels of QoS (e.g. via Pr [kQ (p) A m] ? ) and accordingly price the use
of the application.
18.2 General analysis
Let us consider a particular
o shortest path tree W rooted at node D with
n
(n)
the level set OQ = [Q
as dened in Section 16.2.2. Suppose
1$n$Q31
420
that the result of uniformly distributing p anycast group members over the
graph leads to a number p(n) of those anycast group member nodes that
(n)
are n hops away
n from
o the root. These p distinct nodes all belong to the
(n)
(n)
n-th level set [Q . Similarly as for [Q , some relations are immediate.
First, p(0) = 0 means that none of the p anycast group members coincides
with the root node D or p(0) = 1 means that one of them (and at most one)
is attached to the same router D as the querying device. Also, for all n A 0,
(n)
it holds that 0 p(n) [Q and that
Q31
X
p(n) = p
(18.1)
n=0
hn
o
i
(18.2)
(18.3)
Pr [hm ] = Pr p(m) = 0 hm31 Pr [hm31 ]
The assumption that allp
group members
are uniformly distributed
anycast
421
(m)
p
Pr p = 0 hm31 = Sm31
(18.4)
(n)
Q3
[
Q
n=0
(0)
( )
which follows from Pr p = 0 = p
and equals Pr p(0) = 0 |h31
Q
(p)
p
(although the event h31 is meaningless). Observe that Pr p(0) = 1 = Q
holds for any tree such that
p
Pr [kQ (p) = 0] =
Q
By iteration of (18.3), we obtain
Pr [hm ] =
m
Y
v=0
Q3Sv
(n)
n=0
[Q
n=0
Q 3Sm
p
Q3Sv31 [ (n) =
p
(n)
n=0
p
Q
[Q
(18.5)
P
where the convention in summation is that en=d in = 0 if d A e. Finally,
combining (18.2) with (18.4) and (18.5), we arrive at the general conditional
expression for the minimum hopcount to the anycast group,
Q 3Sm31 [ (n) Q3Sm [ (n)
n=0 Q
n=0 Q
p
p
(18.6)
Pr [kQ (p) = m|OQ ] =
Q
p
Clearly, while Pr [kQ (0) = m|OQ ] = 0 since there is no path, we have for
p = 1,
(m)
[
Pr [kQ (1) = m|OQ ] = Q
Q
It directly follows from (18.6) that
Q 3Sq
n=0
p
Q
(n)
[Q
(18.7)
P 31
P
(n)
(n)
If Q qn=0 [Q ? p or, equivalently, Q
n=q+1 [Q ? p, then equation
(18.7) shows that Pr [kQ (p) A q|OQ ] = 0. The maximum possible hopcount
of a shortest path to an anycast group strongly depends on the specics of the
shortest path tree or the level set OQ . A general result is worth mentioning:
422
Q.
Pr[kQ (Q 1) = 1] =
Using the tail probability formula (2.36) for the average, it follows from
(18.7) that
P
Q32
(n)
1 X Q qn=0 [Q
(18.8)
H [kQ (p)|OQ ] = Q
p
p q=0
from which we nd,
Q31
1 X
(n)
H [kQ (1)|OQ ] =
n[Q
Q
n=1
423
or explicitly,
Pr[kQ (p) = m] =
SQ31
SQ 31 {n SQ 31 {n h
i
n=m
n=m+1
(1)
(Q31)
p
= {Q31
Pr [Q = {1 >= = = >[Q
Q p
n=1 {n =Q31
where the integers {n 0 for all n. This expression explicitly shows the
importance of the level structure OQ of the shortest path tree W . The level
set OQ entirely determines the shape of the tree W . Unfortunately, a general
form for Pr [OQ ] or Pr [kQ (p) = m] is di!cult to obtain. In principle, via
extensive trace-route measurements from several roots, the shortest path
tree and Pr [OQ ] can be constructed such that a (rough) estimate of the
level set OQ in the Internet can be obtained.
18.3 The n-ary tree
For regular trees, explicit expressions are possible because the summation
in (18.9) simplies considerably. For example, for the n-ary tree dened in
Section 17.3,
(m)
[Q = n m
(m)
Provided the set OQ only contains these values of [Q for each m, we have
that Pr [OQ ] = 1, else it is zero (because then OQ is not consistent with a
G+1
n-ary tree). Summarizing, for the n-ary tree with Q = n n3131 and G levels,
the distribution of the minimum hopcount to the anycast group is
Q3 nm 31 Q3 nm+1 31
n31
n31
p
(18.10)
Pr [kQ (p) = m] =
Q p
p
Extension of the integer n to real numbers in the formula (18.10) is expected to be of value as suggested in Section 17.3. When a n-ary tree was
used to t corresponding Internet multicast measurements (Van Mieghem
et al., 2001a), a remarkably accurate agreement was found for the value
n 3=2, which is about the average degree of the Internet graph. Hence,
if we were to use the n-ary tree as model for the hopcount to an anycast
group, we expect that n 3=2 is the best value for Internet shortest path
trees. However, we feel we ought to mention that the hopcount distribution of the shortest path between two arbitrary nodes is denitely not a
n-ary tree, because Pr [kQ (1) = m] increases with the hopcount m, which is
in conict with Internet trace-route measurements (see, for example, the
bell-shape curve in Fig. 16.4).
Figure 18.1 displays Pr [k(p) m] for a n-ary with outdegree n = 3 and
424
0.8
Pr[hN (m) d j]
m = 50
m = 10
0.6
m=5
0.4
m=2
m=1
0.2
Fig. 18.1. The distribution function of k500 (p) versus the hops m for various sizes
of the anycast group in a n-ary tree with n = 3 and Q = 500
Q = 500. This type of plot allows us to solve the server placement problem. For example, assuming that the n-ary tree is a good model and the
network consists of Q = 500 nodes, Fig. 18.1 shows that at least p = 10
servers are needed to assure that any user is not more than four hops separated from a server with a probability of 93%. More precisely, the equation
Pr[k500 (p) A 4] ? 0=07 is obeyed if p 10.
Figure 18.2 gives an idea how the performance measure decreases with
the size of the anycast group in n-ary trees (all with outdegree n = 3), but
with dierent size Q . For values of p up to around 20% of Q , we observe
that decreases logarithmically in p.
425
1.0
k=3
0.8
0.6
N = 100
N = 500
0.4
N = 5000
5
N = 10
0.2
6
N = 10
0.0 2
7 8 9
7 8 9
0.1
m/N
Fig. 18.2. The performance measure for several sizes of n-ary trees (with n = 3)
as a function of the ratio of anycast nodes over the total number of nodes.
rem 16.2.1 of the URT, applied to the anycast minimum hop problem, is
illustrated in Fig. 18.3.
Root
i anycast
members
m i anycast
members
R1
N k nodes
T2
k nodes
T1
Fig. 18.3. A uniform recursive tree consisting of two subtrees W1 and W2 with n and
Q n nodes respectively. The rst cluster contains l anycast members while the
cluster with Q n nodes contains p l anycast members.
Figure 18.3 shows that any URT can be separated into two subtrees W1 and
W2 with n and Q n nodes respectively. Moreover, Theorem 16.2.1 states
426
that each subtree is independent of the other and again a URT. Consider
now a specic separation of a URT W into W1 = w1 and W2 = w2 , where the tree
w1 contains n nodes and l of the p anycast members and w2 possesses Q n
nodes and the remaining p l anycast members. The event {kW (p) = m}
equals the union of all possible sizes Q1 = n and subgroups p1 = l of the
event {kw1 (l) = m 1} _ {kw2 (p l) m} and the event {kw1 (l) A m 1} _
{kw2 (p l) = m},
{kW (p) = m} = ^n ^l {{kw1 (l) = m 1} _ {kw2 (p l) m}}
^ {{kw1 (l) A m 1} _ {kw2 (p l) = m}}
Because kQ (0) is meaningless, the relation must be modied for the case
l = 0 to
{kW (p) = m} = {kw2 (p) = m}
and for the case l = p to
{kW (p) = m} = {kw1 (p) = m 1}
This decomposition holds for any URT W1 and W2 , not only for the specic
ones w1 and w2 . The transition towards probabilities becomes
X
Pr [kW (p) = m] =
(Pr [kw1 (l) = m 1] Pr [kw2 (p l) m]
all w1 >w2 >n>l
Qp3l
427
Q31
X p31
X
n=1 l=1
nQ3n
Q31
X Q3n Pr [kQ 3n (p)
p
= m] +
p
Q
(Q 1) p
n=1
l p3l
(Q 1) Q
p
Pr [kn (p) = m 1]
Pr [kQ (p) = m] =
Q31
X p31
X nl
n=1 l=1
Q3n31
X
p3l
Pr [kQ3n (p l) = t]
t=m
Q31
X
n=1
n
p
This recursion (18.11) is solved numerically for Q = 20. The result is shown
in Fig. 18.4, which demonstrates that Pr [k(p) A Q p] = 0 or that the
path with the longest hopcount to an anycast group of p members consists
of Q p links.
Since there are (Q 1)! possible recursive trees (Theorem 16.2.2) and
there is only one line tree with Q 1 hops where each node has precisely
one child node, the probability to have precisely Q 1 hops from the root is
1
(Q31)! (which also is Pr [kQ = Q 1] given in (16.8)). The longest possible
hopcount from a root to p anycast members occurs in the line tree where
all p anycast members occupy the last p positions. Hence, the probability
428
p!
(Q 1)! Q
p
(18.12)
10
N = 20
-3
10
-5
10
Pr[hN (m) = j]
-7
10
-9
10
-11
10
-13
10
-15
10
-17
10
-19
10
10
12
14
16
18
20
Fig. 18.4. The pdf of kQ (p) in a URT with Q = 20 nodes for all possible p.
Observe that Pr[kQ (p) A Q p] = 0= This relation connects the various curves
to the value for p.
Figure 18.4 allows us to solve the server placement problem. For example, consider the scenario in which a network operator announces that
any user request will reach a server of the anycast group in no more than
m = 4 hops in 99.9% of the cases. Assuming his network has Q = 20 routers
and the shortest path tree is a URT, the network operator has to compute
the number of anycast servers p he has to place uniformly spread over the
Q = 20 routers by solving Pr [k20 (p) A 4] ? 1033 . Figure 18.4 shows that
the intersection of the line m = 4 and the line Pr [k20 (p) = 4] = 1033 is the
curve for p = 7. Since the curves for p 7 are exponentially decreasing,
Pr [k20 (p) A 4] is safely1 approximated by Pr [k20 (p) = 4], which leads to
the placing of p = 7 servers. When following the line m = 4, we also observe
that the curves for p = 5> 6> 7> 8 lie near to that of p = 7. This means that
1
More precisely, since Pr [k20 (4) A 4] = 0.001 06 and Pr [k20 (5) A 4] = 0.000 32, only p = 5
servers are su!cient.
429
Q31
p
XX
pl n Q n
1
u=
pl
Q n l
(Q 1) Q
p n=1 l=0
Q31
X p31
X n Q 1 n
1
=
p1l
l
(Q 1) Q
p n=1 l=0
Q31
X Q 1 p
1
=
=
Q
p1
(Q 1) Q
p n=1
(b) Observe that Pr [kQ (Q ) = m] = 0 for m A 0.
(c) For p = 1,
Pr [kQ
Q 31
1 X
n
= m] =
(Pr[kn = m] + Pr [kn = m 1])
Q 1
Q
n=1
Multiplying both sides by } m , summing over all m leads to the recursion for
the generating function (16.6)
(Q + 1)*Q+1 (}) = (} + Q )*Q (})
430
m
2(1)Q313m (m+1)
2(1)Q3m X
(n+m+1)
n 2n1
VQ
Pr [kQ (2) = m] =
VQ
+
(1)
Q!
Q !(Q 1)
n
n=1
m31
2n + 1
2(1)Q3m X m + n + 1
n+m+1
(1)n VQ
+
n
m
Q !(Q 1)
n=0
(18.13)
In van der Hofstad et al. (2002b) we have demonstrated that the covariance
between the number of nodes at level u and m for u m in the URT is
u
h
i (1)Q31 X
(u) (m)
(n+m+1)
n+m 2n + m u
H [Q [Q =
VQ
(1)
(Q 1)!
n
n=0
m
2(1)Q3m X
(n+m+1)
n 2n1
VQ
(1)
Q !(Q 1)
n
n=1
With
l
k
(m31) (m)
H [Q
[Q
(Q2 )
. Since
(m) 2
H [Q
(m+1)
2(1)Q3m31 VQ
=
(Q 1)
Q!
2 Q2
h
i
(m) 2
(m31) (m)
H [Q
H [Q [Q
2Q
Pr [kQ (2) = m] =
Pr [kQ = m] +
Q
Q 1
2 Q2
2
m
2(1)Q31 X m + n
n+m
(1)n+m VQ
+
Q !(Q 1)
n
n=1
Pr [kQ (p) = Q 2] =
Q1
X
p=1
Q 1 Pr [kQ 1 (p) = Q 3]
1
+
Q 1
(Q 2)!
p
"
#
p
X
Q
p+1
+ (p 1)(p@2 + 1) +
2
n
By substitution into the recursion (18.11), one may verify these relations.
n=2
431
m
Y
Pr
hn
oi
Sm31 (o)
p(m) = 0 = (1 s) o=0 [Q
o=0
which implies that the probability that there are no servers in the tree is
(1 s)Q . Since in that case, the hopcount is meaningless, we consider
the conditional probability (18.2) of the hopcount given that the level set
contains at least one server (which is denoted by e
kQ (p)) is
Sm31 (o)
(m)
i
h
1 (1 s)[Q (1 s) o=0 [Q
Pr e
kQ (p) = m|OQ =
1 (1 s)Q
Thus,
h
i 1 (1 s)Sqo=0 [Q(o)
e
Pr kQ (p) q|OQ =
1 (1 s)Q
i
h
(o)
Finally, to avoid the knowledge of the entire level set OQ , we use H [Q =
(o)
Q Pr [kQ (1) = o] from (16.7) as the best estimate for each [Q and obtain
the approximate formula
l
k
Sm31 k (o) l
(m)
H [Q
H [Q
(1 s) o=0
1 (1 s)
h
i
Pr e
kQ (p) = m =
(18.14)
1 (1 s)Q
In the dotted lines in Fig. 18.5, we have added the approximate result for
the URT where H [kQ (p)] is computed based on (18.14), but where H[kQ (1)]
is computed exactly. For p = 1, the approximate analysis (18.14) is not
well
i Fig. 18.5 illustrates this deviation in the fact that appr (1) =
h suited:
H e
kQ (1) @H [kQ (1)] ? 1. For higher values of p we observe a fairly good
correspondence. We found that the probability (18.14) reasonably approximates the exact result plotted on a linear scale. Only the tail behavior (on
432
log-scale) and the case for p = 1 deviate signicantly. In summary for the
URT, the approximation (18.14) for Pr [kQ (p) = m] is much faster to compute than the exact recursion and it seems appropriate for the computation
of for p A 1. However, it is less adequate to solve the server placement
problem that requires the tail values Pr [kQ (p) A m].
1.0
N = 10 : K
N = 20 : K
N = 30 : K
N = 50 : K
0.8
0.404 ln(m/N)
0.295 ln(m/N)
0.252 ln(m/N)
0.210 ln(m/N)
0.6
0.4
0.2
0.0
7 8 9
0.1
7 8 9
1
m/N
Fig. 18.5. The performance measure for several sizes Q of URTs as a function of
the ratio p@Q
(m) 1@m
number of nodes Q with degree if limm<" [Q
= or, equivalently,
(m)
433
Without the limit concept, we cannot specify the precise conditions of exponential growth in a nite shortest path tree. If we assume in nite graphs
P
(m)
(m)
that [Q m for m o, then om=0 [Q = Q with 0 ? ? 1. Indeed,
for A 1, the highest hopcount level o possesses by far the most nodes since
o+1 31
o
31 , which cannot be larger than a fraction Q of the total number
of nodes.
We now present an order calculus to estimate for exponentially growing
trees based on relation (18.8). Let us denote
Q3{
|=
p
Q
p31
Y
m=0
{
1
Q m
Q32
p
X (13q )Q
q
p
H [kQ (p)|OQ ] (1 + r (1))
exp q +
Q
Q
p
q=0
o
X
q=o+1
If there are only a few levels more than o, the last series is much smaller than
1 and can be omitted. Since the slowly varying sequence q is unknown, we
approximate q = and
Z p o 3x
Z o
p
p
Q
h
1
q
q
gx
exp q
exp gq =
p
Q
Q
log
x
0
q=0
Q
Z " 3x
h
h3p
1
gx
log p x
p log
Q
p
h3p
p
1
log + R
=
log
p
Q
Q
o
X
where in the last step a series (Abramowitz and Stegun, 1968, Section 5.1.11)
434
3p
p
33 h p 3log p3log
+R Q
1+
log Q
(1 + r (1))
1
31
1 +hlog+log
+
R
Q
Q
!
3p
h
1
log p h31 p
= (1 + r (1)) 1
+R
log Q
log Q
log2 Q
Since by denition = 1 for p = 1, we nally arrive at
3p
1
log p h31 h p
+R
1
log Q
log Q
log2 Q
which supplies evidence for the conjecture 1 d log p that exponentially growing graphs possess a performance measure that logarithmically
decreases in p, which is rather slow.
Measurement data in the Internet seem to support this log p-scaling law.
Apart from the correspondence with gures in the work of Jamin et al.
(2001), Fig. 6 in Krishnan et al. (2000) shows that the relative measured
tra!c ow reduction decreases logarithmically in the number of caches p.
Appendix A
Stochastic matrices
This appendix reviews the matrix theory for Markov chains. In-depth analyses are found in classical books by Gantmacher (1959a,b), Wilkinson (1965)
and Meyer (2000).
A.1 Eigenvalues and eigenvectors
1. The algebraic eigenproblem consists in the determination of the eigenvalues and the corresponding eigenvectors { of a matrix D for which the
set of q homogeneous linear equations in q unknowns
D{ = {
(A.1)
(A.2)
(A.3)
435
436
Stochastic matrices
q
Y
(n )
(A.5)
n=1
Since f() = det (D L), it follows from (A.3) and (A.5) that for = 0
det D = f0 =
q
Y
n
(A.6)
n=1
q
X
n = trace(D)
(A.7)
n=1
q
Y
n=1
P
n n
q
n=1 n n
P
q
m=1 m
!Sq
m=1
m
trace(D) q
det D
q
To any eigenvalue , the set (A.1) has at least one non-zero eigenvector {.
Furthermore, if { is a non-zero eigenvector, also n{ is a non-zero eigenvalue.
Therefore, eigenvectors are often normalized, for instance, a probabilistic
eigenvector has the sum of its components equal to 1 or a norm k{k1 = 1
as dened in (A.23). If the rank of D L is less than q 1, there will
be more than one independent vector. Just these cases seriously complicate
the eigenvalue problem. In the sequel, we omit the discussion on multiple
eigenvalues and refer to Wilkinson (1965).
2. The eigenproblem of the transpose DW ,
DW | = |
(A.8)
W
determinant of its transpose, det D L = det (D L) which shows
that the eigenvalues of D and DW are the same. However, the eigenvectors
are, in general, dierent. Alternatively, we can write (A.8) as
|W D = | W
(A.9)
437
(A.10)
W
|mW {n = {Wn |m holds. However, (A.10) expresses that the sets of left- and
right-eigenvectors are orthogonal if m 6= n .
3. If D has q distinct eigenvalues, then the q eigenvectors are linearly independent and span the whole q dimensional space. The proof is by reductio
ad absurdum. Assume that v is the smallest number of linearly dependent
eigenvectors labelled by the rst v smallest indices. Linear dependence then
means that,
v
X
n {n = 0
(A.11)
n=1
n n {n = 0
(A.12)
n=1
which, because all eigenvalues are distinct, implies that there is a smaller
set of v 1 linearly depending eigenvectors. This contradicts the initial
hypothesis.
This important property has a number of consequences. First, it applies to
left- as well as to right-eigenvectors. Relation (A.10) then shows that the sets
438
Stochastic matrices
(A.13)
or, the matrix \ W is the inverse of the matrix [. Furthermore, for any
right-eigenvector, (A.1) holds, rewritten in matrix form, that
D[ = [ diag(n )
(A.14)
(A.15)
Thus, when the eigenvalues of D are distinct, there exists a similarity transform K 31 DK that reduces D to diagonal form. In many applications, similarity transforms are applied to simplify matrix problems. Observe that a
similarity transform preserves the eigenvalues, because, if D{ = {, then
K 31 { = K 31 D{ = (K 31 DK)K 31 {. The eigenvectors are transformed to
K 31 {.
When D has multiple eigenvalues, it may be impossible to reduce D to
a diagonal form by similarity transforms. Instead of a diagonal form, the
most compact form when D has u distinct eigenvalues each with multiplicity
P
pm such that um=1 pm = q is the Jordan canonical form F,
5
9
9
9
F=9
9
7
Fp1 3d (1 )
Fd (1 )
:
:
:
:
:
8
..
.
Fpu31 (u31 )
Fpu (u )
439
0
..
.
1
..
.
9
9
9
Fp () = 9
9
7 0
0
0
1 0
..
..
.
.
0
0 0
..
.
:
:
:
:
:
1 8
The number of independent eigenvectors is equal to the number of submatrices. If an eigenvalue has multiplicity p, there can be one large
submatrix Fp (), but also a number n of smaller submatrices Fem () such
P
that nm=1 em = p. This illustrates, as mentioned in art. 1, the much higher
complexity of the eigenproblem in case of multiple eigenvalues. For more
details we refer to Wilkinson (1965).
5. The companion matrix of the characteristic polynomial (A.3) of D is
dened as
5
9
9
9
F=9
9
7
..
.
(1)q31 f1 (1)q31 f0
0
0
0
0
..
..
.
.
1
0
6
:
:
:
:
:
8
Expanding det (F L) in cofactors of the rst row yields det (F L) =
f (). If D has distinct eigenvalues, D as well as F are similar to diag(l ). It
has been shown that the similarity transform K for D equals K = [. The
similarity transform for F is the Vandermonde matrix Y (), where
5
9
9
9
Y ({) = 9
9
9
7
{1q31 {2q31
{1q32 {2q32
..
..
.
.
..
{
.
{
1
q31
{q31
q31 {q
q32
{q31 {qq32
..
..
.
.
{q31
1
{q
1
6
:
:
:
:
:
:
8
440
Stochastic matrices
distinct. Furthermore,
5
9
9
9
Y ()diag (l ) = 9
9
9
7
while
9
9
9
FY () = 9
9
9
7
q1
q2
q31
1
2q31
..
..
.
.
..
2
2
.
1
1
2
21
1
22
2
qq31 qq
q31
q31
q31 q
..
..
.
.
2q31
q31
..
.
2q
q
6
:
:
:
:
:
:
8
6
:
:
:
:
:
:
8
(A.16)
(A.17)
This result is the CaleyHamilton theorem. There exist several other proofs
of the CaleyHamilton theorem.
7. Consider an arbitrary matrix polynomial in ,
I () =
p
X
In n
n=0
441
left-quotient and left-remainder I () = E()TO () + O() and the rightquotient and right-remainder I () = TU ()E() + U() are unique. Let us
concentrate on the right-remainder in the case where E() = L D is a
linear polynomial in . Using Euclids division scheme for polynomials,
p31
I () = Ip
p31
p32
X
In n
n=0
X
p32 p33
2
+ Ip D + Ip31 D + Ip32
+
In n
n=0
p
X
Im Dm3n + +
p
X
6
Im Dm31 8 (L D)
m=1
m=n
p
X
Im Dm
m=0
In summary, I () = TU () (L D) + U() (and similarly for the leftquotient and left-remainder) with
P
P
Pp
p
p
n31
m3n
n31
m3n I
T
I
D
()
=
D
TU () = p
m
m
O
m=n
m=n
P n=1 m
Ppn=1 m
U() = p
I
D
=
I
(D)
O()
=
D
I
m
m=0 m
m=0
(A.18)
and where the right-remainder is independent of . The Generalized Bzout
Theorem states that the polynomial I () is divisible by (L D) on the
right (left) if and only if I (D) = R (O() = R).
By the Generalized Bzout Theorem, the polynomial I () = j()L j(D)
is divisible by (L D) because I (D) = j(D)L j(D) = R. If I () is an
ordinary polynomial, the right- and left-quotient and remainder are equal.
The CaleyHamilton Theorem (A.17) states that f(D) = 0, which indicates
that f()L = T() (L D) and also f()L = (L D) T(). The matrix
T() = (L D)31 f() is called the adjoint matrix of D. Explicitly, from
(A.18),
3
4
q
q
X
X
T() =
n31 C
fm Dm3n D
n=1
m=n
Pq
m3n . The main theand, with (A.6), T(0) = (D)31 det D =
m=1 fm D
oretical interest of the adjoint matrix stems from its denition f()L =
442
Stochastic matrices
o
Y
({ m )
m=1
Substitute { by D, then
j(D) = j0
o
Y
(D m L)
m=1
o
Y
det(D m L) = j0q
m=1
o
Y
f(m )
m=1
With (A.5),
det(j(D)) = j0q
o Y
q
Y
(n m ) =
m=1 n=1
q
Y
q
Y
n=1
j0
o
Y
(n m )
m=1
j (n )
n=1
If k({) = j({) , we arrive at the general result: For any polynomial j({),
the eigenvalues values of j(D) are j (1 ) > = = = > j (q ) and the characteristic
polynomial is
q
Y
(j (n ) )
(A.19)
det(j(D) L) =
n=1
443
arbitrary polynomial, it should not surprise that, under appropriate conditions of convergence, it can be extended to innite polynomials, in particular
to the Taylor series of a complex function. As proved in Gantmacher (1959a,
Chapter V), if the power series of a function i (}) around } = }0
i (}) =
"
X
im (}0 )(} }0 )m
(A.20)
m=1
P
m
converges for all } in the disc |}}0 | ? U, then i (D) = "
m=1 im (}0 )(D}0 L)
provided all eigenvalues of D lie with the region of convergence of (A.20),
i.e. | }0 | ? U. For example,
hD} =
log D =
"
X
} n Dn
n=0
"
X
n=1
n!
for all D
(1)n31
(D L)n for |n 1| ? 1, all 1 n q
n
and, from (A.19), the eigenvalues of hD} are h}1 > = = = > h}1 . Hence, the knowledge of the eigenstructure of a matrix D allows us to compute any function
of D (under the same convergence restrictions as complex numbers }).
W
=
A Hermitian matrix D is a complex matrix that obeys DK = DW
D, where dK = (dlm )W is the complex conjugate of dlm = Hermitian matrices
possess a number of attractive properties. A particularly interesting subclass
of Hermitian matrices are real, symmetric matrices that obey DW = D.The
W
inner-product of vector | and { is dened as | K { and obeys | K { =
K K
P
| { = {K |. The inner-product {K { = qm=1 |{m |2 is real and positive
for all vectors except for the null vector.
9. The eigenvalues of a Hermitian matrix are all real. Indeed, leftmultiplying (A.1) by {K yields
{K D{ = {K {
K
444
Stochastic matrices
(A.21)
where nm is the Kronecker delta, which is zero if n 6= m and else nn = 1.
Consequently, (A.13) reduces to
[K [ = L
which implies that the matrix [ formed by the eigenvectors is an unitary
matrix ([ 31 = [ K ). For a real symmetric matrix D, the corresponding
relation [ W [ = L implies that [ is an orthogonal matrix ([ 31 = [ W ).
Although the arguments so far (see Section A.1) have assumed that the
eigenvalues of D are distinct, the theorem applies in general (as proved in
Wilkinson (1965, Section 47)): For any Hermitian matrix D, there exists a
unitary matrix X such that
X K DX = diag (m )
real m
and for any real symmetric matrix D, there exists an orthogonal matrix X
such that
X W DX = diag (m )
real m
q
X
n=1
n }n2
(A.22)
445
which is only positive for all }n provided n A 0 for all n. From (A.6),
a positive denite quadratic form {W D{ possesses a positive determinant,
det D A 0. This analysis shows that the problem of determining an orthogonal matrix X (or the eigenvectors of D) is equivalent to the geometrical
problem of determining the principal axes of the hyper-ellipsoid
q
q X
X
dlm {l |m = 1
l=1 m=1
Relation (A.22) illustrates that the eigenvalues n are the squares of the
principal axis. A multiple eigenvalue refers to an indeterminacy of the principal axes. For example if q = 3, an ellipsoid with two equal principal axis
means that any section along the third axis is a circle. Any two perpendicular diameters of the largest circle orthogonal to the third axis are principal
axis of that ellipsoid.
(A.23)
m=1
446
Stochastic matrices
1
s
+ 1t = 1 and
(A.24)
s
q k{k2
m=1 m |{m |
Pq
m=1 m
v
Pq
!1
v
v
m=1 m |{m |
P
q
m=1 m
!1
v
For m = 1, the weights m disappear such that the inequality for the Hlder
t-norm becomes
1 1
where q
1 1
( 31)
v
447
P
q
1v
4 1v 3
41
3
1 v
q
q
v
v
X
X
k{kv
|{
|{
|
|
m
m
D =C
D
C
Pq
= P
1 =
1
Pq
v
q
k{kv
|{
|
v )
n
n=1
( n=1 |{n |v ) v
(
|{
|
m=1
m=1
n
n=1
v
m=1 |{m |
Since | =
|{ |v
Sq m
v
n=1 |{n |
1 and
1
3
41 3
4 1v P
!1
1 v
q
q
q
v
v
X
X
|{m |v v
|
|{
|
|{
m=1
m
m
C
D C
D = Pq
Pq
Pq
=1
v
v
v
n=1 |{n |
n=1 |{n |
n=1 |{n |
m=1
m=1
1 1
(A.26)
For p q matrices D, the most frequently used norms are the Euclidean
or Frobenius norm
41@2
3
q
p X
X
|dlm |2 D
(A.27)
kDkI = C
l=1 m=1
kD{kt
(A.28)
k{kt
{
= D k{k
, which shows that
{6=0
kD{kt
k{kt
(A.29)
k{kt =1
(A.30)
Since the vector norm is a continuous function of the vector components and
since the domain k{kt = 1 is closed, there must exist a vector { for which
equality kD{kt = kDkt k{kt holds. Since the n-th vector component of D{
P
is (D{)l = qm=1 dlm {m , it follows from (A.23) that
t 41@t
3
p X
X
q
D
C
kD{kt =
d
{
lm
m
l=1 m=1
448
Stochastic matrices
X
q
p X
p X
q
p
X
X
X
q
kD{k1 =
d
{
|d
|
|{
|
=
|{
|
|dlm |
lm m
lm
m
m
l=1 m=1
l=1 m=1
m=1
l=1
!
q
p
p
X
X
X
|{m | max
|dlm | = max
|dlm |
m=1
l=1
l=1
Clearly, there exists a vector { for which equality holds, namely, if n is the
column in D with maximum absolute sum, then { = hn , the n-th basis vector
with all components zero, except for the n-th one, which is 1. Similarly, for
all { with k{k" = 1,
q
q
X
X
q
kD{k" = max
dlm {m max
|dlm | |{m | max
|dlm |
l
l
l
m=1
m=1
m=1
Again, if u is the row with maximum absolute sum and {m = 1.sign(dum )
P
P
such that k{k" = 1, then (D{)u = qm=1 |dum | = maxl qm=1 |dlm | = kD{k" .
Hence, we have proved that
kDk" = max
l
kDk1 = max
m
from which
q
X
m=1
p
X
|dlm |
(A.31)
|dlm |
(A.32)
l=1
K
D = kDk
1
"
kD{k2
= 1
k{k2
(A.33)
449
1
= q31
min kD{k2
k{k2 =1
(A.34)
{W D{
{6=0 {W {
(A.35)
1 = sup
{6=0
q = inf
q
X
n2
(A.36)
n=1
s
q kDk2 may be attained.
n n31
(a) Since D = DD kDk Dn31 , by induction, we have for any
integer n, that
n
n
D kDk
and
lim Dn = 0 if kDk ? 1
n<"
(b) By taking the norm of the eigenvalue equation (A.1), kD{k = || k{k
and with (A.30),
|| kDkt
(A.37)
450
Stochastic matrices
12 DK Dt DK t kDkt
Choose t = 1 and with (A.33),
kDk22 DK 1 kDk1 = kDk" kDk1
(c) Any matrix D can be transformed by a similarity transform K to
a Jordan canonical form F (art. 4) as D = KFK 31 , from which Dn =
KF n K 31 . A typical Jordan submatrix (Fp ())n = n32 E, where E is
independent of n. Hence, for large n, Dn $ 0 if and only if || ? 1 for all
eigenvalues.
S
E
1
Se =
R S2
where S1 and S2 are square matrices. Relabeling amounts to permuting rows
and columns in the same fashion. Thus, there exists a similarity transform
K such that S = K SeK 31 .
y1
0
and } =
}1
0
where y1 A 0> }1 A 0
S11 S12
S =
S21 S22
451
y1
}1
S11 y1
+
=
0
S21 y1
0
Since S is irreducible, S21 6= R, such that y1 A 0 implies that S21 y1 6= 0,
which proves the lemma.
The theorem proved for stochastic matrices is a special case of the famous
Frobenius theorem for non-negative matrices (see for a proof, e.g. Gant-
452
Stochastic matrices
453
det L + fgW = 1 + gW f
(A.38)
which follows, after taking the determinant, from the matrix identity
L 0
L
0
L + fgW f
L
f
=
gW 1
0
1
gW 1
0 1 + gW f
gives
1
1 W
1
y x=1+
=
1
det S L = det (S L)
Invoking (A.19) yields
q
q
Y
Y
1
det S L =
= (1 )
(n )
(n )
n=1
n=2
which shows that the eigenvalues of S are {1> 2 > 3 > = = = > Q }.
S (1 )x
S =
0
yW
454
Stochastic matrices
with corresponding eigenvalues {1> 2 > 3 > = = = > q > 0}. This result is similarly proved as Lemma A.4.4 using (Meyer, 2000, p. 475)
D E
(A.39)
= det D det G FD31 E
det
F G
provided D31 exists unless F = 0.
A.4.2 Example: the two-state Markov chain
The two-state Markov chain is dened by
1s
s
S =
t
1t
Observe that det S = 1st. The eigenvalues of S satisfy the characteristic
polynomial f() = 2 (2 s t) + det S = 0, from which 1 = 1 and
2 = 1 s t = det S . The adjoint matrix T () is computed (art. 7) via
the polynomial f()3f()
3 ,
f() f()
= + (2 s t)
and after $ L and $ S
T () = L + S (2 s t)L
1+t
s
=
t
1+s
The (unscaled) right- (left-) eigenvectors of S follow as the non-zero columns
(rows) of T (). For 1 = 1, we nd {1 = (1> 1) and |1W = (t> s). For
2 = 1st, the eigenvector {2 = (s> t) and |2W = (1> 1). Normalization
1
1
(1> 1) and {2 = s+t
(s> t). If the
(art. 4) requires that |nW {n = 1 or {1 = s+t
eigenvalues are distinct (s + t 6= 0), the matrix S can be written as (art. 4)
S = [diag(n )\ W ,
1
1 s
1
0
t s
S =
0 1st
1 1
s + t 1 t
from which any power S n is immediate as
1
1
0
1 s
t s
n
S =
1 1
0 (1 s t)n
s + t 1 t
n
(1 s t)
1
t s
s s
=
+
t t
s+t t s
s+t
(A.40)
1
t s
"
S =
s+t t s
455
(A.41)
because |1 s t| ? 1.
Alternatively, the steady-state vector is a solution of (9.25),
0
s t
1
=
2
1
1 1
s t
Applying Cramers rule with G = det
= (s + t), we obtain
1 1
0 t
s 0
1
1
and 2 = G
or
det
det
1 = G
1 1
1 1
i
h
s
t
= s+t
s+t
which indeed agrees with (A.41) and (9.37).
A.4.3 The tendency towards the steady-state
A stochastic matrix S and the corresponding Markov chain is regular if the
only eigenvalue with || = 1 is = 1. It is fully regular if, in addition,
= 1 is a simple zero of the characteristic polynomial of S . The Frobenius Theorem A.4.2 indicates that a regular matrix is necessarily reducible.
Application (c) in Section A.3.2 demonstrates that the steady-state only
exists for regular Markov chains. Alternatively, a regular matrix S has the
property that S n A R (for some n), i.e. all elements are strictly positive.
In the sequel, we concentrate on fully regular stochastic matrices S , where
all eigenvalues lie within the unit circle, except for the largest one, = 1.
If the Q eigenvalues of the regular stochastic matrix S are ordered as 1 =
1 A |2 | |Q | 0, the second largest eigenvalue 2 will determine
the speed of convergence of the Markov chain towards the steady-state.
A.4.3.1 Example: the three-state Markov chain
The three-state Markov chain S is dened by (9.7) with Q = 3. Assuming
that S is irreducible, we determine the eigenvalues. Since the Frobenius
Theorem A.4.2 already determines one eigenvalue 1 = 1, the remaining
two 2 and 3 are found from (A.6) and (A.7). They obey the equations
2 3 = det S
2 + 3 = S11 + S22 + S33 1 = trace(S ) 1
456
Stochastic matrices
s
1
(trace(S ) 1) = det S
2
n31
s
det S
in
457
Sln =
Q
X
Snm = 1
n=1
= Q1 Q1 Q1
The example
in Section A.4.3
illustrates that a steady-state vector equal
that arises in the Markov chain of the random walk and the birth and death
process. Moreover, the eigenstructure of the tri-diagonal Toeplitz matrix D
can be expressed in analytic form.
458
Stochastic matrices
2n Q 1
f{Q31 + (e ){Q = 0
We assume that d 6= 0 and f 6= 0 and rewrite the set with {0 = {Q+1 = 0 as
f
e
{n+1 +
{n+2 +
0n Q 1
{n = 0
d
d
which are second order dierence equations with constant coe!cients. The
n
general solution of these equations is {n = u1n +u
2 where
e
u1 + u2 =
d
f
u1 u2 =
d
The constants and follow from the boundary requirement {0 = {Q +1 = 0
as
+ =0
u1Q +1
+ u2Q+1 = 0
Q+1
u1
u2
= 1 or
u1
u2
2lp
Q +1
pf
lp
Q +1
dh
and u2 =
pf
lp
3 Q +1
dh
The rst root equation is only possible for special values of = p with
1 p Q , which are the eigenvalues,
r
lp
s
f 3 Qlp
p
p = e + d
h +1 + h Q +1 = e + 2 df cos
d
Q +1
459
Since there are precisely Q dierent values of p, there are Q distinct eigenvalues p . The components {n of the eigenvector belonging to p are
f n
f n lpn
lpn
pn
2
2
h Q +1 h3 Q +1 = 2l
sin
{n =
d
d
Q +1
The scaling constant follows from the normalization k{k1 = 1 or
2l
Q n
X
f 2
n=1
Since sin
pn
Q+1
Q n
X
f 2
n=1
sin
pn
Q +1
=1
h lpn i
= Im h Q +1 we have
sin
pn
Q +1
= Im
" Q r
X
f
n=1
d
p
f
lp
Q +1
lp
Q +1
n #
Q+1
dh
:
91
18
= Im 7
p f lp
1 d h Q +1
p f Q +1
p
1 + (1)p
sin
d
Q+1
=
1
pf
p
f
1 2 d cos Q+1 + d
5
631
p f Q+1
p
sin
1 + (1)p
d
Q+1
2l = 7
18
pf
p
f
1 2 d cos Q+1 + d
Finally, the components {n of the eigenvector { belonging to p become,
for 1 n Q ,
f n
pn
2 sin
d
Q+1
{n =
s f Q +1
p
1+(31) ( d )
sin( Qp
+1 )
sf
1
p
132 d cos( Q +1 )+ df
Observe that for stochastic matrices d + e + f = 1 (see the general random
walk in Section 11.2) and for the innitesimal rate matrix d + e + f = 0
(see the birth and death process in Section 11.3), which only changes the
eigenvalue through e.
460
Stochastic matrices
Q } + f Q
f1
f2
=
+
+ ( + 1 ) } 1
} u1 } u2
}<un
(} un ) (Q } + f Q)
(} u1 ) (} u2 )
f3Q
.
Explicitly,
p
( + 1 )2 + 4
A0
u1 =
2
p
( + 1 ) ( + 1 )2 + 4
u2 =
?0
2
( + 1 ) +
(A.42)
s
with u1 u2 = 1 and u1 + u2 = +13
. Moreover, unless = 1 2l
461
in which case u1 = u2 = Il , the roots are distinct. The residues are
Q u1 + f Q
(u1 u2 )
Q u2 + f Q
f2 =
= Q f1
(u2 u1 )
f1 =
(A.43)
f1
J (})
u2 Q
} u1
e
1
lim
=
h
lim
= he
}<" } Q
}<" } u2
}
such that he = {Q . The obvious scaling for the eigenvector is to choose
{Q = 1 and we arrive at
J (}) =
Q
X
{m } m = (} u1 )f1 (} u2 )Q3f1
(A.44)
m=0
which shows that f1 must be an integer n 5 [0> Q ] for J (}) to be a polynomial of degree Q . Expanding the binomials with f1 = n gives
n
Q3n
X
X Q n
n m
n3m
} (u1 )
} q (u2 )Q 3n3q
J (}) =
m
q
q=0
m=0
m
" X
X
Q n
n
(u1 )n3m (u2 )Q3n3m+q } m
=
m
q
q
q=0
m=0
m
X
Q n
n
Q3m
u1 n3m u2 Q3n3m+q
(A.45)
{m (n) = (1)
m
q
q
q=0
The requirement on f1 also leads to equations for the eigenvalues . Indeed, equating f1 = n in (A.43) and substituting the explicit expressions for
the roots u1 and u2 , we obtain after squaring the quadratic equations for the
eigenvalue (n) for 0 n Q
D(n) 2 (n) + E(n) (n) + F(n) = 0
(A.46)
462
Stochastic matrices
where
D(n) = (Q@2 n)2 (Q@2 f)2
E(n) = 2(1 ) (Q@2 n)2 Q (1 + ) (Q@2 f)
F(n) = (1 + )2 [(Q@2)2 (Q@2 n)2 ]
Each of the Q + 1 quadratic equations (A.46) has two roots 1 (n) and 2 (n),
thus in total 2(Q +1), while there are only Q +1 eigenvalues. The coe!cients
D (n), E (n) and F (n) only depend on n via (Q@2 n)2 , which means that
the quadratics (A.46) for which n0 = Q n are identical. This observation
reduces the set {1 (n)> 2 (n)}0$n$Q of roots to precisely Q + 1 and connes
the analysis to 0 n Q@2. We will show that all roots are real and
distinct (except for n = Q@2).
(n) = E 2 (n) 4D (n) F (n) is with | = (Q@2 n)2 5
The discriminant
0> (Q@2)2 ,
g2 {(n)
= 32 ? 0, for
g| 2
2
Q (f(1 + ) Q )2 A 0
1+
E (Q@2)
=
2D (Q@2)
1 2 Qf
For ? n Q@2, the roots {1 ()> 2 ()} are dierent from the roots
{1 (n)> 2 (n)} because D(n) } 2 + E(n) } + F(n) ? D() } 2 + E() } + F()
for all }. Indeed, D() D(n) = (Q@2 )2 (Q@2 n)2 A 0 and the
discriminant (E() E (n))2 4 (D() D(n)) (F() F(n)) ? 0 shows
that there are no real solutions. Thus, an extreme eigenvalue occurs for
n = 0 for which F (0) = 0 such that 1 (0) = 0 and
2 (0) =
1 + Q
E (0)
f
=
D (0)
1 Qf
(A.47)
Q
? 1 and f ? Q shows that 2 (0) ? 0,
The stability requirement = f(1+)
and thus 2 (0) is the largest negative eigenvalue. The eigenvalues for other
0 ? n Q@2 are either larger than 0 or smaller than 2 (0). We need to
consider two dierent cases (a) f ? Q@2 and (b) f A Q@2 while F (n) ? 0
for all n 5 [0> Q).
463
Q
Q
Q
Q
n
f
f
+1
2f
E(n) = 2(1 )
2
2
2
f
shows that both terms are negative.
(b) If f A Q@2, we see that D (n) A 0 for 0 ? n ? Q f leading to
1 (n) A 0 A 2 (n). For Q f ? n ? Q@2, we have D (n) ? 0 and thus
1 (n)2 (n) = F(n)
D(n) A 0 while their same sign follows from 1 (n) + 2 (n) =
E(n)
D(n) requires us to consider the sign of E (n). If 1, then E (n) A 0. If
A 1, then
Q
+ 2(1 ) (Q@2 n)2
E(n) = Q (1 + ) f
2
Q 2
Q
+ 2(1 ) f
? Q (1 + ) f
2
2
(Q f)
Q
+1 A0
= 2f f
2
f
which shows that 0 ? 2 (n) ? 1 (n). Hence, there are Q [f] + 2(Q@2
Q + [f]) = [f] positive eigenvalues.
In summary, there are [f] positive eigenvalues, one 1 (0) = 0 and Q [f]
negative eigenvalues. Relabel the eigenvalues as (n > Q3n ) = (1 (n)> 2 (n))
in increasing order Q3[f]31 ? ? 1 ? 0 ? Q = 0 ? Q31 ? ?
Q3[f] .This way of writing distinguishes between underload and overload
eigenvalues. In terms of the discriminant by (n) = E 2 (n) 4D (n) F (n),
the non-positive eigenvalues are
(a) If f ? Q@2,
s
3E(n)3 {(n)
2D(n)
s
3E(n)~ {(n)
=
2D(n)
1 (n) =
1>2 (n)
0 n [f]
[f] + 1 n
Q
2
(b) If f A Q@2,
s
1 (n) =
3E(n)3 {(n)
2D(n)
0 n Q [f] 1
464
Stochastic matrices
31 31
| W Z = |W Z Z 31 G31 Z Z 31 TZ = | W Z Z 31 GZ
Z TZ
W = |W Z , G
31 GZ and T
31 TZ = TW , we obtain
With |Z
Z = Z
Z = Z
Z
31
31
W
W
|Z GZ TZ = |Z . The transpose |Z = TZ GZ |Z is
Z 2 | = TG31 Z 2 |
which shows compared to G31 T{ = { that { = Z 2 | or, the vector components are, for 0 m Q ,
Q m
{m =
|m
(A.50)
m
A.5.2.3 General tri-diagonal matrices
Since tri-diagonal matrices of the form (11.1) frequently occur in Markov
theory, we devote this section to illustrate how far the eigen-analysis can be
465
1m?Q
tQ {Q31 + (uQ ) {Q = 0
If sm = s and tm = t, the matrix S reduces to a Toeplitz form for which the
eigenvalues and eigenvectors can be explicitly written, as shown in Appendix A.5.2.1. Here, we consider the general case and show how orthogonal
polynomials enter the scene.
Using um = 1 tm sm , u0 = 1 s0 and uQ = 1 tQ , the set becomes,
with = 1,
s0 +
{0 ()
s0
sm + tm +
tm
{m () {m31 ()
{m+1 () =
sm
sm
tQ
{Q 31 ()
{Q () =
tQ +
{1 () =
1m?Q
(A.51)
The dependence on the eigenvalue is made explicit. Solving (A.51) iteratively for m ? Q ,
{0 () 2
+ (t1 + s1 + s0 ) + s1 s0
s0 s1
{0 () 3
{3 () =
+ (t1 + t2 + s2 + s1 + s0 ) 2
s2 s1 s0
+ (t2 t1 + t2 s0 + s2 t1 + s2 s1 + s2 s0 + s1 s0 ) + s2 s1 s0
{2 () =
(A.52)
466
Stochastic matrices
with
fm (m) = 1
fm31 (m) =
f0 (m) =
m31
X
p=0
m31
Y
(sp + tp )
sp
p=0
fn (m + 1) n =
n=1
m31
X
n=1
PQ 31 Pm
1 + m=1 n=0
fn (m)
Tm31
n +
p=0 sp
tQ
|tQ +|
P
Q31
n=0
f (Q31) n
TnQ 32
p=0 sp
467
have
Q
X
{m () {m 0 = k{()k22 0
m=0
{m () = 0
m=0
P
while the normalization enforces k{()k1 = Q
m=0 |{m ()| = 1. The scaling
PQ
{0 () = 1 leads to the polynomial m=0 en n of degree Q whose Q zeros
equal the eigenvalues 6= 0 and whose coe!cients are, with sm = tm+1 and
for 2 n Q 2,
e0 = (Q + 1) tQ
e1 = Q + tQ
Q
32
X
m=1
en =
Q32
X
m=n
f1 (m)
f1 (Q 1)
+ 2tQ QQ32
Qm31
p=0 sp
p=0 sp
Q31
X fn31 (m)
tQ fn (m)
2tQ fn (Q 1)
+
+
QQ32
Qm31
Qm31
p=0 sp
p=0 sp
p=0 sp
m=n31
eQ31 =
eQ
The Newton identities (B.9) relate these coe!cients to the sum of integer
powers of the real zeros 6= 0.
Proceeding much further in the case that S is not symmetric is di!cult.
A similarity transform is needed to transform the linearly independent set of
vectors { () for dierent to an orthogonal set from which the eigenvalues
then follow, as in the symmetric case above. Karlin and McGregor (see
Schoutens (2000, Chapter 3)) have shown the existence of a set of orthogonal
polynomials (similar to our set {m ()) that obey an integral orthogonality
condition (similar to Legendre or Chebyshev polynomials) instead of our
summation orthogonality condition. Only in particular cases, however, were
they able to specify this orthogonal set explicitly.
468
Stochastic matrices
S0Q
S10 S11 S12
S1Q
0 S21 S22
S2Q
..
..
..
..
..
.
.
.
.
.
0
0 SQ31>Q32 SQ31;Q31 SQ31;Q
0
0
0
SQ;Q31
SQQ
6
:
:
:
:
:
:
:
8
m =
m+1
X
Snm n
n=0
m+1 =
1 Smm
Sm+1;m
m31
X
Snm
m
n
Sm+1;m
n=0
Let us consider the eigenvalue equation (A.1) that is written for stochastic
matrices as (S L)W {W = 0. The matrix (S L)W is a (Q + 1) (Q + 1)
matrix of rank Q because det(S L)W = 0 (else all eigenvectors { are zero).
When writing this set of equations in terms of {0 , we produce the following
set of Q equations,
5
S10
9 S11 3
9
9
9 S12
9
9
9
9
9
9
..
7
.
S22 3
S32
..
.
..
.
S1;Q 31
S2;Q 31
S3;Q 31
0
S21
0
0
..
.
0
0
.
..
0
0
.
..
SQ 31;Q 32
SQ 31;Q 31 3
0
SQ ;Q 31
6
5 {
:
1
:
: 9 {2
: 9 {3
:9
:=9
..
:9
:9
.
:7
: {Q 31
8
{Q
5 3S
00
3S01
3S02
..
.
3S0;Q 32
3S0;Q 31
: 9
: 9
: 9
:=9
: 9
: 9
8 7
6
:
:
:
: {0
:
:
8
469
rule, we nd that
5
S10
9 S11 3
9
9
9
S12
9
9
..
9
9
.
9
det 9
..
9
.
9
9
9
9
S1m
9
9
..
7
.
{m
=
{0
S1;Q 31
0
S21
S22 3
..
.
..
.
S2m
..
.
S2;Q 31
..
.
0
0
..
.
3 S00
3S01
..
.
Sm31;m32
3S0>m32
0
0
..
.
..
.
Sm31;m31 3
3S0;m31
..
.
Sm31;m
..
.
3S0m
..
.
3S0;Q 31
Sm+1;m
..
.
..
.
Sm31;Q 31
TQ 31
n=0
Sm+1;Q 31
0
0
..
.
..
.
..
.
..
.
0
SQ ;Q 31
6
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
8
Sn+1;n
RmQ3m
Dmm
= det F det D
det
EQ3mm FQ 3mQ3m
Q 31
where det F = Q
n=m Sn+1;n . In the determinant det D, we can change the
m-th column with the (m 1)-th, and subsequently, the (m 1) th with the
(m 2)-th and so on until the last column is permuted to the rst column,
in total m 1 permutations. After changing the sign of that rst column,
the result is that det D = (1)m det (Smm Lmm ) where Smm is the original
transition probability matrix limited to m states (instead of Q + 1). Hence,
for 1 m Q ,
{0 (1)m det (Smm Lmm )
{m =
Qm31
n=0 Sn+1;n
and the normalization of eigenvectors k{k1 = 1 determines {0 as
{0 =
1+
PQ
Appendix B
Algebraic graph theory
This appendix reviews the elementary basics of the matrix theory for graphs
J (Q> O). The book by Cvetkovic et al. (1995) is the current standard work
on algebraic graph theory.
3
4
2
6
Fig. B.1. A graph with Q = 6 and O = 9. The links are lexicographically ordered,
h1 = 1 $ 2> h2 = 1 $ 3> h3 = 1 # 6 etc.
472
elements
;
? 1 if link hm = l $ m
elm =
1 if link hm = l # m
=
0 otherwise
9
9
9
D=9
9
7
0
1
1
0
0
1
1
0
1
0
1
1
1
1
0
1
0
0
0
0
1
0
1
0
0
1
0
1
0
1
1
1
0
0
1
0
:
9
:
9
9
:
: E =9
9
:
8
7
1
1 1
0
0
0
0
0
0
1
0
0
1 1 1
0
0
0
0 1
0 1
0
0
1
0
0
0
0
0
0
0
0 1 1
0
0
0
0
0
1
0
0
1 1
0
0
1
0
0
1
0
0
1
6
:
:
:
:
:
8
tlm= EE W lm =
eln emn = 1
n=1
PO
2
If l = m, then
n=1 eln = gl , the number of links that have node l in
common.
Also, by the denition of D, the row sum l of D equals the degree gl of
node l,
Q
X
gl =
dln
(B.1)
PQ
n=1
Consequently, each row sum n=1 tln = 0 which shows that T is singular
implying that det T = 0. The Laplacian is symmetric T = TW because D
and are both symmetric and the quadratic form dened in Section A.2
art. 10,
{W T{ = {W TW { = {W E W E{ = kE{k22 0
is positive semidenite, which implies that all eigenvalues of T are nonnegative and at least one is zero because det T = 0.
P PQ
Since Q
l=1
n=1 dln = 2O, the basic law for the degree follows as
Q
X
l=1
gl = 2O
(B.2)
473
(B.3)
adjT
where T31 = det
T . We omit the proof, but apply the relation (B.3) to
the complete graph NQ where T = Q L M. Equation (B.3) demonstrates
that all elements of adjT are equal to (J). Hence, it su!ces to compute
one suitable element of adjT, for example (adjT)11 that is equal to the
determinant of the (Q 1) (Q 1) principal submatrix of T obtained by
deleting the rst row and column in T,
5
6
Q 1
1
===
1
9 1
Q 1 ===
1 :
9
:
(adjT)11 = det 9
:
..
..
..
7
8
.
.
.
1
1
=== Q 1
Adding all rows to the rst and subsequently adding this new rst row to
all other rows gives
6
5
1
1
===
1
1 :
9
9 1 Q 1 = = =
: = det 9
= det 9
..
..
8
7
7 ...
.
.
1
1
=== Q 1
5
(adjT)11
1 1
0 Q
..
.
0 0
6
=== 1
=== 0 :
. : = Q Q 2
..
. .. 8
=== Q
Hence, the total number of spanning trees in the complete graph NQ which
is also the total number of possible spanning trees in any graph with Q
nodes equals Q Q32 . This is a famous theorem of Cayley of which many
proofs exist (van Lint and Wilson, 1996, Chapter 2).
4. The complexity of J is also given by
(J) =
det (M + T)
Q2
(B.4)
474
P
n31
of walks from l to m with length n then equals Q
d
=
D lm
um
u=1 D
lu
(by the rules of matrix multiplication).
Explicitly,
Q
Q X
Q
X
X
=
u1 =1 u2 =1
un31 =1
As shown in Section 15.2, the number of paths with n hops between node l
and node m is
X
X
X
[n (l $ m; Q ) =
(Q 1 o) =
(Q 2)!
(Q n 1)!
475
n
equivalently, if there exists some integer n A 0 for which D lm 6= 0 for each
l> m. The lowest integer n for which Dn lm 6= 0 for each pair of nodes l> m
is called the diameter of the graph J. Lemma B.1.1 demonstrates that the
diameter equals the length of the longest shortest hop path in J.
Q Q
Q
1 XX
1 X
2O
dlm =
gl =
Q
Q
Q
l=1 m=1
(B.5)
l=1
The stochastic matrix S = 31 D where = diag(g1 > g2 > = = = > gQ) is
the degree matrix has the characteristic polynomial det 31 D L =
Q
Y
det(D3{)
where
det
=
gm . Since the largest eigenvalue of a stochastic
det {
m=1
476
(1)
fQ31 =
Q
X
n = 0
(B.6)
n=1
q
X
dn (q) } = dq (q)
n=0
q
Y
(} }n (q))
(B.7)
n=1
where {}n (q)} are the q zeros. It follows from (B.7) that sq (0) = d0 (q) =
Q
dq (q) qn=1 (}n (q)). The logarithmic derivative of (B.7) is
s0q (})
= sq (})
q
X
n=1
1
} }n (q)
For } A maxn }n (q) (which is always possible for polynomials, but not for
functions), we have that
s0q (}) = sq (})
q X
"
X
(}n (q))m
} m+1
n=1 m=0
= sq (})
"
X
]m (q)
m=0
} m+1
where
]m (q) =
q
X
(}n (q))m
n=1
Thus
q
X
ndn (q) } n =
n=1
q
X
dn (q) } n
"
X
]m (q)} 3m =
m=0
n=0
q
" X
X
m=0 n=0
m=0 n=0
q
X
q
X
o=3" n=max(o>0)
31 X
q
X
o=3" n=0
q X
q
X
o=0 n=o
477
q
X
n=1
q
X
o0
n=o+1
q
1 X
dn (q) ]n3o (q)
qo
(B.9)
n=o+1
Q
1 X 2
n
2
n=1
3n
3
n=1
3
4
!2
Q
Q
X
1C X 2
=
n 2
4n D
8
fQ 33 =
fQ 34
Q
1X
n=1
n=1
Q
1 X 2
n
2
n=1
(B.10)
478
of the form
6
{ }
0 | 8
| 0
(B.11)
n=1
where Jn is the set of all subgraphs of J with exactly n nodes and f|fohv (G)
is the number of cycles in a subgraph G 5 Jn . The minor Pn is a determinant of the Pnn submatrix of D and dened as
X
(1)(s) d1s1 d2s2 dnsn
det Pn =
s
where the sum is over all n! permutations s = (s1 > s2 > = = = > sn ) of (1> 2> = = = > n)
and (s) is the parity of s, i.e. the number of interchanges of (1> 2> = = = > n)
to obtain (s1 > s2 > = = = > sn ). Only if all the links (1> s1 ) > (2> s2 ) > = = = (n> sn ) are
contained in J, d1s1 d2s2 = = = dnsn is non-zero. Since dmm = 0, the sequence of
contributing links (1> s1 ) > (2> s2 ) > = = = (n> sn ) is a set of disjoint cycles and (s)
depends on the number of those disjoint cycles. Now, det Pn is constructed
from a specic set G 5 Jn of n out of Q nodes and in total there are Qn
such sets in Jn . Combining all contributions leads to the expression (B.12).
7. Since D is a symmetric 0-1 matrix, we observe that using (B.1),
Q
Q
Q
X
X
X
2
2
dln dnl =
dln =
dln = gl
D ll =
n=1
n=1
n=1
Hence, with (A.16) or (B.10), (A.7) and basic law for the degree (B.2) is
expressed as
Q
Q
X
X
2n =
gn = 2O
(B.13)
trace(D2 ) =
n=1
n=1
479
Furthermore,
Q
Q
X
X
D2
lm
l=1 m=1;m6=l
Q
Q
Q
X
X
X
dln dnm =
Q X
Q
X
n=1 l=1
Q X
Q
X
Q
X
n=1 l=1
Q
X
dnl
gn
dnm
m=1;m6=l
Q
X
dnl
Q
X
l=1
n=1
!
dnl
l=1
or
Q
Q
X
X
D2
lm
l=1 m=1;m6=l
Q
X
gn (gn 1)
(B.14)
n=1
2
P PQ
Lemma B.1.1 states that Q
l=1
m=1;m6=l D lm equals twice the total number of two-hop walks with dierent source and destination nodes. In other
words, the total number of connected triplets of nodes in J equals half
(B.14).
8. The total number Qn of walks of length n in a graph follows from
Lemma B.1.1 as
Q
Q X
X
(Dn )lm
Qn =
l=1 m=1
Since any real symmetric matrix (Section A.2, art. 9) can be written as D =
X diag(m )X W where X is an orthogonal matrix of the (normalized) eigenvecP
n
tors of D, we have that Dn = X diag(nm )X W and (Dn )lm = Q
q=1 xlq xmq q .
Hence,
Q
!2
Q X
Q
Q X
Q
X
X
X
Qn =
xlq xmq nq =
xlq nq
l=1 m=1 q=1
q=1
l=1
l=1
Q
Q
Y
X
m=1
l=1
! 12
d2ml
Q
Q
Y
X
m=1
l=1
! 12
dml
Q
Y
p
gm
m=1
480
Q
Y
2n
Q
Y
gm
(B.15)
m=1
n=1
n=1
n=1
to the vector (2 > = = = > Q ) and the 1 vector (1> 1> = = = > 1) gives
Q
!2
Q
X
X
n
(Q 1)
2n
n=2
n=2
21 (Q 1) 2O 21
leads to the bound for the largest (and positive) eigenvalue 1 ,
r
2O (Q 1)
(B.16)
1
Q
P
2O
Alternatively, in terms of the average degree gd = Q1 Q
m=1 gm = Q , the
largest eigenvalue 1 is bounded by the geometric
mean of the average degree
p
and the maximum possible degree, 1 gd (Q 1). Combining the lower
bound (B.5) and upper bound (B.16) yields
r
2O (Q 1)
2O
1
(B.17)
Q
Q
11. From the inequality (A.26) for Hlder t-norms, we nd that, if
Q
X
|n |t ? t
n=1
then
Q
X
PQ
|n |s ? s
n=1
481
Q
1 3 1X 3
1 3 1 X 3
the number of triangles in J = 1 +
n 1
n A 0
6
6
6
6
PQ
n=2
n=2
P
2
2
Hence, if Q
n=2 n ? 1 , then the number
s of triangles in J is at least one.
Equivalently, in view of (B.10), if 1 A O then the graph J contains at
least one triangle.
12. A Theorem of Turan states that
h
Theorem B.2.1 A graph J with Q nodes and more than
tains at least one triangle.
Q2
4
i
links con-
h 2i
2
This theorem is a consequence of art. 7 and 11. For, using O A Q4 Q4
s
which is equivalent to Q ? 2 O in the bound on the largest eigenvalue (B.5),
1
and 1 A
triangle.
s
2O
2O
A s = O
Q
2 O
s
O is precisely the condition in art. 11 to have at least one
W
x=xW
Q
det (M L L) = det x=x ( + 1) L = ( ( + 1)) det L
+1
Using (A.38) and xW x = Q ,
Q
+1
= (1)Q31 ( + 1)Q31 ( + 1 Q )
,
gives the eigenvalues of NQ . Since the number of links in NQ is O = Q(Q31)
2
Q(Q31)
we observe that the equality sign in (B.16) can occur. Since O
for
2
any graph, the upper bound (B.16) shows that 1 Q 1 for any graph.
482
14. The dierence between the largest eigenvalue 1 and second largest
2 is never larger than Q , i.e.
1 2 Q
(B.18)
Q
X
n=1
n 1 +
Q
X
|n | 1 + (Q 1) |2 |
n=2
such that
2
1
Q 1
Hence,
1 2 1 +
Q 1
1
=
Q 1
Q 1
Q
X
dpm {m
m=1
which implies that all {m = 1 whenever dpm = 1, i.e. when the node m is
adjacent to node p. Hence, the degree of node p is gp = gmax . For any
node m adjacent to p for which the component {m = 1, a same eigenvalue
relation holds and thus gm = gmax . Proceeding this process shows that every
node n 5 J has same degree gn = gmax because J is connected. Hence,
{ = x where xW = [1 1 1]. Conversely, if J is connected and regular,
P
then Q
m=1 dpm = gmax for each p such that x is the eigenvector belonging
483
to eigenvalue = gmax , and the only possible eigenvector (as follows from
(B.19)
where
j () = 1 xW (D + ( + 1) L)31 x
In general, j () is not a simple function of although a little more is
2
1
(D + ( + 1) L)31 =
n=0
P
n n
where the last sum "
n=0 D } can be interpreted as the matrix generating function of the number of walks
of length n (see Section B.1, art. 5
n
and art. 8). Since D = X diag nm X W (Section A.2, art. 9) where the
orthogonal
s xm of D, the matrix product
Q
X
m=1
cos2 m
+ 1 + m
n=1 (n
+ 1 + )
484
and, hence,
det (Df L) =
(1)Q X
+ 1 + m Q 2 cos2 m
Q
m=1
Q
Y
(n + 1 + )
n=1;n6=m
(B.20)
which shows that the poles of j () are precisely compensated by the zeros
of the polynomial f ( 1). Thus, the eigenvalues of Df are generally
dierent from {m 1}1$m$Q where m is an eigenvalue of D. Only if x
n 3Q
and
is an eigenvector of D corresponding with n , then j () = +1+
+1+n
all eigenvalues of Df belong to the set {m 1}1$m6=n$Q ^ {Q 1 n }.
According to art. 15, x is only an eigenvector when the graph is regular.
gm
2O
(B.21)
485
In general, the matrix S is not symmetric, but, after a similarity transform K = 1@2 , a symmetric matrix U = 1@2 S 31@2 = 31@2 D31@2 is
obtained whose eigenvalues are the same as those of S (Section A.1, art. 4).
The powerful property (Section A.2, art. 9) of symmetric matrices shows
that all eigenvalues are real and that U = X W diag(U ) X , where the columns
of the orthogonal matrix X consist of the normalized eigenvectors yn that
obey ymW yn = mn . Explicitly written in terms of these eigenvectors gives
U=
Q
X
n yn ynW
n=1
where, with Frobenius Theorem A.4.2, the real eigenvalues are ordered as
1 = 1 2 Q 1. If we exclude bipartite graphs (where the set
of nodes is N = N1 ^ N2 with N1 _ N2 = B and where each link connects
a node in N1 and in N2 ) or reducible Markov chains (Section A.4), then
|n | ? 1, for n A 1. Section A.1, art. 4 shows that the similarity transform
K = 1@2 maps the steady state vector into y1 = K 31 W and, with (B.21),
31@2 W
y1 =
31@2 W
2
or
gm
2O
y1m = s
s 2 =
PQ
gm
m=1
gm
s
= m
2O
2O
Q
X
31@2
n
yn ynW 1@2
= x +
n=1
Q
X
n=2
The q-step transition probability (9.10) is, with yn ynW lm = ynl ynm and
(B.21),
s
Q
gm
gm X q
q
+
n ynl ynm
Slm =
2O
gl
n=2
q
gm X q
Slm m gm
|qn | |ynl | |ynm | ?
|n |
gl
gl
n=2
n=2
486
Denoting by = max (|2 | > |Q |) and by 0 the largest element of the reduced set {|n |} \ {} with 2 n Q , we obtain
s
q
Slm m ? gm q + R (0q )
gl
n
Y
det (Dp p L)
(B.22)
p=1
487
min
{W T{
(J) T 2 (J) 1 cos
Q
(B.23)
where (J) and (J) are the vertex and edge connectivity respectively.
488
(Q2 )
(Q2 ) Q
hs i X
X
s n
s
Q
2
O =
n Pr [O = n] =
ns (1 s)( 2 )3n
H
n
n=0
n=0
n=0
(B.24)
The degree distribution (15.11) of the random graph is a binomial distribution with mean H [Grg ] = s(Q 1) and Var[Grg ] = (Q 1)s(1 s).
The inequality (5.13) indicates that the degree Grg converges exponentially
fast to zero the mean H [Guj ] for xed s and large Q , which means that
the random graphs tends to a regular graph with high probability. Section
B.2, art. 1 states that 1 $ s (Q 1) with high probability. Comparison
with the bounds (B.24) indicates that the upper bound is less tight than the
lower bound and that the upper bound is only sharp when s $ 1, i.e. for
the complete graph. Section B.2, art. 13 shows that only for the complete
graph the upper bound is indeed exactly attained.
1
1
It is known that, for large Q, the second largest eigenvalue of Js (Q) grows as R Q 2 + .
489
Q<"
1 p 2
4 {2 1|{|$2
2 2
(B.25)
Since Wigners rst proof (Wigner, 1955) of this Theorem and his subsequent generalizations (Wigner, 1957, 1958) many proofs have been published. However, none of them is short and easy enough to include here.
Wigners Semicircle Law illustrates that, for su!ciently large Q , the distribution of the eigenvalues of IDQ does not depend anymore on the probability
distribution of the elements dlm . Hence, Wigners Semicircle Law exhibits
a universal property of a class of large, real symmetric matrices with independent random elements. Mehta (1991) suspects that, for a much broader
class of large random matrices, a mysterious yet unknown law of large numbers must be hidden. The scaling of D by I1Q can be understood from the
previous Section B.5.1. The adjacency matrix of the random graph satis2
es the conditions in Theorem B.5.1
= s (1 s) and its eigenvalues
swith
Q . In order to obtain the nite limit
(apart from the largest) grow as R
distribution (B.25) scaling by I1Q is necessary.
The spectrum of Js (50) together with the properly rescaled Wigners
Semicircle Law (B.25) is plotted in Fig. B.2. Already for this small value of
Q , we observe that Wigners Semicircle Law is a reasonable approximation
for the intermediate s-region. The largest eigenvalue 1 for nite Q , which is
distributed around s (Q 1) as demonstrated above and shown in Fig. B.2
but which is not incorporated in Wigners Semicircle Law, inuences the
PQ
average H [] = Q1
n=1 n = 0 and causes the major bulk of the pdf
around { = 0 to shift leftward compared to Wigners Semicircle Law, which
is perfectly centered around { = 0.
The complement of Js (Q ) is (Js (Q ))f = J13s (Q ), because a link in
Js (Q ) is present with probability s and absent with probability 1 s and
(Js (Q ))f is also a random graph. For large Q , there exists a large range
of s values for which both s sf and 1 s sf such that both Js (Q )
490
12
p = 0.1
10
p = 0.9
fO(x)
8
6
p = 0.8
p = 0.2
p = 0.7
p = 0.3
2
0
0
10
20
30
40
eigenvalue x
Fig. B.2. The probability density function of an eigenvalue in Js (50) for various
s. Wigners Semicircle Law, rescaled and for s = 0=5 ( 2 = 14 ), is shown in bold.
We observe that the spectrum for s and 1 s is similar, but slightly shifted. The
high peak for s = 0=1 reect disconnectivity, while the high peak at s = 0=9 shows
the tendency to the spectrum of the complete graph where Q 1 eigenvalues are
precisely 1.
and (Js (Q ))f are connected almost surely. Figure B.2 shows that the normalized spectra of Js (Q ) and J13s (Q ) are, apart from a small shift and
ignoring the largest eigenvalue, almost identical. Equation (B.20) indicates
that the spectrum of a graph and its complement tends to each other if
cos m $ 0 (except for the largest eigenvalue which will tend to x). This
seems to suggest that Js (Q ) and J13s (Q ) are tending to a regular graph
with degree s (Q 1) and (1 s) (Q 1) and that these regular graphs
(even for small Q) have nearly the same spectrum (apart from the largest
s
' IQ
I1Q
eigenvalue s (Q 1) and (1 s) (Q 1) respectively): I13s
Q
where s is an eigenvalue of Js (Q ).
Figure B.3 shows the probability density function i ({) of the eigenvalues of the adjacency matrix D of Js (Q ) with Q = 100 together with the
eigenvalues of the corresponding matrix DX where all one elements in the
adjacency matrix of Js (100) are replaced by i.i.d uniform random variables
on [0,1]. Wigners Semicircle Law provides an already better approximation
491
12
p = 0.2
N = 100
p = 0.3
p = 0.4
p = 0.5
p = 0.6
p = 0.7
p = 0.8
p = 0.9
Semicircle Law (p = 0.5)
10
p = 0.7
p = 0.8
fO(x)
0
0
20
40
60
80
eigenvalue x
Fig. B.3. The spectrum of the adjacency matrix of Js (100) (full lines) and of the
corresponding matrix with i.i.d. uniform elements (dotted lines). The small peaks
at higher values of { are due to 1 .
than for Q = 50. Since the elements of DX are always smaller (with probability 1) than those of D, the matrix norm kDX k2 ? kDk2 , which implies
by Section B.2, art. 1 that 1 (DX ) ? 1 (D). In addition, relation (B.13)
P
2
shows that Q
n=1 n (DX ) ? 2O such that Var[ (DX )] ? Var[ (D)], which
is manifested by a narrower and higher peaked pdf centered around { = 0.
Appendix C
Solutions of problems
H [log [] =
log n Pr [[ = n]
n=1
S
n
while (2.18), *[ (}) = "
n=0 Pr [[ = n] } , shows that we need to express log n in terms of
n
} . A possible solution starts from the double integral with 0 ? d $ e,
] e
] "
] " ] e
g{
gwh3w{ =
gw
g{h3w{
d
where the reversal Uof integration is justied by absolute convergence (Titchmarsh, 1964,
Section 1.8). Since 0" gwh3w{ = {1 , the left-hand side integral equals
]
"
g{
d
gwh3w{ = log
"
e
d
h3wd 3 h3we
gw
w
hence,
]
"
log n =
0
h3w 3 h3wn
gw
w
"
log n Pr [[ = n] =
0
n=1
"
=
0
"
gw [ 3w
h 3 h3wn Pr [[ = n]
w n=1
$
#
"
"
[
[
gw
Pr [[ = n] 3
h3wn Pr [[ = n]
h3w
w
n=1
n=1
"
493
494
Solutions of problems
(ii) (a) The pdf of the n-th smallest order statistic follows from (3.36) for an exponential
distribution as
p 3 1
n31 3(p3n+1){
i[(n) ({) = p
1 3 h3{
h
n31
The probability generating function (2.37) is
l
p 3 1 ] "
k
n31 3(}+(p3n+1))w
1 3 h3w
*[(n) (}) = H h3}[(n) = p
h
gw
n31
0
Let x = h3w and = } + (p 3 n + 1), then the integral reduces to the well-known Beta
function (Abramowitz and Stegun, 1968, Section 6.2.)
] "
]
n31 3w
1 1
1
1 3 h3w
h
gw =
(1 3 x)n31 x@31 gw = E (n> @)
0
0
1 K (n) K (@)
=
K (n + @)
Hence,
*[(n) (}) =
}
n31
\
+p+13n
K
1
p!
p!
}
=
}
(p 3 n)! K + p + 1
(p 3 n)! m=0
+p3m
The mean follows from H [(n) = 3O0z(n) (0) where O[ is the logarithm of the generating
function (2.41) as
n31
1 [ 1
H [(n) =
m=0 p 3 m
(C.1)
(b) For a polynomial probability density function i[ ({) = {31 1{M[0>1] with A 0, we
have with (3.36) for { M [0> 1] that
p 3 1
{n31 (1 3 { )p3n
i[(n) ({) = p
n31
with mean
p 3 1 ] 1
{n (1 3 { )p3n g{
H [(n) = p
n31
0
1
p 3 1 ] 1
K n+
1
p!
=p
wn+ 31 (1 3 w)p3n gw =
1
n31
(n 3 1)! K p + 1 +
0
If < ", then H [(n) = 1, while for < 0, H [(n) = 0. For a uniform distribution
n
where = 1, the result is H [(n) = p+1 . Indeed, the p independently chosen uniform
random variables divide, after ordering, the line segment [0> 1] into p + 1 subintervals. The
length O of each subinterval has a same distribution, which more easily follows by symmetry
if the line segment is replaced by a circle of unit perimeter. Since the length O of each
subinterval is equal in distribution, one can consider the rst subinterval [0> [(1) ] whose
length O exceeds a value { M (0> 1) if and only if all p uniform random variables belong to
[{> 1]. The latter event has probability equal to (1 3 {)p such that Pr [O A {] = (1 3 {)p
1
and, with (2.35), H [O] = p+1
.
(iii) If [ were a discrete random variable, then Pr [[ = n] E qqn , where qn is the number of
values in the set {{1 > {2 > = = = > {q } that is equal to n. For a continuous random variable
[, the values are generally real numbers ranging from {min = min1$m$q {m until {max =
max1$m$q {m . We rst construct a histogram K from the set {{1 > {2 > = = = > {q } by choosing
3{min
a bin size {{ = {maxp
, where p is the number of bins (abscissa points). The choice
of 1 ? p ? q is in general di!cult to determine. However, most computer packages allow
us to experiment with p and the human eye proves sensitive enough to make a good choice
495
of p: if p is too small, we loose details, while a high p may lead to high irregularities
due to the stochastic nature of [. Once p is chosen, the histogram consists of the set
{k0 > k1 > = = = > kp31 } where km equals the number of [ values in the set {{1 > {2 > = = = > {q } that
lies in the interval [{min + m{{> {min + (m + 1){{] for 0 $ m $ p 3 1. By construction,
Sp31
m=0 km = q.
The histogram K approximates the probability density function i[ ({) after dividing each
value km by q{{ because
]
{max {min
31
{
[
{max
1=
{min
i[ ({)g{ = lim
{{<0
i[ (m ) {{ E
m=0
p31
[
m=0
km
{{ = 1
q{{
where in the Riemann sum m denotes a real number m M [{min + m{{> {min + (m + 1){{].
Alternatively from (2.31) we obtain
i[ (m ) = lim
{{<0
Pr [m ? [ $ m + {{]
Pr [{min + m{{ ? [ $ {min + (m + 1){{]
E
{{
{{
such that
i[ (m ) E
km
q{{
2{
1
u 2 {$u
2{g{
u2
integration as IU ({) =
1{$u + 1{Au . The (random) position U(p) of the p-th nearest
mobile node to the center is given by (3.36)
iU(p) ({) = QiU ({)
Q 3 1
(IU ({))p31 (1 3 IU ({))Q 3p
p31
Q 313(p31)
Q 3 1 {2 p31
{2
13
p31
Q
Q
2
we recognize, apart from the prefactor 2{, a binomial distribution (3.3) with s = {
.
Q
Similar to the derivation of the law of rare events in Section 3.1.4, this binomial distribution
tends, for large Q but constant density , to a Poisson distribution with = {2 . Hence,
asymptotically, the pdf of the position U(p) of the p-th nearest mobile node to the center
is, for { $ u,
p31
{2
2
h3{
iU(p) ({) = 2{
(p 3 1)!
(v) We use the law of total probability (2.46) rst assuming that Z is discrete,
[
Pr[Y 3 Z $ {|Z = n] Pr [Z = n]
Pr[Y 3 Z $ {] =
n
496
Solutions of problems
If Z is continuous, the general formula is
]
"
Pr[Y $ { + |]
Pr[Y 3 Z $ {] =
3"
g Pr[Z $ |]
g|
g|
(C.2)
"
3"
iY ({ + |)iZ (|)g|
This resembles the convolution integral (2.62). If both Y and Z have the same distribution,
direct integration of (C.2) yields
Pr[Y $ Z ] = Pr[Z $ Y ] =
1
2
This equation conrms the intuitive result that two independent random variables with
same density function have equal probability to be larger or smaller than the other.
i[\ ({> |; ) =
1
(2l)2
f1 +l"
f1 3l"
f2 +l"
f2 3l"
2
2
2 \ (1 ) 2
1
}2
2
h 2 ([ }1 +}2 \ ) +
g}1 g}2
] f2 +l" 2 (12 )
2
\
1
}2 }2 (|3\ )
2
=
h
h
g}2
2l f2 3l"
] f1 +l"
2
1
1
O=
497
Since the integrand is an entire function, thekcontour canlbe shifted, which allows substitution as in real analysis. Thus, let x = w 3 l f1 + }2\ , then
[
1 3 }2\
[
h
2
1 3 }2\
[
h
2
&
2
[
2
[
x gx
h
exp 3
2
3"
%
$&
#
] "
2
({ 3 [ )
({3[ )
exp 3 [ x2 + 2lx
gx
2
2
[
3"
5
&]
$2 6
%
#
2
"
[
)
({ 3 [ )2
({
3
({3[ )
[
8 gx
exp 3
exp 73
x+l
2
2
2[
2
[
3"
1 f1 ({3[ )
O=
h
2
"
k
l
}
l x+l f1 + 2 \
({3[ )
[)
By substituting w = x + l ({3
, the integral becomes
2
[
%
&
I ] "
] "
2
2
[
[
2
2
2
w gw = 2
w gw =
exp 3
exp 3
h3z z31@2 gz
2
2
[ 0
3"
0
I
I
2
2
1
=
K
=
[
2
[
"
&
where we have used the Gamma function (Abramowitz and Stegun, 1968, Chapter 6).
Hence,
}
3 2 \ ({3[ )
O=
I
[ 2
({ 3 [ )2
exp 3
2
2[
&
and
2
[)
] f2 3l" \2 (12 ) 2
exp 3 ({3
2
2[
1
}2 3 \ + \
2
[
i[\ ({> |; ) =
I
h
2l f2 3l"
[ 2
({3[ ) }2 }2 |
g}2
%
2 &
|3\ 3 \ ({3[ )
[
({3[ )2 exp 3
2
exp 3 22
2\ (132 )
[
I
s
i[\ ({> |; ) =
I
[ 2
\ 1 3 2 2
which nally leads to the joint Gaussian density function (4.4). Hence, the linear combination method leads to exact results for Gaussian random variables.
"
[
q=0
Pr[\ = n|Q = q] Pr [Q = q]
498
Solutions of problems
With (3.3) and (3.9), we have
Pr [\ = n] =
"
"
[
sn 3 [ t (q3n) q
q n (q3n) q h3
=
h
s t
n
q!
n!
(q 3 n)!
q=0
q=n
"
(s)n 3+t
sn n 3 [ t q q
h
=
h
n!
q!
n!
q=0
n
since the Poisson random variables [1 and [2 are independent. As an application we can
consider a Poissonean arrival ow of packets at a router with rate . If the packets are
marked randomly with probability s = 1 , the resulting ow consists of two types, those
marked and those not. Each of these ows is again a Poisson ow, the marked ow with
rate 1 = s and the non-marked ow with 2 = (1 3 s). Actually, this procedure leads
to a decomposition of the Poisson process into two independent Poisson processes and leads
to the reverse of Theorem 7.3.4.
1
(iv) (a) Applying the solution of previous exercise immediately gives +
1
2 +3
(b) Since the three Poisson processes are independent, the total number of cars on the three
lanes, denoted by [, is also a Poisson process (Theorem 7.3.4) with rate = 1 + 2 + 3 .
q
Hence, Pr [[ = q] = q! h3 .
(c) Let us denote the Poisson process in lane m by [m . Then, using the independence
between the [m ,
Pr [[1 = q> [2 = 0> [3 = 0] = Pr [[1 = q] Pr [[2 = 0] Pr [[3 = 0]
=
q
q
1 31 32 33
h
h
h
= 1 h3
q!
q!
(v) (a) The player relies on the fact that during the time there is exactly one arrival. Since
the game rules mention that he should identify the last signal in (0> W ), signals arriving
during (0> v) do not inuence his chance to win because of the memoryless property of the
Poisson process. The number of arrivals in the interval (v> W ) obeys a Poisson distribution
with parameter (W 3 v). The probability that precisely one signal arrives in the interval
(v> W ) is Pr [Q (W ) 3 Q (v) = 1] = (W 3 v) h3(W 3v) .
(b) Maximizing this winning probability with respect to v (by equating the rst derivative
to zero) yields
g
Pr [Q (W ) 3 Q (v) = 1] = 3h3(W 3v) + 2 (W 3 v) h3(W 3v) = 0
gv
with solution (W 3 v) = 1 or v = W 3 1@. This maximum (which is readily veried by
g2
checking that gv
2 Pr [Q (W ) 3 Q (v) = 1] ? 0) lies inside the allowed interval (0> W ). The
maximum probability of winning is Pr [Q (W ) 3 Q (W 3 1@) = 1] = 1@h.
499
(vi) (a) We apply the general formula (7.1) for the pdf of a Poisson process with mean H [[(w)] =
w = 1. Then, Pr [[ (w + v) 3 [ (v) = 0] = h3w = 1h .
S
1
(b) Pr [[ (w + v) 3 [ (v) A 10] = 1 3 Pr [[ (w + v) 3 [ (v) $ 10] = 1 3 1h 10
n=0 n! .
(c) Each minute is equally probable as follows from Theorem 7.3.3.
(vii) This exercise is an application of randomly marking in a Poisson ow as explained in
solution (iii) above. The total ow of packets can be split up into an ACK stream, a
Poisson process Q1 with rate s = 3v31 and a data ow, an independent Poisson process
Q2 with rate (1 3 s) = 7v31 . Then,
(a) Pr [Q1 A 1] = 1 3 Pr [Q1 = 0] = 1 3 h33
(b) The average number is H [Q1 + Q2 |Q1 = 5] = H [Q1 |Q1 = 5] + H [Q2 |Q1 = 5] = 5 +
H [Q2 ] = 5 + 7 = 12 packets.
(c) Pr [Q1 = 2|Q1 + Q2 = 8] =
32 h3 76 h7
2!
6!
108 h10
8!
E 29=65%
(viii) (a) Since the three Poisson arrival processes are independent, the total number of requests
will also be a Poisson process with the parameter = 1 + 2 + 3 = 20 requests/hour
(Theorem 7.3.4). The expected number of requests during an 8-hour working day is H [Q] =
w = 20 8 = 160 requests.
(b) If we denote arrival processes of requests with dierent ADSL problems each with a
random variable [l for l = 1> 2> and 3, then due to their mutual independence
Pr [[1 = 0> [2 = n> [3 = 0] = Pr [[1 = 0] Pr [[2 = n] Pr [[3 = 0]
= h31 w
8
6 3
h3 3
h3 3 = 1=7 1033 .
3!
20
(c) If we denote the total number of requests by [ then Pr [[ = 0] = h3w = h3 4 =
33
6=7 10 .
(d) The precise time is irrelevant for Poisson processes, only the duration of the interval
matters. Here intervals are overlapping and we need to compute the probability
3
1
[
n=0
1
[
n=0
1
[
n=0
h32
(e) Given that at the moment w + v there are n + p requests, the probability that there
were n requests at the moment w is
Pr [{[ (w) = n} K {[ (w + v) = n + p}]
Pr [[ (w + v) = n + p]
Pr [[ (w) = n] Pr [[ (w + v) 3 [ (w) = p]
=
Pr [[ (w + v) = n + p]
Pr [[ (w) = n|[ (w + v) = n + p] =
500
Solutions of problems
(ix) (a) The number of attacks that are arriving to the PC is a Poisson random variable [ (w)
with rate = 6. The probability that exactly one (n = 1) attack during one (w = 1) hour
follows from (7.1) as Pr [[(1) = 1] = 6h36 .
H[[(w)]
=
(b) Applying (7.2), the expected amount of time that the PC has been on is w =
60
=
10
hours.
6
(c) The arrival time of the fth attack is denoted by W . Given that there are six attacks in
one hour (w = 1), we compute the probability Pr[W ? w|[(1) = 6] that either ve attacks
arrive in the interval (0> w) and one arrives in (w> 1) or all six attacks arrive in (0> w) and none
arrives in the interval (w> 1). Hence, for 0 $ w ? 1,
IW (w) = Pr[W ? w|[(1) = 6]
Pr[{[(w) = 5} K {[(1) = 6}] + Pr[{[(w) = 6} K {[(1) = 6}]
Pr[[(1) = 6]
Pr[[(w) = 5] Pr[[(1) 3 [(w) = 1] + Pr[[(w) = 6] Pr[[(1) 3 [(w) = 0]
=
Pr[[(1) = 6]
The probability
that the fth attack will arrive between 1:30 p.m. and 2 p.m. is IW (1) 3
7
= 57
.
IW 12 = 1 3 64
64
U
(d) The expectation of W given [(1) = 6 follows from (2.33) as H [W |[(1)] = 01 {iW ({)g{
gI (w)
W
derived in (c). Alternatively, the expectation can be computed from
where iW (w) = gw
U
(2.35), H [W |[(1)] = 01 1 3 (6{5 3 5{6 ) g{ = 57 . Hence the expected arrival time of the
fth attack between 1 p.m. and 2 p.m. is about 1:43 p.m.
(x) Let [ and [m denote the lifetime of system and subsystem m respectively. For a series
of subsystems with independent lifetimes [m is the event {[ A w} = Kq
{[ A w} and
m=1 m
T
Pr [[ A w] = q
Pr
[[
A
w].
Recall
with
(3.32)
that
Pr
[[
A
w]
=
Pr
min
m
1$m$q [m A w .
m=1
Using the denition of the reliability function (7.5) then yields
Userie s (w) =
q
\
Um (w)
m=1
(xi) The probability that the system V shown in Fig. 7.6 fails is determined by the subsystem
with longest lifetime or [ = max1$m$q [m . Invoking relation (3.33) combined with the
denition of the reliability function (7.5) leads to
Up a ra llel (w) = 1 3
q
\
(1 3 Um (w))
m=1
= Pr [Z0 $ {> Z1 A w] +
"
[
q=1
501
"
g Pr [Zq $ x]
gx
gx
=
0
A renewal process restarts after each renewal from scratch (due to the stationarity and the
independent increments of the renewal process). This implies that Pr [Zq+1 A w|Zq = x] =
Pr [q+1 A w 3 x] = 1 3I (w 3 x) because the interarrival times are i.i.d. random variables.
Combined,
" ]
[
Pr ZQ (w) $ { = Pr [ A w] +
q=1
Pr [ A w 3 x] g Pr [Zq $ x]
Pr [ A w 3 x] g
= Pr [ A w] +
0
"
[
$
Pr [Zq $ x]
q=1
With the basic equivalence (8.6) and the denition (8.7) of the renewal function p(w), we
arrive at
] {
Pr ZQ (w) $ { = Pr [ A w] +
Pr [ A w 3 x] gp (x)
0
This equation holds for all {. If { = w, we can use the renewal equation,
]
Pr [ A w 3 x] gp (x) = p(w) 3
0
Pr [ $ w 3 x] gp (x)
0
= Pr [Q(w) = 0] +
0
]
= Pr [Q(w) = 0] + }
]
0
$
Pr [Q(w 3 v) = n 3 1] } n
n=1
#"
w
[
= Pr [Q(w) = 0] + }
"
[
i (v)gv
$
Pr [Q(w 3 v) = n] } n
i (v)gv
n=0
w
]
0
0
] w
+}
0
502
Solutions of problems
which reduces to the renewal equation (8.9) for } = 1 since *0Q (w) (1) = p(w). The second
derivative
] w
] w
0
(})
=
2
*
(})
gI
(v)
+
}
*00
*00
Q (w)
Q (w3v)
Q (w3v) (}) gI (v)
0
2
2
= *0Q (w) (}) 3
}
}
*00
Q (w3v) (}) gI (v)
evaluated at } = 1, is
*00
Q (w) (1) = 2p(w) 3 2I (w) +
]
0
*00
Q (w3v) (1) gI (v)
(iii) Every time an IP packet is launched by TCP, a renewal occurs and the reward is that 2000
km are travelled, in each renewal, thus Uq = 2000 km. The speed in a trip that suers
from congestion is, on average, 40 000 km/s, while the speed without congestion experience
is 120 000 km/s. Since congestion only occurs in 1/5 cases, the average length (in s) of a
renewal period is
H [ ] =
4
2000
1
7
2000
+
=
120000
5
40000
5
300
The average speed of an IP packet (in km/s) then follows from (8.20) as
lim
w<"
U(w)
H [U]
2000
=
= 7 = 85714=3
w
H [ ]
300
(iv) Every transmission of an ATM cell is a renewal with average length of the renewal interval
equal to H [ ] = Q@u, where 1@u is the mean interarrival time for a voice sample. If q is
the time between the q-th and q + 1-th arrival of sample, then the average total cost per
ATM cell transmission equals
%
H [U] = H
Q
[
&
qf q + N = f
q=1
Q
[
qH [q ] + N
q=1
f Q(Q 3 1)
+N
u
2
H[U]
(Q 31)
.
Hence, the average cost per unit time incurred in UMTS is H[ ] = f 2 + Nu
Q
(v) (a) The replacement of a router is a renewal process where the time at which router Um is
replaced is Zm = plq([m > W ), and
D> if [m $ W
Um =
E> if [m A W
The average cost per renewal period is H [U] = D Pr [[m $ W ] + E Pr [[m A W ] and the
average length of a renewal interval equals
]
"
Pr [Zm A w] gw =
H [[] =
0
"
Pr [[m A w] gw
0
H [U]
.
H [[]
503
(b) For D = 10000> E = 7000, Pr [[m $ W ] = 1 3 h3W with mean life time
and W = 5, we have
1
= 10 years
H [U] = D Pr [[m $ W ] + E Pr [[m A W ] = 10000 1 3 h31@2 + 7000 h31@2 ' 8200
and
]
Pr [[m A w] gw =
H [[] =
0
h30=1w gw = 10 1 3 h31@2 ' 4
such that time average cost rate of the policy ChangeRouter is F '
8200
= 2050.
4
0.8
0.2
0.2
0.8
0.8
S2
0=800
= 7 0=640
0=640
5
S8
0=762
= 7 0=761
0=761
0=160
0=320
0=160
6
0=040
0=040 8
0=200
S4
0=190
0=191
0=190
6
0=048
0=048 8
0=048
S 16
0=768
= 7 0=742
0=742
5
0=762
= 7 0=762
0=762
rapidly conver-
0=168
0=211
0=186
6
0=046
0=046 8
0=072
0=190
0=190
0=190
6
0=048
0=048 8
0=048
from which we nd that the row vector in S 16 equals = 0=762 0=190 0=048 .
The second method consists in solving the set (9.25) by Cramers method. Hence,
30=2
det P = 0=2
1
2 =
30=2
0=2
1
0=8
31=0
1
0
0
1
det P
0=0
0=8
1
0=0
0=8 = 0=84
1
= 0=19
1 =
1 =
0
0
1
0=8
31
1
0=0
0=8
1
det P
30=2
0=2
1
0=8
31
1
det P
0
0
1
= 0=762
= 0=048
The third method relies on the specic structure of the Markov chain, a discrete birth and
dead process or general random walk with constant sn = s and tn = t. Applying formula
504
Solutions of problems
(11.9), taking into account that Q = 2, =
0=2
0=8
1
4
1
43
16
= 0=762
21
1 =
13
13
1
4
yields,
4
= 0=190
21
1
= 0=048
3 = 2 =
21
2 = 1 =
(ii) The Markov chain is shown in Fig. C.2. The state 1 is an absorbing state. From (9.23),
1
1
2
1
2
1
3
2
3
3
4
1
4
4
5
1
5
...
1
n
Q
[
n
n
n=1
2 = 0
m32
m =
m31
m31
mD2
or 1 = 1 and m = 0 for m A 1. Hence, the steady-state vector exists, and is dierent from
= 0, which demonstrates that the Markov chain is positive recurrent for any number of
states Q. However, the drift for m A 1 because m = 1 is absorbing is
H [[n+1 3 [n |[n = m] = 1 3
1
2
1
3 =13
m
m
m
which is always positive for m A 2. Hence, given an initial state m A 2, the Markov chain
will, on average, move to the right (higher states).
(iii) (a) The Markov chain is shown in Fig. C.3.
1 pb
pb
1 py
1 pm
1 po
py
pm
po
Fig. C.3. Markov chain of the growth process of trees in a forest during a period of 15
years.
0=2
0
0=8
0
0=3
0
0
0=7
6 5
sr
0
: 9
87
0
1 3 sr
505
6
e[n]
|[n] :
p[n] 8
x[n]
6
6 5
6 5
500
950
0=4
0 : 9 4500 : 9 450 :
=
0 8 7 0 8 7 3600 8
0
0
0=6
(d) The steady-state vector obeys equation (9.22) or, equivalently, (9.25). Applying a
variant of (9.25), we have
6 5
6
5
6 5
1
1
1
1
1
e
31
0
0 : 9 | :
9 0 : 9 1 3 se
7 0 8=7
0
1 3 s|
31
0 8 7 p 8
0
0
1 3 sp 3sr
0
0
The determinant is det S = 3s0 3 (1 3 se ) (1 + 2s0 3 s| 3 sp + s| sp 3 s0 s| ) and via
Cramers method we have
5
6
1
1
1
1
1
0
31
0
0
9
:
e =
det 7
0 1 3 s|
31
0 8
det S
0
0
1 3 sp 3sr
5
6
31
0
0
7
31
0 8
det 1 3 s|
0
1 3 sp 3sr
3sr
=
=
det S
det S
With the numerical values given in (c), e = 0=25773. After a similar calculation for the
other categories, the total number of trees in steady growth is
5
6 5
6
5000e
1289
9 5000| : 9 1160 :
7 5000 8 ' 7 928 8
p
1624
50000
(iv) (a) The clustered error pattern is modeled as a two-state discrete Markov chain. When a
bit is received incorrectly, the system is in state 0 else it is in state 1. The Markov chain
is shown in Fig. 9.2, wheres = 1 3 0=95 =
0=05 and t = 1 3 0=999 = 0=001. The transition
0=95
0=05
.
probability matrix is S =
0=001 0=999
(b) There is only one communicating class because both states 0 and 1 are reachable from
each other. The Markov chain is therefore irreducible.
(c) The steady-state vector follows from (9.37) as
1
= 50
= 0=0196 0=9804
51
51
The fraction of correctly received bits in the long run is 98.04% and the fraction of incorrectly received bits is 1.96%.
(d) After repair, the system operates correctly in 99.9% of the cases, which implies that
506
Solutions of problems
1 = 0=999 and 0 = 0=001. Formula (9.37) indicates that
or s = 999t. The test sequence shows that
s
s+t
= 0=999 and
t
s+t
= 0=001
2O
2P
Fig. C.4. The Markov chain for the three states: (1) both processors work, (2) one processor
is damaged and (3) both processors are damaged.
(b) The innitesimal generator is
3
T=C
32
0
2
3( + )
2
4
0
D
32
If the state probability vector is denoted by v(w), we can also write v(w)T =
3
[v1 (w)
v2 (w)
v3 (w)] C
32
0
2
3( + )
2
4
0
D = v01 (w)
32
v02 (w)
g
gw
(v(w)), or
v03 (w)
1
2
3
32
C
0
2
3( + )
2
4
0
D = [0
32
0]
2
2
2
Since 1 + 2 + 3 = 1, we nd that 1 = +
, 2 = (+)
. From
2 and 3 =
+
the balance equation, we know that the probability ux from state 1 to state 2 should
precisely equal that in the opposite direction such that 21 = 2 and similar for the
transitions 2 < 3, 1 = 22 . Using 1 + 2 + 3 = 1 leads faster to the solution. With
= 0=001 and = 0=01, the values are 1 = 0=8264, 2 = 0=1653 and 3 = 0=0083.
(d) The availability in case (i) is 1 = 0=8264. The availability in case (ii) is 1 +2 = 0=9917.
(ii) (a) In state 0, both servers are damaged, state 1 refers to one server down and one operating
while in state 2, both servers are operating. The corresponding Markov chain is shown in
Fig. C.5.
5
6
3E
0
E
7
8
I + H 3I 3 H 3
(b) The innitesimal generator T =
H
2K
3H 3 2K
507
OB
O
PF + PE
PH
PE
1 31
1 31
h
h
= 6=66 1032 h 31 , E =
=
15
20
= 7 1034 h31 and H = 6 1035 h31 .
(c)
0
1
2
3E
7 I + H
H
0
3I 3 H 3
2K
6
E
8= 0
3H 3 2K
0
1
2
3E
7 I + H
H
0
3I 3 H 3
2K
6
1
1 8= 0
1
2 =
0 = 1 3 2 3 1 = 0=0013
(d) Theorem 10.2.3 states that the average lifetime of state m is H [m ] =
1
1
=
t0
E
1
=
H [1 ] =
t1
I
1
H [2 ] =
=
t2
H
H [0 ] =
1
. This yields
tl
= 20 h
1
= 14=9 h
+ H +
1
= 1515 h
+ 2K
(e) A repair takes place when the system transfers from state 1 to 2. When the system
jumps from state 0 to state 2, two repairs take place. The fraction of time during which
both servers are damaged is 0 and the fraction of time in which one server is operating is
1 . The rate of repairs will be the rate of changing from state 1 to 2, plus two times the
rate of changing from state 0 to state 2:
iu = 1 t12 + 20 t02 = 1 + 20 E = 7=17 1034
If we denote with [ the random variable of the number of total failures over the period of
1 year, then the average value of [ will be
H [[] = iu 24 365 = 6=28
508
Solutions of problems
m
m31
\
m \ 1
m
p
= m
=
=
p+1
(p + 1)
p=1 p
m!
p=0
m =
1+
m
m!
S" m
m=1 m!
m 3
h
m!
mD0
which demonstrates that the steady-state probability that the birth and death process is
in state m is Poisson distributed with mean .
(b) Similarly, we rst compute with p = (p+1)
and p = ,
m31
\
p=0
m31
\
p
m
=
=
p+1
(p + 1)
m!
p=0
which leads to precisely the same steady-state as in (a). Indeed, the steady-state is only a
function of the ratios p , which are the same in both (a) and (b).
p+1
(ii) All stations in slotted ALOHA operate independently and each has probability sw = 0=12 to
transmit in a timeslot. A station is successful in one slot with probability sv = sw (13sw )Q 31
where the number of stations Q = 8. Thus, sv = 0=049. The waiting time Z to transmit
one packet is a geometric random variable with parameter sv from which (Section 3.1.3)
the mean H [Z ] = s1 . Alternatively, H [Z ] obeys the equation
v
H [Z ] = sv + (1 3 sv ) (1 + H [Z ])
because the average waiting time equals 1 timeslot with probability sv plus 1 timeslot
increased with the average waiting time with probability 1 3 sv . Solving that equation
again yields H [Z ] = s1 = 20=39 timeslots. The average transmission time for 7 packets is
v
7H [Z ] = 142=7 timeslots.
or = H [Q{ ]. Substituted into Littles law for the waiting time in the buer, H QT =
H [z], and using H QT = 3=2 gives
H [z] =
H QT 1
4
=
H [Q{ ]
(ii) In a M/M/m/m queue, the number of busy servers equals the number (of packets) in the
509
system QV . From (14.16) and the denition (2.11), the average number of busy servers
equals
H [QV ] =
p
[
m=0
m Pr [QV = m] = Sp
p
[
m
m=0 m!m m=1
m
(m 3 1)!m
such that
6
5
p31
p
[ m
7[ m
p 8
m
=
=
3
(m 3 1)!m
m=0 m!m
m=0 m!m
p!p
6
5
p
7
p!p
8 = (1 3 Pr [QV = p])
H [QV ] =
1 3 Sp
m
m
m=0 m!
u2 @2
1 + u + u2 @2
I
1+ 19
.
90
SE +
2
2SE 3SE
13SE
. For SB =
1
,
10
we have that u =
I
1+ 19
9
(vi) The average system times H [W ] for the three dierent queueing systems are immediate.
From (14.2), for system A, we have
H [WD ] =
1
n (1 3 )
510
Solutions of problems
For each of the n subqueues of system B, (14.2) gives
H [WE ] =
1
(1 3 )
n (1 3 ) + Pr [QV D n]
n (1 3 )
Clearly, H [WE ] = nH [WD ] shows that by replacing n small systems by one larger system
with same processing capability, the average system time decreases by a factor n.
From the relation H [WF ] = i (n> )H [WD ] with i (n> ) = n (1 3 ) + Pr [QV D n], it is
more complicated to decide where i (n> ) is larger or smaller than 1. The extreme values
Ci (n>)
C Pr[QV Dn]
of i (n> ) are known: i (n> 0) = n and i (n> 1) = 1. Since
= 3n +
C
C
C Pr[Q Dn]
Ci (n>)
V
and
A 0, it cannot be concluded that
is monotonously decreasing
C
C
from n to 1 in which case we would have i (n> ) A 1. Assuming n real, we observe that
Ci (n>)
C Pr[QV Dn]
= (1 3 ) +
A 0 for all ? 1, which implies that i (2> ) ? i (3> ) ?
Cn
Cn
and allows us to concentrate only on i (2> ). Numerical results show that i (2> ) ? 1 if
D 0=85, but i (3> ) D 1. This leads us to the conclusion that for n A 2, system A always
outperforms system C; only if n = 2 and in the heavy tra!c regime D 0=85, system C leads
to a slightly shorter average system time of maximum 1.7%. Hence, by replacing n A 2
processing units (servers) by one with same processing capability, always lowers the total
time spent in the system. Of course, all conclusions only apply to systems that can be well
modeled as M/M/m queueing systems. To rst order a computing device (processor) may
be regarded as a M/M/1 queue. Then, the analysis shows that replacing an old processor
by a n times faster one is faster (on average) than installing n old processors in parallel.
(vii) The waiting process for aeroplanes is modeled as a M/D/1 queue because the arrival process
1
is Poissonean with rate = 10
arrivals/minute, it consists of a single queue as 1 aeroplane
can land at a time and the service process (the landing process) takes precisely { = 5
5
minutes (constant service time). Thus, H[{] = 5 minutes> Var[{] = 0 and = 10
. Since
the M/D/1 process is a special case of the M/G/1, we can apply the general formula (14.28)
for the average waiting time in the queue of an M/G/1 system
H[z] =
1
52
H[{2 ]
= 10 1 = 2=5 minutes
2(1 3 )
2 2
(viii) (a) We know that the arrival intensity of new calls to the cell is i = 20 calls/min. Let k
denote the arrival rate of the handover calls. The average time spent by a call in the cell is
H [W ] = 1=64 minutes and the average number of ongoing calls is H [Q] = 52. Furthermore,
the blocking rate is SE = 0=02. The total arrival rate of calls that are carried by the base
station is
c a rrie d = (1 3 SE ) o e red = (1 3 SB ) (f + h )
Littles formula (13.21) states that H [Q] = c a rried H [W ]. Note that only the carried calls
have an inuence on the state of the system. We can solve the asked k from these two
equations as
h =
H [Q]
3 f = 12=35 calls/minute
H [W ] (1 3 SB )
(b) The arrival intensity of lost calls is lo st = SB (f + h ) = 0=647 calls/minute. If only
the new calls are blocked, the asked blocking rate is
SB f =
lo st
= 3=24%
f
511
b) Use in (14.25) the Laplace transform of an exponential random variable with mean
given in (3.16)
*{ (v) =
1
+v
One obtains
k l
H } Q = *{ ( (1 3 })) =
@ ( + )
=
(1 3 }) +
1 3 (@ ( + )) }
[
n
"
= 13
}n
+ n=0 +
13
+
+
n
sD
(s 1)D
2
P
(s 2)D
...
3
2P
(s m + 1)D
3P
m
mP
m =
(v3n+1)
n
Sp Tq (v3n+1)
q=1
n=1
n
n=1
1+
,
v m
u
m
= Sp v q
q=0 q u
The computation of the blocking probability is more complex than for the Erlang B formula,
because the arrival process is not a Poisson process. Indeed, due to the nite number of
customers v, the largest number of possible arrivals is nite and the arrival rate depends
on the state. Hence, the PASTA property cannot be applied. For a small time interval
{w, the blocking probability SE equals the ratio of the se ({w), the probability of blocking
in {w, over sd ({w), the probability of an arrival in {w. Since the arrival rates depend on
512
Solutions of problems
the state, the probability of an arrival in {w is not equal to p as for the Erlang B model.
Instead, we have
Sp
p
[
(v 3 q) v uq
sd ({w) = {w
(v 3 q) Pr[QV = q] = {w q=0
Sp v q
q
q=0 q u
q=0
Sp v31 q
u
q=0
q
= v{w S
v q
p
q=0 q u
Furthermore, blocking is only caused if QV = p and if at least one of the v 3 p customers
of the still demanding group generates an arrival. However, since the interval {w can
be made arbitrarily small1 , a generation of more than 1 arrival has probability r({w) such
that it su!ces to consider only one call attempt. Hence,
v31 p
u
se ({w) = {w(v 3 p) Pr[QV = p] = v{w Spp v q
q=0 q u
The Engset call blocking probability SE =
SE
se ({w)
sd ({w)
becomes
v31 p
u
= Sp pv31
uq
q=0
q
(C.3)
1
p!
q=0
(v313p)!vpq
q!(v313q)!
q
(v313p)!vpq
(v313p)!
fou() E
3N
13
31
13 3N
1 3 31
For su!ciently high loads A 0=8, we use the approximation E 32 of Section 5.7 to
obtain
(1 3 )2N
fouM / D / 1 / K '
(C.4)
1 3 2N+1
Comparing with (14.20) in the M/M/1/K queue,
fouM / M / 1/ K '
(1 3 )N
1 3 N+1
the M-server (in continuous-time) needs approximately twice as much buer places to guarantee the same cell loss ratio as in the corresponding D-server (in discrete-time). Further
combining (14.3) and (14.38) shows that
M / M / 1 H zM / M / 1 = 2M / D / 1 H zM / D / 1
or, the average waiting time in the queue (normalized to the average service time) for the
1
Similar arguments are used in Chapter 7 when studying the Poisson process.
513
M/M/1 queue is exactly twice as long as for the M/D/1 queue. The variability of the
service in the M-server causes these rather large dierences in performance. Furthermore,
the simple formula (C.4) is particularly useful to engineer ATM buers or to dimension
simple queueing networks. If the number of individual ows that constitute the aggregate
ow are large enough and none of the individual ows is dominant, the aggregate arrival
process is quite well approximated by a Poisson process. Given as a QoS requirement a
stringent cell loss ratio fou W , the input ow can be limited such that fouM / D / 1 / K ? fouW .
Alternatively, the buer size N can be derived from (C.4) subject to fouM / D / 1 / K = fou W for
an aggregate Poisson input ow = 0=9. As long as the input ow is limited to ? 0=9,
the thus found buer size N always guarantees a cell loss ratio below fou W provided the
input ow can be approximated as a Poisson arrival process.
]31
[
Pr[|{D 3 {E | = n]{n =
n=0
] 3 ]{2 + 2{({] 3 1)
] 2 ({ 3 1)2
Since the nodes are uniformly chosen, all coordinate dimensions are independent and the
generating function of the hopcount of the shortest path in a g-lattice is2 *g] ({). From
g
(] 2 3 1) and
(2.26) and (2.27), the average number of hops is immediate as H[kQ ] = 3]
g(] 2 31)(] 2 +2)
g 1@g
Q
3
and
Var[kQ ] '
g 2@g
Q
18
both increasing in g A 1 (for constant Q )as inQ (for constant g). For a two-dimensional
I
Q .
lattice, the average hopcount scales as R
(ii) Using the denition (15.6) of the clustering coe!cient and applying the law of total probability (2.46) yields
31
k
l Q[
Pr fJs (Q ) $ { =
Pr
n=0
2|
$ { gy = n Pr [gy = n]
gy (gy 3 1)
The degree distribution Pr [gy = n] in the random graph is given by (15.11) and
k l
Pr
{
2
n
n
n
[
2|
3m
2
sm (1 3 s) 2
$ { gy = n = Pr | $
{ gy = n =
m
gy (gy 3 1)
2
m=0
g
\
m=1
*]m ({).
514
Solutions of problems
because | is the number of links between the gy = n neighbors of y, which is binomially
distributed with parameter s. Combined gives
k l
n
{
2
31
k
n
l Q[
n
[
Q 3 1 n
3m
2
sm (1 3 s) 2
Pr fJs (Q ) $ { =
s (1 3 s)Q 313n
m
n
m=0
n=0
l
k
The average H fJs (Q ) is computed via (2.35) as
k
l ]
H fJs (Q ) =
k
l
Pr fJs (Q ) A { g{
n=0
Let w =
2
[
k l
m= n
{ +1
2
n
n
3m
2
sm (1 3 s) 2
g{
m
n
{, then
2
Q 3 1 n
s (1 3 s)Q 313n
n
Q
31
[
2
[
m=
k
n
{
2
] n
n
n
2
1
3m
m
2
g{ = n
s (1 3 s) 2
m
l
0
2
n
2
n
n
[
3m
2
gw
sm (1 3 s) 2
m
m=[w]+1
n
2
2
n
1 [ [ n2 m
3m
s (1 3 s) 2
= n
m
w=0 m=w+1
2
]
0
2
[
n
2 m31
n
n
n
1 [ [ n2 m
3m
3m
2
sm (1 3 s) 2
s (1 3 s) 2
g{ = n
m
m
l
m=1 w=0
k
m= n
{
2
n
2
n
1 [ n2 m
3m
s (1 3 s) 2
= n
m
=s
m
m=1
2
l
k
Hence, we nd that H fJs (Q ) = s. Along the same lines, we nd that the generating
function *f (}) of the clustering coe!cient fJs (Q ) is
*f (}) =
Q
31
[
n=0
Q 3 1 n
s (1 3 s)Q 313n
n
}
3 n
1 3 s + sh (2)
n
2
(iii) The probability Pr [KQ = 2] is determined by the intersection of two independent events.
First, there is no direct path between node D and E. This event has a chance proportional
to 13s. Second, there is at least one path with two hops. All Q 32 possible two-hops paths
between D and E have the structure (D < m) (m < E) and they have no links in common,
i.e. they are mutually independent and independent from the direct link. The probability
of the second event equals 1 3 S2 , where S2 is the probability that there is no path with
515
two hops. Hence, we have that Pr [KQ = 2] = (1 3 s)(1 3 S2 ) and it remains to compute
S2 . The event of no path with two hops is
f
32
32
= KQ
Q
m=1 1(D<m)(m<E)
m=1 1((D<m)(m<E))f
such that
S2 = Pr
=
k
f l
k
l
32
32
= Pr KQ
Q
m=1 1(D<m)(m<E)
m=1 1((D<m)(m<E))f
Q
32
\
32
Q\
Q 32
1 3 Pr 1((D<m)(m<E)) = 1 3 s2
Pr 1((D<m)(m<E))f =
m=1
m=1
10 iterations
5
10 iterations
6
10 iterations
0.20
Pr[H50 = k]
Relative error
0.1
0.01
exact pdf
0.15
0.10
0.05
0.00
0.001
10
6 8 10 12 14
k hops
15
20
25
k hops
Fig. C.7. The relative error of the simulations of the hopcount in the complete graph with
exponential link weight versus the hopcount for 10 4 > 105 and 106 iterations.
[uq ] of the relative error for q iterations versus the hops n are
H [u104 ] = 0=12
H [u105 ] = 0=047
H [u106 ] = 0=017
[u104 ] = 0=17
[u105 ] = 0=073
[u106 ] = 0=02
where the range of n values has been limited for q = 104 to 10 hops, for q = 105 to 11
hops, and for q = 106 to 12 hops. For larger hops, the simulations return zeros because the
tail probability Pr [KQ A n] decreases as R (1@n!) and simulating such a rare event requires
on average at least as many simulations as (Pr [KQ = n])31 . The table roughly shows
516
Solutions of problems
that the average error over the non-zero returned values decreases as R I1q , which is in
agreement with the Central Limit Theorem 6.3.1. Each iteration of the simulation can be
regarded as an independent trial and the histogram sums in a particular way the number
of these trials.
(ii) Using (2.43), we have
2
2
0
Var [ZQ ] = *00
= *00
ZQ (0) 3 *ZQ (0)
ZQ (0) 3 (H [ZQ ])
$2
#
Q
31
Q
31
n
[
[
g2 \ q(Q 3 q)
1
1
1
=
3
Q 3 1 n=1 g} 2 q=1 } + q(Q 3 q)
Q 3 1 q=1 q
}=0
n
\
q(Q 3 q)
}
+
q(Q 3 q)
q=1
g log j(})
.
g}
The second
n
n
[
1
g log j (})
q(Q 3 q)
g [
log
=
=3
g}
g} q=1
} + q(Q 3 q)
}
+
q(Q
3 q)
q=1
n
[
1
g2 log j (})
=
2
g} 2
q=1 (} + q(Q 3 q))
n
[
q=1
$2
1
q(Q 3 q)
Q
31 [
n
[
1
1
+
3
Q 3 1 n=1 q=1 q2 (Q 3 q)2
# SQ 31
1
q=1 q
$2
Q 31
(C.5)
n
[
q=1
and, with
Q
31
[
n=1
$2
1
q(Q 3 q)
Q
31
[
n=1 q=1
SQ 31 Sn
n
[
q=1
n=q
1
m=1 m(Q 3m)
$2
1
q(Q 3 q)
n
[
Q
31
n
[
[
1
1
=
q(Q 3 q) m=1 m(Q 3 m)
q=1
SQ 31 Sn
n=1
1
m=1 m(Q 3m)
SQ 31 Sn
n=q
1
m=1 m(Q 3m)
q(Q 3 q)
Sq31 Sn
n=1
1
m=1 m(Q 3m) ,
3
4
Q
31 [
q31
n
n
[
[[
1
1
1
C
D
3
=
q(Q 3 q) n=1 m=1 m(Q 3 m) n=1 m=1 m(Q 3 m)
q=1
4
3
Q
31
Q
31
Q
31
q31
q31
[
[
[
[
[
1
1
1
C
13
1D
=
q(Q 3 q) m=1 m(Q 3 m) n=m
m(Q 3 m) n=m
q=1
m=1
Q
31
[
Q
31
[
q=1
Q
31
Q
31
q31
[
[
[
1
1
1
1
3
q(Q 3 q) m=1 m
(Q
3
q)
m(Q
3 m)
q=1
m=1
Q
31
[
q=1
q31
[
1
1
q(Q 3 q) m=1 (Q 3 m)
Sn
1
q=1 q(Q 3q)
Sn
1
q=1 q
1
Q
1
Q
SQ 31
1
q=Q 3n q ,
517
we have
q31
Q 31
q31
Q 31
[
[ 1
1
1
1
1
1 [
1 [
=
+
(Q 3 q) m=1 m(Q 3 m)
Q q=1 (Q 3 q) n=1 n
Q q=1 (Q 3 q)
1
Q
Q
31
[
1
m
m=1
Q[
3m31
n=1
1
1
+
n
Q
Q
31
[
1
m
m=1
Q
31
[
n=m+1
Q
31
[
n=Q 3q+1
1
n
1
n
and
Q
31
[
q=1
q31
q31
Q 31
[
[
1
1
1
1 [ 1
1
=
+
q(Q 3 q) m=1 (Q 3 m)
Q q=1 q
Q 3 q m=1 (Q 3 m)
=
Q 31
1 [ 1
Q m=1 m
Q
31
[
n=Q 3m+1
Q 31
Q 31
1
1 [ 1 [ 1
+
n
Q m=1 m n=m+1 n
Hence,
Q
31
[
n=1
n
[
q=1
$2
#Q 31 $2
Q 31
[ 1
1 [ 1
3
q
Q m=1 m
q=1
Q[
3m31
Q 31
Q
31
[
1
1
1 [ 1
+
n
Q
m
n
m=1
n=1
n=Q 3m+1
3
4
#Q 31 $2
Q
31
Q
31
Q
31
[ 1
[ 1
1 [ 1 C[ 1
2
D
3
3
=
Q q=1 q
Q m=1 m n=1 n n=Q 3m n
1
q(Q 3 q)
2
=
Q
Q 31
1 [ 1
Q m=1 m
Q
31
[
n=Q 3m+1
1
n
#Q 31 $2
Q 31
Q 31
[ 1
2 [ 1
1 [ 1 1
1
+
+
=
Q q=1 q
Q m=1 m Q 3 m
Q m=1 m
#Q 31 $2
Q 31
Q 31
[ 1
2 [ 1
2 [ 1
1
+ 2
+
=
Q q=1 q
Q m=1 m
Q m=1 m
Q
31
[
n=Q 3m+1
Q
31
[
n=Q 3m+1
1
n
1
q=1 q
2
2
2
(Q 3 1) Q
SQ 31 Sn
SQ 31
1
m
1) Q 2
m=1
(Q 3
2
+
SQ 31
m=1
1
m
SQ 31
1
n=Q 3m+1 n
Q (Q 3 1)
1
q=1 q2 (Q 3q)2
n=1
Q 31
Further,
Q
31
[
n
[
n=1 q=1
q2 (Q
Q
31
Q
31
Q
31
[
[
[
1
1
1
=
1=
2
2
2
2 (Q 3 q)
3 q)
q
(Q
3
q)
q
q=1
q=1
n=q
1
q2 (Q 3q)
n
[
n=1 q=1
q2 (Q
1
Q 2q
1
1
+ Q 2 (Q
Q q2
3q)
Q 31
Q 31
1 [ 1
2 [ 1
1
+
= 2
2
3 q)
Q q=1 q
Q q=1 q2
such that
1
n
518
Solutions of problems
Combined,
S
Q 31
Var [ZQ ] = 3
1
q=1 q
2
2
4
(Q 3 1) Q
SQ 31
(Q
1
q=1 q
3 1) Q 2
Q
31
[
2
1
Q (Q 3 1) m=1 m
Q
31
[
n=Q 3m+1
SQ 31 1
1
q=1 q2
+
n
Q(Q 3 1)
Q
31
Q
31
[
[
1
1
1
=
Q 3 m n=m n
q2
q=1
1
m
Q
31
[
n=Q 3m+1
Q
31
[
1
1
=
n
Q
3m
m=1
Q
31
[
m=1
Q
31
[
n=m+1
3
4
Q
31
Q 31
[
1D
1
1 C[ 1
=
3
n
Q 3 m n=m n
m
m=1
Q
31
Q
31
Q
31
Q 31
[
[
[
2 [ 1
1
1
1
1
3
=
3
2
Q 3 m n=m n
m (Q 3 m)
q
Q q=1 q
q=1
m=1
Q
31 \
n
[
q(Q 3 q)
1
Q 3 1 n=1 q=1 } + q(Q 3 q)
will be derived from which the distribution then follows by taking the inverse Laplace
transform. Since
3v
4 3v
4
2
2
Q
Q
Q
Q
C
D
C
} + q(Q 3 q) =
3q
3q D
+}+
+}3
2
2
2
2
u
we have with | =
Q
2
2
+ },
n
\
n
n
\
n!(Q 3 1)! \
q(Q 3 q)
1
1
=
Q
Q
}
+
q(Q
3
q)
(Q
3
n
3
1)!
q=1
q=1 | + 2 3 q q=1 | 3 2 + q
Thus,
*ZQ
Q
31
K |3 Q
+ 1 Q[
K (n + 1) K | + 2 3 n
2
(}) = (Q 3 2)!
K(Q 3 n) K | 3 Q + n + 1
K |+ Q
n=1
2
2
519
}
2P
(provided
2P
31
P
[
[
K (n + 1)
K (n + 1)
K (| + P 3 n)
K (| + P 3 n)
+
K(2P
3
n)
K
(|
3
P
+
n
+
1)
K(2P
3
n)
K
(|
3 P + n + 1)
n=1
n=P+1
P
31
[
m=0
P
31
[
K (P 3 m + 1) K (| + m)
K (P + n + 1) K (| 3 n)
+
K(P + m) K (| 3 m + 1)
K(P 3 n) K (| + n + 1)
n=1
P
31
[
m=3(P 31)
K (| + m) K (P 3 m + 1)
K(P + m) K (| 3 m + 1)
and
*Z2P (}) = (2P 3 1)!
K (| 3 P + 1)
1
K (| + P) 2P 3 1
P
31
[
m=3(P 31)
K (| + m) K (P 3 m + 1)
K(P + m) K (| 3 m + 1)
For large P,
(2P 3 1)!
}
K } +1
}
K (| 3 P + 1)
; (2P)3 2P K
; K (2P) 2P
+1
}
K (| + P)
2P
K 2P + 2P
which suggests that we consider } < 2P} since then, using (Abramowitz and Stegun, 1968,
Section 6.1.47),
*Z2P (2P}) ; (2P)3} K (} + 1)
3}
; (2P)
1
2P 3 1
P
31
[
m=3(P 31)
1
1
1+R
1+R
P
P
K (} + 1)
Hence,
lim Q } *ZQ (Q}) = K (} + 1)
Q <"
or equivalently,
l
k
lim H h3(Q ZQ 3log Q )} = K (} + 1)
Q <"
(C.6)
Q{
, the
(C.7)
The goodness of this asymptotic distribution (C.7) for nite Q is illustrated in Fig. C.8.
Observe from Fig. C.8 that iZQ (0) = 1 while iZQ (0) = Q 2 h3Q E 0. Since *ZQ (}) =
U
" 3}w
iZQ (w) gw is a single-sided Laplace transform, integrating by parts yields }*ZQ (}) =
0 h
U
0
0
iZQ (0) + 0" h3}w iZ
(w) gw provided iZ
(w) exists for all w D 0. Hence, we nd a wellQ
Q
known limit criterion of single-sided Laplace transforms,
iZQ (0) = lim }*ZQ (})
}<"
(C.8)
Applied to (16.17) leads to iZQ (0) = 1 for all nite Q and applied to the scaled link
weight where the mean is d1 such that *ZQ;d (}) = *ZQ (d}) gives iZQ (0) = d. The
interpretation of this property is related to the choice of the link weights. The shortest
520
Solutions of problems
2
10
N = 200
N = 100
N = 50
10
fWN(x)
10
-1
10
-2
10
-3
10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Fig. C.8. The pdf of the weight of the shortest path for various Q. Each simulation consists
of 106 iterations. The bold curves represent the nite Q-equivalent (C.7) of the asymptotic
result.
path includes almost surely the smallest link weights of Iz ({) = iz (0){ + R {2 since
Iz (0) = 0. Both the exponential (with parameter 1) and uniform distribution are regular
with iz (0) = 1. Since the smallest values of the weight of the shortest path ZQ in
NQ occur for direct links a.s., the distribution of ZQ around zero is dominated by the
distribution of the link weight z around zero. The contribution cannot be due a q-hop
shortest path with q A 1 since such a path existsof thesum of q exponentials, which has a
probability density around { = 0 of the form R {q31 . Indeed, see (3.24) or apply (C.8)
T
n
to the pgf of a sum of q exponentials *Vq (}) = q
n=1 }+n .
(iv) When the intermediate nodes of the shortest path between a source D and a destination
E are removed from NQ , we obtain again a complete graph with Q 3 kQ + 1 nodes.
The resulting graph contains link weights that are not perfectly exponentially distributed
anymore nor are they perfectly independent, because we have removed a special set of nodes
and not a random set. But, since we have removed at each node of the shortest path, apart
from the shortest link, Q 3 3 other links, we assume that the dependence between NQ and
the reduced graph is ignorably small. Under these assumptions, the shortest node-disjoint
path in NQ is a shortest path in NQ 3KQ +1 with exponential link weight with mean 1.
The distribution of hopcount kqg
Q of that shortest node-disjoint path is
31
k
l Q[
Pr kqg
Pr kQ 3m+1 = n|kQ = m Pr [kQ = m]
Q =n '
m=0
The hopcount KQ of the shortest path in the complete graph NQ with independent exponential link weights with mean 1 is given in (16.8). With the assumption that
Pr kQ 3m+1 = n|kQ = m = Pr kQ 3m+1 = n
521
Pr[HN = k]
0.15
N = 200
0.10
N = 100
N = 50
0.05
0.00
0
10
12
14
k hops
Fig. C.9. Both the pdf of the hopcount of the shortest path (thin line) and the shortest
node-disjoint path (bold line).
we obtain
(m+1)
Q 31 (n+1)
l
k
(31)n+1 [ VQ 3m+1 VQ
Pr kqg
Q =n '
Q!
(Q 3 m + 1)!
m=0
Q 31
1 [ (log(Q 3 m + 1))n (log Q )m
Qn! m=0
Q 3m+1
m!
logn1
Q2
and
1
Q 3m+1
1
Q
#
#
$$
Q 31
1 [ logn Q
logn31 Q
(log Q )m
+R
2
Qn! m=0
Q
Q
m!
#
#
$$
1 logn Q
logn31 Q
E
+R
E Pr [kQ = n]
n!
Q
Q2
l
k
Pr kqg
Q =n E
For large Q, we expect approximately that the hopcount of the shortest and that of the
shortest node-disjoint path have about the same distribution. The validity of the assumption is illustrated in Fig. C.9 for relatively small values of Q = 50> 100> and 200. Each
simulation consisted of q = 106 iterations. The corresponding weight of the shortest and
node-disjoint shortest path are drawn in Fig. C.10. The weight of the node-disjoint shortest
path is evidently always larger than that of the shortest path in the same graph. Nevertheless, for large Q , the simulations suggest that both pdfs tend to each other.
522
Solutions of problems
shortest path
node-disjoint shortest path
10
N = 50
fW (x)
0.1
N = 100
N = 200
0.01
0.001
0.0
0.1
0.2
0.3
0.4
0.5
Fig. C.10. Pdf of the weight of the shortest path (thin line) and the node-disjoint shortest
path (bold line).
jQ>n (2) = Q 3 1 3
G31
[
G3m
Q 313
m=0
nm+1 31
n31
Q 323
nm+1 31
n31
(Q 3 1)(Q 3 2)
(2Q 3 3)G
(2Q 3 3)
(2Q 3 3)GQ
+
3
+
(Q 3 1)(Q 3 2)
(Q 3 1)(Q 3 2)(n 3 1)
(Q 3 2)(n 3 1)
Q(Q 3 1 3 2G)
1
(Q 3 1 3 2G)
3
3
3
(Q 3 1)(Q 3 2)(n 3 1)
(Q 3 1)(Q 3 2)(n 3 1)2
(Q 3 2)(n 3 1)2
3
+R
n31
logn Q
Q
the eective power exponent W (Q) as dened in (17.32), equals for the n-ary tree and
large Q ,
3
2G3 n1
log
1
G3 n1
W (Q) ;
log 2
5
6
1
8
= 1 + log2 71 3
1
2(n 3 1) logn Q + logn (1 3 1@n) 3 n31
1
1
;13
= 1 + log2 1 3
2(n 3 1)H[KQ ]
(log 4)(n 3 1)H[KQ ]
which shows, for large Q, that W (Q) ? 1, but that W (Q) < 1 if n < ".
Bibliography
523
524
BIBLIOGRAPHY
Chalmers, R. C. and Almeroth, K. C. (2001). Modeling the branching characteristics and e!ciency gains in global multicast trees. IEEE INFOCOM2001,
Alaska.
Chen, L. Y. (1975). Poisson approximation for dependent trials. The Annals of
Probability 3, 3, 534545.
Chen, W.-K. (1971). Applied Graph Theory. (North-Holland Publishing Company,
Amsterdam).
Chuang, J. and Sirbu, M. A. (1998). Pricing multicast communication: A costbased approach. Proceedings of the INET98 .
Cohen, J. W. (1969). The Single Server Queue. (North-Holland Publishing Company, Amsterdam).
Cohen-Tannoudji, C., Diu, B., and Lalo, F. (1977). Mcanique Quantique. Vol. I
and II. (Hermann, Paris).
Comtet, L. (1974). Advanced Combinatorics, revised and enlarged edn. (D. Riedel
Publishing Company, Dordrecht, Holland).
Cormen, T. H., Leiserson, C. E., and Rivest, R. L. (1991). An Introduction to
Algorithms. (MIT Press, Boston).
Cvetkovic, D. M., Doob, M., and Sachs, H. (1995). Spectra of Graphs, Theory and
Applications, third edn. (Johann Ambrosius Barth Verlag, Heidelberg).
Dorogovtsev, S. N. and Mendes, J. F. F. (2003). Evolution of Networks, From
Biological Nets to the Internet and WWW. (Oxford University Press, Oxford).
Embrechts, P., Klppelberg, C., and Mikosch, T. (2001a). Modelling Extremal
Events for Insurance and Finance, 3rd edn. (Springer-Verlag, Berlin).
Embrechts, P., McNeil, A., and Straumann, D. (2001b). Correlation and Dependence in Risk Management: Properties and Pitfalls. Risk Management: Value at
Risk and Beyond, ed. M. Dempster and H. K. Moatt, (Cambridge University
Press, Cambridge, UK).
Erds, P. and Rnyi, A. (1959). On random graphs. Publicationes Mathematicae
Debrecen 6, 290297.
Erds, P. and Rnyi, A. (1960). On the evolution of random graphs. Magyar Tud.
Akad. Mat. Kutato Int. Kozl. 5, 1761.
Feller, W. (1970). An Introduction to Probability Theory and Its Applications, 3rd
edn. Vol. 1. (John Wiley & Sons, New York).
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, 2nd
edn. Vol. 2. (John Wiley & Sons, New York).
Floyd, S. and Paxson, V. (2001). Di!culties in simulating the internet. IEEE
Transactions on Networking 9, 4 (August), 392403.
Fortz, B. and Thorup, M. (2000). Internet tra!c engineering by optimizing OSPF
weights. IEEE INFOCOM2000 .
Frieze, A. M. (1985). On the value of a random minimum spanning tree problem.
Discrete Applied Mathematics 10, 4756.
Gallager, R. G. (1996). Discrete Stochastic Processes. (Kluwer Academic Publishers, Boston).
Gantmacher, F. R. (1959a). The Theory of Matrices. Vol. I. (Chelsea Publishing
Company, New York).
Gantmacher, F. R. (1959b). The Theory of Matrices. Vol. II. (Chelsea Publishing
Company, New York).
Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimus obnoxiae. Pars prior. Gauss Werke 4, 326.
BIBLIOGRAPHY
525
526
BIBLIOGRAPHY
BIBLIOGRAPHY
527
528
BIBLIOGRAPHY
Index
Pareto, 56
Poisson, 40, 116, 129, 335
polynomial, 44, 348, 494
regular, 348, 362
uniform, 43, 74
Weibull, 55, 107, 132
529
530
Index