1 Combinatorial Analysis 1.1 1.2 1.3 1.4

Contents
1 Combinatorial analysis 1
1.1 Fundamental Principles Of Counting: Tree Diagram . . . . . . . . . . . . . 1
1.2 Factorial Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Elements of probability theory 1

2.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1.1 Venn Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.3 Set Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.4 Principle Of Duality: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Sample Space And Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Concept Of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Addition Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 The Multiplication Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Elements of mathematical statistics 1

3.1 Stochastic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1
0 CONTENTS
3.2 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 2

3.3 Continuous Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 3
3.4 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.5 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.6 Some Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6.2 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6.3 Other Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . 12
3.7 Some Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.7.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.7.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.7.3 Erlang-k Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Theory of sampling 1
4.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.2 Sampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4.4 Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.4.1 Population mean µ and variance σ 2 are known . . . . . . . . . . . . . 3
4.4.2 Population mean µ and variance σ 2 are both unknown . . . . . . . . . 4
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Author index 7
Index 7
Chapter 1
Combinatorial analysis
In some cases the number of possible outcomes for a particular event is not very large, and
so direct counting of possible outcomes is not difficult. However, problems often arise where
direct counting becomes a practical impossibility. In these cases use is made of
combinatorial analysis, which could be called a sophisticated way of counting.
1.1 Fundamental Principles Of Counting: Tree

Diagram
We first introduce two rules which are employed in many proofs through the combinatorial
analysis:
Rule of Sum: If object A may be chosen in m ways, and object

B in n other ways, ”either A or B” may be
chosen in m + n ways.
Rule of Product: If object A may be chosen in m ways, and

thereafter object B in n ways, ”both A
and B” may be chosen in this order in
m · n ways (multiplication principle).
It should be noticed that in the Rule of Sum, the choices of A and B are mutually
exclusive, that is, one can not choose both A and B but either A or B.
1
2 CHAPTER 1. COMBINATORIAL ANALYSIS
T1
R1
T2
L1 T1
R2
T2
T1
R3
T2
T1
R1
T2
L2 T1
R2
T2
R3 T1
T2
Figure 1.1: Fig.1.1 Tree diagram for example (1.1), where, where=2, l=3, and m = 2
The Rule of Product is often used in cases where the order of choosing is immaterial, that
is, where the choices are independent. But in many practical situations the possiblity of
dependence should not be ignored.
If one thing can be accomplished in n1 different ways, and if after this a second thing can be
accomplished in n2 different ways,· · ·, and finally a k’th thing can be accomplished in nk
different ways, then all k things (which are assumed to be independent of each other) can
be accomplished in the specified order in n = n1 · n2 · · · nk different ways.
A diagram, called a tree diagram because of its appearance (fig.1.1), is often used in
connection with these rules.
Example 1.1 (see Fig. 1.1)
Let the setting-up procedure of a call involve the following devices:
k local circuits : L1 , L2 , · · · , Lk
1.2. FACTORIAL FUNCTION 3
l registers : R1 , R2 , · · · , Rl
m trunk circuits : T1 , T2 , · · · , Tm
Under the assumption of independence a call can then be set up in
n =k·l·m
different ways.
For a group of 1000 subscribers typical figures are
k = 80, l = 15, m = 50, i.e. n = 60.000.
If a malfunctioning only occurs for a specific combination of devices, it can be very difficult
to trace the fault, as it only appears in one out of 60.000 calls (assuming random hunting).
1.2 Factorial Function
Factorial n (denoted by n!, n integer) is defined as
n! = n · (n − 1) · (n − 2) · · · 2 · 1 (1.1)
It is convenient to define
0! = 1 (1.2)
Example 2.1
For many calculators the upper range of number is
9.999 · 1099
Thus the factorial function only exists for n < 70:
69! = 1.7112 · 1098
70! = 1.1979 · 10100

If n is large a direct evaluation of n! is impractical. In such cases Sterling’s approximation

is often applied: √
n! ≈ 2πn · nn · e−n (1.3)
where e is the base of natural logarithms
(e ' 2.718281828 ).
The symbol ≈ means that the ratio of the left side to the right side approaches 1 as n → ∞.
For this reason we often call the right side an asymptotic expansion of the left side. The
symbol ' means approximately equal to.
The gamma function, denoted by Γ(n) , is defined for any real value of n > 0:
Z ∞
Γ(n) = tn−1 · e−t dt , n > 0 (1.4)
0
A recurrence formula is
Γ(n + 1) = n · Γ(n) (1.5)
We can easily find Γ(1) = 1. If n is a positive integer, then we get (1.1):
Γ(n + 1) = n!
1.3 Permutations
The basic definition of a permutation is given below:
A r-permutation of n different objects is an ordered selection or arrangement of r (r ≤ n) of

the objects.
Actually it is sampling without replacement. Suppose that we are given n distinguishable

objects and wish to arrange r (≤ n) of these objects on a line. Since there are n ways of
choosing 1st object and, after this is done, n − 1 ways of choosing the 2nd object,· · · , and
finally (n − r + 1) ways of choosing the r’th object, by repeating the application of the rule
of product it follows by the fundamental principle of counting that the number of different
arrangements, or permutations as they are called, is given by
!
n
P = n · (n − 1) · · · (n − r + 1) (1.6)
r
!
n
We call P the number of permutations of n objects taken r at a time.
r
1.4. COMBINATIONS 5
For the particular case where r = n (1.6) becomes

!
n
P = n! (1.7)
n
We can write (1.6) in terms of factorials as

!
n n!
P = (1.8)
r (n − r)!
If r = n we see that (1.6) and (1.8) agree only if 0! = 1 (1.2).
Example 3.1
We consider the case where some objects are identical. The number of permutation of n
objects consisting of groups of which n1 are identical, n2 are identical,· · ·, and nk are
identical, where
n = n1 + n2 + · · · + nk
is !
n! n
=
n1 ! · n2 ! · · · nk ! n1 n2 · · · · · nk
. The term of the righthand side is called the Polynomial coefficient.
Example 3.2
Let us consider a group of n circuits. We can look (hunt) for idle circuits in
!
n
P = n!
n
different ways.
1.4 Combinations
In a permutation we are interested in the order of arrangements of the objects. Thus (a b c)

is different permutation from (b c a). In many problems, however, we are only interested in
selecting objects without regards to the order. Such selections are called combinations. For
example (a b c) and (b c a) are the same combination.
The total number of combinations of r objects selected

! from !
n (also called the combinations
n n
of n items taken r at a time) is denoted by C or .
r r
We have ! !
n n n!
C = = (1.9)
r r r!(n − r)!
It can also be written !

n
! P
n r
= (1.10)
r r!
We can easily derive the expression by noticing that each combination of r different objects
may be ordered in r! ways, and so ordered it is an r-permutation.
Thus we have
! !
n n
r!C =P = n(n − 1) · · · · · (n − r + 1), n ≥ r
r r
By moving r! to the other side we get the expression (1.10).
It is easy to see that ! !

n n
= (1.11)
r n−r
!
n
The numbers are often called binomial coefficients because they arise in the
r
binomial expansion: !
Xn
n n
(x + y) = xr · y n−r (1.12)
r=0
r
They can be generalized in several ways. Thus we define:
! !
−n r n+r−1
= (−1) ,n > 0 (1.13)
r r
This appears in the Negative Binomial Distribution.
Example 4.1
1.4. COMBINATIONS 7
The number of combinations of n objects taken 1,2,· · · , or n at a time are:

! ! !
n n n
+ +···+ = 2n − 1 (1.14)
1 2 n
i.e. !
n
X n
= 2n
r=0
r
This is readily seen from (1.12) by letting x = y = 1.
Example 4.2
The combinations of the letters a, b, c and d taken 3 at a time are
abc, abd, acd, bcd.
To form all the permutations of 4 letters taken 3 at a time it is necessary to take each
combination and write out all possible permutations of the given combination:
Combinations P ermutations
abc abc, acb, bac, bca, cab, cba
abd abd, adb, bad, bda, dab, dba
acd acd, adc, cad, cda, dac, dca
bcd bcd, bdc, cbd, cdb, dcb, dbc
Combinations P ermutations
aaa, aab, aac, aad aaa, aab, baa, aba, aac, caa, aca, aad, daa, ada
bbb, bba, bbc, bbd bbb, bba, abb, bab, bbc, cbb, bcb, bbd, dbb, bdb
ccc, cca, ccb, ccd ccc, cca, acc, cac, ccb, bcc, cbc, ccd, dcc, cdc
ddd, dda, ddb, ddc ddd, dda, add, dad, ddb, bdd, dbd, ddc, cdd, dcd
abc abc, acb, bac, bca, cab, cba
abd abd, adb, bad, bda, dab, dba
acd acd, adc, cad, cda, dac, dca
bcd bcd, bdc, cbd, cdb, dcb, dbc
Example 4.3
Let us consider a set of n objects consisting of n1 different objects of type 1, n2 different

objects of type 2, etc. where
n = n1 + n2 + · · · + nk
We consider combinations of r objects containing r1 objects of type 1, r2 objects of type 2,

· · ·, rk objects of type k. where
r = r 1 + r2 + · · · + rk , ri ≤ ni
The number of these combinations is by the fundamental principle of counting:
! ! !
n1 n2 nk
· ···
r1 r2 rk
!
n
The total number of combinations with r elements is .
r
Many combinatorial problems can be reduced to the following form. For a group of n
circuits, p of them are busy and (n − p) of them are idle. A group of k circuits is chosen at
random. We seek the number of combinations which contain exactly x busy circuits. Here x
can be any integer between zero and p or k, whichever is the smaller.
The chosen group contains x busy and k − x idle circuits. Since any choice of busy !
circuits
p
may be combined with any choice of idle ones, the busy ones can be chosen in
x
!
n−p
different ways and the idle ones in different ways. Thus the total number of
k−x
combinations containing x busy circuits is
! !
p n−p
· (1.15)
x k−x
!
n
The total number of combinations containing k circuits (idle or busy) is . So the
k
1.4. COMBINATIONS 9
relative number of ”favourable” combinations is

! !
p n−p
x k−x
qk = ! (1.16)
n
p
This can be rewritten in the form
! !
k n−k
x p−x
qk = ! (1.17)
n
p
In terms of probabilities this is called the hypergeometric distribution.
Example 4.4
For the special case x = k we get (1.16)

! ! !
p n−p p
k 0 k
qk = ! = ! (1.18)
n n
k k
or from (1.17) ! ! !
k n−k n−k
k p−k p−k
qk = ! = ! (1.19)
n n
p p
These expressions are useful when deriving Palm-Jacobæus’ formula and Erlang’s
interconnection formula (Erlang’s ideal grading) in teletraffic theory.
Useful Relations And Results
• Equalities: ! !
n n
= (1.20)
r n−r
!
n
= 0, for r > n and for r < 0 (1.21)
r
!
n
=1 (1.22)
0
• Recurrence formula (Pascal’s triangle):

! ! !
n n n+1
+ = (1.23)
r−1 r r
n
!
X n
= 2n (1.24)
r=0
r
n
! !
X i n+1
= (1.25)
i=r
r r+1
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
··· ··· ··· ··· ··· ··· ··· ··· ···
Fig.1.2 Pascal’s triangle
• Summary:  n!

 (n−r)!
, repetitions are not allowed
P ermutation =


nr , repetitions are allowed

 n!

 r!(n−r)!
, repetitions are not allowed
Combination =

 (n+r−1)!
 , repetitions are allowed
r!(n−1)!
1.5 Exercises
1. Evaluate 69! by using Sterling’s approximation to n! (use logarithm to the base 10).
Compare the result with the value given in Example(2.1).
2. In how many ways can 4 calls occupy 10 different circuits?
3. How many 6-digit telephone numbers can be formed with the digits 0,1,2,· · ·, 9 (0 is
not allowed in the first digit) if
(a) repetitions are allowed?
(b) repetitions are not allowed?
(c) the last digit must be 0 and repetitions are not allowed?
1.5. EXERCISES 11
4. We consider a 4-digit binary number. Every digit may be ”0” or ”1”. How many
different numbers have two ”0” (and two ”1”)?
5. Prove that (1.16) and (1.17) are equal.
6. Prove formula (1.23).
7. A trunk group contains 10 circuits. 7 circuits are busy. What is the number of
combinations which contain X(X = 0, 1, 2, 3, 4) busy circuits?
8. The number of combinations of n digits taken x(x = 0, 1, 2, · · · , n) at a time, where

every digit has one of k different values 0,1,2,· · ·,k − 1 can be shown to be
!
n+k
n
Verify this for n = 2 and k = 10.
Updated: 2001.01.10
Chapter 2
Elements of probability theory
Probability theory deals with the studies of events whose occurrence cannot be predicted in
advance. These kinds of events are termed random events. For example, to throw a single
die, the result may be one of the six numbers: 1, 2, 3, 4, 5, 6. We cannot predict the
result. So the outcome of throwing a die is a random event. When observing the number of
telephone calls arriving at a telephone exchange during a certain time interval, we are of
cause unable to predict the actual number of arriving calls. This is also a random event.
Probability theory is usually discussed in terms of experiments and possible outcomes of the
experiments. The set theory plays an important role in the study of probability theory.
2.1 Set Theory
A set is a collection of objects called elements of the set. In general we shall denote a set by
a capital letter such as A, B, C, etc. and an element by a lower case letter such as a, b, c, etc..
If an element c belongs to a set A, we write c ∈ A. If c does not belong to A, we write

c 6∈ A. If both a and c belong to A we write a, c ∈ A. A set is well-defined if we are able to
determine whether a particular element does or does not belong to the set.
A set can be defined by listing its elements. If the set A consists of the elements a, b, c, then
we write
A = {a, b, c}
A set can also be defined by describing some properties held by all elements and by
non-elements. We call a set a finite (or infinite) one if it contains a finite (or infinite)
number of elements.
1
2 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
Example 1.1
A = {abc, acd, abd, bcd}

= {x | x is a combination of the letters a, b, c
and d taken three at a time }
Example 1.2
B = {x | x is the number of telephone call attempts

between 9 a.m. and 10 a.m. }
If any element of a set A does belong to a set B, then we call A a subset of B, written:
A ⊆ B (” A is contained in B ”) or
B ⊇ A (” B contains A ”)
For all sets we have A ⊆ A. If both A ⊆ B, and B ⊆ A, then A and B are said to be equal,
and we write A = B. In this case A and B have exactly the same elements.
If A and B do not have the same elements, we write A 6= B.
If A ⊆ B, but A 6= B, then we call A a proper subset of B denoted by A ⊂ B.
Example 1.3 (cf. Example 1.1)
The set consisting of the combinations of the letters a, b, c, and d taken three at a time is
a proper subset of the set consisting of the permutations of the same four letters taken three
at a time.
Example 1.4 (cf. Example 1.2)
The set consisting of successful call attempts between 9 a.m. and 10 a.m. is a subset of all
call attempts during the same period.
2.1. SET THEORY 3
The following theorem is true for any sets A, B, C:
if A⊆B and B⊆C then A⊆C
All sets considered are in general assumed to be subsets of some fixed set called the universe
or the universal set, and denoted by U. It is also useful to define a set having no elements
at all. This is called the null set and is denoted by ∅.
2.1.1 Venn Diagram
A universe U can be shown graphically by the set of points inside a rectangle (fig. 2.1).
Subsets of U (such as A and B shown in fig. 2.1) can be represented by sets of points inside
circles. Such a diagram is called a Venn Diagram. It often serves to provide geometric
intuition regarding possible relationships between sets.
U
A...................................................................................... ..............
........... . ..........
....... . . . . ........
..... . . . . . . . .....
.... . . . . . . . . . . . . . . . ......
. . . . . . . . . . .. .
.... . . . . . . . . . . . . . . . . . ..... .... . . . . . . . . . . . . . . . . . .....
... . . . . . . . . . . ... ... . . . . . . . . . . ...
..... . . . . . . . . . . . . . . . . . . . ..... ... . . . . . . . . . . . . . . . . . . . ....
... . . . . . . . . . . . . . . . . . . . . . .. .... . . . . . . . . . . . . . . . . . . . . . ...
.... . . . . . . . . . .... .... . . . . . . . . . ....
... . . . . . . . . . . ... ... . . . . . . . . . . ...
.... . . . . . . . . . . . . . . . . . ... .... . . . . . . . . . . . . . . . . . ...
... . . . . . . . . . ... ... . . . . . . . . . ...
... . . . . . . . ..... ... . . . . . . . .....
...... . . . . . ...... ...... . . . . . ......
........ . . . . . . . ....... ........ . . . . . . . .......
............................. .............................
B
Fig. 2.1 A Venn Diagram. U is the universe. A and B are subsets.
2.1.2 Operators
In set theory we define a number of operators. We assign symbols to them, just as

arithmetic operators for addition, subtraction, multiplication and division have symbols like
+, −, ×,÷. The set operators are:
Union (symbol ∪).

The union of A and B, denoted by A ∪ B, is the set of elements which belongs to
either A or B or both A and B (fig. 2.2).
Intersection (symbol ∩).

The intersection of A and B, denoted by A ∩ B, is the set of elements which belong to
both A and B (fig. 2.3).
If A ∩ B = ∅, then the two sets are called disjoint or mutually exclusive. They have no
elements in common.
Difference (symbol \).

The difference of A and B, denoted by A \ B, is the set consisting of all elements of A
which do not belong to B (fig. 2.4).
Complement (symbol C).

The absolute complement or, simply, the complement of A denoted by CA, is the set
of elements which does not belong to A.
The complement of A relative to B, denoted by CB A is the set of elements in B which
does not belong to A (i.e. B \ A) (fig. 2.5).
....................... .......................
U ...................................................................
U
A ........ . . . ............. . . .........
...... . . . . ....... ..... . . . . .....
..... . . . . . . . . ..... . . ...... . . . . . . . . ...... A.................... ..
..... .....
....... . . .....
......
....
...
. . . .
... . . . . . ... . . . .... . . . . ... . .... . . . . . . .... ...
.. . . . . . .. . . . ... . . . . . ... .... .... . . . . . . . . ..... ...
.. . . . . . . . . . .... . . . . . . . . ..... . . . . . . . . . .... ... ... . . . . ... ..
.... . . . . ... . . . . ... . . . . . .. ...
.... . . . . . . . . . . .... . . . . . . . . . .... . . . . . . . . . .... ...
...
... . . . . . . . . ...
.. . . . . .. ..
.... . . . . . . . . . ... . . . . . . . . . .... . . . . . . . . . .... .. . . . . .. ..
... . . . . . .... . . . . .... . . . . ... ... . . ..
.
... . . . . .... . . . . ... . . . . ... ... ... . . . . . . . .. .
.
... .... . . . .. . .
.... . . . . . . . . . .... . . . . . . ..... . . . . . . . . ... ... .... . . ... ...
... . . . . . .... . . ..... . . . . ... .... ...... . . ... ...
..... . . . . . .... . .... . . . . ..... ...... ..... ...... ....
....... . . . . .......... . . . . ...... ........ ... ......
........... . ..................... . . ...........
B ..............................................................
............. ............. B
Fig. 2.2 The union :A ∪ B Fig. 2.3 The intersection :A ∩ B
. . . . . . . . . . .
. .
............ ......................
U .
.
.
.
.
.
.
.
.
.
.
......
.
.....
.
.
.
.
.
. .U.. . ..
A ........... . ........... .......... ...... . . . . ................ ................ . .
....... . . . . ......... ..... . ........ . . ..........
..... . . . . . ..... ........ ... . . . . . .
.
..... . . . . . . . . ......
... . . . . . . . . ....
...
...
...
...
...
...
.
.
.
.
.
. .. . .
.
..... .
.
.
.
A .
.
. ..... .
..
. .. .
.
.
.
.
..... . . . . . . . . . .... ... ... . . . . .
..... . .
... . . . . . ... .. .. .. . . . . .
... . . . . . . . . . . ... . . . . ...... . . . . . .. . . .
.... . . . . ... ... .. . . ... . . . . . ..... . .
... . . . . . . . . . .... . ..
.... . . . . ... ..
.
.
..
.
. . . . ..... . . . . . .... . . .
.... . . . . ...
..... . . . . .... ...
..
...
.
. . . ...... . . . . ..... . . .
....... . . . . . . . . ....... ........ .... . . . ........ . . . ........ . . .
........ . . .............. ..... . ............................................
............................. . . . . .
......................
B . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
Fig. 2.4 The difference :A \ B Fig. 2.5 The complement :CA

2.1. SET THEORY 5
2.1.3 Set Theorems
Set operations are similar to those of Boolean algebra as seen from the following theorems.
1. Idempotent laws:
A∪A= A A∩A=A
2. Commutative laws:
A∪B =B∪A A∩B =B∩A
3. Associative laws:
A ∪ (B ∪ C) = (A ∪ B) ∪ C = A ∪ B ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C = A ∩ B ∩ C
4. Distributive laws:
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
5. Identity laws:
A∪∅ =A A∩∅=∅
A∪U =U A∩U =A
6. De Morgan’s laws:
C(A ∪ B) = CA ∩ CB
C(A ∩ B) = CA ∪ CB
7. Complement laws:
A ∪ CA = U A ∩ CA = ∅
C(CA) = A CU = ∅ C∅ = U
8. For any sets A and B:

A = (A ∩ B) ∪ (A ∩ CB)
2.1.4 Principle Of Duality:
Any true results involving sets is also true if we replace unions by intersections, intersections
by unions, sets by their complements and we reverse the inclusion symbols ⊂ and ⊃.
2.2 Sample Space And Events
Experiments are of great importance in science and engineering. Experiments in which the
outcome will not be the same, even though the conditions are nearly identical, are called
random experiments and are subject to study by probability theory.
A set of a list of all possible outcomes of an experiment is called a sample space denoted by
S. The individual outcome is called a sample point which is an element of the set S.
Example 2.1
Experiment : Throw a die

Sample space: S = {1, 2, 3, 4, 5, 6}
Example 2.2
Experiment : Observation of the number of busy

lines in a group of n lines.
Sample space: S = {0, 1, 2, · · · , n}
Example 2.3
Experiment : Observation of the number of calls

arriving at a telephone station
during a certain time interval.
Sample space: S = {0, 1, 2, · · ·}
Example 2.4
Experiment : Make a telephone call between 9 a.m.

and 10 a.m. and observe the time
of the call.
Sample space: S = {t | 9 ≤ t ≤ 10}
2
2.2. SAMPLE SPACE AND EVENTS 7
If the sample space has a finite number of points it is called a finite sample space (Example
2.1 and 2.2). If it has as many points as there are natural numbers it is called a
countable finite sample space (Example 2.3). In both cases it is called a discrete sample
space. If it has as many points as there are points in some interval it is called a
non-countable infinite sample space or a continuous sample space(Example 2.4).
An event is a subset A of the sample space, i.e. it is a set of possible outcomes. If the
outcome of an experiment is A we say that the event A has occurred or that A is a
realization of the experiment.
An event which consists of one sample point of S is called a simple event. It can not be
broken down to other events. A compound event is the aggregate of a number of simple
events. A sure event is an event that will definitely occur. Naturally an impossible event
never occurs.
Example 2.5
For a !
group of 12 lines the compound event that exactly 2 circuits are busy consists of
12
= 66 simple events.
2
Since events are sets, statements concerning events can be translated into the language of
set theory and conversely. We can represent events graphically on a Venn Diagram, and we
also have an algebra of events corresponding to the algebra of sets given in section 2.1. By
using the operators of section 2.1 on events in S we can obtain other events in S. If A and
B are events, we have
A∪B : the event ”either A or B or both”

A∩B : the event ”both A and B”
CA : the event ”not A”
A\B : the event ”A but not B”
∅ : the event ”never occur”
S : the event ”surely occur”
If A ∩ B = ∅, that is the sets corresponding to events A and B are disjoint, then both
events cannot occur simultaneously. They are mutually exclusive.
A set of events is termed exhaustive if their union is the entire sample space S.
Example 2.6
Consider a trunk group having four trunk circuits. The experiment is to observe the state of
the trunks: busy or idle. A sample space S for this experiment of observing trunk circuits 1
through 4 may be the set of four-tuples (a1 , a2 , a3 , a4 ). ai is either 1 or 0 indicating that
the i’th trunk circuit is busy or idle. Thus the sample point (1,0,1,0 ) corresponds to the
outcome that the first and the third circuits are busy while the second and the fourth
circuits are idle. The sample space consists of 24 = 16 sample points.
Let A be an event in which at least two circuits are idle, and let B be an event in which no
more than two circuits are idle. Then A ∪ B is the whole space S. A ∩ B is the collection of
elements of S in which just two trunks are idle. If C is the event that exactly one circuit is
idle, then A ∩ C = ∅ that is to say A and C have no sample points in common.
2.3 The Concept Of Probability
Probability is a positive measure between 0 and 1 associated with each simple event, the
total of all simple event probabilities being 1. From a strict mathematical point of view it is
difficult to define the concept of probability. We shall use a relative frequency approach (the
posteriori approach)
In early or classical probability theory, all sample spaces were assumed to be finite, and
each sample point was considered to occur with equal frequency. The definition of the
probability P of an event A was described by the relative frequency by which A occurs:
h
P (A) =
n
where h is the number of sample points in A and n is the total number of sample points.
This definition is applicable in some cases as the following example.
Example 3.1 (ref. Example 2.6)
According to Example 2.6, there are 24 = 16 sample points. A is the event that at least two
trunk circuits are idle, B is the event that at most two circuits are idle, C is the event that
exactly one circuit is idle. We use the combinational analysis to get the probabilities:
! ! !
4 4 4
hA = + + = 11
2 3 4
2.3. THE CONCEPT OF PROBABILITY 9
hA 11
and P(A) = n
= 16
= 0.6875
! ! !
4 4 4
hB = + + = 11
0 1 2
hB 11
and P (B) = n
= 16
= 0.6875
4
!
X 4
hA∪B = = 16
i=0
i
16
and P (A ∪ B) = 16
= 1.
We find that a sure event has the probability 1.

!
4
hA∩B = =6
2
6
and P (A ∩ B) = 16
= 0.375
!
4
hC = =4
1
4
and P (C) = 16
= 0.25
hA∩C = 0
0
and P (A ∩ C) = 16
= 0.
We find that an impossible event has the probability 0.
Let us consider an experiment with sample space S. Let h be the number of times that the
event A occurs in n repetitions of the experiment. Then we define the probability of A by
h
P (A) = lim (2.1)
n→∞ n
Thus the probability of an event is the proportion of all experiments in which this event
occurs when we make a very large number of experiments. From the definition we obtain a
number of basic properties:
1. For every event A we have:

0 ≤ P (A) ≤ 1 (2.2)
2. An impossible event has zero probability:
P (∅) = 0 (2.3)
3. A sure event has probability of unity:
P (S) = 1 (2.4)
4. For any number of disjoint events A1 ,A2 ,· · ·,Ak we have:

k
X
P (A1 ∪ A2 ∪ · · · ∪ Ak ) = P (Ai) (2.5)
i=1
In particular for two disjoint events:
P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) (2.6)
If S is a continuous sample space, then the probability of any particular sample point is
zero. We therefore need the concept of a probability density:
Z
P (A) = p(s)ds (2.7)
A
This is similar to the discrete case (2.5), and all laws of probability still apply if we replace
summation by integration.
Example 3.2
In Example 2.4 we have a continuous sample space {9 ≤ t ≤ 10}. The probabiliy of a call at
9:30 sharp is zero. If we assume the call is equally likely to occur anywhere between 9 and
10, then the density function becomes (1hour)−1 = (60minutes)−1 . The probability of a call
2 1
between 9:29 and 9:31 then becomes 60 = 30 .
Example 3.3
A single die is thrown. The sample space is

S = {1, 2, 3, 4, 5, 6}. If we assume the die is fair, then we assign equal probabilities to the
sample points:
1
P (1) = P (2) = P (3) = P (4) = P (5) = P (6) =
6
2.4. ADDITION RULE 11
In some cases enough is known about the experiment to enumerate all possible outcomes
and to state that these are equally likely. The probability of A is then equal to the ratio of
the number of outcomes in which A is realized to the total number of outcomes.
Combinatorial analysis is useful to find the relevant number of outcomes.
Example 3.4 (cf.Chapter 1 Example 4.4)
If all possible combinations are equally likely, then the probability of finding k busy circuits
are given by (1.18) or (1.19).
We now consider an important theorem in probability theory.
2.4 Addition Rule
If A and B are two events, then
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (2.8)
This formula might be used to simplify the evaluations of Example 3.1.
If the events are mutually exclusive (disjoint), then
P (A ∪ B) = P (A) + P (B) (2.9)
Generalizations to 3 or more events can be made.
2.5 Conditional Probability
We now investigate a more complicated version of the case in Example 2.6.
Given the first trunk circuit is known to be idle, then we are interested in the probability P
that there are at least two other circuits idle. It is obvious that” the first circuit is idle” is
an event, and that” at least two other circuits are idle is also an event. So P is the
probability that one event occurs under the condition that another event has occurred. This
kind of probability is naturally called conditional probability.
We consider an experiment. If it is known that an event B has already occurred, then the
probability that the event A has also occurred is known as the conditional probability. This
is denoted by P (A | B), the conditional probability of A given B, and it is defined by
P (A ∩ B)
P (A | B) = (2.10)
P (B)
i.e. the number of experiments in which A and B (at least) are realized.
Example 3.5
Let A denote the event ”a group of 5 circuits (a, b, c, d, e) contains 2 calls only which occupy
adjacent circuits”. Let B denote the event ”a group of 5 circuits contains 2 calls only, one of
which occupies the circuit a”.
!
5
From combinatorial analysis we know that 2 lines can be busy in = 10 different ways.
2
4 1
Therefore we get P (B) = 10 (ab, ac, ad, or ae occupied), and P (A ∩ B) = 10 (ab occupied).
1
10 1
P (A | B) = 4 =
10
4
2.6 The Multiplication Theorem
From (2.10) we get the following formula:
P (A ∩ B) = P (A) · P (B | A) = P (B) · P (A | B) (2.11)
If the events are mutually exclusive we get
P (A ∩ B) = P (A) · P (B) (2.12)
In this case A and B are said to (statistically) independent events. The probability of event
A does not depend on whether B has occurred or not:
P (B | A) = P (B)
or
P (A | B) = P (A)
2.7. BAYES’ RULE 13
Example 3.6
If the probability of getting B-busy (called party busy) at a random point of time is 0.10,
then the probability of getting B-busy in two call-attempts at two different days is
0.10 · 0.10 = 0.01.
2.7 Bayes’ Rule
Let A1 , A2 ,· · ·,Ak be mutually exclusive events whose union is the sample space S
(exhaustive). Then for any event A we have
P (Ai ) · P (A | Ai )
P (Ai | A) = Pk (2.13)
j=1 P (Ai ) · P (A | Aj )
From this we can find the probabilities of the events A1 ,A2 ,· · ·, Ak which can cause A to
occur. In this case we can thus obtain P (A | B) from P (B | A), which in general is not
possible.
Example 3.7
A and B are two persons in two different places calling a third person C. It has been found
that A talks with C on the average 9 times during the same time as B talk with C 10 times.
On the average, out of 100 call attempts, A succeeds 80 times and B 70 times. During this
trial, C’s telephone is ringing, but it is unknown which person is calling successfully. Then
what is the probability that C’s incoming call is from B?
Let Ta and Tb be the events that the call is from A and B respectly. Consider the ratio of
the average number of successful calls from A to the average number of successful calls from
B, we have P (Ta ) = 0.9 · P (Tb ). Let D be the event that a call reaches C. According to the
data, we have P (D | Ta ) = 0.8 and P (D | Tb ) = 0.7. Using the Bayes’ formula:
P (Tb ) · P (D | Tb )
P (Tb | D) =
P (Ta ) · P (D | Ta ) + P (Tb ) · P (D | Tb )
0.7 · P (Tb )
= = 0.493
0.9 · P (Tb ) · 0.8 + 0.7 · P (Tb )
2
2.8 Exercises
1. Telephones returned to a workshop for repair are subject to three kinds of defects,
namely A, B and C. A sample of 1000 pieces was inspected with the following results:
400 had type A defect (and possibly

other defects).
500 had type B defect (and possibly
other defects).
300 had type C defect (and possibly
other defects).
60 had both type A and B defects
(and possibly C).
100 had both type A and C defects
(and possibly B).
80 had both type B and C defects
(and possibly C).
20 had all type A, B and C defects.
Let A, B, C be subsets of the universe consisting of all 1000 telephones.
(a) Make a Venn Diagram representing

the above subsets.
Find from this diagram.

(b) The number of telephones which had
none of these defects.
(c) The number of telephones which had
at least one of these defects.
(d) The number of telephones which
were free of type A and B defects.
(e) The number of telephones which
had no more than one of these defects.
2. A sample space consists of the following sample points:

U = (E1 ,E2 ,E3 ,E4 ,E5 ,E6 )
We define the subsets:
A = (E2 ,E3 ,E5 )
B = (E3 ,E5 ,E6 )
(a) Find the intersection of A and B.
2.8. EXERCISES 15
(b) Find the union of A and B.

(c) Find the difference of A and B.
(d) Find the complement of A.
(e) Find the complement of A relative to B.
3. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue
balls. Determine the probability that it is
(a) red
(b) white
(c) blue
(d) not red
(e) red or white
4. A die is thrown twice.
(a) Define the sample space for each throw

and for the total experiment.
(b) Find the probability of the event.
A = two times six
A = at least one six
(c) Find the probability of not getting
a total of 7 or 11 in the two throws.
5. Let the sample space in exercise 2 correspond to the throw of a die. We assign equal
probability to the sample points. Find the probability P (B | A).
6. Determine the probability of three sixs in five throws of a fair die.
7. A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at random
without replacement, determine the probability that:
(a) all 3 are red
(b) all 3 are white
(c) 2 are red and 1 is blue
(d) at least 1 is white
(e) 1 of each color is drawn
(f) balls are drawn in the order of red, white, blue.
8. Suppose six dice are thrown simultaneously. What is the probability of getting
(a) all faces alike
(b) no two faces alike
(c) only five different faces
9. Try to generalize formula (2.8) to 3 events and to prove it.
Updated: 2001.01.10
Chapter 3
Elements of mathematical statistics
3.1 Stochastic Variables
Let us assign a real number to each point of a sample space, i.e. each sample point has a
single real value. This is a function defined on the sample space. This function is called a
stochastic function, and the result of a given experiment which generates sample points is
called a stochastic variable (a random variable). Actually this variable is a
stochastic function defined on the sample space.
A stochastic variable which is defined on a discrete sample space is called a

discrete stochastic variable, and a stochastic variable which is defined on a continuous
sample space and takes on a uncountable infinite number of values is called a
continuous stochastic variable.
A stochastic variable is in general denoted by a capital letter (e.g. X, Y ) whereas the

possible values are denoted by lower case letters (e.g. x, y).
Example 1.1
Teletraffic measurements: a traffic recorder scans a group of circuits at regular intervals.

The number of circuits found busy is a discrete stochastic variable. The observed call
holding time is a continuous stochastic variable.
1
2 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS
A stochastic variable is characterized by its (cumulative)

distribution function F (x):
F (x) = P {X ≤ x} −∞<x<∞ (3.1)
Here ”X ≤ x” is a shorthand notation for the event corresponding to the set of all points s
in sample space S for which X(s) ≤ x. F (x) is a never decreasing function of x and
F (−∞) = 0, F (∞) = 1.
3.2 Discrete Probability Distributions
Let X be a discrete stochastic variable which can take the values x1 , x2 , · · ·, xn (finite
number or countably many values). If these values are assumed with probabilities given by
P {X = xk } = f (xk ) (3.2)
then we introduce the (probability) density function (frequency function) noted by
P {X = x} = f (x) (3.3)
For x = xk this reduces to (3.2), while for other values of x we have f (x) = 0.
In general a function f (x) is a density function if

f (x) ≥ 0 (3.4)
and X
f (x) = 1 (3.5)
x
where the sum is to be taken over all possible values of x.
The distribution function is obtained from the density function by noting that
X
F (x) = P {X ≤ x} = f (u) (3.6)
u≤x
Example 1.2 (ref. Example 2.6 of Chapter2)
Let us define a discrete random variable X that counts the number of busy circuits in a
trunk group of 4 trunks. X takes only the values 1, 2, 3 and 4. Let the probabilities be given
by
p(1) = 0.40, p(2) = 0.35, p(3) = 0.15 p(4) = 0.10
Then we can get the distribution function F (x) of the discrete random variable X as follows:
3.3. CONTINUOUS PROBABILITY DISTRIBUTIONS 3
F (0) = 0
F (1) = p(1) = 0.40
F (2) = p(1) + p(2) = 0.75
F (3) = p(1) + p(2) + p(3) = 0.90
F (4) = p(1) + p(2) + p(3) + p(4) = 1.00
Thus for example F (3.4) = 0.90. (ref. Fig. 3.1)
1.00
1.0
0.90
p 0.8 0.75
r
o
b 0.6
a
b
i 0.40
l 0.4
i
t 0.2
y
0.0
0 1 2 3 4 5
Number of tests
Fig. 3.1 Distribution function of Example 1.2
3.3 Continuous Probability Distributions
When X is a continuous stochastic variable, the probability that X takes any one particular
value is in general zero. We noticed, however, in Chapter 2 (2.7) that the probability that
X is in between two different values is meaningful. In fact ”a < X ≤ b” is the event
corresponding to the set ]a, b].
The concept of probability density leads us to the introduction of a (probability)

density function f(x) where
f (x) ≥ 0 (3.7)
Z ∞
f (u)du = 1 (3.8)
−∞
We denote the probability that X lies between a and b by

Z b
P {a < X ≤ b} = f (u)du (3.9)
a
Any function satisfying (3.7) and (3.8) will be a density function.
By analogy to (3.6) we define the distribution function F (x) for a continuous stochastic
variable by Z x
F (x) = P {X ≤ x} = P {−∞ < X ≤ x} = f (u)du (3.10)
−∞
In general, we have
f (x) = F 0 (x) (3.11)
P {x < X ≤ x + dx} ≈ f (x)dx (3.12)

is called the probability element of the distribution and expresses the probability that X
belongs to the interval ]x, x + dx] .
Example 3.1
A random variable X is called exponentially distributed with parameter λ (λ > 0) if X has

a density function f (x) defined as follows:
(
0 if x < 0
f (x) =
λ · e−λx if x ≥ 0
We get the distribution function F (x) by

Z (
x 0 if x < 0
F (x) = f (t)dt = −λx
−∞ 1−e if x ≥ 0
1.0 ......................................................................................................................
...................
..........
.......
.
........
.
..
.....
...
...
...
..
.
...
...
...
...
...
...
F (x) 0.5 ...
....
..
...
...
....
..
...
...
....
..
..
..
..
....
0.0 .
0.0 0.5 1.0 1.5 2.0 2.5 3.0

X
Fig. 3.2 Distribution function for Example 3.1 with parameter λ = 3.

3.4. JOINT DISTRIBUTIONS 5
3.4 Joint Distributions
We are often interested in two stochastic variables X and Y at the same time. These may
be the outcomes of 2 experiments, or they may be a pair of figures emerging from a single
experiment.
(X, Y ) can be regarded as taking values in the product space (S × T ) consisting of all pairs
(s, t) with s ∈ S and t ∈ T .
We first consider the case of two discrete stochastic variables X, Y . The

joint (probability) density function is defined by
f (x, y) = P {X = x, Y = y} (3.13)
where
f (x, y) ≥ 0 (3.14)
X
f (u, v) = 1 (3.15)
u,v
The probability that X = xj and Y = yk is given by
f (xj , yk ) = P {X = xj , Y = yk } (3.16)
The total probability of P {X = xj } is obtained by adding all possible values of yk :

X
P {X = xj } = f1 (xj ) = f (xj , ν) (3.17)
ν
This is called the marginal density function of X.
The joint distribution function of X and Y is defined by

XX
F (x, y) = P {X ≤ x, Y ≤ y} = f (u, v) (3.18)
u≤x v≤y
The continuous case is easily obtained by analogy by replacing sums by integrals. It is also
obvious how the mixed case (discrete - continuous) should be dealt with.
If the events X = x and Y = y are independent for all x and y, then we say that X and Y
are independent stochastic variables. In this case
P {X = x, Y = y} = P {X = x} · P {Y = y} (3.19)
or equivalently
f (x, y) = f (x) · f (y) (3.20)
Generalizations to more than two variables can also be made.
Example 4.1
Consider again two consecutive throws of a die. Let X and Y correspond to the result of
the first and second throw. We can easily see that X and Y are independent, and
x, y ∈ {1, 2, 3, 4, 5, 6}, each with probability 16 . Hence the two-dimentional variable (X, Y )
takes on the pairs of values (i, j) where i, j ∈ {1, 2, 3, 4, 5, 6}.
Let Z = X + Y , which is the sum of the random variables X and Y . It is a one-dimentional

random variable which takes on the values of the sum i + j : 2, 3, 4, · · · , 11, 12. Because of
the independence of X and Y , we have
P (Z = 2) =P (X = 1, Y = 1)
1
=P (X = 1) · P (Y = 1) = 36
P (Z = 3) =P (X = 1, Y = 2)
1
+P (X = 2, Y = 1) = 18
············
P (Z = 12) =P (X = 6, Y = 6)
1
=P (X = 6) · P (Y = 6) = 36
P12
It is easy to verify that i=2 P (Z = i) = 1.
3.5 Expected Values
For a discrete stochastic variable X taking the possible values x1 , x2 , · · · , xn we define the
expectation of X of the mean of X as follows
n
X n
X
E(X) = xj · P {X = xj } = xj · f (xj ) (3.21)
j=1 j=1
For the continuous case the expectation of X with density function f (x) is defined in a
similar way: Z ∞
E(X) = x · f (x)dx (3.22)
−∞
Let X be a stochastic variable. Consider a single-valued function g(t), then Y = g(X) is

also a stochastic variable, and in analogy with (3.21) and (3.22) we define the expectation
of g(x) by:
3.5. EXPECTED VALUES 7
Discrete case: n
X
E(X) = g(xj ) · f (xj ) (3.23)
j=1
Continuous case: Z +∞
E(X) = g(x) · f (x)dx (3.24)
−∞
E(X) is the mean value of X and is a measure of the location of X.
Example 5.1
Assume that the random variable X can take on two values x1 = −1 with probability
p1 = 0.2 and x2 = 1 with probabity p2 = 0.8
The expectation of X equals

E(X) = 0.2 · (−1) + 0.8 · 1 = 0.6
2
Example 5.2
Let the stochastic variable X take on the values x = 0, 1, 2, · · · and let

λx −λ
P (X = x) = e λ(> 0) is constant.
x!
We find the mean value of X:
∞ ∞
X λx −λ X λx−1
E(X) = x· · e = λe−λ
x=0 x! x=1 (x − 1)!
∞
−λ
X λx
= λe = λe−λ eλ = λ
x=0 x!
In this case, we call X Possion distributed.
Example 5.3
A random variable X is called Normal distributed if

1 x2
f (x) = √ e− 2
2π
We derive the mean value of X:

1 Z ∞ − x2 1 Z∞ x2
E(X) = √ xe 2 dx = − √ (−x)e− 2 dx
2π −∞ 2π −∞
1 x2
= − √ [e− 2 ]+∞
−∞ = 0
2π
2
Some properties of the expectation are shown below:
1. E(c) = c
2. E(c · X) = c · E(X)
3. E(X1 + X2 ) = E(X1 ) + E(X2 )
4. E(X1 · X2 ) = E(X1 ) · E(X2 ) X1 , X2 are independent
Hereby c is a constant and X, X1 , X2 are stochastic variables whose expectations exist.
Of particular interest is the expectation of g(x), when g(x) = X r , where r is positive

integer. Then αr = E(X r ) is called the r’th moment of X:
Discrete case: X
αr = xrj · f (xj ) (3.25)
j
αr = xr · f (x)dx (3.26)
−∞
We notice that:
α1 = E(X) α2 = E(X 2 )
The r’th moment of a stochastic variable X about a is defined by E((X − a)r ).
Moments about the mean of X are denoted by µr :
Discrete case: X
µr = (xj − E(X))r · f (xj ) (3.27)
j
3.5. EXPECTED VALUES 9
µr = (x − E(X))r · f (x)dx (3.28)
−∞
Of particular interest is the 2nd moment about the mean. This is called the variance :
V ar(X) = E((X − E(X))2 ) (3.29)
This is a non-negative number. The square root of the variance is called the
standard deviation.
We can easily get:

V ar(X) = µ2 = α2 − α12 (3.30)
Some properties of the variance are:
1. V ar(c) = 0
2. V ar(cX) = c2 · V ar(X)
3. V ar(X1 + X2 ) = V ar(X1 ) + V ar(X2 ) X1 , X2 are independent
Example 5.4
We calculate the variances
1. in Example 5.1
α1 = E(X) = 0.6
α2 = E(X2 ) = (−1)2 · 0.2 + 12 · 0.8 = 1
µ2 = V ar(X) = α2 − α12 = 0.64
2. in Example 5.2
α1 = λ
P λx
α2 = E(X 2 ) = ∞ 2
x=0 x · x!
· e−λ
= e−λ (λ2 eλ + λeλ ) = λ2 + λ

µ2 = V ar(X) = α2 − α12 = λ
3. in Example 5.3
α1 = E(X) = 0
R+∞ 2 − x2
α2 = E(X 2 ) = √12π −∞ x e 2 dx = 1
µ2 = V ar(X) = α2 − α12 = 1
For a discrete stochastic variable assuming non-negative integers as values the

Binomial moments are very useful in teletraffic theory. The r’th Binomial moment is
defined by
∞
!
X i
βr = p(i) (3.31)
i=r
r
The results given above can be extended to two or more variables having joint density
functions, e.g. f (x, y).
XX
E(X) = u · f (u, v) (3.32)
u v
An interesting quantity arising in the case of two variables is the covariance defined by
Cov(X, Y ) = E((X − E(X))(Y − E(Y ))) (3.33)
If X and Y are independent, then Cov(X, Y ) = 0.
On the other hand, if X and Y are identical

(X = Y ) then
1
Cov(X, Y ) = (V ar(X) · V ar(Y )) 2 = V ar(X) (3.34)
Thus we are led to a measure of the dependence of the variables X and Y given by
Cov(X, Y )
ρ= 1 (3.35)
(V ar(X) · V ar(Y )) 2
This is a dimensionless quantity called the correlation coefficient or coefficient of correlation.
3.6 Some Discrete Distributions
We shall now describe some important distributions of the discrete type.

3.6. SOME DISCRETE DISTRIBUTIONS 11
3.6.1 Binomial Distribution
Later on we will realize that many important discrete variables are outcomes from the
concept of Bernoulli sequence of experiments. A Bernoulli experiment is a trial with only
two possible outcomes. We normally call the two outcomes ”success” and ”failure” with
respective probabilities p and 1 − p . A sequence of such experiments is called Bernoulli
sequence if all the experiments have the same probability of ”success” or ”failure”.
We now consider Binomial distribution. This distribution is given by (ref. Table 3.1)
!
n
P {X = x} = · px (1 − p)n−x , x = 0, 1, 2, · · · , n (3.36)
x
We find
E(X) = n · p (3.37)
V ar(X) = np(1 − p) (3.38)
This distribution applies if one makes n independent Bernoulli experiments in which the
probability of ”success” in each experiment is p. P {X = x} is the probability of there being
exactly x ”successes”. This probability is derived by combinatorial analysis.
Example 6.1
This model is usable when we make n test-calls and observes how many of them are
unsuccessful.
3.6.2 Poisson Distribution
This distribution is given by (ref. Table 3.1)
λx −λ
P {X = x} = e , x = 0, 1, 2, · · · (3.39)
x!
We find (cf. Example 5.2 & 5.4):
E(X) = λ (3.40)
V ar(X) = λ (3.41)
This distribution is obtained as the limit of the Binomial distribution when we increase n
and at the same time reduce α, keeping α · n constant and equal to λ.
The Poisson distribution, however, is not only an approximation of the Binomial

distribution, but it is a distribution of its own right, as we shall see in the teletraffic theory.
Example 6.2
If we examine a large number of subscribers each of which has a small probability of being
busy, then the number found busy will follow a Poission distribution.
Example 6.3
The number of calls incoming to an exchange during one hour will also follow a Poission
distribution.
3.6.3 Other Discrete Distributions
The random variable which in a Bernouli sequence counts the number of trials to get the
first success is called a geometric random variable. It is described by the
Geometric distribution. In some cases we don’t include the trial for success so that the
values assumed are k = 0, 1, · · · . The geometric distribution is shown in Table 3.1. Notice,
that this distribution includes the success (k = 1, 2, · · ·). By adding k geometric
distributions we get the Negative Binomial distribution, which is also shown in Table 3.1
(Pascal distribution). In Chapter 1 we indicated the Hypergeometric distribution (formula
(1.17)). From Table 3.1 we notice the close relationship between the Binomial, the
Geometric and the Negative Binomial distributions.
3.7 Some Continuous Distributions
3.7.1 Normal Distribution
This is a continuous distribution with the density function
1 1 t−µ 2
f (t) = √ exp(− ( )) − ∞ < t < +∞ (3.42)
σ 2π 2 σ
3.7. SOME CONTINUOUS DISTRIBUTIONS 13
with (Example 5.3 & 5.4):

E(T ) = µ (3.43)
V ar(T ) = σ 2 (3.44)
One usually writes T = N(µ, σ), which means ”T is normally distributed with mean value µ
and the standard deviation σ ”.
The standard Normal distribution has a mean of 0 and variance of 1, and forms the basis
for tables of the Normal distribution. The properties of other Normal distributions are
obtained from these tables by working in terms of the quantity (t−µ)
σ
.
3.7.2 Exponential Distribution
This distribution is called the negative exponential distribution in teletraffic theory. The
density and distribution functions are
f (t) = λe−λt , t ≥ 0, λ > 0 (3.45)
respectively
F (t) = 1 − e−λt , t ≥ 0, λ > 0 (3.46)
We have:
1
E(T ) = (3.47)
λ
1
V ar(T ) = 2 (3.48)
λ
This is one of the most important distributions in teletraffic theory.
The well-known Markov or ”memoryless” property is inherent in this distribution as we

have:
P {X > t + h | X > t} = P {X > h}
The stochastic variable forget the age t.
3.7.3 Erlang-k Distribution
By adding k exponentially distributed stochastic variables we get a new stochastic variable

which is Erlang-k distributed (Table 3.1). By allowing k to be non-integral this can be
generalized to the Gamma distribution:
(λt)k−1 −λt
f (t) = λe (3.49)
Γ(k)
By replacing λ by (kλ) the mean value becomes independent of k:
λk
f (t) = (λkt)k−1 · e−λkt λ>0 (3.50)
(k − 1)!
For integral k we have the distribution function:
k−1
−λkt
X (λkt)j
F (t) = 1 − e · (3.51)
j=0 j!
1 1
E(T ) = V ar(T ) = (3.52)
λ kλ2
When k = 1, the Erlang-k distribution is identical with the exponential distribution. When
k = ∞, since V ar(T ) = 0 ,the random variable becomes constant.
From Table 3.1 we notice the close relationship between the Exponential, the Poisson and
the Erlang-k distributions. We also notice the relationship to the discrete cases.
Example 7.1
The holding time of a control device has been found to be Erlang-5 distributed with average
value of 500 milliseconds. What is the probability that the holding time does not exceed
750 milliseconds?
1
Let X be the random variable, we have E(X) = 500 ms.. So that λ = 500 (ms.)−1 . The
probability that X does not exceed 750 milliseconds is given by Fx (750).
k−1
−λkx
X (λkx)j
Fx (750) = 1 − e
j=0 j!
4
−7.5
X (7.5)j
= 1−e = 0.8679
j=0 j!
2
3.8. EXERCISES 15
BINOMIAL PROCESS POISSON PROCESS
Discrete time Continuous time

Probability of event = α Rate of event = λ
0<α<1 (intensity) λ > 0
Time interval GEOMETRIC EXPONENTIAL

between two DISTRIBUTION DISTRIBUTION
events = time p(n) = α · (1 − α)n−1 f (t) = λ · e−λt
interval from n = 1, 2, 3, · · · t≥0
a random point
1 1−α 1 1
of time to E= α
V = α2
E= λ
V = λ2
next event
Time interval NEGATIVE BINOMIAL ERLANG-k

DISTRIBUTION(PASCAL) DISTRIBUTION
until the !
n−1 (λt)k−1
occurrence of p(n | k) = · αk f (t | k) = · λ · e−λt
k−1 (k−1)!
event number k ·(1 − α)n−k t≥0

n = k, k + 1, · · ·
E = αk V = k(1−α) α2
E= k
λ
V = k
λ2
Number of BINOMIAL POISSON

events in DISTRIBUTION ! DISTRIBUTION
n (λt)x
a fixed p(x | n) = · αx f (x | t) = x!
· e−λt
x
time interval ·(1 − α)n−x t≥0
x = 0, 1, 2, · · · , n
E =α·n E =λ·t V =λ·t
V = α · n · (1 − α)
Table 1 Correspondence between the distribution of the Binomial process

and the Poisson process. E= mean value, V = variance.
3.8 Exercises
1. The number of calls arriving on a group of devices in a telephone system was recorded
on a counter. The counter was read off every 3 minutes. The following values xi were
obtained during a period from 8 a.m. to 10 a.m..
06 08 08 07 07 06 09 07 06 03
09 07 11 07 12 09 13 08 09 05
08 15 09 09 19 16 10 11 11 15
17 12 16 14 15 14 09 10 14 14
(a) Make a diagram of the frequency of different

number of arriving calls.
(b) Make a table of the (empirical) density function,
i.e. of the relative frequency of different number
of arriving calls.
(c) Make a table of the (empirical) distribution
function.
(d) Calculate the mean value x of the number
of arriving calls for this measurement
1 P40
(1) from (b). (2) from the formula x = 40 i=1 xi
(e) Calculate the variance of the number of
arriving calls for this measurement
P
(1) from (b). (2) from the formula V ar = n1 ni=1 (xi − x)2
(In practice we divide the sum by n − 1 instead of n when
we calculate the variance of ”observations” because this
gives a better result. We use n only in theoretical analysis)
2. Prove (3.30).
3. Show that if X and Y are independent, then Cov(X, Y ) = 0.
Updated: 2001.01.10
Chapter 4
Theory of sampling
4.1 Sampling
We are often interested in drawing conclusions about a large set of objects, which we shall
call a population . The population size N can be finite or infinite. Instead of examining the
entire population (doing this is often impossible in practice) we observe only a
sample of size n, which is a subset of the population. The process of obtaining samples is
called sampling. The purpose is to obtain some knowledge about the population from
results found in the sample.
Sampling where each element of a population may be chosen more than once is called
sampling with replacement. In sampling without replacement each element cannot be
chosen more than once.
A population is characterized by a stochastic variable X, which is defined by a distribution

function F (X) having population parameters as the mean value µ, the variance σ 2 , etc. If
we know F (t), then we have full information about the population. However, in real world
problems one often has little or no knowledge about the distribution underlying samples. So
finding knowledge, when little or nothing is known of the underlying distribution, is the
main topic of this chapter. In general we shall only try to get some knowledge (estimates)
of some population parameters by sampling.
4.2 Sampling Statistics
By taking random samples from the population these may be used to obtain estimates of
the population parameters. An important problem in sampling theory is to decide how to
1
2 CHAPTER 4. THEORY OF SAMPLING
form the sample statistics which will best estimate a given population parameter.
Let us pick n members from the population at random:
observations : x1 , x2 , · · · , xn
sample size : n
Then we calculate the following sample statistics:

n
1 X
sample mean : x= · xi (4.1)
n i=1
n
1 X
sample variance : s2 = · x2 − x2 (4.2)
n i=1 i
These statistics are functions of stochastic variables and are therefore stochastic variables
themselves.
The unkown population mean and variance are estimated by the following unbiased
estimators:
µ = E{x} (4.3)
n
σ 2 = E{sb2 } = E{ · s2 } (4.4)
n−1
(these important results are proven in mathematical statistics)
We now want to know how accurate these results are.
4.3 The Central Limit Theorem
This is a fundamental theorem from mathematical statistics:
If a sample of size n is taken from a population with finite mean µ and finite variance σ 2
(and otherways any statistical distribution), then as n increases the distribution of the
sample mean x is asymptotically normal distributed (cf. section 3.7) with mean value µ and
2
variance σn . Or equivalently, the distribution of
(x − µ)
Z= (4.5)
√σ
n
tends towards the standard Normal distribution N(0, 1) as n increases.

4.4. SAMPLING DISTRIBUTION 3
Example 4.1
Consider a sequence of Bernouli random variables X1 , X2 , · · · , Xn that are independent and

each with ”success” probability p. We have then E(Xi = p and V ar(Xi = p(1 − p)). By the
central limit theorem, we get:
x−p
√ → N(0, 1) (n → ∞)
p(1−p)
√
n
or
Pn
Xi − np
qi=1 → N(0, 1) (n → ∞)
np(1 − p)
P
Since we know from section 6 of Chapter 3 that Sn = ni=1 Xi is binomial distributed, the
above expression shows that for large n an approximation for binomial probaboloties can be
obtained by using the Normal probabilities of N(np, np(1 − p)).
4.4 Sampling Distribution
A sample statistic, which is calculated from a sample, is a function of random variables and
is therefore itself a random variable. The probability distribution of a sample statistic is
called the sampling distribution of the statistic . We shall only consider two sampling
distributions for the sample mean.
4.4.1 Population mean µ and variance σ 2 are known
Suppose that the population from which samples are taken has a probability distribution
with mean value µ and variance σ 2 (not necessary a Normal distribution). Then it can be
shown that the sampling distribution of x is asymptotically normal distributed N(µ, σ 2 ),
i.e. :
x−µ
Z = σ → N(0, 1) f or n → ∞ (4.6)
√
n
This is a consequence of the Central Limit Theorem in section 4.3.

If we choose a so-called confidence level 1 − α, then we can expect to find x lying between
the confidence limits
σ
µ ± z1− α2 · √ (4.7)
n
with the probability (1 − α) · 100% of the time.
This interval (µ − z1− α2 · √σn , µ + z1− α2 · √σn ) is called the confidence interval . z1− α2 is
obtained from the standard Normal distribution:
α
P {−∞ < T ≤ z1− α2 } = 1 − (4.8)
2
Example 4.2
For some values of the confidence level α we have the following values of z:
α z1− α2
10% 1.6449
5% 1.9600
1% 2.5758
Thus (section 3.7) 2.5% of probability (area under the density function) is above t = 1.9600,
and (because of symmetry) 2.5% is below t = −1.9600.
4.4.2 Population mean µ and variance σ 2 are both unknown
In most pratical applications we do not know the population parameters µ and/or σ 2 . Then
we estimate these parameters by the sample mean x , respectively the sample variance sb2 .
It can be shown that the sample mean has a so-called (student) t-distribution. The
confidence interval becomes:
sb
x ± t1− α2 ,n−1 · √ (4.9)
n
where the t-value is obtained from a table of the t-distribution, which has an additional
parameter: degrees of freedom = n − 1. For increasing n this distribution is asymptotically
Normal distributed:
lim t1− α2 ,n−1 = z1− α2 (4.10)
n→∞
4.4. SAMPLING DISTRIBUTION 5
The t-value yields a larger confidence interval than the z-value (we have less information
because we don’t know the population mean and variance), but for large values of n and for
most pratical purposes we often use the z-value.
Example 4.3
For α = 5% we had z97.5% = 1.96. From the t-distribution we get:
n t97.5%
1 12.71
2 4.30
5 2.57
10 2.23
20 2.09
50 2.01
We notice that for increasing n the t-value tends to 1.96.
For a given confidence level we have a relation between the confidence limits (confidence
interval) and the sample size. If we want to reduce the confidence interval by a factor c,
then we must increase the sample size by a factor c2 .
Example 4.4
The average holding time of calls during a certain period in a telephone system is to be
estimated. Based on a random sample of 100 holding times of calls, the sample mean and
sample variance are calculated as x = 5.74 time unit and sb2 = 2.65 square of time unit.
Find a 95% confidence interval for the true average holding time of calls in that period.
Let µ denote the true average holding time of calls. The confidence interval for µ based on
formula (4.9), is
sb sb
(x − √ · t1− α2 ,n−1 , x + √ · t1− α2 ,n−1 )
n n
where n = 100, 1 − α = 0.95, x = 5.74, sb = 2.65. We have t1− α2 ,n−1 = 1.984. Therefore the
confidence interval for µ is (5.4170, 6.0630).
4.5 Exercises
1. A bottle is supposed to contain 250 ml of wine, with a standard deviation of 3ml. If
we sample 200 such bottles at random, probability that the average of wine contained
in a bottle will be
(a) At most 248 ml.
(b) At least 252 ml.
(c) Between 249 and 251 ml.
2. If X is a Poisson random variable with mean 81, find the approximate probability
P (X ≥ 75).
3. Let X1 , X2 , · · · , Xn (n large enough to justify applying the Central Limit Theorem) be

independent random variables, each Poisson with mean λ. Find an (appriximate)
1 − α confidence interval for µ. (α = 5%).
4. Suppose that it is observed that the average span of using one kind of parts of a
machine is 5 years, with a standard deviation of 1.2 years. By sampling of 100 of this
kind parts, we obtain x = 4.75. Construct a confidence interval for µ with confidence
level
(a) 99% (b) 95% (c) 90% (d) 80%
Does the length of the intervals increase or decrease as the confidence level decrease?
5. A certain kind of instrument labeled ”1.5 kg weight”. A random sample of 50

instruments is measured to be in the standard weight. We calculate x = 1.47 kg,
sb2 = 0.09 kg 2 . Construct a confidence interval for µ with confidence level
(a) 80% (b) 95% (c) 99%.
Updated: 2001.01.10
4.5. EXERCISES 7

1 Combinatorial Analysis 1.1 1.2 1.3 1.4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Combinatorial Analysis 1.1 1.2 1.3 1.4

Uploaded by

Copyright:

Available Formats

Contents

2 Elements of probability theory 1

3 Elements of mathematical statistics 1

3.2 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1 Fundamental Principles Of Counting: Tree

Rule of Sum: If object A may be chosen in m ways, and object

Rule of Product: If object A may be chosen in m ways, and

Example 1.1 (see Fig. 1.1)

Let the setting-up procedure of a call involve the following devices:

Under the assumption of independence a call can then be set up in

For a group of 1000 subscribers typical figures are

k = 80, l = 15, m = 50, i.e. n = 60.000.

1.2 Factorial Function

Factorial n (denoted by n!, n integer) is defined as

For many calculators the upper range of number is

Thus the factorial function only exists for n < 70:

69! = 1.7112 · 1098

70! = 1.1979 · 10100

If n is large a direct evaluation of n! is impractical. In such cases Sterling’s approximation

The basic definition of a permutation is given below:

A r-permutation of n different objects is an ordered selection or arrangement of r (r ≤ n) of

Actually it is sampling without replacement. Suppose that we are given n distinguishable

For the particular case where r = n (1.6) becomes

We can write (1.6) in terms of factorials as

If r = n we see that (1.6) and (1.8) agree only if 0! = 1 (1.2).

In a permutation we are interested in the order of arrangements of the objects. Thus (a b c)

The total number of combinations of r objects selected

It can also be written !

By moving r! to the other side we get the expression (1.10).

It is easy to see that ! !

This appears in the Negative Binomial Distribution.

The number of combinations of n objects taken 1,2,· · · , or n at a time are:

This is readily seen from (1.12) by letting x = y = 1.

The combinations of the letters a, b, c and d taken 3 at a time are

abc, abd, acd, bcd.

Let us consider a set of n objects consisting of n1 different objects of type 1, n2 different

We consider combinations of r objects containing r1 objects of type 1, r2 objects of type 2,

The number of these combinations is by the fundamental principle of counting:

relative number of ”favourable” combinations is

For the special case x = k we get (1.16)

Useful Relations And Results

• Recurrence formula (Pascal’s triangle):

Fig.1.2 Pascal’s triangle

2. In how many ways can 4 calls occupy 10 different circuits?

5. Prove that (1.16) and (1.17) are equal.

6. Prove formula (1.23).

8. The number of combinations of n digits taken x(x = 0, 1, 2, · · · , n) at a time, where

Verify this for n = 2 and k = 10.

Elements of probability theory

2.1 Set Theory

If an element c belongs to a set A, we write c ∈ A. If c does not belong to A, we write

A = {abc, acd, abd, bcd}

B = {x | x is the number of telephone call attempts

If A and B do not have the same elements, we write A 6= B.

If A ⊆ B, but A 6= B, then we call A a proper subset of B denoted by A ⊂ B.

Example 1.3 (cf. Example 1.1)

Example 1.4 (cf. Example 1.2)

The following theorem is true for any sets A, B, C:

if A⊆B and B⊆C then A⊆C

2.1.1 Venn Diagram

Fig. 2.1 A Venn Diagram. U is the universe. A and B are subsets.

In set theory we define a number of operators. We assign symbols to them, just as

Union (symbol ∪).