Professional Documents
Culture Documents
1 Combinatorial analysis 1
1.1 Fundamental Principles Of Counting: Tree Diagram . . . . . . . . . . . . . 1
1.2 Factorial Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.1
0 CONTENTS
4 Theory of sampling 1
4.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.2 Sampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4.4 Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.4.1 Population mean µ and variance σ 2 are known . . . . . . . . . . . . . 3
4.4.2 Population mean µ and variance σ 2 are both unknown . . . . . . . . . 4
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Author index 7
Index 7
Chapter 1
Combinatorial analysis
In some cases the number of possible outcomes for a particular event is not very large, and
so direct counting of possible outcomes is not difficult. However, problems often arise where
direct counting becomes a practical impossibility. In these cases use is made of
combinatorial analysis, which could be called a sophisticated way of counting.
We first introduce two rules which are employed in many proofs through the combinatorial
analysis:
It should be noticed that in the Rule of Sum, the choices of A and B are mutually
exclusive, that is, one can not choose both A and B but either A or B.
1
2 CHAPTER 1. COMBINATORIAL ANALYSIS
T1
R1
T2
L1 T1
R2
T2
T1
R3
T2
T1
R1
T2
L2 T1
R2
T2
R3 T1
T2
Figure 1.1: Fig.1.1 Tree diagram for example (1.1), where, where=2, l=3, and m = 2
The Rule of Product is often used in cases where the order of choosing is immaterial, that
is, where the choices are independent. But in many practical situations the possiblity of
dependence should not be ignored.
If one thing can be accomplished in n1 different ways, and if after this a second thing can be
accomplished in n2 different ways,· · ·, and finally a k’th thing can be accomplished in nk
different ways, then all k things (which are assumed to be independent of each other) can
be accomplished in the specified order in n = n1 · n2 · · · nk different ways.
A diagram, called a tree diagram because of its appearance (fig.1.1), is often used in
connection with these rules.
k local circuits : L1 , L2 , · · · , Lk
1.2. FACTORIAL FUNCTION 3
l registers : R1 , R2 , · · · , Rl
m trunk circuits : T1 , T2 , · · · , Tm
n =k·l·m
different ways.
If a malfunctioning only occurs for a specific combination of devices, it can be very difficult
to trace the fault, as it only appears in one out of 60.000 calls (assuming random hunting).
n! = n · (n − 1) · (n − 2) · · · 2 · 1 (1.1)
It is convenient to define
0! = 1 (1.2)
Example 2.1
9.999 · 1099
The symbol ≈ means that the ratio of the left side to the right side approaches 1 as n → ∞.
For this reason we often call the right side an asymptotic expansion of the left side. The
symbol ' means approximately equal to.
The gamma function, denoted by Γ(n) , is defined for any real value of n > 0:
Z ∞
Γ(n) = tn−1 · e−t dt , n > 0 (1.4)
0
A recurrence formula is
Γ(n + 1) = n · Γ(n) (1.5)
We can easily find Γ(1) = 1. If n is a positive integer, then we get (1.1):
Γ(n + 1) = n!
1.3 Permutations
Example 3.1
We consider the case where some objects are identical. The number of permutation of n
objects consisting of groups of which n1 are identical, n2 are identical,· · ·, and nk are
identical, where
n = n1 + n2 + · · · + nk
is !
n! n
=
n1 ! · n2 ! · · · nk ! n1 n2 · · · · · nk
. The term of the righthand side is called the Polynomial coefficient.
Example 3.2
Let us consider a group of n circuits. We can look (hunt) for idle circuits in
!
n
P = n!
n
different ways.
1.4 Combinations
We have ! !
n n n!
C = = (1.9)
r r r!(n − r)!
We can easily derive the expression by noticing that each combination of r different objects
may be ordered in r! ways, and so ordered it is an r-permutation.
Thus we have
! !
n n
r!C =P = n(n − 1) · · · · · (n − r + 1), n ≥ r
r r
!
n
The numbers are often called binomial coefficients because they arise in the
r
binomial expansion: !
Xn
n n
(x + y) = xr · y n−r (1.12)
r=0
r
They can be generalized in several ways. Thus we define:
! !
−n r n+r−1
= (−1) ,n > 0 (1.13)
r r
Example 4.1
1.4. COMBINATIONS 7
i.e. !
n
X n
= 2n
r=0
r
Example 4.2
To form all the permutations of 4 letters taken 3 at a time it is necessary to take each
combination and write out all possible permutations of the given combination:
Combinations P ermutations
abc abc, acb, bac, bca, cab, cba
abd abd, adb, bad, bda, dab, dba
acd acd, adc, cad, cda, dac, dca
bcd bcd, bdc, cbd, cdb, dcb, dbc
Combinations P ermutations
aaa, aab, aac, aad aaa, aab, baa, aba, aac, caa, aca, aad, daa, ada
bbb, bba, bbc, bbd bbb, bba, abb, bab, bbc, cbb, bcb, bbd, dbb, bdb
ccc, cca, ccb, ccd ccc, cca, acc, cac, ccb, bcc, cbc, ccd, dcc, cdc
ddd, dda, ddb, ddc ddd, dda, add, dad, ddb, bdd, dbd, ddc, cdd, dcd
abc abc, acb, bac, bca, cab, cba
abd abd, adb, bad, bda, dab, dba
acd acd, adc, cad, cda, dac, dca
bcd bcd, bdc, cbd, cdb, dcb, dbc
Example 4.3
8 CHAPTER 1. COMBINATORIAL ANALYSIS
n = n1 + n2 + · · · + nk
r = r 1 + r2 + · · · + rk , ri ≤ ni
! ! !
n1 n2 nk
· ···
r1 r2 rk
!
n
The total number of combinations with r elements is .
r
Many combinatorial problems can be reduced to the following form. For a group of n
circuits, p of them are busy and (n − p) of them are idle. A group of k circuits is chosen at
random. We seek the number of combinations which contain exactly x busy circuits. Here x
can be any integer between zero and p or k, whichever is the smaller.
The chosen group contains x busy and k − x idle circuits. Since any choice of busy !
circuits
p
may be combined with any choice of idle ones, the busy ones can be chosen in
x
!
n−p
different ways and the idle ones in different ways. Thus the total number of
k−x
combinations containing x busy circuits is
! !
p n−p
· (1.15)
x k−x
!
n
The total number of combinations containing k circuits (idle or busy) is . So the
k
1.4. COMBINATIONS 9
Example 4.4
• Equalities: ! !
n n
= (1.20)
r n−r
!
n
= 0, for r > n and for r < 0 (1.21)
r
!
n
=1 (1.22)
0
10 CHAPTER 1. COMBINATORIAL ANALYSIS
n
!
X n
= 2n (1.24)
r=0
r
n
! !
X i n+1
= (1.25)
i=r
r r+1
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
··· ··· ··· ··· ··· ··· ··· ··· ···
• Summary: n!
(n−r)!
, repetitions are not allowed
P ermutation =
nr , repetitions are allowed
n!
r!(n−r)!
, repetitions are not allowed
Combination =
(n+r−1)!
, repetitions are allowed
r!(n−1)!
1.5 Exercises
1. Evaluate 69! by using Sterling’s approximation to n! (use logarithm to the base 10).
Compare the result with the value given in Example(2.1).
3. How many 6-digit telephone numbers can be formed with the digits 0,1,2,· · ·, 9 (0 is
not allowed in the first digit) if
(a) repetitions are allowed?
(b) repetitions are not allowed?
(c) the last digit must be 0 and repetitions are not allowed?
1.5. EXERCISES 11
4. We consider a 4-digit binary number. Every digit may be ”0” or ”1”. How many
different numbers have two ”0” (and two ”1”)?
7. A trunk group contains 10 circuits. 7 circuits are busy. What is the number of
combinations which contain X(X = 0, 1, 2, 3, 4) busy circuits?
Updated: 2001.01.10
12 CHAPTER 1. COMBINATORIAL ANALYSIS
Chapter 2
Probability theory deals with the studies of events whose occurrence cannot be predicted in
advance. These kinds of events are termed random events. For example, to throw a single
die, the result may be one of the six numbers: 1, 2, 3, 4, 5, 6. We cannot predict the
result. So the outcome of throwing a die is a random event. When observing the number of
telephone calls arriving at a telephone exchange during a certain time interval, we are of
cause unable to predict the actual number of arriving calls. This is also a random event.
Probability theory is usually discussed in terms of experiments and possible outcomes of the
experiments. The set theory plays an important role in the study of probability theory.
A set is a collection of objects called elements of the set. In general we shall denote a set by
a capital letter such as A, B, C, etc. and an element by a lower case letter such as a, b, c, etc..
A set can be defined by listing its elements. If the set A consists of the elements a, b, c, then
we write
A = {a, b, c}
A set can also be defined by describing some properties held by all elements and by
non-elements. We call a set a finite (or infinite) one if it contains a finite (or infinite)
number of elements.
1
2 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
Example 1.1
Example 1.2
If any element of a set A does belong to a set B, then we call A a subset of B, written:
A ⊆ B (” A is contained in B ”) or
B ⊇ A (” B contains A ”)
For all sets we have A ⊆ A. If both A ⊆ B, and B ⊆ A, then A and B are said to be equal,
and we write A = B. In this case A and B have exactly the same elements.
The set consisting of the combinations of the letters a, b, c, and d taken three at a time is
a proper subset of the set consisting of the permutations of the same four letters taken three
at a time.
The set consisting of successful call attempts between 9 a.m. and 10 a.m. is a subset of all
call attempts during the same period.
2.1. SET THEORY 3
All sets considered are in general assumed to be subsets of some fixed set called the universe
or the universal set, and denoted by U. It is also useful to define a set having no elements
at all. This is called the null set and is denoted by ∅.
A universe U can be shown graphically by the set of points inside a rectangle (fig. 2.1).
Subsets of U (such as A and B shown in fig. 2.1) can be represented by sets of points inside
circles. Such a diagram is called a Venn Diagram. It often serves to provide geometric
intuition regarding possible relationships between sets.
U
A...................................................................................... ..............
........... . ..........
....... . . . . ........
..... . . . . . . . .....
.... . . . . . . . . . . . . . . . ......
. . . . . . . . . . .. .
.... . . . . . . . . . . . . . . . . . ..... .... . . . . . . . . . . . . . . . . . .....
... . . . . . . . . . . ... ... . . . . . . . . . . ...
..... . . . . . . . . . . . . . . . . . . . ..... ... . . . . . . . . . . . . . . . . . . . ....
... . . . . . . . . . . . . . . . . . . . . . .. .... . . . . . . . . . . . . . . . . . . . . . ...
.... . . . . . . . . . .... .... . . . . . . . . . ....
... . . . . . . . . . . ... ... . . . . . . . . . . ...
.... . . . . . . . . . . . . . . . . . ... .... . . . . . . . . . . . . . . . . . ...
... . . . . . . . . . ... ... . . . . . . . . . ...
... . . . . . . . ..... ... . . . . . . . .....
...... . . . . . ...... ...... . . . . . ......
........ . . . . . . . ....... ........ . . . . . . . .......
............................. .............................
B
2.1.2 Operators
....................... .......................
U ...................................................................
U
A ........ . . . ............. . . .........
...... . . . . ....... ..... . . . . .....
..... . . . . . . . . ..... . . ...... . . . . . . . . ...... A.................... ..
..... .....
....... . . .....
......
....
...
. . . .
... . . . . . ... . . . .... . . . . ... . .... . . . . . . .... ...
.. . . . . . .. . . . ... . . . . . ... .... .... . . . . . . . . ..... ...
.. . . . . . . . . . .... . . . . . . . . ..... . . . . . . . . . .... ... ... . . . . ... ..
.... . . . . ... . . . . ... . . . . . .. ...
.... . . . . . . . . . . .... . . . . . . . . . .... . . . . . . . . . .... ...
...
... . . . . . . . . ...
.. . . . . .. ..
.... . . . . . . . . . ... . . . . . . . . . .... . . . . . . . . . .... .. . . . . .. ..
... . . . . . .... . . . . .... . . . . ... ... . . ..
.
... . . . . .... . . . . ... . . . . ... ... ... . . . . . . . .. .
.
... .... . . . .. . .
.... . . . . . . . . . .... . . . . . . ..... . . . . . . . . ... ... .... . . ... ...
... . . . . . .... . . ..... . . . . ... .... ...... . . ... ...
..... . . . . . .... . .... . . . . ..... ...... ..... ...... ....
....... . . . . .......... . . . . ...... ........ ... ......
........... . ..................... . . ...........
B ..............................................................
............. ............. B
. . . . . . . . . . .
. .
............ ......................
U .
.
.
.
.
.
.
.
.
.
.
......
.
.....
.
.
.
.
.
. .U.. . ..
A ........... . ........... .......... ...... . . . . ................ ................ . .
....... . . . . ......... ..... . ........ . . ..........
..... . . . . . ..... ........ ... . . . . . .
.
..... . . . . . . . . ......
... . . . . . . . . ....
...
...
...
...
...
...
.
.
.
.
.
. .. . .
.
..... .
.
.
.
A .
.
. ..... .
..
. .. .
.
.
.
.
..... . . . . . . . . . .... ... ... . . . . .
..... . .
... . . . . . ... .. .. .. . . . . .
... . . . . . . . . . . ... . . . . ...... . . . . . .. . . .
.... . . . . ... ... .. . . ... . . . . . ..... . .
... . . . . . . . . . .... . ..
.... . . . . ... ..
.
.
..
.
. . . . ..... . . . . . .... . . .
.... . . . . ...
..... . . . . .... ...
..
...
.
. . . ...... . . . . ..... . . .
....... . . . . . . . . ....... ........ .... . . . ........ . . . ........ . . .
........ . . .............. ..... . ............................................
............................. . . . . .
......................
B . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
Set operations are similar to those of Boolean algebra as seen from the following theorems.
1. Idempotent laws:
A∪A= A A∩A=A
2. Commutative laws:
A∪B =B∪A A∩B =B∩A
3. Associative laws:
A ∪ (B ∪ C) = (A ∪ B) ∪ C = A ∪ B ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C = A ∩ B ∩ C
4. Distributive laws:
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
5. Identity laws:
A∪∅ =A A∩∅=∅
A∪U =U A∩U =A
6. De Morgan’s laws:
C(A ∪ B) = CA ∩ CB
C(A ∩ B) = CA ∪ CB
7. Complement laws:
A ∪ CA = U A ∩ CA = ∅
C(CA) = A CU = ∅ C∅ = U
Any true results involving sets is also true if we replace unions by intersections, intersections
by unions, sets by their complements and we reverse the inclusion symbols ⊂ and ⊃.
6 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
Experiments are of great importance in science and engineering. Experiments in which the
outcome will not be the same, even though the conditions are nearly identical, are called
random experiments and are subject to study by probability theory.
A set of a list of all possible outcomes of an experiment is called a sample space denoted by
S. The individual outcome is called a sample point which is an element of the set S.
Example 2.1
Example 2.2
Example 2.3
Example 2.4
2
2.2. SAMPLE SPACE AND EVENTS 7
If the sample space has a finite number of points it is called a finite sample space (Example
2.1 and 2.2). If it has as many points as there are natural numbers it is called a
countable finite sample space (Example 2.3). In both cases it is called a discrete sample
space. If it has as many points as there are points in some interval it is called a
non-countable infinite sample space or a continuous sample space(Example 2.4).
An event is a subset A of the sample space, i.e. it is a set of possible outcomes. If the
outcome of an experiment is A we say that the event A has occurred or that A is a
realization of the experiment.
An event which consists of one sample point of S is called a simple event. It can not be
broken down to other events. A compound event is the aggregate of a number of simple
events. A sure event is an event that will definitely occur. Naturally an impossible event
never occurs.
Example 2.5
For a !
group of 12 lines the compound event that exactly 2 circuits are busy consists of
12
= 66 simple events.
2
Since events are sets, statements concerning events can be translated into the language of
set theory and conversely. We can represent events graphically on a Venn Diagram, and we
also have an algebra of events corresponding to the algebra of sets given in section 2.1. By
using the operators of section 2.1 on events in S we can obtain other events in S. If A and
B are events, we have
If A ∩ B = ∅, that is the sets corresponding to events A and B are disjoint, then both
events cannot occur simultaneously. They are mutually exclusive.
A set of events is termed exhaustive if their union is the entire sample space S.
8 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
Example 2.6
Consider a trunk group having four trunk circuits. The experiment is to observe the state of
the trunks: busy or idle. A sample space S for this experiment of observing trunk circuits 1
through 4 may be the set of four-tuples (a1 , a2 , a3 , a4 ). ai is either 1 or 0 indicating that
the i’th trunk circuit is busy or idle. Thus the sample point (1,0,1,0 ) corresponds to the
outcome that the first and the third circuits are busy while the second and the fourth
circuits are idle. The sample space consists of 24 = 16 sample points.
Let A be an event in which at least two circuits are idle, and let B be an event in which no
more than two circuits are idle. Then A ∪ B is the whole space S. A ∩ B is the collection of
elements of S in which just two trunks are idle. If C is the event that exactly one circuit is
idle, then A ∩ C = ∅ that is to say A and C have no sample points in common.
Probability is a positive measure between 0 and 1 associated with each simple event, the
total of all simple event probabilities being 1. From a strict mathematical point of view it is
difficult to define the concept of probability. We shall use a relative frequency approach (the
posteriori approach)
In early or classical probability theory, all sample spaces were assumed to be finite, and
each sample point was considered to occur with equal frequency. The definition of the
probability P of an event A was described by the relative frequency by which A occurs:
h
P (A) =
n
where h is the number of sample points in A and n is the total number of sample points.
This definition is applicable in some cases as the following example.
According to Example 2.6, there are 24 = 16 sample points. A is the event that at least two
trunk circuits are idle, B is the event that at most two circuits are idle, C is the event that
exactly one circuit is idle. We use the combinational analysis to get the probabilities:
! ! !
4 4 4
hA = + + = 11
2 3 4
2.3. THE CONCEPT OF PROBABILITY 9
hA 11
and P(A) = n
= 16
= 0.6875
! ! !
4 4 4
hB = + + = 11
0 1 2
hB 11
and P (B) = n
= 16
= 0.6875
4
!
X 4
hA∪B = = 16
i=0
i
16
and P (A ∪ B) = 16
= 1.
6
and P (A ∩ B) = 16
= 0.375
!
4
hC = =4
1
4
and P (C) = 16
= 0.25
hA∩C = 0
0
and P (A ∩ C) = 16
= 0.
Let us consider an experiment with sample space S. Let h be the number of times that the
event A occurs in n repetitions of the experiment. Then we define the probability of A by
h
P (A) = lim (2.1)
n→∞ n
Thus the probability of an event is the proportion of all experiments in which this event
occurs when we make a very large number of experiments. From the definition we obtain a
number of basic properties:
P (∅) = 0 (2.3)
P (S) = 1 (2.4)
If S is a continuous sample space, then the probability of any particular sample point is
zero. We therefore need the concept of a probability density:
Z
P (A) = p(s)ds (2.7)
A
This is similar to the discrete case (2.5), and all laws of probability still apply if we replace
summation by integration.
Example 3.2
In Example 2.4 we have a continuous sample space {9 ≤ t ≤ 10}. The probabiliy of a call at
9:30 sharp is zero. If we assume the call is equally likely to occur anywhere between 9 and
10, then the density function becomes (1hour)−1 = (60minutes)−1 . The probability of a call
2 1
between 9:29 and 9:31 then becomes 60 = 30 .
Example 3.3
1
P (1) = P (2) = P (3) = P (4) = P (5) = P (6) =
6
2.4. ADDITION RULE 11
In some cases enough is known about the experiment to enumerate all possible outcomes
and to state that these are equally likely. The probability of A is then equal to the ratio of
the number of outcomes in which A is realized to the total number of outcomes.
Combinatorial analysis is useful to find the relevant number of outcomes.
If all possible combinations are equally likely, then the probability of finding k busy circuits
are given by (1.18) or (1.19).
Given the first trunk circuit is known to be idle, then we are interested in the probability P
that there are at least two other circuits idle. It is obvious that” the first circuit is idle” is
an event, and that” at least two other circuits are idle is also an event. So P is the
probability that one event occurs under the condition that another event has occurred. This
kind of probability is naturally called conditional probability.
12 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
We consider an experiment. If it is known that an event B has already occurred, then the
probability that the event A has also occurred is known as the conditional probability. This
is denoted by P (A | B), the conditional probability of A given B, and it is defined by
P (A ∩ B)
P (A | B) = (2.10)
P (B)
i.e. the number of experiments in which A and B (at least) are realized.
Example 3.5
Let A denote the event ”a group of 5 circuits (a, b, c, d, e) contains 2 calls only which occupy
adjacent circuits”. Let B denote the event ”a group of 5 circuits contains 2 calls only, one of
which occupies the circuit a”.
!
5
From combinatorial analysis we know that 2 lines can be busy in = 10 different ways.
2
4 1
Therefore we get P (B) = 10 (ab, ac, ad, or ae occupied), and P (A ∩ B) = 10 (ab occupied).
1
10 1
P (A | B) = 4 =
10
4
In this case A and B are said to (statistically) independent events. The probability of event
A does not depend on whether B has occurred or not:
P (B | A) = P (B)
or
P (A | B) = P (A)
2.7. BAYES’ RULE 13
Example 3.6
If the probability of getting B-busy (called party busy) at a random point of time is 0.10,
then the probability of getting B-busy in two call-attempts at two different days is
0.10 · 0.10 = 0.01.
Let A1 , A2 ,· · ·,Ak be mutually exclusive events whose union is the sample space S
(exhaustive). Then for any event A we have
P (Ai ) · P (A | Ai )
P (Ai | A) = Pk (2.13)
j=1 P (Ai ) · P (A | Aj )
From this we can find the probabilities of the events A1 ,A2 ,· · ·, Ak which can cause A to
occur. In this case we can thus obtain P (A | B) from P (B | A), which in general is not
possible.
Example 3.7
A and B are two persons in two different places calling a third person C. It has been found
that A talks with C on the average 9 times during the same time as B talk with C 10 times.
On the average, out of 100 call attempts, A succeeds 80 times and B 70 times. During this
trial, C’s telephone is ringing, but it is unknown which person is calling successfully. Then
what is the probability that C’s incoming call is from B?
Let Ta and Tb be the events that the call is from A and B respectly. Consider the ratio of
the average number of successful calls from A to the average number of successful calls from
B, we have P (Ta ) = 0.9 · P (Tb ). Let D be the event that a call reaches C. According to the
data, we have P (D | Ta ) = 0.8 and P (D | Tb ) = 0.7. Using the Bayes’ formula:
P (Tb ) · P (D | Tb )
P (Tb | D) =
P (Ta ) · P (D | Ta ) + P (Tb ) · P (D | Tb )
0.7 · P (Tb )
= = 0.493
0.9 · P (Tb ) · 0.8 + 0.7 · P (Tb )
2
14 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
2.8 Exercises
1. Telephones returned to a workshop for repair are subject to three kinds of defects,
namely A, B and C. A sample of 1000 pieces was inspected with the following results:
5. Let the sample space in exercise 2 correspond to the throw of a die. We assign equal
probability to the sample points. Find the probability P (B | A).
6. Determine the probability of three sixs in five throws of a fair die.
7. A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at random
without replacement, determine the probability that:
(a) all 3 are red
(b) all 3 are white
(c) 2 are red and 1 is blue
(d) at least 1 is white
(e) 1 of each color is drawn
(f) balls are drawn in the order of red, white, blue.
8. Suppose six dice are thrown simultaneously. What is the probability of getting
(a) all faces alike
(b) no two faces alike
(c) only five different faces
16 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY
Updated: 2001.01.10
Chapter 3
Let us assign a real number to each point of a sample space, i.e. each sample point has a
single real value. This is a function defined on the sample space. This function is called a
stochastic function, and the result of a given experiment which generates sample points is
called a stochastic variable (a random variable). Actually this variable is a
stochastic function defined on the sample space.
Example 1.1
1
2 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS
Let X be a discrete stochastic variable which can take the values x1 , x2 , · · ·, xn (finite
number or countably many values). If these values are assumed with probabilities given by
P {X = xk } = f (xk ) (3.2)
then we introduce the (probability) density function (frequency function) noted by
P {X = x} = f (x) (3.3)
For x = xk this reduces to (3.2), while for other values of x we have f (x) = 0.
The distribution function is obtained from the density function by noting that
X
F (x) = P {X ≤ x} = f (u) (3.6)
u≤x
Let us define a discrete random variable X that counts the number of busy circuits in a
trunk group of 4 trunks. X takes only the values 1, 2, 3 and 4. Let the probabilities be given
by
Then we can get the distribution function F (x) of the discrete random variable X as follows:
3.3. CONTINUOUS PROBABILITY DISTRIBUTIONS 3
F (0) = 0
F (1) = p(1) = 0.40
F (2) = p(1) + p(2) = 0.75
F (3) = p(1) + p(2) + p(3) = 0.90
F (4) = p(1) + p(2) + p(3) + p(4) = 1.00
1.00
1.0
0.90
p 0.8 0.75
r
o
b 0.6
a
b
i 0.40
l 0.4
i
t 0.2
y
0.0
0 1 2 3 4 5
Number of tests
When X is a continuous stochastic variable, the probability that X takes any one particular
value is in general zero. We noticed, however, in Chapter 2 (2.7) that the probability that
X is in between two different values is meaningful. In fact ”a < X ≤ b” is the event
corresponding to the set ]a, b].
By analogy to (3.6) we define the distribution function F (x) for a continuous stochastic
variable by Z x
F (x) = P {X ≤ x} = P {−∞ < X ≤ x} = f (u)du (3.10)
−∞
In general, we have
f (x) = F 0 (x) (3.11)
Example 3.1
1.0 ......................................................................................................................
...................
..........
.......
.
........
.
..
.....
...
...
...
..
.
...
...
...
...
...
...
F (x) 0.5 ...
....
..
...
...
....
..
...
...
....
..
..
..
..
....
0.0 .
We are often interested in two stochastic variables X and Y at the same time. These may
be the outcomes of 2 experiments, or they may be a pair of figures emerging from a single
experiment.
(X, Y ) can be regarded as taking values in the product space (S × T ) consisting of all pairs
(s, t) with s ∈ S and t ∈ T .
f (x, y) = P {X = x, Y = y} (3.13)
where
f (x, y) ≥ 0 (3.14)
X
f (u, v) = 1 (3.15)
u,v
f (xj , yk ) = P {X = xj , Y = yk } (3.16)
The continuous case is easily obtained by analogy by replacing sums by integrals. It is also
obvious how the mixed case (discrete - continuous) should be dealt with.
If the events X = x and Y = y are independent for all x and y, then we say that X and Y
are independent stochastic variables. In this case
P {X = x, Y = y} = P {X = x} · P {Y = y} (3.19)
or equivalently
f (x, y) = f (x) · f (y) (3.20)
6 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS
Example 4.1
Consider again two consecutive throws of a die. Let X and Y correspond to the result of
the first and second throw. We can easily see that X and Y are independent, and
x, y ∈ {1, 2, 3, 4, 5, 6}, each with probability 16 . Hence the two-dimentional variable (X, Y )
takes on the pairs of values (i, j) where i, j ∈ {1, 2, 3, 4, 5, 6}.
P (Z = 2) =P (X = 1, Y = 1)
1
=P (X = 1) · P (Y = 1) = 36
P (Z = 3) =P (X = 1, Y = 2)
1
+P (X = 2, Y = 1) = 18
············
P (Z = 12) =P (X = 6, Y = 6)
1
=P (X = 6) · P (Y = 6) = 36
P12
It is easy to verify that i=2 P (Z = i) = 1.
For a discrete stochastic variable X taking the possible values x1 , x2 , · · · , xn we define the
expectation of X of the mean of X as follows
n
X n
X
E(X) = xj · P {X = xj } = xj · f (xj ) (3.21)
j=1 j=1
For the continuous case the expectation of X with density function f (x) is defined in a
similar way: Z ∞
E(X) = x · f (x)dx (3.22)
−∞
Discrete case: n
X
E(X) = g(xj ) · f (xj ) (3.23)
j=1
Continuous case: Z +∞
E(X) = g(x) · f (x)dx (3.24)
−∞
Example 5.1
Assume that the random variable X can take on two values x1 = −1 with probability
p1 = 0.2 and x2 = 1 with probabity p2 = 0.8
Example 5.2
∞
−λ
X λx
= λe = λe−λ eλ = λ
x=0 x!
Example 5.3
1. E(c) = c
2. E(c · X) = c · E(X)
Discrete case: X
αr = xrj · f (xj ) (3.25)
j
Continuous case: Z +∞
αr = xr · f (x)dx (3.26)
−∞
We notice that:
α1 = E(X) α2 = E(X 2 )
The r’th moment of a stochastic variable X about a is defined by E((X − a)r ).
Discrete case: X
µr = (xj − E(X))r · f (xj ) (3.27)
j
3.5. EXPECTED VALUES 9
Continuous case: Z +∞
µr = (x − E(X))r · f (x)dx (3.28)
−∞
Of particular interest is the 2nd moment about the mean. This is called the variance :
This is a non-negative number. The square root of the variance is called the
standard deviation.
1. V ar(c) = 0
2. V ar(cX) = c2 · V ar(X)
Example 5.4
1. in Example 5.1
α1 = E(X) = 0.6
α2 = E(X2 ) = (−1)2 · 0.2 + 12 · 0.8 = 1
µ2 = V ar(X) = α2 − α12 = 0.64
2. in Example 5.2
α1 = λ
P λx
α2 = E(X 2 ) = ∞ 2
x=0 x · x!
· e−λ
3. in Example 5.3
α1 = E(X) = 0
R+∞ 2 − x2
α2 = E(X 2 ) = √12π −∞ x e 2 dx = 1
µ2 = V ar(X) = α2 − α12 = 1
The results given above can be extended to two or more variables having joint density
functions, e.g. f (x, y).
XX
E(X) = u · f (u, v) (3.32)
u v
An interesting quantity arising in the case of two variables is the covariance defined by
Thus we are led to a measure of the dependence of the variables X and Y given by
Cov(X, Y )
ρ= 1 (3.35)
(V ar(X) · V ar(Y )) 2
Later on we will realize that many important discrete variables are outcomes from the
concept of Bernoulli sequence of experiments. A Bernoulli experiment is a trial with only
two possible outcomes. We normally call the two outcomes ”success” and ”failure” with
respective probabilities p and 1 − p . A sequence of such experiments is called Bernoulli
sequence if all the experiments have the same probability of ”success” or ”failure”.
We now consider Binomial distribution. This distribution is given by (ref. Table 3.1)
!
n
P {X = x} = · px (1 − p)n−x , x = 0, 1, 2, · · · , n (3.36)
x
We find
E(X) = n · p (3.37)
V ar(X) = np(1 − p) (3.38)
This distribution applies if one makes n independent Bernoulli experiments in which the
probability of ”success” in each experiment is p. P {X = x} is the probability of there being
exactly x ”successes”. This probability is derived by combinatorial analysis.
Example 6.1
This model is usable when we make n test-calls and observes how many of them are
unsuccessful.
λx −λ
P {X = x} = e , x = 0, 1, 2, · · · (3.39)
x!
We find (cf. Example 5.2 & 5.4):
E(X) = λ (3.40)
V ar(X) = λ (3.41)
This distribution is obtained as the limit of the Binomial distribution when we increase n
and at the same time reduce α, keeping α · n constant and equal to λ.
12 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS
Example 6.2
If we examine a large number of subscribers each of which has a small probability of being
busy, then the number found busy will follow a Poission distribution.
Example 6.3
The number of calls incoming to an exchange during one hour will also follow a Poission
distribution.
The random variable which in a Bernouli sequence counts the number of trials to get the
first success is called a geometric random variable. It is described by the
Geometric distribution. In some cases we don’t include the trial for success so that the
values assumed are k = 0, 1, · · · . The geometric distribution is shown in Table 3.1. Notice,
that this distribution includes the success (k = 1, 2, · · ·). By adding k geometric
distributions we get the Negative Binomial distribution, which is also shown in Table 3.1
(Pascal distribution). In Chapter 1 we indicated the Hypergeometric distribution (formula
(1.17)). From Table 3.1 we notice the close relationship between the Binomial, the
Geometric and the Negative Binomial distributions.
1 1 t−µ 2
f (t) = √ exp(− ( )) − ∞ < t < +∞ (3.42)
σ 2π 2 σ
3.7. SOME CONTINUOUS DISTRIBUTIONS 13
The standard Normal distribution has a mean of 0 and variance of 1, and forms the basis
for tables of the Normal distribution. The properties of other Normal distributions are
obtained from these tables by working in terms of the quantity (t−µ)
σ
.
This distribution is called the negative exponential distribution in teletraffic theory. The
density and distribution functions are
respectively
F (t) = 1 − e−λt , t ≥ 0, λ > 0 (3.46)
We have:
1
E(T ) = (3.47)
λ
1
V ar(T ) = 2 (3.48)
λ
This is one of the most important distributions in teletraffic theory.
(λt)k−1 −λt
f (t) = λe (3.49)
Γ(k)
14 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS
λk
f (t) = (λkt)k−1 · e−λkt λ>0 (3.50)
(k − 1)!
k−1
−λkt
X (λkt)j
F (t) = 1 − e · (3.51)
j=0 j!
1 1
E(T ) = V ar(T ) = (3.52)
λ kλ2
When k = 1, the Erlang-k distribution is identical with the exponential distribution. When
k = ∞, since V ar(T ) = 0 ,the random variable becomes constant.
From Table 3.1 we notice the close relationship between the Exponential, the Poisson and
the Erlang-k distributions. We also notice the relationship to the discrete cases.
Example 7.1
The holding time of a control device has been found to be Erlang-5 distributed with average
value of 500 milliseconds. What is the probability that the holding time does not exceed
750 milliseconds?
1
Let X be the random variable, we have E(X) = 500 ms.. So that λ = 500 (ms.)−1 . The
probability that X does not exceed 750 milliseconds is given by Fx (750).
k−1
−λkx
X (λkx)j
Fx (750) = 1 − e
j=0 j!
4
−7.5
X (7.5)j
= 1−e = 0.8679
j=0 j!
2
3.8. EXERCISES 15
3.8 Exercises
1. The number of calls arriving on a group of devices in a telephone system was recorded
on a counter. The counter was read off every 3 minutes. The following values xi were
16 CHAPTER 3. ELEMENTS OF MATHEMATICAL STATISTICS
06 08 08 07 07 06 09 07 06 03
09 07 11 07 12 09 13 08 09 05
08 15 09 09 19 16 10 11 11 15
17 12 16 14 15 14 09 10 14 14
2. Prove (3.30).
Updated: 2001.01.10
Chapter 4
Theory of sampling
4.1 Sampling
We are often interested in drawing conclusions about a large set of objects, which we shall
call a population . The population size N can be finite or infinite. Instead of examining the
entire population (doing this is often impossible in practice) we observe only a
sample of size n, which is a subset of the population. The process of obtaining samples is
called sampling. The purpose is to obtain some knowledge about the population from
results found in the sample.
Sampling where each element of a population may be chosen more than once is called
sampling with replacement. In sampling without replacement each element cannot be
chosen more than once.
By taking random samples from the population these may be used to obtain estimates of
the population parameters. An important problem in sampling theory is to decide how to
1
2 CHAPTER 4. THEORY OF SAMPLING
form the sample statistics which will best estimate a given population parameter.
observations : x1 , x2 , · · · , xn
sample size : n
These statistics are functions of stochastic variables and are therefore stochastic variables
themselves.
The unkown population mean and variance are estimated by the following unbiased
estimators:
µ = E{x} (4.3)
n
σ 2 = E{sb2 } = E{ · s2 } (4.4)
n−1
(these important results are proven in mathematical statistics)
We now want to know how accurate these results are.
If a sample of size n is taken from a population with finite mean µ and finite variance σ 2
(and otherways any statistical distribution), then as n increases the distribution of the
sample mean x is asymptotically normal distributed (cf. section 3.7) with mean value µ and
2
variance σn . Or equivalently, the distribution of
(x − µ)
Z= (4.5)
√σ
n
Example 4.1
x−p
√ → N(0, 1) (n → ∞)
p(1−p)
√
n
or
Pn
Xi − np
qi=1 → N(0, 1) (n → ∞)
np(1 − p)
P
Since we know from section 6 of Chapter 3 that Sn = ni=1 Xi is binomial distributed, the
above expression shows that for large n an approximation for binomial probaboloties can be
obtained by using the Normal probabilities of N(np, np(1 − p)).
A sample statistic, which is calculated from a sample, is a function of random variables and
is therefore itself a random variable. The probability distribution of a sample statistic is
called the sampling distribution of the statistic . We shall only consider two sampling
distributions for the sample mean.
Suppose that the population from which samples are taken has a probability distribution
with mean value µ and variance σ 2 (not necessary a Normal distribution). Then it can be
shown that the sampling distribution of x is asymptotically normal distributed N(µ, σ 2 ),
i.e. :
x−µ
Z = σ → N(0, 1) f or n → ∞ (4.6)
√
n
If we choose a so-called confidence level 1 − α, then we can expect to find x lying between
the confidence limits
σ
µ ± z1− α2 · √ (4.7)
n
with the probability (1 − α) · 100% of the time.
This interval (µ − z1− α2 · √σn , µ + z1− α2 · √σn ) is called the confidence interval . z1− α2 is
obtained from the standard Normal distribution:
α
P {−∞ < T ≤ z1− α2 } = 1 − (4.8)
2
Example 4.2
For some values of the confidence level α we have the following values of z:
α z1− α2
10% 1.6449
5% 1.9600
1% 2.5758
Thus (section 3.7) 2.5% of probability (area under the density function) is above t = 1.9600,
and (because of symmetry) 2.5% is below t = −1.9600.
In most pratical applications we do not know the population parameters µ and/or σ 2 . Then
we estimate these parameters by the sample mean x , respectively the sample variance sb2 .
It can be shown that the sample mean has a so-called (student) t-distribution. The
confidence interval becomes:
sb
x ± t1− α2 ,n−1 · √ (4.9)
n
where the t-value is obtained from a table of the t-distribution, which has an additional
parameter: degrees of freedom = n − 1. For increasing n this distribution is asymptotically
Normal distributed:
lim t1− α2 ,n−1 = z1− α2 (4.10)
n→∞
4.4. SAMPLING DISTRIBUTION 5
The t-value yields a larger confidence interval than the z-value (we have less information
because we don’t know the population mean and variance), but for large values of n and for
most pratical purposes we often use the z-value.
Example 4.3
n t97.5%
1 12.71
2 4.30
5 2.57
10 2.23
20 2.09
50 2.01
For a given confidence level we have a relation between the confidence limits (confidence
interval) and the sample size. If we want to reduce the confidence interval by a factor c,
then we must increase the sample size by a factor c2 .
Example 4.4
The average holding time of calls during a certain period in a telephone system is to be
estimated. Based on a random sample of 100 holding times of calls, the sample mean and
sample variance are calculated as x = 5.74 time unit and sb2 = 2.65 square of time unit.
Find a 95% confidence interval for the true average holding time of calls in that period.
Let µ denote the true average holding time of calls. The confidence interval for µ based on
formula (4.9), is
sb sb
(x − √ · t1− α2 ,n−1 , x + √ · t1− α2 ,n−1 )
n n
where n = 100, 1 − α = 0.95, x = 5.74, sb = 2.65. We have t1− α2 ,n−1 = 1.984. Therefore the
6 CHAPTER 4. THEORY OF SAMPLING
4.5 Exercises
1. A bottle is supposed to contain 250 ml of wine, with a standard deviation of 3ml. If
we sample 200 such bottles at random, probability that the average of wine contained
in a bottle will be
(a) At most 248 ml.
(b) At least 252 ml.
(c) Between 249 and 251 ml.
2. If X is a Poisson random variable with mean 81, find the approximate probability
P (X ≥ 75).
4. Suppose that it is observed that the average span of using one kind of parts of a
machine is 5 years, with a standard deviation of 1.2 years. By sampling of 100 of this
kind parts, we obtain x = 4.75. Construct a confidence interval for µ with confidence
level
(a) 99% (b) 95% (c) 90% (d) 80%
Does the length of the intervals increase or decrease as the confidence level decrease?
Updated: 2001.01.10
4.5. EXERCISES 7