Professional Documents
Culture Documents
Probability Theory
Our discussion on probability starts with the Set theory. We first state the basic concepts,
then the set operations (with Laws of operation) and finally define function. We then
discuss the different approaches to the theory of probability, laws of probability,
conditional probability and Bayes’ theorem. We illustrate them with some examples.
• The sample space for an experiment is the set of all experimental outcomes.
• A sample point is an element of the sample space, any one particular experimental
outcome.
• In Sample Space, 2 outcomes cannot occur at the same time. On the other hand,
one outcome must occur.
Sets are unordered collections of elements. Elements are usually named with lower case
letters. Sets are usually named with capital letters.
1
a predicate which holds for members of this set). General form: {x | P(x)}, where P is
some predicate (condition, property).
(3) Defining a set of rules which generates (defines) its members (recursive rules).
a) 4 ∈ E
b) if x ∈ E, then x + 2 ∈ E
c) nothing else belongs to E.
The first rule is the basis of recursion, the second one generates new elements from the
elements defined before and the third rule restricts the defined set to the elements
generated by rules a) and b).
Two sets A and B are equal sets if and only if all the elements in A belong to B and vice
versa. Set A is a subset of B if an element belongs to A is also an element in B.
If two sets have no common elements between them then are called disjoint sets
(mutually exclusive sets).
A ∪ B = {x : x ∈ Aorx ∈ A}
Intersection Set: Set obtained by combining two or more sets using only common
elements.
A ∩ B = {x : x ∈ Aandx ∈ A}
Universal Set (Ω): Totality of the all elements under consideration in a given set.
Set Difference: A –B is the set of all elements that are in A but not in B.
2
a) Commutative Law: A ∪ B = B ∪ A; A ∩ B = B ∩ A
b) Associative Law: A ∪ ( B ∪ C ) = ( A ∪ B) ∪ C ; A ∩ ( B ∩ C ) = ( A ∩ B) ∩ C
c) Distributive Law:
A ∪ ( B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C ); A ∩ ( B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C )
Note given the above;
( AC ) C = A
A ∩ AC = Φ
A∩Ω = A
A ∪ AC = Ω
( A ∪ B) C = A C ∩ B C
(i)
( A ∩ B) C = A C ∪ B C
Proof:
Method 1
Let ( A ∪ B) C ≠ A C ∩ B C
x ∈ ( A ∪ B) C
and
x ∉ AC ∩ B C
x ∉ AC ∩ B C
x ∉ (Ω − A) ∩ (Ω − B)
x ∉ (Ω − A ∩ B )
x ∉ ( A ∩ B) C
x∈ A∩ B
( A ∪ B ) C = AC ∩ B C
3
Method 2
Let
x ∈ ( A ∪ B) C ⇒ x ∉ Aandx ∉ B ⇒ x ∈ A C andx ∈ B C
⇒ x ∈ ( AC ∩ B C ) ⇒ ( A ∪ B) C ⊂ AC ∩ B C
Let
x ∈ ( A C ∩ B C ) ⇒ x ∈ A C andx ∈ B C ⇒ x ∉ Aandx ∉ B
⇒ x ∉ ( A ∪ B) ⇒ x ∈ ( A ∪ B) C ⇒ AC ∩ B C ⊂ ( A ∪ B) C
1.3 Function
A function is a rule or law that associates each element in one set with one and only one
element in another set of elements.
Let a1 ∈ A and a 2 ∈ B
a 2 = f (a1 )
f (.) = {a 2 ∈ B | a1 ∈ A., a 2 = f (a1 )}
Let A ⊂ Ω and x ∈ A . We can define an Indicator function as
I A = 1 if x ∈ A
I A = 0 if x ∉ A
4
1.4 Sample Space and Event
The sample space for an experiment is the set of all experimental outcomes. In Sample
Space, 2 outcomes cannot occur at the same time. On the other hand, one outcome must
occur.
Examples
Event could be any collection of sample points. Simple Event refers to Outcome with one
characteristic whereas Compound Event refers to:
Examples
5
1.5 Different approaches to the theory of Probability
There are three major approaches to the theory of Probability: (1) Classical approach (2)
Frequency approach and (3) Axiomatic approach.
(b) All elementary events are equally likely to occur in a single trial of the
experiment.
Under the above assumptions, the probability of any Event E, is given by the ratio
This formula will help us to compute the probabilities of many events under the classical
(finite sample- space equally likely elementary events) set up. Note that the probability of
any event lies between 0 and 1.
⎛15 ⎞⎛14 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟
⎝ 1 ⎠⎝ 1 ⎠ .
⎛ 29 ⎞
⎜⎜ ⎟⎟
⎝2⎠
Note that the definition will give P(S) = 1 and P(φ) = 0. The certain event has probability
1 and the impossible event has probability 0!
6
In the classical (finite sample- space equally likely elementary events) set up, any two
events A and B can “independently” occur if P(A ∩ B) = P(A) P(B).
One can easily establish the following results using the definition:
1. P(AUB)=P(A)+P(B)-P(A ∩ B)
2. P(Ac)=1-P(A).
3. P(A-B)=P(A)-P(B) if A ⊃ B.
P(A1UA2U…UAk)=P(A1)+P(A2)+…+P(Ak).
5. If A1,A2,…,Akare k mutually exclusive events and exhaustive events (ie they addup
to S, then for any event B
If the elementary events are not equally likely, and even if the Sample space is infinite,
one can adopt this approach for any event. Let E be the event for which we want compute
the probability of its occurrence. Let the experiment be repeated a large number of times
say N times. Let M denote the number of times E has occurred. Then the probability of E
is defined by
Lim ⎡ M ⎤
P(E)=
N → ∞ ⎢⎣ N ⎥⎦
Note that this definition is based on a limiting concept. But the limit has been shown,
mathematically, to exist. But one cannot conduct the experiment infinite number of times
to find P(E) from a practical point of view. So one can approximate P(E) by the ratio
M/N for sufficiently large N. Note this approach suffers from large replications of the
experiment.
7
1.5.3 Axiomatic approach
The sample space, the set of all possible outcomes of the random experiment can be
uncountably infinite like the Real line R=(-∞, ∞). Note that this includes the finite sample
space in the classical approach. Let us denote this by Ω. We do not assign probabilities to
all subsets of Ω as that would be impossible if Ω is infinite. Instead we concentrate on a
class of interesting events that would be sufficient for our inference. Such a class, say F,
should satisfy the following three requirements:
∞
3. If {An} is any sequence events then UA
n =1
n also belongs to F .
Given the sample space Ω and the event space F, the probability is a non-negative real
valued function on the event space F satisfying the following three axioms:
Axiom 1: P ( Ai ) ≥ 0
Axiom 2: P (Ω ) = 1
Axiom 3: For any sequence {An} of mutually disjoint (exclusive) events in F,
∞ ∞
P (U An ) = ∑ P( An ) . The triplet (Ω, F,,P) is called the probability space.
n =1 n =1
Theorem 1 P (Φ ) = 0
For infinite sets, we choose a set of Ai such that Ai = Φ (for all i = 1, 2, 3,….). By
definition they are mutually exclusive.
Using Axiom 3,
8
∞ ∞
P (U Ai ) = ∑ P( Ai )
i i =1
∞
P (Φ ) = ∑ P (Φ )
A number will be equal to zero if its sum (given it is positive) if and only if the number
itself is zero. This implies P (Φ ) = 0
Theorem 2 P ( A C ) = 1 − P( A)
AC = Ω − A
AC ∩ A = Φ
P (Ω ) = P ( A ∪ A C )
= P( A) + P( A C )
= 1 ⇒ P( A C ) = 1 − P( A)
Theorem 3 P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B )
A ∪ B = A ∪ ( AC ∩ B )
A ∩ ( AC ∩ B ) = Φ
P( A ∪ B ) = P( A) + P( AC ∩ B )
P( A ∪ B ) = P( A) + P( B ) − P( A ∩ B )
General Result:
If A1, A2, …., An are not mutually exclusive events and Ai ∈ F for any i = 1,2,…,n then
n
P( A1 ∪ A2 ∪ .... ∪ An ) = ∑ P( A j ) − ∑ ∑ P( Ai ∩ A j ) + ∑ ∑ ∑ P( A1 ∩ A2 ∩ .... An )...( −1) n +1 P(( A1 ∩ A2 ∩ .... An )
j =1 i< j
Theorem 4 Let A, B ∈ F and A ⊂ B , then P( A) ≤ P( B )
B = ( B ∩ A) ∪ ( B ∩ A C )
( B ∩ A) ∩ ( B ∩ A C ) = Φ
A ∩ ( B ∩ AC ) = Φ
P ( B ) = P ( A) + P ( B ∩ A C )
Since
P( B ∩ A C ) ≥ 0
P( B ) ≥ P( A)
P ( A) ≤ P ( B )
9
Boole’s inequality (Proof omitted)
P( A1 ∪ A2 ∪ ... ∪ An ) ≤ P( A1 ) + P( A2 ) + ... + P( An )
A useful counting rule enables us to count the number of experimental outcomes when n
objects are to be selected from a set of N objects where the order of selection is
important.
⎛N⎞ N! N!
P nN = n! ⎜⎜ ⎟⎟ = n! =
⎝n ⎠ ( n! )( N − n)! ( N − n )!
Another useful counting rule enables us to count the number of experimental outcomes
when n objects are to be selected from a set of N objects. Here the order of selection is
not important. The number of combinations of N objects taken n at a time is:
⎛N⎞ N!
C nnNN = ⎜⎜ ⎟⎟ =
⎝ n ⎠ n!( N − n)!
Examples
Example 1
A problem is given to three students whose chances of solving it are ½, ¾, and ¼
respectively.
a) What is the chance that the problem is being solved?
b) What is the chance that exactly one of them solves it?
a) P(solved) = 1 – P(not solved) = 1 – (1/2)(3/4)(1/4) = 1 – (3/32) = 29/32
10
Example 2
An urn contains 4 Red, 3 White and 2 Blue balls. A person draws 4 balls without
replacement. What is the probability that amongst the balls drawn at least one ball of
each colour?
{2 R, 1 W, 1 B}, {1 R, 2 W, 1 B}, {1 R, 1 W, 2 B}
4 3 2 3 2 3 2
C 2 C1 C1 4 C1 C 2 C1 4 C1 C1 C 2 4
= 9
+ 9
+ 9
=
C4 C4 C4 7
Example 3
The conditional probability refers to the probability of an event given that another event
has occurred. The conditional probability of A given B is denoted by P(A|B). A
conditional probability is computed as follows:
P( A ∩ B )
P( A | B) = if P( B) ≠ 0
P( B)
P( B ∩ A)
P( B | A) = if P( B) ≠ 0
P( A)
Note P( A ∩ B) = P( B ∩ A)
P( A | B) P ( B) = P( B | A) P( A)
Note
∞
∞
P ( U ( Ai ∩ B) n
P ( U Ai | B) = i =1
and P ( A) = ∑ P( A | Bi ) P( Bi )
i =1 P( B) i =1
11
1.7.1 Bayes’ Theorem
If A is any event, such that P(A)>0, and B1,B2,…,Bn are any finite set of mutually
exclusive and exhaustive events such that P(Bi) >0 for all i= 1,2,…,n, then
P( A | Bi ) P ( Bi )
P ( Bi | A) = n
for i = 1,2,..., n .
∑ P( A | B ) P( B )
i =1
i i
Proof: By the definition of conditional probability, for each i = 1,2,…,n, we can write
P(A∩Bi)=P(A|Bi)P(Bi) and also as P(Bi|A)P(A). Therefore,
P( A | Bi ) P( Bi ) P( A | Bi ) P( Bi )
P ( Bi | A) = = n
P( A)
∑ P( A | Bi ) P( Bi )
i =1
12
Note that mutual independence implies pair-wise independence (see the first condition
above). But mere pair-wise independence need not imply mutual independence as the
following example shows.
Example:
Let Ω be the given sample space and F be the event space. A random variable X is a real
valued function on the sample space Ω such that the inverse image of any interval of the
type (-∞,x] lies in F for all values of x.
Let us denote by F(x) the value P(X ≤ x). This function is called the distribution
function of the random variable. Note, therefore, that every random variable has a
distribution function which determined through the Probability P on the event space.
13
measurable outcomes and no other, the random variable X which measures the outcome
is said to be discrete:
For example, if a random sample of n items is taken from a production line, the random
variable X, representing the number of defectives in the sample, is discrete as it can take
the set of values x = 0, 1, 2, ..., n but no other, i.e. the variable cannot take values like ½,
0.3, 0.0001 etc. Discrete variables need not be integers, e.g. the proportion of defective
1 2
items in the sample would take the values, 0, , , ..., 1 and would also be a discrete
n n
variable.
For example, arrival time of students for a 9:00am lecture may be a random variable X
taking values in the range 8:30 to 9:30. Here any value of the variable can occur.
Note the notational convention: capital letters, like X, Y, Z, refer to random variables in a
general sense; small letters, like x, y, z, refer to possible values taken by random variables
(collectively); small letters with a subscript, like x1 , x 2 , x3 , refer to specific values of a
random variable.
14
1.10 Probability Distributions
Discrete Variables
x 0 1 2 3 4 5
f (x ) 0.1 0.2 0.3 0.2 0.1 0.1 ∑ =1
15
Example 2 Theoretical distribution (Algebraic expression)
If a random sample of n items is taken from a production line with probability of a
defective being equal to P, the probability distribution of ‘the number of defectives in the
sample’ (X) is given by
n!
f ( x ) = nC x P x (1 − P )
n− x
, for x = 0, 1, 2 , ..., n, where n
Cx =
x !(n − x )!
Continuous Variables
For a continuous random variable, the probability distribution is a function (in the
mathematical sense) that describes a curve, the probability density function (pdf), so that
areas under the curve give probabilities associated with corresponding intervals of the
variable. Using general notation, a probability distribution of a continuous random
variable can be defined as
Here the probability distribution f ( x ) always takes the form of an algebraic expression.
+∞
(b) ∫ f (x ) dx = 1, i.e. probabilities sum to 1;
−∞
16
b
(c) P (a ≤ X ≤ b ) = ∫ f ( x ) dx , i.e. the area under the pdf between a and b
a
Note also that P( X = x ) = 0 , as f (x ) is the ordinate of the pdf and thus has no area.
Example
Daily sales of petrol (X, measured in ‘000 litres) of a garage has the following pdf:
x
f (x ) = , 0< x<2
2
f (x )
0.5
0
0 1 2 x
2
1 ⎡ x2 ⎤
2 2
x 1
(1) ∫ f (x ) dx = ∫
0 0
2
dx = ⎢ ⎥ = (4 − 0 ) = 1 as required, i.e. f ( x ) is correctly
2 ⎣ 2 ⎦0 4
specified.
3 3
1 ⎡ x 2 ⎤ 2 1 ⎡⎛ 3 ⎞ ⎛ 1 ⎞ ⎤ 1
2 2 2
⎛1 3⎞ x
(2) P⎜ ≤ X ≤ ⎟ = ∫ dx = ⎢ ⎥ = ⎢⎜ ⎟ − ⎜ ⎟ ⎥ = , i.e. the probability
⎝2 2⎠ 1 2 2 ⎣ 2 ⎦ 1 4 ⎢⎣⎝ 2 ⎠ ⎝ 2 ⎠ ⎥⎦ 2
2 2
that daily sales will be between 500 and 1500 litres is 0.5.
x
Note that if the pdf was specified as, for example, f ( x ) = , where k is an unknown
k
constant, the value of k could be found by using the relation
17
2
∫ f (x ) dx = 1 .
0
In our example,
2
1 ⎡ x2 ⎤
2 2
x 1⎛1⎞ 2
∫ f (x ) dx = ∫
0 0
k
dx = ⎢ ⎥ = ⎜ ⎟(4 − 0) = = 1
k ⎣ 2 ⎦0 k ⎝ 2 ⎠ k
and, therefore, k = 2 .
Discrete Variables
For a discrete random variable, the cumulative probability distribution (cpd) (or
distribution function) is a function that gives the probability that X does not exceed a
specific value x. Using general notation,
F ( x ) = P( X ≤ x ) = ∑ f (t ) for x = ..., xi −1 , xi , xi +1 , ...
t≤x
Note that f ( x ) has been replaced by f (t ) whose form is the same as f ( x ) . The change
of x into t is simply to distinguish the variable (t) from the upper limit of the summation
(x).
(a) F (− ∞ ) = 0 and F (+ ∞ ) = 1 ;
(b) If a < b , then F (a) ≤ F (b) , for any real numbers a and b;
(c) f ( x3 ) = F ( x3 ) − F ( x 2 ), f ( x 2 or x3 ) = F ( x3 ) − F ( x1 ) .
x 0 1 2 3 4 5
f (x ) 0.1 0.2 0.3 0.2 0.1 0.1
F (x ) 0.1 0.3 0.6 0.8 0.9 1.0
18
f (x) F (x)
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 1 2 3 4 5 x 0 1 2 3 4 5 x
f ( x ) = n C x P x (1 − P )
n− x
x = 0 , 1, 2 , ..., n
t=x
F ( x ) =∑ n C t P t (1 − P ) = nC 0 P 0 (1 − P ) + nC1 P 1 (1 − P ) + ... + nC x P x (1 − P )
n −t n −0 n −1 n− x
t =0
.
Continuous Variables
The cumulative probability distribution (cpd) (or distribution function) for a continuous
random variable is a function that gives the probability that X does not exceed a specific
value x. Using general notation,
x
F (x ) = P( X ≤ x ) = ∫ f (t ) dt for − ∞ < x < +∞
−∞
Note that t replaces x in f ( x ) , in order to distinguish the variable (t) from the upper limit
of integration (x).
19
(a) F (− ∞ ) = 0 and F (+ ∞ ) = 1 ;
(b) If a < b , then F (a) ≤ F (b) , for any real numbers a and b;
dF ( x )
(d) f ( x) = .
dx
Example
x
1 ⎡t 2 ⎤
x x
t 1 2
F ( x ) = ∫ f (t ) dt = ∫ dt = ⎢ ⎥ = x .
0 0
2 2 ⎣ 2 ⎦0 4
For example:
2
⎛ 3⎞ ⎛ 3⎞ ⎛ 3⎞ 1 ⎛ 3⎞ 1⎛ 9 ⎞ 9
P⎜ X ≤ ⎟ = F ⎜ x = ⎟ = F⎜ ⎟ = ⎜ ⎟ = ⎜ ⎟ =
⎝ 4⎠ ⎝ 4⎠ ⎝ 4⎠ 4⎝ 4⎠ 4 ⎝ 16 ⎠ 64
gives the probability that daily sales of petrol will not exceed 750 litres.
2 2
⎛1 3⎞ ⎛ 3⎞ ⎛1⎞ 1 ⎛ 3⎞ 1⎛1⎞ 1⎛9⎞ 1⎛1⎞ 9 1 1
P⎜ ≤ X ≤ ⎟ = F ⎜ ⎟ − F ⎜ ⎟ = ⎜ ⎟ − ⎜ ⎟ = ⎜ ⎟ − ⎜ ⎟ = − =
⎝2 2⎠ ⎝ 2⎠ ⎝ 2⎠ 4⎝ 2⎠ 4⎝ 2⎠ 4 ⎝ 4 ⎠ 4 ⎝ 4 ⎠ 16 16 2
gives the probability that daily sales of petrol will be between 500 and 1500 litres. Note
that the answer is the same as in the example given in Section 2.
20
1 2 dF ( x ) 2 x x
Given F (x ) = x , f (x ) = = = , for 0< x<2 as required.
4 dx 4 2
f (x ) F (x )
1 1
0.5 0.5
0 0
0 1 2 x 0 1 2 x
21