Probability Theory: 1.1 Basic Concepts

1.
Probability Theory
Our discussion on probability starts with the Set theory. We first state the basic concepts,
then the set operations (with Laws of operation) and finally define function. We then
discuss the different approaches to the theory of probability, laws of probability,
conditional probability and Bayes’ theorem. We illustrate them with some examples.
1.1 Basic Concepts
• An experiment is any process that generates well-defined outcomes.
• The sample space for an experiment is the set of all experimental outcomes.
• A sample point is an element of the sample space, any one particular experimental
outcome.
• In Sample Space, 2 outcomes cannot occur at the same time. On the other hand,
one outcome must occur.
Sets are unordered collections of elements. Elements are usually named with lower case
letters. Sets are usually named with capital letters.
• Finite Sets: Finite number of elements.

• Denumerable Sets: Each character is identifiable, countable and distinct. In case
of non-denumerable sets, the elements can be classified as having certain
characteristics but each element cannot be separated or identified.
There are three main ways to specify a set:
(1) Listing all its members (list notation);
Example: {zen, astra, santro}
(2) Stating a property of its elements (predicate notation);
Example: {x | x is a natural number and x < 8}

Reading: “the set of all x such that x is a natural number and is less than 8” and therefore
the second part of this notation is a property the members of the set share (a condition or
1
a predicate which holds for members of this set). General form: {x | P(x)}, where P is
some predicate (condition, property).
(3) Defining a set of rules which generates (defines) its members (recursive rules).
Example – the set E of even numbers greater than 3:
a) 4 ∈ E
b) if x ∈ E, then x + 2 ∈ E
c) nothing else belongs to E.
The first rule is the basis of recursion, the second one generates new elements from the
elements defined before and the third rule restricts the defined set to the elements
generated by rules a) and b).
Two sets A and B are equal sets if and only if all the elements in A belong to B and vice
versa. Set A is a subset of B if an element belongs to A is also an element in B.
A set A is a subset of a set B if and only if every element of A is also an element of B.

Such a relation between sets is denoted by A ⊆ B. If A ⊆ B and A ≠ B we call A a
proper subset of B and write A ⊂ B. Note the largest possible subset of a given set is the
set itself whereas the smallest possible subset is the null set for any given set.
If two sets have no common elements between them then are called disjoint sets
(mutually exclusive sets).
1.2 Set operations
Union Set: Set obtained by combining two or more sets.
A ∪ B = {x : x ∈ Aorx ∈ A}
Intersection Set: Set obtained by combining two or more sets using only common
elements.
A ∩ B = {x : x ∈ Aandx ∈ A}
Universal Set (Ω): Totality of the all elements under consideration in a given set.
Compliment: Given Ω and A, then A C = {x : x ∉ Aandx ∈ Ω}
Set Difference: A –B is the set of all elements that are in A but not in B.
1.2.1 Laws of Set Operations
2
a) Commutative Law: A ∪ B = B ∪ A; A ∩ B = B ∩ A
b) Associative Law: A ∪ ( B ∪ C ) = ( A ∪ B) ∪ C ; A ∩ ( B ∩ C ) = ( A ∩ B) ∩ C
c) Distributive Law:
A ∪ ( B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C ); A ∩ ( B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C )
Note given the above;
( AC ) C = A
A ∩ AC = Φ
A∩Ω = A
A ∪ AC = Ω
1.2.2 De Morgan’s Law
( A ∪ B) C = A C ∩ B C
(i)
( A ∩ B) C = A C ∪ B C
Proof:
Method 1
Let ( A ∪ B) C ≠ A C ∩ B C
Suppose the following:
x ∈ ( A ∪ B) C
and
x ∉ AC ∩ B C
x ∉ AC ∩ B C
x ∉ (Ω − A) ∩ (Ω − B)
x ∉ (Ω − A ∩ B )
x ∉ ( A ∩ B) C
x∈ A∩ B
This is contradictory and the hence the assumption of ( A ∪ B) C ≠ A C ∩ B C is not

correct. Therefore we conclude:
( A ∪ B ) C = AC ∩ B C
3
Method 2
Let
x ∈ ( A ∪ B) C ⇒ x ∉ Aandx ∉ B ⇒ x ∈ A C andx ∈ B C
⇒ x ∈ ( AC ∩ B C ) ⇒ ( A ∪ B) C ⊂ AC ∩ B C
Let
x ∈ ( A C ∩ B C ) ⇒ x ∈ A C andx ∈ B C ⇒ x ∉ Aandx ∉ B
⇒ x ∉ ( A ∪ B) ⇒ x ∈ ( A ∪ B) C ⇒ AC ∩ B C ⊂ ( A ∪ B) C
The second part of the theorem is left for you to prove.

We summarize the laws on set operations below:
S.NO Operator Symbol Example Meaning

1 Union U AUB The event of either A or B
occurring.
2 Finite union n 3 The event of any one of the
U i U Ai
A
i =1 i =1
events A1,A2 and A3 occurring.
3 Countable union ∞ ∞ The event of any one of the
UA
i =1
i UA
i =1
i events A1,A2 ….. occurring.
4 Intersection ∩ A∩B The event of both A and B
occurring.
5 Finite n 3 The event of all the events A1,A2
intersection IA
i =1
i IA
i =1
i and A3 occurring.
6 Countable ∞ ∞ The event of all the events A1,A2
intersection IA
i =1
i IA
i =1
i ….. occurring.
7 Complementation c or − Ac or A The event of A not occurring.
8 Subtraction - A-B The event of A occurring and B
not occurring.
1.3 Function
A function is a rule or law that associates each element in one set with one and only one
element in another set of elements.
Let a1 ∈ A and a 2 ∈ B
a 2 = f (a1 )
f (.) = {a 2 ∈ B | a1 ∈ A., a 2 = f (a1 )}
Let A ⊂ Ω and x ∈ A . We can define an Indicator function as
I A = 1 if x ∈ A
I A = 0 if x ∉ A
4
1.4 Sample Space and Event
The sample space for an experiment is the set of all experimental outcomes. In Sample
Space, 2 outcomes cannot occur at the same time. On the other hand, one outcome must
occur.
Examples
Experiment Sample Space
Toss a Coin, Note Face {Head, Tail}

Toss 2 Coins, Note Faces {HH, HT, TH, TT}
Play a Soccer Game {Win, Lose, Draw}
Inspect a Part, Note Quality {Defective, Good}
Observe Gender {Male, Female}
Event could be any collection of sample points. Simple Event refers to Outcome with one
characteristic whereas Compound Event refers to:
Collection of outcomes or simple events

Two or more characteristics
Joint event is a special case
Examples
Experiment: Toss 2 Coins. Note Faces.

Sample Space: HH, HT, TH, TT
Event Outcomes in Event
1 Head & 1 Tail HT, TH

Head on 1st Coin HH, HT
At Least 1 Head HH, HT, TH
Heads on Both HH
5
1.5 Different approaches to the theory of Probability
There are three major approaches to the theory of Probability: (1) Classical approach (2)
Frequency approach and (3) Axiomatic approach.
1.5.1 Classical approach
The classical approach makes the following assumptions:
(a) The Sample space is a finite set of elementary events.
(b) All elementary events are equally likely to occur in a single trial of the
experiment.
Classical (or Laplace) definition of Probability:
Under the above assumptions, the probability of any Event E, is given by the ratio
The number of elementary events in E

P( E ) = .
The total number of elementary events in S
This formula will help us to compute the probabilities of many events under the classical
(finite sample- space equally likely elementary events) set up. Note that the probability of
any event lies between 0 and 1.
Example 1: The probability of getting an odd number when a die is thrown is ½.
Example 2: In a class of 15 male students and 14 female students two class

representatives are to be selected at random. Then the probability of selecting a particular
male student and a particular female student is:
⎛15 ⎞⎛14 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟
⎝ 1 ⎠⎝ 1 ⎠ .
⎛ 29 ⎞
⎜⎜ ⎟⎟
⎝2⎠
Note that the definition will give P(S) = 1 and P(φ) = 0. The certain event has probability
1 and the impossible event has probability 0!
Two events A and B are said to be mutually exclusive if A ∩ B = φ. It is easy to prove

that for any two mutually exclusive events P(AUB) = P(A) + P(B).
6
In the classical (finite sample- space equally likely elementary events) set up, any two
events A and B can “independently” occur if P(A ∩ B) = P(A) P(B).
One can easily establish the following results using the definition:
1. P(AUB)=P(A)+P(B)-P(A ∩ B)
2. P(Ac)=1-P(A).
3. P(A-B)=P(A)-P(B) if A ⊃ B.
4. If A1,A2,…,Ak are k mutually exclusive events then
P(A1UA2U…UAk)=P(A1)+P(A2)+…+P(Ak).
5. If A1,A2,…,Akare k mutually exclusive events and exhaustive events (ie they addup
to S, then for any event B
P(B)= P(B ∩ A1)+P(B ∩ A2)+…+P(B ∩ Ak).
6. For any event E we have 0 ≤ P(E) ≤ 1 .
There are some limitations in this approach. They are:
1. The method fails if the sample space is infinite.
2. When events are not equally likely.
1.5.2 Relative frequency approach
If the elementary events are not equally likely, and even if the Sample space is infinite,
one can adopt this approach for any event. Let E be the event for which we want compute
the probability of its occurrence. Let the experiment be repeated a large number of times
say N times. Let M denote the number of times E has occurred. Then the probability of E
is defined by
Lim ⎡ M ⎤
P(E)=
N → ∞ ⎢⎣ N ⎥⎦
Note that this definition is based on a limiting concept. But the limit has been shown,
mathematically, to exist. But one cannot conduct the experiment infinite number of times
to find P(E) from a practical point of view. So one can approximate P(E) by the ratio
M/N for sufficiently large N. Note this approach suffers from large replications of the
experiment.
7
1.5.3 Axiomatic approach
The sample space, the set of all possible outcomes of the random experiment can be
uncountably infinite like the Real line R=(-∞, ∞). Note that this includes the finite sample
space in the classical approach. Let us denote this by Ω. We do not assign probabilities to
all subsets of Ω as that would be impossible if Ω is infinite. Instead we concentrate on a
class of interesting events that would be sufficient for our inference. Such a class, say F,
should satisfy the following three requirements:
1. The sample space Ω belongs to F.
2. If A belongs to F then Ac also belongs to F.
∞
3. If {An} is any sequence events then UA
n =1
n also belongs to F .
Any collection of events satisfying the above requirements is called a σ-FIELD of

events. We will call this class as the event space.
1.5.3.1 Axioms of Probability
Given the sample space Ω and the event space F, the probability is a non-negative real
valued function on the event space F satisfying the following three axioms:
Axiom 1: P ( Ai ) ≥ 0
Axiom 2: P (Ω ) = 1
Axiom 3: For any sequence {An} of mutually disjoint (exclusive) events in F,
∞ ∞
P (U An ) = ∑ P( An ) . The triplet (Ω, F,,P) is called the probability space.
n =1 n =1
We can use the above axioms to state the following theorems.
Theorem 1 P (Φ ) = 0
For infinite sets, we choose a set of Ai such that Ai = Φ (for all i = 1, 2, 3,….). By
definition they are mutually exclusive.
Using Axiom 3,
8
∞ ∞
P (U Ai ) = ∑ P( Ai )
i i =1
∞
P (Φ ) = ∑ P (Φ )
A number will be equal to zero if its sum (given it is positive) if and only if the number
itself is zero. This implies P (Φ ) = 0
Theorem 2 P ( A C ) = 1 − P( A)
AC = Ω − A
AC ∩ A = Φ
P (Ω ) = P ( A ∪ A C )
= P( A) + P( A C )
= 1 ⇒ P( A C ) = 1 − P( A)
Theorem 3 P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B )
A ∪ B = A ∪ ( AC ∩ B )
A ∩ ( AC ∩ B ) = Φ
P( A ∪ B ) = P( A) + P( AC ∩ B )
P( A ∪ B ) = P( A) + P( B ) − P( A ∩ B )
General Result:
If A1, A2, …., An are not mutually exclusive events and Ai ∈ F for any i = 1,2,…,n then
n
P( A1 ∪ A2 ∪ .... ∪ An ) = ∑ P( A j ) − ∑ ∑ P( Ai ∩ A j ) + ∑ ∑ ∑ P( A1 ∩ A2 ∩ .... An )...( −1) n +1 P(( A1 ∩ A2 ∩ .... An )
j =1 i< j
Theorem 4 Let A, B ∈ F and A ⊂ B , then P( A) ≤ P( B )
B = ( B ∩ A) ∪ ( B ∩ A C )
( B ∩ A) ∩ ( B ∩ A C ) = Φ
A ∩ ( B ∩ AC ) = Φ
P ( B ) = P ( A) + P ( B ∩ A C )
Since
P( B ∩ A C ) ≥ 0
P( B ) ≥ P( A)
P ( A) ≤ P ( B )
9
Boole’s inequality (Proof omitted)
If Ai ∈ F for any i = 1,2,…,n then
P( A1 ∪ A2 ∪ ... ∪ An ) ≤ P( A1 ) + P( A2 ) + ... + P( An )
1.6 Counting Rules
A useful counting rule enables us to count the number of experimental outcomes when n
objects are to be selected from a set of N objects where the order of selection is
important.
Number of permutations of N objects taken n at a time:
⎛N⎞ N! N!
P nN = n! ⎜⎜ ⎟⎟ = n! =
⎝n ⎠ ( n! )( N − n)! ( N − n )!
Another useful counting rule enables us to count the number of experimental outcomes
when n objects are to be selected from a set of N objects. Here the order of selection is
not important. The number of combinations of N objects taken n at a time is:
⎛N⎞ N!
C nnNN = ⎜⎜ ⎟⎟ =
⎝ n ⎠ n!( N − n)!
Examples
Example 1
A problem is given to three students whose chances of solving it are ½, ¾, and ¼
respectively.
a) What is the chance that the problem is being solved?
b) What is the chance that exactly one of them solves it?
a) P(solved) = 1 – P(not solved) = 1 – (1/2)(3/4)(1/4) = 1 – (3/32) = 29/32
b) P(exactly one of them solves it)

= P(A solves, B and C do not) + P(B solves, A and C do not) + P(C solves, A
and B do not)
= P ( A ∩ B C ∩ C C ) + P ( B ∩ A C ∩ C C ) + P (C ∩ A C ∩ B C )
= (1/2)(1/4)(3/4) + (3/4)(1/2)(3/4) + (1/4)(1/2)(1/4) = 3/32 + 9/32 + 1/32
= 13/32
10
Example 2
An urn contains 4 Red, 3 White and 2 Blue balls. A person draws 4 balls without
replacement. What is the probability that amongst the balls drawn at least one ball of
each colour?
Note the events are:
{2 R, 1 W, 1 B}, {1 R, 2 W, 1 B}, {1 R, 1 W, 2 B}
4 3 2 3 2 3 2
C 2 C1 C1 4 C1 C 2 C1 4 C1 C1 C 2 4
= 9
+ 9
+ 9
=
C4 C4 C4 7
Example 3
An urn contains 4 tickets numbered 1, 2, 3, and 4. Another urn contains 6 tickets

numbered 2, 4, 6, 7, 8 and 9. If one of the 2 urns is chosen at random and a ticket is
drawn (at random) from the chosen urn, what is the probability that ticket drawn bears the
number 2 or 4?
P(2,4) = P(U1) + P(U2) = (1/2)(1/2) + (1/2)(2/6) = 5/12
1.7 Conditional Probability
The conditional probability refers to the probability of an event given that another event
has occurred. The conditional probability of A given B is denoted by P(A|B). A
conditional probability is computed as follows:
P( A ∩ B )
P( A | B) = if P( B) ≠ 0
P( B)
P( B ∩ A)
P( B | A) = if P( B) ≠ 0
P( A)
Note P( A ∩ B) = P( B ∩ A)
P( A | B) P ( B) = P( B | A) P( A)
Note
∞
∞
P ( U ( Ai ∩ B) n
P ( U Ai | B) = i =1
and P ( A) = ∑ P( A | Bi ) P( Bi )
i =1 P( B) i =1
11
1.7.1 Bayes’ Theorem
If A is any event, such that P(A)>0, and B1,B2,…,Bn are any finite set of mutually
exclusive and exhaustive events such that P(Bi) >0 for all i= 1,2,…,n, then
P( A | Bi ) P ( Bi )
P ( Bi | A) = n
for i = 1,2,..., n .
∑ P( A | B ) P( B )
i =1
i i
Proof: By the definition of conditional probability, for each i = 1,2,…,n, we can write
P(A∩Bi)=P(A|Bi)P(Bi) and also as P(Bi|A)P(A). Therefore,
P( A | Bi ) P( Bi ) P( A | Bi ) P( Bi )
P ( Bi | A) = = n
P( A)
∑ P( A | Bi ) P( Bi )
i =1
using the law of total probability for P(A) in the denominator.
1.7.2 Multiplication Rule
P ( A1 ∩ A2 ∩ ... ∩ An ) = P( A1 ) P( A2 | A1 ) P( A3 ) P( A1 ∩ A2 )...P( An | A1 ∩ A2 ∩ ... ∩ An −1 )
1.8 Independent Events
Independence plays an import role in probability theory. To know what it is we must

differentiate between pair-wise independence and mutual independence. Pair-wise
independence refers to two events while mutual independence refers to more than two
events.
Definition: (Pair-wise independence) Two events, say A and B, are said to be
independent if P ( A ∩ B) = P( A) P ( B) .
Definition: (Mutual independence) A set of k events, say A1,A2,…,Ak (k>2), are said to
be mutually independent if all the following k-2 conditions hold;
• For any two events in the given set of events, say Ai and Aj, i ≠ j,
P(Ai∩Aj) = P(Ai)P(Aj) i,j=1,2,…,k, i ≠ j.
• For any three events say Ai Aj and Ak, i ≠ j ≠ k
P(Ai∩Aj∩Ak) = P(Ai)P(Aj)P(Ak) i,j,k=1,2,…,k, where i ≠ j ≠ k and so on up to
P(A,A2…Ak)=P(A1) P(A2)… P(Ak)
12
Note that mutual independence implies pair-wise independence (see the first condition
above). But mere pair-wise independence need not imply mutual independence as the
following example shows.
Example:
Let Ω = {w1,w2,w3,w4}; A ={w1,w2},B={ w1,w3} and C={ w1,w4}. Let P{w1}=1/4,

P{w2}=1/4, P{w3}=1/4 and P{w4}=1/4. Using the classical definition one can see that
P(A) = 1/2, P(B) = 1/2 and P(C) = 1/2. But P(A∩B∩C)=1/4 which is not equal to
P(A)P(B)P(C)=1/8. Therefore A, B and C are pair-wise independent but are not mutually
independent!
Immediate consequences of the definition of independence are listed below:

1. If A and B are independent, then Ac and B, A and Bc and Ac and Bc are also
independent.
2. If A and B are independent, then
P(A|B)=P(A) and P(B|A)=P(B)
1.9 Random Variables
Let Ω be the given sample space and F be the event space. A random variable X is a real
valued function on the sample space Ω such that the inverse image of any interval of the
type (-∞,x] lies in F for all values of x.
Note that there are two requirements for X to be a random variable.

1. For any elementary event w in the sample space Ω, X(w) is real number.
2. The set of elementary events w such that X(w) ≤ x is an event for which the
probability can be determined, for all real values of x. In other words, one
should be able to find P(X ≤ x) for all values of x.
Let us denote by F(x) the value P(X ≤ x). This function is called the distribution
function of the random variable. Note, therefore, that every random variable has a
distribution function which determined through the Probability P on the event space.
1.9.1 Discrete Variables

If an ‘experiment’ (a trial, the empirical observation of a phenomenon) can result in a
finite ( x1 , x 2 , ..., x n ) or countably infinite ( − ∞, ..., xi −1 , xi , xi +1 , ..., + ∞ ) set of specified
13
measurable outcomes and no other, the random variable X which measures the outcome
is said to be discrete:
X = [x1 , x 2 , ..., x n ] finite
X = [ ..., xi −1 , xi , xi +1 , ... ] countably infinite
For example, if a random sample of n items is taken from a production line, the random
variable X, representing the number of defectives in the sample, is discrete as it can take
the set of values x = 0, 1, 2, ..., n but no other, i.e. the variable cannot take values like ½,
0.3, 0.0001 etc. Discrete variables need not be integers, e.g. the proportion of defective
1 2
items in the sample would take the values, 0, , , ..., 1 and would also be a discrete
n n
variable.
1.9.2 Continuous Variables
If an ‘experiment’ can result in an uncountably infinite set of measurable outcomes which

can take any value within a generic range (i.e. from − ∞ to + ∞ ) or a specified range
(e.g. from a to b), the random variable X which measures the outcome is said to be
continuous:
X = [− ∞ < x < +∞ ] generic range
X = [a < x < b] specified range
For example, arrival time of students for a 9:00am lecture may be a random variable X
taking values in the range 8:30 to 9:30. Here any value of the variable can occur.
Note the notational convention: capital letters, like X, Y, Z, refer to random variables in a
general sense; small letters, like x, y, z, refer to possible values taken by random variables
(collectively); small letters with a subscript, like x1 , x 2 , x3 , refer to specific values of a
random variable.
14
1.10 Probability Distributions
Discrete Variables
The probability distribution of a discrete random variable is a function (in the

mathematical sense) that gives the probabilities of possible values of the random variable.
It can take the form of a table (empirical distribution) which lists the individual
probabilities, or the form of an algebraic expression (theoretical distribution) which
generates the individual probabilities. Using general notation, a probability distribution
of a discrete random variable can be defined as
f (x ) = P( X = x ) for x = ..., xi −1 , xi , xi +1 , ...
assuming the random variable X is defined as X = [ ..., xi −1 , xi , xi +1 , ... ] .
The probability distribution f ( x ) has the following properties:
(a) The outcomes xi are mutually exclusive and collectively exhaustive;
(b) f ( xi ) ≥ 0 for all i, i.e. probabilities are non-negative;
(c) ∑ f (x ) = 1, i.e. the sum of all probabilities is equal to 1.

x
Example 1 Empirical Distribution (Table)

Let X be daily sales (in number of loaves) of a special rye bread in a shop
x 0 1 2 3 4 5
f (x ) 0.1 0.2 0.3 0.2 0.1 0.1 ∑ =1
15
Example 2 Theoretical distribution (Algebraic expression)
If a random sample of n items is taken from a production line with probability of a
defective being equal to P, the probability distribution of ‘the number of defectives in the
sample’ (X) is given by
n!
f ( x ) = nC x P x (1 − P )
n− x
, for x = 0, 1, 2 , ..., n, where n
Cx =
x !(n − x )!
So, if n = 5 and P = 0.1 , then
f ( x = 2) = f (2) = 5C 2 (0.1) (0.9 ) = (10)(0.01)(0.729) = 0.0729 .

2 3
Here f ( x ) is given by an algebraic expression known as the binomial distribution.
Continuous Variables
For a continuous random variable, the probability distribution is a function (in the
mathematical sense) that describes a curve, the probability density function (pdf), so that
areas under the curve give probabilities associated with corresponding intervals of the
variable. Using general notation, a probability distribution of a continuous random
variable can be defined as
f (x ) for − ∞ < x < +∞
assuming the random variable X is defined as X = [− ∞ < x < +∞ ] .
Here the probability distribution f ( x ) always takes the form of an algebraic expression.
The pdf f ( x ) has the following properties:
(a) f ( x ) ≥ 0, i.e. probabilities are non-negative;
+∞
(b) ∫ f (x ) dx = 1, i.e. probabilities sum to 1;
−∞
16
b
(c) P (a ≤ X ≤ b ) = ∫ f ( x ) dx , i.e. the area under the pdf between a and b
a
represents the probability that X lies in the interval a to b, with a ≤ b .
Note also that P( X = x ) = 0 , as f (x ) is the ordinate of the pdf and thus has no area.
Example
Daily sales of petrol (X, measured in ‘000 litres) of a garage has the following pdf:
x
f (x ) = , 0< x<2
2
f (x )
0.5
0
0 1 2 x
2
1 ⎡ x2 ⎤
2 2
x 1
(1) ∫ f (x ) dx = ∫
0 0
2
dx = ⎢ ⎥ = (4 − 0 ) = 1 as required, i.e. f ( x ) is correctly
2 ⎣ 2 ⎦0 4
specified.
3 3
1 ⎡ x 2 ⎤ 2 1 ⎡⎛ 3 ⎞ ⎛ 1 ⎞ ⎤ 1
2 2 2
⎛1 3⎞ x
(2) P⎜ ≤ X ≤ ⎟ = ∫ dx = ⎢ ⎥ = ⎢⎜ ⎟ − ⎜ ⎟ ⎥ = , i.e. the probability
⎝2 2⎠ 1 2 2 ⎣ 2 ⎦ 1 4 ⎢⎣⎝ 2 ⎠ ⎝ 2 ⎠ ⎥⎦ 2
2 2
that daily sales will be between 500 and 1500 litres is 0.5.
x
Note that if the pdf was specified as, for example, f ( x ) = , where k is an unknown
k
constant, the value of k could be found by using the relation
17
2
∫ f (x ) dx = 1 .
0
In our example,
2
1 ⎡ x2 ⎤
2 2
x 1⎛1⎞ 2
∫ f (x ) dx = ∫
0 0
k
dx = ⎢ ⎥ = ⎜ ⎟(4 − 0) = = 1
k ⎣ 2 ⎦0 k ⎝ 2 ⎠ k
and, therefore, k = 2 .
1.11 Cumulative Probability Distributions (cpd) (or Distribution Functions)
Discrete Variables
For a discrete random variable, the cumulative probability distribution (cpd) (or
distribution function) is a function that gives the probability that X does not exceed a
specific value x. Using general notation,
F ( x ) = P( X ≤ x ) = ∑ f (t ) for x = ..., xi −1 , xi , xi +1 , ...
t≤x
Note that f ( x ) has been replaced by f (t ) whose form is the same as f ( x ) . The change
of x into t is simply to distinguish the variable (t) from the upper limit of the summation
(x).
Properties of the cpd F (x ) :
(a) F (− ∞ ) = 0 and F (+ ∞ ) = 1 ;
(b) If a < b , then F (a) ≤ F (b) , for any real numbers a and b;
(c) f ( x3 ) = F ( x3 ) − F ( x 2 ), f ( x 2 or x3 ) = F ( x3 ) − F ( x1 ) .
Example 1 Empirical Distribution (Table)

Let x be daily sales (in number of loaves) of a special rye bread in a shop.
x 0 1 2 3 4 5
f (x ) 0.1 0.2 0.3 0.2 0.1 0.1
F (x ) 0.1 0.3 0.6 0.8 0.9 1.0
18
f (x) F (x)
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 1 2 3 4 5 x 0 1 2 3 4 5 x
Note that if f ( x ) is given, F (x ) can be obtained, and vice versa.
Example 2 Theoretical Distribution (e.g. binomial) (Algebraic expression)
f ( x ) = n C x P x (1 − P )
n− x
x = 0 , 1, 2 , ..., n
t=x
F ( x ) =∑ n C t P t (1 − P ) = nC 0 P 0 (1 − P ) + nC1 P 1 (1 − P ) + ... + nC x P x (1 − P )
n −t n −0 n −1 n− x
t =0
.
Continuous Variables
The cumulative probability distribution (cpd) (or distribution function) for a continuous
random variable is a function that gives the probability that X does not exceed a specific
value x. Using general notation,
x
F (x ) = P( X ≤ x ) = ∫ f (t ) dt for − ∞ < x < +∞
−∞
Note that t replaces x in f ( x ) , in order to distinguish the variable (t) from the upper limit
of integration (x).
Properties of the cpd F (x ) :
19
(a) F (− ∞ ) = 0 and F (+ ∞ ) = 1 ;
(b) If a < b , then F (a) ≤ F (b) , for any real numbers a and b;
(c) P(a ≤ X ≤ b ) = F (b ) − F (a ) , for any real numbers a and b with a ≤ b ;
dF ( x )
(d) f ( x) = .
dx
Note that if F (x ) is given, f ( x ) can be found from it by differentiating F (x ) with

respect to x. This reverses the integration process underlying in the definition of F (x ) .
Example
Returning to the example where X = sales of petrol, we have:

x
f ( x) = , 0< x<2
2
x
1 ⎡t 2 ⎤
x x
t 1 2
F ( x ) = ∫ f (t ) dt = ∫ dt = ⎢ ⎥ = x .
0 0
2 2 ⎣ 2 ⎦0 4
Note that if the integration can be performed, F ( x ) is usually a simple function of x

(unlike most cases involving discrete variables).
For example:
2
⎛ 3⎞ ⎛ 3⎞ ⎛ 3⎞ 1 ⎛ 3⎞ 1⎛ 9 ⎞ 9
P⎜ X ≤ ⎟ = F ⎜ x = ⎟ = F⎜ ⎟ = ⎜ ⎟ = ⎜ ⎟ =
⎝ 4⎠ ⎝ 4⎠ ⎝ 4⎠ 4⎝ 4⎠ 4 ⎝ 16 ⎠ 64
gives the probability that daily sales of petrol will not exceed 750 litres.
2 2
⎛1 3⎞ ⎛ 3⎞ ⎛1⎞ 1 ⎛ 3⎞ 1⎛1⎞ 1⎛9⎞ 1⎛1⎞ 9 1 1
P⎜ ≤ X ≤ ⎟ = F ⎜ ⎟ − F ⎜ ⎟ = ⎜ ⎟ − ⎜ ⎟ = ⎜ ⎟ − ⎜ ⎟ = − =
⎝2 2⎠ ⎝ 2⎠ ⎝ 2⎠ 4⎝ 2⎠ 4⎝ 2⎠ 4 ⎝ 4 ⎠ 4 ⎝ 4 ⎠ 16 16 2
gives the probability that daily sales of petrol will be between 500 and 1500 litres. Note
that the answer is the same as in the example given in Section 2.
20
1 2 dF ( x ) 2 x x
Given F (x ) = x , f (x ) = = = , for 0< x<2 as required.
4 dx 4 2
f ( x ) and F (x ) are illustrated below:
f (x ) F (x )
1 1
0.5 0.5
0 0
0 1 2 x 0 1 2 x
Probability distributions and cumulative probability distributions play an important role

in theoretical statistics and are very useful in applied statistics. When an empirical
probability distribution (table) can be replaced by a mathematically specified distribution
(algebraic expression), it is much easier to analyze it. Note also that although the
distinction between discrete and continuous variables is useful, sometimes discrete
distributions will be approximated by continuous distributions because the latter
distributions are easier to analyze.
21

Probability Theory: 1.1 Basic Concepts

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory: 1.1 Basic Concepts

Uploaded by

Copyright:

Available Formats

1.

1.1 Basic Concepts

• An experiment is any process that generates well-defined outcomes.

• Finite Sets: Finite number of elements.

There are three main ways to specify a set:

(1) Listing all its members (list notation);

Example: {zen, astra, santro}

(2) Stating a property of its elements (predicate notation);

Example: {x | x is a natural number and x < 8}

Example – the set E of even numbers greater than 3:

A set A is a subset of a set B if and only if every element of A is also an element of B.

1.2 Set operations

Union Set: Set obtained by combining two or more sets.

Compliment: Given Ω and A, then A C = {x : x ∉ Aandx ∈ Ω}

1.2.1 Laws of Set Operations

1.2.2 De Morgan’s Law

Suppose the following:

This is contradictory and the hence the assumption of ( A ∪ B) C ≠ A C ∩ B C is not

The second part of the theorem is left for you to prove.

S.NO Operator Symbol Example Meaning

Experiment Sample Space

Toss a Coin, Note Face {Head, Tail}

Collection of outcomes or simple events

Experiment: Toss 2 Coins. Note Faces.

Event Outcomes in Event

1 Head & 1 Tail HT, TH

1.5.1 Classical approach

The classical approach makes the following assumptions:

(a) The Sample space is a finite set of elementary events.

Classical (or Laplace) definition of Probability:

The number of elementary events in E

Example 1: The probability of getting an odd number when a die is thrown is ½.

Example 2: In a class of 15 male students and 14 female students two class

Two events A and B are said to be mutually exclusive if A ∩ B = φ. It is easy to prove

4. If A1,A2,…,Ak are k mutually exclusive events then

P(B)= P(B ∩ A1)+P(B ∩ A2)+…+P(B ∩ Ak).

6. For any event E we have 0 ≤ P(E) ≤ 1 .

There are some limitations in this approach. They are:

1. The method fails if the sample space is infinite.

2. When events are not equally likely.

1.5.2 Relative frequency approach

1. The sample space Ω belongs to F.

2. If A belongs to F then Ac also belongs to F.

Any collection of events satisfying the above requirements is called a σ-FIELD of

1.5.3.1 Axioms of Probability

We can use the above axioms to state the following theorems.

If Ai ∈ F for any i = 1,2,…,n then

1.6 Counting Rules

Number of permutations of N objects taken n at a time:

b) P(exactly one of them solves it)

Note the events are:

An urn contains 4 tickets numbered 1, 2, 3, and 4. Another urn contains 6 tickets

P(2,4) = P(U1) + P(U2) = (1/2)(1/2) + (1/2)(2/6) = 5/12

1.7 Conditional Probability

using the law of total probability for P(A) in the denominator.

1.7.2 Multiplication Rule

P ( A1 ∩ A2 ∩ ... ∩ An ) = P( A1 ) P( A2 | A1 ) P( A3 ) P( A1 ∩ A2 )...P( An | A1 ∩ A2 ∩ ... ∩ An −1 )

1.8 Independent Events

Independence plays an import role in probability theory. To know what it is we must

Let Ω = {w1,w2,w3,w4}; A ={w1,w2},B={ w1,w3} and C={ w1,w4}. Let P{w1}=1/4,

Immediate consequences of the definition of independence are listed below:

1.9 Random Variables