Probability Theory

Int
Theory of Probability
Dr. Amitava Mukherjee;

Why Should We Teach and Learn
in A Premier B- School?
Uncertainty and Inductive Reasoning
Unavoidably an element of uncertainty is always present regarding

the truth or correspondence with facts of the conclusion reached
through inductive reasoning.
Pure Mathematics or Applied Mechanics you have studied so far

are broadly based on prolongation of deductive reasoning.
In deduction, given the premises, the conclusion necessarily follows

from them.
If a piece of deductive reasoning is free from fallacy, its conclusion

is formally valid.
3
Uncertainty and Inductive Reasoning
If the premises are materially valid so is the conclusion, but

deduction as such is not concerned with material validity.
In induction the premises only lend some support to the

conclusion relate to the contingent (i.e. situated in space and
time) world.
Apart from formal validity, the question of the material

validity or validity as the basis of practical action of the
conclusion (over and above the same of the premises)
naturally arises.
4
Degree of Uncertainty
In the case of every exercise at induction, the question of

assessing the degree of uncertainty, or in other words, the
extent of support given to the conclusion by the evidence, is
relevant.
Until the operations generating the observations (including

additional observations if extensibility is assumed) are
performed, the evidence (extended evidence) that would be
realized remains uncertain and this uncertainty can be
assessed in terms of probability.
5
Degree of Uncertainty
In the case of every exercise at induction, the question of

assessing the degree of uncertainty, or in other words, the
extent of support given to the conclusion by the evidence, is
relevant.
Until the operations generating the observations (including

additional observations if extensibility is assumed) are
performed, the evidence (extended evidence) that would be
realized remains uncertain and this uncertainty can be
assessed in terms of probability.
6
Uncertainty Objective Approach
In controlled experiments where a set of units are subjected

to different treatments before observation, allocation of the
units to the treatments is similarly randomized.
Example:
1. Inspection of diameters of cork stoppers in a production-

line
2. Controlled clinical Trials
7
Uncertainty Objective Approach
Uncertainty in the evidence may arise due to one or more of

the causes, natural variation, errors of measurement,
sampling variation (incorporating any randomness
deliberately introduced), unforeseen contingencies etc.
8
Uncertainty Subjective Approach
In the subjective approach there is no question of repetition
of observations.
Here uncertainty only means absence of knowledge about
the evidence and extended evidence, before the generating
operations are performed.
Because of this, the scope for induction is somewhat wider
in the subjective than in the objective approach.
Example:
1. Number of working hours that might be wasted due to
contact labour strike in next six months.
2. Exchange rate at the next morning
9
Meaning of Probability
Various Aspects
Meaning of probability
As regards probability which expresses the uncertainty about the

observables, it is given radically different interpretations in the
objective and subjective approaches.
In the former, roughly speaking, we assume in effect that the

unpredictable variation of the evidence is such that the relative
frequency with which it would belong to any meaningful set in the
evidential domain would tend to stabilize around some idealized
value, if the number of repetitions were made indefinitely large.
The meaningful sets, technically called measurable sets, are those

which are of practical interest and are theoretically easy to handle
11
The basis of this assumption, which we call frequential regularity

is our experience with diverse types of particular repetitive
experiments.
This is commonly called statistical regularity [A misnomer]
For any set of interest the probability that the uncertain evidence
will belong to it is identified with the corresponding idealized
long-term relative frequency.
Probabilities, so defined, of all meaningful sets in the evidential

domain determine a probability distribution over the domain and
this gives an objective representation of the evidential uncertainty.
12
In the subjective approach probability exists only in one's mind and

may vary from person to person.
For a particular person the probability of any set of interest

represents the person's degree of belief in the materialization of the
event that the evidence (extended evidence) generated through the
operations when they are performed would belong to that set.
In practice this degree of belief can be quantified introspectively,

e.g. by ascertaining the maximum price one is prepared to pay
outright for a unit gain contingent on the actual realization of the
event.
13
Ideally one should attach numerical degrees of belief to

different sets of interest in a consistent or coherent manner.
Coherent probabilities for different meaningful sets in the

domain define a probability distribution over it.
Since uncertainty here means absence of knowledge, such a

probability distribution may cover evidence extended
backward or collaterally to involve unobserved characters
belonging to the past or the present.
14
Gambling and Games of Chances
A Fascinating History of Development of
Cardanoan unrecognized pioneer
Gerolamo Cardano
24 September 1501 21 September

1576
A renowned physician, mathematician,

astrologer, and an inveterate gambler
16
He wrote a book entitled Liber de Ludo Aleae (The Book on Games
of Chance) around 1564
The book remained unpublished possibly because of various

misfortunes and tragedies that befell the author towards the end of
his life and saw the light of day only in 1663.
Cardano suffered a number of other tragedies as well. Cardano's son

Giambatista poisoned his wife.
Cardano was jailed briefly for heresy (in part for casting the
horoscope of Jesus).
Cardano supposedly predicted the date of his own death, a

prediction that he perhaps ensured by suicide.
17
Basic Ideas and Rules of Probability Theory:
Conceptualized by Cardano
1. The chance of an event in a random trial represents its long-

run relative frequency.
2. If a die is honest its different faces have equal chance of

appearing.
In fact Cardano makes the statement, I am as able to throw 1,
3, or 5 as 2, 4, or 6 which suggests that he had something like
propensity in mind. From this he identifies the set of equally
likely cases (the sets of all 36 or 216 permutations) when two or
three honest dice are thrown.
He uses the term circuit for such a set.
18
3. When the circuit for a trial is well-identified, the chance of an

event is represented by the portion of the whole circuit favourable
to it.
Cardano gives the rule that to obtain the odds we have to consider
in how many ways the favourable result can occur and compare
that number to the remainder of the circuit
4. Cardano correctly uses the rule for addition of probabilities in terms

of disjoint events.
In throwing two dice of 36 equally likely cases, 11 are favourable to
the event at least one ace, 9 additional cases become favourable if
we take the larger event at least one ace or deuce, 7 further cases
come if we consider the still larger at least one ace, deuce, or trey
and so on.
Similar computations are made for three dice.
19
5. Cardano also correctly formulates the product rule for computing
the chance of the simultaneous occurrence of events defined for
independent trials Details will be discussed later.
6. In the case of throwing two dice the odds on getting at least one ace,
deuce, or trey are 3:1.
Cardano states that if the player who wants an ace, deuce, or trey
wagers three ducats [a standard unit of currency at that time] and
the other player one, then the former would win three times and
would gain three ducats and the other once and would win three
ducats; therefore in the circuit of four throws [impliedly in the long
run] they would always be equal.
20
Galileo Galilei Sought to resolve a
puzzle about a dice game
Galileo Galilei
15 February 1564 8 January 1642
One of the pioneers in introducing

experimental methods in science
21
Conceptualized by Galileo
In throwing three dice, the numbers of unordered partitions producing the

total scores 9 ({1, 2, 6}, {1, 3, 5}, {1, 4, 4}, {2, 2, 5}, {2, 3, 4}, {3, 3, 3})
and 10 ({1,3, 6}, {1, 4, 5}, {2, 2, 6}, {2, 3, 5}, {2, 4, 4}, {3, 3, 4}) are
both equal to 6. Yet, why is it that long observation has made dice-players
consider 10 to be more advantageous than 9?
Galileo pointed out that there is a very simple explanation, namely that
some numbers are more easily and more frequently made than others,
which depends on their being able to be made up with more variety of
numbers.
A variety of numbers making up a score here represents an ordered

partition. There being 27 such ordered partitions for the score 10 and 25
for the score 9 and all ordered partitions or permutations being equally
likely, the chance of getting a 10 is higher.
22
Probability is officially bornPascal
and Fermat
Blaise Pascal
19 June 1623 19 August 1662
A French (Parisian) mathematician,

physicist, inventor, writer and Christian
philosopher.
Pascal solved some problems on Games of

chances including the failed attempts of
Cardano through correspondence with his
friend, Pierre de Fermat (16011665) -
stationed at Toulouse
23
Probability is officially bornPascal
and Fermat
Pierre de Fermat
17 August 1601 (or 1607) 12
January 1665
Although a jurist by profession, Fermat had

become famous for his contributions to
mathematics and the other branches of
knowledge
24
First Published Book on Probability
Christiaan Huygens
14 April 1629 8 July 1695
A was a prominent Dutch mathematician

and scientist. He is known particularly as
an astronomer, physicist, probabilist and
horologist.
Wrote the book entitled De Ratiociniis in

Ludo Aleae (Computations in Games of
Chance), published in 1657 - The first
published book on probability
25
Applications: Probability in Finance :
Consider only two players, they alternate moves, each is
immediately informed of the others moves, and one or the
other wins.
In such a game, one player has a winning strategy, and so we

do not need the subtle solution concepts now at the center of
game theory in economics and the other social sciences.
Reference: Probability and Finance: Its Only a Game!, by

Glenn Shafer and Vladimir Vovk. 2001 by JohnWiley & Sons,
Inc.
26
Probability in Finance :
Consider a straightforward but rigorous framework for elaboration, with no
extraneous mathematical or philosophical baggage, of two ideas that are
fundamental to both probability and finance:
The Principle of Pricing by Dynamic Hedging : [Can be discerned in the

letters of Blaise Pascal to Pierre de Fermat in 1654] When simple gambles
can be combined over time to produce more complex gambles, prices for
the simple gambles determine prices for the more complex gambles.
The Hypothesis of the Impossibility of a Gambling System: Sometimes

we hypothesize that no system for selecting gambles from those offered to
us can both (1) be certain to avoid bankruptcy and (2) have a reasonable
chance of making us rich.
27
Probability in Marketing :
A company might like to estimate the probability for the

increase in volume of sales by Rupees ten million given a
particular marketing campaign.
Probability models are used to measure consumer lifetime

value.
28
Elementary Calculus
Probability:
Classical and Frequentist
Approaches
Calculus of Probability :
Connections with Set Theory
Set Theory Probability Theory Notations

Element Outcome / Elementary Event ; .
Set (Compound) Event Collection of ,
Elementary Events
Universal Set Sample Space or Sure Event or
Null set Impossible Event
Complement of a set A Complementary event of A
is a subset of Occurrence of event implies
( is a superset of ) occurrence of event ( )
Union of sets and Occurrences of events or
30
Calculus of Probability :
Connections with Set Theory
Set Theory Probability Theory Notations
Intersection of sets and Joint occurrences of events
and
and are disjoint set and are mutually =

exclusive events
and are exhaustive set and are exhaustive =
events
Power Set - The set of (Countable) Sigma- Field
all subsets of S, including
the empty set and S itself
31
Classical Definition of Probability :
As in Thorie analytique des probabilits
by Pierre-Simon Laplace
The probability of an event is the ratio of the number of cases

favorable to it, to the number of all cases possible when
nothing leads us to expect that any one of these cases should
occur more than any other, which renders them, for us,
equally possible
The Probability of an event A is defined a-priori without
actual experimentation as

=

provided all these outcomes are equally likely.
32
Simple Examples
Consider a box with n white and m red balls. In this case,

there are two elementary outcomes: white ball or red ball.

Probability of selecting a white ball is
+
We can use classical definition to determine the probability

that a given number is divisible by a prime p. If p is a prime
number, then every pth number (starting with p) is divisible
by p. Thus among p consecutive integers there is one
favorable outcome, and hence
1
P =

33
Frequentist Definition of Probability :
The frequentist view may have been foreshadowed by

Aristotle, in Rhetoric, when he wrote:
the probable is that which for the most part happens
34
In the frequentist interpretation, probabilities are discussed

only when dealing with well-defined random experiments (or
random samples).
The set of all possible outcomes of a random experiment is

called the sample space of the experiment.
An event is defined as a particular subset of the sample space

to be considered.
35
For any given event, only one of two possibilities may hold: it
occurs or it does not.
The relative frequency of occurrence of an event, observed in

a number of repetitions of the experiment, is a measure of the
probability of that event.
This is the core conception of probability in the frequentist

interpretation.
36
Thus, if t is the total number of trials and is the number

of trials where the event occurred, the probability ( ) of
the event occurring will be approximated by the relative
frequency as follows:

.

Clearly, as the number of trials is increased, one might expect
the relative frequency to become a better approximation of a
"true frequency".
37
A claim of the frequentist approach is that in the "long run,"

as the number of trials approaches infinity, the relative
frequency will converge exactly to the true probability:

( ) = lim .

38
Travelers Choices
There are three major possible options available to a travel

agency for its customer who wants to travel to New Delhi
from Jamshedpur
Direct train to New Delhi

By Road/Rail to Ranchi and Flight from Ranchi
By Rail/road to Kolkata and Flight from Kolkata
Agency has records of previous bookings in the same route

over last few years that will help them to assess probable
choice of customers
39
Other Applications
Proportion of loan application rejected by a a major bank from SMEs

micro, small and medium-sized enterprises
Proportion of defective items produced by a manufacturing unit
In estimating life time of a product, proportion of electric bulb

survived after 1000 hours in operations.
In fact, in estimating probability of survival of a electric/electronic
device after certain hours, we can actually use a lifetime
distribution that we study later.
Not only for consumer durables, we may think of Consumer
lifetime as well.
40
Combinatorics :
Arrangement of r balls in n cells
Four possible cases according to

Whether balls are distinguishable of not
Whether Exclusion principle followed (cells cannot more
than one ball) or not
41
Combinatorics :
Arrangement of r balls in n cells
Exclusion principle Exclusion principle not
followed followed
Balls are () =
!

distinguishable !
(Maxwell-Boltzman
0 otherwise Statistics)
Balls are +1
indistinguishable
(FermiDirac (Bose-Einstein Statistics)
statistics) Special Case:
No cell remain empty:
1
1
42
Application
A random sample of size with replacement is taken from a

population of elements. What is the probability that in the
sample, no element appears twice, that is, the sample could
have been realized also by sampling without replacement?
We see that there are possible sample in all of which

() satisfies the stipulated condition. Assuming that all
arrangements have equal probability, we conclude that
probability of no repetition in our sample is
() 1 2 ( + 1)
= =
1
43
Industrial Implications
If in a coal mines, 12 accidents occur in each year, then
practically all year will contain months with two or more
accidents. The probability that all months will have one
accident each is only 0.0000537.
On the average, only one year out of 18614 years, will

show a uniform distribution of one accident per month
This example reveal an unexpected characteristic of pure

randomness
This type of argument is often used for fraud detection
44
Extensions
The number of ways to deposit distinct objects into cells

with objects in cell no. ( are non-negative integers
summing to ) is
!
1 ! !
(ordering of bins is important but within each bin the
ordering is not important).
45
More Example
A throw of twelve dice can result in different outcomes

which we consider equally likely. The event that each face
appears twice can occur in as many ways as twelve dice can
be arranged in six groups of two each. The probability of
that event is therefore
12!
6 12
= 0.0034
2 .6
46
Application In Industrial Quality Control
Items are sampled from a collection of items and inspected for
defects. Assume that there are n defective items in the lot of
items. What is the probability of sampling defective items
out of items?
Problems of these types lead to genesis of hypergeometric
distribution.
In practice, the total population size as well as and are

known but the number of defected items in the population is
unknown.
o The latter may be estimated by maximizing the likelihood of the sample,
and may be given confidence interval using standard statistical estimation.
47
Estimating Population Size of Fish in a
Lake [capture-recapture ]
Consider the following experiment in an attempt to estimate
the number of fish in a lake. First, fishes are captures,
marked, and released. At a later time, fishes are caught with
of them bearing the mark of the original capture. Assuming
the size of the population of fishes is , the probability of
getting marked fishes in the second capture is

.

In this case, (, , ) are known but is unknown. We can estimate or
construct confidence intervals using the likelihood (probability of
observed data as a function of the unknown parameter ).
For example, if = 100, = = 1000 - we have approximately 93%
confidence interval that belongs to (8500; 12000).
48
Criticism of
Classical Definition of Probability :
Mathematicians find the definition to be circular.
The probability for a "fair" coin is... A "fair" coin is defined by a
probability of...
The definition is very limited. It says nothing about cases where no

physical symmetry exists.
Insurance premiums, for example, can only be rationally priced by
measured rates of loss.
It is not trivial to justify the principle of indifference except in the

simplest and most idealized of cases. Coins are not truly symmetric.
Can we assign equal probabilities to each side? Can we assign equal
probabilities to any real world experience?
49
- algebra
A non-empty collection of subsets of Sample space is
called a sigma algebra (or Borel field for events over real
line), denoted by ( B), if it satisfied the following two
properties:
a. If , then
( is closed under complementation).
b. If 1 , 2 , , then
=1
( is closed under countable unions).
To show that is closed under finite unions

To show (the empty set is an element of ) and
(the sample space is an element of ).
50
- algebra
Easy to realize that the superset generated by countable

sample space is always a - field.
Let = a, b, c, d and S = {, a, b , c. d , }. Can we

consider as a - field?
Probability is a measure (set-function) defined on , .

, is known as probabilizable space
51
More on - algebra
Example-1. (Sigma algebra-I) If S is finite or countable, we

define for a given sample space S, B = {all subsets of S,
including S itself}. If S has n elements, there are 2 sets in
B. For example, if S = {1, 2, 3}, then B is the following
collection of 2 3 = 8 sets: {1}, {1, 2}, {1, 2, 3}, {2}, {1, 3},
, {3}, {2, 3}.
Example-2. (Sigma algebra-II) Let S = (, ), the real line.
Then B is chosen to contain all sets of the form [a, b], (a,
b], (a, b), [a, b) for all real numbers a and b. Also, from the
properties of B, it follows that B contains all sets that can
be formed by taking (possibly countably infinite) unions
and interactions of sets of the above varieties.
52
Axiomatic Definition
of Probability
and
Probability Laws
Axiomatic Definition of Probability
[By Andrey Kolmogorov]
Probability of an event A, denoted by P(A) is a set function (also
called a measure ) with sample space and -field (also called event
space) satisfying the following axioms:
Axiom of nonnegativity: The probability of an event is a non
negative real number:
, 0
Axiom of Unity: the probability that at least one of the elementary
events in the entire sample space will occur is 1. More specifically,
there are no elementary events outside the sample space.
= 1
Axiom of Countable Additivity: For any countable sequence of
disjoint (synonymous with mutually exclusive) events 1 , 2
, P(
=1 ) = =1 ( )
54
Andrey Kolmogorov
Andrey Nikolaevich Kolmogorov

25 April 1903 20 October 1987
A 20th-century Russian mathematician
who made significant contributions to the
mathematics of probability theory,
topology, intuitionistic logic, turbulence,
classical mechanics, algorithmic
information theory and computational
complexity.
55
Important Results Follows From
The Probability axioms
Result-1: For the impossible event , we have necessarily P = 0.
Result-2: Probability function P is finitely additive; that is, if

(for = 1,2 , ) and if these events are disjoint, then

= ( )
=1 =1
Result-2.A. If , = 1,2 , be exhaustive and mutually
exclusive events in , then
( ) = 1

Result-2.B. Rule for Complementary Probability of any event A :
For any event , ( )= 1 .
Important Results - Continued
Result-3. The probability function P is monotone; that is, if and

are events in , such that , then ().
The numeric bound: It immediately follows from the

monotonicity property that for A , 0 A 1
Result-4. The probability function P is subtractive; that is, if

and are events in , such that , then
= .
Important Results - Continued
Result-5. Rule for Union of Probability for any events A, B

not necessarily mutually exclusive: If and are any two events
in , then
= + .
Imagine the rule for three or more events

Example: Selection and Allocation Dilemma
A company recruits 10 students for summer internship from a B-

School for four functional areas, namely, Analytics, Finance,
Marketing and Operations. The students are nearly equally
efficient in terms of their expertise in each of these four
functional areas. On the first day of their reporting, The HR
manager in a hurry, allotted them almost at random to the four
functional areas, without taking much care about the requirement
of various areas. What is the probability that Analytics area will
receive exactly 4 of the students?
Hint Answer
Note: You may consider students are indistinguishable based on

their skillset
That is, total possible equally likely arrangements will be same as

arranging 10 balls in 4 cells
Total arrangements favourable to the desired event will be same

as arranging 6(=10-4) balls in 3 (=4-1) cells
104+411 8
8!10!3! 8.7.3 14
Check: 104
10+41 = 6
13 = = = = 0.0979
10 10 6!2!13! 11.12.13 143
Example
A large shopping complex has 15 entry gates. Usually, one security
personnel is deployed to each of these gates. Security personnel can
usually chat with their colleagues deployed in the adjacent (both right
and left) gates. The personnel in 1st and 15th gates will be able talk
with only one of their colleagues. It was observed from past CCTV
footages that two personnel, say, and , whenever deployed in
adjacent gates gossips more and do not take the job seriously!
Management has ordered the chief-security officer that and
should be so deployed that there should be 10 other personnel in
between them. On one day, the chief-security officer was absent and
another person who had no idea about the order, allotted 15 personnel
in 15 gates at random. What is the probability that the requirement
will be met even in that case?
Hint Answer
2!10!3! .4 13
10
Required Probability =
15!
When Exact Probability is Untraceable
Result-6. Booles Inequality If , = 1,2 , be any events in ,

then

.
=1 =1
Result-7. [Bonferronis Inequality] If , = 1,2 , be any events in

, then

1
=1 =1

( 1)
=1 =1
Example
Over the years, the culture of binge drinking spread in premier

the B-schools across the globe despite honest effort of various
managements to curb irresponsible drinking behaviour of
students. After a booze night in a hostel, it was reported that
80% of the student of one Hostel consumed Beer, 70% enjoyed
Whisky and 60% relished Vodka. What is the proportion of
stalwarts who tried all three in that night?
Union of More than Two
Events
Poincares Theorem
For any sequence of events ; = 1,2 , , (not

necessarily mutually exclusive )
= +
=1 =1 <=1 <<=1
1
+ 1 1 2
For proof, use induction method starting from Result-5.

Example of Booze Night - continued
It was further reported that because of the popular perception

that Whisky after Beer, there is no fear; Beer after Whisky, it
is risky; 60% of the students tried Whisky after Beer but no
one tried Beer after Whisky. Moreover 50% tried both Beer
and Vodka and 40% actually tried Baseball Pleasure (a
cocktail of Vodka and Whisky). About 35% tried all three in
someway or other. What is the probability that a student did
not drink at all in the Booze night?
Are these information consistent?

Conditional Probability
Perception of Conditioning in Random
Experiment: Relative Frequency Context
In N independent trials, suppose , , denote the
number of times events A, B and AB occur respectively.
According to the frequency interpretation of probability, for
large N

; and

Among the NA occurrences of A, only NAB of them are also

found among the NB occurrences of B. Thus the ratio may

be looked upon as a measure of the event A given that B has
already occurred. Now,
/
= .
/
69
Definition of Conditional Probability
Consider the probability space , , . The conditional
probability of an event given that another event also
belong to same has occurred, denoted as , is defined
as by
( )
=
()
provided () 0.
If = 0, = 0.
The above definition satisfies all probability axioms discussed
earlier [Please Check by yourself]
70
Justification of the Definition of Conditional
Probability in the Light of Three Axioms
(i) > 0 by definition and 0 by axiom of nonnegativity.
()
Therefore, = 0.
()
(ii) Note that = . Therefore,

( ) ()
= = =1
() ()
(iii) Suppose are mutually disjoint for all = 1,2 . Then for any ,
( ) ( ) = ( ) = =
Therefore

(
=1 )
=1 ( )
| = = = ( |)
=1 () ()
=1
Hence (. |) satisfies all probability axioms and thus, defines a legitimate

probability measure.
71
Properties of Conditional Probability
(i) If , | = 1.
If , = ; Therefore P = . Hence,

= = = 1.
() ()
Since the occurrence of B implies automatic occurrence of the
event A.
Example: Probability that a G20 member is selected given that it is a
Brics member
A=A All 20 members ={Argentina, Australia, Brazil, Canada, China,
France, Germany, India, Indonesia, Italy, Japan, South Korea, Mexico,
Russia, Saudi Arabia, South Africa, Turkey, UK, USA, EU}
B= All Brics member = {Brazil, Russia, India, China, South Africa}
72
Properties of Conditional Probability
(ii) If , | .
If , = ; Therefore P = .
Hence,

= = .
() ()
(iii) The Law of Compound Probability.
When expressed in the product form we get,
= . .
73
Theorem of Compound Probability
When we have 3 events A, B and C, we have

= ( ) = .
= . . ( )
By an easy induction we obtain for n events we obtain for n

events 1 , 2 , ,
1 2
= 1 . 2 |1 . . ( |1 2 1 )
74
Probability of Complementary Event
with Conditioning Event
Suppose that, given , either or both belong to

can take place.
So we have
| = 1 .
We complement the event that is conditioned, not the

conditioning events in computing probabilities.
75
Law of independence
How to interpret the equation:
| = ?
It shows that As occurrence has had no impact on B. We say
then that B is independent of A.
We now ask the following: If B is independent of A, then is A

also independent of B?
The answer is yes, as the equation = also

implies the equation =
Thus the relationship of independence is symmetric. So from

now on we shall say A and B are independent events,
whenever anyone is independent of the other.
77
Testing Independence of Two Events
This will mean

= ()
=
and
= . ()
To show that A and B are independent events, we may verify

any of the above three equations.
78
Example : Three coins are tossed
= first coin is heads; = second coin is heads
1 1
Then = = () and = . So,
2 4
= . .
This verifies that A and B are independent events.
79
If and both belongs to and are mutually independent,

then
and are independent
and are independent
and are independent
80
Difference between Mutually Exclusive
Events and Independent Events
Note carefully that, if A and B are mutually exclusive, then
= 0. From the definition of conditional probability,
we see that
= =
From this, we would have, respectively

= = 0, 0 () 0,
Thus, unless either A or B is the null event , A and B are not
independent when they are mutually exclusive. Alternatively, if
A and B are mutually exclusive, the occurrence of B must depend
on A, since if A occurs, then B can never do so. If either A or B
is equal to , then A and , or B and , are independent.
81
Difference between Pairwise Independence
and Complete Independence of Events
If A, B and C are three events in , they will be pairwise
independent if
=
=
=
A, B and C will be completely independent if along with above

three equations following also holds
= ()
How many equations need to be satisfied, if events 1 , 2 , ,

have to be completely independent?
82
Some Final Remarks on
Achievements and Failures of
Gerolamo Cardano
83
5. Cardano also correctly formulates the product rule for computing

the chance of the simultaneous occurrence of events defined for
independent trials:
In terms of odds he says that if out of n equally likely cases just m

are favourable to an event, then in r [independent] repetitions of the
trial the odds that the event would occur every time are as mr /(nr
mr ) which, writing p for m/n becomes pr /(1 pr ). In particular, in
throwing three dice, 91 out of 216 cases are favourable to the event
at least one ace.
If the three dice are thrown thrice, Cardano correctly gets that the
odds for getting the event at least once is a little less than 1 to 12
84
Two problems : Cardano discussed but
failed to solve correctly
1. Problem of minimum number of trials: What should be the
minimum value of r, the number of throws of two dice,
which would ensure at least an even chance for the
appearance of one or more double sixes?
2. Problem of division: Two players start playing a series of

independent identical games in each one of which one or
other would win, with the agreement that whoever wins a
pre-fixed number of games first would win the series and the
total stake. The series is interrupted when the two players
have respectively a and b games still to win. What division
of the total stake between the players would be fair?
For Brainstorming
85
Some Important Theorems
Theorem of Total Probability
There are three machines that are producing cork stoppers in a

manufacturing unit. One machine is old and produce about 5%
defective. Other two machines respectively produce 2% and 3%
defective items. Probability of having a defective item at the
time of inspection is actually connected with
I. The machine in which that cork-stopper is produced
II. Proportional contribution of three machine in total
production of cork stoppers under inspection
We can use the conditional probability to express the probability
of a complicated event in terms of simpler related events. The
theorem of total probability helps us in achieving this.
87
If a sequence of events : = 1 , 2 , 3 , ; forms a

finite or countably infinite partition of a sample space (in other
words, set of events are exhaustive as well as mutually
exclusive) and provided ( ) > 0 for each , then for any
event A ,
= . P

If ( ) = 0 for some , we should take away those events

and work with the rests.
88
The summation can be interpreted as a weighted average, and

consequently the marginal probability, ( ), is sometimes
called average probability.
Special Case: If 0 < () < 1 so that P and

P are both defined, then,
= P + P( ) P
89
Example: Theorem of Total Probability
Suppose that two factories supply light bulbs to the market.

Factory X's bulbs work for over 5000 hours in 99% of cases,
whereas factory Y's bulbs work for over 5000 hours in 95% of
cases. It is known that factory X supplies 60% of the total
bulbs available. What is the chance that a purchased bulb will
work for longer than 5000 hours?
90
Bayes' Theorem
91
Laplace form of Bayes Theorem
Let : = 1 , 2 , 3 , be a sequence of exhaustive and

mutually exclusive events in such that ( ) > 0 for each
, then for any event B ,

| =
. P
provided, of course, () > 0.
92
Rev. Thomas Bayes
Thomas Bayes (c. 1701 7 April 1761)
An English statistician, philosopher and Presbyterian minister
Bayes never published what would eventually become his

most famous accomplishment
His notes were edited and published after his death

by Richard Price.
93
Application of Bayes Theorem
The entire output of a factory is produced on three machines.

The three machines account for 20%, 30%, and 50% of the
output, respectively. The fraction of defective items
produced is this: for the first machine, 5%; for the second
machine, 3%; for the third machine, 1%. If an item is chosen
at random from the total output and is found to be defective,
what is the probability that it was produced by the third
machine?
94
Bayesian (or epistemological) interpretation
of the Theorem
Measure of a degree of belief: Bayes' theorem links the degree of

belief in a proposition before and after accounting for evidence.
Example: Suppose it is believed with 50% certainty that a coin is

twice as likely to land heads than tails. If the coin is flipped a number
of times and the outcomes observed, that degree of belief may rise,
fall or remain the same depending on the results.
For proposition A and evidence B,

P(A), the prior, is the initial degree of belief in A.
P(A|B), the posterior, is the degree of belief having accounted for B.
the quotient P(B|A)/P(B) represents the support B provides for A.
95
Miscellaneous Problems and Results
Pairwise Independent but not
Completely Independent -An Example
Suppose, an investor may choose to invest in either of the three
options available to him namely, Fixed Deposits, Mutual
Funds or in Stocks.
= 0.16
= 0.16
= 0.16
= 0.08
= 0.08
= 0.08
= 0.28
Here events are pairwise independent but not completely
independent
Poincares Theorem Through
Logical Reasoning
The probability 1 of the realization of at least one among the
events 1 , 2 , . , is given by
1 = 1 2 + 3 4 ,
where
= (1 2 ) .
11 <,2 <,<
Earlier we have noted that the can be proved easily by method of

induction. We shall now consider an elegant proof.
Logical Reasoning
Let us consider the so-called method of inclusion and exclusion. To
compute P1 we should add the probabilities of all sample points
which are contained in at least one of the , but each point should
be taken only once.
To proceed systematically we first take the points which are

contained in only one , then those contained in exactly two events
, and so forth, and finally the points (if any) contained in all .
Now let E be any sample point contained in exactly n among our N

events .
Without loss of generality we may number the events so that E is

contained in 1 , 2 , . , , but not contained in +1 , +2 , , .
Logical Reasoning
Then {} appears as a contribution to those , , .whose
subscripts range from 1 to n, where
= { }, = { }, = { , }, .
Hence {} appears n times as a contribution to 1 , and 2 times as a

contribution to 2 , etc. In all, when the right-hand side is expressed in
terms of the probabilities of sample points we find {} with the factor

+ + .
2 3
It remains to show that above series is equal to 1.
This follows at once on comparing the above with the binomial

expansion of (1 1) . The latter starts with 1, and the terms of above
with reversed sign. Hence for every > 1 the expression equals 1.
Complex Application of Poincares Theorem
A NBFS has presence in 25 cities in India and they have one

branch office in each of these 25 cities. A group 25 managers
were selected one each from those 25 different city branches for
a Management Development Programme in XLRI. Post-training
these 25 managers were deputed at random in 25 city branches
so that each branch received exactly one manager. What is the
probability that no managers were posted in the same branch
where the person used to work before the training?
Complex Application of Poincares Theorem
Note that Probability of at least one match is:

25 24! 25 23! 25 22! 25 1! 25 0!
+ +
1 25! 2 25! 3 25! 24 25! 25 25!
1 1 1 1
= + +
1! 2! 3! 25!
Therefore the required probability of no match, being the

complementary event of the at least one match is
1 1 1 1
1 + +
1! 2! 3! 25!
When number of persons is large this probability actually tends to
1 .
To express e, remember to memorize a sentence to simplify this.

An Extension of Classical Definition to
Geometrical Probability
Example-1: Courier person comes regularly once to the office to
pick up consignments at a random time between 12.30 pm and
1.30 pm and stays about 10 minutes. If you prepare an urgent
consignment at 12:55 pm, how likely are you to dispatch the
letter without any hassle on the same day from the office ?
Example-2: Both the bus and you get to the bus stop at random
times between 12noon and 1pm. When the bus arrives, it waits
for 5 minutes before leaving. When you arrive, you wait for 20
minutes before hiring a cab if the bus doesn't come. What is the
probability that you catch the bus?
An Extension of Classical Definition to
To solve this type of problems we consider Geometrical

Probability.
Suppose is a smaller part of , then under the framework of

classical definition of equally likeliness, probability that a
randomly chosen point fall inside the part of is
Some famous problems are Buffons needle problem and

Bertrand paradox See Wikipedia
Solution to Example-2 using
We have two continuous variables here: , the time in minutes
past 12 noon that the bus arrives, and , the time in minutes past
12 noon that you arrive. Since there are 2 independent variables,
we will convert this into a 2-dimensional geometry problem.
Specifically, we can think of the set of all outcomes as the points
in a square:
Then, we need to determine the region of "success"; that is, the
points where we catch the bus. Since the bus will wait for 5
minutes, you need to arrive within 5 minutes of the bus' arrival,
or + 5.
However, you only wait for 20 minutes, so you can't arrive

more than 20 minutes before the bus, so 20.
Combining our two conditions, we have a region of success as
shown below
Now, we just need to find the area of this success region. A
simple method is to find the area of the non-success region, and
then subtract that from the total area:
Thus, the probability of catching the bus is:

=

552 402
602 103
2 2
= =
602 288
An Example from the Book:
Bayesian Method in Finance
[Authors: S. T. Rachev; J. S. J. Hsu; B. S. Bagasheva and F. J. Fabozzi]
Investors often hunt for companies that have high or improving

free cash flow but low share prices. Low P/FCF ratios typically
mean the shares are undervalued and prices may soon increase.
Thus, the lower the ratio, the "cheaper" the stock is.
A manager in an event driven hedge fund (an offshore investment

fund, typically formed as a private limited partnership, that
engages in speculation using credit or borrowed capital ) is
testing a strategy that involves identifying potential acquisition
targets and examines the effectiveness of various company
screens, in particular the ratio of stock price to free cash flow per
share (PFCF).
Independently of the screen, the manager assesses the
probability of company X being targeted 40%. Suppose
further that the managers analysis suggests that the
probability a target companys PFCF has been more than
three times lower than the sector average for the past three
years is 75% while the probability that a nontarget company
has been having that low of a PFCF for past three years is
35%. If a bidder does appear on the scene, what is the
probability that the targeted company had been detected by
the managers screen?
Let us consider the following two events:

= Company Xs PFCF has been more than three times
lower than the sector average for the past three years
= Company X becomes an acquisition target in the course
of a given year.
To answer the question, the manager needs to update the prior
probability compute the posterior probability P .
That is , denoting by the event that X does not become a
target in the course of the year , we have
= 0.4 and = 0.6.
Also, | = 0.75 and | = 0.35.
Applying Bayes Theorem we obtain:
. (|)
P =
. + . (| )
0.75 0.4
=
0.75 0.4 + 0.35 0.6
0.3 0.3
= = = 0.5882.
0.3+0.21 0.51
After taking into account companys persistently low PFCF,

Probability of a takeover increases from 40% to 58.8%.
In financial applications, continuous versions of Bayes

Theorem is predominantly used. Nevertheless, the discrete
form has some important uses, two of which are:
1. Model Selection
2. Bayes Classification
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)
Source: Russell and Norvig's AI book, section 14.4 (1st

edition), personal communication between Prof. Scott D.
Anderson of Department of Computer Science , Wellesley
College and David D. Lewis (http://daviddlewis.com).
Prof. Anderson rewrote the problem with an help from
Ethan Herdrick. The context of this problem is spam filters,
an honors thesis conducted by Sara .Scout. Sinclair under
Prof. Andersons supervision.
How to combine evidence using what's called naive Bayes:

the assumption of conditional independence - even though
we might know that the data aren't exactly conditionally
independent.
So, the probability we get won't be accurate, but it should at

least be a probability and should correlate with the
information we want, namely the probability that a message
is spam.
We want to train a Bayesian classier to classify email.
Let's start with an example:
Ham Spam Total

messages messages
With Free 100 300 400
With Viagra 10 90 100
All Messages 400 600 1000

The basic application of Bayes' rule allows us to calculate the
probability that a message is spam given that it contains any one
token.
. |
=
()
600 300
. | 300
= = 1000 600
400 = = 0.75
() 400
1000
600 90
. | 90
= = 1000 600
100 = = 0.90
() 100
1000
Our prior probability of spam (given the training data) is 0.6, and if
we see a message containing the word free we bump that up to
0.75 and if we see Viagra we bump it up to 0.90.
The question is how to combine multiple pieces of evidence. That is,

if I see a message with both freeand
Viagra,what will be our
probability calculation?
Let us start with the following equation, which doesn't assume
conditional independence. This equation is a straightforward
application of Bayes' rule for two pieces of evidence:

. |
= (1)
( )
There are several problems with this equation. The 1st is the
denominator: usually one does not going to record and train on
all subsets of words (let's stipulate that), so the probability of
Viagra co-occurring with free is unknown. The same problem
is on the numerator, where one would need to know the
probability of that pair of terms co-occurring in a spam message.
One approach is to make the assumption of conditional

independence.
Conditional independence means that once you know one piece of

information, other features become independent. One classic
example is that spelling ability and shoe size are not independent:
people with larger feet spell better than people with smaller feet.
The missing piece of information is age: older kids have larger
feet and better spelling. Once you know a child's age, their
spelling ability and shoe size are unrelated (independent). When
two features are conditionally independent, we can calculate their
co-occurrence as a simple multiplication. The general statement is
as follows:
, = . (2)
For the spam problem, our assumption is that the occurrence of

the words free and Viagra become independent once we
know whether the message is spam. (Again, this assumption is
probably wrong, but we make it anyhow, because we won't count
how many times the words co-occur.)
Now, we make our assumption of conditional independence.
Applying equation (2) to the numerator of equation (1), we get:

|
= | . | (3)
In words, this means that for spam messages, we expect Viagra
and freeto be independent, so the probability of their co-
occurrence in a spam message is just the product of their
conditional probabilities.
You may or may not agree with the assumption, but that's what
it means.
Thus equation (1) becomes

. | . |
= (4)
( )
We have in our database everything except .
Now, since the message is certainly either ham or spam
+ = 1.
Therefore,
. |
( )
. |
+ = 1.
( )
This gives;

= . |
+ . | .
This replaces the calculation of the joint probability (
This, then, is the desired denominator for our probability
calculation. Note that the 1st term is the same as our
numerator, the other term is the analogous calculation
conditioned on ham rather than spam. The final formula, then,
for two pieces of evidence is:

. | . |
=
{ . | . |
+ . | . | }
Check: In the Example: =0.95

Application of Bayes Rule in Market Survey
In Orange County, 51% of the adults are males. (and assume that
the other 49% are females) One adult is randomly selected for a
survey involving credit card usage.
a. Find the prior probability that the selected person is a male.

b. It is later learned that the selected survey subject was smoking a
cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of
females smoke cigars (based on data from the Substance Abuse
and Mental Health Services Administration). Use this additional
information to find the probability that the selected subject is a
male.
Let's use the following notation:

= male; = female (or not male);
= cigar smoker; = not a cigar smoker.
a. Before using the information given in part b, we know

only that 51% of the adults in Orange County are males, so
the probability of randomly selecting an adult and getting a
male is given by () = 0.51.
b. Based on the additional given information, we have the

following:
() = 0.51 because 51% of the adults are males
( ) = 0.49 because 49% of the adults are females (not
males)
(|) = 0.095 because 9.5% of the males smoke cigars
(That is, the probability of getting someone who smokes cigars,
given that the person is a male, is 0.095.)
(| ) = 0.017 because 1.7% of the females smoke cigars
(That is, the probability of getting someone who smokes cigars,
given that the person is a female, is 0.017.)
Let's now apply Bayes' theorem. We get the following result:

( | ) =
+
0.510.095
= = 0.85329.
0.510.095+0.490.017
Before we knew that the survey subject smoked a cigar, there

is a 0.51 probability that the survey subject is male (because
51% of the adults in Orange County are males). However,
after learning that the subject smoked a cigar, we revised the
probability to 0.853. There is a 0.853 probability that the
cigarsmoking respondent is a male. This makes sense,
because the likelihood of a male increases dramatically with
the additional information that the subject smokes cigars
(because so many more males smoke cigars than females).
Application of Bayes Rule in
Engineering Management
An aircraft emergency locator transmitter (ELT) is a device

designed to transmit a signal in the case of a crash. The
Altigauge Manufacturing Company makes 80% of the ELTs, the
Bryant Company makes 15% of them, and the Chartair
Company makes the other 5%. The ELTs made by Altigauge
have a 4% rate of defects, the Bryant ELTs have a 6% rate of
defects, and the Chartair ELTs have a 9% rate of defects (which
helps to explain why Chartair has the lowest market share).
a. If an ELT is randomly selected from the general population of

all ELTs, find the probability that it was made by the Altigauge
Manufacturing Company.
b. If a randomly selected ELT is then tested and is found to be

defective, find the probability that it was made by the Altigauge
Manufacturing Company.
We use the following notation:
= ELT manufactured by Altigauge;
= ELT manufactured by Bryant;
= ELT manufactured by Chartair
= ELT is defective;
= ELT is not defective (or it is good)
a) If an ELT is randomly selected from the general population

of all ELTs, the probability that it was made by Altigauge is
0.8 (because Altigauge manufactures 80% of them).
b) If we now have the additional information that the ELT

was tested and was found to be defective, we want to revise
the probability from part (a) so that the new information can be
used. We want to find the value of (|), which is the
probability that the ELT was made by the Altigauge company
given that it is defective. Based on the given information, we
know these probabilities:
() = 0.80 as Altigauge makes 80% of the ELTs

() = 0.15 as Bryant makes 15% of the ELTs
() = 0.05 as Chartair makes 5% of the ELTs
(|) = 0.04 as 4% of the Altigauge ELTs are defective
(|) = 0.06 as 6% of the Bryant ELTs are defective
(|) = 0.09 as 9% of the Chartair ELTs are defective
Here Bayes' theorem is extended to include three events

corresponding to the selection of ELTs from the three
manufacturers (A, B, C):
(|)
=
+ (|)+ (|)
0.8 0.04
= = 0.7033
0.8 0.04 + 0.15 0.06 + 0.05 0.09
Application of Bayes Rule in Traffic
Management and Crime Investigation
A certain town has two taxi companies: Blue Birds, whose cabs
are blue, and Night Owls, whose cabs are black. Blue Birds
has 125 taxis in its fleet, and Night Owls has 375. Late one
night, there is a hit-and-run accident involving a taxi. The
town's 500 taxis were all on the streets at the time of the
accident. A witness saw the accident and claims that a blue taxi
was involved. At the request of the police, the witness
undergoes a vision test under conditions similar to those on the
night in question. Presented repeatedly with a blue taxi and a
black taxi, in random order, he shows he can successfully
identify the color of the taxi 9 times out of 10. Which company
is more likely to have been involved in the accident?
Two Problems of Theoretical Nature
Prove that for any three events, , and , the probability

that exactly one of these events will occur can be expressed
as:
+ +
2 + + + 3( )
Prove that, for any two events and , one has

, 2 , .
Probability of Maximum and Minimum
(i) In a bidding process any bidder can bid in multiple of Rs.

1000/- between Rs. 1,00,000/- and Rs. 10,00,000/- both
inclusive. A bidder is equally likely to choose any of the
permissible amount. The highest bidder amongst the 25
participating in the bidding process will win. What is the
probability that one who bids Rs. 8,00,000/- will win?
(ii) If person choose a number at random from among first

positive integers. All numbers are equally likely to be
selected. What is the probability that the highest number
chosen will be ?
Probability of Maximum and Minimum
iii. In a tender call, an organization is likely to receive

quotation price in multiple of Rs. 1000/- between Rs.
50,00,000/- and Rs. 60,00,000/- both inclusive from various
companies. A company is equally likely to offer a price in
the given range. A company that offers the lowest quote will
win the tender call. If 25 companies respond to the tender
call with a quote, what is the probability that one who
quoted Rs. 54,00,000/- will win?
iv. If person choose a number at random from among first
positive integers. All numbers are equally likely to be
selected. What is the probability that the lowest number
chosen will be ?
More On The Classical Occupancy Problem
Recall the problem of a random distribution of r balls in

cells, assuming that each arrangement has probability .
What is the probability (, ) of finding exactly cells

empty?
Let be the event that cell number k is empty ( =

1, 2, . . . , ). In this event all r balls are placed in the remaining
1 cells, and this can be done in ( 1) different ways.
Similarly; there are ( 2) arrangements, leaving two
preassigned cells empty, etc.
Now, writing as the probability that -th cell remain empty,

as the probability that both the -th and -th cell remain
empty, we have

1 2 3
= 1 , = 1 , = 1 ,.

and hence for every

= 1 ,, = 1

11 <<
By Poincare's theorem The probability that at least one cell is
empty is given by 1 2 + +
The probability that all cells are occupied

0 , = 1 1 + 2 + = 0(1) 1 .

Consider now a distribution in which exactly cells are

empty.

These m cells can be chosen in
ways.
The r balls are distributed among the remaining cells

so that each of these cells is occupied, the number of such
distributions is ( ) 0 (, ).
Dividing by we find for the probability that exactly m

cells remain empty

, = 1 0 ,

+
= (1) 1

0
Urn Models for Aftereffect.
An industrial plant accident might be pictured as the result of a

superhuman game of chance: Fate has in storage an urn
containing red and black balls; at regular time intervals a ball
is drawn at random, a red ball signifying an accident.
If the chance of an accident remains constant in time, the

composition of the urn is always the same. But it is
conceivable that each accident has an aftereffect in that it
either increases or decreases the chance of new accidents.
This corresponds to an urn whose composition changes

according to certain rules that depend on the outcome of the
successive drawings. It is easy to invent a variety of such rules
to cover various situations, but we shall be content with a
discussion of the popular Urn models.
Urn model: An urn contains b black and r red balls.
A ball is drawn at random.
It is replaced and, moreover, c balls of the color drawn and d
balls of the opposite color are added.
A new random drawing is made from the urn (now containing r
+ b + c + d balls), and this procedure is repeated.
This corresponds to an urn whose composition changes

according to certain rules that depend on the outcome of the
successive drawings. It is easy to invent a variety of such rules
to cover various situations, but we shall be content with a
discussion of the popular Urn models.
Urn model: An urn contains b black and r red balls.
A ball is drawn at random.
It is replaced and, moreover, c balls of the color drawn and d
balls of the opposite color are added.
A new random drawing is made from the urn (now containing r
+ b + c + d balls), and this procedure is repeated.
A typical point of the sample space corresponding to n

drawings may be represented by a sequence of n letters B and
R.
The event "black at first drawing" (i.e., the aggregate of all
sequences starting with B) has probability

.
+
If the first ball is black, the (conditional) probability of a black
ball at the second drawing is
+
.
+++
Random Variables and
Applications
Random Variables
We have seen earlier, uncertainty is omnipresent in the business
world which in turn induces variability.
To model variability probabilistically, we need the concept of a

random variable.
A random variable is the outcomes of a random experiment

numerically expressed that can take different values with given
probabilities.
Suppose that we have a random experiment with sample space

. A function from into another set is called a (-valued)
random variable.
Random Variables
Examples:
The return on an investment in a span (period) of one-year;
The closing price of a stock in NSE;
The number of customers entering a shopping complex
The sales volume of a store on a particular day
The turnover rate at your organization next year
Types of Random Variables
Discrete Random Variable:

One that takes on a countable number of possible values, e.g.
Total of face values (points) on roll of two dice: 2, 3, , 12
Number of refrigerator sold: 0, 1,
Customer count: 0, 1, . . .
Continuous Random Variable:
one that takes on an uncountable number of possible values, e.g.
Interest rate: 3.25%, 6.125%, . . .
Task completion time: a nonnegative value
Price of a stock: a nonnegative value
Types of Random Variables
In General: Random variables that take Integer or rational

numbers are discrete, while that take real numbers are
continuous.
In some cases, numbers are not immediately associated with
outcomes of a random experiment.
For Example,
You may win a bid or lose
After flipping, coin may show head or a tail
A customer can be male or female
We Often assign numbers such as 0 and 1 to the possible
possible outcomes in such cases
Probability Distribution
Randomness of a random variable is described by a

probability distribution.
Informally, the probability distribution specifies the probability
or likelihood for a random variable to assume a particular
value.
Formally, let be a random variable and let be a possible
value of X. Then, we have two cases.
Discrete: the probability mass function of X specifies ()
( = ) for all possible values of .
Continuous: the probability density function of is a function
() that is such that () ( < + ) for
small positive .
Probability Distribution
The probability mass function specifies the actual probability,

while the probability density function specifies the probability
rate; both can be viewed as a measure of likelihood.
Discrete probability distribution may have

A finite support ( Sample space is countably finite)
For example: Number of Successes in bidding or
Number of stocks in the List of 50 companies that form
part of NIFTY 50 Index whose closing prices were higher
than opening prices yesterday
An infinite Support: ( Sample space is countably infinite)
For example: Number of trials required to get r successes
Discrete Probability Distribution
A probability mass function must satisfy the following two

requirements:
i. 0 1for all
ii. = 1; being set of all possible values of .
Empirical data can be used to estimate the probability mass

function.
Consider, for example, the number of TVs in a household.

Discrete Probability Distribution
No. of TVs No. of Households x P(x)

0 1,218 0 0.012
1 32,379 1 0.319
2 37,961 2 0.374
3 19,387 3 0.191
4 7,714 4 0.076
5 2,842 5 0.028
Total 101,501 1
For = 0, the probability 0.012 comes from 1,218/101,501.
Other probabilities are estimated similarly.
Properties Discrete (Probability) Distribution
Realized values of a discrete random variable can be viewed as
samples from a conceptual/theoretical population.
For example, suppose a household is randomly drawn, or
sampled, from the population governed by the probability mass
function specified in the previous table. What is the probability
for us to observe the event {X = 3}?
Answer: 0.191. That X turns out to be 3 in a random sample is
called a realization. Similarly, the realization X = 2 has
probability 0.374.
We can therefore compute the population mean, variance, and so
on. Results of such calculations are examples of population
parameters.
Details of estimation will be taken later.
Bernoulli Trials
Bernoulli Trials
A sequence of trials is said to be Bernoulli trials if they satisfy

the following three assumptions:
I. Each trial has two possible outcomes, in the language of

probability called success and failure.
II. The trials are independent. Intuitively, the outcome of one

trial has no influence over the outcome of another trial.
III. On each trial, the probability of success is and the

probability of failure is 1 where [0,1] is the success
parameter of the process.
Bernoulli Trials in In Real World
Randomly assign a patient a new drug or placebo according as

the outcome of a coin tossing is head or tail.
In conducting a political opinion poll, choosing a voter at

random to ascertain whether that voter will vote "yes" in an
upcoming referendum. In choosing multiple voter one need to
ensure Population size is large enough compared to sample so
that exclusion of already sampled voters does not alter the
probability.
A customer can reinvest or liquidate a fixed deposit that will

mature in a day.
Jacob Bernoulli
[Not to be confused with Daniel Bernoulli Associated with the
famous Bernoulli Principle of Fluid Dynamics]
Lifespan: 6 January 1655 16 August 1705
One of the many prominent mathematicians in

the Bernoulli family.
Known for his numerous contributions
to calculus, and along with his brother Johann,
was one of the founders of the calculus of
variations.
His most important contribution was in the
field of probability, where he derived the first
version of the law of large numbers in his
work Ars Conjectandi.
Probability Models from Bernoulli Trials
Binomial Model can be looked upon as the number of success

in a sequence of n Bernoulli trials.
Poisson Model can be used as an approximation for the number

of success in an infinite sequence of Bernoulli Trials.
We shall also consider models for number of Bernoulli trials

required to achieve a specified number of successes.
Some Important
Discrete Distributions
Problem-1
Suppose, an organization had 10 senior managers and

15 junior managers. Out of those 25 managers, 5 left
the organization in the last quarter. Assuming that the
managers acted independently of each other and it is
equally likely for anyone to separate, what is the
probability that 2 of the 5 managers left, were senior
mangers?
General Version of Problem-1
Suppose, an organization had managers, out of

them proportion of senior managers is and rests are
junior managers. Out of those managers, left the
organization in last quarter. Assuming that the
managers acted independently of each other and it is
equally likely for anyone to separate, what is the
probability that out of the managers left, were
senior mangers?
The Hypergeometric Distribution
The . . of the distribution is given by

= , = 0,1,2, , .

0 1, = 1 ,
In practice,
min(, ) and max(0, ).
The Hypergeometric Distribution -
from Urn Problem
There are blue balls and white balls in an urn

which are otherwise identical. Further suppose,
( + ) balls are taken out of the urn at
random all at once. What is the probability that
out of the balls are taken out of the urn will be
blue?
Problem-2.A.
Suppose, an organization has large number of

employees of which 20% are rewarded with one
additional increment based on their performance
appraisal. There are 24 employees in the area of a
city branch of the organization. Assuming that the
employees chance of getting reward is independent
of others, what is the probability that exactly 7 of
the 24 employees of the branch are rewarded?
General Version of Problem 2.A.

employees of which a certain proportion of
employees are rewarded with one additional
increment based on performance appraisal. There
are employees in the area of a city branch of
the organization. Assuming that the an employees
chance of getting reward is independent of others,
what is the probability that exactly of the
employees of the branch are rewarded?
Problem-2.B.

employees of which 65% are permanent employees and
rest are in fixed-term contractual appointment and the
proportion is more or less same across all its branches.
Suppose, there are 50 employees in a city branch of the
organization. What is the probability that exactly 20 of
the 50 employees of the branch are permanent
employees?
Problem-2.C.

employees of which 55% are male and rest are
female. Further suppose that the gender ratios are
more or less same across all its branches. Suppose,
there are 100 employees in a city branch of the
organization. What is the probability that exactly
70 of the 100 employees of the branch are male?
The Binomial Distribution.
The . . of the distribution is given by

=
, = 0,1,2, , .
0 1, = 1 ,
Special case: = 1.
= (1 )1 , = 0,1. 0 1.
This is known as Bernoulli Distribution.

The Binomial Distribution -
from Urn Problem
There are blue balls and white balls in an urn

which are otherwise identical. Suppose one ball is
taken out of the urn at random, its colour is noted
and is subsequently returned to the urn. Let the
trial be repeated n times and each time one ball is
taken out of the urn at random, its colour is noted
and is subsequently returned to the urn before next
drawing. What is the probability that out of the
balls are taken out of the urn will be blue?
Binomial Probability Mass Function
(For varying sample size and fraction p fixed at 0.5)
Binomial Probability Mass Function
(For varying fraction p and fixed sample size =20)
Binomial Cumulative Distribution
Function
Binomial Model Vs. Hypergeometric Model
Hypergeometric Model is used for sampling without

replacement while Binomial Model is used for sampling
with replacement.
Does it matter if we sample a few buckets of water from a

vast ocean without returning them back before drawing the
next bucket?
When population size is large Hypergeometric

distribution tends to binomial distribution. This can be
mathematically proved applying limit tending to infinity
but we can skip it in Business Statistics course.
Binomial Model in Option Pricing
The binomial options pricing model (BOPM) is one of

the most commonly used option pricing models.
Though in calculation it is more complex as compared

to the Black-Scholes option pricing model but is
widely used due used as it is able to handle a variety of
conditions for which other models cannot easily be
applied.
At each point, the model considers two scenarios, one

is called up (where the value of the underlying
increases) and the other one being down (where the
value of the underlying decreases).
Defining Rare Events
When we define a Binomial model as:

= , = 0,1,2, , .

0 1, = 1 ,
We often say is the probability of success and is the
probability of failure in a sequence of Bernoulli Trials.
Imagine a situation when is very large but is small.

We can intuitively argue that occurrence of a success in
such a case will be a rare event.
Poisson Model in Real World
The number of bankruptcies that are filed in a month
The number of arrivals at a car wash in one hour
The number of network failures per day
The number of Airbus 330 aircraft engine shutdowns per

100,000 flight hours.
The number of hungry persons entering McDonald's restaurant.

Poisson Model in Real World
The number of work related accidents over a given production

time
The number of birth, deaths, marriages, divorces, suicides, and

homicides over a given period of time
The number of customers who call to complain about a service

problem per month
The number of visitors to a Web site per minute
The number of calls to consumer hot line in a 5-minute period

Examples of such Rare Events in Real World
Number of Road Accidents/ Traffic fatalities
Number of misprints in a book
Number of employees absent from work on a particular day
Number of unresolved cases in call center in a day

Poisson Distribution
The . . . of a Poisson distribution is given by:

= , > 0, = 0,1,2, .
!
This model can actually be derived from the . . . of a

Binomial distribution taking limits over and .
We consider limits as: tends to infinity and tends to zero such

that the product is finite and is equal to, say, .
Poisson Probability Mass Function
The horizontal axis is the index k, the number of occurrences. The

function is defined only at integer values of k. The connecting lines are
only guides for the eye.
Poisson Cumulative Distribution Function
The horizontal axis is the index k, the number of occurrences. The

CDF is discontinuous at the integers of k and flat everywhere else
because a variable that is Poisson distributed takes on only integer
values.
Problem on Poisson Distribution
(Problem-3)
In a given hour, a human resource manager receives job

applications over the internet. The number of job
applications she receives per hour varies from hour to
hour. Suppose the best distribution that models the hour-
to-hour fluctuations in the number of applicants received
is Poisson and the human resource manager receives
applications from the internet at an average (rate) of 6 per
hour. What is the probability that the human resource
manager receives between 4 and 6, both inclusive, in any
given hour?
Problem-4
A Director wants to recruit a secretary as fresher for assisting him in

the delivery of human resource services with specific responsibility
for supporting department staff; providing information to applicants
and employees; maintaining clerical and financial records; and
completing assigned projects and tasks who will report him on regular
basis. Considering his busy schedule, he decides to go for telephonic
interview one by one until he finds someone deserving to call for
personal interview in his office. He knows that the probability of
getting a deserving candidate is .
What is the probability that he will find the ideal candidate in trials?
What is the probability that y candidates would be rejected before he

finds the right candidate?
The Geometric Distribution
In probability theory and statistics, the geometric distribution

is either of two discrete probability distributions:
The probability distribution of the number X of Bernoulli trials

needed to get one success, supported on the set { 1, 2, 3, ...}
The probability distribution of the number Y = X 1 of failures

before the first success, supported on the set { 0, 1, 2, 3, ... }
Which of these one calls "the" geometric distribution is a

matter of convention and convenience.
Its the probability that the first occurrence of success requires

number of independent trials, each with success probability . If
the probability of success on each trial is , the probability that
the -th trial (out of trials) is the first success is:
1
= = = 1 for = 1, 2, 3, .
The above form of geometric distribution is used for modeling

the number of trials until the first success.
By contrast, the following form of geometric distribution is

used for modeling number of failures preceding the first
success:
= = = 1 for = 0, 1, 2, 3 .
In either case, the sequence of probabilities is a geometric

sequence.
Suppose a fair die is thrown repeatedly until the first time a "1"
appears. The probability distribution of the number of times it is
thrown is supported on the infinite set { 1, 2, 3, ... } and is a
geometric distribution with = 1/6
Problem-5
A. Suppose a fair die is thrown repeatedly until the 6 appears

for 6 times. What is the probability that the sixth 6 can be
achieved at 36th trials?
B. Suppose a fair die is thrown repeatedly until the 6 appears

for 5 times. What is the probability that the fifth 6 can be
achieved at 25th trials?
In context, it is recruiting a number of candidates one by one

sequentially till all the vacancies are filled.
Negative Binomial Distribution
Generalization of Problem 5 leads to negative binomial

distribution.
Suppose there is a sequence of Bernoulli trials with the

probability of success per trial. We are observing this
sequence until a predefined number of failures has occurred.
Then the random number of successes we have seen, , will
have the negative binomial (or Pascal) distribution:
The probability mass function of the negative binomial

distribution is
+1
(; , ) = [ = ] = 1
1 for k=0,1,2,
Negative Binomial Distribution (Alternative
form)
Suppose there is a sequence of Bernoulli trials with the
probability of success per trial. We are observing this sequence
until a predefined number of successes has occurred. Then the
random number of trials we have seen, , will have the negative
binomial distribution with . . .:
1
(; , ) = [ = ] = 1
1
for = , + 1, + 2,
Discrete Uniform Distribution
In probability theory and statistics, the discrete uniform

distribution is a symmetric probability distribution whereby a
finite number of values are equally likely to be observed; every one
of n values has equal probability 1/n.
Another way of saying "discrete uniform distribution" would be "a

known, finite number of outcomes equally likely to happen".
A simple example of the discrete uniform distribution is throwing a

fair die. The possible values are 1, 2, 3, 4, 5, 6, and each time the
die is thrown the probability of a given score is 1/6.
Application: Absenteeism in Call Center
Suppose the absenteeism on a particular day amongst Customer

Service Representatives (CSR) deployed for attending inbound calls
follows a binomial distribution. Further suppose that the probability
of a CSR to stay absent is 0.1. There are 25 CSR in the call center.
I. What is the probability that on a particular day, 3 CSR will remain

absent?
II. What is the probability that on a particular day, not more than 3
CSR will remain absent?
III. What is the probability that on a particular day 3 or more CSR
will remain absent?
IV. Company wants to know from the manager whether in 95 percent
cases absenteeism amongst CSR is less than 5 on a particular day
or not. What will be response?
Solution To Part - I and II.
Let X be the random variable denoting the number of CSR who
remain absent on a particular day. In this context, we have:
~ (25, 0.1)
I. The probability that on a particular day, 3 CSR will remain absent =

25
=3 = 0.1 3 (0.9)22 = 0.2264973.
3
II. The probability that on a particular day not more than 3 CSR will
remain absent:
= 3 = =0 + =1 + =2 + =3
25 25
= 0.1 0 (0.9)25 + 0.1 1 (0.9)24
0 1
25 25
+ 0.1 2 (0.9)23 + 0.1 3 (0.9)22
2 3
= 0.0717898 + 0.1994161 + 0.2658881 + 0.2264973 = 0.7635913.
Solution To Part III and IV.
iii. The probability that on a particular day 3 or more CSR will remain
absent:
3 =1 2 =1 =0 =1 =2
= 1 0.0717898 + 0.1994161 + 0.2658881 = 0.462906.
iv. Here we first need to calculate and check : < 5 = 4 .

Now, 4 = 3 + = 4 = 0.7635913 +
25
0.1 4 (0.9)21 = 0.7635913 + 0.138415 = 0.9020063 < 0.95.
4
So in 95 percent cases absenteeism amongst CSR is not less than 5. You

may check that 5 = 0.966600, So in 95 percent plus (actually
96.66%) cases absenteeism amongst CSR is up to 5.
Distribution Function , Survival Function
and Hazard Function
In the previous example, we observe that we are often interested in
finding the probability of the type or or
> or < for a given on real line.
The most important to this end is the for any real-valued

random variable .
The cumulative distribution function (c.d.f) [or in short only

distribution function] of a real-valued random variable X is the
function given by () = , that represents the
probability that the random variable X takes on a value less than or
equal to x.
and Hazard Function
The probability that X lies in the semi-closed interval (, ], where < ,
is therefore < = = .
There are four necessary and sufficient conditions for a function to be a

distribution function and Vice Versa. We shall consider them without the
proof.
Every cumulative distribution function F is monotone non-decreasing, that

is, for any 1 < 2 ; 1 , 2 ; 1 2 .
= 0 or in other words lim x = 0

+ = 1 or in other words lim x = 1.

Every cumulative distribution function F is right-continuous, in the sense,
x+0 = x or in other words lim x + = x .
0+
and Hazard Function
In the definition above, the "less than or equal to" sign, " ", is a
convention.
Many old Soviet literature uses " < ", so that the forth property
change left continuity instead of right continuity.
This convention does not matter much in case of absolutely

continuous densities but is important for discrete distributions.
The CDF of a continuous random variable X can be expressed as the

integral of its probability density function as follows:

= ()

and Hazard Function
The survival function, also known as a reliability function or
complementary cumulative distribution function is a property of any
random variable that maps a set of events, usually associated with
mortality or failure of some system, onto time.
It captures the probability that the system will survive beyond a
specified time.
The term reliability function is common in engineering while the term

survival function is used in a broader range of applications, including
human mortality.
Let T be a random variable with CDF F(t). Its survival function or

reliability function is: () = ( > ) = 1 .
and Hazard Function
Failure rate is the frequency with which an engineered system or
component fails, expressed in failures per unit of time. It is often
denoted by the and is highly used in reliability engineering and
engineering management.
Calculating the failure rate for a smaller intervals of time in a

limiting sense, or instantaneous failure rate results in the hazard
function (also called hazard rate), ().
By definition
()
= .
1 ()
Inverse distribution function
(quantile function)
We shall see rigorous use of this when we shall study Testing of
Hypothesis in QT-II.
The quantile function specifies, for a given probability in the

probability distribution of a random variable, the value at which
the probability of the random variable being less than or equal to
this value is equal to the given probability.
It is also called the percent point function or inverse cumulative

distribution function.
Inverse distribution function
(quantile function)
If the CDF is strictly increasing and continuous then 1 ;
0, 1 , is the unique real number such that = .
In such a case, this defines the inverse distribution function or

quantile function.
Some distributions do not have a unique inverse (for example in the

case where = 0 for all < < causing to be
constant).
This problem can be solved by defining, for [ 0 , 1 ] , the

generalized inverse distribution function:
1 = inf{ : } .
Applications of Quantile Function
Recall the Q-IV on Call center absenteeism: Company wants to know
from the manager whether in 95 percent cases absenteeism amongst
CSR is less than 5 on a particular day or not. What will be response?
Any modern statistical software will give us 95th percentile point of a

Binomial Distribution with parameter = 25 and = 1. It will be 5.
From that we can also response to the question raised by Company

management.
Further certain location measures of a distribution are done by

employing quantile function, such as, Median or First or Third quartile.
Median and Other Quartiles of
Absenteeism
Call center absenteeism, more often we are interested in the
following questions:
V. What is the Median number of Absentees ( that is, in about 50%

cases, up to what number of CSRs will remain absent or in about
50% cases absenteeism will more than the given number) ?
VI. What is the first or third quartiles of the distribution of Call

center absenteeism?
VII. What is the Quartile Deviation of the Call center absenteeism?

Median and Other Quartiles of
Absenteeism
Note that = 1 0.50 ; the middlemost point in a distribution.
The first Quartile is 1 0.25 ; the point below which probability is

25% and above which probability is 75%.
The third Quartile is 1 0.75 ; the point below which probability is

75% and above which probability is 25%.
Quartiles are typical Quantile measures - there are various quantile

measures of location such as, percentile, decile etc.
Given a set of raw (numerical) data arranged according to order of

magnitude, median is usually considered as the middlemost observation
when number of data point is odd or, by convention, the average of the
two middlemost points when number of observations are even.
Solution to Part V and VI
Recall that
[ = 0] = 0.0717898 < 0.25 [ = 0] + [ = 1] =

0.0717898 + 0.1994161 = 0.2712059
Clearly the First Quartile is 1 as at this point distribution

function crosses 0. 25 mark.
= 0 + = 1 = 0.2712059 < 0.5 = 0 +

= 1 + = 2 = 0.2712059 + 0.2658881 = 0.537094.
Clearly the Median is 2 as at this point distribution function

crosses 0. 50 mark.
Solution to Part V and VI
Further, 2 = 0.537094 < 0.75 3 = 0.7635913
Clearly the third quartile is 3 as at this point distribution function

crosses 0. 75 mark.
From these sets of results we can answer Questions V and VI.
Note that we always observe jumps in cumulative distribution function

we we deal with discrete distributions.
Also note that we shall often use 0.5th , 1st , 2.5th , 5th , 95th, 97.5th,
99th and 99.5th percentile points in Statistical Inference in QT-II.
Solution to Part VII
Quartile Deviation is a measure of dispersion or variability in the
probability distribution.
Writing 3 = 1 0.75 ; 1 = 1 0.25 ; we define Quartile

deviation (QD) as (3 1 )/2.
The difference 3 1 is often called Inter-Quartile Range (IQR).
In the given problem of call center absenteeism

31
= = 1.
2
These concepts can easily be used when we have a set of raw data
and/or frequency distribution.
Quartile Based Skewness Measure
Skewness measures the degree of asymmetry in the probability
distribution or data as the case may be.
A quick and robust measure of Skewness is Bowleys skewness

(), Writing 2 as the second quartile or median; we define, =
(3 2 )(2 1 ) 2 +
= 3 2 1.
3 1 3 1
Question-VIII Find the Bowleys Skewness in Call Center

absenteeism and comment.
Clearly Bowleys skewness=0.
But Binomial (25, 0.1) this is not a Symmetric Distribution in

general. Why so?
Note on Quartile Measures
These measures are not based on entire probability distribution or
the entire data as the case may be
As a result, if there are are some outliers in the tails , these measures
are highly robust and efficient in the sense they are not influenced
by the presence of outlier.
However, since they are not based on entire distribution or data,

sometime results are little surprising, such as, Bowleys skewness is
0 in our example. This is because skewness is measured using just
three locations of the distributions and more tail information are not
considered.
This remind us that we must study certain measures that are based
on entire probability distribution or the data as the case may be.
How to find Average Absenteeism
Question IX: What is the average number of absentees per day?
Average of a random variable is usually computed using the notion

called expectation.
For a discrete random variable that takes values 1 , 2 , ,

respectively with probability 1 , 2 , , the expectation of ,
denoted by , is given by

=
=1
For binomial distribution, it is:
= =1 1 = [See Board for Proof]
In the present problem average is 25 0.1 = 2.5.

Rationale Behind Expectation
For a variable that takes values 1 , 2 , , respectively with frequency

1 , 2 , , the arithmetic mean of is given by

1

= =

=1 =1
where

=
=1

Note that, according to relative frequency approach of probability tends to

probability of , which may be given by .
That is in the long run, tends to and is given by

=
=1
Problems of Absenteeism (Contd.)
Suppose, there are 10 more call center agents (CSA) who handle the
responsibilities of sales promotion and marketing through outbound
calls. On a particular day their absenteeism follows a binomial
distribution with parameter 0.08.
X. Can we say in this case that in 95 percent situations, absenteeism

amongst CSA is less than 5?
XI. On an average how many CSA will remain absent on a particular

day?
XII. In this connection, we may have an additional question that is there

any significant differences in average rate absenteeism among
CSRs and CSAs For which need raw data and we discuss such
issues in QT-II
Solutions to Part X and XI
Let Y be the random variable denoting the number of CSA who

remain absent on a particular day.
In this context, ~ (10, 0.08)
The probability that on a particular day, less than 5 CSA will remain
absent = < 5 = [ 4].
It is easy to see that the probability [ 4]=0.9994143.

So the answer to the Question No X is affirmative.
As regard to Question No. XI, we further see that = =

10 0.08 = 0.8. That is on an average, there will be 0.8 absentees
among CSAs per day.
More Problems of Absenteeism (Contd.)
XIII.Suppose the manager of the call center adopt a strategy

that if on a particular day more than 3 CSR remain absent,
one CSA will be deputed as CSR. What is the probability
that on a particular day, one CSA has to act as a CSR?
XIV.Further suppose that the manager adopt a strategy that he

will assign one CSA the CSR role per two absent CSR
(and may ignore absence of any one CSR) on a particular
day. What is the probability that on a particular day, two
CSA have to act as the CSR?
Solutions to Part XIII
Under certain assumption, the required probability for the first
problem is same as the probability that more than 3 CSR will
remain absent. That is = > 3 = 1 [ 3]=0.2364086
In this context, we have assumed that at least one CSA will remain
present.
Such an assumption is plausible because the probability that no

CSA is present is same as = 10 = 1.073742 x 1011 .
This probability is practically 0. That is why, we can avoid the

hassles of using joint/conditional probability.
In general, otherwise we need to work with joint distribution of two

variables. More on these will be discussed later.
Solutions to Part XIV
As per policy, exactly 2 CSA has to be deputed if and only if 4

or 5 CSR remain absent on a particular day.
Probability that at least two CSA will be available is 8

is almost 1. (Note that the probability of the complementary
event is 1.245541e-09).
So we can safely assume that always 2 CSA will be available to

act as CSR.
Therefore the required probability can be approximated by:

4 5 = 5 3 =0.2030087
Problems of Absenteeism (Contd.)
At some point of time, the top management realizes that retaining

the brand value and serving the existing customers better are
more important than trying to win a few new customers.
They directed the manager that 25 CSR must be used 24x7 even
at the cost of sales promotion, if necessary.
XV.What is the probability that on a particular day there will be no

one for sales promotions and marketing?
Solutions to Part XV
The probability that on a particular day there will be no one for sales
promotions and marketing if and only if sum of the number of
absent CSR and CSA on a particular day is 10 or more.
That is, iff + 10.
Note that, if ~ (1 , ) independently of where

~ 2 , , + ~ (1 + 2 , ). See Board
for Proof.
In general, this is not true if proportion of success p is not same in

either cases. In such cases, as in the given problem, we have to
evaluate probabilities directly.
Solutions to Part XV
To evaluate: P + 10 .
P + 10
= 0, = 10 + 1, = 9 + 2, = 8 +
+ 10, = 0
= 0 = 10 + 1 = 9 + 2 = 8
+ + 10 = 0
(We assume that X and Y are independent random variable where
we can apply rule of Multiplication of probability)
We can easily evaluate this using Calculator. You can check that the
probability is 0.001103982.
More Problems with Call Center
Management
Further suppose that at any particular moment number of incoming
calls for the CSR follows a Poisson distribution with rate (average) 8.
Customers do not have to wait if at least one of the 25 (assuming no
absence) CSR is free as the call will automatically go to a free CSR.
What is the probability that at any point of time just one customer
has to wait?
What is the probability that at any point of time at least three

customers have to wait?
What is the probability that at any point of time more than half of the
CSR will remain free?
Average (Expectation) in Context of
For a discrete random variable that takes values

1 , 2 , , respectively with probability 1 , 2 , , the expectation of
, denoted by , is given by

=
=0
provided the sum is finite. In fact, the sum exists iff
|| < .
For a Poisson random variable , the condition holds and it can be

shown that
=
In the present problem average is 8, therefore, we can assign Poisson

parameter =8.
Solution to Call Center Problems using
Poisson Model
Let be the random Variable denoting the number of
customers have to wait at a certain time point.
Note that 0 iff number of customers wish to avail

CSR services () at a particular time is more than 25. In
such cases, = + 25.
Here follows (8)
Required Probability: = 1 = [ = 26]

> dpois(26,8)
[1] 2.513997e-07
Solution to Call Center Problems using
Poisson Model (contd.)
At any point of time at least three customers have to wait iff 3
Required Probability: 3 = 28 = 1 [ 27]

Using R:
> 1-ppois(27,8)
[1] 2.925614e-08
Further at any point of time more than half of the CSR will remain
free if customers availing CSR services () at that time is not more
than 12.
That is the required probability : 12 .

Using R
> ppois(12,8)
[1] 0.9362028
Dispersion of a Random Variable
Variance of a random variable is given by

= 2 = ( ) 2 = 2 2 .
Condition for existence of variance:

2 < .
Standard deviation of a random variable is the positive square
root of variance.
~ , , Var = np 1 p
~ (), Var =
Examples
The standard deviation (SD) of the random variable , denoting

the number of CSR who remain absent on a particular day
where, ~ (25, 0.1) is:
25 0.1 0.9 = 1.5.
The standard deviation (SD) of the random variable , denoting

the number of Customers trying to avail CSR service at certain
time, where, ~ (8) is:
8 = 2.828.
More Problems with Call Center
Management Continuation of Session-8
Suppose at any particular moment number of incoming calls for the
CSR follows a Poisson distribution with rate (average) 8. Customers
do not have to wait if at least one of the 25 (assuming no absence)
CSR is free as the call will automatically go to a free CSR.
XVI. What is the probability that at any point of time just one customer
has to wait?
XVII.What is the probability that at any point of time at least three

customers have to wait?
XVIII.What is the probability that at any point of time more than half of
the CSR will remain free?
Average (Expectation) in Context of

1 , 2 , , respectively with probability 1 , 2 , , the
expectation of , denoted by , is given by

=
=0
provided the sum is finite. In fact, the sum exists iff
|| < .
For a Poisson random variable , the condition holds and it can

be shown that
=
In the present problem average is 8, therefore, we can assign

Poisson parameter =8.
General Rules for Expectation

1 , 2 , , respectively with probability 1 , 2 , , the
expectation of a regular function of , say () is given by

() = ( ) [ = ] = ( )
=0 =0
provided the |()| < .
If the distribution of is absolutely continuous

() =
where the integral is over the support of , provided of
course |()| < .
Solution to Part - XVI of Call Center
Problems using Poisson Model
Let be the random Variable denoting the number of
customers have to wait at a certain time point.
Note that 0 iff number of customers wish to avail CSR

services () at a particular time is more than 25. In such cases,
= + 25.
Here follows (8)
Required Probability:
= 1 = = 26 =2.513997x 107
Solution to Parts XVII and XVIII
of Call Center Problems
At any point of time at least three customers have to wait iff
3
Required Probability:
[ 3] = [ 28] = 1 [ 27] = 2.925614 108
Further at any point of time more than half of the CSR will
remain free if customers availing CSR services () at that time
is not more than 12.
That is the required probability : 12 = 0.9362028

Later we shall see how to approximate such probability
Dispersion of a Random Variable
Variance of a random variable is given by

= 2 = ( ) 2 = 2 2 .
Condition for existence of variance:

2 < .
Standard deviation of a random variable is the positive square
root of variance.
~ , , Var = np 1 p
~ (), Var =
See Board for proofs.
Examples- Problems XIX and XX
XIX.What is the standard deviation and variance of absenteeism

among CSRs and CSAs ?
XX.What is the standard deviation and variance of the variable

denoting number customers who try to reach to certain CSR
at given time?
We can directly apply the formulae of SD and variance of

Binomial and Poisson that we derive ---
Solution to Problem XIX and XX
XIX.The standard deviation (SD) of the random variable , denoting

the number of CSR who remain absent on a particular day where,
~ (25, 0.1) is: 25 0.1 0.9 = 1.5.
In this context the variance is 2.25.
Similarly the standard deviation (SD) of the random variable ,

denoting the number of CSA who remain absent on a particular day
where, ~ (10, 0.08) is: 10 0.08 0.92 = 0.8579.
In this context the variance is 0.736.
XX. The standard deviation (SD) of the random variable , denoting the
number of Customers trying to avail CSR service at certain time,
where, ~ (8) is:
8 = 2.828.
More Problems : XXI and XXIII
Company is often interested in the following problems that

helps them to relook at various strategies:
XXI.What is the median number of customers who try to reach

to certain CSR at given time?
XXII.What is the most likely value of number of absentees

among CSRs and CSAs?
XXIII.What is the most likely value of number customers who

try to reach to certain CSR at given time?
Solution to Problem XXI and XXII
Here direct enumeration is of course a possibility as we did in

case of Problem V and VI.
But better if we know the formula for finding Median or Mode

of a distribution.
Now, we have realized the utility of mean, median, mode,

standard deviation, variance, skewness etc.
We learn, how to find expression for mean and variance. So it

will good if we also learn approaches of computing Median or
mode of a some discrete distributions.
Median of Binomial Model
Suppose ~ (, ). It can be shown that c.d.f. of can be

expressed as:
[]

= = (1 )

=0
1

= ( ) 1 (1 )
0
This may be used to find median but that is not very straight forward.
It is known that if is an integer, then the mean, median, and mode

coincide and equal .
In general, any Median must lie within the interval:

1 []
Mode of Binomial Model
Suppose ~ (, ).
Usually the mode of a is equal to [( + 1)] where [. ] is

the floor function (the largest integer less than or equal to).
However, when ( + 1) is an integer and is neither 0 nor

1, then the distribution has two modes: ( + 1) and ( +
1) 1. --- See Board for Proof.
Degenerate cases: When is equal to 0 or 1, the mode will be 0

and n correspondingly.
Median and Mode of Poisson Model
We know how Poisson model arises as limiting case of

Binomial, using same limit concept, we can say,
Usually the mode of a () is equal to [] where [. ] is

the floor function (the largest integer less than or equal to).
However, when is a positive integer the distribution has two

modes: and 1. --- Try yourself.
Degenerate case: When is equal to 0 the mode will be 0.

More Problems on Call Center Management
XXIV. Suppose that after implementation of some stringent laws for

casual leaves probabilities of absenteeism boil down to 0.05 for
both CSA and CSR. We assume that two groups behave
independently. Under the same policy as in Problem No. XV, how
will you compute the probability that on a particular day no one will
be left for sells promotion and marketing?
XXV. On a given day, it was found that the total number of absentees
two groups taken together is 5. What is the probability that 4 of
them are CSRs?
Imagine the what happens under conditioning with independent two

binomial random variables are on the plate. (See Board for Proof.)
Expectation and Variance of
Geometric Model
X: No. of Bernoulli trials required to get the first success

1 2
1
E X = . 1 = . 1 (1 ) = .

=1
1
Check that = .
2
Y: No. of failures preceding the first success in a sequence of

Bernoulli trials. = 1.
1 1
= 1 = 1= 1= .

= 1 =
Some General rules for
Expectation and Variance
Expectation of any constant c is the same constant. = .
Variance of any constant c is always 0. V = 0.
For any two random variable and , sum law of expectation states:
+ = + = + , .
Always = 0. Similarly = 0.
For any two random variable and :

+ = + 2 , ,
where , = ( )( ).
If and are independent , = 0 [Converse is not true] and

+ = + .
A simple use of sum law of expectation
Suppose a manager is conducting online interviews to fill up

r vacancies one by one till all the positions are filled up. He
knows that the probability that a candidate will be selected is
. On an average, how many candidates he has to interview
and what will be its variance?
One way is to compute Mean and Variance of Negative

binomial model directly. But we can use sum rule more to
obtain it more elegantly.
A simple use of sum law of expectation
Let denote the number of candidates need to interviewed to fill the i-th
vacancy.
Therefore, if denote the number of candidates need to interviewed to

fill the r vacancies, we have = =1 .
As a consequence:

E = = ( ) = .

=1 =1
It can be also seen that s are independent and therefore,

(1 )
Var = = ( ) = 2
.

=1 =1
Application of Laws of
Expectation and Variance
Example: Sales versus Profit:
The monthly sales, , of a company have a mean of Rs. 25

Million and a standard deviation of Rs. 4 Million. Profits, ,
are calculated by multiplying sales by 0.3 and subtracting
fixed costs of Rs. 6 Million. What are the mean profit and the
standard deviation of profit?
Solution
Throughout the computation, we consider the unit of currency as Rs.

in Million. We know that:
= 25 () = 42 = 16
Further; = 0.3 6
Therefore,
= 0.3 6 = 0.3 25 6 = 1.5
and
= 0.32 = 0.09 16 = 1.44
Consequently, = = 1.2
Applications of Laws of Expectation

in Decision Making under Uncertainty. . .
Many of the concepts we have introduced can be used effectively in
analyzing decision problems that involve uncertainty.
The basic features of such problems are:
We need to make a choice from a set of possible alternatives. Each
alternative may involve a sequence of actions.
The consequences of our actions, usually given in the form of a
payoff table, may depend on possible states of nature, which are
governed by a probability distribution (possibly subjective).
The true state of nature is not known at the time of decision.
Our objective is to maximize the expected payoff and/or to
minimize risk.
We could acquire additional information regarding the true state of
nature at a cost.
Example: Investment Decision
We shall consider only one example here. For more

complicated problems, a decision tree can be used. Those will
be discussed later.
An individual has Rupees 1 million and wishes to make a

one-year investment.
Suppose his/her possible actions are:
1 : buy a guaranteed income certificate paying 10%

2 : buy bond with a coupon value of 8%
3 : buy a well-diversified portfolio of stocks
A coupon payment on a bond is a periodic interest payment

that the bondholder receives during the time between when
the bond is issued and when it matures.
Coupons are normally described in terms of the coupon rate,

which is calculated by adding the total amount of coupons
paid per year and dividing by the bond's face value. For
example, if a bond has a face value of Rs. 1,000 and a coupon
rate of 5%, then it pays total coupons of Rs. 50 per year.
Return on investment in the diversified portfolio depends on

the behavior of the interest rate next year. Suppose there are
three possible states of nature:
1 : interest rate increases

2 : interest rate stays the same
3 : interest rate decreases
Suppose further that the subjective probabilities for these

states are 0.2, 0.5, and 0.3, respectively.
Based on historical data, the payoff table is:

States Actions Actions
of Nature
1 2 3
1 100,000 50,000 150,000
2 100,000 80,000 90,000
3 100,000 180,000 40,000
Which action should he/she take?

Solution
The expected payoffs for the actions are:
1 : 0.2 100,000 + 0.5 100,000 + 0.3 100,000

= 100,000
2 : 0.2 50,000 + 0.5 80,000 + 0.3 180,000
= 84,000
3 : 0.2 150,000 + 0.5 90,000 + 0.3 40,000
= 87,000
Hence, if one wishes to maximize expected payoff, then

action 1 should be taken.
Solution based on an equivalent concept
To minimize expected opportunity loss (EOL).
Consider any given state. For each possible action, the opportunity
loss is defined as the difference between what the payoff could
have been had the best action been taken and the payoff for that
particular action. Thus,
States Actions Actions
of Nature
1 2 3
1 50,000 200,000 0
2 0 20,000 10,000
3 80,000 0 140,000
EOL 34,000 50,000 47,000
Indeed, 1 is again optimal.

Application in Fair Betting
For actual investments, we expect to get a net positive return.
In order to establish a baseline for any wager, investment, or

even an insurance premium (which is a form of wager), we will
study the concept of a fair bet in this application.
Wager: An agreement in which people try to guess what will

happen and the person who guesses wrong has to give something
(such as money)
What does it mean to say I'll give you 4 to 1 odds that the GST bill pass
the floor test in upper house?
The most common interpretation is that you are willing to risk Rs. 4
against someone else Rs. 1 on the outcome of the GST bill pass.
Specifically, if the GST bill pass the floor test in upper house, you get Rs.
1 and if not you pay Rs. 4.
When you give odds on something happening, we will call this odds for
(()).
How does this relate to your belief about the underlying probability of the
event?
If we let be the probability of the event happening then the situation can
be diagramed as in next slide:
Event Probability Gain(Loss)

Happens 1
Doesn't Happen (1 ) ()
Therefore for the situation to be "fair" one must have :
1 + 1 = 0
This implies that the relationships between and () are:

= ; =
+ 1 1
Probability as a Function of Odds For
1
0.95
0.9
Probability
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0 5 10 15 20
Odds For to 1
Note on odd against
If you think that the probability of something happening is less than

.5, then you would have to offer odds like 50 paise to Rs. 1. This is
not usual since Odds are usually quoted in even dollar amounts.
A bet of 50 paise to Rs. 1 would be converted to a bet of 1:2. From

the other persons point of view, he/she is now betting against the
event happening.
In other words if it happens he/she will now lose Rs. 2 and if it

doesn't happen they will gain a rupee. In other words they are
giving you odds against the event happening.
Note on odd against
Let O(a) equal the amount the person will lose if the event happens, then
the table becomes
Event Probability Gain(Loss)
Happens 1
Doesn't Happen (1 ) ()
Probability as a Function of Odds
Against
0.5
0.4
Probability
0.3
0.2
0.1
0
0 5 10 15 20
Odds Against to 1
Application in determination of
Insurance Premium
Assume that a person is 35 years old and have a probability

0.001 of dying in the next year. Suppose the person want to
purchase a Rs. 100,000 insurance policy. Also, known that:
Overhead Cost per Rs. 100 of sales = Rs.75
Desired Profit = 10% of revenues
What should be the premium?
Outcome Probability Insurance Company Gain

Live 0.999
Die 0.001 Rs. 100,000
Application in determination of
Insurance Premium
Under Fair Betting
= 0.999 + 0.001 . 100,000 = 0 ;
Solving we get:
= . 100
Overhead Cost per Rs. 100 of sales = Rs. 75 and Desired Profit
= 10% of revenues. Therefore,
Target Profit (in Rs. ) = 100 + 75 .10 = 17.50
Pr = . 100 + . 75 + . 17.50 = . 192.50.

Insurance Company's Perspective
Subjective Company
Outcome Probability Gain P Gain
Live 0.999 Rs. 192.50 Rs.192.31
Die 0.001 - Rs. 99,807.50 -Rs.99.81
Expected
Gain = Rs.92.50
Overhead = Rs.75.00
Expected Profit = Rs.17.50
Persons Perspective
Person's
Outcome Probability Gain
Live (1 ) -192.5 0.998075 -192.129
Die 99807.5 0.001925 192.129
(1 ) (192.5) + (99807.5)
= 0 0
= 0.001925
Joint Probability Distribution
Consider an Example where in a small township, houses are

sold by two agents , say, THC and GPL. Let and be the
respective numbers of houses sold by them in a month. Based
on past sales, we estimated the following joint probabilities for
and .
X 0 1 2 3
Y
0 0.10 0.30 0.05 0.04
1 0.20 0.05 0.05 0.02
2 0.06 0.03 0.02 0.01
3 0.04 0.02 0.01 0
Broadly, we have looked at univariate distributions, i.e.,

probability distributions in one variable or multiple independent
variables.
Bivariate distributions, also called joint distributions, are

probabilities of combinations of two variables.
For discrete variables X and Y , the joint probability distribution

or joint probability mass function of and is defined as:
, = = =
for all pairs of values and .
As in the univariate case, we require:
0 , 1 for all pairs of , for all pairs of values

and .
, = 1.
Thus, in our example example (0, 1) = 0.20 , meaning that

the joint probability for and to sell 0 and 1 houses,
respectively, is 0.20.
Other entries in the table are interpreted similarly.
Note that the sum of all entries must equal to 1.
Marginal Probabilities
The marginal probabilities are calculated by summing across

rows and down columns:
In ourXexample,0 1 2 3 Marginal
Y Probabilities of
0 0.10 0.30 0.05 0.04 0.49

1 0.20 0.05 0.05 0.02 0.32
2 0.06 0.03 0.02 0.01 0.12
3 0.04 0.02 0.01 0.00 0.07
Marginal 0.40 0.40 0.13 0.07 1.00
Probabilities
of
Marginal Probabilities
This gives us the probability mass functions for and

individually. For example, the marginal probability for THC
to sell 1 house is 0.4.
X Marginal Y Marginal
Probabilities of Probabilities of
0 0.40 0 0.49
1 0.40 1 0.32
2 0.13 2 0.12
3 0.07 3 0.07
Total 1.00 Total 1.00
Independence of Random Variables
Two variables and are said to be independent if

( = = ) = ( = )( = )
for all and .
That is, the joint probabilities equal the product of marginal

probabilities. This is similar to the definition of independent
events.
In the houses-sold example, we have

( = 0 = 2) = 0.06 ,
= 0 = 0.4 , = 2 = 0.12.
Hence, and are not independent.
Properties of Bivariate Distributions. . .
Expected values, Variances, and Standard Deviations etc. can

be computed
Please Check yourself that:
= = 0.87 ; = = 0.77
2 = = 0.7931; 2 = = 0.8371
These marginal parameters are computed via earlier formulas.

Covariance and Correlation
Covariance: The covariance between two discrete variables is
defined as:
, = ,

This is equivalent to:

, = ,

Example: Houses Sold
, = 0.53 0.87 0.77 = 0.1399
Coefficient of Correlation : The LINEAR association between
between two variables is given by Pearsons Coefficient of
Correlation or Product-moment correlation defined as:
(, )
, = , =

Note that , only measures the linear association between

two variables and .
If and are linearly uncorrelated , = 0 but , = 0

does not imply and are independent.
0.1399
, = = 0.1716979
0.7931 0.8371
This indicates that there is a bit of negative relationship between
the numbers of houses sold by THC and GPL.
Is this surprising?
For absolutely continuous random variables, sums are usually

replaced by integration and p.m.f by corresponding p.d.f.
Conditional Probability Distribution
Formally, let and be two random variables. Then, the
conditional probability distribution of , for all values of y,
given = is defined by:
P(Y = y and X = x)
( | ) = ( = = = .
P(X = x)
Given = , we can also calculate the conditional expected
value of via:
( | = ) = y P(y | x) .

( | = ) is known as True regression of on .
Similarly ( | = ) is the True regression of on .
Conditional Probabilities of Y
The conditional probabilities of are calculated for various

given .
In our example, Given Given Given Given
Y =0 = = =
0 0.25 0.750 0.3846 0.5714
1 0.50 0.125 0.3846 0.2857
2 0.15 0.075 0.1538 0.1429
3 0.10 0.050 0.0770 0.00
Total 1.000 1.000 1.000 1.000
(|) 1.10 0.425 0.9229 0.5714
True Regression Line of on
E(Y|x)
1.2
0.8
0.6
E(Y|x)
0.4
0.2
0
1 2 3 4
Sum of Two Variables...
The bivariate distribution allows us to develop the probability

distribution of the sum of two variables, which is of interest in
many applications.
In the houses-sold example, we could be interested in the

probability for having two houses sold (by either THC or GPL)
in a month.
This can be computed by adding the probabilities for all

combinations of (, ) pairs that result in a sum of 2:
( + = 2) = (0, 2) + (1, 1) + (2, 0) = 0.16 .

Using this method, we can derive the probability mass function

for the variable + :
+ ( + )
0 0.10
1 0.50
2 0.16
3 0.16
4 0.06
5 0.02
6 0
Total 1.00
The expected value and variance of X + Y obey the following

basic laws. . .
I. ( + ) = () + ( )
II. ( + ) = () + ( ) + 2 (, )
If X and Y happens to be independent, then (, ) = 0

and thus ( + ) = () + ( ).
( + ) = 0.87 + 0.77 = 1.64 ,

+ = 0.7931 + 0.8371 + 2 0.1399 = 1.3504,
+ = + = 1.162
Note that the negative correlation between X and Y had a

variance-reduction effect on X + Y . This is an important
concept. One application is that investing in both stocks and
bonds could result in reduced variability or risk.
Application: Mutual Fund Sales
Suppose a mutual fund sales person has a 50% (perhaps too
high, but we will revisit this) chance of closing a sale on each
call she makes. Suppose further that she made four calls in the
last hour.
Consider closing a sale a success and not closing a sale a

failure. Then, we will study the variables:
= total number of successes
= number of successes before first failure
An interesting question is: How would the distribution of

vary for different values of ?
Suppose a mutual fund sales person has a 50% (perhaps too
high, but we will revisit this) chance of closing a sale on each
call she makes. Suppose further that she made four calls in the
last hour.
Consider closing a sale a success and not closing a sale a

failure. Then, we will study the variables:
= total number of successes
= number of successes before first failure
An interesting question is: How would the distribution of

vary for different values of ?
Let
= total number of successes (denoted by S) out of 4 sales calls
= number of successes before the first failure (denoted by F) in
the same 4 sales calls
Assumptions:
I. The success probability for a call is 0.5 (or 1/2).
II. The outcomes of different calls are independent.
The Sample Space: There are 24 =16 possible outcomes, listed in

next in the slide.
SSSS FFFF
SSSF FFFS
SSFS FFSF
SFSS FSFF
FSSS SFFF
SSFF FFSS
SFSF FSFS
SFFS FSSF
Since the success probability is 0.5, each possible outcome has a
1
probability of = 0.0625.
16
The Variable X: By simple counting, we have:
X X
SSSS 4 FFFF 0
SSSF 3 FFFS 1
SSFS 3 FFSF 1
SFSS 3 FSFF 1
FSSS 3 SFFF 1
SSFF 2 FFSS 2
SFSF 2 FSFS 2
SFFS 2 FSSF 2
By counting the number of times each value for X occurs, we

obtain the probability mass function (or distribution) of X:
x P(x)
0 0.0625
1 0.25
2 0.375
3 0.25
4 0.0625
Recall that X is a Binomial Distribution with parameters = 4
and = 0.5.
Please check that () = 2; () = 1.
Probability Mass Function of X
0.4
0.35
0.3
Probability
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4
Values of X
The Variable : For each possible outcome, we can also

determine the value of :

SSSS 4 4 FFFF 0 0
SSSF 3 3 FFFS 1 0
SSFS 3 2 FFSF 1 0
SFSS 3 1 FSFF 1 0
FSSS 3 0 SFFF 1 1
SSFF 2 2 FFSS 2 0
SFSF 2 1 FSFS 2 0
SFFS 2 1 FSSF 2 0
By counting the number of times each value for Y occurs, we

obtain the probability mass function (or distribution) of Y:
y P(y)
0 0.5
1 0.25
2 0.125
3 0.0625
4 0.0625
Y is actually a Right Truncated Geometric Distribution with
with at 4 and = 0.5.
Please check that () = 0.9375; () = 1.433594.
Probability Mass Function of Y
0.6
0.5
0.4
Probability
0.3
0.2
0.1
0
0 1 2 3 4
Values of Y
Application: Mutual Fund Sales:
Bivariate Distribution of X and Y:
For each of the outcomes in our sample space, we have both an

X value and a Y value.
We can therefore develop the joint probability distribution of X

and Y.
The table in next slide gives the bivariate probabilities P(x,y)

for all possible combinations of x and y values:
Row
x\y 0 1 2 3 4 Sum
0 0.0625 0 0 0 0 0.0625
1 0.1875 0.0625 0 0 0 0.25
2 0.1875 0.125 0.0625 0 0 0.375
3 0.0625 0.0625 0.0625 0.0625 0 0.25
4 0 0 0 0 0.0625 0.0625
Column
Sum = 0.5 0.25 0.125 0.0625 0.0625 1
The marginal probabilities for and for are highlighted,
respectively, in blue and red above. These are consistent with
what we obtained earlier.
Chart of Bivariate Distribution of X and Y:
Joint Distribution of X and Y
0.2
0.18
0.16
0.14
Probability
0.12
0.1
0.08
0.06
0.04 4
0.02
2
0
0 1 0
Y Values
2 3 4
X Values
Please check that () = 2.6875.
(, ) = 0.8125
The correlation coefficient between and is:

, = 0.678594476
This is a relatively large value, indicating that there is a decent

positive linear relationship between and .
Conditional Distribution of given :
The joint distribution of and allows us to also determine the
distribution of for any given value of . We noted before that
this is called a conditional distribution, and it is a very important
concept.
As an example, we could ask what is the probability for = 2,
given that X=3?
This probability is denoted by P(Y=2 | X=3) and can be computed

as follows:
( = 2 | = 3) = ( = 2 = 3) / ( = 3) = 0.25
Repeating this then yields the conditional distribution of , given
= 3:
Conditional Distribution of given :
Given = 3, we have:
0 1 2 3 4
(| = 3) 0.25 0.25 0.25 0.25 0
Similarly, for other values of , we have: :
0 1 2 3 4
(| = 0) 1 0 0 0 0
(| = 1) 0.75 0.25 0 0 0
(| = 2) 0.5 0.33333 0.16667 0 0
(| = 4) 0 0 0 0 1
Conditional Expectation of given :
For each given value for , we can now compute the expected
value of .
We noted earlier that this is called a conditional expectation.

Conditional expected values of for different given values of
allows us to better understand the nature of the relationship
between and . This concept will be important in regression.
For our example here, this results in:

= 0 = 0; = 1 = 0.25
= 2 = 0.666666667; = 3 = 1.5;
(| = 4) = 4
True Regression of given :
A plot of these shows that:
Conditional Expected Values of Y
4.5
4
3.5
3
E(Y|X=x)
2.5
2
1.5
1
0.5
0
0 1 2 3 4
Given X Values
Conditional Expectation of given :
Observe that these conditional expected values vary depending on
the given value of .
The diagram clearly indicates a positive relationship between

and , which is consistent with our calculation of the correlation
coefficient.
In the context of our problem, this means that the greater the total
number of successes, the longer the run of successes before first
failure.
This is rather intuitive.

True Regression of given :
In practice, nature (or mathematical form) of the true regression is
very complex or even untraceable.
Thus, we consider certain working model visualizing scatter

diagram and/or Matrix plot.
More on regression will be discussed later.

Thank you For Your Patience
The probability that we may fail in the struggle ought not to

deter us from the support of a cause we believe to be just.
~Abraham Lincoln
The 50-50-90 rule: anytime you have a 50-50 chance of getting

something right, there's a 90% probability you'll get it wrong.
~ Andy Rooney

Probability Theory

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory

Uploaded by

Copyright:

Available Formats

Int

Dr. Amitava Mukherjee;

Unavoidably an element of uncertainty is always present regarding

Pure Mathematics or Applied Mechanics you have studied so far

In deduction, given the premises, the conclusion necessarily follows

If a piece of deductive reasoning is free from fallacy, its conclusion

If the premises are materially valid so is the conclusion, but

In induction the premises only lend some support to the

Apart from formal validity, the question of the material

In the case of every exercise at induction, the question of

Until the operations generating the observations (including

In the case of every exercise at induction, the question of

Until the operations generating the observations (including

In controlled experiments where a set of units are subjected

1. Inspection of diameters of cork stoppers in a production-

Uncertainty in the evidence may arise due to one or more of

As regards probability which expresses the uncertainty about the

In the former, roughly speaking, we assume in effect that the

The meaningful sets, technically called measurable sets, are those

The basis of this assumption, which we call frequential regularity

This is commonly called statistical regularity [A misnomer]

Probabilities, so defined, of all meaningful sets in the evidential

In the subjective approach probability exists only in one's mind and

For a particular person the probability of any set of interest

In practice this degree of belief can be quantified introspectively,

Ideally one should attach numerical degrees of belief to

Coherent probabilities for different meaningful sets in the

Since uncertainty here means absence of knowledge, such a

24 September 1501 21 September

A renowned physician, mathematician,

The book remained unpublished possibly because of various

Cardano suffered a number of other tragedies as well. Cardano's son

Cardano supposedly predicted the date of his own death, a

1. The chance of an event in a random trial represents its long-

2. If a die is honest its different faces have equal chance of

3. When the circuit for a trial is well-identified, the chance of an

4. Cardano correctly uses the rule for addition of probabilities in terms

One of the pioneers in introducing

In throwing three dice, the numbers of unordered partitions producing the

A variety of numbers making up a score here represents an ordered

A French (Parisian) mathematician,

Pascal solved some problems on Games of

Although a jurist by profession, Fermat had

A was a prominent Dutch mathematician

Wrote the book entitled De Ratiociniis in

In such a game, one player has a winning strategy, and so we

Reference: Probability and Finance: Its Only a Game!, by

The Principle of Pricing by Dynamic Hedging : [Can be discerned in the

The Hypothesis of the Impossibility of a Gambling System: Sometimes

A company might like to estimate the probability for the

Probability models are used to measure consumer lifetime

Set Theory Probability Theory Notations

and are disjoint set and are mutually =

The probability of an event is the ratio of the number of cases

Consider a box with n white and m red balls. In this case,

We can use classical definition to determine the probability

The frequentist view may have been foreshadowed by

the probable is that which for the most part happens

In the frequentist interpretation, probabilities are discussed

The set of all possible outcomes of a random experiment is

An event is defined as a particular subset of the sample space

The relative frequency of occurrence of an event, observed in

This is the core conception of probability in the frequentist