You are on page 1of 305

Int

Theory of Probability

Dr. Amitava Mukherjee;


Why Should We Teach and Learn
Theory of Probability
in A Premier B- School?
Uncertainty and Inductive Reasoning

Unavoidably an element of uncertainty is always present regarding


the truth or correspondence with facts of the conclusion reached
through inductive reasoning.

Pure Mathematics or Applied Mechanics you have studied so far


are broadly based on prolongation of deductive reasoning.

In deduction, given the premises, the conclusion necessarily follows


from them.

If a piece of deductive reasoning is free from fallacy, its conclusion


is formally valid.

3
Uncertainty and Inductive Reasoning

If the premises are materially valid so is the conclusion, but


deduction as such is not concerned with material validity.

In induction the premises only lend some support to the


conclusion relate to the contingent (i.e. situated in space and
time) world.

Apart from formal validity, the question of the material


validity or validity as the basis of practical action of the
conclusion (over and above the same of the premises)
naturally arises.

4
Degree of Uncertainty

In the case of every exercise at induction, the question of


assessing the degree of uncertainty, or in other words, the
extent of support given to the conclusion by the evidence, is
relevant.

Until the operations generating the observations (including


additional observations if extensibility is assumed) are
performed, the evidence (extended evidence) that would be
realized remains uncertain and this uncertainty can be
assessed in terms of probability.

5
Degree of Uncertainty

In the case of every exercise at induction, the question of


assessing the degree of uncertainty, or in other words, the
extent of support given to the conclusion by the evidence, is
relevant.

Until the operations generating the observations (including


additional observations if extensibility is assumed) are
performed, the evidence (extended evidence) that would be
realized remains uncertain and this uncertainty can be
assessed in terms of probability.

6
Uncertainty Objective Approach

In controlled experiments where a set of units are subjected


to different treatments before observation, allocation of the
units to the treatments is similarly randomized.

Example:

1. Inspection of diameters of cork stoppers in a production-


line
2. Controlled clinical Trials

7
Uncertainty Objective Approach

Uncertainty in the evidence may arise due to one or more of


the causes, natural variation, errors of measurement,
sampling variation (incorporating any randomness
deliberately introduced), unforeseen contingencies etc.

8
Uncertainty Subjective Approach
In the subjective approach there is no question of repetition
of observations.
Here uncertainty only means absence of knowledge about
the evidence and extended evidence, before the generating
operations are performed.
Because of this, the scope for induction is somewhat wider
in the subjective than in the objective approach.

Example:
1. Number of working hours that might be wasted due to
contact labour strike in next six months.
2. Exchange rate at the next morning
9
Meaning of Probability
Various Aspects
Meaning of probability

As regards probability which expresses the uncertainty about the


observables, it is given radically different interpretations in the
objective and subjective approaches.

In the former, roughly speaking, we assume in effect that the


unpredictable variation of the evidence is such that the relative
frequency with which it would belong to any meaningful set in the
evidential domain would tend to stabilize around some idealized
value, if the number of repetitions were made indefinitely large.

The meaningful sets, technically called measurable sets, are those


which are of practical interest and are theoretically easy to handle

11
Meaning of probability

The basis of this assumption, which we call frequential regularity


is our experience with diverse types of particular repetitive
experiments.

This is commonly called statistical regularity [A misnomer]

For any set of interest the probability that the uncertain evidence
will belong to it is identified with the corresponding idealized
long-term relative frequency.

Probabilities, so defined, of all meaningful sets in the evidential


domain determine a probability distribution over the domain and
this gives an objective representation of the evidential uncertainty.

12
Meaning of probability

In the subjective approach probability exists only in one's mind and


may vary from person to person.

For a particular person the probability of any set of interest


represents the person's degree of belief in the materialization of the
event that the evidence (extended evidence) generated through the
operations when they are performed would belong to that set.

In practice this degree of belief can be quantified introspectively,


e.g. by ascertaining the maximum price one is prepared to pay
outright for a unit gain contingent on the actual realization of the
event.

13
Meaning of probability

Ideally one should attach numerical degrees of belief to


different sets of interest in a consistent or coherent manner.

Coherent probabilities for different meaningful sets in the


domain define a probability distribution over it.

Since uncertainty here means absence of knowledge, such a


probability distribution may cover evidence extended
backward or collaterally to involve unobserved characters
belonging to the past or the present.

14
Gambling and Games of Chances
A Fascinating History of Development of
Theory of Probability
Cardanoan unrecognized pioneer

Gerolamo Cardano

24 September 1501 21 September


1576

A renowned physician, mathematician,


astrologer, and an inveterate gambler

16
Cardanoan unrecognized pioneer
He wrote a book entitled Liber de Ludo Aleae (The Book on Games
of Chance) around 1564

The book remained unpublished possibly because of various


misfortunes and tragedies that befell the author towards the end of
his life and saw the light of day only in 1663.

Cardano suffered a number of other tragedies as well. Cardano's son


Giambatista poisoned his wife.

Cardano was jailed briefly for heresy (in part for casting the
horoscope of Jesus).

Cardano supposedly predicted the date of his own death, a


prediction that he perhaps ensured by suicide.

17
Basic Ideas and Rules of Probability Theory:
Conceptualized by Cardano

1. The chance of an event in a random trial represents its long-


run relative frequency.

2. If a die is honest its different faces have equal chance of


appearing.
In fact Cardano makes the statement, I am as able to throw 1,
3, or 5 as 2, 4, or 6 which suggests that he had something like
propensity in mind. From this he identifies the set of equally
likely cases (the sets of all 36 or 216 permutations) when two or
three honest dice are thrown.
He uses the term circuit for such a set.

18
Basic Ideas and Rules of Probability Theory:
Conceptualized by Cardano

3. When the circuit for a trial is well-identified, the chance of an


event is represented by the portion of the whole circuit favourable
to it.
Cardano gives the rule that to obtain the odds we have to consider
in how many ways the favourable result can occur and compare
that number to the remainder of the circuit

4. Cardano correctly uses the rule for addition of probabilities in terms


of disjoint events.
In throwing two dice of 36 equally likely cases, 11 are favourable to
the event at least one ace, 9 additional cases become favourable if
we take the larger event at least one ace or deuce, 7 further cases
come if we consider the still larger at least one ace, deuce, or trey
and so on.
Similar computations are made for three dice.

19
Basic Ideas and Rules of Probability Theory:
Conceptualized by Cardano
5. Cardano also correctly formulates the product rule for computing
the chance of the simultaneous occurrence of events defined for
independent trials Details will be discussed later.

6. In the case of throwing two dice the odds on getting at least one ace,
deuce, or trey are 3:1.

Cardano states that if the player who wants an ace, deuce, or trey
wagers three ducats [a standard unit of currency at that time] and
the other player one, then the former would win three times and
would gain three ducats and the other once and would win three
ducats; therefore in the circuit of four throws [impliedly in the long
run] they would always be equal.

20
Galileo Galilei Sought to resolve a
puzzle about a dice game
Galileo Galilei
15 February 1564 8 January 1642

One of the pioneers in introducing


experimental methods in science

21
Basic Ideas and Rules of Probability Theory:
Conceptualized by Galileo

In throwing three dice, the numbers of unordered partitions producing the


total scores 9 ({1, 2, 6}, {1, 3, 5}, {1, 4, 4}, {2, 2, 5}, {2, 3, 4}, {3, 3, 3})
and 10 ({1,3, 6}, {1, 4, 5}, {2, 2, 6}, {2, 3, 5}, {2, 4, 4}, {3, 3, 4}) are
both equal to 6. Yet, why is it that long observation has made dice-players
consider 10 to be more advantageous than 9?

Galileo pointed out that there is a very simple explanation, namely that
some numbers are more easily and more frequently made than others,
which depends on their being able to be made up with more variety of
numbers.

A variety of numbers making up a score here represents an ordered


partition. There being 27 such ordered partitions for the score 10 and 25
for the score 9 and all ordered partitions or permutations being equally
likely, the chance of getting a 10 is higher.

22
Probability is officially bornPascal
and Fermat
Blaise Pascal
19 June 1623 19 August 1662

A French (Parisian) mathematician,


physicist, inventor, writer and Christian
philosopher.

Pascal solved some problems on Games of


chances including the failed attempts of
Cardano through correspondence with his
friend, Pierre de Fermat (16011665) -
stationed at Toulouse

23
Probability is officially bornPascal
and Fermat
Pierre de Fermat
17 August 1601 (or 1607) 12
January 1665

Although a jurist by profession, Fermat had


become famous for his contributions to
mathematics and the other branches of
knowledge

24
First Published Book on Probability

Christiaan Huygens
14 April 1629 8 July 1695

A was a prominent Dutch mathematician


and scientist. He is known particularly as
an astronomer, physicist, probabilist and
horologist.

Wrote the book entitled De Ratiociniis in


Ludo Aleae (Computations in Games of
Chance), published in 1657 - The first
published book on probability

25
Applications: Probability in Finance :
Consider only two players, they alternate moves, each is
immediately informed of the others moves, and one or the
other wins.

In such a game, one player has a winning strategy, and so we


do not need the subtle solution concepts now at the center of
game theory in economics and the other social sciences.

Reference: Probability and Finance: Its Only a Game!, by


Glenn Shafer and Vladimir Vovk. 2001 by JohnWiley & Sons,
Inc.

26
Probability in Finance :
Consider a straightforward but rigorous framework for elaboration, with no
extraneous mathematical or philosophical baggage, of two ideas that are
fundamental to both probability and finance:

The Principle of Pricing by Dynamic Hedging : [Can be discerned in the


letters of Blaise Pascal to Pierre de Fermat in 1654] When simple gambles
can be combined over time to produce more complex gambles, prices for
the simple gambles determine prices for the more complex gambles.

The Hypothesis of the Impossibility of a Gambling System: Sometimes


we hypothesize that no system for selecting gambles from those offered to
us can both (1) be certain to avoid bankruptcy and (2) have a reasonable
chance of making us rich.

27
Probability in Marketing :

A company might like to estimate the probability for the


increase in volume of sales by Rupees ten million given a
particular marketing campaign.

Probability models are used to measure consumer lifetime


value.

28
Elementary Calculus
Probability:
Classical and Frequentist
Approaches
Calculus of Probability :
Connections with Set Theory

Set Theory Probability Theory Notations


Element Outcome / Elementary Event ; .
Set (Compound) Event Collection of ,
Elementary Events
Universal Set Sample Space or Sure Event or
Null set Impossible Event
Complement of a set A Complementary event of A
is a subset of Occurrence of event implies
( is a superset of ) occurrence of event ( )
Union of sets and Occurrences of events or

30
Calculus of Probability :
Connections with Set Theory
Set Theory Probability Theory Notations
Intersection of sets and Joint occurrences of events
and

and are disjoint set and are mutually =


exclusive events
and are exhaustive set and are exhaustive =
events
Power Set - The set of (Countable) Sigma- Field
all subsets of S, including
the empty set and S itself

31
Classical Definition of Probability :
As in Thorie analytique des probabilits
by Pierre-Simon Laplace

The probability of an event is the ratio of the number of cases


favorable to it, to the number of all cases possible when
nothing leads us to expect that any one of these cases should
occur more than any other, which renders them, for us,
equally possible
The Probability of an event A is defined a-priori without
actual experimentation as

=

provided all these outcomes are equally likely.

32
Simple Examples

Consider a box with n white and m red balls. In this case,


there are two elementary outcomes: white ball or red ball.

Probability of selecting a white ball is
+

We can use classical definition to determine the probability


that a given number is divisible by a prime p. If p is a prime
number, then every pth number (starting with p) is divisible
by p. Thus among p consecutive integers there is one
favorable outcome, and hence
1
P =

33
Frequentist Definition of Probability :

The frequentist view may have been foreshadowed by


Aristotle, in Rhetoric, when he wrote:

the probable is that which for the most part happens

34
Frequentist Definition of Probability :

In the frequentist interpretation, probabilities are discussed


only when dealing with well-defined random experiments (or
random samples).

The set of all possible outcomes of a random experiment is


called the sample space of the experiment.

An event is defined as a particular subset of the sample space


to be considered.

35
Frequentist Definition of Probability :

For any given event, only one of two possibilities may hold: it
occurs or it does not.

The relative frequency of occurrence of an event, observed in


a number of repetitions of the experiment, is a measure of the
probability of that event.

This is the core conception of probability in the frequentist


interpretation.

36
Frequentist Definition of Probability :

Thus, if t is the total number of trials and is the number


of trials where the event occurred, the probability ( ) of
the event occurring will be approximated by the relative
frequency as follows:

.

Clearly, as the number of trials is increased, one might expect
the relative frequency to become a better approximation of a
"true frequency".

37
Frequentist Definition of Probability :

A claim of the frequentist approach is that in the "long run,"


as the number of trials approaches infinity, the relative
frequency will converge exactly to the true probability:


( ) = lim .

38
Travelers Choices

There are three major possible options available to a travel


agency for its customer who wants to travel to New Delhi
from Jamshedpur

Direct train to New Delhi


By Road/Rail to Ranchi and Flight from Ranchi
By Rail/road to Kolkata and Flight from Kolkata

Agency has records of previous bookings in the same route


over last few years that will help them to assess probable
choice of customers

39
Other Applications

Proportion of loan application rejected by a a major bank from SMEs


micro, small and medium-sized enterprises

Proportion of defective items produced by a manufacturing unit

In estimating life time of a product, proportion of electric bulb


survived after 1000 hours in operations.
In fact, in estimating probability of survival of a electric/electronic
device after certain hours, we can actually use a lifetime
distribution that we study later.
Not only for consumer durables, we may think of Consumer
lifetime as well.

40
Combinatorics :
Arrangement of r balls in n cells

Four possible cases according to


Whether balls are distinguishable of not
Whether Exclusion principle followed (cells cannot more
than one ball) or not

41
Combinatorics :
Arrangement of r balls in n cells
Exclusion principle Exclusion principle not
followed followed

Balls are () =
!

distinguishable !
(Maxwell-Boltzman
0 otherwise Statistics)
Balls are +1
indistinguishable
(FermiDirac (Bose-Einstein Statistics)
statistics) Special Case:
No cell remain empty:
1
1
42
Application

A random sample of size with replacement is taken from a


population of elements. What is the probability that in the
sample, no element appears twice, that is, the sample could
have been realized also by sampling without replacement?

We see that there are possible sample in all of which


() satisfies the stipulated condition. Assuming that all
arrangements have equal probability, we conclude that
probability of no repetition in our sample is
() 1 2 ( + 1)
= =
1

43
Industrial Implications
If in a coal mines, 12 accidents occur in each year, then
practically all year will contain months with two or more
accidents. The probability that all months will have one
accident each is only 0.0000537.

On the average, only one year out of 18614 years, will


show a uniform distribution of one accident per month

This example reveal an unexpected characteristic of pure


randomness
This type of argument is often used for fraud detection

44
Extensions

The number of ways to deposit distinct objects into cells


with objects in cell no. ( are non-negative integers
summing to ) is
!
1 ! !
(ordering of bins is important but within each bin the
ordering is not important).

45
More Example

A throw of twelve dice can result in different outcomes


which we consider equally likely. The event that each face
appears twice can occur in as many ways as twelve dice can
be arranged in six groups of two each. The probability of
that event is therefore
12!
6 12
= 0.0034
2 .6

46
Application In Industrial Quality Control
Items are sampled from a collection of items and inspected for
defects. Assume that there are n defective items in the lot of
items. What is the probability of sampling defective items
out of items?
Problems of these types lead to genesis of hypergeometric
distribution.

In practice, the total population size as well as and are


known but the number of defected items in the population is
unknown.
o The latter may be estimated by maximizing the likelihood of the sample,
and may be given confidence interval using standard statistical estimation.

47
Estimating Population Size of Fish in a
Lake [capture-recapture ]
Consider the following experiment in an attempt to estimate
the number of fish in a lake. First, fishes are captures,
marked, and released. At a later time, fishes are caught with
of them bearing the mark of the original capture. Assuming
the size of the population of fishes is , the probability of
getting marked fishes in the second capture is


.

In this case, (, , ) are known but is unknown. We can estimate or
construct confidence intervals using the likelihood (probability of
observed data as a function of the unknown parameter ).
For example, if = 100, = = 1000 - we have approximately 93%
confidence interval that belongs to (8500; 12000).

48
Criticism of
Classical Definition of Probability :
Mathematicians find the definition to be circular.
The probability for a "fair" coin is... A "fair" coin is defined by a
probability of...

The definition is very limited. It says nothing about cases where no


physical symmetry exists.
Insurance premiums, for example, can only be rationally priced by
measured rates of loss.

It is not trivial to justify the principle of indifference except in the


simplest and most idealized of cases. Coins are not truly symmetric.
Can we assign equal probabilities to each side? Can we assign equal
probabilities to any real world experience?
49
- algebra
A non-empty collection of subsets of Sample space is
called a sigma algebra (or Borel field for events over real
line), denoted by ( B), if it satisfied the following two
properties:
a. If , then
( is closed under complementation).
b. If 1 , 2 , , then
=1
( is closed under countable unions).

To show that is closed under finite unions


To show (the empty set is an element of ) and
(the sample space is an element of ).
50
- algebra

Easy to realize that the superset generated by countable


sample space is always a - field.

Let = a, b, c, d and S = {, a, b , c. d , }. Can we


consider as a - field?

Probability is a measure (set-function) defined on , .


, is known as probabilizable space

51
More on - algebra

Example-1. (Sigma algebra-I) If S is finite or countable, we


define for a given sample space S, B = {all subsets of S,
including S itself}. If S has n elements, there are 2 sets in
B. For example, if S = {1, 2, 3}, then B is the following
collection of 2 3 = 8 sets: {1}, {1, 2}, {1, 2, 3}, {2}, {1, 3},
, {3}, {2, 3}.
Example-2. (Sigma algebra-II) Let S = (, ), the real line.
Then B is chosen to contain all sets of the form [a, b], (a,
b], (a, b), [a, b) for all real numbers a and b. Also, from the
properties of B, it follows that B contains all sets that can
be formed by taking (possibly countably infinite) unions
and interactions of sets of the above varieties.

52
Axiomatic Definition
of Probability
and
Probability Laws
Axiomatic Definition of Probability
[By Andrey Kolmogorov]
Probability of an event A, denoted by P(A) is a set function (also
called a measure ) with sample space and -field (also called event
space) satisfying the following axioms:
Axiom of nonnegativity: The probability of an event is a non
negative real number:
, 0
Axiom of Unity: the probability that at least one of the elementary
events in the entire sample space will occur is 1. More specifically,
there are no elementary events outside the sample space.
= 1
Axiom of Countable Additivity: For any countable sequence of
disjoint (synonymous with mutually exclusive) events 1 , 2
, P(
=1 ) = =1 ( )

54
Andrey Kolmogorov

Andrey Nikolaevich Kolmogorov


25 April 1903 20 October 1987
A 20th-century Russian mathematician
who made significant contributions to the
mathematics of probability theory,
topology, intuitionistic logic, turbulence,
classical mechanics, algorithmic
information theory and computational
complexity.

55
Important Results Follows From
The Probability axioms
Result-1: For the impossible event , we have necessarily P = 0.

Result-2: Probability function P is finitely additive; that is, if


(for = 1,2 , ) and if these events are disjoint, then

= ( )
=1 =1
Result-2.A. If , = 1,2 , be exhaustive and mutually
exclusive events in , then
( ) = 1

Result-2.B. Rule for Complementary Probability of any event A :
For any event , ( )= 1 .
Important Results - Continued

Result-3. The probability function P is monotone; that is, if and


are events in , such that , then ().

The numeric bound: It immediately follows from the


monotonicity property that for A , 0 A 1

Result-4. The probability function P is subtractive; that is, if


and are events in , such that , then
= .
Important Results - Continued

Result-5. Rule for Union of Probability for any events A, B


not necessarily mutually exclusive: If and are any two events
in , then
= + .

Imagine the rule for three or more events


Example: Selection and Allocation Dilemma

A company recruits 10 students for summer internship from a B-


School for four functional areas, namely, Analytics, Finance,
Marketing and Operations. The students are nearly equally
efficient in terms of their expertise in each of these four
functional areas. On the first day of their reporting, The HR
manager in a hurry, allotted them almost at random to the four
functional areas, without taking much care about the requirement
of various areas. What is the probability that Analytics area will
receive exactly 4 of the students?
Hint Answer

Note: You may consider students are indistinguishable based on


their skillset

That is, total possible equally likely arrangements will be same as


arranging 10 balls in 4 cells

Total arrangements favourable to the desired event will be same


as arranging 6(=10-4) balls in 3 (=4-1) cells

104+411 8
8!10!3! 8.7.3 14
Check: 104
10+41 = 6
13 = = = = 0.0979
10 10 6!2!13! 11.12.13 143
Example
A large shopping complex has 15 entry gates. Usually, one security
personnel is deployed to each of these gates. Security personnel can
usually chat with their colleagues deployed in the adjacent (both right
and left) gates. The personnel in 1st and 15th gates will be able talk
with only one of their colleagues. It was observed from past CCTV
footages that two personnel, say, and , whenever deployed in
adjacent gates gossips more and do not take the job seriously!
Management has ordered the chief-security officer that and
should be so deployed that there should be 10 other personnel in
between them. On one day, the chief-security officer was absent and
another person who had no idea about the order, allotted 15 personnel
in 15 gates at random. What is the probability that the requirement
will be met even in that case?
Hint Answer

2!10!3! .4 13
10
Required Probability =
15!
When Exact Probability is Untraceable

Result-6. Booles Inequality If , = 1,2 , be any events in ,


then

.
=1 =1

Result-7. [Bonferronis Inequality] If , = 1,2 , be any events in


, then

1
=1 =1

( 1)
=1 =1
Example

Over the years, the culture of binge drinking spread in premier


the B-schools across the globe despite honest effort of various
managements to curb irresponsible drinking behaviour of
students. After a booze night in a hostel, it was reported that
80% of the student of one Hostel consumed Beer, 70% enjoyed
Whisky and 60% relished Vodka. What is the proportion of
stalwarts who tried all three in that night?
Union of More than Two
Events
Poincares Theorem

For any sequence of events ; = 1,2 , , (not


necessarily mutually exclusive )

= +
=1 =1 <=1 <<=1
1
+ 1 1 2

For proof, use induction method starting from Result-5.


Example of Booze Night - continued

It was further reported that because of the popular perception


that Whisky after Beer, there is no fear; Beer after Whisky, it
is risky; 60% of the students tried Whisky after Beer but no
one tried Beer after Whisky. Moreover 50% tried both Beer
and Vodka and 40% actually tried Baseball Pleasure (a
cocktail of Vodka and Whisky). About 35% tried all three in
someway or other. What is the probability that a student did
not drink at all in the Booze night?

Are these information consistent?


Conditional Probability
Perception of Conditioning in Random
Experiment: Relative Frequency Context
In N independent trials, suppose , , denote the
number of times events A, B and AB occur respectively.
According to the frequency interpretation of probability, for
large N

; and

Among the NA occurrences of A, only NAB of them are also

found among the NB occurrences of B. Thus the ratio may

be looked upon as a measure of the event A given that B has
already occurred. Now,
/
= .
/

69
Definition of Conditional Probability
Consider the probability space , , . The conditional
probability of an event given that another event also
belong to same has occurred, denoted as , is defined
as by
( )
=
()
provided () 0.

If = 0, = 0.
The above definition satisfies all probability axioms discussed
earlier [Please Check by yourself]

70
Justification of the Definition of Conditional
Probability in the Light of Three Axioms
(i) > 0 by definition and 0 by axiom of nonnegativity.
()
Therefore, = 0.
()

(ii) Note that = . Therefore,


( ) ()
= = =1
() ()
(iii) Suppose are mutually disjoint for all = 1,2 . Then for any ,
( ) ( ) = ( ) = =
Therefore

(
=1 )
=1 ( )
| = = = ( |)
=1 () ()
=1

Hence (. |) satisfies all probability axioms and thus, defines a legitimate


probability measure.
71
Properties of Conditional Probability
(i) If , | = 1.
If , = ; Therefore P = . Hence,

= = = 1.
() ()
Since the occurrence of B implies automatic occurrence of the
event A.
Example: Probability that a G20 member is selected given that it is a
Brics member
A=A All 20 members ={Argentina, Australia, Brazil, Canada, China,
France, Germany, India, Indonesia, Italy, Japan, South Korea, Mexico,
Russia, Saudi Arabia, South Africa, Turkey, UK, USA, EU}
B= All Brics member = {Brazil, Russia, India, China, South Africa}
72
Properties of Conditional Probability

(ii) If , | .
If , = ; Therefore P = .
Hence,

= = .
() ()

(iii) The Law of Compound Probability.

When expressed in the product form we get,

= . .
73
Theorem of Compound Probability

When we have 3 events A, B and C, we have


= ( ) = .
= . . ( )

By an easy induction we obtain for n events we obtain for n


events 1 , 2 , ,

1 2
= 1 . 2 |1 . . ( |1 2 1 )

74
Probability of Complementary Event
with Conditioning Event

Suppose that, given , either or both belong to


can take place.

So we have
| = 1 .

We complement the event that is conditioned, not the


conditioning events in computing probabilities.

75
Law of independence
How to interpret the equation:
| = ?
It shows that As occurrence has had no impact on B. We say
then that B is independent of A.

We now ask the following: If B is independent of A, then is A


also independent of B?

The answer is yes, as the equation = also


implies the equation =

Thus the relationship of independence is symmetric. So from


now on we shall say A and B are independent events,
whenever anyone is independent of the other.
77
Testing Independence of Two Events

This will mean


= ()
=
and
= . ()

To show that A and B are independent events, we may verify


any of the above three equations.

78
Testing Independence of Two Events

Example : Three coins are tossed

= first coin is heads; = second coin is heads

1 1
Then = = () and = . So,
2 4
= . .

This verifies that A and B are independent events.

79
Testing Independence of Two Events

If and both belongs to and are mutually independent,


then
and are independent
and are independent
and are independent

80
Difference between Mutually Exclusive
Events and Independent Events
Note carefully that, if A and B are mutually exclusive, then
= 0. From the definition of conditional probability,
we see that
= =

From this, we would have, respectively


= = 0, 0 () 0,
Thus, unless either A or B is the null event , A and B are not
independent when they are mutually exclusive. Alternatively, if
A and B are mutually exclusive, the occurrence of B must depend
on A, since if A occurs, then B can never do so. If either A or B
is equal to , then A and , or B and , are independent.

81
Difference between Pairwise Independence
and Complete Independence of Events
If A, B and C are three events in , they will be pairwise
independent if
=
=
=

A, B and C will be completely independent if along with above


three equations following also holds
= ()

How many equations need to be satisfied, if events 1 , 2 , ,


have to be completely independent?

82
Some Final Remarks on
Achievements and Failures of
Gerolamo Cardano

83
Cardanoan unrecognized pioneer

5. Cardano also correctly formulates the product rule for computing


the chance of the simultaneous occurrence of events defined for
independent trials:

In terms of odds he says that if out of n equally likely cases just m


are favourable to an event, then in r [independent] repetitions of the
trial the odds that the event would occur every time are as mr /(nr
mr ) which, writing p for m/n becomes pr /(1 pr ). In particular, in
throwing three dice, 91 out of 216 cases are favourable to the event
at least one ace.

If the three dice are thrown thrice, Cardano correctly gets that the
odds for getting the event at least once is a little less than 1 to 12

84
Two problems : Cardano discussed but
failed to solve correctly
1. Problem of minimum number of trials: What should be the
minimum value of r, the number of throws of two dice,
which would ensure at least an even chance for the
appearance of one or more double sixes?

2. Problem of division: Two players start playing a series of


independent identical games in each one of which one or
other would win, with the agreement that whoever wins a
pre-fixed number of games first would win the series and the
total stake. The series is interrupted when the two players
have respectively a and b games still to win. What division
of the total stake between the players would be fair?

For Brainstorming

85
Some Important Theorems
Theorem of Total Probability

There are three machines that are producing cork stoppers in a


manufacturing unit. One machine is old and produce about 5%
defective. Other two machines respectively produce 2% and 3%
defective items. Probability of having a defective item at the
time of inspection is actually connected with
I. The machine in which that cork-stopper is produced
II. Proportional contribution of three machine in total
production of cork stoppers under inspection
We can use the conditional probability to express the probability
of a complicated event in terms of simpler related events. The
theorem of total probability helps us in achieving this.

87
Theorem of Total Probability

If a sequence of events : = 1 , 2 , 3 , ; forms a


finite or countably infinite partition of a sample space (in other
words, set of events are exhaustive as well as mutually
exclusive) and provided ( ) > 0 for each , then for any
event A ,
= . P

If ( ) = 0 for some , we should take away those events


and work with the rests.

88
Theorem of Total Probability

The summation can be interpreted as a weighted average, and


consequently the marginal probability, ( ), is sometimes
called average probability.

Special Case: If 0 < () < 1 so that P and


P are both defined, then,
= P + P( ) P

89
Example: Theorem of Total Probability

Suppose that two factories supply light bulbs to the market.


Factory X's bulbs work for over 5000 hours in 99% of cases,
whereas factory Y's bulbs work for over 5000 hours in 95% of
cases. It is known that factory X supplies 60% of the total
bulbs available. What is the chance that a purchased bulb will
work for longer than 5000 hours?

90
Bayes' Theorem

91
Laplace form of Bayes Theorem

Let : = 1 , 2 , 3 , be a sequence of exhaustive and


mutually exclusive events in such that ( ) > 0 for each
, then for any event B ,

| =
. P
provided, of course, () > 0.

92
Rev. Thomas Bayes

Thomas Bayes (c. 1701 7 April 1761)

An English statistician, philosopher and Presbyterian minister

Bayes never published what would eventually become his


most famous accomplishment

His notes were edited and published after his death


by Richard Price.

93
Application of Bayes Theorem

The entire output of a factory is produced on three machines.


The three machines account for 20%, 30%, and 50% of the
output, respectively. The fraction of defective items
produced is this: for the first machine, 5%; for the second
machine, 3%; for the third machine, 1%. If an item is chosen
at random from the total output and is found to be defective,
what is the probability that it was produced by the third
machine?

94
Bayesian (or epistemological) interpretation
of the Theorem

Measure of a degree of belief: Bayes' theorem links the degree of


belief in a proposition before and after accounting for evidence.

Example: Suppose it is believed with 50% certainty that a coin is


twice as likely to land heads than tails. If the coin is flipped a number
of times and the outcomes observed, that degree of belief may rise,
fall or remain the same depending on the results.

For proposition A and evidence B,


P(A), the prior, is the initial degree of belief in A.
P(A|B), the posterior, is the degree of belief having accounted for B.
the quotient P(B|A)/P(B) represents the support B provides for A.
95
Miscellaneous Problems and Results
Pairwise Independent but not
Completely Independent -An Example
Suppose, an investor may choose to invest in either of the three
options available to him namely, Fixed Deposits, Mutual
Funds or in Stocks.
= 0.16
= 0.16
= 0.16
= 0.08
= 0.08
= 0.08
= 0.28
Here events are pairwise independent but not completely
independent
Poincares Theorem Through
Logical Reasoning
The probability 1 of the realization of at least one among the
events 1 , 2 , . , is given by
1 = 1 2 + 3 4 ,

where
= (1 2 ) .
11 <,2 <,<

Earlier we have noted that the can be proved easily by method of


induction. We shall now consider an elegant proof.
Poincares Theorem Through
Logical Reasoning
Let us consider the so-called method of inclusion and exclusion. To
compute P1 we should add the probabilities of all sample points
which are contained in at least one of the , but each point should
be taken only once.

To proceed systematically we first take the points which are


contained in only one , then those contained in exactly two events
, and so forth, and finally the points (if any) contained in all .

Now let E be any sample point contained in exactly n among our N


events .

Without loss of generality we may number the events so that E is


contained in 1 , 2 , . , , but not contained in +1 , +2 , , .
Poincares Theorem Through
Logical Reasoning
Then {} appears as a contribution to those , , .whose
subscripts range from 1 to n, where
= { }, = { }, = { , }, .

Hence {} appears n times as a contribution to 1 , and 2 times as a


contribution to 2 , etc. In all, when the right-hand side is expressed in
terms of the probabilities of sample points we find {} with the factor

+ + .
2 3
It remains to show that above series is equal to 1.

This follows at once on comparing the above with the binomial


expansion of (1 1) . The latter starts with 1, and the terms of above
with reversed sign. Hence for every > 1 the expression equals 1.
Complex Application of Poincares Theorem

A NBFS has presence in 25 cities in India and they have one


branch office in each of these 25 cities. A group 25 managers
were selected one each from those 25 different city branches for
a Management Development Programme in XLRI. Post-training
these 25 managers were deputed at random in 25 city branches
so that each branch received exactly one manager. What is the
probability that no managers were posted in the same branch
where the person used to work before the training?
Complex Application of Poincares Theorem

Note that Probability of at least one match is:


25 24! 25 23! 25 22! 25 1! 25 0!
+ +
1 25! 2 25! 3 25! 24 25! 25 25!
1 1 1 1
= + +
1! 2! 3! 25!

Therefore the required probability of no match, being the


complementary event of the at least one match is
1 1 1 1
1 + +
1! 2! 3! 25!
When number of persons is large this probability actually tends to
1 .

To express e, remember to memorize a sentence to simplify this.


An Extension of Classical Definition to
Geometrical Probability
Example-1: Courier person comes regularly once to the office to
pick up consignments at a random time between 12.30 pm and
1.30 pm and stays about 10 minutes. If you prepare an urgent
consignment at 12:55 pm, how likely are you to dispatch the
letter without any hassle on the same day from the office ?

Example-2: Both the bus and you get to the bus stop at random
times between 12noon and 1pm. When the bus arrives, it waits
for 5 minutes before leaving. When you arrive, you wait for 20
minutes before hiring a cab if the bus doesn't come. What is the
probability that you catch the bus?
An Extension of Classical Definition to
Geometrical Probability

To solve this type of problems we consider Geometrical


Probability.

Suppose is a smaller part of , then under the framework of


classical definition of equally likeliness, probability that a
randomly chosen point fall inside the part of is

Some famous problems are Buffons needle problem and


Bertrand paradox See Wikipedia
Solution to Example-2 using
Geometrical Probability
We have two continuous variables here: , the time in minutes
past 12 noon that the bus arrives, and , the time in minutes past
12 noon that you arrive. Since there are 2 independent variables,
we will convert this into a 2-dimensional geometry problem.
Specifically, we can think of the set of all outcomes as the points
in a square:
Solution to Example-2 using
Geometrical Probability
Then, we need to determine the region of "success"; that is, the
points where we catch the bus. Since the bus will wait for 5
minutes, you need to arrive within 5 minutes of the bus' arrival,
or + 5.
Solution to Example-2 using
Geometrical Probability

However, you only wait for 20 minutes, so you can't arrive


more than 20 minutes before the bus, so 20.
Solution to Example-2 using
Geometrical Probability
Combining our two conditions, we have a region of success as
shown below
Solution to Example-2 using
Geometrical Probability
Now, we just need to find the area of this success region. A
simple method is to find the area of the non-success region, and
then subtract that from the total area:

Thus, the probability of catching the bus is:



=

552 402
602 103
2 2
= =
602 288
An Example from the Book:
Bayesian Method in Finance
[Authors: S. T. Rachev; J. S. J. Hsu; B. S. Bagasheva and F. J. Fabozzi]

Investors often hunt for companies that have high or improving


free cash flow but low share prices. Low P/FCF ratios typically
mean the shares are undervalued and prices may soon increase.
Thus, the lower the ratio, the "cheaper" the stock is.

A manager in an event driven hedge fund (an offshore investment


fund, typically formed as a private limited partnership, that
engages in speculation using credit or borrowed capital ) is
testing a strategy that involves identifying potential acquisition
targets and examines the effectiveness of various company
screens, in particular the ratio of stock price to free cash flow per
share (PFCF).
Bayesian Method in Finance
Independently of the screen, the manager assesses the
probability of company X being targeted 40%. Suppose
further that the managers analysis suggests that the
probability a target companys PFCF has been more than
three times lower than the sector average for the past three
years is 75% while the probability that a nontarget company
has been having that low of a PFCF for past three years is
35%. If a bidder does appear on the scene, what is the
probability that the targeted company had been detected by
the managers screen?
Bayesian Method in Finance

Let us consider the following two events:


= Company Xs PFCF has been more than three times
lower than the sector average for the past three years
= Company X becomes an acquisition target in the course
of a given year.
To answer the question, the manager needs to update the prior
probability compute the posterior probability P .
That is , denoting by the event that X does not become a
target in the course of the year , we have
= 0.4 and = 0.6.
Also, | = 0.75 and | = 0.35.
Bayesian Method in Finance

Applying Bayes Theorem we obtain:

. (|)
P =
. + . (| )
0.75 0.4
=
0.75 0.4 + 0.35 0.6
0.3 0.3
= = = 0.5882.
0.3+0.21 0.51

After taking into account companys persistently low PFCF,


Probability of a takeover increases from 40% to 58.8%.
Bayesian Method in Finance

In financial applications, continuous versions of Bayes


Theorem is predominantly used. Nevertheless, the discrete
form has some important uses, two of which are:

1. Model Selection
2. Bayes Classification
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

Source: Russell and Norvig's AI book, section 14.4 (1st


edition), personal communication between Prof. Scott D.
Anderson of Department of Computer Science , Wellesley
College and David D. Lewis (http://daviddlewis.com).
Prof. Anderson rewrote the problem with an help from
Ethan Herdrick. The context of this problem is spam filters,
an honors thesis conducted by Sara .Scout. Sinclair under
Prof. Andersons supervision.
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

How to combine evidence using what's called naive Bayes:


the assumption of conditional independence - even though
we might know that the data aren't exactly conditionally
independent.

So, the probability we get won't be accurate, but it should at


least be a probability and should correlate with the
information we want, namely the probability that a message
is spam.
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

We want to train a Bayesian classier to classify email.

Let's start with an example:

Ham Spam Total


messages messages
With Free 100 300 400

With Viagra 10 90 100

All Messages 400 600 1000


Bayesian Method in Information Systems
(For Network Security and Spam Filtering)
The basic application of Bayes' rule allows us to calculate the
probability that a message is spam given that it contains any one
token.

. |
=
()
600 300
. | 300
= = 1000 600
400 = = 0.75
() 400
1000
600 90
. | 90
= = 1000 600
100 = = 0.90
() 100
1000
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

Our prior probability of spam (given the training data) is 0.6, and if
we see a message containing the word free we bump that up to
0.75 and if we see Viagra we bump it up to 0.90.

The question is how to combine multiple pieces of evidence. That is,


if I see a message with both freeand
Viagra,what will be our
probability calculation?
Let us start with the following equation, which doesn't assume
conditional independence. This equation is a straightforward
application of Bayes' rule for two pieces of evidence:

. |
= (1)
( )
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

There are several problems with this equation. The 1st is the
denominator: usually one does not going to record and train on
all subsets of words (let's stipulate that), so the probability of
Viagra co-occurring with free is unknown. The same problem
is on the numerator, where one would need to know the
probability of that pair of terms co-occurring in a spam message.

One approach is to make the assumption of conditional


independence.
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

Conditional independence means that once you know one piece of


information, other features become independent. One classic
example is that spelling ability and shoe size are not independent:
people with larger feet spell better than people with smaller feet.
The missing piece of information is age: older kids have larger
feet and better spelling. Once you know a child's age, their
spelling ability and shoe size are unrelated (independent). When
two features are conditionally independent, we can calculate their
co-occurrence as a simple multiplication. The general statement is
as follows:
, = . (2)
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)

For the spam problem, our assumption is that the occurrence of


the words free and Viagra become independent once we
know whether the message is spam. (Again, this assumption is
probably wrong, but we make it anyhow, because we won't count
how many times the words co-occur.)

Now, we make our assumption of conditional independence.

Applying equation (2) to the numerator of equation (1), we get:


|
= | . | (3)
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)
In words, this means that for spam messages, we expect Viagra
and freeto be independent, so the probability of their co-
occurrence in a spam message is just the product of their
conditional probabilities.

You may or may not agree with the assumption, but that's what
it means.

Thus equation (1) becomes



. | . |
= (4)
( )
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)
We have in our database everything except .

Now, since the message is certainly either ham or spam

+ = 1.

Therefore,

. |
( )
. |
+ = 1.
( )
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)
This gives;

= . |
+ . | .
This replaces the calculation of the joint probability (
Bayesian Method in Information Systems
(For Network Security and Spam Filtering)
This, then, is the desired denominator for our probability
calculation. Note that the 1st term is the same as our
numerator, the other term is the analogous calculation
conditioned on ham rather than spam. The final formula, then,
for two pieces of evidence is:


. | . |
=
{ . | . |
+ . | . | }

Check: In the Example: =0.95


Application of Bayes Rule in Market Survey

In Orange County, 51% of the adults are males. (and assume that
the other 49% are females) One adult is randomly selected for a
survey involving credit card usage.

a. Find the prior probability that the selected person is a male.


b. It is later learned that the selected survey subject was smoking a
cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of
females smoke cigars (based on data from the Substance Abuse
and Mental Health Services Administration). Use this additional
information to find the probability that the selected subject is a
male.
Application of Bayes Rule in Market Survey

Let's use the following notation:


= male; = female (or not male);
= cigar smoker; = not a cigar smoker.

a. Before using the information given in part b, we know


only that 51% of the adults in Orange County are males, so
the probability of randomly selecting an adult and getting a
male is given by () = 0.51.
Application of Bayes Rule in Market Survey

b. Based on the additional given information, we have the


following:
() = 0.51 because 51% of the adults are males
( ) = 0.49 because 49% of the adults are females (not
males)
(|) = 0.095 because 9.5% of the males smoke cigars
(That is, the probability of getting someone who smokes cigars,
given that the person is a male, is 0.095.)
(| ) = 0.017 because 1.7% of the females smoke cigars
(That is, the probability of getting someone who smokes cigars,
given that the person is a female, is 0.017.)
Application of Bayes Rule in Market Survey

Let's now apply Bayes' theorem. We get the following result:


( | ) =
+

0.510.095
= = 0.85329.
0.510.095+0.490.017
Application of Bayes Rule in Market Survey

Before we knew that the survey subject smoked a cigar, there


is a 0.51 probability that the survey subject is male (because
51% of the adults in Orange County are males). However,
after learning that the subject smoked a cigar, we revised the
probability to 0.853. There is a 0.853 probability that the
cigarsmoking respondent is a male. This makes sense,
because the likelihood of a male increases dramatically with
the additional information that the subject smokes cigars
(because so many more males smoke cigars than females).
Application of Bayes Rule in
Engineering Management

An aircraft emergency locator transmitter (ELT) is a device


designed to transmit a signal in the case of a crash. The
Altigauge Manufacturing Company makes 80% of the ELTs, the
Bryant Company makes 15% of them, and the Chartair
Company makes the other 5%. The ELTs made by Altigauge
have a 4% rate of defects, the Bryant ELTs have a 6% rate of
defects, and the Chartair ELTs have a 9% rate of defects (which
helps to explain why Chartair has the lowest market share).
Application of Bayes Rule in
Engineering Management

a. If an ELT is randomly selected from the general population of


all ELTs, find the probability that it was made by the Altigauge
Manufacturing Company.

b. If a randomly selected ELT is then tested and is found to be


defective, find the probability that it was made by the Altigauge
Manufacturing Company.
Application of Bayes Rule in
Engineering Management
We use the following notation:
= ELT manufactured by Altigauge;
= ELT manufactured by Bryant;
= ELT manufactured by Chartair
= ELT is defective;
= ELT is not defective (or it is good)

a) If an ELT is randomly selected from the general population


of all ELTs, the probability that it was made by Altigauge is
0.8 (because Altigauge manufactures 80% of them).
Application of Bayes Rule in
Engineering Management

b) If we now have the additional information that the ELT


was tested and was found to be defective, we want to revise
the probability from part (a) so that the new information can be
used. We want to find the value of (|), which is the
probability that the ELT was made by the Altigauge company
given that it is defective. Based on the given information, we
know these probabilities:
Application of Bayes Rule in
Engineering Management

() = 0.80 as Altigauge makes 80% of the ELTs


() = 0.15 as Bryant makes 15% of the ELTs
() = 0.05 as Chartair makes 5% of the ELTs
(|) = 0.04 as 4% of the Altigauge ELTs are defective
(|) = 0.06 as 6% of the Bryant ELTs are defective
(|) = 0.09 as 9% of the Chartair ELTs are defective
Application of Bayes Rule in
Engineering Management

Here Bayes' theorem is extended to include three events


corresponding to the selection of ELTs from the three
manufacturers (A, B, C):

(|)
=
+ (|)+ (|)
0.8 0.04
= = 0.7033
0.8 0.04 + 0.15 0.06 + 0.05 0.09
Application of Bayes Rule in Traffic
Management and Crime Investigation
A certain town has two taxi companies: Blue Birds, whose cabs
are blue, and Night Owls, whose cabs are black. Blue Birds
has 125 taxis in its fleet, and Night Owls has 375. Late one
night, there is a hit-and-run accident involving a taxi. The
town's 500 taxis were all on the streets at the time of the
accident. A witness saw the accident and claims that a blue taxi
was involved. At the request of the police, the witness
undergoes a vision test under conditions similar to those on the
night in question. Presented repeatedly with a blue taxi and a
black taxi, in random order, he shows he can successfully
identify the color of the taxi 9 times out of 10. Which company
is more likely to have been involved in the accident?
Two Problems of Theoretical Nature

Prove that for any three events, , and , the probability


that exactly one of these events will occur can be expressed
as:

+ +
2 + + + 3( )

Prove that, for any two events and , one has


, 2 , .
Probability of Maximum and Minimum

(i) In a bidding process any bidder can bid in multiple of Rs.


1000/- between Rs. 1,00,000/- and Rs. 10,00,000/- both
inclusive. A bidder is equally likely to choose any of the
permissible amount. The highest bidder amongst the 25
participating in the bidding process will win. What is the
probability that one who bids Rs. 8,00,000/- will win?

(ii) If person choose a number at random from among first


positive integers. All numbers are equally likely to be
selected. What is the probability that the highest number
chosen will be ?
Probability of Maximum and Minimum

iii. In a tender call, an organization is likely to receive


quotation price in multiple of Rs. 1000/- between Rs.
50,00,000/- and Rs. 60,00,000/- both inclusive from various
companies. A company is equally likely to offer a price in
the given range. A company that offers the lowest quote will
win the tender call. If 25 companies respond to the tender
call with a quote, what is the probability that one who
quoted Rs. 54,00,000/- will win?
iv. If person choose a number at random from among first
positive integers. All numbers are equally likely to be
selected. What is the probability that the lowest number
chosen will be ?
More On The Classical Occupancy Problem

Recall the problem of a random distribution of r balls in


cells, assuming that each arrangement has probability .

What is the probability (, ) of finding exactly cells


empty?

Let be the event that cell number k is empty ( =


1, 2, . . . , ). In this event all r balls are placed in the remaining
1 cells, and this can be done in ( 1) different ways.
Similarly; there are ( 2) arrangements, leaving two
preassigned cells empty, etc.
More On The Classical Occupancy Problem

Now, writing as the probability that -th cell remain empty,


as the probability that both the -th and -th cell remain
empty, we have

1 2 3
= 1 , = 1 , = 1 ,.

and hence for every


= 1 ,, = 1

11 <<
By Poincare's theorem The probability that at least one cell is
empty is given by 1 2 + +
More On The Classical Occupancy Problem

The probability that all cells are occupied



0 , = 1 1 + 2 + = 0(1) 1 .

Consider now a distribution in which exactly cells are


empty.

These m cells can be chosen in
ways.

The r balls are distributed among the remaining cells


so that each of these cells is occupied, the number of such
distributions is ( ) 0 (, ).
More On The Classical Occupancy Problem

Dividing by we find for the probability that exactly m


cells remain empty


, = 1 0 ,



+
= (1) 1

0
Urn Models for Aftereffect.

An industrial plant accident might be pictured as the result of a


superhuman game of chance: Fate has in storage an urn
containing red and black balls; at regular time intervals a ball
is drawn at random, a red ball signifying an accident.

If the chance of an accident remains constant in time, the


composition of the urn is always the same. But it is
conceivable that each accident has an aftereffect in that it
either increases or decreases the chance of new accidents.
Urn Models for Aftereffect.

This corresponds to an urn whose composition changes


according to certain rules that depend on the outcome of the
successive drawings. It is easy to invent a variety of such rules
to cover various situations, but we shall be content with a
discussion of the popular Urn models.
Urn model: An urn contains b black and r red balls.
A ball is drawn at random.
It is replaced and, moreover, c balls of the color drawn and d
balls of the opposite color are added.
A new random drawing is made from the urn (now containing r
+ b + c + d balls), and this procedure is repeated.
Urn Models for Aftereffect.

This corresponds to an urn whose composition changes


according to certain rules that depend on the outcome of the
successive drawings. It is easy to invent a variety of such rules
to cover various situations, but we shall be content with a
discussion of the popular Urn models.
Urn model: An urn contains b black and r red balls.
A ball is drawn at random.
It is replaced and, moreover, c balls of the color drawn and d
balls of the opposite color are added.
A new random drawing is made from the urn (now containing r
+ b + c + d balls), and this procedure is repeated.
Urn Models for Aftereffect.

A typical point of the sample space corresponding to n


drawings may be represented by a sequence of n letters B and
R.
The event "black at first drawing" (i.e., the aggregate of all
sequences starting with B) has probability

.
+
If the first ball is black, the (conditional) probability of a black
ball at the second drawing is
+
.
+++
Random Variables and
Applications
Random Variables
We have seen earlier, uncertainty is omnipresent in the business
world which in turn induces variability.

To model variability probabilistically, we need the concept of a


random variable.

A random variable is the outcomes of a random experiment


numerically expressed that can take different values with given
probabilities.

Suppose that we have a random experiment with sample space


. A function from into another set is called a (-valued)
random variable.
Random Variables

Examples:
The return on an investment in a span (period) of one-year;
The closing price of a stock in NSE;
The number of customers entering a shopping complex
The sales volume of a store on a particular day
The turnover rate at your organization next year
Types of Random Variables

Discrete Random Variable:


One that takes on a countable number of possible values, e.g.
Total of face values (points) on roll of two dice: 2, 3, , 12
Number of refrigerator sold: 0, 1,
Customer count: 0, 1, . . .
Continuous Random Variable:
one that takes on an uncountable number of possible values, e.g.
Interest rate: 3.25%, 6.125%, . . .
Task completion time: a nonnegative value
Price of a stock: a nonnegative value
Types of Random Variables

In General: Random variables that take Integer or rational


numbers are discrete, while that take real numbers are
continuous.
In some cases, numbers are not immediately associated with
outcomes of a random experiment.

For Example,
You may win a bid or lose
After flipping, coin may show head or a tail
A customer can be male or female
We Often assign numbers such as 0 and 1 to the possible
possible outcomes in such cases
Probability Distribution

Randomness of a random variable is described by a


probability distribution.
Informally, the probability distribution specifies the probability
or likelihood for a random variable to assume a particular
value.
Formally, let be a random variable and let be a possible
value of X. Then, we have two cases.
Discrete: the probability mass function of X specifies ()
( = ) for all possible values of .
Continuous: the probability density function of is a function
() that is such that () ( < + ) for
small positive .
Probability Distribution

The probability mass function specifies the actual probability,


while the probability density function specifies the probability
rate; both can be viewed as a measure of likelihood.

Discrete probability distribution may have


A finite support ( Sample space is countably finite)
For example: Number of Successes in bidding or
Number of stocks in the List of 50 companies that form
part of NIFTY 50 Index whose closing prices were higher
than opening prices yesterday
An infinite Support: ( Sample space is countably infinite)
For example: Number of trials required to get r successes
Discrete Probability Distribution

A probability mass function must satisfy the following two


requirements:

i. 0 1for all
ii. = 1; being set of all possible values of .

Empirical data can be used to estimate the probability mass


function.

Consider, for example, the number of TVs in a household.


Discrete Probability Distribution

No. of TVs No. of Households x P(x)


0 1,218 0 0.012
1 32,379 1 0.319
2 37,961 2 0.374
3 19,387 3 0.191
4 7,714 4 0.076
5 2,842 5 0.028
Total 101,501 1
For = 0, the probability 0.012 comes from 1,218/101,501.
Other probabilities are estimated similarly.
Properties Discrete (Probability) Distribution
Realized values of a discrete random variable can be viewed as
samples from a conceptual/theoretical population.
For example, suppose a household is randomly drawn, or
sampled, from the population governed by the probability mass
function specified in the previous table. What is the probability
for us to observe the event {X = 3}?
Answer: 0.191. That X turns out to be 3 in a random sample is
called a realization. Similarly, the realization X = 2 has
probability 0.374.
We can therefore compute the population mean, variance, and so
on. Results of such calculations are examples of population
parameters.
Details of estimation will be taken later.
Bernoulli Trials
Bernoulli Trials

A sequence of trials is said to be Bernoulli trials if they satisfy


the following three assumptions:

I. Each trial has two possible outcomes, in the language of


probability called success and failure.

II. The trials are independent. Intuitively, the outcome of one


trial has no influence over the outcome of another trial.

III. On each trial, the probability of success is and the


probability of failure is 1 where [0,1] is the success
parameter of the process.
Bernoulli Trials in In Real World

Randomly assign a patient a new drug or placebo according as


the outcome of a coin tossing is head or tail.

In conducting a political opinion poll, choosing a voter at


random to ascertain whether that voter will vote "yes" in an
upcoming referendum. In choosing multiple voter one need to
ensure Population size is large enough compared to sample so
that exclusion of already sampled voters does not alter the
probability.

A customer can reinvest or liquidate a fixed deposit that will


mature in a day.
Jacob Bernoulli
[Not to be confused with Daniel Bernoulli Associated with the
famous Bernoulli Principle of Fluid Dynamics]
Lifespan: 6 January 1655 16 August 1705

One of the many prominent mathematicians in


the Bernoulli family.
Known for his numerous contributions
to calculus, and along with his brother Johann,
was one of the founders of the calculus of
variations.
His most important contribution was in the
field of probability, where he derived the first
version of the law of large numbers in his
work Ars Conjectandi.
Probability Models from Bernoulli Trials

Binomial Model can be looked upon as the number of success


in a sequence of n Bernoulli trials.

Poisson Model can be used as an approximation for the number


of success in an infinite sequence of Bernoulli Trials.

We shall also consider models for number of Bernoulli trials


required to achieve a specified number of successes.
Some Important
Discrete Distributions
Problem-1

Suppose, an organization had 10 senior managers and


15 junior managers. Out of those 25 managers, 5 left
the organization in the last quarter. Assuming that the
managers acted independently of each other and it is
equally likely for anyone to separate, what is the
probability that 2 of the 5 managers left, were senior
mangers?
General Version of Problem-1

Suppose, an organization had managers, out of


them proportion of senior managers is and rests are
junior managers. Out of those managers, left the
organization in last quarter. Assuming that the
managers acted independently of each other and it is
equally likely for anyone to separate, what is the
probability that out of the managers left, were
senior mangers?
The Hypergeometric Distribution

The . . of the distribution is given by



= , = 0,1,2, , .

0 1, = 1 ,

In practice,
min(, ) and max(0, ).
The Hypergeometric Distribution -
from Urn Problem

There are blue balls and white balls in an urn


which are otherwise identical. Further suppose,
( + ) balls are taken out of the urn at
random all at once. What is the probability that
out of the balls are taken out of the urn will be
blue?
Problem-2.A.

Suppose, an organization has large number of


employees of which 20% are rewarded with one
additional increment based on their performance
appraisal. There are 24 employees in the area of a
city branch of the organization. Assuming that the
employees chance of getting reward is independent
of others, what is the probability that exactly 7 of
the 24 employees of the branch are rewarded?
General Version of Problem 2.A.

Suppose, an organization has large number of


employees of which a certain proportion of
employees are rewarded with one additional
increment based on performance appraisal. There
are employees in the area of a city branch of
the organization. Assuming that the an employees
chance of getting reward is independent of others,
what is the probability that exactly of the
employees of the branch are rewarded?
Problem-2.B.

Suppose, an organization has large number of


employees of which 65% are permanent employees and
rest are in fixed-term contractual appointment and the
proportion is more or less same across all its branches.
Suppose, there are 50 employees in a city branch of the
organization. What is the probability that exactly 20 of
the 50 employees of the branch are permanent
employees?
Problem-2.C.

Suppose, an organization has large number of


employees of which 55% are male and rest are
female. Further suppose that the gender ratios are
more or less same across all its branches. Suppose,
there are 100 employees in a city branch of the
organization. What is the probability that exactly
70 of the 100 employees of the branch are male?
The Binomial Distribution.

The . . of the distribution is given by


=
, = 0,1,2, , .
0 1, = 1 ,

Special case: = 1.

= (1 )1 , = 0,1. 0 1.

This is known as Bernoulli Distribution.


The Binomial Distribution -
from Urn Problem

There are blue balls and white balls in an urn


which are otherwise identical. Suppose one ball is
taken out of the urn at random, its colour is noted
and is subsequently returned to the urn. Let the
trial be repeated n times and each time one ball is
taken out of the urn at random, its colour is noted
and is subsequently returned to the urn before next
drawing. What is the probability that out of the
balls are taken out of the urn will be blue?
Binomial Probability Mass Function
(For varying sample size and fraction p fixed at 0.5)
Binomial Probability Mass Function
(For varying fraction p and fixed sample size =20)
Binomial Cumulative Distribution
Function
Binomial Model Vs. Hypergeometric Model

Hypergeometric Model is used for sampling without


replacement while Binomial Model is used for sampling
with replacement.

Does it matter if we sample a few buckets of water from a


vast ocean without returning them back before drawing the
next bucket?

When population size is large Hypergeometric


distribution tends to binomial distribution. This can be
mathematically proved applying limit tending to infinity
but we can skip it in Business Statistics course.
Binomial Model in Option Pricing

The binomial options pricing model (BOPM) is one of


the most commonly used option pricing models.

Though in calculation it is more complex as compared


to the Black-Scholes option pricing model but is
widely used due used as it is able to handle a variety of
conditions for which other models cannot easily be
applied.

At each point, the model considers two scenarios, one


is called up (where the value of the underlying
increases) and the other one being down (where the
value of the underlying decreases).
Defining Rare Events

When we define a Binomial model as:



= , = 0,1,2, , .

0 1, = 1 ,
We often say is the probability of success and is the
probability of failure in a sequence of Bernoulli Trials.

Imagine a situation when is very large but is small.


We can intuitively argue that occurrence of a success in
such a case will be a rare event.
Poisson Model in Real World

The number of bankruptcies that are filed in a month

The number of arrivals at a car wash in one hour

The number of network failures per day

The number of Airbus 330 aircraft engine shutdowns per


100,000 flight hours.

The number of hungry persons entering McDonald's restaurant.


Poisson Model in Real World

The number of work related accidents over a given production


time

The number of birth, deaths, marriages, divorces, suicides, and


homicides over a given period of time

The number of customers who call to complain about a service


problem per month

The number of visitors to a Web site per minute

The number of calls to consumer hot line in a 5-minute period


Examples of such Rare Events in Real World

Number of Road Accidents/ Traffic fatalities

Number of misprints in a book

Number of employees absent from work on a particular day

Number of unresolved cases in call center in a day


Poisson Distribution

The . . . of a Poisson distribution is given by:



= , > 0, = 0,1,2, .
!

This model can actually be derived from the . . . of a


Binomial distribution taking limits over and .

We consider limits as: tends to infinity and tends to zero such


that the product is finite and is equal to, say, .
Poisson Probability Mass Function

The horizontal axis is the index k, the number of occurrences. The


function is defined only at integer values of k. The connecting lines are
only guides for the eye.
Poisson Cumulative Distribution Function

The horizontal axis is the index k, the number of occurrences. The


CDF is discontinuous at the integers of k and flat everywhere else
because a variable that is Poisson distributed takes on only integer
values.
Problem on Poisson Distribution
(Problem-3)

In a given hour, a human resource manager receives job


applications over the internet. The number of job
applications she receives per hour varies from hour to
hour. Suppose the best distribution that models the hour-
to-hour fluctuations in the number of applicants received
is Poisson and the human resource manager receives
applications from the internet at an average (rate) of 6 per
hour. What is the probability that the human resource
manager receives between 4 and 6, both inclusive, in any
given hour?
Problem-4

A Director wants to recruit a secretary as fresher for assisting him in


the delivery of human resource services with specific responsibility
for supporting department staff; providing information to applicants
and employees; maintaining clerical and financial records; and
completing assigned projects and tasks who will report him on regular
basis. Considering his busy schedule, he decides to go for telephonic
interview one by one until he finds someone deserving to call for
personal interview in his office. He knows that the probability of
getting a deserving candidate is .

What is the probability that he will find the ideal candidate in trials?

What is the probability that y candidates would be rejected before he


finds the right candidate?
The Geometric Distribution

In probability theory and statistics, the geometric distribution


is either of two discrete probability distributions:

The probability distribution of the number X of Bernoulli trials


needed to get one success, supported on the set { 1, 2, 3, ...}

The probability distribution of the number Y = X 1 of failures


before the first success, supported on the set { 0, 1, 2, 3, ... }

Which of these one calls "the" geometric distribution is a


matter of convention and convenience.
The Geometric Distribution

Its the probability that the first occurrence of success requires


number of independent trials, each with success probability . If
the probability of success on each trial is , the probability that
the -th trial (out of trials) is the first success is:

1
= = = 1 for = 1, 2, 3, .

The above form of geometric distribution is used for modeling


the number of trials until the first success.
The Geometric Distribution

By contrast, the following form of geometric distribution is


used for modeling number of failures preceding the first
success:
= = = 1 for = 0, 1, 2, 3 .

In either case, the sequence of probabilities is a geometric


sequence.

Suppose a fair die is thrown repeatedly until the first time a "1"
appears. The probability distribution of the number of times it is
thrown is supported on the infinite set { 1, 2, 3, ... } and is a
geometric distribution with = 1/6
Problem-5

A. Suppose a fair die is thrown repeatedly until the 6 appears


for 6 times. What is the probability that the sixth 6 can be
achieved at 36th trials?

B. Suppose a fair die is thrown repeatedly until the 6 appears


for 5 times. What is the probability that the fifth 6 can be
achieved at 25th trials?

In context, it is recruiting a number of candidates one by one


sequentially till all the vacancies are filled.
Negative Binomial Distribution

Generalization of Problem 5 leads to negative binomial


distribution.

Suppose there is a sequence of Bernoulli trials with the


probability of success per trial. We are observing this
sequence until a predefined number of failures has occurred.
Then the random number of successes we have seen, , will
have the negative binomial (or Pascal) distribution:

The probability mass function of the negative binomial


distribution is
+1
(; , ) = [ = ] = 1
1 for k=0,1,2,
Negative Binomial Distribution (Alternative
form)
Suppose there is a sequence of Bernoulli trials with the
probability of success per trial. We are observing this sequence
until a predefined number of successes has occurred. Then the
random number of trials we have seen, , will have the negative
binomial distribution with . . .:

1
(; , ) = [ = ] = 1
1
for = , + 1, + 2,
Discrete Uniform Distribution

In probability theory and statistics, the discrete uniform


distribution is a symmetric probability distribution whereby a
finite number of values are equally likely to be observed; every one
of n values has equal probability 1/n.

Another way of saying "discrete uniform distribution" would be "a


known, finite number of outcomes equally likely to happen".

A simple example of the discrete uniform distribution is throwing a


fair die. The possible values are 1, 2, 3, 4, 5, 6, and each time the
die is thrown the probability of a given score is 1/6.
Application: Absenteeism in Call Center

Suppose the absenteeism on a particular day amongst Customer


Service Representatives (CSR) deployed for attending inbound calls
follows a binomial distribution. Further suppose that the probability
of a CSR to stay absent is 0.1. There are 25 CSR in the call center.

I. What is the probability that on a particular day, 3 CSR will remain


absent?
II. What is the probability that on a particular day, not more than 3
CSR will remain absent?
III. What is the probability that on a particular day 3 or more CSR
will remain absent?
IV. Company wants to know from the manager whether in 95 percent
cases absenteeism amongst CSR is less than 5 on a particular day
or not. What will be response?
Solution To Part - I and II.
Let X be the random variable denoting the number of CSR who
remain absent on a particular day. In this context, we have:
~ (25, 0.1)

I. The probability that on a particular day, 3 CSR will remain absent =


25
=3 = 0.1 3 (0.9)22 = 0.2264973.
3

II. The probability that on a particular day not more than 3 CSR will
remain absent:
= 3 = =0 + =1 + =2 + =3
25 25
= 0.1 0 (0.9)25 + 0.1 1 (0.9)24
0 1
25 25
+ 0.1 2 (0.9)23 + 0.1 3 (0.9)22
2 3
= 0.0717898 + 0.1994161 + 0.2658881 + 0.2264973 = 0.7635913.
Solution To Part III and IV.

iii. The probability that on a particular day 3 or more CSR will remain
absent:
3 =1 2 =1 =0 =1 =2
= 1 0.0717898 + 0.1994161 + 0.2658881 = 0.462906.

iv. Here we first need to calculate and check : < 5 = 4 .


Now, 4 = 3 + = 4 = 0.7635913 +
25
0.1 4 (0.9)21 = 0.7635913 + 0.138415 = 0.9020063 < 0.95.
4

So in 95 percent cases absenteeism amongst CSR is not less than 5. You


may check that 5 = 0.966600, So in 95 percent plus (actually
96.66%) cases absenteeism amongst CSR is up to 5.
Distribution Function , Survival Function
and Hazard Function
In the previous example, we observe that we are often interested in
finding the probability of the type or or
> or < for a given on real line.

The most important to this end is the for any real-valued


random variable .

The cumulative distribution function (c.d.f) [or in short only


distribution function] of a real-valued random variable X is the
function given by () = , that represents the
probability that the random variable X takes on a value less than or
equal to x.
Distribution Function , Survival Function
and Hazard Function
The probability that X lies in the semi-closed interval (, ], where < ,
is therefore < = = .

There are four necessary and sufficient conditions for a function to be a


distribution function and Vice Versa. We shall consider them without the
proof.

Every cumulative distribution function F is monotone non-decreasing, that


is, for any 1 < 2 ; 1 , 2 ; 1 2 .
= 0 or in other words lim x = 0

+ = 1 or in other words lim x = 1.

Every cumulative distribution function F is right-continuous, in the sense,
x+0 = x or in other words lim x + = x .
0+
Distribution Function , Survival Function
and Hazard Function
In the definition above, the "less than or equal to" sign, " ", is a
convention.

Many old Soviet literature uses " < ", so that the forth property
change left continuity instead of right continuity.

This convention does not matter much in case of absolutely


continuous densities but is important for discrete distributions.

The CDF of a continuous random variable X can be expressed as the


integral of its probability density function as follows:

= ()

Distribution Function , Survival Function
and Hazard Function
The survival function, also known as a reliability function or
complementary cumulative distribution function is a property of any
random variable that maps a set of events, usually associated with
mortality or failure of some system, onto time.
It captures the probability that the system will survive beyond a
specified time.

The term reliability function is common in engineering while the term


survival function is used in a broader range of applications, including
human mortality.

Let T be a random variable with CDF F(t). Its survival function or


reliability function is: () = ( > ) = 1 .
Distribution Function , Survival Function
and Hazard Function
Failure rate is the frequency with which an engineered system or
component fails, expressed in failures per unit of time. It is often
denoted by the and is highly used in reliability engineering and
engineering management.

Calculating the failure rate for a smaller intervals of time in a


limiting sense, or instantaneous failure rate results in the hazard
function (also called hazard rate), ().

By definition
()
= .
1 ()
Inverse distribution function
(quantile function)
We shall see rigorous use of this when we shall study Testing of
Hypothesis in QT-II.

The quantile function specifies, for a given probability in the


probability distribution of a random variable, the value at which
the probability of the random variable being less than or equal to
this value is equal to the given probability.

It is also called the percent point function or inverse cumulative


distribution function.
Inverse distribution function
(quantile function)
If the CDF is strictly increasing and continuous then 1 ;
0, 1 , is the unique real number such that = .

In such a case, this defines the inverse distribution function or


quantile function.

Some distributions do not have a unique inverse (for example in the


case where = 0 for all < < causing to be
constant).

This problem can be solved by defining, for [ 0 , 1 ] , the


generalized inverse distribution function:
1 = inf{ : } .
Applications of Quantile Function
Recall the Q-IV on Call center absenteeism: Company wants to know
from the manager whether in 95 percent cases absenteeism amongst
CSR is less than 5 on a particular day or not. What will be response?

Any modern statistical software will give us 95th percentile point of a


Binomial Distribution with parameter = 25 and = 1. It will be 5.

From that we can also response to the question raised by Company


management.

Further certain location measures of a distribution are done by


employing quantile function, such as, Median or First or Third quartile.
Median and Other Quartiles of
Absenteeism
Call center absenteeism, more often we are interested in the
following questions:

V. What is the Median number of Absentees ( that is, in about 50%


cases, up to what number of CSRs will remain absent or in about
50% cases absenteeism will more than the given number) ?

VI. What is the first or third quartiles of the distribution of Call


center absenteeism?

VII. What is the Quartile Deviation of the Call center absenteeism?


Median and Other Quartiles of
Absenteeism
Note that = 1 0.50 ; the middlemost point in a distribution.

The first Quartile is 1 0.25 ; the point below which probability is


25% and above which probability is 75%.

The third Quartile is 1 0.75 ; the point below which probability is


75% and above which probability is 25%.

Quartiles are typical Quantile measures - there are various quantile


measures of location such as, percentile, decile etc.

Given a set of raw (numerical) data arranged according to order of


magnitude, median is usually considered as the middlemost observation
when number of data point is odd or, by convention, the average of the
two middlemost points when number of observations are even.
Solution to Part V and VI
Recall that

[ = 0] = 0.0717898 < 0.25 [ = 0] + [ = 1] =


0.0717898 + 0.1994161 = 0.2712059

Clearly the First Quartile is 1 as at this point distribution


function crosses 0. 25 mark.

= 0 + = 1 = 0.2712059 < 0.5 = 0 +


= 1 + = 2 = 0.2712059 + 0.2658881 = 0.537094.

Clearly the Median is 2 as at this point distribution function


crosses 0. 50 mark.
Solution to Part V and VI
Further, 2 = 0.537094 < 0.75 3 = 0.7635913

Clearly the third quartile is 3 as at this point distribution function


crosses 0. 75 mark.

From these sets of results we can answer Questions V and VI.

Note that we always observe jumps in cumulative distribution function


we we deal with discrete distributions.

Also note that we shall often use 0.5th , 1st , 2.5th , 5th , 95th, 97.5th,
99th and 99.5th percentile points in Statistical Inference in QT-II.
Solution to Part VII
Quartile Deviation is a measure of dispersion or variability in the
probability distribution.

Writing 3 = 1 0.75 ; 1 = 1 0.25 ; we define Quartile


deviation (QD) as (3 1 )/2.

The difference 3 1 is often called Inter-Quartile Range (IQR).

In the given problem of call center absenteeism


31
= = 1.
2

These concepts can easily be used when we have a set of raw data
and/or frequency distribution.
Quartile Based Skewness Measure
Skewness measures the degree of asymmetry in the probability
distribution or data as the case may be.

A quick and robust measure of Skewness is Bowleys skewness


(), Writing 2 as the second quartile or median; we define, =
(3 2 )(2 1 ) 2 +
= 3 2 1.
3 1 3 1

Question-VIII Find the Bowleys Skewness in Call Center


absenteeism and comment.

Clearly Bowleys skewness=0.

But Binomial (25, 0.1) this is not a Symmetric Distribution in


general. Why so?
Note on Quartile Measures
These measures are not based on entire probability distribution or
the entire data as the case may be

As a result, if there are are some outliers in the tails , these measures
are highly robust and efficient in the sense they are not influenced
by the presence of outlier.

However, since they are not based on entire distribution or data,


sometime results are little surprising, such as, Bowleys skewness is
0 in our example. This is because skewness is measured using just
three locations of the distributions and more tail information are not
considered.

This remind us that we must study certain measures that are based
on entire probability distribution or the data as the case may be.
How to find Average Absenteeism

Question IX: What is the average number of absentees per day?

Average of a random variable is usually computed using the notion


called expectation.

For a discrete random variable that takes values 1 , 2 , ,


respectively with probability 1 , 2 , , the expectation of ,
denoted by , is given by

=
=1
For binomial distribution, it is:
= =1 1 = [See Board for Proof]

In the present problem average is 25 0.1 = 2.5.


Rationale Behind Expectation

For a variable that takes values 1 , 2 , , respectively with frequency


1 , 2 , , the arithmetic mean of is given by

1

= =

=1 =1
where

=
=1

Note that, according to relative frequency approach of probability tends to

probability of , which may be given by .

That is in the long run, tends to and is given by


=
=1
Problems of Absenteeism (Contd.)

Suppose, there are 10 more call center agents (CSA) who handle the
responsibilities of sales promotion and marketing through outbound
calls. On a particular day their absenteeism follows a binomial
distribution with parameter 0.08.

X. Can we say in this case that in 95 percent situations, absenteeism


amongst CSA is less than 5?

XI. On an average how many CSA will remain absent on a particular


day?

XII. In this connection, we may have an additional question that is there


any significant differences in average rate absenteeism among
CSRs and CSAs For which need raw data and we discuss such
issues in QT-II
Solutions to Part X and XI

Let Y be the random variable denoting the number of CSA who


remain absent on a particular day.

In this context, ~ (10, 0.08)

The probability that on a particular day, less than 5 CSA will remain
absent = < 5 = [ 4].

It is easy to see that the probability [ 4]=0.9994143.


So the answer to the Question No X is affirmative.

As regard to Question No. XI, we further see that = =


10 0.08 = 0.8. That is on an average, there will be 0.8 absentees
among CSAs per day.
More Problems of Absenteeism (Contd.)

XIII.Suppose the manager of the call center adopt a strategy


that if on a particular day more than 3 CSR remain absent,
one CSA will be deputed as CSR. What is the probability
that on a particular day, one CSA has to act as a CSR?

XIV.Further suppose that the manager adopt a strategy that he


will assign one CSA the CSR role per two absent CSR
(and may ignore absence of any one CSR) on a particular
day. What is the probability that on a particular day, two
CSA have to act as the CSR?
Solutions to Part XIII
Under certain assumption, the required probability for the first
problem is same as the probability that more than 3 CSR will
remain absent. That is = > 3 = 1 [ 3]=0.2364086

In this context, we have assumed that at least one CSA will remain
present.

Such an assumption is plausible because the probability that no


CSA is present is same as = 10 = 1.073742 x 1011 .

This probability is practically 0. That is why, we can avoid the


hassles of using joint/conditional probability.

In general, otherwise we need to work with joint distribution of two


variables. More on these will be discussed later.
Solutions to Part XIV

As per policy, exactly 2 CSA has to be deputed if and only if 4


or 5 CSR remain absent on a particular day.

Probability that at least two CSA will be available is 8


is almost 1. (Note that the probability of the complementary
event is 1.245541e-09).

So we can safely assume that always 2 CSA will be available to


act as CSR.

Therefore the required probability can be approximated by:


4 5 = 5 3 =0.2030087
Problems of Absenteeism (Contd.)

At some point of time, the top management realizes that retaining


the brand value and serving the existing customers better are
more important than trying to win a few new customers.

They directed the manager that 25 CSR must be used 24x7 even
at the cost of sales promotion, if necessary.

XV.What is the probability that on a particular day there will be no


one for sales promotions and marketing?
Solutions to Part XV

The probability that on a particular day there will be no one for sales
promotions and marketing if and only if sum of the number of
absent CSR and CSA on a particular day is 10 or more.

That is, iff + 10.

Note that, if ~ (1 , ) independently of where


~ 2 , , + ~ (1 + 2 , ). See Board
for Proof.

In general, this is not true if proportion of success p is not same in


either cases. In such cases, as in the given problem, we have to
evaluate probabilities directly.
Solutions to Part XV

To evaluate: P + 10 .

P + 10
= 0, = 10 + 1, = 9 + 2, = 8 +
+ 10, = 0
= 0 = 10 + 1 = 9 + 2 = 8
+ + 10 = 0
(We assume that X and Y are independent random variable where
we can apply rule of Multiplication of probability)

We can easily evaluate this using Calculator. You can check that the
probability is 0.001103982.
More Problems with Call Center
Management
Further suppose that at any particular moment number of incoming
calls for the CSR follows a Poisson distribution with rate (average) 8.
Customers do not have to wait if at least one of the 25 (assuming no
absence) CSR is free as the call will automatically go to a free CSR.

What is the probability that at any point of time just one customer
has to wait?

What is the probability that at any point of time at least three


customers have to wait?

What is the probability that at any point of time more than half of the
CSR will remain free?
Average (Expectation) in Context of
Poisson Distribution

For a discrete random variable that takes values


1 , 2 , , respectively with probability 1 , 2 , , the expectation of
, denoted by , is given by

=
=0
provided the sum is finite. In fact, the sum exists iff
|| < .

For a Poisson random variable , the condition holds and it can be


shown that
=

In the present problem average is 8, therefore, we can assign Poisson


parameter =8.
Solution to Call Center Problems using
Poisson Model
Let be the random Variable denoting the number of
customers have to wait at a certain time point.

Note that 0 iff number of customers wish to avail


CSR services () at a particular time is more than 25. In
such cases, = + 25.

Here follows (8)

Required Probability: = 1 = [ = 26]


> dpois(26,8)
[1] 2.513997e-07
Solution to Call Center Problems using
Poisson Model (contd.)
At any point of time at least three customers have to wait iff 3

Required Probability: 3 = 28 = 1 [ 27]


Using R:
> 1-ppois(27,8)
[1] 2.925614e-08

Further at any point of time more than half of the CSR will remain
free if customers availing CSR services () at that time is not more
than 12.

That is the required probability : 12 .


Using R
> ppois(12,8)
[1] 0.9362028
Dispersion of a Random Variable

Variance of a random variable is given by


= 2 = ( ) 2 = 2 2 .

Condition for existence of variance:


2 < .
Standard deviation of a random variable is the positive square
root of variance.

~ , , Var = np 1 p
~ (), Var =
Examples

The standard deviation (SD) of the random variable , denoting


the number of CSR who remain absent on a particular day
where, ~ (25, 0.1) is:
25 0.1 0.9 = 1.5.

The standard deviation (SD) of the random variable , denoting


the number of Customers trying to avail CSR service at certain
time, where, ~ (8) is:
8 = 2.828.
More Problems with Call Center
Management Continuation of Session-8
Suppose at any particular moment number of incoming calls for the
CSR follows a Poisson distribution with rate (average) 8. Customers
do not have to wait if at least one of the 25 (assuming no absence)
CSR is free as the call will automatically go to a free CSR.

XVI. What is the probability that at any point of time just one customer
has to wait?

XVII.What is the probability that at any point of time at least three


customers have to wait?

XVIII.What is the probability that at any point of time more than half of
the CSR will remain free?
Average (Expectation) in Context of
Poisson Distribution

For a discrete random variable that takes values


1 , 2 , , respectively with probability 1 , 2 , , the
expectation of , denoted by , is given by

=
=0
provided the sum is finite. In fact, the sum exists iff
|| < .

For a Poisson random variable , the condition holds and it can


be shown that
=

In the present problem average is 8, therefore, we can assign


Poisson parameter =8.
General Rules for Expectation

For a discrete random variable that takes values


1 , 2 , , respectively with probability 1 , 2 , , the
expectation of a regular function of , say () is given by

() = ( ) [ = ] = ( )
=0 =0
provided the |()| < .

If the distribution of is absolutely continuous


() =
where the integral is over the support of , provided of
course |()| < .
Solution to Part - XVI of Call Center
Problems using Poisson Model
Let be the random Variable denoting the number of
customers have to wait at a certain time point.

Note that 0 iff number of customers wish to avail CSR


services () at a particular time is more than 25. In such cases,
= + 25.

Here follows (8)

Required Probability:
= 1 = = 26 =2.513997x 107
Solution to Parts XVII and XVIII
of Call Center Problems
At any point of time at least three customers have to wait iff
3

Required Probability:
[ 3] = [ 28] = 1 [ 27] = 2.925614 108

Further at any point of time more than half of the CSR will
remain free if customers availing CSR services () at that time
is not more than 12.

That is the required probability : 12 = 0.9362028


Later we shall see how to approximate such probability
Dispersion of a Random Variable

Variance of a random variable is given by


= 2 = ( ) 2 = 2 2 .

Condition for existence of variance:


2 < .
Standard deviation of a random variable is the positive square
root of variance.

~ , , Var = np 1 p
~ (), Var =
See Board for proofs.
Examples- Problems XIX and XX

XIX.What is the standard deviation and variance of absenteeism


among CSRs and CSAs ?

XX.What is the standard deviation and variance of the variable


denoting number customers who try to reach to certain CSR
at given time?

We can directly apply the formulae of SD and variance of


Binomial and Poisson that we derive ---
Solution to Problem XIX and XX

XIX.The standard deviation (SD) of the random variable , denoting


the number of CSR who remain absent on a particular day where,
~ (25, 0.1) is: 25 0.1 0.9 = 1.5.
In this context the variance is 2.25.

Similarly the standard deviation (SD) of the random variable ,


denoting the number of CSA who remain absent on a particular day
where, ~ (10, 0.08) is: 10 0.08 0.92 = 0.8579.
In this context the variance is 0.736.

XX. The standard deviation (SD) of the random variable , denoting the
number of Customers trying to avail CSR service at certain time,
where, ~ (8) is:
8 = 2.828.
More Problems : XXI and XXIII

Company is often interested in the following problems that


helps them to relook at various strategies:

XXI.What is the median number of customers who try to reach


to certain CSR at given time?

XXII.What is the most likely value of number of absentees


among CSRs and CSAs?

XXIII.What is the most likely value of number customers who


try to reach to certain CSR at given time?
Solution to Problem XXI and XXII

Here direct enumeration is of course a possibility as we did in


case of Problem V and VI.

But better if we know the formula for finding Median or Mode


of a distribution.

Now, we have realized the utility of mean, median, mode,


standard deviation, variance, skewness etc.

We learn, how to find expression for mean and variance. So it


will good if we also learn approaches of computing Median or
mode of a some discrete distributions.
Median of Binomial Model

Suppose ~ (, ). It can be shown that c.d.f. of can be


expressed as:
[]

= = (1 )

=0
1

= ( ) 1 (1 )
0
This may be used to find median but that is not very straight forward.

It is known that if is an integer, then the mean, median, and mode


coincide and equal .

In general, any Median must lie within the interval:


1 []
Mode of Binomial Model

Suppose ~ (, ).

Usually the mode of a is equal to [( + 1)] where [. ] is


the floor function (the largest integer less than or equal to).

However, when ( + 1) is an integer and is neither 0 nor


1, then the distribution has two modes: ( + 1) and ( +
1) 1. --- See Board for Proof.

Degenerate cases: When is equal to 0 or 1, the mode will be 0


and n correspondingly.
Median and Mode of Poisson Model

We know how Poisson model arises as limiting case of


Binomial, using same limit concept, we can say,

Usually the mode of a () is equal to [] where [. ] is


the floor function (the largest integer less than or equal to).

However, when is a positive integer the distribution has two


modes: and 1. --- Try yourself.

Degenerate case: When is equal to 0 the mode will be 0.


More Problems on Call Center Management

XXIV. Suppose that after implementation of some stringent laws for


casual leaves probabilities of absenteeism boil down to 0.05 for
both CSA and CSR. We assume that two groups behave
independently. Under the same policy as in Problem No. XV, how
will you compute the probability that on a particular day no one will
be left for sells promotion and marketing?

XXV. On a given day, it was found that the total number of absentees
two groups taken together is 5. What is the probability that 4 of
them are CSRs?

Imagine the what happens under conditioning with independent two


binomial random variables are on the plate. (See Board for Proof.)
Expectation and Variance of
Geometric Model
X: No. of Bernoulli trials required to get the first success

1 2
1
E X = . 1 = . 1 (1 ) = .

=1

1
Check that = .
2

Y: No. of failures preceding the first success in a sequence of


Bernoulli trials. = 1.

1 1
= 1 = 1= 1= .

= 1 =
Some General rules for
Expectation and Variance
Expectation of any constant c is the same constant. = .

Variance of any constant c is always 0. V = 0.

For any two random variable and , sum law of expectation states:
+ = + = + , .

Always = 0. Similarly = 0.

For any two random variable and :


+ = + 2 , ,
where , = ( )( ).

If and are independent , = 0 [Converse is not true] and


+ = + .
A simple use of sum law of expectation

Suppose a manager is conducting online interviews to fill up


r vacancies one by one till all the positions are filled up. He
knows that the probability that a candidate will be selected is
. On an average, how many candidates he has to interview
and what will be its variance?

One way is to compute Mean and Variance of Negative


binomial model directly. But we can use sum rule more to
obtain it more elegantly.
A simple use of sum law of expectation

Let denote the number of candidates need to interviewed to fill the i-th
vacancy.

Therefore, if denote the number of candidates need to interviewed to


fill the r vacancies, we have = =1 .

As a consequence:


E = = ( ) = .

=1 =1

It can be also seen that s are independent and therefore,



(1 )
Var = = ( ) = 2
.

=1 =1
Application of Laws of
Expectation and Variance
Example: Sales versus Profit:

The monthly sales, , of a company have a mean of Rs. 25


Million and a standard deviation of Rs. 4 Million. Profits, ,
are calculated by multiplying sales by 0.3 and subtracting
fixed costs of Rs. 6 Million. What are the mean profit and the
standard deviation of profit?
Solution

Throughout the computation, we consider the unit of currency as Rs.


in Million. We know that:

= 25 () = 42 = 16
Further; = 0.3 6

Therefore,
= 0.3 6 = 0.3 25 6 = 1.5
and
= 0.32 = 0.09 16 = 1.44
Consequently, = = 1.2

Applications of Laws of Expectation


in Decision Making under Uncertainty. . .
Many of the concepts we have introduced can be used effectively in
analyzing decision problems that involve uncertainty.
The basic features of such problems are:
We need to make a choice from a set of possible alternatives. Each
alternative may involve a sequence of actions.
The consequences of our actions, usually given in the form of a
payoff table, may depend on possible states of nature, which are
governed by a probability distribution (possibly subjective).
The true state of nature is not known at the time of decision.
Our objective is to maximize the expected payoff and/or to
minimize risk.
We could acquire additional information regarding the true state of
nature at a cost.
Example: Investment Decision

We shall consider only one example here. For more


complicated problems, a decision tree can be used. Those will
be discussed later.

An individual has Rupees 1 million and wishes to make a


one-year investment.

Suppose his/her possible actions are:

1 : buy a guaranteed income certificate paying 10%


2 : buy bond with a coupon value of 8%
3 : buy a well-diversified portfolio of stocks
Example: Investment Decision

A coupon payment on a bond is a periodic interest payment


that the bondholder receives during the time between when
the bond is issued and when it matures.

Coupons are normally described in terms of the coupon rate,


which is calculated by adding the total amount of coupons
paid per year and dividing by the bond's face value. For
example, if a bond has a face value of Rs. 1,000 and a coupon
rate of 5%, then it pays total coupons of Rs. 50 per year.
Example: Investment Decision

Return on investment in the diversified portfolio depends on


the behavior of the interest rate next year. Suppose there are
three possible states of nature:

1 : interest rate increases


2 : interest rate stays the same
3 : interest rate decreases

Suppose further that the subjective probabilities for these


states are 0.2, 0.5, and 0.3, respectively.
Example: Investment Decision

Based on historical data, the payoff table is:


States Actions Actions
of Nature
1 2 3

1 100,000 50,000 150,000

2 100,000 80,000 90,000

3 100,000 180,000 40,000

Which action should he/she take?


Solution

The expected payoffs for the actions are:

1 : 0.2 100,000 + 0.5 100,000 + 0.3 100,000


= 100,000
2 : 0.2 50,000 + 0.5 80,000 + 0.3 180,000
= 84,000
3 : 0.2 150,000 + 0.5 90,000 + 0.3 40,000
= 87,000

Hence, if one wishes to maximize expected payoff, then


action 1 should be taken.
Solution based on an equivalent concept
To minimize expected opportunity loss (EOL).
Consider any given state. For each possible action, the opportunity
loss is defined as the difference between what the payoff could
have been had the best action been taken and the payoff for that
particular action. Thus,
States Actions Actions
of Nature
1 2 3
1 50,000 200,000 0
2 0 20,000 10,000
3 80,000 0 140,000
EOL 34,000 50,000 47,000

Indeed, 1 is again optimal.


Application in Fair Betting

For actual investments, we expect to get a net positive return.

In order to establish a baseline for any wager, investment, or


even an insurance premium (which is a form of wager), we will
study the concept of a fair bet in this application.

Wager: An agreement in which people try to guess what will


happen and the person who guesses wrong has to give something
(such as money)
Application in Fair Betting
What does it mean to say I'll give you 4 to 1 odds that the GST bill pass
the floor test in upper house?

The most common interpretation is that you are willing to risk Rs. 4
against someone else Rs. 1 on the outcome of the GST bill pass.
Specifically, if the GST bill pass the floor test in upper house, you get Rs.
1 and if not you pay Rs. 4.

When you give odds on something happening, we will call this odds for
(()).

How does this relate to your belief about the underlying probability of the
event?

If we let be the probability of the event happening then the situation can
be diagramed as in next slide:
Application in Fair Betting

Event Probability Gain(Loss)


Happens 1
Doesn't Happen (1 ) ()
Therefore for the situation to be "fair" one must have :
1 + 1 = 0

This implies that the relationships between and () are:


= ; =
+ 1 1
Application in Fair Betting

Probability as a Function of Odds For

1
0.95
0.9
Probability

0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0 5 10 15 20

Odds For to 1
Note on odd against

If you think that the probability of something happening is less than


.5, then you would have to offer odds like 50 paise to Rs. 1. This is
not usual since Odds are usually quoted in even dollar amounts.

A bet of 50 paise to Rs. 1 would be converted to a bet of 1:2. From


the other persons point of view, he/she is now betting against the
event happening.

In other words if it happens he/she will now lose Rs. 2 and if it


doesn't happen they will gain a rupee. In other words they are
giving you odds against the event happening.
Note on odd against

Let O(a) equal the amount the person will lose if the event happens, then
the table becomes
Event Probability Gain(Loss)
Happens 1
Doesn't Happen (1 ) ()
Probability as a Function of Odds
Against
0.5
0.4
Probability

0.3
0.2
0.1
0
0 5 10 15 20
Odds Against to 1
Application in determination of
Insurance Premium

Assume that a person is 35 years old and have a probability


0.001 of dying in the next year. Suppose the person want to
purchase a Rs. 100,000 insurance policy. Also, known that:
Overhead Cost per Rs. 100 of sales = Rs.75
Desired Profit = 10% of revenues

What should be the premium?

Outcome Probability Insurance Company Gain


Live 0.999
Die 0.001 Rs. 100,000
Application in determination of
Insurance Premium
Under Fair Betting

= 0.999 + 0.001 . 100,000 = 0 ;

Solving we get:
= . 100

Overhead Cost per Rs. 100 of sales = Rs. 75 and Desired Profit
= 10% of revenues. Therefore,

Target Profit (in Rs. ) = 100 + 75 .10 = 17.50

Pr = . 100 + . 75 + . 17.50 = . 192.50.


Insurance Company's Perspective

Subjective Company
Outcome Probability Gain P Gain
Live 0.999 Rs. 192.50 Rs.192.31
Die 0.001 - Rs. 99,807.50 -Rs.99.81
Expected
Gain = Rs.92.50
Overhead = Rs.75.00
Expected Profit = Rs.17.50
Persons Perspective

Person's
Outcome Probability Gain
Live (1 ) -192.5 0.998075 -192.129
Die 99807.5 0.001925 192.129
(1 ) (192.5) + (99807.5)
= 0 0
= 0.001925
Joint Probability Distribution

Consider an Example where in a small township, houses are


sold by two agents , say, THC and GPL. Let and be the
respective numbers of houses sold by them in a month. Based
on past sales, we estimated the following joint probabilities for
and .
X 0 1 2 3
Y
0 0.10 0.30 0.05 0.04
1 0.20 0.05 0.05 0.02
2 0.06 0.03 0.02 0.01
3 0.04 0.02 0.01 0
Joint Probability Distribution

Broadly, we have looked at univariate distributions, i.e.,


probability distributions in one variable or multiple independent
variables.

Bivariate distributions, also called joint distributions, are


probabilities of combinations of two variables.

For discrete variables X and Y , the joint probability distribution


or joint probability mass function of and is defined as:
, = = =
for all pairs of values and .
Joint Probability Distribution

As in the univariate case, we require:

0 , 1 for all pairs of , for all pairs of values


and .
, = 1.

Thus, in our example example (0, 1) = 0.20 , meaning that


the joint probability for and to sell 0 and 1 houses,
respectively, is 0.20.
Other entries in the table are interpreted similarly.
Note that the sum of all entries must equal to 1.
Marginal Probabilities

The marginal probabilities are calculated by summing across


rows and down columns:
In ourXexample,0 1 2 3 Marginal
Y Probabilities of

0 0.10 0.30 0.05 0.04 0.49


1 0.20 0.05 0.05 0.02 0.32
2 0.06 0.03 0.02 0.01 0.12
3 0.04 0.02 0.01 0.00 0.07
Marginal 0.40 0.40 0.13 0.07 1.00
Probabilities
of
Marginal Probabilities

This gives us the probability mass functions for and


individually. For example, the marginal probability for THC
to sell 1 house is 0.4.

X Marginal Y Marginal
Probabilities of Probabilities of

0 0.40 0 0.49
1 0.40 1 0.32
2 0.13 2 0.12
3 0.07 3 0.07
Total 1.00 Total 1.00
Independence of Random Variables

Two variables and are said to be independent if


( = = ) = ( = )( = )
for all and .

That is, the joint probabilities equal the product of marginal


probabilities. This is similar to the definition of independent
events.

In the houses-sold example, we have


( = 0 = 2) = 0.06 ,
= 0 = 0.4 , = 2 = 0.12.
Hence, and are not independent.
Properties of Bivariate Distributions. . .

Expected values, Variances, and Standard Deviations etc. can


be computed

Please Check yourself that:

= = 0.87 ; = = 0.77

2 = = 0.7931; 2 = = 0.8371

These marginal parameters are computed via earlier formulas.


Properties of Bivariate Distributions. . .
Covariance and Correlation
Covariance: The covariance between two discrete variables is
defined as:
, = ,

This is equivalent to:


, = ,

Example: Houses Sold
, = 0.53 0.87 0.77 = 0.1399
Properties of Bivariate Distributions. . .
Covariance and Correlation
Coefficient of Correlation : The LINEAR association between
between two variables is given by Pearsons Coefficient of
Correlation or Product-moment correlation defined as:
(, )
, = , =

Note that , only measures the linear association between


two variables and .

If and are linearly uncorrelated , = 0 but , = 0


does not imply and are independent.
Properties of Bivariate Distributions. . .
Covariance and Correlation
Example: Houses Sold
0.1399
, = = 0.1716979
0.7931 0.8371
This indicates that there is a bit of negative relationship between
the numbers of houses sold by THC and GPL.

Is this surprising?

For absolutely continuous random variables, sums are usually


replaced by integration and p.m.f by corresponding p.d.f.
Conditional Probability Distribution
Formally, let and be two random variables. Then, the
conditional probability distribution of , for all values of y,
given = is defined by:
P(Y = y and X = x)
( | ) = ( = = = .
P(X = x)
Given = , we can also calculate the conditional expected
value of via:
( | = ) = y P(y | x) .

( | = ) is known as True regression of on .
Similarly ( | = ) is the True regression of on .
Conditional Probabilities of Y

The conditional probabilities of are calculated for various


given .
In our example, Given Given Given Given
Y =0 = = =
0 0.25 0.750 0.3846 0.5714
1 0.50 0.125 0.3846 0.2857
2 0.15 0.075 0.1538 0.1429
3 0.10 0.050 0.0770 0.00
Total 1.000 1.000 1.000 1.000
(|) 1.10 0.425 0.9229 0.5714
True Regression Line of on

E(Y|x)
1.2

0.8

0.6
E(Y|x)

0.4

0.2

0
1 2 3 4
Sum of Two Variables...

The bivariate distribution allows us to develop the probability


distribution of the sum of two variables, which is of interest in
many applications.

In the houses-sold example, we could be interested in the


probability for having two houses sold (by either THC or GPL)
in a month.

This can be computed by adding the probabilities for all


combinations of (, ) pairs that result in a sum of 2:

( + = 2) = (0, 2) + (1, 1) + (2, 0) = 0.16 .


Sum of Two Variables...

Using this method, we can derive the probability mass function


for the variable + :
+ ( + )
0 0.10
1 0.50
2 0.16
3 0.16
4 0.06
5 0.02
6 0
Total 1.00
Sum of Two Variables...

The expected value and variance of X + Y obey the following


basic laws. . .

I. ( + ) = () + ( )
II. ( + ) = () + ( ) + 2 (, )

If X and Y happens to be independent, then (, ) = 0


and thus ( + ) = () + ( ).
Sum of Two Variables...

Example: Houses Sold

( + ) = 0.87 + 0.77 = 1.64 ,


+ = 0.7931 + 0.8371 + 2 0.1399 = 1.3504,

+ = + = 1.162

Note that the negative correlation between X and Y had a


variance-reduction effect on X + Y . This is an important
concept. One application is that investing in both stocks and
bonds could result in reduced variability or risk.
Application: Mutual Fund Sales
Suppose a mutual fund sales person has a 50% (perhaps too
high, but we will revisit this) chance of closing a sale on each
call she makes. Suppose further that she made four calls in the
last hour.

Consider closing a sale a success and not closing a sale a


failure. Then, we will study the variables:
= total number of successes
= number of successes before first failure

An interesting question is: How would the distribution of


vary for different values of ?
Application: Mutual Fund Sales
Suppose a mutual fund sales person has a 50% (perhaps too
high, but we will revisit this) chance of closing a sale on each
call she makes. Suppose further that she made four calls in the
last hour.

Consider closing a sale a success and not closing a sale a


failure. Then, we will study the variables:
= total number of successes
= number of successes before first failure

An interesting question is: How would the distribution of


vary for different values of ?
Application: Mutual Fund Sales

Let
= total number of successes (denoted by S) out of 4 sales calls
= number of successes before the first failure (denoted by F) in
the same 4 sales calls

Assumptions:
I. The success probability for a call is 0.5 (or 1/2).
II. The outcomes of different calls are independent.

The Sample Space: There are 24 =16 possible outcomes, listed in


next in the slide.
Application: Mutual Fund Sales

SSSS FFFF
SSSF FFFS
SSFS FFSF
SFSS FSFF
FSSS SFFF
SSFF FFSS
SFSF FSFS
SFFS FSSF
Since the success probability is 0.5, each possible outcome has a
1
probability of = 0.0625.
16
Application: Mutual Fund Sales

The Variable X: By simple counting, we have:

X X
SSSS 4 FFFF 0
SSSF 3 FFFS 1
SSFS 3 FFSF 1
SFSS 3 FSFF 1
FSSS 3 SFFF 1
SSFF 2 FFSS 2
SFSF 2 FSFS 2
SFFS 2 FSSF 2

Application: Mutual Fund Sales

By counting the number of times each value for X occurs, we


obtain the probability mass function (or distribution) of X:

x P(x)
0 0.0625
1 0.25
2 0.375
3 0.25
4 0.0625
Recall that X is a Binomial Distribution with parameters = 4
and = 0.5.
Please check that () = 2; () = 1.
Application: Mutual Fund Sales

Probability Mass Function of X

0.4

0.35

0.3
Probability

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4

Values of X
Application: Mutual Fund Sales

The Variable : For each possible outcome, we can also


determine the value of :


SSSS 4 4 FFFF 0 0
SSSF 3 3 FFFS 1 0
SSFS 3 2 FFSF 1 0
SFSS 3 1 FSFF 1 0
FSSS 3 0 SFFF 1 1
SSFF 2 2 FFSS 2 0
SFSF 2 1 FSFS 2 0
SFFS 2 1 FSSF 2 0
Application: Mutual Fund Sales

By counting the number of times each value for Y occurs, we


obtain the probability mass function (or distribution) of Y:

y P(y)
0 0.5
1 0.25
2 0.125
3 0.0625
4 0.0625
Y is actually a Right Truncated Geometric Distribution with
with at 4 and = 0.5.
Please check that () = 0.9375; () = 1.433594.
Application: Mutual Fund Sales

Probability Mass Function of Y

0.6

0.5

0.4
Probability

0.3

0.2

0.1

0
0 1 2 3 4

Values of Y
Application: Mutual Fund Sales:
Bivariate Distribution of X and Y:

For each of the outcomes in our sample space, we have both an


X value and a Y value.

We can therefore develop the joint probability distribution of X


and Y.

The table in next slide gives the bivariate probabilities P(x,y)


for all possible combinations of x and y values:
Application: Mutual Fund Sales:
Bivariate Distribution of X and Y:
Row
x\y 0 1 2 3 4 Sum
0 0.0625 0 0 0 0 0.0625
1 0.1875 0.0625 0 0 0 0.25
2 0.1875 0.125 0.0625 0 0 0.375
3 0.0625 0.0625 0.0625 0.0625 0 0.25
4 0 0 0 0 0.0625 0.0625
Column
Sum = 0.5 0.25 0.125 0.0625 0.0625 1
The marginal probabilities for and for are highlighted,
respectively, in blue and red above. These are consistent with
what we obtained earlier.
Application: Mutual Fund Sales:
Chart of Bivariate Distribution of X and Y:

Joint Distribution of X and Y

0.2
0.18
0.16
0.14
Probability

0.12
0.1
0.08
0.06
0.04 4
0.02
2
0
0 1 0
Y Values
2 3 4

X Values
Application: Mutual Fund Sales:
Bivariate Distribution of X and Y:
Please check that () = 2.6875.

(, ) = 0.8125

The correlation coefficient between and is:


, = 0.678594476

This is a relatively large value, indicating that there is a decent


positive linear relationship between and .
Application: Mutual Fund Sales:
Conditional Distribution of given :
The joint distribution of and allows us to also determine the
distribution of for any given value of . We noted before that
this is called a conditional distribution, and it is a very important
concept.
As an example, we could ask what is the probability for = 2,
given that X=3?

This probability is denoted by P(Y=2 | X=3) and can be computed


as follows:
( = 2 | = 3) = ( = 2 = 3) / ( = 3) = 0.25
Repeating this then yields the conditional distribution of , given
= 3:
Application: Mutual Fund Sales:
Conditional Distribution of given :
Given = 3, we have:
0 1 2 3 4
(| = 3) 0.25 0.25 0.25 0.25 0

Similarly, for other values of , we have: :

0 1 2 3 4
(| = 0) 1 0 0 0 0
(| = 1) 0.75 0.25 0 0 0
(| = 2) 0.5 0.33333 0.16667 0 0
(| = 4) 0 0 0 0 1
Application: Mutual Fund Sales:
Conditional Expectation of given :
For each given value for , we can now compute the expected
value of .

We noted earlier that this is called a conditional expectation.


Conditional expected values of for different given values of
allows us to better understand the nature of the relationship
between and . This concept will be important in regression.

For our example here, this results in:


= 0 = 0; = 1 = 0.25
= 2 = 0.666666667; = 3 = 1.5;
(| = 4) = 4
Application: Mutual Fund Sales:
True Regression of given :
A plot of these shows that:

Conditional Expected Values of Y

4.5
4
3.5
3
E(Y|X=x)

2.5
2
1.5
1
0.5
0
0 1 2 3 4
Given X Values
Application: Mutual Fund Sales:
Conditional Expectation of given :
Observe that these conditional expected values vary depending on
the given value of .

The diagram clearly indicates a positive relationship between


and , which is consistent with our calculation of the correlation
coefficient.

In the context of our problem, this means that the greater the total
number of successes, the longer the run of successes before first
failure.

This is rather intuitive.


Application: Mutual Fund Sales:
True Regression of given :
In practice, nature (or mathematical form) of the true regression is
very complex or even untraceable.

Thus, we consider certain working model visualizing scatter


diagram and/or Matrix plot.

More on regression will be discussed later.


Thank you For Your Patience

The probability that we may fail in the struggle ought not to


deter us from the support of a cause we believe to be just.
~Abraham Lincoln

The 50-50-90 rule: anytime you have a 50-50 chance of getting


something right, there's a 90% probability you'll get it wrong.
~ Andy Rooney

You might also like