You are on page 1of 14

GEOMETRIC AND NEGATIVE BINOMIAL DISTRIBUTIONS

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Learning goals
To study the Geometric distribution and the Negative Binomial distribution using simple examples To understand the relation between them To understand their importance in real life situations

Applied Statistics and Computing Lab

Imagine someone proof-reading a book. Suppose the probability of finding a typographical mistake (a typo) on any page, is . And this probability of finding a typo on any page is independent of finding a typo on any other page of the book. How many pages will he/she have to read before finding the first typographical mistake?

Examples
Applied Statistics and Computing Lab

Imagine a door-to-door salesperson who has a target of selling 7 products per day. Suppose the probability of making a sale in any given house is . And this probability of making a sale in any house is independent of making a sale in any other house. How many doorbells would he/she have to ring on any day, to achieve the sales target?
3

Examples (contd.)
In both the above scenarios, it is easy to detect that the underlying trials are Bernoulli We term finding a typographical error as success in the first example and in the second example, sale of a product to a particular household is success In Binomial distribution, we studied the probability of successes over multiple trials Now we are also interested in studying the pattern of occurrence of success/successes For the sake of explanation we assume that every page can have maximum 1 typo and that every household can buy maximum 1 product
Applied Statistics and Computing Lab
4

Calculating the probability (1)


Consider P(Finding a typo on a particular page) = P Not finding a typo on a particular page = 1 = Suppose variable notes the number of pages that will be turned until the 1st typo is found Then = indicates that no typo was found on the first (j 1) pages and the first typo was found on the page
= = P No typo found on 1 1 pages P Typo found on the page = (1)
# of pages before 1st typo found 0 P(1st typo found on the corresponding page) 0 1

Page # where 1st typo found 1

2
3 j (j+1)

1
2 (j-1) j

1 1
2 1 (1) 1 1

Applied Statistics and Computing Lab

Calculating the probability (1) (contd.)


The probabilities of above , P = 1 , = 2 , ( = 3) are 0 , 1 , 2 and so on These are terms of a geometric progression with first term and common ratio Therefore such a variable is said to follow Geometric distribution

Applied Statistics and Computing Lab

Calculating the probability (2)


Suppose the proof-reader is new at job and has to report to the editor every time he/she finds typos in the book How many pages without a typo would have to be read before finding the typo? We know that, P(Finding a typo on a particular page) = P Not finding a typo on a particular page = 1 = Variable notes the number of pages without a typo until the typo is found Then = indicates that there are pages without a typo before the page with the typo i.e. ( + ) pages are read before finding the typo We must remember that the ( + ) page will always have the typo, as the experiment terminates when the typo is found

= = P pages with typos P pages with no typos #of ways pages without typos can occur within first + 1 pages + 1 =
Applied Statistics and Computing Lab
7

Calculating the probability (2) (contd.)


This is similar to finding the number of houses that the salesperson would have to approach before he can sell products In this experiment, there would be at least trials; if all the first trials are successes

above

+1

is the Binomial coefficient in the probability we evaluated

This coefficient is equal to (1) ( , which is the binomial coefficient with negative integers Therefore such a variable is said to follow Negative Binomial distribution Negative Binomial distribution is also referred to as Pascal distribution
Applied Statistics and Computing Lab
8

Formal definitions
Common underlying experiment: 1. Infinite Bernoulli trials are undertaken 2. Probability of success () is known and is constant at each trial 3. Trials are independent i.e. the outcome of one trial does not influence the outcome of any other trial

A variable X that denotes the number of trials undertaken to achieve the first success is said to follow Geometric distribution with the parameter A sharp shooter would enter the next round the moment he/she hits the first target, to a certain degree of accuracy Currently, we are all looking forward to the day when the value of Rupee will increase, the first time after the fall

A variable X that denotes the number of failures before achieving the success, is said to follow Negative Binomial distribution with parameters and An investor may consider it a success when the price of a share falls times, as some of his/her investment decisions may be based on that An associate professor would get tenure once he/she has publications
9

Applied Statistics and Computing Lab

PMF and statistics


The parameter of Geometric distribution is (the probability of success) We say that ~() For this variable, = j = where, j = 0,1,2, and 0 < < 1 and + = 1 The parameters of Negative Binomial distribution are (the number of successes to achieve) and (the probability of success) We say that ~(, ) For this variable,

= V =

1 2

1 () = = 1 where, j = 0,1,2, and 0 < < 1 and + = 1 and > 0

=
V =

We see here, that Geometric distribution is a special case of Negative Binomial distribution where = 1
Applied Statistics and Computing Lab
10

Alternate definition of Negative Binomial distribution


If denotes the number of trials that have to be undertaken before the success is achieved, still follows a Negative Binomial distribution and the PMF is given by: 1 () = = 1 where, j = , + 1 , and 0 < < 1 and + = 1 and > 0 For this distribution, =

V =

Applied Statistics and Computing Lab

11

Importance
The most important property of Geometric distribution is the lack of memory or memory less property, stated as below: > + | > = > for any 0, 0 This implies: if an event hasnt occurred up to a time point , the probability that it will occur within next time units i.e. between time units and , is equal to the probability that it occurs within time points 0 and Here, one time point refers to one trial For example, suppose an investor is observing the price of share A. If the price has not fallen during the first trading days under observation, P(the share price will fall within next P the share price will fall within next trading days = P(the share price will fall within any trading days) Negative binomial distribution is the discrete waiting time distribution as it describes the probability distribution of waiting for any success where 1
12

Applied Statistics and Computing Lab

More examples
Suppose that in a particular region, the probability of finding oil is equal, within each area of the same size. A company can evaluate the number of trials it may take before hitting an oil well, in order to decide whether to dig in that region or not For a particular type of insurance policy, suppose that the probability of death of any insured person is equal and known. Given the policy structure we can calculate to see if the insurance company can go bankrupt if claims are made within a year Suppose a graduate student keeps applying for a job until she gets three offers. If her probability of cracking every interview is equal and it is known, Negative Binomial distribution can help us determine the number of interviews she may have to appear for, before she stops applying The memory less property, for the same example as above: suppose that the student does not get any offer in the first interviews. The probability that she will get her first offer in the next interviews is equal to the probability that she gets a job in interviews
13

Applied Statistics and Computing Lab

Thank you

Applied Statistics and Computing Lab