You are on page 1of 38

binomial

Probability mass function

Binomial distribution
From Wikipedia, the free encyclopedia

"Binomial model" redirects here. For the


binomial model in options pricing,
see Binomial options pricing model.
See also: Negative binomial distribution
The probability that a ball in a Galton box with 8 layers
(n = 8) ends up in the central bin (k = 4) is

Cumulative distribution function

Notation

B(n,p)

Parameters n N0 number of trials


p [0,1] success probability in each trial

Support

In probability theory and statistics,


the binomial distribution with
parameters n and p is the discrete probability
distribution of the number of successes in a
sequence of n independent yes/no
experiments, each of which yields success
with probability p. A success/failure
experiment is also called a Bernoulli
experiment or Bernoulli trial; when n = 1, the
binomial distribution is a Bernoulli
distribution. The binomial distribution is the
basis for the popular binomial
test ofstatistical significance.
The binomial distribution is frequently used
to model the number of successes in a
sample of size n drawn with
replacement from a population of size N. If
the sampling is carried out without
replacement, the draws are not independent
and so the resulting distribution is
a hypergeometric distribution, not a binomial
one. However, for N much larger than n, the
binomial distribution is a good
approximation, and widely used.

k {0, , n} number of successes


Binomial distribution for
with n and k as in Pascal's triangle

pmf

CDF

Definition: A polynomial equation with two


terms usually joined by a plus or minus sign is
called a binomial. Binomials are used in
algebra. Polynomials with one term will be
called a monomial and could look like 7x. A

Mean

Median

Mode

Variance

or

or

polynomial with two terms is called a binomial, it could look like 3x + 9. It is easy to remember binomials
as bi means 2 and a binomial will have 2 terms.
Examples: 3x + 4 is an example of a binomial.
A binomial is a polynomial.

Examples of Polynomials

3x -2
Monomials: 3x, -2

5xy2 + 36xy + 52x + 6


Monomials: 5xy2, 36xy, 52x, 6

r2 + 2h
Monomials: r2, 2h

When to Multiply Polynomials


The instructions will ask you to multiply or simplify exercises that look like this:

(Polynomial) (Polynomial)
(Polynomial) (Polynomial)

(Polynomial) * (Polynomial)
Ads
ACLS Certification Coupon
www.aclscertification.com
Certify Today for a 15% Discount! The ACLS Certification Institute
UNIQLO MY Online Store
uniqlo.com/my/LimitedOffer
Shop Our Specially Priced Limited Offers Every Week. Check It Out!
Pinjaman Utk Gaji Rendah

ringgitplus.com/RHB-Easy-Loan
Kelulusan Dalam 10 Minit Sahaja. Pinjaman Max RM50k. Gaji Min RM1.5k

Algebra

Free Algebra Worksheets

Math Help
Math Tutor Online

Multiplying Polynomials
(Polynomial)x

(Polynomial)(Polynomial)

Note: When there is no multiplication symbol between 2 sets of parentheses, realize that
you are being called to multiply.

When to Not Multiply Polymonials


(Polynomial) + (Polynomial)
(Polynomial) (Polynomial)
Yes, I understand that parentheses encompass the polynomials, but pay attention to what
the exercise is asking you to do.
(3x + 5y) + (2x +-6y) does not equal (3x + 5y) (2x +-6y).

Practice with Constants


Multiply: (8 + 6)(-2 + 5)
Use order of operations:
1.

Parentheses
(8+ 6) = 14
(-2 + 5) = 3

2.

Multiply
14 * 3 = 42

Introducing FOIL
Heres another way of looking at it:

FOIL is method to multiply polynomials. It is an acronym for First, Outer, Inner, Last
1. First: (8 + 6)(-2 + 5)
Multiply the first terms: 8 * -2 = -16
2. Outer: (8 + 6)(-2 + 5)
Multiply the outer terms: 8 * 5 = 40
3. Inner: (8 + 6)(-2 + 5)
Multiply the inner terms: 6 * -2= -12
4. Last: (8 + 6)(-2 + 5)
Multiply the outer terms: 6 * 5 = 30
Next, add the results:
-16 + 40 + -12 + 30
Simplify:
-16 + 40 + -12 + 30 = 42
Now, let's practice multiplying polynomials with variables.

Practice with Positives


Simplify the following polynomials.
(x + 5)(x + 4)
1.
2.

First. Outer. Inner. Last.


x*x+x*4+5*x+5*4

3.
4.

Multiply.
x + 4x + 5x + 20
2

5.
6.

Simplify.
x + 9x + 20
2

Practice with Negatives


Simplify the following polynomials.
(x - 5)(x - 4)
1.
2.

Before you start FOILing, change the negative signs:


(x + -5)(x + -4)

3.
4.

Now FOIL: First. Outer. Inner. Last.


x * x + x * -4 + -5 * x + -5 * -4

5.
6.

Multiply.
x + -4x + -5x + 20
2

7.
8.

Simplify.
x + -9x + 20
2

Practice Exercises
1) (x + 3)(x - 3) =

2) (x - 6)(x + 4) =

3) (x - 8)(x - 94) (5j + 11)(j + 1)=


5) (5p - 7)(4p + 3)=

Word Origin and History for binomialExpand


1550s (n.); 1560s (adj.), from Late Latin binomius "having two personalnames," a hybrid fr
om bi- (see bi- )

+ nomius, from nomen (see name(n.)). Taken up 16c. in the algebraic sense "consisting of t
wo terms."

binomial distribution

Definition Add to FlashcardsSave to FavoritesSee Examples


Frequency distribution where only two (mutually exclusive) outcomes are possible, such as
better or worse, gain or loss, head or tail, rise or fall, success or failure, yes or no. Therefore, if
the probability of success in any given trial is known, binomial distributions can be employed to
compute a given number of successes in a given number of trials. And it can be determined if
an empirical distribution deviates significantly from a typical outcome. Also called Bernoulli
distribution after its discoverer, the Swiss mathematician Jacques Bernoulli (1654-1705). See
also poisson distribution.

The binomial probability distribution is useful when a total of n independent trials are conducted and we
want to find out the probability of r successes, where each success has probability p of occurring. There
are several things stated and implied in this brief description. The definition boils down to these four
conditions:
1.

Fixed number of trials

2.

Independent trials

3.

Two different classifications

4.

Probability of success stays the same for all trials

5. All of these must be present in the process under investigation in order to use the binomial
probability formula or tables. A brief description of each of these follows.

6. Fixed Trials
7. The process being investigated must have a clearly defined number of trials that does not vary.
We cannot alter this number midway through our analysis. Each trial must be performed the same
way as all of the others, although the outcomes may vary. The number of trials is indicated by
an n in the formula.
8. An example of this studying the outcomes from rolling a die for 11 times. The total number of
times that each trial (roll) is conducted is defined from the outset.

9. Independent Trials
10. Each of the trials have to be independent. Each trial should have absolutely no effect on any of
the others. The classical examples of rolling two dice or flipping several coins illustrate
independent events. Since the events are independent we are able to use the multiplication
rule to multiply the probabilities together.
11. In practice, especially due to some sampling techniques, there can be times when trials are not
technically independent.
12. A binomial distribution can sometimes be used in these situations as long as the population is
larger relative to the sampl

Poisson approximation[edit]
The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity
while the product np remains fixed. Therefore the Poisson distribution with parameter = np can be used as an
approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According
to two rules of thumb, this approximation is good if n 20 and p 0.05, or if n 100 and np 10.[10]

Limiting distributions[edit]

Poisson limit theorem: As n approaches and p approaches 0 while np remains fixed at > 0 or at
least np approaches > 0, then the Binomial(n, p) distribution approaches the Poisson
distribution with expected value .[10]

de MoivreLaplace theorem: As n approaches while p remains fixed, the distribution of

approaches the normal distribution with expected value 0 and variance 1.[citation needed] This result is
sometimes loosely stated by saying that the distribution of X isasymptotically normal with expected
value np and variance np(1 p). This result is a specific case of the central limit theorem.

Beta distribution[edit]
Beta distributions provide a family of conjugate prior probability distributions for binomial distributions
in Bayesian inference. The domain of the beta distribution can be viewed as a probability, and in fact
the beta distribution is often used to describe the distribution of a probability value p:[11]

Confidence intervals[edit]
Main article: Binomial proportion confidence interval
Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.
[12]

Because of this problem several methods to estimate confidence intervals have been

proposed.
Let n1 be the number of successes out of n, the total number of trials, and let

be the proportion of successes. Let z/2 be the 100(1 /2)th percentile of the standard
normal distribution.

Wald method

A continuity correction of 0.5/n may be added.[clarification needed]

Agresti-Coull method[13]

Here the estimate of p is modified to

ArcSine method[14]

Wilson (score) method[15]

The exact (Clopper-Pearson) method is the most


conservative.[12] The Wald method although commonly
recommended in the text books is the most biased.[clarification needed]

Normal approximation[edit]

Binomial probability mass function and normal probability density functionapproximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation
to B(n, p) is given by the normal distribution

and this basic approximation can be improved in a simple way by using a suitable continuity correction.
The basic approximation generally improves as n increases (at least 20) and is better when p is not near to
0 or 1.[8] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from
the extremes of zero or one:

One rule is that both x=np and n(1 p) must be greater than 5. However, the specific number varies
from source to source, and depends on how good an approximation one wants; some sources give 10
which gives virtually the same results as the following rule for large n until n is very large (ex: x=11,
n=7752).

A second rule[8] is that for n > 5 the normal approximation is adequate if

Another commonly used rule holds that the normal approximation is appropriate only if everything
within 3 standard deviations of its mean is within the range of possible values, [citation needed] that is if

The following is an example of applying a continuity correction. Suppose one wishes to calculate
Pr(X 8) for a binomial random variable X. If Y has a distribution given by the normal

approximation, then Pr(X 8) is approximated by Pr(Y 8.5). The addition of 0.5 is the continuity
correction; the uncorrected normal approximation gives considerably less accurate results.
This approximation, known as de MoivreLaplace theorem, is a huge time-saver when
undertaking calculations by hand (exact calculations with large n are very onerous); historically, it
was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine
of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit
theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with
parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value
of p using x/n, the sample proportion and estimator of p, in a common test statistic.[9]
For example, suppose one randomly samples n people out of a large population and ask them
whether they agree with a certain statement. The proportion of people who agree will of course
depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the
proportions would follow an approximate normal distribution with mean equal to the true
proportion p of agreement in the population and with standard deviation = (p(1 p)/n)1/2.

Poisson approximation[edit]
The binomial distribution converges towards the Poisson distribution as the number of trials goes
to infinity while the product np remains fixed. Therefore the Poisson distribution with
parameter = np can be used as an approximation to B(n, p) of the binomial distribution if n is
sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is
good if n 20 and p 0.05, or if n 100 and np 10.[10]

Limiting distributions[edit]

Poisson limit theorem: As n approaches and p approaches 0 while np remains fixed at


> 0 or at least np approaches > 0, then the Binomial(n, p) distribution approaches
the Poisson distribution with expected value .[10]

de MoivreLaplace theorem: As n approaches while p remains fixed, the distribution of

approaches the normal distribution with expected value 0 and variance 1.[citation needed] This result is
sometimes loosely stated by saying that the distribution of X isasymptotically normal with expected
value np and variance np(1 p). This result is a specific case of the central limit theorem.

Beta distribution[edit]
Beta distributions provide a family of conjugate prior probability distributions for binomial
distributions in Bayesian inference. The domain of the beta distribution can be viewed as

a probability, and in fact the beta distribution is often used to describe the distribution of
a probability value p:[11]

Poisson Distribution
A Poisson distribution is the probability distribution that results from a Poisson experiment.

Attributes of a Poisson Experiment


A Poisson experiment is a statistical experiment that has the following properties:

The experiment results in outcomes that can be classified as successes or failures.

The average number of successes () that occurs in a specified region is known.

The probability that a success will occur is proportional to the size of the region.

The probability that a success will occur in an extremely small region is virtually zero.

Note that the specified region could take many forms. For instance, it could be a length, an area, a
volume, a period of time, etc.

Notation
The following notation is helpful, when we talk about the Poisson distribution.

e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural


logarithm system.)

: The mean number of successes that occur in a specified region.

x: The actual number of successes that occur in a specified region.

P(x; ): The Poisson probability that exactly x successes occur in a Poisson


experiment, when the mean number of successes is .

Poisson Distribution
A Poisson random variable is the number of successes that result from a Poisson experiment.
Theprobability distribution of a Poisson random variable is called a Poisson distribution.

Given the mean number of successes () that occur in a specified region, we can compute the Poisson
probability based on the following formula:

Poisson Formula. Suppose we conduct a Poisson experiment, in which the average


number of successes within a given region is . Then, the Poisson probability is:
P(x; ) = (e-) (x) / x!
where x is the actual number of successes that result from the experiment, and eis
approximately equal to 2.71828.
The Poisson distribution has the following properties:

The mean of the distribution is equal to .

The variance is also equal to .

Example 1
The average number of homes sold by the Acme Realty company is 2 homes per day. What is the
probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following:

= 2; since 2 homes are sold per day, on average.

x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.

e = 2.71828; since e is a constant equal to approximately 2.71828.

We plug these values into the Poisson formula as follows:


P(x; ) = (e-) (x) / x!
P(3; 2) = (2.71828-2) (23) / 3!
P(3; 2) = (0.13534) (8) / 6
P(3; 2) = 0.180
Thus, the probability of selling 3 homes tomorrow is 0.180 .

Poisson Calculator
Clearly, the Poisson formula requires many time-consuming computations. The Stat Trek
Poisson Calculator can do this work for you - quickly, easily, and error-free. Use the Poisson
Calculator to compute Poisson probabilities and cumulative Poisson probabilities. The

calculator is free. It can be found under the Stat Tables tab, which appears in the header of
every Stat Trek web page.

Poisson Calculator

Cumulative Poisson Probability


A cumulative Poisson probability refers to the probability that the Poisson random variable is
greater than some specified lower limit and less than some specified upper limit.
Example 1
Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists
will see fewer than four lions on the next 1-day safari?
Solution: This is a Poisson experiment in which we know the following:

= 5; since 5 lions are seen per safari, on average.

x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than
4 lions; that is, we want the probability that they will see 0, 1, 2, or 3 lions.

e = 2.71828; since e is a constant equal to approximately 2.71828.

To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus, we
need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute this
sum, we use the Poisson formula:
P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)
P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] + [ (0.006738)(125) /
6]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650
Thus, the probability of seeing at no more than 3 lions is 0.2650.

Statistics

The Poisson Distribution

In the picture above are simultaneously portrayed several Poisson distributions. Where the rate of
occurrence of some event, r (in this chart called lambda or ) is small, the range of likely possibilities will
lie near the zero line. Meaning that when the rate r is small, zero is a very likely number to get. As the rate
becomes higher (as the occurrence of the thing we are watching becomes commoner), the center of the
curve moves toward the right, and eventually, somewhere around r = 7, zero occurrences actually
become unlikely. This is how the Poisson world looks graphically. All of it is intuitively obvious. Now we will
back up a little and begin over, with you and your mailbox.
Suppose you typically get 4 pieces of mail per day. That becomes your expectation, but there will be a
certain spread: sometimes a little more, sometimes a little less, once in a while nothing at all. Given only
the average rate, for a certain period of observation (pieces of mail per day, phonecalls per hour,
whatever), and assuming that the process, or mix of processes, that produce the event flow are
essentially random, the Poisson Distribution will tell you how likely it is that you will get 3, or 5, or 11, or
any other number, during one period of observation. That is, it predicts the degree of spread around a
known average rate of occurrence. (The average or likeliest actual occurrence is the hump on each of the
Poisson curves shown above). For small values of p, the Poisson Distribution can simulate the Binomial
Distribution (the pattern of Heads and Tails in coin tosses), and it is much easier to compute.

Application

Derivation (on separate page)

Computing Poisson Probabilities

The Classic Example

Features

Poisson Paper

Types of Problem

Approximation to Binomial

Summary

End Matter

Application
The Poisson distribution applies when: (1) the event is something that can be counted in whole
numbers; (2) occurrences are independent, so that one occurrence neither diminishes nor increases the
chance of another; (3) the average frequency of occurrence for the time period in question is known;
and (4) it is possible to count how many events have occurred, such as the number of times a firefly lights
up in my garden in a given 5 seconds, some evening, but meaningless to ask how many such events
have not occurred. This last point sums up the contrast with the Binomial situation, where the probability
of each of two mutually exclusive events (p and q) is known. The Poisson Distribution, so to speak, is the
Binomial Distribution Without Q. In those circumstances, and they are surprisingly common, the Poisson
Distribution gives the expected frequency profile for events. It may be used in reverse, to test whether a
given data set was generated by a random process. If the data fit the Poisson Expectation closely, then
there is no strong reason to believe that something other than random occurrence is at work. On the other
hand, if the data are lumpy, we look for what might be causing the lump.
The Poisson situation is most often invoked for rare events, and it is only with rare events that it can
successfully mimic the Binomial Distribution (for larger values of p, the Normal Distribution gives a better
approximation to the Binomial). But the Poisson rate may actually be any number. The real contrast is that
the Poisson Distribution is asymmetrical: given a rate r = 3, the range of variation ends with zero on one
side (you will never find "minus one" letter in your mailbox), but is unlimited on the other side (if the label
machine gets stuck, you may find yourself some Tuesday with 4,573 copies of some magazine spilling all
over your front yard - it's not likely, but you can't call it impossible). The Poisson Distribution, as a data set
or as the corresponding curve, is always skewed toward the right, but it is inhibited by the Zero
occurrence barrier on the left. The degree of skew diminishes as r becomes larger, and at some point the
Poisson Distribution becomes, to the eye, about as symmetrical as the Normal Distribution. But much
though it may come to resemble the Normal Distribution, to the eye of the person who is looking at a
graph for, say, r = 35, the Poisson is really coming from a different kind of world event.

History. The Poisson Distribution is named for its discoverer, who first applied it to the deliberations
of juries; in that form it did not attract wide attention. More suggestive was Poisson's application to the
science of artillery. The distribution was later and independently discovered by von Bortkiewicz,
Rutherford, and Gosset. It was von Bortkiewicz who called it The Law of Small Numbers, but as noted
above, though it has a special usefulness at the small end of the range, a Poisson Distribution may also
be computed for larger r. The fundamental trait of the Poisson is its asymmetry, and this trait it preserves
at any value of r.

Derivation. The Poisson Distribution has a close connection with the Binomial, Hypergeometric,
and Exponential Distributions, and can be derived as an extreme case of any of them. The Poisson can
also be derived from first principles, which involve the growth constant . That derivation is given on a
separate page, for those who like to see the inner workings of the universe up close. Other readers may
proceed directly to the how-to-do-it instructions in the next section.

Computing Poisson Probabilities


We found, on the Derivation page, that when the average rate of occurrence of some event per module of
observation is r, we can calculate the probability of any given number of actually observed occurrences,
k, by substituting in the formula
p(k) = r*k / (k!)(*r)

(5)

Before going on, consider the following:


It will be noticed that in our formula, the only variable quantity is the rate r. That number is the only way in
which one Poisson situation differs from another, and it is the only determining variable (parameter) of the
Poisson equation. Nothing else enters in.
Each number r defines a different Poisson distribution. We cannot multiply by 10 the values for the
distribution whose rate is r = 1 and get the values for r = 10. The latter must be calculated separately, and
will be found to have a different shape. Specifically, the larger the r for any given unit of occurrence, the
more symmetrical is the resulting frequency profile. This we already noticed in the picture at the top of this
page.
What we here call rate of occurrence, or r, is conventionally called lambda (). Remember to make that
adjustment when consulting other textbooks or tables.
Calculating Poisson probabilities ideally requires a statistical calculator, with x*y and *x keys (remember
that is the constant 2.71828). Absent such a calculator, certain individual probabilities may be computed
with the aid of the *x Table. For selected simple values of r, problems may be solved using
the Tables here provided.

Example. Let us suppose that some event, say the arrival of a weird particle from outer space at a
counter on some farm outside Topeka, occurs on average 2 times per hour. But there are variations from
that average. What is the probability that in a given hour three weird particles will be recorded?
Substituting in formula (5) the empirical rate r = 2 and the expectation k = 3, we get:
p(3) = r*3 / 3!*r = 2*3 / 6*2 = 8 / (6)(7.3891) = 8 / 44.3346 = 0.1804
This answer may be checked with the one given in the Poisson Table, and will be found to match. This
sort of calculation was in fact how the Table was constructed.
In rough terms, then, if our weird particles average 2 per hour but vary randomly around that average, and
thus fit the random Poisson model, we would expect to get 3 rather than 2 weird particles per hour, at the
counter over by the silo, in about 0.1804 of the hours observed. If we only watch for one hour, our reading
will most likely be 2 particles. But there are 24 hours in a day, and in an average day, there should thus be
(24)(0.1804) = approximately 4 hours during which 3 particles are registered. Of course, things can vary
from that most likely expectation; that is the way the universe works. But now we know what the most
likely expectation is. It is such likeliest expectations that the Poisson formula gives us.
Just to show how the whole situation looks, here, from the Table, is the frequency profile for r = 2, omitting
the extremely rare possibilities:
r = 2.0
p(0) 0.1353
p(1)

0.270
7

p(2)

0.270
7

p(3) 0.1804
p(4) 0.0902
p(5) 0.0361
p(6) 0.0120
p(7) 0.0034
p(8) 0.0009
p(9) 0.0002

It will be seen that the realistic possibilities for occurrence per hour go no lower than zero (which would be
physically impossible), and that they reach as high as 9 per hour before becoming so miniscule that they
do not show up in four decimal places. If we add these probabilities, we get 0.9999, or 1 (the total
probability in the system) plus an effect of rounding error. This, then, is a virtually complete picture of the
possibilities. So also with every other column of the Table.
Browsing one of those Tables will illustrate the fact that the Poisson is cramped on the zero side, but
spreads out on the infinity side. The list of possible values is thus asymmetrical (the statistical term is
"skew"). Such situations, where variation from an average is easier in one direction than another, are
very common in real life, and this is one thing that accounts for the fact that so many situations are well
described by the Poisson distribution. (For the Normal Distribution, the assumption is that variation is
equally likely in either direction from the average).
For the set of probabilities (frequency profiles) for selected average rates r, consult the Poisson Table. To
calculate individual probabilities, use formula (5) above. Rough probabilities may be obtained by the use
of Poisson Paper. This, and clear thinking, are all that are required to work with the Poisson distribution.
The clear thinking is the hardest part, as the Problem set will presently demonstrate.

The Classic Example

The classic Poisson example is the data set of von Bortkiewicz (1898), for the chance of a Prussian
cavalryman being killed by the kick of a horse. Ten army corps were observed over 20 years, giving a
total of 200 observations of one corps for a one year period. The period or module of observation is thus
one year. The total deaths from horse kicks were 122, and the average number of deaths per year per
corps was thus 122/200 = 0.61. This is a rate of less than 1. It is also obvious that it is meaningless to ask
how many times per year a cavalryman was not killed by the kick of a horse. In any given year, we expect
to observe, well, not exactly 0.61 deaths in one corps (that is not possible; deaths occur in modules of 1),
but sometimes none, sometimes one, occasionally two, perhaps once in a while three, and (we might
intuitively expect) very rarely any more. Here, then, is the classic Poisson situation: a rare event, whose
average rate is small, with observations made over many small intervals of time.
Let us see if our formula gives a close fit for the actual Prussian data, where r = 0.61 is the average
number expected per year for the whole sample, and the successive terms of the Poisson formula are the
successive probabilities. Remember that our formula for each term in the distribution is:
p(k) = r*k / (k!)(*r)

(5)

We may start by asking, given r = 0.61, what is the probability of no deaths by horse kick in a given year
(module of observation)? For k = 0, we get by substitution

p(0) = (0.61)*0 / (0!)(*0.61) = 1 / (1)(1.8404) = 0.5434


Given that probability, then over the 200 years observed we should expect to find a total of 108.68 = 109
years with zero deaths. It turns out that 109 is exactly the number of years in which the Prussian data
recorded no deaths from horse kicks. The match between expected and actual values is not merely good,
it is perfect.
If we had used instead, as an approximation, the value of *0.6 from our table, we would have gotten p(0)
= 0.5488, so that the expected number of such years over 200 years would be 109.76 = 110, or 1 too
high. Not bad.
For the entire set of Prussian data, where p = the predicted Poisson frequency for a given number of
deaths per year, E is the corresponding number of years in which that number of deaths is expectedto
occur in our 200 samples (that is, our p value times 200), and A is the actual number of years in which
that many deaths were observed, we have:
Deaths

0.54335

108.67 109

0.33145

66.29

65

0.10110

20.22

22

0.02055

4.11

0.00315

0.63

0.00040

0.08

0.00005

0.01

and the match seems very good throughout. (Not perfect. But it is intuitively obvious that another trial,
over another 200 years' worth of data, would give slightly different results, and this is a perfectly plausible
example of one such result).
In sum, then, we assume that the Poisson frequency profile gives the expectation (E) when the events in
question are indeed random. Comparing that expectation with our actual results (A), we judge that the
Prussian data set appears to be the result of random causes. There is no reason to suspect any
systematic cause, or any connection between separate events. These deaths, then, just happened. (If illtrained horses were supplied to all corps in one year, for instance, the pattern of deaths should be more
clustered, and we would have a nonrandom factor). It is the ability of the Poisson Distribution to give a
model for stuff that "just happens," that accounts for its power in statistics. Statistics is about stuff that
"just happens."

Features
The Poisson distribution has several unique features. Most distinctively, as noted above, it has only one
parameter, namely the average frequency of the event. That figure is conventionally called lambda (); we
here use instead the abbreviation r (for rate).
The Poisson distribution is not symmetrical; it is skewed toward the infinity end.
The mean of any Poisson distribution is equal to its variance, that is

m=v
which is a unique property of this distribution. (Note that "mean" here is the average of all values, and
defines the center of gravity of the distribution; it is not a point from which values diverge symmetrically;
the Poisson Distribution is not symmetrical). It is sometimes said that the Poisson mean is an
"expectation." It is true that the commonest frequency in any Poisson set is the one corresponding
to r itself. But it is also true that if r is a whole number, the expectation for (r-1) is identical to that for r, so
that where r > 1, the "expectation" is a pair of outcomes, not one single outcome.
For fractional r, where the likeliest or equally likeliest frequency is 0, the histogram of a Poisson set of
frequencies is high on the left and skewed toward the right. For the Prussian horse data, above, where r =
0.61, it looks like this:

0.5 0.3 0.1 0.0 0.00 0.0 0.0


43 31 01 21 3 00 00
6
E 0
1 2
3
4
5
p

As the average frequency (r) increases, the histogram becomes a little humpier in the middle (see
the Poisson Table for an overview of the pattern up to r = 20), but it never becomes perfectly symmetrical,
and thus it never loses its distinctive character as a distribution. That character, however,
does weaken with increasing r.

Poisson Paper
Poisson paper is specially printed for the easy analysis of raw data. If you plot data points on Poisson
paper, they will lie on a vertical line if the set is random in the sense assumed by the Poisson formula. If
the resulting line is not vertical, then to that degree, the data set is non-Poisson.

Types of Problem
The situations to which Poisson distributions apply are diverse, and it is not always easy to see at first
glance that they are specimens of one underlying type. We give here examples of three common types of
Poisson problem. These sample problems will be repeated on the Practice page, along with other
problems of the same general type.

Keep in mind that all we have to work with are (1) a rate of occurrence, r, which may be any number; (2) a
window of observation; a timespan or a space within which occurrences are observed, and (3) the
number of times the event, as seen through that given window, is repeated.

Isolated Events
It has been observed that the average number of traffic accidents on the Hollywood Freeway between 7
and 8 PM on Wednesday mornings is 1 per hour. What is the chance that there will be 2 accidents on the
Freeway, on some specified Wednesday morning?

Answer. The basic rate is r = 1 (in hour units), and our window is 1 hour. We wish to know the
chance of observing 2 events in that window. The rate r = 1 is included in the Poisson Table, so we don't
have to calculate anything. Reading down the r = 1 column, we come to the p(2) row, and there we find
that the probability of 2 accidents is 0.1839, or a little less than 1 chance in 5. It's not unlikely. You might
get that situation about once a week.

Proportions
Coliform bacteria are randomly distributed in a certain Arizona river at an average concentration of 1 per
20cc of water. If we draw from the river a test tube containing 10cc of water, what is the chance that the
sample contains exactly 2 coliform bacteria?

Answer. Our window of observation is 10cc. If the concentration is 1 per 20cc, it is also 0.5 per 10cc;
that is just another way of saying the same thing. So r = 0.5 is the rate relevant to our chosen window (if
we used a 20cc test tube, or window, the rate would be different, and the resulting frequency
profile would also be different). We can then read off any desired probability from the r = 0.5 column of
the Poisson Table. For the specific value of p(2), the table supplies the answer 0.0758, or about 1 chance
in 13. Not common, but not out of the question either. About once in 8 tries with that unit of observation.

Arrivals
The switchboard in a small Denver law office gets an average of 2.5 incoming phonecalls during the noon
hour on Thursdays. Staffing is reduced accordingly; people are allowed to go out for lunch in rotation.
Experience shows that the assigned levels are adequate to handle a high of 5 calls during that hour. What
is the chance that 6 calls will be received in the noon hour, some particular Thursday, in which case the
firm might miss an important call?

Answer. The rate 2.5, and the window of observation is 1 hour. The desired result is easily read off
the Poisson Table, from the p(6) row of the r = 2.5 column. The answer is p(6) = 0.0278, or about 1
chance in 36, or a little more than 1 missed phonecall per month. How acceptable that is will depend on
how cranky the firm's clients are, and the firm itself is in the best position to make that judgement.

Approximation to Binomial
Besides handling Poisson problems proper, the Poisson Distribution can give an useful simulation of the
Binomial Distribution when p is small (one rule of thumb is that it should be no greater than 0.1). In these
cases, q is known (as in true Poisson problems it is not), but it is simply discarded; we pay no attention to
it. In the range where the Poisson approximation is reasonably close, it is much less difficult to calculate,
and is often preferred in practice.

Sample Binomial Problem


Rick has a crooked quarter, which comes up Heads 80% of the time. He tells Jimmie he will get 7 or more
Heads in 10 tosses. Jimmie bets the family horse that there will be 6 or fewer Heads. What is Jimmie's
chance of riding home from the wager?
This can be worked out by Binomial methods, which are the ones strictly proper to it. To adjust to Poisson
perspectives, we take as r the rate of the rarer event (T = Tails), with an average rate of 2 per 10 tosses
or 0.2. This exceeds the above recommended level, but we will go ahead anyway, just to see what
happens.
We are making a trial observation for another 10 tosses. The expectation for those 10 tosses is ten times
the rate for one toss; hence we expect T = 2, and the rate also becomes 2 (per set of 10). Rick bet on
between 3 and 0 Tails, so Jimmie wins only if the 10 tosses yield 4 or more Tails. From the Poisson
Table for r = 2, we find that the sum of probabilities p(4) through p(9) gives 0.1428, or about 1 in 7, as
Jimmie's chance of winning. (There is no value for p(10), that frequency being so small that it does not
show up in four place decimals, so it is not included in the table).
If we go back and do this over as as a Binomial problem, we would have have n = 10 (there are 10
tosses), and p(T) = 0.2000 (the coin comes up Tails, on average, 2 times out of 10). The exact Binomial
answer for Rickie's chance of winning (to four places) is 0.1209. The Poisson approximation was 0.1428.
The Poisson approximation in this case is 18% high; that is, it is only roughly right. This is the
consequence of our having exceeded the recommended figure of p = 0.1. This may remind us that
Poisson is not an easier way of getting any and all Binomial results. It is a different animal, one which
under certain conditions leaves similar tracks as it lopes on its own errands through the statistical woods.

Summary

The Poisson distribution deals with mutually independent events, occurring at a known
and constant rate r per unit (of time or space), and observed over a certain unit of time or
space.

The probability of k occurrences in that unit can be calculated from p(k) = r*k / (k!)(*r).

The rate r is the expected or most likely outcome (for whole number r greater than 1, the
outcome corresponding to r-1 is equally likely).

The frequency profile of Poisson outcomes for a given r is not symmetrical; it is skewed
more or less toward the high end.

For Binomial situations with p < 0.1 and reasonably many trials, the Poisson Distribution
can acceptably mimic the Binomial Distribution, and is easier to calculate.

Problems
The first of these items is mandatory, for practice with the above Poisson explanation (you don't know it
until you can do it). The second and third are problems dealing with the number , which plays a
fundamental role in questions of this type. The rest are a mix of Sinological and standard puzzlements,
some of which have turned up elsewhere in this site, and are gathered here for their recreational value.

Poisson Distribution
Definition 1: The Poisson distribution has pdf given by

The parameter is often replaced by .

Figure 1 Poisson Distribution

Observation: Some key statistical properties of the Poisson distribution are:

Mean =
Median =
Skewness = 1 /
Kurtosis = 1/

Excel Function: Excel provides the following function for the Poisson distribution:
POISSON(x, , cum) where = the mean of the distribution and cum takes the values TRUE
and FALSE
POISSON(x, , FALSE) = probability density function value f(x) at the value x for the Poisson
distribution with mean .

POISSON(x, , TRUE) = cumulative probability distribution function F(x) at the valuex for the
Poisson distribution with mean .
Excel 2010/2013 provide the additional function POISSON.DIST which is equivalent to
POISSON.
Real Statistics Function: Excel doesnt provide a worksheet function for the inverse of the
Poisson distribution. Instead you can use the following function provided by the Real Statistics
Resource Pack.
POISSON_INV(p, ) = smallest integer x such that POISSON(x, , TRUE) p
Note that the maximum value of x is 1,024,000,000. A value higher than this indicates an error.
Theorem 1: If the probability p of success of a single trial approaches 0 while the number of
trials n approaches infinity and the value = np stays fixed, then the binomial distribution B(n,
p) approaches the Poisson distribution with mean .
Click here for the proof of this theorem.
Observation: Based on Theorem 1 the Poisson distribution can be used to estimate the
binomial distribution when n 50 and p .01, preferably with np 5.
Example 1: A company produces high precision bolts so that the probability of a defect is .05%.
In a sample of 4,000 units what is the probability of having more than 3 defects?
We can solve this problem using the distribution B(4000, .0005), namely the desired
probability is
1 BINOMDIST(3, 4000, .0005, TRUE) = 1 0.857169 = 0.142831
We can also use the Poisson approximation as follows:
= np = 4000(.0005) = 2
1 POISSON(3, 2, TRUE) = 1 0.857123 = 0.142877
As you can see the approximation is quite accurate.

Observation: If the average number of occurrences of a particular event in an hour (or some
other unit of time) is and the arrival times are random without any tendency to bunch up then
the probability of x events occurring in an hour is given by

Example 2: A large department store sells on average 100 MP3 players a week. Assuming that
purchases are as described in the above observation, what is the probability that the store will
run out of MP3 players in a week if they stock 120 players? How many MP3 players should the
store stock in order to make sure that it has a 99% probability of being able to supply a weeks
demand?
The probability that they will sell 120 MP3 players in a week is
POISSON(120, 100, TRUE) = 0.977331
Thus, the answer to the first problem is 1 0.977331 = 0.022669, or about 2.3%. We can answer
the second question by using successive approximations until we arrive at the correct answer.
E.g. we could try x = 130, which is higher than 120. The cumulative Poisson is 0.998293, which
is too high. We then pick x = 125 (halfway between 120 and 130). This yields 0.993202, which is
a little too high, and so we try 123. This yields 0.988756, which a little too low, and so we finally
arrive at 124 which has cumulative Poisson of 0.991226.
Observation: We have observed that under the appropriate conditions the binomial
distribution can be approximated by either the Poisson or normal distribution. We conclude this
section by stating that the Poisson distribution can be approximated by the normal distribution.
Theorem 2: For n sufficiently large (usually n 20), if x has a Poisson distribution with
mean , then x ~ N(,
).
14 Responses to Poisson Distribution
1.

Anson says:
August 10, 2015 at 11:30 am

Hi.May I ask a question?


When n (from 5 to 10 and 20)increases what happens on the probability distribution graph?
(binomial, poisson and normal)
Reply

Charles says:

August 10, 2015 at 9:48 pm

It really depends on what happens with the other parameters.


Binomial: If the other parameters in the BINOMDIST function are held constant then the
cumulative distribution values decrease (e.g. compare BINOMDIST(4,n,.7,TRUE) for n = 5, 10,
20.
Poisson: If you assume that the mean of the distribution = np, then the cumulative distribution
values decrease (e.g. compare POISSON(2,np,TRUE) where p = .5 for n = 5, 10, 20.
Normal: It really depends on how you are going to use n since NORMDIST doesnt directly use
n.

Normal Distribution
Data can be "distributed" (spread out) in different ways.
It can be spread out
more on the left

Or more on the right

Or it can be all jumbled up

But there are many cases where the data tends to be around a central value
with no bias left or right, and it gets close to a "Normal Distribution" like this:

A Normal Distribution
The "Bell Curve" is a Normal Distribution.
And the yellow histogram shows some data that
follows it closely, but not perfectly (which is usual).

It is often called a "Bell Curve"


because it looks like a bell.

Many things closely follow a Normal Distribution:

heights of people

size of things produced by machines

errors in measurements

blood pressure

marks on a test

We say the data is "normally distributed":

The Normal Distribution has:

mean = median = mode

symmetry about the center

50% of values less than the mean


and 50% greater than the mean

Quincunx
You can see a normal distribution being created by random
chance!
It is called the Quincunx and it is an amazing machine.
Have a play with it!

Standard Deviations
The Standard Deviation is a measure of how spread out numbers are (read
that page for details on how to calculate it).
When we calculate the standard deviation we find that (generally):

68% of values are within


1 standard deviation of the mean

95% of values are within


2 standard deviations of the mean

99.7% of values are within


3 standard deviations of the mean

Example: 95% of students at school are between 1.1m and 1.7m tall.

Assuming this data is normally distributed can you calculate the mean and
standard deviation?
The mean is halfway between 1.1m and 1.7m:

Mean = (1.1m + 1.7m) / 2 = 1.4m


95% is 2 standard deviations either side of the mean (a total of 4 standard
deviations) so:

1 standard deviation

= (1.7m-1.1m) / 4
= 0.6m / 4
= 0.15m
And this is the result:

It is good to know the standard deviation, because we can say that any value
is:

likely to be within 1 standard deviation (68 out of 100 should be)

very likely to be within 2 standard deviations (95 out of 100 should be)

almost certainly within 3 standard deviations (997 out of 1000 should


be)

Standard Scores
The number of standard deviations from the mean is also called the
"Standard Score", "sigma" or "z-score". Get used to those words!

Example: In that same school one of your friends is 1.85m tall

You can see on the bell curve that 1.85m is 3 standard


deviations from the mean of 1.4, so:

Your friend's height has a "z-score" of 3.0

It is also possible to calculate how many standard deviations 1.85 is from the
mean
How far is 1.85 from the mean?
It is 1.85 - 1.4 = 0.45m from the mean
How many standard deviations is that? The standard deviation is 0.15m, so:
0.45m / 0.15m = 3 standard deviations
So to convert a value to a Standard Score ("z-score"):

first subtract the mean,

then divide by the Standard Deviation

And doing that is called "Standardizing":

We can take any Normal Distribution and convert it to The Standard Normal
Distribution.
Example: Travel Time

A survey of daily travel time had these results (in minutes):


26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34
The Mean is 38.8 minutes, and the Standard Deviation is 11.4
minutes (you can copy and paste the values into the Standard Deviation
Calculator if you want).
Convert the values to z-scores ("standard scores").

To convert 26:

first subtract the mean: 26 - 38.8 = -12.8,


then divide by the Standard Deviation: -12.8/11.4 = -1.12
So 26 is -1.12 Standard Deviations from the Mean

Here are the first three conversions


Original Value

Calculation

Standard Score
(z-score)

26

(26-38.8) / 11.4 =

-1.12

33

(33-38.8) / 11.4 =

-0.51

65

(65-38.8) / 11.4 =

+2.30

...

...

...

And here they are graphically:

You can calculate the rest of the z-scores yourself!

Here is the formula for z-score that we have been using:

z is the "z-score" (Standard Score)

x is the value to be standardized

is the mean

is the standard deviation

Why Standardize ... ?


It can help us make decisions about our data.
Example: Professor Willoughby is marking a test.

Here are the students results (out of 60 points):


20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17

Most students didn't even get 30 out of 60, and most will fail.
The test must have been really hard, so the Prof decides to Standardize all the
scores and only fail people 1 standard deviation below the mean.
The Mean is 23, and the Standard Deviation is 6.6, and these are the
Standard Scores:
-0.45, -1.21 , 0.45, 1.36, -0.76, 0.76, 1.82, -1.36 , 0.45, -0.15, -0.91
Only 2 students will fail (the ones who scored 15 and 14 on the test)
It also makes life easier because we only need one table (the Standard Normal
Distribution Table ), rather than doing calculations individually for each value of
mean and standard deviation.

In More Detail
Here is the Standard Normal Distribution with percentages for every half of a
standard deviation, and cumulative percentages:

Example: Your score in a recent test was 0.5 standard deviations above the
average, how many people scored lower than you did?

Between 0 and 0.5 is 19.1%

Less than 0 is 50% (left half of the curve)

So the total less than you is:

50% + 19.1% = 69.1%


In theory 69.1% scored less than you did (but with real data the percentage
may be different)

A Practical Example: Your company packages sugar in 1 kg


bags.
When you weigh a sample of bags you get these results:

1007g, 1032g, 1002g, 983g, 1004g, ... (a hundred measurements)

Mean = 1010g

Standard Deviation = 20g

Some values are less than 1000g ... can you fix that?
The normal distribution of your measurements looks like this:

31% of the bags are less than 1000g,


which is cheating the customer!

It is a random thing, so we can't stop bags having less than 1000g, but we can
try to reduce it a lot.
Let's adjust the machine so that 1000g is:

at 3 standard deviations:
From the big bell curve above we see that 0.1% are less. But maybe that is
too small.

at 2.5 standard deviations:


Below 3 is 0.1% and between 3 and 2.5 standard deviations is 0.5%,
together that is 0.1% + 0.5% = 0.6% (a good choice I think)

So let us adjust the machine to have 1000g at 2.5 standard


deviations from the mean.
Now, we can adjust it to:

increase the amount of sugar in each bag (which changes the mean), or

make it more accurate (which reduces the standard deviation)

Let us try both.

ADJUST

THE MEAN AMOUNT IN EACH BAG

The standard deviation is 20g, and we need 2.5 of them:

2.5 20g = 50g


So the machine should average 1050g, like this:

ADJUST THE ACCURACY OF THE MACHINE

Or we can keep the same mean (of 1010g), but then we need 2.5 standard deviations to be
equal to 10g:

10g / 2.5 = 4g
So the standard deviation should be 4g, like this:
(We hope the machine is that accurate!)
Or perhaps we could have some combination of

You might also like