You are on page 1of 37

PROBABILITY DISTRIBUTIONS

Dr. Onoja Matthew AKPA


Department of Epidemiology and Medical Statistics,
Faculty of Public Health, College of Medicine,
University of Ibadan

1
Probability Distributions
A probability distribution is a distribution of
data based on the likelihood that an event or
indicator will occur in a sample of the
population.

Knowledge of the probability distribution of a


variable allows us to come to conclusions
about a population based on data taken from
a sample of that population.
2
Discrete Probability
Distribution
I Geometric Distribution
Ii Binomial Distribution
Iii Poisson Distribution
iv Hyper-geometric Distribution
v Negative binomial distribution

3
Binomial Distribution
Properties:
Trial has only two possible outcomes-dichotomous
Trials repeated many times (or twice at least) - n
Successive trials are independent
Same probability of success/failure in each trial, (P)
and 1- P respectively
The random variable X is the number of successes
in the n trials.
4
Binomial Formula

Pr (r successes) = n! pr qn-r
r!(n - r)!
Where q= 1 - p
n!=(n factorial) =n(n-1)(n-2)(n-3)2x1

5
Factorial !
0! =1
1! =1
2! = 2x1
3! = 3x2x1
4! = 4x3x2x1
..

6
Example
Suppose that 24% of a certain population have
blood group O. For a sample of 20 drawn from
this population find the probability that:
i. Exactly three persons has blood group O
ii. Three or more persons has blood group O
iii. Exactly five persons has group O

7
Exercise
Suppose it is known that the probability of
recovery from a certain disease is 0.4. if 15
randomly selected people are affected by the
disease, what is the probability that
i. At least 1 will recover
ii. Four or more will recover
iii. Fewer than three will recover

8
Poisson Distribution
Distribution of a discrete random variable
X with parameter , the average number
of occurrences of an event in a given
space, time or volume.

9
Poisson Process
- Experiment in which discrete events are
observed in a continuous interval of
time, space & volume such that
(i) Prob. of exactly one occurrence in
any interval t is very small
(ii) Trial size is large and events are rare

10
POISSON DISTRIBUTION
Useful for rare events
Trial size is large
Occurrence of events random (space or time)
Probability of occurrence very small ( in a
short interval it approaches zero)

11
Poisson Formula
Pr (x) = e - ()x
x

where x = 0,1,2,3,(no of occurrences of some


random events)

e= 2.7183, a constant

is mean number of occurrences


12
Example
Suppose that over a period of several years,
the average number of deaths from a certain
disease has been 10. If the number of deaths
from this disease follows the Poisson
distribution, what is the probability that
during the current year:
a) Exactly seven people will die from the disease
b) There will be no deaths from the disease
c) At least one person will die from the disease

13
Example

Pr (x) = [e - ()x ]/x

(a) P(X=7) = (2.718-10x107)/7! = 0.09

14
Exercise
If the mean number of serious accidents per
year in a drug manufacturing company is 5.
find the probability that in the current year
there will be:
a) Fewer than 2 accidents
b) No accidents
c) Exactly seven accidents
d) Two or more accidents

15
Continuous probability distributions
Normal distribution
Exponential distribution
Weibull distribution
Gamma distribution

16
The Normal Distribution
Normal distributions are a family of
distributions that have the same general shape.
They are symmetric with scores more
concentrated in the middle than in the tails
The normal distribution is a bell-shaped curve
with both the mean and the median at the
center of the curve.

17
The Normal Distribution
The standard normal distribution is a
distribution of data with a mean of zero and a
standard deviation of one.
It allows different populations to be compared to
each other.
Formula: The formula below is used to
calculate the standard score, or the z score
when comparing normally distributed
populations. Z=(x-)/
18
Normal Distribution is bell-shaped

19
Parameters of the normal distribution
The height of a normal distribution can be
specified mathematically in terms of two
parameters:
the mean () and
the standard deviation ()

20
The General Normal distribution
Equation

f(X) =

where is the mean and is the standard deviation


Hence we write x~N(, )
is the constant 3.14159, and e is the base of
natural logarithms and is equal to 2.718282.
bnnnx can take on any value from -infinity to
+infinity.
21
Properties of the normal curve
Its bell-shaped
It is specified by two parameters: the population mean
and the standard deviation.
It is symmetrical around the mean, bell-shaped, and
unimodal. This is why the normal curve is frequently
referred to as the bell curve.
The mean, median, and mode, are all in the middle of
the curve.
The total area under the curve above the x-axis is one
square unit with 50% of the area to the right of the
mean and 50% to the left of the mean.
22
Properties of the normal curve
The area bounded by one standard deviation to the right
and one standard deviation to the left of the mean will
represents approximately 68% of the values.
The area bounded by two standard deviations to the
right and two to the left will represents approximately
95% of the values.
99.7% of the values will be within three standard
deviations of the mean. This is demonstrated in the
graph on the next page:

23
The Normal Curve

24
Simple applications
Given that the mean height of men in this class is
171.5 cm ( = 171.5 cm) and the standard
deviation is 6.5 cm ( = 6.5 cm), using our
knowledge of the normal curve, the following
information will be true:
68.3% of men are between 1 implies that
68.3% of men are between ( 1 = 171.5 6.5 =
165cm and 178 cm)

25
Simple applications
Also,
95.5% of men are between 2 implies that
95.5% of men are between ( 2 = 171.5 2 x
6.5 = 158.5cm and 184.5 cm)

Exercise:
What is the range containing 99.7% of the men?

26
Standard Normal Distribution
This is a normal distribution with a mean of 0
and a standard deviation of 1
The standard normal random variable z is given
as
Where:
X is a score from the original normal distribution
is the mean of the original normal distribution,
and
is the standard deviation of original normal
distribution
Hence we write z~N(0,1)
27
Standard Normal distribution
Sometimes called the z distribution is
used to know specific information (in
the above example) such as:
1. What proportion of men are over
180 cm?
2. What is the probability of selecting
a person whose height is between
160cm and 178cm? 28
Example
Given a normal distribution of male heights
with = 171.5 cm and = 6.5 cm, what is the
proportion of men taller than 180 cm?
x- 180 - 171.5
z
6.5
8.5
z 1.31
6.5
Now that we know the z score, we must find
the area of the standard normal curve above
1.31.
29
Example

30
Step by step approach
In order to find the area of the curve that is
represented by the z score, 1.31, we must refer
to the standard normal z distribution table.
On the Standard Normal z Table, locate the z
score 1.31. Under the column labeled z, find
the value, 1.3.
The row labeled z will provide you with the
hundredths place of your z score, so follow it
over until 0.01.
31
Step by step approach
If you place one finger on 1.3 and on one
finger on 0.01 and follow those paths until
your two fingers meet, you find the value,
0.9049.
Use the excerpt from the Standard Normal z
Table on the following page to help you locate
the z score.

32
The Normal distribution Table

33
Interpretation
This table will give us the area of the curve
located to the LEFT of the z score.
As you can see by the diagram, we want to
find the area of the curve located to the
RIGHT of the z score.
To find the area to the right of the z score,
we subtract 0.9049 from 1.
i.e 1 0.9049 = 0.0951
34
Interpretation
Therefore, approximately 9.5% (0.0951 x
100%) of the curve is above 180cm (or
above 1.31 SD of the mean).
We can also say that men who are 180cm
and above are taller than 90.5% of men in
this class.

35
Importance of the Normal distribution
Fits many practical data distributions in
Medicine
Binomial distribution can be approximated by
a normal when n is large
It is the cornerstone of all parametric tests of
statistical significance
Its the foundation of other distributions (e.g.
Chi-square, F-distribution, T-distribution, etc)

36
Thank you
for listening

37

You might also like