You are on page 1of 28

Debre Tabor University

College of Health sciences


Social and Public Health Unit

Probability and
Probability Distributions
By: Marelign Tilahun

(Assistant Professor, MPH /Epidemiology & Biostatistics)


Probability is a measure of how likely it is for an event to happen

Probability is the language of chance.

Probability theory was developed out of attempting to solve


problems related to games of chance such as tossing a coin,
rolling a die etc.

i.e. trying to quantify personal beliefs regarding degrees of


uncertainty.
Definition:

1. Classical Probability: An events probability is the ratio of the number


of favorable outcomes and possible outcomes in an equally likely
experiment (m/n).

m = success & n = possible outcomes

E.g. the probability of the occurrence of a head tossing a coin is 0.5.


Examples:
1. Roll a fair die
2. Select a SRS of size 2 from a population

2. Relative frequency probability: The probability of an event is the


proportion of times it occurs when exactly the same experiment is
repeated a very large number of times in independent trials.
Example;

number of times that the event occurs


Relative Frequency =
number of trials

No freq Relative freq


1 25 25/250 = 0.1
2 34 34/250 = 0.136
3 32 32/250 = 0.128
4 30 30/250 = 0.12
5 34 34/250 = 0.136
6 95 95/250 = 0.38
250 trials
3. Subjective probability:
A subjective probability is an individuals degree of belief in the
occurrence of an event..

Example:
If some one says that he is 95% certain that a cure for AIDS will
be discovered within 5 years, then he means that Pr(discovery of
cure of AIDS within 5 years) = 95%.
Definition of terms

Experiment: Any activity from which results are obtained.


A random experiment is one in which the outcomes cannot be predicted
with certainty.

Examples:
1. Flip a coin, Flip a coin 3 times, Roll a die
2. Draw a SRS of size 50 from a population
Sample Space: The set of all possible outcomes of an experiment

Events: Collections of outcomes from the sample space.


Venn Diagram: Graphical representation of sample space and events
Mutually exclusive events and the additive law of probability:

Two events A and B are mutually exclusive if they have no


elements in common.

Thus, if A and B are mutually exclusive events, Pr(A or B) = Pr


(A) + Pr(B).

Eg. One die is rolled. Sample space = S = (1,2,3,4,5,6)

Let A = the event an odd number turns up, A = (1,3,5)

Let B = the event 1,2 or 3 turns up; B = (1,2,3 )

Let C = the event 2 turns up, C= (2)


i. Find Pr (A), Pr (B) and Pr (C)

ii. Are A and B; A and C; B and C mutually exclusive?

Answers:

i. Pr(A) = Pr(1) + Pr(3) + Pr(5) = 1/6+1/6+ 1/6 = 3/6 = 1/2

Pr(B) = Pr(1) + pr(2) + Pr(3) = 1/6+1/6+1/6 = 3/6 =

Pr ( C ) = Pr(2) = 1/6
A and B are not mutually exclusive. Because they have the elements 1
and 3 in common
similarly, B and C are not mutually exclusive. They have the element 2
in common.
A and C are mutually exclusive. They dont have any element in
common

When A and B are not mutually exclusive, Pr(A or B) = Pr (A) +


Pr(B) Pr(A and B).
Properties of probability
1. Probabilities are real numbers on the interval from 0 to 1.

2. If two events are mutually exclusive ,then Pr(A or B) = Pr(A) +

Pr(B).

3. If A and B are two events, not mutually exclusive , then Pr( A or

B)= Pr (A) +Pr (B) Pr( A and B).

4. The sum of the probabilities that an event will occur and that it

will not occur is equal to 1; hence, P(A) = 1 P(A)


Probability distribution

The term probability distribution refers to the collection of all


possible outcomes along with their probabilities.

Every random variable has a corresponding probability distribution.

A probability distribution of a random variable can be displayed by


a table or a graph or a mathematical formula.
I. Probability distribution of a categorical variables
Specifies all possible outcomes of the categorical variable along
with the probability that each will occur.

E.g. Consider the value on the face showing up from tossing a die.
The probability distribution of this variable is

Value on Face 1 2 3 4 5 6

Probability 1/6 1/6 1/6 1/6 1/6 1/6

Notice that the total probability is 1.


II. Probability distribution of a discrete variable
Probability distributions can be estimated from relative
frequencies. Consider the number of televisions per household (X)
from US survey data.

1,218 101,501 = 0.012

e.g. P(X=4) = P(4) = 0.076 = 7.6%


7.13
III. Probability distribution of continuous variables
E.g. Suppose, X represents the continuous variable Height; rarely is
an individual exactly equal to 170cm tall

X can assume an infinite number of intermediate values 170.1,


170.2, 170.3 etc.

Because a continuous random variable X can take on an infinite


number of values, the probability associated with any particular
value is almost equal to zero.

However the probability that X will assume some value in the


interval enclosed by two ranges say x1 and x2
The Normal Distribution
It is the probability distribution of continuous variables and has
an especially important role in statistics.

Characteristics of the normal distribution:

1) It extends from minus infinity( -) to plus infinity (+).

2) It is uni-modal, bell-shaped and symmetrical about x = u.

3) It is determined by two quantities: its mean ( ) and SD ( ).

4. The height of the frequency curve cannot be taken as the


probability of a particular value
The Normal Distribution
The formula that generates the normal
probability distribution is:

2
1 x 2
11 1 x
22
ff((xx)) ee
for
for

xx

22
ee22..7183
7183 33..1416
1416
andand are
arethe
thepopulation
populationmeanmeanand
andstandard
standarddeviation.
deviation.

The shape and location of the normal curve changes as the


mean and standard deviation change.
The standard normal distribution

To find P(c < x < d), we need to find the area under the appropriate

normal curve.
1 x 2
d 1
P(c < X < d) = e 2 dx
c 2

To simplify the tabulation of these areas, we standardize each value


of x by expressing it as a z-score, the number of standard deviations
it lies from the mean.
xx
zz

The standard normal distribution
Since a normal distribution could be an infinite number of
possible values for its mean and SD, it is impossible to tabulate
the area associated for each and every normal curve.

Instead only a single curve for which = 0 and = 1 is


tabulated.

The curve is called the standard normal distribution (SND).


The Standard Normal (z)
Distribution

Mean = 0; Standard deviation = 1

When x = mean , z = 0

Symmetric about z = 0

Values of z to the left of center are negative

Values of z to the right of center are positive

Total area under the curve is 1.


The standard normal distribution
Assume a distribution has a mean of 70 and a standard deviation
of 10.
How many standard deviation units above the mean is a score of
80?

Answer: Z= ( 80-70) / 10 = 1
How many standard deviation units above the mean is a score of
83?

Answer: Z = (83 - 70) / 10 = 1.3


Using normal table

The four digit probability in a particular row and column of Table 1


gives the area under the z curve to the left that particular value of z.

Area for z 1.36


Example

Use Table 1 to calculate these probabilities:

P(z 1.36)
P(z 1.36) == .9131
.9131

P(z
P(z >1.36)
>1.36)
== 11 -- .9131
.9131 == .0869
.0869

P(-1.20 zz 1.36)
P(-1.20 1.36) == ..
9131
9131 -- .1151
.1151 == .7980
.7980
The standard normal distribution
From the symmetry properties of the stated normal distribution,
P(Z -x) = P(Z x) = 1 P(z x)
Hence, P(-1 < Z < +1) = 0.6826

P(-1.96< Z < + 1.96) = 0.95 and

P(-2.576 < Z < + 2.576) = 0.99


Example1: Suppose a borderline hypertensive is defined as a
person whose DBP is between 90 and 95 mm Hg, and the subjects
are 35-44-year-old males whose BP is normally distributed with
mean 80 and variance 144. What is the probability that a
randomly selected person from this population will be a
borderline hypertensive?

Solution: Let X be DBP, X ~ N(80, 12)


P (90 < X < 95) =P(90-80 < x- < 95-80) = P(0.83 < z < 1.25)
12 12
= P (Z < 1.25) P(Z < 0.83) = 0.8944 0.7967 = 0.098

Thus, approximately 9.8% of this population will be borderline


hypertensive.
Exercise:
A data collected on systolic blood pressure in normal

healthy individuals is normally distributed with = 120


and = 10 mm Hg.
1)What proportion of normal healthy individuals have a
systolic blood pressure above 130 mm Hg?
2)What proportion of normal healthy individuals have a
systolic blood pressure between 100 and 140 mm Hg?
3)What level of systolic blood pressure cuts off the lower
95% of normal healthy individuals?

25
Exercises
Find the probability of the following under the SND

Above 1.96?

Below 1.96?

Between 1.28 and 1.28?

Between 1.65 and 1.08? 0.8104

What level cuts the upper 25%?

What level cuts the lower 10%?

What level cuts the middle 99%?


Exercises
Let X be systolic blood pressure (for US population aged 18-74
males = 129 mmHg and = 19.8 mmHg).

What level encompasses the middle 95%?

What proportion of men in the population have SBP greater


than 150mmHg?
What level cuts the lower 10% of SBP?

You might also like