You are on page 1of 16

1

THE NORMAL DISTRIBUTION:


A BASIC TOOL

DIFFERENT TYPES OF DISTRIBUTION






Skewed to the Right











Skewed to the Left











Symmetrical and Bell- Shaped


2

DIFFERENT TYPES OF DISTRIBUTION

SKEWED TO THE RIGHT
- Reflects a frequency distribution
which has lower frequencies as the
measurements take on higher
values
- E.g. income of people in most
developing countries


SKEWED TO THE LEFT
- Larger frequencies as the
measurements take on higher
values of the variables being
measured
- E.g. age of onset of diabetes (more
people developing at the later
ages)


SYMMETRICAL, BELL-SHAPED
DISTRIBUTION
- E.g. Normal curve in public health
and medicine




3
A GLIMPSE OF HISTORY

Normal distribution was the earliest
contribution to statistics

Its equation was derived by the English
mathematician Abraham de Moivre
(1667-1754)

First applied in astronomy by the German
mathematician, physicist and astronomer
Carl Friedrich Gauss (1777-1785)


The Formula for the normal curve:


Fx = ____1____ exp -(x-)
2
, < x <
2 H o
2
2 o
2



Where H = 3.14
e = 2.71
= mean of the distribution
o = standard deviation of the distribution


4
IMPORTANT CHARACTERISTICS OF THE
NORMAL DISTRIBUTION

1.) It is bell shaped and symmetrical about the mean.








- the left side is a mirror of the right side

- the bell shape indicates that much of
the measurements tend to cluster
around the mean
- frequencies decrease gradually and then
rapidly and finally leveling off as the
values move away from the mean

2.) The mean , median, mode of the normal
distribution are all equal. All values coincide in
the same point along the x- axis


3.) The total area under the curve (AUC) and
above the x-axis equals the value of 1.
(This is proven by calculus)
- We can calculate the areas bounded by
the normal curve, the x-axis and any
two points along the x-axis
5
- Since the totality is equal to 1 or 100% ,
these areas can be thought of as a
proportion or a percentage of a whole.

4.) It has long tapering tails that extend
infinitely in either direction but never touching
the x-axis. The range of values which has
normal distribution is from - to + .
The tapering tails show the leveling of
frequencies as the measurements move farther
from the mean.

5.) It is completely determined by two
parameters: its mean and standard deviation,
o. This means that for every distinct pair of
values of and o, there is a corresponding
unique normal curve.

The value of = indicates the position of the
normal curve along the x-axis

The value of o = determines how flat or peaked
the curve is at its center.







6
THE NORMAL DISTRIBUTION


1
=
2
, o
1
= o
2








1
=
2
, o
1
= o
2








6.) When relating to the normal distribution,
the standard deviation becomes a more
meaningful quantity than merely being a
measure of dispersion.

If we draw a line perpendicular to the x-axis at the
point equivalent to 1 standard deviation to the left
and 1 standard deviation to the right of the mean,
then the area bounded by the normal curve, the
perpendicular lines and the x-axis is approximately
68.5% of the total distribution.
7

Example: If for example the lengths of Filipino babies
approximate a normal distribution with a mean of 41 cm and
a standard deviation of 4 cm, then approximately 68.5% of
Filipino babies measure between 37 ( 41- 4) and 45 (41 + 4)
cm at birth.


68.5%

95%

99.7%

-3s -2s -1s +1s +2s +3s

AREAS UNDER THE NORMAL CURVE BOUNDED BY
DIFFERENT STANDARD DEVIATIONS

If the perpendicular lines were drawn two standard
deviations away from the mean, then the area covered is
95%.
Around 95% of the babies are between 33 and 49 cm
long at birth . { 41 (2 x 4)} and 41 + (2 x 4) }

If the lines were drawn three standard deviations, the
area covered is 99.7% . Beyond three standard deviations,
the areas are almost negligible.

In summary:

1 SD away from the mean = area covered is 68.7%
2 SD away from the mean = area covered is 95%
3 SD away from the mean = area covered is 99.7%
8


THE IMPORTANCE OF THE NORMAL
DISTRIBUTION

Why is the normal distribution important?

The answer lies in the usefulness of the normal
distribution in explaining many biological phenomena
and in the role it plays in statistical inference .

Examples of biological variables with normal
distribution:

Height, weight, blood pressure, serum uric acid
(continuous numerical)
Discrete variables e.g. heart beat after an exercise

Note that these variables assume positive values
only. The range of values of a normal distribution
ranges from - and + . It is also impossible to
attain infinity.
What if variables are not normally distributed?

Transformation of a non-normal data to a normal one
requires square root and logarithmic transformations.

E.g. minimum dose that will produce an effect to an
animal may be converted to its natural logarithm
to yield a normal data.
9
The method of data transformation depends on the type
of distribution of the data. (e.g. Z-transformation)

Other types of distributions in biostatistics:
Binomial
Poisson
Students T

These can be approximated by the normal
distribution in large sample studies.
The normal distribution is easy to work with.
The most important reason for the use of the
normal distribution stems from the so-called central
limit theorem.

CENTRAL LIMIT THEOREM

A general law about commonly computed
statistics (measures to characterize a sample).
One of its implications is that if we draw a large
sample n repeatedly from a population that has a
mean and standard deviation o, then the
distribution of the sample will approximate that
of a normal distribution.
The resulting distribution will have a mean and
a standard deviation o / \n . This holds true
even if the population from which a sample is
taken in not normally distributed.


10
THE STANDARD NORMAL DISTRIBUTION

According to the fifth property, the normal
distribution is actually a collection or a family of
distributions with each member having a
population mean and standard deviation o .

The most important member of this family is one
having a = zero and o = 1. (unity)
This member of the family is called the standard
normal distribution.

Let s say we take a sample of values from a
population with normal distribution with a mean
and standard deviation o and denote these
values as xs. To convert this particular normal
distribution into a standard normal distribution,
we subtract the value of from each x value and
divide the result by s to get a new variable z


x -
o

Z has a normal distribution with a mean=0 and
SD=1. The variable Z is called the standard
normal deviate.

In the example of the lengths of Filipino babies.
Which is normally distributed
Z =
11

Mean = 41 cm 4cm SD

The value of 45 cm in this distribution is
equivalent to z=1 in the standard normal
distribution.

The proportion or percentage of Filipino babies
who are measured more than 45 cm at birth is
equal to the area under the standard normal
curve to the right of z=1.

The value of 33 cm in the original distribution is
transformed to z=-2. The percentage of Filipino
babies who have lengths less than 33 cm is equal
to the area under the curve to the left of z=-2.

The proportion of Filipino babies who measured
between 33 cm and 45 cm at birth is the same as
the area between z=-2 and z=1 in the standard
normal curve.








12
APPLICATIONS OF THE NORMAL
DISTRIBUTION

IT ALLOWS US TO SOLVE PROBLEMS OF TWO TYPES:

1.) ALLOWS US TO COMPUTE FOR PROPORTIONS
OR PERCENTAGES OF VALUES THAT BELONG TO
DIFFERENT CATEGORIES OF THE VARIABLE OF
INTEREST.
-The proportion belonging to a certain category can
also be interpreted as the probability that a member
drawn from a population will belong to this category.
- These categories may be presented by ranges of
values or intervals

Example:

If the distribution of systolic BP of non-hypertensive
men is known to be normal with a mean of 110
mmHg and an SD of 15 mmHg, then the proportion
of non-hypertensive men who have systolic BP
between 100 and 120 mmHg can be determined.

This proportion is computed by solving for the
area under the curve with = 110 and = 15
and x values bounded by x1 = 100 and x2=120.

2.) IT ALLOWS US TO DETERMINE THE X VALUES
THAT BOUND A SPECIFIED AREA OF
DISTRIBUTION OF THIS VARIABLE.
WE MAY BE INTERESTED IN DETERMINING
THE SBP OF THE HIGHEST 10% OF NON-
HYPERTENSIVE MEN. (Use the z-table)
13
PROBLEM 1:

What is the proportion of non-hypertensive men
who have systolic blood pressure above 120
mmHg?

Solution:

Here x= 120, we compute for Z using the formula

67 . 0
15
110 120
=

=
o
_
Z

Look for the area of the standard normal curve
corresponding to z =0.67. The table gives us A=0.2514.
This is the area of the curve to the right of z=0.67 in the
normal distribution with =110 and = 15, 0.2514 is also the
right of x=120


= 15 mmHg
A = 0.2514


X
= 110 120


= 15 mmHg
A = 1



= 0 0.67
14
PROBLEM 2

What is the proportion of non-hypertensive men
with systolic BP less than 90 mmHg ?


33 . 1
15
110 90
=

=
o
_
Z

The value of x= 90 corresponds to = -1.33.
Note that there are no negative z-values in the table.
However, we know that the standard normal curve is
symmetric about its mean (0).
The area to the left of Z = -1.33 is the same as to the
area to the right of Z = + 1.33. This area in the table is
.00918 or 9.18%
Therefore, the proportion of non-hypertensive men
with systolic BP less than 90mmHg is 9.18%.


= 15 mmHg
A=.0918


X
90 = 110 120



A=.0918 = 1
Z

-1.33 =0

15
PROBLEM 3

Calculate the area under the curve for systolic BP > 90
and < 120 mmHg.





= 15
X

90 =110 120



= 1
Z

-1.33 =0 0.67

PROBLEM 4:

What is the 90
th
percentile of the systolic BP levels of non-
hypertensive men ?

Solution:

X= z +
X = (1.28) (15) + 110
= 129.1

Therefore, 129.2 mmHg is the 90
th
percentile of the SBP of
non-hypertensive men. Ninety percent of non-hypertensive
men have SBP less than or equal to 129.2 mmHg.
16

EXERCISES TO DO:

1.) The hemoglobin level of household heads in
Magallanes Cavite has a mean of 12.63 gm% and a
SD of 2.45 gm%. Assuming that hemoglobin levels
follow a normal distribution:

1.1 What is the probability that a household head would
be classified as having severe anemia if the cut-off point
used in < gm%


1.2 What percentage of the population of household
heads would be classified as normal if the cut-off
point used is a hemoglobin level of > 12 gm%?


2.) Suppose that the height of 6-year old Filipino boys
is normally distributed with a mean of 110 cms and
a variance of 40cm
2
.

2.1 What proportion of 6-year old Filipino boys have
heights taller than 103 cm?

2.2) Between what two values do the middle 80% of the
heights fall?

You might also like