You are on page 1of 7

Proof of Normal Probability Distribution

Function
Ben Cuttriss
22 July 2011
Proof. An interesting question was posed in a Statistics assignment which
essentially was to essentially show that the standard normal distribution was
valid - i.e. the integral from negative innity to innity equated to one and
in doing so showed the derivation of the part of the normal pdf
1

(2)
.
A friend of mine and I decided to try to derive the normal pdf and the
thinking went along the lines of the central limit theorem which states that
the mean of any probability distribution becomes normal as the number of
trials increases.
The proof of this is well known. but we asked ourselves how the normal
distribution was rst achieved. There is another normal derivation which
is the binomial approximation and it is through this direction that we won-
dered how to derive the normal distribution from the binomial as n gets large.
So the general approach we will take is to take a binomial distribution, then
increase the number of samples n.
1 2 3 4 5
P(X)
X
1
Once we have done this, instead of using the horizontal lines of the dis-
tribution histogram (which would be the normal probability mass function
of the binomial), we are going to draw a line through each central point.
1 2 3 4 5
P(X)
X
Notice how the probability mass function shown in blue now extends from
X = 0 point through to the X = 5 point. This probability mass function
now represented by the blue line now looks more like a probability density
function. Instead of labelling the histogram bars 1, 2, 3, 4, 5, we are instead
going to label the intervals 0k, 1k, 2k, ..., nk.
0k 1k 2k 3k 4k 5k
f(X)
X
So we begin by stating our distribution as P(y) where y is the probabil-
ity of an occurence of rk. From the original binomial distribution, we can
immediately see that the mean is = npk and the variance s
2
= npqk
2
2
(where p is the probability of success and q is the probability of failure).
y = P(Y = rk)
=
_
n
r
_
p
r
q
nr
=
_
n!
r!(n r)!
_
p
r
q
nr
Now, since we are going to be deriving the formula of the gradient function we
are going to have to derive the formula for the gradient from rst principles.
y

= P(Y=(r+1)k)
=
_
n
r + 1
_
p
r+1
q
nr1
=
_
n!
(r + 1)!(n r 1)!
_
p
r+1
q
nr1
(As a hint to where we are going, we eventually want to get to (y

y)/y...)
y

y =
_
n!
(r + 1)!(n r 1)!
_
p
r+1
q
nr1

_
n!
r!(n r)!
_
p
r
q
nr
=
_
n!(n r)
(r + 1)!(n r)!
_
p
r+1
q
nr1

_
n!(r + 1)
(r + 1)!(n r)!
_
p
r
q
nr
=
n!
(r + 1)!(n r)!
_
(n r)p
r+1
q
nr1
(r + 1)p
r
q
nr

=
n!p
r
q
nr1
(r + 1)!(n r)!
[p(n r)p (r + 1)p]
Now we are ready to divide by y
y

y
y
=
n!p
r
q
nr1
(r+1)!(nr)!
[p(n r)p (r + 1)p]
_
n!
r!(nr)!
_
p
r
q
nr
Which can be simplied after lots of cancellation to:
=
p(n r) (r + 1)q
(r + 1)q
=
1
(r + 1)q
(np pr qr q)
=
1
(r + 1)q
(np r(p + q) q
3
And since p+q=1
=
1
(r + 1)q
(np r q)
Call this Equation A
Instead of the variate Y, which represents the binomial distribution, lets
consider a new variate X which represents the binomial distribution, but
centred around 0. To achieve this we have to subtract the mean from each
value.
x = rk npk
= k(r np)
r =
x
k
+ np
rk = x + np
And:
r + 1 =
x
k
+ np + 1
k(r + 1) = x + knp + k
Multiply both sides by kq
k
2
q(r + 1) = kqx + k
2
npq + k
2
q
= (x + knp + k)kq
Now taking equation A and multiplying top and bottom by k
2
y

y
y
=
k
2
(np r q)
k
2
(r + 1)q
=
k(k(np r) kq)
(x + knp + k)kq
=
k(x kq)
(x + knp + k)kq
=
k(x kq)
k
2
npq + (x + k)kq
4
Since k is the change in x, k = dx and
2
= k
2
npq
dy
y
=
(x q dx)dx

2
+ (x + dx)q dx
Now as we increase n to innity and dx tends to 0
1
y
dy =
x

2
dx
Integrating both sides
ln(y) =
x
2
2
2
+ c
y = e

x
2
2
2
+c
= e

x
2
2
2
e
c
= A exp
_

x
2
2
2
_
We multiply by an A, but now have to work out what it is to make the
formula valid to be a pdf. We have to integrate over the range negative to
positive innity and set the result to one to calculate the value of A. Now
for this to be a valid probability density function, then the integral from
to must equal 1.
1 =
_

A exp
_

x
2
2
2
_
dx
1
A
=
_

exp
_

x
2
2
2
_
dx
I =
_

exp
_

x
2
2
2
_
dx
Now consider the following transformation:
I
2
=
_

exp
_

x
2
2
2
_
exp
_

y
2
2
2
_
dx dy
=
_

exp
_

_
1
2
_
(x
2
+ y
2
)

2
_
dx dy
5
Now consider:
r
2
= (x
2
+ y
2
)
Therefore:
I
2
=
_

exp
_

_
1
2
2
_
r
2
_
dx dy
But to convert into polar co-ordinates, we have to now apply the transfor-
mation
I
2
=
_
2
0
_

0
r exp
_

_
1
2
2
_
r
2
_
dr d
Solving with substitution:
u = r
2
rdr =
1
2
du
I
2
=
_
2
0
_

0
e

1
2
2
(u)
dud
=
_
2
0

2
_
e

1
2
u
_

0
d
=
_
2
0

2
(0 1)d
=
_
2
0

2
d
=
2
_
2
0
1d
=
2
2
I =

2
2
Since:
A =
1
I
=
1

2
2
6
We can now write the full pdf for the normal distribution as being:
1

2
2
exp
_

_
1
2
2
_
x
2
_
However, careful consideration shows that this is not quite nished. Remem-
ber that we said that we would normalise Y and consider a new variate
X centered about the mean. So remembering that we essentially said that
X = Y , then the full pdf for the normal distribution (for Y) is as follows
and in its most common form:
1

2
e
1
2
(
y

)
2
7

You might also like