Bayesian Inference Explained

Bayesian Inference
by Hoai Nam Nguyen

September 9, 2017
1
The setting is the same. Given a population that follows a distribution P ,
where P contains 1 or more unknown parameters, we want to construct an
estimator for each of them. In this course, I consider the simple case, where
there is only 1 unknown parameter . To do this, we proceed by collecting
an i.i.d sample X1 , ..., Xn P
Similar to Maximum Likelihood Estimation, we first find the Likelihood

Function L():
L() = fX1 ,...,Xn (x1 , ..., xn |)
In Bayesian inference, we treat the parameter as a random variable. That

is, follows a probability distribution with pdf (). We call () the prior
distribution of
By Bayess formula, we have
fX1 ,...,Xn (x1 , ..., xn |)()

(|x1 , ..., xn ) =
fX1 ,...,Xn (x1 , ..., xn )
fX1 ,...,Xn (x1 , ..., xn |)()
where (|x1 , ..., xn ) is the pdf of given the sample data. This is called the
posterior distribution of
Let me clarify the last step further. The symbol means proportional
to. Since the left-hand side is the distribution of conditional on the sam-
ple data {x1 , ..., xn }, all the xi are assumed to be known and the denominator
fX1 ,...,Xn (x1 , ..., xn ) is, therefore, no more than a constant
In this setting, we are given the population distribution P and the prior
distribution (). We have to find the posterior distribution (|x1 , ..., xn ).
We then use the posterior mean E[|x1 , ..., xn ] to estimate the unknown
parameter . That is,
= E[|x1 , ..., xn ]
NOTE: when calculating (|x1 , ..., xn ), always use proportionality by re-

moving constants because this will simply the calculation a lot
2
Example 1
The population distribution is Bernoulli(p), where p U nif orm(0, 1). Use

Bayesian inference to construct an estimator p
The likelihood function is given by:

n
Y
L(p) = fXi (xi |p)
i=1
n
Y
= pxi (1 p)1xi
i=1
P P
xi
=p (1 p)n xi
The pdf of the prior distribution is (p) = 1, for 0 < p < 1
Therefore, the posterior distribution is given by:

(p|x1 , ..., xn ) fX1 ,...,Xn (x1 , ..., xn |p)(p)
P P
xi
=p (1 p)n xi
, for 0 < p < 1
Recall the pdf of Beta(, ):
(+) 1
fX (x) = ()()
x (1 x)1 , for 0 < x < 1
P
By comparing,
P we can see that the posterior distribution p is Beta( xi +
1, n xi + 1)

We know that the expectation of Beta(, ) is +
. Therefore, the posterior
mean is given by:
P
xi +1
E[p|x1 , ..., xn ] = n+2
P
Xi +1
Thus, p = n+2
is the Bayesian estimator for p
Note that we used proportionality when calculating the posterior distribu-

tion. By comparing with the pdf of Beta(, ), we can easily recover the
missing constant:
(+) (n+2) P
c= ()()
= P
( xi +1)(n xi +1)
3
Example 2
Same as example 1, except that p Beta(a, b), where both a and b are
given constants
The likelihood function stays unchanged:

P P
xi
L(p) = p (1 p)n xi
The pdf of the prior distribution is given by:

(a+b) a1
(p) = (a)(b)
p (1 p)b1 , for 0 < p < 1
Therefore, the pdf of the posterior distribution is given by:
(p|x1 , ..., xn ) fX1 ,...,Xn (x1 , ..., xn |p)(p)
P P
xi
p (1 p)n xi a1
p (1 p)b1
P P
p xi +a1 (1 p)n xi +b1 , for 0 < p < 1
P P
We recognise this is Beta( xi + a, n xi + b)
P
xi +a
The posterior mean is E[p|x1 , ..., xn ] = n+a+b
. The Bayesian estimator for p
is given by:
P
Xi +a
p = n+a+b
Again, you can recover the normalising constant in the pdf of the posterior
distribution:
c= P (n+a+b) P
( xi +a)(n xi +b)
4
Example 3
The population distribution is N (, 2 ), where is unknown and 2 is known.

The parameter follows a prior distribution N (, 2 ), where both and 2
are given constants. Use Bayesian inference to construct an estimator p
The likelihood function is given by:

n
Y
L() = fXi (xi |)
i=1
n (x )2
Y 1 i
= exp
i=1 2 2 2 2
n
Y (xi )2

exp 2
, because 2 is known
i=1
2
Also, the pdf of the prior distribution is given by:

1 ( )2
() = p exp
2 2 2 2
( )2
exp , because 2 is known
2 2
Then, calculate the pdf of the posterior distribution:
Yn (x )2 ( )2
i
(|x1 , ..., xn ) exp exp
i=1
2 2 2 2
n
1 X
1
= exp 2 (xi )2 exp 2 (2 2 + 2 )
2 i=1 2
n
1 X 1
exp 2 (xi )2 exp 2 (2 2)
2 i=1 2
2
by removing exp
2 2
5
n
1 X 2

2
1 2

= exp 2 (x 2xi + ) exp 2 ( 2)
2 i=1 i 2
n
1 X
1
exp 2 (2xi + 2 ) exp 2 (2 2)
2 i=1 2
n
1 X 2
by removing exp 2 x
2 i=1 i
n
X n2 1 2

= exp xi exp ( 2)
2 i=1
2 2 2 2
n n
h 1 2 1 X i
= exp + + xi + 2
2 2 2 2 2 i=1
= exp(A2 + B)
2 B/A
= exp
1/A
2 B/A + B 2 /4A2
exp
1/A
( B/2A)2 i
h
= exp
1/A
Comparing with the pdf of a Normal distribution, we deduce that the pos-
terior distribution of is given by:

B 1
|x1 , ...xn N 2A , 2A
Clearly, E[|x1 , ..., xn ] = B/2A. Therefore, = B/2A is the Bayesian esti-

mator for
6
Example 4
Consider the following types of treatment:
Treatment 1: 100% of the patients are cured (3 out of 3)
Treatment 2: 95% of the patients are cured (19 out of 20)
Treatment 3: 90% of the patients are cured (90,000 out of 100,000)
Which one is the best???
Treatment 1 cured 100% of the patients. But the sample was so small that
we should cast doubt on the result. On the other hand, Treatment 3 was
very reassuring, but the percentage was a bit lower
Let p be the probability that a patient is cured. Then, the probability that
a patient is not cured is 1 p
Therefore, the population follows Bernoulli(p), where p is an unknown pa-

rameter
P
xi +1
In example 1, we found that p = n+2
provided an estimate for p
3+1 4
Treatment 1: p = 3+2
= 5
= 0.8
19+1 20
= 22
0.909
90000+1 90001
= 100002
0.9
We can see that p for Treatment 2 is the highest. Therefore, we predict that
Treatment 2 is the best one. Treatment 1, despite curing everyone in the
sample, is predicted to be the worst due to its small sample size

Bayesian Inference Explained

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Inference Explained

Uploaded by

Copyright:

Available Formats

Bayesian Inference

by Hoai Nam Nguyen

Similar to Maximum Likelihood Estimation, we first find the Likelihood

L() = fX1 ,...,Xn (x1 , ..., xn |)

In Bayesian inference, we treat the parameter as a random variable. That

By Bayess formula, we have

fX1 ,...,Xn (x1 , ..., xn |)()

fX1 ,...,Xn (x1 , ..., xn |)()

NOTE: when calculating (|x1 , ..., xn ), always use proportionality by re-

The population distribution is Bernoulli(p), where p U nif orm(0, 1). Use

The likelihood function is given by:

The pdf of the prior distribution is (p) = 1, for 0 < p < 1

Therefore, the posterior distribution is given by:

Note that we used proportionality when calculating the posterior distribu-

The likelihood function stays unchanged:

The pdf of the prior distribution is given by:

Therefore, the pdf of the posterior distribution is given by:

(p|x1 , ..., xn ) fX1 ,...,Xn (x1 , ..., xn |p)(p)

The population distribution is N (, 2 ), where is unknown and 2 is known.

The likelihood function is given by:

Also, the pdf of the prior distribution is given by:

Clearly, E[|x1 , ..., xn ] = B/2A. Therefore, = B/2A is the Bayesian esti-

Consider the following types of treatment:

Treatment 1: 100% of the patients are cured (3 out of 3)

Treatment 2: 95% of the patients are cured (19 out of 20)

Treatment 3: 90% of the patients are cured (90,000 out of 100,000)

Which one is the best???

Therefore, the population follows Bernoulli(p), where p is an unknown pa-

You might also like