You are on page 1of 7

Bayesian Inference

by Hoai Nam Nguyen


September 9, 2017

1
The setting is the same. Given a population that follows a distribution P ,
where P contains 1 or more unknown parameters, we want to construct an
estimator for each of them. In this course, I consider the simple case, where
there is only 1 unknown parameter . To do this, we proceed by collecting
an i.i.d sample X1 , ..., Xn P

Similar to Maximum Likelihood Estimation, we first find the Likelihood


Function L():

L() = fX1 ,...,Xn (x1 , ..., xn |)

In Bayesian inference, we treat the parameter as a random variable. That


is, follows a probability distribution with pdf (). We call () the prior
distribution of

By Bayess formula, we have

fX1 ,...,Xn (x1 , ..., xn |)()


(|x1 , ..., xn ) =
fX1 ,...,Xn (x1 , ..., xn )

fX1 ,...,Xn (x1 , ..., xn |)()

where (|x1 , ..., xn ) is the pdf of given the sample data. This is called the
posterior distribution of

Let me clarify the last step further. The symbol means proportional
to. Since the left-hand side is the distribution of conditional on the sam-
ple data {x1 , ..., xn }, all the xi are assumed to be known and the denominator
fX1 ,...,Xn (x1 , ..., xn ) is, therefore, no more than a constant

In this setting, we are given the population distribution P and the prior
distribution (). We have to find the posterior distribution (|x1 , ..., xn ).
We then use the posterior mean E[|x1 , ..., xn ] to estimate the unknown
parameter . That is,

= E[|x1 , ..., xn ]

NOTE: when calculating (|x1 , ..., xn ), always use proportionality by re-


moving constants because this will simply the calculation a lot

2
Example 1

The population distribution is Bernoulli(p), where p U nif orm(0, 1). Use


Bayesian inference to construct an estimator p

The likelihood function is given by:


n
Y
L(p) = fXi (xi |p)
i=1

n
Y
= pxi (1 p)1xi
i=1

P P
xi
=p (1 p)n xi

The pdf of the prior distribution is (p) = 1, for 0 < p < 1

Therefore, the posterior distribution is given by:


(p|x1 , ..., xn ) fX1 ,...,Xn (x1 , ..., xn |p)(p)

P P
xi
=p (1 p)n xi
, for 0 < p < 1
Recall the pdf of Beta(, ):
(+) 1
fX (x) = ()()
x (1 x)1 , for 0 < x < 1
P
By comparing,
P we can see that the posterior distribution p is Beta( xi +
1, n xi + 1)


We know that the expectation of Beta(, ) is +
. Therefore, the posterior
mean is given by:
P
xi +1
E[p|x1 , ..., xn ] = n+2
P
Xi +1
Thus, p = n+2
is the Bayesian estimator for p

Note that we used proportionality when calculating the posterior distribu-


tion. By comparing with the pdf of Beta(, ), we can easily recover the
missing constant:
(+) (n+2) P
c= ()()
= P
( xi +1)(n xi +1)

3
Example 2

Same as example 1, except that p Beta(a, b), where both a and b are
given constants

The likelihood function stays unchanged:


P P
xi
L(p) = p (1 p)n xi

The pdf of the prior distribution is given by:


(a+b) a1
(p) = (a)(b)
p (1 p)b1 , for 0 < p < 1

Therefore, the pdf of the posterior distribution is given by:

(p|x1 , ..., xn ) fX1 ,...,Xn (x1 , ..., xn |p)(p)

P P
xi
p (1 p)n xi a1
p (1 p)b1

P P
p xi +a1 (1 p)n xi +b1 , for 0 < p < 1
P P
We recognise this is Beta( xi + a, n xi + b)
P
xi +a
The posterior mean is E[p|x1 , ..., xn ] = n+a+b
. The Bayesian estimator for p
is given by:
P
Xi +a
p = n+a+b

Again, you can recover the normalising constant in the pdf of the posterior
distribution:

c= P (n+a+b) P
( xi +a)(n xi +b)

4
Example 3

The population distribution is N (, 2 ), where is unknown and 2 is known.


The parameter follows a prior distribution N (, 2 ), where both and 2
are given constants. Use Bayesian inference to construct an estimator p

The likelihood function is given by:


n
Y
L() = fXi (xi |)
i=1

n  (x )2 
Y 1 i
= exp
i=1 2 2 2 2

n
Y (xi )2 

exp 2
, because 2 is known
i=1
2

Also, the pdf of the prior distribution is given by:


1  ( )2 
() = p exp
2 2 2 2

 ( )2 
exp , because 2 is known
2 2
Then, calculate the pdf of the posterior distribution:
Yn  (x )2   ( )2 
i
(|x1 , ..., xn ) exp exp
i=1
2 2 2 2

n
1 X
   1 
= exp 2 (xi )2 exp 2 (2 2 + 2 )
2 i=1 2

n
 1 X   1 
exp 2 (xi )2 exp 2 (2 2)
2 i=1 2

 2 
by removing exp
2 2

5
n
1 X 2

2
  1 2

= exp 2 (x 2xi + ) exp 2 ( 2)
2 i=1 i 2

n
1 X
   1 
exp 2 (2xi + 2 ) exp 2 (2 2)
2 i=1 2

n
1 X 2 
by removing exp 2 x
2 i=1 i

n
 X n2   1 2

= exp xi exp ( 2)
2 i=1
2 2 2 2

 n n
h 1  2 1 X  i
= exp + + xi + 2
2 2 2 2 2 i=1

= exp(A2 + B)

 2 B/A 
= exp
1/A

 2 B/A + B 2 /4A2 
exp
1/A

( B/2A)2 i
h
= exp
1/A

Comparing with the pdf of a Normal distribution, we deduce that the pos-
terior distribution of is given by:
 
B 1
|x1 , ...xn N 2A , 2A

Clearly, E[|x1 , ..., xn ] = B/2A. Therefore, = B/2A is the Bayesian esti-


mator for

6
Example 4

Consider the following types of treatment:

Treatment 1: 100% of the patients are cured (3 out of 3)

Treatment 2: 95% of the patients are cured (19 out of 20)

Treatment 3: 90% of the patients are cured (90,000 out of 100,000)

Which one is the best???

Treatment 1 cured 100% of the patients. But the sample was so small that
we should cast doubt on the result. On the other hand, Treatment 3 was
very reassuring, but the percentage was a bit lower

Let p be the probability that a patient is cured. Then, the probability that
a patient is not cured is 1 p

Therefore, the population follows Bernoulli(p), where p is an unknown pa-


rameter
P
xi +1
In example 1, we found that p = n+2
provided an estimate for p
3+1 4
Treatment 1: p = 3+2
= 5
= 0.8
19+1 20
Treatment 2: p = 20+2
= 22
0.909
90000+1 90001
Treatment 3: p = 100000+2
= 100002
0.9

We can see that p for Treatment 2 is the highest. Therefore, we predict that
Treatment 2 is the best one. Treatment 1, despite curing everyone in the
sample, is predicted to be the worst due to its small sample size

You might also like