Professional Documents
Culture Documents
1. Introduction
1.1. A Math 526 Exercise. Suppose B1 , B2 , B3 give a partition of
a sample space Ω, so that B1 , B2 , B3 are mutually exclusive, and their
union is all of Ω. Given any event A, clearly it is given by the disjoint
union,
A = (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ),
thus
P(A) = P(A ∩ B1 ) + P(A ∩ B2 ) + P(A ∩ B3 ).
We also know from the definition of conditional probabilities, that if
each of the Bi ’s have non-zero probabilities, then
P(A ∩ Bi ) = P(A|Bi )P(Bi ),
for each i = 1, 2, 3. Thus we obtain that,
P(A) = P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) + P(A|B3 )P(B3 ).
Recall that this is referred to as the rule of total probability.
Exercise 1 (Two Face). The DC comic book villain Two-Face often
uses a coin to decide the fate of his victims. If the result of the flip is
tails, then the victim is spared, otherwise the victim is killed. It turns
out he actually randomly selects from three coins: a fair one, one that
comes up tails 1/3 of the time, and another that comes up tails 1/10
of the time. What is the probability that a victim is spared?
Solution. Let Sp denote that event that the victim is spared and let
C1 be the event that the fair coin is used, C2 be the event that the coin
that comes up tails 1/3 of time is used, and C3 denote the event that
the coin that comes up tails 1/10 of the time is used. Then
P(Sp) = P(Sp|C1 )P(C1 ) + P(Sp|C2 )P(C2 ) + P(Sp|C3 )P(C3 )
= 1/2(1/3) + 1/3(1/3) + 1/10(1/3).
1
2 INTRODUCTION TO BAYESIAN METHODS I
Sometimes we also want to compute P(Bi |A), and a bit algebra gives
the following formula, in the case i = 3:
P(B3 ∩ A)
P(B3 |A) =
P(A)
P(A|B3 )P(B3 )
= .
P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) + P(A|B3 )P(B3 )
Recall that this is referred to as Bayes’ theorem.
Exercise 2. Referring to Exercise 1, supposed that the victim was
spared, then what is the probability that the fair coin was used?
Solution. Bayes’ theorem gives
P(C1 ∩ Sp) P(Sp|C1 )P(C1 )
P(C1 |Sp) = = ;
P(Sp) P(Sp|C1 )P(C1 ) + P(Sp|C2 )P(C2 ) + P(Sp|C3 )P(C3 )
these are all numbers we know.
1.2. Bayesian statistics. In classical statistics, θ ∈ ∆ is unknown so
we take a random sample from fθ and then we make an inference about
θ.
In Bayesian statistics, rather than thinking of the parameter as un-
known, we think of it has a random variable having some unknown
distribution. Let (fθ )θ∈∆ be a family of pdfs. Let Θ be a random
variable with pdf r taking values in ∆. Here r is called prior pdf for
Θ; we do not really know the true pdf for Θ, and this is a subjective
assignment or guess based on our present knowledge or ignorance. We
think of f (x1 ; θ) = f (x1 |θ) as the conditional pdf of a random variable
X1 that can be generated in the following two step procedure: First,
we generate Θ = θ, then we generate X1 with pdf fθ . In other words,
we let the joint pdf of X1 and Θ be given by
f (x1 |θ)r(θ).
In shorthand, we will denote this model by writing
X1 |θ ∼ f (x|θ)
Θ ∼ r(θ)
Similarly, we say that X = (X1 , . . . , Xn ) is a random sample from
the conditional distribution of X1 given Θ = θ if X1 |θ ∼ f (x1 |θ) and
Yn
L(x; θ) = L(x|θ) = f (xi |θ);
i=1
in which case the joint pdf of X and Θ is given by
j(x, θ) = L(x|θ)r(θ).
INTRODUCTION TO BAYESIAN METHODS I 3