Professional Documents
Culture Documents
Tang Yin-cai
yctang@stat.ecnu.edu.cn
Estimating a
Problem: Estimate an unknown population propor- probability
tion θ from the results of a sequence of /Bernoulli from binomial data
(Section 2.1-2.5)
trials0; that is, data y1 , · · · , yn , each = either 0 or Problem
The Binomial
1. Model
Example
discrete prior
uniform prior
Beta dist’n
Posterior Beta
A compromise
Choice of prior
conjugate prior
Estimating
posterior dist’n
Using simulation
Example
Estimating a
Problem: Estimate an unknown population propor- probability
tion θ from the results of a sequence of /Bernoulli from binomial data
(Section 2.1-2.5)
trials0; that is, data y1 , · · · , yn , each = either 0 or Problem
The Binomial
1. Model
Example
This problem provides a relatively simple but discrete prior
uniform prior
important starting point for the discussion of Beta dist’n
Bayesian inference. Posterior Beta
A compromise
Choice of prior
conjugate prior
Estimating
posterior dist’n
Using simulation
Example
Estimating a
Problem: Estimate an unknown population propor- probability
tion θ from the results of a sequence of /Bernoulli from binomial data
(Section 2.1-2.5)
trials0; that is, data y1 , · · · , yn , each = either 0 or Problem
The Binomial
1. Model
Example
This problem provides a relatively simple but discrete prior
uniform prior
important starting point for the discussion of Beta dist’n
Bayesian inference. Posterior Beta
A compromise
Choice of prior
The binomial distribution provides natural model for conjugate prior
a sequence of n exchangeable trials each giving Estimating
posterior dist’n
rise to one of two possible outcomes, convention- Using simulation
Example
ally labeled ’success’ and ’failure.’
Ans: Exchangeability
3
1.5
n=5 n=20
y=3
y=12
Problem
The Binomial
1.0
2
Model
0.5
1
Example
0.0
discrete prior
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
uniform prior
Beta dist’n
Posterior Beta
A compromise
25
8
Choice of prior
20
n=1000
n=100 conjugate prior
6
y=600
y=60
15
Estimating
4
10
posterior dist’n
2
Using simulation
5
Example
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
sample proportion, ny .
Beta dist’n
Posterior Beta
A compromise
Choice of prior
conjugate prior
Estimating
posterior dist’n
Using simulation
Example
sample proportion, ny .
Beta dist’n
Posterior Beta
A compromise
Choice of prior
This is a general feature of Bayesian inference: conjugate prior
Estimating
posterior distribution centered at a compromise posterior dist’n
between the prior information and the data, with Using simulation
Example
/compromise0increasingly controlled by the data
as sample size increases (can be investigated
more formally using conditional expectation formu-
lae).
SCHOOL OF FINANCE AND S TAT I S T I C S
priori.
priori.
Not necessarily a compelling argument õ
/true0Bayesian analysis must use a subjectively
assessed prior dist.n.
S F
CHOOL OF S
INANCE AND TAT I S T I C S
Estimating a
As a specific example of a factor that may influ- probability
ence the sex ratio, we consider the maternal con- from binomial data
(Section 2.1-2.5)
dition placenta() previa(c), an unusual con- Problem
The Binomial
dition of pregnancy(~)) in which the placenta is Model
Example
implanted very low in the uterus(fû), obstruct- discrete prior
ing({
) the fetus() from a normal vaginal de- uniform prior
Beta dist’n
livery(^). An early study concerning the sex of Posterior Beta
A compromise
placenta previa births in Germany found that of a Choice of prior
total of 980 births, 437 were female. How much conjugate prior
Estimating
evidence does this provide for the claim that the posterior dist’n
Using simulation
proportion of female births in the population of pla- Example
centa previa births is less than 0.485, the propor-
tion of female births in the general population?
60
70
from binomial data
(Section 2.1-2.5)
120
50
60
Problem
The Binomial
100
50
Model
40
Example
80
40
discrete prior
Frequency
Frequency
Frequency
30
uniform prior
60
30
Beta dist’n
20
Posterior Beta
40
20
A compromise
10
Choice of prior
20
10
conjugate prior
Estimating
0
0
0.35 0.40 0.45 0.50 0.55 −0.5 −0.3 −0.1 0.1 1.0 1.1 1.2 1.3 1.4 1.5 1.6 posterior dist’n
θ logit(θ) = log(θ (1 − θ)) φ = (1 − θ) θ
Using simulation
Example
From the conjugate form of prior density, the Analysis for the
normal distribution
posterior distribution for θ is also normal: (Section 2.6)
Unknown mean/
known variance
one data point
Conjugate prior
Posterior density
Precisions
Interpreting µ1
More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
From the conjugate form of prior density, the Analysis for the
normal distribution
posterior distribution for θ is also normal: (Section 2.6)
Unknown mean/
known variance
one data point
2 2
Conjugate prior
1 (y − θ) (θ − µ0 )
p(θ|y) ∝ exp − 2
+ 2
. (2.3) Posterior density
2 σ τ0 Precisions
Interpreting µ1
More on µ1
Some algebra is required, however, to reveal its Posterior prediction
multiple
form. observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
From the conjugate form of prior density, the Analysis for the
normal distribution
posterior distribution for θ is also normal: (Section 2.6)
Unknown mean/
known variance
one data point
2 2
Conjugate prior
1 (y − θ) (θ − µ0 )
p(θ|y) ∝ exp − 2
+ 2
. (2.3) Posterior density
2 σ τ0 Precisions
Interpreting µ1
More on µ1
Some algebra is required, however, to reveal its Posterior prediction
multiple
form. observations
multiple
observations
Remark: Recall that in the posterior density Known
mean/unknown
everything except θ is regarded as constant. variance
Normal likelihood
Conjugate prior
Posterior
distribution
τ02
+ σ2
multiple
observations
Known
and mean/unknown
1 1 1 variance
2
= 2 + 2. (2.6) Normal likelihood
τ1 τ0 σ Conjugate prior
Posterior
distribution
2
= 2+ 2 More on µ1
τ1 τ0 σ Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
2
= 2+ 2 More on µ1
τ1 τ0 σ Posterior prediction
multiple
observations
multiple
observations
Known
posterior precision = prior precision+data precision. mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
There are several ways of interpreting the form of Analysis for the
normal distribution
the posterior mean, µ1 . In equation (2.5) (Section 2.6)
Unknown mean/
known variance
one data point
Conjugate prior
Posterior density
Precisions
Interpreting µ1
More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
There are several ways of interpreting the form of Analysis for the
normal distribution
the posterior mean, µ1 . In equation (2.5) (Section 2.6)
Unknown mean/
known variance
one data point
1 1
τ02 µ 0 + σ2
y Conjugate prior
Posterior density
µ1 = 1 1 Precisions
τ02
+ σ2 Interpreting µ1
More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
There are several ways of interpreting the form of Analysis for the
normal distribution
the posterior mean, µ1 . In equation (2.5) (Section 2.6)
Unknown mean/
known variance
one data point
1 1
τ02 µ 0 + σ2
y Conjugate prior
Posterior density
µ1 = 1 1 Precisions
τ02
+ σ2 Interpreting µ1
More on µ1
Posterior prediction
multiple
posterior mean = weighted average of prior mean observations
multiple
and observed value, y, with weights proportional to observations
Known
the precisions. mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
The normal model with a single observation can Analysis for the
normal distribution
easily be extended to the more realistic situation (Section 2.6)
Unknown mean/
where we have a sample of independent and known variance
identically distributed observations y = (y1 , · · · , yn ). one data point
Conjugate prior
Posterior density
Precisions
Interpreting µ1
More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
The normal model with a single observation can Analysis for the
normal distribution
easily be extended to the more realistic situation (Section 2.6)
Unknown mean/
where we have a sample of independent and known variance
identically distributed observations y = (y1 , · · · , yn ). one data point
Conjugate prior
Posterior density
We can proceed formally( similar to those used in Precisions
Interpreting µ1
the single observation case), More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
i=1
2σ multiple
observations
" multiple
n
#!
1 1 2 1 X
2
observations
= exp − 2
(θ − µ 0 ) + 2
(y i − θ) . Known
2 2τ0 2σ i=1 mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
i=1
2σ multiple
observations
" multiple
n
#!
1 1 2 1 X
2
observations
= exp − 2
(θ − µ 0 ) + 2
(y i − θ) . Known
2 2τ0 2σ i=1 mean/unknown
variance
Normal likelihood
Conjugate prior
The posterior distribution depends on y only Posterior
In fact, since ȳ |θ, σ 2 ∼ N (θ, σ 2 /n), we can apply Analysis for the
normal distribution
results for the single normal observation (Section 2.6)
Unknown mean/
known variance
p(θ|y1 , · · · , yn ) = p(θ|ȳ) = N (θ|µn , τn2 ), one data point
Conjugate prior
where Posterior density
Precisions
Interpreting µ1
More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
In fact, since ȳ |θ, σ 2 ∼ N (θ, σ 2 /n), we can apply Analysis for the
normal distribution
results for the single normal observation (Section 2.6)
Unknown mean/
known variance
p(θ|y1 , · · · , yn ) = p(θ|ȳ) = N (θ|µn , τn2 ), one data point
Conjugate prior
where Posterior density
Precisions
Interpreting µ1
1 n More on µ1
τ0 0
2 µ + σ2
ȳ Posterior prediction
µn = 1 n
multiple
τ02 + σ2
observations
multiple
observations
and Known
1 1 n mean/unknown
variance
= 2 + 2.
τ1 τ0 σ Normal likelihood
Conjugate prior
Posterior
distribution
The prior precision, τ12 and data precision, σn2 play Analysis for the
normal distribution
0
(Section 2.6)
equivalent roles; if n is large, the posterior Unknown mean/
distribution is largely determined by σ 2 and the known variance
one data point
sample value ȳ. Conjugate prior
Posterior density
Precisions
Interpreting µ1
More on µ1
Posterior prediction
multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
The prior precision, τ12 and data precision, σn2 play Analysis for the
normal distribution
0
(Section 2.6)
equivalent roles; if n is large, the posterior Unknown mean/
distribution is largely determined by σ 2 and the known variance
one data point
sample value ȳ. Conjugate prior
Posterior density
Precisions
As τ0 → ∞ with n fixed, or as n → ∞ with τ02 fixed, Interpreting µ1
we have: More on µ1
Posterior prediction
p(θ|y) ≈ N (θ|ȳ, σ 2 /n), (2.7) multiple
observations
multiple
observations
Known
mean/unknown
variance
Normal likelihood
Conjugate prior
Posterior
distribution
The prior precision, τ12 and data precision, σn2 play Analysis for the
normal distribution
0
(Section 2.6)
equivalent roles; if n is large, the posterior Unknown mean/
distribution is largely determined by σ 2 and the known variance
one data point
sample value ȳ. Conjugate prior
Posterior density
Precisions
As τ0 → ∞ with n fixed, or as n → ∞ with τ02 fixed, Interpreting µ1
we have: More on µ1
Posterior prediction
p(θ|y) ≈ N (θ|ȳ, σ 2 /n), (2.7) multiple
observations
multiple
observations
A prior distribution with large τ02 and thus low pre- Known
mean/unknown
cision captures prior beliefs diffuse over the range variance
Normal likelihood
of θ where the likelihood is substantial. Conjugate prior
Posterior
distribution
■ exponential
The Standard
Single-parameter
Models
(Section 2.7)
The exponential
family
Poisson Model
Exponential Model
θy e−θ
p(y|θ) = , y = 0, 1, 2, . . . ,
y!
θy e−θ
p(y|θ) = , y = 0, 1, 2, . . . ,
y!
The Standard
Single-parameter
Models
(Section 2.7)
The exponential
family
Poisson Model
Exponential Model
■ intuitive explanation: ?
The Standard
Single-parameter
Models
(Section 2.7)
The exponential
family
Poisson Model
Exponential Model
p(y|θ) ∝ θ( ) ( i=1 xi )θ
Pn Pn
y −
i=1 i e
p(y|θ) ∝ θ( ) ( i=1 xi )θ
Pn Pn
y −
i=1 i e
p(y|θ) ∝ θ( ) ( i=1 xi )θ
Pn Pn
y −
i=1 i e
The Standard
Single-parameter
Models
(Section 2.7)
The exponential
family
Poisson Model
Exponential Model
■ Memoryless property:
P r(y > t + s|y > s, θ) = P r(y > t|θ) ∀s, t.
The Standard
Single-parameter
Models
(Section 2.7)
The exponential
family
Poisson Model
Exponential Model