Bayes Prosper Ch3

Bayesian Statistical Analysis
Chapter 3: Introduction to Multiparameter

Models
Tang Yin-cai
yctang@stat.ecnu.edu.cn
SCHOOL OF FINANCE AND S TAT I S T I C S
March 30, 2009 Chapter 2 - p. 1/55

Introduction to
Multiparameter
Models (I)
— Noninformative
prior distributions
Introduction to (Reference: Gel-
man et. al.
Multiparameter Models (I) 2.9)
Noninformative
prior distributions
Proper and
— Noninformative prior distributions improper priors
Second example
(Reference: Gelman et. al. 2.9) binomial parameter
March 30, 2009 Chapter 2 - p. 2/55

Noninformative prior distributions
■ Prior distributions may be hard to construct if Introduction to

Multiparameter
there is no population basis. Models (I)
— Noninformative
prior distributions
(Reference: Gel-
man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 3/55


Multiparameter
■ Statisticians have long sought prior distributions — Noninformative

prior distributions
guaranteed to play a minimal role in determining (Reference: Gel-
man et. al.
the posterior distribution. 2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 3/55


Multiparameter
■ Statisticians have long sought prior distributions — Noninformative

prior distributions
guaranteed to play a minimal role in determining (Reference: Gel-
man et. al.
the posterior distribution. 2.9)
Noninformative
■ The rationale is that we should /let the data prior distributions
Proper and
speak for themselves0. improper priors
Second example
Such as /objective0Bayesian analysis would binomial parameter
use a reference or noninformative prior with a
density described as vague, flat or diffuse.
March 30, 2009 Chapter 2 - p. 3/55

Proper and improper priors
■ Recall: In estimating mean θ of a normal model Introduction to

Multiparameter
with known variance σ 2 , if prior precision, 1/τ02 , Models (I)
is small relative to the data precision, n/σ 2 , — Noninformative

prior distributions
then posterior distribution is approximately as if (Reference: Gel-
τ02 = ∞: man et. al.
2.9)
p(θ|y) ≈ N (θ|ȳ, σ 2 /n) Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 4/55

Proper and improper priors
■ Recall: In estimating mean θ of a normal model Introduction to

Multiparameter
with known variance σ 2 , if prior precision, 1/τ02 , Models (I)
is small relative to the data precision, n/σ 2 , — Noninformative

prior distributions
then posterior distribution is approximately as if (Reference: Gel-
τ02 = ∞: man et. al.
2.9)
p(θ|y) ≈ N (θ|ȳ, σ 2 /n) Noninformative
prior distributions
Proper and
■ Conclusion: the posterior distribution is improper priors
Second example
approximately what would result from assuming binomial parameter
that
p(θ) ∝ Constant for θ ∈ (−∞, ∞).
March 30, 2009 Chapter 2 - p. 4/55

■ The integral of the /flat0distribution p(θ) ∝ 1 for Introduction to
Multiparameter
θ ∈ (−∞, ∞) is not finite. In this case the Models (I)
distribution is referred to as improper. — Noninformative

prior distributions
(Reference: Gel-
man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 5/55

Multiparameter

prior distributions
(Reference: Gel-
A prior density p(θ) is proper if it does not depend man et. al.
2.9)
on data and integrates to 1. Noninformative
prior distributions
(provided the integral is finite the density can Proper and
improper priors
always be normalized to integrate to 1). Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 5/55

Multiparameter

prior distributions
(Reference: Gel-
A prior density p(θ) is proper if it does not depend man et. al.
2.9)
on data and integrates to 1. Noninformative
prior distributions
(provided the integral is finite the density can Proper and
improper priors
always be normalized to integrate to 1). Second example
binomial parameter
Despite impropriety of prior distribution in the ex-

ample with the normal sampling model, the posterior
distribution is proper, given at least one data point.
March 30, 2009 Chapter 2 - p. 5/55

Second example (Section 2.7)
■ Normal model with known mean but unknown Introduction to

Multiparameter
variance, with the conjugate scaled inverse-χ2 Models (I)
prior distribution σ 2 ∼ Inv-χ2 (ν0 , σ02 ) (That is, — Noninformative

σ02 ν0 /σ 2 ∼ χ2ν0 ). prior distributions
(Reference: Gel-
man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 6/55


Multiparameter

(Reference: Gel-
man et. al.
■ If prior degrees of freedom, ν0 , are small relative 2.9)
to the data degrees of freedom, n, then posterior Noninformative
prior distributions
distribution (ν = n1 ni=1 (yi − θ)2 )
P
Proper and
improper priors
Second example
2
2 2 ν0 σ0 + nν binomial parameter
σ |y ∼ Inv-χ (ν0 + n, )
ν0 + n
March 30, 2009 Chapter 2 - p. 6/55


Multiparameter

(Reference: Gel-
man et. al.
■ If prior degrees of freedom, ν0 , are small relative 2.9)
to the data degrees of freedom, n, then posterior Noninformative
prior distributions
distribution (ν = n1 ni=1 (yi − θ)2 )
P
Proper and
improper priors
Second example
2
2 2 ν0 σ0 + nν binomial parameter
σ |y ∼ Inv-χ (ν0 + n, )
ν0 + n
is approximately as if ν0 = 0:
p(σ 2 |y) ≈ Inv-χ2 (n, ν),
March 30, 2009 Chapter 2 - p. 6/55

Improper prior can lead to proper posterior
Note: Introduction to
Multiparameter
Models (I)
p(σ 2 ) ∝ 1/σ 2 (improper) ⇒ σ 2 |y ∼ Inv-χ2 (n, ν).
— Noninformative
prior distributions
(Reference: Gel-
man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 7/55

Multiparameter
Models (I)
— Noninformative
prior distributions
■ The combination of an improper prior density and (Reference: Gel-
man et. al.
a normal likelihood does not define a proper joint 2.9)
Noninformative
probability model, p(y, θ). prior distributions
Proper and
■ But applying improper priors
Second example
binomial parameter
p(θ|y) ∝ p(y|θ)p(θ),
still gives a proper posterior density.
March 30, 2009 Chapter 2 - p. 7/55

Multiparameter
Models (I)
— Noninformative
prior distributions
man et. al.
Noninformative
Proper and
Second example
binomial parameter
Rstill gives a proper posterior density. since

p(θ|y)dθ is finite for all y.
March 30, 2009 Chapter 2 - p. 7/55

Multiparameter
Models (I)
— Noninformative
prior distributions
man et. al.
Noninformative
Proper and
Second example
binomial parameter
Rstill gives a proper posterior density. since

p(θ|y)dθ is finite for all y.
This not always true! Posterior distributions ob-
tained from improper priors must be treated with
great care. S F
CHOOL OF S
INANCE AND TAT I S T I C S
March 30, 2009 Chapter 2 - p. 7/55

Noninformative prior distributions for the binomial parameter
Binomial sampling model y|θ ∼ Bin(n, θ) Introduction to

Multiparameter
Models (I)
Various noninformative prior distributions for θ(See — Noninformative
section 2.9/Page 63): prior distributions
(Reference: Gel-
man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 8/55


Multiparameter
Models (I)
(Reference: Gel-
1. BayesõLaplace uniform prior density man et. al.
2.9)
θ ∼ Beta(1, 1), that is p(θ) ∝ 1, θ ∈ (0, 1). Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 8/55


Multiparameter
Models (I)
(Reference: Gel-
2.9)
prior distributions
2. Jeffreys.Rule suggests Beta( 12 , 12 ) : Proper and
improper priors
p(θ) ∝ θ1/2 (1 − θ)1/2 . Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 8/55


Multiparameter
Models (I)
(Reference: Gel-
2.9)
prior distributions
2. Jeffreys.Rule suggests Beta( 12 , 12 ) : Proper and
improper priors
p(θ) ∝ θ1/2 (1 − θ)1/2 . Second example
binomial parameter
3. From exponential family representation of the
binomial distribution:
p(logit(θ)) ∝ Constant.
That is, p(θ) ∝ θ−1 (1 − θ)−1 : improper Beta(0, 0)
March 30, 2009 Chapter 2 - p. 8/55

Difference is small
Note that: Introduction to

Multiparameter
Models (I)
— Noninformative
prior distributions
(Reference: Gel-
man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 9/55

Difference is small

Multiparameter
Models (I)
1. For binomial and other single-parameter — Noninformative
models, different principles give (slightly) prior distributions
(Reference: Gel-
different noninformative prior distributions. man et. al.
2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 9/55

Difference is small

Multiparameter
Models (I)
(Reference: Gel-
2.9)
Noninformative
2. The difference between these alternatives is prior distributions
generally small (getting from θ ∼ Beta(0, 0) to Proper and
improper priors
θ ∼ Beta(1, 1) requires just one more success Second example
binomial parameter
and one more failure).
March 30, 2009 Chapter 2 - p. 9/55

Difference is small

Multiparameter
Models (I)
(Reference: Gel-
2.9)
Noninformative
improper priors
binomial parameter
3. If y = 0 or n, Beta(0, 0) prior leads to improper

posterior!
March 30, 2009 Chapter 2 - p. 9/55

Difference is small

Multiparameter
Models (I)
(Reference: Gel-
2.9)
Noninformative
improper priors
binomial parameter
3. If y = 0 or n, Beta(0, 0) prior leads to improper

posterior!
4. But for two cases—location parameters and

scale parameters—all principles seem to agree.
March 30, 2009 Chapter 2 - p. 9/55

1. θ is a location parameter, that is, p(y − θ|θ) is Introduction to
Multiparameter
free of θ and y. Models (I)
— Noninformative
Noninformative prior: prior distributions
(Reference: Gel-
man et. al.
p(θ) ∝ Constant, θ ∈ (−∞, ∞). 2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 10/55

1. θ is a location parameter, that is, p(y − θ|θ) is Introduction to
Multiparameter
free of θ and y. Models (I)
— Noninformative
Noninformative prior: prior distributions
(Reference: Gel-
man et. al.
p(θ) ∝ Constant, θ ∈ (−∞, ∞). 2.9)
Noninformative
prior distributions
Proper and
2. θ is a scale parameter, that is, p(y/θ|θ) is free of improper priors
θ and y. Second example
binomial parameter
Noninformative prior:
1
p(θ) ∝ ∈ (−∞, ∞).
θ
p(log(θ)) ∝ 1.
p(θ2 ) ∝ 1/θ2 . SCHOOL OF FINANCE AND S TAT I S T I C S
March 30, 2009 Chapter 2 - p. 10/55

General principles?
But beware that even these principles can be Introduction to

Multiparameter
misleading in some problems: Models (I)
— Noninformative
(Noninformative/improper) prior distributions can prior distributions
lead to improper and thus uninterpretable posterior (Reference: Gel-
man et. al.
distributions. 2.9)
Noninformative
prior distributions
Proper and
improper priors
Second example
binomial parameter
March 30, 2009 Chapter 2 - p. 11/55

General principles?
But beware that even these principles can be Introduction to

Multiparameter
misleading in some problems: Models (I)
— Noninformative
(Noninformative/improper) prior distributions can prior distributions
lead to improper and thus uninterpretable posterior (Reference: Gel-
man et. al.
distributions. 2.9)
Noninformative
prior distributions
All noninformative prior specifications are arbitrary Proper and
improper priors
and if the results are sensitive to the particular Second example
binomial parameter
choice, then more effort in specifying genuine prior
information is required to justify any particular infer-
ence.
March 30, 2009 Chapter 2 - p. 11/55

Multiparameter
models:
Introduction
(Reference: Gel-
man et. al.
Multiparameter models: Introduction 3.1)
Multiparameter
models
(Reference: Gelman et. al. 3.1) The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 12/55

Introduction
■ The reality of applied statistics: there are always Multiparameter

models:
several (maybe many) unknown parameters! Introduction
(Reference: Gel-
man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 13/55

Introduction

models:
■ BUT the interest usually lies in only a few of (Reference: Gel-

man et. al.
these (parameters of interest) while others are 3.1)
Multiparameter
regarded as nuisance parameters for which we models
The idea
have no interest in making inferences but which Computation
are required in order to construct a realistic Gibbs Sampling
model.
March 30, 2009 Chapter 2 - p. 13/55

Introduction

models:
■ BUT the interest usually lies in only a few of (Reference: Gel-

man et. al.
these (parameters of interest) while others are 3.1)
Multiparameter
regarded as nuisance parameters for which we models
The idea
have no interest in making inferences but which Computation
are required in order to construct a realistic Gibbs Sampling
model.
■ At this point the simple conceptual framework of
the Bayesian approach reveals its principal
advantage over other forms of inference.
March 30, 2009 Chapter 2 - p. 13/55

The Bayesian approach
The Bayesian approach is clear: Multiparameter

models:
Introduction
(Reference: Gel-
man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 14/55


models:
1. Obtain the joint posterior distribution of all Introduction
unknowns (Reference: Gel-

man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 14/55


models:

man et. al.
2. integrate over the nuisance parameters to leave 3.1)
Multiparameter
the marginal posterior distribution for the models
The idea
parameters of interest. Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 14/55


models:

man et. al.
Multiparameter
The idea
Gibbs Sampling
Alternatively using simulation:
March 30, 2009 Chapter 2 - p. 14/55


models:

man et. al.
Multiparameter
The idea
Gibbs Sampling
Alternatively using simulation:
1. draw samples from the entire joint posterior
distribution (difficult ?)
2. look at the parameters of interest and ignore the
rest.
March 30, 2009 Chapter 2 - p. 14/55

Averaging over nuisance parameters
■ Suppose that θ has two parts: θ = (θ1 , θ2 ) Multiparameter

models:
θ1 — parameter of interest; Introduction
θ2 — nuisance parameter. (Reference: Gel-

man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 15/55


models:

man et. al.
3.1)
■ For example: Multiparameter
models
y|µ, σ 2 ∼ N (µ, σ 2 ), The idea
Computation
Gibbs Sampling
with both µ(= θ1 ) and σ 2 (= θ2 ) are unknown.
March 30, 2009 Chapter 2 - p. 15/55


models:

man et. al.
3.1)
■ For example: Multiparameter
models
y|µ, σ 2 ∼ N (µ, σ 2 ), The idea
Computation
Gibbs Sampling
with both µ(= θ1 ) and σ 2 (= θ2 ) are unknown.
■ Parameter of interest: µ.
March 30, 2009 Chapter 2 - p. 15/55

■ AIM: To obtain the conditional distribution p(θ1 |y) Multiparameter

models:
Introduction
(Reference: Gel-
man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 16/55


models:
Introduction
■ joint posterior density,
(Reference: Gel-
p(θ1 , θ2 |y) ∝ p(y|θ1 , θ2 )p(θ1 , θ2 ), man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 16/55


models:
Introduction
■ joint posterior density,
(Reference: Gel-
p(θ1 , θ2 |y) ∝ p(y|θ1 , θ2 )p(θ1 , θ2 ), man et. al.
3.1)
Multiparameter
models
■ averaging or integrating over θ2 The idea
Z Computation
Gibbs Sampling
p(θ1 |y) = p(θ1 , θ2 |y)dθ2 .
March 30, 2009 Chapter 2 - p. 16/55

Factoring the joint posterior
Alternatively, Multiparameter
models:
Z Introduction
p(θ1 |y) = p(θ1 |θ2 , y)p(θ2 |y)dθ2 . (3.1) (Reference: Gel-

man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 17/55

Factoring the joint posterior
Alternatively, Multiparameter
models:
Z Introduction
p(θ1 |y) = p(θ1 |θ2 , y)p(θ2 |y)dθ2 . (3.1) (Reference: Gel-

man et. al.
3.1)
Multiparameter
models
p(θ1 |y), as a mixture of the conditional posterior dis- The idea
tributions given the nuisance parameter, θ2 , where Computation
Gibbs Sampling
p(θ2 |y) is a weighting function for the different pos-
sible values of θ2 .
March 30, 2009 Chapter 2 - p. 17/55

Mixtures of conditionals
■ The weights depend on the posterior density of Multiparameter

models:
θ2 —so on a combination of evidence from data Introduction
and prior model. (Reference: Gel-

man et. al.
3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 18/55


models:

man et. al.
3.1)
■ What if θ2 is known to have a particular value? Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 18/55


models:

man et. al.
3.1)
■ What if θ2 is known to have a particular value? Multiparameter
models
The idea
The averaging over nuisance parameters can be Computation
interpreted very generally: θ2 can be categorical Gibbs Sampling
(discrete) and may take only a few possible values

representing, for example, different sub-models.
March 30, 2009 Chapter 2 - p. 18/55

A strategy for computation
We rarely evaluate integral (3.1) explicitly, but it Multiparameter

models:
suggests an important strategy for constructing Introduction
and computing with multiparameter models, using (Reference: Gel-

man et. al.
simulation: 3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 19/55


models:

man et. al.
simulation: 3.1)
Multiparameter
■ Draw θ2 from its marginal posterior distribution models
The idea
p(θ2 |y). Computation
Gibbs Sampling
■ Draw θ1 from conditional posterior distribution
p(θ1 |θ2 , y), given the drawn value of θ2 .
March 30, 2009 Chapter 2 - p. 19/55


models:

man et. al.
simulation: 3.1)
Multiparameter
■ Draw θ2 from its marginal posterior distribution models
The idea
p(θ2 |y). Computation
Gibbs Sampling
■ Draw θ1 from conditional posterior distribution
p(θ1 |θ2 , y), given the drawn value of θ2 .
March 30, 2009 Chapter 2 - p. 19/55

Conditional simulation
is performed indirectly. Altering step 1 to draw θ2

(3.1) Multiparameter
models:
from its conditional posterior distribution given θ1 Introduction
leads to Gibbs Sampler, often used in Bayesian (Reference: Gel-

man et. al.
analysys. 3.1)
Multiparameter
models
The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 20/55


models:

man et. al.
analysys. 3.1)
Multiparameter
models
1. Give an initial value of θ1 . The idea
Computation
Gibbs Sampling
March 30, 2009 Chapter 2 - p. 20/55


models:

man et. al.
analysys. 3.1)
Multiparameter
models
Computation
2. Draw θ2 from p(θ2 |θ1 , y), given the drawn value Gibbs Sampling
of θ1 .
March 30, 2009 Chapter 2 - p. 20/55


models:

man et. al.
analysys. 3.1)
Multiparameter
models
Computation
of θ1 .
3. Draw θ1 from p(θ1 |θ2 , y), given θ2 .
March 30, 2009 Chapter 2 - p. 20/55


models:

man et. al.
analysys. 3.1)
Multiparameter
models
Computation
of θ1 .
4. Go back to Step 2, and iterate for certain
number of loops.
March 30, 2009 Chapter 2 - p. 20/55


models:

man et. al.
analysys. 3.1)
Multiparameter
models
Computation
of θ1 .
number of loops.
The procedure will ultimately to generate samples

from the marginal posterior distribution of both θ1
and θ2 —The GibbsS samplers.
F
CHOOL OF S
March 30, 2009 Chapter 2 - p. 20/55


models:

man et. al.
analysys. 3.1)
Multiparameter
models
Computation
of θ1 .
number of loops.
The procedure will ultimately to generate samples

from the marginal posterior distribution of both θ1
and θ2 —The GibbsS samplers.
F
CHOOL OF S
March 30, 2009 Chapter 2 - p. 20/55

Multiparameter
models: Normal
mean & variance
(Reference: Gel-
man et. al.
Multiparameter models: Normal mean 3.2)
Normal Dist’n
& variance
(Reference: Gelman et. al. 3.2)
March 30, 2009 Chapter 2 - p. 21/55

Normal data with noninformative prior
iid 2 Multiparameter
■ The data: y ∼ N (µ, σ ). models: Normal
mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 22/55

mean & variance
■ Prior distribution: noninformative (which is easily (Reference: Gel-
extended to informative priors). man et. al.
3.2)
Normal Dist’n
We assume prior independence of location and
scale parameters and take p(µ, σ 2 ) to be uniform
on (µ, log(σ 2 )):
p(µ, σ 2 ) ∝ (σ 2 )−1 .
March 30, 2009 Chapter 2 - p. 22/55

mean & variance
■ Prior distribution: noninformative (which is easily (Reference: Gel-
extended to informative priors). man et. al.
3.2)
Normal Dist’n
We assume prior independence of location and
scale parameters and take p(µ, σ 2 ) to be uniform
on (µ, log(σ 2 )):
p(µ, σ 2 ) ∝ (σ 2 )−1 .
March 30, 2009 Chapter 2 - p. 22/55

The joint posterior distribution p(µ, σ 2 |y)
Under the improper prior distribution, Multiparameter

models: Normal
mean & variance
Joint posterior distribution ∝ the likelihood × 1/σ 2
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 23/55


models: Normal
mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
n
!
2 1 X
p(µ, σ |y) ∝ σ −n−2
exp − 2 (yi − µ)2
2σ i=1
" n #!
1 X
= σ −n−2 exp − 2 (yi − ȳ)2 + n(ȳ − µ)2
2σ i=1

1 2 2

= σ −n−2
exp − 2 (n − 1)s + n(ȳ − µ) (,3.2)
2σ
1
Pn
where s = n−1 i=1 (yi − ȳ)2 .
2
March 30, 2009 Chapter 2 - p. 23/55


models: Normal
mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
n
!
2 1 X
p(µ, σ |y) ∝ σ −n−2
exp − 2 (yi − µ)2
2σ i=1
" n #!
1 X
= σ −n−2 exp − 2 (yi − ȳ)2 + n(ȳ − µ)2
2σ i=1

1 2 2

= σ −n−2
exp − 2 (n − 1)s + n(ȳ − µ) (,3.2)
2σ
1
Pn
where s = n−1 i=1 (yi − ȳ)2 .
2
2
Sufficient statistics:
S
ȳ and
CHOOL OFF
sINANCE AND S TAT I S T I C S
March 30, 2009 Chapter 2 - p. 23/55

The conditional posterior dist.n p(µ|σ 2 , y)
■ We now try to factor the joint posterior density as Multiparameter

models: Normal
the product of p(µ|σ 2 , y) and the marginal p(σ 2 |y) mean & variance
(Reference: Gel-
p(µ, σ 2 |y) = p(µ|σ 2 , y) × p(σ 2 |y) man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 24/55

The conditional posterior dist.n p(µ|σ 2 , y)
■ We now try to factor the joint posterior density as Multiparameter

models: Normal
the product of p(µ|σ 2 , y) and the marginal p(σ 2 |y) mean & variance
(Reference: Gel-
p(µ, σ 2 |y) = p(µ|σ 2 , y) × p(σ 2 |y) man et. al.
3.2)
Normal Dist’n
■ We can use a previous result for the mean of a
normal distribution with known variance.
µ|σ 2 , y ∼ N (ȳ, σ 2 /n). (3.3)
March 30, 2009 Chapter 2 - p. 24/55

The marginal posterior dist’n p(σ 2 |y)
This requires averaging the joint distribution (3.2) Multiparameter

models: Normal
over µ, that is, evaluating the simple normal mean & variance
integral (Reference: Gel-

man et. al.
1
Z p 3.2)
2
exp − 2 n(ȳ − µ) dµ = 2πσ 2 /n. Normal Dist’n
2σ
March 30, 2009 Chapter 2 - p. 25/55


models: Normal

man et. al.
1
Z p 3.2)
2
2σ
Thus,
2

2 2 −[ n−1 +1] (n − 1)s
p(σ |y) ∝ (σ ) 2 exp − 2
, (3.4)
2σ
March 30, 2009 Chapter 2 - p. 25/55


models: Normal

man et. al.
1
Z p 3.2)
2
2σ
Thus,
2

2 2 −[ n−1 +1] (n − 1)s
p(σ |y) ∝ (σ ) 2 exp − 2
, (3.4)
2σ
which is a scaled inverse-χ2 density:

σ 2 |y ∼ Inv-χ2 (n − 1, s2 ). (3.5)
March 30, 2009 Chapter 2 - p. 25/55

Parallel between Bayes & frequentist results
As with one-parameter normal results, there is a Multiparameter

models: Normal
remarkable parallel with sampling theory: mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 26/55


models: Normal
(Reference: Gel-
Bayes: man et. al.
2 3.2)
(n − 1)s 2 Normal Dist’n
2
y ∼ χ n−1 .
σ
March 30, 2009 Chapter 2 - p. 26/55


models: Normal
(Reference: Gel-
Bayes: man et. al.
2 3.2)
(n − 1)s 2 Normal Dist’n
2
y ∼ χ n−1 .
σ
Frequentist:
2

(n − 1)s 2 2
2
µ, σ ∼ χ n−1 .
σ
March 30, 2009 Chapter 2 - p. 26/55

Marginal posterior distribution of µ
■ µ is typically the estimand of interest, so ultimate Multiparameter

models: Normal
objective of the Bayesian analysis is the marginal mean & variance
posterior distribution of µ. (Reference: Gel-

man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 27/55


models: Normal

man et. al.
3.2)
■ Analytically:This can be obtained by integrating Normal Dist’n
σ 2 out of the joint posterior distribution.
March 30, 2009 Chapter 2 - p. 27/55


models: Normal

man et. al.
3.2)
■ Easily done by simulation: first draw σ 2 from (3.5),
then draw µ from (3.3).
March 30, 2009 Chapter 2 - p. 27/55


models: Normal

man et. al.
3.2)
■ Easily done by simulation: first draw σ 2 from (3.5),
then draw µ from (3.3).
The posterior distribution of µ can be thought of

as a mixture of normal distributions mixed over the
scaled inverse chi-squared distribution for the vari-
ance.
March 30, 2009 Chapter 2 - p. 27/55

Analytic form ?
We start by integrating the joint posterior density Multiparameter

models: Normal
(3.2) over σ 2 mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 28/55

Analytic form ?

models: Normal
Z ∞ (Reference: Gel-
p(µ|y) = p(µ, σ 2 |y)dσ 2 . man et. al.
3.2)
0 Normal Dist’n
March 30, 2009 Chapter 2 - p. 28/55

Analytic form ?

models: Normal
Z ∞ (Reference: Gel-
p(µ|y) = p(µ, σ 2 |y)dσ 2 . man et. al.
3.2)
0 Normal Dist’n
This can be evaluated using the substitution

A
z = 2, where A = (n − 1)s2 + n(µ − ȳ)2 .
2σ
March 30, 2009 Chapter 2 - p. 28/55

We recognize (!) that the result is an unnormalized Multiparameter

models: Normal
gamma integral: mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 29/55


models: Normal
(Reference: Gel-
man et. al.
Z 3.2)
∞ Normal Dist’n
p(µ|y) ∝ A−n/2 z (n−2)/2 exp(−z)dz
0
∝ [(n − 1)s2 + n(µ − ȳ)2 ]−n/2
2

n(µ − ȳ)
∝ 1+
(n − 1)s2
March 30, 2009 Chapter 2 - p. 29/55


models: Normal
(Reference: Gel-
man et. al.
Z 3.2)
∞ Normal Dist’n
p(µ|y) ∝ A−n/2 z (n−2)/2 exp(−z)dz
0
∝ [(n − 1)s2 + n(µ − ȳ)2 ]−n/2
2

n(µ − ȳ)
∝ 1+
(n − 1)s2
That is the tn−1 (ȳ, s2 /n) density (See Appendix A).
March 30, 2009 Chapter 2 - p. 29/55

Multiparameter
Equivalently, under the noninformative uniform models: Normal
prior distribution on (µ, log(σ)), the posterior distri- mean & variance
bution of µ is (Reference: Gel-

man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 30/55

Multiparameter

man et. al.
3.2)
Normal Dist’n

µ − ȳ
√ y ∼ tn−1 ,
s/ n
where where tn−1 is the standard Student-t density
(location 0, scale 1) with n − 1 degrees of freedom.
March 30, 2009 Chapter 2 - p. 30/55

Multiparameter

man et. al.
3.2)
Normal Dist’n

µ − ȳ
√ y ∼ tn−1 ,
s/ n
Comparing with the sampling theory:

µ − ȳ
√ µ, σ 2 ∼ tn−1 ,
s/ n
March 30, 2009 Chapter 2 - p. 30/55

Multiparameter

man et. al.
3.2)
Normal Dist’n

µ − ȳ
√ y ∼ tn−1 ,
s/ n
Comparing with the sampling theory:

µ − ȳ
√ µ, σ 2 ∼ tn−1 ,
s/ n
■ sampling distribution does not depend on the
nuisance parameter σ 2
S
CHOOL OFF S
■ posterior distribution does not depend on data

March 30, 2009 Chapter 2 - p. 30/55
Exercises
Nov. 14: Multiparameter

models: Normal
Ex 3.1, 3.2, 3.5 mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 31/55

Exercises
Nov. 14: Multiparameter

models: Normal
Ex 3.1, 3.2, 3.5 mean & variance
(Reference: Gel-
man et. al.
3.2)
Normal Dist’n
March 30, 2009 Chapter 2 - p. 31/55

Introduction to
Multiparameter
Models (II)
— Example:
Bioassay
Introduction to experiment
References:
Multiparameter Models (II) (1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
— Example: Bioassay experiment Multiparameter
models
References: Bioassay example
Example Data
(1) Gelman et. al. 3.7 Sampling Model
(2) Jim Albert, Chapter 4 Model across dose

levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 32/55

Multiparameter models
■ Few multiparameter sampling models allow Introduction to

Multiparameter
explicit calculation of the posterior distribution. Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 33/55


Multiparameter
— Example:
■ Data analysis for such models is usually Bioassay
experiment
achieved with simulation (especially MCMC References:
(1) Gelman
methods). et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 33/55


Multiparameter
— Example:
■ Data analysis for such models is usually Bioassay
experiment
achieved with simulation (especially MCMC References:
(1) Gelman
methods). et. al. 3.7
(2) Jim
■ We will illustrate with a nonconjugate model for Albert, Chapter 4
Multiparameter
bioassay experiment using a two-parameter models
generalized linear model. Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 33/55

Bioassay example
■ In drug development, acute toxicity tests(Ó5Á Introduction to

Multiparameter
) are performed in animals. Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 34/55

Bioassay example

Multiparameter
— Example:
■ Various dose(Jþ) levels of the compound are Bioassay
experiment
administered() to batches of animals. References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 34/55

Bioassay example

Multiparameter
— Example:
■ Various dose(Jþ) levels of the compound are Bioassay
experiment
administered() to batches of animals. References:
(1) Gelman
et. al. 3.7
■ Animals responses typically characterized by a (2) Jim
Albert, Chapter 4
binary outcome: alive or dead, tumor or no Multiparameter
tumor, response or no response etc. models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 34/55

Data Structure
■ Such an experiment gives rise to data of the form Introduction to

Multiparameter
Models (II)
(xi , ni , yi ), i = 1, 2, · · · , k,
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 35/55

Data Structure

Multiparameter
Models (II)
(xi , ni , yi ), i = 1, 2, · · · , k,
— Example:
Bioassay
experiment
■ where References:
(1) Gelman
◆ xi is the ith dose level (i = 1, 2, · · · , k) et. al. 3.7
(often measured on a logarithmic scale) (2) Jim
Albert, Chapter 4
◆ ni animals given ith dose level Multiparameter
models
◆ yi animals with positive outcome (tumor, death, Bioassay example
Example Data
response). Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 35/55

Data Structure

Multiparameter
Models (II)
(xi , ni , yi ), i = 1, 2, · · · , k,
— Example:
Bioassay
experiment
■ where References:
(1) Gelman
◆ xi is the ith dose level (i = 1, 2, · · · , k) et. al. 3.7
(often measured on a logarithmic scale) (2) Jim
Albert, Chapter 4
◆ ni animals given ith dose level Multiparameter
models
◆ yi animals with positive outcome (tumor, death, Bioassay example
Example Data
response). Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 35/55

Example Data
For the example data, 20 animals were tested, 5 at Introduction to

Multiparameter
each of 4 dose levels. Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 36/55

Example Data

Multiparameter
— Example:
Bioassay
Dose, xi Number of Number of experiment
References:
(log g/ml) animals, ni deaths, yi (1) Gelman
et. al. 3.7
-0.86 5 0 (2) Jim
Albert, Chapter 4
-0.30 5 1 Multiparameter
models
-0.05 5 3 Bioassay example
Example Data
0.73 5 5 Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 36/55

Example Data

Multiparameter
— Example:
Bioassay
Dose, xi Number of Number of experiment
References:
(log g/ml) animals, ni deaths, yi (1) Gelman
et. al. 3.7
-0.86 5 0 (2) Jim
Albert, Chapter 4
-0.30 5 1 Multiparameter
models
-0.05 5 3 Bioassay example
Example Data
0.73 5 5 Sampling Model
Model across dose
levels
Source: Racine A, Grieve AP, Fluhler H, Smith AFM. (1986). Approximation
Approximation
Bayesian methods in practice: experiences in the pharmaceutical Posterior inference
Results
industry (with discussion). Applied Statistics 35, 93-150.
March 30, 2009 Chapter 2 - p. 36/55

Sampling model at each dose level
Within dosage level i: Introduction to

Multiparameter
Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 37/55


Multiparameter
Models (II)
■ The animals are assumed to be exchangeable — Example:
(there is no information to distinguish among Bioassay
experiment
them). References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 37/55


Multiparameter
Models (II)
■ The animals are assumed to be exchangeable — Example:
(there is no information to distinguish among Bioassay
experiment
them). References:
(1) Gelman
et. al. 3.7
■ We model the outcomes as independent given (2) Jim
Albert, Chapter 4
same probability of death θi , which leads to the Multiparameter
familiar binomial sampling model: models
Bioassay example
Example Data
yi |θi ∼ Bin(ni , θi ). Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 37/55

Setting up a model across dose levels
■ Modeling the response at several dosage levels Introduction to

Multiparameter
requires a relationship between the θi ’s and Models (II)
xi .s. — Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 38/55


Multiparameter
xi .s. — Example:
Bioassay
experiment
■ Note:We start by assuming that each θi is an References:
(1) Gelman
independent parameter. We relax this et. al. 3.7
assumption when we develop hierarchical (2) Jim
Albert, Chapter 4
models (See Chapter 5). Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 38/55


Multiparameter
xi .s. — Example:
Bioassay
experiment
■ Note:We start by assuming that each θi is an References:
(1) Gelman
independent parameter. We relax this et. al. 3.7
assumption when we develop hierarchical (2) Jim
Albert, Chapter 4
models (See Chapter 5). Multiparameter
models
Bioassay example
■ There are many possibilities for relating the θi ’s Example Data
Sampling Model
to the xi .s, but a popular and reasonable choice Model across dose
is a logistic regression model: levels
Approximation
Approximation
θi Posterior inference
logit(θi ) = log = α + βxi Results
1 − θi
March 30, 2009 Chapter 2 - p. 38/55

Prior distribution for (α, β): Introduction to
Multiparameter
Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 39/55

Multiparameter
Models (II)
■ We assume that p(α, β) is independent and — Example:
locally uniform in the two parameters, that is, Bioassay
experiment
References:
p(α, β) ∝ 1. (1) Gelman
et. al. 3.7
It is an improper /noninformative0distribution. (2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 39/55

Multiparameter
Models (II)
experiment
References:
p(α, β) ∝ 1. (1) Gelman
et. al. 3.7
Albert, Chapter 4
Multiparameter
■ We need to check that the posterior distribution models
Bioassay example
is proper (?) (details not shown). Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 39/55

Multiparameter
Models (II)
experiment
References:
p(α, β) ∝ 1. (1) Gelman
et. al. 3.7
Albert, Chapter 4
Multiparameter
■ We need to check that the posterior distribution models
Bioassay example
is proper (?) (details not shown). Example Data
Sampling Model
(Noninformative/improper) prior distributions can Model across dose
levels
lead to improper and thus uninterpretable posterior Approximation
distributions. Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 39/55

Describing the posterior distribution
The form of posterior distribution: Introduction to

Multiparameter
Models (II)
p(α, β|y) ∝ p(α, β)p(y|α, β)
— Example:
k α+βxi
yi ni −yi Bioassay
Y e 1 experiment
∝ α+βxi α+βxi References:
i=1
1 + e 1 + e (1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 40/55


Multiparameter
Models (II)
p(α, β|y) ∝ p(α, β)p(y|α, β)
— Example:
k α+βxi
Y e 1 experiment
i=1
1 + e 1 + e (1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Parameter Estimation: Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 40/55


Multiparameter
Models (II)
p(α, β|y) ∝ p(α, β)p(y|α, β)
— Example:
k α+βxi
Y e 1 experiment
i=1
1 + e 1 + e (1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
models
■ One approach would be to use a normal Bioassay example
approximation (see Chapter 4) centered at Example Data
Sampling Model
posterior mode (α̃ = 0.87, β̃ = 7.91) Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 40/55


Multiparameter
Models (II)
p(α, β|y) ∝ p(α, β)p(y|α, β)
— Example:
k α+βxi
Y e 1 experiment
i=1
1 + e 1 + e (1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
models
■ One approach would be to use a normal Bioassay example
approximation (see Chapter 4) centered at Example Data
Sampling Model
posterior mode (α̃ = 0.87, β̃ = 7.91) Model across dose
levels
Approximation
■ Second approach (similar to above) is obtaining Approximation
maximum likelihood estimates (eg by running Posterior inference
Results
glm in R). Asymptotic standard errors can be
obtained via MLStheory. F
CHOOL OF S
March 30, 2009 Chapter 2 - p. 40/55

Introduction to
Multiparameter
Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Figure 1: (a) Contour plot of the posterior density of the pa- Approximation
rameters α and β. (b) Scatterplotof 1000 draws from the pos- Approximation
Posterior inference
terior distribution Results
March 30, 2009 Chapter 2 - p. 41/55

Discrete approx. to the posterior density
We illustrate computing the joint posterior Introduction to

Multiparameter
distribution for (α, β) at a grid of points(f:) in Models (II)
2-dimensions: — Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 42/55


Multiparameter
Bioassay
experiment
1. We begin with a rough estimate of the References:
(1) Gelman
parameters. et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 42/55


Multiparameter
Bioassay
experiment
1. We begin with a rough estimate of the References:
(1) Gelman
parameters. et. al. 3.7
(2) Jim
Albert, Chapter 4
■ Since Multiparameter
models
logit(E[(yi /ni )|α, β]) = α + βxi , Bioassay example
Example Data
Sampling Model
we obtain rough estimates of α and β using a Model across dose
linear regression of logit(yi /ni ) on xi . levels
Approximation
■ Set y1 = 0.5, y4 = 4.5 to enable calculation. Approximation
Posterior inference
Results
■ α̂ = 0.1, β̂ = 2.9 (standard errors 0.3 and 0.5).
March 30, 2009 Chapter 2 - p. 42/55

2. Evaluate the posterior on a 200 × 200 grid; use Introduction to

Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 43/55


Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
3. Use R to produce a contour plot (lines of equal Bioassay
experiment
posterior density). References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 43/55


Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
experiment
P P (1) Gelman
4. Renormalize on grid so that α β p(α, β|y) = 1 et. al. 3.7
(2) Jim
(i.e., create discrete approx to posterior) Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 43/55


Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
experiment
P P (1) Gelman
(2) Jim
Multiparameter
5. Sample from marginal Pdistribution of one models
Bioassay example
parameter: p(α|y) = β p(α, β|y) Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 43/55


Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
experiment
P P (1) Gelman
(2) Jim
Multiparameter
Bioassay example
Sampling Model
6. Sample from conditional distribution of second Model across dose
levels
parameter: p(β|α, y) Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 43/55


Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
experiment
P P (1) Gelman
(2) Jim
Multiparameter
Bioassay example
Sampling Model
levels
Approximation
7. We can improve sampling slightly by drawing Posterior inference
from linear interpolation between grid points. Results
March 30, 2009 Chapter 2 - p. 43/55


Multiparameter
range [−5, 10] × [−10, 40]. Models (II)
— Example:
experiment
P P (1) Gelman
(2) Jim
Multiparameter
Bioassay example
Sampling Model
levels
Approximation
7. We can improve sampling slightly by drawing Posterior inference
from linear interpolation between grid points. Results
Alternative: exact posterior using advanced

computation (methods
S covered
CHOOL OF F later)
S
March 30, 2009 Chapter 2 - p. 43/55

Posterior inference
Quantities of interest: (α, β) Introduction to

Multiparameter
Models (II)
— Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 44/55

Posterior inference

Multiparameter
Models (II)
LD50 = dose at which Pr(death) is 0.5 = −α/β — Example:
Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 44/55

Posterior inference

Multiparameter
Models (II)
Bioassay
experiment
References:
Why? (1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 44/55

Posterior inference

Multiparameter
Models (II)
Bioassay
experiment
References:
Why? (1) Gelman
et. al. 3.7
yi (2) Jim
Because: E ni
= θi = logit−1 (α + βxi ) = 0.5 gives Albert, Chapter 4
Multiparameter
α + βxi = logit(0.5) = 0, thus xi = −α/β. models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 44/55

Posterior inference

Multiparameter
Models (II)
Bioassay
experiment
References:
Why? (1) Gelman
et. al. 3.7
yi (2) Jim
Because: E ni
= θi = logit−1 (α + βxi ) = 0.5 gives Albert, Chapter 4
Multiparameter
α + βxi = logit(0.5) = 0, thus xi = −α/β. models
Bioassay example
Example Data
■ This is meaningless if β ≤ 0 (substance not Sampling Model
harmful). Model across dose
levels
■ We perform inference in two steps: Approximation
Approximation
◆ P r(β > 0|y) Posterior inference
Results
◆ posterior dist.n of LD50 conditional on β > 0
March 30, 2009 Chapter 2 - p. 44/55

Results
We take 1000 simulation draws of (α, β) from the Introduction to

Multiparameter
grid Models (II)
(different posterior sample from results in book) — Example:

Bioassay
experiment
References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 45/55

Results

Multiparameter
grid Models (II)

Bioassay
experiment
Note that β > 0 for all 1000 draws. References:
(1) Gelman
et. al. 3.7
(2) Jim
Albert, Chapter 4
Multiparameter
models
Bioassay example
Example Data
Sampling Model
Model across dose
levels
Approximation
Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 45/55

Results

Multiparameter
grid Models (II)

Bioassay
experiment
Note that β > 0 for all 1000 draws. References:
(1) Gelman
et. al. 3.7
Summary of posterior distribution (2) Jim
Albert, Chapter 4
posterior quantiles Multiparameter
models
2.5% 25% 50% 75% 97.5% Bioassay example
Example Data
α -0.6 0.6 1.3 2.0 4.1 Sampling Model
Model across dose
β 3.5 7.5 11.0 15.2 26.0 levels
Approximation
LD50 -0.28 -0.16 -0.11 -0.06 0.12 Approximation
Posterior inference
Results
March 30, 2009 Chapter 2 - p. 45/55

Lessons from
simple examples
(Reference: Gel-
man et. al.
3.8)
Lessons from simple examples Lessons
five steps
(Reference: Gelman et. al. 3.8)
March 30, 2009 Chapter 2 - p. 46/55

Lessons from simple examples
Lessons from
The lack of multiparameter models with explicit simple examples
posterior distributions not necessarily a barrier to (Reference: Gel-
analysis. man et. al.
3.8)
Lessons
five steps
March 30, 2009 Chapter 2 - p. 47/55

Lessons from simple examples
Lessons from
The lack of multiparameter models with explicit simple examples
posterior distributions not necessarily a barrier to (Reference: Gel-
analysis. man et. al.
3.8)
Lessons
We can use simulation, maybe after replacing five steps
sophisticated models with hierarchical or

conditional models (possibly invoking a normal
approximation in some cases).
March 30, 2009 Chapter 2 - p. 47/55

■ Inference from posterior distribution Lessons from
simple examples
◆ one parameter at a time
(Reference: Gel-
◆ simple graphical methods man et. al.
3.8)
◆ analytic methods
Lessons
◆ simulation five steps
March 30, 2009 Chapter 2 - p. 48/55

simple examples
(Reference: Gel-
3.8)
Lessons
■ No need to rely on asymptotics for inference
March 30, 2009 Chapter 2 - p. 48/55

simple examples
(Reference: Gel-
3.8)
Lessons
■ No need to rely on asymptotics for inference

■ Real advantage of Bayesian approach to be seen
in more complex settings (but computational
strategies will need to be extended)......
March 30, 2009 Chapter 2 - p. 48/55

The five steps of Bayesian inference
1. Write the likelihood p(y|θ) Lessons from

simple examples
(Reference: Gel-
man et. al.
3.8)
Lessons
five steps
March 30, 2009 Chapter 2 - p. 49/55


simple examples
2. Generate the posterior as p(θ|y) = p(θ)p(y|θ) by (Reference: Gel-

including well formulated information in p(θ) or man et. al.
3.8)
else use Lessons
five steps
p(θ) = constant.
March 30, 2009 Chapter 2 - p. 49/55


simple examples

3.8)
else use Lessons
five steps
p(θ) = constant.
3. Get crude estimates for θ as a starting point or

for comparison.
March 30, 2009 Chapter 2 - p. 49/55


simple examples

3.8)
else use Lessons
five steps
p(θ) = constant.

for comparison.
4. Draw simulations θ1 , · · · , θL from the posterior
distribution and compute the posterior density of
any function of θ which may be of interest.
March 30, 2009 Chapter 2 - p. 49/55


simple examples

3.8)
else use Lessons
five steps
p(θ) = constant.

for comparison.
4. Draw simulations θ1 , · · · , θL from the posterior
distribution and compute the posterior density of
any function of θ which may be of interest.
5. Simulate ỹ 1 , · · · , ỹ L by drawing each ỹ l from
predicative distribution p(ỹ|θl ) for any predicative
quantities ỹ of interest.
S
CHOOL OF F
INANCE AND STAT I S T I C S
March 30, 2009 Chapter 2 - p. 49/55

APPENDIX: Large
sample Bayesian
inference
(Reference: Gel-
man et. al. Chapter
APPENDIX: Large sample Bayesian 4)
Normal
inference approximations
Transformations
(Reference: Gelman et. al. Chapter 4)
March 30, 2009 Chapter 2 - p. 50/55

Bayesian and noninformative priors
■ In many simple Bayesian analyses using APPENDIX: Large

sample Bayesian
noninformative priors give similar results to inference
standard non-Bayesian approaches; e.g. (Reference: Gel-

man et. al. Chapter
Posterior t interval for the normal distribution with 4)
unknown mean and variance. Normal
approximations
Transformations
March 30, 2009 Chapter 2 - p. 51/55


sample Bayesian

man et. al. Chapter
approximations
■ The extent to which a /noninformative0prior Transformations
can be justified as an objective depends on a

judgement that the /data dominate the prior0.
March 30, 2009 Chapter 2 - p. 51/55


sample Bayesian

man et. al. Chapter
approximations
■ The extent to which a /noninformative0prior Transformations
can be justified as an objective depends on a

judgement that the /data dominate the prior0.
■ As the sample size increases, so does the
amount of information available in the data
increases and the influence of the prior
distribution on posterior inference decreases.
March 30, 2009 Chapter 2 - p. 51/55

Normal approximationsto the joint posterior distribution
■ If p(θ|y) unimodal and roughly symmetric, often APPENDIX: Large

sample Bayesian
convenient to approximate it by a normal inference
distribution centred at the mode, that is, we (Reference: Gel-

man et. al. Chapter
approximate the logarithm of the posterior 4)
density by a quadratic function. Normal
approximations
Transformations
March 30, 2009 Chapter 2 - p. 52/55


sample Bayesian

man et. al. Chapter
approximations
Transformations
■ Let θ̂ denote the posterior mode. Then a Taylor
series expansion of log p(θ|y) centered at θ̂ (θ
may be a vector and θ̂ is assumed to be in the
interior of the parameter space)
March 30, 2009 Chapter 2 - p. 52/55


sample Bayesian

man et. al. Chapter
approximations
Transformations
■ Let θ̂ denote the posterior mode. Then a Taylor
series expansion of log p(θ|y) centered at θ̂ (θ
may be a vector and θ̂ is assumed to be in the
interior of the parameter space)
p(θ|y) ≈ N (θ̂, [I(θ̂)]−1 ),
where I(θ) is the observed information,
d2
I(θ) = − 2 p(θ|y)
S F dθ
CHOOL OF S
March 30, 2009 Chapter 2 - p. 52/55

Note: APPENDIX: Large
sample Bayesian
inference
(Reference: Gel-
man et. al. Chapter
4)
Normal
approximations
Transformations
March 30, 2009 Chapter 2 - p. 53/55

sample Bayesian
inference
■ if θ̂ is in interior of parameter space, then I(θ̂) > 0
(Reference: Gel-
■ If θ is a vector, then I(θ) is a matrix. man et. al. Chapter
4)
Normal
approximations
Transformations
March 30, 2009 Chapter 2 - p. 53/55

sample Bayesian
inference
■ if θ̂ is in interior of parameter space, then I(θ̂) > 0
(Reference: Gel-
■ If θ is a vector, then I(θ) is a matrix. man et. al. Chapter
4)
Normal
Example 1: Normal distribution with unknown approximations
Transformations
mean and variance; see Gelman et al. page 102.
Example 2: Bioassay Experiment; see Gelman et
al. page 104.
March 30, 2009 Chapter 2 - p. 53/55

Transformations
APPENDIX: Large
Transformations: in many cases convergence to sample Bayesian
normality of posterior distribution for θ can be dra- inference
matically improved by transformation. (Reference: Gel-

man et. al. Chapter
4)
Normal
approximations
Transformations
March 30, 2009 Chapter 2 - p. 54/55

Transformations
APPENDIX: Large

man et. al. Chapter
4)
Normal
If φ = continuous transformation of θ, then both approximations
p(φ|x) and p(θ|x) −→ normal distributions, but Transformations
closeness of approximation for finite n may be very

different.
March 30, 2009 Chapter 2 - p. 54/55

Transformations
APPENDIX: Large

man et. al. Chapter
4)
Normal
If φ = continuous transformation of θ, then both approximations
p(φ|x) and p(θ|x) −→ normal distributions, but Transformations
closeness of approximation for finite n may be very

different.
Most commonly used transformations:
■ logarithmic: transforms (0, ∞) to (−∞, ∞)
■ logistic: transforms (0, 1) to (−∞, ∞)
March 30, 2009 Chapter 2 - p. 54/55

Exercises
Nov. 21: APPENDIX: Large

sample Bayesian
Ex 3.10, 3.11 inference
(Reference: Gel-
man et. al. Chapter
4)
Normal
approximations
Transformations
March 30, 2009 Chapter 2 - p. 55/55

Bayes Prosper Ch3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes Prosper Ch3

Uploaded by

Copyright:

Available Formats

Bayesian Statistical Analysis

Chapter 3: Introduction to Multiparameter

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 1/55

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 2/55

■ Prior distributions may be hard to construct if Introduction to

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 3/55

■ Prior distributions may be hard to construct if Introduction to

■ Statisticians have long sought prior distributions — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 3/55

■ Prior distributions may be hard to construct if Introduction to

■ Statisticians have long sought prior distributions — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 3/55

■ Recall: In estimating mean θ of a normal model Introduction to

is small relative to the data precision, n/σ 2 , — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 4/55

■ Recall: In estimating mean θ of a normal model Introduction to

is small relative to the data precision, n/σ 2 , — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 4/55

distribution is referred to as improper. — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 5/55

distribution is referred to as improper. — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 5/55

distribution is referred to as improper. — Noninformative

Despite impropriety of prior distribution in the ex-

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 5/55

■ Normal model with known mean but unknown Introduction to

prior distribution σ 2 ∼ Inv-χ2 (ν0 , σ02 ) (That is, — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 6/55

■ Normal model with known mean but unknown Introduction to

prior distribution σ 2 ∼ Inv-χ2 (ν0 , σ02 ) (That is, — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 6/55

■ Normal model with known mean but unknown Introduction to

prior distribution σ 2 ∼ Inv-χ2 (ν0 , σ02 ) (That is, — Noninformative

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 6/55

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 7/55

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 7/55

Rstill gives a proper posterior density. since

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 7/55

Rstill gives a proper posterior density. since

March 30, 2009 Chapter 2 - p. 7/55

Binomial sampling model y|θ ∼ Bin(n, θ) Introduction to

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 8/55

Binomial sampling model y|θ ∼ Bin(n, θ) Introduction to

SCHOOL OF FINANCE AND S TAT I S T I C S

March 30, 2009 Chapter 2 - p. 8/55