Bayesian Statistical Analysis: Chapter 4: Hierarchical Models

Bayesian Statistical Analysis
Chapter 4: Hierarchical Models
Tang Yin-cai
yctang@stat.ecnu.edu.cn
SCHOOL OF FINANCE AND S TAT I S T I C S
April 28, 2009 Chapter 2 - p. 1/83

¥©ë©z
■ James O. Berger, Statistical Decision Theory and
Bayesian Analysis (Second Edition),
¥È _D1È ÇUÈ ÚOûüØ

Springer-Verlag, 1985
d©Û ¥IÚOÑ
: , ,
, , 1998. pp714.
ÇU ydÚOÆ
¥IÚOÑ
■ Samuel Kotz, , (Modern
Bayesian Statistics), , 2000.
■ ÜÌ Ç¸ dÚOíä ÆÑ
, , , ,
1994.
■ +t dÚO ¥IÚOÑ
, , , 1999.
April 28, 2009 Chapter 2 - p. 2/83

Hierarchical
© .
Models
( )
— Setting-up and
examples
© .
Hierarchical Models
( )
— Setting-up and examples
April 28, 2009 Chapter 2 - p. 3/83

Hierarchical
models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Hierarchical models(I) — introduction Berger, 4.6; J.
Albert, 7)
•Review 1
(Ref.: Gelman et al., 5.1-5.3; Berger, •Review 2
Empirical Bayes
4.6; J. Albert, 7) •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 4/83

Review: 1-parameter model
■ Likelihood (sampling distribution): p(y|θ) Hierarchical

models(I) —
introduction
.
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 5/83


models(I) —
introduction
■ Prior distribution: p(θ).
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 5/83


models(I) —
introduction
(Ref.: Gelman et
Note: If no information is available, use al., 5.1-5.3;
Berger, 4.6; J.
diffuse/noninformative prior(s). However, you Albert, 7)
•Review 1
need to make sure that the posterior is proper. •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 5/83


models(I) —
introduction
(Ref.: Gelman et
Berger, 4.6; J.
•Review 1
Empirical Bayes
■ Posterior distribution: p(θ|y) ∝ p(θ)p(y|θ) •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 5/83


models(I) —
introduction
(Ref.: Gelman et
Berger, 4.6; J.
•Review 1
Empirical Bayes
■ Posterior distribution: p(θ|y) ∝ p(θ)p(y|θ) •Why
Hierarchical?
•Hierarchical
Inference: Model
•hierarchical
■ Estimate of θ, e.g. posterior mode approach
•Exchangeability
■ Bayesian interval •Basic ex. model
R •General ex.
■ Predictive distribution: p(ỹ|y) = p(θ|y)p(ỹ|θ)dθ model
■ ...... •Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 5/83

Review: Multiparameter model
Consider a 2-parameter model: Hierarchical

models(I) —
■ Parameters θ = (θ1 , θ2 ): introduction
θ1 —parameter of interest (Ref.: Gelman et

al., 5.1-5.3;
θ2 —nuisance parameter Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 6/83


models(I) —

al., 5.1-5.3;
Albert, 7)
■ Likelihood (sampling distribution): p(y|θ1 , θ2 ) •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 6/83


models(I) —

al., 5.1-5.3;
Albert, 7)
•Review 2
■ Prior distribution: p(θ1 , θ2 ) Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 6/83


models(I) —

al., 5.1-5.3;
Albert, 7)
•Review 2
■ Prior distribution: p(θ1 , θ2 ) Empirical Bayes
•Why
■ Posterior distribution of θ1 Hierarchical?
•Hierarchical
Z Model
•hierarchical
p(θ1 |y) = p(θ1 |θ2 , y)p(θ2 |y)dθ2 approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 6/83

Hierarchical
models(I) —
Strategy of Computation (for all Bayesian analysis!)— introduction
Simulation —Draw samples from univariate distributions: (Ref.: Gelman et
Draw θ2 , then given θ2 draw θ1 . al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
■ If θ2 is given, then it degenerates into a •Review 1
•Review 2
1-parameter model. Empirical Bayes
•Why
■ If direct sampling is not possible, then sample Hierarchical?
•Hierarchical
from its discretized distribution/grid Model
•hierarchical
approximation. approach
•Exchangeability
■ For more complex models, advanced •Basic ex. model
•General ex.
computation methods can be used (See Part III, model
Gelman, et. al.) •Typical structure
•Posterior dist.
■ ...... •Predictive dist.
April 28, 2009 Chapter 2 - p. 7/83

Empirical Bayes( ²d)
■ Often we would like the parameters of a Hierarchical
models(I) —
prior/population distribution to be estimated from introduction
historical data. Then we follow the Bayesian (Ref.: Gelman et

al., 5.1-5.3;
²d
method as above. This method is called Empirical Berger, 4.6; J.
Bayes( ). Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 8/83

models(I) —

al., 5.1-5.3;
²d
•Review 1
How? •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 8/83

models(I) —

al., 5.1-5.3;
²d
•Review 1
How? •Review 2
Empirical Bayes
■ Example. Estimating the risk of tumor of rats—θ: •Why
Hierarchical?
◆ Current experiment: y = 4 of n = 14 rate •Hierarchical
Model
developed tumor. •hierarchical
approach
◆ Bayesisan Model:
•Exchangeability
•Basic ex. model
y|θ ∼ Bin(n, θ) •General ex.
model
θ ∼ Beta(α, β) •Typical structure
•Posterior dist.
•Predictive dist.
◆ Posterior: θ|y ∼ Beta(α + 4, β + 10).
April 28, 2009 Chapter 2 - p. 8/83

■ α =?, β =? Hierarchical
models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 9/83

■ α =?, β =? From historical data: yi /ni , i = 1 − 70 Hierarchical
models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 9/83

models(I) —
introduction
0/20 0/20 0/20 0/20 0/20
(Ref.: Gelman et
0/20 0/20 0/19 0/19 0/19 al., 5.1-5.3;
Berger, 4.6; J.
0/19 0/18 0/18 1/17 1/20 Albert, 7)
•Review 1
1/20 1/20 1/19 1/19 1/19 •Review 2
1/18 1/18 2/25 2/24 2/23 Empirical Bayes
•Why
2/20 2/20 2/20 2/20 2/20 Hierarchical?
•Hierarchical
2/20 1/10 5/49 2/19 5/46 Model
•hierarchical
approach
3/27 2/17 7/49 7/47 3/20
•Exchangeability
3/20 2/13 9/48 10/50 4/20 •Basic ex. model
•General ex.
4/20 4/20 4/20 4/20 4/20 model
4/20 10/48 4/19 4/19 4/19 •Posterior dist.
•Predictive dist.
5/22 11/46 12/49 5/20 5/20
6/23 5/19
SCHOOL OF 6/22
F I N A N C E A6/20
N D S T A T 6/20
ISTICS
6/20 16/52 15/47 15/46 9/24

April 28, 2009 Chapter 2 - p. 9/83
models(I) —
introduction
0/20 0/20 0/20 0/20 0/20
(Ref.: Gelman et
0/20 0/20 0/19 0/19 0/19 al., 5.1-5.3;
Berger, 4.6; J.
0/19 0/18 0/18 1/17 1/20 Albert, 7)
•Review 1
1/20 1/20 1/19 1/19 1/19 •Review 2
1/18 1/18 2/25 2/24 2/23 Empirical Bayes
•Why
2/20 2/20 2/20 2/20 2/20 Hierarchical?
•Hierarchical
2/20 1/10 5/49 2/19 5/46 Model
•hierarchical
approach
3/27 2/17 7/49 7/47 3/20
•Exchangeability
3/20 2/13 9/48 10/50 4/20 •Basic ex. model
•General ex.
4/20 4/20 4/20 4/20 4/20 model
4/20 10/48 4/19 4/19 4/19 •Posterior dist.
•Predictive dist.
5/22 11/46 12/49 5/20 5/20
6/23 5/19
SCHOOL OF 6/22
F I N A N C E A6/20
N D S T A T 6/20
ISTICS
6/20 16/52 15/47 15/46 9/24

April 28, 2009 Chapter 2 - p. 9/83
■ Sample mean of yi /ni =0.136; Sample standard Hierarchical
models(I) —
deviation =0.103. introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 10/83

models(I) —
■ Solve for α and β: (Ref.: Gelman et

al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
αβ 2 Empirical Bayes
2
= Var (θ) = 0.103 .Or •Why
(α + β) (α + β + 1) Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 10/83

models(I) —

al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
2
= Var (θ) = 0.103 .Or •Why
•Hierarchical
Model
•hierarchical
E(θ)(1 − E(θ)) approach
α+β = −1 •Exchangeability
Var(θ) •Basic ex. model
•General ex.
α = (α + β)E(θ) model
β = (α + β)(1 − E(θ)). •Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 10/83

models(I) —

al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
2
= Var (θ) = 0.103 .Or •Why
•Hierarchical
Model
•hierarchical
•General ex.
•Predictive dist.
■ α = 1.37, β = 8.71.
April 28, 2009 Chapter 2 - p. 10/83

models(I) —

al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
2
= Var (θ) = 0.103 .Or •Why
•Hierarchical
Model
•hierarchical
•General ex.
•Predictive dist.
■ α = 1.37, β = 8.71.
S F
CHOOL OF INANCE AND S TAT I S T I C S
→Posterior:Beta(5.37, 18.71).
April 28, 2009 Chapter 2 - p. 10/83
■ Is Empirical Bayes Reasonable (Bayesianly)? Hierarchical
models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 11/83

models(I) —
introduction
■ Empirical Bayes above require the assumption
that current tumor risk, θ71 and historical rumor (Ref.: Gelman et
al., 5.1-5.3;
risks θ1 , . . . , θ70 be a random sample from a Berger, 4.6; J.
Albert, 7)
common distribution. •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 11/83

models(I) —
introduction
al., 5.1-5.3;
Albert, 7)
•Review 2
■ This assumption—independence is invalid, for Empirical Bayes
•Why
example, Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 11/83

models(I) —
introduction
al., 5.1-5.3;
Albert, 7)
•Review 2
•Why
•Hierarchical
◆ if the historical experiments were done in lab Model
•hierarchical
A, but the current data were gathered in lab B; approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 11/83

models(I) —
introduction
al., 5.1-5.3;
Albert, 7)
•Review 2
•Why
•Hierarchical
◆ if the historical experiments were done in lab Model
•hierarchical
A, but the current data were gathered in lab B; approach
•Exchangeability
◆ if there were a time •Basic ex. model
•General ex.
trend/structure/dependence among θ1 , . . . , θ71 model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 11/83

■ Questions/Problems may arise of estimating a Hierarchical
models(I) —
prior from existing data: introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 12/83

models(I) —
◆ point estimate for α and β seems arbitrary, (Ref.: Gelman et

al., 5.1-5.3;
which may ignores some posterior uncertainty; Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 12/83

models(I) —
◆ point estimate for α and β seems arbitrary, (Ref.: Gelman et

al., 5.1-5.3;
which may ignores some posterior uncertainty; Berger, 4.6; J.
Albert, 7)
◆ does it make sense to estimate α and β? •Review 1
•Review 2
α and β are part of the "prior": should they be Empirical Bayes
•Why
known before the data are gathered, according Hierarchical?
•Hierarchical
to the logic of Bayesian inference? Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 12/83

Why use Hierarchical models?
Hierarchical
■ Many problems have multiple parameters that models(I) —
introduction
are related.
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 13/83

Hierarchical
introduction
are related.
(Ref.: Gelman et
■ Use a joint probability model to reflect this al., 5.1-5.3;
Berger, 4.6; J.
dependence. Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 13/83

Hierarchical
introduction
are related.
(Ref.: Gelman et
■ Use a joint probability model to reflect this al., 5.1-5.3;
Berger, 4.6; J.
dependence.
©
Albert, 7)
•Review 1
■ It is useful to think hierarchically( ): •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 13/83

Hierarchical Model
■ Consider a set of experiments j = 1, 2, . . . , J. Hierarchical

models(I) —
Experiment j has data (vector) yj = (yj1 , . . . , yjnj ) introduction
and parameter (vector) θj . The distribution of yj , (Ref.: Gelman et

al., 5.1-5.3;
j = 1, 2, . . . , J are conditional on parameters θj : Berger, 4.6; J.
Albert, 7)
yj |θj ∼ p(y|θj ) •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 14/83

Hierarchical Model

models(I) —

al., 5.1-5.3;
Albert, 7)
•Review 2
Empirical Bayes
■ which themselves have a probability •Why
Hierarchical?
specification: •Hierarchical
Model
θj |φ ∼ p(θ|φ), j = 1, 2, . . . , J •hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 14/83

Hierarchical Model

models(I) —

al., 5.1-5.3;
Albert, 7)
•Review 2
Empirical Bayes
Hierarchical?
Model
approach
•Exchangeability
•Basic ex. model
•General ex.
in terms of further parameters φ ∼ p(φ), known
ëê
■ model
as hyperparameters( ). •Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 14/83

Hierarchical Model

models(I) —

al., 5.1-5.3;
Albert, 7)
•Review 2
Empirical Bayes
Hierarchical?
Model
approach
•Exchangeability
•Basic ex. model
•General ex.
in terms of further parameters φ ∼ p(φ), known
ëê
■ model
as hyperparameters( ). •Posterior dist.
•Predictive dist.
■ If θ1 , . . . , θJ are iid, then single-parameter model
is enough (?) S CHOOL OF F
INANCE AND S TAT I S T I C S
April 28, 2009 Chapter 2 - p. 14/83

Bioassay example(Chapter 3)
Bioassay example revisited: Hierarchical

models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 15/83


models(I) —
introduction
Dose, xi Number of Number of Death rate
(Ref.: Gelman et
(log g/ml) animals, ni deaths, yi θi al., 5.1-5.3;
Berger, 4.6; J.
-0.86 5 0 θ1 Albert, 7)
•Review 1
-0.30 5 1 θ2 •Review 2
Empirical Bayes
-0.05 5 3 θ3
•Why
0.73 5 5 θ4 Hierarchical?
•Hierarchical
Model
•hierarchical
Model: approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 15/83


models(I) —
introduction
Dose, xi Number of Number of Death rate
(Ref.: Gelman et
(log g/ml) animals, ni deaths, yi θi al., 5.1-5.3;
Berger, 4.6; J.
-0.86 5 0 θ1 Albert, 7)
•Review 1
-0.30 5 1 θ2 •Review 2
Empirical Bayes
-0.05 5 3 θ3
•Why
0.73 5 5 θ4 Hierarchical?
•Hierarchical
Model
•hierarchical
Model: approach
■ yi |θi ∼ Bin(ni , θi ) •Exchangeability
•Basic ex. model
•General ex.
■ θi , i = 1, . . . , 4 are independent model
θi
■ logit(θi ) = log 1−θ i
= α + βxi •Posterior dist.
•Predictive dist.
■ parameters: α, β with noninformative prior.
April 28, 2009 Chapter 2 - p. 15/83

Bioassay example(cont’d)
.
Let s generalize our simple bioassay example: Hierarchical
models(I) —
introduction
■ Imagine repeated bioassays with same (Ref.: Gelman et
compound, where (αj , βj ) parameters from al., 5.1-5.3;
Berger, 4.6; J.
different bioassays. Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 16/83

.
models(I) —
introduction
Berger, 4.6; J.
•Review 1
■ A single (α, β) may be inadequate to fit a •Review 2
Empirical Bayes
combined data set (several experiments) (⇒ •Why
pooled estimate). Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 16/83

.
models(I) —
introduction
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Hierarchical
Model
Separate unrelated (αj , βj ) are likely to
/ 0
■ •hierarchical
approach
overfit data (only 4 points in each data set). •Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 16/83

.
models(I) —
introduction
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Hierarchical
Model
/ 0
■ •hierarchical
approach
•Basic ex. model
■ Think: Is there a compromise? •General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 16/83

.
models(I) —
introduction
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Hierarchical
Model
/ 0
■ •hierarchical
approach
•Basic ex. model
■ Think: Is there a compromise? •General ex.
model
■ — Hierarchical Model: a compromise between •Typical structure
•Posterior dist.
single data estimate and pooled estimate. •Predictive dist.
April 28, 2009 Chapter 2 - p. 16/83

The hierarchical approach
■ A natural prior distribution arises by assuming Hierarchical

models(I) —
(αj , βj )′ s are a sample from common population introduction
distribution. (Ref.: Gelman et

al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 17/83


models(I) —
distribution.
.
(Ref.: Gelman et
al., 5.1-5.3;
■ We d be better off estimating the Berger, 4.6; J.
Albert, 7)
parameters,say φ, governing the population •Review 1
•Review 2
distribution of (αj , βj ) rather than each (αj , βj ) Empirical Bayes
separately. •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 17/83


models(I) —
distribution.
.
(Ref.: Gelman et
al., 5.1-5.3;
Albert, 7)
•Review 2
separately. •Why
Hierarchical?
•Hierarchical
■ This introduces new parameters that govern this Model
•hierarchical
population distribution, called hyperparameters. approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 17/83


models(I) —
distribution.
.
(Ref.: Gelman et
al., 5.1-5.3;
Albert, 7)
•Review 2
separately. •Why
Hierarchical?
•Hierarchical
■ This introduces new parameters that govern this Model
•hierarchical
population distribution, called hyperparameters. approach
•Exchangeability
•Basic ex. model
Hierarchical models uses many parameters but im- •General ex.
posing a population distribution induces enough model
structure (dependence) to avoid overfitting. •Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 17/83

Hierarchical models and empirical Bayes
■ Hierarchical models retain the advantages of Hierarchical

models(I) —
using the data to estimate the prior parameters, introduction
but eliminate the disadvantages (of dealing with (Ref.: Gelman et

al., 5.1-5.3;
many parameters) by putting a joint probability Berger, 4.6; J.
model on the entire set of parameters and the Albert, 7)
•Review 1
data. •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 18/83


models(I) —

al., 5.1-5.3;
•Review 1
data. •Review 2
Empirical Bayes
•Why
■ We can then do a Bayesian analysis of the joint Hierarchical?
•Hierarchical
distribution of all model parameters θ = (θ1 , . . . , θJ ) Model
•hierarchical
and φ. approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 18/83


models(I) —

al., 5.1-5.3;
•Review 1
data. •Review 2
Empirical Bayes
•Why
■ We can then do a Bayesian analysis of the joint Hierarchical?
•Hierarchical
distribution of all model parameters θ = (θ1 , . . . , θJ ) Model
•hierarchical
and φ. approach
•Exchangeability
Note: ◆ Empirical Bayes (using data to estimate φ ) is •Basic ex. model
•General ex.
an approximation to the complete hierarchical model
Bayesian analysis. •Posterior dist.
•Predictive dist.
◆ The key part of hierarchical model: φ is not
known and has S
its own
CHOOL OFF
prior p(φ).
S
INANCE AND TAT I S T I C S
April 28, 2009 Chapter 2 - p. 18/83

Exchangeability—review and extension
■ Exchangeability is essential for Bayesian Hierarchical

models(I) —
inference. It is related to iid but different. introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 19/83


models(I) —
■ Consider a set of experiments j = 1, 2, · · · , J, in (Ref.: Gelman et

al., 5.1-5.3;
which experiment j has data (vector) yj , Berger, 4.6; J.
Albert, 7)
parameter (vector) θj and likelihood p(yj |θj ). •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 19/83


models(I) —
■ Consider a set of experiments j = 1, 2, · · · , J, in (Ref.: Gelman et

al., 5.1-5.3;
which experiment j has data (vector) yj , Berger, 4.6; J.
Albert, 7)
parameter (vector) θj and likelihood p(yj |θj ). •Review 1
•Review 2
■ Definition: A set of random variables (θ1 , · · · , θJ ) Empirical Bayes
•Why
is said to be exchangeable if the joint distribution is Hierarchical?
•Hierarchical
invariant to permutations of the indexes Model
•hierarchical
(1, · · · , J), that is, the indexes contain no approach
•Exchangeability
information about the data values. •Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 19/83

Hierarchical
In practice: Ignorance implies exchangeability models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 20/83

Hierarchical
.
introduction
If no information (other than the yj s) is

.
■ (Ref.: Gelman et
al., 5.1-5.3;
available to distinguish the θj s from each other, Berger, 4.6; J.
and no ordering or grouping of the parameters Albert, 7)
•Review 1
can be made, then we can assume symmetry •Review 2
Empirical Bayes
among the parameters in the prior distribution. •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 20/83

Hierarchical
.
introduction
If no information (other than the yj s) is

.
■ (Ref.: Gelman et
al., 5.1-5.3;
available to distinguish the θj s from each other, Berger, 4.6; J.
and no ordering or grouping of the parameters Albert, 7)
•Review 1
can be made, then we can assume symmetry •Review 2
Empirical Bayes
among the parameters in the prior distribution. •Why
Hierarchical?
•Hierarchical
■ This symmetry is represented probabilistically by Model
•hierarchical
exchangeability approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 20/83

Examples of exchangeability
Hierarchical
models(I) —
introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 21/83

Hierarchical
Examples (see pages 122-123): models(I) —
introduction
1. The simplest form: i.i.d. given some unknown
(Ref.: Gelman et
parameter. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 21/83

Hierarchical
introduction
(Ref.: Gelman et
Berger, 4.6; J.
2. Seemingly non-exchangeable random variables Albert, 7)
•Review 1
may become exchangeable if we condition on all •Review 2
available information (e.g. covariates regression Empirical Bayes
•Why
analysis) Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 21/83

Hierarchical
introduction
(Ref.: Gelman et
Berger, 4.6; J.
2. Seemingly non-exchangeable random variables Albert, 7)
•Review 1
may become exchangeable if we condition on all •Review 2
available information (e.g. covariates regression Empirical Bayes
•Why
analysis) Hierarchical?
•Hierarchical
3. Hierarchical models often use exchangeable Model
•hierarchical
models for prior distribution of model approach
•Exchangeability
parameters. •Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 21/83

Basic exchangeable model
Note that: Hierarchical

models(I) —
■ The basic form of an exchangeable model has introduction
the parameter θj as an independent sample from (Ref.: Gelman et

al., 5.1-5.3;
a prior distribution governed by some unknown Berger, 4.6; J.
parameter φ. Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 22/83


models(I) —

al., 5.1-5.3;
•Review 1
•Review 2
■ θ = (θ1 , · · · , θJ ) are independent conditional on Empirical Bayes
•Why
additional parameters φ (the hyperparameters): Hierarchical?
•Hierarchical
J Model
Y •hierarchical
p(θ|φ) = p(θj |φ) approach
•Exchangeability
j=1 •Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 22/83


models(I) —

al., 5.1-5.3;
•Review 1
•Review 2
■ θ = (θ1 , · · · , θJ ) are independent conditional on Empirical Bayes
•Why
additional parameters φ (the hyperparameters): Hierarchical?
•Hierarchical
J Model
Y •hierarchical
p(θ|φ) = p(θj |φ) approach
•Exchangeability
j=1 •Basic ex. model
•General ex.
model
■ In general φ is unknown, so our distribution for θ •Typical structure
•Posterior dist.
must average over uncertainty in φ: •Predictive dist.
April 28, 2009 Chapter 2 - p. 22/83

J
Z "Y # Hierarchical
models(I) —
p(θ) = p(θj |φ) p(φ)dφ introduction
j=1 (Ref.: Gelman et

al., 5.1-5.3;
.
Berger, 4.6; J.
Albert, 7)
This mixture of i.i.d. s is usually all we need to cap- •Review 1
ture exchangeability in practice. •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 23/83

J
Z "Y # Hierarchical
models(I) —
p(θ) = p(θj |φ) p(φ)dφ introduction

al., 5.1-5.3;
.
Berger, 4.6; J.
Albert, 7)
This mixture of i.i.d. s is usually all we need to cap- •Review 1
ture exchangeability in practice. •Review 2
Empirical Bayes
•Why
Bruno de Finetti Theorem: Hierarchical?
•Hierarchical
As J → ∞, any suitably well-behaved Model
•hierarchical
exchangeable distribution on θ1 , · · · , θJ can be approach
•Exchangeability
written as an i.i.d. mixture. •Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 23/83

General exchangeable model with covariates
■ The usual way to model exchangeability with Hierarchical

models(I) —
covariates is through conditional independence: introduction
J
Z "Y # (Ref.: Gelman et
al., 5.1-5.3;
p(θ1 , . . . , θJ |x1 , . . . , xJ ) = p(θj |φ, xj ) p(φ|xj )dφ,
Berger, 4.6; J.
Albert, 7)
j=1 •Review 1
•Review 2
where x = (x1 , . . . , xJ ) represents the available Empirical Bayes
•Why
information. Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 24/83

General exchangeable model with covariates
■ The usual way to model exchangeability with Hierarchical

models(I) —
covariates is through conditional independence: introduction
J
Z "Y # (Ref.: Gelman et
al., 5.1-5.3;
p(θ1 , . . . , θJ |x1 , . . . , xJ ) = p(θj |φ, xj ) p(φ|xj )dφ,
Berger, 4.6; J.
Albert, 7)
j=1 •Review 1
•Review 2
where x = (x1 , . . . , xJ ) represents the available Empirical Bayes
•Why
information. Hierarchical?
•Hierarchical
Model
■ In this way, exchangeable models become •hierarchical
approach
almost universally applicable. •Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 24/83

Setting up hierarchical models: typical structure
The hierarchical model is specified in nested Hierarchical

models(I) —
stages: introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 25/83


models(I) —
(Ref.: Gelman et
1. p(y|θ) = the sampling distribution of the data. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 25/83


models(I) —
(Ref.: Gelman et
Berger, 4.6; J.
Albert, 7)
2. p(θ|φ) = the prior distribution for θ = (θ1 , . . . , θJ ) •Review 1
•Review 2
given φ—called population distribution. Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 25/83


models(I) —
(Ref.: Gelman et
Berger, 4.6; J.
Albert, 7)
•Review 2
•Why
Hierarchical?
3. p(φ) = the prior distribution for φ—called •Hierarchical
Model
hyperprior distribution. •hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 25/83


models(I) —
(Ref.: Gelman et
Berger, 4.6; J.
Albert, 7)
•Review 2
•Why
Hierarchical?
Model
approach
•Exchangeability
4. More levels are possible! •Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 25/83


models(I) —
(Ref.: Gelman et
Berger, 4.6; J.
Albert, 7)
•Review 2
•Why
Hierarchical?
Model
approach
•Exchangeability
4. More levels are possible! •Basic ex. model
•General ex.
model
5. The hyperprior at highest level is often diffuse. •Typical structure
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 25/83

Posterior distribution and Inference/Computation
Inference based on posterior distribution of Hierarchical

models(I) —
unknowns: introduction
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
•Hierarchical
in two steps: Model
•hierarchical
1. Inference for θ as if we knew φ using the approach
•Exchangeability
posterior conditional distribution p(θ|y, φ); •Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
•Hierarchical
in two steps: Model
•hierarchical
•Exchangeability
•General ex.
2. Inference for φ based on posterior marginal model
distribution p(φ|y). •Typical structure
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
•Hierarchical
in two steps: ? Model
•hierarchical
•Exchangeability
•General ex.
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
•Hierarchical
in two steps: ? Multi-parameter model? Model
•hierarchical
•Exchangeability
•General ex.
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 26/83


models(I) —
(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
•Review 1
Empirical Bayes
•Why
•Hierarchical
in two steps: ? Multi-parameter model? Model
•hierarchical
•Exchangeability
•General ex.
•Posterior dist.
3. Treat θ as vector parameter of interest and φ as •Predictive dist.
nuisance parameter(s)
S
CHOOL OFF
(thoughS they are both of
interest in some problems.)

April 28, 2009 Chapter 2 - p. 26/83
Posterior predictive distributions
Hierarchical
Hierarchical models are characterized both by hy- models(I) —
perparameters, φ, and parameters θ. introduction
(Ref.: Gelman et
al., 5.1-5.3;
Two posterior predictive distributions: Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 27/83

Hierarchical
(Ref.: Gelman et
al., 5.1-5.3;
■ the distribution of future observations ỹ Albert, 7)
•Review 1
corresponding to an existing θj (experiment), •Review 2
Empirical Bayes
based on the posterior draws of θj (and/or φ). •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 27/83

Hierarchical
(Ref.: Gelman et
al., 5.1-5.3;
■ the distribution of future observations ỹ Albert, 7)
•Review 1
corresponding to an existing θj (experiment), •Review 2
Empirical Bayes
based on the posterior draws of θj (and/or φ). •Why
Hierarchical?
•Hierarchical
■ the distribution of observations ỹ corresponding Model
•hierarchical
to future(experiment) θj ’s drawn from p(θj |φ): approach
◆ draw θ̃ from p(θj |φ); •Exchangeability
•Basic ex. model
◆ draw ỹ from p(y|θ̃).
•General ex.
model
•Posterior dist.
•Predictive dist.
April 28, 2009 Chapter 2 - p. 27/83

Hierarchical
models(II)
— A N-N HM
with SAT coaching
example
Hierarchical models(II) (Ref.: Gelman et

al., 5.4-5.5)
— A N-N HM A special case
•SAT coaching ex
with SAT coaching example •Non-H approach
•Model
specification
(Ref.: Gelman et al., 5.4-5.5) •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 28/83

A special case: One-way normal random-effects model
■ Sampling Data: Observed data are normally Hierarchical

models(II)
distributed with a different mean for each — A N-N HM
with SAT coaching
group/experiment, with known observation example
variance. (Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 29/83


models(II)
with SAT coaching
al., 5.4-5.5)
■ Prior/population distribution for group means: A special case
normal. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 29/83


models(II)
with SAT coaching
al., 5.4-5.5)
■ Prior/population distribution for group means: A special case
normal. •SAT coaching ex
•Non-H approach
■ Widely applicable, a special case of hierarchical •Model
specification
normal linear model (see Chapter 15, Gelman et. •Joint posterior
•Computation
al.). •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 29/83

Data structure
■ Consider J independent experiments. Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 30/83

Data structure

models(II)
iid — A N-N HM
■ yij ∼ N (θj , σ 2 ), i = 1, 2, . . . , nj (j = 1, 2 . . . , J), with SAT coaching
with known error variance σ 2 . example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 30/83

Data structure

models(II)
iid — A N-N HM
1
Pnj (Ref.: Gelman et
■ Sample mean: ȳ.j = nj i=1 yij with sample al., 5.4-5.5)
A special case
σ2
variance: σj2 = nj
. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 30/83

Data structure

models(II)
iid — A N-N HM
1
A special case
σ2
variance: σj2 = nj
•Non-H approach
■ Likelihood for θj in terms of sufficient statistics ȳ.j : •Model
specification
•Joint posterior
ȳ.j | θj ∼ N (θj , σj2 ). •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 30/83

Data structure

models(II)
iid — A N-N HM
1
A special case
σ2
variance: σj2 = nj
•Non-H approach
specification
•Joint posterior
•N-N Summary
•SAT ex. Result
■ This model ("normal with known variance") is •HM summary
appropriate for nj large enough. •Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 30/83

Data structure

models(II)
iid — A N-N HM
1
A special case
σ2
variance: σj2 = nj
•Non-H approach
specification
•Joint posterior
•N-N Summary
•SAT ex. Result
■ This model ("normal with known variance") is •HM summary
appropriate for nj large enough. •Computation
Overview
■ Purpose: estimating θj . How? •Exercises
April 28, 2009 Chapter 2 - p. 30/83

Hierarchical
■ Method 1—single data estimate: θ̂j = ȳ.j . models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 31/83

Hierarchical
— A N-N HM
■ Reasonable? What if there are J = 20 experiments with SAT coaching
example
with only nj = 2 observations. So it is not
accurate! (Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 31/83

Hierarchical
— A N-N HM
example
al., 5.4-5.5)
■ Method 2—pooled estimate: A special case
•SAT coaching ex
PJ 1 •Non-H approach
j=1 σj2 ȳ.j •Model
specification
θ̂j = ȳ.. = PJ 1 . •Joint posterior
j=1 σj2 •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 31/83

Hierarchical
— A N-N HM
example
al., 5.4-5.5)
■ Method 2—pooled estimate: A special case
•SAT coaching ex
PJ 1 •Non-H approach
j=1 σj2 ȳ.j •Model
specification
θ̂j = ȳ.. = PJ 1 . •Joint posterior
j=1 σj2 •Computation
•N-N Summary
•SAT ex. Result
■ Pooled estimate is reasonable if J groups are not •HM summary
significantly different. •Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 31/83

Statistical Testing
■ ANOVA F test to decide which estimate to use: Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 32/83

Statistical Testing

models(II)
◆ if the J group means appear significantly — A N-N HM
with SAT coaching
variable, choose sample means ȳ.j ; example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 32/83

Statistical Testing

models(II)
with SAT coaching
◆ if the variance between the group means is not
(Ref.: Gelman et
significantly greater than within the group al., 5.4-5.5)
A special case
means, use ȳ.. . •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 32/83

Statistical Testing

models(II)
with SAT coaching
◆ if the variance between the group means is not
(Ref.: Gelman et
significantly greater than within the group al., 5.4-5.5)
A special case
means, use ȳ.. . •SAT coaching ex
•Non-H approach
■ Theoretical ANOVA table, where τ 2 being the •Model
specification
variance of θ1 , . . . , θJ . For simplicity, let •Joint posterior
nj = n, σj2 = σ 2 , j = 1, 2, . . . , J (Balanced design). •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 32/83

Statistical Testing
Hierarchical
df SS MS E(MS|σ 2 , τ ) models(II)
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
Between J −1 i j (ȳ.j J−1 with SAT coaching
example
− ȳ.j )2 SS
σ2
P P
Within J(n-1) i j (ȳij J(n−1)
2 (Ref.: Gelman et
SS
σ2
P P
Total Jn-1 i j (ȳij − ȳ.. ) Jn−1 al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 33/83

Statistical Testing
Hierarchical
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
example
− ȳ.j )2 SS
σ2
P P
2 (Ref.: Gelman et
SS
σ2
P P
A special case
■ Conclusions: •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 33/83

Statistical Testing
Hierarchical
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
example
− ȳ.j )2 SS
σ2
P P
2 (Ref.: Gelman et
SS
σ2
P P
A special case
◆ if the ratio of "between" to "within" mean •Non-H approach
•Model
squares is significantly greater than 1, then specification
•Joint posterior
θ̂j = ȳ.j ; •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 33/83

Statistical Testing
Hierarchical
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
example
− ȳ.j )2 SS
σ2
P P
2 (Ref.: Gelman et
SS
σ2
P P
A special case
•Model
•Joint posterior
◆ if ratio of the mean squares is not statistically •N-N Summary
•SAT ex. Result
significant, then F test cannot reject H0 : τ = 0, •HM summary
•Computation
and θ̂j = ȳ.. . Overview
•Exercises
April 28, 2009 Chapter 2 - p. 33/83

Statistical Testing
Hierarchical
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
example
− ȳ.j )2 SS
σ2
P P
2 (Ref.: Gelman et
SS
σ2
P P
A special case
•Model
•Joint posterior
◆ if ratio of the mean squares is not statistically •N-N Summary
•SAT ex. Result
significant, then F test cannot reject H0 : τ = 0, •HM summary
•Computation
and θ̂j = ȳ.. . Overview
•Exercises
■ Method 3: weighted combination:
θ̂j = λj ȳ.j + (1 − λj )ȳ.. ,

where 0 ≤ λj ≤ 1.
April 28, 2009 Chapter 2 - p. 33/83
■ What priors produce these posterior estimates? Hierarchical
models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 34/83

models(II)
1. The unpooled estimate θ̂j = ȳ.j is the posterior — A N-N HM
with SAT coaching
mean if θj ∼ U (−∞, ∞). (λj ≡ 1, τ 2 = ∞.) example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 34/83

models(II)
with SAT coaching
2. The pooled estimate θ̂j = ȳ.. is the posterior (Ref.: Gelman et

al., 5.4-5.5)
mean if θ1 = · · · = θJ with a uniform prior for θ. A special case
•SAT coaching ex
(λj ≡ 0, τ 2 = 0.) •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 34/83

models(II)
with SAT coaching

al., 5.4-5.5)
•SAT coaching ex
•Model
3. The weighted combination is the posterior mean specification
•Joint posterior
if the J values θj have iid normal prior •Computation
densities. •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 34/83

models(II)
with SAT coaching

al., 5.4-5.5)
•SAT coaching ex
•Model
•Joint posterior
•SAT ex. Result
■ All three options are exchangeable in θj′ s and •HM summary
•Computation
options 1 and 2 are special cases of option 3. Overview
•Exercises
April 28, 2009 Chapter 2 - p. 34/83

models(II)
with SAT coaching

al., 5.4-5.5)
•SAT coaching ex
•Model
•Joint posterior
•SAT ex. Result
■ All three options are exchangeable in θj′ s and •HM summary
•Computation
options 1 and 2 are special cases of option 3. Overview
•Exercises
■ See (below) the hierarchical model with: 1)
normal likelihood with known variance 2)
conjugate normal S
iid prior
CHOOL OF F
for theSmeans.
April 28, 2009 Chapter 2 - p. 34/83

SAT coaching example
ÆUÿÁ):
«Y²ÿÁ øÆ3Æ)\Æë, [©
SAT: Scholastic Aptitude Test( Hierarchical
models(II)
Ǒ«
, — A N-N HM
with SAT coaching
3 example
· PSAT: Preliminary SAT (Ref.: Gelman et

al., 5.4-5.5)
A special case
· SAT-V:(Verbal) •SAT coaching ex
•Non-H approach
· SAT-M:(Mathematics) •Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 35/83

SAT coaching example: Data description
Ô§
■ Purpose: Analyze the effects of special coaching Hierarchical
models(II)
programs( ) on test scores. — A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 36/83

Ô§
models(II)
with SAT coaching
■ All students in the experiments had taken the example
PSAT, and allowance was made for differences in (Ref.: Gelman et

al., 5.4-5.5)
the PSAT-M and PSAT-V test scores between A special case
coached and uncoached students. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 36/83

Ô§
models(II)
with SAT coaching

al., 5.4-5.5)
•Non-H approach
■ Separate randomized experiments were •Model
specification
conducted in 8 high schools: in each school the •Joint posterior
•Computation
estimated coaching effect (yj ) and its standard •N-N Summary
error (σj ) were obtained by an analysis of •SAT ex. Result
•HM summary
covariance adjustment—a linear regression was •Computation
Overview
performed of SAT-V on treatment group, using •Exercises
PSAT-M and PSAT-V as control variables.
April 28, 2009 Chapter 2 - p. 36/83

Ô§
models(II)
with SAT coaching

al., 5.4-5.5)
•Non-H approach
■ Separate randomized experiments were •Model
specification
conducted in 8 high schools: in each school the •Joint posterior
•Computation
estimated coaching effect (yj ) and its standard •N-N Summary
error (σj ) were obtained by an analysis of •SAT ex. Result
•HM summary
covariance adjustment—a linear regression was •Computation
Overview
performed of SAT-V on treatment group, using •Exercises
PSAT-M and PSAT-V as control variables.
■ Data: yj , j = 1, 2, . . . , J = 8 with known σj2 .
April 28, 2009 Chapter 2 - p. 36/83

SAT coaching example: The data
The data (Rudin, 1981): Hierarchical

models(II)
— A N-N HM
estimated Standard error True with SAT coaching
example
treatment of effect treatment
(Ref.: Gelman et
School effect, yj estimates, σj effect, θj al., 5.4-5.5)
A special case
A 28 15 ? •SAT coaching ex
•Non-H approach
B 8 10 ? •Model
specification
C -3 16 ? •Joint posterior
•Computation
D 7 11 ? •N-N Summary
•SAT ex. Result
E -1 9 ? •HM summary
•Computation
F 1 11 ? Overview
•Exercises
G 18 10 ?
H 12 18 ?
April 28, 2009 Chapter 2 - p. 37/83

SAT coaching example: The model
The model: Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 38/83


models(II)
■ The quantities of interest are the θj , j = 1, . . . , 8:
/ 0
— A N-N HM
with SAT coaching
average true effects of coaching programs. example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 38/83


models(II)
/ 0
— A N-N HM
with SAT coaching
(Ref.: Gelman et
■ Data yj : separate estimated treatment effects for al., 5.4-5.5)
A special case
each school. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 38/83


models(II)
/ 0
— A N-N HM
with SAT coaching
(Ref.: Gelman et
A special case
•Non-H approach
■ The standard errors σj are assumed known •Model
specification
(large samples). •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 38/83


models(II)
/ 0
— A N-N HM
with SAT coaching
(Ref.: Gelman et
A special case
•Non-H approach
■ The standard errors σj are assumed known •Model
specification
(large samples). •Joint posterior
•Computation
•N-N Summary
■ This is a randomized experiment with large •SAT ex. Result
samples (over 32 students in each school), no •HM summary
•Computation
outliers, so we appeal to the central limit Overview
•Exercises
theorem:
yj |θj ∼ N (θj , σj2 ).
April 28, 2009 Chapter 2 - p. 38/83

Nonhierarchical approach 1
Consider the 8 programs separately: Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 39/83


models(II)
— A N-N HM
■ Two programs appear to work (18-28 points) with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 39/83


models(II)
— A N-N HM
example
■ Four programs appear to have a small effect (Ref.: Gelman et

al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 39/83


models(II)
— A N-N HM
example

al., 5.4-5.5)
A special case
■ Two programs appear to have negative effects •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 39/83


models(II)
— A N-N HM
example

al., 5.4-5.5)
A special case
•Non-H approach
•Model
■ Large standard errors imply overlapping CIs specification
•Joint posterior
(95% posterior intervals) •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 39/83


models(II)
— A N-N HM
example

al., 5.4-5.5)
A special case
•Non-H approach
•Model
■ Large standard errors imply overlapping CIs specification
•Joint posterior
(95% posterior intervals) •Computation
•N-N Summary
•SAT ex. Result
■ Thus, NOT reasonable! •HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 39/83

Use a pooled estimate of the coaching effect: Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 40/83


models(II)
— A N-N HM
■ Pooled estimate with SAT coaching
example
P8 2
j=1 (y j /σj) (Ref.: Gelman et
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)
j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 40/83


models(II)
— A N-N HM
example
P8 2
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)
j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
■ Pooled estimate applies to each school. •Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 40/83


models(II)
— A N-N HM
example
P8 2
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)
j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
Separate and pooled estimates are both unreason- •Computation
•N-N Summary
able! —See further on pages 139–141. •SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 40/83


models(II)
— A N-N HM
example
P8 2
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)
j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
•N-N Summary
•HM summary
■ /Classical0test fails to reject that all θ .s are j

•Computation
Overview
•Exercises
equal.
April 28, 2009 Chapter 2 - p. 40/83


models(II)
— A N-N HM
example
P8 2
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)
j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
•N-N Summary
•HM summary
■ /Classical0test fails to reject that all θ .s are j

•Computation
Overview
•Exercises
equal.
A hierarchical model provides a compromise.
April 28, 2009 Chapter 2 - p. 40/83

Model specification
■ Observed data are normally distributed with a Hierarchical

models(II)
different mean in each group: — A N-N HM
with SAT coaching
example
yj |θj ∼ N (θj , σj2 ), j = 1, 2, . . . , J.
(Ref.: Gelman et
nj al., 5.4-5.5)
1 X
yj = yij A special case
nj i=1 •SAT coaching ex
•Non-H approach
•Model
σj2 = σ 2 /nj (assumed known) specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 41/83

Model specification
■ Observed data are normally distributed with a Hierarchical

models(II)
different mean in each group: — A N-N HM
with SAT coaching
example
yj |θj ∼ N (θj , σj2 ), j = 1, 2, . . . , J.
(Ref.: Gelman et
nj al., 5.4-5.5)
1 X
yj = yij A special case
nj i=1 •SAT coaching ex
•Non-H approach
•Model
σj2 = σ 2 /nj (assumed known) specification
•Joint posterior
.
•Computation
•N-N Summary
■ Prior model for θj s is based on a normal •SAT ex. Result
population distribution (conjugate) •HM summary
•Computation
Overview
J
Y •Exercises
p(θ1 , . . . , θJ |µ, τ ) = N (θj |µ, τ )
j=1
April 28, 2009 Chapter 2 - p. 41/83

■ Hyperprior distribution: p(µ, τ ) = p(τ )p(µ|τ ) Hierarchical
models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 42/83

models(II)
◆ p(µ|τ ) ∝ 1 (noninformative, this won t matter . — A N-N HM

with SAT coaching
example
much because the data supply a great deal of
(Ref.: Gelman et
information about µ) al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 42/83

models(II)

with SAT coaching
example
(Ref.: Gelman et
A special case
•SAT coaching ex
◆ p(τ ) ∝ 1 (must be sure the posterior is proper) •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 42/83

models(II)

with SAT coaching
example
(Ref.: Gelman et
A special case
•SAT coaching ex
◆ p(τ ) ∝ 1 (must be sure the posterior is proper) •Non-H approach
•Model
specification
Note: p(log(τ )) ∝ 1 yields an improper posterior •Joint posterior
dist’n! •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 42/83

Joint posterior distribution
The joint posterior distribution: Hierarchical

models(II)
— A N-N HM
p(θ, µ, τ |y) ∝ p(µ, τ )p(θ|µ, τ )p(y|θ) with SAT coaching
example
YJ J
Y
∝ N (θj |µ, τ 2 ) N (yj |θj , σj2 ) (Ref.: Gelman et
al., 5.4-5.5)
j=1 j=1 A special case
" # •SAT coaching ex
•Non-H approach
1 X 1
2 •Model
∝ τ −J exp − (θ j − µ) specification
2 j τ2 •Joint posterior
" # •Computation
•N-N Summary
1X 1 2 •SAT ex. Result
× exp − (y j − θj )
2 j σj2 •HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 43/83

Joint posterior distribution
The joint posterior distribution: Hierarchical

models(II)
— A N-N HM
p(θ, µ, τ |y) ∝ p(µ, τ )p(θ|µ, τ )p(y|θ) with SAT coaching
example
YJ J
Y
∝ N (θj |µ, τ 2 ) N (yj |θj , σj2 ) (Ref.: Gelman et
al., 5.4-5.5)
j=1 j=1 A special case
" # •SAT coaching ex
•Non-H approach
1 X 1
2 •Model
∝ τ −J exp − (θ j − µ) specification
2 j τ2 •Joint posterior
" # •Computation
•N-N Summary
1X 1 2 •SAT ex. Result
× exp − (y j − θj )
2 j σj2 •HM summary
•Computation
Overview
•Exercises
Note: Factors depend only on y and {σj } are
treated as constant.
April 28, 2009 Chapter 2 - p. 43/83

Conditional posterior dist .n of θ given µ, τ, y
■ Treat (µ, τ ) as fixed in the previous expressions. Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 44/83


models(II)
— A N-N HM
■ Given (µ, τ ), the J separate parameters θj are with SAT coaching
example
independent in their posterior distribution since
(Ref.: Gelman et
they appear in different factors in the likelihood al., 5.4-5.5)
A special case
(which factors into J components). •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 44/83


models(II)
— A N-N HM
example
(Ref.: Gelman et
A special case
•Non-H approach
•Model
■ Thus, θj |µ, τ, y ∼ N (θ̂j , Vj ), with specification
•Joint posterior
1 1
y
σj2 j
+ τ2
µ 1
•Computation
•N-N Summary
θ̂j = 1 1 and Vj = 1 1 •SAT ex. Result
σj2
+ τ2 σj2
+ τ2 •HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 44/83


models(II)
— A N-N HM
example
(Ref.: Gelman et
A special case
•Non-H approach
•Model
■ Thus, θj |µ, τ, y ∼ N (θ̂j , Vj ), with ? specification
•Joint posterior
1 1
y
σj2 j
+ τ2
µ 1
•Computation
•N-N Summary
σj2
+ τ2 σj2
+ τ2 •HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 44/83


models(II)
— A N-N HM
example
(Ref.: Gelman et
A special case
•Non-H approach
•Model
■ Thus, θj |µ, τ, y ∼ N (θ̂j , Vj ), with ? specification
•Joint posterior
1 1
y
σj2 j
+ τ2
µ 1
•Computation
•N-N Summary
σj2
+ τ2 σj2
+ τ2 •HM summary
•Computation
Overview
•Exercises
■ Large standard errors imply overlapping CIs
April 28, 2009 Chapter 2 - p. 44/83

Marginal posterior dist .n of µ, τ given y
■ To derive p(µ, τ |y), think of inference about (µ, τ ) Hierarchical

models(II)
directly: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(µ, τ )p(y|µ, τ ).
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 45/83


models(II)
with SAT coaching
example
p(µ, τ |y) ∝ p(µ, τ )p(y|µ, τ ).
(Ref.: Gelman et
■ Prior distribution: p(µ, τ ) ∝ 1 al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 45/83


models(II)
with SAT coaching
example
p(µ, τ |y) ∝ p(µ, τ )p(y|µ, τ ).
(Ref.: Gelman et
■ Prior distribution: p(µ, τ ) ∝ 1 al., 5.4-5.5)
A special case
•SAT coaching ex
■ Data distribution: •Non-H approach
•Model
J specification
Y •Joint posterior
p(y|µ, τ ) ∝ N (yj |µ, σj2 + τ 2 ) •Computation
•N-N Summary
j=1
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 45/83

■ Thus, Hierarchical
models(II)
— A N-N HM
J
Y with SAT coaching
p(µ, τ |y) ∝ N (yj |µ, σj2 + τ 2 ) example

al., 5.4-5.5)
J 2
A special case
Y
2 2 −1/2 (yj − µ)
∝ (σj + τ ) exp − 2 2)
•SAT coaching ex
j=1
2(σ j + τ •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 46/83

Note: Hierarchical
models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 47/83

Note: Hierarchical
models(II)
— A N-N HM
■ We can derive this analytically by integrating with SAT coaching
example
p(θ, µ, τ |y) over θ, but in non-normal models, it is
(Ref.: Gelman et
not generally possible to integrate over θ, and al., 5.4-5.5)
A special case
more elaborate computational methods are •SAT coaching ex
needed. •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 47/83

Note: Hierarchical
models(II)
— A N-N HM
■ We can derive this analytically by integrating with SAT coaching
example
p(θ, µ, τ |y) over θ, but in non-normal models, it is
(Ref.: Gelman et
not generally possible to integrate over θ, and al., 5.4-5.5)
A special case
more elaborate computational methods are •SAT coaching ex
needed. •Non-H approach
•Model
specification
■ We can compute marginal posterior density over •Joint posterior
•Computation
grid in (µ, τ ), as in bioassay example, but it is •N-N Summary
better to consider a further simplification... •SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 47/83

Posterior distribution of µ given τ, y
■ Instead of sampling (µ, τ ) on a grid, factor the Hierarchical

models(II)
distribution: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(τ |y)p(µ|τ, y).
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 48/83


models(II)
with SAT coaching
example
p(µ, τ |y) ∝ p(τ |y)p(µ|τ, y).
(Ref.: Gelman et
■ p(µ|τ, y) is obtained by looking at p(µ, τ |y) and al., 5.4-5.5)
A special case
thinking of τ as known. With a uniform prior for •SAT coaching ex
•Non-H approach
µ|τ , the log posterior is quadratic in µ and •Model
specification
therefore normal: •Joint posterior
J •Computation
Y •N-N Summary
p(µ|τ, y) ∝ N (µ|yj , σj2 + τ 2 ). •SAT ex. Result
•HM summary
j=1 •Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 48/83


models(II)
with SAT coaching
example
p(µ, τ |y) ∝ p(τ |y)p(µ|τ, y).
(Ref.: Gelman et
■ p(µ|τ, y) is obtained by looking at p(µ, τ |y) and al., 5.4-5.5)
A special case
thinking of τ as known. With a uniform prior for •SAT coaching ex
•Non-H approach
µ|τ , the log posterior is quadratic in µ and •Model
specification
therefore normal: •Joint posterior
J •Computation
Y •N-N Summary
p(µ|τ, y) ∝ N (µ|yj , σj2 + τ 2 ). •SAT ex. Result
•HM summary
j=1 •Computation
Overview
•Exercises
■ This is a normal sampling distribution with a
noninformative prior density on µ.
April 28, 2009 Chapter 2 - p. 48/83

■ The mean and variance are obtained by Hierarchical
models(II)
considering group means yj as J independent — A N-N HM
with SAT coaching
estimates of µ with variance σj2 + τ 2 . example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 49/83

■ The mean and variance are obtained by Hierarchical
models(II)
considering group means yj as J independent — A N-N HM
with SAT coaching
estimates of µ with variance σj2 + τ 2 . example
(Ref.: Gelman et
■ Result: µ|τ, y ∼ N (µ̂, Vµ ) with al., 5.4-5.5)
A special case
PJ 1 •SAT coaching ex
j=1 σj +τ 2 yj
2
1 •Non-H approach
µ̂ = PJ 1
and Vµ = PJ 1
•Model
specification
j=1 σj2 +τ 2 j=1 σj2 +τ 2 •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 49/83

Posterior distribution of τ given y
■ Directly: We could integrate p(µ, τ |y) over µ ?: Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 50/83


models(II)
■ Indirectly: It is easier to use identity — A N-N HM
with SAT coaching
example
p(µ, τ |y)
p(τ |y) = , (Ref.: Gelman et
p(µ|τ, y) al., 5.4-5.5)
A special case
•SAT coaching ex
which holds for all µ. •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 50/83


models(II)
■ Indirectly: It is easier to use identity — A N-N HM
with SAT coaching
example
p(µ, τ |y)
p(τ |y) = , (Ref.: Gelman et
p(µ|τ, y) al., 5.4-5.5)
A special case
•SAT coaching ex
which holds for all µ. Evaluating at µ = µ̂ gives: •Non-H approach
QJ •Model
2 2
j=1 N (y j |µ̂, σj + τ ) specification
p(τ |y) ∝ •Joint posterior
N (µ̂|µ̂, Vµ ) •Computation
•N-N Summary
J 2
•SAT ex. Result
1/2
Y
2 2 −1/2 (yj − µ̂) •HM summary
∝ Vµ (σj + τ ) exp − 2 2
. •Computation
j=1
2(σj + τ ) Overview
•Exercises
April 28, 2009 Chapter 2 - p. 50/83

Note: Hierarchical
models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 51/83

Note: Hierarchical
models(II)
— A N-N HM
■ Note that Vµ and µ̂ are both functions of τ , and with SAT coaching
example
thus so is p(τ |y), so we compute p(τ |y) on a grid
(Ref.: Gelman et
of values of τ . al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 51/83

Note: Hierarchical
models(II)
— A N-N HM
■ Note that Vµ and µ̂ are both functions of τ , and with SAT coaching
example
thus so is p(τ |y), so we compute p(τ |y) on a grid
(Ref.: Gelman et
of values of τ . al., 5.4-5.5)
A special case
•SAT coaching ex
■ The numerator of the first expression for p(τ |y) is •Non-H approach
the profile likelihood for τ given the maximum •Model
specification
likelihood estimate of µ given τ — more details •Joint posterior
•Computation
later. •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 51/83

N-N model computation: Summary
To simulate from joint posterior distribution Hierarchical

models(II)
p(θ, µ, τ |y): — A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 52/83


models(II)
with SAT coaching
example
1. Draw τ from p(τ |y) (grid approximation)
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 52/83


models(II)
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
2. Draw µ from p(µ|τ, y) (normal distribution) A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 52/83


models(II)
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
3. Draw θ = (θ1 , . . . , θJ ) from p(θ|µ, τ, y) •Model
specification
(independent normal distribution for each θj ) •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 52/83


models(II)
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
3. Draw θ = (θ1 , . . . , θJ ) from p(θ|µ, τ, y) •Model
specification
(independent normal distribution for each θj ) •Joint posterior
•Computation
•N-N Summary
Back to the coaching example: Apply these ideas •SAT ex. Result
to SAT coaching data; repeat 1000 times to obtain •HM summary
•Computation
1000 simulations. Overview
•Exercises
April 28, 2009 Chapter 2 - p. 52/83

p(τ |y)
Hierarchical
models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
0 5 10 15 20 25 30 •Computation
Overview
τ
•Exercises
Figure 1: Marginal density of τ , p(τ |y).

April 28, 2009 Chapter 2 - p. 53/83

E(θj |τ, y) and Sd(θj |τ, y)
Hierarchical
Conditional posterior mean Conditional posterior SD
models(II)
— A N-N HM
30
with SAT coaching
20
example
25
A
(Ref.: Gelman et
15
H al., 5.4-5.5)
20
Posterior Standard Deviations

Estimate Treatment Effects
C A special case
A
G
•SAT coaching ex
15
•Non-H approach
10
D
F
B
G
H •Model
10
E
B
D
specification
•Joint posterior
5
5
F
•Computation
C
E •N-N Summary
0
•SAT ex. Result

•HM summary
−5
0 5 10 15 20 25 30 0 5 10 15 20 25 30
•Computation
Overview
τ τ
•Exercises
Figure 2: Conditional posterior means and std: E(θj |τ, y),

Sd(θj |τ, y). S C H O O L O F F I N A N C E A N D S TAT I S T I C S
April 28, 2009 Chapter 2 - p. 54/83

SAT coaching example: post. quantiles
Hierarchical
School 2.5% 25% 50% 75% 97.5% yj models(II)
— A N-N HM
with SAT coaching
A -2 7 10 16 31 28 example
B -5 3 8 12 23 8 (Ref.: Gelman et
al., 5.4-5.5)
C -11 2 7 11 19 -3 A special case
•SAT coaching ex
D -7 4 8 11 21 7 •Non-H approach
•Model
E -9 1 5 10 18 -1 specification
•Joint posterior
F -7 2 6 10 28 1 •Computation
•N-N Summary
G -1 7 10 15 26 18 •SAT ex. Result
•HM summary
H -6 3 8 13 33 12 •Computation
Overview
µ -2 5 8 11 18 •Exercises
τ 0.3 2.3 5.1 8.8 21.0
April 28, 2009 Chapter 2 - p. 55/83

SAT coaching example: Reults
We can address more complicated questions: Hierarchical
.s effect is the max) = 0.25

models(II)
— A N-N HM

Pr(school A with SAT coaching
example

Pr(school B
(Ref.: Gelman et
.s effect is the min) = 0.07

Pr(school C al., 5.4-5.5)

A special case
Pr(school A •SAT coaching ex

Pr(school B •Non-H approach
•Model
.s effect > school C.s effect) = 0.67

Pr(school C specification
•Joint posterior
Pr(school A •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 56/83

Hierarchical models: Summary
1. They account for multiple levels of variability. Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 57/83


models(II)
— A N-N HM
2. There is a data-determined degree of pooling with SAT coaching
example
across studies.
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 57/83


models(II)
— A N-N HM
example
across studies.
(Ref.: Gelman et
al., 5.4-5.5)
3. Classical estimates (no pooling, complete A special case
•SAT coaching ex
pooling) provide a starting point for analysis. •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 57/83


models(II)
— A N-N HM
example
across studies.
(Ref.: Gelman et
al., 5.4-5.5)
3. Classical estimates (no pooling, complete A special case
•SAT coaching ex
pooling) provide a starting point for analysis. •Non-H approach
•Model
specification
We can draw inference about the population of •Joint posterior
schools. •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 57/83

Computation with hierarchical models: Overview
Conjugate case (p(θ|φ) conjugate prior for p(y|θ)) Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 58/83


models(II)
— A N-N HM
1. write p(θ, φ|y) = p(φ|y)p(θ|φ, y) with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 58/83


models(II)
— A N-N HM
example
2. identify conditional posterior density p(θ|φ, y) (Ref.: Gelman et

al., 5.4-5.5)
(easy for conjugate models) A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 58/83


models(II)
— A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
3. obtain marginal posterior distribution p(φ|y) •Model
specification
(more about this step on next slide) •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 58/83


models(II)
— A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
3. obtain marginal posterior distribution p(φ|y) •Model
specification
(more about this step on next slide) •Joint posterior
•Computation
4. draw from p(φ|y) and then p(θ|φ, y) •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 58/83

Approaches for obtaining p(φ|y)
Four ways to obtain p(φ|y) : Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
example
2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et

al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
3. Sampling from p(φ|y): •Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
■ easy if it is a common distribution •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
•N-N Summary
■ grid if φ is low-dimensional •SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
•N-N Summary
•HM summary
■ more sophisticated methods (later) •Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 59/83


models(II)
R — A N-N HM
example

al., 5.4-5.5)
•SAT coaching ex
•Non-H approach
specification
•Joint posterior
•N-N Summary
•HM summary
■ more sophisticated methods (later) •Computation
/Empirical Bayes0methods replace p(φ|y) by

Overview
•Exercises
4.
mode.
April 28, 2009 Chapter 2 - p. 59/83

Exercises
Ex 5.1, 5.3 Hierarchical

models(II)
— A N-N HM
with SAT coaching
example
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises
April 28, 2009 Chapter 2 - p. 60/83

Hierarchical
models(III)
— Meta-analysis of
clinical trials
(Ref.: Gelman et
Hierarchical models(III) al., 5.6, 19.4)
•Meta-analysis
— Meta-analysis of clinical trials •The data
•Parameters
•Normal Approx.
(Ref.: Gelman et al., 5.6, 19.4) •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 61/83

Meta-analysis
■ Meta-analysis(?Ô©Û ) aims to summarize and Hierarchical

models(III)
integrate findings from research studies in a — Meta-analysis of
clinical trials
particular area.
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 62/83

Meta-analysis
■ Meta-analysis( ?Ô©Û ) aims to summarize and Hierarchical

models(III)
clinical trials
particular area.
(Ref.: Gelman et
■ It involves combining information from several al., 5.6, 19.4)
•Meta-analysis
parallel data sources, so is closely connected to •The data
•Parameters
hierarchical modelling (but there are well known •Normal Approx.
frequentist methods as well). •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 62/83

Meta-analysis

models(III)
clinical trials
particular area.
(Ref.: Gelman et
•Meta-analysis
•Parameters
.
Assumptions
•Classical model
■ We ll re-inforce some of the concepts of •Classical test
hierarchical modelling in a meta-analysis of •HM–stage 1
•HM–stage 2
clinical trials data.
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 62/83

Meta-analysis

models(III)
clinical trials
particular area.
(Ref.: Gelman et
•Meta-analysis
•Parameters
.
Assumptions
•Classical model
■ We ll re-inforce some of the concepts of •Classical test
hierarchical modelling in a meta-analysis of •HM–stage 1
•HM–stage 2
clinical trials data.
.
•Posterior
?Ô©Û´A½+(XÆ)¥æ^«ÚOÆ distn s
Ü{§´òõÕáéÓKïÄn
•Result
•Another example
Üå5?1½þ©Û{"
•Exercises
April 28, 2009 Chapter 2 - p. 62/83

The Data and Aim of study
■ Meta-analysis Data: 22 clinical trials, each with Hierarchical

models(III)
%*òl
two groups of heart attack(myocardial infarction, — Meta-analysis of
clinical trials
ÉN{
) patients receiving (or not)
(Ref.: Gelman et
beta-blockers( ) (samples sizes from al., 5.6, 19.4)
100 to almost 2000, mortality from 3% to 21% •Meta-analysis
•The data
showing a modest, though not ’statistically •Parameters
•Normal Approx.
significant,’ benefit from use of beta-blockers.) •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 63/83


models(III)
%*òl
clinical trials
ÉN{
(Ref.: Gelman et
•The data
•Normal Approx.
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 63/83


models(III)
%*òl
clinical trials
ÉN{
(Ref.: Gelman et
•The data
•Normal Approx.
Assumptions
■ Aim: Use a combined analysis of the studies to •Classical model
•Classical test
measure the strength of evidence for (and •HM–stage 1
•HM–stage 2
magnitude of) any beneficial effect of the
.
•Posterior
treatment under study. distn s
•Result
•Another example
■ Note: Any formal analysis must be preceded by •Exercises
the application of rigorous inclusion criteria.
April 28, 2009 Chapter 2 - p. 63/83

Meta-analysis: The data
The data (Yusuf et al., 1985): Hierarchical

models(III)
Trial Raw Data Estimated Estimated clinical trials
j Control Treated log(OR) SD (Ref.: Gelman et

al., 5.6, 19.4)
(deaths/total) (deaths/total) yj σj •Meta-analysis
•The data
1 3/39 3/38 0.028 0.850 •Parameters
•Normal Approx.
2 14/116 7/114 -0.741 0.483 •Possible
Assumptions
3 11/93 5/69 -0.541 0.565 •Classical model
4 127/1520 102/1533 -0.246 0.138 •Classical test
•HM–stage 1
5 27/365 28/355 0.069 0.281 •HM–stage 2
.
•Posterior
6 6/52 4/59 -0.584 0.676 distn s
•Result
. ··· ··· ··· ··· •Another example
•Exercises
21 43/364 27/391 -0.591 0.257
22 39/674 22/680 -0.608 0.272
April 28, 2009 Chapter 2 - p. 64/83

Parameters for each clinical trial
Meta-analysis involves data in the form of several Hierarchical

models(III)
2 × 2 tables. — Meta-analysis of
clinical trials
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 65/83


models(III)
clinical trials
■ In trial j there are n0j control subjects and n1j (Ref.: Gelman et
al., 5.6, 19.4)
treatment subjects, with y0j and y1j deaths •Meta-analysis
respectively. •The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 65/83


models(III)
clinical trials
■ In trial j there are n0j control subjects and n1j (Ref.: Gelman et
al., 5.6, 19.4)
treatment subjects, with y0j and y1j deaths •Meta-analysis
respectively. •The data
•Parameters
•Normal Approx.
■ Sampling model: y0j and y1j have independent •Possible
Assumptions
binomial sampling distributions with probabilities •Classical model
•Classical test
of death p0j and p1j respectively. •HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 65/83

Odds ratios as a measure of effectiveness
Estimands of interest: Hierarchical

models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 66/83


models(III)
clinical trials
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
.
•The data
•Parameters
We ll use the natural logarithm of the odds ratio, •Normal Approx.
θj = log ρj , as a measure of effect size comparing •Possible
Assumptions
treatment to control groups. •Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 66/83


models(III)
clinical trials
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
.
•The data
•Parameters
Assumptions
treatment to control groups. The reasons are: •Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 66/83


models(III)
clinical trials
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
.
•The data
•Parameters
Assumptions
•Classical test
■ Interpretability in a range of study designs •HM–stage 1
•HM–stage 2
(cohorts, case-control and clinical trials).
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 66/83


models(III)
clinical trials
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
.
•The data
•Parameters
Assumptions
•Classical test
•HM–stage 2
.
•Posterior
distn s
■ Posterior distribution of θj = log ρj close to
•Result
normality even for small sample sizes. •Another example
•Exercises
April 28, 2009 Chapter 2 - p. 66/83


models(III)
clinical trials
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
.
•The data
•Parameters
Assumptions
•Classical test
•HM–stage 2
.
•Posterior
distn s
■ Posterior distribution of θj = log ρj close to
•Result
normality even for small sample sizes. •Another example
•Exercises
■ Canonical (natural) parameter for logistic
regression.
April 28, 2009 Chapter 2 - p. 66/83

Normal approx. to the likelihood
Summarize the results of each trial with an Hierarchical

models(III)
approximate normal likelihood for θj . — Meta-analysis of
clinical trials
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 67/83


models(III)
clinical trials
Let yj represent the empirical logit, a point (Ref.: Gelman et

al., 5.6, 19.4)
estimate of the effect θj in the jth study where •Meta-analysis
j = 1, 2, . . . , J : •The data
•Parameters
•Normal Approx.
y1j y0j •Possible
yj = log − log , Assumptions
n1j − y1j n0j − y0j •Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 67/83


models(III)
clinical trials
Let yj represent the empirical logit, a point (Ref.: Gelman et

al., 5.6, 19.4)
estimate of the effect θj in the jth study where •Meta-analysis
j = 1, 2, . . . , J : •The data
•Parameters
•Normal Approx.
y1j y0j •Possible
yj = log − log , Assumptions
n1j − y1j n0j − y0j •Classical model
•Classical test
•HM–stage 1
with approximate sampling variance: •HM–stage 2
.
•Posterior
1 1 1 1 distn s
σj2 = + + + . •Result
y1j n1j − y1j y0j n0j − y0j •Another example
•Exercises
April 28, 2009 Chapter 2 - p. 67/83

Remarks
1. Here we use the results of one analytic Hierarchical

models(III)
approach to produce a point estimate and — Meta-analysis of
clinical trials
standard error that can be regarded as
(Ref.: Gelman et
approximately a normal mean and standard al., 5.6, 19.4)
deviation. •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 68/83

Remarks

models(III)
clinical trials
(Ref.: Gelman et
•The data
2. We use the notation yj and σj2 to be consistent •Parameters
•Normal Approx.
with the earlier expressions for the hierarchical •Possible
Assumptions
normal model. •Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 68/83

Remarks

models(III)
clinical trials
(Ref.: Gelman et
•The data
2. We use the notation yj and σj2 to be consistent •Parameters
•Normal Approx.
with the earlier expressions for the hierarchical •Possible
Assumptions
normal model. •Classical model
•Classical test
3. We do not use the continuity correction of •HM–stage 1
•HM–stage 2
adding a fraction such as 1/2 to the four counts
.
•Posterior
distn s
of the contingency table to improve the •Result
asymptotic normality of the sampling •Another example
•Exercises
distributions.
April 28, 2009 Chapter 2 - p. 68/83

Possible Assumptions about θj′ s
1. Studies are identical replications, so θj = µ for Hierarchical

models(III)
all j (no heterogeneity) or — Meta-analysis of
clinical trials
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 69/83


models(III)
clinical trials
2. No comparability between studies so that each (Ref.: Gelman et

al., 5.6, 19.4)
study provides no information about the others •Meta-analysis
(complete heterogeneity) or •The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 69/83


models(III)
clinical trials
2. No comparability between studies so that each (Ref.: Gelman et

al., 5.6, 19.4)
study provides no information about the others •Meta-analysis
(complete heterogeneity) or •The data
•Parameters
•Normal Approx.
3. Studies are exchangeable but not identical or •Possible
Assumptions
completely unrelated (compromise between 1 •Classical model
•Classical test
and 2). •HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 69/83

Classical model for a fixed treated effect
■ If all of the θj are identical and equal to a Hierarchical

models(III)
common treatment effect µ, then — Meta-analysis of
clinical trials
yj |µ, σj2 ∼ N (µ, σj2 ). (Ref.: Gelman et

al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 70/83


models(III)
clinical trials

al., 5.6, 19.4)
•Meta-analysis
■ The classical pooled estimate µ̂ of µ weights •The data
•Parameters
each trial estimate inversely by its variance: •Normal Approx.
PJ •Possible
2
j=1 y j /σ j
Assumptions
•Classical model
µ̂ = PJ 2
. •Classical test
j 1/σj •HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 70/83


models(III)
clinical trials

al., 5.6, 19.4)
•Meta-analysis
■ The classical pooled estimate µ̂ of µ weights •The data
•Parameters
each trial estimate inversely by its variance: •Normal Approx.
PJ •Possible
2
j=1 y j /σ j
Assumptions
•Classical model
µ̂ = PJ 2
. •Classical test
j 1/σj •HM–stage 1
•HM–stage 2
.
•Posterior
■ Assumptions imply µ̂ normal with variance distn s
PJ •Result
1/ j=1 1/σj2 . •Another example
•Exercises
April 28, 2009 Chapter 2 - p. 70/83

Classical test of heterogeneity
■ A classical test for heterogeneity, that is, whether Hierarchical

models(III)
it is reasonable to assume all the trials are — Meta-analysis of
clinical trials
measuring the same quantity, is provided by
(Ref.: Gelman et
J al., 5.6, 19.4)
X (yj − µ̂)2 •Meta-analysis
Q= •The data
j=1
σj2 •Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 71/83


models(III)
clinical trials
(Ref.: Gelman et
J al., 5.6, 19.4)
Q= •The data
j=1
σj2 •Parameters
•Normal Approx.
•Possible
■ which has a χ2J−1 distribution under the null Assumptions
•Classical model
hypothesis of homogeneity. •Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 71/83


models(III)
clinical trials
(Ref.: Gelman et
J al., 5.6, 19.4)
Q= •The data
j=1
σj2 •Parameters
•Normal Approx.
•Possible
■ which has a χ2J−1 distribution under the null Assumptions
•Classical model
hypothesis of homogeneity. •Classical test
•HM–stage 1
•HM–stage 2
■ It is well known that this is not a very powerful
.
•Posterior
test (Whitehead (2002)). distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 71/83

A hierarchical model: Stage 1
■ The first stage of the hierarchical model assumes Hierarchical

models(III)
that: — Meta-analysis of
clinical trials
yj |θj , σj2 ∼ N (θj , σj2 ).
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 72/83

■ The first stage of the hierarchical model assumes Hierarchical

models(III)
that: — Meta-analysis of
clinical trials
yj |θj , σj2 ∼ N (θj , σj2 ).
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
■ The simplification of known variances is •The data
•Parameters
reasonable with large sample sizes (but see its •Normal Approx.
/ 0
multivariate analysis that use the •Possible
Assumptions
true binomial sampling distribution). •Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 72/83

The second stage of the hierarchical model Hierarchical

models(III)
assumes that the trial means θj are exchangeable — Meta-analysis of
clinical trials
with a normal distribution
(Ref.: Gelman et
2
θj ∼ N (µ, τ ). al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 73/83

■ If µ and τ 2 are fixed and known, then the Hierarchical
models(III)
conditional posterior distribution of the θj′ s are — Meta-analysis of
clinical trials
independent, and
(Ref.: Gelman et
al., 5.6, 19.4)
θj |µ, τ, y ∼ N (θ̂j , Vj ), •Meta-analysis
•The data
where •Parameters
•Normal Approx.
1 1 •Possible
y
σj2 j
+ τ2
µ 1 Assumptions
θ̂j = 1 1 and Vj = 1 1 . •Classical model
σj2
+ τ2 σj2
+ τ2
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 74/83

■ If µ and τ 2 are fixed and known, then the Hierarchical
models(III)
conditional posterior distribution of the θj′ s are — Meta-analysis of
clinical trials
independent, and
(Ref.: Gelman et
al., 5.6, 19.4)
θj |µ, τ, y ∼ N (θ̂j , Vj ), •Meta-analysis
•The data
where •Parameters
•Normal Approx.
1 1 •Possible
y
σj2 j
+ τ2
µ 1 Assumptions
θ̂j = 1 1 and Vj = 1 1 . •Classical model
σj2
+ τ2 σj2
+ τ2
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
■ Note that the posterior mean is a distn s
precision-weighted average of the prior •Result
•Another example
population mean and the observed yj •Exercises
representing the treatment effect in the j th group.
April 28, 2009 Chapter 2 - p. 74/83

.
Posterior distn s for the θj′ s given y, µ, τ
■ The expression for the posterior distribution of θj Hierarchical

models(III)
can be rearranged as — Meta-analysis of
clinical trials
θj |yj , µ, τ ∼ N (Bj µ + (1 − Bj )yj , (1 − Bj )σj2 ), (Ref.: Gelman et

al., 5.6, 19.4)
where Bj = σj2 /(σj2 + τ 2 ) is the weight given to •Meta-analysis

•The data
the prior mean. •Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 75/83

.

models(III)
clinical trials

al., 5.6, 19.4)

•The data
•Normal Approx.
•Possible
Assumptions
■ Ignoring data from the other trials is equivalent to •Classical model
setting τ 2 = ∞, that is, Bj = 0. •Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 75/83

.

models(III)
clinical trials

al., 5.6, 19.4)

•The data
•Normal Approx.
•Possible
Assumptions
■ Ignoring data from the other trials is equivalent to •Classical model
setting τ 2 = ∞, that is, Bj = 0. •Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
■ The classical pooled result results from τ 2 → 0, distn s
•Result
that is, Bj = 1. •Another example
•Exercises
April 28, 2009 Chapter 2 - p. 75/83

Posterior distn for the µ given y, τ
■ A uniform conditional prior distribution p(µ|τ ) = 1 Hierarchical

models(III)
for µ leads to the following posterior distribution: — Meta-analysis of
clinical trials
µ|τ, y ∼ N (µ̂, Vµ ) (Ref.: Gelman et

al., 5.6, 19.4)
where µ̂ is is the precision weighted average of •Meta-analysis
•The data
the yj values, and Vµ−1 is the total precision: •Parameters
•Normal Approx.
PJ 1 •Possible
j=1 σj +τ 2 yj
J Assumptions
2
−1
X 1
µ̂ = 1 and Vµ = 2 2
. •Classical model
PJ
σ 2 +τ 2
σ +τ
j=1 j
•Classical test
•HM–stage 1
j=1 j
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 76/83

Posterior distn for the µ given y, τ
■ A uniform conditional prior distribution p(µ|τ ) = 1 Hierarchical

models(III)
for µ leads to the following posterior distribution: — Meta-analysis of
clinical trials
µ|τ, y ∼ N (µ̂, Vµ ) (Ref.: Gelman et

al., 5.6, 19.4)
where µ̂ is is the precision weighted average of •Meta-analysis
•The data
the yj values, and Vµ−1 is the total precision: •Parameters
•Normal Approx.
PJ 1 •Possible
j=1 σj +τ 2 yj
J Assumptions
2
−1
X 1
µ̂ = 1 and Vµ = 2 2
. •Classical model
PJ
σ 2 +τ 2
σ +τ
j=1 j
•Classical test
•HM–stage 1
j=1 j
•HM–stage 2
.
•Posterior
■ τ 2 → 0 gives the classical result where Bj = 1. distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 76/83

The exchangeable model and shrinkage
■ The exchangeable model therefore leads to Hierarchical

models(III)
narrower posterior intervals for the θj′ s than the
/ 0
clinical trials
independence model, but they are shrunk
(Ref.: Gelman et
towards the prior mean response. al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 77/83

The exchangeable model and shrinkage
■ The exchangeable model therefore leads to Hierarchical

models(III)
narrower posterior intervals for the θj′ s than the
/ 0
clinical trials
independence model, but they are shrunk
(Ref.: Gelman et
towards the prior mean response. al., 5.6, 19.4)
•Meta-analysis
•The data
■ The degree of shrinkage depends on the •Parameters
•Normal Approx.
variability between studies, measured by τ 2 , and •Possible
Assumptions
the precision of the estimate of the treatment •Classical model
effect from the individual trial, measured by σj2 . •Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 77/83

The full hierarchical model
.ll use
■ The hierarchical model is completed by Hierarchical
models(III)
specifying a prior distribution for τ — we — Meta-analysis of
clinical trials
the noninformative prior p(τ ) = 1.
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 78/83

The full hierarchical model
.ll use
■ The hierarchical model is completed by Hierarchical
models(III)
specifying a prior distribution for τ — we — Meta-analysis of
clinical trials
the noninformative prior p(τ ) = 1.
(Ref.: Gelman et
al., 5.6, 19.4)
■ Nevertheless, p(τ |y) is a complicated function of •Meta-analysis
•The data
τ: •Parameters
QJ •Normal Approx.
2 2
j=1 N (y j |µ̂, σj + τ ) •Possible
Assumptions
p(τ |y) ∝ •Classical model
N (µ̂|µ̂, Vµ ) •Classical test
J •HM–stage 1
2

1/2
Y
2 2 −1/2 (yj − µ̂) •HM–stage 2
∝ Vµ (σj + τ ) exp − .
.
2 •Posterior
2(σj + τ )2 distn s
j=1 •Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 78/83

The profile likelihood for τ
■ A tractable alternative to the marginal posterior Hierarchical

models(III)
distribution is the profile likelihood for τ , derived by — Meta-analysis of
clinical trials
replacing µ in the joint likelihood for µ and τ by its
(Ref.: Gelman et
conditional maximum likelihood estimate µ̂(τ ) al., 5.6, 19.4)
given the value of τ . •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 79/83

The profile likelihood for τ
■ A tractable alternative to the marginal posterior Hierarchical

models(III)
distribution is the profile likelihood for τ , derived by — Meta-analysis of
clinical trials
replacing µ in the joint likelihood for µ and τ by its
(Ref.: Gelman et
conditional maximum likelihood estimate µ̂(τ ) al., 5.6, 19.4)
given the value of τ . •Meta-analysis
•The data
•Parameters
•Normal Approx.
■ This summarizes the support for different values •Possible
of τ and is easily evaluated as Assumptions
•Classical model
J •Classical test
Y •HM–stage 1
N (yj |µ̂(τ ), σj2 + τ 2 ). •HM–stage 2
.
•Posterior
j=1 distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 79/83

Estimates of τ
■ The maximum likelihood estimate is τ = 0 Hierarchical

models(III)
although values of τ with a profile log(likelihood) — Meta-analysis of
clinical trials
above −1.9622 /2 ≈ −2 might be considered as
(Ref.: Gelman et
being reasonably supported by the data. al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 80/83

Estimates of τ
■ The maximum likelihood estimate is τ = 0 Hierarchical

models(III)
although values of τ with a profile log(likelihood) — Meta-analysis of
clinical trials
above −1.9622 /2 ≈ −2 might be considered as
(Ref.: Gelman et
being reasonably supported by the data. al., 5.6, 19.4)
•Meta-analysis
•The data
■ τ = 0 would not appear to be a robust choice as •Parameters
•Normal Approx.
an estimate since non-zero values of τ , which •Possible
are well-supported by the data, can have a Assumptions
•Classical model
strong influence on the conclusions. We shall •Classical test
•HM–stage 1
assume, for illustration, the method-of-moments •HM–stage 2
estimator τ̂ = 0.41. .
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 80/83

Results of the meta-analysis
■ Posterior quantiles of θj′ s in Table 5.4 on page Hierarchical

models(III)
146 — Meta-analysis of
clinical trials
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 81/83

Results of the meta-analysis
■ Posterior quantiles of θj′ s in Table 5.4 on page Hierarchical

models(III)
146 — Meta-analysis of
clinical trials
(Ref.: Gelman et
■ Estimates of µ and τ , and the predicted θj , see al., 5.6, 19.4)
Table 5.5 on page 149. •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 81/83

Another meta-analysis example
8 clinical trials, each with each with two groups of Hierarchical

models(III)
heart attack patients receiving (or not) IV — Meta-analysis of
clinical trials
magnesium sulfate (rates 1% to 17%, samples
(Ref.: Gelman et
sizes from < 50 to more than 2300) al., 5.6, 19.4)
•Meta-analysis
Trial Magnesium group control group Estimated Estimated
•The data
deaths patients deaths patients log(OR) yj SD σj •Parameters
Morton 1 40 2 36 -0.83 1.25 •Normal Approx.
Rasmussen 9 135 23 135 -1.06 0.41 •Possible
Smith 2 200 23 135 -1.28 0.81 Assumptions
Abraham 1 48 1 46 -0.04 1.43 •Classical model
Feldstedt 10 150 8 148 0.22 0.49 •Classical test
Schechter 1 59 9 56 -2.41 1.07 •HM–stage 1
Ceremuzynski 1 25 3 23 -1.28 1.19 •HM–stage 2
.
LIMIT-2 90 1159 118 1157 -0.30 0.15 •Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 82/83

Another meta-analysis example
8 clinical trials, each with each with two groups of Hierarchical

models(III)
heart attack patients receiving (or not) IV — Meta-analysis of
clinical trials
magnesium sulfate (rates 1% to 17%, samples
(Ref.: Gelman et
sizes from < 50 to more than 2300) al., 5.6, 19.4)
•Meta-analysis
Trial Magnesium group control group Estimated Estimated
•The data
deaths patients deaths patients log(OR) yj SD σj •Parameters
Morton 1 40 2 36 -0.83 1.25 •Normal Approx.
Rasmussen 9 135 23 135 -1.06 0.41 •Possible
Smith 2 200 23 135 -1.28 0.81 Assumptions
Abraham 1 48 1 46 -0.04 1.43 •Classical model
Feldstedt 10 150 8 148 0.22 0.49 •Classical test
Schechter 1 59 9 56 -2.41 1.07 •HM–stage 1
Ceremuzynski 1 25 3 23 -1.28 1.19 •HM–stage 2
.
LIMIT-2 90 1159 118 1157 -0.30 0.15 •Posterior
distn s
•Result
Please analyze the data using a normal-normal •Another example
•Exercises
hierarchical Bayesian model.
April 28, 2009 Chapter 2 - p. 82/83

Exercises
Ex 5.10 Hierarchical
models(III)
clinical trials
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2
.
•Posterior
distn s
•Result
•Another example
•Exercises
April 28, 2009 Chapter 2 - p. 83/83

Bayesian Statistical Analysis: Chapter 4: Hierarchical Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Statistical Analysis: Chapter 4: Hierarchical Models

Uploaded by

Copyright:

Available Formats

Bayesian Statistical Analysis

Chapter 4: Hierarchical Models

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 1/83

¥È _D1È ÇUÈ ÚOûüØ

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 2/83

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 3/83

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 4/83

■ Likelihood (sampling distribution): p(y|θ) Hierarchical

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83

■ Likelihood (sampling distribution): p(y|θ) Hierarchical

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83

■ Likelihood (sampling distribution): p(y|θ) Hierarchical

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83

■ Likelihood (sampling distribution): p(y|θ) Hierarchical

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83

■ Likelihood (sampling distribution): p(y|θ) Hierarchical

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83

Consider a 2-parameter model: Hierarchical

θ1 —parameter of interest (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83

Consider a 2-parameter model: Hierarchical

θ1 —parameter of interest (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83

Consider a 2-parameter model: Hierarchical

θ1 —parameter of interest (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83

Consider a 2-parameter model: Hierarchical

θ1 —parameter of interest (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 7/83

historical data. Then we follow the Bayesian (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 8/83

historical data. Then we follow the Bayesian (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 8/83

historical data. Then we follow the Bayesian (Ref.: Gelman et

April 28, 2009 Chapter 2 - p. 8/83

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 9/83

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 9/83

6/20 16/52 15/47 15/46 9/24

6/20 16/52 15/47 15/46 9/24

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 10/83

■ Solve for α and β: (Ref.: Gelman et

SCHOOL OF FINANCE AND S TAT I S T I C S

¥È _D1È ÇUÈ ÚOûüØ