You are on page 1of 257

Bayesian Statistical Analysis

Chapter 4: Hierarchical Models

Tang Yin-cai
yctang@stat.ecnu.edu.cn

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 1/83


¥©ë©z
■ James O. Berger, Statistical Decision Theory and
Bayesian Analysis (Second Edition),

¥È _D1È ÇUƒÈ ÚOûü؆“


Springer-Verlag, 1985

d©Û ¥IÚOч
: , ,
, , 1998. pp714.
ÇUƒ y““dÚOÆ
¥IÚOч
■ Samuel Kotz, , (Modern
Bayesian Statistics), , 2000.
■ ÜÌ Ç¸ “dÚOíä ‰Æч
, , , ,
1994.
■ +˜t “dÚO ¥IÚOч
, , , 1999.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 2/83


Hierarchical

© .
Models
( )
— Setting-up and
examples

© .
Hierarchical Models
( )
— Setting-up and examples

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 3/83


Hierarchical
models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Hierarchical models(I) — introduction Berger, 4.6; J.
Albert, 7)
•Review 1
(Ref.: Gelman et al., 5.1-5.3; Berger, •Review 2
Empirical Bayes
4.6; J. Albert, 7) •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 4/83


Review: 1-parameter model

■ Likelihood (sampling distribution): p(y|θ) Hierarchical


models(I) —
introduction
.
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83


Review: 1-parameter model

■ Likelihood (sampling distribution): p(y|θ) Hierarchical


models(I) —
introduction
■ Prior distribution: p(θ).
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83


Review: 1-parameter model

■ Likelihood (sampling distribution): p(y|θ) Hierarchical


models(I) —
introduction
■ Prior distribution: p(θ).
(Ref.: Gelman et
Note: If no information is available, use al., 5.1-5.3;
Berger, 4.6; J.
diffuse/noninformative prior(s). However, you Albert, 7)
•Review 1
need to make sure that the posterior is proper. •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83


Review: 1-parameter model

■ Likelihood (sampling distribution): p(y|θ) Hierarchical


models(I) —
introduction
■ Prior distribution: p(θ).
(Ref.: Gelman et
Note: If no information is available, use al., 5.1-5.3;
Berger, 4.6; J.
diffuse/noninformative prior(s). However, you Albert, 7)
•Review 1
need to make sure that the posterior is proper. •Review 2
Empirical Bayes
■ Posterior distribution: p(θ|y) ∝ p(θ)p(y|θ) •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83


Review: 1-parameter model

■ Likelihood (sampling distribution): p(y|θ) Hierarchical


models(I) —
introduction
■ Prior distribution: p(θ).
(Ref.: Gelman et
Note: If no information is available, use al., 5.1-5.3;
Berger, 4.6; J.
diffuse/noninformative prior(s). However, you Albert, 7)
•Review 1
need to make sure that the posterior is proper. •Review 2
Empirical Bayes
■ Posterior distribution: p(θ|y) ∝ p(θ)p(y|θ) •Why
Hierarchical?
•Hierarchical
Inference: Model
•hierarchical
■ Estimate of θ, e.g. posterior mode approach
•Exchangeability
■ Bayesian interval •Basic ex. model
R •General ex.
■ Predictive distribution: p(ỹ|y) = p(θ|y)p(ỹ|θ)dθ model
•Typical structure
■ ...... •Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 5/83


Review: Multiparameter model

Consider a 2-parameter model: Hierarchical


models(I) —
■ Parameters θ = (θ1 , θ2 ): introduction

θ1 —parameter of interest (Ref.: Gelman et


al., 5.1-5.3;
θ2 —nuisance parameter Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83


Review: Multiparameter model

Consider a 2-parameter model: Hierarchical


models(I) —
■ Parameters θ = (θ1 , θ2 ): introduction

θ1 —parameter of interest (Ref.: Gelman et


al., 5.1-5.3;
θ2 —nuisance parameter Berger, 4.6; J.
Albert, 7)
■ Likelihood (sampling distribution): p(y|θ1 , θ2 ) •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83


Review: Multiparameter model

Consider a 2-parameter model: Hierarchical


models(I) —
■ Parameters θ = (θ1 , θ2 ): introduction

θ1 —parameter of interest (Ref.: Gelman et


al., 5.1-5.3;
θ2 —nuisance parameter Berger, 4.6; J.
Albert, 7)
■ Likelihood (sampling distribution): p(y|θ1 , θ2 ) •Review 1
•Review 2
■ Prior distribution: p(θ1 , θ2 ) Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83


Review: Multiparameter model

Consider a 2-parameter model: Hierarchical


models(I) —
■ Parameters θ = (θ1 , θ2 ): introduction

θ1 —parameter of interest (Ref.: Gelman et


al., 5.1-5.3;
θ2 —nuisance parameter Berger, 4.6; J.
Albert, 7)
■ Likelihood (sampling distribution): p(y|θ1 , θ2 ) •Review 1
•Review 2
■ Prior distribution: p(θ1 , θ2 ) Empirical Bayes
•Why
■ Posterior distribution of θ1 Hierarchical?
•Hierarchical
Z Model
•hierarchical
p(θ1 |y) = p(θ1 |θ2 , y)p(θ2 |y)dθ2 approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 6/83


Review: Multiparameter model

Hierarchical
models(I) —
Strategy of Computation (for all Bayesian analysis!)— introduction
Simulation —Draw samples from univariate distributions: (Ref.: Gelman et
Draw θ2 , then given θ2 draw θ1 . al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
■ If θ2 is given, then it degenerates into a •Review 1
•Review 2
1-parameter model. Empirical Bayes
•Why
■ If direct sampling is not possible, then sample Hierarchical?
•Hierarchical
from its discretized distribution/grid Model
•hierarchical
approximation. approach
•Exchangeability
■ For more complex models, advanced •Basic ex. model
•General ex.
computation methods can be used (See Part III, model
Gelman, et. al.) •Typical structure
•Posterior dist.
■ ...... •Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 7/83


Empirical Bayes( ²“d)
■ Often we would like the parameters of a Hierarchical
models(I) —
prior/population distribution to be estimated from introduction

historical data. Then we follow the Bayesian (Ref.: Gelman et


al., 5.1-5.3;

²“d
method as above. This method is called Empirical Berger, 4.6; J.
Bayes( ). Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 8/83


Empirical Bayes( ²“d)
■ Often we would like the parameters of a Hierarchical
models(I) —
prior/population distribution to be estimated from introduction

historical data. Then we follow the Bayesian (Ref.: Gelman et


al., 5.1-5.3;

²“d
method as above. This method is called Empirical Berger, 4.6; J.
Bayes( ). Albert, 7)
•Review 1
How? •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 8/83


Empirical Bayes( ²“d)
■ Often we would like the parameters of a Hierarchical
models(I) —
prior/population distribution to be estimated from introduction

historical data. Then we follow the Bayesian (Ref.: Gelman et


al., 5.1-5.3;

²“d
method as above. This method is called Empirical Berger, 4.6; J.
Bayes( ). Albert, 7)
•Review 1
How? •Review 2
Empirical Bayes
■ Example. Estimating the risk of tumor of rats—θ: •Why
Hierarchical?
◆ Current experiment: y = 4 of n = 14 rate •Hierarchical
Model
developed tumor. •hierarchical
approach
◆ Bayesisan Model:
•Exchangeability
•Basic ex. model
y|θ ∼ Bin(n, θ) •General ex.
model
θ ∼ Beta(α, β) •Typical structure
•Posterior dist.
•Predictive dist.
◆ Posterior: θ|y ∼ Beta(α + 4, β + 10).
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 8/83


■ α =?, β =? Hierarchical
models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 9/83


■ α =?, β =? From historical data: yi /ni , i = 1 − 70 Hierarchical
models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 9/83


■ α =?, β =? From historical data: yi /ni , i = 1 − 70 Hierarchical
models(I) —
introduction
0/20 0/20 0/20 0/20 0/20
(Ref.: Gelman et
0/20 0/20 0/19 0/19 0/19 al., 5.1-5.3;
Berger, 4.6; J.
0/19 0/18 0/18 1/17 1/20 Albert, 7)
•Review 1
1/20 1/20 1/19 1/19 1/19 •Review 2
1/18 1/18 2/25 2/24 2/23 Empirical Bayes
•Why
2/20 2/20 2/20 2/20 2/20 Hierarchical?
•Hierarchical
2/20 1/10 5/49 2/19 5/46 Model
•hierarchical
approach
3/27 2/17 7/49 7/47 3/20
•Exchangeability
3/20 2/13 9/48 10/50 4/20 •Basic ex. model
•General ex.
4/20 4/20 4/20 4/20 4/20 model
•Typical structure
4/20 10/48 4/19 4/19 4/19 •Posterior dist.
•Predictive dist.
5/22 11/46 12/49 5/20 5/20
6/23 5/19
SCHOOL OF 6/22
F I N A N C E A6/20
N D S T A T 6/20
ISTICS

6/20 16/52 15/47 15/46 9/24


April 28, 2009 Chapter 2 - p. 9/83
■ α =?, β =? From historical data: yi /ni , i = 1 − 70 Hierarchical
models(I) —
introduction
0/20 0/20 0/20 0/20 0/20
(Ref.: Gelman et
0/20 0/20 0/19 0/19 0/19 al., 5.1-5.3;
Berger, 4.6; J.
0/19 0/18 0/18 1/17 1/20 Albert, 7)
•Review 1
1/20 1/20 1/19 1/19 1/19 •Review 2
1/18 1/18 2/25 2/24 2/23 Empirical Bayes
•Why
2/20 2/20 2/20 2/20 2/20 Hierarchical?
•Hierarchical
2/20 1/10 5/49 2/19 5/46 Model
•hierarchical
approach
3/27 2/17 7/49 7/47 3/20
•Exchangeability
3/20 2/13 9/48 10/50 4/20 •Basic ex. model
•General ex.
4/20 4/20 4/20 4/20 4/20 model
•Typical structure
4/20 10/48 4/19 4/19 4/19 •Posterior dist.
•Predictive dist.
5/22 11/46 12/49 5/20 5/20
6/23 5/19
SCHOOL OF 6/22
F I N A N C E A6/20
N D S T A T 6/20
ISTICS

6/20 16/52 15/47 15/46 9/24


April 28, 2009 Chapter 2 - p. 9/83
■ Sample mean of yi /ni =0.136; Sample standard Hierarchical
models(I) —
deviation =0.103. introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 10/83


■ Sample mean of yi /ni =0.136; Sample standard Hierarchical
models(I) —
deviation =0.103. introduction

■ Solve for α and β: (Ref.: Gelman et


al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
αβ 2 Empirical Bayes
2
= Var (θ) = 0.103 .Or •Why
(α + β) (α + β + 1) Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 10/83


■ Sample mean of yi /ni =0.136; Sample standard Hierarchical
models(I) —
deviation =0.103. introduction

■ Solve for α and β: (Ref.: Gelman et


al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
αβ 2 Empirical Bayes
2
= Var (θ) = 0.103 .Or •Why
(α + β) (α + β + 1) Hierarchical?
•Hierarchical
Model
•hierarchical
E(θ)(1 − E(θ)) approach
α+β = −1 •Exchangeability
Var(θ) •Basic ex. model
•General ex.
α = (α + β)E(θ) model
•Typical structure
β = (α + β)(1 − E(θ)). •Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 10/83


■ Sample mean of yi /ni =0.136; Sample standard Hierarchical
models(I) —
deviation =0.103. introduction

■ Solve for α and β: (Ref.: Gelman et


al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
αβ 2 Empirical Bayes
2
= Var (θ) = 0.103 .Or •Why
(α + β) (α + β + 1) Hierarchical?
•Hierarchical
Model
•hierarchical
E(θ)(1 − E(θ)) approach
α+β = −1 •Exchangeability
Var(θ) •Basic ex. model
•General ex.
α = (α + β)E(θ) model
•Typical structure
β = (α + β)(1 − E(θ)). •Posterior dist.
•Predictive dist.
■ α = 1.37, β = 8.71.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 10/83


■ Sample mean of yi /ni =0.136; Sample standard Hierarchical
models(I) —
deviation =0.103. introduction

■ Solve for α and β: (Ref.: Gelman et


al., 5.1-5.3;
α Berger, 4.6; J.
= E(θ) = 0.136 Albert, 7)
α+β •Review 1
•Review 2
αβ 2 Empirical Bayes
2
= Var (θ) = 0.103 .Or •Why
(α + β) (α + β + 1) Hierarchical?
•Hierarchical
Model
•hierarchical
E(θ)(1 − E(θ)) approach
α+β = −1 •Exchangeability
Var(θ) •Basic ex. model
•General ex.
α = (α + β)E(θ) model
•Typical structure
β = (α + β)(1 − E(θ)). •Posterior dist.
•Predictive dist.
■ α = 1.37, β = 8.71.
S F
CHOOL OF INANCE AND S TAT I S T I C S
→Posterior:Beta(5.37, 18.71).
April 28, 2009 Chapter 2 - p. 10/83
■ Is Empirical Bayes Reasonable (Bayesianly)? Hierarchical
models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 11/83


■ Is Empirical Bayes Reasonable (Bayesianly)? Hierarchical
models(I) —
introduction
■ Empirical Bayes above require the assumption
that current tumor risk, θ71 and historical rumor (Ref.: Gelman et
al., 5.1-5.3;
risks θ1 , . . . , θ70 be a random sample from a Berger, 4.6; J.
Albert, 7)
common distribution. •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 11/83


■ Is Empirical Bayes Reasonable (Bayesianly)? Hierarchical
models(I) —
introduction
■ Empirical Bayes above require the assumption
that current tumor risk, θ71 and historical rumor (Ref.: Gelman et
al., 5.1-5.3;
risks θ1 , . . . , θ70 be a random sample from a Berger, 4.6; J.
Albert, 7)
common distribution. •Review 1
•Review 2
■ This assumption—independence is invalid, for Empirical Bayes
•Why
example, Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 11/83


■ Is Empirical Bayes Reasonable (Bayesianly)? Hierarchical
models(I) —
introduction
■ Empirical Bayes above require the assumption
that current tumor risk, θ71 and historical rumor (Ref.: Gelman et
al., 5.1-5.3;
risks θ1 , . . . , θ70 be a random sample from a Berger, 4.6; J.
Albert, 7)
common distribution. •Review 1
•Review 2
■ This assumption—independence is invalid, for Empirical Bayes
•Why
example, Hierarchical?
•Hierarchical
◆ if the historical experiments were done in lab Model
•hierarchical
A, but the current data were gathered in lab B; approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 11/83


■ Is Empirical Bayes Reasonable (Bayesianly)? Hierarchical
models(I) —
introduction
■ Empirical Bayes above require the assumption
that current tumor risk, θ71 and historical rumor (Ref.: Gelman et
al., 5.1-5.3;
risks θ1 , . . . , θ70 be a random sample from a Berger, 4.6; J.
Albert, 7)
common distribution. •Review 1
•Review 2
■ This assumption—independence is invalid, for Empirical Bayes
•Why
example, Hierarchical?
•Hierarchical
◆ if the historical experiments were done in lab Model
•hierarchical
A, but the current data were gathered in lab B; approach
•Exchangeability
◆ if there were a time •Basic ex. model
•General ex.
trend/structure/dependence among θ1 , . . . , θ71 model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 11/83


■ Questions/Problems may arise of estimating a Hierarchical
models(I) —
prior from existing data: introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 12/83


■ Questions/Problems may arise of estimating a Hierarchical
models(I) —
prior from existing data: introduction

◆ point estimate for α and β seems arbitrary, (Ref.: Gelman et


al., 5.1-5.3;
which may ignores some posterior uncertainty; Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 12/83


■ Questions/Problems may arise of estimating a Hierarchical
models(I) —
prior from existing data: introduction

◆ point estimate for α and β seems arbitrary, (Ref.: Gelman et


al., 5.1-5.3;
which may ignores some posterior uncertainty; Berger, 4.6; J.
Albert, 7)
◆ does it make sense to estimate α and β? •Review 1
•Review 2
α and β are part of the "prior": should they be Empirical Bayes
•Why
known before the data are gathered, according Hierarchical?
•Hierarchical
to the logic of Bayesian inference? Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 12/83


Why use Hierarchical models?

Hierarchical
■ Many problems have multiple parameters that models(I) —
introduction
are related.
(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 13/83


Why use Hierarchical models?

Hierarchical
■ Many problems have multiple parameters that models(I) —
introduction
are related.
(Ref.: Gelman et
■ Use a joint probability model to reflect this al., 5.1-5.3;
Berger, 4.6; J.
dependence. Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 13/83


Why use Hierarchical models?

Hierarchical
■ Many problems have multiple parameters that models(I) —
introduction
are related.
(Ref.: Gelman et
■ Use a joint probability model to reflect this al., 5.1-5.3;
Berger, 4.6; J.
dependence.
©
Albert, 7)
•Review 1
■ It is useful to think hierarchically( ): •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 13/83


Hierarchical Model

■ Consider a set of experiments j = 1, 2, . . . , J. Hierarchical


models(I) —
Experiment j has data (vector) yj = (yj1 , . . . , yjnj ) introduction

and parameter (vector) θj . The distribution of yj , (Ref.: Gelman et


al., 5.1-5.3;
j = 1, 2, . . . , J are conditional on parameters θj : Berger, 4.6; J.
Albert, 7)
yj |θj ∼ p(y|θj ) •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 14/83


Hierarchical Model

■ Consider a set of experiments j = 1, 2, . . . , J. Hierarchical


models(I) —
Experiment j has data (vector) yj = (yj1 , . . . , yjnj ) introduction

and parameter (vector) θj . The distribution of yj , (Ref.: Gelman et


al., 5.1-5.3;
j = 1, 2, . . . , J are conditional on parameters θj : Berger, 4.6; J.
Albert, 7)
yj |θj ∼ p(y|θj ) •Review 1
•Review 2
Empirical Bayes
■ which themselves have a probability •Why
Hierarchical?
specification: •Hierarchical
Model
θj |φ ∼ p(θ|φ), j = 1, 2, . . . , J •hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 14/83


Hierarchical Model

■ Consider a set of experiments j = 1, 2, . . . , J. Hierarchical


models(I) —
Experiment j has data (vector) yj = (yj1 , . . . , yjnj ) introduction

and parameter (vector) θj . The distribution of yj , (Ref.: Gelman et


al., 5.1-5.3;
j = 1, 2, . . . , J are conditional on parameters θj : Berger, 4.6; J.
Albert, 7)
yj |θj ∼ p(y|θj ) •Review 1
•Review 2
Empirical Bayes
■ which themselves have a probability •Why
Hierarchical?
specification: •Hierarchical
Model
θj |φ ∼ p(θ|φ), j = 1, 2, . . . , J •hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
in terms of further parameters φ ∼ p(φ), known
‡ëê
■ model
•Typical structure
as hyperparameters( ). •Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 14/83


Hierarchical Model

■ Consider a set of experiments j = 1, 2, . . . , J. Hierarchical


models(I) —
Experiment j has data (vector) yj = (yj1 , . . . , yjnj ) introduction

and parameter (vector) θj . The distribution of yj , (Ref.: Gelman et


al., 5.1-5.3;
j = 1, 2, . . . , J are conditional on parameters θj : Berger, 4.6; J.
Albert, 7)
yj |θj ∼ p(y|θj ) •Review 1
•Review 2
Empirical Bayes
■ which themselves have a probability •Why
Hierarchical?
specification: •Hierarchical
Model
θj |φ ∼ p(θ|φ), j = 1, 2, . . . , J •hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
in terms of further parameters φ ∼ p(φ), known
‡ëê
■ model
•Typical structure
as hyperparameters( ). •Posterior dist.
•Predictive dist.
■ If θ1 , . . . , θJ are iid, then single-parameter model
is enough (?) S CHOOL OF F
INANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 14/83


Bioassay example(Chapter 3)

Bioassay example revisited: Hierarchical


models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 15/83


Bioassay example(Chapter 3)

Bioassay example revisited: Hierarchical


models(I) —
introduction
Dose, xi Number of Number of Death rate
(Ref.: Gelman et
(log g/ml) animals, ni deaths, yi θi al., 5.1-5.3;
Berger, 4.6; J.
-0.86 5 0 θ1 Albert, 7)
•Review 1
-0.30 5 1 θ2 •Review 2
Empirical Bayes
-0.05 5 3 θ3
•Why
0.73 5 5 θ4 Hierarchical?
•Hierarchical
Model
•hierarchical
Model: approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 15/83


Bioassay example(Chapter 3)

Bioassay example revisited: Hierarchical


models(I) —
introduction
Dose, xi Number of Number of Death rate
(Ref.: Gelman et
(log g/ml) animals, ni deaths, yi θi al., 5.1-5.3;
Berger, 4.6; J.
-0.86 5 0 θ1 Albert, 7)
•Review 1
-0.30 5 1 θ2 •Review 2
Empirical Bayes
-0.05 5 3 θ3
•Why
0.73 5 5 θ4 Hierarchical?
•Hierarchical
Model
•hierarchical
Model: approach
■ yi |θi ∼ Bin(ni , θi ) •Exchangeability
•Basic ex. model
•General ex.
■ θi , i = 1, . . . , 4 are independent model
  •Typical structure
θi
■ logit(θi ) = log 1−θ i
= α + βxi •Posterior dist.
•Predictive dist.
■ parameters: α, β with noninformative prior.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 15/83


Bioassay example(cont’d)

.
Let s generalize our simple bioassay example: Hierarchical
models(I) —
introduction
■ Imagine repeated bioassays with same (Ref.: Gelman et
compound, where (αj , βj ) parameters from al., 5.1-5.3;
Berger, 4.6; J.
different bioassays. Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 16/83


Bioassay example(cont’d)

.
Let s generalize our simple bioassay example: Hierarchical
models(I) —
introduction
■ Imagine repeated bioassays with same (Ref.: Gelman et
compound, where (αj , βj ) parameters from al., 5.1-5.3;
Berger, 4.6; J.
different bioassays. Albert, 7)
•Review 1
■ A single (α, β) may be inadequate to fit a •Review 2
Empirical Bayes
combined data set (several experiments) (⇒ •Why
pooled estimate). Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 16/83


Bioassay example(cont’d)

.
Let s generalize our simple bioassay example: Hierarchical
models(I) —
introduction
■ Imagine repeated bioassays with same (Ref.: Gelman et
compound, where (αj , βj ) parameters from al., 5.1-5.3;
Berger, 4.6; J.
different bioassays. Albert, 7)
•Review 1
■ A single (α, β) may be inadequate to fit a •Review 2
Empirical Bayes
combined data set (several experiments) (⇒ •Why
pooled estimate). Hierarchical?
•Hierarchical
Model
Separate unrelated (αj , βj ) are likely to
/ 0
■ •hierarchical
approach
overfit data (only 4 points in each data set). •Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 16/83


Bioassay example(cont’d)

.
Let s generalize our simple bioassay example: Hierarchical
models(I) —
introduction
■ Imagine repeated bioassays with same (Ref.: Gelman et
compound, where (αj , βj ) parameters from al., 5.1-5.3;
Berger, 4.6; J.
different bioassays. Albert, 7)
•Review 1
■ A single (α, β) may be inadequate to fit a •Review 2
Empirical Bayes
combined data set (several experiments) (⇒ •Why
pooled estimate). Hierarchical?
•Hierarchical
Model
Separate unrelated (αj , βj ) are likely to
/ 0
■ •hierarchical
approach
overfit data (only 4 points in each data set). •Exchangeability
•Basic ex. model
■ Think: Is there a compromise? •General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 16/83


Bioassay example(cont’d)

.
Let s generalize our simple bioassay example: Hierarchical
models(I) —
introduction
■ Imagine repeated bioassays with same (Ref.: Gelman et
compound, where (αj , βj ) parameters from al., 5.1-5.3;
Berger, 4.6; J.
different bioassays. Albert, 7)
•Review 1
■ A single (α, β) may be inadequate to fit a •Review 2
Empirical Bayes
combined data set (several experiments) (⇒ •Why
pooled estimate). Hierarchical?
•Hierarchical
Model
Separate unrelated (αj , βj ) are likely to
/ 0
■ •hierarchical
approach
overfit data (only 4 points in each data set). •Exchangeability
•Basic ex. model
■ Think: Is there a compromise? •General ex.
model
■ — Hierarchical Model: a compromise between •Typical structure
•Posterior dist.
single data estimate and pooled estimate. •Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 16/83


The hierarchical approach

■ A natural prior distribution arises by assuming Hierarchical


models(I) —
(αj , βj )′ s are a sample from common population introduction

distribution. (Ref.: Gelman et


al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 17/83


The hierarchical approach

■ A natural prior distribution arises by assuming Hierarchical


models(I) —
(αj , βj )′ s are a sample from common population introduction

distribution.
.
(Ref.: Gelman et
al., 5.1-5.3;
■ We d be better off estimating the Berger, 4.6; J.
Albert, 7)
parameters,say φ, governing the population •Review 1
•Review 2
distribution of (αj , βj ) rather than each (αj , βj ) Empirical Bayes
separately. •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 17/83


The hierarchical approach

■ A natural prior distribution arises by assuming Hierarchical


models(I) —
(αj , βj )′ s are a sample from common population introduction

distribution.
.
(Ref.: Gelman et
al., 5.1-5.3;
■ We d be better off estimating the Berger, 4.6; J.
Albert, 7)
parameters,say φ, governing the population •Review 1
•Review 2
distribution of (αj , βj ) rather than each (αj , βj ) Empirical Bayes
separately. •Why
Hierarchical?
•Hierarchical
■ This introduces new parameters that govern this Model
•hierarchical
population distribution, called hyperparameters. approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 17/83


The hierarchical approach

■ A natural prior distribution arises by assuming Hierarchical


models(I) —
(αj , βj )′ s are a sample from common population introduction

distribution.
.
(Ref.: Gelman et
al., 5.1-5.3;
■ We d be better off estimating the Berger, 4.6; J.
Albert, 7)
parameters,say φ, governing the population •Review 1
•Review 2
distribution of (αj , βj ) rather than each (αj , βj ) Empirical Bayes
separately. •Why
Hierarchical?
•Hierarchical
■ This introduces new parameters that govern this Model
•hierarchical
population distribution, called hyperparameters. approach
•Exchangeability
•Basic ex. model
Hierarchical models uses many parameters but im- •General ex.
posing a population distribution induces enough model
•Typical structure
structure (dependence) to avoid overfitting. •Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 17/83


Hierarchical models and empirical Bayes

■ Hierarchical models retain the advantages of Hierarchical


models(I) —
using the data to estimate the prior parameters, introduction

but eliminate the disadvantages (of dealing with (Ref.: Gelman et


al., 5.1-5.3;
many parameters) by putting a joint probability Berger, 4.6; J.
model on the entire set of parameters and the Albert, 7)
•Review 1
data. •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 18/83


Hierarchical models and empirical Bayes

■ Hierarchical models retain the advantages of Hierarchical


models(I) —
using the data to estimate the prior parameters, introduction

but eliminate the disadvantages (of dealing with (Ref.: Gelman et


al., 5.1-5.3;
many parameters) by putting a joint probability Berger, 4.6; J.
model on the entire set of parameters and the Albert, 7)
•Review 1
data. •Review 2
Empirical Bayes
•Why
■ We can then do a Bayesian analysis of the joint Hierarchical?
•Hierarchical
distribution of all model parameters θ = (θ1 , . . . , θJ ) Model
•hierarchical
and φ. approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 18/83


Hierarchical models and empirical Bayes

■ Hierarchical models retain the advantages of Hierarchical


models(I) —
using the data to estimate the prior parameters, introduction

but eliminate the disadvantages (of dealing with (Ref.: Gelman et


al., 5.1-5.3;
many parameters) by putting a joint probability Berger, 4.6; J.
model on the entire set of parameters and the Albert, 7)
•Review 1
data. •Review 2
Empirical Bayes
•Why
■ We can then do a Bayesian analysis of the joint Hierarchical?
•Hierarchical
distribution of all model parameters θ = (θ1 , . . . , θJ ) Model
•hierarchical
and φ. approach
•Exchangeability
Note: ◆ Empirical Bayes (using data to estimate φ ) is •Basic ex. model
•General ex.
an approximation to the complete hierarchical model
•Typical structure
Bayesian analysis. •Posterior dist.
•Predictive dist.
◆ The key part of hierarchical model: φ is not
known and has S
its own
CHOOL OFF
prior p(φ).
S
INANCE AND TAT I S T I C S

April 28, 2009 Chapter 2 - p. 18/83


Exchangeability—review and extension

■ Exchangeability is essential for Bayesian Hierarchical


models(I) —
inference. It is related to iid but different. introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 19/83


Exchangeability—review and extension

■ Exchangeability is essential for Bayesian Hierarchical


models(I) —
inference. It is related to iid but different. introduction

■ Consider a set of experiments j = 1, 2, · · · , J, in (Ref.: Gelman et


al., 5.1-5.3;
which experiment j has data (vector) yj , Berger, 4.6; J.
Albert, 7)
parameter (vector) θj and likelihood p(yj |θj ). •Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 19/83


Exchangeability—review and extension

■ Exchangeability is essential for Bayesian Hierarchical


models(I) —
inference. It is related to iid but different. introduction

■ Consider a set of experiments j = 1, 2, · · · , J, in (Ref.: Gelman et


al., 5.1-5.3;
which experiment j has data (vector) yj , Berger, 4.6; J.
Albert, 7)
parameter (vector) θj and likelihood p(yj |θj ). •Review 1
•Review 2
■ Definition: A set of random variables (θ1 , · · · , θJ ) Empirical Bayes
•Why
is said to be exchangeable if the joint distribution is Hierarchical?
•Hierarchical
invariant to permutations of the indexes Model
•hierarchical
(1, · · · , J), that is, the indexes contain no approach
•Exchangeability
information about the data values. •Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 19/83


Hierarchical
In practice: Ignorance implies exchangeability models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 20/83


Hierarchical
In practice: Ignorance implies exchangeability models(I) —

.
introduction

If no information (other than the yj s) is


.
■ (Ref.: Gelman et
al., 5.1-5.3;
available to distinguish the θj s from each other, Berger, 4.6; J.
and no ordering or grouping of the parameters Albert, 7)
•Review 1
can be made, then we can assume symmetry •Review 2
Empirical Bayes
among the parameters in the prior distribution. •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 20/83


Hierarchical
In practice: Ignorance implies exchangeability models(I) —

.
introduction

If no information (other than the yj s) is


.
■ (Ref.: Gelman et
al., 5.1-5.3;
available to distinguish the θj s from each other, Berger, 4.6; J.
and no ordering or grouping of the parameters Albert, 7)
•Review 1
can be made, then we can assume symmetry •Review 2
Empirical Bayes
among the parameters in the prior distribution. •Why
Hierarchical?
•Hierarchical
■ This symmetry is represented probabilistically by Model
•hierarchical
exchangeability approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 20/83


Examples of exchangeability

Hierarchical
models(I) —
introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 21/83


Examples of exchangeability

Hierarchical
Examples (see pages 122-123): models(I) —
introduction
1. The simplest form: i.i.d. given some unknown
(Ref.: Gelman et
parameter. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 21/83


Examples of exchangeability

Hierarchical
Examples (see pages 122-123): models(I) —
introduction
1. The simplest form: i.i.d. given some unknown
(Ref.: Gelman et
parameter. al., 5.1-5.3;
Berger, 4.6; J.
2. Seemingly non-exchangeable random variables Albert, 7)
•Review 1
may become exchangeable if we condition on all •Review 2
available information (e.g. covariates regression Empirical Bayes
•Why
analysis) Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 21/83


Examples of exchangeability

Hierarchical
Examples (see pages 122-123): models(I) —
introduction
1. The simplest form: i.i.d. given some unknown
(Ref.: Gelman et
parameter. al., 5.1-5.3;
Berger, 4.6; J.
2. Seemingly non-exchangeable random variables Albert, 7)
•Review 1
may become exchangeable if we condition on all •Review 2
available information (e.g. covariates regression Empirical Bayes
•Why
analysis) Hierarchical?
•Hierarchical
3. Hierarchical models often use exchangeable Model
•hierarchical
models for prior distribution of model approach
•Exchangeability
parameters. •Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 21/83


Basic exchangeable model

Note that: Hierarchical


models(I) —
■ The basic form of an exchangeable model has introduction

the parameter θj as an independent sample from (Ref.: Gelman et


al., 5.1-5.3;
a prior distribution governed by some unknown Berger, 4.6; J.
parameter φ. Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 22/83


Basic exchangeable model

Note that: Hierarchical


models(I) —
■ The basic form of an exchangeable model has introduction

the parameter θj as an independent sample from (Ref.: Gelman et


al., 5.1-5.3;
a prior distribution governed by some unknown Berger, 4.6; J.
parameter φ. Albert, 7)
•Review 1
•Review 2
■ θ = (θ1 , · · · , θJ ) are independent conditional on Empirical Bayes
•Why
additional parameters φ (the hyperparameters): Hierarchical?
•Hierarchical
J Model
Y •hierarchical
p(θ|φ) = p(θj |φ) approach
•Exchangeability
j=1 •Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 22/83


Basic exchangeable model

Note that: Hierarchical


models(I) —
■ The basic form of an exchangeable model has introduction

the parameter θj as an independent sample from (Ref.: Gelman et


al., 5.1-5.3;
a prior distribution governed by some unknown Berger, 4.6; J.
parameter φ. Albert, 7)
•Review 1
•Review 2
■ θ = (θ1 , · · · , θJ ) are independent conditional on Empirical Bayes
•Why
additional parameters φ (the hyperparameters): Hierarchical?
•Hierarchical
J Model
Y •hierarchical
p(θ|φ) = p(θj |φ) approach
•Exchangeability
j=1 •Basic ex. model
•General ex.
model
■ In general φ is unknown, so our distribution for θ •Typical structure
•Posterior dist.
must average over uncertainty in φ: •Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 22/83


J
Z "Y # Hierarchical
models(I) —
p(θ) = p(θj |φ) p(φ)dφ introduction

j=1 (Ref.: Gelman et


al., 5.1-5.3;

.
Berger, 4.6; J.
Albert, 7)
This mixture of i.i.d. s is usually all we need to cap- •Review 1
ture exchangeability in practice. •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 23/83


J
Z "Y # Hierarchical
models(I) —
p(θ) = p(θj |φ) p(φ)dφ introduction

j=1 (Ref.: Gelman et


al., 5.1-5.3;

.
Berger, 4.6; J.
Albert, 7)
This mixture of i.i.d. s is usually all we need to cap- •Review 1
ture exchangeability in practice. •Review 2
Empirical Bayes
•Why
Bruno de Finetti Theorem: Hierarchical?
•Hierarchical
As J → ∞, any suitably well-behaved Model
•hierarchical
exchangeable distribution on θ1 , · · · , θJ can be approach
•Exchangeability
written as an i.i.d. mixture. •Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 23/83


General exchangeable model with covariates

■ The usual way to model exchangeability with Hierarchical


models(I) —
covariates is through conditional independence: introduction

J
Z "Y # (Ref.: Gelman et
al., 5.1-5.3;
p(θ1 , . . . , θJ |x1 , . . . , xJ ) = p(θj |φ, xj ) p(φ|xj )dφ,
Berger, 4.6; J.
Albert, 7)
j=1 •Review 1
•Review 2
where x = (x1 , . . . , xJ ) represents the available Empirical Bayes
•Why
information. Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 24/83


General exchangeable model with covariates

■ The usual way to model exchangeability with Hierarchical


models(I) —
covariates is through conditional independence: introduction

J
Z "Y # (Ref.: Gelman et
al., 5.1-5.3;
p(θ1 , . . . , θJ |x1 , . . . , xJ ) = p(θj |φ, xj ) p(φ|xj )dφ,
Berger, 4.6; J.
Albert, 7)
j=1 •Review 1
•Review 2
where x = (x1 , . . . , xJ ) represents the available Empirical Bayes
•Why
information. Hierarchical?
•Hierarchical
Model
■ In this way, exchangeable models become •hierarchical
approach
almost universally applicable. •Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 24/83


Setting up hierarchical models: typical structure

The hierarchical model is specified in nested Hierarchical


models(I) —
stages: introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 25/83


Setting up hierarchical models: typical structure

The hierarchical model is specified in nested Hierarchical


models(I) —
stages: introduction

(Ref.: Gelman et
1. p(y|θ) = the sampling distribution of the data. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 25/83


Setting up hierarchical models: typical structure

The hierarchical model is specified in nested Hierarchical


models(I) —
stages: introduction

(Ref.: Gelman et
1. p(y|θ) = the sampling distribution of the data. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
2. p(θ|φ) = the prior distribution for θ = (θ1 , . . . , θJ ) •Review 1
•Review 2
given φ—called population distribution. Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 25/83


Setting up hierarchical models: typical structure

The hierarchical model is specified in nested Hierarchical


models(I) —
stages: introduction

(Ref.: Gelman et
1. p(y|θ) = the sampling distribution of the data. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
2. p(θ|φ) = the prior distribution for θ = (θ1 , . . . , θJ ) •Review 1
•Review 2
given φ—called population distribution. Empirical Bayes
•Why
Hierarchical?
3. p(φ) = the prior distribution for φ—called •Hierarchical
Model
hyperprior distribution. •hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 25/83


Setting up hierarchical models: typical structure

The hierarchical model is specified in nested Hierarchical


models(I) —
stages: introduction

(Ref.: Gelman et
1. p(y|θ) = the sampling distribution of the data. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
2. p(θ|φ) = the prior distribution for θ = (θ1 , . . . , θJ ) •Review 1
•Review 2
given φ—called population distribution. Empirical Bayes
•Why
Hierarchical?
3. p(φ) = the prior distribution for φ—called •Hierarchical
Model
hyperprior distribution. •hierarchical
approach
•Exchangeability
4. More levels are possible! •Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 25/83


Setting up hierarchical models: typical structure

The hierarchical model is specified in nested Hierarchical


models(I) —
stages: introduction

(Ref.: Gelman et
1. p(y|θ) = the sampling distribution of the data. al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
2. p(θ|φ) = the prior distribution for θ = (θ1 , . . . , θJ ) •Review 1
•Review 2
given φ—called population distribution. Empirical Bayes
•Why
Hierarchical?
3. p(φ) = the prior distribution for φ—called •Hierarchical
Model
hyperprior distribution. •hierarchical
approach
•Exchangeability
4. More levels are possible! •Basic ex. model
•General ex.
model
5. The hyperprior at highest level is often diffuse. •Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 25/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
al., 5.1-5.3;
Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: Model
•hierarchical
1. Inference for θ as if we knew φ using the approach
•Exchangeability
posterior conditional distribution p(θ|y, φ); •Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: Model
•hierarchical
1. Inference for θ as if we knew φ using the approach
•Exchangeability
posterior conditional distribution p(θ|y, φ); •Basic ex. model
•General ex.
2. Inference for φ based on posterior marginal model
distribution p(φ|y). •Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: ? Model
•hierarchical
1. Inference for θ as if we knew φ using the approach
•Exchangeability
posterior conditional distribution p(θ|y, φ); •Basic ex. model
•General ex.
2. Inference for φ based on posterior marginal model
distribution p(φ|y). •Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: ? Multi-parameter model? Model
•hierarchical
1. Inference for θ as if we knew φ using the approach
•Exchangeability
posterior conditional distribution p(θ|y, φ); •Basic ex. model
•General ex.
2. Inference for φ based on posterior marginal model
distribution p(φ|y). •Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 26/83


Posterior distribution and Inference/Computation

Inference based on posterior distribution of Hierarchical


models(I) —
unknowns: introduction

(Ref.: Gelman et
p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) al., 5.1-5.3;
Berger, 4.6; J.
∝ p(θ, φ)p(y|θ) (y ind. of φ given θ) Albert, 7)
•Review 1
∝ p(φ)p(θ|φ)p(y|θ). •Review 2
Empirical Bayes
•Why
♣ Inference (and computation) is often carried out Hierarchical?
•Hierarchical
in two steps: ? Multi-parameter model? Model
•hierarchical
1. Inference for θ as if we knew φ using the approach
•Exchangeability
posterior conditional distribution p(θ|y, φ); •Basic ex. model
•General ex.
2. Inference for φ based on posterior marginal model
distribution p(φ|y). •Typical structure
•Posterior dist.
3. Treat θ as vector parameter of interest and φ as •Predictive dist.

nuisance parameter(s)
S
CHOOL OFF
(thoughS they are both of
INANCE AND TAT I S T I C S

interest in some problems.)


April 28, 2009 Chapter 2 - p. 26/83
Posterior predictive distributions

Hierarchical
Hierarchical models are characterized both by hy- models(I) —
perparameters, φ, and parameters θ. introduction

(Ref.: Gelman et
al., 5.1-5.3;
Two posterior predictive distributions: Berger, 4.6; J.
Albert, 7)
•Review 1
•Review 2
Empirical Bayes
•Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 27/83


Posterior predictive distributions

Hierarchical
Hierarchical models are characterized both by hy- models(I) —
perparameters, φ, and parameters θ. introduction

(Ref.: Gelman et
al., 5.1-5.3;
Two posterior predictive distributions: Berger, 4.6; J.
■ the distribution of future observations ỹ Albert, 7)
•Review 1
corresponding to an existing θj (experiment), •Review 2
Empirical Bayes
based on the posterior draws of θj (and/or φ). •Why
Hierarchical?
•Hierarchical
Model
•hierarchical
approach
•Exchangeability
•Basic ex. model
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 27/83


Posterior predictive distributions

Hierarchical
Hierarchical models are characterized both by hy- models(I) —
perparameters, φ, and parameters θ. introduction

(Ref.: Gelman et
al., 5.1-5.3;
Two posterior predictive distributions: Berger, 4.6; J.
■ the distribution of future observations ỹ Albert, 7)
•Review 1
corresponding to an existing θj (experiment), •Review 2
Empirical Bayes
based on the posterior draws of θj (and/or φ). •Why
Hierarchical?
•Hierarchical
■ the distribution of observations ỹ corresponding Model
•hierarchical
to future(experiment) θj ’s drawn from p(θj |φ): approach
◆ draw θ̃ from p(θj |φ); •Exchangeability
•Basic ex. model
◆ draw ỹ from p(y|θ̃).
•General ex.
model
•Typical structure
•Posterior dist.
•Predictive dist.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 27/83


Hierarchical
models(II)
— A N-N HM
with SAT coaching
example

Hierarchical models(II) (Ref.: Gelman et


al., 5.4-5.5)
— A N-N HM A special case
•SAT coaching ex
with SAT coaching example •Non-H approach
•Model
specification
(Ref.: Gelman et al., 5.4-5.5) •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 28/83


A special case: One-way normal random-effects model

■ Sampling Data: Observed data are normally Hierarchical


models(II)
distributed with a different mean for each — A N-N HM
with SAT coaching
group/experiment, with known observation example
variance. (Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 29/83


A special case: One-way normal random-effects model

■ Sampling Data: Observed data are normally Hierarchical


models(II)
distributed with a different mean for each — A N-N HM
with SAT coaching
group/experiment, with known observation example
variance. (Ref.: Gelman et
al., 5.4-5.5)
■ Prior/population distribution for group means: A special case
normal. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 29/83


A special case: One-way normal random-effects model

■ Sampling Data: Observed data are normally Hierarchical


models(II)
distributed with a different mean for each — A N-N HM
with SAT coaching
group/experiment, with known observation example
variance. (Ref.: Gelman et
al., 5.4-5.5)
■ Prior/population distribution for group means: A special case
normal. •SAT coaching ex
•Non-H approach
■ Widely applicable, a special case of hierarchical •Model
specification
normal linear model (see Chapter 15, Gelman et. •Joint posterior
•Computation
al.). •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 29/83


Data structure

■ Consider J independent experiments. Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 30/83


Data structure

■ Consider J independent experiments. Hierarchical


models(II)
iid — A N-N HM
■ yij ∼ N (θj , σ 2 ), i = 1, 2, . . . , nj (j = 1, 2 . . . , J), with SAT coaching

with known error variance σ 2 . example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 30/83


Data structure

■ Consider J independent experiments. Hierarchical


models(II)
iid — A N-N HM
■ yij ∼ N (θj , σ 2 ), i = 1, 2, . . . , nj (j = 1, 2 . . . , J), with SAT coaching

with known error variance σ 2 . example

1
Pnj (Ref.: Gelman et
■ Sample mean: ȳ.j = nj i=1 yij with sample al., 5.4-5.5)
A special case
σ2
variance: σj2 = nj
. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 30/83


Data structure

■ Consider J independent experiments. Hierarchical


models(II)
iid — A N-N HM
■ yij ∼ N (θj , σ 2 ), i = 1, 2, . . . , nj (j = 1, 2 . . . , J), with SAT coaching

with known error variance σ 2 . example

1
Pnj (Ref.: Gelman et
■ Sample mean: ȳ.j = nj i=1 yij with sample al., 5.4-5.5)
A special case
σ2
variance: σj2 = nj
. •SAT coaching ex
•Non-H approach
■ Likelihood for θj in terms of sufficient statistics ȳ.j : •Model
specification
•Joint posterior
ȳ.j | θj ∼ N (θj , σj2 ). •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 30/83


Data structure

■ Consider J independent experiments. Hierarchical


models(II)
iid — A N-N HM
■ yij ∼ N (θj , σ 2 ), i = 1, 2, . . . , nj (j = 1, 2 . . . , J), with SAT coaching

with known error variance σ 2 . example

1
Pnj (Ref.: Gelman et
■ Sample mean: ȳ.j = nj i=1 yij with sample al., 5.4-5.5)
A special case
σ2
variance: σj2 = nj
. •SAT coaching ex
•Non-H approach
■ Likelihood for θj in terms of sufficient statistics ȳ.j : •Model
specification
•Joint posterior
ȳ.j | θj ∼ N (θj , σj2 ). •Computation
•N-N Summary
•SAT ex. Result
■ This model ("normal with known variance") is •HM summary
appropriate for nj large enough. •Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 30/83


Data structure

■ Consider J independent experiments. Hierarchical


models(II)
iid — A N-N HM
■ yij ∼ N (θj , σ 2 ), i = 1, 2, . . . , nj (j = 1, 2 . . . , J), with SAT coaching

with known error variance σ 2 . example

1
Pnj (Ref.: Gelman et
■ Sample mean: ȳ.j = nj i=1 yij with sample al., 5.4-5.5)
A special case
σ2
variance: σj2 = nj
. •SAT coaching ex
•Non-H approach
■ Likelihood for θj in terms of sufficient statistics ȳ.j : •Model
specification
•Joint posterior
ȳ.j | θj ∼ N (θj , σj2 ). •Computation
•N-N Summary
•SAT ex. Result
■ This model ("normal with known variance") is •HM summary
appropriate for nj large enough. •Computation
Overview
■ Purpose: estimating θj . How? •Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 30/83


Hierarchical
■ Method 1—single data estimate: θ̂j = ȳ.j . models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 31/83


Hierarchical
■ Method 1—single data estimate: θ̂j = ȳ.j . models(II)
— A N-N HM
■ Reasonable? What if there are J = 20 experiments with SAT coaching
example
with only nj = 2 observations. So it is not
accurate! (Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 31/83


Hierarchical
■ Method 1—single data estimate: θ̂j = ȳ.j . models(II)
— A N-N HM
■ Reasonable? What if there are J = 20 experiments with SAT coaching
example
with only nj = 2 observations. So it is not
accurate! (Ref.: Gelman et
al., 5.4-5.5)
■ Method 2—pooled estimate: A special case
•SAT coaching ex
PJ 1 •Non-H approach
j=1 σj2 ȳ.j •Model
specification
θ̂j = ȳ.. = PJ 1 . •Joint posterior
j=1 σj2 •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 31/83


Hierarchical
■ Method 1—single data estimate: θ̂j = ȳ.j . models(II)
— A N-N HM
■ Reasonable? What if there are J = 20 experiments with SAT coaching
example
with only nj = 2 observations. So it is not
accurate! (Ref.: Gelman et
al., 5.4-5.5)
■ Method 2—pooled estimate: A special case
•SAT coaching ex
PJ 1 •Non-H approach
j=1 σj2 ȳ.j •Model
specification
θ̂j = ȳ.. = PJ 1 . •Joint posterior
j=1 σj2 •Computation
•N-N Summary
•SAT ex. Result
■ Pooled estimate is reasonable if J groups are not •HM summary
significantly different. •Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 31/83


Statistical Testing

■ ANOVA F test to decide which estimate to use: Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 32/83


Statistical Testing

■ ANOVA F test to decide which estimate to use: Hierarchical


models(II)
◆ if the J group means appear significantly — A N-N HM
with SAT coaching
variable, choose sample means ȳ.j ; example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 32/83


Statistical Testing

■ ANOVA F test to decide which estimate to use: Hierarchical


models(II)
◆ if the J group means appear significantly — A N-N HM
with SAT coaching
variable, choose sample means ȳ.j ; example
◆ if the variance between the group means is not
(Ref.: Gelman et
significantly greater than within the group al., 5.4-5.5)
A special case
means, use ȳ.. . •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 32/83


Statistical Testing

■ ANOVA F test to decide which estimate to use: Hierarchical


models(II)
◆ if the J group means appear significantly — A N-N HM
with SAT coaching
variable, choose sample means ȳ.j ; example
◆ if the variance between the group means is not
(Ref.: Gelman et
significantly greater than within the group al., 5.4-5.5)
A special case
means, use ȳ.. . •SAT coaching ex
•Non-H approach
■ Theoretical ANOVA table, where τ 2 being the •Model
specification
variance of θ1 , . . . , θJ . For simplicity, let •Joint posterior
nj = n, σj2 = σ 2 , j = 1, 2, . . . , J (Balanced design). •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 32/83


Statistical Testing

Hierarchical
df SS MS E(MS|σ 2 , τ ) models(II)
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
Between J −1 i j (ȳ.j J−1 with SAT coaching
example
− ȳ.j )2 SS
σ2
P P
Within J(n-1) i j (ȳij J(n−1)
2 (Ref.: Gelman et
SS
σ2
P P
Total Jn-1 i j (ȳij − ȳ.. ) Jn−1 al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 33/83


Statistical Testing

Hierarchical
df SS MS E(MS|σ 2 , τ ) models(II)
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
Between J −1 i j (ȳ.j J−1 with SAT coaching
example
− ȳ.j )2 SS
σ2
P P
Within J(n-1) i j (ȳij J(n−1)
2 (Ref.: Gelman et
SS
σ2
P P
Total Jn-1 i j (ȳij − ȳ.. ) Jn−1 al., 5.4-5.5)
A special case
■ Conclusions: •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 33/83


Statistical Testing

Hierarchical
df SS MS E(MS|σ 2 , τ ) models(II)
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
Between J −1 i j (ȳ.j J−1 with SAT coaching
example
− ȳ.j )2 SS
σ2
P P
Within J(n-1) i j (ȳij J(n−1)
2 (Ref.: Gelman et
SS
σ2
P P
Total Jn-1 i j (ȳij − ȳ.. ) Jn−1 al., 5.4-5.5)
A special case
■ Conclusions: •SAT coaching ex
◆ if the ratio of "between" to "within" mean •Non-H approach
•Model
squares is significantly greater than 1, then specification
•Joint posterior
θ̂j = ȳ.j ; •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 33/83


Statistical Testing

Hierarchical
df SS MS E(MS|σ 2 , τ ) models(II)
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
Between J −1 i j (ȳ.j J−1 with SAT coaching
example
− ȳ.j )2 SS
σ2
P P
Within J(n-1) i j (ȳij J(n−1)
2 (Ref.: Gelman et
SS
σ2
P P
Total Jn-1 i j (ȳij − ȳ.. ) Jn−1 al., 5.4-5.5)
A special case
■ Conclusions: •SAT coaching ex
◆ if the ratio of "between" to "within" mean •Non-H approach
•Model
squares is significantly greater than 1, then specification
•Joint posterior
θ̂j = ȳ.j ; •Computation
◆ if ratio of the mean squares is not statistically •N-N Summary
•SAT ex. Result
significant, then F test cannot reject H0 : τ = 0, •HM summary
•Computation
and θ̂j = ȳ.. . Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 33/83


Statistical Testing

Hierarchical
df SS MS E(MS|σ 2 , τ ) models(II)
— A N-N HM
− ȳ.. )2 SS
nτ 2 + σ 2
P P
Between J −1 i j (ȳ.j J−1 with SAT coaching
example
− ȳ.j )2 SS
σ2
P P
Within J(n-1) i j (ȳij J(n−1)
2 (Ref.: Gelman et
SS
σ2
P P
Total Jn-1 i j (ȳij − ȳ.. ) Jn−1 al., 5.4-5.5)
A special case
■ Conclusions: •SAT coaching ex
◆ if the ratio of "between" to "within" mean •Non-H approach
•Model
squares is significantly greater than 1, then specification
•Joint posterior
θ̂j = ȳ.j ; •Computation
◆ if ratio of the mean squares is not statistically •N-N Summary
•SAT ex. Result
significant, then F test cannot reject H0 : τ = 0, •HM summary
•Computation
and θ̂j = ȳ.. . Overview
•Exercises
■ Method 3: weighted combination:

θ̂j = λj ȳ.j + (1 − λj )ȳ.. ,


SCHOOL OF FINANCE AND S TAT I S T I C S
where 0 ≤ λj ≤ 1.
April 28, 2009 Chapter 2 - p. 33/83
■ What priors produce these posterior estimates? Hierarchical
models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 34/83


■ What priors produce these posterior estimates? Hierarchical
models(II)
1. The unpooled estimate θ̂j = ȳ.j is the posterior — A N-N HM
with SAT coaching
mean if θj ∼ U (−∞, ∞). (λj ≡ 1, τ 2 = ∞.) example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 34/83


■ What priors produce these posterior estimates? Hierarchical
models(II)
1. The unpooled estimate θ̂j = ȳ.j is the posterior — A N-N HM
with SAT coaching
mean if θj ∼ U (−∞, ∞). (λj ≡ 1, τ 2 = ∞.) example

2. The pooled estimate θ̂j = ȳ.. is the posterior (Ref.: Gelman et


al., 5.4-5.5)
mean if θ1 = · · · = θJ with a uniform prior for θ. A special case
•SAT coaching ex
(λj ≡ 0, τ 2 = 0.) •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 34/83


■ What priors produce these posterior estimates? Hierarchical
models(II)
1. The unpooled estimate θ̂j = ȳ.j is the posterior — A N-N HM
with SAT coaching
mean if θj ∼ U (−∞, ∞). (λj ≡ 1, τ 2 = ∞.) example

2. The pooled estimate θ̂j = ȳ.. is the posterior (Ref.: Gelman et


al., 5.4-5.5)
mean if θ1 = · · · = θJ with a uniform prior for θ. A special case
•SAT coaching ex
(λj ≡ 0, τ 2 = 0.) •Non-H approach
•Model
3. The weighted combination is the posterior mean specification
•Joint posterior
if the J values θj have iid normal prior •Computation
densities. •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 34/83


■ What priors produce these posterior estimates? Hierarchical
models(II)
1. The unpooled estimate θ̂j = ȳ.j is the posterior — A N-N HM
with SAT coaching
mean if θj ∼ U (−∞, ∞). (λj ≡ 1, τ 2 = ∞.) example

2. The pooled estimate θ̂j = ȳ.. is the posterior (Ref.: Gelman et


al., 5.4-5.5)
mean if θ1 = · · · = θJ with a uniform prior for θ. A special case
•SAT coaching ex
(λj ≡ 0, τ 2 = 0.) •Non-H approach
•Model
3. The weighted combination is the posterior mean specification
•Joint posterior
if the J values θj have iid normal prior •Computation
densities. •N-N Summary
•SAT ex. Result
■ All three options are exchangeable in θj′ s and •HM summary
•Computation
options 1 and 2 are special cases of option 3. Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 34/83


■ What priors produce these posterior estimates? Hierarchical
models(II)
1. The unpooled estimate θ̂j = ȳ.j is the posterior — A N-N HM
with SAT coaching
mean if θj ∼ U (−∞, ∞). (λj ≡ 1, τ 2 = ∞.) example

2. The pooled estimate θ̂j = ȳ.. is the posterior (Ref.: Gelman et


al., 5.4-5.5)
mean if θ1 = · · · = θJ with a uniform prior for θ. A special case
•SAT coaching ex
(λj ≡ 0, τ 2 = 0.) •Non-H approach
•Model
3. The weighted combination is the posterior mean specification
•Joint posterior
if the J values θj have iid normal prior •Computation
densities. •N-N Summary
•SAT ex. Result
■ All three options are exchangeable in θj′ s and •HM summary
•Computation
options 1 and 2 are special cases of option 3. Overview
•Exercises
■ See (below) the hierarchical model with: 1)
normal likelihood with known variance 2)
conjugate normal S
iid prior
CHOOL OF F
for theSmeans.
INANCE AND TAT I S T I C S

April 28, 2009 Chapter 2 - p. 34/83


SAT coaching example

ÆœUÿÁ):
˜«˜Y²ÿÁ øÆ3Æ)\ƞë, [©
SAT: Scholastic Aptitude Test( Hierarchical
models(II)

Ǒ«
, — A N-N HM
with SAT coaching
3 example

· PSAT: Preliminary SAT (Ref.: Gelman et


al., 5.4-5.5)
A special case
· SAT-V:(Verbal) •SAT coaching ex
•Non-H approach
· SAT-M:(Mathematics) •Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 35/83


SAT coaching example: Data description

ԑ§
■ Purpose: Analyze the effects of special coaching Hierarchical
models(II)
programs( ) on test scores. — A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 36/83


SAT coaching example: Data description

ԑ§
■ Purpose: Analyze the effects of special coaching Hierarchical
models(II)
programs( ) on test scores. — A N-N HM
with SAT coaching
■ All students in the experiments had taken the example

PSAT, and allowance was made for differences in (Ref.: Gelman et


al., 5.4-5.5)
the PSAT-M and PSAT-V test scores between A special case
coached and uncoached students. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 36/83


SAT coaching example: Data description

ԑ§
■ Purpose: Analyze the effects of special coaching Hierarchical
models(II)
programs( ) on test scores. — A N-N HM
with SAT coaching
■ All students in the experiments had taken the example

PSAT, and allowance was made for differences in (Ref.: Gelman et


al., 5.4-5.5)
the PSAT-M and PSAT-V test scores between A special case
coached and uncoached students. •SAT coaching ex
•Non-H approach
■ Separate randomized experiments were •Model
specification
conducted in 8 high schools: in each school the •Joint posterior
•Computation
estimated coaching effect (yj ) and its standard •N-N Summary
error (σj ) were obtained by an analysis of •SAT ex. Result
•HM summary
covariance adjustment—a linear regression was •Computation
Overview
performed of SAT-V on treatment group, using •Exercises
PSAT-M and PSAT-V as control variables.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 36/83


SAT coaching example: Data description

ԑ§
■ Purpose: Analyze the effects of special coaching Hierarchical
models(II)
programs( ) on test scores. — A N-N HM
with SAT coaching
■ All students in the experiments had taken the example

PSAT, and allowance was made for differences in (Ref.: Gelman et


al., 5.4-5.5)
the PSAT-M and PSAT-V test scores between A special case
coached and uncoached students. •SAT coaching ex
•Non-H approach
■ Separate randomized experiments were •Model
specification
conducted in 8 high schools: in each school the •Joint posterior
•Computation
estimated coaching effect (yj ) and its standard •N-N Summary
error (σj ) were obtained by an analysis of •SAT ex. Result
•HM summary
covariance adjustment—a linear regression was •Computation
Overview
performed of SAT-V on treatment group, using •Exercises
PSAT-M and PSAT-V as control variables.
■ Data: yj , j = 1, 2, . . . , J = 8 with known σj2 .
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 36/83


SAT coaching example: The data

The data (Rudin, 1981): Hierarchical


models(II)
— A N-N HM
estimated Standard error True with SAT coaching
example
treatment of effect treatment
(Ref.: Gelman et
School effect, yj estimates, σj effect, θj al., 5.4-5.5)
A special case
A 28 15 ? •SAT coaching ex
•Non-H approach
B 8 10 ? •Model
specification
C -3 16 ? •Joint posterior
•Computation
D 7 11 ? •N-N Summary
•SAT ex. Result
E -1 9 ? •HM summary
•Computation
F 1 11 ? Overview
•Exercises
G 18 10 ?
H 12 18 ?
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 37/83


SAT coaching example: The model

The model: Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 38/83


SAT coaching example: The model

The model: Hierarchical


models(II)
■ The quantities of interest are the θj , j = 1, . . . , 8:

/ 0
— A N-N HM
with SAT coaching
average true effects of coaching programs. example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 38/83


SAT coaching example: The model

The model: Hierarchical


models(II)
■ The quantities of interest are the θj , j = 1, . . . , 8:

/ 0
— A N-N HM
with SAT coaching
average true effects of coaching programs. example

(Ref.: Gelman et
■ Data yj : separate estimated treatment effects for al., 5.4-5.5)
A special case
each school. •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 38/83


SAT coaching example: The model

The model: Hierarchical


models(II)
■ The quantities of interest are the θj , j = 1, . . . , 8:

/ 0
— A N-N HM
with SAT coaching
average true effects of coaching programs. example

(Ref.: Gelman et
■ Data yj : separate estimated treatment effects for al., 5.4-5.5)
A special case
each school. •SAT coaching ex
•Non-H approach
■ The standard errors σj are assumed known •Model
specification
(large samples). •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 38/83


SAT coaching example: The model

The model: Hierarchical


models(II)
■ The quantities of interest are the θj , j = 1, . . . , 8:

/ 0
— A N-N HM
with SAT coaching
average true effects of coaching programs. example

(Ref.: Gelman et
■ Data yj : separate estimated treatment effects for al., 5.4-5.5)
A special case
each school. •SAT coaching ex
•Non-H approach
■ The standard errors σj are assumed known •Model
specification
(large samples). •Joint posterior
•Computation
•N-N Summary
■ This is a randomized experiment with large •SAT ex. Result
samples (over 32 students in each school), no •HM summary
•Computation
outliers, so we appeal to the central limit Overview
•Exercises
theorem:
yj |θj ∼ N (θj , σj2 ).
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 38/83


Nonhierarchical approach 1

Consider the 8 programs separately: Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 39/83


Nonhierarchical approach 1

Consider the 8 programs separately: Hierarchical


models(II)
— A N-N HM
■ Two programs appear to work (18-28 points) with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 39/83


Nonhierarchical approach 1

Consider the 8 programs separately: Hierarchical


models(II)
— A N-N HM
■ Two programs appear to work (18-28 points) with SAT coaching
example

■ Four programs appear to have a small effect (Ref.: Gelman et


al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 39/83


Nonhierarchical approach 1

Consider the 8 programs separately: Hierarchical


models(II)
— A N-N HM
■ Two programs appear to work (18-28 points) with SAT coaching
example

■ Four programs appear to have a small effect (Ref.: Gelman et


al., 5.4-5.5)
A special case
■ Two programs appear to have negative effects •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 39/83


Nonhierarchical approach 1

Consider the 8 programs separately: Hierarchical


models(II)
— A N-N HM
■ Two programs appear to work (18-28 points) with SAT coaching
example

■ Four programs appear to have a small effect (Ref.: Gelman et


al., 5.4-5.5)
A special case
■ Two programs appear to have negative effects •SAT coaching ex
•Non-H approach
•Model
■ Large standard errors imply overlapping CIs specification
•Joint posterior
(95% posterior intervals) •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 39/83


Nonhierarchical approach 1

Consider the 8 programs separately: Hierarchical


models(II)
— A N-N HM
■ Two programs appear to work (18-28 points) with SAT coaching
example

■ Four programs appear to have a small effect (Ref.: Gelman et


al., 5.4-5.5)
A special case
■ Two programs appear to have negative effects •SAT coaching ex
•Non-H approach
•Model
■ Large standard errors imply overlapping CIs specification
•Joint posterior
(95% posterior intervals) •Computation
•N-N Summary
•SAT ex. Result
■ Thus, NOT reasonable! •HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 39/83


Nonhierarchical approach 2

Use a pooled estimate of the coaching effect: Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 40/83


Nonhierarchical approach 2

Use a pooled estimate of the coaching effect: Hierarchical


models(II)
— A N-N HM
■ Pooled estimate with SAT coaching
example
P8 2
j=1 (y j /σj) (Ref.: Gelman et
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)

j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 40/83


Nonhierarchical approach 2

Use a pooled estimate of the coaching effect: Hierarchical


models(II)
— A N-N HM
■ Pooled estimate with SAT coaching
example
P8 2
j=1 (y j /σj) (Ref.: Gelman et
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)

j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
■ Pooled estimate applies to each school. •Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 40/83


Nonhierarchical approach 2

Use a pooled estimate of the coaching effect: Hierarchical


models(II)
— A N-N HM
■ Pooled estimate with SAT coaching
example
P8 2
j=1 (y j /σj) (Ref.: Gelman et
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)

j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
■ Pooled estimate applies to each school. •Model
specification
•Joint posterior
Separate and pooled estimates are both unreason- •Computation
•N-N Summary
able! —See further on pages 139–141. •SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 40/83


Nonhierarchical approach 2

Use a pooled estimate of the coaching effect: Hierarchical


models(II)
— A N-N HM
■ Pooled estimate with SAT coaching
example
P8 2
j=1 (y j /σj) (Ref.: Gelman et
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)

j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
■ Pooled estimate applies to each school. •Model
specification
•Joint posterior
Separate and pooled estimates are both unreason- •Computation
•N-N Summary
able! —See further on pages 139–141. •SAT ex. Result
•HM summary

■ /Classical0test fails to reject that all θ .s are j


•Computation
Overview
•Exercises
equal.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 40/83


Nonhierarchical approach 2

Use a pooled estimate of the coaching effect: Hierarchical


models(II)
— A N-N HM
■ Pooled estimate with SAT coaching
example
P8 2
j=1 (y j /σj) (Ref.: Gelman et
µ̂ = P8 2
= 7.9(s.e.(µ̂)=4.2) al., 5.4-5.5)

j=1 (1/σj )
A special case
•SAT coaching ex
•Non-H approach
■ Pooled estimate applies to each school. •Model
specification
•Joint posterior
Separate and pooled estimates are both unreason- •Computation
•N-N Summary
able! —See further on pages 139–141. •SAT ex. Result
•HM summary

■ /Classical0test fails to reject that all θ .s are j


•Computation
Overview
•Exercises
equal.
A hierarchical model provides a compromise.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 40/83


Model specification

■ Observed data are normally distributed with a Hierarchical


models(II)
different mean in each group: — A N-N HM
with SAT coaching
example
yj |θj ∼ N (θj , σj2 ), j = 1, 2, . . . , J.
(Ref.: Gelman et
nj al., 5.4-5.5)
1 X
yj = yij A special case
nj i=1 •SAT coaching ex
•Non-H approach
•Model
σj2 = σ 2 /nj (assumed known) specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 41/83


Model specification

■ Observed data are normally distributed with a Hierarchical


models(II)
different mean in each group: — A N-N HM
with SAT coaching
example
yj |θj ∼ N (θj , σj2 ), j = 1, 2, . . . , J.
(Ref.: Gelman et
nj al., 5.4-5.5)
1 X
yj = yij A special case
nj i=1 •SAT coaching ex
•Non-H approach
•Model
σj2 = σ 2 /nj (assumed known) specification
•Joint posterior

.
•Computation
•N-N Summary
■ Prior model for θj s is based on a normal •SAT ex. Result
population distribution (conjugate) •HM summary
•Computation
Overview
J
Y •Exercises
p(θ1 , . . . , θJ |µ, τ ) = N (θj |µ, τ )
j=1

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 41/83


■ Hyperprior distribution: p(µ, τ ) = p(τ )p(µ|τ ) Hierarchical
models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 42/83


■ Hyperprior distribution: p(µ, τ ) = p(τ )p(µ|τ ) Hierarchical
models(II)

◆ p(µ|τ ) ∝ 1 (noninformative, this won t matter . — A N-N HM


with SAT coaching
example
much because the data supply a great deal of
(Ref.: Gelman et
information about µ) al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 42/83


■ Hyperprior distribution: p(µ, τ ) = p(τ )p(µ|τ ) Hierarchical
models(II)

◆ p(µ|τ ) ∝ 1 (noninformative, this won t matter . — A N-N HM


with SAT coaching
example
much because the data supply a great deal of
(Ref.: Gelman et
information about µ) al., 5.4-5.5)
A special case
•SAT coaching ex
◆ p(τ ) ∝ 1 (must be sure the posterior is proper) •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 42/83


■ Hyperprior distribution: p(µ, τ ) = p(τ )p(µ|τ ) Hierarchical
models(II)

◆ p(µ|τ ) ∝ 1 (noninformative, this won t matter . — A N-N HM


with SAT coaching
example
much because the data supply a great deal of
(Ref.: Gelman et
information about µ) al., 5.4-5.5)
A special case
•SAT coaching ex
◆ p(τ ) ∝ 1 (must be sure the posterior is proper) •Non-H approach
•Model
specification
Note: p(log(τ )) ∝ 1 yields an improper posterior •Joint posterior
dist’n! •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 42/83


Joint posterior distribution

The joint posterior distribution: Hierarchical


models(II)
— A N-N HM
p(θ, µ, τ |y) ∝ p(µ, τ )p(θ|µ, τ )p(y|θ) with SAT coaching
example
YJ J
Y
∝ N (θj |µ, τ 2 ) N (yj |θj , σj2 ) (Ref.: Gelman et
al., 5.4-5.5)
j=1 j=1 A special case
" # •SAT coaching ex
•Non-H approach
1 X 1
2 •Model
∝ τ −J exp − (θ j − µ) specification
2 j τ2 •Joint posterior
" # •Computation
•N-N Summary
1X 1 2 •SAT ex. Result
× exp − (y j − θj )
2 j σj2 •HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 43/83


Joint posterior distribution

The joint posterior distribution: Hierarchical


models(II)
— A N-N HM
p(θ, µ, τ |y) ∝ p(µ, τ )p(θ|µ, τ )p(y|θ) with SAT coaching
example
YJ J
Y
∝ N (θj |µ, τ 2 ) N (yj |θj , σj2 ) (Ref.: Gelman et
al., 5.4-5.5)
j=1 j=1 A special case
" # •SAT coaching ex
•Non-H approach
1 X 1
2 •Model
∝ τ −J exp − (θ j − µ) specification
2 j τ2 •Joint posterior
" # •Computation
•N-N Summary
1X 1 2 •SAT ex. Result
× exp − (y j − θj )
2 j σj2 •HM summary
•Computation
Overview
•Exercises
Note: Factors depend only on y and {σj } are
treated as constant.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 43/83


Conditional posterior dist .n of θ given µ, τ, y

■ Treat (µ, τ ) as fixed in the previous expressions. Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 44/83


Conditional posterior dist .n of θ given µ, τ, y

■ Treat (µ, τ ) as fixed in the previous expressions. Hierarchical


models(II)
— A N-N HM
■ Given (µ, τ ), the J separate parameters θj are with SAT coaching
example
independent in their posterior distribution since
(Ref.: Gelman et
they appear in different factors in the likelihood al., 5.4-5.5)
A special case
(which factors into J components). •SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 44/83


Conditional posterior dist .n of θ given µ, τ, y

■ Treat (µ, τ ) as fixed in the previous expressions. Hierarchical


models(II)
— A N-N HM
■ Given (µ, τ ), the J separate parameters θj are with SAT coaching
example
independent in their posterior distribution since
(Ref.: Gelman et
they appear in different factors in the likelihood al., 5.4-5.5)
A special case
(which factors into J components). •SAT coaching ex
•Non-H approach
•Model
■ Thus, θj |µ, τ, y ∼ N (θ̂j , Vj ), with specification
•Joint posterior
1 1
y
σj2 j
+ τ2
µ 1
•Computation
•N-N Summary
θ̂j = 1 1 and Vj = 1 1 •SAT ex. Result
σj2
+ τ2 σj2
+ τ2 •HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 44/83


Conditional posterior dist .n of θ given µ, τ, y

■ Treat (µ, τ ) as fixed in the previous expressions. Hierarchical


models(II)
— A N-N HM
■ Given (µ, τ ), the J separate parameters θj are with SAT coaching
example
independent in their posterior distribution since
(Ref.: Gelman et
they appear in different factors in the likelihood al., 5.4-5.5)
A special case
(which factors into J components). •SAT coaching ex
•Non-H approach
•Model
■ Thus, θj |µ, τ, y ∼ N (θ̂j , Vj ), with ? specification
•Joint posterior
1 1
y
σj2 j
+ τ2
µ 1
•Computation
•N-N Summary
θ̂j = 1 1 and Vj = 1 1 •SAT ex. Result
σj2
+ τ2 σj2
+ τ2 •HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 44/83


Conditional posterior dist .n of θ given µ, τ, y

■ Treat (µ, τ ) as fixed in the previous expressions. Hierarchical


models(II)
— A N-N HM
■ Given (µ, τ ), the J separate parameters θj are with SAT coaching
example
independent in their posterior distribution since
(Ref.: Gelman et
they appear in different factors in the likelihood al., 5.4-5.5)
A special case
(which factors into J components). •SAT coaching ex
•Non-H approach
•Model
■ Thus, θj |µ, τ, y ∼ N (θ̂j , Vj ), with ? specification
•Joint posterior
1 1
y
σj2 j
+ τ2
µ 1
•Computation
•N-N Summary
θ̂j = 1 1 and Vj = 1 1 •SAT ex. Result
σj2
+ τ2 σj2
+ τ2 •HM summary
•Computation
Overview
•Exercises
■ Large standard errors imply overlapping CIs

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 44/83


Marginal posterior dist .n of µ, τ given y

■ To derive p(µ, τ |y), think of inference about (µ, τ ) Hierarchical


models(II)
directly: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(µ, τ )p(y|µ, τ ).
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 45/83


Marginal posterior dist .n of µ, τ given y

■ To derive p(µ, τ |y), think of inference about (µ, τ ) Hierarchical


models(II)
directly: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(µ, τ )p(y|µ, τ ).
(Ref.: Gelman et
■ Prior distribution: p(µ, τ ) ∝ 1 al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 45/83


Marginal posterior dist .n of µ, τ given y

■ To derive p(µ, τ |y), think of inference about (µ, τ ) Hierarchical


models(II)
directly: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(µ, τ )p(y|µ, τ ).
(Ref.: Gelman et
■ Prior distribution: p(µ, τ ) ∝ 1 al., 5.4-5.5)
A special case
•SAT coaching ex
■ Data distribution: •Non-H approach
•Model
J specification
Y •Joint posterior
p(y|µ, τ ) ∝ N (yj |µ, σj2 + τ 2 ) •Computation
•N-N Summary
j=1
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 45/83


■ Thus, Hierarchical
models(II)
— A N-N HM
J
Y with SAT coaching
p(µ, τ |y) ∝ N (yj |µ, σj2 + τ 2 ) example

j=1 (Ref.: Gelman et


al., 5.4-5.5)
J  2
 A special case
Y
2 2 −1/2 (yj − µ)
∝ (σj + τ ) exp − 2 2)
•SAT coaching ex

j=1
2(σ j + τ •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 46/83


Note: Hierarchical
models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 47/83


Note: Hierarchical
models(II)
— A N-N HM
■ We can derive this analytically by integrating with SAT coaching
example
p(θ, µ, τ |y) over θ, but in non-normal models, it is
(Ref.: Gelman et
not generally possible to integrate over θ, and al., 5.4-5.5)
A special case
more elaborate computational methods are •SAT coaching ex
needed. •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 47/83


Note: Hierarchical
models(II)
— A N-N HM
■ We can derive this analytically by integrating with SAT coaching
example
p(θ, µ, τ |y) over θ, but in non-normal models, it is
(Ref.: Gelman et
not generally possible to integrate over θ, and al., 5.4-5.5)
A special case
more elaborate computational methods are •SAT coaching ex
needed. •Non-H approach
•Model
specification
■ We can compute marginal posterior density over •Joint posterior
•Computation
grid in (µ, τ ), as in bioassay example, but it is •N-N Summary
better to consider a further simplification... •SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 47/83


Posterior distribution of µ given τ, y

■ Instead of sampling (µ, τ ) on a grid, factor the Hierarchical


models(II)
distribution: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(τ |y)p(µ|τ, y).
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 48/83


Posterior distribution of µ given τ, y

■ Instead of sampling (µ, τ ) on a grid, factor the Hierarchical


models(II)
distribution: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(τ |y)p(µ|τ, y).
(Ref.: Gelman et
■ p(µ|τ, y) is obtained by looking at p(µ, τ |y) and al., 5.4-5.5)
A special case
thinking of τ as known. With a uniform prior for •SAT coaching ex
•Non-H approach
µ|τ , the log posterior is quadratic in µ and •Model
specification
therefore normal: •Joint posterior
J •Computation
Y •N-N Summary
p(µ|τ, y) ∝ N (µ|yj , σj2 + τ 2 ). •SAT ex. Result
•HM summary
j=1 •Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 48/83


Posterior distribution of µ given τ, y

■ Instead of sampling (µ, τ ) on a grid, factor the Hierarchical


models(II)
distribution: — A N-N HM
with SAT coaching
example
p(µ, τ |y) ∝ p(τ |y)p(µ|τ, y).
(Ref.: Gelman et
■ p(µ|τ, y) is obtained by looking at p(µ, τ |y) and al., 5.4-5.5)
A special case
thinking of τ as known. With a uniform prior for •SAT coaching ex
•Non-H approach
µ|τ , the log posterior is quadratic in µ and •Model
specification
therefore normal: •Joint posterior
J •Computation
Y •N-N Summary
p(µ|τ, y) ∝ N (µ|yj , σj2 + τ 2 ). •SAT ex. Result
•HM summary
j=1 •Computation
Overview
•Exercises
■ This is a normal sampling distribution with a
noninformative prior density on µ.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 48/83


■ The mean and variance are obtained by Hierarchical
models(II)
considering group means yj as J independent — A N-N HM
with SAT coaching
estimates of µ with variance σj2 + τ 2 . example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 49/83


■ The mean and variance are obtained by Hierarchical
models(II)
considering group means yj as J independent — A N-N HM
with SAT coaching
estimates of µ with variance σj2 + τ 2 . example

(Ref.: Gelman et
■ Result: µ|τ, y ∼ N (µ̂, Vµ ) with al., 5.4-5.5)
A special case
PJ 1 •SAT coaching ex
j=1 σj +τ 2 yj
2
1 •Non-H approach
µ̂ = PJ 1
and Vµ = PJ 1
•Model
specification
j=1 σj2 +τ 2 j=1 σj2 +τ 2 •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 49/83


Posterior distribution of τ given y

■ Directly: We could integrate p(µ, τ |y) over µ ?: Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 50/83


Posterior distribution of τ given y

■ Directly: We could integrate p(µ, τ |y) over µ ?: Hierarchical


models(II)
■ Indirectly: It is easier to use identity — A N-N HM
with SAT coaching
example
p(µ, τ |y)
p(τ |y) = , (Ref.: Gelman et
p(µ|τ, y) al., 5.4-5.5)
A special case
•SAT coaching ex
which holds for all µ. •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 50/83


Posterior distribution of τ given y

■ Directly: We could integrate p(µ, τ |y) over µ ?: Hierarchical


models(II)
■ Indirectly: It is easier to use identity — A N-N HM
with SAT coaching
example
p(µ, τ |y)
p(τ |y) = , (Ref.: Gelman et
p(µ|τ, y) al., 5.4-5.5)
A special case
•SAT coaching ex
which holds for all µ. Evaluating at µ = µ̂ gives: •Non-H approach
QJ •Model
2 2
j=1 N (y j |µ̂, σj + τ ) specification
p(τ |y) ∝ •Joint posterior
N (µ̂|µ̂, Vµ ) •Computation
•N-N Summary
J  2
 •SAT ex. Result
1/2
Y
2 2 −1/2 (yj − µ̂) •HM summary
∝ Vµ (σj + τ ) exp − 2 2
. •Computation
j=1
2(σj + τ ) Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 50/83


Note: Hierarchical
models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 51/83


Note: Hierarchical
models(II)
— A N-N HM
■ Note that Vµ and µ̂ are both functions of τ , and with SAT coaching
example
thus so is p(τ |y), so we compute p(τ |y) on a grid
(Ref.: Gelman et
of values of τ . al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 51/83


Note: Hierarchical
models(II)
— A N-N HM
■ Note that Vµ and µ̂ are both functions of τ , and with SAT coaching
example
thus so is p(τ |y), so we compute p(τ |y) on a grid
(Ref.: Gelman et
of values of τ . al., 5.4-5.5)
A special case
•SAT coaching ex
■ The numerator of the first expression for p(τ |y) is •Non-H approach
the profile likelihood for τ given the maximum •Model
specification
likelihood estimate of µ given τ — more details •Joint posterior
•Computation
later. •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 51/83


N-N model computation: Summary

To simulate from joint posterior distribution Hierarchical


models(II)
p(θ, µ, τ |y): — A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 52/83


N-N model computation: Summary

To simulate from joint posterior distribution Hierarchical


models(II)
p(θ, µ, τ |y): — A N-N HM
with SAT coaching
example
1. Draw τ from p(τ |y) (grid approximation)
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 52/83


N-N model computation: Summary

To simulate from joint posterior distribution Hierarchical


models(II)
p(θ, µ, τ |y): — A N-N HM
with SAT coaching
example
1. Draw τ from p(τ |y) (grid approximation)
(Ref.: Gelman et
al., 5.4-5.5)
2. Draw µ from p(µ|τ, y) (normal distribution) A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 52/83


N-N model computation: Summary

To simulate from joint posterior distribution Hierarchical


models(II)
p(θ, µ, τ |y): — A N-N HM
with SAT coaching
example
1. Draw τ from p(τ |y) (grid approximation)
(Ref.: Gelman et
al., 5.4-5.5)
2. Draw µ from p(µ|τ, y) (normal distribution) A special case
•SAT coaching ex
•Non-H approach
3. Draw θ = (θ1 , . . . , θJ ) from p(θ|µ, τ, y) •Model
specification
(independent normal distribution for each θj ) •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 52/83


N-N model computation: Summary

To simulate from joint posterior distribution Hierarchical


models(II)
p(θ, µ, τ |y): — A N-N HM
with SAT coaching
example
1. Draw τ from p(τ |y) (grid approximation)
(Ref.: Gelman et
al., 5.4-5.5)
2. Draw µ from p(µ|τ, y) (normal distribution) A special case
•SAT coaching ex
•Non-H approach
3. Draw θ = (θ1 , . . . , θJ ) from p(θ|µ, τ, y) •Model
specification
(independent normal distribution for each θj ) •Joint posterior
•Computation
•N-N Summary
Back to the coaching example: Apply these ideas •SAT ex. Result
to SAT coaching data; repeat 1000 times to obtain •HM summary
•Computation
1000 simulations. Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 52/83


p(τ |y)

Hierarchical
models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
0 5 10 15 20 25 30 •Computation
Overview
τ
•Exercises

Figure 1: Marginal density of τ , p(τ |y).


SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 53/83


E(θj |τ, y) and Sd(θj |τ, y)

Hierarchical
Conditional posterior mean Conditional posterior SD
models(II)
— A N-N HM
30
with SAT coaching

20
example
25

A
(Ref.: Gelman et

15
H al., 5.4-5.5)
20

Posterior Standard Deviations


Estimate Treatment Effects

C A special case
A
G
•SAT coaching ex
15

•Non-H approach

10
D
F
B
G
H •Model
10

E
B
D
specification
•Joint posterior
5

5
F
•Computation
C
E •N-N Summary
0

•SAT ex. Result


•HM summary
−5

0 5 10 15 20 25 30 0 5 10 15 20 25 30
•Computation
Overview
τ τ
•Exercises

Figure 2: Conditional posterior means and std: E(θj |τ, y),


Sd(θj |τ, y). S C H O O L O F F I N A N C E A N D S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 54/83


SAT coaching example: post. quantiles

Hierarchical
School 2.5% 25% 50% 75% 97.5% yj models(II)
— A N-N HM
with SAT coaching
A -2 7 10 16 31 28 example

B -5 3 8 12 23 8 (Ref.: Gelman et
al., 5.4-5.5)
C -11 2 7 11 19 -3 A special case
•SAT coaching ex
D -7 4 8 11 21 7 •Non-H approach
•Model
E -9 1 5 10 18 -1 specification
•Joint posterior
F -7 2 6 10 28 1 •Computation
•N-N Summary
G -1 7 10 15 26 18 •SAT ex. Result
•HM summary
H -6 3 8 13 33 12 •Computation
Overview
µ -2 5 8 11 18 •Exercises

τ 0.3 2.3 5.1 8.8 21.0

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 55/83


SAT coaching example: Reults

We can address more complicated questions: Hierarchical

.s effect is the max) = 0.25


models(II)
— A N-N HM

.s effect is the max) = 0.10


Pr(school A with SAT coaching
example

.s effect is the max) = 0.10


Pr(school B
(Ref.: Gelman et

.s effect is the min) = 0.07


Pr(school C al., 5.4-5.5)

.s effect is the min) = 0.09


A special case
Pr(school A •SAT coaching ex

.s effect is the min) = 0.17


Pr(school B •Non-H approach
•Model

.s effect > school C.s effect) = 0.67


Pr(school C specification
•Joint posterior
Pr(school A •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 56/83


Hierarchical models: Summary

1. They account for multiple levels of variability. Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 57/83


Hierarchical models: Summary

1. They account for multiple levels of variability. Hierarchical


models(II)
— A N-N HM
2. There is a data-determined degree of pooling with SAT coaching
example
across studies.
(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 57/83


Hierarchical models: Summary

1. They account for multiple levels of variability. Hierarchical


models(II)
— A N-N HM
2. There is a data-determined degree of pooling with SAT coaching
example
across studies.
(Ref.: Gelman et
al., 5.4-5.5)
3. Classical estimates (no pooling, complete A special case
•SAT coaching ex
pooling) provide a starting point for analysis. •Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 57/83


Hierarchical models: Summary

1. They account for multiple levels of variability. Hierarchical


models(II)
— A N-N HM
2. There is a data-determined degree of pooling with SAT coaching
example
across studies.
(Ref.: Gelman et
al., 5.4-5.5)
3. Classical estimates (no pooling, complete A special case
•SAT coaching ex
pooling) provide a starting point for analysis. •Non-H approach
•Model
specification
We can draw inference about the population of •Joint posterior
schools. •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 57/83


Computation with hierarchical models: Overview

Conjugate case (p(θ|φ) conjugate prior for p(y|θ)) Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 58/83


Computation with hierarchical models: Overview

Conjugate case (p(θ|φ) conjugate prior for p(y|θ)) Hierarchical


models(II)
— A N-N HM
1. write p(θ, φ|y) = p(φ|y)p(θ|φ, y) with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 58/83


Computation with hierarchical models: Overview

Conjugate case (p(θ|φ) conjugate prior for p(y|θ)) Hierarchical


models(II)
— A N-N HM
1. write p(θ, φ|y) = p(φ|y)p(θ|φ, y) with SAT coaching
example

2. identify conditional posterior density p(θ|φ, y) (Ref.: Gelman et


al., 5.4-5.5)
(easy for conjugate models) A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 58/83


Computation with hierarchical models: Overview

Conjugate case (p(θ|φ) conjugate prior for p(y|θ)) Hierarchical


models(II)
— A N-N HM
1. write p(θ, φ|y) = p(φ|y)p(θ|φ, y) with SAT coaching
example

2. identify conditional posterior density p(θ|φ, y) (Ref.: Gelman et


al., 5.4-5.5)
(easy for conjugate models) A special case
•SAT coaching ex
•Non-H approach
3. obtain marginal posterior distribution p(φ|y) •Model
specification
(more about this step on next slide) •Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 58/83


Computation with hierarchical models: Overview

Conjugate case (p(θ|φ) conjugate prior for p(y|θ)) Hierarchical


models(II)
— A N-N HM
1. write p(θ, φ|y) = p(φ|y)p(θ|φ, y) with SAT coaching
example

2. identify conditional posterior density p(θ|φ, y) (Ref.: Gelman et


al., 5.4-5.5)
(easy for conjugate models) A special case
•SAT coaching ex
•Non-H approach
3. obtain marginal posterior distribution p(φ|y) •Model
specification
(more about this step on next slide) •Joint posterior
•Computation
4. draw from p(φ|y) and then p(θ|φ, y) •N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 58/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et


al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et


al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
3. Sampling from p(φ|y): •Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et


al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
3. Sampling from p(φ|y): •Model
specification
•Joint posterior
■ easy if it is a common distribution •Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et


al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
3. Sampling from p(φ|y): •Model
specification
•Joint posterior
■ easy if it is a common distribution •Computation
•N-N Summary
■ grid if φ is low-dimensional •SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et


al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
3. Sampling from p(φ|y): •Model
specification
•Joint posterior
■ easy if it is a common distribution •Computation
•N-N Summary
■ grid if φ is low-dimensional •SAT ex. Result
•HM summary
■ more sophisticated methods (later) •Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Approaches for obtaining p(φ|y)

Four ways to obtain p(φ|y) : Hierarchical


models(II)
R — A N-N HM
1. Integration: p(φ|y) = p(θ, φ|y)dθ with SAT coaching
example

2. Algebra: Use p(φ|y) = p(θ, φ|y)/p(θ|φ, y) for a (Ref.: Gelman et


al., 5.4-5.5)
convenient value of θ. A special case
•SAT coaching ex
•Non-H approach
3. Sampling from p(φ|y): •Model
specification
•Joint posterior
■ easy if it is a common distribution •Computation
•N-N Summary
■ grid if φ is low-dimensional •SAT ex. Result
•HM summary
■ more sophisticated methods (later) •Computation

/Empirical Bayes0methods replace p(φ|y) by


Overview
•Exercises
4.
mode.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 59/83


Exercises

Ex 5.1, 5.3 Hierarchical


models(II)
— A N-N HM
with SAT coaching
example

(Ref.: Gelman et
al., 5.4-5.5)
A special case
•SAT coaching ex
•Non-H approach
•Model
specification
•Joint posterior
•Computation
•N-N Summary
•SAT ex. Result
•HM summary
•Computation
Overview
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 60/83


Hierarchical
models(III)
— Meta-analysis of
clinical trials

(Ref.: Gelman et
Hierarchical models(III) al., 5.6, 19.4)
•Meta-analysis
— Meta-analysis of clinical trials •The data
•Parameters
•Normal Approx.
(Ref.: Gelman et al., 5.6, 19.4) •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 61/83


Meta-analysis

■ Meta-analysis(?Ô©Û ) aims to summarize and Hierarchical


models(III)
integrate findings from research studies in a — Meta-analysis of
clinical trials
particular area.
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 62/83


Meta-analysis

■ Meta-analysis( ?Ô©Û ) aims to summarize and Hierarchical


models(III)
integrate findings from research studies in a — Meta-analysis of
clinical trials
particular area.
(Ref.: Gelman et
■ It involves combining information from several al., 5.6, 19.4)
•Meta-analysis
parallel data sources, so is closely connected to •The data
•Parameters
hierarchical modelling (but there are well known •Normal Approx.
frequentist methods as well). •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 62/83


Meta-analysis

■ Meta-analysis( ?Ô©Û ) aims to summarize and Hierarchical


models(III)
integrate findings from research studies in a — Meta-analysis of
clinical trials
particular area.
(Ref.: Gelman et
■ It involves combining information from several al., 5.6, 19.4)
•Meta-analysis
parallel data sources, so is closely connected to •The data
•Parameters
hierarchical modelling (but there are well known •Normal Approx.
frequentist methods as well). •Possible

.
Assumptions
•Classical model
■ We ll re-inforce some of the concepts of •Classical test
hierarchical modelling in a meta-analysis of •HM–stage 1
•HM–stage 2
clinical trials data.
.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 62/83


Meta-analysis

■ Meta-analysis( ?Ô©Û ) aims to summarize and Hierarchical


models(III)
integrate findings from research studies in a — Meta-analysis of
clinical trials
particular area.
(Ref.: Gelman et
■ It involves combining information from several al., 5.6, 19.4)
•Meta-analysis
parallel data sources, so is closely connected to •The data
•Parameters
hierarchical modelling (but there are well known •Normal Approx.
frequentist methods as well). •Possible

.
Assumptions
•Classical model
■ We ll re-inforce some of the concepts of •Classical test
hierarchical modelling in a meta-analysis of •HM–stage 1
•HM–stage 2
clinical trials data.
.
•Posterior

?Ô©Û´A½+(XšÆ)¥æ^˜«ÚOÆ distn s

ܐ{§´òõ‡ÕáéӘKïÄn
•Result
•Another example

Üå5?1½þ©Û{"
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 62/83


The Data and Aim of study

■ Meta-analysis Data: 22 clinical trials, each with Hierarchical


models(III)

%*òl
two groups of heart attack(myocardial infarction, — Meta-analysis of
clinical trials

ÉN{Ž†
) patients receiving (or not)
(Ref.: Gelman et
beta-blockers( ) (samples sizes from al., 5.6, 19.4)
100 to almost 2000, mortality from 3% to 21% •Meta-analysis
•The data
showing a modest, though not ’statistically •Parameters
•Normal Approx.
significant,’ benefit from use of beta-blockers.) •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 63/83


The Data and Aim of study

■ Meta-analysis Data: 22 clinical trials, each with Hierarchical


models(III)

%*òl
two groups of heart attack(myocardial infarction, — Meta-analysis of
clinical trials

ÉN{Ž†
) patients receiving (or not)
(Ref.: Gelman et
beta-blockers( ) (samples sizes from al., 5.6, 19.4)
100 to almost 2000, mortality from 3% to 21% •Meta-analysis
•The data
showing a modest, though not ’statistically •Parameters
•Normal Approx.
significant,’ benefit from use of beta-blockers.) •Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 63/83


The Data and Aim of study

■ Meta-analysis Data: 22 clinical trials, each with Hierarchical


models(III)

%*òl
two groups of heart attack(myocardial infarction, — Meta-analysis of
clinical trials

ÉN{Ž†
) patients receiving (or not)
(Ref.: Gelman et
beta-blockers( ) (samples sizes from al., 5.6, 19.4)
100 to almost 2000, mortality from 3% to 21% •Meta-analysis
•The data
showing a modest, though not ’statistically •Parameters
•Normal Approx.
significant,’ benefit from use of beta-blockers.) •Possible
Assumptions
■ Aim: Use a combined analysis of the studies to •Classical model
•Classical test
measure the strength of evidence for (and •HM–stage 1
•HM–stage 2
magnitude of) any beneficial effect of the
.
•Posterior
treatment under study. distn s
•Result
•Another example
■ Note: Any formal analysis must be preceded by •Exercises
the application of rigorous inclusion criteria.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 63/83


Meta-analysis: The data

The data (Yusuf et al., 1985): Hierarchical


models(III)
— Meta-analysis of
Trial Raw Data Estimated Estimated clinical trials

j Control Treated log(OR) SD (Ref.: Gelman et


al., 5.6, 19.4)
(deaths/total) (deaths/total) yj σj •Meta-analysis
•The data
1 3/39 3/38 0.028 0.850 •Parameters
•Normal Approx.
2 14/116 7/114 -0.741 0.483 •Possible
Assumptions
3 11/93 5/69 -0.541 0.565 •Classical model
4 127/1520 102/1533 -0.246 0.138 •Classical test
•HM–stage 1
5 27/365 28/355 0.069 0.281 •HM–stage 2

.
•Posterior
6 6/52 4/59 -0.584 0.676 distn s
•Result
. ··· ··· ··· ··· •Another example
•Exercises
21 43/364 27/391 -0.591 0.257
22 39/674 22/680 -0.608 0.272
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 64/83


Parameters for each clinical trial

Meta-analysis involves data in the form of several Hierarchical


models(III)
2 × 2 tables. — Meta-analysis of
clinical trials

(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 65/83


Parameters for each clinical trial

Meta-analysis involves data in the form of several Hierarchical


models(III)
2 × 2 tables. — Meta-analysis of
clinical trials

■ In trial j there are n0j control subjects and n1j (Ref.: Gelman et
al., 5.6, 19.4)
treatment subjects, with y0j and y1j deaths •Meta-analysis
respectively. •The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 65/83


Parameters for each clinical trial

Meta-analysis involves data in the form of several Hierarchical


models(III)
2 × 2 tables. — Meta-analysis of
clinical trials

■ In trial j there are n0j control subjects and n1j (Ref.: Gelman et
al., 5.6, 19.4)
treatment subjects, with y0j and y1j deaths •Meta-analysis
respectively. •The data
•Parameters
•Normal Approx.
■ Sampling model: y0j and y1j have independent •Possible
Assumptions
binomial sampling distributions with probabilities •Classical model
•Classical test
of death p0j and p1j respectively. •HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 65/83


Odds ratios as a measure of effectiveness

Estimands of interest: Hierarchical


models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 66/83


Odds ratios as a measure of effectiveness

Estimands of interest: Hierarchical


models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis

.
•The data
•Parameters
We ll use the natural logarithm of the odds ratio, •Normal Approx.
θj = log ρj , as a measure of effect size comparing •Possible
Assumptions
treatment to control groups. •Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 66/83


Odds ratios as a measure of effectiveness

Estimands of interest: Hierarchical


models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis

.
•The data
•Parameters
We ll use the natural logarithm of the odds ratio, •Normal Approx.
θj = log ρj , as a measure of effect size comparing •Possible
Assumptions
treatment to control groups. The reasons are: •Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 66/83


Odds ratios as a measure of effectiveness

Estimands of interest: Hierarchical


models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis

.
•The data
•Parameters
We ll use the natural logarithm of the odds ratio, •Normal Approx.
θj = log ρj , as a measure of effect size comparing •Possible
Assumptions
treatment to control groups. The reasons are: •Classical model
•Classical test
■ Interpretability in a range of study designs •HM–stage 1
•HM–stage 2
(cohorts, case-control and clinical trials).
.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 66/83


Odds ratios as a measure of effectiveness

Estimands of interest: Hierarchical


models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis

.
•The data
•Parameters
We ll use the natural logarithm of the odds ratio, •Normal Approx.
θj = log ρj , as a measure of effect size comparing •Possible
Assumptions
treatment to control groups. The reasons are: •Classical model
•Classical test
■ Interpretability in a range of study designs •HM–stage 1
•HM–stage 2
(cohorts, case-control and clinical trials).
.
•Posterior
distn s
■ Posterior distribution of θj = log ρj close to
•Result
normality even for small sample sizes. •Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 66/83


Odds ratios as a measure of effectiveness

Estimands of interest: Hierarchical


models(III)
1. difference in probability:p1j − p0j — Meta-analysis of
clinical trials
2. risk ratio: p1j /p0j
(Ref.: Gelman et
p1j /(1−p1j ) al., 5.6, 19.4)
3. odds ratio: ρj = p0j /(1−p0j ) •Meta-analysis

.
•The data
•Parameters
We ll use the natural logarithm of the odds ratio, •Normal Approx.
θj = log ρj , as a measure of effect size comparing •Possible
Assumptions
treatment to control groups. The reasons are: •Classical model
•Classical test
■ Interpretability in a range of study designs •HM–stage 1
•HM–stage 2
(cohorts, case-control and clinical trials).
.
•Posterior
distn s
■ Posterior distribution of θj = log ρj close to
•Result
normality even for small sample sizes. •Another example
•Exercises
■ Canonical (natural) parameter for logistic

regression.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 66/83


Normal approx. to the likelihood

Summarize the results of each trial with an Hierarchical


models(III)
approximate normal likelihood for θj . — Meta-analysis of
clinical trials

(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 67/83


Normal approx. to the likelihood

Summarize the results of each trial with an Hierarchical


models(III)
approximate normal likelihood for θj . — Meta-analysis of
clinical trials

Let yj represent the empirical logit, a point (Ref.: Gelman et


al., 5.6, 19.4)
estimate of the effect θj in the jth study where •Meta-analysis
j = 1, 2, . . . , J : •The data
•Parameters
    •Normal Approx.
y1j y0j •Possible
yj = log − log , Assumptions
n1j − y1j n0j − y0j •Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 67/83


Normal approx. to the likelihood

Summarize the results of each trial with an Hierarchical


models(III)
approximate normal likelihood for θj . — Meta-analysis of
clinical trials

Let yj represent the empirical logit, a point (Ref.: Gelman et


al., 5.6, 19.4)
estimate of the effect θj in the jth study where •Meta-analysis
j = 1, 2, . . . , J : •The data
•Parameters
    •Normal Approx.
y1j y0j •Possible
yj = log − log , Assumptions
n1j − y1j n0j − y0j •Classical model
•Classical test
•HM–stage 1
with approximate sampling variance: •HM–stage 2

.
•Posterior
1 1 1 1 distn s
σj2 = + + + . •Result
y1j n1j − y1j y0j n0j − y0j •Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 67/83


Remarks

1. Here we use the results of one analytic Hierarchical


models(III)
approach to produce a point estimate and — Meta-analysis of
clinical trials
standard error that can be regarded as
(Ref.: Gelman et
approximately a normal mean and standard al., 5.6, 19.4)
deviation. •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 68/83


Remarks

1. Here we use the results of one analytic Hierarchical


models(III)
approach to produce a point estimate and — Meta-analysis of
clinical trials
standard error that can be regarded as
(Ref.: Gelman et
approximately a normal mean and standard al., 5.6, 19.4)
deviation. •Meta-analysis
•The data
2. We use the notation yj and σj2 to be consistent •Parameters
•Normal Approx.
with the earlier expressions for the hierarchical •Possible
Assumptions
normal model. •Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 68/83


Remarks

1. Here we use the results of one analytic Hierarchical


models(III)
approach to produce a point estimate and — Meta-analysis of
clinical trials
standard error that can be regarded as
(Ref.: Gelman et
approximately a normal mean and standard al., 5.6, 19.4)
deviation. •Meta-analysis
•The data
2. We use the notation yj and σj2 to be consistent •Parameters
•Normal Approx.
with the earlier expressions for the hierarchical •Possible
Assumptions
normal model. •Classical model
•Classical test
3. We do not use the continuity correction of •HM–stage 1
•HM–stage 2
adding a fraction such as 1/2 to the four counts
.
•Posterior
distn s
of the contingency table to improve the •Result
asymptotic normality of the sampling •Another example
•Exercises
distributions.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 68/83


Possible Assumptions about θj′ s

1. Studies are identical replications, so θj = µ for Hierarchical


models(III)
all j (no heterogeneity) or — Meta-analysis of
clinical trials

(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 69/83


Possible Assumptions about θj′ s

1. Studies are identical replications, so θj = µ for Hierarchical


models(III)
all j (no heterogeneity) or — Meta-analysis of
clinical trials

2. No comparability between studies so that each (Ref.: Gelman et


al., 5.6, 19.4)
study provides no information about the others •Meta-analysis
(complete heterogeneity) or •The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 69/83


Possible Assumptions about θj′ s

1. Studies are identical replications, so θj = µ for Hierarchical


models(III)
all j (no heterogeneity) or — Meta-analysis of
clinical trials

2. No comparability between studies so that each (Ref.: Gelman et


al., 5.6, 19.4)
study provides no information about the others •Meta-analysis
(complete heterogeneity) or •The data
•Parameters
•Normal Approx.
3. Studies are exchangeable but not identical or •Possible
Assumptions
completely unrelated (compromise between 1 •Classical model
•Classical test
and 2). •HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 69/83


Classical model for a fixed treated effect

■ If all of the θj are identical and equal to a Hierarchical


models(III)
common treatment effect µ, then — Meta-analysis of
clinical trials

yj |µ, σj2 ∼ N (µ, σj2 ). (Ref.: Gelman et


al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 70/83


Classical model for a fixed treated effect

■ If all of the θj are identical and equal to a Hierarchical


models(III)
common treatment effect µ, then — Meta-analysis of
clinical trials

yj |µ, σj2 ∼ N (µ, σj2 ). (Ref.: Gelman et


al., 5.6, 19.4)
•Meta-analysis
■ The classical pooled estimate µ̂ of µ weights •The data
•Parameters
each trial estimate inversely by its variance: •Normal Approx.
PJ •Possible
2
j=1 y j /σ j
Assumptions
•Classical model
µ̂ = PJ 2
. •Classical test
j 1/σj •HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 70/83


Classical model for a fixed treated effect

■ If all of the θj are identical and equal to a Hierarchical


models(III)
common treatment effect µ, then — Meta-analysis of
clinical trials

yj |µ, σj2 ∼ N (µ, σj2 ). (Ref.: Gelman et


al., 5.6, 19.4)
•Meta-analysis
■ The classical pooled estimate µ̂ of µ weights •The data
•Parameters
each trial estimate inversely by its variance: •Normal Approx.
PJ •Possible
2
j=1 y j /σ j
Assumptions
•Classical model
µ̂ = PJ 2
. •Classical test
j 1/σj •HM–stage 1
•HM–stage 2

.
•Posterior
■ Assumptions imply µ̂ normal with variance distn s
PJ •Result
1/ j=1 1/σj2 . •Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 70/83


Classical test of heterogeneity

■ A classical test for heterogeneity, that is, whether Hierarchical


models(III)
it is reasonable to assume all the trials are — Meta-analysis of
clinical trials
measuring the same quantity, is provided by
(Ref.: Gelman et
J al., 5.6, 19.4)
X (yj − µ̂)2 •Meta-analysis
Q= •The data
j=1
σj2 •Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 71/83


Classical test of heterogeneity

■ A classical test for heterogeneity, that is, whether Hierarchical


models(III)
it is reasonable to assume all the trials are — Meta-analysis of
clinical trials
measuring the same quantity, is provided by
(Ref.: Gelman et
J al., 5.6, 19.4)
X (yj − µ̂)2 •Meta-analysis
Q= •The data
j=1
σj2 •Parameters
•Normal Approx.
•Possible
■ which has a χ2J−1 distribution under the null Assumptions
•Classical model
hypothesis of homogeneity. •Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 71/83


Classical test of heterogeneity

■ A classical test for heterogeneity, that is, whether Hierarchical


models(III)
it is reasonable to assume all the trials are — Meta-analysis of
clinical trials
measuring the same quantity, is provided by
(Ref.: Gelman et
J al., 5.6, 19.4)
X (yj − µ̂)2 •Meta-analysis
Q= •The data
j=1
σj2 •Parameters
•Normal Approx.
•Possible
■ which has a χ2J−1 distribution under the null Assumptions
•Classical model
hypothesis of homogeneity. •Classical test
•HM–stage 1
•HM–stage 2
■ It is well known that this is not a very powerful
.
•Posterior
test (Whitehead (2002)). distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 71/83


A hierarchical model: Stage 1

■ The first stage of the hierarchical model assumes Hierarchical


models(III)
that: — Meta-analysis of
clinical trials
yj |θj , σj2 ∼ N (θj , σj2 ).
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 72/83


A hierarchical model: Stage 1

■ The first stage of the hierarchical model assumes Hierarchical


models(III)
that: — Meta-analysis of
clinical trials
yj |θj , σj2 ∼ N (θj , σj2 ).
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
■ The simplification of known variances is •The data
•Parameters
reasonable with large sample sizes (but see its •Normal Approx.

/ 0
multivariate analysis that use the •Possible
Assumptions
true binomial sampling distribution). •Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 72/83


A hierarchical model: Stage 2

The second stage of the hierarchical model Hierarchical


models(III)
assumes that the trial means θj are exchangeable — Meta-analysis of
clinical trials
with a normal distribution
(Ref.: Gelman et
2
θj ∼ N (µ, τ ). al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 73/83


■ If µ and τ 2 are fixed and known, then the Hierarchical
models(III)
conditional posterior distribution of the θj′ s are — Meta-analysis of
clinical trials
independent, and
(Ref.: Gelman et
al., 5.6, 19.4)
θj |µ, τ, y ∼ N (θ̂j , Vj ), •Meta-analysis
•The data
where •Parameters
•Normal Approx.
1 1 •Possible
y
σj2 j
+ τ2
µ 1 Assumptions
θ̂j = 1 1 and Vj = 1 1 . •Classical model
σj2
+ τ2 σj2
+ τ2
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 74/83


■ If µ and τ 2 are fixed and known, then the Hierarchical
models(III)
conditional posterior distribution of the θj′ s are — Meta-analysis of
clinical trials
independent, and
(Ref.: Gelman et
al., 5.6, 19.4)
θj |µ, τ, y ∼ N (θ̂j , Vj ), •Meta-analysis
•The data
where •Parameters
•Normal Approx.
1 1 •Possible
y
σj2 j
+ τ2
µ 1 Assumptions
θ̂j = 1 1 and Vj = 1 1 . •Classical model
σj2
+ τ2 σj2
+ τ2
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
■ Note that the posterior mean is a distn s
precision-weighted average of the prior •Result
•Another example
population mean and the observed yj •Exercises
representing the treatment effect in the j th group.
SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 74/83


.
Posterior distn s for the θj′ s given y, µ, τ

■ The expression for the posterior distribution of θj Hierarchical


models(III)
can be rearranged as — Meta-analysis of
clinical trials

θj |yj , µ, τ ∼ N (Bj µ + (1 − Bj )yj , (1 − Bj )σj2 ), (Ref.: Gelman et


al., 5.6, 19.4)

where Bj = σj2 /(σj2 + τ 2 ) is the weight given to •Meta-analysis


•The data
the prior mean. •Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 75/83


.
Posterior distn s for the θj′ s given y, µ, τ

■ The expression for the posterior distribution of θj Hierarchical


models(III)
can be rearranged as — Meta-analysis of
clinical trials

θj |yj , µ, τ ∼ N (Bj µ + (1 − Bj )yj , (1 − Bj )σj2 ), (Ref.: Gelman et


al., 5.6, 19.4)

where Bj = σj2 /(σj2 + τ 2 ) is the weight given to •Meta-analysis


•The data
the prior mean. •Parameters
•Normal Approx.
•Possible
Assumptions
■ Ignoring data from the other trials is equivalent to •Classical model
setting τ 2 = ∞, that is, Bj = 0. •Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 75/83


.
Posterior distn s for the θj′ s given y, µ, τ

■ The expression for the posterior distribution of θj Hierarchical


models(III)
can be rearranged as — Meta-analysis of
clinical trials

θj |yj , µ, τ ∼ N (Bj µ + (1 − Bj )yj , (1 − Bj )σj2 ), (Ref.: Gelman et


al., 5.6, 19.4)

where Bj = σj2 /(σj2 + τ 2 ) is the weight given to •Meta-analysis


•The data
the prior mean. •Parameters
•Normal Approx.
•Possible
Assumptions
■ Ignoring data from the other trials is equivalent to •Classical model
setting τ 2 = ∞, that is, Bj = 0. •Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
■ The classical pooled result results from τ 2 → 0, distn s
•Result
that is, Bj = 1. •Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 75/83


Posterior distn for the µ given y, τ

■ A uniform conditional prior distribution p(µ|τ ) = 1 Hierarchical


models(III)
for µ leads to the following posterior distribution: — Meta-analysis of
clinical trials

µ|τ, y ∼ N (µ̂, Vµ ) (Ref.: Gelman et


al., 5.6, 19.4)
where µ̂ is is the precision weighted average of •Meta-analysis
•The data
the yj values, and Vµ−1 is the total precision: •Parameters
•Normal Approx.
PJ 1 •Possible
j=1 σj +τ 2 yj
J Assumptions
2
−1
X 1
µ̂ = 1 and Vµ = 2 2
. •Classical model
PJ
σ 2 +τ 2
σ +τ
j=1 j
•Classical test
•HM–stage 1
j=1 j
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 76/83


Posterior distn for the µ given y, τ

■ A uniform conditional prior distribution p(µ|τ ) = 1 Hierarchical


models(III)
for µ leads to the following posterior distribution: — Meta-analysis of
clinical trials

µ|τ, y ∼ N (µ̂, Vµ ) (Ref.: Gelman et


al., 5.6, 19.4)
where µ̂ is is the precision weighted average of •Meta-analysis
•The data
the yj values, and Vµ−1 is the total precision: •Parameters
•Normal Approx.
PJ 1 •Possible
j=1 σj +τ 2 yj
J Assumptions
2
−1
X 1
µ̂ = 1 and Vµ = 2 2
. •Classical model
PJ
σ 2 +τ 2
σ +τ
j=1 j
•Classical test
•HM–stage 1
j=1 j
•HM–stage 2

.
•Posterior
■ τ 2 → 0 gives the classical result where Bj = 1. distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 76/83


The exchangeable model and shrinkage

■ The exchangeable model therefore leads to Hierarchical


models(III)
narrower posterior intervals for the θj′ s than the
/ 0
— Meta-analysis of
clinical trials
independence model, but they are shrunk
(Ref.: Gelman et
towards the prior mean response. al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 77/83


The exchangeable model and shrinkage

■ The exchangeable model therefore leads to Hierarchical


models(III)
narrower posterior intervals for the θj′ s than the
/ 0
— Meta-analysis of
clinical trials
independence model, but they are shrunk
(Ref.: Gelman et
towards the prior mean response. al., 5.6, 19.4)
•Meta-analysis
•The data
■ The degree of shrinkage depends on the •Parameters
•Normal Approx.
variability between studies, measured by τ 2 , and •Possible
Assumptions
the precision of the estimate of the treatment •Classical model
effect from the individual trial, measured by σj2 . •Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 77/83


The full hierarchical model

.ll use
■ The hierarchical model is completed by Hierarchical
models(III)
specifying a prior distribution for τ — we — Meta-analysis of
clinical trials
the noninformative prior p(τ ) = 1.
(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 78/83


The full hierarchical model

.ll use
■ The hierarchical model is completed by Hierarchical
models(III)
specifying a prior distribution for τ — we — Meta-analysis of
clinical trials
the noninformative prior p(τ ) = 1.
(Ref.: Gelman et
al., 5.6, 19.4)
■ Nevertheless, p(τ |y) is a complicated function of •Meta-analysis
•The data
τ: •Parameters
QJ •Normal Approx.
2 2
j=1 N (y j |µ̂, σj + τ ) •Possible
Assumptions
p(τ |y) ∝ •Classical model
N (µ̂|µ̂, Vµ ) •Classical test
J •HM–stage 1
2
 
1/2
Y
2 2 −1/2 (yj − µ̂) •HM–stage 2
∝ Vµ (σj + τ ) exp − .
.
2 •Posterior
2(σj + τ )2 distn s
j=1 •Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 78/83


The profile likelihood for τ

■ A tractable alternative to the marginal posterior Hierarchical


models(III)
distribution is the profile likelihood for τ , derived by — Meta-analysis of
clinical trials
replacing µ in the joint likelihood for µ and τ by its
(Ref.: Gelman et
conditional maximum likelihood estimate µ̂(τ ) al., 5.6, 19.4)
given the value of τ . •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 79/83


The profile likelihood for τ

■ A tractable alternative to the marginal posterior Hierarchical


models(III)
distribution is the profile likelihood for τ , derived by — Meta-analysis of
clinical trials
replacing µ in the joint likelihood for µ and τ by its
(Ref.: Gelman et
conditional maximum likelihood estimate µ̂(τ ) al., 5.6, 19.4)
given the value of τ . •Meta-analysis
•The data
•Parameters
•Normal Approx.
■ This summarizes the support for different values •Possible
of τ and is easily evaluated as Assumptions
•Classical model
J •Classical test
Y •HM–stage 1
N (yj |µ̂(τ ), σj2 + τ 2 ). •HM–stage 2

.
•Posterior
j=1 distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 79/83


Estimates of τ

■ The maximum likelihood estimate is τ = 0 Hierarchical


models(III)
although values of τ with a profile log(likelihood) — Meta-analysis of
clinical trials
above −1.9622 /2 ≈ −2 might be considered as
(Ref.: Gelman et
being reasonably supported by the data. al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 80/83


Estimates of τ

■ The maximum likelihood estimate is τ = 0 Hierarchical


models(III)
although values of τ with a profile log(likelihood) — Meta-analysis of
clinical trials
above −1.9622 /2 ≈ −2 might be considered as
(Ref.: Gelman et
being reasonably supported by the data. al., 5.6, 19.4)
•Meta-analysis
•The data
■ τ = 0 would not appear to be a robust choice as •Parameters
•Normal Approx.
an estimate since non-zero values of τ , which •Possible
are well-supported by the data, can have a Assumptions
•Classical model
strong influence on the conclusions. We shall •Classical test
•HM–stage 1
assume, for illustration, the method-of-moments •HM–stage 2
estimator τ̂ = 0.41. .
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 80/83


Results of the meta-analysis

■ Posterior quantiles of θj′ s in Table 5.4 on page Hierarchical


models(III)
146 — Meta-analysis of
clinical trials

(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 81/83


Results of the meta-analysis

■ Posterior quantiles of θj′ s in Table 5.4 on page Hierarchical


models(III)
146 — Meta-analysis of
clinical trials

(Ref.: Gelman et
■ Estimates of µ and τ , and the predicted θj , see al., 5.6, 19.4)
Table 5.5 on page 149. •Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 81/83


Another meta-analysis example

8 clinical trials, each with each with two groups of Hierarchical


models(III)
heart attack patients receiving (or not) IV — Meta-analysis of
clinical trials
magnesium sulfate (rates 1% to 17%, samples
(Ref.: Gelman et
sizes from < 50 to more than 2300) al., 5.6, 19.4)
•Meta-analysis
Trial Magnesium group control group Estimated Estimated
•The data
deaths patients deaths patients log(OR) yj SD σj •Parameters
Morton 1 40 2 36 -0.83 1.25 •Normal Approx.
Rasmussen 9 135 23 135 -1.06 0.41 •Possible
Smith 2 200 23 135 -1.28 0.81 Assumptions
Abraham 1 48 1 46 -0.04 1.43 •Classical model
Feldstedt 10 150 8 148 0.22 0.49 •Classical test
Schechter 1 59 9 56 -2.41 1.07 •HM–stage 1
Ceremuzynski 1 25 3 23 -1.28 1.19 •HM–stage 2

.
LIMIT-2 90 1159 118 1157 -0.30 0.15 •Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 82/83


Another meta-analysis example

8 clinical trials, each with each with two groups of Hierarchical


models(III)
heart attack patients receiving (or not) IV — Meta-analysis of
clinical trials
magnesium sulfate (rates 1% to 17%, samples
(Ref.: Gelman et
sizes from < 50 to more than 2300) al., 5.6, 19.4)
•Meta-analysis
Trial Magnesium group control group Estimated Estimated
•The data
deaths patients deaths patients log(OR) yj SD σj •Parameters
Morton 1 40 2 36 -0.83 1.25 •Normal Approx.
Rasmussen 9 135 23 135 -1.06 0.41 •Possible
Smith 2 200 23 135 -1.28 0.81 Assumptions
Abraham 1 48 1 46 -0.04 1.43 •Classical model
Feldstedt 10 150 8 148 0.22 0.49 •Classical test
Schechter 1 59 9 56 -2.41 1.07 •HM–stage 1
Ceremuzynski 1 25 3 23 -1.28 1.19 •HM–stage 2

.
LIMIT-2 90 1159 118 1157 -0.30 0.15 •Posterior
distn s
•Result
Please analyze the data using a normal-normal •Another example
•Exercises
hierarchical Bayesian model.

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 82/83


Exercises

Ex 5.10 Hierarchical
models(III)
— Meta-analysis of
clinical trials

(Ref.: Gelman et
al., 5.6, 19.4)
•Meta-analysis
•The data
•Parameters
•Normal Approx.
•Possible
Assumptions
•Classical model
•Classical test
•HM–stage 1
•HM–stage 2

.
•Posterior
distn s
•Result
•Another example
•Exercises

SCHOOL OF FINANCE AND S TAT I S T I C S

April 28, 2009 Chapter 2 - p. 83/83

You might also like