You are on page 1of 10

Biometrics DOI: 10.1111/j.1541-0420.2007.01039.

Bayesian Distributed Lag Models: Estimating Effects of Particulate


Matter Air Pollution on Daily Mortality

L. J. Welty,1,∗ R. D. Peng,2 S. L. Zeger,2 and F. Dominici2


1
Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine,
680 North Lake Shore Drive, Suite 1102, Chicago, Illinois 60611, U.S.A.
2
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health,
615 North Wolfe Street, Baltimore, Maryland 21205, U.S.A.

email: lwelty@northwestern.edu

Summary. A distributed lag model (DLagM) is a regression model that includes lagged exposure vari-
ables as covariates; its corresponding distributed lag (DL) function describes the relationship between the
lag and the coefficient of the lagged exposure variable. DLagMs have recently been used in environmental
epidemiology for quantifying the cumulative effects of weather and air pollution on mortality and morbid-
ity. Standard methods for formulating DLagMs include unconstrained, polynomial, and penalized spline
DLagMs. These methods may fail to take full advantage of prior information about the shape of the DL
function for environmental exposures, or for any other exposure with effects that are believed to smoothly
approach zero as lag increases, and are therefore at risk of producing suboptimal estimates. In this article,
we propose a Bayesian DLagM (BDLagM) that incorporates prior knowledge about the shape of the DL
function and also allows the degree of smoothness of the DL function to be estimated from the data. We
apply our BDLagM to its motivating data from the National Morbidity, Mortality, and Air Pollution Study
to estimate the short-term health effects of particulate matter air pollution on mortality from 1987 to 2000
for Chicago, Illinois. In a simulation study, we compare our Bayesian approach with alternative methods
that use unconstrained, polynomial, and penalized spline DLagMs. We also illustrate the connection be-
tween BDLagMs and penalized spline DLagMs. Software for fitting BDLagM models and the data used in
this article are available online.
Key words: Air pollution; Bayes; Distributed lag; Mortality; NMMAPS; Penalized splines; Smoothing;
Time series.

1. Introduction might underestimate the risk of mortality associated with air


Distributed lag models (DLagMs; Almon, 1965) are regression pollution (Schwartz, 2000; Zanobetti et al., 2003; Goodman
models that include lagged exposure variables, or distributed et al., 2004; Roberts, 2005).
lags (DLs), as covariates. They have recently been employed Exposure variables, such as ambient air pollution levels,
in environmental epidemiology for estimating short-term cu- may be highly correlated over time, making DL coefficients
mulative effects of environmental exposures on daily mortal- difficult to estimate. A general solution is to constrain the co-
ity or morbidity (e.g., Pope et al., 1991; Pope and Schwartz, efficients as a function of lag. Common constraints include a
1996; Braga et al., 2001; Zanobetti et al., 2002; Kim, Kim, polynomial (Almon, 1965) or a spline (Corradi, 1977). Esti-
and Hong, 2003; Bell McDermott, Zeger, Samet, and Do- mating DLagMs as varying-coefficient models constrains the
minici, 2004; Goodman, Dockery, and Clancy, 2004; Welty coefficients to follow a natural cubic spline (Hastie and Tib-
and Zeger, 2005). DLagMs are specialized types of varying- shirani, 1993). The DL function for air pollution and mor-
coefficient models (Hastie and Tibshirani, 1993) and dynamic tality has been estimated with polynomial constraints (e.g.,
linear models (Ravines, Schmidt, and Migon, 2006). Schwartz, 2000, Braga et al., 2001; Kim et al., 2003; Bell,
For Poisson log-linear DLagMs that estimate the effects Samet, and Dominici, 2004; Goodman et al., 2004), spline
of lagged air pollution levels on daily mortality counts, the constraints (Zanobetti et al., 2000), and without constraints
sum of the DL coefficients is interpreted as the percentage (Zanobetti et al., 2003).
increase in daily mortality associated with a one unit in- Each type of constraint on the DL coefficients is an appli-
crease in air pollution on each of the previous days. Because cation of prior knowledge to model specification. In the con-
the time from exposure to event will almost certainly vary in text of air pollution and mortality, prior knowledge suggests
a population, this sum is a more appropriate measure of the that short-term risk of mortality varies smoothly as a func-
effect of short-term exposure than a single day’s coefficient. tion of lag and decreases to zero. Prior knowledge about the
Results from previous time series studies suggest that com- effects of air pollution on mortality at early lags is limited.
pared to DLagMs, models with single day pollution exposures There may be short delays in health effects after exposure,


C 2008, The International Biometric Society 1
2 Biometrics

as suggested by studies of single day pollution exposures that Though our BDLagM formulation was motivated by a de-
find the largest effect on mortality at lag day 1 (Zmirou et al., sire to model flexibly the DL function between lagged PM
1988; Katsouyanni et al., 2001; Dominici et al., 2003). In the levels and daily mortality counts, it is relevant to situations
scenario of mortality displacement (Schimmel and Murawsky, in which the lagged effects of an exposure on an outcome
1978), in which high air pollution levels may advance by sev- are unknown for the first few lags but are believed to dissi-
eral days the deaths of frail individuals, the DL function may pate with lag. Using BDLagMs with repeated measures data
be zero or positive at early lags, then decrease and become would require extensions to our approach. For documenta-
negative (Zanobetti et al., 2000, 2002). If there were both a tion and to encourage implementation, our BDLagM soft-
delay in health effect and mortality displacement, hypotheses ware is available online at http://www.ihapss.jhsph.edu/
concerning the sign or smoothness of the DL function at early software/BayesDLM/.
lags would be tenuous at best.
For more appropriate model specification and improved es- 2. Bayesian DLagMs
timation, it may be advisable to formulate DLagMs so that Let y t and x t be the outcome and exposure time series. We
(i) coefficients are constrained to approach zero smoothly consider a generalized linear DLagM g(E[yt | x1 , . . . , xt ]) =
L
with increasing lag and (ii) early coefficients are relatively θ x where L is the maximum lag and θ = (θ 0 , . . . , θ L )
=0  t−
unconstrained. Neither polynomial nor spline constraints, the is the vector of the DL coefficients to be estimated. Initially
most common methods for specifying DLagMs, include this  will consider the normal linear model E[yt | x1 , . . . , xt ] =
we
prior information in estimation. In this article, we develop θ xt− , with Y t independent normal with constant vari-
Bayesian DLagMs (BDLagMs) that incorporate our under- ance.
standing of the relationship between short-term fluctuations The goal is to specify a prior on θ = (θ 0 , θ 1 , . . . , θ L ) that
of particulate matter (PM) air pollution and daily fluctuations is uninformative on the DL coefficients for small  but that
in mortality counts. Our prior distribution specifies that as constrains the coefficients with larger  to be smoother and ap-
lag increases, the DL function will have increasing smooth- proach zero. We assume θ ∼ N (0, Ω), where Ω is constructed
ness and approach zero. An advantage of our approach is so that for increasing lag the diagonal elements decrease to
that the degree of smoothness of the DL function is estimated zero (Var(θ  ) → 0) and the off–diagonal elements in its corre-
from the data. We note that BDLagMs have been explored in lation matrix increase to one (Cor(θ −1 , θ  ) → 1). Care must
economics (e.g., Leamer, 1972; Schiller, 1973; Ravines et al., be taken to construct Ω so that it remains positive definite.
2006), and autoregressive priors have been used generally to A natural approach is to define Ω = ABA, where AAT is the
smooth time-dependent coefficients in generalized linear mod- diagonal matrix of the individual variances of the θ  s, and B is
els (e.g., Fahrmeir and Knorr-Held, 1997; Manda and Meyer, the correlation matrix for θ . Specifying an appropriate Ω may
2005). However, our prior is quite different from those using then be achieved by setting A equal to the Cholesky decom-
a constant degree of smoothness (Schiller, 1973), a particu- position of a diagonal matrix with the desired prior variances
lar parametric form (Leamer, 1972; Ravines et al., 2006), or and setting B equal to the correlation matrix for increasingly
an autoregressive structure (e.g., Fahrmeir and Knorr-Held, correlated normal random variables.
1997; Manda and Meyer, 2005). To define A, let the parameter σ 2 be the prior variance of
We apply our BDLagM to data from the National Mor- θ 0 , and set Var(θ 1 ) = v 1 σ 2 , . . . , Var(θ L ) = v L σ 2 where the v  s
bidity, Mortality, and Air Pollution Study (NMMAPS) to es- are a decreasing sequence of weights such that 1 ≥ v 1 ≥ · · · ≥
timate the shape of the DL function between daily PM and v L > 0. We parameterize them by v  (η 1 ) = exp(η 1 ), η 1 ≤ 0,
daily deaths for Chicago, Illinois from 1987 to 2000. We exam- so that the hyperparameter η 1 governs how quickly the prior
ine the sensitivity of the estimated DL function to the speci- variances of the θ  s approach zero. Choosing the exponential
fication of the BDLagM prior. We compare the air pollution function is convenient but not required. Let V(η 1 ) be the
effect estimated with the BDLagM to that estimated using diagonal matrix with entries 1, v 1 (η 1 )1/2 , . . . , v L (η 1 )1/2 . We
unconstrained maximum likelihood (ML). We also compare set A = σV(η 1 ).
air pollution effects estimated under the full formulation of To specify the correlation matrix B, we similarly define
the BDLagM, computed using a Gibbs sampler, to those es- w  (η 2 ) = exp(η 2 ), η 2 ≤ 0, to be a decreasing sequence of
timated under an approximate formulation, computed using weights, and M(η 2 ) to be the (L + 1) × (L + 1) diago-
a closed form expression. nal matrix with entries 1, w 1 (η 2 ), . . . , w L (η 2 ). We let B =
We also conduct a simulation study comparing BDLagMs W(η 2 ), where W(η 2 ) is the correlation matrix derived from
to unconstrained, polynomial, and penalized spline DLagMs. the covariance matrix M(η 2 )M(η 2 ) + {I L+1 − M(η 2 )}1 L+1 ×

For penalized spline DLagMs, we compare estimates obtained 1L+1 {I L+1 − M(η 2 )} , where by 1 L+1 we mean a (L + 1) × 1
using generalized cross validation (GCV) and restricted maxi- vector of ones and by I L+1 we mean the (L + 1) × (L + 1)
mum likelihood estimation (REML; Ruppert, Wand, and Car- identity matrix. Then W(η 2 ) is the correlation matrix for
roll, 2003). We include DLagMs that are consistent with bi- the mixture of normal random variables M(η 2 )X 1 + {I L+1 −
ological knowledge along with DLagMs for which our BD- M(η 2 )}1 L+1 X 2 where X 1 ∼ N (0, I L+1 ) and X 2 ∼ N (0, 1).
LagMs may be misspecified. The first few elements of the independent X 1 are weighted
Because constraining DL coefficients is a way of smooth- more heavily than the corresponding first few elements of the
ing, we consider how our Bayesian approach relates to pe- dependent 1 L+1 X 2 , and the latter elements of the dependent
nalized spline DLagMs. We demonstrate that BDLagMs are 1 L+1 X 2 are weighted more heavily than the latter elements of
analogous to penalized spline DLagMs with a specific penalty the independent X 1 . The parameter η 2 controls how quickly
matrix derived from the BDLagM prior. the mixture moves from independent to dependent. The final
Bayesian Distributed Lag Models 3

form for the prior on θ is then N (0, σ 2 Ω(η )), where Ω(η ) = which are otherwise difficult to relate to biological or other
V(η 1 )W(η 2 )V(η 1 ) and η = (η 1 , η 2 ) . prior knowledge.
Let θ̂ be the ML estimate of the unconstrained DL co- Let θ = U γ , where U is a spline basis matrix and γ
efficients and let Σ be the sample covariance matrix. For a is a vector of spline coefficients. Let θ̂ be the ML esti-
normal linear DLagM, θ̂ is N (θ , Σ), so the posterior for θ mate of θ , and assume that θ̂ = U γ + ν , ν ∼ N (0, Σ), where
conditional on η and σ is Σ is the estimated covariance matrix for θ̂ . Under a p-
 −1 spline approach, we estimate γ by minimizing the criterion
θ | θ̂ , η , σ 2 ∼ N 1/σ 2 Ω(η )−1 + Σ−1 Σ−1 θ̂ , (θ̂ − U γ ) Σ−1 (θ̂ − U γ ) + λγ T D γ , where λ is a penalty pa-
 −1  rameter and D a positive semidefinite matrix (Eilers and
1/σ 2 Ω(η )−1 + Σ−1 . (1) Marx, 1996; Ruppert et al., 2003).
To show the connection between minimizing this criterion
For a general linear DLagM, the posterior distribution for θ and estimating the BDLagM, (1), we reformulate the p-spline
may not be available in closed form, but it may be computed in its Bayesian form θ̂ | γ ∼ N (U γ , Σ) and γ ∼ N (0, Γ),
through Gibbs sampling or other Markov chain Monte Carlo where Γ is the prior covariance matrix of γ . Because θ =
methods (e.g., Carlin and Louis, 2000). We discuss such an U γ , the prior on γ translates to prior θ ∼ N (0, U ΓU  ). In
approach for our PM air pollution and mortality example, in (1) we assume θ ∼ N (0, σ 2 Ω(η )), so we need Γ such that
which the Y t are Poisson Ldistributed daily mortality counts, U ΓU  = σ 2 Ω(η ), or Γ(η ) = R −1 Q  σ 2 Ω(η )Q R −1 where QR
log(E[yt | x1 , . . . , xt ]) = =0 θ xt− , and the likelihood for θ̂ is U’s qr-decomposition.
is Poisson. Under this formulation the log posterior for γ
The influence of the prior distribution in estimating θ is, up to a constant, − 12 (θ̂ − U γ) Σ−1 (θ̂ − U γ) −
depends on the values of hyperparameters σ 2 and η = 1   
2
γ U (U Γ(η )W )−1 U γ , and maximizing the log poste-
(η 1 , η 2 ) . The hyperparameter σ 2 , the prior variance of θ 0 , rior for γ is equivalent to minimizing the above criterion with
can be viewed as a tuning parameter determining the starting λ = 1 and D = U  (U Γ(η ) W  )−1 U (Silverman, 1985; Green
point of the DL function. In practice there is little informa- and Silverman, 1994). For a given value of the hyperparame-
tion in the data to jointly estimate σ 2 and η . We therefore ter η , the estimated DL coefficients are given by the posterior
assume σ 2 is ten times the estimated statistical variance of θ 0 mean U (U  Σ−1 U + U  (U Γ(η )U  )−1 U −1 )−1 U  Σ−1 θ̂ , and the
so that even for relatively large values of η , the prior has little equivalent degrees of freedom equal the trace of the smoother
to no influence on the first few DL coefficients. We examine matrix X (X TΣ−1 X + X T(X Γ(η )X T)−1 X −1 )X TΣ−1
sensitivity of BDLagM estimates to choice of σ in Section 5. (Ruppert et al., 2003).
Rather than setting values for η = (η 1 , η 2 ) and directly de- Though a prior on DL coefficients may be translated to
termining the influence of the prior, we let η = (η 1 , η 2 ) have a specific p-spline penalty, the spline approach requires that
a discrete uniform prior on N 1 × N 2 , where N 1 and N 2 are the DL function follow a specific form, θ = U γ . For our air
finite sets of possible values for η 1 and η 2 . Then the poste- pollution mortality example, we found that using a b-spline
rior distribution
 for θ can be defined as the weighted sum basis with L + 1 degrees of freedom produced estimates of θ
p(θ | θ̂ ) = η p(θ | θ̂ , η )p(η | θ̂ ), where p denotes a general identical to those from the BDLagM. In the following simula-
probability density. Under the assumption that θ̂ ∼ N (θ , Σ), tion study, we compare BDLagMs to p-splines with penalties
the marginal posterior density of the hyperparameter η is unrelated to the prior.
available in closed form. For a given η ∗ :
   −1

1  1
| σ 2 Ω(η ∗ )Σ−1 + I| −1/2 exp − θ̂ Σ−1 − Σ−1 Σ−1 + Ω(η ∗ )−1 Σ−1 θ̂
2 σ2
p(η ∗ | θ̂ ) =    −1
. (2)
−1 −1/2 1  −1 −1 −1 1 −1
| σ Ω(η )Σ
2
+ I| exp − θ̂ Σ −Σ Σ + 2 Ω(η )−1 Σ θ̂
2 σ
η

Sufficiently large ranges for N 1 and N 2 insure that the


data drive the strength or weakness of the prior distribution
and therefore the eventual smoothness of the estimated DL 4. Simulation Study
function. We conducted a simulation study to compare BDLagMs with
four methods for estimating DL functions—unconstrained,
3. Bayesian DLagMs and Penalized Splines polynomial, p-splines with penalty parameter chosen by GCV,
Following the well-established connection between nonpara- and p-splines estimated with REML. We generated data un-
metric smoothing and Bayesian modeling (e.g., Silverman, der 25 different sets of true DL coefficients, including examples
1985), we illustrate the relationship between normal linear for which coefficients do not decrease to zero and smoothness
BDLagMs and p-spline DLagMs. We show that estimating does not increase with lag. We categorize the DL functions
the normal linear DL function under model (1) is analogous by four characteristics: (1) shape—decaying exponential (E),
to fitting a p-spline to DL coefficients with penalty derived step function (St), or gamma distribution (G); (2) latency—
from our prior. An advantage of this connection is that our 0 or 2, the number of initial coefficients equal to zero; (3)
method of putting a prior directly on the coefficients may be oscillation—as described by (−1) mod 2, to mimic mortality
viewed as a transparent means for eliciting p-spline penalties, displacement; and (4) maximum nonzero lag−7 or 14, the lag
4 Biometrics

by which the coefficients are less than 0.01. We also considered substantially worse. However, Bayes consistently outperforms
a null DL function with all zero coefficients. All DL functions the others in estimating the lag 7 and the lag 14 coefficients
included current day ( = 0). We set L = 14 as in the sub- for scenarios in which the coefficients go to zero by lag 7 or 14.
sequent air pollution mortality example. Except for the null When the BDLagM is misspecified and the DL coefficients do
model, all the DL functions were normalized so the sum of not decrease smoothly to zero, performance of the BDLagM is
squares of the DL coefficients is 1. We refer to the nonnull less predictable. Bayes may estimate the total effect only 5%
functions by [Shape]o ([latency], [max lag]), where the super- worse than ML (and Poly and REML), or nearly 15% better
script indicates oscillation. (superior to Poly, GCV, REML).
Under each of the 25 scenarios, we generated 500 outcome Mortality counts are often modeled with Poisson log-linear
14
series y t from the model yt = δ =0 θ xt− + t where  t ∼ regression, so we also examine how our results extend to
i.i.d. N(0,1), and δ is a constant to balance signal and noise. the Poisson case. We simulated data from Y t ∼ Poisson(µ t ),
For the exposure series x t we used mean centered PM 10 for log(µ t ) = log(100) + Σ=14
=0 x t− θ  /100. The offset and division
1996 from Chicago, Illinois because there were no missing ob- by 100 were determined empirically to approximate Chicago
servations and the autocorrelation is similar to what we ex- mortality levels in 1996. For each set of DL coefficients, we
perience when estimating the association between PM 10 and generated 1000 mortality series. We estimated the posterior
mortality for Chicago for 1987–2000. For simplicity we take distribution for θ two ways—using (1) (approximating θ̂ as
the  t to be independent N (0, 1), noting that our simulations normal) or a Gibbs sampler. Web Table 2 compares the mean
still apply to situations in which the  t are autocorrelated be- squared errors of the total effects. The errors are comparable,
cause application of an appropriate linear filter will result in suggesting that the simulation results for normal outcomes
a new DLagM with independent normal errors. We setδ = are not necessarily misleading for Poisson outcomes.
0.25 to generate moderate evidence for a total effect, θ ,
in nonnull models (we empirically determined that δ = 0.25 5. Application to Particulate Matter Air Pollution
generates y t such that the t-statistic for the ML estimate for and Mortality

θ is approximately two). Similarly we set δ = 0.475 to
  In this section, we apply BDLagMs to daily time series of
generate strong evidence for total effect (we empirically de- PM with aerodynamic diameter less than 10 microns (PM 10 )
termined that δ = 0.475 generates
 y t such that the t-statistic and nonaccidental deaths for Chicago, Illinois for the period
for the ML estimate for 
θ  is approximately four). For 1987–2000. The data were collected from publicly available
each simulated data set we compared the DL functions un- sources as part of the NMMAPS. NMMAPS contains daily
der five methods: (1) unconstrained ML; (2) the proposed time series of age classified mortality, temperature, dew point,
Bayes’ method (Bayes) using the normal posterior as in (1); and PM 10 for 109 U.S. cities from 1987 to 2000. We ana-
(3) ML with a polynomial of degree four (Poly); (4) a pe- lyzed the time series for Chicago because it is the largest U.S.
nalized spline with penalty chosen by GCV (GCV); and (5) city in NMMAPS with few missing PM 10 values. Additional
a penalized spline estimated with REML (REML). We also details regarding NMMAPS data assembly are available at
considered estimating the DL function using an AR-1 model. http://www.ihapss.jhsph.edu/ and are discussed in previ-
With the exception of the null model and St0 (2, 14), the AR-1 ous NMMAPS analyses (Samet, Zeger, Dominici, Curriero,
model was not competitive, and was substantially worse when Dockery, Schwartz, and Zanobetti, 2000; Samet, Zeger, Do-
the DL function oscillates then goes to zero. minici, Schwartz, and Dockery, 2000; Dominici et al., 2003).
Figure 1 shows the estimated DL functions (white) av- Poisson log-linear regression is frequently used to estimate
eraged across the 500 simulations with the 95% confidence the association between day-to-day variations in mortality
bands (gray) for 24 of the true DL functions (black) (results counts and day-to-day variations in ambient air pollution lev-
not pictured for null model). Results are reported for δ = els. We accordingly assume that the mortality in Chicago on
0.25. Visual inspection of this figure indicates that the BD- day t, t = 1, . . . , 5114, is a Poisson random variable Y t with
LagM performs consistently well and estimates the true DL expectation E[Yt ] = µ t . As above, we let θ = (θ 0 , . . . , θ L )
function with narrower confidence bands than other methods. be the unknown DL coefficients we wish to estimate. We let
To quantify the comparison, we summarize  the mean x t denote the PM 10 time series and for t > L we let x t de-
squared errors of the estimated total effect ( θ ) and DL note the length L + 1 vector of lagged PM 10 values (x t , . . . ,
coefficients at lags 0, 7, and 14 under the five estimation meth- x t−L ) .
ods and for the 25 scenarios. Table 1 summarizes the results Multisite time series studies of single day exposure PM 10
for δ = 0.25. Results for δ = 0.475 are available in Web Ta- and mortality have found strong evidence of an association
ble 1. Mean squared errors are expressed as percentages of between PM 10 at lags l = 0, 1, and 2 and daily mortality
the mean squared error of the corresponding unconstrained (e.g., Zmirou et al., 1988; Burnett, Cakmak, and Brook, 1998;
ML estimates. Values smaller than 100 favor the proposed Katsouyanni et al., 2001; Dominici et al., 2003); single city
estimation methods with respect to unconstrained ML. studies with DLagMs have similarly found the largest effects
When the DL function decreases to zero, BDLagM is 10 to in the first seven lags (e.g., Schwartz, 2000; Zanobetti et al.,
15% better at estimating the total effect than ML, whereas 2003; Goodman et al., 2004). Though lags beyond two weeks
Poly, GCV, and REML perform comparably to ML. Results may have some influence on daily mortality (e.g., mortality
are similar for δ = 0.25 and δ = 0.475. The better performance displacement), it is unlikely that lags beyond 2 weeks have
of the Bayesian method with respect its competitors is mainly substantial influence on mortality compared to lags less than
due to its greater flexibility in estimating the DL coefficients 2 weeks (Zanobetti et al., 2003). Models containing lags be-
at the longer lags. Bayes is consistently 20–30% better than yond 2 weeks are additionally difficult to estimate because
ML for lag 0; GCV and REML may be substantially better or long-term averages of PM 10 have strong seasonal variation.
Bayesian Distributed Lag Models 5

Figure 1. Mean estimated DL functions (white) and 95% posterior bands (gray) under five estimation methods—
unconstrained ML, the proposed Bayesian method (Bayes), ML with a polynomial of degree four (Poly), a penalized spline
with penalty chosen by GCV (GCV), and a penalized spline estimated with REML (REML). Outcome series were simulated
under moderately strong evidence for the sum of the DL coefficients (δ = 0.25).

We set L = 14 to capture the majority of short-term effects specification is documented in the associated R code, availa-
of PM 10 on mortality without confounding estimation of DL ble at http://www.ihapss.jhsph.edu/software/BayesDLM/.
coefficients with seasonal trends in mortality. Our goal is to estimate the DL coefficients θ as part of the
When estimating air pollution health effects from time se- generalized linear model
ries studies it is important to account for potential time-
log(µt ) = x t θ + z t β . (3)
varying confounders such as weather, seasonality, and in-
fluenza epidemics (e.g., Schwartz, 1993; Samet et al., 1998; The estimate for 1000 × θ  corresponds to the percentage
Braga, Zanobetti, and Schwartz, 2000; Samoli et al., 2001; increase in daily mortality associated with a 10µg/m3 increase
14
Bell, Samet, and Dominici, 2004; Dominici, McDermott, and in PM 10 at lag , and 1000 × =0 θ corresponds to the per-
Hastie, 2004; Peng, Dominici, and Louis, 2005; Welty and centage increase in daily mortality associated with a 10µg/m3
Zeger, 2005). We let z t denote the vector of time-varying co- increase in PM 10 at lags  = 0, . . . , 14.
variates to include in the model, and we specify z t as in pre- Bayesian estimation of the generalized linear model in (3)
vious NMMAPS analyses (Dominici et al., 2003). The exact with our proposed prior for the DL coefficients θ requires two
6 Biometrics

Table 1
Mean squared errors of the estimates of the total effect and of the DL coefficients at lags 0, 7, and 14 obtained under four
estimation methods (Bayesian method (B), a polynomial with four degrees of freedom (P), a p-spline with penalty parameter
chosen by GCV (G), and a p-spline estimated with REML (R)) and for the 25 true DL functions. These results are reported
under the assumption of moderately strong evidence of a total effect (δ = 0.25). Mean squared errors are expressed as
percentages of the mean squared error of the corresponding ML estimates.

Total effect Lag 0 Lag 7 Lag 14


B P G R B P G R B P G R B P G R

E(0,7) 89 99 102 99 84 56 175 129 6 14 36 6 2 100 83 62


E(2,7) 91 99 100 99 78 47 59 77 9 11 31 16 2 135 94 102
E(0,14) 91 99 103 99 81 47 161 57 6 13 36 8 3 96 89 67
E(2,14) 95 99 99 99 78 70 56 62 8 11 22 15 6 108 95 98
o
E (0, 7) 89 99 100 99 81 58 119 167 6 22 42 10 1 129 92 78
Eo (2, 7) 89 99 100 99 77 43 70 76 7 16 47 12 2 141 96 89
Eo (0, 14) 89 99 100 99 80 48 74 162 15 50 37 26 2 134 96 70
Eo (2, 14) 88 99 100 99 74 44 81 49 11 50 58 18 3 124 102 83
St(0,7) 97 99 102 99 75 55 76 29 40 29 27 40 9 130 103 69
St(2,7) 99 99 98 99 74 88 40 49 50 38 23 38 10 126 86 75
St(0,14) 106 99 102 99 73 47 58 10 7 13 19 3 28 95 96 37
St(2,14) 105 99 96 99 72 59 29 24 7 13 25 6 30 95 76 61
Sto (0, 7) 87 100 100 99 82 67 68 113 98 206 41 187 4 96 99 50
Sto (2, 7) 87 100 100 100 73 61 72 24 46 179 51 220 5 97 102 37
o
St (0, 14) 86 99 100 99 81 52 65 84 72 183 70 135 180 355 99 248
Sto (2, 14) 86 99 99 99 73 43 65 15 33 133 31 142 188 316 93 339
G(0,7) 92 99 99 100 73 70 64 149 11 11 19 22 3 131 93 106
G(2,7) 92 100 99 100 75 187 55 94 16 28 27 33 4 96 86 84
G(0,14) 99 99 97 99 75 57 27 40 8 15 23 10 14 96 82 84
G(2,14) 99 99 100 99 75 89 25 71 18 18 27 11 20 143 93 71
o
G (0, 7) 88 100 100 99 71 73 86 63 7 27 60 9 2 134 106 42
Go (2, 7) 87 99 99 99 74 42 85 13 10 15 69 5 3 103 108 38
Go (0, 14) 87 99 100 99 76 50 80 41 63 180 59 109 3 100 96 40
Go (2, 14) 86 100 99 99 71 48 74 20 47 205 48 259 5 115 92 35
Null 89 99 96 99 74 47 21 10 5 13 24 3 1 95 83 37

extensions from the general approach outlined in Section 2. For both computational methods, we set the hyperprior on
First, the likelihood for (Y t | x t , z t ) is Poisson, so that θ̂ , η = (η 1 , η 2 ) to be a discrete uniform distribution over N 1 ×
the ML estimates of θ , will not be normal and the posterior N 2 , where N 1 is a length 10 sequence ranging from −0.35
distribution for θ | θ̂ will not have a closed form expression. to −0.05 in equal intervals, and N 2 is a length 10 sequence
Second, usual Bayesian estimation requires specifying a joint ranging from −0.37 to 0 in equal intervals. We selected the in-
prior for θ and β , an untenable approach given the size of the terval for N 1 so that the ratio of the prior standard deviation
nonpollutant covariate matrix and its potential relationship of θ 0 to θ L is bounded between 2 and 100. We selected the
with the pollutant covariate matrix. values for N 2 so that the prior correlation of θ L−1 and θ L is
We propose two approaches. The first is to fit (3) and bounded approximately by 0 and 0.99. We also set σ = 0.004,
treat the ML estimates θ̂ as N (θ , Σ), where Σ is the sam- slightly larger than the square root of ten times the estimated
ple covariance matrix. This approach ignores the uncertainty variance in the ML estimate of θ 0 . The sensitivity of the esti-
introduced by estimating β and relies on the asymptotic mated BDLagMs to choices of σ and N 1 × N 2 is considered
normality of the Poisson likelihood, but allows us to esti- below. We ran the Gibbs sampler for K = 5000 iterations, dis-
mate θ directly using its closed form posterior (1). The sec- carding the first 1000 as burn-in. Diagnostic checks suggested
ond approach is to fit the Poisson log-linear model using that the algorithm converged.
a Gibbs sampler; details and code are available at http:// Figure 2 shows the posterior mean and the 95% posterior
www.ihapss.jhsph.edu/software/BayesDLM/. region of the DL function for the association between PM 10
Bayesian Distributed Lag Models 7

Figure 2. Posterior mean (white) of the DL function for the effect of PM 10 on mortality for Chicago, Illinois from 1987 to
2000, using the last 4000 of 5000 iterations of the Gibbs sampler. The gray shaded region denotes the 95% posterior region.
Black dots indicate ML estimates for the unconstrained DL coefficients.

and mortality in Chicago from 1987 to 2000. The black dots value for σ (Web Figure 2). The value for σ 2 was initially set
indicate the unconstrained ML estimates of the DL coeffi- to 10 times the estimated variance of θ 0 . Larger values of σ re-
cients. The strongest association between PM and mortality sult in BDLagMs that more closely followed the unconstrained
occurs at lag 3: a 10µg/m3 increase in PM 10 at lag 3 is associ- ML estimates at longer lags. Smaller values of σ resulted in
ated with a 0.17% increase in mortality (95% posterior inter- BDLagMs with latter DL coefficients shrunk to zero. For σ =
val [PI] 0.01%, 0.34%), all other lagged PM 10 levels remaining 0.04, 0.004, 0.0004, the initial DL coefficient estimates were
constant. The drop in relative risk from lag 3 to lag 5 suggests indistinguishable. The original discrete uniform prior on η 1
the possibility of mortality displacement. We estimate a to- was set so that the ratio of the prior standard deviation of θ 0
tal effect of −0.24% (95% PI −0.73%, 0.23%). The estimated to θ L ranged approximately from 2 to 100. We considered two
total effect using unconstrained ML, −0.19%, is similar, but new priors for η 1 so that the ratio ranged from approximately
has a wider 95% confidence interval (−0.86%, 0.48%). The 2 to 50 (more restrictive) or from 2 to 200 (less restrictive). We
joint posterior distribution of η = (η 1 , η 2 ) (see Web Figure did not consider alternate priors on η 2 because the prior was
1) favored models for which Var(θ  ) → 0 quickly and Cor(θ  , already constructed to be as broad as possible without cre-
θ +1 ) → 1 moderately or quickly. ating numerical instability. The BDLagMs estimated across
Figure 3 compares posterior distributions of DL coefficients different prior distributions for η and σ = 0.004 were remark-
from the Gibbs sampler (black) and the normal approxima- ably similar. We concluded that the estimated BDLagM is
tion (gray). The estimates from the two methods differ for not driven strongly by the range of values for η .
more moderate lags but are similar for early and later lags and
for the overall sum of DL coefficients. This pattern of agree-
ment and discrepancy is not surprising, given that we expect 6. Discussion
the normal approximation and the true posterior distribution We introduce a Bayesian approach to estimate DL functions
to be most similar where the prior is weakest and the data in time series models of air pollution and mortality. This for-
drive estimation (early lags) and where the prior is strongest mulation uses prior knowledge about the shape of the DL
and drives estimation (later lags). The normal approxima- function, and allows the degree of smoothness of the DL
tion was computationally faster than the Gibbs sampler (on function to be estimated from the data. We illustrate in a
an AMD Opteron 848 system with a 2.2 GHz processor, 8.6 simulation study that when prior assumptions are valid, BD-
seconds versus 15.5 hours for 5000 iterations). LagMs estimate DL coefficients with smaller mean squared
We examined the sensitivity of the BDLagM estimates to errors than three common methods—polynomial, spline, and
the specification of the prior on η and the selection of the unconstrained DLagMs.
8 Biometrics

Figure 3. Comparison of estimation methods for DL coefficients of the effect of PM 10 on mortality for Chicago, Illinois from
1987 to 2000 by estimation method. Distributions of DL coefficient estimates, by lag, and sum of DL coefficients (all in units
of 10−4 ) are shown for (i) the DL coefficient vector simulated from the normal approximate posterior distribution (gray) and
(ii) the estimates of DL coefficients from last 4000 iterations of the Gibbs sampler (black).

We also show that our approach relates to using penalized approximately normal, we anticipate less agreement between
splines to estimate DL functions. Specifically, fitting a penal- the two estimation methods and that the normal approximate
ized spline DLagM with a specific penalty matrix is analogous posterior would be a less efficient proposal distribution.
to using a BDLagM with a normal prior on the DL coefficients. The BDLagM formulated for a single city time series study
An advantage of using the Bayesian approach is the simplicity may be naturally extended to a multicity framework. Multi-
of formulating a prior distribution on DL coefficients rather city studies of mortality and air pollution use hierarchical
than specifying a penalty matrix. models to pool individual city relative risks across multiple
Using the proposed BDLagM we estimated the association cities or counties, and have provided strong evidence for the
between lagged exposures of PM 10 and mortality for Chicago, association between air pollution and mortality (Zmirou et al.,
Illinois from 1987 to 2000. We found that the largest effect of 1988; Burnett et al., 1998; Schwartz, 2000, Katsouyanni et
PM 10 on mortality occurs at lag 3 and that the total effect is al., 2001; Samoli et al., 2001; Zanobetti et al., 2002, 2003;
equal to −0.21% (95% PI −0.86%, 0.41%). The shape of the Dominici et al., 2003). The hierarchical models used to date
DL function is consistent with mortality displacement. have estimated risk for single lag PM exposures or the total
For the Chicago data we found that the BDLagM esti- effect, which may not fully describe the relationship between
mated using the normal approximation to the likelihood (with short-term health risk and air pollution exposure. Estimating
a posterior distribution for θ available in closed form) and our BDLagM for multiple cities in a hierarchical model of an
the Poisson likelihood (with a Gibbs sampler) yielded simi- overall DL function between air pollution and mortality would
lar estimates for the total effect and for early and later DL provide additional understanding of the relationship between
coefficients. The relatively large number of daily deaths in air pollution and health (Peng, Dominici, and Welty, 2007).
Chicago (on average, 116) as well as the length of the time A challenge to estimating our BDLagMs for multiple cities
series may account for the agreement between the two meth- is missing data. For many U.S. cities, PM air pollution is
ods. For applications with outcome distributions that are not measured 1 in every 6 days. Before estimating the outlined
Bayesian Distributed Lag Models 9

BDLagMs for multicity studies, it will be necessary to de- Dominici, F., McDermott, A., Daniels, M. J., Zeger, S. L.,
velop a version that estimates the DL coefficients in the pres- and Samet, J. M. (2003). Revised Analysis of the Na-
ence of missing data. Accounting for missingness in the ex- tional Morbidity Mortality Air Pollution Study: Part II.
posure series would expand the applicability of the proposed Cambridge, Massachusetts: The Health Effects Institute.
BDLagMs. Dominici, F., McDermott, A., and Hastie, T. (2004). Im-
Given the equivalence between estimating DL functions us- proved semi-parametric time series models of air pollu-
ing a penalized spline and putting a prior directly on the DL tion and mortality. Journal of the American Statistical
coefficients, our Bayesian method may be viewed as a means Association 468, 938–948.
for eliciting a penalty matrix. P-spline penalties can be in- Eilers, P. and Marx, B. (1996). Flexible smoothing with
terpreted as the size of jumps of a smooth’s third or higher b-splines and penalties. Statistical Science 1, 89–121.
derivatives, which may be difficult to relate to biological or Fahrmeir, L. and Knorr-Held, L. (1997). Dynamic discrete-
other prior knowledge. Our method may be viewed as a trans- time duration models: Estimation via Markov chain
parent or intuitive means for eliciting penalties that are con- Monte Carlo. Sociological Methodology 27, 417–452.
sistent with prior knowledge of the objective function. Our Goodman, P. G., Dockery, D. W., and Clancy, L. (2004).
approach is not limited to functions that increase in smooth- Cause-specific mortality and the extended effects of par-
ness as they approach zero; it could also be applied, for in- ticulate pollution and temperature exposure. Environ-
stance, to monotonic functions. However, given the necessity mental Health Perspectives 112, 179–185.
of choosing a value for σ 2 = Var(θ 0 ), it could be imprudent to Green, P. J. and Silverman, B. W. (1994). Nonparametric
use our approach to estimate DL functions about which there Regression and Generalized Linear Models. Boca Raton,
is no prior knowledge about the range of θ 0 . Florida: Chapman and Hall.
Hastie, T. and Tibshirani, R. (1993). Varying-coefficient mod-
7. Supplementary Materials els. Journal of the Royal Statistical Society, Series B 4,
Web Tables and Figures referenced in Sections 4 and 5 are 757–796.
available under the Paper Information link on the Biometrics Katsouyanni, K., Toulomi, G., Samoli, E., Gryparis, A.,
website at http://www.biometrics.tibs.org. Le Tertre, A., Monopolis, Y., Rossi, G., Zmirou, D.,
Ballester, F., Boumghar, A., Anderson, H. R., Woj-
tyniak, B., Paldy, A., Braunstein, R., Pekkanen, J.,
Acknowledgements Schindler, C., and Schwartz, J. (2001). Confounding and
effect modification in the short-term effects of ambi-
Funding for the authors was provided by NIEHS RO1 grant
ent particles on total mortality: Results from 29 Euro-
(ES012054-01), and by NIEHS Center in Urban Environmen-
pean cities within the APHEA2 project. Epidemiology
tal Health (P30 ES 03819).
12, 521–531.
Kim, H., Kim, Y., and Hong, Y. (2003). The lag-effect pat-
References tern in the relationship of particulate air pollution to
Almon, S. (1965). The distributed lag between capital appro- daily mortality in Seoul, Korea. International Journal of
priations and expenditures. Econometrica 33, 178–196. Biometeorology 48, 25–30.
Bell, M. L., Samet, J. M., and Dominici, F. (2004). Time- Leamer, E. E. (1972). A class of informative priors and dis-
series studies of particulate matter. Annual Review of tributed lag analysis. Econometrica 40, 1059–1081.
Public Health 25, 247–280. Manda, S. and Meyer, R. (2005). Age at first marriage in
Bell, M. L., McDermott, A., Zeger, S. L., Samet, J. M., and Malawi: A Bayesian multilevel analysis using a discrete
Dominici, F. (2004). Ozone and short-term mortality in time-to-event model. Journal of the Royal Statistical So-
95 US urban communities, 1987–2000. Journal of the ciety, Series A 168, 439–455.
American Medical Association 19, 2372–2378. Peng, R. D., Dominici, F., and Louis, T. A. (2005). Model
Braga, F., Zanobetti, A., and Schwartz, J. (2000). Do respi- choice in time series studies of air pollution and mortal-
ratory epidemics confound the association between air ity. Journal of the Royal Statistical Society, Series A 169,
pollution and daily deaths? European Respiratory Jour- 179–203.
nal 16, 723–728. Peng, R. D., Dominici, F., and Welty, L. J. (2007). A Bayesian
Braga, F., Luis, A., Zanobetti, A., and Schwartz, J. (2001). hierarchical model for constrained distributed lag func-
The time course of weather-related deaths. Epidemiology tions: Estimating the time course of hospitilization asso-
12, 662–667. ciated with air pollution exposure. Technical Report 128,
Burnett, R., Cakmak, S., and Brook, J. (1998). The effect of Department of Biostatistics, Johns Hopkins University,
the urban ambient air pollution mix on daily mortality Baltimore, Maryland.
rates in 11 Canadian cities. Canadian Journal of Public Pope, C. A. and Schwartz, J. (1996). Time series for the
Health 89, 152–156. analysis of pulmonary health data. American Journal
Carlin, B. P. and Louis, T. A. (2000). Bayes and Empirical of Respiratory and Critical Care Medicine 154, S229–
Bayes Methods for Data Analysis. Boca Raton, Florida: S233.
Chapman and Hall. Pope, C. A. R., Dockery, D. W., Spengler, J. D., and
Corradi, C. (1977). Smooth distributed lag estimators and Raizenne, M. E. (1991). Respiratory health and pm10
smoothing spline functions in Hilbert spaces. Journal of pollution. A daily time series analysis. American Review
Econometrics 5, 211–220. of Respiratory Diseases 144, 668–674.
10 Biometrics

Ravines, R. R., Schmidt, A. M., and Migon, H. S. (2006). Re- Silverman, B. W. (1985). Some aspects of the spline smooth-
visiting distributed lag models through a Bayesian per- ing approach to non-parametric regression curve fitting.
spective. Applied Stochastic Models in Business and In- Journal of the Royal Statistical Society, Series B 47, 1–
dustry 22, 193–210. 52.
Roberts, S. (2005). An investigation of distributed lag models Welty, L. J. and Zeger, S. L. (2005). Are the acute effects
in the context of air pollution and mortality time series of particulate matter on mortality in the National Mor-
analysis. Journal of the Air and Waste Management As- bidity, Mortality, and Air Pollution study the result of
sociation 55, 273–282. inadequate control for weather and season? A sensitivity
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). analysis using flexible distributed lag models. American
Semiparametric Regression. Cambridge, U.K.: Cambridge Journal of Epidemiology 162, 80–88.
University Press. Zanobetti, A., Wand, M. P., Schwartz, J., et al. (2000). Gener-
Samet, J., Zeger, S., Kelsall, J., Xu, J., and Kalkstein, L. alized additive distributed lag models: Quantifying mor-
(1998). Does weather confound or modify the association tality displacement. Biostatistics 1, 279–292.
of particulate air pollution with mortality. Environmental Zanobetti, A., Schwartz, J., Samoli, E., Gryparis, A.,
Research 77, 9–19. Touloumi, G., Atkinson, R., Le Tertre, A., Bobros,
Samet, J. M., Zeger, S. L., Dominici, F., Curriero, F., Dock- J., Celko, M., Goren, A., Forsberg, B., Michelozzi, P.,
ery, D. W., Schwartz, J., and Zanobetti, A. (2000). The Rabczenko, D., Aranguez Ruiz, E., and Katsouyanni, K.
National Morbidity Mortality Air Pollution Study: Part II. (2002). The temporal pattern of mortality responses to
Cambridge, Massachusetts: The Health Effects Institute. air pollution: A multicity assessment of mortality dis-
Samet, J. M., Zeger, S. L., Dominici, F., Schwartz, J., and placement. Epidemiology 13, 87–93.
Dockery, D. W. (2000). The National Morbidity Mortality Zanobetti, A., Schwartz, J., Samoli, E., Gryparis, A.,
Air Pollution Study: Part I. Cambridge, Massachusetts: Touloumi, G., Peacock, J., Anderson, R. H., LeTertre,
The Health Effects Institute. A., Bobros, J., Celko, M., Goren, A., Forsberg, B., Mich-
Samoli, E., Schwartz, J., Wojtyniak, B., et al. (2001). Inves- elozzi, P., Rabczenko, D., Hoyos, S. P., Wichmann, H.
tigating regional differences in short-term effects of air E., and Katsouyanni K. (2003). The temporal pattern
pollution on daily mortality in the APHEA project: A of respiratory and heart disease mortality in response
sensitivity analysis for controlling long-term trends and to air pollution. Environmental Health Perspectives 111,
seasonality. Environmental Health Perspectives 109, 349– 1188–1193.
353. Zmirou, D., Schwartz, J., Saez, M., Zanobetti, A., Wojty-
Schiller, R. J. (1973). A distributed lag estimator derived from niak, B., Touloumi, G., Spix, C., Ponce de León, A., Le
smoothness priors. Econometrica 41, 775–788. Moullec, Y., Bacharova, L., Schouten, J., Pönkä, A., and
Schimmel, B. and Murawsky, T. (1978). The relation of air Katsouyanni, K. (1988). Time-series analysis of air pol-
pollution to mortality. Journal of Occupational Medicine lution and cause-specific mortality. Epidemiology 9, 495–
18, 316–333. 503.
Schwartz, J. (1993). Methodological issues in studies of air
pollution and daily counts of deaths or hospital admis-
sions. American Journal of Epidemiology 137, 1136–1147.
Schwartz, J. (2000). The distributed lag between air pollution Received December 2005. Revised January 2008.
and daily deaths. Epidemiology 11, 320–326. Accepted January 2008.

You might also like