You are on page 1of 20

Point process modelling of the Afghan War Diary

Andrew Zammit-Mangion
a,b
, Michael Dewar
c
, Visakan Kadirkamanathan
d
, and Guido Sanguinetti
a,e,1
a
School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom;
b
University/British Heart Foundation Centre for Cardiovascular
Science, Queens Medical Research Institute, Edinburgh EH16 4TJ, United Kingdom;
c
Department of Applied Physics and Applied Mathematics, Columbia
University, New York, NY 10027;
d
Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield S1 3JD, United Kingdom;
and
e
SynthSysSystems and Synethic Biology, University of Edinburgh, Edinburgh EH9 3JD, United Kingdom
Edited by Stephen E. Fienberg, Carnegie Mellon University, Pittsburgh, PA, and approved June 8, 2012 (received for review February 25, 2012)
Modern conflicts are characterized by an ever increasing use of
information and sensing technology, resulting in vast amounts of
high resolution data. Modelling and prediction of conflict, how-
ever, remain challenging tasks due to the heterogeneous and dy-
namic nature of the data typically available. Here we propose the
use of dynamic spatiotemporal modelling tools for the identifica-
tion of complex underlying processes in conflict, such as diffusion,
relocation, heterogeneous escalation, and volatility. Using ideas
fromstatistics, signal processing, and ecology, we provide a predic-
tive framework able to assimilate data and give confidence esti-
mates on the predictions. We demonstrate our methods on the
WikiLeaks Afghan War Diary. Our results show that the approach
allows deeper insights into conflict dynamics and allows a strik-
ingly statistically accurate forward prediction of armed opposition
group activity in 2010, based solely on data from previous years.
conflict prediction point processes variational Bayes
T
he last decade has witnessed a tremendous increase in the
availability of data relating to conflicts. For example, the col-
lection of media reports in the Armed Conflict Location and
Event Dataset (1) provides a small scale but highly curated re-
cord of conflict events. More prominently, the release of confi-
dential documents by the WikiLeaks whistleblower website in
July 2010 has provided for the first time a large scale (but uncu-
rated) description of the current Afghan conflict. However, most
analyses of these and similar data sources do not go beyond
visualization and descriptive statistical methods (25), for good
reasons: first, conflict data is highly heterogeneous and often
poorly annotated. For example, the WikiLeaks Afghan War Diary
(AWD) data used in this study (Dataset S1) consists of event en-
tries as diverse as elaborate preplanned military activity and spon-
taneous stop-and-search events. Any plausible attempt to model
this data will need to be statistical in nature in order to handle the
high levels of noise. Second, it is very difficult to define simple
mechanisms that would allow the bottom-up construction of a
plausible model.
Here, we develop statistical dynamical modelling methodolo-
gies to provide a predictive framework that may be used in policy
making. We show that the temporal and spatial dependencies
(6, 7) as well as diffusion and advection effects (8, 9) inherent
in conflict data make it suitable for the use of a broad class of
models, widely employed in ecology and epidemiology, in order
to describe the dynamics of disaggregated data. We then develop
tools based on ideas from point process statistics (10) to constrain
the models. The approach enables us to leverage powerful tech-
niques from point process filtering theory and spatiotemporal sta-
tistics (1114) to carry out inference of the underlying systems
dynamics and to predict the future behavior of the system.
We test the performance of our methods on the AWD, a
WikiLeaks release which contains over 75,000 military logs by the
USA military, describing events which occurred between the
beginning of 2004 and the end of 2009 and providing a high tem-
poral and spatial resolution description of the Afghan war in that
period. We show that our approach allows deeper insights in the
conflict dynamics than simple descriptive methods by providing a
spatially resolved map of the growth and volatility of the conflict.
Most remarkably, we show that a model trained on the AWD can
predict with surprising statistical accuracy the progression of the
conflict in 2010; i.e., a year after the end of the AWD data. We
conclude the paper by discussing the importance and potential of
statistical modelling of conflict data, as well as offering some
consideration as to its wide applicability.
Statistical Methods
Spatial Point Processes and the Stochastic Integro-Difference Equa-
tion (SIDE). Conflict data typically consists of a set of incidents
labeled through spatiotemporal coordinates which, when visua-
lized as event markers, are highly spatiotemporally correlated,
generally clustered and representative of some underlying struc-
ture. In this regard, these data sets are very similar to others
encountered in a variety of fields, such as epidemiology (15) and
agricultural sciences (16). Poisson point processes provide a con-
venient and frequently used mathematical framework to model
event-based data; in this framework, the probability of observing
a certain number of events within a region of interest O is given by
a Poisson distribution whose mean is the integral over O of an
intensity function s, s O. In order to accommodate phenom-
ena such as event clustering, the intensity itself is often modeled
as a random function, giving rise to doubly stochastic or Cox
processes. A popular class of Cox processes, which will also be
considered here, is the log-Gaussian Cox process (LGCP) where
the logarithm of the event intensity is assumed to be a Gaussian
process (GP). We recall that a GP is wholly defined by (i) a mean
function s describing a global trend and (ii) a covariance func-
tion ks; r indicating how the field at distinct points in space
(s and r) covary (17).
Because conflict data is often logged in a discrete-time format
(e.g., the day of an event as opposed to the precise time), we will
consider a discrete-time series of continuous-space LGCPs. For-
mally, let k K, K f1; 2; ; Kg denote a discrete-time index
set and fz
k
sg, z
k
s GP
k
s;
2
k

k
s; r, a set of temporally
correlated spatial GPs, each with mean
k
s and covariance
function
2
k

k
s; r. For each k, we then define the point process
intensity function as
k
s expz
k
s. Frequently, the mean
function of z
k
s, k K, can be related to explanatory variables,
such as population density, which help to reduce prediction un-
certainty. We hence let ds be a vector of spatially referenced
covariates and b
T
the corresponding regression parameters;
the LGCP at time k then has intensity
k
s expb
T
ds
z
k
s.
Naturally, the key question is how to specify the temporal
dynamics of the intensity functions through z
k
s; we need a suf-
ficiently flexible modelling approach to incorporate the complex-
ity of conflict dynamics. One such representation is the stochastic
Author contributions: A.Z.M., V.K., and G.S. designed research; A.Z.M. and G.S. performed
research; A.Z.M., M.D., and G.S. analyzed data; and A.Z.M. and G.S. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
1
To whom correspondence should be addressed. E-mail: G.Sanguinetti@ed.ac.uk.
This article contains supporting information online at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1203177109/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1203177109 PNAS Early Edition 1 of 6
S
T
A
T
I
S
T
I
C
S
S
O
C
I
A
L
S
C
I
E
N
C
E
S
integro-difference equation (SIDE), a model originally intro-
duced in ecology (18) which has rapidly gained popularity in spa-
tiotemporal statistics (19). The SIDE relates the spatiotemporal
dependent variable z
k
s to z
k1
s through the following integral
equation
z
k1
s
Z
O
k
I
s; rf
1
z
k
rdr e
k
s; [1]
where k
I
s; r is the mixing kernel in the integral and e
k
s is an
added disturbance, modeled as a Gaussian field with mean
Q
s
and covariance function k
Q
s; r, e
k
s GP
Q
s; k
Q
s; r,
and O is the spatial domain under investigation. The nonlinear
mapping f
1
distorts the field in the sedentary stage; in this work
we will employ the identity map f
1
z
k
r z
k
r, an assumption
usually adopted in the absence of a priori knowledge (20). The
SIDE is, in its original form, a very flexible modelling tool,
capable of representing a number of dynamic effects such as dif-
fusion and dispersal (or both simultaneously) even under consid-
erably restrictive conditions (19). Although the AWD will suggest
the use of only a special case of SIDE, the two-pronged metho-
dological approach we present here to estimate unknown compo-
nents is in principle applicable to the more general case.
Nonparametric Analysis. We start by studying the correlation be-
tween the conflict events within the same and across subsequent
time frames. We are interested in the probabilities of finding a
conflict event at r given that an event has occurred at s within the
same time frame k or at the previous time frame k 1. In point
process statistics these are quantified through the pair auto-
correlation function (PACF) g
k;k
s; r, and what we term the pair
cross-correlation function (PCCF) g
k;k1
s; r defined as
g
k;k
s; r

2
k;k
s; r

1
k
s
1
k
r
; [2]
g
k;k1
s; r

2
k;k1
s; r

1
k
s
1
k1
r
; [3]
where
1
k
s E
k
s and
2
k;k
s; r E
k
s
k
r are real and
positive and E denotes the expectation operator.
The PACF may be used to determine qualitative characteristics
of the conflict; for instance if g
k;k
s; r 1, then no spatial
pattern can be extracted from the data; g
k;k
s; r > 1 and g
k;k
s; r
< 1 can be used to indicate conflict aggregation and repulsion
respectively. The PACF can also be used as a preprocessing tool
for dimensionality reduction. Direct use of the PACF and PCCF
for nonparametric field estimation is also possible (SI Text) but
our preliminary investigation showed that this is only a reliable
proposition for homogeneous datasets with a very large number
of events (SI Text).
Dimensionality Reduction and Bayesian Inference. In order to devel-
op an inferential approach for SIDE driven LGCPs, we adopt a
basis function representation of the spatiotemporal field, which
we will then truncate at a level which enables sufficient accuracy
(21). This representation, frequently employed in spatiotemporal
modelling [e.g., process convolution models (22, 23)], in turn
facilitates the implementation of computationally efficient infer-
ence algorithms.
The choice of basis functions is a problem that deserves atten-
tion; as far as we are aware, there are no standard solutions for
LGCPs. We propose here a general approach to selecting basis
functions based on the nonparametric estimation of the PACF.
Specifically, we capitalize on (i) a fundamental lemma of LGCPs
g
k;k
s; r exp
2
k

k
s; r; [4]
which states that the log PACF is proportional to the field auto-
correlation function and (ii) the auto-correlation theorem (24)
which states that the Fourier transform of the auto-correlation
function is the spectrum of the signal. Hence, a relationship be-
tween the frequency content of the point process and the PACF is
found, which in turn may be used to select a set of sufficiently
representative basis functions, much on the lines of refs. 21
and 25. We then obtain a decomposition of the kernel, the mean
disturbance and the field as
z
k
s s
T
x
k
; [5]

Q
s s
T
; [6]
k
I
s; r s
T

I
r; [7]
k
Q
s; r s
T

Q
r; [8]
where s R
n
is the vector of basis functions, x
k
R
n
and
R
n
are weights which reconstruct the spatiotemporal field
and the disturbance mean respectively and where
I
R
nn
and
Q
R
nn
reconstruct the kernel covariance function and
the disturbance covariance function respectively.
It can be shown (SI Text) that under this decomposition, the
SIDE of Eq. 1 can be represented in the compact form
x
k1
A
I
x
k
w
k
;
Q
; [9]
where A
I
R
nn
and w
k
R
n
is a Gaussian colored noise
term with mean Ew
k
and covariance covw
k

Q
. Eq. 9 is
a standard linear dynamical system where both the states X
K

x
0K
fx
k
g
K
k0
and the unknown parameters f;
I
;
1
Q
g
need to be estimated from the data Y
K
fy
k
g
K
k1
where we
define each y
k
to be the set of coordinates of the logged events
at the kth time point.
For inference, we make use of the likelihood function
py
k
j
k
s
Y
s
j
y
k

k
s
j
exp

Z
O

k
sds

; [10]
and approximate each
k
s using the same basis representation:

k
s expb
T
ds z
k
s expb
T
ds s
T
x
k
: [11]
We proceed with a computationally efficient variational Bayes
(VB) method by approximating the full posterior distribution
pX
K
; ; bjY
K
pX
K
; ;
I
;
1
Q
; bjY
K

~ pX
K
~ p~ p
I
~ p
1
Q
~ pb; [12]
where ~ p are the variational marginals (26, 27).
The variational marginals are able to reveal important proper-
ties of the conflict progression; X
K
is used to reconstruct the spa-
tiotemporal field at every time point, reveals the spatially
varying escalation in conflict,
I
the extent of any spatial dy-
namics, if any, and
Q
the volatility of the conflict which can either
be localized or dependent on events happening at remote geogra-
phical locations. The number of unknown parameters in the re-
duced model scales as On
2
, where n is the number of basis
functions retained. However, as we will see later, nonparametric
2 of 6 www.pnas.org/cgi/doi/10.1073/pnas.1203177109 Zammit-Mangion et al.
data analysis can suggest further simplifications which can con-
siderably lower the complexity of the model.
The Afghan War Diary
On July 25 , 2010, WikiLeaks publicly made available a compen-
dium of US military war logs in Afghanistan dating between 2004
and 2009. The so-called Afghan War Diary contains a detailed
insiders description of the military machinery of the worlds
largest power; it consists of roughly 77,000 logs and entries detail
the time and position of an event, which could be anything from a
stop-and-search episode to a gunfight. The dataset is considered a
reliable description of the Afghan war and systematic verification
efforts carried out by several organizations such as the New York
Times* have found little reason to dispute its authenticity. SI Text
reports some of our own tests which show significant correlations
between the logged event rate in the AWD and that in other da-
tasets. In what follows we adopt the spatiotemporal point process
approach to infer a model from the data in the AWD and use it to
analyze the heterogeneous growth (through ) and volatility
(through
Q
) of the conflict in Afghanistan and also to predict
violence of armed opposition groups in 2010, a year after the
end of the WikiLeaks dataset.
We start with a nonparametric analysis (SI Text) of the data by
splitting the data into weekly intervals (
t
1 week) and looking
at the temporally averaged PACF and PCCF fitted to Gaussian
radial basis functions. It is found that, on average, the log PACF
is nearly identical to the log PCCF and that a nonparametric es-
timate of a homogeneous kernel k
I
jjs rjj, computed with the
direct inverse filter, is very narrow in relation to the extent of the
spatial correlations in the field (SI Text). This observation sug-
gests that k
I
in the SIDE may be safely approximated to
ss r, corresponding to negligible spatial interactions
across adjacent time frames. Note that if e
k
s is restricted to
be homogeneous and s , the spatiotemporal covariance
function is separable, a common assumption in several fields such
as epidemiology (15). However, given the data characteristics, we
chose to maintain the spatial heterogeneity in e
k
s. We also set
s 1 as we found no evidence of mean reversion both at a
national and a provincial level; additionally, we found that a spa-
tially dependent s did not contribute to increased prediction
accuracy.
The resulting formulation is validated by studying the temporal
dynamics of the AWD (Fig. 1A). A quantitative analysis reveals
that the fractional increments of the event incidence nationwide
are normally distributed (with a one-tailed Shapiro Wilks test
and a Levenes test with 0.1, n 312 w. See also Fig. 1 B
and C)

. This statistic characterizes systems following a geometric


Brownian motion given by
ds; t
e
Rss; tdt s; tdWs; t; [13]
where the increment dWs; t is a Gaussian process with zero
mean and covariance function k
Q
s; rdt and
e
Rs is a spatially
varying percentage drift. Applying Itos Lemma (28) to ln s; t
and noting that the continuous-time intensity ln s; t b
T
ds
zs; t, we obtain the following form for zs; t:
dzs; t Rsdt dWs; t; [14]
where Rs
e
Rs
1
2
s
2
is a heterogeneous temporally inde-
pendent spatial growth rate and s
2
is the variance field. Ap-
plying an explicit Euler discretization scheme to Eq. 14, one
obtains the model z
k1
s z
k
s e
k
s where e
k
s has mean

Q
s Rs
t
and covariance function k
Q
s; r
t
. This model
is, as expected, the SIDE with the delta-Dirac kernel.
The field is next decomposed and Eqs. 6 and 8 are applied to
finally obtain the random walk model occasionally employed in
spatiotemporal studies (29)
x
k1
x
k
w
k
;
Q
: [15]
For basis function selection we employed the aforementioned fre-
quency-based approach (see SI Text for complete details). Finally,
we chose population density and the distance to the nearest major
city as covariates (see SI Text for details on how this choice was
made). Inference was carried out using the VB algorithm de-
scribed above. Full derivatons, algorithmic details, and configura-
tion parameters (priors and stopping conditions), as well as
indicative run times, are given in SI Text respectively whilst a de-
tailed simulation study showing the identifiability of the model
under flat priors and a comparison with kernel-based estimators
(30), is given in SI Text.
Results
Conflict Intensity and Regression Parameters. State inference leads
to broad conclusions to where and how the conflict intensity has
increased, decreased or shifted in time. We show the posterior
mean intensity at regular intervals in SI Text and also in
Movie S1 together with the underlying AWD events at a weekly
resolution. The progression of the intensity captures important
geographical features of the war scenario. Regions of high inten-
sity in 2009 include Sangin in northern Helmand (see SI Text for a
provincial map), one of the most dangerous places in Afghani-
stan, notorious for thousands of improvised explosive devices
and frequent suicide bombings (2). Other regions, such as Kabul,
Nangarhar, and Paktya provinces, on the other hand have wit-
nessed high activity throughout the six-year interval. Also very
apparent is the emergence in later years of a high intensity ring
starting from Kabul extending southwards towards Kandahar, up
through Herat, through Balkh and back to Kabul. This roughly
elliptical shape corresponds to the countrys ring road, com-
monly targeted by insurgent activity and placement of improvised
explosive devices (2). We note that a representative spatiotem-
Fig. 1. Temporal analysis of the AWD. (A) Weekly number of activity reports in Afghanistan between January 2004 and December 2009 (bin size 1 w).
(B) Distribution of weekly fractional increments in report count in the AWD where N
k
denotes the number of report counts at week k. (C) Corresponding
normality probability plot. Fourteen points (4.5% of data) were marked as clear outliers as a result of low report count and not used in this analysis.
*http://www.nytimes.com/2010/07/26/world/26editors-note.html?_r=1

The Levenes test failed to reject the null hypothesis of constant variance for the years
2006 to 2009 but not when including 2004 and 2005. The reason for rejection when
including the earlier two years can be safely attributed to relatively low report count,
arising in noisy quantities when computing the fractional increments.
Zammit-Mangion et al. PNAS Early Edition 3 of 6
S
T
A
T
I
S
T
I
C
S
S
O
C
I
A
L
S
C
I
E
N
C
E
S
poral intensity map may also be obtained with the use of standard
nonparametric kernel estimators (30), seen in Movie S2.
The regression parameters corresponding to population den-
sity and distance to the closest major city were estimated to be
1.97 10
4
6.2 10
6
(2) and 0.037 2.1 10
4
respec-
tively. This result reflects the fact that a vast majority of logs
in the AWD, as with typical conflict datasets, are present in urban
and highly populated areas (7).
Conflict Escalation and Volatility. A major advantage of the adopted
model-based approach is the ability to establish quantitative con-
clusions on aspects of the conflict scenario other than the inten-
sity. For instance, in the AWD we have modeled the spatially
varying escalation of conflict in Afghanistan between 2004 and
2009 through (Fig. 2) and the volatility of the conflict progres-
sion in the same period through the diagonal elements of

Q
(Fig. 3).
Escalation (or deescalation) may be used to distinguish
between event hot spots and growth hot spots. This feature is,
in itself, a major advantage over conflict clustering analysis which
cannot discern whether a cluster was a one-off, or a sign of a de-
teriorating situation. In the AWD it is very evident, for instance,
that while some of the high growth areas such as Helmand also
had an overall high count of events, this was not the general case;
for example, Sar-e Pul and Balkh in the north and the Badghis
province in the west all had witnessed a modest number of total
event count but are seen to have had a significant overall growth
in activity throughout the years.
The volatility/predictability of the conflict is also of consider-
able interest. In our case, a small diagonal value in
Q
indicates
that based on the data so far the future intensity may be predicted
with reasonable accuracy. On the other hand, a large value is a
sign of considerable volatility; little can be said about the future.
Such inferences are vital for decision purposessimply stated it
might prove a better option to admit a large uncertainty about the
future, than to base a policy decision on a highly uncertain pre-
diction. Consider for instance the high volatility on the eastern
part of Farah province in western Afghanistan (see SI Text). A
subsequent analysis of the video shows spurious clusters emerging
in April 2005 and towards the end of 2006, an indication that the
conflict dynamics in this part of Afghanistan are relatively hard to
predict; even more so than in Sangin which had seen a drastic, but
relatively smooth, increase in events in the latter years.
Prediction. The key advantage of dynamic point process modelling
is the ability to make statistical predictions of the systems beha-
vior for decision making. To illustrate this feature we considered
the frequency of incidents by armed opposition groups (AOG)
and predicted it in 2010, a year after the termination of the
WikiLeaks dataset. AOG activity on a provincial scale was ob-
tained from the Afghanistan NGO Safety Office (ANSO) safety
reports

. Prediction was carried out by (i) sampling a trajectory z


k
through ~ pX in 2009, (ii) forward simulating each trajectory for
52 weeks (2010) using the generative model with the parameters
,
Q
and b set to E
~ p
, E

1
Q

1
Q

1
and E
~ pb
b respectively,
Fig. 2. AWD activity growth in Afghanistan. (A) Posterior mean fractional increase in logs per week in the AWD between 2004 and 2009. Only regions with
positive overall growth are shown. (BF left) Spatial map of all events occurring in a square of side 100 km centered on the city under study. (BF right) Number
of weekly events N
k
in these regions (-) together with the estimated 90% confidence intervals (green shading).
Fig. 3. Volatility in conflict events between 2004 and 2009 in the WikiLeaks
AWD. Only regions with a high volatility (
2
> 0.055) are shown.

Reports are freely available from the official ANSO website http://www.afgnso.org.
4 of 6 www.pnas.org/cgi/doi/10.1073/pnas.1203177109 Zammit-Mangion et al.
(iii) integrating the interpolated sample over each ith province to
give ^ z
k;i
, (iv) finding the corresponding intensity
^

k;i
, (v) averaging
the intensity over 52 w invervals to obtain
^

2009;i
and
^

2010;i
, (vi)
generating two samples N
i;2009
and N
i;2010
from Poisson random
variables with intensity
^

2009;i
and
^

2010;i
, (vii) predicting a provin-
cial AOG count in 2010, AOG
i;2010
, from that in 2009, AOG
i;2009
,
through the formula
AOG
i; 2010

N
i;2010
N
i;2009
AOG
i;2009
; [16]
and (viii) repeating (i)(vii) for N 2; 000 times. Note that
although Eq. 16 is a very simple predictor, one which assumes
a linear relationship (without offset), it reflects the fact that
the frequency of the logs in the WikiLeaks dataset is significantly
correlated with the saliency of AOG initiated attacks in Afghani-
stan, particularly in 2009 (SI Text).
As seen from Fig. 4 A and B, the prediction medians from the
model match closely the observed values. In Baghlan, for in-
stance, AOG activity rose by 120% (17.3% using log counts) from
100 incidents in 2009 to 222 in 2010; the model predicted a med-
ian 2010 increase of 128% (17.9%) to a count of 228. Badakhshan
saw a 19% (5.5%) growth in 2010; our model predicted a med-
ian of 23% (7.0%) growth. Further, a correlation test between
the predicted medians and actual incident count for all 32 pro-
vinces gave a Pearsons correlation coefficient of 0.81 on a linear
scale and 0.89 under a log transform (Fig. 4B), showing strong
support for prediction capability.
Despite this, for some provinces (such as Badghis), the median
remains substantially offset from the true value. The disparities
are, however, consistent with the predictive distributions. From
Fig. 4A it is seen that counts in 62.5% of the predicted provinces
lie between the lower and upper quartiles and more importantly
all of them lie within the 99% confidence intervals. The same
holds for the predicted change in AOG activity in 2010, the dis-
tributions of which are given in SI Text. Even here, the model is
seen to be well tuned and supply confidence intervals which con-
sistently capture the true activity growth (Fig. 4C).
Thus, although the true count is not always close to the point-
wise median predictions, we see that the predictions are accurate
in a statistical sense; i.e., the predicted and observed distribution
of AOG growth across provinces match closely. Further, the
above results were obtained merely from the AWD up to 2009
and did not include any knowledge of events in 2010 such as mili-
tary plans or deployments/withdrawal of troops. Incorporation of
domain knowledge to reduce the predictive variance would, in
principle, be straightforward in our model through manipulation
of the prior distributions or inclusion of further relevant exoge-
neous inputs.
Discussion and Conclusions
Our results demonstrate that statistical spatiotemporal modelling
can be an extremely valuable tool in the analysis of conflict. The
analysis of the AWD data shows that data modelling can yield
insights that cannot be achieved by simple visualization or by the
use of descriptive statistics. This claim is borne out by the avail-
ability of a spatially resolved map of the growth of conflict inten-
sity, as well as the volatility/predictability of the conflict. Further-
more, the availability of statistical confidence intervals associated
with all model predictions is an important feature of our model-
ling framework and a potentially crucial feature for decision
making.
The most striking result of our analysis is the ability to accu-
rately predict (in a statistical sense) conflict dynamics for a whole
year after the end of the AWD data on which the model was
trained. While we do not have a simple mechanism underlying
our model, the fact that a latent Gaussian model can produce pre-
dictions of this quality cannot be by chance. Intuitively, we believe
that the type of conflict we are modelling may be the main reason
Fig. 4. Prediction of AOG growth in 2010. (A) Box-and-whisker plots of the predicted log AOG activity in 2010 using 2000 MC runs. For each province, the
box marks the first and third quartiles; the median (red line), mean (black circle), and true reported count (green circle) are also given. The whiskers extend to
the furthest MC points that are within 1.5 times the interquartile range (99% coverage) and the outliers are plotted individually (red cross). (B) Comparison
between the median log model prediction and log AOG count in 2010 where the mark number corresponds to the province number denoted in (A). () Ideal
prediction. (C) Cumulative distribution of growth prediction on a province-by-province basis. The graph shows correct tuning of the model, with approximately
x% of provinces lying within the xth percentile of the predictive distribution. () True cumulative score. (dotted line) Ideal cumulative score.
Zammit-Mangion et al. PNAS Early Edition 5 of 6
S
T
A
T
I
S
T
I
C
S
S
O
C
I
A
L
S
C
I
E
N
C
E
S
why our method works. The Afghan conflict is characterized by
insurgent movements and qualifies as a case of irregular warfare
where activity is only loosely dependent and actioned by a myriad
of disparate groups. Some averaging effects may be leading to the
Gaussian behavior of the conflicts intensity, which in turn may be
exploited for modelling purposes.
Naturally, as with all modelling techniques, our approach
comes also with limitations, as well as benefits. From the techni-
cal point of view, reliable parameter estimation in point processes
requires a sufficiently large number of events within the region of
interest. While it is difficult to put a precise figure to this number,
we found that parameter estimation in provinces with fewer event
counts than a few dozens a year was extremely difficult. Another
limitation may be the suitability of the modelling approach to
generic conflict scenarios. Our approach appears to be more
suitable for fragmented scenarios such as Afghanistan rather than
conventional wars between well organized armies. Finally, we
have assumed temporal-invariance of the parameters. Sequential
implementations allowing continuous estimation of slowly vary-
ing governing parameters are in principle straightforward (11)
and offer an attractive way forward to the study of conflict.
In conclusion, the analysis presented in this paper has been
made possible by the development of statistical methodologies
to handle large scale spatiotemporal datasets. Given the in-
creased availability of such datasets from remote sensing or
social networking sources, we envisage that methods such as
those used here will become increasingly useful in a number
of disciplines.
ACKNOWLEDGMENTS. This work was supported in part by the Pattern Analy-
sis, Statistical Modelling, and Computational Learning 2 (PASCAL) FP7
Network of Excellence, and by a studentship from the University of Sheffield
to A.Z.-M. G.S. is funded by the Scottish Government through the SICSA
initiative. V.K. is part-funded by the EPSRC platform grant EP/H00453X/1.
1. Raleigh C, Linke A, Hegre H, Karlsen J (2010) Introducing ACLED: an armed conflict
location and event dataset. J Peace Res 47:651660.
2. OLoughlin J, Witmer FDW, Linke AM, Thorwardson N (2010) Peering into the fog of
war: the geography of the WikiLeaks Afghanistan war logs, 20042009. Eurasian
Geogr Econ 51:472495.
3. OLoughlin J, Witmer FDW, Linke AM (2010) The Afghanistan-Pakistan wars, 2008
2009: micro-geographies, conflict diffusion, and clusters of violence. Eurasian Geogr
Econ 51:437471.
4. Gleditsch KS, Weidmann NB (2012) Richardson in the information age: GIS and spatial
data in international studies. Annu Rev Polit Sci 15:461481.
5. Bohannon J (2011) Counting the dead in Afghanistan. Science 331:12561260.
6. Haushofer J, Biletzki A, Kanwisher N (2010) Both sides retaliate in the IsraeliPalesti-
nian conflict. Proc Natl Acad Sci USA 107:1792717932.
7. Weidmann NB, Ward MD (2010) Predicting conflict in space and time. J Conflict
Resolut 54:883901.
8. Schutte S, Weidmann NB (2011) Diffusion patterns of violence in civil wars. Polit Geogr
30:143152.
9. Zhukov YM (2012) Roads and the diffusion of insurgent violence: the logistics of
conflict in Russias North Caucasus. Polit Geogr 31:144156.
10. Moeller J, Waagepetersen R (2004) Statistical Inference and Simulation for Spatial
Point Processes (CRC Press, Boca Raton).
11. Zammit Mangion A, Yuan K, Kadirkamanathan V, Niranjan M, Sanguinetti G (2011)
Online variational inference for state-space models with point-process observations.
Neural Comput 23:19671999.
12. Zammit Mangion A, Sanguinetti G, Kadirkamanathan V (2012) Variational estimation
in spatiotemporal systems from continuous and point-process observations. IEEE T Sig-
nal Process 60:34493459.
13. Wikle CK, Holan SH (2011) Polynomial nonlinear spatio-temporal integro-difference
equation models. J Time Ser Anal 32:339350.
14. Cressie NAC, Wikle CK (2011) Statistics for Spatio-temporal Data (Wiley, New Jersey).
15. Diggle P, Rowlingson B, Su T (2005) Point process methodology for online spatio-tem-
poral disease surveillance. Environmetrics 16:423434.
16. Brix A, Moeller J (2001) Space-time multi type log Gaussian Cox processes with a view
to modelling weeds. Scandinavian Journal of Statistics 28:471488.
17. Rasmussen CE, Williams CKI (2006) Gaussian Processes for Machine Learning (The MIT
Press, Cambridge, MA).
18. Kot M, Lewis MA, van den Driessche P (1996) Dispersal data and the spread of invading
organisms. Ecology 77:20272042.
19. Wikle CK (2002) A kernel-based spectral model for non-Gaussian spatiotemporal pro-
cesses. Stat Model 2:299314.
20. Dewar M, Scerri K, Kadirkamanathan V (2009) Data-driven spatiotemporal modeling
using the integro-difference equation. IEEE T Signal Process 57:8391.
21. Scerri K, Dewar M, Kadirkamanathan V (2009) Estimation and model selection for an
IDE-based spatio-temporal model. IEEE T Signal Process 57:482492.
22. Rodrigues A, Diggle P (2010) A class of convolution-based models for spatiotemporal
processes with non-separable covariance structure. Scandinavian Journal of Statistics
37:553567.
23. Higdon D (1998) A process convolution approach to modelling temperatures in the
North Atlantic ocean (with Discussion). Environ Ecol Stat 5:173190.
24. Bracewell R (2000) in The Fourier Transform & its Applications (McGraw-Hill,
Singapore), 3rd Ed, p 122.
25. Freestone DR, et al. (2011) A data-driven framework for neural field modeling. Neuro-
Image 56:10431058.
26. Beal MJ (2003) Variational Algorithms for Approximate Bayesian Inference. PhD thesis
(Gatsby Computational Neuroscience Unit, University College London, United
Kingdom).
27. Smidl V, Quinn A (2005) The Variational Bayes Method in Signal Processing (Springer
Verlag, New York).
28. Jazwinski AH (1970) Stochastic Processes and Filtering Theory (Academic Press,
London).
29. Stroud JR, Mueller P, Sanso B (2001) Dynamic models for spatiotemporal data. J R Stat
Soc B Stat Met 63:673689.
30. Diggle P (1985) A kernel method for smoothing point process data. Appl Stat
34:138147.
6 of 6 www.pnas.org/cgi/doi/10.1073/pnas.1203177109 Zammit-Mangion et al.
Supporting Information
Zammit-Mangion et al. 10.1073/pnas.1203177109
SI Text
Nonparametric Estimation of the Stochastic Integro-Difference Equa-
tion (SIDE) from Point Process Observations. Standard nonpara-
metric methods may be easily extended to estimate unknown
quantities in the SIDE. First, for jjs rjj, it can be shown that
ln g
k;k1
k
I
ln g
k;k
; [S1]
where * is the convolution operator. Nonparametric estimators
for the pair auto-correlation function (PACF) and the pair
cross-correlation function (PCCF) are well known (SI Text), sug-
gesting that the kernel k
I
can be obtained through the decon-
volution of Eq. S1. The problem may be seen as one of image
restoration where the task is to recover an original image k
I

from a degraded image ln g


k;k1
, for which standard image
processing techniques such as the direct inverse filtering can
be used. Further, one can show that
k
Q
ln g
k1;k1
k
I
k
I
ln g
k;k
: [S2]
The proofs for Eqs. S1 and S2 are given below. For ease of the
exposition we consider a one-dimensional domain s; r R,
js rj with the intensity defined as
k
s expz
k
s where
z
k
s is a homogeneous Gaussian process with zero mean and
covariance function
2
k

k
. The results hold for a homoge-
neous and isotropic field in any dimension.
Proof for Eq. S1: Because z
k
s has zero mean, the following rela-
tionships hold:
Ez
k
s k
I
z
k
s 0; [S3]
Ez
k
s
2

2
k
; [S4]
Ez
k
sk
I
z
k
s
2
k
k
I

k
; [S5]
Ek
I
z
k
s
2

2
k
k
I
k
I

k
0: [S6]
By Eq. 1 and the assumption that e
k
s is uncorrelated with z
k
s,
the intensity cross second moment is given by

2
k;k1
E
k
s
k1
s
Eexpz
k
s z
k1
s
Eexpz
k
s k
I
z
k
s e
k
s
exp
_

2
k
2

2
k
k
I
k
I

k
0
2

2
k
k
I

k

k
Q
0
2
_
:
[S7]
Because
k
expz
k
s, the quantities
1
k
and
1
k1
(through
Eqs. S4 and S6) are given as

1
k
exp
2
k
2; [S8]

1
k1
exp
_

2
k
k
I
k
I

k
0
2

k
Q
0
2
_
; [S9]
so that the log PCCF is given by
ln g
k;k1

2
k
k
I

k
: [S10]
But by Eq. 4 ln g
k;k

2
k

k
to give Eq. S1.
Proof for Eq. S2: To obtain an expression for k
Q
, the PACF at
subsequent time steps is considered. Again, considering zero
mean stationarity of z
k
s

2
k1;k1
Eexpk
I
z
k
s k
I
z
k
s e
k
s
e
k
s ; [S11]
which, after some lengthy algebraic manipulation, can be shown
to be

2
k1;k1
expk
Q
0 k
Q

2
k
k
I
k
I

k
0

2
k
k
I
k
I

k
: [S12]
Eq. S2 subsequently follows from Eqs. 4 and S9.
Nonparametric Estimation of First-Order and Second-Order Point
Process Statistics. Denote the spatial point process at k as P
k
.
If P
k
is first-order stationary, then an estimator for
1
k
is given
as (1)

1
k

N
k
jOj
; [S13]
where N
k
is the cardinality of P
k
. In some cases, this assumption
does not hold and one may instead employ either standard linear
regression methods to mark out clear intensity trends (2) or a
standard nonparametric kernel estimator (3)

1
k
s

s
i
P
k
k
b
jjs s
i
jj
c
O;b
s
i

: [S14]
Here, c
O;b
s
i
is an edge-correction factor given as c
O;b
s
i

O
k
b
jjs s
i
jjds and k
b
s is the Epanenikov kernel which in
one dimension is given as
k
b
s
3
4b
_
1
s
2
b
2
_
1jsj 1: [S15]
A nonparametric estimator for the PACF is given by (2)
^ g
k;k

1
2jOj

s
i
;s
j
P
k
k
b
jjs
i
s
j
jj

1
k
s
i

1
k
s
j
ws
i
; s
j

; [S16]
where ws
i
; s
j
is the fraction of the circle (in two dimensions)
with center s
i
and radius jjs
i
s
j
jj lying in O. Similarly, an estimate
of the PCCF is given by
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 1 of 14
^ g
k;k1

1
2jOj

s
i
P
k
s
j
P
k1
k
b
jjs
i
s
j
jj

1
k
s
i

1
k1
s
j
ws
i
; s
j

: [S17]
If the processes are taken to be second-order stationary also in
time, to smooth out the nonparametric estimates an average over
all K time steps may be taken so that
g
k;k

1
K
K
k1
^ g
k;k
; [S18]
g
k;k1

1
K 1
K1
k1
^ g
k;k1
: [S19]
The set of nonparametric estimators Eqs. S16S17 and the
averaged estimates Eqs. S18S19 may be used to estimate Eq. S2.
Note that if temporally averaged PACF/PCCFs are used, the
inverse filter is given as
^
k
I
F
1
_
Flng
k;k1

Flng
k;k

_
: [S20]
Case Studies: Nonparametric Estimation from Point Process Observa-
tions. High frequency (small-scale) spatial interactions. Here we con-
sider the SIDE of Eq. 1 with f
1
z
k
s z
k
s,
t
1, k
I

0.05 expjjjj
2
and k
Q
0.8 expjjjj
2
5 on a domain
s s
1
; s
2
O 36 36. Synthetic data was generated by dis-
cretizing Eq. 1 on a 50 50 grid and carrying out the recursion
for K 100 time points; this involved the discretization of k
Q

to obtain a 2; 500 2; 500 covariance matrix which was then


sampled from. The initial field z
0
s was assumed to be drawn
from the distribution of the disturbance. Point process
observations were then generated using the method of thinning
(4) from the exponentiated underlying field. The sequence of
time frames shown in Fig. S5 is representative of the whole
set; here the point process observations are superimposed on
the instantiation of the spatiotemporal field which in this case
is known, but not used in the analysis.
In the analysis stage, intensities governing the point process at
each time instant were first assumed to constitute a series of cor-
related log-Gaussian Cox processes (LGCPs) with first-order sta-
tionarity, the estimate of which is given through Eq. S13.
Algorithm S1 was then carried out on the dataset to give nonpara-
metric estimates of the mixing kernel
^
k
I
and the noise kernel
^
k
Q
using the exact inverse filter of Eq. S20. Fig. S6Ashows that
the estimates (red) conform excellently with the true kernels
(blue). The figure also compares the bandwidth of the estimated
kernels to the true ones and it clearly shows that the spatial dy-
namics are very localized (high frequency) when compared to the
in-time correlations.
Low frequency (large-scale) spatial interactions. A second synthetic
dataset was generated under the same conditions as above but
with a significantly wider mixing kernel characteristic of a diffu-
sion process, k
I
0.01 expjjjj
2
15. The data was subse-
quently analyzed with Algorithm S1. As seen in Fig. S6B, the
kernel estimates are seen to conform well with the true kernels.
Note that in this case the dynamics exhibit interactions over a
wide range.
Finite-Dimensional Reduction of the SIDE. For conciseness, define
Az
k
s k
I
s; rz
k
rdr. We now employ a standard Galer-
kin-type finite-dimensional reduction on z
k
s by first expanding
z
k
s
n
i1

z
i
sx
k

T
z
x
k
;
z
s R
n
z
to obtain

z
s
T
x
k1
A
z
s
T
x
k
e
k
s; [S21]
and, subsequently projecting through the inner product with

z
s, h
z
s; i, to obtain
h
z
s;
z
s
T
ix
k1
h
z
s; A
z
s
T
ix
k
h
z
s; e
k
si:
[S22]
Let
x
h
z
s;
z
s
T
i and
A
h
z
s; A
z
s
T
i. Then
x
k1

1
x

A
x
k

1
x
h
z
s; e
k
si: [S23]
We next decompose the heterogeneous effects in
Q
s; k
I
s; r
and k
Q
s; r using vectors of basis functions

Q
R
n

Q
;

k
I
R
n
k
I
and
k
Q
R
n
k
Q
respectively to obtain

Q
s

Q
s
T
; [S24]
k
I
s; r
k
I
s
T

k
I
r; [S25]
k
Q
s; r
k
Q
s
T

k
Q
r; [S26]
where R
n

Q
,
I
R
n
k
I
n
k
I
and
Q
R
n
k
Q
n
k
Q
are unknown
and, in addition to each x
k
, need to be estimated from the data.
Under this decomposition,
A
is given by

A
h
z
s; A
z
s
T
i [S27]
h
z
s;
_
O

k
I
s
T

k
I
r
z
r
T
dri [S28]

__
O

z
s
k
I
s
T

k
I
r
z
r
T
drds [29]
h
z
s;
k
I
s
T
i
I
h
k
I
s;
z
s
T
i; [S30]
to give

A

z;k
I

T
z;k
I
; [S31]
where
z;k
I
h
z
s;
k
I
s
T
i. Similarly, the mean of the added
disturbance w
k

1
x
h
z
s; e
k
si is given by
Ew
k
E
1
x
h
z
s; e
k
si [S32]

1
x
h
z
s;

Q
s
T
i; [S33]
to give
Ew
k

1
x

z;
Q
: [S34]
Further, the second moment is given by
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 2 of 14
Ew
k
w
T
k

1
x
Eh
z
s; e
k
sihe
k
r;
z
r
T
i
1
x

1
x
__
O

z
sEe
k
se
k
r
z
r
T
dsdr
1
x

1
x

__
O

z
s

Q
s
T

Q
r
z
r
T
dsdr

__
O

z
s
k
Q
s
T

k
Q
r
z
r
T
dsdr
1
x
;
[S35]
so that
covw
k

1
x
h
z
s;
k
Q
s
T
i
Q
h
k
Q
s;
z
s
T
i
1
x
; [S36]
to give
covw
k

1
x

z;k
Q

T
z;k
Q

1
x
; [S37]
where the matrices
z;
Q
h
z
s;

Q
s
T
i and
z;k
Q

h
z
s;
k
Q
s
T
i.
As shown in the main text considerable simplifications are ob-
tained when choosing the basis
z
s

Q
s
k
I
s
k
Q
s.
Consistency in Event Rate Between the Afghan War Diary (AWD),
Afghanistan NGO Safety Office (ANSO) Armed Opposition Groups (AOG)
Reports, Armed Conflict Location and Event Dataset (ACLED), and Glo-
bal TerrorismDatabase (GTD) Datasets. The AWDis a large but highly
heterogeneous collection of event logs but the reluctance of gov-
ernment officials to confirm the dataset as being factually accurate
and a complete portrayal of the Afghan war (5) has led a few to
doubt its use as a reflection of the ground truth. As a result, sub-
stantial verification efforts were carried out by several researchers
and organisations alike. The New York Times Company, for in-
stance, cross-validated a number of logged entries with its own
media reports*. OLoughlin et al. on the other hand compared the
spatial and temporal distribution of violent events in the AWD to
that in the ACLED, finding significant correlation in support of
consistency (6).
Whilst it is is beyond the scope of this article to verify the AWD
as a representatory dataset, here we show some of our own cor-
relation tests between the AWD and ACLED (7), the ANSO Q4
reports on AOG initiated attacks

, and the GTD

in the hope that


our results help to further evidence consistency across different
databases. Throughout, our basic assumption is that the inci-
dence rate in the AWD, both violent and nonviolent, should cor-
roborate with that in other datasets, both geographically and tem-
porally. We note here that the corroborating datasets are not free
from sampling biases themselves: for instance, NGO reports from
extremely dangerous regions such as Helmand may underesti-
mate the event counts simply because NGOs may have a lesser
presence there due to security reasons. We stress therefore that
the level of corroboration we expect to find needs not be perfect,
but simply a confirmation of general trends.
ACLED. The ACLED dataset is an extensive one, constructed from
a variety of local and international media sources and NGO
reports. The majority of logs in the ACLED dataset denote in-
stances of political violence, where force is exercised by one or
more actors (governments, militias, rebel groups) for a political
end. As such one would expect the incidence rate of ACLED
events to corroborate with violent events in the AWD and indeed
(6) found a significant correlation in the respective geographical
distributions in 2008/2009. Here we replicated the study with the
difference that we included all of the AWD in the analysis, and
not just violent events. A geographical assessment revealed
significant correlations in both years, with a Pearsons correlation
coefficient of r 0.88 in 2008 and r 0.92 in 2009. See Fig. S7 A
and B.
ANSO AOG reports. ANSO monitors the activity of AOG nation-
wide and its Q4 reports provide detailed overviews of the fre-
quency of AOG incidents on a provincial basis. A similar geo-
graphical assessment to that carried out with ACLED revealed
significant (although weaker) correlations between AOG activity
and event log incidence in the AWD, with a Pearsons correlation
coefficient of r 0.73 in 2008 and r 0.59 in 2009. Omission of
Helmand, a clear outlier in this analysis, improved r to 0.84 and
0.89 respectively. See Fig. S7 C and D.
GTD. The GTD is a collection of international terrorist incidents,
1,783 of which are located in Afghanistan between 2004 and 2009.
Most terrorist attacks are reported as being perpetrated by the
Taliban and list a number of targets including private citzens,
the telecommunication infrastructure, government buildings
and personnel and NGOs. As with the ANSO reports we thus
expected the prevalence of logs in this database to corroborate
with the AWD and a correlation test across all years strongly con-
firmed this with a coefficient of r 0.93. See Fig. S7E. Because
the geographical location in the GTD is coded through city
names, generally inconsistent with those in standard shapefiles,
a comparison on a provincial level was omitted. String similarity
checks with the use of, for instance, the Levenshtein distance, did
not produce reliable merges in this case.
Basis Function Selection. For basis selection we adopt a frequency-
based approach. First the Fourier transform of the average PACF
is found, from which a cutoff frequency of
c
cycles/unit is se-
lected. Second, localized reconstruction kernels are placed at reg-
ular intervals throughout the spatial domain. The resolution of
the lattice on which the functions are placed has to be small en-
ough to avoid aliasing by satisfying Shannons sampling criterion.
In particular, if the centers of the basis functions, denoted f
i
g
n
i1
,
are set equal to the sequence of vectors describing a regular lat-
tice of edge length
s
in O, then it is required that

s
<
1
2
c

1
2
0

c
; [S38]
where
0
is an oversampling parameter. Lastly, the width of the
local functions, which determines the range of frequencies they
can represent, is set to cater for the frequency content of the spa-
tial field at each time point.
To demonstrate the last step, consider the case when the basis
functions are Gaussian radial basis functions (GRBFs) with func-
tional form s exps
2
2
2
b
. The Fourier transform of the
GRBF is yet another GRBF (in the frequency domain) given as
Ffsg

2
2
b
_
exp2
2

2
b

2
; [S39]
so that the variances in the spatial and frequency domain are re-
lated through the mappings (8)

1
4
2

2
b
;
2
b

1
4
2

: [S40]
The range of frequencies which can be represented by the basis
functions has to exceed that of the field for adequate reconstruc-
tion. To this end Sanner and Slotine (8) suggest the following
relationship
*http://www.nytimes.com/2010/07/26/world/26editors-note.html?_r=1

Reports are freely available from the official ANSO website http://www.afgnso.org.

National Consortium for the Study of Terrorism and Responses to Terrorism (2011)
Retrieved from http://www.start.umd.edu/gtd
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 3 of 14

2
p
c
: [S41]
Given

, [S40] can then be used to find the width of the desired


GRBF in O. By substitution of Eq. S41 in [S40],
b

2
2
c

12
. The resulting basis is hence a set of GRBFs with
parameter
b
placed in the spatial domain centered on the coor-
dinates f
i
g
n
i1
.
Because GRBFs are not of compact support (and hence have a
diminishing yet far-reaching effect), we instead use a compact
radial basis function, termed the compact GRBF (CGRBF), of
the form
s
_
2jjsjj1cos jjsjj2
3
2
sinjjsjj
3
; jjsjj < 2;
0; otherwise;
[S42]
for > 0 and where jj jj denotes the usual Euclidean distance on
O. The CGRBF closely resembles the usual GRBF with
s exp
2
jjsjj
2
2, however it is of compact support. For
a given GRBF parameter
b
, or a cutoff frequency
c
, the
CGRBF parameter is then given by

p

b

2
2
c

3
:
_
[S43]
For the AWD a cutoff frequency of
c
0.2 cyclesunit was
selected from the average PACF shown in Fig. S8A, thus corre-
sponding to the basis parameter 0.9

p
. CGRBFs were then
equally spaced within the entire domain with an oversampling
parameter of
0
1.3. A cross-section of the chosen basis func-
tion is shown in Fig. S8A.
Initially basis functions were placed on a 16 16 grid with an
intercentre spacing of
s
1.9 and covering the whole of Afgha-
nistan. Many of these basis functions were however considered
redundant, representing areas exhibiting no logged events, or a
very small number of events. To avoid problems of identifiability
in these regions (see also study in SI Text ), a constant background
intensity baseline b
1
was used to represent activity in these areas.
Each basis function was analyzed separately; if a basis function
had its center more than 0.4 spatial units outside Afghanistan
or had on average less than eight logged events per year within
1.3 standard deviations from its center (corresponding to a back-
ground event rate b
1
3.5) it was omitted. The final arrange-
ment of the basis functions together and the spatial distribution
of all the logged events of the AWD in Afghanistan are shown in
Fig. S8 B and C respectively. Note that basis functions are
omitted in quiet areas. To cater for the inclusion of the back-
ground event rate, we set the first covariate, d
1
s 1 so that

k
s expb
1
b
2
d
2
s b
3
d
3
s z
k
s: [S44]
As can be seen from Eq. S44, in the absence of other covariates,
the intensity where there are no basis functions simply reduces
to expb
1
.
Whilst placing basis functions only in regions which are highly
represented by the data is not new both in spatial (9) and spatio-
temporal (10) systems, care should be exercised if extending the
problem to an online setting. Basis function omission induces a
strong prior on the model, and unexpected changes in unrepre-
sented regions would not be detected with the current setup (11);
a realistic trade-off, in this case, needs to be found between the
computational complexity and level of representation. We note
that this issue is virtually inexistent in offline studies (such as this
one), where it is very common to employ a basis which is not
amenable to substantial changes in temporal behavior (12, 13).
Controlling for Purely Spatial Variation. As is typical in spatiotem-
poral point process applications, we considered the addition of
deterministic components to the intensity model to control for
certain demographical and topological features:
Population density. It is generally an accepted notion in conflict
that more populous regions witness, on average, more conflict
events than rural regions (14). This association was confirmed
by our study which compared extrapolations from the 1979 census
and a precensus survey in 2003/2004 to the spatial intensity of the
AWD, see Fig. S10A. Data, available from the Central Statistics
Organization, http://cod.humanitarianresponse.info/country-
region/afghanistan, is aggregated at the district level.
Distance to major city. Similar to population density, locations far
away from any large concentrations of population are likely to
witness less conflict events. This association was confirmed by
our study which compared a digital map of distances to the 33
major cities (as from 1994) in Afghanistan to the intensity
map, see Fig. S10B. Data for settlement location is available from
the Afghanistan Information Management Service, http://www
.aims.org.af/services/mapping/shape_files/afghanistan/point/.
Elevation and terrain type. In (6) it was seen that most violent
events occurred in flat terrain. This observation does not imply
that flat terrain is more susceptible to conflict and in fact we
found no evidence that this need be the case, see Fig. S10C.
We also found no simple relationship between elevation and
overall event intensity, see Fig. S10D. Topology data was obtained
from the GTOPO30 dataset (U.S. Geological Survey 2007, 30-arc
second resolution).
Distance to Pakistan border. It is well known that the proximity to
borders increases the prospect of conflict. Afghanistan may be no
exception; insurgents are known to use regions across the Pakistan
border for refuge. However, an analysis showed that there was no
direct association between the distance to the Pakistan border and
the conflict intensity map. The lack of association is most probably
due to the high intensities in the central border area (e.g., Pakitka,
Nangarhar, and Kunar) being offshot by the relatively quiet re-
gions in the south near Nimroz and the north in Badakhshan,
see Fig. S10E.
As a result of this study the intensity model was augmented so
that ds 1; d
1
s; d
2
s where the first element corresponds
to a background intensity rate (established in SI Text), d
1
s the
population density and d
2
s the distance to the closest major city,
the latter two variables deemed as having strong associations with
the overall spatial intensity of the AWD.
Variational Bayes (VB) Update Equations. By finding a lower bound
on the marginal likelihood it can be shown that the required
VB marginals for the unknown states X
K
and parameters
;
1
Q
and b b
1
; b
2
; ; b
d
(d denoting the number of
covariates) are given as (19)
~ pX
K
expE
~ p~ pb
ln pY
K
; X
K
; ; b; [S45]
~ p expE
~ pX
K
~ p

~ pb
ln pY
K
; X
K
; ; b; [S46]
~ p
1
Q
expE
~ pX
K
~ p

1
Q
~ pb
ln pY
K
; X
K
; ; b; [S47]
~ pb
i
expE
~ pX
K
~ p~ pb
b
i
ln pY
K
; X
K
; ; b; i 1d;
[S48]
where

denotes the set of variables without and E


~ p
is
used to render specific the distribution relative to which we are
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 4 of 14
taking expectations. Next, we formulate the algorithm used to in-
fer the unknown quantities. Throughout, the notation ijj refers to
the estimate at time i conditioned on data up to time j. For ease
of the exposition the model is rewritten as
x
k1
x
k
~ w
k
; [S49]
where ~ w
k
is now with 0 mean.
State inference. For the computation of [S45] we employ an
approximate variational Kalman smoother (15). Let x
0

N
x0

0
;
0
. and denote the variational forward message as
~ x
k
~ px
k
jy
1k
. This latter quantity is further approximated
through the Laplace method as
~ x
k

_
~ x
k1
expE
~ p~ pb
ln px
k
jx
k1
; py
k
jx
k
; bdx
k1
!
Laplace
N
x
k
^ x
kjk
;
kjk
; [S50]
where y
k
is conditionally independent of . Similarly the back-
ward message
~
x
k
~ py
k1K
jx
k
is given by
~
x
k

_
~
x
k1
expE
~ p~ pb
lnpx
k1
jx
k
; py
k1
jx
k1
; bdx
k1
!
Laplace
N
x
k
^ x
kjk1K
;
kjk1K
: [S51]
The two messages are then combined to give the smoothed esti-
mate:
~ px
k
jy
1K
~ px
k
jy
1k
~ py
k1
jx
k
~ x
k

~
x
k

N
x
k
^ x
kjK
;
kjK
: [S52]
The resulting equations of the forward-backward smoother are
quite involved and given in Algorithm S2.
Escalation inference. Let the prior p N

p
;
;p
. Then, the
posterior ~ p in [S46] is given by
~ p p exp
_

1
2
E
~ pX
K
~ p
1
Q

_

K1
k0
x
k1
x
k

T

1
Q
x
k1
x
k

__
; [S53]
so that N

^
;

where
^

1
;p
^

p
E
~ p
1
Q

1
Q

K1
k0
E
~ pX
K

i1 x
k1
x
k

_
; [S54]


1
;p
KE
~ p
1
Q

1
Q

1
: [S55]
Volatility inference. Let the prior p
1
Q
Wi

1
Q
V
p
; d
p
where
Wi

1
Q
V; d denotes a Wishart distribution with V a positive-
definite, symmetric scale matrix and d degrees of freedom.
The variational posterior of [S46] is given by
~ p
1
Q
p
1
Q
exp
_
K
2
ln j
1
Q
j
1
2
tr
1
Q

_
; [S56]
where

K
k1
E
~ pX
K
~ p
x
k
x
k1
x
k
x
k1

T
: [S57]
It can then be easily shown that ~ p
1
Q
Wi

1
Q

^
V;
^
d where
^
V V
1
p

1
; [S58]
^
d d
p
K: [S59]
Evaluation of requires evaluation of the cross-covariance
matrix in addition to the usual posterior covariance matrices.
The computation of the cross-covariance, also requiring Laplace
approximations (see also ref. 16), is given in the last for loop of
Algorithm S2.
Regression parameters. Under VB we let ~ pb

d
i1
~ pb
i
. Let
the prior pb
i
N
b
i

^
b
i;p
;
2
b
i
;p
. Then the variational posterior
~ pb
i
of Eq. S48, under a Laplace approximation, is given by
~ pb
i
pb
i

kK
__

s
j
y
k
expE
~ pX
K
~ pb
b
i
b
T
ds
j

s
j

T
x
k

_
exp
_
E
~ pX
K
~ pb
b
i
_

_
O
expb
T
ds

T
sx
k

__
ds
_
!
Laplace
N
b
i

^
b
i
;
2
b
i
; i 1d; [S60]
where it can be easily shown that
^
b
i

^
b
i;p

2
b
i
;p
_

k K
s
j
y
k
d
i
s
j

kK
E
~ pX
K
~ pb
b
i
__
O
d
i
s expb
T
ds
T
sx
k
ds
__
;
[S61]

2
b
i

2
b
i
;p

kK
E
~ pX
K
~ pb
b
i
__
O
d
2
i
s expb
T
ds

T
sx
k
ds
__
1
; i 1d: [S62]
Inference for the AWD was completed in less than an hour on
a standard PC: this included the approximation of integrals within
the optimization routines for variational-Laplace on a 100 100
grid, the use of a relatively low tolerance value for terminating the
optimisation routine (0.1% change in sequential function evalua-
tions) and six state-parameter iterations for convergence.
Configuration Notes. State inference: The initial state ^ x
0j0
was set by
first carrying out nonparametric estimation of the field in the first
week (k 1) using conventional methods (17) and then regres-
sing this onto the chosen basis using ordinary least squares.
0j0
was set to 30I. In Algorithm S2 the prior from a Kalman filter
running in parallel and assuming point estimates was used as
an initial condition in the first for loop. In the second for loop,
the mean of the forward message was used as initialization. In
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 5 of 14
both cases, gradient descent was halted after a change of less than
a 0.1% in sequential function evaluations (typically 2030 func-
tion evaluations were required). The integrals in Algorithm S2
were approximated on a 100 100 discrete grid using numerical
quadrature.
Parameter inference: The parameter priors were configured as fol-
lows (recall that b
1
was fixed a priori):
^
b
i;p
0;
2
b
i
;p
10; i 2; 3; [S63]
^

p
0;
;p
1000I; [S64]
d
p
1000; V
p
0.025I: [S65]
The prior scale matrix V
p
was chosen such that its mean is 25I,
where
2
25 is equal to the squared reciprocal of the standard
deviation of the logged increments in 2006, the largest of the four
years 20062009 for which homoskedasticity was met.
Stopping conditions: The VB algorithm was assumed to have con-
verged when the change in
^
and
^
b
i
; i 2; 3 in subsequent itera-
tions was less than 0.005, and when all diagonal elements in
E
1
Q

^
d
^
V changed by less than 1%.
Case Study: VB Estimation from Point Process Observations. Simulation
setup. Here we consider the SIDE of Eq. 1 with f
1
z
k
s z
k
s,
0.1,
t
1, k
I
and k
Q
s; r sr expjjs
rjj
2
3 on a domain s s
1
; s
2
O 0 18 0 18. Spatially
varying volatility is modeled through s 0.52exps
1
5
2

6s
2
2.5
2
12exps
1
10
2
6s
2
13
2
12 and the
nonstationary mean
Q
s is itself generated from a Gaussian pro-
cess (GP)
Q
GP0; k

Q
where k

Q
s; r 3.2expjjsrjj
2
3.
The intensity function was modeled as in Eq. S44 with b
1
2
(d
1
1) and b
2
0; b
3
0; . The simulation configuration re-
flects one which typically gives a total event count of the same order
of magnitude as that present in the AWD.
Synthetic data was generated by discretising the SIDE on a
25 25 grid and carrying out the recursion for K 300 time
points. Point processes observations were once again simulated
using the method of thinning (4) from the true intensity with lin-
ear interpolation used for evaluating the latent data in between
grid points. The complete set of points is shown in Fig. S9A,
where the heterogeneity in the dynamics of the governing inten-
sity is immediately apparent. The simulated growth map and the
volatility surface are shown in Fig. S9 B and C respectively. For
low-rank representation, a set of 8 8 grid of CGRBFs of the
form Eq. S42 were placed on a regular grid inside the domain
and on the boundary, truncated were appropriate, see Fig. S9D.
At this stage basis function omission may be carried out to selec-
tively omit functions in highly unrepresented regions; however
here they will be kept to explicitly show problems of nonidentifia-
bility (of the parameters) in these regions.
As assumed in the AWD, the background rate b
1
and auto-
regressive parameter were considered known in this study.
We then considered two cases, (VB1) one where
1
Q
is set to
the identity matrix (which seemed a reasonable fit on visualiza-
tion of the true precision matrix) and (VB2) one where
1
Q
is
assumed to be fully unknown, with the aim of showing that esti-
mating the spatially varying volatility may indeed lead to better
a-posteriori inference of the intensity function. In both cases both
the latent field and the growth (through ) were considered un-
known, the latter being equipped with a Gaussian prior with para-
meters
^

p
0;
;p
I: [S66]
In VB2 the precision matrix was assigned a Wishart prior with
degrees of freedom and scale matrix given by
d
p
10; V
p
0.2I: [S67]
For the purpose of this study both priors can be considered
largely uninformative. The VB expectation maximization (VBEM)
algorithm was run for 20 iterations until deemed to have con-
verged. In addition to the VBEM algorithm we also implemented
a standard kernel estimator (KE) of the intensity function using
homogeneous, but anisotropic spatiotemporal kernels of the form
ks
1
; s
2
; t exps
1

2
2
2
s
s
2

2
2
2
s
t
3

2
2
2
t

with
s
0.1; 0.2; 0.3; 0.5; 0.8 and
t
1; 1.5; 2; 2.5 . Volu-
metric correction was carried out when
1
3
s
,
2
3
s
or/and

3
3
t
lay outside the spatiotemporal domain O
ST
0 18
0 18 0 300.
SI results. Estimation of
Q
s and the volatility map s by VB2
are shown in Fig. S9 E and F respectively showing a relatively
good agreement with Fig. S9 B and C respectively. Crucially, re-
gions of high volatility are detected independently from the over-
all growth rate in the region.
A comment is due on regions where the algorithm does not
perform so well, in particular in areas which exhibit a consider-
able low event count (this can be seen by comparing to S9A). In
these regions, quantifying parameters such as growth and volati-
lity clearly becomes a futile task; even more so given the expo-
nential form of the intensity function where low intensity fields
correspond to arbitrarily negative latent fields. Clearly spatially
selective priors can be introduced to remedy this problem. How-
ever, in line with existing spatiotemporal literature; e.g., (9, 10),
with the AWD we have opted to simply concentrate modelling
effort in represented areas. This approach also results in obvious
computational savings.
A natural question to ask is whether estimating the spatially
varying volatility contributes to an increase in accuracy to the
latent field and growth variational densities obtained on conver-
gence. To answer this question we computed (i) the mean square
error (MSE) between the true intensity and the a-posteriori med-
ian intensity given by
MSE


1
K
K
k1
_
1
J
2

sO
J

k
s exps
T
^ x
kjK

_
; [S68]
where is used to denote the true value, J
2
is the number of
spatial points over which the error is computed (10,000) and O
J
a correspondingly gridded spatial domain with side length J, (ii)
the MSE between the exponentiated growth field and the a-pos-
teriori median exponentiated growth (with the exponential
applied in this case so as not to penalize for excessively negative
growth rates in regions witnessing scarce events)
MSE

Q

1
J
2

sO
J

Q
s exps
T ^
; [S69]
and finally (iii) the bias in the estimated probabilities of the in-
tensity function computed as
bias
q
q
1
KJ
2

K
k1

sO
J
Iz

k
s z
q
k
s; [S70]
where I is the indicator function, q is a quantile and
~ ps
T
x
k
z
q
k
s q so that for each k; s z
q
k
s denotes the
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 6 of 14
q
th
quantile as supplied by the variational density (for complete
details refer to ref. 20). For a comparison between VB1 and VB2,
the mean of bias
q
evaluated for selected values of qq
0.01; 0.05; 0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7; 0.8; 0.9; 0.95; 0.99 will
be used.
Results of the analysis shown in Table S1 where it is immedi-
ately evident that estimation of the precision matrix contributes
to a considerable reduction in overall intensity MSE, growth
MSE, and bias. Further, inspection on a frame-by-frame basis
showed that in an overwhelming 93% of the times the intensity
MSE given by VB2 was less than that of VB1, thus firmly estab-
lishing the importance of correct variance estimation in the meth-
odology. Finally, the nonparametric estimator is seen to give
reasonable estimates (only the lowest MSE from all attempted
kernels is shown in Table S1, corresponding to
s
0.2;

t
1). However, and although it is possible that here the MSE
can be further reduced using adaptive methods (18), the key
limiation of the KE method remains its inability to (i) capture
uncertainty in a systematic way and (ii) to provide a mechanistic
description of the underlying process, both limitations which
modern spatiotemporal methods such as that presented here
overcome.
1. Stoyan D, Stoyan H (1994) in Fractals, Random Shapes, and Point Fields: Methods of
Geometrical Statistics (Wiley, New York), pp 275305.
2. Brix A, Moeller J (2001) Space-time multi type log Gaussian Cox processes with a view
to modelling weeds. Scand Stat Theory Appl 28:471488.
3. Moeller J, Waagepetersen R (2004) in Statistical Inference and Simulation for Spatial
Point Processes (CRC Press, Boca Raton), pp 2955.
4. Lewis PAW, Shedler GS (1979) Simulation of nonhomogeneous Poisson processes by
thinning. Naval Research Logistics Quarterly 26:403413.
5. Shachtman N (2010) What I saw at Moba Khan: the military reports highlighted by
WikiLeaks dont provide a full picture of the war. Wall Street Journal, http://online
.wsj.com/article/SB10001424052748703977004575393523349648264.html [online: last
accessed 23 June 2012].
6. OLoughlin J, Witmer FDW, Linke AM, Thorwardson N (2010) Peering into the fog of
war: the geography of the WikiLeaks Afghanistan war logs, 20042009. Eurasian
Geogr Econ 51:472495.
7. Raleigh C, Linke A, Hegre H, Karlsen J (2010) Introducing ACLED: an armed conflict
location and event dataset. J Peace Res 47:651660.
8. Sanner R, Slotine J (1992) Gaussian networks for direct adaptive control. IEEE Trans
Neural Netw 3:837863.
9. Crainiceanu CM, Diggle P, Rowlingson B (2008) Bivariate binomial spatial modeling of
Loa loa prevalence in tropical Africa. Amer Statist Assoc 103:2137.
10. Stroud JR, Mueller P, Sanso B (2001) Dynamic models for spatiotemporal data. J R Stat
Soc Series B Stat Methodol 63:673689.
11. Rodrigues A, Diggle P (2010) A class of convolution-based models for spatiotemporal
processes with nonseparable covariance structure. Scand Stat Theory Appl 37:553567.
12. Wikle CK, Cressie NAC (1999) A dimension-reduced approach to space-time Kalman
filtering. Biometrika 86:815829.
13. Berliner LM, Wikle CK, Cressie NAC (2000) Long-lead prediction of Pacific SSTs via Baye-
sian dynamic modeling. Journal of Climate 13:39533968.
14. Weidmann NB, Ward MD (2010) Predicting conflict in space and time. J Conflict
Resolut 54:883901.
15. Beal MJ (2003) in Variational Algorithms for Approximate Bayesian Inference. Ph.D.
thesis (Gatsby Computational Neuroscience Unit, University College London, United
Kingdom), pp 159205.
16. Zammit Mangion A, Yuan K, Kadirkamanathan V, Niranjan M, Sanguinetti G (2011)
Online variational inference for state-space models with point-process observations.
Neural Comput 23:19671999.
17. Diggle P (1985) A kernel method for smoothing point process data. Appl Stat
34:138147.
18. Diggle P, Rowlingson B, Su T (2005) Point process methodology for on-line spatio-tem-
poral disease surveillance. Environmetrics 16:423434.
19. Smidl V, Quinn A (2005) The Variational Bayes Method in Signal Processing (Springer
Verlag, New York).
20. Taylor BM, Diggle PJ (2012) INLA or MCMC? A tutorial and comparative evaluation for
spatial prediction in log-Gaussian Cox processes. http://arxiv.org/pdf/1202.1738v2.pdf
[online: last accessed 12 April 2012].
Fig. S1. (A) Average log PACF ln g
k;k
() and average log PCCF ln g
k;k1
(). (B)
^
k
I
as computed from Eq. S20.
Fig. S2. Estimated mean intensity E
k
s on the first week of the month and respective year.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 7 of 14
Fig. S3. Provincial map of Afghanistan (accurate as from 2010).
Fig. S4. Normalized histograms of log AOG activity count in 2010 per province as obtained from MC simulations, shown together with true growth
(circle, blue) and sample median (circle, red). The two stems overlap considerably in Baghlan, Kunar, Ghor, Nangarhar, and Khost.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 8 of 14
Fig. S5. Three time frames from a single realization of a spatiotemporal point process, where the intensity function exponentiates a field evolving according
to the SIDE of Eq. S1 with k
I
0.05 expjjjj
2
and k
Q
0.8 expjjjj
2
5.
Fig. S6. Line plots: true () and nonparametric estimates (- -) of k
I
and k
Q
from point process observations with (A) k
I
0.05 expjjjj
2
, k
Q

0.8expjjjj
2
5 and (B) k
I
0.01expjjjj
2
15, k
Q
0.8expjjjj
2
5. Upper surface plots: spatial frequency response of the true kernels. Lower
surface plots: spatial frequency response of the estimated kernels.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 9 of 14
Fig. S7. Correlation between the event incidence in the AWD and other datasets. (A) ACLED 2008 on a province-by-province basis. (B) ACLED 2009 on a
province-by-province basis. (C) ANSO AOG initiated attacks 2008 on a province-by-province basis. (D) ANSO AOG initiated attacks 2009 on a province-by-pro-
vince basis. (E) GTD 20042009 on a year-by-year basis.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 10 of 14
Fig. S8. (A) Average log PACF g
k;k
r as a function of radial distance r between 2004 and 2009 and cross-section of the isotropic basis function employed in
study. Here one unit in r corresponds to approximately 0.4 degrees (latitude/longitude). (B) Spatial location of of all logged events between 2004 and 2009. Of
the roughly 77,000 logs constituting the AWD, the 75,676 located within Afghanistans borders were considered in the analysis. (C) Basis function placement in
spatial domain with the red contours denoting the 1 mark. Functions were omitted in regions within the country (such as in the extreme North East and South
West) which contain few, sparse events. These events which are few and far between were instead captured with the use of a background activity baseline
Fig. S9. (A) Superposition of the first 50 spatial point patterns in the simulation study. (B) True growth function
Q
s. (C) True variance map s
2
. (D) Basis
function placement in simulation study with the red contours denoting the 1 mark. (E) Mean a-posteriori estimate of
Q
s. (F) Mean a-posteriori estimate of
s
2
. Note how mismatches occur in regions of very low event count, as expected.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 11 of 14
Fig. S10. Spatial maps (left) and empirical relationships (right) between the independent variable and the log spatial intensity of the AWD () together with
the 1 interval (red lines, - -) and global mean (- -). (A) Population density. (B) Distance to closest major city. (C) Terrain slope. (D) Elevation. (E) Distance to
Pakistan border. All studies were carried out on a 200 200 spatial grid.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 12 of 14
Movie. S1. A-posteriori intensity of the AWD between 2004 and 2009 at regular intervals of one week. The animation shows strong evidence for a consistent
trend of increasing activity in Afghanistan, with particular growth in the Southern provinces from 2006 onwards (MPG; 13.7 MB)
Movie S1 (MPG)
Movie S2. Kernel intensity estimation of the AWD between 2004 and 2009, showed at regular intervals of one week. For this study, the homogeneous kernel
chosen was of the form exps
2
1
0.5 s
2
2
0.5 t
2
2 (MPG; 14.7 MB)
Movie S2 (MPG)
Table S1. Results of simulation
study of SI Text S10
MSE

MSE

Q
mean(bias
q
)
VB1 1.939 0.176 0.099
VB2 1.699 0.161 0.063
KE 2.324 N.A. N.A.
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 13 of 14
Algorithm S1: Analysis for dynamic, homogeneous, isotropic spatiotmporal point processes
1. Estimate
1
k
sk using Eq. S13 for stationary systems or simple regression where clear trends (linear or otherwise) are evident.
2. Estimate ^ g
k;k
; ^ g
k;k1
k from Eq. S16 and Eq. S17 respectively.
3. Estimate
^
k
I
from Eq. S20 using g
k;k
and g
k;k1
from Eqs. S18S19.
4. Estimate
^
k
Q
from Eq. S2 using
^
k
I
and g
k;k
from Eq. S18.
Algorithm S2: VB-Laplace smoother for the AWD model (Note: Integrations are carried out by numerical quadrature).
Time interval
t
1 is assumed throughout. Expectations are
taken with respect to the relevant distributions.)
Input: Data set Y
K
, parameters b;
0
;
0
and parameter distributions ~ p; ~ p
1
Q
~ pQ.
Forward message
Set ^ x
0j0

0
and
0j0

0
.
for k 1 to K

k1

1
k1jk1
EQ
1
~

k
EQ EQ

k1
EQ
1
~ x
k

~

k
_
EQ

k1

1
k1jk1
^ x
k1jk1
EQE EQE
_
:
^ x
kjk
arg max
x
k

s
j
y
k
Eb
T
ds
j
s
j

T
x
k

O
Eexpb
T
ds exp
T
sx
k
ds
1
2
x
k
~ x
k

T ~

1
x
k
~ x
k

kjk

~

1
k

O
ss
T
exps
T
x
kjk
Eexpb
T
dsds
1
end for
Backward message
Set
1
KjK1K
0 (ignore estimate of end condition)
for k K 1 down to 0
x
0
k1
arg max
x
k1

s
j
y
k1
Eb
T
ds
j
s
j

T
x
k1

O
Eexpb
T
ds exp
T
sx
k1
ds
1
2
x
k1
^ x
k1jk2K

T ~

1
x
k1
^ x
k1jk2K

0
k1

_

1
k1jk2K

O
ss
T
exps
T
x
0
k1
Eexpb
T
dsds
1

kjk1K
EQ EQ
01
k1
EQ
1
EQ
1
x
kjk1K

kjk1K
EQE EQ
01
k1
EQ
1

01
k1
x
0
k1
EQE
end for
Smoothed estimate
for k 0 to K

kjK

1
kjk

1
kjk1K

1
^ x
kjK

kjK

1
kjk
^ x
kjk

1
kjk1K
^ x
kjk1K

end for
Computation of cross-covariance fM
k
g
K
k1
for k K down to 1
M
kjK

k1
EQ
1
kjk1K
EQ
O
ss
T
exps
T
^ x
kjK
Eexpb
T
dsds EQ

k1
EQ
1
end for
Output: f^ x
kjK
;
kjK
g
K
k0
; fM
kjK
g
K
k1
:.
Other Supporting Information
Dataset S1 (CSV)
Zammit-Mangion et al. www.pnas.org/cgi/doi/10.1073/pnas.1203177109 14 of 14

You might also like