You are on page 1of 13

Advances in Water Resources 77 (2015) 6981

Contents lists available at ScienceDirect

Advances in Water Resources


journal homepage: www.elsevier.com/locate/advwatres

Direct forecasting of subsurface ow response from non-linear dynamic


data by linear least-squares in canonical functional principal component
space
Aaditya Satija , Jef Caers
Energy Resources Engineering, Stanford University, USA

a r t i c l e

i n f o

Article history:
Received 24 August 2014
Received in revised form 2 January 2015
Accepted 4 January 2015
Available online 12 January 2015
Keywords:
Inverse modeling
Functional data analysis
Canonical correlation analysis
Groundwater
Reservoir modeling

a b s t r a c t
Inverse modeling is widely used to assist with forecasting problems in the subsurface. However, full
inverse modeling can be time-consuming requiring iteration over a high dimensional parameter space
with computationally expensive forward models and complex spatial priors. In this paper, we investigate
a prediction-focused approach (PFA) that aims at building a statistical relationship between data variables and forecast variables, avoiding the inversion of model parameters altogether. The statistical relationship is built by rst applying the forward model related to the data variables and the forward model
related to the prediction variables on a limited set of spatial prior models realizations, typically generated
through geostatistical methods. The relationship observed between data and prediction is highly non-linear for many forecasting problems in the subsurface. In this paper we propose a Canonical Functional
Component Analysis (CFCA) to map the data and forecast variables into a low-dimensional space where,
if successful, the relationship is linear. CFCA consists of (1) functional principal component analysis
(FPCA) for dimension reduction of time-series data and (2) canonical correlation analysis (CCA); the latter
aiming to establish a linear relationship between data and forecast components. If such mapping is successful, then we illustrate with several cases that (1) simple regression techniques with a multi-Gaussian
framework can be used to directly quantify uncertainty on the forecast without any model inversion and
that (2) such uncertainty is a good approximation of uncertainty obtained from full posterior sampling
with rejection sampling.
2015 Elsevier Ltd. All rights reserved.

1. Introduction
Inverse modeling with dynamic data such as piezometric head
(pressure), tracer or transport data has been an active area of
research as evidenced by several review papers in both oil/gas
and groundwater research [14]. Essentially two main factors
make this a difcult to solve problem. First, the relationship
between model parameters and the data is complex, non-linear
and often requires the numerical solution of partial differential
equations that are CPU demanding, limiting in practice the number
of forward model runs that can be done. Secondly, the ill-posedness nature, due to non-linearity and incompleteness of the
dynamic data, requires the formulation of a 3D spatial prior model
based on other knowledge of the subsurface geological heterogeneity. The latter may include elements of structural uncertainty (layering and faults), lithofacies and petrophysical properties (porosity,
hydraulic conductivity) from geological and geophysical data
Corresponding author.
http://dx.doi.org/10.1016/j.advwatres.2015.01.002
0309-1708/ 2015 Elsevier Ltd. All rights reserved.

sources. Ignoring such prior information may lead to inverse solutions that are geologically unrealistic and have limited forecasting
ability. In addition, modelers need to be mindful of generating
inverse solution that span a realistic range of uncertainty [5,6].
The latter requires formulating the inverse problem as a sampling
problem by adhering to certain theories that combine likelihood
and prior such as Bayes theory or perhaps more general, Tarantolas conjunction of information models.
Regardless of the methodologies being used, the end-goal of
inverse modeling is not necessarily the inverted subsurface models
or parameters themselves. Instead, one often requires performing
forecasts using the inverse solutions. Such forecasts could be the
future evolution of an existing plume of contaminants or the arrival
time at certain wells. The forecasting model itself may require a forward numerical model different from the forward model
establishing the data variables. In a recent contribution [7], it was
recognized that time-consuming full-edged inverse modeling
may not always be required. In their work, a direct, statistical,
relationship was built between the data variables and the forecast

70

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

(prediction) variables using non-linear principal component analysis (NLPCA). The latter required the generation of only a few subsurface models and accompanying forward model runs (200 in their
case) and did not require any iterative inverse modeling. For the
case studied their method provided the same posterior uncertainty
in the forecast as the full sampling technique (rejection sampling).
However, their approach, relying on non-linear principal component analysis required reducing the dimension of both data and prediction variables to a very low-dimensional space (12D). This may
not be realistically feasible in all practical cases. In addition, their
method called upon sampling the non-linear low-dimensional relationship between data and prediction variables using a Metropolis
sampler, which may be difcult to tune to convergence.
In this paper we extend on the idea in [7], by relating data variables and prediction variables using functional data analysis (FDA).
Techniques of FDA have been recently developed in the eld of statistical science [8] and rely essentially on modeling a phenomenon
by means of a linear combination of basis functions where coefcients are statistically estimated from a set of observations. For
example, a series of weather stations with measured temperature,
moisture or atmospheric pressures constitute time-series that vary
systematically (functionally with time), but have in addition station-specic uctuations that can be modeled using a stochastic
process (statistical uctuations). We recognize that most dynamic
data in the subsurface are essentially time-series of dynamic variations observed at well locations (stations). We will use functional
data analysis to build a functional principal component space of
both the data variables and the prediction variables based on a
few forward models runs. Next, we will attempt to linearize the
relationship between the functional components of data and prediction variables using canonical analysis. In case this linearization
is statistically successful, then we propose to directly quantify
uncertainty in the forecast using linear least-squares (Gaussian
process regression) based on the observed data mapped into the
canonical functional principal component space. If this is not
successful, full non-linear inverse modeling is required. We rst
illustrate our proposed approach on the same case study as
presented in [7] allowing a comparison with NLPCA and full
inverse modeling with rejection sampling. Then we apply the same
methodology to an extension of the same case.
2. Acronyms
Because of the extensive use of existing acronyms, the following
summary is provided:
PFA: Prediction Focused Analysis [7].
CCA: canonical correlation analysis [9,10].
PCA: principal component analysis [10].
NLPCA: non-linear principal component analysis [11].
MPS: Multiple Point Statistics [12].
FCA: Functional Component Analysis [13].
FDA: functional data analysis [14].
CFCA: Canonical Functional Component Analysis (this paper).
IMPALA: Improved Parallel Multiple-point Algorithm using List
Approach [15].
 MaFloT: Matlab Flow and Transport [16].











3. Brief review of multivariate analysis


To develop the methodology for direct forecasting, multiple
existing statistical techniques for multivariate analysis have been
used. Some are possibly more specialized in nature, hence this
short review section. A summary has been provided in Table 1.
In particular, we use Functional Component Analysis (FCA) [13]

for dimension reduction and canonical correlation analysis for linearizing the relationship between two multivariate quantities.
3.1. Functional Component Analysis
To perform Functional Component Analysis (FCA), functional
data analysis (FDA) [8] and principal component analysis (PCA)
are performed in succession. FDA assumes that changes in any
measurement of a physical variable over space or time is based
on an underlying smooth physical process [14] that in turn can
be mathematically represented using a continuous and differentiable mathematical function and that this function not always needs
to be known for analyzing measurements. This assumption allows
for the decomposition of any time series measurement xt into a
linear combination of underlying continuous functions called basis
functions, forming a functional basis. Multiple functional bases
such as a sinusoidal basis, a Fourier basis, a polynomial basis, an
exponential basis, a spline basis etc are available and the choice
between them is application driven. The spline basis has an advantage over the others with its versatility in terms of computational
ease of evaluation of basis functions and their derivatives as well as
its exibility. When a time series xt is analyzed using a splinebasis of L spline functions fn1 t; n2 t    nL tg, it can be represented as a linear combination

xt

L
X
kn;i ni t

i1

where the FDA components kn;i are the scalar linear combination
coefcients of the spline function ni t.
PCA is a classical multivariate analysis procedure that uses an
orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components [9] such that the rst
principal component accounts for as much of the variability in
the data as possible, and each succeeding component in turn has
the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. When
applied to vector data, PCA identies the principal modes of variation from the eigen-vectors of the covariance matrix. Analogously,
when applied to FDA component data kn;i , PCA identies the dominant functional modes of variation or eigen-functions /x;i t. Performing PCA on FDA component data allows for a time series xt
to be represented as a linear combination of K orthonormal
eigen-functions f/x;1 t; /x;2 t    /x;K tg with coefcients x f such
that

xt

K
X
xif /x;i t

i1

Conventionally K  L, so the PCA step of FCA achieves a dimension


reduction. The FCA components or functional components x f can
be interpreted as projections of the physical variable xt in a Kdimensional functional space.
3.2. Canonical correlation analysis
Canonical correlation analysis [17] is a multivariate analysis
procedure used to transform the relationship between pairs of vector variables into a set of independent linearized relationships
between pairs of scalar variables. For datasets containing n pairs
of simulated or observed values fx1    xn g of a p-dimensional column vector variable x and fy1 . . . yn g of a q-dimensional vector variable y, CCA nds the column vectors a1 and b1 (p-dimensional and
q-dimensional vectors respectively) for which the linear
T

combinations aT1 x and b1 y are as highly correlated as possible.

71

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981


Table 1
Table indicating the different multivariate analysis techniques.
Technique

Acronym

Purpose

Functional data analysis

FDA

Principal component analysis


Functional Component analysis
Canonical correlation analysis
Canonical functional component
analysis

PCA
FCA
CCA
CFCA

Represents a set of measurements as a linear combination of continuous and differentiable mathematical functions
in space or time
Represents a set of vectors as a linear combination of principal vector modes of variation
FDA + PCA, Represents a set of curves as a linear combination of principal functional modes of variation
Identies linear combinations of two vector variables such that the linear combinations are maximally correlated
FDA + PCA + CCA, Described in Section 4

The coefcient of correlation between the n scalar values of these


linear combinations is termed the canonical correlation between
T

x and y. The scalar values aT1 x and b1 y are termed the rst canonical variates of x and y respectively.
The jth pair of vectors aj and bj is obtained by maximizing the
T

correlation between the pair aTj x; bj y of linear combinations such


aTj x; aTk;8k<j x,

that all the pairs

T
T
bj x; bk;8k<j y

and

T
aTj x; bk;8k<j y

of

Canonical
Correlaon
Analysis

linear combinations are orthogonal. At most, m such orthogonal


canonical variate pairs can be evaluated where m
minrankx1    xn ; ranky1    yn . [14]
A simple solution to this maximization problem has been
described in [9]. The sample covariance matrix RZZ of the samples

T
is calculated such that
fz1    zn g where zi xTi ; yTi

RXX RXY
RZZ
RYX RYY

1
YY

Component
Analysis

eigenvalues.

The

eigen-vectors

1
R1
XX RXY RYY RYX

of

Funconal
data analysis

Canonical
Funconal
Component
Analysis Funconal

Principal
Component
Analysis

1
1
1
The matrices R1
XX RXY RYY RYX and RYY RYX RXX RXY have the same posi-

tive

Canonical
Funconal
Analysis

and

1
XX

R RYX R RXY corresponding to the jth largest eigenvalue represent


aj and bj respectively. Taken together in the p  m matrix


A a1    aj and the q  m matrix B fb1    bj g, these represent
the solutions of the canonical correlation analysis problem. The
1
XX

1
YY

1
YY

1
XX

jth largest eigenvalue of R RXY R RYX and R RYX R RXY represent

Fig. 1. Canonical Functional Component Analysis: comparison with other multivariate data analysis techniques.

th

the correlation between the j canonical variates of x and y.


Combinations of these multivariate analysis techniques have
been developed. In [14,17], FDA and CCA have been combined to
form a Canonical Functional Analysis. In [18], FDA and PCA have
been combined into Functional Component Analysis (FCA). However, in this paper, all three techniques, FDA, PCA and CCA are combined and therefore termed: Canonical Functional Component
Analysis (CFCA). Fig. 1 provides a guide to these combinations.
4. Methodology
4.1. Notation
We formulate the forecasting problem using the notation of
[19]. The vector m represents parameters of a subsurface model
with a prior distribution qM m. m can be a gridded earth model
containing several modeling parameters such as the lithology,
porosity and hydraulic conductivity per grid cell, but may also
include modeling choices such as variogram parameters, Boolean
model parameters or choice of training image. The prior model
then constitutes the potentially innite set of realizations that
can be generated (sampled) using for example geostatistical methods such as MPS. The dynamic data are represented by specic values dobs , often representing measurements over time (pump-tests,
tracer tests, etc.) The data variables/parameters are represented by
a vector d which is usually simulated using numerical forward
model g for any given m using a forward model: d gm. Because
observable variables, just like the dynamic data, are time-varying
we will sometime write explicitly dt. In traditional inverse mod-

eling, the goal is to obtain the posterior distribution


can be expressed as [19]

rM m kqM mLm

rM m which
4

where k is a proportionality constant and Lm is a likelihood


function. Tarantola provides a general expression of this likelihood
function as function of model and data error distributions. In a more
limited representation this likelihood is typically made dependent
on the mist between dobs and the simulated gm. The posterior
distribution itself is not necessarily desired, rather samples
fm1 ; m2    mL g are required, which with non-Gaussian prior and
non-linear forward models require iterative Markov chain type
sampling, which are not feasible when the dimension of m is large
and/or running forward models takes several hours of CPU time.
As mentioned before, the goal in many applications is not
uncertainty on the model itself, but uncertainty on forecasts generated from models. The forecast seeks uncertainty in a variable h
that is computed from any model m using another forward model
which is denoted as h rm. In similar fashion, the posterior distribution on the predicted variable rH h with posterior samples
represented as fh1 ; h2    hL g which can be simulated as hi rmi .
4.2. Illustrative forecasting problem
As an illustration of a forecasting problem, the same example
presented in [7] is used. This illustrative two-dimensional earth
system is derived from a German aquifer [20,21]. To parameterize

72

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

the prior information available about the system, MPS [12] is used.
The MPS training image contains the binary spatial distribution of
the depositional feature with high hydraulic conductivity. A contaminant (or tracer) is injected on the left-hand edge of the earth
model. Observed data describes the contaminant concentration at
three different depths in the center of the system for 3.5 days after
contamination. The desired forecast is to predict the contaminant
concentration at the drinking well on the right-hand edge over
12 days after contamination on the left-hand edge.
For the illustrative problem, an ensemble of N 200 models,
fm1 ; m2    m200 g were created on a 100  25 grid using MPS code
in IMPALA [15] using the training image in Fig. 2. For each of these
models mi , using MaFloT, a ow and transport multi-scale nite
volume model [16], the contaminant concentration in the three
locations w1 ; w2 ; w3 in the center of the grid during 3.5 days after
contamination was calculated as

1
dw1;i t
B
C
di t @ dw2;i t A gmi
dw3;i t

4.3. Functional data analysis of a limited prior sample of d and h

and the contaminant concentration on the right hand edge over


12 days after contamination was calculated as

hi t rmi

are calculated at each time step from a limited set of posterior


concentration curves. In this fashion it becomes easy to compare
posterior uncertainty obtained with various methods as well as
with prior uncertainty.
In Data-case One, dobs;1 shows that there is a late arrival of the
contaminant at each of the observation points (Fig. 3). Intuition suggests that the contaminant arrival at the right hand edge would be
late as well. When PFA is applied in 3 dimensions with 2 NLPCA
components for the prediction variable and 1 NLPCA component
for the data variable, uncertainty in the forecast is reduced and more
centered around a late arrival of the contaminant as seen in Fig. 2.
In Data-case Two, dobs;2 shows late contaminant arrival at w1
and w2 but early arrival at w3 . PFA based on NLPCA suggests that
the prior and posterior will overlap. As a result, there would not
be any signicant uncertainty reduction through explicit inverse
modeling as seen in the green lines in Fig. 3. In Data-case Three,
dobs;3 shows that there is an early arrival of the contaminant at each
of the observation points as seen in the blue lines in Fig. 3.

with one measurement every 2.88 h resulting in a 3  30 dimensional variable for d and a 100 dimensional variable for h.
To analyze different situations, three different synthetic
observed data-sets dobs;1 , dobs;2 and dobs;3 were chosen by [7] to exhibit different possible situations of contaminant arrival and were
generated in a manner analogous to the other di t to ensure that
the observed data are consistent with and fully contained in the
prior. A reference posterior distribution on ht is obtained using
the NLPCA-based PFA [7] in each case for comparison. In [7] posterior samples for each case were generated using NLPCA, see Fig. 3. In
this paper, posterior uncertainty on forecasts will be represented
by quantiles, specically, the P10, P50 and P90 quantiles, which

In Prediction Focused Analysis (PFA) introduced by [7], a statistical relationship is established between ensembles of
fh1 ; h2 . . . hL g and fd1 ; d2 . . . dL g obtained from forward simulations,
r and g respectively, of an ensemble of prior models
fm1 ; m2 . . . mN g sampled from the prior qM m. Directly establishing such a relationship is difcult because the dimension of d
and h may be large, although typically much less than m. Lower
dimensional representations of d and h are required and are

denoted as d and h. If a signicant dimension reduction can be


achieved without much loss of variability in d and h then the


joint distribution f d ; h approximates the joint relationship
between the data and the forecast variable.
In this paper, we use Functional Component Analysis (FCA) [13]
and canonical correlation analysis (CCA) [9] to establish this joint
relationship in relatively small dimension (less than 100). When
FCA is performed upon a time series ht , it can be represented

Fig. 2. Illustration case set-up. Top: training image containing binary spatial distribution of higher conductivity spatial feature (Source: [7]). Bottom: aquifer modeled using
two dimensional grid with 100  25 cells. Contaminant injected on left-hand edge, contaminant concentration measured in the center 3.5 days after contamination and
forecast seeks contaminant concentration on right-hand edge 12 days after contamination.

73

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

Fig. 3. The top row shows the contaminant arrival at the three observation wells. The red lines indicate Data Case One, the green lines indicate Data Case Two and the blue
lines indicate Data Case Three. The grey lines are the data responses evaluated on a set of prior model realization. The bottom row shows predictions of posterior uncertainty
reduction obtained using NLPCA-based PFA using the P10P50P90 statistics for each data case. The grey lines indicate the P10P50P90 statistics of the prior. (For interpretation
of the references to color in this gure legend, the reader is referred to the web version of this article.)

as a linear combination of K orthonormal eigen-functions


f/h;1 t; /h;2 t    /h;K tg with coefcients contained in a vector
f

h of length K such that


K
X
f
ht
hi /h;i t

dt

K
X

di /d;i t

i1

and

i1

ht

K
X

hi /h;i t

i1

Thus, FCA allows for a representation of the time series as a single


f

where /d;i t are the eigen-functions of dt and /h;i t are the eigen

f
f
functions of ht . It results in N points d ; h in functional space

point h in a functional space as seen in Fig. 4 (see Chapter 8


[14]). Such functional space transformation preserves proximity
relationships as seen in Fig. 4 and importantly, given a point in
the functional space, the back-transformation yields a unique function in physical space (see Chapter 9 [14] for proof), rendering a
unique time-series ht. This is important because exact reconstruction of a time series is required when inverse solutions found in
functional space need to be back-transformed to responses in actual
physical space.

whose joint variation can be said to represent a sample of the joint


variation between the data and predicted variables. While such
transformation may result in a signicant dimension reduction,

f
f
dim d  dimg m and dimh  dimrm, the relationship

4.4. Linearizing with canonical analysis

ther transformation of d ; h . For this purpose we use canonical


correlation analysis or CCA [9]. CCA aims to establish a linear relationship between two random vectors that are non-linearly related

Applying FCA to both the data and forecast variables obtained


from N samples, we obtain:

using the transformations d AT d and h BT h where A and B


are the solutions to the canonical correlation analysis problem

may be complex, non-linear and therefore difcult to model using


simple statistical regression approaches. Therefore, we rely on a furf

Fig. 4. Functional Component Analysis preserves proximity relationships: contaminant concentration curves simulated using MaFlot (left), FCA components of each curve in a
two dimensional Functional Component Space (center) and contamination curves reconstructed using only 2 FCA components containing 88% variability (right).

74

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

described in the earlier review section. This maximizes the correlac


c
tions between pairwise components of hi and di (see Fig. 5) while
c
constraining all the inter-component correlations between hi and
c
c
c
c
c
c
c
hji , between di and dji , and between hi and dji to 0. d and h
are each of the dimension minrankH; rankD. Moreover, as long
as the rankH < rankD, BT is non-singular and square with
rankH rows and columns. This makes the back-transformation
c

from h to h a linear operation. To ensure this condition is met,


f

more functional components d of the data variable must to be conf

sidered than the components h of the prediction variable.

So far, we have focused on attempting to transform the joint


variation of a set of N samples of d; h to a set of N samples
 c c
d ; h that are linearly related and of lower dimension. Evidently,
this transformation may not always be successful i.e. the correlac
c
tion between d ; h might not be high enough. This would be an
indication that full inverse modeling (involving sampling the posterior on m) is required. In case such transformation is viable, then
a linear least square framework (see Chapter 3 [19]) can be used to
directly forecast h from d as discussed in this section.
 c
In [19], it has been assumed that the prior qcH h is multi-varc

iate Gaussian with mean h and covariance C H . For our purpose, h


can be estimated as the sample mean and C H can be estimated
using the sample covariance RHH of the N samples. In cases where
qcH hc is not multivariate Gaussian, since the individual compoc
nents of h are not univariate Gaussian, a simple histogram transformation can be used. By denition of CCA, the relationship
c
c
between d and h is linear with
c

d Gh


1
T
c
c
c
c
c
Lh exp  Gh  dobs C 1
dc Gh  dobs
2

11

With likelihood and prior multi-variate Gaussian and a linear


c
model, the posterior distribution rcH h is also multivariate Gaussc
h and covariance model Cf such that
ian with mean f
H

4.5. Least square modeling of posterior uncertainty

Because dobs has possibly observational error modeled in a least


square framework with error covariance matrix C d , for the same
c
reason, dobs has observational error now with covariance C dc . The
relationship between C d and C dc is derived in the Appendix A and
B. The likelihood is then assumed to be multi-variate Gaussian with
the following expression:

10

fc

h G

C 1
dc G

1
T 1 c
C 1
H G C dc dobs

fc
C 1
H h

12

1
CfH GT C 1
dc A C H

13
c

However, in physical systems, the relationship between d and h


c
will rarely be perfectly linear along a line of regression Gh , hence
an error model
c

d Gh 

14

is more appropriate. Because this error can be observed in the N


c
c
transformed samples d ; h , it is modeled using a multi-variate
 and covariance model C T which
Gaussian distribution with mean 
can be calculated as follows,



N
1X
c
c
d  Ghi
N i1 i

CT

1
Ddiff DTdiff
N

where each of the N columns ddiff in Ddiff is calculated as

Fig. 5. Comparison of relationships between data and prediction in functional space and Canonical Functional Space.

15

16

75

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981


c


ddiff di  Ghi  

17

to represent the deviation in each observation. The same method


was employed in [22] to model the error in using an approximate
forward model instead of an exact one. This model error can be
c
accounted for by replacing C dc by C dc C T and dobs by
c
c
 when calculating f
h and CfH . If linearization of the relationd 
obs

ship between d ; h using canonical correlation analysis is unsuccessful, the error covariance C T will exceed the C dc covariance. In
that case, instead of linear Gaussian regression, explicit inverse
modeling of m might be required.
Since a Gaussian distribution is completely dened by its mean
c
c
c
and covariance, an ensemble fh1 ; h2 . . . hM g of M samples of the
c
c
fH . Each of these
h and C
posterior rc h can be sampled using f
H

samples can be back transformed into a time series using the following steps:
1. Calculating functional posterior samples from canonical posterfT

CT

ior samples as h B1 h :


2. Calculating time series posterior samples from functional posP
f
terior samples using ht Ki1 hi /h;i t.
This allows for conversion of the ensemble of M samples of
c
c
c
points fh1 ; h2 . . . hM g in a low dimension space to an ensemble of
M samples of time-series fh1 t; h2 t . . . hM tg. This ensemble
can be used to estimate the statistics of the posterior distribution.
The technique in a broader view can be seen as modeling linearly, the relationships between the data variable d and the forecast
variable h in a lower-dimensional Canonical Functional Component Space. This requires doing functional data analysis (FDA),
principal component analysis (PCA) and canonical correlation analysis (CCA) in succession as Canonical Functional Component Analysis (CFCA). An overview of the entire workow has been added as
a owchart in an Appendix A and B.
5. Results
5.1. Illustration case
When CFCA-based PFA is applied to the illustrative case, the
results closely mirror those obtained from [7] NLPCA-based PFA.
The data variable dt and the forecast variable ht were decomposed using spline bases. The three time series dw1;i t, dw2;i t
and dw3;i t composing each di t were each functionally decomposed using a 4th order 6-spline basis resulting in four eigen-function components for each time series containing 99.99% of the
variability. As a result, each di t was projected in a 12-dimensional

functional space (Three dt time series, each with four d components). Each hi t was decomposed using a 2nd order 5-spline
basis resulting in ve eigen-function components containing
99.99% of the variability. The choice of basis (4th order 6-spline/
2nd order 5-spline) was guided by minimizing the maximum
RMS error between simulated and reconstructed curves in an
ensemble. This minimization provides the least number of splines
to achieve the maximum dimension reduction. Using canonical
correlation analysis, the functional data-forecast relationships
c
c
were linearized into a 10 dimensional d ; h Canonical Functional
f

Component Space from a 17-dimensional d ; h space (12 dimenf

sional d + 5 dimensional h ). All the relevant relationship information, however, could be effectively separated into ve
independent Canonical Functional Component Planes. Three of
these planes are seen in Fig. 6. In each of these planes, the dataforecast relationship is highly linear. Moreover the linear correlation went from 0.60 in functional space to 0.94 in functional
canonical space. (Fig. 5) The successfully achieved linear relationship permits the use of linear Gaussian regression to estimate
c
the posterior rcH h on the forecast components as a Gaussian distribution. Since sampling a Gaussian posterior distribution and
back transforming those samples are computationally inexpensive
linear operations, as many samples can be generated as needed to
effectively quantify an estimate on the forecast-uncertainty. In this
c
case, 100 posterior h points were sampled that, in turn, were
back-transformed into 100 ht curves seen in Fig. 7.
As seen in Fig. 8, the P10P50P90 quantile statistics of the posterior ensemble estimated using CFCA-based PFA agree very well
with the statistics of the NLPCA-based PFA estimate in data-cases
One and Three in terms of uncertainty reduction. In data-case
Two, CFCA-based PFA agrees with NLPCA-based PFA in suggesting
that the observed data does not contain information about the system that is relevant to the targeted forecast. As a result, inverse
modeling would not provide much uncertainty reduction.
The latter result (data-case Two) suggests that CFCA-based PFA
can be used as a diagnostic technique to ascertain the need for explicit inverse modeling for a forecasting problem just like NLPCAbased PFA can. However, CFCA is computationally more inexpensive
(no need for Monte Carlo iterations) and more straightforward
(involving linear models) compared to NLPCA, and especially so
for higher dimensional data, because it employs linear methods.

5.2. Extended case


To test the robustness of CFCA-based PFA in the presence of more
complex geological uncertainty, the illustrative case-forecasting
problem was extended now involving the following spatial prior:

Fig. 6. Canonical Functional Space for dobs;1 : red dots indicate Canonical Functional Component Projections d ; h , and the blue line indicates where, in each of the Canonical
c
Functional Component Planes, the dobs;1 (the projection in low dimensional Canonical Functional Component Space of the data observation) lies. (For interpretation of the
references to color in this gure legend, the reader is referred to the web version of this article.)

76

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

Fig. 7. Posterior sampling in Data-case one. Histogram of 100 samples of the posterior of the rst component h1 (left). Reconstructed posterior ht curves in blue showing
uncertainty reduction compared to grey prior ht curves (right). (For interpretation of the references to color in this gure legend, the reader is referred to the web version of
this article.)

Fig. 8. Comparison of P10-P50-P90 quantile statistics of posterior estimates using CFCA and NLPCA based PFA for Data-cases One (left), Two (center) and Three (right). The grey
lines indicate the P10-P50-P90 statistics of the Prior. The dotted blue lines indicate the P10-P50-P90 statistics obtained from NLPCA-based PFA as described in [7]. The dotted red
lines indicate the P10-P50-P90 statistics obtained from CFCA-based PFA. (For interpretation of the references to color in this gure legend, the reader is referred to the web
version of this article.)

1. There are three depositional scenarios as illustrated in Fig. 9.


Two scenarios with thicker stacked conductive bodies are
considered half as likely as the scenario with thin conductive
bodies.
2. The marginal distribution by volume of the highly-conductive
feature is uncertain. This uncertainty is modeled using a continuous uniform distribution between 25% and 50%.
3. The horizontal extent of the conductive body is uncertain. This
length is modeled as discrete uniform distribution between 26
and 34 grid units.
The parameter uncertainty of such an earth system would be
represented by a joint distribution of continuous, discrete and categorical random variables. An ensemble of N 300 earth models,
fm1 ; m2 . . . m300 g was modeled on a 100  25 grid using Booleanmodel generator Tetris [23] to sample this joint distribution. It is
important that this initial prior ensemble of scoping runs reliably
represents the extent of prior uncertainty, meaning, the number of
these initial scoping runs required is application dependent. Using
MaFlot, the contaminant concentration in the three locations
w1 ; w2 ; w3 in the middle of the grid during 3.5 days after contamination and the contaminant concentration at the forecast well on
the right hand edge over 12 days after contamination were calculated in the same way as the illustrative case.
As in the illustrative case, a reference model was generated
using a randomly sampled parameter set and simulated to obtain

a synthetic observed data-set. This ascertains that the synthetic


observed data would be consistent with the prior a requirement
for any Bayesian modeling to be valid. To generate a posterior forecast to test against, rejection sampling [24,25] was used (Fig. 10).
Rejection sampling entails sampling the posterior distribution
rM m using the Bayesian expression rM m kqM mLm where
the likelihood Lm is calculated as a function of the objective funcOm
2

tion m : Lm kL e

. The iterative workow is:

1. Generate an earth model from m the prior qM m: In this case, a


parameter set was randomly sampled from the prior jointparameter distribution. This parameter set is used to build the
earth model using Tetris.
2. Calculate the objective function Om for the model: In this
case, positive-valued function of the mismatch dobs  gm
was used.
3. Calculate the likelihood function from the objective function
Om

Lm kL e 2 where the constant kL is chosen such that the


maximum Lm value is 1.
4. Sample a random number u from a continuous uniform distribution on 0; 1.
5. Accept m as a sample of the posterior rM m if u 6 Lm.
Rejection sampling required 4607 forward model evaluations in
order to obtain a posterior ensemble of 45 forecast time-series
curves.

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

77

Fig. 9. Three depositional scenarios. The white cells indicate presence of depositional feature with high hydraulic conductivity. The bottom scenario is considered a priori
twice as likely as the upper two scenarios.

Fig. 10. Extended case: the top row shows dobs t curves (center) used as reference for rejection sampling and PFA. The middle row shows the projections in Canonical
Functional Space. The bottom row shows that the P10-P50-P90 statistics of rejection sampling forecast and CFCA-based PFA forecast-estimate agree well.

78

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

Fig. 11. Robustness check: dobs dened in Canonical Functional Space such that it lies near the edge of the prior (top). Reconstructed dobs t curves (center) used as reference
for rejection sampling and PFA. P10-P50-P90 statistics of rejection sampling forecast and CFCA-based PFA forecast-estimate do not agree (bottom).

For CFCA-based PFA, this time each di t simulated using the initial ensemble of 300 models was functionally decomposed using a
basis composed of 27 tri-variate fourth-order spline functions.
The choice of a large basis (more spline functions), as compared to
the illustration case, ensures as many simulated measurements as
possible can be used for calibrating the basis. In this case, the simulation provides 27 data points for each di t. However, this leads
to 81 kn;i coefcients for each di t if 27 uni-variate splines are used.
Using 27 tri-variate splines instead reduces the dimensionality
without any change in accuracy of reconstruction because only 27
kn;i coefcients are required for each di t: Additionally, using a
tri-variate basis rather than three uni-variate bases better accounts
for the statistical-relationships between the three time series
dw1;i t, dw2;i t and dw3;i t. This resulted in the rst 13 tri-variate
eigen-function components containing 99.99% of the variability.
This allows for maximum granularity in the FDA thereby ensuring,
rstly, a minimum loss of accuracy in reconstruction of the timeseries at the time of back-transformation and, secondly, preservation of the relationships between the constituent time series of dt.
In case of ht, the functional basis contained 102 fourth order
splines. In this case, the simulation provides 102 data points for
each hi t and all these simulated measurements can be used for
calibrating the basis. This yielded ve eigen-functions containing
99.99% of the variability. Using canonical correlation analysis, the
data-forecast relationships were linearized into independent

Canonical Functional Component Planes three of them shown


in Fig. 10. Linear Gaussian regression was used to generate 75 posterior samples that, in turn, were back-transformed into 75 ht
curves. The P10P50P90 quantile statistics of these curves agree
well with those of the posterior ensemble curves obtained by rejection sampling as seen in Fig. 10. However CFCA-based PFA uses
only 300 forward models that can be simultaneously evaluated
(embarrassingly parallel computational problem) and involves
linear methods that scale well with the dimensions of the problem.
One could have generated many more posterior samples without
needing to run any forward simulators. Compare this to rejection
sampling which requires, in this case approximately 100 forward
simulator runs (4607/45) per each posterior sample.
PFA (whether NLPCA or CFCA) is not applicable is cases where
the observed data falls at the edge of the prior data space. In
Fig. 6, as well as in Fig. 10, we notice how the observed data (blue
line) is within the bulk of the scatter-plot, hence, meaningful statistical modeling can take place. However, this approach is unlikely
to be successful when the observed data lies either outside the
scatter or at the edge. Consider such a case in Fig. 11: a synthetic
reference case was generated such that the Canonical Functional
c
Projection of the data observation dobs lies at the edge of the prior
c
d . As seen in Fig. 11, CFCA-based PFA cannot provide a reliable
forecast estimate. To solve such cases successfully, full posterior
modeling of m may be required.

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

79

Fig. 12. Trade-off between linearity and dimension-reduction: canonical correlation of the data-forecast pair when dimensions are reduced using NLPCA.

Another case where PFA using CFCA is not applicable is where


the canonical correlation analysis does not successfully linearize
c
c
the relationship between d ; h . This means there is a large differc
c
ence  between d and its regression estimate Ah . When this error,
modeled as a Gaussian distributed random variable, has a covariance C T that exceeds the desired C dc covariance, the error correction used in [22] does not apply. Since the error  can be
c
c
calculated from the samples in each d ; h plane independently,
C T can be calculated and compared with the desired C dc . In a case
where the error  is large and varying, instead of linear Gaussian
regression, explicit inverse modeling of m might be required.
6. Trade-off between linearity and dimensionality
When data-forecast relationships are studied in low-dimensions, the dimension-reduction itself trades off against linearity.
In the illustrative case, for example, there is a signicant loss of linearity as dimensions are reduced using NLPCA as seen in Fig. 12.
This gures plots the number of NLPCA components versus the correlation of the rst pair of canonical components. This rst pair
serves as a measure of linearity of the relationship between the
projection of the data and prediction in lower dimensions.
As the linearity of the relationship reduces, computationally
complex methods such as Kernel Smoothing need to be performed
to generate a forecast. The reason for this is even though a Neural
Network can be designed such that the bottleneck layer has just
one node, a single eigen-function can very rarely be used to
describe the entire variability in a physical system. When canonical
correlation analysis is added to FCA, in addition to linearizing the
data-forecast relationship, it also splits the low dimension
f

d ; h space into independent d ; h planes. Within each of these


c
c
d ; h planes, the data-forecast relationship can be approximated
with a linear model. Linear model regression can then replace Kernel Smoothing as the estimation technique. This provides further
computational gains since a Kernel Smoothed distribution needs
to be sampled using a Monte Carlo technique whereas, a multiGaussian distribution can be sampled efciently.
NLPCA needs to be modeled using a neural network whose conguration needs to be calibrated from the data and this is often a
non-robust feature of neural networks. In particular, the number
of hidden layers (number of weights in the network) need to be
estimated from calibration sets. In comparison, the CFCA model

is in fact linear (more robustly estimated) and the complexity/


dimensionality of the model is determined directly from the analysis. Additionally, as the dimensions of the physical data increase
for example, many forecasts need to be updated as more data measurements become available for a physical system over time the
computational complexity of NLPCA increases cubically with On3 .
This can get computationally prohibitive. Since FDA uses continuous functions in the basis, as more data measurements become
available, the functional basis can be simply recalibrated to provide
a new forecast estimate. The computational complexity of the
operation increases linearly with On. Thus, functional analysis
is inherently scalable in comparison to NLPCA. These scalability
gains can accumulate over time.
7. Conclusions
Since many problems in the Earth Sciences tend to involve
highly multivariate earth models with computationally expensive
forward models, using PFA as a diagnostic technique or a quickestimate may be useful since it avoids any iterative workow.
Our results demonstrate that CFCA can be used as an effective
method to do PFA for a forecasting problem, and that it is more
computationally inexpensive and scalable than NLPCA due to
higher dimensions and thus, higher linearity.
An interpretation of CFCA is that it relates data features of an
underlying physical system with its forecast features. Observed
data measurements capture information about some features of
the underlying earth system. Canonical functional components of
the observed data represent these data features. These features
of the system do not always inform the targeted forecast. The features of the earth-system that do inform the targeted forecast the
forecast features - are represented by the canonical functional
components of the forecast. CFCA checks if data and forecast features of a system overlap for particular forecasting problem. If
not, then explicit inverse modelling is needed. When multiple
measurements sources are available for a system, CFCA can potentially be used to select a data variable or a combination of data
variables - that would contain most information about the targeted
forecast features. As a result, CFCA can potentially be used for
application-specic sensor (or well) location problems.
A question that is left unaddressed in this paper is how many
models N need to be generated to create a meaningful statistical

80

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981

relationship between data response and forecast response. The


question of N is case specic and depends on where the observed
data lies with respect to prior uncertainty of the data responses.
This issue was raised in Fig. 11: if the observed data lies at the
extremes of the prior data responses than many more samples will
be required. We are currently working on a bootstrap procedure
that can determine whether any given N number of samples provides sufcient condence in the generated posterior uncertainty.
An important application of CFCA can potentially be in the integration of unreliable or noisy data. Time-series measurements in
earth systems tend to reect smooth underlying processes. However, these measurements, in practice, are susceptible to errors
and data-gaps due to physical agents. These can propagate through
numerical workows unpredictably. However, functional analysis
provides a framework for noise removal in the case of data measurements from smooth processes [26].
Another application of functional analysis lies in the ability to
quantify the extremity/centrality (or outlyingness) of an observation with respect to a highly multivariate prior [27]. Bayesian
techniques depend upon the assumption that the prior information
is complete and data-agnostic. In practice, however, prior distributions for earth-modelling parameters are rarely data-agnostic.
Using extremity metrics based on CFCA, it is possible in the future
to dene a criterion for adequateness of a prior distribution for a
given forecasting problem.
Appendix A
Flowchart of CFCA-based PFA

Appendix B
The relationship between the physical-space data covariance C d
and the canonical-functional space data component covariance C dc
c
can be modeled using the linear relationships between d and d . In
general, if Y XAT , the covariance between the columns of Y

RYY covY; Y ARXX AT


where RXX covX; X is the covariance between the columns of X.
In our case, C dc is the covariance matrix between the components in the Canonical Functional Component Space. Due to the
nature of the relationships between canonical components, it is a
c

diagonal matrix. If d d AT ,

C dc AC d f AT
relates the covariance of the functional components C d f with C dc .
C d f is a K  K matrix where K is the number of eigen-functions in
the functional basis such that

dt

K
X
f
di ud;i t
i1

Assuming that functions on t are valued at M values f t 1

...

t M g,

the above expression can be expressed as d d UT where U is an


M  K matrix.
Note that the M values of measurement of t do not necessarily
need to be unique. If, for example, 10 measurements are taken
independently at three locations, as in the illustrative example
case, the eigen-functions ud;i t can be evaluated either as treating
dt as a tri-variate on 10 values of t or as uni-variate on 30 values
of t treating dt as a concatenation of dw1;i t, dw2;i t and dw3;i t.
The M  M covariance matrix C d is expressed as

C d UC d f UT
or
1

C d UA1 UC d f AT AT UT
in terms of the desired canonical functional data component covariance. Thus the expression
1

C d UA1 C dc AT UT
(The matrix A1 represents the MoorePenrose inverse of A if A is
not a square matrix) can be used to express the relationship
between the physical-space data covariance C d and the canonicalfunctional space data component covariance C dc .
In an Earth Sciences forecasting problem, the error-covariance
on the observable data C d often models the measurement error
of an instrument. In a 2q-dimensional Canonical Functional Component Space, the corresponding error-covariance can be modeled
as

C dc

argmin
Xdiagbix8xRq

1

C d  UA1 XAT UT

where Rq is the set of all q-dimensional vectors with positive


real-valued elements.
References
[1] Mosegaard K, Tarantola A. Probabilistic approach to inverse problems. In: Lee
W, Jennings P, Kisskinger C, Kanamori H, editors. International handbook of
earthquake and engineering seismology. London: Academic Press; 2002. p.
23765. http://dx.doi.org/10.1016/S0074-6142(02)80219-4.
[2] Carrera J, Alcolea A, Medina A, Hidalgo J, Slooten L. Inverse problem in
hydrogeology. Hydrogeol J 2005;13:20622. http://dx.doi.org/10.1007/
s10040-004-0404-7.

A. Satija, J. Caers / Advances in Water Resources 77 (2015) 6981


[3] Oliver D, Chen Y. Recent progress on reservoir history matching: a review.
Comput Geosci 2011;15:185221. http://dx.doi.org/10.1007/s10596-0109194-2.
[4] Zhou H, Gmez-Hernndez JJ, Li L. Inverse methods in hydrogeology: evolution
and recent trends. Adv Water Resour 2014;63:2237. http://dx.doi.org/
10.1016/j.advwatres.2013.10.014.
[5] Park H, Scheidt C, Fenwick D, Boucher A, Caers J. History matching and
uncertainty quantication of facies models with multiple geological
interpretations. Comput Geosci 2013;17:60921. http://dx.doi.org/10.1007/
s10596-013-9343-5.
[6] Caers J. On internal consistency, conditioning and models of uncertainty. In:
Ninth international geostatistics congress. Oslo, Norway; 2012. p. 115. http://
dx.doi.org/10.1007/978-94-007-4153-9
[7] Scheidt C, Renard P, Caers J. Prediction-focused subsurface modeling:
investigating the need for accuracy in ow-based inverse modeling. Math
Geosci 2015;47:17391. http://dx.doi.org/10.1007/s11004-014-9521-6.
[8] Ramsay JO. Functional data analysis. Encyclopedia of statistical sciences. John
Wiley & Sons Inc; 2004. http://dx.doi.org/10.1002/0471667196.ess3138.
[9] Krzanowski WJ. Principles of multivariate analysis: a users perspective. Rev.
ed. Oxford Oxfordshire; New York: Oxford University Press; 2000.
[10] Ramsay JO, Dalzell CJ. Some tools for functional data analysis. J R Stat Soc Ser B
(Methodol) 1991;53:53972. http://dx.doi.org/10.2307/2345586.
[11] Kramer MA. Nonlinear principal component analysis using autoassociative
neural networks. AIChE J 1991;37:23343. http://dx.doi.org/10.1002/
aic.690370209.
[12] Caers J. Petroleum geostatistics. Richardson, TX: Society of Petroleum
Engineers; 2005.
[13] Silverman BW. Smoothed functional principal components analysis by choice
of norm. Ann Stat 1996;24:124. http://dx.doi.org/10.1214/aos/1033066196.
[14] Ramsay JO, Silverman BW. Functional data analysis. 2nd ed. New
York: Springer; 2005.
[15] Straubhaar J, Renard P, Mariethoz G, Froidevaux R, Besson O. An improved
parallel multiple-point algorithm using a list approach. Math Geosci
2011;43:30528. http://dx.doi.org/10.1007/s11004-011-9328-7.

81

[16] Knze R, Lunati I. An adaptive multiscale method for density-driven


instabilities. J Comput Phys 2012;231:555770. http://dx.doi.org/10.1016/
j.jcp.2012.02.025.
[17] Leurgans SE, Moyeed RA, Silverman BW. Canonical correlation analysis when
the data are curves. J R Stat Soc Ser B (Methodol) 1993;55:72540. http://
dx.doi.org/10.2307/2345883.
[18] Silverman BW. Smoothed functional principal components analysis by choice
of norm 1996:124. http://dx.doi.org/10.1214/aos/1033066196.
[19] Tarantola A. Inverse problem theory and methods for model parameter
estimation. Philadelphia, PA: Society for Industrial and Applied Mathematics;
2005.
[20] Bayer P, Huggenberger P, Renard P, Comunian A. Three-dimensional high
resolution uvio-glacial aquifer analog: Part 1: eld study. J Hydrol
2011;405:19. http://dx.doi.org/10.1016/j.jhydrol.2011.03.038.
[21] Comunian A, Renard P, Straubhaar J, Bayer P. Three-dimensional high
resolution uvio-glacial aquifer analog Part 2: geostatistical modeling. J
Hydrol 2011;405:1023. http://dx.doi.org/10.1016/j.jhydrol.2011.03.037.
[22] Hansen T, Cordua K, Jacobsen B, Mosegaard K. Accounting for imperfect
forward modeling in geophysical inverse problems exemplied for
crosshole tomography. Geophysics 2014;79:H1H21. http://dx.doi.org/
10.1190/geo2013-0215.1.
[23] Maharaja A. TiGenerator: object-based training image generator. Comput
Geosci 2008;34:175361. http://dx.doi.org/10.1016/j.cageo.2007.08.012.
[24] von Neumann J. Various techniques used in connection with random digits.
Monte Carlo Method 1951;12:368.
[25] Ripley BD. Random variables. Stochastic simulation. John Wiley & Sons Inc;
2008. p. 5395.
[26] Ramsay JO, Silverman BW. Applied functional data analysis: methods and case
studies. New York: Springer; 2002.
[27] Ferraty F, Franco-Pereira A, Lillo R, Romo J. Extremality for functional data.
Recent advances in functional data analysis and related topics. Physica-Verlag
HD2011. p. 1314.

You might also like