Professional Documents
Culture Documents
Philip J. Brown
University of Kent at Canterbury, UK
Inverse predi
tion
on
erns predi
ting one set of measurements from another set when
the dire
tion of
ausation and possibly error operates in the opposite dire
tion. An
example is that of
alibration in whi
h it is desired to use a
heap and qui
k but error
prone measurement (Y ) to predi
t the true amount of a
onstituent, (x); whi
h in itself
an be measured a
urately in laboratory
onditions but at mu
h greater eort or
ost.
After taking data on samples using both measurements (the training or learning data) it
is desired to use the
heap measurement in future.
The paradigm extends to multivariate response Y and
omposition ve
tor x: Relation-
ships may be linear or non-linear. Appli
ations range from simple instrument
alibration
to the use of spe
tros
opi
(multivariate) responses for predi
tion of
omposition, to im-
age re
onstru
tion from noisy data. An example that will serve to fo
us ideas is given
by Karlsson et al (1995) on the determination of nitrate in muni
ipal waste water by UV
spe
tros
opy, and dis
ussed in the review of multivariate
alibration by Sundberg (1999).
If parametri
statisti
al analysis is employed then there are well developed estimation
pro
edures, be they
lassi
al sampling, likelihood based, or the in
reasingly in
uential
Bayesian approa
h. In this arti
le we will review these. We will also
omment on the
use of the predi
tion of x dire
tly and the symmetri
approa
h to x; y embodied in latent
variable methods.
Whatever statisti
al approa
h is adopted there is a need to keep the parti
ular features
of the appli
ation
learly in fo
us. Some prepro
essing of the data may be ne
essary
to redu
e unwanted
onfounding. It may be that spe
ial features need to be modelled.
Graphi
al diagnosti
s are invaluable in dete
ting important departures from assumptions.
For an earlier dis
ussion of some of these issues see (Martens and Ns, 1989).
Yi = + xi + i (1)
Here the i are independent and identi
ally normally distributed with zero mean and
varian
e 2 : In random
alibration the set of x values will be a sample of those generally
en
ountered. Hybrid designs might result from sub-sampling purposely su
h a random
sele
tion, perhaps to in
rease the varian
e by thinning out
entral values so as to in
rease
the a
ura
y of estimation in the
onditional distribution of Y given x: Some more extreme
x values might even be added to the sample to extend the range of validity.
1
The statisti
ian's standard approa
h to
ontrolled
alibration has a long pedigree
stemming from Eisenhart (1939), and is to regress Y on x and then invert the estimated
relationship to predi
t x in future. If x is random or the x values
an be safely assumed
to be typi
al of those likely to arise in future, then to predi
t x it is sensible to regress X;
random, on y; treated as
onditionally xed. Under an extension of the Gauss- Markov
theorem for quadrati
predi
tion loss (Brown, 1993, se
tion 3.3), this is the best linear
unbiased predi
tor. It loses this optimality when there is one future x to predi
t but m
repli
ates in Y at this future value, as is explored below.
Let us
omplete the model for the training data with a similar one for the predi
tion
data in the linear model for
ontrolled x: Imagine a single unknown x to predi
t. We
denote it by to emphasize that it is unknown and needs to be predi
ted. At this value
of we have observed m repli
ates of Y: To further emphasize that these Y values are for
predi
tion and may just be dierent from the training data Y
arefully obtained under
laboratory
onditions, we use Z in pla
e of Y: The predi
tion model is then
Zj = + + j ; j = 1; : : : ; m: (2)
Here it is generally assumed that the varian
e of the normally distributed error j is 2;
identi
ally distributed to errors in (1). Departures from these assumptions would in
lude
the possibility of a larger error varian
e than 2; and
orrelated repli
ates. We have
already indi
ated that su
h departures are worth
he
king. The
lassi
al estimator from
the regression of Y or Z on x or is
where Z = m
P Zj =m and least squares estimates ^ = y ^x; ^ = sxy =sxx; with sxy =
Pn(xi x)(yi 1 y); sxx = Pn (xi x)2: It terms of departures from training means eqn
1 1
(3) be
omes (^C x) = (Z y)=: ^ The inverse predi
tor for m = 1 from the regression
of X on y is simply
^I = ^ + ^Z (4)
or in terms of departures from training means, (^I x) = ^(Z y) with least squares
P
estimates ^ = x ^y; ^ = sxy =syy and syy = n1 (yi y)2:
The two regression lines are related by
where r2 = sxxsxysyy the squared
orrelation. Sin
e r2 1; the inverse estimator is
loser
2
to the mean of x than the
lassi
al estimator. With a perfe
t
orrelation of unity the
estimators will
oin
ide, otherwise the dieren
e between the two depends on the strength
of the relationship through r2 :
The shrinkage of the inverse estimator towards the mean relative to the
lassi
al es-
timator is re
e
ted in the assumption of a
ommon normal distribution for the x0s and
in alternative derivations of the inverse estimator. All approa
hes to inferen
e whi
h
2
properly a
ount for the un
ertainty use the two sour
es of un
ertainty, in the training
data through model (1) with both x and y data and through the future observation(s)
with just Z data in model (2). The Bayes approa
h to this amalgamation stems from
Hoadley (1970) and was extended by Brown (1982) to the multivariate general linear
model. Broadly it uses the training data to derive the posterior distribution of the regres-
sion parameters and then integrates these out from the predi
tion model for Z to form a
predi
tive density of the only remaining unknown, ; whi
h
ombines with the prior for
to form the posterior. When there are predi
tion repli
ates these will add information
(residual sum of squares s02) about 2; provided this
an be safely assumed to be
ommon
to both predi
tion and training. In this
ase the typi
al form of posterior is
( jZ; Y; X ) / f (Z jY; X ; s02 ; ) ( jX );
where for detailed assumptions see Brown (1993),
hapter 5.
Here however we will tie together approa
hes to univariate, multivariate and non-
linear models through the prole likelihood of : This uses maximisation of likelihood
rather than the integration of the Bayes approa
h. It is formed by maximising over
parameters ; ; 2 the likelihood formed by
ombining the n observations (xi ; Yi ); i =
1; : : : ; n and m `observations' (; Zj ); j = 1; : : : ; m for xed : The prole or relative
likelihood (Kalb
eis
h and Sprott, 1970) gives maximum likelihood estimate of if it
is further maximized over ; and the prole likelihood is often standardized to having
an overall maximum of unity at the MLE. The prole likelihood itself gives the relative
likelihood of dierent values of : By taking all values of su
h that this prole likelihood
is greater than some
onstant we generate likelihood ratio
onden
e intervals. For the
ontrolled
alibration situation the prole likelihood is proportional to
max Y n Y
m
[ f (yi j xi ; ; ; ) f (zj j ; ; ; 2)℄:
2
; ; 1
2
1
The expli
it form is given later for the more general multivariate
ase in formula (10).
This prole likelihood as a fun
tion of has a maximum at ^C given by (3), verifying its
maximum likelihood
redentials. However if one assumes some of the x values and are
random normal, whi
h one is free to do without ae
ting the distribution of Y j x; where
all have the same unknown mean and varian
e, then a vital link between the unknown
and the x sample is established. The prole likelihood augmented by the normal xi ;
data now shifts its peak towards the mean of the random : In the
ase where all the x
values and form a N (; 2 ) random sample, then the prole likelihood is the
ontrolled
ase relative likelihood multiplied by
max nY +1
[ f (xi j ; 2 )℄
; 2 1
where xn+1 = : This extra term is like the kernel of a Student distribution,
21;n =
(n+1) P
[1 + 1=n + ( x)2 =sxx ℄ 2 where x = n1 xi =n: Without predi
tion repli
ates, m = 1;
3
the
ombined term is maximized by ^I given by (4) (Brown, 1993, Se
tion 2.5). When
there are predi
tion repli
ates the maximum of the prole likelihood, let us
all it ^I ; is not
equal to ^I extended by Z repla
ing Z: The predi
tor ^ +^Z is biased towards x; as noted
for example by Srivastava (1995) in simulations
omparing
lassi
al and inverse predi
tors
in multi-univariate linear
alibration. A series expansion of the log-prole likelihood as a
quadrati
in gives a maximising value and approximate predi
tor (^I x) = k ^(Z y)
with s2 ; s2+ residual sums of squares from the training data, and training data augmented
with mean
orre
ted sums of squares from the repli
ates, respe
tively. We reiterate that it
is important here to
he
k for departures from assumptions espe
ially dieren
es between
error varian
es in training and predi
tion, and
orrelated repli
ates.
The fa
tor k given by (6) is unity with no predi
tion repli
ates and is monotone in-
reasing in m; with a limiting value as m ! 1 of 1+ s =s xx = 1=r2 ; when as a
onsequen
e
2
^2
of (5) the estimator reverts to ^C : Similar relationships between the estimators may be
explored by adopting a Bayes approa
h, see Brown (1982) for the multivariate general
linear model
ase. In the published dis
ussion of this paper (i) T. Fearn emphasizes that
the design for
alibration will typi
ally
hoose or sub-sele
t x's that span the range of
likely future values of ; (ii) J. Copas emphasises the essential inferential need for a link
between and the training x-sample whi
h underpins inverse estimation.
Likelihood ratio
onden
e intervals may be readily spe
ied. For
ontrolled
alibra-
tion the prole likelihood is a monotone fun
tion of the pivotal statisti
2 Multivariate Calibration
Simple single variable versus single variable
alibration as featured above provides insights
into the essential dieren
es between
lassi
al and inverse
alibration. It is however the
4
ase that most modern instruments are multivariate in Y and often multiple in x: So
alled se
ond-order or hyphenated instruments go one step further and deal with matrix
Y for ea
h x; (Bro, 1996; Smilde, 1997; Linder and Sundberg, 1998).
In the multivariate general linear model, model (1) with predi
tion model (2) may be
extended to
Y = 10 + X B + E; (8)
0
Z = 1 + 1 B + E ; 0 (9)
where Y (n q ); E (n q ); Z (m q ); E (m q ) are random matri
es, X (n p) is a matrix
of xed
onstants as are the ve
tors of units, 1;whi
h are respe
tively n 1 and m 1; is
a p 1 ve
tor of p unknowns. Rows of the errors E and E are independent multivariate
normal with mean the zero ve
tor and
ovarian
e matrix the the q q unknown matrix
: This model was developed in (Brown, 1982). In the
ontext of the muni
ipal waste
water example we have n = 125; observations, q = 316 UV-absorban
es, and there is just
one
omponent (nitrate) , p = 1; of waste water of interest.
The prole likelihood for from (8),(9) is given as
^ 0 ( x)℄0S+ 1[Z y B^ 0 ( x) ℄ ℄ n+2m : (10)
[
2m;n ( )=f
2m;n ( ) + [Z y B gg
Here S+ is the q q residual sum of produ
ts matrix augmented by m 1 repli
ate
information in (9), and
2m;n ( ) = [1=m + 1=n + ( x)0SXX1 ( x)℄ (11)
with SXX the p p sum of mean
orre
ted produ
ts matrix. The simple estimator
analogous to (3) is the generalized least squares estimator
^ + 1B^ 0 )BS
^GLS x = (BS ^ + 1(Z y)
This does not maximize (10) unless p = q; (Brown and Sundberg, 1987). When there
are relatively few observations, S+ is not of full rank and generalized inverse versions are
possible if n p + q; but are somewhat unstable, see Sundberg (1996) who
ompares
the pre
ision of a range of estimators and dis
usses putting redu
ed parametri
fa
tor
stru
ture into or Bayesian forms of regularization or shrinkage.
(See also the EofE entry on SHRINKAGE REGRESSION.)
Another strategy revolves around response sele
tion, see for example Spiegelman et al (1996).
Variable sele
tion is usually easiest to implement in the regression of X on Y ; see (Brown
et al., 1998) for fully multivariate Bayesian sto
hasti
sear
h variable sele
tion employing
5
an arise whereby some
omponents of the future Y suggest a quite dierent value of
to other
omponents: there is no suÆ
ien
y redu
tion to iron this out due to the
urved
nature of the exponential family, and likelihood based
onden
e regions will re
e
t this
on
i
t by be
oming
atter.
Predi
tion from the regression of X on Y may be formulated in a similar way to
the univariate prole likelihood approa
h. The prole likelihood formed by augmenting
models (8),(9) with the assumption of some random p ve
tors Xi ; taken to the limit of
n+1
all n random, modies (10) by a fa
tor of
21;n ( ) 2 ; a spe
ial
ase of (11), and reverts
to the predi
tive likelihood from the regression of X on Y: Conden
e pro
edures are
just obtained from multivariate predi
tive Student distributions.
The regressions of X on Y and Y on X lead to predi
tors that after a suit-
able transformation are simply proportional with the squared
anoni
al
orrelations as
proportionality fa
tors, (Brown, 1982), extending the simple univariate relationship (5).
Chemometri
ian's have often preferred to formulate models symmetri
ally in X and Y;
and have es
hewed statisti
ian's fo
us on random and
ontrolled
alibration in favour of
algorithmi
methods su
h as partial least squares, see parti
ularly the dis
ussion of Sund-
berg's paper (1999) by H. Martens and and S. Wold whose arguments are persuasive but
also see Sundberg's response. For a general latent variable approa
h see (Burnham et al.,
1999).
3 Non-linear alibration
New features arise if the relationship between Y and X is non-linear even in the univariate
ase. Let us fo
us on
ontrolled
alibration. In the
ase of multivariate Y it may be
possible to sele
t a subset of Y variables for whi
h the relationship is linear without
dis
arding too mu
h information (Brown, 1993,
hapter 7). If we do not have a surplus
of variables and want to retain all, then two situations may be distinguished.
(i) The regression of Y on x is linear in the parameters but non-linear in x:
(ii) The regression of Y on x is non-linear in the parameters as well as x:
In the rst
ase although the linear model applies to the
alibrating data, predi
tion is still
non-linear. An example is polynomial regression. In the se
ond
ase standard inferential
methods still apply (likelihood and Bayes) but linearity in the model parameters
annot
be exploited. For
ase (i), the prole likelihood is still given by (10), (11) if we repla
e
by h() a p 1 ve
tor of p < p
omponents in : Lundberg and de Mare (1980)
develop the univariate polynomial x
ase and note that if the mean is monotone in x over
the range of x and possible ; then a single unique
onsistent estimate ^ exists, whi
h
is indeed the MLE. Their asymptoti
onden
e region is just obtained by linearising
the mean fun
tion about this estimate. In general provided the x relationship is not too
6
urved, linearization about the maximum likelihood value of
an provide straightforward
onden
e regions, see also (Oman, 1999).
Clarke (1992) provides a prole likelihood solution for
ase (ii) with a single unknown
omponent of x; and mean fun
tions monotone in x: As with the earlier dis
ussion this
just involves writing down the likelihood fun
tion from both training and predi
tion for
given and maximising over the other unknown model parameters. Simple
onden
e
pro
edures are available when approximating by linearization of the mean fun
tion both
in and the model parameters about their maximum likelihood estimates. There remains
the possibility of
on
i
t between the estimates of on dierent response axes, (Sundberg,
1999). The example given of rhino
eros data with 2 responses (anterior and posterior horn
lengths) and a single x (age)
ould benet from inje
ting the ex
hangeability assumption
of and the x-sample. It is analysed by Bayesian simulation te
hniques by (du Plessis
and van der Merve, 1996).
Referen es
Brown, P. J., Vannu
i, M. and Fearn, T. (1998). Multivariate Bayesian variable sele
tion
and predi
tion. Journal of the Royal Statisti
al So
iety, Series B, 60, 627{641.
Burnham, A. J., Ma
Gregor, J. F. and Viveros, R. (1999). A statisti
al framework
for multivariate latent variable methods based on maximum likelihood. Journal of
Chemometri
s, 13, 49{65.
7
Kalb
eis
h, J. D. and Sprott, D. A. (1970). Appli
ation of likelihood methods to models
involving large numbers of parameters. Journal of the Royal Statisti
al So
iety, B, 32,
175{208.
Karlsson, M., Karlberg, B. and Olsson, R. J. O. (1995). Determination of nitrate in
muni
ipal waste water by UV spe
tros
opy. Analyti
a Chemi
a A
ta, 312, 107{113.
Linder, M. and Sundberg, R. (1998). Se
ond order
alibration: bilinear least squares
regression and a simple alternative. Chemometri
s and Intelligent Laboratory Systems,
42, 159{178.
Lundberg, E. and de Mare, J. (1980). Interval estimates in the spe
tros
opi
alibration
problem. S
andinavian Journal of Statisti
s, 7, 40{42.
Martens, H. and Ns, T. (1989). Multivariate
alibration . Wiley, Chi
hester.
Oman, S. D. (1999). Multivariate
alibration. In Multivariate analysis, design of exper-
iments, and survey sampling (ed. S. Ghosh), pp. 265{299. Mar
el Dekker: New York,
Basel.
Smilde, A. K. (1997). Comments on multilinear PLS. Journal of Chemometri
s, 11,
367{377.
Spiegelman, C., Wang, S. and Denham, M. (1996). Asymptoti
minimax
alibration
estimates. Chemometri
s and Intelligent Laboratory Systems, 32, 257{263.
Srivastava, M. S. (1995). Comparison of the inverse and
lassi
al estimators in multi-
univariate linear
alibration. Communi
ations in Statisti
s, 24, 2753{2767.
Sundberg, R. (1996). The pre
ision of the estimated generalized least squares estimator
in multivariate
alibration. S
andinavian Journal of Statisti
s, 23, 257{274.
Sundberg, R. (1999). Multivariate
alibration{ dire
t and indire
t regression methodology.
S
andinavian Journal of Statisti
s, 26, 161{207.