You are on page 1of 8

INVERSE PREDICTION

Philip J. Brown
University of Kent at Canterbury, UK

Inverse predi tion on erns predi ting one set of measurements from another set when
the dire tion of ausation and possibly error operates in the opposite dire tion. An
example is that of alibration in whi h it is desired to use a heap and qui k but error
prone measurement (Y ) to predi t the true amount of a onstituent, (x); whi h in itself
an be measured a urately in laboratory onditions but at mu h greater e ort or ost.
After taking data on samples using both measurements (the training or learning data) it
is desired to use the heap measurement in future.
The paradigm extends to multivariate response Y and omposition ve tor x: Relation-
ships may be linear or non-linear. Appli ations range from simple instrument alibration
to the use of spe tros opi (multivariate) responses for predi tion of omposition, to im-
age re onstru tion from noisy data. An example that will serve to fo us ideas is given
by Karlsson et al (1995) on the determination of nitrate in muni ipal waste water by UV
spe tros opy, and dis ussed in the review of multivariate alibration by Sundberg (1999).
If parametri statisti al analysis is employed then there are well developed estimation
pro edures, be they lassi al sampling, likelihood based, or the in reasingly in uential
Bayesian approa h. In this arti le we will review these. We will also omment on the
use of the predi tion of x dire tly and the symmetri approa h to x; y embodied in latent
variable methods.
Whatever statisti al approa h is adopted there is a need to keep the parti ular features
of the appli ation learly in fo us. Some prepro essing of the data may be ne essary
to redu e unwanted onfounding. It may be that spe ial features need to be modelled.
Graphi al diagnosti s are invaluable in dete ting important departures from assumptions.
For an earlier dis ussion of some of these issues see (Martens and Ns, 1989).

1 Simple linear alibration

Suppose we have a training sample of n observations (xi; yi ); i = 1; : : : ; n: In ontrolled


alibration the data would have arisen from a designed set of x values or standards.

Yi = + xi + i (1)

Here the i are independent and identi ally normally distributed with zero mean and
varian e  2 : In random alibration the set of x values will be a sample of those generally
en ountered. Hybrid designs might result from sub-sampling purposely su h a random
sele tion, perhaps to in rease the varian e by thinning out entral values so as to in rease
the a ura y of estimation in the onditional distribution of Y given x: Some more extreme
x values might even be added to the sample to extend the range of validity.

1
The statisti ian's standard approa h to ontrolled alibration has a long pedigree
stemming from Eisenhart (1939), and is to regress Y on x and then invert the estimated
relationship to predi t x in future. If x is random or the x values an be safely assumed
to be typi al of those likely to arise in future, then to predi t x it is sensible to regress X;
random, on y; treated as onditionally xed. Under an extension of the Gauss- Markov
theorem for quadrati predi tion loss (Brown, 1993, se tion 3.3), this is the best linear
unbiased predi tor. It loses this optimality when there is one future x to predi t but m
repli ates in Y at this future value, as is explored below.
Let us omplete the model for the training data with a similar one for the predi tion
data in the linear model for ontrolled x: Imagine a single unknown x to predi t. We
denote it by  to emphasize that it is unknown and needs to be predi ted. At this value
of  we have observed m repli ates of Y: To further emphasize that these Y values are for
predi tion and may just be di erent from the training data Y arefully obtained under
laboratory onditions, we use Z in pla e of Y: The predi tion model is then
Zj = +  + j ; j = 1; : : : ; m: (2)

Here it is generally assumed that the varian e of the normally distributed error j is  2;
identi ally distributed to errors in (1). Departures from these assumptions would in lude
the possibility of a larger error varian e than  2; and orrelated repli ates. We have
already indi ated that su h departures are worth he king. The lassi al estimator from
the regression of Y or Z on x or  is

^C = (Z ^ )= ^ (3)

where Z = m
P Zj =m and least squares estimates ^ = y ^x; ^ = sxy =sxx; with sxy =
Pn(xi x)(yi 1 y); sxx = Pn (xi x)2: It terms of departures from training means eqn
1 1
(3) be omes (^C x) = (Z y)= : ^ The inverse predi tor for m = 1 from the regression
of X on y is simply
^I = ^  + ^Z (4)
or in terms of departures from training means, (^I x) = ^(Z y) with least squares
P
estimates ^  = x ^y; ^ = sxy =syy and syy = n1 (yi y)2:
The two regression lines are related by

(^I x) = r2 (^C x) (5)

where r2 = sxxsxysyy the squared orrelation. Sin e r2  1; the inverse estimator is loser
2

to the mean of x than the lassi al estimator. With a perfe t orrelation of unity the
estimators will oin ide, otherwise the di eren e between the two depends on the strength
of the relationship through r2 :
The shrinkage of the inverse estimator towards the mean relative to the lassi al es-
timator is re e ted in the assumption of a ommon normal distribution for the x0s and
 in alternative derivations of the inverse estimator. All approa hes to inferen e whi h

2
properly a ount for the un ertainty use the two sour es of un ertainty, in the training
data through model (1) with both x and y data and through the future observation(s)
with just Z data in model (2). The Bayes approa h to this amalgamation stems from
Hoadley (1970) and was extended by Brown (1982) to the multivariate general linear
model. Broadly it uses the training data to derive the posterior distribution of the regres-
sion parameters and then integrates these out from the predi tion model for Z to form a
predi tive density of the only remaining unknown, ; whi h ombines with the prior for
 to form the posterior. When there are predi tion repli ates these will add information
(residual sum of squares s02) about  2; provided this an be safely assumed to be ommon
to both predi tion and training. In this ase the typi al form of posterior is
 ( jZ; Y; X ) / f (Z jY; X ; s02 ;  ) ( jX );
where for detailed assumptions see Brown (1993), hapter 5.
Here however we will tie together approa hes to univariate, multivariate and non-
linear models through the pro le likelihood of : This uses maximisation of likelihood
rather than the integration of the Bayes approa h. It is formed by maximising over
parameters ; ;  2 the likelihood formed by ombining the n observations (xi ; Yi ); i =
1; : : : ; n and m `observations' (; Zj ); j = 1; : : : ; m for xed : The pro le or relative
likelihood (Kalb eis h and Sprott, 1970) gives maximum likelihood estimate of  if it
is further maximized over ; and the pro le likelihood is often standardized to having
an overall maximum of unity at the MLE. The pro le likelihood itself gives the relative
likelihood of di erent values of : By taking all values of  su h that this pro le likelihood
is greater than some onstant we generate likelihood ratio on den e intervals. For the
ontrolled alibration situation the pro le likelihood is proportional to
max Y n Y
m
[ f (yi j xi ; ; ;  ) f (zj j ; ; ;  2)℄:
2
; ;  1
2
1

The expli it form is given later for the more general multivariate ase in formula (10).
This pro le likelihood as a fun tion of  has a maximum at ^C given by (3), verifying its
maximum likelihood redentials. However if one assumes some of the x values and  are
random normal, whi h one is free to do without a e ting the distribution of Y j x; where
all have the same unknown mean and varian e, then a vital link between the unknown 
and the x sample is established. The pro le likelihood augmented by the normal xi ; 
data now shifts its peak towards the mean of the random : In the ase where all the x
values and  form a N (;  2 ) random sample, then the pro le likelihood is the ontrolled
ase relative likelihood multiplied by
max nY +1
[ f (xi j ;  2 )℄
;  2 1
where xn+1 = : This extra term is like the kernel of a Student distribution, 21;n =
(n+1) P
[1 + 1=n + ( x)2 =sxx ℄ 2 where x = n1 xi =n: Without predi tion repli ates, m = 1;

3
the ombined term is maximized by ^I given by (4) (Brown, 1993, Se tion 2.5). When
there are predi tion repli ates the maximum of the pro le likelihood, let us all it ^I ; is not


equal to ^I extended by Z repla ing Z: The predi tor ^  + ^Z is biased towards x; as noted
for example by Srivastava (1995) in simulations omparing lassi al and inverse predi tors
in multi-univariate linear alibration. A series expansion of the log-pro le likelihood as a
quadrati in  gives a maximising value and approximate predi tor (^I x) = k ^(Z y)


with expansion fa tor k given as


m ^2 + m[s2=sxx ℄
k= ; (6)
m ^2 + [ (ns+m) (ns+1)
2
+
xx

with s2 ; s2+ residual sums of squares from the training data, and training data augmented
with mean orre ted sums of squares from the repli ates, respe tively. We reiterate that it
is important here to he k for departures from assumptions espe ially di eren es between
error varian es in training and predi tion, and orrelated repli ates.
The fa tor k given by (6) is unity with no predi tion repli ates and is monotone in-
reasing in m; with a limiting value as m ! 1 of 1+ s =s xx = 1=r2 ; when as a onsequen e
2

^2

of (5) the estimator reverts to ^C : Similar relationships between the estimators may be
explored by adopting a Bayes approa h, see Brown (1982) for the multivariate general
linear model ase. In the published dis ussion of this paper (i) T. Fearn emphasizes that
the design for alibration will typi ally hoose or sub-sele t x's that span the range of
likely future values of ; (ii) J. Copas emphasises the essential inferential need for a link
between  and the training x-sample whi h underpins inverse estimation.
Likelihood ratio on den e intervals may be readily spe i ed. For ontrolled alibra-
tion the pro le likelihood is a monotone fun tion of the pivotal statisti

[Z y ^( x)℄=[^(1=m + 1=n +  2 =sxx )1=2℄ (7)

with (n + m 3)^ 2 = s2+ ; and has a Student t-distribution on (n + m 3) degrees of


freedom. Noti e here that un ertainty in  persists even if n ! 1 when the regression
p
parameters be ome e e tively known: the ^ = m term in the denominator of (7) remains
and only redu es to zero as m ! 1:
In the inverse estimation ontext, a orresponding Student sampling or Bayes predi -
tive distribution from the regression of X on y is readily available. It forms the basis
for on den e intervals when there are no predi tion repli ates, (m = 1); and may be
extended through the pro le likelihood (or Bayes posterior distribution) for predi tion
with repli ates, m > 1:

2 Multivariate Calibration

Simple single variable versus single variable alibration as featured above provides insights
into the essential di eren es between lassi al and inverse alibration. It is however the

4
ase that most modern instruments are multivariate in Y and often multiple in x: So
alled se ond-order or hyphenated instruments go one step further and deal with matrix
Y for ea h x; (Bro, 1996; Smilde, 1997; Linder and Sundberg, 1998).
In the multivariate general linear model, model (1) with predi tion model (2) may be
extended to
Y = 1 0 + X B + E; (8)
0
Z = 1 + 1 B + E ; 0  (9)
where Y (n  q ); E (n  q ); Z (m  q ); E  (m  q ) are random matri es, X (n  p) is a matrix
of xed onstants as are the ve tors of units, 1;whi h are respe tively n  1 and m  1;  is
a p  1 ve tor of p unknowns. Rows of the errors E and E  are independent multivariate
normal with mean the zero ve tor and ovarian e matrix the the q  q unknown matrix
: This model was developed in (Brown, 1982). In the ontext of the muni ipal waste
water example we have n = 125; observations, q = 316 UV-absorban es, and there is just
one omponent (nitrate) , p = 1; of waste water of interest.
The pro le likelihood for  from (8),(9) is given as
^ 0 ( x)℄0S+ 1[Z y B^ 0 ( x) ℄ ℄ n+2m : (10)
[ 2m;n ( )=f 2m;n ( ) + [Z y B gg
Here S+ is the q  q residual sum of produ ts matrix augmented by m 1 repli ate
information in (9), and
2m;n ( ) = [1=m + 1=n + ( x)0SXX1 ( x)℄ (11)

with SXX the p  p sum of mean orre ted produ ts matrix. The simple estimator
analogous to (3) is the generalized least squares estimator
^ + 1B^ 0 )BS
^GLS x = (BS ^ + 1(Z y)
This does not maximize (10) unless p = q; (Brown and Sundberg, 1987). When there
are relatively few observations, S+ is not of full rank and generalized inverse versions are
possible if n  p + q; but are somewhat unstable, see Sundberg (1996) who ompares
the pre ision of a range of estimators and dis usses putting redu ed parametri fa tor
stru ture into or Bayesian forms of regularization or shrinkage.
(See also the EofE entry on SHRINKAGE REGRESSION.)
Another strategy revolves around response sele tion, see for example Spiegelman et al (1996).
Variable sele tion is usually easiest to implement in the regression of X on Y ; see (Brown
et al., 1998) for fully multivariate Bayesian sto hasti sear h variable sele tion employing

Markov hain Monte Carlo.


For a review of on den e regions in ontrolled alibration see Oman (1999). As with
the univariate ase, un ertainty in  an only go to zero as both training sample size n;
and predi tion repli ates m both be ome large. When as is usual the number of responses
q is mu h greater than the number of omponents, p; of  to be predi ted then on i t

5
an arise whereby some omponents of the future Y suggest a quite di erent value of 
to other omponents: there is no suÆ ien y redu tion to iron this out due to the urved
nature of the exponential family, and likelihood based on den e regions will re e t this
on i t by be oming atter.
Predi tion from the regression of X on Y may be formulated in a similar way to
the univariate pro le likelihood approa h. The pro le likelihood formed by augmenting
models (8),(9) with the assumption of some random p ve tors Xi ; taken to the limit of
n+1
all n random, modi es (10) by a fa tor of 21;n ( ) 2 ; a spe ial ase of (11), and reverts
to the predi tive likelihood from the regression of X on Y: Con den e pro edures are
just obtained from multivariate predi tive Student distributions.
The regressions of X on Y and Y on X lead to  predi tors that after a suit-
able transformation are simply proportional with the squared anoni al orrelations as
proportionality fa tors, (Brown, 1982), extending the simple univariate relationship (5).
Chemometri ian's have often preferred to formulate models symmetri ally in X and Y;
and have es hewed statisti ian's fo us on random and ontrolled alibration in favour of
algorithmi methods su h as partial least squares, see parti ularly the dis ussion of Sund-
berg's paper (1999) by H. Martens and and S. Wold whose arguments are persuasive but
also see Sundberg's response. For a general latent variable approa h see (Burnham et al.,
1999).

3 Non-linear alibration

New features arise if the relationship between Y and X is non-linear even in the univariate
ase. Let us fo us on ontrolled alibration. In the ase of multivariate Y it may be
possible to sele t a subset of Y variables for whi h the relationship is linear without
dis arding too mu h information (Brown, 1993, hapter 7). If we do not have a surplus
of variables and want to retain all, then two situations may be distinguished.
(i) The regression of Y on x is linear in the parameters but non-linear in x:
(ii) The regression of Y on x is non-linear in the parameters as well as x:
In the rst ase although the linear model applies to the alibrating data, predi tion is still
non-linear. An example is polynomial regression. In the se ond ase standard inferential
methods still apply (likelihood and Bayes) but linearity in the model parameters annot
be exploited. For ase (i), the pro le likelihood is still given by (10), (11) if we repla e
 by h() a p  1 ve tor of p < p omponents in : Lundberg and de Mare (1980)
develop the univariate polynomial x ase and note that if the mean is monotone in x over
the range of x and possible ; then a single unique onsistent estimate ^ exists, whi h
is indeed the MLE. Their asymptoti on den e region is just obtained by linearising
the mean fun tion about this estimate. In general provided the x relationship is not too

6
urved, linearization about the maximum likelihood value of  an provide straightforward
on den e regions, see also (Oman, 1999).
Clarke (1992) provides a pro le likelihood solution for ase (ii) with a single unknown
omponent of x; and mean fun tions monotone in x: As with the earlier dis ussion this
just involves writing down the likelihood fun tion from both training and predi tion for
given  and maximising over the other unknown model parameters. Simple on den e
pro edures are available when approximating by linearization of the mean fun tion both
in  and the model parameters about their maximum likelihood estimates. There remains
the possibility of on i t between the estimates of  on di erent response axes, (Sundberg,
1999). The example given of rhino eros data with 2 responses (anterior and posterior horn
lengths) and a single x (age) ould bene t from inje ting the ex hangeability assumption
of  and the x-sample. It is analysed by Bayesian simulation te hniques by (du Plessis
and van der Merve, 1996).

Referen es

Bro, R. (1996). Multiway alibration. Multilinear PLS. Journal of Chemometri s , 10,


47{61.
Brown, P. J. (1982). Multivariate alibration (with dis ussion). Journal of the Royal

Statisti al So iety, B, 44, 287{321.

Brown, P. J. (1993). Measurement, Regression,and Calibration . Oxford: Clarendon Press.


Brown, P. J. and Sundberg, R. (1987). Con den e and on i t in multivariate alibration.
Journal of the Royal Statisti al So iety, B, 49, 46{57.

Brown, P. J., Vannu i, M. and Fearn, T. (1998). Multivariate Bayesian variable sele tion
and predi tion. Journal of the Royal Statisti al So iety, Series B, 60, 627{641.
Burnham, A. J., Ma Gregor, J. F. and Viveros, R. (1999). A statisti al framework
for multivariate latent variable methods based on maximum likelihood. Journal of
Chemometri s, 13, 49{65.

Clarke, G. P. Y. (1992). Inverse estimates from a multiresponse model. Biometri s , 48,


1081{1094.
du Plessis, J. L. and van der Merve, A. J. (1996). Bayesian alibration in the estimation
of the age of rhino eros. Annals of the Institute of Statisti al Mathemati s, 48, 17{28.
Eisenhart, C. (1939). The interpretation of ertain regression methods and their use in
biologi al and industrial resear h. Annals of Mathemati al Statisti s, 10, 162{186.
Hoadley, B. (1970). A Bayesian look at inverse regression. Journal of the Ameri an

Statisti al Asso iation, 65, 356{369.

7
Kalb eis h, J. D. and Sprott, D. A. (1970). Appli ation of likelihood methods to models
involving large numbers of parameters. Journal of the Royal Statisti al So iety, B, 32,
175{208.
Karlsson, M., Karlberg, B. and Olsson, R. J. O. (1995). Determination of nitrate in
muni ipal waste water by UV spe tros opy. Analyti a Chemi a A ta, 312, 107{113.
Linder, M. and Sundberg, R. (1998). Se ond order alibration: bilinear least squares
regression and a simple alternative. Chemometri s and Intelligent Laboratory Systems,
42, 159{178.
Lundberg, E. and de Mare, J. (1980). Interval estimates in the spe tros opi alibration
problem. S andinavian Journal of Statisti s, 7, 40{42.
Martens, H. and Ns, T. (1989). Multivariate alibration . Wiley, Chi hester.
Oman, S. D. (1999). Multivariate alibration. In Multivariate analysis, design of exper-
iments, and survey sampling (ed. S. Ghosh), pp. 265{299. Mar el Dekker: New York,

Basel.
Smilde, A. K. (1997). Comments on multilinear PLS. Journal of Chemometri s, 11,
367{377.
Spiegelman, C., Wang, S. and Denham, M. (1996). Asymptoti minimax alibration
estimates. Chemometri s and Intelligent Laboratory Systems, 32, 257{263.
Srivastava, M. S. (1995). Comparison of the inverse and lassi al estimators in multi-
univariate linear alibration. Communi ations in Statisti s, 24, 2753{2767.
Sundberg, R. (1996). The pre ision of the estimated generalized least squares estimator
in multivariate alibration. S andinavian Journal of Statisti s, 23, 257{274.
Sundberg, R. (1999). Multivariate alibration{ dire t and indire t regression methodology.
S andinavian Journal of Statisti s, 26, 161{207.

You might also like