Professional Documents
Culture Documents
finite-dimensional distributions have a multivariate and call it the autocovariance function of the process.
normal distribution. That is, for any choice of dis- For stationary Gaussian processes fXt g, we have
tinct values t1 , . . . , tk 2 T, the random vector X D
3. Xt ¾ N , 0
X ¾ N m,
h
0
VAG002-
2 Gaussian processes
An autocovariance function Ð
has the properties:
1
Xt D j Ztj 6
1. 0
½ 0, jD0
2. j h
j 0
for all h,
3. h
D h
, i.e. Ð
is an even function. The autocovariance of fXt g has the form
Autocovariances have another fundamental prop-
erty, namely that of non-negative definiteness,
1
h
D j jCh 7
n jD0
ai ti tj
aj ½ 0 4
Gaussian processes 3
The key observation here is that the best mean The coefficients !n in the calculation of the one-step
square error predictor of X1 in terms of X2 prediction error (11) and the mean square error of
(i.e. the multivariate function g X2
that minimizes prediction (13) can be computed recursively from the
EjjX1 g X2
jj2 , where jj Ð jj is Euclidean distance) equations
is E X1 jX2
D mX1 jX2 which is a linear function of 1
X2 . Also, the covariance matrix of prediction error, n
!nn D n
!n1,j n 1
v1 n1
X1 jX2 , does not depend on the value of X2 . These
jD1
results extend directly to the prediction problem for
Gaussian processes. !n,1 !n1,1 !n1,n1
Suppose fXt , t D 1, 2, . . .g is a stationary Gaussian . .. ..
.. D . !nn .
process with mean and autocovariance function
!n,n1 !n1,n1 !n1,1
Ð
and that based on the random vector consisting
of the first n observations, Xn D X1 , . . . , Xn
0 , we 2
vn D vn1 1 !nn
14
where
with vn D 2 .
!n1 , . . . , !nn
0 D 1
n n 12
0 . The mean square error of predic- for model identification. The partial autocorrelation at
tion is given by lag j is interpreted as the correlation between X1 and
XjC1 after correcting for the intervening observations
vn D 0
n0 1
n n 13
X2 , . . . , Xj . Specifically, !jj is the correlation of
the two residuals obtained by regression of X1 and
These formulas assume that n is nonsingular. If XjC1 on the intermediate observations X2 , . . . , Xj .
n is singular, then there is a linear relationship Of particular interest is the relationship between !nn
among X1 , . . . , Xn and the prediction problem can and the reduction in the one-step mean square error
then be recast by choosing a generating prediction as the number of predictors is increased from n 1
VAG002-
4 Gaussian processes
to n. The one-step prediction error has the following is a useful representation of the likelihood in terms of
decomposition in terms of the partial autocorrelation the one-step prediction errors and their mean square
function: errors. By the form of X n , we can write
2
vn D 0
1 !11 2
Ð Ð Ð 1 !nn
16
n D An X n
Xn X 18
For a Gaussian process, XnC1 X nC1 is normally where An is a lower triangular square matrix with
distributed with mean 0 and variance vn . Thus, ones on the diagonal. Inverting this expression,
nC1 š z1˛/2 v1/2 we have
X n n
Xn D Cn Xn X 19
and Xj D E Xj jX1 , . . . , Xj1
for j ½ 2. If n vj1
jD1
denotes the covariance matrix of Xn , which we
assume is nonsingular, then the likelihood of Xn is It follows that det n D v0 v1 . . . vn1 so that the
n/2 1/2 likelihood reduces to
L n ,
D 2
det n
ð exp 12 Xn 1
0 1 X 1
L n ,
D 2
n/2 v0 v1 . . . vn1
1/2
n n
17
1 n
X
X
2
exp
j j
where 1 D 1, . . . , 1
0 . Typically, n will be express- 22
2 vj1
ible in terms of a finite number of unknown param- jD1
eters, ˇ1 , . . . , ˇr , so that the maximum likelihood
The calculation of the one-step prediction errors and
estimator of these parameters and are those val-
their mean square errors required in the computation
ues that maximize L for the given dataset. Under
of L based on (22) can be simplified further for
mild regularity assumptions, the resulting maximum
a variety of time series models such as ARMA
likelihood estimators are approximately normally dis-
processes. We illustrate this for an AR process.
tributed with covariance matrix given by the inverse
of the Fisher information.
In most settings, direct-closed-form maximization Gaussian Likelihood for an AR(p) Process
of L with respect to the parameter set is not achiev-
able. In order to maximize L using numerical meth- If fXt g is the AR(p) process specified in (8) with
ods, either derivatives or repeated calculation of the mean , then one can take advantage of the simple
function are required. For moderate to large sample form for the one-step predictors and associated mean
sizes n, calculation of both the determinant of n and square errors. The likelihood becomes
the quadratic form in the exponential of L can be dif-
ficult and time consuming. On the other hand, there L !1 , . . . , !p , , 2
D 2
np
/2 np
VAG002-
Gaussian processes 5
n j
2 where, for j > p, X j D C !1 Xj1
C Ð Ð Ð C
1 X X
ð exp 2
p/2
j
!p Xjp1
are the one-step predictors. The
2 2
jDpC1 likelihood is a product of two terms, the conditional
p j
2 density of Xn given Xp and the density of Xp . Often,
1 X X
ð v0 v1 . . . vp1
1/2 exp
j
just the conditional maximum likelihood estimator
2 vj1 is computed which is found by maximizing the first
jD1
23
term. For the AR process, the conditional maximum
20
19
Temperature
18
17
16
15
Figure 1 Average maximum temperature, 1885–1993. Regression line is 16.83 C 0.008 45t
1
Temperature for september
−1
−2
−3
−2 −1 0 1 2
Quantiles of standard normal
6 Gaussian processes
likelihood estimator can be computed in closed form. significance of a nonzero slope of the line. Without
modeling the dependence in the residuals, the slope
Example This example consists of the average would have been deemed significant using classical
maximum temperature over the month of September inference procedures. By modeling the dependence
for the years 1895–1993 in an area of the US whose in the residuals, the evidence in favor of a nonzero
vegetation is characterized as tundra. The time series slope has diminished somewhat. The QQ plot of the
x1 , . . . , x99 is plotted in Figure 1. Here we investigate estimated innovations is displayed in Figure 2. This
the possibility of the data exhibiting a slight linear plot shows that the AR(1) model is not far from being
trend. After inspecting the residuals from fitting a Gaussian. Further details about inference procedures
least squares regression line to the data, we entertain for regression models with time series errors can be
a time series model of the form found in [2, Chapter 6].
Xt D ˇ0 C ˇ1 t C Wt 24
References
where fWt g is the Gaussian AR(1),
[1] Brockwell, P.J. & Davis, R.A. (1991). Time Series: The-
ory and Methods, 2nd Edition, Springer-Verlag, New
Wt D !1 Wt1 C Zt 25
York.
[2] Brockwell, P.J. & Davis, R.A. (1996). Introduction to
and fZt g is a sequence of iid N 0, 2
random vari- Time Series and Forecasting, Springer-Verlag, New
ables. After maximizing the Gaussian likelihood over York.
the parameters ˇ0 , ˇ1 , !1 , and 2 , we find that the [3] Diggle, Peter J., Liang, Kung-Yee & Zeger, Scott L.
maximum likelihood estimate of the mean function is (1996). Analysis of Longitudinal Data, Clarendon Press,
16.83 C 0.008 45t. The maximum likelihood parame- Oxford.
[4] Fahrmeir, L. & Tutz, G. (1994). Multivariate Statis-
ters of !1 and 2 are estimated by 0.1536 and 1.3061,
tical Modeling Based on Generalized Linear Models,
respectively. The maximum likelihood estimates of Springer-Verlag, New York.
ˇ0 and ˇ1 can be viewed as generalized least squares [5] Rosenblatt, M. (2000). Gaussian and Non-Gaussian
estimates assuming that the residual process follows Linear Time Series and Random Fields, Springer-Verlag,
the estimated AR(1) model. The resulting standard New York.
errors of these estimates are 0.277 81 and 0.004 82,
respectively, which provides some doubt about the RICHARD A. DAVIS