You are on page 1of 6

Automatica 42 (2006) 303 308

www.elsevier.com/locate/automatica

Brief paper
A new autocovariance least-squares method for estimating
noise covariances
Brian J. Odelsona , Murali R. Rajamanib , James B. Rawlingsb,
a BP Research and Technology, 150 W. Warrenville Rd., Naperville, IL 60563, USA
b Department of Chemical and Biological Engineering, University of WisconsinMadison, 1415 Engineering Drive, Madison, WI 53706, USA

Received 25 August 2003; received in revised form 18 April 2005; accepted 7 September 2005
Available online 23 November 2005

Abstract
Industrial implementation of model-based control methods, such as model predictive control, is often complicated by the lack of knowledge
about the disturbances entering the system. In this paper, we present a new method (constrained ALS) to estimate the variances of the
disturbances entering the process using routine operating data. A variety of methods have been proposed to solve this problem. Of note, we
compare ALS to the classic approach presented in Mehra [(1970). On the identication of variances and adaptive Kalman ltering. IEEE
Transactions on Automatic Control, 15(12), 175184]. This classic method, and those based on it, use a three-step procedure to compute the
covariances. The method presented in this paper is a one-step procedure, which yields covariance estimates with lower variance on all examples
tested. The formulation used in this paper provides necessary and sufcient conditions for uniqueness of the estimated covariances, previously
not available in the literature. We show that the estimated covariances are unbiased and converge to the true values with increasing sample
size. The proposed method also guarantees positive semidenite covariance estimates by adding constraints to the ALS problem. The resulting
convex program can be solved efciently.
2005 Elsevier Ltd. All rights reserved.

Keywords: Adaptive Kalman lter; Covariance estimation; Optimal estimation; Semidenite programming; State estimation

1. Introduction remove some of the information burden from the user. Consider
the usual linear, time-invariant, discrete-time model
Model-based control methods, such as model predictive con-
trol (MPC), have become popular choices for solving difcult xk+1 = Ax k + Buk + Gw k ,
control problems. Higher performance, however, comes at a yk = Cx k + vk
cost of greater required knowledge about the process being con-
in which A Rnn , B Rnm , G Rng , C Rpn ,
trolled. Expert knowledge is often required to properly com-
and {wk }N Nd
k=0 and {vk }k=0 are uncorrelated zero-mean Gaussian
d
mission and maintain the regulator, target calculator, and state
noise sequences with covariances Qw and Rv , respectively. The
estimator of MPC, for example. This paper addresses the re-
quired knowledge for the state estimator, and describes a tech- sequence {uk }Nd
k=0 is assumed to be a known input resulting
nique with which ordinary closed-loop data may be used to from the actions of a controller. State estimates of the system
are considered using a linear, time-invariant state estimator

xk+1|k = Axk|k + Buk ,
This paper was not presented at any IFAC meeting. This paper was
xk|k = 
 xk|k1 + L[yk C
xk|k1 ]
recommended for publication in revised form by Associate Editor Marco
Campi under the direction of Editor I. Petersen. in which L is the estimator gain, which is not necessarily the
Corresponding author. Tel.: +1 608 263 5859; fax: +1 608 265 8794.
E-mail addresses: odelbj@bp.com (B.J. Odelson),
optimal gain. We denote the residuals of the output equations
rajamani@bevo.che.wisc.edu (M.R. Rajamani), rawlings@engr.wisc.edu (yk C xk|k1 ) as the L-innovations when calculated using a
(J.B. Rawlings). state estimator with gain L. In order to use the optimal lter,

0005-1098/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.automatica.2005.09.006
304 B.J. Odelson et al. / Automatica 42 (2006) 303 308

we need to know the covariances of the disturbances, Qw , Rv Propagating the estimate error through the state evolution equa-
from which we can calculate the optimal estimators error co- tion gives an explicit formula for the mean
variance and the optimal Kalman lter gain. In most industrial
process control applications, however, the covariances of the E(k ) = Ak m0
disturbances entering the process are not known. To address
this requirement, estimation of the covariances from open-loop and the recursion for the covariance
data has long been a subject in the eld of adaptive lter-
ing, and can be divided into four general categories: Bayesian cov(k ) = Pk ,
(Alspach, 1974; Hilborn & Lainiotis, 1969), maximum like- Pj = APj1 AT + GQw GT , j = 1, . . . , k.
lihood (Bohlin, 1976; Kashyap, 1970), covariance matching
(Myers & Tapley, 1976), and correlation techniques. Bayesian Because the lter is stable (Assumption 2), as k increases, the
and maximum likelihood methods have fallen out of favor be- mean converges to zero and the covariance approaches a steady
cause of their sometimes excessive computation times. They state given by the solution to the following Lyapunov equation
may be well suited to a multi-model approach as in Averbuch,
Itzikowitz, and Kapon (1991). Covariance matching is the com- E(k ) 0,
putation of the covariances from the residuals of the state es- cov(k ) P ,
timation problem. Covariance matching techniques have been P = AP AT + GQw GT . (2)
shown to give biased estimates of the true covariances. The
fourth category is correlation techniques, largely pioneered by We therefore assume that we have chosen k sufciently large
Mehra (1970, 1972), Blanger (1974) and Carew and Blanger so that the effects of the initial conditions can be neglected,
(1973), which we consider further in this paper. or, equivalently, we choose the steady-state distribution as the
initial condition.
2. Innovations based correlation techniques and the
ALS estimator Assumption 3. E(0 ) = 0, cov(0 ) = P .

With the standard linear state estimator, the state estimation Now consider the autocovariance, dened as the expectation
error, k = xk 
xk|k1 , evolves according to of the data with some lagged version of itself (Jenkins & Watts,
  1968),
wk
k+1 = (A ALC) k + [G, AL] . (1)
      vk
G    Cj = E[Yk YTk+j ]. (3)
A
wk

We dene the state-space model of the L-innovations as Using Eq. (1) and the steady-state initial condition (Assumption
3) gives for the autocovariance
k+1 = Ak + Gwk ,
Yk = Ck + vk E(Yk YTk ) = CP C T + Rv , (4)
E(Yk+j YTk ) = C Aj P C T C A j 1
ALRv , j 1 (5)
in which
Yk = yk C
xk|k1 which are independent of k because of our assumption about
the initial conditions. The autocovariance matrix (ACM) is then
and we require subsequently that the system is detectable and dened as
the chosen lter is stable.

C0 CN1
Assumption 1. (A, C) is detectable. R(N ) = ... ..
.
.. .
. (6)
CN1
T
C0
Assumption 2. A = A ALC is stable.
The number of lags used in the ACM is a user-dened pa-
A stable lter gain L exists because of Assumption 1. In this rameter, N. The off-diagonal autocovariances are not assumed
formulation, the state and sensor noises are correlated: zero, because we do not process the data with the optimal l-
  ter, which is unknown. The ACM of the L-innovations can be
Qw 0
E[wk (wk ) ] Qw =
T
, written as follows:
0 Rv
  N 
0
E[wk vkT ] = . T
Rv R(N ) = OP O +  GQw G T
T

i=1
Effect of initial condition: Assume the initial estimate error  
is distributed with mean m0 and covariance P0 ,
N
N
N
+ Rv + Rv T + Rv (7)
E(0 ) = m0 , cov(0 ) = P0 . i=1 i=1 i=1
B.J. Odelson et al. / Automatica 42 (2006) 303 308 305

in which in which
  
C 0 0 0 0  D(AL AL)
C A 0 A = D(G G)  , (11)
C 0 0 +[  + Ip2 N 2 ]Ip,N
O=
.. ,
=
.. .. .. ,
. . . .
C AN1 C AN2 C 0 D = [(O O)(In2 A A)1 + ( )In,N ],

N
x = [(Qw )Ts (Rv )Ts ]T , b = R(N )s .
 =  (AL) .
j =1
We dene the ALS estimate as
In this result and those to follow, we employ the standard de-
x = arg min Ax 
 b22 (12)
nitions of the Kronecker product, Kronecker sum and the direct x
sum (Brewer, 1978; Searle, 1982). In order to use the ACM
in which
relationship in a standard least-squares problem, we apply the
vec operator, which is the columnwise stacking of a matrix  w )Ts (R
x = [(Q v )Ts ]T ,   )s .
b = R(N
into a vector (Brewer, 1978). If zk is the kth column of an ar-
bitrary Z matrix The solution for the ALS estimate is the well known
vec(Z) = Zs = [z1T zkT ]T . x = A
 b, A = (AT A)1 AT .
Throughout this paper, we use the s subscript to denote the The uniqueness of the estimate is a standard result of least-
outcome of applying the vec operator. Applying the vec operator squares estimation (Lawson & Hanson, 1995). The estimated
to Eq. (7) and using the result of applying the vec operator on covariances are symmetric due to the structure of the least-
Eq. (2) squares problem.
Ps = (A A)Ps + (GQw GT )s (8)
Lemma 4. The ALS estimate (Eq. (12)) exists and is unique if
yields and only if A has full column rank.

[R(N )]s = [(O O)(In2 A A)1 We note that the ACM could be written in the output form as
well, in which the ACM is computed from the outputs instead of
+ ( )In,N ](G G)(Qw )s
the L-innovations. There are number of reasons why the output
+ {[(O O)(In2 A A)1 covariance estimator is inadequate in practical applications. In
+ ( )In,N ](AL AL) the output-based formulation, the control law would have to be
+ [  + Ip2 N 2 ]Ip,N }(Rv )s (9) specied in order to consider closed-loop data. Additionally,
output autocovariance methods are not suitable for estimating
in which Ip,N is a permutation matrix to convert the direct integrated white noise disturbances, which are used widely in
sum to a vector, i.e. Ip,N is the (pN)2 p 2 matrix of zeros industrial MPC implementations to remove steady offset.
and ones satisfying Comments on the initial lter gain: In principle, any stable
 N  lter gain, L, may be used to calculate the L-innovations. This
initial gain simply parameterizes the L-innovations. The covari-
Rv = Ip,N (Rv )s .
ances of the underlying noise sequences are contained in the
i=1 s
outputs of the process. While the choice of the initial lter may
Ideally, we would like to compute the autocovariance as the impact the number of data points required to nd reliable esti-
expectation of the product Yk YTk+j . Practically, we approxi- mates of the covariances, we show in Section 3 that the initial
mate the expectation from the data using the time average, a choice of the lter is irrelevant for large data sets. The autoco-
valid procedure since the process is ergodic (Jenkins & Watts, variance matrix in Eq. (6) has nonzero off-diagonal elements for
1968). The estimate of the autocovariance is computed as a suboptimal choice of L. Only when the true covariances (and
optimal lter) are employed are the off-diagonal terms zero.
d j
N
 1
Cj = Yi YTi+j (10)
Nd j 3. Properties of the ALS covariance estimates
i=1

which is the so-called unbiased autocovariance estimator. The In this section we evaluate the mean and variance of the ALS
estimated ACM, R(N ), is analogously dened using the com- estimator. To this end, rst we require the properties of the

puted Cj . At this point we can dene a least-squares problem estimated autocovariance.
to estimate Qw , Rv . We summarize Eq. (9) as
Lemma 5. The expectation of the estimated autocovariance
Ax = b (
Cj ) is equal to the autocovariance (Cj ) for all j, and the
306 B.J. Odelson et al. / Automatica 42 (2006) 303 308

variance goes to zero inversely with sample size, Nd 4. Discussion and comparison to previous approaches

E[
Cj ] = C j , j = 0, . . . , N, 4.1. Comparison to correlation based methods
 
1
cov(
Cj ) = O . The pioneering work of Mehra (1970, 1972) and Carew and
Nd j
Blanger (1973) has seen successful application using open-
Proof. The expectation result follows from taking expectation loop data and remains highly cited. Mehra employs a three-
of Eq. (10) and the denition of the autocovariance, Eq. (3). For step procedure to estimate (Qw , Rv ): (i) Solve a least squares
brevity, the proof of the variance result is omitted. A derivation problem to estimate P C T from the estimated autocovariances
can be found in Bartlett (1946).  using Eqs. (4) and (5). (ii) Use Eq. (4) and the estimated P C T
to solve for Rv . (iii) Solve a least-squares problem to estimate
Remark 1. The unbiased result in nite sample size is due to Qw from the estimated P C T and Rv using Eq. (8). We offer
the strong assumption we have made on the initial conditions, two criticisms of the classic Mehra approach. Our rst comment
concerns the conditions for uniqueness of (Q w , R
v ) in Mehras
Assumption 3. If we weaken this assumption and allow nonzero
expectation of initial error or covariance of initial error not approach. These conditions were stated (without proof) as
equal to P , then the bias is nonzero with nite sample size,
but decreases exponentially to zero with increasing sample size. (1) (A, C) observable.
(2) A full rank.
Note that for the types of problems to be solved with this (3) The number of unknown elements in the Qw matrix, g(g +
method, we choose Nd ?N , and therefore cov( Cj ) 0 as 1)/2, is less than or equal to np.
Nd , for all j. We can choose large Nd because we require
These conditions were also cited by Blanger (1974). As a
only routine operating data, not identication testing data with
counterexample, consider
input excitation. The properties of the autocovariance estimate
  
then imply the ALS estimates of the covariances are unbiased 0.9 0 0
0 1 0
for all sample sizes, and converge to the true values with in- A = 1 0.9 0 , C = , G = I.
0 0 1
creasing sample size. 0 0 0.9
The Mehra conditions predict that unique covariances exist, but
Theorem 6. Given A (Eq. (11)) has full column rank, the ALS
w , R
v ) (Eq. (12)) are unbiased the A matrix in Eq. (11) does not have full column rank for this
noise covariance estimates (Q
case. Thus these conditions are not sufcient. The problem here
for all sample sizes and converge asymptotically to the true
is that although P C T and Rv are uniquely estimatable from
covariances (Qw , Rv ) as Nd .
the data, Qw is not. Examining the null space of the stacked
version of the P C T equation shows that any multiple of the
Proof. For compactness, we use the notation of the least- w without
following matrix can be added to an estimate of Q
squares problem of Eq. (12) in which Ax = b, and
changing the t to the autocovariance data

x = arg min Ax 
 b22 . 0.117 0.552 0
x
Q = 0.552 0.613 0 .
0 0 0
The expected value of the estimate is
Consider a second counterexample,
x ] = A E[
E [ b]    
0.1 0 1
= A b (by Lemma 5) A= , G= , C = [1 ].
0 0.2 2
= A Ax = x.

When  = 0, this system is not observable, and thus does
not meet Mehras conditions. But A has full column rank for
The covariance of the estimate is  = 0, the ALS method estimates unique covariances, and thus
Mehras conditions are also not necessary. In this example, one
x ) = A cov(
cov( b)(A )T . can use just state x1 to uniquely distinguish the process dis-
turbance from the output disturbance. It makes no difference
From Lemma 5, cov(
b) 0 as Nd . Therefore whether or not the second state is observable.
Our second comment concerns the large variance associated
x) 0
cov( as Nd .  with Mehras method. This point was rst made by Neethling
and Young (1974), and seems to have been largely overlooked.
Remark 2. Again, as in Lemma 5, the unbiased result in nite First, step (ii) above is inappropriate because the zero-order
sample size is due to Assumption 3. If we remove this assump- lag autocorrelation estimate in Eq. (4) is not known perfectly.
tion, the bias is nonzero with nite sample size, but decreases Second, breaking a single-stage estimation of Qw and Rv into
exponentially to zero with increasing sample size. two stages by rst nding P C T and Rv and then using these
B.J. Odelson et al. / Automatica 42 (2006) 303 308 307

4 0.18
estimate ALS estimate
3 true value 0.16 true value

2 0.14

1 0.12
Rv

0 0.1

Rv
-1 0.08

-2 0.06

-3
0.04
-4
-30 -20 -10 0 10 20 30 0.02
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Qw
Qw

Fig. 1. Estimates of Qw and Rv using Mehras method.


Fig. 2. Estimates of Qw and Rv using proposed ALS method. Notice the
axes have been greatly expanded compared to Fig. 1.

estimates to estimate Qw in steps (i) and (iii) also increases the Adding the semi-denite constraint directly to the estimation
variance in the estimated Qw . problem gives a constrained ALS estimation problem
To quantify the size of the variance ination associated with
   2
Mehras method, consider a third example, which has a well-  (Qw )s 
 = min  
conditioned observability matrix Qw ,Rv A (Rv )s b
2
  s.t. Qw 0, Rv 0. (13)
0.1 0 0.1 1
A= 0 0.2 0 , G= 2 , C = [0.1 0.2 0]. The constraints in Eq. (13) are convex in Qw , Rv and the opti-
0 0 0.3 3 mization is in the form of a semidenite programming (SDP)
problem (Vandenberghe & Boyd, 1996). The matrix inequali-
Data are generated using noise sequences with covariance Qw = ties Qw 0, Rv 0 can then be handled by adding a logarith-
0.5, Rv = 0.1. The L-innovations are calculated with a lter mic barrier function to the objective. The optimization in Eq.
gain corresponding to incorrect noise variances Qw = 0.2 and (13) becomes:
Rv = 0.4. Mehras method and the ALS method are run using    2  
 (Qw )s   Qw 0 
Nd = 1000 data points, N = 15. The simulation is repeated 200  = min A  
b  log   (14)
times to illustrate the mean and variances of the estimators. In Qw ,Rv (Rv )s 2
0 Rv 
Fig. 1, the estimates of (Qw , Rv ) using Mehras method are in which,  is the barrier parameter and | | denotes the de-
plotted. The variance of the estimates is large, and many of terminant of the matrix (Nocedal & Wright, 1999). The opti-
the estimates are negative, which is unphysical. In Fig. 2, the mization in Eq. (14) is convex and the gradient can be evalu-
ALS estimates of (Qw , Rv ) are plotted, on much tighter axes. ated analytically. A simple path following algorithm based on
The variance of the ALS estimates is much smaller than in Newton steps provides a simple and efcient method to nd the
Mehras method, and none of the estimates are negative. Note global optimum. The details of this type of algorithm can be
that Neethling and Young (1974) discuss other examples with found in Wolkowicz, Saigal, and Vandenberghe (2000, Chapter
behavior similar to this one. 10). The convexity of the optimization in Eq. (13) and Lemma
4 ensure uniqueness of the covariance estimate. The algorithm
4.2. Enforcing semidenite constraints generalizes efciently for large dimensional problems.

When dealing with a small sample of measurements or sig- 5. Conclusions


nicant plant/model error, the ALS estimate of the covariances
from Eq. (12) may not be positive semidenite, even though In this paper we have developed a new method (ALS) for
the variance of the estimate may be smaller than the two-step using the autocovariance of the L-innovations to estimate the
procedure. Such estimates are physically meaningless. Most of covariances of the system disturbances, which are required for
the literature for estimating covariances does not address this optimal state estimation. We have shown that the ALS covari-
issue. A recent ad hoc method of imposing positive semide- ance estimates are unbiased and converge asymptotically to the
niteness on the estimates of only Rv is given in Noriega and true system covariances with increasing sample size. Using the
Pasupathy (1997). standard properties of least-squares estimation, necessary and
308 B.J. Odelson et al. / Automatica 42 (2006) 303 308

sufcient conditions for unique covariance estimates have been Myers, K. A., & Tapley, B. D. (1976). Adaptive sequential estimation with
provided. Examples are provided to show that conditions previ- unknown noise statistics. IEEE Transactions on Automatic Control, 21,
520523.
ously stated in the literature are neither necessary nor sufcient
Neethling, C., & Young, P. (1974). Comments on Identication of optimum
for uniqueness of these estimates. Previously reported methods lter steady-state gain for systems with unknown noise covariances. IEEE
for this estimation have been shown to have unnecessarily large Transactions on Automatic Control, 19(5), 623625.
variance. The approach also guarantees semidenite covariance Nocedal, J., & Wright, S. J. (1999). Numerical optimization. New York:
estimates by solving a convex semidenite programming prob- Springer.
Noriega, G., & Pasupathy, S. (1997). Adaptive estimation of noise covariance
lem.
matrices in real-time preprocessing of geophysical data. IEEE Transactions
on Geoscience Remote Sensing, 35(5), 11461159.
Acknowledgments Searle, S. R. (1982). Matrix algebra useful for statistics. New York: Wiley.
Vandenberghe, L., & Boyd, S. (1996). Semidenite programming. SIAM
Financial support came from the industrial members of the Review, 38(1), 4995.
Texas-Wisconsin Modeling and Control Consortium and NSF Wolkowicz, H., Saigal, R., & Vandenberghe, L. (Eds.). (2000). Handbook
of semidenite programming: Theory, algorithms and applications. MA:
through Grant #CTS-0105360. The authors thank the anony- Kluwer Academic Publishers.
mous reviewers for many helpful comments, and Professors
Stephen P. Boyd and John F. MacGregor for helpful discus-
Brian J. Odelson received his B.S. from Pur-
sions. due University in 1998 and his Ph.D. from the
University of Wisconsin in 2003, both in Chem-
References ical Engineering. His Ph.D. research focused
on improved state estimation in the presence
of unknown noise, model mismatch, and vari-
Alspach, D. L. (1974). A parallel ltering algorithm for linear systems with
ous disturbance models. In 2003, he moved to
unknown time-varying noise statistics. IEEE Transactions on Automatic Naperville, Illinois as a Senior Research Engi-
Control, 19(5), 552556. neer for BP Research and Technology.
Averbuch, A., Itzikowitz, S., & Kapon, T. (1991). Radar target
trackingviterbi versus IMM. IEEE Transactions on Aerospace and
Electronic Systems, 27(3), 550563.
Murali R. Rajamani was born in Coimbatore,
Bartlett, M. S. (1946). On the theoretical specication of sampling properties
India, in 1980. He received his Bachelors in
of autocorrelated time series. Journal of the Royal Statistical Society Chemical Engineering (B. Chem. Engg.) degree
Supplement, 8, 2741. with a gold medal from University of Mumbai,
Blanger, P. R. (1974). Estimation of noise covariance matrices for a linear Institute of Chemical Technology, Mumbai, In-
time-varying stochastic process. Automatica, 10, 267275. dia in 2002. Since then, he has been working
Bohlin, T. (1976). Four cases of identication of changing systems. In towards a Ph.D. degree in the Department of
R. K. Mehra, & D. G. Lainiotis (Eds.), System identication: Advances Chemical and Biological Engineering, Univer-
and case studies (1st ed.). New York: Academic Press. sity of WisconsinMadison, USA. His research
Brewer, J. (1978). Kronecker products and matrix calculus in system theory. interests include model predictive control, state
estimation, estimation of noise statistics, non-
IEEE Transactions on Circuits and Systems, 25(9), 772781.
linear control and applications of semidenite
Carew, B., & Blanger, P. R. (1973). Identication of optimum lter steady- optimization in control and estimation.
state gain for systems with unknown noise covariances. IEEE Transactions
on Automatic Control, 18(6), 582587. James B. Rawlings received his B.S. from
Hilborn, C. G., & Lainiotis, D. G. (1969). Optimal estimation in the presence the University of Texas in 1979 and his Ph.D.
of unknown parameters. IEEE Transactions on Systems, Science, and from the University of Wisconsin in 1985,
Cybernetics, 5(1), 3843. both in Chemical Engineering. He spent one
Jenkins, G. M., & Watts, D. G. (1968). Spectral analysis and its applications. year at the University of Stuttgart as a NATO
500 Sansome Street, San Fransisco, CA: Holden-Day. postdoctoral fellow and then joined the faculty
Kashyap, R. L. (1970). Maximum likelihood identication of stochastic linear at the University of Texas. He moved to the
systems. IEEE Transactions on Automatic Control, 15(1), 2534. University of Wisconsin in 1995 and is cur-
rently the Paul A. Elfers Professor of Chemical
Lawson, C. L., & Hanson, R. J. (1995). Solving least squares problems.
and Biological Engineering and the co-director
Philadelphia, PA: SIAM. of the Texas-Wisconsin Modeling and Control
Mehra, R. K. (1970). On the identication of variances and adaptive Kalman Consortium (TWMCC).
ltering. IEEE Transactions on Automatic Control, 15(12), 175184. His research interests are in the areas of chemical process modeling,
Mehra, R. K. (1972). Approaches to adaptive ltering. IEEE Transactions on monitoring and control, nonlinear model predictive control, moving horizon
Automatic Control, 17, 903908. state estimation, particulate systems modeling, and crystal engineering.

You might also like