Updating Techniques in Recursive Least-Squares Estimation

Updating Techniques in Recursive Least-Squares
Estimation.
Anders Sjo
October 23, 1992
Contents
1 Introduction
2 Algorithms
2.1 Recursive Algorithms and the Regression Model : : : : : :

2.2 Algorithms Using Modication of the Covariance Matrix :
2.2.1 The RLS Algorithm : : : : : : : : : : : : : : : : :
2.2.2 Peterka's and Bierman's Algorithms : : : : : : : :
2.2.3 Algorithms Using Orthogonal Transformations : :
2.3 Algorithms Using Modication of the Information Matrix
2.4 Algorithms Using Modication of the QR Decomposition
2.4.1 Updating of the QR Decomposition : : : : : : : :
2.4.2 Downdating of the QR Decomposition : : : : : : :
2.4.3 Modication of the Inverse of the R-matrix : : : :
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
2
4
4
5
5
7
10
12
13
14
14
18
3 Error Analysis
22
4 Time-Varying Parameter Estimation

5 Test Results
28
31
A Alternative Algorithms Using Orthogonal Transformations
56
3.1 Updating of the Covariance and Information Matrix : : : : : : : 22

3.2 Updating of the QR Decomposition : : : : : : : : : : : : : : : : : 25
5.1 Identication of Ship Steering Dynamics : : : : : : : : : : : : : : 31

5.2 Hilbert Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49
Chapter 1
Introduction
The need for recursive algorithms for parameter estimation has arisen since online parameter estimation is frequently used in scientic areas like for instance
adaptive control and signal processing.
In this survey we have only considered algorithms based on the least-squares
problem, i.e.
min
kAx ; bk2;
(1.1)
x
where A 2 IRmn , x 2 IRn and b 2 IRm , with m > n. The solution to the
least-squares problem is given by the normal equations
AT Ax = AT b:
When a new linear equation aT x = is added to the system, we get a modied
least-squares problem

min
x kAx ; bk2 ;
;

;

where A = AT a T and b = bT T . This procedure is called updating.
The classical recursive algorithms are based on the normal equations, which
has the drawback that we will lose accuracy when the matrix AT A is formed.
One alternative is to update the Cholesky factorization of this matrix or its
inverse, e.g. AT A = LLT , where L is a lower-triangular (n n)-matrix, but still
use the same idea as in the classical algorithms. The numerical disadvantages
of the normal equations have however led to a new set of algorithms that in
principle update the QR decomposition of; the matrix
A = QR0 , where Q is

T
T
0
an orthogonal (m m)-matrix and R = R 0 , where R is an uppertriangular (n n)-matrix. Although dierent in approach, these two groups of
alternative algorithms are closely related since
AT A = AT A + aaT = LLT + aaT = RT R + aaT ;
2
i.e. L = RT .
In Chapter 2 we present the dierent alternatives to perform recursive leastsquares estimation. To make the algorithms comparable we have aimed for
making the taxonomy as uniform as possible, implying that it often will dier
from the presentations in the references. In Chapter 3 a brief error analysis
is presented to form the basis of the test examples presented in Chapter 5.
In Chapter 4 we discuss the problem of time-varying parameters. There are
mainly two strategies to deal with this problem: downdating and forgetting. By
downdating we mean the case when a linear equation aT x = is discarded
from (1.1), giving for example
R T R = RT R ; aaT :
Downdating is, however, numerically a more complicated problem than updating mainly because the singular values of R are less than or equal to the
corresponding singular values of R, i.e. R may become singular. In forgetting
we put time-dependent weights on the old data, giving a weighted least-squares
problem
min
kD(Ax ; b)k2 ;
x
where D = diag((m;i)=2 ) and the forgetting factor is limited by 0 < < 1.
This type of forgetting is often referred to as exponential forgetting. This is also
connected to some minor problems, which will be further discussed in Chapter 4.
An alternative to exponential forgetting is directional forgetting, i.e. we only
forget in the direction of the new vector a.
Chapter 2
Algorithms
2.1 Recursive Algorithms and the Regression
Model
In this chapter we will derive a number of algorithms for the recursive leastsquares problem. These algorithms must have certain qualities that to some
extent is connected to their applications. The most important qualities are that
the algorithm should give a solution that is accurate up to the limitations
of the data and conditioning of the problem, i.e. the algorithm should not
aggravate the problem's sensitivity to perturbations,
the algorithm should use as few operations (and particularly as few squareroots) as possible and
it should be possible to use a computer with short word-length,
cf. Bjorck et al. [5] and Alexander et al. [1].
The least-squares method has many applications, among which we will concentrate on the regression problem. Assume that we have a mathematical model
for our system that can be represented as a linear regression model
y(t) = '1 (t)1 + '2 (t)2 + + 'n (t)n + "(t) = 'T (t) + "(t);
(2.1)
where y is the observed variable, is a vector containing the parameters to
be estimated, ' is a regression vector of known functions 'i , that may depend
on other known variables, and " is additive errors.; Now, let
us say that in a
discrete-time problem we have observed pairs of y(i); '(i) for i = 1; 2; :::; t.
These observations give an overdetermined linear system
t + Et = Yt ;
4
with
0 'T (1) 1
BB 'T (2) CC
t = B .. C ;
@ . A
'T (t)
;

;

Yt = y(1) y(2) y(t) T and Et = "(1) "(2) "(t) T . The
least-squares solution to this overdetermined equation system is obtained by

minimizing the square of the residual vector
min
kYt ; t k22 ;
(2.2)

leading to the normal equations
Tt t = Tt Yt :
(2.3)
If the matrix Tt t is non-singular, we have found the that solves (2.2):
^t = (Tt t );1 Tt Yt :
(2.4)
With the assumptions that E[Et ] = 0 and E[Et EtT ] = 2 I we are able to dene
the matrix
;
Pt = 2 Tt t ;1
as the covariance matrix and its inverse Pt;1 as the information matrix.
2.2 Algorithms Using Modication of the Covariance Matrix

In the following subsections a number of recursive algorithms for updating of the
covariance matrix Pt will be derived. First we will outline the derivation of the
classical recursive least-squares algorithm, here referred to as the RLS algorithm,
that forms the basis of the succeeding algorithms that work with LDLT or
Cholesky factorization of the covariance matrix. The reason for using factiorized
forms is that it reduces the loss of precision when the covariance matrix is
calculated. Most of the algorithms are based on already known algorithms for
rank-one modication of Cholesky factorizations, cf. Bierman [4] and Gill et
al. [12].
2.2.1 The RLS Algorithm
To derive the RLS algorithm we note that when adding a new data vector,
the inverse of the covariance matrix is updated as
Pt;1 = Pt;;11 + ;2 't 'Tt
(2.5)
5
and furthermore,
;

^t = ;2 Pt Tt Yt = ;2 Pt Tt;1 Yt;1 + 't yt :
(2.6)
By using Equations (2.3) and (2.5) we obtain

Tt;1 Yt;1 = 2 Pt;;11 ^t;1 = 2 Pt;1 ^t;1 ; 't 'Tt ^t;1 ;
which together with Equation (2.6) gives the parameter estimate at time t
^t = ^t;1 ; ;2 Pt 't 'Tt ^t;1 + ;2Pt 't yt
= ^t;1 + ;2 Pt 't yt ; 'Tt ^t;1
(2.7)
^
= t;1 + Kt"t ;
where the Kalman gain is dened by
Kt = Pt '2 t ;
and the prediction error is given by

"t = yt ; 'Tt ^t;1 :
(2.8)
The updating of the covariance matrix is still to be dealt with, since the appearance of the Kalman gain implies that we do not want to handle the inverse
of the matrix. From Equation (2.5) we note that the covariance matrix can be
expressed as follows,
;

Pt = Pt;;11 + ;2 't 'Tt ;1 ;
and by using Sherman-Morrison's formula, cf. Andersson et al. [2], we obtain
with
T
Pt = Pt;1 ; P2t;+1 ''tT'Pt Pt;'1 = Pt;1 ; ;t 1 (Pt;1 't ) (Pt;1 't )T ;
t;1 t
t
t = 2 + 'Tt Pt;1 't :
(2.9)
(2.10)
The Kalman gain can now be expressed with the old covariance matrix
;P ; ;1(P ' )(P ' )T ' P '
t = t;1 t :
Kt = t;1 t t;12 t t;1 t
(2.11)

t
We see that the covariance matrix is updated by a subtraction of a rank-one

matrix containing the old covariance matrix and the new regression vector. The
subtraction sign comes out as a result when we use Sherman-Morrison's formula
and makes this RLS problem equivalent to downdating, which numerically is a
more dicult problem with dierent stability properties. An alternative is to
update a factorization of P ;1 directly. However, the price to be paid for this is
that the Kalman gain is not obtained explicitly, but only after the solution of a
(triangular) linear system.
We can nally summarize the RLS algorithm with Equations (2.7{11):
6
The RLS algorithm

"t
t
Kt
^t
Pt
:=
:=
:=
:=
:=
yt ; 'Tt ^t;1
2 + 'Tt Pt;1 't
Pt;1 't =t
^t;1 + Kt"t
Pt;1 ; ;t 1 (Pt;1 't ) (Pt;1 't )T :
2
Due to the fact that P is symmetric the RLS algorithm only requires n =2+O(n)
multiplications and n divisions.
2
Exponential forgetting can be applied to the RLS algorithm by simply scale

the covariance matrix as Pt := ;1 Pt , where 2 = . This will not aect the
algorithm appart from that the covariance matrix is updated as

P := 1 P ; ;1 (P ' ) (P ' )T
t
t;1
t;1 t
t;1 t
and the factor t is calculated as

t := + 'Tt Pt;1 't :
This also shows that we do not need to know the variance of the additive errors
2 .
Note that whenever the vector Pt;1 't = 0 the exponential forgetting leads
to an exponential increase of the matrix Pt , since 0 < < 1. An alternative
is therefore directional forgetting, where we only forget in the 't -direction. See
further Section 4 and Hagglund [15].
2.2.2 Peterka's and Bierman's Algorithms
Peterka's and Bierman's algorithms are two algorithms that are based on the
covariance matrix updating formula (2.9), but now the covariance matrix is
written in LDLT factorized form, cf. Peterka [22] and Bierman [4] respectively.
Although the lines of derivation diers, the two algorithms can be shown to be
theoretically equal.
To make the derivations of the algorithms easier to follow we will in the
remainder of this section use the following notations P = Pt , P = Pt;1 and so
forth.
Peterka's algorithm is based on a transformation of a joint covariance
matrix P 2 IR(n+1)(n+1) . Consider the system of (2.1), the joint covariance
matrix of y and is given by
T
2
T
P = ' P' + ' P :
P'
If P is written in LDLT form, P may be factorized as
1 'T L
P=
0 L
1
LT ' LT
By using a decomposition method called dyadic decomposition (see Peterka [22]

or
Astrom and Wittenmark [27]), Peterka's algorithm transforms this to
P=
1
0
K L

0
0 D
1 K T
0 L T ;
where is the same parameter as in (2.10) and K is the Kalman gain.

Bierman's algorithm on the other hand works more directly on the covariance matrix updating formula (2.9). We write (2.9) with P in LDLT form
as
T
;1
T T LDLT
L D L T = LDL
; ;1 ;LDLT ''
;

= L D ; DL 't DLT ' T LT
;

= L D ; ;1 wwT LT :
and form the vector v = LT '. Our goal is now to decompose
D ; ;1 wwT = L~ D~ L~ T
and then form
LL~ D~ L~ T LT = L D L T ;
i.e. L = LL~ and D = D~ . This is equivalent to the derivation of the standard

rank-one update of Cholesky factorization. Its main drawback is that it may
become unstable. In the derivation of Bierman's algorithm, however, the calculations are rearranged, using the special appearance of the scalar , to improve
the numerical behaviour.
The two algorithms may be summerized as follows (note that dj is the j :th
diagonal element of the matrix D, Lj is the j :th column of the matrix L, vj is
the j :th element of the vector v and K j is the j :th iterate of the vector K ),
Peterka's algorithm
1. Dene vectors v := LT ' and w := Dv.
2. Set n+1 := 2 and K n := 0.
3. Use backward recurrence
for j = n to 1
j := j+1 + vj wj
dj := j+1 dj =j
L j := Lj ; vj K j
j
K ;1 := K j + wj L j =j
4. Use the Kalman gain K := K 0 .

5. Update the parameters ^ := ^ + K y ; 'T ^ .
Bierman's algorithm
1. Dene vectors v := LT ' and w := Dv.
2. Set n+1 := 2 and K n := 0.
for j = n to 1
j := j+1 + vj wj
dj := j+1 dj =j
L j := Lj ; vj K j =j+1
K j;1 := K j + wj Lj
4. Compute the Kalman gain K := K 0 =1 .

As we can see there is very little that diers between the two algorithms, and
we can thus expect them to perform equally well. The complexity of the algorithms is also comparable, i.e. Peterka's algorithm requires 3n2=2 + O(n) multiplications and 2n divisions, while Bierman's algorithm requires 3n2=2 + O(n)
multiplications and 2n + 1 divisions.
Like all algorithms that update the covariance matrix, Peterka's and Bierman's algorithms have the advantage that the Kalman gain appears directly.
Furthermore, the usage of the LDLT factorization rather than the Cholesky
factorization avoids the involvement of square-roots, which in some applications
is preferable.
Here we can also easily use exponential forgetting by setting n+1 := and
.
scaling D := D=
2.2.3 Algorithms Using Orthogonal Transformations
The good numerical qualities of orthogonal transformations can of course also

be utilized when deriving recursive algorithms for the updating of matrix factorizations. In Gill et al. [12] four algorithms for updating of LDLT or Cholesky
factorizations using orthogonal transformations are derived. Among those there
is however only one that is well suited for our purposes. Hence, only this algorithm will be presented here.
For the Givens 1 algorithm we write the covariance matrix in Cholesky
factorized form, i.e.
P = L L T = LLT ; ;1 LvvT LT ;
(2.12)
where v = LT ', and form the matrix Rn 2 IR(n+1)(n+1) as

T
Rn = vn L0 ;
with n2 = 2 . Then we apply n Givens rotations U = U1 U2 Un from the left

to zero the vector v, giving a new matrix

T
R0 = 00 (KL0 )T ;
with 02 = 2 + vT v = . By writing
RT0 R0 = RTn U T U Rn = RTn Rn
we nd that the covariance matrix updating formula (2.12) comes out as a result
and that the Kalman gain is calculated as
K = K :
0
We are now ready to summarize the algorithm.

10
Givens 1 algorithm
1. Dene the vector v := LT '.
2. Set n := 2 , n := pn and K n := 0.
for j = n to 1
j;1
j;1
L j
j
K ;1
:=
:=
:=
:=
j + vj2
pj;1
(j Lj ; vj K j )=j;1
(j K j + vj Lj )=j;1
4. Compute the Kalman gain K := K 0 =0.

2
This algorithm requires 5n2=2 + O(n) multiplications, 2n + 1 divisions and
n + 1 square-roots. When comparing this algorithm to the RLS algorithm and
Peterka's and Bierman's algorithms, it is clearly seen that it cannot hold it's
own when it comes to the number of operations. The involvement of squareroots also proves to make this algorithm slower than the others. On the other
hand one could think that the orthogonal transformations should make the
algorithm more stable, but according to van Dooren and Verhaegen [8] this is
not obvious. In particular, the usage of Sherman-Morrison's formula changes the
numerical behaviour of the updating, and it is not certain that the orthogonal
transformations will restore the eects of this.
Both exponential forgetting and downdating can be applied to this algorithm. Exponential forgetting is achieved by using p
the same rescaling as in the
RLS algorithm, leading to n := and L := L = . The aspects of using
this type of algorithm for downdating are discussed in Alexander et al. [1] and
Gill et al. [12]. An alternative to this orthogonal downdating is an algorithm
presented by Ferng et al. [10], using hyperbolic rotations. A hyperbolic rotation
H is a semi-orthogonal matrix with the following appearance,
H=
cosh ; sinh
; sinh cosh
and is \orthogonal" in the meaning that
HSH T = S;
11
where the signature matrix S = diag(1). The hyperbolic rotations should

however be used with the utmost care, since they do not have the same good
numerical properties as the orthogonal rotations.
There are other ways to apply the orthogonal transformations when updating
Cholesky factorizations. In Appendix A, the derivations for three other algorithms (Givens 2, Givens 3 and Householder algorithm) are outlined in brevity.
These algorithms can however be shown to be less well suited to our purposes,
cf. Sjo [25].
2.3 Algorithms Using Modication of the Information Matrix

As mentioned before, numerical diculties similar to those of downdating arise
when updating the covariance matrix. One important alternative is thus to
update the inverse of the covariance matrix or the information matrix instead,
i.e. to use Equation (2.5). The algorithm that will be derived can be found
in van Dooren and Verhaegen [8] and Bierman [4] and is referred to as the
Square-Root Information Filter (SRIF) algorithm.
To derive the SRIF algorithm we factorize Pt;1 as
Pt;1 = Lt LTt
and form the (n + 1) (n + 1) matrix
T
^
Rt; = L;t;'T ;t;yt :
t
1
Then we apply n Givens rotations to zero the vector ;1 'Tt , giving
LT ^
Rt = U Rt;1 = 0t rt :
The parameter estimation is then given by
^t = L;t T ^t :
The algorithm is summarized as follows:
SRIF algorithm
1. Dene the vector v1 := ;1 't .
2. Set r1 := ;1 yt .
12
(2.13)
3. Use forward recurrence

for j = 1 to n
qj
:= vj 2 + ljj2
c := ljj =
s := vjj =
rj+1 := crj ; s^j
^j+1 := c^j + srj
vj+1 := cvj ; sLj
Lj := cLj + svj
4. Update the parameters ^t := L;T ^t .
2.4 Algorithms Using Modication of the QR

Decomposition
Another important group of algorithms that need a section of its own, is algorithms that update the QR factorization of the matrix . These are important
because an error analysis can be made rather straightforward and, furthermore,
shows that the algorithms have good numerical properties. On the other hand,
these algorithms tend to be less ecient and require more storage of data.
Recall the least-squares problem
; (2.2) and
assume that we know the QR
decomposition of the data matrix t Yt . Then we can write
Rt Y~t ;
0 t
QTt t Yt =
(2.14)
where t = kt ; Yt k2 and the parameter estimate is given by

^t = Rt;1 Y~t :
Note that the matrix Qt is not the Q-matrix of the QR decomposition of t .
When new observations are added, we wish to update the QR decomposition
directly rather than the matrix itself. If our algorithms work on the QR
factorization of the whole matrix , this will lead to an enormous amount of
data storing. Therefore we have to regard the problem of downdating, i.e. old
observations will successively be discarded, giving a window through which we
\see" the data. Hence, this is sometimes called the sliding window method. As
13
mentioned before, downdating is a more dicult problem than updating due to

the fact that when subtracting a rank-one matrix,
RtT Rt = RtT;1 Rt;1 ; 't 'Tt ;
the singular values will relate to each other by
1 (Rt;1 ) 1 (Rt ) n (Rt;1 ) n (Rt ) 0;
i.e. the matrix Rt may be singular although Rt;1 has full column rank, cf.
Bjorck et al. [5]. Although equivalent to downdating, this is not so crucial in
the covariance matrix updating algorithms, since no matrix is inverted.
2.4.1 Updating of the QR Decomposition
First of all we will derive the LINPACK updating algorithm. Assume that
we know the factors Rt , Y~t and t of Equation (2.14) and that we have a new
pair of observations (yt+1 ; 't+1 ). The updating is now simply done by
0 R Y~ 1
0 'T y 1
t+1
t+1 t+1
@ 0 t+1 A := QTt @ Rt+1t Y~t A ;
(2.15)
0
0
0
t
giving the new parameter estimate
^t+1 := Rt;+11 Y~t+1 :
The updating method is equivalent to the sequential row orthogonalization
method for computing the QR decomposition, which is known to be stable.
Note the similarities to the SRIF algorithm.
2.4.2 Downdating of the QR Decomposition
Downdating is a more sensitive problem, and more care has to be taken when
deriving and implementing the algorithms. We will present two algorithms for
least-squares downdating: the LINPACK algorithm and the CSNE algorithm
(Corrected Semi-Normal Equations).
Assume that we have the QR factorization of
0
1
; Y = 'T1 y1 = Q @ R0t Y~t A
(2.16)
t t
t
t
t+1 Yt
0 0
and want to delete the rst row 'T1 . By forming the matrix e1 t Yt we
obtain
1 'T y 0 q Rt Y~t 1
T
Qt 0 1 Y 1 = @
0 t A ;
t+1 t+1
q 0 0
14
where q;tT = qT qT is the rst row of Qt , Tt = '1 Tt+1 and
YtT = y1 YtT+1 . We can now apply a matrix U , formed by a number of
Givens rotations, such that
0 q R Y~ 1 0 1 '^T y^ 1
t t
B 0 Rt+1 Y~t+1 CC :
UT @
(2.17)
0 t A = B
@ 0 0 t+1 A
q 0 0
0 0
0
We have thus performed the transformation
0
1
1 'T y B 10 R'^T Y~y^ C
+1 C
U T 0 1 Y 1 = B
@ 0 t0+1 tt+1
A;
t+1 t+1
0 0
0
where U = QtU . The rst columns on each side read U T e1 = e1, saying that U
must have the following appearance

U = 10 Q0
t+1
and, hence, '^ = '1 and y^ = y1 . By removing the rst row we have thus obtained
the new downdated QR decomposition.
The sliding window computations require a large amount of data storing,
which easily becomes too costly. One way to avoid this is to downdate only the
R-matrix of the QR factorization. This is done in Saunder's LINPACK algorithm, cf. Saunders [24]. Another way is to use the Gram-Schmidt factorization
which reduces the storage requirement for the Q-matrix, cf. Bjorck et al. [5] and
Daniel et al. [7].
To derive the LINPACK algorithm we observe that only the rst row of
Qt is used in the downdating above. In Bjorck et al. [5] the derivations of how
to compute the elements of this vector are given in detail, here we will however
only present the results of these derivations in the LINPACK algorithm given
below.
By leaving the Q-matrix out of the computations, Saunder's LINPACK algorithm improves the eciency. On the other hand it can be shown, cf. Bjorck
et al. [5], that we will lose in accuracy compared to algorithms that downdate
the whole QR factorization, due to the fact that information is lost when the
Q-matrix is left unchanged.
An alternative algorithm is Bjorck's CSNE algorithm, where again the
Q-matrix is left out of the computations, but instead the data matrix t is
included in iterative renement. The derivations are again given in detail in [5].
The semi-normal equations are given by
RtT Rt = Tt Yt :
15
By using iterative renement we obtain the corrected semi-normal equations as

RtT Rt = Tt Yt
r = Yt ; t
(2.18)
RtT Rt = Tt r
= + :
In principle the CSNE algorithm is done as the LINPACK algorithm but in each
step we use iterative renement to improve the numerical accuracy.
The two algorithms are given below.
LINPACK algorithm
1. Compute the vector q := Rt;T '1 and the scalars :=

^ := (y1 ; 'T1 ^t )= .
2. QR factorize the matrix
q R Y~
t t
0 ^ ;
giving the new R-matrix
1 'T y
1
1
0 Rt+1 Y~t+1 :
p1 ; kqk
2
2
and
(2.19)
3. Calculate the scalar t+1 := 2t ; ^2 .

4. Update the parameters ^t+1 := Rt;+11 Y~t+1 .
CSNE algorithm
1. Compute the vector q := Rt;T '1 and form the residual

r := e1 ; t Rt;1 q:
2. Update the soultion by iterative renement
q := Rt;T Tt r;
q := q + q
and form a new residual
r := r ; t R;1 q;
with the norm
:= krk2 :
16
3. Dene the normalized residual vector as e := Yt ; t ^t =t .

4. If t = 0 the system is consistent and is treated as a special case with
:= 0;
else perform the following calculations:
:= (y1 ; 'T1 ^t )=t ;
r := r ; e;
:= eT r;
:= +
and nally
r := r ; e:
5. Calculate the norm 0 := krk2 , i.e. in the consistent case we have 0 = .

6. QR factorize the matrix
q R Y~
t
t
0 t = ;
giving the new R-matrix
1 'T
1
0 Rt+1
y1
Y~t+1 :
7. Calculate t+1 := t .
0
8. Update the parameters ^t+1 := Rt;+11 Y~t+1 .
2
Obviously the complexity of the algorithms is rather high. The LINPACK algorithm uses two triangular solves and the iteritive renement of the CSNE
algorithm adds another three. Furthermore, one QR decomposition is carried
out in each algorithm, giving a total cost of about 3n2 operations for the LINPACK algorithm and about 4pn + 4:5n2 operations, where p is the length of
the window, for the CSNE algorithm. If one can accept the complexity of the
LINPACK algorithm, then the CSNE algorithm can be used whenever the conditioning of the downdating problem is poor. This Hybrid algorithm was
suggested by Elden, cf. Bjorck et al. [5].
17
Note that the scalar is not used in the downdating in the LINPACK
algorithm, but may still be calculated. This is because when we want to use the
Hybrid algorithm we use the factor
1 ; kqk22 ;
> TOL;
where TOL is a suitably chosen factor in the interval (0; 1), e.g. TOL = 0:25, to
check whether the system is too ill-conditioned to use the LINPACK algorithm.
Another alternative to the LINPACK algorithm is an algorithm presented by
Alexander et al. [1], using hyperbolic rotations (see Section 2.2.3). This algorithm has been shown to have almost as good numerical qualities as the LINPACK algorithm, if the problem is not too poorly conditioned, and furthermore
that it uses n2 less operations than the LINPACK algorithm.
2.4.3 Modication of the Inverse of the R-matrix
One drawback of the QR modication methods is that a number of triangular

solves have to be carried out. An alternative strategy is therefore to modify the
inverse of R instead. The algorithms that will be derived are directly related to
the algorithms in the two preceding subsections, cf. Bjorck et al. [5].
To derive the LINPACK inverse updating algorithm we recall the
transformation (2.15) from the LINPACK algorithm in Section 2.4.1. By adding
the unit vector e1 to the rst part of the equation we obtain
QT
1 'T q R
t
t
;
0 Rt = 0
+1
where qT is the rst row of Q. Now, we simply invert these matrices

and dene the matrix St = Rt;1
1 ;'T S 0

t t Q=

0 St
St ; St q :
1
+1
+1
The updating of the last column of Equation (2.15) is achieved by taking the
transpose of the columns
; y Y~ T Q = ; Y~ T :
t
t
t+1 t
t+1
Note that Q, as distinguished from Q in Equation (2.15), does not aect t .
Finally, we calculate
q
t+1 = 2 + 2t
and update our parameters as
^t+1 = St+1 Y~t+1 :
18
LINPACK inverse updating algorithm

1. Perform the orthogonal transformation
0 0
@ Y~tT

+1
St+1 ; 1 St+1 q
1 0 1 ;'T S 1
t t
A := @ yt
Y~tT A Q:
+1
St
2. Calculate t+1 := 2 + 2t .

3. Update the parameters ^t+1 := St+1 Y~t+1 .
2
For the downdating of R;1 we will present two algorithms directly related
to the ones in Section 2.4.2. To derive the LINPACK inverse downdating
algorithm we recall the QR decomposition (2.19), which shortly can be written
U^ T
q R 1 '
t
:
0 = 0 Rt
1
+1
To downdate the inverse of R we simply invert the matrices:
and, fourthermore,

St ; 1 St q
U^ =
1 ;'T S
t
;
0 St
+1
+1
; Y~ T ^ U^ = ; y Y~ T :
t
t
1
The downdated parameters are given by

^t+1 = St+1 Y~t+1 :
+1
An alternative expression for the parameter estimation can be obtained if the

derivations are slightly rearranged, cf. Bjorck et al. [5], giving
^ = ^ ; ^ S q:
t+1
This algorithm is presented below together with the CSNE inverse downdating algorithm (Corrected Semi-Normal Equations), that uses the seminormal equations with one step of iterative renement.
19
LINPACK inverse downdating algorithm

1. Compute vectors q := St '1 and v := St q and scalars
:= 1 ; kqk22
and
T^
^ := y1 ; '1 t :
2. Use n Givens rotations for the transformation

0 y Y~ T 1 0 Y~ T ^
@ 11 ;'T1t+1St+1 A := @ 0t 1
0
St+1
St ; 1 v
1
A Q:^
3. Calculate t+1 := 2t ; ^2 .

4. Downdate the parameters ^t+1 := St+1 Y~t+1 = ^t ; ^ St v.
CSNE inverse downdating algorithm

1. Compute vectors q := St '1 and v := St q.
2. Use one step of iterative renement to compute the following vectors and
scalars:
r
q
q
v
v
r
and
:=
:=
:=
:=
:=
:=
e1 ; t St q;
StT Tt r;
q + q;
St q;
v + v;
r ; t v
:= krk2 :
3. If t = 0 the system is consistent and is treated as a special case with

:= 0
and
t+1 := ^ = t
20
else compute the normalized residual and use another step of iterative
renement to calculate
e := (Yt ; t ^t )=t ;
:= (y1 ; 'T1 ^t )=t ;
r := r ; e;
:= eT r;
:= + ;
r := r ; e;
t+1 := krk2 t =
and
^ := t =
4. Use n Givens rotations for the transformation

0 y Y~ T 1 0 Y~ T ^
@ 11 ;'T1t+1St+1 A := @ 0t 1
0
St+1
St ; 1 v
1
A Q:^
5. Downdate the parameters ^t+1 := St+1 Y~t+1 = ^t ; ^ St v.
2
To modify R;1 rather than R has the advantage that the triangular solves
are avoided, and the number of operations is thus decreased. And, furtermore,
in our application we have no need of knowing the QR decomposition of
explicitly, so in this sense it is not advantageous to use the methods of Sections 2.4.1 and 2.4.2. But then we have forgotten the most important quality
of the algorithms: the numerical stability. Whereas the error analysis of the
algorithms using modication of the QR factorization is rather straightforward
and well established, the ditto for the inverse modication methods have not
been fully examined (for further discussion, see Chapter 3).
21
Chapter 3
Error Analysis
When analysing the eect of round-o errors, there are three aspects that have
to be studied:
What bounds can be obtained for the errors performed in step t?
How do errors performed in step t propagate in subsequent steps?
How do errors performed in dierent steps interact and accumulate?
cf. van Dooren and Verhaegen [8]. In the literature, however, mostly the second
item has been studied so far.
Error analysis in numerical linear algebra can be very complicated, usually
requiring tedious studies to fully examine how errors emerge and propagate for
a certain algorithm. An exampel is Gau elimination, which up to this date
still is subject of intensive research. Therefore, we do not hope to fully examine
the numerical properties of the algorithms presented in Chapter 2, but only to
present some results of the error analyses made by e.g. van Dooren and Verhaegen, Bjorck et al. and Alexander et al., and then by numerical experiments
make certain conclusions about the algorithms.
3.1 Updating of the Covariance and Information Matrix

Consider the RLS Algorithm in Section 2.2.1. Since the algorithms from Section 2.2 all are inherently equivalent to this updating formula we will rstly only
analyse this algorithm, cf. van Dooren and Verhaegen [8]. The rst step is to
analyse the second item above: How do errors performed in step t propagate in
subsequent steps? Let Pt , ^t and so forth denote the accumulated errors in
step t. The calculated factor t may then be expressed as
t = 2 + 'Tt (Pt + Pt )'t = t + 'Tt Pt 't :
22
By the assumption that
= j'Tt Pt 't j jt j
the inverse of this factor is given by
;t 1 = (t + t );1 = ;t 1 + ;t 1 'Tt Pt 't ;t 1 + O(2 ):

The error in the Kalman gain can now be written
Kt = ;t 1 Ft Pt 't + O(2 );

where
Ft = I ; ;t 1 Pt 't 'Tt = I ; Kt'Tt :

This gives the propagated errors Pt+1 and ^t+1 as
Pt+1 = Ft Pt FtT + O(2 )
and
^t+1 = Ft (^t + ;t 1 Pt 't "t ) + O(2 );

where "t is the prediction error. The matrix Ft is an important factor in the
error propagation, and the error propagation for P can roughly be expressed as
kPt k < kFt k kPt k = t kPt k :

(3.1)
Unfortunately, we see that t = kI ;Kt 'Tt k 1, implying that the errors might
+1 2
2
2
increase. This is of course alarming, but on the other hand we know that the
RLS algorithm works in practice if the problem is not too poorly conditioned,
which implies that the bounds of (3.1) are too pessimistic.
Although the algorithms of Sections 2.2.2 and 2.2.3 dier from the RLS
algorithm, the algorithms are inherently equivalent, and the equations above
hold. For the SRIF algorithm we do not calculate ^t+1 from ^t directly, and we
would expect no propagation of errors between them. Yet, such a propagation
is present via the vectors ^t+1 and ^t and the relation
^t+1 = (Lt+1 + Lt+1 );T ^t+1 :
In fact, the SRIF algorithm is inherently equivalent to an update of P and ^,
and the equations above hold for this algorithm as well, cf. van Dooren and
Verhaegen [8].
The main reason for using factorized forms of the covariance matrix (or the
information matrix) is that one is bound to lose accuracy whenever P (or P ;1 )
is calculated explicitly. In [8] it is shown that this is valid, as far as the error
analysis is concerned, only for P (or P ;1 ) but not for K or ^. And clearly, since
we are mainly interested in a good parameter estimate, this implies that we do
not necessarily gain accuracy by using the algorithms that involve factorized
23
forms instead of the RLS algorithm. What we know, however, is that we will
lose eciency when these algorithms are used.
The error analysis concerning items one and three above, i.e. What bounds
can be obtained for the errors performed in step t? and How do errors performed
in dierent steps interact and accumulate?, are more dicult to derive and will
only be presented here with approximate bounds for the RLS algorithm, the
Givens 1 algorithm and the SRIF algorithm, cf. van Dooren and Verhaegen [8]
(k k denotes the 2-norm throughout).
1. RLS algorithm
kPt k < kPt k

kKtk < kKt k

k^t k < kFt k k^t k + kKtk kytk

+ kKtk k't k k^tk + kyt k
+1
+1
+1
2. Givens 1 algorithm
kLt k < 2 kLt k

kKtk < jt j; = (2 kLt k + jtj kKt k)
k^t k < kFt k k^t k + kKtk kytk

+ kKt k k't k k^t k + kyt k
+1
+1
1 2
+1
+1
3. SRIF algorithm
kLt k < 2 kLt k

;

k^t k < kLt k 1 + (Lt ) jrj + (Lt ) k^t k
+1
+1
+1
+1
+1
+1
+1
where i are constants close to the machine precision u, () is the condition
number and r is the factor from Equation (2.13).
The dierences are quite clear and can be left to the reader to observe. We
will, however, make the following comments:
The expression for the error in the parameters is almost the same for
the RLS and Givens 1 algorithms, although the errors in the covariance
matrix and the Kalman gain dier. This implies that accuracy might not
necessarily be gained by using factorized forms of the covariance matrix
as far as the parameters are concerned.
Regarding the errors in the parameters, there are factors that are not
multiplied by an i in all three algorithms.
24
The appearance of the parameter error in the SRIF algorithm diers sig-
nicantly from the other two, implying that other stability properties can
be expected.
3.2 Updating of the QR Decomposition

If the algorithms in the preceding section are the ones of control theorists, the
algorithms of this section are those of the numerical analysts. Therefore the
error analysis for these algorithms involves terms like forward-, backward- and
mixed stability, terms often used in numerical analysis.
Assume that we have a problem F and an algorithm A, and say that we
have an exact solution
F (x) = y:
(3.2)
If we perturb x by a small perturbation x, we obtain
F (x + x) = y + y:
Provided that jxj ) jyj < , we say that we have forward stability.
A more powerful tool for the stability analysis of problems and algorithms
is backward stabilty. For the problem (3.2) it can be dened as follows: If we
perturb the data y by y, we obtain a perturbed solution x + x satisfying (3.2).
If jyj ) jxj < , the problem is backward stable. Note that this is a
condition on the inverse F ;1 ; the solution must depend continuously on the
data y.
This technique is also used in the analysis of a computational algorithm A for
the solution of (3.2). The algorithm can be thought of as an approximate inverse
to F . Given the data (F ; y) of the problem (3.2), it produces an approximate
solution x~:
x~ = A(F ; y):
This approximate solution satises some equation
F~(~x) = y~;
and the backward stability approach tries to ensure that this perturbed problem
is \nearby" the original problem. In other words, the algorithm is amenable to a
backward stability analysis, if it produces the exact solution of a nearby problem.
Backward stability can then be expressed as
jF~ ; Fj < 1 ^ jy ; y~j < 2 ) jx ; x~j < :
It is common to ask for uniform continuity, i.e. should be proportional to the
i . In addition, the typical error analysis is often carried out in such a way that
we require and i to be expressed in terms of problem parameters (i.e. the
25
\data" F and y, ususally in addition F ;1 ) and some unit round-o u related

to the working precision of the computer.
In the least-squares problem, Equation (3.2) is represented by the overdetermined system
Ax = b;
or, in the regression problem, by
t = Yt :
The term mixed stable
; will be used when we talk about the QR decomposition
of the data matrix t Yt . We say that the algorithm is mixed stable if the
calculated factors; Rt , Y~t and t are close
to the corresponding quantities in the
exact factor of t + t Yt + Yt , where t and Yt are small.
Backward stability is the strongest type of stability we can require of an
algorithm. Amongst the algorithms in Section 2.4 it is however only the LINPACK updating algorithm that is backward stable, cf. Paige [19] and Bjorck et
al. [5].
As mentioned in Chapters 1 and 2, the downdating is numerically a more
dicult problem than updating, and we can not hope to obtain backward stability for downdating algorithms. When discarding an observation the singular
values of the downdated R-matrix, R~ relates to the ones of the old R as
1 ~1 2 ~2 n ~n 0;
i.e. R~ may become singular although R has full column rank. Stewart [26] has
shown that the vector q = R;T ' (see Section 2.4) plays an important role in
identifying possible ill-conditioning in the downdating problem. In fact, Stewart
shows that
1 p 1 ;
~n
1 ; kqk22
i.e. whenever kqk2 is close to 1, the problem must be poorly conditioned.
If kqk2 > 1, then R~ is singular. (Note the resemblance to the conditioning
test in the Hybrid algorithm in Section 2.4.2.) This shows that downdating can
be a very sensitive problem. In many applications, however, the downdating
is always preceded by an updating, which lessens the ill-conditioning eect of
the downdating, since in updating the singular values relate to each other by
~i i , cf. Alexander et al. [1].
So, what can be said about the stability of the downdating algorithms in
Section 2.4.2? Paige [19] has shown that the method where both the Q- and
R-matrices are downdated is mixed stable. For the LINPACK downdating algorithm Pan [20] has given an error bound for the R-matrix: Let > 0 be small
enough so that the factorization
R~ T ()R~ () = (R + E )T (R + E ) ; (' + f )(' + f )T
exists for all 2 (;; ), where E is an upper triangular matrix. Then we have
26
kR~ () ; R~ k
kR~ k
2
(R)C
2
2n(n =
1 2

k
f
k
k
E
k
=
C + 1) k' k + (2n C + 2n + 1) kRk jj + O( );
2
3 2
1 2
where () is the condition number and
C = p kqk2 2 :
1 ; kqk2
Once again we see that the closeness of kqk to 1 is important via the factor C
2
to indicate the ill-conditioning of the downdating problem.

For the CSNE downdating algorithm Bjorck et al. [5] has derived an error
bound for the calculated parameters ^. Recall the corrected semi-normal equations (2.18) and assume that the computed R-matrix R is such that there exists
an exactly orthogonal matrix Q^ such that

+ E = Q^ R;
with kE kF uc1 kkF , where the index F indicates the Frobenius norm and u
is the round-o unit. Then, if uc1 n1=2 < 1, we have
k ; ^k2

k
r
k
k
Y
k
=
=
u c kk + n p kk + n nkk + p kk + n kk ;
where = c , c = 2n = (c + n) and c 2n = (c +2n + p=2). According to

1 2
1 2
1 2
1 2
1 2
Bjorck et al., if < 1, the forward error will not be worse than for a backward
stable method.
27
Chapter 4
Time-Varying Parameter
Estimation
In almost any system where recursive least-squares algorithms are used, the parameters are assumed to vary with time. If this were not the case, the parameter
estimation would simply be a tuning process, which is not really a demanding
problem.
If the parameters vary with time, we have to discount old observations in
some way, or otherwise the result will be poor estimates. The two major categories of discarding old observations are forgetting and downdating. In this
chapter we will discuss dierent ways of dealing with the problem of timevarying parameters.
Of course we have to make certain assumptions about how the parameters
can vary. To be able to produce good parameter estimates, we can only allow
the parameters to vary slowly and/or seldom compared with the time constants
of the system, cf. Hagglund [15]. If we for instance allow the parameters to vary
quickly, noise can be interpreted as a parameter change and we will obtain poor
estimates. The parameter variations can now be divided into two types: the
slow parameter changes, where the problem is to determine \the right amount
of forgetting", and the large parameter changes, where the problem is to detect
when a parameter change has occured.
The algorithms based on downdating are automatically adapted to timevarying parameters, and therefore in this chapter we will mainly discuss forgetting. The most common type of forgetting in control theory is exponential
forgetting, where the data are subject to time-dependent weights, and the leastsquares problem becomes
min

Xt
i=1
t;i yi ; 'Ti 2 ;
28
where 0 < < 1. The exponential forgetting is appealing due to its obvious
interpretation and that it easily can be applied to the recursive algorithms
that use updating of the covariance matrix, see Section 2.2. The exponential
forgetting does however have disadvantages. If the forgetting factor is chosen
too large, it will lead to slow convergence of the estimates, and if is chosen
too small, it will lead to \noisy" estimates. Another problem that may occur
during times of low excitation is estimator wind-up, i.e. whenever the vector
Pt;1 't = 0 the covariance matrix will increase exponentially.
These problems have led to a number of more or less successful alternative
types of forgetting. Fortesque et al. [11] and Sano and Wellstead [23] have made
attempts to use a time-variable forgetting factor that depends on the prediction
error, i.e. (t) "(t);1 . Irving [16] proposed the use of a forgetting factor that
keeps the trace of the covariance matrix constant to eliminate the estimator
wind-up. The estimation can also be restarted repeatedly, instead of using a
forgetting factor. This method is successfully practised by Betz and Evans [3],
where the covariance matrix is reset to a large matrix repeatedly. Another way
to deal with the trade-o between fast adaption and parameter uctuations
has been suggested by Bobrow and Murray [6], where a desire to minimize the
parameter uctuations is added to the cost function. This is achieved by a stable
polynomial in the forward shift operator, which determines how the parameters
should decay. The method does however have the drawback that an inertia is
added to the system.
We will now present an important alternative suggested by Hagglund [15]
called directional forgetting. One problem with exponential forgetting is that
although new information only arrives in the '-direction, old information is removed in all directions. In directional forgetting, however, information will only
be removed in the '-direction. The algorithm given in [15] gives a covariance
matrix proportional to the identity matrix, i.e.
lim P = a I;
(4.1)
t!1 t
where the diagonal elements of Pt may be interpreted as approximations of the
variances of the corresponding parameters.
To derive the new algorithm a more general denition of the covariance
matrix, or rather the information matrix, is needed. Let the updating of the
information matrix be given by
Pt;1 = Tt Vt;1 t = Tt;1 Vt;;11jt t;1 + 't vt;1 'Tt ;
where
V ;1 0
;
1
Vt = t;01jt v;1 :
t
If information only shall be discarded in the '-direction, Vt;;11jt is chosen so that
Tt;1 Vt;;11jt t;1 = Tt;1 Vt;;11 t;1 ; (t)'t 'Tt ;
29
where (t) is a scalar. The P-matrix is then updated as

T
Pt := Pt;1 ; ; ;1 Pt;1 ';t '1t Pt;T1
vt ; (t) + 't Pt;1 't
(4.2)
or equivalently, the information matrix is updated as
;

Pt;1 := Pt;;11 + vt;1 ; (t) 't 'Tt :
The information at time t can be expressed as a sum of three terms: the information at time t ; 1, the old information which is removed and the new
information. The corresponding expression for the exponential forgetting is
Pt;1 := Pt;;11 ; (1 ; )Pt;;11 + vt;1 't 'Tt :

Here, we can see what we stated before, that in directional forgetting we only
forget in the '-direction, represented by the factor (t)'t 'Tt . In exponential
forgetting the corresponding factor is (1 ; )Pt;;11 , which leads to a forgetting
in all directions.
The directional forgetting will, appart from Equation (4.2), not aect the
RLS algorithm of Section 2.2 more than that the Kalman gain now is given by
Kt := Pvt 't :
t
The remaining issue is how to choose the factor (t). In [15] this is derived
by using (4.1) and the fact that (t) should be non-negative.
In the case of large and seldom occuring parameter changes there are mainly
two methods that have been put into practice. Both use the fact that whenever a
large parameter change is detected, the covariance matrix should be increased.
The rst method is to decrease the forgetting factor . The growth of Pt is
then nearly exponential. The second method is to add a constant times the
unit matrix to the P -matrix in which case Pt is increased instantaneously. The
updating of the covariance matrix is then given by
;1 't 'Tt Pt;1 + (t) I;

Pt := 1 Pt;1 ; Pt+
'Tt Pt;1 't
where (t) = 0 until a parameter change has been detected when it becomes a
non-negative number. How this number is chosen and, the most important issue,
how the parameter change is detected is discussed in detail in Hagglund [15].
30
Chapter 5
Test Results
In this chapter we will apply some of the algorithms presented in the previous
chapters to ve test examples. As we could see in e.g. Chapter 3, there are some
dierences between the way a numerical analyst and a control theorist tackles
the recursive least-squares problem. The test examples will therefore be divided
into two sections, of which the rst deals with an example taken from control
theory while the second deals with an example taken from numerical analysis.
5.1 Identication of Ship Steering Dynamics

The fundamental example to the test examples in this section is taken from a
report by Kallstrom and
Astrom [17], which deals with identication of ship
steering dynamics. We will not put any weight at the real steering dynamics of
the boat (which in this case is a freighter called the Atlantic Song), but simply
nd reliance in the fact that the discrete time model that we will use later on
has a connection to reality.
In [17] one nds that in discrete time the ship's steering dynamics can be
described by a second order system (the noise terms have been omitted)
yt + a1 yt;1 + a2 yt;2 = b1 ut;1 + b2 ut;2 ;
(5.1)
where ut is the input signal (the rudder angle) and yt is the output signal (the
heading angle). In [17] the parameters are estimated by using free steering
experiments on the freighter itself, and are given by
a1
a2
b1
b2
=
=
=
=
;1:60
0:61
;0:161
;0:285:
31
The regression model to this problem is then given by

yt = 'Tt ;
where
'Tt = ( ;yt;1 ;yt;2 ut;1 ut;2 );
and
T = ( a1 a2 b1 b2 ):
The goal is now to produce test examples that make it dicult for the
algorithms to estimate the parameters . This can be done in many ways,
among which we have chosen four. This will be discribed in Examples 5.1{5.4
below.
Example 5.1
Poorly conditioned problems can be generated if the columns of the matrix

in Equation (2.3) are linearly dependent. This can be obtained for example
if we generate data with the second order system (5.1), but try to estimate
parameters of a third order system, i.e.
yt + a1 yt;1 + a2 yt;2 + a3 yt;3 = b1 ut;1 + b2 ut;2 + b3 ut;3 :
The problem with this type of examples is that we have no \true" parameter
values, so there may be some diculties in telling how well an algorithm can
cope with this type of problems.
The input signal in this example will be a Pseudo-Random Binary Sequence
(PRBS), i.e. a sequence with quasi white noise properties where the rudder angle
is switched between ;10 and 10 degrees, see Figure 5.1.
Example 5.2
Another possibility to obtain a poorly conditioned problem is to use an input

signal that is not Persistently Exciting (PE).
A square summable signal u is PE of order n i
1
tlim
!1 t
Xt
k=1
P (q)u(k)
> 0;
where q is the forward shift operator, for all nonzero polynomials P of degree
n ; 1 or less, cf. Wittenmark and
Astrom [27]. In our case we have a second
order system, so to get an input signal that is not PE we simply let ut be a
step, since a step is PE of order 1.
32
Example 5.3
The problem with particularly the RLS algorithm is that the elements of the
-matrix are squared when the covariance matrix is calculated. This means
that if we for example have an input signal containing one large constant term
and one small time-varying term, problems may occur when the RLS algorithm
is used.
In this example we will use the same input signal as in Example 5.1 but will
add a constant term = 108 . Since we are using MATLAB's double precission
calculations in all test examples, this is bound to cause problems, at least when
the RLS algorithm is used. The input and output signals are thus transformed
as
ut := ut +
(1) ;
yt := yt + B
A(1)
where A(q) = q2 + a1 q + a2 and B (q) = b1 q + b2 .
Example 5.4
In this example we will examine what happens when we have a time-varying

parameter. We let the input signal be a PRBS as in Example 5.1 and let the
parameter a2 vary as
a2 (t) = a2 (0) + Tt (1 ; a2 (0));
where a2 (0) is the value of a2 in the previous examples and t 2 (0; T ). This
represents a slow variation in the system. A fast variation in the system is
obtained if we add some noise to the model (5.1). The discrete time model will
now have the following appearance:
yt + a1 yt;1 + a2 (t)yt;2 = b1 ut;1 + b2 ut;2 + et ;
where ei 2 N (0; 2) is white noise.
We have chosen ve of the algorithms given in Chapter 2: The RLS algorithm, Peterka's algorithm, the Givens algorithm, the Hybrid algorithm and
the Inverse Hybrid algorithm. For each of the rst three test examples we have
plotted the input- and output signals followed by plots of the factor1
q = kR;T 'k2 = kL'k2;
1 For Peterka's algorithm L here stands for LD 1=2 .
33
u (rudder angle)
10
-10
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
y (heading angle)
100
50
0
-50
-100
Figure 5.1: The input signal u and output signal y in Example 5.1.
cf. Section 3.2, the parameter estimates ^ and the prediction error " for each
algorithm. (For some algorithms the results are indistinguishable, and we have
in those cases only presented a single plot.)
The initial values for the parameters and matrices in the algorithms are given
in dierent ways: for the algorithms that use updating of the covariance matrix
the parameters are simply set to 0 and the covariance matrix is initialized as
P0 = I , where is chosen to be a large number to minimize the in uence on
Pt , since according to Equation (2.5), Pt depends on P0 as
;

Pt = Tt t + P0;1 ;1 :
For the Hybrid algorithms the initial values are given by solving an overdetermined system of the size of the window. To choose window size can sometimes
be a problem, as we will see later on. In the examples where nothing else is
mentioned, a window of length 20 will be used.
Example 5.1 (cont.) Figures 5.1{5.11

As mentioned before, we have no \true" parameter values in this example,
so the obtained parameter estimates are not important as such. (The ones
achieved by Peterka's algorithm and the Givens and RLS algorithms are however
close to those obtained in the experiments in [17] when a third order system
is estimated). We may however observe how the factor q and the prediction
34
error " behave. And furthermore, it would be preferable if a realization with

the obtained estimates gives an output signal ym that is close to the real output
y.
In the parameter estimate plots the dierent estimates are shown as follows:
^1 (a1 ) solid
^2 (a2 ) dashed
^3 (a3 ) dotted
^4 (b1 ) dashdot
^5 (b2 ) point
^6 (b3 ) dotpoint:
The results for Peterka's algorithm and the Givens algorithm are identical, see
Figures 5.2{5.3: the factor q is, except in the initialization period, well below
1 and the estimates nd values rather quickly that they hold on to throughout
the experiment. The prediction error is small and is dependent on the initial
value of the covariance matrix, see Figure 5.5. In this experiment we have used
= 1016. For the RLS algorithm the covariance becomes non-positive denite,
i.e. 'Tt Pt 't 0, and consequently the factor q cannot be calculated, since there
is no matrix Lt . Despite this fact the parameter estimates are good and the
prediction error is of the same magnitude as for Peterka's algorithm and the
Givens algorithm, see Figure 5.4.
For the Hybrid algorithms we nd that q is closer to 1 and that the estimates
are very unsteady, see Figures 5.6{5.9. Despite this fact the prediction error is
small (and for the non-inverse Hybrid algorithm very small). One reason for
this is an eect that we can refer to as equivalent systems, i.e. if the real system
is given by
(z ) u ;
yt = B
A(z ) t
where zi are the poles and zeros of the system, then we have an equivalent
system given by
(z + a)
yt = B
A(z + a) ut :
Since we are estimating a third order system, this will in our example show up
as two constant poles and one constant zero, which can be seen in Figure 5.10{
5.11. The consequence of this is unsteady estimates. It may be interesting to
know that a realization of the the nal estimates of the Hybrid algorithm gives
an output signal ym that is close to y. This is however not the case for the
Inverse Hybrid algorithm.
So, we cannot say if the estimates obtained by the Hybrid algorithm are
worse than e.g. the ones obtained by Peterka's algorithm. But if we have to
choose it would certainly be preferable to have moderately varying estimates.
35
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.2: The factor q = kL'k2 for Peterka's algorithm and the Givens algorithm in Example 5.1.
Parameter estimates
0.5
0
...............................................................................................................................................................................................
.....
. ...............................................................................................................................................................................................
-0.5
-1
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
0.5
0
-0.5
-1
Figure 5.3: Parameter estimates ^ and prediction error " for Peterka's algorithm
and the Givens algorithm in Example 5.1.
36
Parameter estimates
0.5
0
...............................................................................................................................................................................................
.....
. ...............................................................................................................................................................................................
-0.5
-1
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
0.5
0
-0.5
-1
Figure 5.4: Parameter estimates ^ and prediction error " for the RLS algorithm
in Example 5.1.
0
+
-2
+
-4
log(||eps||)
-6
+
-8
+
-10
+
-12
+
-14
10
12
14
16
18
20
log(rho)
Figure 5.5: log(k"k) vs. log(), where is the factor in P0 = I , for Peterka's
algorithm in Example 5.1.
37
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.6: The factor q = kR;T 'k2 for the Hybrid algorithm in Example 5.1.
Parameter estimates
10
5
0
.. .......
................................................................
..........
...
.....
.................................................................................................................................................................
.
.
-5
-10
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
0.5
0
-0.5
-1
Figure 5.7: Parameter estimates ^ and prediction error " for the Hybrid algorithm in Example 5.1.
38
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.8: The factor q = kS T 'k2 for the Inverse Hybrid algorithm in Example 5.1.
Parameter estimates
1 x10
12
0.5
. ..
....................
.......................
..............................
...........
.......
.......
.................
...... ............
.................................................
........................
...................................
...........
........................................
.................
0
-0.5
-1
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
0.02
0.01
0
-0.01
-0.02
Figure 5.9: Parameter estimates ^ and prediction error " for the Inverse Hybrid
39
Zeros of A(z)
5
0
-5
-10
-15
20
40
60
80
100
120
140
160
180
200
40
60
80
100
120
140
160
180
200
Zeros of B(z)
-5
-10
-15
20
Figure 5.10: The zeros of the polynomials A(z ) and B (z ) respectively for the
Hybrid algorithm in Example 5.1.
Zeros of A(z)
6 x10
11
4
2
0
-2
-4
20
Zeros of B(z)
6 x10
40
60
80
100
120
140
160
180
200
40
60
80
100
120
140
160
180
200
11
4
2
0
-2
-4
20
Figure 5.11: The zeros of the polynomials A(z ) and B (z ) respectively for the
Inverse Hybrid algorithm in Example 5.1.
40
u (rudder angle)
10
-10
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
y (heading angle)
500
-500

As distinguished from the preceding example, we do here have true parameter
values, but due to the non-persistently exciting input signal, see Figure 5.12,
the algorithms have problems in nding the correct estimates.
In the parameter estimate plots the dierent estimates in this and the succeeding examples are shown as follows:
^1 (a1 ) solid
^2 (a2 ) dashed
^3 (b1 ) dotted
^4 (b2 ) dashdot:
The Inverse Hybrid algorithm is clearly unstable in this case. The Hybrid algorithm on the other hand nds the true values despite a rather noisy behaviour
before the step intervenes, see Figures 5.13{5.16. In this example the choise of
window length is a crucial item. We have chosen a window of length 14, which
gives good parameter estimates and a small prediction error for the Hybrid algorithm. A window of length 18 would also give acceptable results, but windows
with for example lengths 13, 15, 16 and 17 lead to q > 1, and the estimation
breaks down. For the Inverse Hybrid algorithm we cannot nd a window length
that removes the instability. This indicates that these algorithms are not very
robust.
41
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.13: The factor q = kR;T 'k2 for the Hybrid algorithm in Example 5.2.
Parameter estimates
40
20
0
-20
-40
0
Prediction error
2 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-12
1
0
-1
-2
Figure 5.14: Parameter estimates ^ and prediction error " for the Hybrid algorithm in Example 5.2.
42
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Parameter estimates
2 x10
16
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
100
50
0
-50
-100
43
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.17: The factor q = kL'k2 for Peterka's algorithm and the Givens
Parameter estimates
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
1
0.5
0
-0.5
-1
0
Figure 5.18: Parameter estimates ^ and prediction error " for Peterka's algorithm, the RLS algorithm and the Givens algorithm in Example 5.2.
44
Parameter estimates
30
20
10
0
-10
-20
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
100
50
0
-50
-100
Figure 5.19: Parameter estimates ^ and prediction error " for the RLS algorithm
in Example 5.3.
For Peterka's algorithm and the Givens algorithm we can again observe a
more stable behaviour. The estimates converge quickly to some values (^3 (b1 )
and ^4 (b2 ) converge to erroneous values). As the step intervenes the prediction
error increases and q imediately goes to 1, but as the estimates instantaneously
converge to new (and correct) values, " and q decrease again. See Figures 5.17{
5.18. For the RLS algorithm the covariance matrix once again becomes nonpositive denite, but still the estimates and the prediction error are close to the
results of Peterka's algorithm and the Givens algorithm.

The interesting feature of this test example is the dierence between the RLS
algorithm and Peterka's and Givens' algorithms. The elements of the matrix
t are approximately p = 108 + , where is of the magnitude 10. In the RLS
algorithm these factors are squared as the covariance matrix is formed, giving
p2 = 1016 + 2 108 + 2 . Since we are working with MATLAB double precision
calculations, the factor 2 will be seriously truncated. The consequence of this,
apart from a non-positive denite covariance matrix, is unsteady estimates. The
estimates are shown in Figure 5.19. For Peterka's algorithm and the Givens
algorithm this eect will not occur, which can be seen in Figures 5.20{5.21.
45
Parameter estimates
-1
-2
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-5
0.5
0
-0.5
-1
Figure 5.20: Parameter estimates ^ and prediction error " for Peterka's algorithm in Example 5.3.
Parameter estimates
-1
-2
Prediction error
1 x10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
-5
0.5
0
-0.5
-1
Figure 5.21: Parameter estimates ^ and prediction error " for the Givens algorithm in Example 5.3.
46
u (rudder angle)
10
-10
0
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
y (heading angle)
100
50
0
-50
-100
The Hybrid algorithm tackles this test example successfully, while the Inverse
Hybrid algorithm fails again.

When we are dealing with a time-varying system we have to make a trade-o
between fast adaption and stable estimates. In the covariance matrix updating
algorithms this is represented by the choice of the forgetting factor , and in
the QR decomposition algorithms it is represented by the length of the window.
If we choose a that is too large or a window that is too long, the result will be
slow convergence because the algorithm is not forgetting fast enough. On the
other hand, if we choose a that is too small or a window that is too short,
the result will be noisy estimates since the algorithm interprets the white noise
as parameter changes. To illustrate this we have plotted the square-sum of the
prediction errors vs. the forgetting factor and the window length respectively, see
Figures 5.23{5.24. In this example a forgetting factor about 0.9 and a window
length between 20 to 30 would be appropriate. We have used = 0:9 and a
window of length 20.
47
70
65
+
60
||eps||
+
+
55
50
45
+
40
0.4
0.5
0.6
0.7
0.8
0.9
1.1
Forgetting factor
Figure 5.23: k"k2 vs. ln(), where 2 f0:5; 0:7; 0:9; 0:95; 0:99; 0:995; 0:999; 1g
for Peterka's algorithm in Example 5.4.
50
45
40
+
+
35
+
+
||eps||
30
25
+
+
+
+
20
15
10
5
0
10
15
20
25
30
35
40
45
50
Window length
Figure 5.24: k"k2 vs. the window length for the Hybrid algorithm in Example 5.4.
48
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.25: The factor q = kL'k2 for Peterka's algorithm and the Givens
The results for the dierent algorithms are shown in Figures 5.25{5.31, and
will not be commented further upon.
5.2 Hilbert Matrix

Example 5.5 Figures 5.32{5.35
This test example is taken from a report by Bjorck et al. [5] and is constructed
in the following way: generate an overdetermined system
= Y;
where is a 5 1 vector with ones as its elements,
H
H + ;
where H is a 25 5 Hilbert matrix, H is the same matrix but with the rows in
reversed order and is a 50 5 matrix with elements uniformly distributed in
(0; 10;9), and
Y = + Y;
=
49
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.26: The factor q = 'Tt Pt 't for the RLS algorithm in Example 5.4.
Parameter estimates
2
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
10
5
0
-5
-10
Figure 5.27: Parameter estimates ^ and prediction error " for Peterka's algorithm, the RLS algorithm and the Givens algorithm in Example 5.4.
50
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Figure 5.28: The factor q = kR;T 'k2 the Hybrid algorithm in Example 5.4.
Parameter estimates
2
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
10
5
0
-5
-10
Figure 5.29: Parameter estimates ^ and prediction error " the Hybrid algorithm
in Example 5.4.
51
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
40
60
80
100
120
140
160
180
200
Parameter estimates
2
1
0
-1
-2
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Prediction error
10
5
0
-5
-10
52
Parameter estimates
1000
500
0
-500
-1000
-1500
Prediction error
1 x10
10
15
20
25
30
35
40
45
50
-6
0.5
0
-0.5
-1
10
15
20
25
30
35
40
45
50
Figure 5.32: Parameter Estimates ^ and prediction error " for the Hybrid algorithm in Example 5.5.
where Y is a 50 1 vector with elements uniformly distributed in (0; 10;6).
In this test example we choose only to compare the Hybrid algorithm and
Peterka's algorithm. For the Hybrid algorithm we have chosen a window of
length 8 and for Peterka's algorithm we have chosen a forgetting factor = 1
and initial values ^0 = 0 and P0 = 1016 . We will present plots of the parameter
estimates ^, the prediction error ", the factor q and the relative error norm of
the parameter estimates, i.e.
^ ^
" = k k;kls k2 ;
2
where ^ls is the non-recursive solution to the windowed least-squares problem

when the Hybrid algorithm is used and the solution to the overdetermined linear
system containing the equations upto time t when Peterka's algorithm is used,
see Figures 5.32{5.35.
We see that the results coincide with the results from the previous section.
The Hybrid algorithm gives noisier estimates than Peterka's algorithm. If we
observe the relative error norm, we see that Peterka's algorithm gives estimates
that, except for the initial period, are very close to the solution to the corresponding overdetermined linear system. The Hybrid algorithm does not give as
good results (it should however be pointed out that the results obtained here
53
1
0.8
0.6
0.4
0.2
0
0
10
15
20
25
30
35
40
45
50
Relative error norm
10 0
10 -3
10 -6
10 -9
10 -12
10
15
20
25
30
35
40
45
50
Figure 5.33: The factor q = kR;T 'k2 and relative error norm for the parameter
estimates " for the Hybrid algorithm in Example 5.5.
Parameter estimates
1.1
1.05
1
0.95
0.9
0
Prediction error
1 x10
10
15
20
25
30
35
40
45
50
10
15
20
25
30
35
40
45
50
-6
0.5
0
-0.5
-1
Figure 5.34: Parameter Estimates ^ and prediction error " for Peterka's algorithm in Example 5.5.
54
1
0.8
0.6
0.4
0.2
0
0
10
15
20
25
30
35
40
45
50
10
15
20
25
30
35
40
45
50
Relative error norm
10 4
10 1
10 -2
10 -5
10 -8
10 -11
Figure 5.35: The factor q = kL'k2 and relative error norm for the parameter
estimates " for Peterka's algorithm in Example 5.5.
dier from the results in [5]). It is however doubtful if the relative error norm
for the two algorithms is comparable since it is calculated in dierent ways.
55
Appendix A
Alternative Algorithms
Using Orthogonal
Transformations
As mentioned in Chapter 2 there are other ways to apply the orthogonal transformations when updating Cholesky factorizations. In the Givens 2 Algorithm we multiply an orthogonal matrix Q, formed by n ; 1 Givens rotations,
to the updating formula (2.12) in the following way
;

P = LQT Q I ; ;1 vvT QT QLT ;
so that Qv = e1 . This gives

P = H T JJH = H T H;
where H and H are upper Hessenberg matrices and
J=
p1 ; ;
1 2
0
In;1 :
Then we apply a new orthogonal matrix Q formed by n ; 1 Givens matrices to
give the nal result
P = H T Q T Q H = L L T :
To derive the Givens 3 Algorithm we will once again modify the updating
formula (2.12), i.e.
;

;
;

P = L I ; ;1 vvT LT = L I + vvT I + vvT LT ;
where v = LT ' and
;1
= ; p ;1 T :
1+ 1; v v
56
Then we perform orthogonal transformations so that

;
;

P = L I + vvT QQT I + vvT LT = LH T HLT ;
where Q is a product of n ; 1 Givens matrices, which reduces the vector v to a
multiple of e1 , i.e. Qv = 1 e1 , giving
Q I + vvT = Q + 1 e1 vT = H;
where H is an upper Hessenberg matrix. As in the Givens 2 Algorithm we apply
a new orthogonal transformation to reduce H to a upper triangular matrix L~ T ,
i.e.
T = LL~ L~ T LT :
P = LH T HLT = LH T Q T QHL
Since the product of two lower triangular matrices is lower triangular, the derivation is completed with the new Cholesky factor L = LL~ .
Finally, we have the Householder Algorithm, where the covariance matrix
will be written in LDLT form, giving the updating formula
P = L D L T ;= LDLT ; ;1 LDLT ''T LDLT
= LD1=2 I ; ;1 vvT D1=2 LT ;
with v = D1=2 LT '. Now, we factorize the matrix
;
I ; ;1 vvT = I + vvT I + vvT ;

where is the same parameter as in the Givens 3 Algorithm. The idea is now,
by repeated multiplication of Householder matrices Qi , to reduce the matrix
I + vvT to a lower triangular matrix
;

L^ = I + vvT Q1 Q2 Qn;1 ;
giving
P = LD1=2 L^ L^ T D1=2 LT :
Note that L^ , as distinguished from L, does not have a unit diagonal. Finally,
we have to rewrite the previous equation as follows,

P = LD1=2 L^ L^ T D1=2 LT = LL~ ;D;L~ T LT = L D L T ;
where L~ is a unit lower triangular matrix and ; is a diagonal matrix with the
same diagonal elements as L^ . Thus we have L = LL~ and D = ;D;.
57
Bibliography
[1] S. T. Alexander, C. T. Pan and R. J. Plemmons. Analysis of a Recursive
Least-Squares Hyperbolic Rotation Algorithm for Signal Processing. Lin.
Alg. Appl., vol. 98, pp. 3{40, 1988.
[2] Andersson, Bjorck and Dahlquist. Numerical Methods. Prentice-Hall, 1974.
[3] R. E. Betz and R. J. Evans. New Results and Applications of Adaptive Control to Classes of Nonlinear Systems. Proc. Workshop on Adaptive Control,
Florence, Italy, 1982.
[4] G. J. Bierman. Factorization Methods for Discrete Sequential Estimation.
Mathematics in Science and Engineering, vol. 128, Academic Press, 1977.
[5]
A. Bjorck, L. Elden and H. Park. Accurate Downdating of Least Squares
Solutions. IMA Preprint Series # 947, April 1992.
[6] J. E. Bobrow and W. Murray. An Algorithm for RLS Identication of Parameters that Vary Quickly with Time. Report, April 1991.
[7] J. Daniel, W. B. Gragg, L. Kaufman and G. W. Stewart. Reorthogonalization and Stable Algorithms for Updating the Gram-Schmidt QR Factorization. Math. Comp., vol. 30, pp. 772{795, 1976.
[8] P. van Dooren and M. Verhaegen. Numerical Aspects of Dierent Kalman
Filter Implementations. IEEE Trans. Aut. Control, vol. 31, no. 10, pp. 907{
917, 1986.
[9] L. Elden and H. Park. Block Downdating of Least Squares Solutions. Department of Mathematics Lindkoping University, August 1992.
[10] W. R. Ferng, G. H. Golub and R. J. Plemmons. Adaptive Lanczos Methods for Recursive Condition Estimation. Numerical Algorithms, pp. 11{20,
1991.
[11] T. R. Fortesque, L. S. Kershenbaum and B. E. Ydstie. Implementation
of Self-Tuning Regulators with Variable Forgetting Factors. Automatica,
vol. 17, pp. 831{835, 1981.
58
[12] P. E. Gill, G. H. Golub, W. Murray and M. A. Saunders. Methods for Modifying Matrix Factorizations. Mathematics of Computation, vol. 28, no. 126,
pp. 505{535, April 1974.
[13] J. Gotze and U. Schwiegelshohn. A Square Root and Division Free Givens
Rootation for Solving Least Squares Problems on Systolic Arrays. SIAM J.
Sci. Stat. Comput., vol. 12, no. 4, pp. 800{807, July 1991.
[14] C. S. Henkel and R. J. Plemmons. Recursive Least Squares on a Hypercube Multiprocessor Using the Covariance Factorization. SIAM J. Sci. Stat.
Comput., vol. 12, no. 1, pp. 95{106, January 1991.
[15] T. Hagglund. New Estimation Techniques for Adaptive Control. Department of Automatic Control, Lund, December 1983.
[16] E. Irving. New Developments in Improving Power Network Stability with
Adaptive Control. Proc. Workshop on Appl. of Adaptive Control, Yale University, New Haven, 1979.
[17] C. G. Kallstrom and K. J.
Astrom. Identication of Ship Steering Dynamics. Automatica, Vol. 12, pp. 9{22, Pergamon Press, 1976.
[18] M. Morf and T. Kailath. Square-Root Algorithms for Least-Squares Estimation. IEEE Trans. Aut. Control, vol. 20, no. 4, pp. 487{497, August
1975.
[19] C. C. Paige. Error Analysis of Some Techniques for Updating Orthogonal
Decompositions. Math. Comp., vol 34, pp. 465{471, 1980.
[20] C. T. Pan. A Perturbation Analysis of the Problem of Downdating a
Cholesky Factorization. Department of Mathematics, Northern Illinois University, October 1990.
[21] C. T. Pan and R. J. Plemmons. Least-Squares Modications with Inverse
Factorizations: Parallel Implementations. Comp. Appl. Math., vol. 27,
pp. 109{127, 1989.
[22] V. Peterka. Algorithms for LQG Self-Tuning Control Based on InputOutput Delta Models. Institute of Information Theory and Automation,
Prague.
[23] S. P. Sano and P. E. Wellstead. Extended Self-Tuning Algorithm. Int. J.
Control, vol. 34, pp. 433{455, 1981.
[24] M. A. Saunders. Large-Scale Linear Programming Using the Cholesky Factorization. Technical Report CS252, Comp. Sci. Dept., Stanford University,
1972.
59
[25] A. Sjo. Updating Techniques in Recursive Least-Squares Estimation. Algorithms Using Rank-1 Modication of Covariance Matrix Factorization.
Department of Computer Science, Lund, March 1992.
[26] G. W. Stewart. The Eects of Rounding Error on an Algorithm for Downdating a Cholesky Factorization. Journal of the Institute for Mathematics
and Applications, vol. 23, pp. 203{213, 1979.
[27] B. Wittenmark and K. J.
Astrom. Adaptive Control. Addison-Wesley, 1989.
60

Updating Techniques in Recursive Least-Squares Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Updating Techniques in Recursive Least-Squares Estimation

Uploaded by

Copyright:

Available Formats

Updating Techniques in Recursive Least-Squares

2.1 Recursive Algorithms and the Regression Model : : : : : :

4 Time-Varying Parameter Estimation

A Alternative Algorithms Using Orthogonal Transformations

3.1 Updating of the Covariance and Information Matrix : : : : : : : 22

5.1 Identi cation of Ship Steering Dynamics : : : : : : : : : : : : : : 31

least-squares solution to this overdetermined equation system is obtained by

2.2 Algorithms Using Modi cation of the Covariance Matrix

2.2.1 The RLS Algorithm

By using Equations (2.3) and (2.5) we obtain

and the prediction error is given by

t = 2 + 'Tt Pt;1 't :

We see that the covariance matrix is updated by a subtraction of a rank-one

The RLS algorithm

Exponential forgetting can be applied to the RLS algorithm by simply scale

and the factor t is calculated as

2.2.2 Peterka's and Bierman's Algorithms

If P is written in LDLT form, P may be factorized as

By using a decomposition method called dyadic decomposition (see Peterka [22]

where is the same parameter as in (2.10) and K is the Kalman gain.

i.e. L = LL~ and D = D~ . This is equivalent to the derivation of the standard

2.2.3 Algorithms Using Orthogonal Transformations

The good numerical qualities of orthogonal transformations can of course also

with n2 = 2 . Then we apply n Givens rotations U = U1 U2 Un from the left

We are now ready to summarize the algorithm.

4. Compute the Kalman gain K := K 0 =0.

and is \orthogonal" in the meaning that

where the signature matrix S = diag(1). The hyperbolic rotations should

2.3 Algorithms Using Modi cation of the Information Matrix

and form the (n + 1) (n + 1) matrix

3. Use forward recurrence

2.4 Algorithms Using Modi cation of the QR

where t = kt  ; Yt k2 and the parameter estimate is given by

mentioned before, downdating is a more dicult problem than updating due to

2.4.1 Updating of the QR Decomposition

2.4.2 Downdating of the QR Decomposition

By using iterative re nement we obtain the corrected semi-normal equations as

1. Compute the vector q := Rt;T '1 and the scalars :=

3. Calculate the scalar t+1 := 2t ; ^2 .

1. Compute the vector q := Rt;T '1 and form the residual

3. De ne the normalized residual vector as e := Yt ; t ^t =t .

5. Calculate the norm 0 := krk2 , i.e. in the consistent case we have 0 = .

8. Update the parameters ^t+1 := Rt;+11 Y~t+1 .

2.4.3 Modi cation of the Inverse of the R-matrix

One drawback of the QR modi cation methods is that a number of triangular

where qT  is the rst row of Q. Now, we simply invert these matrices

LINPACK inverse updating algorithm

2. Calculate t+1 := 2 + 2t .

To downdate the inverse of R we simply invert the matrices:

The downdated parameters are given by

An alternative expression for the parameter estimation can be obtained if the

LINPACK inverse downdating algorithm

2. Use n Givens rotations for the transformation

3. Calculate t+1 := 2t ; ^2 .

CSNE inverse downdating algorithm

3. If t = 0 the system is consistent and is treated as a special case with

4. Use n Givens rotations for the transformation

5. Downdate the parameters ^t+1 := St+1 Y~t+1 = ^t ; ^ St v.

3.1 Updating of the Covariance and Information Matrix

By the assumption that

 = j'Tt Pt 't j  j t j

the inverse of this factor is given by

;t 1 = ( t +  t );1 = ;t 1 + ;t 1 'Tt Pt 't ;t 1 + O(2 ):

5.1 Identication of Ship Steering Dynamics : : : : : : : : : : : : : : 31

2.2 Algorithms Using Modication of the Covariance Matrix

t = 2 + 'Tt Pt;1 't :

i.e. L = LL~ and D = D~ . This is equivalent to the derivation of the standard

with n2 = 2 . Then we apply n Givens rotations U = U1 U2 Un from the left

4. Compute the Kalman gain K := K 0 =0.

where the signature matrix S = diag(1). The hyperbolic rotations should

2.3 Algorithms Using Modication of the Information Matrix

2.4 Algorithms Using Modication of the QR

where t = kt ; Yt k2 and the parameter estimate is given by

mentioned before, downdating is a more dicult problem than updating due to

By using iterative renement we obtain the corrected semi-normal equations as

3. Calculate the scalar t+1 := 2t ; ^2 .

3. Dene the normalized residual vector as e := Yt ; t ^t =t .

8. Update the parameters ^t+1 := Rt;+11 Y~t+1 .

2.4.3 Modication of the Inverse of the R-matrix

One drawback of the QR modication methods is that a number of triangular

where qT is the rst row of Q. Now, we simply invert these matrices

2. Calculate t+1 := 2 + 2t .

3. Calculate t+1 := 2t ; ^2 .

3. If t = 0 the system is consistent and is treated as a special case with

5. Downdate the parameters ^t+1 := St+1 Y~t+1 = ^t ; ^ St v.

= j'Tt Pt 't j jt j

;t 1 = (t + t );1 = ;t 1 + ;t 1 'Tt Pt 't ;t 1 + O(2 ):

Kt = ;t 1 Ft Pt 't + O(2 );

^t+1 = Ft (^t + ;t 1 Pt 't "t ) + O(2 );

kPt k < kFt k kPt k = t kPt k :

kPt k < kPt k

kLt k < 2 kLt k

kLt k < 2 kLt k

where () is the condition number and

Pt;1 := Pt;;11 ; (1 ; )Pt;;11 + vt;1 't 'Tt :

5.1 Identication of Ship Steering Dynamics

Poorly conditioned problems can be generated if the columns of the matrix

where ^ls is the non-recursive solution to the windowed least-squares problem

I ; ;1 vvT = I + vvT I + vvT ;