Professional Documents
Culture Documents
ARTHUR E. ALBERT
LELAND A. GARDER, JR.
To Margie
Foreword
HOWARD W. JOHNSON
Preface
where {an} is a suitably chosen sequence of " s m oot hing " vectors. The
ix
X PREFACE
ARTHUR E. ALBERT
LELAND A. GARDNER, JR.
Cambridge, Massachusetts
October 1966
Contents
1. Introduction 1
xiii
xiv CONTENTS
5. Asymptotic Efficiency 60
5.1 Asymptotic Linearity 61
5.2 Increased Efficiency via Transformation of the
Parameter Space 61
5.3 Asymptotic Efficiency and Summary
Theorem 65
5.4 Increased Efficiency 72
5.5 Large-Sa m pl e Confidence Intervals 72
5.6 Choice of Indexing Sequence 73
5.7 A Single-Parameter Estimation Problem 74
8. Applications 146
8.1 Vector Observations and Time-Homogeneous Re-
gression 148
8.2 Estimating the Initial State of a Linear System via
Noisy Nonlinear Observations 153
8.3 Estimating Input Amplitude Through an Un-
known Saturating Amplifier 156
8.4 Estimating the Parameters of a Time-Invariant
Linear System 161
8.5 Elliptical Trajectory Parameter Estimation 172
References 200
Index 203
1. Introduction
{Y,,:n = 1,2,}
if Y" = F,,(6),
(1.1)
where tn + 1 is the estimate of 6 based upon the first n observations and
{an} is a suitably chosen sequence of" smoothing vectors."
Without question, estimators of the type of Equation I.I are compu
tationally appealing, provided the smoothing sequence is chosen
reasonably. After each observation, we compute the prediction error
Yn - Fn(tn) and correct tn by adding to it the vector [Yn - Fn(tn)]an
Such recursions are sometimes called " differential correction"
procedures.
In contrast, maximum-likelihood and least-squares estimation
methods, although often efficient in the purely statistical sense, require
the solution of systems of simultaneous nonlinear normal equations.
If we want "running" values of these estimates, the computational
problems are often great.
Of course, the choice of the weights an critically affects the computa
tional simplicity and statistical properties of the recursive estimate
(Equation 1.1). The main purpose of this monograph is to relate the
large-sample statistical behavior of the estimates to the properties of the
regression function and the choice of smoothing vectors.
Estimation schemes of the type of Equation 1.1 find their origins in
Newton's method for finding the root of a nonlinear function. Suppose
that G() is a monotone differentiable function of a real variable, and
we wish to find the root e of the equation
G(x) = O.
(1.7)
4 INTRODUCTION
But
1 - a- < 1 -
d
a--
G(uJ) < b
- a-
0< 1 < 1
- b - G(IJ) - d '
so that
as n oo.
Let us now complicate matters by letting 0 vary with n. There is a
sequence of monotone differentiable functions, 0", all having a common
root 8:
0,,(6) = 0 (n = 1,2, . ) .
Then we have
If we let
Gn(x) = Fn(x) - Yn ,
the desired parameter value is that value of x which makes Gn(x) vanish
identically in n.
Now let noise be introduced, so that the sequence of observations, Yn,
are corrupted versions of Fn(8):
Yn = Fn(8) + Wn (n = 1,2,),
where Wn is (zero mean) noise. Motivated by the previous discussion,
we consider estimation schemes of the form
tn+l = t" + a"[Y,, - F,,(t,,)], (1.8)
wh ic h can be rewritten as
derivative of F".
"
A3. B
,,
2
= 2: b,,2 -+ 00 with n, where bit = inf
xeJ
1 F,,(x) .I
k=1
d"
A4. sUP- < 00
.
" b"
d,,2 1
AS. IImsup- 2 < .
"B "
AS'. limsup
"
(:)(::) < 1.
AS". I1m
b,,2 0
"
=
B" 2 .
(bn2)2
AS"'.
f 2 < ,,B
00
.
and
any of the Assumptions A2 through AS'" that hold for Fn() over
J = (1o 2) will also hold for Fn *(.) over J* = ( - 00, )
00 . Hence, the
results of this chapter (as well as the next) will apply to the untruncated
estimators tn* whe nevcr they apply to the truncated ones, tn. In most
applications, however, common sense seems to dictate that we should
use truncated procedures whenever we can.
The first theorem demonstrates the strong consistency of the estima
tion sequence, Equation 2.2, for a wide class of gain sequences. [For
J = ( - 00, 00), independent observations and gains which do not depend
on the iterates, the result becomes Burkholder's (1956) Theorem 1 after
an appropriate interpretation of the symbols.]
THEOREM 2.1
(t1 arbitrary),
1. For each n, the sign of on(x) is constant over J ( n ) and equal to the
sign of Fn(),
1
2. sup IOn(X) I < d for all suitably large values of n,
xeJ(n) n
and
3. 2: bn
n
inf
xe/(n)
IOn(x)I = 00,
or
5. L sup lan(x)12 < 00 and Assumption AI' holds.
n xeJ(ft)
Let
and
(2.4)
Then we obtain
and,consequently,
It"+l - 81 A".
THEOREMS FOR GENERAL GAINS 13
which implies that the sign of (T. - 9 + Z.) is equal to the sign of
(T. - 9):
IT. - 9 + Z.I = (T. - 9 + Z.)sgn(T. - 9 + Z.)
= (T. - 9 + Z.)sgn(T. - 9)
= IT. - 91 + Z.sgn(T. - 9).
Setting
X. = Z.sgn(T. - 9), (2.9)
we have, in this case,
It.+l - 91 IT. - 91 + X
In either event, therefore,
It.+1 - 91 max {A., IT. - 91 + XJ (2.10)
if n N, and this is the key relationship for our subsequent analysis.
To establish Equation 2.6, we choose a positive null sequence {AJ
so that
'" (sup la"j)
2
< 00.
L.
n
A
n
2
This combines with Equation 2.10 to give, for all such indices,
I tn +1 -
ill ::::;; max
(I
[ max (-AipPn
NSlsn 1
+ L.,
"=1+1
-)
PnPX,,
,
"
max
\ A ,Pn \ ::::;; max I A, I -+O
NSlsn p, NSlsn
THEOREMS FOR GENERAL GAINS 15
and
l 1 1N - 810 as noo,
PN-
the desired conclusion will follow from Equation 2.12.
To establish Equation 2.13, we can use either Condition 4 or 5. Under
Condition 4,
are themselves functions of 110 12, 1k, Wlo' " Wk, where Zk
" Ok Wk =
(see Equations 2.3, 2.4, and 2 9). In turn, lb ' . " tk are functions of
.
we see that
THEOREM 2.2
Let {Yn:n I, 2, } be an observable process satisfying Assump
= . .
(II arbitrary),
16 PROBABILITY-ONE AND MEAN-SQUARE CONVERGENCE
where
n
Pn = I1 (I - bJ inf laJI) -+ O.
J-l
(2.21)
since sup lakl is summable. But since 0 < Pn/Pk < I for all N ::;; k ::;;n,
it follows that Pn 2/Pk 2 < Pn/Pk. This and Equation 2.21, together with
Equation 2.20, give en 2 -+ O.
Under Condition 5, the Wn are independent, so that for every n, Wn
is independent of an(lh .." In) and Tn(lh'." In). Thus,
Since Fn( . ) is monotone for each 17, the sign of Fn(x) is independent of x
and Equation 2.26 does not depend on the arguments. In instances where
speed of realtime computation is an important factor,these determin
istic gains possess the virtue of being computable in advance of the data
acquisition (although there is the possibility of a storage problem).
Since
sup a. (x) = inf a. (x) = an,
xeJ(n) xeJ(n)
IS
{diVergent when r ::s;
0
if B.2 co. (2.27)
k= 1Bk2 +r convergent when r > 0
(This theorem will be used repeatedly.) If is an independent {Yn}
process,Condition S also holds under Assumption A3,because
(Note the slightly different usage of b" here.) We take J = ( -co, co) ,
and the recursion Equation 2.2 becomes
GAINS THAT USE PRIOR KNOWLEDGE 19
In + 1 = In + an[Y.. - bn1n]
= (1 - anbn)ln + an Yn
B-1 bn v
= -B 2 In + B 2 .1. ..
n n
In+l
[ n (-BBr-2l)] 11 + k2:.. ( nn
n Br-1 bk
)-
Bk2 Yk
=l I=k+l B1
= 2
1=1 1
If the initial (no data) estimate is 11 = 0, then
which is precisely the least-squares estimator for (J based upon the first n
observations. In other words, the gain sequence, Equation 2.26, yields a
recursively defined estimator sequence which is identical to the corre
sponding sequence of least-squares estimators.
The variance of the least-squares estimate in the case of independent
identically distributed residuals is easily computed. Since
Yk = bk8 + Wk
with &Wk2 = 0'2, it foIlows that
0'2
&(In+1 - (J)2 = B 2'
n
Thus, In converges in quadratic mean to (J if and only if Bn2 00. But
we have already shown in the preceding paragraph that this condition
implies Conditions 3 and 5. Since in the present case supx I Fn(x)1 = bn,
Conditions 1 and 2 also hold. In short, Conditions 1, 2, 3, and 5 of
Theorem 2.2 are necessary as weIl as sufficient conditions for the
quadratic-mean (and, we might add, almost sure) convergence of the
recursively defined least-squares estimator,
"
In+l = I" + ; 2 (Y" - b"I,,) (11 = 0),
"
in the case of a linear regression function.
where
Yn* = Yn - [Fn(80) - 80f1\(80)].
The parameter 8 now occurs (approximately) linearly in the mean value
of the observable Yn*, so that the recursive version of the linear least
squares estimator discussed earlier seems appropriate. Accordingly, the
gain sequence would be
(2.28)
From the theoretical point of view, the gain sequences, Equations 2.26
and 2.28, are deterministic special cases of those of the form
(2.29)
RANDOM GAINS 21
with
The last is referred to as the adaptive" gain and is quite often used in
practice.
The convergence properties of estimates based on gains of the type of
Equation 2.29 are determined by considerations of the following sort. If
Assumptions A3, A4, and AS' hold,then Conditions 2, 3, and S of
Theorems 2.1 and 2.2 hold. Indeed,we see that
ILnbn2 _ 00
f Bn2 -
00
(This is always possible since 2.n bn2/Bn2 = under Assumption A3.)
Then, set
b 2
L:n bn inf 10n *1 >
-
K' L:n
Bn2
= ex> ,
and
an ()I
bn bn
a a, 2 sgn an sgn Pn
Bn2 ::::; I X ::::; for all x E J(n>, =
Bn
dn sup lanl--+- O.
The same arguments used in the previous paragraph apply to the respec
tive nonsummability and summability of
THEOREM 2.3
Let {Yn: n = I,2, } be a stochastic process satisfying Assumptions
. . .
or
an2(Xh , Xn) sgn Fn
an(xb, Xn) ,
.
n
2.
=
fJn(Xb , Xn) 2
i-I
'Yl(Xl'' Xi)
n ), and Assump
where b n S an(x) , fJn x), 'Yn (x ) ( S dn for all x E J(
tions A4 and A5' hold,
or
then Assumptions A5, A5', and A5" can be dispensed with in Conditions
1, 2, and 3, respectively.
For the special case of polynomial regression, most of the conditions
are automatically satisfied and the independence assumption can be
dropped.
THEOREM 2.4
and
sgn a,,(x) = sgn/p,
then the estimator
inf
xel
I t(x)1 by inf I tl.
If fp is nondecreasing,
inf F,,(x)
xeJ
nPinf I/pl[l + 0(1)]
For
F,,(8) = cos n8 (0 < 8 < )
'IT ,
EXPONENTIAL REGRESSION 25
The function
Fn(6) = enD
violates Conditions 2 and 3 of Theorems 2.1 and 2.2 in an essential way.
For, if an{x) is any gain satisfying Condition 2,it follows that
bn inf I anex) I exp [n,d exp [ - n'2] = exp [-n{'2 - '1)],
xeJ(n)
Yn* =
{log Yn if Yn> A,
log A if Yn A,
where A is a positive constant chosen to save us the embarrassment of
having to take logarithms of numbers near and less than zero. Then
Yn* = n6 + Vn,
where
if Yn> A,
if Yn A.
26 PROBABILITY-ONE AND MEAN-SQUARE CONVERGENCE
where
Here Vn* has a second moment which goes to zero at least as fast as
e-2nO This suggests that we estimate 8 using weighted least squares. The
weights should ideally be chosen equal to the variance of Vn(e-2nO).
Since 8 is not known,we settle for a conservative (over-) estimate of the
variance, e-2n1, and estimate 8 by
tn+! = tn + (ne2n1/"lk2e2k1)[Yn* - nt 1
n (11 = 0).
for some positive S, then it can be rigorously shown that S(tn - 8)2 -+ 0
(exponentially fast) as n -+ co.
3. Moment Convergence Rates
Our first result tells us that the mean-square error tends to zero as
IIB,,'.1. whenever there is such a constant a which exceeds t,that is,when
B"2 la,,(x) I
lim inf inf > t.
" SE/(n) b"
The conditions of Theorem 3.1 are the same as those that ensured strong
and mean-square consistency in Theorem 2.3, Condition 3.
where {a,,} is any restricted gain sequence. Then,if a > t,it follows that
8(/" - 8)2Q = 0
(B21/)
as noo.
Proof. We let
Thus,
(T" - 8)21' = [1 - a"F"(u,,)]2p(t,, - 8)21'
+ 2p[1 - a"F"(u,,)]2P-l(t,, - 8)2p-la"W"
+ 1=2 (2) [1 I
_ a"F"(u,,)]2p-l(f,, - 8)2p-l(a"W,,)I.
110 Wlo, W" l. Since the zero-mean W's are presumed independent,
_
the second term on the right-hand side has zero conditional expectation,
giving
C{(T" - 8)21'1 flo , I,,}
+ en [1 - a"F"(u,,)]2p-l(t,, - 8)2p-la"ICW,,1
1
2
the reason being that, because f" and 8 belong to J, u" must also.
The Inequality 3.5 will be valid only for all n exceeding some finite index,
which generally depends on a and a'. However, without loss of gener
ality, we may proceed as though the gain restriction is met for n =
1,2,, and thereby obviate continual rewriting of the qualification.
With this understanding, we now majorize the right-hand side of Equa
tion 3.3 by bounding 2pa"F,,(u,,) from below and everything else from
above with the deterministic quantities in Equation 3.5. Following this,
we take expectations and use the sure inequality, Equation 3.1 . The
result is
C(t"+l - 8)21' (l - 2pa{3" + K{3"2)O(t,, - 8)21' 1
+ K'
1=
( )
b
2 E"2
"Cit" - 8121'-1
for all n. Here K and K' are some finite constants depending on p, but
not on n, and the latter contains the hypothesized uniform bounds on
the observational error moments.
Inequality 3.6 is the starting point in the derivation of moment
convergence rates.
For the presently hypothesized case a > -1. we introduce the working
definition
X",'= B"'_l(t", 0) -
+ K'
1=2 B"-l
1
( )
21'- f3,,1/2 CIX",121'-I. (3.7)
(J1' =
1
(1 _ f3n)1' = 1 + pf3n + 0(f3n2),
where all order relations will be as n -+ 00. Thus, for some c > 0 and all
lare enough n, we have
( B ) 21' (1 - 2paf3", Kf3",2) = 1 1 )f3n 0(f3",2) cf3""
B",: 1
+ - p(2a - + =:;; 1 -
(3.8 )
because 2a 1 > O. Let N be fixed large enough so that Equation 3.8
-
holds with cf3", < 1 for all n N. Introduce the inductive hypothesis
(3.9)
It is afartjar; true that the expectations in Equation 3.7 remain bounded
as n -+ 00 for each index i. Since f3n -+ 0, the summands for i > 2 are
evidently each of smaller order than the (i = 2)-term. Thus, after
substituting Equation 3.8 into Equation 3.7, we have
GX; 1 =:;; (1 - cf3",)C Xn 21' + K"f3" (all n ;:::: N).
Iterating back in n to N, we obtain
",
", ",
X; 1 =:;; n (1
J=N
- cf3j)C XN21' + K" L: n (1
k=N J=k+l
- Cf3J)f3k'
From the scalar version of the identity which is Lemma 1, it follows that
the right-hand side is equal to
K" (1
Q",GXN21' + - Qn),
C
THEOREMS FOR MOMENT CONVERGENCE RATES 31
where
..
TI (1 - Cf3f)
Q.. =
f=N
Since Lf3 .. = 00 (Equations 3.4 and 2.27 with r = 0), Q" tends to zero as
n 00. This shows that .;It'p -1 implies .Yf?p' Since .;It'1 holds trivially and
B,,2/B_1 1, the asserted conclusion follows by induction on p =
1 , 2, .. " q. Q.E.D.
For gains with a -!- our technique of estimating convergence rates
requires that we strengthen our assumption 13" 0 to Lf3"2 < 00.
THEOREM 3.2
{
strengthened to A5'". Then
O "2
- 8)2q is at most the order of e r
if a = t,
tff(t" _1_"
if 0 < a < t.
B 4q ..
"
Proof. We first iterate Inequality 3.6 back to the index N for which
Z" = 2paf3" - Kf3,,2 E (0, 1) and log B,,2 > 1
for all n N, which can be done since 13" 0 and B"2 00. This gives
"
tff(t" + 1 - 8)21'
(l - ZJ)tff(tN - 8)21'
JTI
=N
..
L: TI+ K'
.. 21' 1
( 1 - zJ) L: BI<2 tffl!1<
(b ) - 8 1 21'-1.
I<=N J=I<+1 1=2 I<
We apply Lemma 4 with Z = 2pa to get
K' "
+ -- B L.
21'
" BI<41' .. -113I<1J2CIII<
'" DI< L. - 8121'-1
" k=N 41'''
1=2
(a > 0), (3.10)
where the DI<'s are uniformly bounded in accordance with the lemma.
Consider first the case a = h and set
-
IX11+ DN-1(lOg B-l"IXNIP
1 S
(log B.I.,
= 0(1) +
K'
1'
"
L D k
21'
L
(
Bk2 log B_l 1'-1/2
B2 l Iog B 2
)
(I og B"2) k=N 1=2 k- k
X (log Bk2) 1/2,8k'/2CIXkI21'-1
1'- (3.11)
K H "
0(1) + I- LN ,8k,
og B"2 k=
because log Bk2 increases with k. But the last written sum is 0(1) as
n -+ 00 by Equation L4.3 of the Appendix. In fact, we have (Knopp,
1947, p. 292),
" b 2
k log B ,,2,
L
k=l Bk2
which, incidentally, makes explicit the Abel-Dini Theorem (2.27) for
r = O. (The symbol will always mean the ratio of the two sides tends
to unity.) Therefore, .Yt'1'_l implies .Yt'p, and the proof for a -} is =
and multiply Equation 3.10 through by B,,41'''. In place of the final bound
in Equation 3.11, we find
"
eX:!l DN_ltffXN21' + K' L Dk
k=N
x
21'
'" Bk2
L- -B2
( )
"<21'-0 ( B 2k-4"
2 ) 112
,8
1!>IX 12
Ii> k
1'-1
1=2 k-l k
THEOREMS FOR MOMENT CONVERGENCE RATES 33
.:It'P-l we have
c" =
. f B,,2 l a n(x)1 '
In
xeJ(n) bn
we see that there always exists an a > -i when L limn inf c" > -1. and
=
an a < -} when L < -to Generally speaking, the case a -} occurs only=
tion 3.1. Then, if we use Equation 3.S in Equation 3.3 with p I, we get =
K =O. Thus, after further weakening our lower bound by dropping the
positive term involving eN2, we have
a2u2 " b2
e+l L k 4
B 4" k=N C B "-4a"
.. k
const " bk2 .
>- '"
- B "4a" k=N
L.
B k2+2(1-2a")
34 MOMENT CONVERGENCE RATES
The strictly positive" const" involves a uniform lower bound on the C's,
which exists according to Lemma 4. Using Equation 2.27 once again, we
see that
"
t,
.. .. 0 if a < t.
Thus, if the assumption a > 1- of Theorem 3.1 fails, the mean-square
error cannot generally be 0( I/B,,2). Indeed,
"
for all a < t, that is, all cases in which
d" 1
1 sup- < -.
" b" 2a
1"+1
I"
=
1 + 0 (!).n
In fact, for any such sequence (see, for example, Lemma 4 in Sacks,
1958), we have
We should not infer from this that nb"2 -+ 00 is necessary to meet our
conditions B"2 -+ 00 and 'L-{3"2 < 00. Indeed, if
1
b2
" = -- , (3.13)
n log n
it is true that Bn2 ;;;; log log 11, and hence {3" o(I/n). We retain this {3..
=
behavior, and make B,,2 increase even more slowly, when we replace
RELEVANCE TO STOCHASTIC APPROXIMATION 35
equivalently, the conditioning can be on the values of t10 Z1o" ', Zn-l ) '
This is Burkholder's (1956) type-Ao process specialized to the case
where the regression functions all have the same zero. The significance
of our results lies in their validity for a much larger class of Kn(x)'s than
heretofore considered.
To apply Theorems 3.1 and 3.2, it clearly suffices, in accordance with
Equations 1.8 and 1.9, to make the following symbolic identifications:
Kn(x) = IFn(un)l,
Zn(x) = sgn Fn[Fn(x) - Yn], (3.16)
an =
lanl.
Independence of the Yn's is essentially the stated property of the zn's
36 MOMENT CONVERGENCE RATES
Thus, if a" is chosen as any restricted gain sequence, we have for the
{
mean-square estimation error of the successive approximations
(Equation 3.15)
1/B"2 for -t < a < 00 ,
I
dG,,(x) =
K and
dx x =6
for all n, the deviations vii (tn - 8) of the Robbins-Monro process tend
to be normally distributed about the origin in large samples. The
variance (Sacks, 1958, Theorem 1) is
a2
where V(a) = -- ,
2a - 1
3.5 Generalization
8" - 8 = 0 (,.)
as n 00.
Theorem 3.1 so generalized is Burkholder's (1956) Theorem 2 (after
we ignore the continuous convergence portion of his conditions which
are imposed to show that B"2QC(t,, - 8)2Q is not only 0(1) but tends to
the 2qth moment of a certain normal variate). However, Assumptions
A3 and A5" permit a much larger class of K,,(x)'s than does his corre
sponding assumption that b,,2 is of the form of Equation 3.1 2 without
the i,,'s and the exponent restricted to -1 < {:J o.
4. Asymptotic Distribution Theory
38
ASYMPTOTIC NORMALITY FOR GENERAL GAINS 39
probability.
5. Xn "" Yn meansXn and Yn have the same limiting distribution.In
particular, if Yn "" Y and Y is normal with mean 0 and variance
I{J2, we write Xn "" N(O, I{J2).
THEOREM 4.1
Suppose that {Y,,: n 1,2,...} satisfies Assumptions AI', Al, Al,
and AS with J
==
g ,,(x) ==
I lb) 1.
Suppose the functions g10 g,lt' . are continuously convergent at the
point 9; that is to say, for every sequence {xJ tending to 9 as n-+oo,
{g,,(xJ} has a limit. Furthermore, suppose that
Then
y,,
' = n2Ia,,(t1>""
71
tn)1 gn(un) (4.1)
where Un is the point, with the indi cat ed property, which arose in
Equation 3.2 from the law of the mean. A ssumption concerning the
(bounded) random variableYn is mad<; underConditionI in the state
ment of the theorem to be proved. Letting .\ be as it is postulated
there, we rewrite our untruncated difference equation
as
In+l - 8 = (I - Yn'fJn)(t" - 8) + anW"
= (I - .\fJn)(ln - 8) - fJn[(Yn' - Yn) + (Yn - .\)]
x (In - 8) + anWn, (4.2)
wherean and Wn are the same as in Equation 3.2 and fJn is stilI given
by Equation 3.4. After iterating this back to an arbitraryfixed integer
N, we obtain
"
tn+l - 8 = n (1 - A{3;){tN - I)
i=N
We are going to show, under the conditions of the theorem, thatY, II,
andIII go to zero in first mean faster than 1/Bn, while Bn timesIV has
the asserted limitingnormal distribution.We fixN, as usual, sufficiently
large so that )..f3, < 1 for allj N.
With regard toI: From Lemma 4, we have
..
n (l
'k+l
- )..f3,) ::;
DkBk2i\
B 2"
-
"
= 0 ( 1)
-B
"
' (4.4)
inf
xeJ(n)
B ,,
b"
2
la,,(x)I L - e
1
- +
2
e.
ek = tB'%(tk - fJ)2 = 0 (J
as k -+ 00, independent of the value of a. Next, let " denote the
hypothesized limiting value ofgk(X,J when Xk tends to 8. Then from
Equation 4.1 and the gain restriction,
(say). (4.6)
and therefore the t'k'S are bounded random variables and tB't'k2 -+ O.
-+
The sequence following the center dot in Equation 4.5 is thus
0(1) as k 00. By Lemma 4, SUPkN Dk < 00. Therefore,
0(1)0(1)
- 1 > 0, the bound inEquation 4.5 must go to zero asn -+ 00
=
since )"
2
byLemma 5.
42 ASYMPTOTIC DISTRIBUTION THEORY
{:. Ii
for k = 1, 2, , N - 1, (4.8)
a"k =
( 1 - Afl,)a. for k=N,N + l, , n.
j=k+1
The multipliers a"k are random variables via
ak = ak(tlo., tk).
From the form of the iteration, it is clear that
110 t2, " tk
(4.10)
"
c. lim L1 a" ,p2 < 00.
" "=
=
Thus, we obtain
t/12 = ....!
2 !....!!:..
2 >' - 1
by the conclusion of Lemma 6. Q.E.D.
THEOREM 4. 1'
Let the hypotheses of Theorem 4.1 hold over J [Et> E2 ], with at =
least one of the end points finite, and suppose we choose the interval so
that 0 is an interior point. In addition, assume there exists an integer
p, 2 p < co, for which
where {an} is any restricted gain sequence having L > t. Then the con
clusion of Theorem 4. 1 holds under Conditions 1 and 2.
(4. 15)
In what foIlows we proceed as though both end points are finite. If one
is not, the appropriate term in Un is to be deleted and the ensuing
arguments accordingly modified.
In this notation the truncated recursion is
The meaning of all symbols is the same as before, the only difference
being that Ih, In, Un and 8 now belong to a finite interval. We thus
have
In+1 - 8 = (right-hand side of Equation 4.3) + Un,
(4.17)
because {3" tends to zero. Since we are assuming, without loss of gener
ality, that [1 < 8 < [2,
we can write
8 - [1 28 > 0,
for some such 8. For the right-hand end point, we therefore have
tffX"2 = P{T" - 8
[2 - 8} P{lT" - 81 28}
tff(t" - 8)21' (a'b,,)2 2p 0"1W:"121'
+
821' eB,,
ALTERNATIVE ASSUMPTION 47
ClUnl =
() )
0 B P-l -
But
1 1 f1n f1n
Bn2P-l = (BnP-2bn)2 . Bn = 0 Bn ( )
by hypothesis. This establishes Equation 4.17 which, as already argued,
is sufficient. Q.E.D.
Remark. The assumption that BnP-2bn 00 .for some finite p is
directed at those situations in which the sequence of derivative functions
tends to zero as n 00, and it places a limitation on the way in which
we allow this to happen. It excludes Equation 3.13, and also infimums
like bn2 = log nln, since then we would have En2 log2 n. However, the
assumption is satisfied by Equation 3.12, with p 2 the first integer
exceeding 2 - f11(f3 + 1). This makes quantitative the required relation
ship between the rate at which the derivatives are approaching zero and
the number of existing noise moments.
(-2 -)
+ +
o
lim rp(t) 0, rp()
1 rp() !P(11) rp(t2) rp
11 12 ,
1--0
=
12 , 2
(all t2 11 0),
48 AS YMPTOTIC DISTRIBUTION THEOR Y
such that
sup Ign (X 1 ) - gn(X2)I <pet) (4.19)
IXI-X2IS!
(Xl' X2)eJ
THEOREM 4.2
Suppose that{Yn : n = 1, 2 , "'} satisfies Assumptions AI', A2 , A3,
A4, and A S" with J = (-00,00). For a llx in J, set
gn(X) = Fh:) I,
I C = "
dn
IIm sup- '
bn
Gain a,,(xb,XJ
Al sgnF" ;:1
F,,(xJ
2 AI
F"I(X,J
"
"-1
F1I(60)
3 Aa n (60 EJ).
kL
Fk2(60)
=l
Gain QI
3 ( :a),
V Aa provided Aa > c2j2, and gn(60) -+ Yo
as n -+ co.
In every case, the same limiting distribution obtains if, in the norming
sequence, Fi6) is replaced by Fk(tk) for k 1, 2" " , n. =
n -+
This constitutes a normalized Toeplitz matrix because each column
tends to zero as co and the row sums are identically one. Thus,
n
limfn = f implies lim L bndk = f (4.20)
n 11 k=l
50 ASYMPTOTIC DISTRIBUTION THEORY
by the Toeplitz Lemma (Knopp, 1947, p. 75). This fact will be used
repeatedly.
To apply Theorem 4.1, we must first verify that the number L, defined
in its hypothesis, exceeds t for each of the three gains under considera
tion. For Gain I this is immediate because L is A1> and the latter is
presumed chosen larger than t. For gain 2 we have (and the same clearly
will be the case for Gain 3)
L
2
lim inf inf ,, la,,(2l(x)I = lim inf inf A2 "
g,,(x,,)
2 b"kgk2(Xk)
=
"=1
-
> "
A2
(d)2 - 2 2
A2 > !. >
lim sup 2 bn" "
C
if 0 < inf"/" sup"l" < K < 00. Indeed, if we set/ = lim sup"I", there
corresponds, to any e > 0, a finite index no such that/" < /+ e for all
n > no. For such indices, we have
y"
" "0
= 2 b"d"
"=1
< K 2 b"" + /+ e.
"=1
The first term tends to zero as n 00; hence, there is an nl > no such
that it remains less than e for all n > nl' Thus, for all sufficiently
large n,
y" < /+ 2e,
from which the asserted conclusion follows because e was arbitrary.
The problem is to prove that Conditions 1 and 2 in the statement of
Theorem 4.1 are satisfied with values of (, JL) which yield the asserted
formulas for Q2. To do this we set
1 "
B 2 2 Pi(t,,)
"
S,,2 = = 2 b""g,,2(t,,),
" "=1 "=1
(4.21)
for x in J. Let y" and z" have the meanings respectively given in Equa
tions 4.1 and 4.11 as functions of t1> " ', t,,:
y" =
B2
T" la,,1 g,,(t,,) and Z" =
B,,41
b,,2 a"12
LARGE-SAMPLE VARIANCES FOR PARTICULAR GAINS 51
The first two_ columns of the following table are proportional to these
sequences for the listed gains.
1 gn(tn) 1 " 1
2 gn2(tn) gn2(tn)
1 (4.22)
s;::- Sn4 ,,2
3 gn(80)gn(tn) gn2(80) .!
n2(80) n4(80) Yo Y02
as n -,)- 00 for the corresponding ('\, p.) given in the third and fourth
columns.
First of all, however, we note that each of the asserted ,\ values
exceeds t. Indeed, since
1 :$; lim inf gn(x) :$; lim sup gn(x) :$; c
n n
for alI x E J, any limiting values of gnex) must belong to the interval
[I, c]; in particular, yand Yo. Thus, in the case of Gain 3, '\ A3(ylyo) > =
(c 2/2)(Ilc)
= c/22:: 1- with equality only when c = 1, in which case we
say the problem is asymptotically linear.
With regard to Gain 1: The hypothesized continuous convergence of
the gn's at 8 to y immediately allows us to infer gn(tn) y from tn 8.
But the gn's are bounded, so gn(tn) -,)- " in mean square.
With regard to Gain 2: We consider the identity
By the same argument used for Gain 1, the third term goes to zero in
mean square. For the first, from Equation 4.21,
,.
lim ,.2(8) = ,,2. (4.26)
( )
Thus,
.! _1_ y,. + _1_ _ .!
,
_
1
A22 S,.2 A2 S,.2
_
,,2 ,,2
=
It follows from the results of the previous paragraph that this bound
goes to zero in mean square as n -+ co.
With regard to Gain 3: If we use the additional assumption that
{g,.} is convergent at the selected point (Jo, the same type of argument used
in the preceding paragraphs establishes Equations 4.23a and 4.23b for
the asserted .\ and JL in Equation 4.22.
We have thus verified all the (unassumed) hypotheses of Theorem 4.1.
In view of Equations 4.25 and 4.26, we have
SrI =
1
0 ,.
J Fk2(tk)
k=l
1+"
THEOREM 4.2'
Let the hypotheses of Theorem 4.2 hold over an interval J = [eh e2],
with at least one of the end points finite, where the interval is so chosen
that 8 is an interior point. In addition, assume there is a finite integer
p 2 such that
If 11 is arbitrary and
1,,+1 = [I" + a,,(/h, )
"I [Y,, - F,,(t,,)]] (n = 1, 2" ,,),
where {a,,} is one of the three gains listed in Theorem 4.2, then the
conclusion of Theorem 4.2, under its provisos, holds for these truncated
estimates.
r"
2
after using weak convergence and boundedness. The variance is thus
(4.28)
Equation 4.28, is the same as that for Gain 2, although the gains are
algebraically different. Finally, the same is true forgn = 80 andGain 3.
The fact that both Gain 2 and Gain 3 are easier to compute than
Equation4.27 is reflected in the strongerlimitationA > c2/2 .
2.0 ,
\
v
1.5
i.-
\ '-... --
l..----
1.0
Figure 4.1 The stochastic approximation variance (unction defined in Theorem 4.2.
For Gain 1, Al must be chosen in the open interval (t,00). For Gain 2,
A2 must be chosen in (c2/2,00), while for Gain 3, 00 can be chosen
by the experimenter (this determines Yo), and then A3 must be
chosen in (c2/2,00). For any particular choices of the A" it is not
hard to exhibit regressions such that each gain is, in turn,"optimal"
(has minimal Q2) for some value of the parameter O. Thus, the question
of" which gain to use" has no quick answer.
As a possible guideline for comparing the three types of gains for a
particular regression when 0 (hence y) is not known, we might adopt a
"minimax" criterion for choosing the AI and then compare the variance
56 AS YMPTOTIC DISTRIBUTION THEORY
The values of the variance resulting from the choices of Equations 4.29,
4.31, and 4.32 are
GAIN COMPARISON AND CHOICE OF GAIN CONSTANTS 57
if 1 c < V2,
(4.33)
if c V2,
c3
c3 < c(c + 1)
+ l'
if and Yo >
c
otherwise.
where 1 :S ","0 :s c. We see that every Q,I 1 with equality when and
only when c ==1.
The same is true for the simpler choices
and (4.34)
which meet the provisos in all problems. The corresponding variances
are
(4.35)
The same is true in Equation 4.35 for every c. Thus, a fortuitous choice
for 80 will make the estimates based on the more easily computed Gain 3
asymptotically more efficient than those based on Gain 2.
In the next chapter we limit our consideration to sequences gh g2 . . .
,
that converge uniformly on J to a continuous limit g. We then, at an
increased computational cost, iterate in a certain transformed parameter
space defined only by g and invert back to J at each step. The result, as
might already be anticipated, is that Q2 V(I) I for all three gains,
= =
THEOREM 4.3
For every real number x and positive integer n, let Z.(x) be an
observable random variable. Corresponding to a given sequence of
constants "It "lit' " recursively define
G.(x) - IZ.(x)
have a zero which converges to a finite number Bas" -+ 00:
G.(BJ - 0, lim B. - B
{:.(X
1. There exists b. > 0 and a number" such that
G.(x)
if x:l= B
,.(x) - - BJ
if x - B.
satisfies 1 g,,{x) Co < 00 for all x and all n > no. (We can
always redefine b" so that any strictly positive value ofinf,,> "0. x g,,(x)
is unity. )
2. g,,{x) is continuously convergent at 8 to y.
3. B,,2 = b12 + ... + b,,2-+00 with n and b,,4/B,,4 < 00.
4. 8" 8 + o(I/B,,).
sUPn.x O"I Z,,{x) - G,,{X)12+6 < 00 for some 8 > 0 and Var Z,,(x) is
=
5.
continuously convergent at 8 to a number a2
IIm- >
B"2
Y \
"2".
i
ex" = 1\
" b"
the random variables
(n = 1, 2, . . )
a
GENERAL STOCHASTIC APPROXIMATION THEOREM 59
<
e
1
b _ __ (0 !) (4.36)
" n%-
as n -+ 00. (This is not obvious until the symbols in the two statements
are properly related.) As already noted at the end of Chapter 3, our
Condition 3 is much less restrictive than Equation 4.36. Furthermore,
Burkholder assumes that all moments ofZ,,(x) - G,,(x) are finite, albeit
only throughout some neighborhood of 8. Condition 5, at least from
the point of view of application, is in most instances weaker. Indeed, the
distribution of the "noise", Zn(x) - G,,(x}, usually depends on x in a
rather trivial fashion and is often independent of the adjustable param
eter. On the other hand, high-order absolute moments are infinite in
some problems. Finally, Burkholder's assumption that
Gn I
sup I (x) <
00 '
n.x 1 + Ixl
2: Fk2(80)
k=1
is appropriate in many applications. As we have noted, it is computa
tionally cheaper than the .. adaptive" second gain, and it can lead to
estimates that are more efficient in large samples. However, the existence
of a stable limiting distribution for these estimates should not depend on
the value of our initial guess, 11 80 Hence, the Gain-3 proviso (that
=
the assumption that the sequence possess a limit, say g(x), at every
x inJ. If, in addition, we require that this convergence be uniform on J
and that the limit function be continuous, there will be continuous
convergence at every point of J (in particular, at 8, as also required in
Theorem 4.2). Indeed, if , is arbitrary in J and {xn} is any sequence
tending to " then
(5.1)
cp f8 geg) d
J l
=
and invert back at each step to obtain the (J-estimate. This is, in fact, t he
method analyzed in the following t heorem.
In some rather simple problems, Equation 5.1 is an equality for every
n (and the major portion of the proof of Theorem 5.1 is o bviated). For
instance, if Fn(J)= kn(J3, and J is any finite interval that docs not include
the origin, then gn{x) =
(X/1)2 for all 11. In such a situation, we would
estimate (J3 by linear least squares and t hen take the cube root.
THEOREM 5.1
Let Assumptions AI', A2, A3, A4, and A5"' hold, w here J = [h 2]
is any finite interval containing (J as an interior point. For n I, let
62 ASYMPTOTIC EFFICIENCY
For x in J, define
which takes values in J* - [0, ta>], and let 'I" - cz,-I be the inverse
function (which exists because g is positive and bounded), For y in J*,
define
F.*(y) - FII('I"(Y,
b.* - inf I I .*(Y)I,
".1'
-h bll* I.*(tll*)
Sprll*
B*1'
k-I
Ik*I(tk*)
proof of t he t heorem falls into two parts. We first s how t hat t he starred
INCREASED EFFICIENCY VIA TRANSFORMATION 63
in all three cases. The second part of the proof will yield the desired
conclusion by the "delta method."
The initial step, then, is to show that our assumption that {Fn} obeys
Assumptions AI' through AS'" on J implies that {Fn *} does on J *. The
basic relation for doing this is
('l"(yd'l"(y) F,,('l"(y)
* (y)
t. - t.
=
" - n dy d!l>(x)
... g,,('Y(y)
dx x='P(II) I
= s n r"b" 'F . (5.3)
g( (y
bn infgn(x),
gn(x).
bn * = dn* = bn sup
J g (x) J g (x)
Since the range of the limit function cannot be larger than that shared
by every member of the sequence, Equation 4.7 yields
bn
bn* dn* bnco (n = 1 ,2" . , ) .
Co
-,
Thus, not only are Assumptions AI' through A5'" satisfied by the starred
infimums and supremums, but also
lim Bn*I>-2bn* = + 00
"
The ratio
[gn(x)/g(x)]
d..*
s p
=
b..* inf [g..(x)/g(x)]
J
/ sup gg(x)
J
n(x)
1 \ :::; sup l gn - 11:::; sup Ign(x) - g(x )1 = 0(1)
_
J
(X)
g(x) 1
is continuous and nonzero at every yin J*. From Equation 5.2, we see
p p .
that In* - 9'; hence 'Y(t'n) 'Y(9'). Thus, after we multiply through by
.
the appropriate norming sequence and use Equation 5.2 as written, it
follows that
Jk=li Fk*2(9') (t" - 8) = 'Y(9') Jk=li Fk*2(9') (t,,* - 9') + 0,,(1) 0,,(1 )
,.., N(O, 'Y2 (9') a2).
But according to the leading equality in Equation 5.3,
Fk*(9') = Fk (8)'Y(9'),
so that
(5.4)
CW = o,
( Certain higher-order moments are presumed finite when we consider
our methods of estimation.) We further suppose that
hew) =
_ dl o:!(w)
ex ists (on all set s wi t h po sit iv e prob ability) and that
Ch(W) = 0, (5.5)
8 En.
(5.6)
(5.7)
where A = A, is independent of 8.
Now let In = In( Yl> ... , Yn) denote a 8-estimate based on the first n
o bservations (rather than Il - 1 as previously). Under regularity condi
tions, the celebrated Cramer-Rao inequality states that
8) 2 > b n2(8)+ {
I + [dbn(8)/d8]}2
(In (5.8)
In2(8) '
_
-
where bn(8) is here the estimate's bias. The usual form in which the
regularity conditions are written is (see, for example, Hodges and
Lehmann, 1951) as follows:
i. n is open.
ii. a log Ln/o8 exists for all 8 En and almost all points
Y = (Yl>, Yn).
iii. co(a log Ln/(8) 2 > 0 for all 8 En.
iv. f Ln dy = 1 and f (tn - 8)Ln dy = b n( 8) may be differentiated
under the (multiple) integral signs.
Our Equation 5.5 ensures Conditions ii and iii, and Condition iv holds
because/does not depend on 8. We note that Conditions ii and iv imply
I!e a log Ln/88 = O.
The ratio of the right-hand side of Equation 5.8 to the left-hand side
is called the (fixed sample size) efficiency of In when 8 is the true
parameter point in n. As is known, a necessary and sufficient condition
for an estimate In to be such that this ratio is unity for all 8 E n is that In
ASYMPTOTIC EFFICIENCY AND SUMMARY THEOREM 67
that a log gn/e8 = K,,(8)(t - 8), where g" is the density of t". The right
hand side of Equation 5.8 is only a lower bound on the mean-square
estimation error; there exist problems where the uniform minimum
variance of regular unbiased estimates exceeds 1/1,,2(8) at every 8.
Let us restrict our attention to Consistent Asymptotically Normal
(a bbreviated CAN) estimates of the value of 8 specifying {F,,(8)}, that is,
those for which
,,(8)8
t -
,..., N(O, 1)
(5.9)
exists (possibly as +00). Here "10 is called the asymptotic efficiency of{t,,}
when 8 is the true parameter value. IfVar t" 0',,2(8) and db,,(8)/d8-+ 0
as n -+ 00 for all 8 E n, then it follows from Equation 5.8 that "10 1 for
all 8 E n.
If a CAN estimate is such that "18 1 for all 8 E n, it is called
=
THEOREM 5.2
random variables with common variance u2. Let t9' Yn = Fn(8) be pre
scribed up to a parameter value 8 which is known to be an interior point
of a finite interval J = [elf e ]. We impose the following conditions:
2
1. The derivative Fn exists and is one-signed on J for each n.
2. Bn2 = L=l b1<2 -+ 00 as n-+ 00, where bl< inf",el IFIx)l. =
(n = 1,2,),
where
where
Q12 = V(Alg(8 provided that Al > t,
Q 22 V(A ) provided that A2 > c2/ 2,
2
(A3 gf8j )
=
and let 'Y = <l> -1 be the inverse function. For y in J* = [0, <l> ( )]'
2
define
F"*(y) = F,,('Y(Y.
Let 11* = be arbitraryin J*, and let
[ ]
CPo
'(2)
t:+1 = /,,* + "F,,*(f{!o) [Y" _
F"*(/,,*)]
L
k=l
Fk*2(f{!o)
0
t. = 'Y(t,,*) (n = 1,2, . . ).
Then, as 11-+ co,
(i = 1,2, 3),
with equality when and only when/is the N(O, 0'2) density.
d
b" = sup g,,(x) sup g(x)
" I I
(72,\2 =
Ol
therefore (assumingf is absolutely continuous),
f _ 00 wf'(w)dw =
"'few)
1+00Ol foo
_ - _ 00 few) dw = -1 .
This proves that (72,\2 1 . The necessary and sufficient condition for
equality in Equation 5. 1 0, that is, (72,\2 1 , is that the integrands w(w)
=
(-2-1)
V +
)
1_ r
( W
2 -<V+l)/2
(2 ) 1 ( -00 < (0 ) .
__
v
few) - j- + < w (5. 1 1 )
VV1T r
_
variance is (72 = v/(v - 2) and, as v-+- 00, Equation 5.11 approaches the
N(O, 1) density function. After a somewhat lengthy but straightforward
calculation, we obtain the formula
(v - 2)2(V+ 3)
'TJ = 2(v + l)(v+2}'
ASYMPTOTIC EFFICIENCY AND SUMMARY THEOREM 71
For large v, TJ = 1
-
(4/v)+ O(l/v2). As the following table shows, the
approach is not too rapid.
, 0.34
6 0.43
7 0.'0
8 0."
10 0.63
U 0.7'
20 0.81
2' 0.84
0.89
100 0.94
400 0.99
u,
and hence
TJ = -l .
I-ao dw
=
0.
B
(5.13)
.. I
tk(tn)h( Yk - Fk(tk
k l
On = In+ ).2 ::..:=-
:..:: --:
- n:--
- ----
2: tk2(tk)
k=l
will then have the same optimum large-sample statistical properties as
the ML estimates. The fact that these require infinite memory, via the
quantities tk(tn) (k = 1, 2" ", n) , violates our ground rule that we
restrict consideration to computationally feasible estimation schemes.
Under the conditions of Theorem 5.2, we have shown, for the trans
formed estimates {In}, that
I X
lim P{Sn(tn - 0)
n
< O'x} = . /_
y 27T I
-co
exp (-!e) dg
(5.14)
of mean values
(8 EJ;k = 1,2" ,,)
through 5 of Theorem 5.2. The most o bvious criterion for choosing such
a sequence is maximization of each of the summands in Equation 5.14.
That is, we define Tk T(lk) by =
(5. 16)
T(8) = { if 0 8 =::; t,
<
if t =::; 8 < 1.
F(x) =
{(I - X)3 ' o =::; x =::; t,
x3, t =::; x =::; 1,
with b1l = i for all n.
74 ASYMPTOTIC EFFICIENCY
r __________-L-- 0
The polar coordinates (r, cp) of the parabola with focus at the origin
shown in Figure 5. 1 are related by
2a '
= (5.17)
r 1 +cos cp
exerted by the origin on the point P, with (reduced) mass 111, then
[2
a = 2111k'
wherein
I = mr2 ; = const (5.18)
as the one to be estimated and presume the others given. We assume that
at time t = 0 the coordinates of P are (a, 0), that is, that the turning
angle, which orients the axis of the parabola to the observational co
ordinate system, is also known. Integration of Equation 5.18 with r
gi ven by Equation 5.17 then yields the cubic equation
SINGLE-PARAMETER ESTIMATION PROBLEM 75
z+tZ3
t
K 2 (5.19)
8
=
for
z = tan 111'. (5.20)
There is a single positive root, namely.
(5.21)
(t 0).
F(Tk' 8) by selecting appropriate observation times 0 < 'T1 < 'T2 < " . ,
but for the time being we can continue to work with the continuous time
variable. Furthermore, rather than introduce more symbols, we use 8 as
the dummy variable, where
so that
t z
rlr(t, 8) 1 +Z2 - 4K - . --.
82 1+z2
=
( x
2
+ 6x - 3
t(t, 8) - H (Z2) , H x) (5.23)
3(1+x) ,
= =
This expression, together with Equation 5.21, is the basis for all further
considerations.
76 ASYMPTOTIC EFFICIENCY
"'lim
... '" x =
3
Consequently,
11(1, 8)1
1
3 82
3K *
( ) t* A(fJ)t* = (5.26)
(t
g , 8) IFCt, fJ)1 A(fJ) g(fJ)
=
bet) A('2) =
SINGLE-PARAMETER ESTIMATION PROBLEM 77
increasing to infinity.
There are many such sequences for which Conditions 2 and 3 are met.
For instance, we can take slowly increasing times such as
(any > 0, k = 1,2,).
Then, by Equation 5.26, we will have for n -+ 00
n n
P2(Tk' 8) A2(8) loga k. (5.27)
k-l k=l
According to Sacks' ( 1958) Lemma 4,
n
toga k n loga n. (5.28)
k=l
Thus, as n -+ 00,
bn2 '" log n, Bn2 '" n toga n,
and both Conditions 2 and 3 hold. In addition, Condition 5 is true if the
additive noise in the range measurements has finite fourth-order
moments.
PART II
where
u" lies on the line segment joining tIl and e,
F,,(u,, is the gradient of F" evaluated at and
F,,'(u,,)) is the (row vector) transpose of the (column vector) F,,(u,,)_
U",
tn+l - e J-l =
I=J +
1*1
(6.3)
where, both now and later, T17 =m A, means the matrix product
A"An-1 Am (i.e., the product is to be read "backward").
al
where and b/ are p-dimensional column and row vectors, respectively.
{ aJ
We begin by studying conditions on deterministic sequences of p-vectors,
and {hj}, which are sufficient to guarantee that P" converges to zero
(that is, the null matrix) as n -+ 00.
In the one-dimensional case, this problem is trivial: P" converges to
zero if the positive a/I/s are such that 2, a/I, 00 and a/lj > 1 only
=
finitely often. (This was so because of the scalar inequality 1 - x :::; e-x. )
In higher dimensions, life is not so simple, and we must think in
terms of matrix eigenvalues. In what follows, we make use of the
following statement.
in order to find conditions on the vector sequences {aj} and {hj} which
will ensure liP,,11 -+ O. This approach proves to be fruitless. In fact, it can
be shown that
111- ah' lI 1
for any "elementary matrix" of the form I - ah', where a and hare
column vectors, so the above-cited inequality tells us nothing about the
convergence of liP"II.
The successful approach involves grouping successive products
together and exploring an inequality of the form
where lie is a set of consecutive integers. This idea is the basis of the
following theorem.
-
VI < V2 < V3 such
that, with Pk Vk+l
= Vb we have
(k = 1, 2, )
and
11m 1
. f-
10
k Pk
\
"min
( ""
Jell<
bjb/
IIbJ 112
) 2 0
L. T > ,
=
sum of the squared distances from the u/s to that hyperplane. Since
d2Cx) is continuous in x, it actually achieves its minimum on the (com
pact) surface of the unit sphere. Thus, the value of
D ISCUSSION OF ASSUMPTIONS AND PROOF 85
is the sum of the squared distances from the u/s to the particular
(p - 1 )-dimensional hyperplane that best fits" the vector set U1> . " Ur.
.
sequences {an} and {bn}. (We note 7"2 =::; 1 is always the case, so 0 =::; ex; <
1, as should be.) Moreover, the lower bound in Assumption B5 is an
essential one. This is graphically demonstrated by the following example
in which Assumptions BI through B4 hold,
{ [l
if n is odd,
-l [COS cp]
a" - n . ,
Sin cp
if n
is even,
h. ]
where 0 < cp < 'TT/2. Assumptions B I and B2 are immediate because
Ilanll = I/n and Ilbnll 1. The limit inferior in Assumption B5 is simply
=
lim infa,,'bn/llanll Ilbnll = min (cos cp, sin cp) =::; I/V2,
"
with equality only at cp = 'TT/4. With regard to Assumption B4, we have
]'
have
[1 0 .
J=2k-l
b.b/
J
=
0 1
and therefore
rp
-cos
=
rp
rp
=
1;,
integer such that "K :S; n, so that "K :S; n :S; "K+l - 1 . Then we have
n.
Pn Il (I - a,h/)pvx-1
i=VK
=
(6.5)
(6.6)
where
(6.7)
where, unless otherwise noted, k runs over all positive integers. It is not
difficult to see that
IIQkl12 =
IIQk'Qkll S III - k(Tk + Tk')11 + O(k2)
= Amax [I k(Tk + Tk')] + O(k2)
-
for some such number c, we are done. For then, since k 0, from
Equations 6.9 and 6.10 we have
Os IIQkl12 S 1 - 2Ck
(say) for all large enough k. But I - x S e-X is always true, so that
(6.11)
Since the square root of the sum of squares is never smaller than the
sum of the absolute values,
Tk + Tk' = 1k 2 (ajh/
Je/,.
+ hja/)
= } 2 rlvju/
Uk Je/,.
+ ujv/), (6.13)
where
(6.14)
DISCUSSION OF ASSUMPTIONS AND PROOF 89
where
min :- L '/[aAX'U/)2 + VI
IlxlI-1 Uk Jeilc
- a/2 (x'OJ)(x'uJ
1IX11-1Uk [
min A2 flkak L (X'UJ)2
lei"
- )'k yl-.:"'l---a"""'k2 L
lei"
]
Ix'OJ! IX'uA , (6.15)
where
Thus, the set of all unit length vectors x is a subset of those for which
L el(x) Ak
je/k
In turn, the set of all real numbers ej in the unit interval which satisfy
2: el Ak
J e/k
contains the set of those of the form of Eq uation 6. I 6 which satisfy the
inequality. Consequently, the lower bound in Equation 6. I 7 can be
weakened to
(6.19)
After applying the Schwarz Inequality to the second term on the right
hand side and setting
we obtain
Inequality 6.10 will thus follow if the lowcr bound in Inequality 6.20 has
a strictly positive limit inferior as k co. We now complete the proof
by showing, as a consequence of Assumptions 83 through 85, that this
is indeed the casco
In the original notation, the numerator quantities in Equation 6.19
are
lim i nf PleA Ie lim inf 2VPie ,B1e ale > 2Vp > 0 (6.2 1 )
Ie Ie )'Ie P
and fA:c
J>./;:
PIc
I'1m I. nf
"
JAk- - e = ., - e
Pic
hold simultaneously for k k(8). For all such indices, we can, therefore,
write
gk(Z) min g(z), (6.23)
t-BS2S1
where
.,-3
=.,.
Zo = - la,
"(1 - + [T - 3r
the last by the definition of Equation 6.22. Therefore, g(.) must be
strictly positive over [., - e, I ], because Zo < ., - e. This, together with
Equations 6.23 and 6.2 1 , implies the desired conclusion for Equation
6.20. Q.E.D.
Let us now return to the sequence of estimates (Equation 6.1), and
focus our attention on the resulting difference equation (Equation 6.3).
We allow the gain vector a, to depend on the firstj iterates, so that the
leading product is writte n
n
Pn(tl> .. " tn) TI [I - altl> ' . " tj)h/(tj)].
= j=1
92 MEAN-SQUARE AND PROBABILITY-ONE CONVERGENCE
of the following theorem. The sixth takes care of the additional term in
Equation 6.3 arising from the stochastic residuals Wh W2, '
-
and
THEOREM 6.2
Let {Y,,:n = 1, 2, } be a real-valued stochastic process of the
form Y" = F,,(O) + W", where F,,( ) is known up to the p-dimensional
parameter e, and Wh W2,'" have uniformly bounded variances. For
each n, let a,,() be a Borel measurable mapping of the product space
X RP into RP (Euclidean p-space), and let
(n = I, 2, ... ; tl arbitrary).
Denote the gradient vector of F" by F" and suppose the following
assumptions hold:
p;S;Pk;S;q < oo (k = 1 2 . . . ),
"
and
= p < 00.
. f a,,'(Xl> x,,)F"(y) > a,
. f
C5 I ImIn
"
In
.
where
1 - 1'2
a =
J 1 _
1'2 + ('T/p)2'
Then Cl lt" - 61120 as noo if either
C6. L: sup l I a,,(xh"',x,,)11 < 00 or
n Xl_ .x"
Xl. 0. Xn
(6.25 )
as k 00 over the integers. This is immediate, if we can prove that
(6.26 )
holds for all large enough k, say k N, and some sequences having the
properties
Mk > 0 , lim t:.k = 0, t:.k = 00, Bk < 00. (6.27)
Indeed, after iterating Equation 6.26 back to N, we obtain
94 MEAN-SQUARE AND PROBABILITY-ONE CONVERGENCE
It follows from Equation 6.27 and Lemma 2 that this upper bound goes
to zero as k te nds to infinity. The sought-after conclusion will thus be
at hand.
The second and third parts of the proof establish Equations 6.26
and 6.27 under Assumptions C6 and C6', respectively. In the former
case, the argument is relatively straightforward. Under Assumption C6',
however, the details are a bit more complicated, but we are finally able
to use the independence to establish the desired inequality with some
(other) sequences which obey Equation 6.27.
Proofof Equation 6.24. Ite rate Equation 6.2 back from n + 1 to VK,
where K = K(n) is as before. We obtain
" " "
t"+l - 8 J=VK
=
J=VK I=J+l (I - a,h,') aJWJ,
TI (I - ajh/)(tvK - 8) + 2: TI
(6.28)
where it will be necessary to remember that aJ and hJ are now vector
valued ra ndom v ari ables :
(6.29)
We "square" both sides of Equation 6.28, take e x pectations, and then
bound from above (in the obvious way) the two squared norms and the
inner product. The result is
(6.30)
J=m
TI III - ajh/ II < Mq (6.31)
Jel"
max IlaJ11
Jel"
max sup
Xl. .XI
Ilaj(xh, xj)11 = Ale -'Jo- 0 (6.32)
ASSUMPTIONS Cl THROUGH C6' AND Dl THROUGH DS 95
(6.33)
Qk =
TI (I - ajh/) =
Qk(th"',tYk+l-1) (k = 1,2,,, . ) (6.34)
Jeh
is stochastic. The deterministic quantity to be used here in place of
Equation 6.7 is
Ak
= C k xl i'XJ ) aj(xh' XJ)112II Fj(y)II2r
(6.35)
We formally define Tk the way we did in Equation 6.6, but with the
summands given by Equation 6.29 and Ak by Equation 6.35. Using
Assumption C4 in addition to CI, we see that Equation 6.9 remains true
for the matrix Qk of Equation 6.34. Furthermore, by virtue of the uniform
nature of Assumptions C3 through C5, the same ( long) type of argu
ment which led to Equation 6.10 proves, for the present situation, that
(k N) (6.37)
holds for some (deterministic) c > 0 and N < 00. We now apply the
96 MEAN-SQUARE AND PROBABILITY-ONE CONVERGENCE
!5:
(say) for all large enough k. It remains to be seen (for the same reason
that Equation 6.12 followed from Assumption B2) that Assumption C2
implies
l1k 00 = (6.41 )
1:7" '-1+1
(6.42)
6.39,
C5;
paragraph, with the exception of were derived from Assumptions
in particular, the balance of the second bound on e+l
6.38
Cl through
6.42, -+
in Equation remains true as written. Thus, given the val,idity of
Equation we wiII have, because Ak 0,
(6.44)
Equation 6.43 is the desired inequality of Equation 6.26, while Equa.
tions 6.36, 6.41, and 6.44 are collectively the statement of Equation 6.27.
It remains, therefore, to establish Equation 6.42. We begin by carry
E
in view of Assumption C4. After much manipulation, it turns out that
the norm of the matrix i k is also uniformly bounded:
i-I, Wi is independent of
through time i, and hence on the observational errors up through time
and
98 MEAN-SQUARE AND PROBABILITY-ONE CONVERGENCE
{ [ I k aiht']a,wil wlo"" Wi -l }
C (tvk - 6)' 1 +
III I ::; 6' Iltvk - 61111k IITk + Tk'il L Ilailil Wil ::; M311kAkek, (6.47)
ielk
where M3 involves All (with the last given meaning) and the uniform
bound on residual variances. Similarly. we have
IIIII ::; 6"lltvk - 61111k2 L II Eikll lla,111 Wil ::; M411k2Akek. (6.48)
ielk
where M4 involves M2 and so on. Since llk -+- 0 as k -+- 00, Equations
6.45 through 6.48 combine to give
THEOREM 6.3
where
max I laj(xh,x/)I I I F
I iy/)11
D4 rISUp
.:.... I;:..
k
/E:..,;. -::-..,.
- ..- _---,,....,,..:------,,-
p
_
jE < .
h
=
1 - .,.2
D5. lim inf
n
an'(xlo,xn)Fn(Yn)
I lan(Xh, Xn)I I n(Yn)1
I IF > II:
=
J1 - .,.2 + (.,./pi
The v,,'s, .,.2 ,and p can depend on the sequences {xn} and {Yn}.
t
Il n+l - 611 TI III
/=VK
-
ajh/ll lltvK -
611
n n
-
+ 2: TI 1 III ajh/ll llaJ WJII , (6.48)
j=VK 1=/+
where K = K(n) is the (now, possibly random) integer defined at the
outset of the proof of Theorem 6.2. The random vectors a, and hj in
Equation 6.29 are seen to satisfy Assumptions Dl through D5 with
probability one, when we set xJ = tj and Yj = Uj = uitj). Let
n
sn+1 = 2: ajWj (6.49)
"=1
The same two arguments used in the final paragraph of the proof of
Theorem 2.1, to show that Equation 2.13 follows from Condition 4 or
5, apply here to show (component-wise) that
Sn
&.s.
S (6.50)
100 MEAN-SQUARE AND PROBABILITY-ONE CONVERGENCE
(6.51)
as n-+oo. Under Assumption 01, there is a scalar-valued random
variable M = M(t!> t2,) such that Equation 6.31 holds with prob
ability one. Thus, from Equation 6.48,
R, {0 (I - aM for j= 1,2,,n,
for j;::: n + I,
"
+
"
L R 1+1ajh/s, Sn+1
1=1
= - +
obtain, with S1
incorporate the established limit of Equation 6.50 into the result and
0,
=
ASSUMPT[ONS CI THROUGH C6' AND DI THROUGH D5 101
n n
(6.55)
llk
A
-+
&.sO
,
(6.56)
where void products are to be read as unity. This, plus Equation 6.55,
gives
m m
::; Mq L TI II Qdl kdk'
k=ll=k+l
(6.59)
after setting
. ' x - 60
[xlao = mID (r, Ilx - (011) x + 60
Il _ 0011
If fJ' is the rectangular parallelepiped {x: al XI {3h i = 1,2, . , p},
where XI is the ith component of x, then the ith component of the vector
[xlj> is simply [xa!:, which was introduced in Equation 2.1.
We conjecture that the following proposition is a valid extension of
Theorems 6.2 and 6.3, but the methods used to establish those results do
not seem to work for the present situation.
(k = 1 , 2, . ),
and
1 . l
1m Inf -
.
Inf
\
"min L.
(
" F;(YJF/ (Y;
...
) -T
-
2
> 0,
" PIe (y.k..).k+ l-l)e(l'kl l ;(Yj)II 2
ieJk Ir
where
J" = {Vb V" + 1,, VIe+1 -
I}.
maxIl a;(xlo' " , X;)II IIF;(Yi) II
E4. lim sup sup
eJ..;.:.k-c-
_i_ ---,-;- ____ __
1 -
J I-T2 + (T/p)2
T2
=
(k = 1,2,),
where the regression vector Fk(8) and the residual vector Wk each have
Pk components and are defined in the obvious way.
The recursion considered is of the form
observations.
The mean-square consistency of truncated recursive estimators
SUbjected to batch processing is the substance of the next theorem.
THEOREM 6.4
Let { Y..}, {anC )}, and 9 be as defined in the previous statement of the
Conjectured Theorem, and suppose that the Assumptions E 1 through E5
and either E6 or E6' hold. Let S1 be arbitrary in 9, and let
(k = 1,2" , . ),
whe re
1 06 MEAN-SQUARE AND PROBABILITY-ONE CONVERGENCE
Thus, we have
and
Ilsk+1 - Oil 11(1 - Akllk')(s" - 0) + Z"II
From this and Equation 6.62, we deduce the inequality
Ils"+1 - 011 2 III - A"Hd 21Is" - 011 2
+ 2W,,'Ak'(I - Akllk')(s" - 0) + IIA"WkI1 2. (6.63)
If we can show that
(6.64)
BATCH PROCESSING 1 07
holds for some (deterministic) c > 0 and number sequence {L1k} such
that
I1k-+ 0, (6.65)
then we will be finished.
Indeed, if we set
ek2 =
Cllsk - 8112,
it follows from Equation 6.63, after first majorizing the middle term
with norms, that
e+l (I - cl1k)2ek2 + M1C2(IIAkIl Il WkID2ek + 6( IIAkIl Il WkID2.
(6.66)
Since
IIAkll2 tr (Ak' Ak) =
2: lIajll2
je/k
it follows that
+ M4ak(I + ek)'
According to Lemma 3, Equations 6.65 and 6.69 imply SUPk ek2 < 00;
therefore,
e+l (1 Cl1k) ek2
- + MSak'
UJ E . Then we have
and
II - J
/c
(ajb/ + bja/) I + IIAkHk'I12
II - 2: (ajb/
J el"
+ hja/) II + 2: Ila,1121IbJI12.
iel"
(6.70)
We set
where
Corollary. If the sequence of gain vectors {an()} satisfies El, E2, E4,
E5, and either E6 or E6', then so does
an ( . )
* = cp"an(')
for any sequence {CPn} of scalars bounded from above and below away
from zero.
7. Complements and Details
I n this cha pter we wi ll exa mi ne vari ous rati ona les for choosi ng gai n
se que nces for vect or -para me ter re cursi ve esti mati on s chemes . M otivate d
by considerati on of the li near case , two types of gai ns will be dis cusse d i n
detai l. T he firs t cate gory of gai ns possesses a n optima l pr oper ty whe n
a pplied t o li near regressi on . T he ot her has the virt ue o fex tre me compu
ta ti ona l simp licity. T he res ults of T he ore m 6.4 are s pe cialize d a nd
a pplie d dire ct ly to these par ticular gai ns i n T he ore m 7 .1 . We be gi n our
dis cuss ion wit h a look a t li near re gressi on fr om the re cursi ve poi nt of
view.
where {hn} is a k nown se quence of p-di me nsi ona l ve ct ors, a is not k nown,
a nd the W,,'s are ass ume d to be i nde pe nde nt ra nd om varia bles wit h
commori u nk nown varia nces a2 S uppose furt her tha t s o meone prese nts
us wit h a n esti mat or til tha t is base d upon (is a meas urable functi on of)
the firs t n - 1 obser vati ons . We constr uct a n esti mat or tn+l tha t
i ncor p orates the nt h obser vati on YII i n the foll owing wa y:
Since
t9'lI tn+1 - 8112 = tr t9'( tn+l - 8)( t"+l - 8r, (7.3)
it i s cl ea r tha t ansh oul d be ch osen to mi n i mize th e tra ce of Bn. Su bst itu t
i ng Equa ti o n 7.1 i nto Equ ation 7 .2, ex ploiti ng the i ndependence of the
W' s, a nd completing th e squa re, we find th at
Bn - Bn-l
_ (Bn-In
h )(Bn_lh..)'
1 + hn'Bn-1hn
+ ( 1 + hn'Bn-lhn) an - ( Bn-1bn
1 + b"'Bn-lhn an )( -
Bn_lh,,
h '
1 + hn'B"-1 n ),
(7.4)
T hus,
tr Bn = tr Bn-l
h n'B_lh"
1 + hnB' n-l"
-
h ) an
+ ( 1 + hn'Bn-1n
I -
1 + Bh:lnh r 7 )
( .5
Thu s, if th e estima tor tn i s gi ven ( with second- order moment ma trix
B"-I,) th e appropr ia te valu e of an ( wh ich mi ni mizes tr Bn) i s given by
(7.6)
Wh en an is so ch osen,
Bn - B n-l _ \?-.. _I"o,,}\?-n- "0,,)
1
(7.7)
-
1 + n h B' n_l"h '
)
and
B"h" = Bn-lhn 1 ( -
hn'B .. _1h,,
1 + hnB' n-1hn
1 Bn- hn
:...,..::-:- = an
1 + hn'Bn-1hn
Thus, the same end is achieved by choosing
an = Bnbn, (7.8)
wh ere Bni s defined i n terms of Bn-1 by Equation 7 .7. Th is result leads
u s to gi ve seri ous consi deration to gai n sequences defined i teratively by
OPTIMUM GAINS FOR RECURSIVE LINEAR REGRESSION II I
I n order to "get the re cur si on started," ini tia l condi ti ons for tIl a nd
B" -1 mu st be spe cifie d for some n no + 1 . I f thi s i s d one , i t is ea sy t o
=
(k = 0, 1 , 2,), (7 . lO)
and
(7 . 1 1)
( -1 L" bjb/)-1'
= =
= Ro +
B" ( 7.1 3)
1=1
a nd
( 7 .14)
Thi s i s exa ctly the ex pre ssi on for the condi tional expe cta ti on of 6 , give n
Y10 , Y", in the ca se where the re sidua ls have a spheri call y symme tric
Supp ose, o n the othe r ha nd, tha t we wai t for p ob se rva ti ons to
accumula te be fore a ttemp ti ng to e stima te the p-dimensi ona l parame te r
O. I f we a ssume tha t hl, h2' ... , hI' are linea rly i ndependen t and take, as
our" fir st" e stima te,
- 0) et bib/) l et bJWJ).
(tp+1 = -
)
Th us, if we tak e
Bp ( hjh/ ,
I' -1
no = p, and =
J=1 '
we ded uce from Equa ti on 7. 10 tha t
for n p.
(7 .15)
wh ich is pre cisely th e lea st-s qua res es tima tor for 0 based upon
YhI nYth2, , Yn.
e more conventional matrix notati on, th e Bayesia n and lea st
s qua res estima tors, Equa ti ons 7. 1 4 and 7.15, ca n be wri tte n as
and
res pectivel y, wh ere Hn' is th e n x p matrix wh os e rows are h/ (j =
YJ
1, 2, , n), and Yn is the n vector whos ej th component is th e s cal ar
obs ervation Thus , depending upon th e initial condi tions, th e
recurs ion of Eq uation 7.9 can yiel d the B ayesian es tima tor of 0 (condi
tional ex pectation) in a Gauss ian formulation, or the leas t-sq uares
estimator for 0 (no ass umptions concerning dis tributio n theory of
res idual s being n ecess ary) .
OPTIMUM GAINS FOR RECURSIVE LINEAR REGRESSION 11 3
Therefore,
Iflltn+l - 6112 = tr Bn[Ano + a2Bn -1]Bn
= tr BnAno Bn + a2 tr Bn
app roa ch es zero if and only if tr Bn -+ O. Since
.\max (Bn) S tr Bn S P.\max (Bn),
th is redu ce s the que sti on of tn's mea n- square consiste ncy t o the stud y of
Bn' s large st ei ge nva lue. ( W e could resort to The ore m 6.2, bu t we will se e
tha t the specia l featur es of th is linea r pr oble m mak e the hyp othe se s of
The orem 6.2 unne ce ssari ly strong.) B y Equati on, 7 .10 ,
Si nce .\max (Bn) = l/.\mln (Bn -1) , and since .\mln (Bn -1) -+ 00 i f and onl y
if
for every unit vector x, w e must fi n d cond itions wh ich ensure tha t
l i m .\mln
n
( i hih/)
11
= 00. (7 . 16)
Equa tion 7 .16 wi ll hold if there is a s eque nce of integ ers 1 VI <
V < . . . , with
=
2
P S Pk = Vk+l - Vk S q < 00 and Jk = {vbVk + I, ,Vk+l - I},
s uch that
(7 . 17a)
114 COMPLEMENTS AND DETAILS
an d
L mi n IIhJI12
'"
n=1 Je/"
= 00. (7.17b)
Fo r the n
i "min (L bih/ )
n= l leI"
mi n IlbJl12"mln ( L hJh/lllhJI12)
nml Je/" leI"
T2 L mi n IlhJI12
k
-+ 00.
nal leI"
Since
1
pq
- L
le/k
IIB,bill Ilh,ll;
T hus,
I n s ituat ion s where data p our i n a t a very high rate a nd"rea l-t ime"
e st imate s are a cutely de sired, we ma y be will ing t o trade off stat ist ical
effi cie ncy f or comp utat iona l speed, s o long a s consiste ncy is pre se rved .
T he ga in seque nce
(7 .18)
furn ishe s a very hand y e st imat ion scheme. T he ga ins used in E quat ion
7 .9 all owed u s t o find a closed f orm expre ssion f or t" and t o stud y it s
a sympt otic pr opert ie s d ire ctl y ( wit hout re cour se t o The orem 6.2). T h is
i s not p ossible in the pre sen t ca se. H oweve r, if we a ssume tha t E quat ion
7 .17 holds and, in add it i on, t ha t
o < lim inf Ilh"+lll < lim sup Ilh"+lll < ex)
(7 .19a)
n Ilh,,11 -
n Ilh,,11
a nd
l im ( 7 .l9b)
n
1 16 COMPLEMENTS AND DETAILS
then Assumptions C1 through C5 and C6' are satisfied. To see why this
is so, we begin by pointing out that in the present case
(7.20)
therefore, if i, j E In,
Consequently
while
Therefore,
Since the hypotheses of Theorems 6.2 and 6.3 are identical in the
linear case, we infer: The recursion defined by the gain of Equation 7.18
generates an estimator sequence that is consistent in both the mean-square
and almost-sure senses, prorided that the conditions of Equations 7.17 and
7.19 are satisfied by the regression l:ectors {hn}.
,Y
if 1 VI < V < . . . and
Y
=
2
. - [..'" : 1
y
1.2. . )
_ .. +l
(k -
.- .
then
(k = 1.2,).
Now consider the same question that was posed in the early part of
Section 7.1. If the observations Yh Y2,"', Yk- have been used to
form the estimator Sk, and jf
l
1
Rk-l = 21f(sk - 6)(Sk - 6)'
a
Ifllsk+l - 6112,
where
Sk+1 = Sk + Ak[Yk - Hk'Sk] ? (7.2 1 )
Tk = (Hk'Rk-IHk + J) (7.22)
and
(7.23)
1 18 COMPLEMENTS AND DETAILS
(7.25)
where
(7.26)
This, in turn, strongly suggests that we give serious consideration to
recursions of the form
(k = 1, 2" .. ) , (7.27)
where Rk satisfies Equation 7.26 for k 1, 2" " and Ro is arbitrary.
=
Sk+1 =
[i =Iiko+l
(I R/HJH/> ] sleo +1
+
ttL (I
J= +1 R;II;II;') RjlliY/. (7.28)
]
OPTIMUM GAINS FOR BATCH PROCESSING 1 19
]
7.26) that
k
-1
Rk =
[
Rkol + L H,H/
J=ko+l
(k ko + 1" ). (7.29)= . .
Sk+l = (
Rk RkolSkO+l + i
J=ko+1
HJYJ ) (7.31)
Sk+1 =
[Rka1 J=iko+1 H,H/] -1RkolSko +1
+
You will recall that H/ is the matrix whose rows are h;" h1 + It .
"
h/+
.
H,H/ =
L hth/,
tell
(7.36)
(7.38)
120 COMPLEMENTS AND DETAILS
Sk+1 =
[1-111-11' It 1-111-1/] -ltt H1Y1)
+
=
(Vk 2: bjb1') (Vk 2: YjbJ),
+ 1 -1 -1 + 1 -1
1= 1 1= 1
which is identical to Equation 7.15, the least-squares cstimator based
upon the fi rst Vk + 1-I observations.
We now recall that
CJ
n - -
that the kth element of the recursively defined sequence of Equation 7.26
is identical with the (vk+1 I)th element of the recursively defined
sequence of Equation 7.7, if k 2:: ko and
-
and Bn satisfying the recursion (Equation 7.7) for n 2:: Vko, generates a
sequence of estimators II'hich (depending upon initial conditions) is a
subsequence of those generated by the recursion Equation 7.9, and,
"QUICK AND DIRTY" BATCH PROCESSING 121
for Vk :::;; n < Vk+h then the arguments of Section 7.2 can be applied to
the present gain vectors, and Assumptions El through E6' can again be
established. The inequality is indeed true. Under Equations 7.17 and
7.19, we have
+ h ll2 :::;; ( l
cdn +
1=1 1=1
ifvk:::;; n < Vk+l ' Thus, Theorem 6.4 applies to the untruncated (as well
as truncated) batch-pr .)cessing recursion; therefore, Sk converges to 0 in
the mean square.
We now turn our attention to the question of gain sequences for truly
nonlinear regression problems.
122 COMPLEMENTS AND DETAILS
Yn = Yn - [Fn(eo) - Fn'(eo)eo],
we deduce f rom E quati o n 7.41 that
Yn Fn'(eo)e + Wn (7.42)
( wh e re means "approximately equal" ) . In turn, Equation 7.42
suggests that it would be worthwhile trying the recursive linear
regression schemes developed in Sections 7.1 through 7.4 on the trans
formed observations Yn. That is to say, we "pretend" that
Yn = hn'e + Wn,
where
hn = Fn(eo).
We estimate e b y a recursi ve s cheme of the f orm
tn +1 = tn
an[ Yn - hn'tn],
+
(7.45)
The B" recursion is initialized at no, where B"o can be any positive
definite matrix. In this case, in closed form, we can write
The other sequence is the nonlinear version of the "quick and dirty"
gain:
(7.47)
whereas those with h" = Fn(Oo ) are deterministic. The reader will recall
from Chapter 4 that adaptive gains may or may not be more efficient in
the scalar-parameter case (compare Theorem 4.2), and we feel it is safe
to conjecture that a similar situation exists in the vector case.
124 COMPLEMENTS AND DETAILS
=
:F,,(!; ) (7.48a)
a" n ,,
2
j=1
II:Fi(;j)112
an = B,,:F,,(;,,), (7.48b)
where
and, for eachj, !;j maps (th t2, .. " tj) into (ffJ. The gains can be classified
as deterministic if ;j(th t2, " tj) = 60 E [j! for all j; adaptire if
where
and
(7,48c)
and
(7,48d)
Deterministic if !;v/c
Adaptive i f !;v/c
=
=
00 E f/J
Sic
}
Sk) i nto f/J. The g ains can be clas sified as
(k = 1, 2" , ,),
Quasi-adaptirJe i f !;v/c = Sn(k)
where n (k) is a n oild ecreasi ng i nteg er sequ ence with n (k) k.
THEOREM 7.1
L et {Fn ()} b e a s equen ce of real-valu ed fun ctions d efin ed over a
p -di men si on al closed convex set f/J. W e as sume that each Fn{) has
b ound ed, second- order mixed partial d erivatives ov er f/J and that for
some x E f!/, the foll owing condi tions hold true.
where
[ ]
126 COMPLEMENTS AND DETAILS
02Fn(,F,,)loglog,
2Fn(,F,,)log2og, (i 1,2" , ', p)
=
2Fn(,F,,)logpog, ; = y,
and
satisfies the conditions:
IlblBI
Fl. lim
.....
lib.*ral = O.
/-1
..
..
F3.
-1
lib.*111 00
such that
.
1im I n f! \
k ... ", Pk. "mIn ieJ"
( ""
L. II hi*112
h/h/' - *2 )- 7" >
0
,
where
Jk = {Vk' Vk + I" ", Vk+l - I}.
Let r(f?) be the rad ius of the smallest closed sphere containing f?
a*2 10
1m sup -
1 \ ( " h/h/*' )
I lhI*112 .
=
k -"O Pk
" max
L..
le/k
for all E f!lJ if r(f!jJ) is chosen small enough to ensure the leftmost
inequality. By Assumption F4, we have
(7 . 5 1 )
128 COMPLEMENTS AND DETAILS
u n iformly in an' s ar gument for all n. Assumptions E l , E2, E4, and E6'
n ow foll ow w hen Eq uations 7 .49 thr ough 7 .52, F2, F3 , and the Abel
Di n i theor em (2.27) ar e combined i n w hat is, by now , routine fashion.
To pr ove Assumption E5 f or the gains of Eq uations 7 . 48a , c, w e
n otice that
IlaiM IIFb)II =
II a, II IIFiy)!! -
IIh/1l C4r(&I')
1
IIh/ + ri()11
-
for all E f!/, provided that r(&I') is suitably small. Thus, the left-hand
side of Equation 7.53 is bounded below (uniformly) by (\ C3r(9' x -
where
Ilh/il
Ilh/ + rill - 1 + rJ, (7.55)
_
p( * - e)
T2 - q C5r(&)
by Assumption F5 if k is suitably large and r(f3IJ) is suitably small. Here
2 is a nonnegative d e finite matrix. From the Courant-Fischer character
ization of eigenvalues, Schwarz's Inequality, and Equations 7.49b and
7.55),
min (3)
2: Ilr/llllh/11
- - 2 JeJ" IlrJ + h/112
>
*
=
- 2 '1;
1 " Ilt11I Clr} /I,r - Csr(&).
(7.56)
1=1
(m = n, "k+1 - 1),
and k is chosen so that
Furthermore, since R
is assumed to be nonnegative definite and since
(A + B)
'\mln '\mln
(A) + '\mln (B)
for symmetric matrices,
it
if nis large.
Consequently,
Ilanll [CuK}Il
) - 2Cl r()] C1r(Y' n I hn*1 , (7.61)
-1 (l +
i=Ll I h/1 I2
p
iLm1 I h/ rJI12 + tr R
Yk + 1-1
tr R + q{ l + Clr(2K2qllhn* 1 2
n Il <
2:
J=1 I h/1 2
132 COMPLEMENTS AND DETAILS
for large n. Combining Equations 7.58, 7.59, and 7.62, we find that
(7.64)
max Ilb/1 2
Jeh
min Ilb/1 2:S;
K2q, max
Jeh
IlbJ* 1 2 :s; Ilb:kI12K29,
Jeh
Ilb:kl12 II
/1 2 - qK2q
-::-.::.....;.::..:!-- <-
2
J-1
Ilb
if k is large (that is, if n is large). Thus, p (as defined by Assumption E4)
is bounded above by
(I C1r(f1' )2 4 [ (l - CLr(f1'2 + ] (I )
- C1r(f1') (pK q) T*2 - 2pK2QC (f1')
+ II
+ II
1 1r _ 211
Since II is arbitrary,
(l + C1r(f1'4(pK49)
p :s; (l _ C1r(f1'2(T*:a 2pK29C1r(f1'"
_
(7.65)
{n,
where
for Equation 7.48b,
m-
P l-b for Equation 7.48d.
By Equations 7.SO and 7.66,
and
I IF,,(;,,)II 2! (I - C1r(&'lIh,,*1 I ,
Letting
by Lemma 70 we have,
F,,'(;,,)BmF"(;,,) > 2/e % 1
m ( +
/em) -1
IIBmF,,(;,,) 1I IIF,,(;,,)II - '
Thus, we see that
I . f
imin . f
in a,,'(x)F,,(y)
" xea'(n'yea' II a,,(x)II IIF,,(y)1I
2!
..
hm.: nf 2/em%( 1 + /em)-1
(I - 2C1r(Y'9)
1
2C1r(&')
+
3C1r( ) - 1 _ 3C1r(.o/')
2! lim inf 2/em%(1 +
/em) -1 - Car(&') (7.66)
m
if r(&') is small.
134 COMPLEMENTS AND DETAILS
+
1 - 72 + 72/p2 1 _
if r(P/) is small.
On the other hand, if Equation 7.67 holds,
li
nf 2K,,%( l + K,,) -1 (;:*) [ (K::*rr 1 - C13r(P/),
l +
1
A..s. r '5' (Il
Ls;,. l
+ rj)(bl + r/)'] if m eJIe
ieln
I
\ .
lh * 1
leln I
. I h l* I 2 {mln [h/
h/'] + mln [rih , *' + h/r/]}
fl Ilh/1 12 i fl
n Ilh /11 2 i n
Since
we see that
"min (Bm - 1)
min
,,= 1 lei.
Ilh/1 2 {"min [ L: hl hj *JI :] - 2qClr()} -
lei. 1
q min
leh
Ilh/1 2
if m eJk.
if m Elk.
By virtue of Assumptions F3, F4, and F5, the sums in the denominator
and numerator approach +00, while the ratio of the second term in the
numerator to the sum in the denominator approaches zero by Assump
tion F2. Using the discrete version of L'Hospital's rule, we find that
lim inf Km
m .... ""
136 COMPLEMENTS AND DETAILS
and by Assumptions F4, F5, and F6, we see that the last is greater than
or equal to
(C1;qr - Cgr([ljJ),
lim sup
.\max (11 hjh/)
.\mln (2: hjb/ )
= 00. (7.68)
...... co
Jm1
(n = 1 " 2 ...) ,
and if we attempt to estimate I) recursively (by means of Equation 7.9), it
is necessary to compute BII = (Li 1 hjb/) -1 at each step of the recur
=
THEOREM 7.2
If 2" IIhnl12 = 00 and l im" h"/lIh,,1I = h, then {hIll is ill conditioned.
(We defer the proof till the end of this section.) For instance, if
it is clear that
b" [0].
IIh"lI
-
I'
therefore, Theorem 7.2 applies. At the same time, we have
The first factor on the right-hand side is less than a constant times n - 4;
therefore, tr B" = O( Iln) _ 0. In cases such as these, we can only advise
the practitioner to exercise extreme caution in designing his computa
tional program.
In light of Lemma 7b, ill-conditioned linear-regression functions must
necessarily t'iolate at least one of the hypotheses of Theorem 7.1. If, in
particular, the regression is ill conditioned owing to the fact that
b"
L IIh,,1I2 = 00, and -h
" IIb"lI ,
138 COMPLEMENTS AND DETAILS
it follows that
(n+k b.b'' ) =
lim
n --oo Am1n 2:n Ilh j 1'12
i=
kAm1n (bb') = 0
for any k, which means that Assumptions C3, D3, and 3 of Chapter 6
are I.:iolated. This of itself does not preclude consistency (for example,
least-squares polynomial regression). However, the theorems of
Chapters 6 and 7 don't apply. In particular, the "quick and dirty"
recursion applied to polynomial regression cannot be shown to be
consistent.
These observations apply even more strongly to the case of nonlinear
regression. A nonlinear regression function exhibits the pathology of ill
conditioning if Equation 7.68 holds when bn is the gradient of the
regression function evaluated at the true parameter value.
Proof of Theorem 7.2. Since det A is equal to the product of A's
eigenvalues, it must be that
det A Amax (A)[Amln (A)],, -1
if A is p x p and nonnegative definite. On the other hand, we see that
-[ ] -G r.
tr A)
Aax (A) Amax (A) P-1 > (
det (A)
(7.69a)
_
Since
it follows that
(7.69b)
RESPONSE SURFACES 139
B ut
Since
(7 .70)
are the sampling instants. The large-sample properties of recursive
estimation sequences are determined by the analytic properties of
F( ; e) for large values of t.
However, the scope of regression analysis also embraces experimental
situations where the regression function is of the form F(t; e), t now
denoting a (possibly abstract) variable that the experimenter can choose
more or less at will (with replication if so desired) from a certain set of
values. In particular, the constraint of 7.70 is not present. In fact, the
values of the independent variable t are usually chosen from a set that is
bounded (in an appropriate metric) or compact (in an appropriate
topology). For example, F(t; e) might be the mean yield of a chemical
process when the control variables (temperature, pressure, input
quantities, and so on) are represented by the vector t and the external
140 COMPLEMENTS AND DETAILS
From the theoretical point of view, the most appealing feature of the
recursive method, applied to the determination of response surfaces, is
the wide class of regressions (apparently much larger than those in
time-series applications) that satisfy the hypotheses of Theorem 7. 1 .
The following theorem demonstrates the great simplifications that ob
tain when the independent variable t is constrained to a compact set.
THEOREM 7 . 3
Let !'/ be a compact set, let &' be a convex, compact subset of p
dimensional Euclidean space, and suppose that F( ; . ) is a real-valued
function defined over !'/ &', having the following properties :
G I . 0 2 F/oO,001 exists and is continuous over !'/ &', and
G2. II F II is continuous and positive on !'/ &',
where F is the column vector whose components are
8F
(i = 1 , 2, , , ' , p).
80,
Let tb t2 , . . . be a sequence of points from !'/ a nd let
with
such that
k-..
lim inf det
o
( 2 F/(X)F/(X)
l e i"
= D2 > 0
Kl s IIF,,(x) II S K2
for all n and all x E &'. These facts establish Assumptions Fl through F4.
If B is p x p and nonnegative definite, we find that
Since
and since
the induced product topology. However, in most (but not all) applica
tions, :T will be a closed bounded subset of some finite dimensional
Euclidean space.
We close this chapter by exhibiting examples of regression functions
of the form F,,(6) F( I,, ; 6) which violate the conditions that justify the
=
recursive method if tIl 00, but which satisfy the conditions of Theorem
7. 3 if the tIl are chosen appropriately from a finite interval.
=:;; fJ2 and {tIl} is a suitably chosen sequence, the difficulty disappears. In
fact, to make the problem more interesting, consider
F(t,,; 6) = 01 sin 02 1",
wh ere
a nd
The function
F(I,6) = 0 1 sin 02 1
sa tisfies Assumptions G l a nd G2 o ve r :T (#J i f we take :T = [Th T2 ].
On the other ha nd, a little a l geb ra shows that
d et [F2" + l(6) F" + l (6) + F2,, (6) F ,,(6) ]
-
_ [124022 (1 22 n + 1 - 1 2 ")] [
--
Sin 02(12,, + 1 - 12,,)
0 2 (1 2 ,, + 1 - 1 2 ,, )
sin O 2(12,, + 1 + 12,,) 2
-
0 2 (t 2 ,, + 1 + 12 ,,)
]
a nd
we see that
o < IX 2 Ts S IL" S w" - 2IX2 Tl < w" S 2T2f32 < 77.
Since sin e/e has a negative derivative which is bounded a way from zero
in the interval [IX2 TS' 2T2f32 ], we find that
m
. (--
Sin IL" sin w,, 2
- -- > 0. )
" IL"
f
w"
This establishes Assumption 03 if we take II Ic = 2k - 1 (k = 1, 2, ).
a nd
Furthermore,
det (F2 " + lF2 " + 1 + F2"F2 11)
= (t211 + 1 t211) [exp 2 (J2t" + (J1 t211 + 1)][ 1 - exp (J2 - (J1)(t211 + 1 - t2,,)]
2 2
where the a's and /fs can be positive or negative. and if the sampling
instants are chosen so that
Tl == inf l. < sup t. == Til. and o < Ta == inf (Ia. + 1 - 'IIJ.
then
F(I ; 8) - 'lei,}
satisfies Assumptions 01 and 02 on [Tl Til] 8'. Moreover. we see
that
2 O/Inl
"
F(tn ; 6) =
1-0
fail to satisfy Assumptions C3, D3, and E3 of Chapter 6 if tn -+- 00.
However, if the sampling instants are suitably chosen from a compact
set, this difficulty also evaporates. To illustrate this, consider the case of
a first-degree polynomial
F(t ; 6) = 00 + Olt
sampled at times {In} in the interval [Tl' T2 ] and having the property
that
Letting
Fn(6) = F(ln ; 6),
we find, as usual, that
One such scheme (defined over the interval [0, 1 ] , with T3 = t) chooses
11= -1" and
(j = 1, 2, . . " 2 k - 1 ; k = I, 2, . . . ) .
8. Applications
Alternatively, one might use the " quick and di rty" gain initially to get
things started and then switch to the other type of gain, under the
supposition that the linearized version of the problem is, by then, an
adequate approximation. This a ppro ach can be investigated analytically
i n the spirit of the present work, but we will not pu rsue it further.
If the " linearized least-squares" gains of Equations 7. 48b, d are to be
used, the results of the scalar-parameter case presented in Theorem 4.2
for Gains 2 and 3 show that we cannot state a priori that the adaptive
version will be more efficient than the deterministic version (as one
mi ght expect). At this time, we can offer little in the way of guidelines
for choosing between adaptive and deterministic linearized least-squares
gains. However, adaptive gains must be compute d after each cycle of the
recursion and so, if pressed for time, we may be compelled to resort to
the quasi-adaptive or deterministic versions. On the other hand, if
"quick and dirty" gains are being used because of time considerations,
the sensible thing to do is to use the deterministic versions. These can be
stored in memory and need not be computed in real-time. If the " quick
and dirty" gains are being used because Assumption F6 of Theorem 7.1
cannot be established, the adaptive version might conceivably speed up
the convergence rate somewhat.
We will now display some examples and show, in each case, how to go
about verifying the conditions which will guarantee consistency of the
recursive-estimation procedure used.
and solv ing (by least squares perhaps) for the value of e" that "comes
closest" (in some sense) to making the equati ons
f(On) = e"
work.
TIME-HOMOGENEOUS REGRESSION 149
and
f(O) =
[f1(0)]
: ,
fr(O)
so that
Yll = FIl(O) + ZIl (n = 1, 2,,, . ),
where
(i = 1 , 2", ',r; k = 0, 1" . .). (8.1)
I n this case, we could justifiably consider the single-observation recur
sion. However, for the purposes of this example, we confine our atten
tion to the batch-processing recursion.
We will assume the following :
o is known to lie inside a prescribed p-dimensional sphere 9. 8.2
()
The components of each of the vector-valued functions
1,() = grad};() (i= 1, 2, .. , r)
are continuously differentiable over Y'. (8.3)
For each x E 9, the set of vectors Il(X) 12(x), .. , Ir(x)
has rank p and all have positive lengths. (8.4)
We also assume that either
{Zn} is an independent (scalar) process with mean zero (8.5)
or
for some S > O. (8.6)
We will consider the (truncated, batch-processed) recursion
Sk+l = [Sk + Ak(Yk f(sk))]ao,
E 9,
-
Sl =
(8.7)
0
1 50 APPLICATIONS
Ak =
1 r
k (J1
IltiOo)112
-1
)
(t1(00)"'" t,(Oo
(deterministic, "quick and dirty"), (S.8a)
1( - 1
(deterministic, "linearized least-squares ) " , (S.Sc)
= k Ij(Sk)I/(Sk)) (t (Sk),
r
Ak 1, 1,(Sk
J 1
. .
and
u - 1,2" , . ) .
112) hn*
1
an* = (Vk+i-1
j=1
Ilh/
-
(vk =s; 11 < vk+l),
S ince
fPkAk*(Yk* - Fk *(Sk = Ak(Yk - Fk(Sk
for every k, the recursions of Equations 8. 12 and 8.7 (hence Sk and Sk)
are identical, which immediately establishes the mean-square conver
gence of Sk under Equation 8.6.
152 APPLICATIONS
and
(8. 1 4)
where Gn* ( Xl> .. " xp) is the matrix whose columns are given by
Equation 8.9, with F,. replaced by k6Fn ("k 11 < "k+1 ) ' Thus, Fk*
satisfies Assumption FI of Theorem 7.1, since
therefore, by Equation 8. 1 3,
K1n6 Ilhn*11 K2n6,
which establishes Assumption F2'. Since
h/ /' hjh/
L h 2 =
ieJk Il /11 ieJk
L Ilhil12
,
[Vk +i
-1
;=1
]
IIF;(sIJI12 -1 F n(Sk) (Ilk :s; n < "k+ 1),
and Sk depends on the observations up through time "k 1. Nonethe
-
less, the proof of Theorem 7. 1 goes over word for word, and the same
arguments used to establish the convergence of the recursion under the
gain 8.8a can be applied verbatim to 8. 8b.
The" linearized least-squares" gains of Equations 8.8c, d are treated
similarly except that an additional assumption concerning the condi
tioning number of LI= 11;(60)1;'(60) is called for in order to meet
=
Assumption F6 of Theorem 7.1.
In the very special case where Zn 0 for every nand r
= p, the
regression problem reduces to that of finding the root of the equations
11(6) = Y1
12(6)=
Y2
(8. 1 5)
Ip(6) = Yp.
In the absence of noise, the vector" observations" are all the same :
Observer
[=arctan .(kll
, '-
, l at time kr
1--- . ( k 1 -----l
' ... ..
,1
where
where the gradients Fi can be evaluated either at some nominal value 80,
(deterministic version) or at the then-most-recent estimate s} (adaptive
case).
In general,
therefore,
and
i f n is even,
i f n is odd.
If we let
and
uniformly in a,,'s argument.
It is now an easy matter to verify Assumptions EI through E6' of the
conjectured theorem in Chapter 6. Assumptions EI, E2, and E6' hold
because
o < C3/n Il an ll IIF"II C21n
uniformly in an's and FII'S argument. Assumptions E3 and E4 hold with
T2 = 1- and p C2/C3 if we choose Vk 2k - 1 (k = 1, 2,) . For
=
-------i
(0 )
(b)
the input amplitude increases, the ampl ifier saturates. (See Figures
8.2a and 8.2b.) A model that is frequently used to describe the input
output relationship of a saturating amplifier states that
2S
Yout (t) = - ;;; arctan
(rrA
(t)
)
2S Yin
{ (k; ) /2'1T/ if
'Y k is odd,
t"
'Y
=
_
if kis even .
If we set
kodd,
keven,
We assume that
and
and esti mate 8 via the batch-processing recursion
and
(8. 1 80)
or
(8. 1 8b)
(8. 1 9)
ESTIMATING INPUT AMPL ITUDE 1 59
and the matrix of Fk's mixed partials, the first column evaluated at
Given the existing assumptions, the norm ofF k(') is uniformly (in k and
6) bounded above and away from zero, and the col umns of Gk are
uniformly bounded. This establishes Assumptions Fl through F4. To
establish Assumption F5, we choose
(k = 1,2,,, .) .
The norms iiFk(X) ii are uniformly bounded (in x and k); t herefore, it
suffices to show that for some x E & and some S > 0,
(8.20)
.
for all k. Equation 8.20 will fol low if we show that F 2k and F 2k-l are
linearly independent for every k Since F2k-l F2k+3 and F2k
= F2k+4. =
and
(k = 1 , 2)
must be satisfied.
In this treatment of the example, we chose batches of two observa
tions each. If we choose batches of size four, each " observation" is of
the form
(k = 1 , 2, ,,,)
where now
where
or
- 1, 0, + 1,,, . ) .
I'
L: 8lx(t j) get) (t (8.22D)
1=0
- = = "',
In either case, if
get) = cos wI, (8.23)
I
B cos wt)
[(1'-1)/2J
+ (A sin wI L: 82J+1( _1)1+1w21+1.
1=0
-
162 APPLICATIO NS
If this is to equal 8.23 for all 1 > 0, the coefficient of cos wi must be
unity and that of sin wI must be zero. As a result, we have
A
(8.25)
We note that Equation 8.25 holds reciprocally, that is, with A inter
changed with a and B with {3, thereby making explicit the nonlinear
dependence of A and B on the f)'s.
For the sake of convenience we are going to restrict attention to the
case where the number of unknown parameters is even, that is, where
p = 2q+l (8.27)
where the 2(q+ I)-coefficients, AI.: and BI.:, are related to the f)'s via
Equations 8.25 and 8.26C after setting w = A and affixing the subscript
k = 0, I, ..., q to each of A, B, ex, {3, and A. In view of Equation 8.27,
ESTIMATING PARAMETERS OF A LINEAR SYSTEM 163
[P/2] and [(p - 1)/2] are both equal to q; therefore (with "e" for even
and" 0" for odd), we have
(8.30)
where
( 8.3Ia)
2
(-I) Q'\0 Q
2
(-1) Q'\1 q
1 ,
( _1 ) Q'\q2q
(8.3Ib)
-'\03 H l
'\05 <-I).,,"
2
-,\13 '\15 (-I) q'\1 q+1
[ _'\q3 ,\q 5
q
) (8. 3 2)
1 (_1 )1
get ) = ;- + L eos Wkt + ;- (t " ,,- 1, 0, + 1, ,
v 2 k=l
.
. 2
v
=
where
Fa(t; 8) = '!.. + i
.
v2
(Ak cos wkt + Bk sin Wkt) + vq/l2 (_1 ) 1
t, 2, . . ),
k =l
(t = (8.33)
where Ao, A1, 010 . ", Aq, Bq, Aq+1 are related to 00, 010 , 02q+l via
Equations 8.25 and 8.26D, after we affix k = 0, 1 , . . ., q + 1 to each of
/XO
/Xl
Y=
fll
/Xq
n [:: 1
82q+1
no, (8.34)
flq
/Xq+l
where
1 1 1
1 cos Wl cos 2Wl cos (2q + l)wl
0 - sin Wl -sin 2wl -sin (2q + l)wl
n= (8 . 35)
(t = 1,2,,, . ), (8.36)
Ao
Bo
Al which are related to e via
= Bl the linear equations 8.30 (8.37C)
with k = 0, 1, . . . , q,
Aq
Bq
ESTIMATING PARAMETERS OF A LINEAR SYSTEM 165
or
Ao
A1
B1
which are related to e via
; = the linear equations 8.34 (8.370)
with k = 0, 1, . . , q + 1.
Aq
Bq
Aq+l
and
or
as h 00 for some e, 0 < e < I (and thus the noise process need not
possess a spectral density).
We are now in a position to write down the procedure for estimating 6.
We affix an n to a parameter symbol to denote an estimate of the
parameter based on the first n observations. Thus, ;" is a vector-valued
function of Yl> Y2," ', Y" which estimates ; Ak, n estimates the ,
(See the d iscussion leading to Equation 7.15; here we have decreased the
iteration i ndex n by one.)
Step 2. Assume that a finite number A is known for which
max 1 0ji < A.
1=0.1. .2q+l
Compute
n'\
"
L cos 2'\t
1-1
=
sin
-.-, cos (n + 1),\
SIn 1\
} (,\ : a mUltiple of 17) , (S.41)
si n'\
sin 2'\t =
1-1
? , sin (n + 1),\
SIn 1\
1 "
- L cos2,\t
n 1=1
} = 1- + 0 (!) (,\ : a multiple of 17) ,
1 n 11
- L sin2,\t
n ,=l
(S.42)
(S.43)
Vn(" - ; ) H"z"
2 2 "
"'"
. r
vn
= . r
-vnt=1
2: htZt
J
0(110' 0:10
II" flo, flb
"
fl,>
"
_
r;:J
ESTIAfA TING PA RA METERS O F A LINEA R SYSTEM 169
I f we set
(8.46)
it follows that Vn(ex" - ex) and Vn(3" - (3) are asymptotically in
dependent and identically distributed as a (q + I)-dimensional normal
random variable with 0 mean and covariance matrix 2O'2P2, where
P... =
[AD' A '
po -
dl ag - P1 . . . -
pq
, , Aq
]
1
and these two (q + I)-vectors become independent in large samples.
The formula for the covariances of the odd components results from
the fact that in Equation 8.31 we have
is given by Equation 8.46 with the a's and f3's now computed from
Equation 8.26C rat her than Equation 8.26D. Consequently, for the
estimate in Step 3, we have
(8.47 D)
170 APPLICATIONS
Pic =
Lo 82j( -1 ) J'\lc2Jr + Lto 82J+ 1( -1)J'\lc(k2J+ 10,r I, . . " q),
= (8.48C)
and
(k = 1, 2, ,, ', q) (8.49)
ESTI.UATING PARAMETERS OF A LINEAR SYSTEM 17 1
(q + q + k=l
(j = 0, 1,.", 2q + 1), (8.50)
and the estimate of 8J is obtained by merely substituting for /Xk and f3k
the quantities /Xk." and f3k... which result from Step 2. The limiting
covariance matrix in 8.47D reduces to a Toeplitz matrix with entries
gives the row vectors of the inverse of 1\.e'.) An analysis of the method
would show that certain choices for the A'S make the inversion numeri
cally difficult.
On the other hand, we would like to pick these input frequencies to
make our estimate statistically accurate, which we measure by the
determinant of the limiting covariance (called the generalized variance).
In this regard, it is unimportant how we label the parameters; therefore,
the determinant of the limiting covariance matrix of vn(6" - 6) is
simply the product of the determinants of the two matrices in 8.47C.
The square root of this general ized vari ance is proportional to
(8.51)
where ret) is the distance from the Earth's center of mass to the satellite
at time t, 'Y(t) is the angle between a radius vector from the Earth's
center of mass to the sateIlite and the reference direction of the co
ordinate system, a is the length of the eIlipse's major semiaxis, e is the
eccentricity of the ellipse, and a is the angle between the ellipse's major
axis and the reference direction. (See Figure 8.3.)
Noisy observations Yl(t), Y2(t), and yit) are made on r(t), \F{t), and
;(t) = dr/dt, respectively. Thus we have
Y3(t) = I'(t) + Z3(t).
(8.53)
We wish to reconstruct ret) and \F(t) from the noisy data, so that the
position of the satellite can be predicted at any instant of time. We
begin our analysis by deriving parametric representations of rand \1'.
The functional forms of ret) and \f(t), which depend upon the param
eters a, e, a, and '1-"(0), can be deduced from Newton's laws.
In polar coordinates, the" F = ma " equations become
Major axis
Reference direction
(8.58)
a(l _ e2)ra
We substitute 8.56 through 8.58 into Equation 8.54. Thus we obtain
(8.59)
Finally, we substitute 8.52 i nto Equation 8.59 and solve for M2. We find
that
M.2 = a/L{ 1 - e2); (8.60)
therefore, Equation 8.56 becomes
(8.61)
Substituting 8.52 i nto Equation 8.61, we can integrate the differential
equation :
'I'(o-a (a(1 - e2%
J
d'Y
/L -\1 = t. (8.62)
'I'(O)-a (1 + e cos 'Y)2
Equation 8.62 expresses 'Y{t ) as an implicit function of four parameters
('Y(O), 0:, e, and a) .
IPI"(t ) could be solved for explicitly, the resulting expression could be
substituted into Equation 8.52, thereby causing ret ) [hence ;(/)] to be
represented as functions of these parameters. Unfortunately, the in
tegral 8.62 cannot be represented in terms of elementary functions. We
must consequently resort to a clever change of variable.
1 74 APPLICATIONS
Before proceeding, let us point out that we have greatly simplified the
problem by assuming that the plane of the orbit is known exactly,
thereby reducing the number of unknown parameters by two. We will
now add one more simplifying assumption, namely, that a (the length of
the major semiaxis) is known. Under this assumption, we can choose
the unit of length so that
a =1.
S ince I-' has the dimensionality of cubed length over squared time, we
can also choose the unit of time so that
1-'=1.
Fundamental Equations 8.52 and 8.61 become
E(t) [ L a] 27T
'I-'(t -
( e + cos ('Y(t ) - a) )
=
{ arc cos
I + e cos (llr(t) - a)
if sin ('Y - a) 0,
2"
( e + cos ('-F(t) - a) ) if sin ('Y - a) O.
- arc cos
1 + e cos eY(t) a)-
<
(8.65)
(Here, [('Y - a)/27T]
is the greatest integer in ('Y - a)/27T.)
As ('Y - a)
varies from 0 to 00, so does E(and in a monotone fashion). Furthermore,
if k7T :$; 'Y- a
:$; (k + 1)7T,the same holds for E: k7T:$; E:$; (k + 1)7T.
In fact,
if sin E 0,
and
e - cosE
cos('Y - a) = Os E < 00. (8.68)
ecos E - I'
Here E(t) is called the eccentric anomaly at time t.
As a consequence of Equations 8.68 and 8.63,
r(t) = 1 - e cos E( t). (8.69)
Since V(t) = (d'YldE)(dEldt), we can write Equation 8.64 as
d'YdE
r2 (t) (8.70)
.
=
dE dt
Differentiating Equation 8.68 and using 8.69, we find that
. f\J/'
d'Y ( I - e2)
\.. - =
sm a ) sm E. (8.71)
dE r2
Computing sin ('Y - a) from Equation 8.68, we obtain
(I - e2)Yz
sin ('Y - a ) = sin E. (8.72)
r
Combining Equations 8.71 and 8.72, we have
d'Y (I - e2 )Yz.
= (8.73)
dE r
After substituting 8.73 i nto Equation 8.70, we obtain
1
dE
.
r = (8.74)
dt
B(O) = t.
This yields
E(t) - e sin E(t) = t + (E( O) - e sin (0. (8.75)
The quantity E(t) - e sin E(t) is called the mean anomaly at time t.
We will parametrize the unknowns as follows:
82 = E( O) - e sin E( O), and 83 = a. (8.76)
176 APPL ICATIONS
( 8181cos- cos
arc cos
E(t) 1
if sin E ;;::: 0,
81 - cos E(t) )
_
(n = 1 " 2 . . . ),
ELLIPTICAL TRAJECTORY PARAMETER ESTIMATION 1 77
and
and
h3 ,, - 2 = grad 8) l a - 8n
r(I,,; }
h3 ,, - 1 = grad '1'(1,,; 8) l a = 8n (n = 1 . 2. . ) . (8.8 1 )
h3" = grad ;(1,,; 8) l a - sn
;(t ; 8) = 81 ( 1 2
- (1 ) - % sin ('1'(1; 8) - (3). (8.82)
which, together with Equati ons 8.78 and 8.79. define the components of
F". Furthermore, we have
and
(Equations 8.83 through 8.85 will be derived at the end of this example .)
1 78 APPL ICATIO NS
A typical computation cycle might go like this, where (J1n. (J2n, and (Jan
den ote the components of the (column) vector Sn'
I. S ubst it ute Sn for 0 in Equat i on 8.77 and solve for E(tn ; sn).
2. Compute r(tn ; sn) from Eq uation 8.78.
3. Compute y(tn ; sn) from Equati on 8.79.
4. Compute cos ['F(tn ; sn) - (Jan]. and sin ['F(tn ; sn) - (Jan] from
Equation 8.68.
5. Compute ;(tn ; sn) from Equation 8 .82 .
6. Compute han - 2 , han - I > h an' using Equations 8.81 and 8.83 through
8.8 5.
7. Update Li a li hj I1 2 to L 1 II hj l1 2 and form An.
8. Form the col umn vector Fn(sn) of q uantities in (2), (3), and ( 5).
9. Observe Yn and compute [sn + An( Yn - Fn(sn H.. = Sn + 1 '
10. Begin the next cycle.
{
i t follows from Equations 8 .89 and 8.90 that
A min (H nH " ' ) = A1'1 -
>
3
I det Hn l } 2
L II Mn + i 1l 2
im1
In the light o f this and Equation 8.87, we find that Equation 8.88 holds
if I det Hnl is bounded away from zero . If det II " is expanded by co
factors of its last row, we see that
I det Hn I -
- 810{COS2 2
(' - 830) + sin 2 (' - 830)
r r
2 -
810 .
r
2
r
2
} _
Since r(tn) is uniformly bounded, it follows that lim infn ... I det Hn l > O . co
and thus
cos E - e = {l - e cos E) cos ('Y - a),
or
cos E - e = r cos ('Y - a) , (8.9 1 )
after using Equation 8.69 (which we restate for convenience) :
r = 1 - e cos E. (8.92)
Differentiating Equation 8.92 with respect to time, we obtain
f = Ee si n E.
Thus, we have
(8.96)
and finally,
= O.
oE
(8.97)
B 03
By Equation 8.91,
or
0 01
= - cos (\f" - 03) ,
Similarly, we have
(8.99)
By Equation 8.93,
O.
ELLIPTICAL TRAJECTORY PARAMETER ESTIMATION 181
By 8.93, we have
(, +
I I .
81 2) SI D ('Y - 83) ,
8'1"
8 1 8 =
I _
882
Using Equations 8.99, 8.96, and 8.93, we obtain
812)%
882 =
8'1" (l
r2 '
-
(8. 1 0 1 )
and finally
= 1.
8'1"
8 3 8 (8. 1 02)
sin E 2
- r3- (r + r81 cos E - 812 + 81 cos E)
81)ei E)
=
C( l - 281 2)%
=
r 2 SID. ('I" - ) a .
Similarly,
(COS E 8E sin E !!..
882 81 r 882 r2 882)
8; _
r _
= - a .
Finally, we obtain
883 = O.
8;
9. Open Problems
(9.1)
(9.2)
(When n(') is the identity transformation and VII is zero for each n, the
problem reduces to ordinary regression.)
When Fn() and n(') are linear functions of their argument and the
vector processes {Vn} and {WII} are mutually and temporally indepen
dent, Kalman has developed a recursive theory of smoothing and
prediction which generates estimates for 8n which are optimal in a
number of statistical senses. For example, if anln denotes the estimate of
8n based upon the observations Yh Y2," ', YII' then
(9.4)
When this is done, and if all terms of nonlinear order are ignored, we
find that
where 4>n(6) and Fn(6) are, respectively, the matrices of cI>n and F,,'s
first partial derivatives evaluated at 6.
If the Kalman filtering theory is applied to the linear approximation
Equations 9.5 and 9.6, we find that
On+11"+1 = cI>n(Onl") + An[Yn+1 - F,,+1(Onln)], (9.7)
where now An is defined recursively in terms of the second-order noise
statistics for {Wn} and {V,, } and in terms of the matrices Fn(6) and
4>n(6).
Although this technique meets with wide acceptance in applications,
little if any work (to the best of our knowledge) has been directed toward
the analysis of the "steady-state" operating characteristics of such
schemes. Of particular interest are such questions as: What is the large-
sample (large n) mean-square estimation error of Onl"? What is the
quantitative nature of the tradeoff between computational convenience
and accuracy that one experiences with various choices of the gains An?
The estimation recursion 9.7 looks so much like the recursions for
regression-parameter estimation that there is every reason to hope that
the analytic approaches developed in this monograph can be carried
over and extended to the more general case. Indeed, when the state and
observation Equations 9.1 and 9.2 are scalar relations, our previous
methods can be applied and furnish a bound on the limiting mean
square prediction error.
From the first n observations, we recursively predict 8"+1 by
t,,+1( = 8"+1In in the previous notation):
(n = 1, 2, . . . ) . (9.8)
KALMAN-TYPE FILTERING THEOR Y 1 85
H3. For all x, fJn' I(bn(x) I fJn and bn IFn(x) I bn', where
fJn c1fJn' and bn' C2bn for some 1 C1> C2 < 00.
H4. fJ lim sup fJn < 1' where c = C1C
n c-
=
2
H5. bn -+ 00 as n -+ 00.
THEOREM 9.1
Let {8n} and {Yn} be scalar-valued processes defined by Equations 9.1
and 9.2 which satisfy Assumptions H I through H5. Let {t n} be generated
by Equation 9.8 with
fJ n
an sgn (d>nFn) .
C bn
=
(9.11)
Pi - lallbi = Pi 1 -
( ) (9.12)
is bounded away from unity for all large enoughj, so the leading product
in Equation 9.11 goes to zero as n tends to infinity. According to Lemma
1,
n n
L Bnk = I -
TI Bj-> 1 .
k=l /=1
limit of indefinitely large C1 or C2' At the other end of the spectrum when
f3n' f3n f3 and bn bn' (the linear case), Assumption H4 allows
= = =
(9.14)
KALMAN-T YPE FILTERING THEOR Y 187
By filling in the details of the proof, it is not difficult to see that under
Assumptions H I through H3 the first line in Equation 9.13 holds true,
namely,
lim sup e"2 =::; lim sup Q,,(lani), (9.15)
" "
provided that
(9.16a)
and
"
as n 00. For given sequences {f3n} and {bn}, the function 9.14 has a
unique minimum at the point
as n 00, assuming always that f3n has a finite limit superior. Con
sequently,
and
(9.19)
H5'.b,,O.
assuming that 13" does not tend to unity. In the same way as in the
previous paragraph, we find for Equation 9.17 that
2
13"
b -+ 0
O'v
Xo -
O'w 2 I - 13"2 ,,
as n tends to infinity. Since O'v
2
/O'W2 is not known, this suggests using
gains with
(a > 0), (9.20)
H4'. 13 < 1.
Then Equation 9.16a holds, at least for all large enough n (which is
enough), as does Equation 9.l6b. Thus, for the gains a" =
(9.2 1)
and there is equality when 13,,' 13" 13 (and lim sup should be re
= =
placed by lim). But this is precisely the mean-square error resulting from
not using the observations at all (that is, by setting a 0 in Equation =
and f3n' f3n 13 (0 < 13 < I), the same approach will lead to time
= =
- 13 Qo (9.22)
a Qo +
2
- O'w
and
(9.23)
These two equations combine to give a quadratic (Kalman's "variance
equation") whose positive square-root solution is the minimum mean
square linear prediction error Qo. The optimum gain now depends on
O'i and O'w as well as 13. This result is a special case of Kalman's linear
2
theory (see his Example I), and we include it only as a point of com
parison.
Appendix. Lemmas 1 Through 8
Lemma 1
1 :::;
k :::; nand n I,
2:
n f1n (I - A,)A1 = 1 -
n
f1 (I - A,),
I=/c '=1+1 I=/c
where products are to be read backwards and void ones defined as the
identity.
Proof.
n n n
We have
= f1 (I - AI)A/
1=1+1
Thus,
The sum over j from k to nof the right-hand side collapses to yield the
asserted result. Q.E.D.
189
190 APPENDIX
Lemma 2
Let P" = TI7=l (I - a,) when a, E (0, I) for all j Nand P"_ 0
as n - 00. Then, if Lk Xk < 00,
" P
max 2: 2xJ_O
lSkS" i=k PI
max
NSkSn
I p
" (S -
p k-l k
S)\
= max max Ii (I - a,)lsk - sl, max Ii (1 - aj)lsle - sl}
{NSkSn1i=k NSkSn i-k
max { max ISle - sle, (1 - an)e} const e.
=
NSkSn1
Setting
n
ani = aJ [l (I - a,),
I=J+l
LEMMAS I THROUGH 8 191
max
NSI<sn J=I< 1i
anl sJ - S) I i
J=N
an/lsJ - s l (L2.2)
J=N
as n 00. From ISJ - s l 0 as j 00 and the Toeplitz Lemma
(Knopp, 1947,p. 75) we infer that the bound in Equation L2.2 must go
to zero as n 00. Q.E.D.
Lemma 3
Let {an} and {an} be positive number sequences such that an 2 for all
n N and an < 00. If
n
sup en2 < 00.
Proof. If en2 1, then clearly e+1 M < 00 for any n. If en2 > 1,
then en < en2 and
e+1 (1 + an)en2 + an
if n N,because (1 - an)2 + an 1 + an when an 2. In every case,
M, (1 + an)en2 + an}
therefore,
e+1 max { (n N).
If we iterate this back to N, we find that
2
e,,+1 max
{[MP
max -
"+
NSI<S" PI<
L.
1
J=I<+, J
J -PP ]
n a ' -e
p
P"
N-l
N 2 + TI
n n
J JN
-pP GJ ' }
(L3.1)
where
"
TI (1 + aJ).
P" =
J=1
Since 1 + eX is valid for all real numbers, we have
n "
x
p
" exp L aJ exp L aJ < 00,
PI< J=I<+1 J=1
192 APPENDIX
Lemma 4
Let {bIll be any real-number sequence such that p"2 < 00, where
p" = b,,2/B,,2 and B,,2 = b12+ ... +b,,2 (that is, Assumption ASj.
Suppose Z > 0 and K O. Define
k = N, N + 1,, n
for some 0 < CIe-1 SiS DIe-1 < 00, which do not depend on nand
have the property
lim Cle lim Die 1.
... co
= =
k co k
exp
{ __}
_ X < 1 -x < exp {-x} (U.l)
- x 1
is valid for all x < 1. (See Knopp, 1947, p. 198, for this as well as
and form the product on j from k to n.
GO
Equation U.4.) We set x = ZJ
2 __I -
( )
Since
" 2
Z " Z21 " Z
= 2 ZI + __ S 2 ZJ + 2 __ J ,
I-Ie - 1 ZI I-Ie 1 - ZJ J-Ie J-Ie 1 - ZJ
this gives
CIe-1 exp
{- i Z } S Ii (1 - zJ ) S exp
{- i } Z/ , (U.2)
J
J-Ie J-Ie I-Ie
{-GO
with
CIe-1 = exp 2
J-Ie 1 -
1
__
ZJ
Z2
}
tending to 1 as k -+ 00 because {zJ2} is summable. From the right-hand
inequality in Equation U.l, we have
? = 1 - PJ < exp{-pJ};
LEMMAS I THROUGH 8 193
therefore,
m-l
--2 nn -2-
- L: PJ}
BY-I exp {n
=:; (L4.3)
Bn
=
J=k BJ J=k
But ZJ =:; ZPi> and therefore
This combines with Equation L4.2 to give the asserted lower bound.
To prove the upper bound, we use
exp {x} ( 1 + r, > (L4.4)
which is valid for all positive numbers and For the choices x y.
and
we find that
exp {zJ} (-BBJ2-ll )(1-P/)(Z-KD/)
( Bl )"(_ 1 -(,,+K)P/+KP/2
_)
1 -_)
( BJ-1
Bl )"(_
1 PJ -("+K)P1
=
BJ-1
1 - PJ
because 1/(1 - Pi) exceeds 1. Consequently, after inverting and forming
the product over j, we have
exp { - J=kL:n zJ} i=kn B2"I ( 1 ) K)D/ Bk;;
n B;;
=:;
I
1 _
B
2z
R
t'J
("+
(L4.S) =
n
S,
where
log (z + K) Jn PJ log 1 1 PJ ( + K) J PJ log - PJ 1
co
o < S = _ =:; z 1
(L4.6)
Equations L4.5 and L4.6 combine with Equation L4.2 to give the
asserted upper bound with
D/C-l exp {(Z + K) JtP/Iog pJ
= 1
the last written sum is majorized by 'Ll=k Pl/(l - Pj). This goes to 0 as
k since it is the tail of a convergent series, and therefore Dk -+ 1.
-+ 00
Q.E.D.
194 APPENDIX
Lemma 5
Let {bn} be any real-number sequence such that
B,,2 = b12 + . . . + bn2-,>-00 and f3n = bn2/Bn2-,>-Oasll-'>-00
(that is, Assumptions A3 and AS"). Define
Bk2f3k
f3nk(Z) = Jj"2Z (k = 1,2,.,n)
n
for z > O. Then
"
.
ltm
" k=1
2: f3nk(Z)" = -
Z
if limn n = , finite or not.
Proof. For every fixed k, f3nk -'>- 0 as 11-'>- 00. The conclusion follows
immediately from the Toeplitz Lemma (Knopp, 1947, p. 75) if we can
show that the row sums
" 8 .... 0 Z - Z
-.
Lemma 6
where f3" = b,,2/B,,2 (that is, Assumptions A3 and AS"'). Then, for
any z > t,
LEMMAS J THROUGH 8 195
where f3nk( ) was defined in the hypothesis of Lemma 5 for all positive
arguments. After we take limits on both sides of this inequality, we find
that the desired conclusion follows from that of Lemma 5. Q.E.D.
Lemma 7
ct hih/)
( "'n hh ')
. "max K2q
lIm sup - '
n_oo " T 2
mIn L., i'
1= 1
where
and
196 APPENDIX
Proofofb.
Amln (i hih/)
i=l
> Amln ( hih/)
i=l
k 1
i
i=l
Amln (L hih/) ,
lell
"min (L hih/)
le/l
min l!htlI2T/pJ>
le/l
where
and, of course,
Pi = Vi+1 - vi> the number of elements in Ji
Let
Then
and therefore,
Thus, we have
K2(Q+1)llhvk+ 1-1112q.
LEMMAS J THROUGH 8 19 7
Since lim infJ-+oo Tl/Kl T2/K 2Q, Assumption F2 implies that the
second term approaches zero as k -+ 00. The first term on the right-hand
side is indeterminate. The discrete version of L'Hospital's rule can be
applied (Hobson, 195 7, p. 7 , Section 6), and we find that
lim sup
Ie / Ie
Lemma 8
Let ' 1 0 r2, .. . , '" be any distinct real numbers, and let R =
[rio r2, " ', rIO] be the (Vandermonde) matrix whose jth column is
where
(k = 1,2, . . , n ,
)
and alo . . . , a" are such that
"
TI (x - 'Ic) =
g..(x).
1e1
A
=
[.Il- - - - - - - - - - l,
a.. ! a,,-1 a2 al
198 APPENDIX
and notice that ArJ = 'irJ because x = 'J is a root of gn(x) O. In other =
o o o
x 1 o o
2
E(x) = x x o
[! l
we find that
(A - xl)E(x) = i l
hypothesis, the ,'s are distinct numbers, and therefore the two sets of
eigenvectors are biorthogonal. In other words, we have
L'R = D, or
for some diagonal matrix D.
We complete the proof by showing that the ith entry of D is indeed
the one given in the statement of the lemma. To do this, we multiply
n
gk(X) by X -k-1 and sum on k up to n - 1, giving
n-l n- l k
L gk(X) Xn-k -1
k=l
= (n - l) xn-1 - L L aJxn-J-I
k=IJ-I
After collecting the coefficients of the a's, we have
LEMMAS J THROUGH 8 199
If we set the arbitrary constant equal to -an, the right-hand side is the
derivative of gn(x). After differentiating the product form of the char
acteristic polynomial, we obtain the identity
n-1 n
xn-1 + 2 gk(X)Xn-k-1 =
2 I1 (x - 'J).
k=l k=l Jk
In particular, at x = " the left-hand side becomes the inner product,
l;'r" and the right-hand side becomes DJ, (r, - 'j) . Q.E.D.
The reader will note that the preceding is precisely the sort of analysis
used in deriving the solution of an nth-order difference equation with
coefficients 010, an. The proof immediately suggests itself if one
knows that the right-sided eigenvectors of the matrix defining the
corresponding first-order vector difference equation is a Vandermonde
matrix of characteristic roots. Only the point of view is different; we
start with the characteristic roots and construct the coefficients.
References
203
204 INDEX
r .]: 9 A. 87,95,101,108
d. 10 T. 8 7,95,108
b. 10 Q. 87,95,101
B.2 10 T 84,92,99,104
F. 10 p 84,92,99,104
e.' 16,93 (l 84,92,99,104
g.(. ) 39 p. 84,92,99,104
c 48 fJIJ 103
n) 49 rb 103
Fn 82 D. 110,111,112,130
1111 82 Km 133,135