You are on page 1of 8

JOURNAL OF INDUSTRIAL AND Website: http://AIMsciences.

org
MANAGEMENT OPTIMIZATION
Volume 1, Number 2, May 2005 pp. 193–200

CONVERGENCE PROPERTY OF THE FLETCHER-REEVES


CONJUGATE GRADIENT METHOD WITH ERRORS

C.Y. Wang1,2 and M.X. Li1,3


1. Dept. of Appl. Math., Dalian University of Technology, Dalian, Liaoning, 116024, China;
2. Inst. of Operations Research, Qufu Normal University, Rizhao, Shandong, 276826, China;
3. Dept. of Math., Weifang University, Weifang, Shandong, 261041, China

Abstract. In this paper, we consider a new kind of Fletcher-Reeves (abbr. FR)


conjugate gradient method with errors, which is broadly applied in neural network
training. Its iterate formula is xk+1 = xk + αk (sk + ωk ), where the main direction
sk is obtained by FR conjugate gradient method and ωk is accumulative error. The
global convergence property of the method is proved under the mild assumption
conditions.

1. Introduction. We consider the unconstrained optimization problem


min{f (x) : x ∈ Rn }, (1.1)
n
where f : R → R is continuously differentiable.
Conjugate gradient methods are very important for solving (1.1). They have the
following forms:
xk+1 = xk + αk dk , (1.2)
and ½
−gk , k = 1;
dk = (1.3)
−gk + βk dk−1 , k ≥ 2,
where gk = ∇f (xk ), αk > 0 is a stepsize obtained by a one-dimensional line search
and βk is a scalar.
Since Fletcher and Reeves introduced the nonlinear conjugate gradient method
in 1964, many formulae have been proposed to compute the scalar βk . Among
them, two well-known formulae for βk are called the FR and PRP formulae, which
have the following forms:
kgk k2
βkF R = (Fletcher − Reeves), (1.4)
kgk−1 k2
and
gkT (gk − gk−1 )
βkP RP = (Polak − Ribière − Polyak), (1.5)
kgk−1 k2
where k · k means the Euclidean norm.
Many efforts have been devoted to the investigation of the global convergence
properties of the FR conjugate gradient method. Powell and Zoutendijk (see [14]
and [20]) proved that the FR conjugate gradient method is globally convergent un-
der the exact line search conditions. In [1], Al-Baali obtained the global convergence
result of FR method under the strong Wolfe line search conditions. Dai (see [6])

2000 Mathematics Subject Classification. 90C30, 49D27.


Key words and phrases. Conjugate gradient method, global convergence, error.
Research partially supported by NSF grant 10171055.

193
194 C.Y. WANG AND M.X. LI

discussed the global convergence of FR method under the generalized Wolfe line
search. In [7], Dai and Yuan also discussed the global convergence of FR method
under the Armijo line search conditions.
In practical computation, errors may arise because of inexact computation of
the gradient of f (·) or approximate computation of subproblem. Therefore, anal-
ysis of method with errors becomes an important subject in the field of numerical
algorithms. About the gradient-type method with errors, many results have been
obtained. Some related work is as follows: for neural network training, the con-
vergence of standard incremental gradient /backpropagation method has been the
object of much recent analysis( see [2, 8, 9, 10, 11, 17]). These methods have been
recently analysed in [11, 12] and [13] with the following stepsize rule

X ∞
X
αk = ∞, αk2 < ∞, (1.6)
k=1 k=1
which is typical for incremental methods. This stepsize rule implies, among other
things, that {xk+1 − xk } → 0; the latter condition need not hold in the setting
of this paper, that is, the situation here is fundamentally different from that in
the literature. Incremental algorithm with stepsize bounded away from zero are
considered in [16] and [17]. Other recent work on this class of methods can be
found in [3, 4] and [19]. In [18], Solodov and Zavriev studied the gradient-type
methods in the presence of noise and the following stepsize rule
X∞
αk = ∞, αk → 0 (1.7)
k=1

in the general case (nonsmooth and nonconvex). In [15], Solodov presented a fairly
broad feasible descent framework in the presence of errors. However, the problem
of the stepsize choice and stopping conditions were not studied. Bertsekas and
Tsitsiklis (see [5]) also studied the gradient method with errors in the presence of
stepsize rule (1.6) and the main direction sk satisfying
hgk , sk i ≤ −c1 kgk k2 , c1 > 0 (1.8)
and
ksk k ≤ c2 kgk k, c2 > 0. (1.9)
At the same time, error term ωk satisfies
kωk k ≤ αk (q + pkgk k), p, q > 0 (1.10)
and g(x) is Lipschitz continuous, that is,
kg(x) − g(e
x)k ≤ Lkx − x e ∈ Rn ,
ek, ∀x, x (1.11)
where L is the Lipschitz constant. Under the above conditions, they prove either
lim f (xk ) = −∞ (1.12)
k→∞
or
lim kgk k = 0. (1.13)
k→∞
To the best of our knowledge, there is no published convergence analysis for the
conjugate gradient methods with errors.
In this paper, we discuss the FR conjugate gradient method with errors under
the condition that sk only satisfies (1.8). An important feature in our theoret-
ical analysis is that the stepsize is determined by a one-dimensional line search.
SAMPLE PAPER IN AMS-LATEX 195

Therefore it does not tend to zero in the limit necessarily. Furthermore, we prove
that (1.12) or lim inf kgk k = 0 is true in the presence of (1.10). In doing so, we
k→∞
remove various boundedness conditions such as boundedness from below of f (·),
boundedness of xk , etc.
The following two kinds of inexact line searches are used in this paper:
Search A:(Wolfe line search) Choose a λk > 0 such that
f (xk ) − f (xk + λk dk ) ≥ −δλk hgk , dk i, (1.14)
and
σhgk , dk i ≤ hg(xk + λk dk ), dk i, (1.15)
where 0 < δ < σ < 1 are constants.
Search B:(Armijo line search) Choose a λk > 0 as follows: λk = λm , where m
is the smallest nonnegative integer satisfying (1.14) and λ ∈ (0, 1).
In the next section, we propose the FR conjugate gradient method with errors
and prove the global convergence property of the method.
One more word about our notation. The usual inner product of two vectors
x ∈ Rn and y ∈ Rn is denoted by hx, yi. The Euclidean 2-norm of x ∈ Rn is
given by kxk2 = hx, xi. For two nonnegative scalar function s1 : R+ → R+ and
s2 : R+ → R+ , we say that s1 = O(s2 ) if there exists a positive constant c such
that lim ss12 (t)
(t) = c.
t→∞

2. FR Conjugate Gradient Method with Errors. In the conjugate gradient


method with errors, we let
xk+1 = xk + αk (sk + wk ), (2.1)
where αk is stepsize and main direction sk is determined by
½
−gk , k = 1;
sk = (2.2)
−gk + βk dk−1 , k ≥ 2,
and

dk = sk + ωk . (2.3)
sk and ωk satisfy the following assumptions:
(A) hgk , sk i ≤ −c1 kgk k2 , c1 > 0.
(B) kwk k ≤ γk (q + pkgk k), p, q > 0.
(C) γk = O( k1 ), γk > 0.
Now we describe the algorithm formally as follows:
Algorithm 2.1. Set 0 < δ < σ < 1, λ ∈ (0, 1), α > 0, x1 ∈ Rn , k := 1.
Step1 Compute gk . If gk = 0, then stop. xk is a stationary point. Else, go to
Step2.
Step2 Let xk+1 = xk + αk dk , where dk is determined by (2.3) and αk is deter-
mined as follows: when k ∈ I = {k ∈ N |hgk , dk i ≥ 0}, let αk = α, k := k + 1,
return to Step1; else, go to Step3.
Step3 When k ∈ J = N \ I, let αk = λk , where λk is produced by Search A or
Search B. Let k := k + 1, return to Step1.
196 C.Y. WANG AND M.X. LI

Lemma 2.2. [6] Suppose that f (x) is bounded below, and its gradient g(x) is Lip-
schitz continuous with lispschitz constant L. We consider any iteration of the
form xk+1 = xk + αk dk , where dk satisfies hgk , dk i < 0 and αk is obtained by
Search A or Search B. Then there exist subsets N1 and N2 of N such that
N1 ∩ N2 = ∅, N1 ∪ N2 = N, N = {1, 2, ...} and
X 2
X hgk , dk i2
hgk , dk i + < +∞. (2.4)
kdk k2
k∈N1 k∈N2

Lemma 2.3. Let {xk } be the infinite sequence produced by Algorithm 2.1. If J = N
and there exists an open convex set D containing the sequence {xk } such that g(x)
is Lipschitz continuous on D, then at least one is true in the following
(1) lim f (xk ) = −∞,
k→∞
P 2 P hgk ,dk i2
(2) hgk , dk i + kdk k2 < +∞, N1 ∩ N2 = ∅, N1 ∪ N2 = J, N = {1, 2, ...}.
k∈N1 k∈N2

Proof: If (1) is not true, then there exists an infinite subset K ⊆ N such that
lim f (xk ) = f ∗ > −∞. It follows from J = N that ∀k ∈ N, hgk , dk i < 0. On
k∈K,k→∞
the other hand, we can get {f (xk )} is monotonically decreasing from (1.14) and
lim f (xk ) = f ∗ > −∞.
k→∞

Therefore {f (xk )} is bounded below. We know that (2) is true by Lemma 2.2.
Lemma 2.4. Suppose that (A), (B) and (C) hold. If {xk } is an infinite sequence
generated by Algorithm 2.1, then when I is an infinite set, we have
lim kgk k = 0. (2.5)
k∈I,k→∞

Proof: It follows from the definition of I that


−hgk , sk i ≤ hgk , ωk i.
We can get from (A) and (B) that
c1 kgk k2 ≤ γk kgk k(q + pkgk k),
that is,
(c1 − γk p)kgk k ≤ γk q. (2.6)
Together with (C), we get (2.5).
To simplify the narration, we give the following notation:
kdk k2 |hgk , dk i|
tk = 4
, hk = . (2.7)
kgk k kgk k2
Lemma 2.5. Suppose that (B) and (C) hold. If there exists a constant ε0 > 0 such
that
kgk k ≥ ε0 , ∀k ∈ N, (2.8)
then there exists a constant c such that
Xk
hi
tk ≤ 2 + ck. (2.9)
i=1
kgi k2
SAMPLE PAPER IN AMS-LATEX 197

Proof: It follows from dk = −gk + βk dk−1 + ωk that


hgk , dk i = −kgk k2 + βk hgk , dk−1 i + hgk , ωk i.
Hence
βk hgk , dk−1 i = kgk k2 + hgk , dk i − hgk , ωk i. (2.10)
By (B) and (2.8), we have
kωk k2 γk2 ( kgqk k + p)2

kgk k4 kgk k2
1 q
≤ γk2 2 ( + p)2 . (2.11)
²0 ²0
We know that γk → 0(k → ∞) from (C). Therefore there exists a constant c2 > 0
such that when k is large enough, we have
kωk k2
≤ c2 . (2.12)
kgk k4
On the other hand,
kdk k2 = ksk + ωk k2
= ksk k2 + kωk k2 + 2hsk , ωk i
= k − gk + βk dk−1 k2 + 2hsk , ωk i + kωk k2
= kgk k2 + βk2 kdk−1 k2 − 2βk hgk , dk−1 i + 2hsk , ωk i + kωk k2 , (2.13)
which, together with (2.10) , implies that
kdk k2 = βk2 kdk−1 k2 + kgk k2 − 2kgk k2 − 2hgk , dk i + 2hgk , ωk i + 2hsk , ωk i + kωk k2
= βk2 kdk−1 k2 − kgk k2 − 2hgk , dk i + hgk , ωk i + 2hsk , ωk i + kωk k2 .
Dividing both sides of the above equation by kgk k4 and applying (1.4), we obtain
kdk k2 kdk−1 k2 hgk , dk i hgk , ωk i hsk , ωk i kωk k2
4
≤ 4
−2 4
+2 4
+2 + . (2.14)
kgk k kgk−1 k kgk k kgk k kgk k4 kgk k4
By (2.7) and (2.14), we get
hk hgk , ωk i hsk , ωk i kωk k2
tk ≤ tk−1 + 2 2
+2 4
+2 + . (2.15)
kgk k kgk k kgk k4 kgk k4
From (2.2) and (2.3), we have
hsk , ωk i = −hgk , ωk i + βk hdk−1 , ωk i
= −hgk , ωk i + βk hsk−1 + ωk−1 , ωk i
= −hgk , ωk i + βk hsk−1 , ωk i + βk hωk−1 , ωk i
= −hgk , ωk i + βk [−hgk−1 , ωk i + βk−1 hsk−2 , ωk i + βk−1 hωk−2 , ωk i]
+βk hωk−1 , ωk i
= −hgk , ωk i − βk hgk−1 , ωk i + βk βk−1 hsk−2 , ωk i + βk βk−1 hωk−2 , ωk i
+βk hωk−1 , ωk i
= −hgk , ωk i − [βk hgk−1 , ωk i + βk βk−1 hgk−2 , ωk i + ...
+βk βk−1 ...β2 hg1 , ωk i] + [βk hωk−1 , ωk i + βk βk−1 hωk−2 , ωk i + ...
+βk βk−1 ...β2 hω1 , ωk i] + βk ...β2 hs1 , ωk i. (2.16)
198 C.Y. WANG AND M.X. LI

It follows from (2.16) and (1.4) that


k−1 k−1
hsk , ωk i hgk , ωk i kωk k X 1 kωk k X kωi k kωk k
≤ − + + +
kgk k4 kgk k 4 2 2
kgk k i=1 kgi k kgk k i=1 kgi k 2 kgk k2 kg1 k
q k q q
hgk , ωk i γk ( kgk k + p) X 1 + γi kgi k + γi p γk ( kgk k + p)
≤ − + +
kgk k4 kgk k i=1
kgi k kgk kkg1 k
q k q q
hgk , ωk i γk ( ε0 + p) X 1 + γi ( ε0 + p) γk ( ε0 + p)
≤ − + + . (2.17)
kgk k4 kgk k i=1
kgi k kgk kkg1 k
By (2.17), (B) and (C), we know that there exists a constant c3 > 0 such that
hsk , ωk i hgk , ωk i
≤− + c3 , (2.18)
kgk k4 kgk k4
which, together with (2.12), implies that
hk
tk − tk−1 ≤ 2 + c4 , (2.19)
kgk k2
where c4 = c2 + c3 . Applying (2.19) repeatedly, we get
k
X hi
tk ≤ t1 + 2 + kc4 . (2.20)
i=1
kgi k2
Thus there exists a constant c such that (2.9) is true.
Theorem 2.6. Suppose that (A), (B) and (C) hold. If {xk } generated by Algorithm
2.1 is infinite and there exists an open convex set D containing the sequence {xk }
such that g(x) is Lipschitz continuous on D, then at least one is true in the following
(1) lim f (xk ) = −∞,
k→∞
(2) lim inf kgk k = 0.
k→∞

Proof: If (1) is not true, then there exists an infinite subset K ⊆ N such that
lim f (xk ) = f ∗ > −∞. (2.21)
k∈K,k→∞

(i) If I is an infinite set, (2) is true by Lemma 2.4.


(ii) If I is a finite set, then there exists an integer k0 ≥ 1 such that k ∈ J(k ≥ k0 ).
Without loss of generality, suppose that J = N .
If we assume that (2) is not true, then there exists a constant ε0 > 0 such
that (2.8) is true. By (A), (B) and (2.8), we have
−hgk , dk i = −hgk , sk i − hgk , ωk i
≥ c1 kgk k2 − kgk kγk (q + pkgk k)
γk q
≥ kgk k2 (c1 − − γk p).
²0
It follows from (C) that there exists a constant c5 > 0 such that when k is
large enough, we have
−hgk , dk i ≥ c5 kgk k2 . (2.22)
By (2.22), we have
kdk k ≥ c5 ²0 . (2.23)
SAMPLE PAPER IN AMS-LATEX 199

Moreover, by Lemma 2.3 and (2.23), we have


X hgk , dk i2 X 2
c25 ²20 ≤ hgk , dk i < +∞.
kdk k2
k∈N1 k∈N1

Thus
X hgk , dk i2
< +∞. (2.24)
kdk k2
k∈N1

From (2.4) and (2.24), we obtain



X 2
hgk , dk i
< +∞. (2.25)
kdk k2
k=1

It follows from (2.23) that


2 4
hgk , dk i 2 kgk k
≥ c5 .
kdk k2 kdk k2
Therefore, by (2.25), we have

X kgk k4
< +∞. (2.26)
kdk k2
k=1

By Lemma 2.5, we know that when k is large enough, there exist η and c6
such that
k
X
tk ≤ η (2hi + c6 ). (2.27)
i=1
From (2.7) and (2.26), we have

X
tk = +∞. (2.28)
k=1

Hence

X
η (2hk + c6 ) = +∞. (2.29)
k=1
It follows from Abel theorem that
X∞
η(2hk + c6 )
+∞ =
Pk
k=1 η (2hi + c6 )
i=1

X ∞
X
2hk 1
≤ η + ηc6
tk tk
k=1 k=1
X∞ ∞
X
h2k 1
≤ η + η(c6 + 1)
tk tk
k=1 k=1
< +∞,
which is a contradiction. Therefore (2) is true.
Acknowledgement. The authors want to express their sincere thanks to the
referees for their helpful suggestion which improves the presentation of this paper.
200 C.Y. WANG AND M.X. LI

REFERENCES
[1] Al-Baali, M. (1985): Descent property and global convergence of the Fletcher-Reeves method
with inexact line search. IMA J. Numer. Anal. 5, 121-124
[2] Bertsekas, D.P. (1995): Nonlinear Programing. Athena Scientific, Belmont, MA
[3] Bertsekas, D.P. (1996): Incremental least squares methods and the extended Lalman filter.
SIAM J. Optim. 6, 807-822
[4] Bertsekas, D.P. (1997): A new class of incremental gradient methods for least squares prob-
lems. SIAM J. on Optim. 7, 913-926
[5] Bertsekas, D.P.; Tsitsiklis, J.N. (2000): Gradient convergence in gradient methods with errors.
SIAM J. Optim. 3, 627-642
[6] Dai Y.H. (1999): Further insight into the convergence of the Fletcher-Reeves method. Science
in China, 9, 905-916
[7] Dai Y.H.; Yuan Y.X. (1996): Convergence properties of the Fletcher-Reeves method. IMA
Journal of Numerical Analysis, 16, 155-164
[8] Gaivoronski A.A.(1994): Convergence analysis of parallel backpropagation algorithm for neu-
ral networks. Optim. Methods Software, 4, 117-134
[9] Grippo L. (1994): A class of unconstrained minimization methods for neural network training.
Optim. Methods Software, 4, 135-150
[10] Luo Z.Q. (1991): On the convergence of the LMS algorithm with adaptive learning rate for
linear feedforward networks. Neural Comput. 3, 226-245
[11] Luo Z.Q.; Tseng P. (1994): Analysis of an approximate gradient projection method with
applications to the backpropagation algorithm. Optim. Methods Software, 4, 85-101
[12] Mangasarian O.L.; Solodov M.V.(1994a): Serial and parallel backpropagation convergence
via nonmontone perturbed minimization. Optim. Methods Software, 4, 103-116
[13] Mangasarian O.L.; Solodov M.V.(1994b): Backpropagation convergence via deterministic
nonmonotone perturbed minimization. in: G. Tesauro, J.D. Cowan, J. Alspector(Eds.), Ad-
vances in Neural Information Processing Systems 6, Morgan Kaufmann, San Francisco, CA,
383-390
[14] Powell M.J.D. (1984): Non-convex minimization calculation and the conjugate gradient
method, in: Lecture Notes in Math. 1066, 122-141, Springer
[15] Solodov M.V. (1997): Convergence analysis of perturbed feasible descent methods. J. Optim.
Theory Appl. 2, 337-353
[16] Solodov M.V. (1998): Incremental gradient algorithms with stepsizes bounded away from
zero. Computational Optimization and Applications, 11, 23-35
[17] Solodov M.V.; Svaiter B.F. (1997): Descent methods with linesearch in the presence of per-
turbbations. Journal of Computational and Applied Mathematics, 80, 265-275
[18] Solodov M.V.; Zavriev S.K. (1998): Error-stability properties of generalized gradient-type
algorithms. Journal of Optimization Theory and Applications, 3, 663-680
[19] Tseng P. (1998): Incremental gradient(-projection) method with momentum term and adap-
tive stepsize rule. SIAM J. on Optim. 8, 506-531
[20] Zoutendijk G. (1970): Nonlinear programing computation methods. in: Integer and nonlinear
programming (J. Abadie, ed.), 37-86. North-Holland

Received May 2004; revised October 2004.


E-mail address: wcy0537@eyou.com

You might also like