Professional Documents
Culture Documents
Dubrulle
Moler and Morrison have described an iterative algorithmfor the computation of the Pythagorean sum(a' + b2)"' of two real
numbers a and b. This algorithm is immune to unwarrantedfloating-point overjows, has a cubic rate of convergence, and is
easily transportable. This paper,which shows that the algorithm is essentiallyHarley's method applied tothe computation of
square roots. provides a generalization to any order of convergence. Formulas of orders 2 through 9 are illustrated with
numerical examples. The generalization keeps the number of floating-point divisions constant and should be particularly
useful for computation in high-precision floating-point arithmetic.
1. Introduction
As in [ 11, the Pythagorean sum
h of two real numbersa and b 2. The Moler-Morrison algorithm
is defined by In a rectangular coordinate system { x , y } of origin 0, we
consider a sequence of points {A,; n = 0, 1,2, ...], A, being in
h = (a' + b*)'/'. the first quadrant, at a distance h from the origin. A,,, is
Pythagorean sums are a frequent occurrence in numerical derived from A , as follows. Let H , be theprojection of A, on
computations, and their evaluationinfloating-point arith- the x axis and M,, the midpoint of A,H,. A,+, is defined as the
metic requires some precautions, often overlooked, to prevent reflection of A, in OM, (Fig. 1). Elementarygeometric
unwarranted overflows or underflows. In their paper, Moler considerations show that A,,, and A,, areinthesame
and Morrison [ I ] describe an elegant iterative algorithmfor quadrant, at the same distance from the origin, and that
the computation of Pythagorean sums that is immune to A,,, is closer than Anto thex axis. Thus, from thedefinition
unwarranted overflows, has a cubic rateof convergence, and of A,, the set { A n }is on the quarter circle of radius h in the
is easily transportable to any machine. first quadrant, with A,,, between An and A, the point of
abscissa h on the x axis. Thesequence {An] converges
This paper presents a summary description of the Moler- monotonically towards A.
Morrison algorithmand shows that it is essentially the
application of Halley'smethod [2] tothecomputation of Let x,, and y , be the coordinates of A,. From the above
square roots.A generalizationtoany convergence order considerations, the sequence { x n }converges monotonically to
higher than linear is then proposed that preserves the main h from below, with
properties of the Moler-Morrison algorithm. Algorithms for
orders 2 through 9 are illustrated with numerical examples.
h = (x: + y:)'/', n = 0, 1,2, ..., (1)
Since the general algorithmis based on a rational iteration, while the sequence{ y , ] converges monotonicallyto zero from
the number of divisions required per iteration is constant above. Thus, { x n ]can be considered as a sequence of approxi-
(two), while the number of multiplications is proportional to mants to the Pythagorean sumof two arbitrary numbers x,
the order of convergence. High-order algorithms should thus and yo. From the relationship between A, and A,+,, we now
beparticularlyinteresting formultiple-precisionfloating- derive the iteration formulas providing the pair (x,,+,, y,,,)
point computations. from ( x n ,Y,,).
0 Copyright 1983 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of
royalty provided that ( 1 ) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on
the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by
computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the
582 Editor.
Xg Yo > 0,
considering that the Pythagorean addition is commutative
and that the case yo = 0 is trivial. This assumption guaran-
tees that
X" L y, >0 (2)
for any finite value of n.
x,,= m a x ( l a l , I b l ) ,
Y,, = m i n ( l a l , I b l ) ,
where f,,, L,
f and f are the values of f(x) and its first two
derivatives at x,. The asymptotic rate of convergence of the
method is cubic for simple rootsand linearfor multiple roots.
More details on this iteration can be found in [2].
Y"
3 (h + X")k - ( h - x,,y
h
Y.+I = ~
4x5 + y: '
X"+] =
(h + X$ + (h - XJk '
which are precisely the equations ( 5 ) derived for the Moler- 2hYf:
Y"+l =
Morrison algorithm. (h + XJk + ( h - x")k'
4. A generalization
WhiletheMoler-Morrisonalgorithm is likely tobe of
optimal efficiency for the working precision of current com-
puters,higher-order algorithmsmay be desirable(under An analysis of Eqs. (17) shows that two cases must be
certain conditions of computationalcost) for calculations considered, based on the parity of k:
requiringhigher levels of precision. Inthis section, we
0 k is even; then
propose such algorithmsfor which the numberof divisions is
The numerator of x,,+] is an odd polynomial in x, and an
constant and the number of additions and multiplications
even polynomial in h. The denominator is an even poly-
varies linearly with the rate of convergence (rational itera-
nomial in x, and h. Thus, x,,+Iis the product of x, and a
tion formulas).
rational functionof x5 and h2;using a similar approach,we
find y,+, to be the productof h and a rational functionof xf
Thecubicrate of convergence of theMoler-Morrison
and h2;
algorithm is revealed by forming
k is odd; then
x,+] is again the productof x,, and a rational functionof xf
and h2, while Y , + ~is theproduct of y , and a rational
function of xf and h2.
from Eq. (1 1).
rn+l = (Yn+1/Xn+d2
where F J x ) is a polynomial to be defined. The requirements
always has a rational expression in terms of rn and h2. When
we choose to imposeupon Fk(x) relatetocomputational
k is even, an iteration can thus bederived that maps the pair
considerations andtheproperties of theMoler-Morrison
(xn, rn) into (x,+1, rn+l).These twoclasses of iterations,
method; they are
together with Eqs. (17), constitute the basis for our general-
Monotonic convergence to h from below: ization of the Moler-Morrison algorithm. We derive in the
next sections the corresponding iteration formulas. As much
O I X , I h
as possible, the outlineof the general algorithmsfollows that
584 is a sufficient condition
for of the cubic algorithmdefined by Eqs. (6).
k = 2m,
of the x iterate from thefirst
let us first look at the derivation
of Eqs. (17). Using the formula for binomial expansion, we Note that the numerator of this rational function has a null
have constant term and that the denominatoris monic. Similarly,
(h + X)2m + ( h - X)2m = 2 x
i=O
(z) h2i X 2 m - 2 r ,
starting with the equation
[ ( h + x , ) 2yf:
~- ( h - I1
obtained from Eqs. ( 1 7 ) , we derive the iteration formula
r
r" = ( Y,/X")'
and
which with a transformation of Eq. ( 1 8 ) gives
h2 = x: ( 1 + rn).
Combining Eqs. (19), (6), and ( 2 0 ) , we get are available, the numerical algorithm for the computation
of (a2 + b2)1'2can be expressed as follows:
(h + XJZm + (h - X,)Zm = 2xf" 1 2m
i=O
x. = max(lal,Ibl),
(21)
or
The iteration is stopped when the equality of machine
representations
fl ( 1 ) = fl ( 1 + r")
is satisfied.
The analysis of Section 4 showed that when k is odd, from Eqs. (17).
k=2m+1,
7. An estimate of the maximum number of itera-
we have an iteration on the pair (x", y,) that we formulate
tions
below as a straightgeneralization of the Moler-Morrison
Let E,, denote the relative distance of the nth iterate to the
algorithm. Using an approach similar to thatof the previous
Pythagorean sum for the formulaof order k :
section, we derive our iterationformulasfromEqs.(17).
Defining first the coefficients &n = I - "
X
h
2m +1
Defining
L
and un = -,
E
1 - 2
2
we obtain, from Eqs. ( 1 7 ) and ( 2 7 ) ,
we express as follows the algorithm for the Pythagorean sum k
(a' + b2)'I2: U"+l = UIt,
that is,
k"
k" UO
un = uo and 3=- kn
2 1+u,
E =y p , (30)
where y is arounding parametertakingthe values 1
(chopped representation) or '/2 (roundedrepresentation).
of order k ceases to be
Ignoring rounding errors, the iteration
The iteration is stopped when the equality of machine effective as soon as
representations
En < E
fl (1) = fl(1 + rn)
or, from Eq. ( 2 9 ) ,
is satisfied, except for the caseof the cubic iteration( m = l ) ,
k"
where the criterion UO E
k"
< -
fl (4) = fl (4 + r) l+uo 2
The formula of order 1 is of no interest at all. It is a limit where fl (.) denotes number floating-point representation in
586 case of linear convergence for which our machine.
Eo= 1 --,
Jz h = (u' + u2p2= u a
2
is computed with the cubic iteration(6), u being the machine
defines the maximumof uo as underflow threshold, that is, the smallest positive machine
floating-point number.
Jz-I
uo = -
Jz+ 1 We have
from Eq. (28). ro = 1 and so = 1/5.
Combining Eqs. (30), (31), and (32), N is the smallest The computationof the first iterate reduces to
value of n for which
XI = u + 2s0u.
Since
we have
fl (2s0u) = 0
and the first x iterate is u. For the same reason, the first y
(33) iterate is zero, and the iteration terminates
with the incorrect
result u . The same phenomenon occurs for all our formulas
where ceiling (w)denotes the smallest integer not less than where the iteration takes the form
W.
2 P=r 2 120.0000000000000
Q=2+r 159.5549451828402
168.7209057465608
4 P = 4r + 3r2 168.9997691646582
Q=8+8r+r2 168.9999999998423
6 P = 16r + 20rZ+ 5r' 169.0000000000000
Q = 32 + 48r + 18r2+ r' 3 120.0000000000000
8 P = 64r + 1 12r2+ 56r' + 7r4 167.3605440280932
Q = 128 + 256r + 160r2+ 32r' + r4 168.9999608618056
169.0000000000000
4 120.0000000000000
168.7209057465608
168.9999999998424
169.0000000000000
5 120.0000000000000
168.9526470501203
169.0000000000000
Table 2 Formulas of odd orders, iteration (26). 6 120.0000000000000
168.9919703649560
k P and S 169.0000000000000
3 P=2 7 120.0000000000000
S = r/(4 + r) 168.9986385471298
169.0000000000000
5 P=8+4r
S = r/(16 + 12r + r2) 8 120.0000000000000
168.9997691646582
7 P = 32 + 32r + 6r2 169.0000000000000
S = r/(64 + 80r + 24r2 + r 3 )
9 120.0000000000000
9 P = 128 + 192r + 80r2 + 8r' 168.9999608618056
S = r/(256 + 448r + 240r2 + 40r' + r4) 169.0000000000000
A preferred alternative, requiringknowledge of the under- For example, long-precision computation with an IBM Sys-
flow threshold u and machine precision E, consists of apply- tem/370 defines
ing some scaling to the dataonly when warranted.
E = 16-13 and u = 1665
In Eq. (34), we observe that, in the absence of underflow, and thescaling condition
x"+,is computed accurately to machineprecision if
m a x ( l a l , I b l ) < 16-"
4" L E.
for a factor
Thus, the computation terminates correctlyif -1
E = 16".
@"X"L EX" L u,
that is, when It mustbe noted that any reasonable integer power of @ not
less than E-' can be used as a scaling factor and thatvalues
X" L UE-l. higher than UE" can replace the threshold of applicability
Because x, is an increasing function of n, it will be sufficient with little loss of efficiency in order to achieve portability
to satisfy thecondition across a wide range of computers.
x. L U E " .
9. Examples
The iteration formulasof orders 2 through 9 are represented
These considerations define the following scaling rule:
in Tables 1 and 2 by the functionsP,Q,and S that appearin
If max ( I a 1, I b I ) < U E - ' , multiply a and b by E-'; iterations(25)and(26)(theiterationsubscriptsandthe
588 Accordingly, multiply the result by E. order superscripts aredropped for simplicity of notation).
180.0000000000000
10. Conclusion 180.9972222648517
While the efficiency of the Moler-Morrison algorithm may 180.9999999786853
181 .OOOOOOOOOOOOO
be optimum for most current computers, higher-order for-
mulas can beuseful for systems withslow division. They find 180.0000000000000
180.9999923053839
an obvious use in computations involving multiple-precision 181.OOOOOOOOOOOOO
software, where division can be particularly expensive. By
180.0000000000000
reducing thenumber of iterations,theyarealsoadvan- 180.9999999786853
tageous for implementation with interpretive high-level lan- 181.OOOOOOOOOOOOO
guages (e.g., APL). 180.0000000000000
180.9999999999410
11. Acknowledgment 181.OOOOOOOOOOOOO
The author is indebted to Cleve Moler for the communica- 180.0000000000000
tion of the cubic algorithm and its geometric
derivation. 180.9999999999998
181.OOOOOOOOOOOOO
Augustin A. Dubrulle IBM Scientific Center. 1530 Page Division (1968- 1974), and in the GeneralProducts Division (1975-
Mill Road. Palo Alto, California 94304. Mr. Dubrulle is currently 1976). His main interests are in numerical linear algebra, functional
working on numerical problems of seismic oil exploration. His approximation, floating point arithmetic, and mathematical soft-
education includes degrees in mathematics (MS, 1958) and physics ware. Mr. Dubrulle is a member of the Association for Computing
(MS, 1959) from the University of Lille, France. Before joining the Machinery, the Society for Industrial and Applied Mathematics,
Palo Alto Scientific Center in 1977, he worked asa numerical and the Special Interest Groupon Numerical Mathematics.
analyst for IBM France (1962-1967), in theData Processing
589