You are on page 1of 67

MTH375

Numerical Computation
Prof. Dr. Tahira Haroon
Chairperson, Department of
Mathematics,
COMSATS Institute of Informaton
Technology,
Park Road, Chak sahazad,
Islamabad

NUMERICAL COMPUTATION
A numerical process of obtaining a solution is to reduce the original problem
to a repetition of the same step or series of steps so that computations become
automatic is called a numerical method and a numerical method, which can be
used to solve a problem, will be called an algorithm. An algorithm is a complete
and unambiguous set of procedures leading to the solution of a mathematical problem. The selection or construction of appropriate algorithms properly falls within
the discipline of numerical analysis. Numerical analysts should consider all the
sources of error that may affect the results. They must consider how much accuracy is required, estimate the magnitude of the round-off and discretization errors,
determine an appropriate step size or the number of iterations required, provide for
adequate checks on accuracy, and make allowance for corrective action in case of
non-convergence.
Representation of numbers on computers and the errors introduced by these
representations are as follows:
The number 257, for example, is expressible as
257 = 2 102 + 5 101 + 7 100
we call 10 the base of this system. Any integer is expressible as a polynomial in the
base 10 with integral coefficients between 0 and 9 as
N = (an an1 an2 a0 )10
= an 10n + an1 10n1 + an2 10n2 + + a0 100
Modern computers read pulses sent by electrical components. The state of an
electrical impulse is either on or off. It is therefore, convenient to represent numbers
in computers in the binary system, the base 2, and the integer coefficient may take
the values 0 and 1. A nonnegative integer N will be represented in the binary system
as
N = (an an1 an2 a0 )2
= an 2n + an1 2n1 + an2 2n2 + + a0 20
where the coefficient ak are either 0 or 1. Note that N is again represented as a
polynomial, but now in the base 2.
Users of computers prefer to work in the more familiar decimal system. Then computer converts their inputs to base 2 (or perhaps base 16), then performs base 2
arithmetic, and finally, translates the answer into base 10 before it prints it out to
them.
Conversion of the binary number to decimal may be accomplished as
1

(11)2 = 1 21 + 1 20 = 3
(1101)2 = 1 23 + 1 22 + 0 21 + 1 20 = 13
and decimal number to binary as
187 = (187)10 = (10111011)2
However, if we look into the machine languages, we soon realize that other number
systems, particularly the octal and hexadecimal systems, are also used. Hexadecimal
provides more efficient use of memory space for real numbers.
It is easy to convert from octal to binary and back since three binary digits make
one octal digit. To convert from octal to binary one merely replaces all octal digits
by their binary equivalent; thus
187 = (187)10 =
=
=
=
=
=
=
=

(1) (10)2 + (8) (10)1 + (7) (10)0


(1)8 (12)28 + (10)8 (12)18 + (7)8 (12)08
(12)8 ((12)8 + (10)8 ) + (7)8
(12)8 (22)8 + (7)8
(264)8 + (7)8
(273)8
(2
7
3)8
(010 111 011)2

The Three Number Systems


Besides the various bases for representing numbers - decimal, binary, and octal there are also three distinct number systems that are used in computing machines.
First, there are the integers, or counting numbers, For example, 0, 1, 2, 3,
that are used to index and count and have limited usage for numerical analysis.
Second there are the fixed-point numbers. For example
367.143 258 765
593, 245.678 953
0.001 236 754 56
The fixed-point number system is the one that the programmer has implicitly used
during much of his own calculation. Perhaps the only feature that is different in
hand and machine calculations is that the machine always carries the same number
of digits, whereas in hand calculation the user often changes the number of figures
he carries to fit the current needs of the problem.
2

Third, there is the floating-point number system, which is the one used in
almost all practical scientific and engineering computations. This number system
differs in significant ways from the fixed-point number system. Typically, the computer word length includes both the mantissa and exponent; thus the number of
digits in the mantissa of a floating point number is less than in that of a fixed
point.

Floating-Point Arithmetic
Scientific and engineering calculations are, nearly in all cases, carried out in floatingpoint arithmetic. The computer has a number of values it chooses from to store as
an approximation to the real number. The term real numbers is for the continuous
(and infinite) set of numbers on the number line. When printed as a number with
a decimal point, it is either fixed point or Floating-point, in contrast to integers.
Floating-point numbers have three parts:
1. the sign (which requires one bit);
2. the fraction part-often called the mantissa but better characterized by the
name significand;
3. the exponent part-often called the characteristic.
The significand bits(digits) constitute the fractional part of the number. In almost
all cases, numbers are normalized, meaning that the fraction digits are shifted and
exponent adjusted so that a1 is nonzero. e.g.,
27.39 +.2739 102 ;
0.00124 .1240 102 ;
37000 +.3700 105 .
Observe that we have normalized the fractions-the first fraction digit is nonzero. Zero
is a special case; it usually has a fraction part with all zeros and a zero exponent.
This kind of zero is not normalized and never can be.
Most computers permit two or even three types of numbers:
1. single precision, use the letter E in the exponent which is usually equivalent
to seven to nine significant decimal digits;
2. double precision, use the letter D in the exponent instead of E varies from
14 to 29 significant decimal digits, but is typically about 16 or 17.;
3. extended precision, which may be equivalent to 19 to 20 significant decimal
digits.
3

Calculation in double precision usually doubles the storage requirements and


more than doubles running time as compared with single precision.
The finite range of the exponent also is a source of trouble, namely, what are
called overflow and underflow which refer respectively to the numbers exceeding
the largest - and the smallest-sized (non-zero) numbers that can be represented
within the system.
Numerical methods provide estimates that are very close to the exact analytical solutions: obviously, an error is introduced into the computation. This error is
not a human error, such as a blunder or mistake or oversight but rather a discrepancy between the exact and approximate (computed) values. In fact, numerical
analysis is a vehicle to study errors in computations. It is not a static discipline. The continuous change in this field is to devise algorithms, which are both
fast and accurate. These algorithms may become obsolete and may be replaced
by algorithms that are more powerful. In the practice of numerical analysis it is
important to be aware that computed solutions are not exact mathematical solutions but numerical methods should be sufficiently accurate1 (or unbiased) to meet
the requirements of a particular scientific problem and they also should be precise2
enough. The precision of a numerical solution can be diminished in several subtle
ways. Understanding these difficulties can often guide the practitioner in the proper
implementation and/or development of numerical algorithms.

Error Analysis
Error analysis is the study and evaluation of error. An error in a numerical
computation is simply the difference between the actual (true) value of a quantity
and its computed (approximate) value. There are three common ways to express
the size of the error in a computed result: Absolute error, Relative error and
Percentage Error.
Suppose that x is an approximation (computed value) to x. The error is
 = x x ,

Absolute Error:The absolute error of a given result is defined as


absolute error = |true value - approximate value|
Ea = |x x |
1

Accuracy is the number of digits to which an answer is correct.


Precision is the number of digits in which a number is expressed or a given, irrespective of
the correctness of the digits.
2

Relative Error:The relative error =

|true value - approximate value|


|true value|

Er =

Ea
;
|x |

x 6= 0

Ea
;
|x|

x 6= 0

If actual value is not known, then


Er =

is often a better indicator of the accuracy.

Percentage error: Relative error expressed in percentage is called the percentage error, defined by
P E = 100 Er

Significant Digits
In considering rounding errors, it is necessary to be precise in the usage of approximate digits. A significant digit in an approximate number is a digit, which gives
reliable information about the size of the number. In other words, a significant digit
is used to express accuracy, i.e., how many digits in the number have meaning.

Errors
The main sources of error are
Gross errors, can be avoided by taking enough care,
Errors in original data, nothing can be done to overcome such errors by any
choice of method, but we need to be aware of such uncertainties; in particular,
we may need to perform tests to see how sensitive the results are to changes
in the input information,
Truncation errors, due to the finite representation of processes,
Round-off errors, due to the finite representation of numbers in the machine.
5

They all cause the same effect: diversion from the exact answer. Some errors
are small and may be neglected while others may be devastating if overlooked.
Rounding Errors is the most basic source of errors in a computer. It occurs
when a calculator or computer is used to perform real number calculations. This
error arises because the arithmetic performed in a machine involves numbers with
only a finite number of digits, say, n significant digits by rounding off the (n + 1)th
place and dropping all digits after the nth with the result that calculations are
performed with approximate representations of the actual numbers. The error that
results from replacing a number with its floating-point form is called round-off
error (regardless of whether the rounding or chopping method is used).
When machine drop without rounding, which is called chopping; this can cause
serious trouble.
Round-off causes trouble mainly when two numbers of about the same size are
subtracted.
A second, more insidious trouble with round off , especially with chopping, is
the presence of internal correlations between numbers in the computation so that,
step after step, the small error is always in the same direction:

Error Accumulation in Computations


1. Error Accumulation in Addition,
2. Error Accumulation in Subtraction,
3. Error Accumulation in Multiplication,
4. Error Accumulation in Division,
5. Errors of Powers and Roots,
6. Error in Function Evaluation,

Propagated Error
The local error at any stage of the calculation is propagated through out the remaining part of the computation, i.e., the error in the succeeding steps of the process due
to the occurance of an earlier error. Propagated error is more subtle than the
other errors-such errors are in addition to the local errors. Propagated error is of
critical importance. If errors are magnified continuously as the method continues,
eventually they will overshadow the true value, destroying its validity; we call such
a method unstable. For a stable method-the desirable kind- errors made at early
points die out as the method continues.
6

Numerical Cancellation
Accuracy is lost when two nearly equal numbers are subtracted. Thus care should
be taken to avoid such subtraction where possible, because this is the major source
of error in floating point operations.

Errors in Converting Values


The numbers that are input to a computer are ordinarily base10 values. Thus the
input must be converted to the computers internal number base, normally base 2.
This conversion itself causes some errors.

Machine eps
One important measure in computer arithmetic is how small a difference between
two values the computer can recognize. This quantity is termed the computer eps.

The Solution of Nonlinear Equations


Bisection Method
In bisection method, to solve the expression f (x) = 0, we first must know an
interval in which a root lies. Therefore, an interval x1 x x2 within the interval
[a, b] has been found such that
f (x1 )f (x2 ) < 0
and the method undertakes to decrease the size of the interval. This decrease is acx1 + x2
, of the interval
complished by evaluating the function f (x) at the midpoint,
2
then using the condition

x + x1
0 a zero at 1



2

x1 + x1
x1 + x 1
f (x1 )f
=
< 0 new interval (x1 ,
)

2
2

> 0 new interval ( x1 + x1 , x )


2
2
The magnitude of this error estimate (which is the interval size after n iteration
steps) is precisely
ba
error = n
2
This error does not take machine errors into account, which is handled separately.

Advantages
7

The main advantage of bisection method is that it is guaranteed to work if


f (x) is continuous in [a,b] and if the values of x = a and x = b actually bracket a
root.
Another important advantage that few other root finding methods share is
that the number of iterations to achieve a specified accuracy is known in advance.
Each repetition halves the length of the interval, and 10 repetitions, for example, reduce the length of the original interval by a factor of 210 = 1024 > 1000 = 103 .
Thus 10 or, utmost, 20 repetitions are all that are likely to be required.

Objection
This method is slow to converge.
The possibilities to end the cycle of repetitions are
1). |x1 x2 | 
absolute accuracy in x.
x1 x2
|
relative accuracy in x (except for x1 = 0)
2). |
x1
3). |f (x1 ) f (x2 )|  function values small
4).
Repeat n times
a good method
Warnings on the Bisection Method
Function must be continuous.
For example, if
1
f (x) =
x
then the bisection processes will come down to a small interval about x = , and
probably none of our test will have caught the error because the bisection method
does not recognize the difference between root and singularity. The Secant

Method
Almost every function can be approximated by a straight line over a small interval. Let x is near to the root r. Assume that f (x) is linear in the vicinity of the
root r. Choose another point, x1 , which is near to x and also near to r (which we
dont know yet), then from the obvious similar triangles we get
x2 = x

f (x )(x x1 )
f (x ) f (x1 )

Since f (x) is not exactly linear, x2 is not equal to r, but it should be closer than
either of the two points we begin with. We can continue to get better estimates of
the root if we do this repeatedly, always choosing the two xvalues nearest to r for
drawing the straight line.

Linear Interpolation (False-Position or Regula Falsi Method)


This method is the combination of bisection method and secant method. In this
method assumption f (a)f (b) < 0 is taken from bisection method while the concept
8

of using similar triangles is taken from secant method.


In the false position method, we cannot be sure of the number of steps required
to decrease the interval by a preassigned amount.
The Regula Falsi (False-Position) method is the same as the secant method,
except that the condition
f (xn )f (xn1 ) < 0
which is not used in secant method.

Modified False-Position Method


A simple modification to the false-position method greatly improves it. The main
weakness in the original method is slow, one-sided approach to the zero. To
remedy this, we arbitrarly at each step divide the function value that we keep by
2.

Newton-Raphson Method
One of the most widely used methods of solving equations is Newton-Raphson
Method. Starting from an initial estimate which is not too far from a root, x1 , we
extrapolate along the tangent to its intersection with the xaxis, and take that as
the next approximation. This is continued until either the successive xvalues are
sufficiently close, or the value of the function is sufficiently near zero.
General formula is
f (xk )
.
xk+1 = xk 0
f (xk )
This formula provides a method of going from one guess xk to the next guess
xk+1 .
Newtons method when it works is fine. The method does not always converge;
it may jump to another root or oscillate around the desired root.
Thus, in practice, unless the local structure of the function is well understood,
Newtons method is to be avoided.
Newtons method also works for complex roots.
Newtons method is widely used because; at least in the near neighborhood of
a root, it is quadratically convergent. However, offsetting this is the need for two
function evaluations at each step, f (xn ) and f 0 (xn ).

Mullers Method
Mullers method is based on approximating the function in the neighborhood of
the root by a quadratic polynomial. A second degree polynomial is made to fit three
points near a root,
(x , f (x )), (x1 , f (x1 )), (x2 , f (x2 )),
9

and the proper zero of this quadratic, using the quadratic formula, is used as the
improved estimate of the root. The process is then repeated using the three points
nearest the root being evaluated.
The procedure for Mullers method is developed by writing a quadratic equation
that fits through three points in the vicinity of the root. Then we get,
x = x

2c
b2 4ac

where
f1 h2 + f2 h1 f (h2 + h1 )
h21 h2 + h1 h22
f1 f ah21
b =
,
c = f
h1
h1 = x1 x ,
h2 = x x2
a =

with the sign in the denominator taken to give the largest absolute value of the
denominator (i.e., if b > 0, choose plus; if b < 0, choose minus; if b = 0, choose
either).
To take the root of the polynomial as one of the set of three points for the next
approximation, taking three points that are most closely spaced (i.e., if the root is
to the right of x , take x , x1 , and the root; if to the left, take x , x2 , and the root).
Always reset the subscripts to make x be the middle of the three values.

Fixed-Position Iteration
Fixed point iteration is a possible method for obtaining a root of the equation
f (x) = 0.

(1)

In this method, we rearrange equation (1) of the form


g(x) x = 0,

or

x = g(x),

(2)

so that any solution of (2) i.e., any fixed point of g(x), is a solution of (1).

Methods to Solve a System of Linear Equations


A matrix is a rectangular array of numbers in which not only the value of the
number is important but also its position in the array. The number of its rows and
columns describes the size of the matrix. A matrix of n rows and m columns is said
to be n m.
10

A=

a11
a21
a31

an1

a12 a13
a22 a23
a32 a33

an2 an3

a1m
a2m
a3m

= [aij ],

i = 1, 2, 3, , n,

j = 1, 2, 3, , m

anm

Two matrices of the same size may be added or subtracted. The sum of A =
[aij ] and B = [bij ] is the matrix whose elements are the sum of the corresponding
elements of A and B:
C = A B = [aij bij ] = [cij ].
Multiplication of two matrices is defined when the number of columns of first matrix
is equal to the number of rows of second matrix i.e., when A is nm and B is mr :

[cij ] = [aij ][bij ]


m
X
or cij =
aik bkj ,

i = 1, 2, 3, , n,

j = 1, 2, 3, , r.

k=1

If A is nm, B must have m rows or else they are said to be nonconformable


for multiplication and their product is undefined. In general, AB 6= BA, so the
order of factors must be preserved in matrix multiplication.
If k is a scalar then
kA = C,

or

cij = kaij

Matrix with only one column n 1 in size, is termed as a column vector, and one
of only one row, 1 m in size, is called a row vector. Normally, term vector is
used for a column vector. If A is n n, it is called a square matrix.
The set of n linear equations in m unknowns can be written as

a11 x1 + a12 x2 + a13 x3 + + a1m xm = b1

a21 x1 + a22 x2 + a23 x3 + + a2m xm = b2

a31 x1 + a32 x2 + a33 x3 + + a3m xm = b3


(3)

an1 x1 + an2 x2 + an3 x3 + + anm xm = bn


much more simply in matrix notation, as
Ax = b,
11

where

A=

a11 a12 a13 a1m


a21 a22 a23 a2m
a31 a32 a33 a3m
..
.
an1 an2 an3 anm

x=

x1
x2
x3
..
.

xm

b=

b1
b2
b3
..
.

bm

A very important special case is the multiplication of two vectors, when gives
a matrix of one row and one column, a pure number, a scalar then this product is
called the scalar product of the vectors, or inner product.
Reverse the order of multiplication of these two vectors, and get a matrix, then
this product is called the outer product.
If all the elements above the diagonal are zero, a matrix is called lower-triangular;
it is called upper-triangular when all the elements below the diagonal are zero. If
only the diagonal terms are nonzero, the matrix is called a diagonal matrix. When
the diagonal elements are each equal to unity while all off-diagonal elements are zero,
the matrix is said to be the identity matrix. Tridiagonal matrices are those
that have nonzero elements only on the diagonal and in the positions adjacent to the
diagonal. When a matrix is square, a quantity called its trace is defined, which is
the sum of the elements on the main diagonal. All the elements of the null matrix
are zero. For a matrix defined by A = [aij ], its transpose is defined by AT = [aji ].
The inverse of a matrix A is written as A1 and satisfies AA1 = A1 A = I. A
matrix that has orthonormal columns is called an orthogonal matrix. A vector
whose length is one is called a unit vector. A vector that has all its elements equal
to zero except one element, which has a value of unity, is called a unit basis vector.
There are three distinct unit basis vectors for order-3 vectors. Null vectors are
defined
asthe vectors, with all the elements zero. Transpose vector of a vector
x1

u = x2 is given by uT = [x1 x2 x3 ]
x3

Sigma Notation
n
X

xi = x1 + x 2 + x3 + + xn

i=1
n
X
i=1
1
X

c = c

n
X

1 == c(1 + 1 + 1 + + 1) = cn

i=1

xi = x1

i=1

12

n
X

xi =

i=1

n
X

xj =

j=1

n
X

xk .

i, j, k are dummy index

k=1

System (3) can be written as


m
X
j=1
m
X
j=1
m
X

a1j xj =
a2j xj =
a3j xj =

j=1

m
X

b1

b2

anj xj

j=1

b3

= bn

m
X

aij xj = bi

i = 1, 2, 3, , n

j=1

Product Notaion
n
Y

x1 = x 1 x2 x3 xn

i=1

The Gaussian Elimination Method


A method in which the unknowns from the set of equations are eliminated by combining equations is known as an elimination method. It is called Gaussian elimination if a particular systematic scheme, attributed to Gauss, is used in the elimination process. This method is classified as direct methods. Using Gausss
method, a set of n equations in n unknowns is reduced to an equivalent triangular
set, which is then easily solved by back substitution.
An efficient way of programming Gausss elimination method for the computer,
we write a general procedure for reducing matrices as
akij

ak1
ij

ak1
kj

ak1
k1 ik
akk

k+1 j m
k+1 i n


(4)

where the is, js, ks, and so on, are as previously defined. The superscripts shown
merely correspond to the primes used in identifying successive reduced matrices,
and are not needed in a computer program.
The back substitution procedure may be generalized in the form of the following
set of equations:
13

anm
ann
P
aim nj=i+1 aij xj
=
aii

(5)

xn =
xi

i = n 1, n 2, , 1

(6)

There are two points, yet to be considered. First, we must guard against dividing by zero. Observe that zeros may be created in the diagonal positions even
if they are not present in the original matrix of coefficients. A useful strategy to
avoid (if possible) such zero divisors is to rearrange the equations so as to put the
coefficient of large magnitude on the diagonal at each step. This is called pivoting.
Complete pivoting may require both row and column interchanges. This is not
frequently done. Partial pivoting which places a coefficient of larger magnitude
on the diagonal by row interchanges only, will guarantee a nonzero divisor if there
is a solution to the set of equations, and will have the added advantage of improved
arithmetic precision. The diagonal elements that result are called pivot elements.
The second important point is the effect of the magnitude of the pivot elements
on the accuracy of the solution. If the magnitude of the pivot element is appreciably
smaller than the magnitude (absolute), in general, of the other elements in the
matrix, the use of the small pivot element will cause a decrease in the solution
accuracy. Therefore, for overall accuracy, using as a pivot row the row having the
largest pivot element should make each reduction. Such a provision should always
be incorporated in a computer program that is to solve fairly large numbers of
simultaneous equations.
When only a small number of equations are to be solved, the round-off error is
small and usually does not substantially affect the accuracy of the results, but if
many equations are to be solved simultaneously, the cumulative effect of round-off
error can introduce relatively large solution errors. For this reason, the number of
simultaneous equations, which can be satisfactorily solved by Gausss elimination
method, using seven to ten significant digits in the arithmetic operations, is generally
limited to 15 to 20 when most of all of the unknowns are present in all of the
equations (the coefficient matrix is dense). On the other hand if only a few unknowns
are present in each equation (the coefficient matrix is sparse), many more equations
may be satisfactorily handled.
The number of equations which can be accurately solved also depends to a great
extent on the condition of the system of equations. If a small relative change in
one or more of the coefficients of a system of equations results in a small relative
change in the solution, the system of equations is called a well-conditioned system.
If, however, a small relative change in one or more of the coefficient values results
in a large relative change in solution values, the system of equations is said to be
ill conditioned. Since small changes in the coefficients of an equation may result
14

from round-off error, the use of double-precision arithmetic and partial pivoting
or complete pivoting becomes very important in obtaining meaningful solutions
of such sets of equations.
There exists the possibility that the set of equations has no solution or that
the prior procedure will fail to find it. During the triangularization step, if a
zero is encountered on the diagonal, we cannot use that row to eliminate coefficients
below that zero element. However, in that case, we will continue by interchanging
rows and eventually achieve an upper triangular matrix of coefficients. The real
stumbling block is finding a zero on the diagonal after we have trangularized. If
that occurs, the back substitution fails, for we cannot divide by zero. It also means
that the determinant is zero: there is no solution.

Gauss-Jordan Elimination Methods


This procedure varies from the Gaussian method in that, when an unknown is
eliminated, it is eliminated from all the other equations, i.e., from those preceding
the pivot equation as well as those following it. This eliminates the necessity of
using the back substitution process employed in Gausss method.
The elements of new matrix B can be evaluated from the elements of old matrix
A by using the following formulas:

1 <i n

a1j ai1
1 <j m
(7)
bi1,j1 = aij

a11
a11 6=
0


a1j
1 <j m
bn,j1 =
(8)
a11 6=
0
a11
Equation (7) is used to find all elements of the new matrix B except those making
up the last row of that matrix. For determining the elements of the last row of the
new matrix, equation (8) is used. In these equations,
i
j
n
m
a
b

=
=
=
=
=
=

row number of old matrix A


column number of old matrix A
maximum number of rows
maximum number of columns
an elements of old matrix A
an elements of new matrix B

Choleskys Method
Choleskys method also known as Crouts method. Crouts method transforms the
coefficient matrix, A, into the product of two matrices, L (lower triangular matrix)
15

and U, (upper triangular matrix) where U has one on its main diagonal (the method
in which L has the ones on its diagonal, is known as Doolittles method).
The general formula for getting elements of L and U corresponding to the coefficient matrix for n simultaneous equations can be written as
lij = aij
aij
uij =

j1
X
k=1
i1
X
k=1

lii

lik ukj

j i i = 1, 2, 3, , n

(9)

lik ukj
i < j j = 2, 3, , n + 1

(10)

If we make sure that a11 in the original matrix is nonzero, then the divisions of
equations (10) will always be defined since the lii values will be nonzero. This may
be seen by noting that
LU = A
and therefore the determinant of L times the determinant of U equals the determinant of A. that is,
|L||U| = |A|
We are assuming independent equations, so the determinant of A is nonzero.

Norm
Discussing multicomponent entities like matrices and vectors, we frequently need
a way to express their magnitude- some measure of bigness or smallness. For
ordinary numbers, the absolute value tells us how large the number is, but for
a matrix there are many components, each of which may be larger or small in
magnitude. (We are not talking about the size of a matrix, meaning the number of
element it contains.)
Any good measure of the magnitude of a matrix (the technical term is norm
must have four properties that are intuitively essential:
1. the norm must always have a value greater than or equal to zero, and must be
zero only when the matrix is the zero matrix i.e.,
kAk 0 and kAk = 0 if and only if A = 0.
2. The norm must be multiplied by k if the matrix is multiplied by the scalar k.
i.e.,
kkAk = |k| kAk.
16

3. The norm of the sum of two matrices must not exceed the sum of the norms.
i.e.,
kA + Bk kAk + kBk.
4. The norm of the product of two matrices must not exceed the product of the
norms. i.e.,
kABk kAk kBk.
The third relationship is called the triangular inequality. The fourth is important
when we deal with the product of matrices.
For vectors in two or three space, the length satisfies all four requirements and
is a good value to use for the norm of a vector. This norm is called the Euclidean
norm, and is computed by
q
x21 + x22 + x23 .
Its generalized form will be
q
kxke = x21 + x22 + x23 + + x2n =

n
X

! 21
x2i

i=1

This is not the only way to compute a vector norm, however. The sum of the absolute
values of the xi can be used as a norm: the maximum value of the magnitudes of
the xi will also serve. These three norms can be interrelated by defining the p-norm
as
! p1
n
X
p
.
kxkp =
|x|i
i=1

From this it is readily seen that


kxk1 =

n
X

|xi | =

sum of magnitudes;

i=1

kxk2 =

n
X

! 12
x2i

Euclidean norm;

i=1

kxk = max

1in

n
X

! 21
x2i

maximum-magnitude norm.

i=1

which of these vector norms is best to use may depend on the problem.
17

The norm of a matrix are developed by a correspondence to vector norms. Matrix


norms that correspond to the above, for matrix A, can be
kAk1 =

max

1jn

kAk = max

1in

n
X

| aij | =

maximum of column sum;

| aij | =

maximum of row sum.

i=1
n
X
j=1

The matrix norm kAk2 that corresponds to the 2-norm of a vector is not readily
computed. It is related to the eigenvalues of the matrix. This norm is also called
the spectral norm.
For an m n matrix, we can paraphrase the Euclidean (also called Frobenius)
norm as
kAke =

m X
n
X
i=1

! 21
a2ij

j=1

Why are norms important? For one thing, they let us express the accuracy of the
solution to a set of equations in quantitative terms by starting the norm of the error
vector (the true solution minus the approximate solution vector). Norms are also
used to study quantitatively the convergence of iterative methods of solving linear
systems.

Iterative Methods
Iterative methods, opposed to the direct method of solving a set of linear equations
by elimination, in certain cases, are preferred. Iterative techniques are seldom use
for solving linear systems of small dimension, since the time required for sufficient
accuracy exceeds that the required for direct techniques. When the coefficient matrix
is sparse (has many zeros) they may be more rapid.

Jacobi and Gauss-Seidel Methods


Suppose the given system of linear equations A x = b, where A is nonsingular, so
it can always be rearranged so that the diagonal elements are nonzero and

x = b0 + C x,
18

where

b1

a11

a22

0
b =
,C =
b3

.33

b
n
ann

a12
a11

a13
a11

a14
a11

a21
a22

a23
a22

a24
a22

a31
a33
an1
ann

a32
a33
..
.
an2
ann

0
an3
ann

a34
a33
..
.
an4
ann

a1n
a11

a2n

a22

a3n

a33

Assuming we have an initial estimate x(0) for x, the next estimate x(1) is obtained
by substituting x(0) on the right side of the above equation and the next estimate
is obtained by substituting x(1) on the right side of equation to give x(2) . In this
iterative technique, we obtained
x(n+1) = b0 + C x(n)
If the elements of C are small in comparison to 1, A is said to be diagonally
dominant. If the elements of C are smaller in relation to 1, the more likely is
the sequence x(0) , x(1) , x(2) , , x(n) to converge. Nonetheless convergence also may
depend on how good the initial approximation is. This method is called the Jacobi
Iterative Method. It consists of solving the ith equation in A x = b for xi to
obtain (provided aii 6= 0)

n 
X
bi
aij xj
for i = 1, 2, , n
+ ,
xi =

aii
aii
j=1
j6=i

(k)

and generating each xj

from components of x(k1) for k 1 by

n 
X
(k)

xi

(k1)

aij xj

+ bi

j=1
j6=i

aii

for i = 1, 2, , n

Note that this method is exactly the same as the method of fixed-point iteration for
a single equation but it is now applied to a set of equations. This method is written
in the form
x(k+1) = G(x(k) ) = b0 C x(k)
which is identical to form
x(k+1) = g(x(k) ).
19

Jacobi method, is also known as the method of simultaneous displacements because each of the equations is simultaneously changed by using the most
recent set of x-values. Actually, the x-values of the next trial (new x) are not used,
even in part, until we have first found all its components, even though we have computed the new x1 , we still do not use this value in computing new x2 . In nearly
all cases the new values are better than the old, and should be used in preference
to the poorer values. When this is done, that is, to compute xki , the components
(k1)
(k)
(k)
(k)
of xi
are used. Since, for i > 1, x1 , x2 , , xi1 have already been computed
and are likely to be better approximations to the actual solutions x1 , x2 , , xi1
(k1)
(k1)
(k1)
(k)
than x1
, x2
, , xi1 , it seems reasonable to compute xi using these most
recently calculated values; that is,
bi
(k)

i1
X
j=1

(k)
(aij xj )

n
X

(k1)

(aij xj

j=i+1

,
for each i = 1, 2, , n
(11)
aii
This procedure is called the Gauss-Seidel Iterative method. In this method our
first step is to rearrange the set of equations by solving each equation for one of
the variables in terms of the others, exactly as we have done in the Jacobi method.
We then proceed to improve each x-value in turn, always using the most recent
approximations to the values of the other variables. The rate of convergence is more
rapid, as compared to Jacoi method.
xi

Interpolation and Polynomial Approximation


Suppose that we have some tabulated values of a function over a certain range of
its independent variables. The problem of interpolation is to find the value of the
function for some intermediate value of x not included in the table. This x value
is assumed to lie within the range of the tabulated abscissas. If it does not, the
problem of finding the corresponding function value is called extrapolation.
Tabular values may be plotted to give us a graphical picture of points through
which the function must pass. Connecting the points with a smooth curve gives
us a graphical representation of the function. If only a rough approximation of the
function value is required for some intermediate x value, it can be obtained by reading the function value directly from the graph. This procedure is called graphical
interpolation. If the given values of the independent variable are close together,
a sufficiently good graphical approximation to the function might be obtained by
connecting the points with straight line segments. An intermediate function value
can also be obtained analytically by a method based on this piecewise linear approximation of the function. From similar triangles
f (x1 ) f (x )
f (x) f (x )
=
x x
x1 x
20

or

f (x) = f (x ) +

f (x1 ) f (x )
(x x )
x1 x

(12)

for x values between x and x1 . The use of this equation is called linear interpolation, familiar to every one who has used log tables.
If we are given the value of a function f (x) and are asked to find the value of
x then the process is called the inverse interpolation. For inverse interpolation
same straight line is used, but the equation is rearranged into the more convenient
form
x =

x f (x1 ) x1 f (x ) (x1 x )f (x)


+
f (x1 ) f (x )
f (x1 ) f (x )

(13)

Here we will discuss polynomial interpolation, the simplest and certainly the
most widely used techniques for obtaining polynomial approximations. A best polynomial approximation does not give appreciably better results than an appropriate
scheme of polynomial interpolation.

Polynomial Interpolation
Several methods have been developed for finding polynomials, some of which make
use of special properties such as uniform spacing of the points of abscissa. Also, many
of these methods are useful for particular analytical or computational purposes. It
is important to remember that there is only one unique polynomial of degree n
which fits a given set of (n + 1) points. Hence, the polynomials obtained by the
different methods must be the same.

Polynomial Forms
Definition:- A polynomial p(x) of degree n is a function of the form
p(x) = a + a1 x + a2 x2 + a3 x3 + + an xn

(14)

with certain coefficients a , a1 , a2 , a3 , , an . This polynomial has (exact) degree


n in case its leading coefficient an is nonzero.
The power form (14) is the standard way to specify a polynomial in mathematical discussions. It is very convenient form for differentiating or integrating a
polynomial. But, in various specific contexts, other forms are more convenient.
The power form may lead to loss of significance, a remedy to this loss is the use
of the shifted power form
p(x) = a + a1 (x c) + a2 (x c)2 + a3 (x c)3 + + an (x c)n
where c is known as center.
21

(15)

Derivation of Taylors Series


Consider the nth degree polynomial
p(x) = b + b1 x + b2 x2 + b3 x3 + b4 x4 + + bn xn
If p(x) is known at some point x = c and we are interested to find the value of p(x)
in the neighborhood of this point. Then we can write
p(x) = b + b1 (x c) + b2 (x c)2 + b3 (x c)3 + b4 (x c)4 + + bn (x c)n
where c is called center. Since this equation is true for x = c. Therefore, p(c) = b ,
differentiating upto n times and each derivative evaluated for x = c we get
p(x) = p(c) + (x c)p0 (c) +

(x c)2 00
(x c)3 000
(x c)n (n)
p (c) +
p (c) + +
p (c)
2!
3!
n!
(16)

which is the Taylors polynomial of degree n about x = c.


A further generalization of the shifted power form, when n data points are given,
is the Newton form
p(x) = a + a1 (x c1 ) + a2 (x c1 )(x c2 )
+ a3 (x c1 )(x c2 )(x c3 ) +
+ an (x c1 )(x c2 ) (x cn )

(17)

This form plays a major role in the construction of an interpolating polynomial. It


reduces to the shifted power form if the centers c1 , c2 , c3 , c4 , , cn all equal c, and
to the power form if the centers c1 , c2 , c3 , c4 , , cn all equal zero.
additions and n(n+1)
multiplications, instead
Equation (17) takes n + n(n+1)
2
2
p(x) = a + (x c1 ){a1 + (x c2 ){a2
+ (x c3 ){a3 + + an (x cn )}} }

(18)

is the nested form, whose evaluation for any particular value of x takes 2n additions
and n multiplications.

Lagrangian Polynomials
Data where the x-values are not equispaced often occur as the result of experimental
observations or when historical data are examined.
Consider a linear polynomial (equation of a straight line passing through two
distinct data points (x , y ) and (x1 , y1 ))
p(x) =

(x x1 )
(x x1 )
y +
y1
(x x1 )
(x x1 )
22

To generalize the concept of linear interpolation, consider the construction of polynomial of degree n. For this let (x , f ), (x1 , f1 ), (x2 , f2 ), (x3 , f3 ), , (xn , fn ) be (n + 1)
data points. Here we dont assume uniform spacing between the x-values, nor do
we need the x-values arranged in a particular order. The x-values must be distinct,
however. As the linear polynomial passing through (x , f (x )) and (x1 , f (x1 )) is
constructed by using the quotients
l (x) =

(x x1 )
(x x1 )

l1 (x) =

(x x )
(x1 x )

When x = x , l (x ) = 1 and l1 (x ) = 0. When x = x1 , l (x1 ) = 0 and l1 (x1 ) = 1.


Now consider three data points (x , f ), (x1 , f1 ), (x2 , f2 ). Through these points
we need to construct l (x), l1 (x) and l2 (x) with the property that
l (x ) = 1, l (x1 ) = 0, l (x2 ) = 0,
l1 (x ) = 0, l1 (x1 ) = 1, l1 (x2 ) = 0,
l2 (x ) = 0, l2 (x1 ) = 0, l2 (x2 ) = 1,
i. e., li (xj ) = 0 when i 6= j and = 1 when i = j. Following the pattern of quotients
of linear interpolation we obtain
(x x1 )(x x2 )
(x x1 )(x x2 )
(x x )(x x2 )
l2,1 (x) =
(x1 x )(x1 x2 )
(x x )(x x1 )
l2,2 (x) =
(x2 x )(x2 x1 )
l2, (x) =

These quotients are satisfying the required conditions. Therefore, the polynomial
passing through these three data points becomes
p2 (x) = f (x )l2, (x) + f (x1 )l2,1 (x) + f (x2 )l2,2 (x)
For (n + 1) data values we need to construct for each k = 0, 1, 2, , n a quotient
ln,k (x) with the property that ln,k (xi ) = 0 when i 6= k and ln,k (xk ) = 1. To satisfy
ln,k (xi ) = 0 for each i 6= k requires that the numerator of ln,k (x) contain the term
(x x )(x x1 )(x x2 ) (x xk1 )(x xk+1 ) (x xn )
To satisfy ln,k (xk ) = 1, the deniminator and numerator of lk (x) must be equal when
evaluated at x = xk . Thus,
(x x )(x x1 ) (x xk1 )(x xk+1 ) (x xn )
(xk x )(xk x1 ) (xk xk1 )(xk xk+1 ) (xk xn )
n
Y
x xi
ln,k (x) =
for each k =, 1, 2, 3, , n
xk xi
i=0

ln,k (x) =

k6=i

23

(19)

The interpolating polynomial is easily described now that the form of ln,k is
known. This polynomial, is called the nth Lagrangian interpolating polynomial,
and defined as
p(x) = f (x )ln, (x) + f (x1 )ln,1 (x) + f (x2 )ln,2 (x) + + f (xn )ln,n (x)
n
X
=
f (xk )ln,k (x)
(20)
k=0

and

ln,k (xj ) =

1
0

j=k
j 6= k

j = 0, 1, 2, 3, , n

(21)

Forward Differeces
First forward difference of some function f (x) with respect to an increment h of
the independent variable x is
f (x) = f (x + h) f (x)
The operator always implies this operation on any function of x on which it
operates. Since f (x) is a function of x, it can be operated on by the operator
giving
2 f (x) = f (x + 2h) 2f (x + h) + f (x)
The function 2 f (x) is called the second forward difference of f (x). Third,
fourth, and higher differences are similarly obtained. In summary, the forward
difference expressions are
f (xi )
2 f (xi )
3 f (xi )
4 f (xi )
..
.

=
=
=
=

f (xi + h) f (xi )
f (xi + 2h) 2f (xi + h) + f (xi )
f (xi + 3h) 3f (xi + 2h) + 3f (xi + h) f (xi )
f (xi + 4h) 4f (xi + 3h) + 6f (xi + 2h) 4f (xi + h) + f (xi )

n f (xi ) = f (xi + nh) nf (xi + (n 1)h) +

n(n 1)
f (xi + (n 2)h)
2!

n(n 1)(n 2)
f (xi + (n 3)h) + + (1)n f (xi )
3!

in which xi denotes any specific value for x such as x , x1 , x2 , and so forth.


24

The nth forward difference is often written as


n
n
n
f (xi ) = f (xi + nh)
f (xi + (n 1)h) +
f (xi + (n 2)h)
1
2
n
n

f (xi + (n 3)h) + + (1)n


f (xi + (n k)h)
3
k
+ (1)n f (xi )
where

n(n 1)(n 2) (n k + 1)
k!
k
is the familiar symbol used for binomial coefficients.
n

Backward Differeces
The first backward difference of f (x) with respect to increment h is defined as
f (x) = f (x) f (x h)
The operator always implies this operation on the function of x on which it
operates, so that
[f (xi )] = 2 f (xi ) = f (xi ) 2f (xi h) + f (xi 2h)]
where xi denotes any specific values for x such as x , x1 , and so forth. In general,
n f (xi ) = [n1 f (xi )], and the backward differences of f (xi ) are
f (xi )
2 f (xi )
3 f (xi )
4 f (xi )
..
.

f (xi ) f (xi h)
f (xi ) 2f (xi h) + f (xi 2h)
f (xi ) 3f (xi h) + 3f (xi 2h) f (xi 3h)
f (xi ) 4f (xi h) + 6f (xi 2h) 4f (xi 3h) + f (xi 4h)
..
.
n(n 1)
n(n 1)(n 2)
n f (xi ) = f (xi ) nf (xi h) +
f (xi 2h)
f (xi 3h)
2!
3!
n(n 1) (n k + 1)
+ + (1)k
f (xi kh) +
k!
n(n 1) 3.2.1
+ (1)n
f (xi nh)
n!
=
=
=
=

The nth backward difference in terms of binomial coefficients, is written as


n
n
n f (xi ) = f (xi )
f (xi h) +
f (xi 2h)
1
2
n
n
f (xi kh)

f (xi 3h) + + (1)n


k
3
+ (1)n f (xi nh)
25

We may also note that


f (xi )
3 f (xi )

2 f (xi )
= 2 f (xi 2h)
4
f (xi ) = 4 f (xi 4h)
n f (xi ) = n f (xi nh)

= f (xi h)
= 3 f (xi 3h)

In general
k fs = k fsk

(k = 1, 2, 3, )

Central Differeces
The first central difference of f (x) with respect to increment h is defined as, by
introducing the central difference operator
h
h
f (x) = f (x + ) f (x )
2
2
The operator always implies this operation on the function of x on which it operates, so that
h
h
2 f (xi ) = [f (xi )] = [f (xi + ) f (xi )]
2
2
2
= f (xi h)
where xi denotes any specific values for x such as x , x1 , and so forth. In general,
the central differences of f (xi ) are
h
f (xi ) = f (xi )
2
2
2
f (xi ) = f (xi h)
3h
3 f (xi ) = 3 f (xi )
2
4 f (xi ) = 4 f (xi 2h)
..
..
.
.
nh
n f (xi ) = n f (xi
)
2

Divided Differences
There are three disadvantages of using the Lagrangian polynomial method for interpolation.
26

1. It involves more arithmetic operations than does the divided difference


method.
2. If we desire to add or subtract a point from the set used to construct the
polynomial, we essentially have to start over in the computations.
3. Lagrangian polynomials must repeat all of the arithmetic if we interpolate at
a new x-value. The divided different method avoids all of this computation.
Consider a function f (x) which is known at several values of x. We do not assume
that the xs are evenly spaced or even that the values are arranged in any particular
order. Then the divided difference of f (x) for the two arguments xk and xj will
be written f [xk , xj ] and is defined as
f (xk ) f (xj )
xk xj
= f [xj , xk ]

f [xk , xj ] =

is called first divided difference between xj and xk .


Divided difference of three arguments is defined as follows:f [xk , xj ] f [xj , xl ]
xk xl
= f [xl , xj , xk ]
= f [xj , xk , xl ]

f [xk , xj , xl ] =

is called second order divided difference or divided difference of three arguments xk , xj and xl .
Similarly
f [xk , xj , xl ] f [xj , xl , xn ]
x k xn
= f [xl , xj , xk , xn ]

f [xk , xj , xl , xn ] =

is third order divided difference.


Thus, for (n + 1) arguments, divided difference is defined as
f [x , x1 , x2 , x3 , , xn ] =

f [x1 , x2 , x3 , , xn ] f [x , x1 , x2 , , xn1 ]
xn x

which is known as nth order divided difference or divided difference of (n + 1)


arguments.
A special standard notation used for divided differences is
f [x , x1 ] = f[1]
f [x , x1 , x2 ] = f[2]
f [x , x1 , x2 , , xn ] = f[n]
27

The concept is even extended to a zero-order difference:


f [xs ] = fs = fs[0]

Newtons General Interpolating Polynomial


Consider the nth degree polynomial written in a special way
p(x) = a + a1 (x x ) + a2 (x x )(x x1 )
+ a3 (x x )(x x1 )(x x2 ) +
+ an (x x )(x x1 )(x x2 ) (x xn1 )

(22)

which is known as Newton form. If (x , f (x )), (x1 , f (x1 )), (x2 , f (x2 )), (x3 , f (x3 )),
, (xn , f (xn )) are (n + 1) data points, and Pn (x) is an interpolating polynomial, it
must match at (n + 1) data points i. e., Pn (xi ) = f (xi ) for i = 0, 1, 2, , n i. e.,
when

x
p(x )
a
x
p(x1 )

implies
when

=
=
=
=
=

a1 =

when

x =
p(x2 ) =

and

p(x2 ) =

Similarly when

x =
p(x3 ) =
+

and

p(x3 ) =
a3 =

x
a , and
p(x ) = f (x )
f (x ) = f
x1
a + a1 (x1 x ), and
p(x1 ) = f (x1 ) = f1
f1 f
= f [x , x1 ]
x1 x
x2
a + a1 (x2 x ) + a2 (x2 x )(x2 x1 ),
f [x2 , x1 ] f [x1 , x ]
= f [x , x1 , x2 ]
f 2 a2 =
x2 x
x3
a + a1 (x3 x ) + a2 (x3 x )(x3 x1 )
a3 (x3 x )(x3 x1 )(x3 x2 ),
f [x3 , x2 , x1 ] f [x2 , x1 , x ]
f 3 a3 =
x3 x
f [x , x1 , x2 , x3 ]

and so on. Substituting these values of a , a1 , a2 , a3 , , an in (22), we get


p(x) = f +
or

= f +

n1
X
k=0
n1
X

f [x , x1 , x2 , x3 , , xk+1 ](x x )(x x1 )(x x2 ) (x xk )


f[k+1] (x x )(x x1 )(x x2 ) (x xk )

k=0

28

which is known as Newtons general interpolating polynomial or Newtons


form in terms of divided differences.
Now solving for f (x) yields
f (x) = f + f [x , x1 ](x x ) + f [x , x1 , x2 ](x x )(x x1 )
+ f [x , x1 , x2 , x3 ](x x )(x x1 )(x x2 ) +
+ f [x , x1 , x2 , x3 , , xn ](x x )(x x1 )(x x2 ) (x xn1 ) + R(x)
(23)
where the remainder term, R(x), is given by
R(x) = f [x, x , x1 , x2 , x3 , , xn ](x x )(x x1 )(x x2 ) (x xn )

(24)

If f (x) is a polynomial of degree n, the remainder term is zero for all x, i. e.,
R(x) = 0
i.e.,
f [x, x , x1 , x2 , x3 , , xn ] = 0
is of degree zero (is a constant).
Omitting the remainder term in equation (23) gives

pn (x) = f +

n1
X

f [x , x1 , x2 , x3 , , xk+1 ](x x )(x x1 )(x x2 ) (x xk ) (25)

k=0

is the Newtons general interpolating polynomial.

Error Estimation When Function Is Unknown-The Next-Term


Rule
When dealing with experimental data, almost always the function f (x) is unknown.
But we can estimate the error of interpolation as the nth -order divided difference
f (n) (x)
is itself an approximation for
means that the error of interpolation is given
n!
approximately by the value of the next term that would be added.
This most important rule for estimating the error of interpolation is known as
the next-term rule.
29

Newton-Gregory Forward-Interpolating Polynomial


If the values of the function are given at evenly spaced intervals of the independent
variable i. e.,
x1 = x + h
x2 = x + 2 h
..
..
.
.
xn = x + n h
and for any general value of x we can write
x = x + s h
where s is a real number, then
(x x )
h
(x x1 )
h
(x x2 )
h
(x xn1 )
h

similarly

= s
= (s 1)
= (s 2)
= (s (n 1))

and
f
h
2 f
f [x , x1 , x2 ] =
2! h2
..
..
.
.
n f
f [x , x1 , x2 , x3 , , xn ] =
n! hn
f [x , x1 ] =

and Newtons general interpolating polynomial transformed as


s(s 1) 2
s(s 1)(s 2) 3
f +
f
2!
3!
s(s 1)(s 2) (s (n 1)) n
+ +
f
n!

p(x) = f + s f +

gives the Newton-Gregory forward-interpolating polynomial which may be


expressed more compactly by using the binomial coefficient notation; it is
s
s
s
s
p(x) = f +
f +
2 f +
3 f + +
n f
1
2
3
n
30

Newton-Gregory Backward-Interpolating Polynomial


If the values of the function are given at evenly spaced intervals of the independent
variable i. e.,

xn1 = xn h
xn2 = xn 2 h
..
..
.
.
x = x n h
and for any general value of x we can write
x = xn s h
where s is a real number, then
(x xn )
h
(x xn1 )
h
(x xn2 )
h
(x x1 )
h

similarly

= s
= (1 s)
= (2 s)
= ((n 1) s)

fn
h
2 fn
f [xn , xn1 , xn2 ] =
2! h2
..
..
.
.
n fn
f [x , x1 , x2 , x3 , , xn ] =
n! hn
and

f [xn , xn1 ] =

and Newtons general interpolating polynomial transformed as


s(s 1) 2
s(s 1)(s 2) 3
fn
fn +
2!
3!
s(s 1)(s 2) ((s (n 1)) n
+ (1)n
fn
n!

p(x) = fn s fn +

gives the Newton-Gregory backward-interpolating polynomial which may be


expressed more compactly by using the binomial coefficient notation; it is
s
s
s
s
p(x) = fn
fn +
2 fn
3 fn + + (1)n
n fn
1
2
3
n
31

Differences Versus Divided Differences


We can relate divided differences of functional values with function differences when
the xvalues are evenly spaced, i. e.,
f (x )
f (x1 ) f (x )
=
x1 x
h
f [x1 , x2 ] f [x , x1 ]
2 f (x )
f [x , x1 , x2 ] =
=
x2 x
2h2
3 f (x )
f [x1 , x2 , x3 ] f [x , x1 , x2 ]
=
f [x , x1 , x2 , x3 ] =
x3 x
3!h3
..
..
.
.
f [x , x1 ] =

f [x , x1 , x2 , , xn ] =

n f (x )
f [x1 , x2 , , xn ] f [x , x1 , , xn1 ]
=
xn x
n!hn

Interpolating with a Cubic Spline


Often a large number of data points have to be fitted by a single smooth curve, but
the Lagrangian interpolation or Newton interpolation polynomial of a high order
is not suitable for this purpose, because the errors of a single polynomial tend to
increase drastically as its order becomes large i. e., the oscillatory nature of the highdegree polynomials and the properties that a fluctuation over a small portion of the
interval can induce large fluctuations over the entire range restrict their use when
approximating functions that arise in many situations. One remedy is to the problem
is to fit different polynomials to the subregions of f (x). One of these is to divide
the interval into a collection of subintervals and construct a (generally) different
approximating polynomial on each subinterval. Approximation by functions of this
type is called piecewise polynomial approximation.
The simplest piecewise polynomial approximation is piecewise linear interpolation, which consists of joining a set of data points
(x , f ), (x1 , f1 ), (x2 , f2 ), , (xn , fn )
by a series of straight lines. This is the method of linear interpolation. The problem
with this linear function approximation is that at each of the endpoints of the subintervals there is no assurance of differentiability, which means that the interpolating
function is not smooth at these points, i.e., the slope is discontinuous at these points.
Often it is clear from the physical conditions that such a smoothness condition is
required and that the approximating function must be continuously differentiable.
32

The most common piecewise polynomial approximation using cubic polynomial


between each successive pair of data, Cubic spline interpolation, designed to suit
this purpose.
The drafting spline bends according to the laws of beam flexure so that both
the slope and curvature are continuous. Our mathematical spline curve must use
polynomials of degree three (or more) to match this behavior. While splines can be
of any degree, cubic splines are by far the most popular.
In cubic spline interpolation, a cubic polynomial is used in each interval between
two consecutive data points. One cubic polynomial has four free coefficients, so it
needs four conditions. Two of them come from the requirements that the polynomial
must pass through the data points at the two end points of the interval. The other
two are the requirements that the first and second derivatives of the polynomial
become continuous across each data points.
Start with the data points
(x , f ), (x1 , f1 ), (x2 , f2 ), , (xn , fn )
Write the equation for a cubic in the ith interval, which lies between the points
(xi , yi ) and (xi+1 , yi+1 ) in the form
gi (x) = ai (x xi )3 + bi (x xi )2 + ci (x xi ) + di .

(26)

Thus the cubic spline function we want is of the form


g(x) = gi (x)

on the interval

[xi , xi+1 ],

for i = 0, 1, 2, 3, , n 1

and meets the conditions:


gi (xi )
gi (xi+1 )
gi0 (xi+1 )
gi00 (xi+1 )

=
=
=
=

yi ,
i = 0, 1, 2, 3, , n 1 and gn1 (xn ) = yn ;
gi+1 (xi+1 ),
i = 0, 1, 2, 3, , n 2;
0
gi+1 (xi+1 ),
i = 0, 1, 2, 3, , n 2;
00
gi+1 (xi+1 ),
i = 0, 1, 2, 3, , n 2;

(27)
(28)
(29)
(30)

Equations (27-30) say that the cubic spline fits to each of the points (27), is continuous (28), and is continuous in slope and curvature, (29) and (30), throughout the
region spanned by the points.
Using equation (27) in equation (26) immediately gives
di = yi ,

1 = 0, 1, 2, 3, , n 1.

(31)

Equation (27) then gives


yi+1 = ai (xi+1 xi )3 + bi (xi+1 xi )2 + ci (xi+1 xi ) + yi .
= ai h3i + bi h2i + ci hi + yi , i = 0, 1, 2, 3, , n 1
33

(32)

where hi = xi+1 xi , the width of the ith interval.


To relate the slopes and curvatures of the joining splines, we differentiate equation (26):
gi0 (x) = 3ai (x xi )2 + 2bi (x xi ) + ci ,
gi00 (x) = 6ai (x xi ) + 2bi ,
for i = 0, 1, 2, 3, , n 1.

(33)
(34)

00
(xn ) then
Let S(xi ) = gi00 (x) for i = 0, 1, 2, 3, , n 1 and Sn (xn ) = gn1

Si (xi ) =
=
Si+1 (xi ) =
=

6ai (xi xi ) + 2bi ,


2bi ;
6ai (xi+1 xi ) + 2bi ,
6ai hi + 2bi .

Hence we can write


Si
,
2
Si+1 Si
=
.
6hi

bi =

(35)

ai

(36)

Substitute the relations for ai , bi , di given by equations (31), (35) and (36) into
equation (26) we get
Si+1 Si 3 Si 2
hi + hi + ci hi + yi = yi+1 ;
6hi
2
yi+1 yi hi Si+1 + 2hi Si
=

.
hi
6

gi (xi+1 ) =
implies

ci

(37)

Now equation (33) gives


yi0 = 3ai (xi xi )2 + 2bi (xi xi ) + ci ;
= ci .

(38)

In the previous interval, from xi1 to xi , the equation for cubic spline is
gi1 (x) = ai1 (x xi1 )3 + bi1 (x xi1 )2 + ci1 (x xi1 ) + di1 ;
0
gi1
(x) = 3ai1 (x xi1 )2 + 2bi1 (x xi1 ) + ci1 ;
0
gi1
(xi ) = 3ai1 (xi xi1 )2 + 2bi1 (xi xi1 ) + ci1 ;
(39)
Using equation (29), we obtain
yi0 = 3ai1 h2i1 + 2bi1 hi1 + ci1 ;
34

(40)

Equating equations (38) and (40) and using (35) (36) and (37) we get
yi+1 yi hi Si+1 + 2hi Si

hi
6
Si1
yi yi1 hi1 Si + 2hi1 Si1
Si Si1 2
hi1 + 2
hi1 +

= 3
6hi1
2
hi1
6

yi0 =

(41)

Simplifying this equation, we get




hi1 Si1 + 2(hi1 + hi )Si + hi Si+1


yi+1 yi yi yi1
= 6

hi
hi1
= 6 (f [xi+1 , xi ] f [xi , xi1 ]) .

(42)

If all of the intervals are equal in length, this simplifies to a linear difference
equation with constant coefficients:
Si1 + 4Si + Si+1 = 6

2 f (xi1 )
.
h2

(43)

Equations (42) and (43) represent n 1 relations in n + 1 unknowns; so we need two


values of the second derivative or two more equations involving second derivatives
at some of the points xi . Often the end values S and Sn are chosen. The conditions
frequently used are
1. S = 0 and Sn = 0. called a natural spline, makes the end cubics approach
linearity at their extremities.
2. Another frequently used condition, normally called clamped spline. If f 0 (x ) =
A and f 0 (xn ) = B, we get
At left end:
At right end:

2h S + h1 S1
= 6(f [x , x1 ] A)
hn1 Sn1 + 2hn Sn = 6(B f [xn1 , xn ])

3. S = S1 and Sn = Sn1 ; called parabolically terminated spline.


4. Take S as a linear extrapolation from S1 and S2 and Sn as a linear extrapolation from Sn1 and Sn2 . Only this condition gives cubic spline curves that
match exactly to f (x) when f (x) is itself a cubic. We get
At left end:
implies

S2 S1
,
h1
(h1 + h )S1 h S2
=
.
h1

S1 S
h

S
35

At right end:
implies

Sn Sn1
Sn1 Sn2
=
,
hn1
hn2
(hn1 + hn2 )Sn1 hn1 Sn2
.
Sn
=
hn2

this gives too much curvature in the end intervals.


For each end condition, the coefficient matrices become
Condition 1
S = 0, Sn = 0 :

2(h + h1 )
h1

h1
2(h1 + h2 )
h2

h2
2(h2 + h3 )
h3

h
2(h
3
3 + h4 )

..
..

.
.

h4
hn2 2(hn2 + hn1 )

Condition 2

f0 = A, fn0 = B :

2h
h1
h 2(h + h1 )
h1

h1
2(h1 + h2 )
h2

h
2(h
2
2 + h3 ) h3

..
..

.
.

hn2 2hn1

Condition 3

S = S1 , Sn = Sn1 :

(3h + 2h1 )
h1

h1
2(h1 + h2 )
h2

h
2(h
h3
2
2 + h3 )

h3
2(h3 + h4 )

..
..

.
.

h4
hn2 (2hn2 + 3hn1 )

Condition 4

S and Sn are linear extrapolations:


36

(h + h1 )(h + 2h1 ) (h21 h2 )

h1
h1

h1
2(h1 + h2 )
h2

h2
2(h2 + h3 )

..

(h2n2 h2n1 )
hn2

h3

..

(hn1 + hn2 )(hn1 + 2hn2 )


hn2

After the Si values are obtained, we can compute ai , bi , ci , and di for the cubics
in each interval, using
ai =

Si+1 Si
6hi

bi =

ci = f [xi , xi+1 ]

hi Si+1 + 2hi Si
6

Si
2

di = yi

Numerical Differentiation and Numerical Integration


Numerical integration, or numerical quadrature as it is often called, consists
essentially of finding a close approximation to the area under a curve of a function
f (x) which has been determined either from experimental data or from a mathematical expression.
Before discussing the general situation of quadrature formulas, we recall definitions and formulas, which are commonly introduced in calculus courses.
Definition # 1:- Let f be defined on a closed interval [a, b], and let P be a
partition of [a, b]. A Riemann sum of f (or f (x)) for P is any expression RP of
the form
n
X
RP =
f (k )xk ,
k=1

where k is in [xk1 , xk ] and k = 1, 2, 3, , n.


Definition # 2:- Let f be defined on a closed interval [a, b], and let L be a
real number. The statement
X
lim
f (k )xk = L,
kP k0

means that for every  > 0, there is a > 0 such that P is a partition of [a, b] with
kP k < , then
X
|
f (k )xk | < 
k

37

for any choice of number k in the subintervals [xk1 , xk ] of P . The number L is a


limit of (Riemann) sums.
Definition # 3:- Let f be defined on a closed interval [a, b]. The definite
Rb
integral of f from a to b, denoted by b f (x) dx, is
Z b
X
f (x) dx = lim
f (k )xk ,
kP k0

provided the limit exists.

Rectangle Rules
Rb
Using the above definitions, we approximate the value of a f (x) dx as a sum of
areas of rectangles. In particular, if we use a regular partition (evenly spaced data
ba
, then xk = x + kh for k = 0, 1, 2, 3, , n, a = x ,
points) with h = x =
n
b = xn , and
Z b
n
X
f (k )h,
f (x) dx
a

k=1
th

where k is any number in the k subinterval [xk1 , xk ]. Each term f (k )h in the


sum is the area of a rectangle of width h and height f (k ). The accuracy of such
Rb
an approximation to a f (x) dx by rectangles is affected by both the location of k
within each subinterval and the width h of the rectangles.
By locating each k at a left-hand endpoint xk1 , we obtain a left endpoint approximation. Alternately, by locating each k at a right-hand endpoint xk , we obtain
a right endpoint approximation. A third possibility is to let k be the midpoint of
xk1 + xk
each subinterval: the k1/2 =
. This choice of location for k gives a
2
midpoint approximation.
Rectangle Rules For a rectangle partition of an interval [a, b] with n subinRb
ba
tervals, each of width h =
, the definite integral a f (x) dx is approximated
n
by
1. The left rectangle rule:
Z b
n
n
X
baX
f (x)dx Al =
f (k1 )h =
f (k1 )
n k=1
a
k=1
2. The right rectangle rule:
Z b
n
n
X
baX
f (x)dx Ar =
f (k )h =
f (k )
n
a
k=1
k=1
38

3. The midpoint rule:


b

Z
a

n
X

baX
f (x)dx Am =
f (k1/2 )
f (k1/2 )h =
n k=1
k=1

If a function is strictly increasing or strictly decreasing over the interval, then the
end points rules give the areas of the inscribed and circumscribed rectangles.

The Trapezoidal Rule- A Composite Formula


Z
To evaluate

f (x)dx , subdivide the interval from a to b into n subintervals x in


a

width. The area under the curve in each subinterval is approximated by the trapezoid formed by replacing the curve by its secant line drawn between the endpoints
x < x1 < x2 < < xn of the curve. The integral is then approximated by the sum
of all the trapezoid areas. Let h be the constant x. Since the area of a trapezoid
is the sum of the area of a rectangle and the area of a triangle, for each subinterval,
Z

i+1

f (x)dx xf (xi ) +
i

x
(f (xi+1 f (xi ))
2

h
(f (xi ) + f (xi+1 ))
2

is known as trapezoidal rule, which can also be obtained from midpoint rule and
if f (x) 0 on interval [a, b], then to find the area under the curve f (x) over [a, b],
Z b
represented by
f (x)dx, subdivid [a, b] into n subintervals of size h, so that
a

h
(f (x ) + f (x1 ))
2

Z 2
h
A1 =
f (x)dx (f (x1 ) + f (x2 ))
2
1
..
..
.
.Z
n
h
An1 =
f (x)dx (f (xn1 ) + f (xn ))
2
n1
A =

f (x)dx

The total area lying between x = a and x = b is given by


Z

f (x)dx A + A1 + A2 + A3 + + An1

A=
a

39

(44)

Substituting the above values (??) in equation (44) , we get


Z

h
(f (x ) + 2f (x1 ) + 2f (x2 ) + + 2f (xn1 ) + f (xn ))
2
a
Z b
n1
X
h
f (x)dx (f (x ) + 2
A =
f (xi ) + f (xn ))
(45)
2
a
i=1
f (x)dx

A =
or

equation (45) is called the composite trapezoidal rule. This method, for replacing a curve by a straight line is hardly accurate, unless the subintervals are very
small.

Simpsons Rules
The trapezoidal rule approximates the area under a curve by summing the areas of
uniform width trapezoids formed by connecting successive points on the curve by
straight lines. Simpsons rule gives a more accurate approximation by connecting
successive groups of three points on the curve by second-degree parabolas, known
as Simpsons 13 rule, and summing the areas under the parabolas to obtain the
approximate area under the curve, or by connecting successive groups of four points
on the curve by third-degree polynomial, known as Simpsons 38 rule, and summing
the areas, to obtain the approximate area under the curve.

Simpsons

1
3

Rules

Consider the area contained in the two strips under the curve of f (x) comprising
with three data points (x , y ), (x1 , y1 ) and (x2 , y2 ) . Approximate this area with
the area under a parabola passing through these three points.
The general form of the equation of the second-degree parabola connecting the
three points is
f (x) = ax2 + bx + c
(46)
The integration of equation (46) from x to x gives the area contained in the
two strips under the parabola. Hence,
Z

(ax2 + bx + c)dx

A2 strips =
x


=

ax3 bx2
+
+ cx
3
2

x
x

2
=
a(x)3 + 2c(x)
3
40

(47)

The constants a and c can be determined from the fact that points (x, y ), (0, y1 )
and (x, y2 ) must all satisfy equation (46). The substitution of these three sets of
the coordinates into equation (46) yields
a =

y 2y1 + y2
2(x)2

b=

y2 y
2(x)

c = y1

(48)

The substitution of the first and the third parts of equation (48) into equation (47)
yields
x
A2 strips =
(y + 4y1 + y2 )
(49)
3
which gives the area in terms of the three ordinates y , y1 , and y2 and the width x
of a single strip. This constitutes Simpsons 31 rule for obtaining the approximate
area contained in two equal-width strips under a curve.
If the area under a curve between two values of x is divided into n uniform strips
(n even), the application of equation (49) shows that
A
A2
A4
..
.
An2

x
(y + 4y1 + y2 )
3
x
(y2 + 4y3 + y4 )
=
3
x
=
(y4 + 4y5 + y6 )
3
..
.
x
(yn2 + 4yn1 + yn )
=
3
=

(50)

Summing these areas, we can write


Z

xn

f (x)dx A + A2 + A4 + + An2
x

x
(y + 4y1 + 2y2 + 4y3 + 2y4 + 4y5
3
+ 2y6 + + 4yn1 + yn )
i=n1
i=n2
X
X
x
=
(y + 4
yi + 2
yi + yn )
3
i=1,3,5
i=2,4,6

(51)

where n must be an even number. Equation (51) is called Composite/extended


Simpsons 31 rule for obtaining the approximate area under a curve. It may be
used when the area is divided into an even number of strips of width x.
41

Simpsons

3
8

Rules

If an odd number of strips is used, Simpsons three-eighths rule for obtaining


the area contained in 3 strips under a curve (with four datapoints) can be used.
The derivation of the three-eighths rule determines the area under a third-degree
polynomial connecting four points on the given curve. The general form of the
third-degree polynomial is
y = ax3 + bx2 + cx + d
1
1
3
3
For convenience take ( x, y ), ( x, y1 ), ( x, y2 ), ( x, y3 ). Therefore, the
2
2
2
2
3
3
range of integration is from x to x, i. e.,
2
2
Z

3
x
2

A3 strips =

(ax3 + bx2 + cx + d)dx

32 x


=
=

ax4 bx3 cx2


+
+
+ dx
4
3
2

 23 x
32 x

9
b(x)3 + 3d(x)
4

(52)

3
The constants b and d can be determined from the fact that points ( x, y ),
2
1
1
3
( x, y1 ), ( x, y2 ), and ( x, y3 ) must all satisfy equation (52). Using these
2
2
2
four sets of the coordinates into equation (52), we obtain
y + y3 y1 y2
4x2
1
d =
(9y1 + 9y2 y y3 )
8
b =

(53)
(54)

The substitution of the b and d from equations (53) and (54) in equation (52)
yields
A3 strips =

3
(x)(y + 3y1 + 3y2 + y3 )
8

(55)

which is Simpsons three-eighths rule for obtaining the approximate area contained in three equal-width strips under a curve. It gives the area in terms of the
four ordinates y , y1 , y2 , and y3 and the width x of a single strip.
42

If the area under a curve is divided into n, a multiple of 3, uniform strips then
the application of equation (55) shows that
3x
(y + 3y1 + 3y2 + y3 )
8
3x
=
(y3 + 3y4 + 3y5 + y6 )
8
3x
=
(y6 + 3y7 + 3y8 + y9 )
8
..
.
3x
(yn3 + 3yn2 + 3yn1 + yn )
=
8

A3
A6
..
.
An3

(56)

Summing these areas, we can write


Z xn
f (x)dx A + A3 + A6 + + An3
x

3x
(y + 3y1 + 3y2 + 2y3 + 3y4 + 3y5
8
+ 2y6 + + 2yn3 + 3yn2 + 3yn1 + yn )

(57)

Equation (57) is called Composite/extended Simpsons 83 rule for obtaining


the approximate area under a curve. This rule is applicable to a multiple of three
intervals.

Newton-Cotes Integration Formulas


In developing formulas for numerical integration when the values of data points are
equispaced, Newton-Gregory forward polynomial is a convenient starting point.
The numerical integration methods that are derived by integrating the NewtonGregory interpolating polynomials are the Newton-Cotes integration formulas,
then
Z b
Z b
f (x)dx
Pn (xs )dx.
a

This formula will not give us the exact answer because the polynomial is not identical
with f (x). We get an expression for the error by integrating the error term of Pn (xs ).
So starting with the Newton-Gregory forward polynomial of degree n
s(s 1)(s 2) 3
s(s 1) 2
f +
f +
2!
3!
s(s 1)(s 2) (s (n 1)) n
+
f
n!

Pn (xs ) = f + s f +

43

For n = 1 (i. e., two data points (x , f ), (x1 , f1 ) )


Z
and

P1 (xs ) = f + s f
Z
Z x1
b
P1 (xs )dx =
f (x)dx

(f + s f )dx

x1

As x = x + sh we get dx = hds and at x = x , s = 0 and at x = x1 , s = 1 this gives


Z 1
Z x1
(f + s f )hds
(f + s f )dx =
0

= f h s ]10 + f h
=

s2 1
]
2 0

h
(f + f1 )
2

using f = f1 f and for the error estimation, we use the next-term rule i. e.,
Error = (Approximately) the value of the next term that would be added
to P1 (xs ) implies
Z x1
s(s 1) 2
Error =
f dx
2!
x
Z
2 f h 1 2
(s s)ds
=
2
0

1
2 f h s3 s2
=

2
3
2 0
h3
h
= 2 f = f00
12
12
2 f
= f00 . Taking
h2
(x , x1 ), then

Since

 (x , x1 ) such that f 00 () is the maximum value in

h3 00
f ()
12
Therefore, Newton-Cotes formula for n = 1 becomes
Z x1
h
h3
f (x)dx = (f + f1 ) f 00 ()
where  (x , x1 )
2
12
x
Error =

For n = 2 (i. e., three data points (x , f ), (x1 , f1 ), (x2 , f2 ) )


P2 (xs ) = f + s f +
44

s(s 1) 2
f
2!

x2

f (x)dx

and

P2 (xs )dx
Zxx2

(f + s f +

=
x

s(s 1) 2
f )dx
2!

At x = x , s = 0, and at x = x2 , s = 2 this gives


Z x2
s(s 1) 2
(f + s f +
f )dx
2!
x
Z 2
s(s 1) 2
=
(f + s f +
f )hds
2!
0
2
2

2
s2
2 f h s3 s2

= f h s + f h
+

2 0
2
3
2 0
0


2
f h 8
h
= 2f h + 2hf +
2 = (f + 4f1 + f2 )
2
3
3
using f = f1 f and 2 f = f2 2f1 + f .
Using next-term rule for the error evaluation, we get
Z x2
s(s 1)(s 2) 3
f dx
Error =
3!
x
Z 2
s(s 1)(s 2) 3
=
f hds
3!
0

2
3 f h s4
3
2
=
s +s
=0
6
4
0
Considering next term for error, we get
Z x2
s(s 1)(s 2)(s 3) 4
Error =
f dx
4!
x
Z 2
s(s 1)(s 2)(s 3) 4
=
f hds
4!
0

2
4 f h s5 6s4 11s3 6s2
=

24
5
4
3
2 0


4
4
f h
4
f h

=
=
24
15
90
4 f
= fiv . Taking
h4
(x , x2 ), then

Since

 (x , x2 ) such that f iv () is the maximum value in


Error =

h5 iv
f ()
90

45

Therefore, Newton-Cotes formula for n = 2 becomes


Z x2
h
h5
f (x)dx = (f + 4f1 + f2 ) f iv ()
where  (x , x2 )
3
90
x
For n = 3 (four data points (x , f ), (x1 , f1 ), (x2 , f2 ), (x3 , f3 ) )
s(s 1) 2
s(s 1)(s 2) 3
P3 (xs ) = f + s f +
f +
f
2!
3!
Z x3
Z b
P3 (xs )dx
f (x)dx
and
x
a
Z 3
s(s 1) 2
s(s 1)(s 2) 3
(f + s f +
=
f +
f )hds
2!
3!
0
3

3
i3
2 f h s3 s2
s2
+

= f h s + f h
2 0
2
3
2 0
0


3
3 f h s4
+
s3 + s2
6
4
0
3
3
1
= 3h(f + (f1 f ) + (f2 2f1 + f ) + (f3 3f2 + 3f1 f ))
2
4
8
3h
(f + 3f1 + 3f2 + f3 )
=
8
where f = f1 f , 2 f = f2 2f1 + f and 3 f = f3 3f2 + 3f1 f .
Considering next term for error, we obtain
Z x3
s(s 1)(s 2)(s 3) 4
Error =
f dx
4!
x
Z 3
s(s 1)(s 2)(s 3) 4
f hds
=
4!
0

3
4 f h s5 6s4 11s3 6s2
3
=

= 4 f h
24
5
4
3
2 0
80
Taking  (x , x3 ) such that f iv () is the maximum value in (x , x3 ), then
3
Error = h5 f iv ()
80
Therefore, Newton-Cotes formula for n = 3 is derived as
Z x3
3h
3
f (x)dx =
(f + 3f1 + 3f2 + f3 ) h5 f iv ()
where  (x , x3 )
8
80
x
3
1
Since the order of error of the rule is the same as that of the rule, there is no
8
3
3
gain in accuracy over the rule when one has a free choice between the two rules.
8
46

Gaussian Quadrature
Gauss observed that if we dont consider the function at predetermined x-values,
a three term formula will contain six parameters and should corresponds to an
interpolating polynomial of degree five. Formulas based on this principle are called
Gaussian quadrature formulas. They can be applied only when f (x) is known
explicitly, so that it can be evaluated at any desired value of x. Consider the simple
case of two-term formula containing four unknown parameters:
Z 1
f (t)dt af (t1 ) + bf (t2 )
1

a symmetrical interval of integration is used to simplfy the arithmetic. This formula


is valid for any polynomial of degree three; hence it will hold if f (t) = t3 , f (t) =
t2 , f (t) = t, and f (t) = 1;
Z 1
3
t3 dt = 0 = at31 + bt32 ;
f (t) = t ;
Z11
2
2
= at21 + bt22 ;
t2 dt =
f (t) = t ;
3
Z11
(58)
f (t) = t;
t dt = 0 = at1 + bt2 ;
1
Z 1
f (t) = 1;
dt = 2 = a + b.
1

Multiplying the third equation by t21 , and subtracting from the first, we get
0 = 0 + b(t32 t2 t21 ) = b(t2 )(t2 t1 )(t2 + t1 ).
this implies that either b = 0, t2 = 0, t1 = t2 , or t1 = t2 . Only the last of these
possibilities is satisfactory, the other being invalid, or else reduce our formuls to only
a single term, so we choose t1 = t2 . Then
a

= b

t2 = t1
Z

1,
1
= = 0.5773,
3

(59)

f (t) dt f (0.5773) + f (0.5773).


1

i.e., adding these two values of the function gives the exact value for the integral of
any cubic polynomial over the interval from 1 to 1.
Consider a problem with limits of integration from a to b, not from 1 to 1 for
which we derived this formula. We must change the interval of integration to (1, 1)
47

by a change of variable, by the following scheme:


Let x = At + B,
lower limit: x = a, t = 1
this gives a = A + B,
and upper limit: x = b, t = 1
this gives b = A + B, combining these two
b+a
ba
,
and B =
, i.e.,
equations we get A =
2
2
ba
b+a
t+
,
2
2

x=
then

Z
a

ba
f (x)dx =
2

so that dx =
Z

f(
1

ba
dt,
2

ba
b+a
t+
) dt.
2
2

Gaussian quadrature can be extended beyond two terms. The formula is then
given by
Z 1
n
X
f (t) dt
wi f (ti ),
for n points.
1

i=1

This formula is exact for functions f (t) that are polynomials of degree 2n 1 or
less.
Moreover, by extending the method we used previously for the 2-point formula,
for each n we obtain a system of 2n equation:

0,
for k = 1, 3, 5, , 2n 1;
w1 tk1 + w2 tk2 + w3 tk3 + + wn tkn =
2

for k = 0, 2, 4, , 2n 2.
k+1

Adaptive Quadrature
The composite quadrature rules necessitate the use of equally spaced points. It is
useful to introduce a method that adjusts the step size to be smaller over portions
of the curve where a larger functional variation occurs. This technique is called
adaptive quadrature. The method is based on Simpsons rule.
Simpsons rule uses two subintervals over [ak , bk ]:
Z bk
h
f (x)dx [f (ak ) + 4f (ck ) + f (bk )] = S(ak , bk ), say
(60)
3
ak
where ck =

1
(a
2 k

+ bk ) is the center of [ak , bk ] and h =

f C 4 [ak , bk ] then there exists a value 1 [ak , bk ] so that


Z bk
h5
f (x) dx = S(ak , bk ) f (4) (1 )
90
ak
48

b k ak
. Furthermore,
2

(61)

A composite Simpsons rule using four subintervals of [ak , bk ] can be performed by


bisecting this interval into two equal subintervals [ak1 , bk1 ] and [ak2 , bk2 ] and applying
formula (60) recursively over each piece. Only two additional evaluations of f (x)
are needed, and the result is
h
h
[f (ak1 ) + 4f (ck1 ) + f (bk1 )] + [f (ak2 ) + 4f (ck2 ) + f (bk2 )]
6
6
(62)
where ak1 = ak , bk1 = ak2 = ck , bk2 = bk , ck1 is the midpoint of [ak1 , bk1 ] and ck2 is
the midpoint of [ak2 , bk2 ]. In formula (62) the step size is h2 , accounts for the factors
h
on the right side of equation. Furthermore, if f C 4 [ak , bk ], there exists a value
6
2 C 4 [ak , bk ] so that
Z bk
h5 f (4) (2 )
f (x) dx = S(ak1 , bk1 ) + S(ak2 , bk2 )
.
(63)
16 90
ak

S(ak1 , bk1 ) + S(ak2 , bk2 ) =

Assume that f (4) (1 ) f (4) (2 ); then the right sides of equations (61) and (63) are
used to obtain the relation
h5 (4)
h5 f (4) (2 )
S(ak , bk ) f (2 ) S(ak1 , bk1 ) + S(ak2 , bk2 )
90
16 90

(64)

which can be written as

16
h5 (4)
f (2 ) [S(ak1 , bk1 ) + S(ak2 , bk2 ) S(ak , bk )]
90
15

(65)

Then (65) is substituted in (63) to obtain the error estimate:


Z bk
1
f (x)dxS(ak1 , bk1 )S(ak2 , bk2 )| |S(ak1 , bk1 )+S(ak2 , bk2 )S(ak , bk )| (66)
|
15
ak

Numerical Integration using Cubic Splines


Cubic splines can be used for finding derivatives and integrals of functions, even when
the function is known only as a table of values. The smoothness of splines, because
of the requirements that each portion have the same first and second derivatives as
its neighbor where they join, can give improved accuracy in some cases.
For the cubic spline that approximates f (x), we can write, for the interval xi
x xi+1 ,
f (x) = ai (x xi )3 + bi (x xi )2 + ci (x xi ) + di ,
where the coefficients are determined as in section , are
ai =

Si+1 Si
,
6(xi+1 xi )
49

Si
,
2
f (xi+1 ) f (xi ) (xi+1 xi )Si+1 + 2(xi+1 xi )Si

,
=
(xi+1 xi )
6
= f (xi ).

bi =
ci
di

Approximating the integral of f (x) over the n intervals where f (x) is approximated by the spline is straightforward:
Z ni+1
Z ni+1
f (x)dx =
f (x)dx
xi

xi
n
X
i=1

xi+1
bi
ci
ai
4
3
2
(x xi ) + (x xi ) + (x xi ) + di (x xi )
4
3
2
xi

n 
X
ai
i=1


bi
ci
3
2
(xi+1 xi ) + (xi+1 xi ) + (xi+1 xi ) + di (xi+1 xi ) .
4
3
2
4

If the intervals are all of the same size, (h = xi+1 xi ), this equation becomes
Z ni+1
n
n
n
n
X
h4 X
h3 X
h2 X
f (x)dx =
ai +
bi +
ci + h
di .
4 i=1
3 i=1
2 i=1
xi
i=1

Numerical Differentiation
For numerical differentiation we shall use Taylor-series for a function y = f (x) at
(xi + h) expanded about xi
h3
h2 00
f (xi ) + f 000 (xi ) +
(67)
2!
3!
where h = x and f (xi ) is the ordinate corresponding to xi and (xi + h) is in the
region of convergence. The function at (xi h) is similarly given by
f (xi + h) = f (xi ) + hf 0 (xi ) +

h2 00
h3
f (xi ) f 000 (xi ) +
2!
3!
Subtracting equation (68) from equation (67), we obtain
 2

f (xi + h) f (xi h)
h 000
0
f (xi ) =

f (xi ) +
2h
6
f (xi h) = f (xi ) hf 0 (xi ) +

(68)

(69)

If we designate equally spaced points to the right of xi as xi+1 , xi+2 , and so on,
and those to the left of xi as xi1 , xi2 , and identify the corresponding ordinates as
fi+1 , fi+2 , fi1 , fi2 , respectively, equation (69) can be written in the form
fi0 =

fi+1 fi1
2h
50

(70)

with error of order h2 . Equation (70) is called the central-difference approximation


of f 0 (x) at xi with errors, of order h2 .
If we add equations (67) and (68), we may write the second derivative as:


fi+1 2fi + fi1
1 iv 2
00
fi =
f h +
(71)

h2
12 i
The approximate expression using only the first term to the right of the equal sign
is the central-difference approximation of the second derivative of the function
at xi , with error of order h2 . As with the first derivative, the approximations with
error of higher order can also be derived.
To obtain an expression for the third derivative, we expand function at xi2 and
at xi+2 such that
(2h)2 00 (2h)3 000
f +
f +
2! i
3! i
(2h)2 00 (2h)3 000
f
f +
fi2 = fi 2hfi0 +
2! i
3! i
With the help of equations (67), (68), (72), and (73) we derive
fi+2 = fi + 2hfi0 +

fi000 =

fi+2 2fi+1 + 2fi1 fi2


2h3

(72)
(73)

(74)

Equation (74) gives the central-difference approximation for the third derivative
of the function f (x) at xi , with error of order h2 .
Successively higher derivatives can be obtained by this method, but since they
require the solution of increasingly larger number of simultaneous equations, the
process becomes quite tedious. The same technique may also be used to find more
accurate expressions for the derivatives, by using additional terms in the Taylorseries expansion.
It can be noted that the central-difference expressions for the various derivatives
involve values of the function on both sides of the x values at which the derivative of the function is desired. Using Taylor-series expression we can also obtain
expressions for the derivatives which are entirely in terms of values of the function
at xi and points to the right of xi . These are known as forward-finite-difference
expressions. In a similar manner, derivative expressions which are entirely in terms
of values of the function at xi and points to the left of xi can be found. These are
known as backward-finite-difference expressions. In numerical differentiation,
forward-finite-difference expressions are used when data to the left of a point at
which a derivative is desired are not available, and backward-difference expressions
are used when data to the right of desired point are not available. Central-difference
expressions, however, are more accurate than either forward- or backward-difference
expressions.
51

Central-Difference Expressions with Error of Order h2


fi+1 fi1
2h
fi+1 2fi + fi1
=
h2
fi+2 2fi+1 + 2fi1 fi2
=
2h3
fi+2 4fi+1 + 6fi 4fi1 + fi2
=
h4

fi0 =

(75)

fi00

(76)

fi000
fi0000

(77)
(78)

Central-Difference Expressions with Error of Order h4


fi+2 + 8fi+1 8fi1 + fi2
12h
fi+2 + 16fi+1 30fi + 16fi1 fi2
=
12h2
fi+3 + 8fi+2 13fi+1 + 13fi1 8fi2 + fi3
=
8h3
fi+3 + 12fi+2 39fi+1 + 56fi 39fi1 + 12fi2 fi3
=
6h4

fi0 =

(79)

fi00

(80)

fi000
fi0000

(81)
(82)

Forward-Difference Expressions with Error of Order h


fi+1 fi
h
fi+2 2fi+1 + fi
=
h2
fi+3 3fi+2 + 3fi+1 fi
=
h3
fi+4 4fi+3 + 6fi+2 4fi+1 + fi
=
h4

fi0 =

(83)

fi00

(84)

fi000
fi0000

(85)
(86)

Forward-Difference Expressions with Error of Order h2


fi+2 + 4fi+1 3fi
2h
fi+3 + 4fi+2 5fi+1 + 2fi
=
h2
3fi+4 + 14fi+3 24fi+2 + 18fi+1 5fi
=
2h3

fi0 =

(87)

fi00

(88)

fi000

52

(89)

fi0000 =

2fi+5 + 11fi+4 24fi+3 + 26fi+2 14fi+1 + 3fi


h4

(90)

Backward-Difference Expressions with Error of Order h


fi fi1
h
fi 2fi1 + fi2
=
h2
fi 3fi1 + 3fi2 fi3
=
h3
fi 4fi1 + 6fi2 4fi3 + fi4
=
h4

fi0 =

(91)

fi00

(92)

fi000
fi0000

(93)
(94)

Backward-Difference Expressions with Error of Order h2


fi2 4fi1 + 3fi
2h
fi3 + 4fi2 5fi1 + 2fi
=
h2
3fi4 14fi3 + 24fi2 18fi1 + 5fi
=
2h3
2fi5 + 11fi4 24fi3 + 26fi2 14fi1 + 3fi
=
h4

fi0 =

(95)

fi00

(96)

fi000
fi0000

(97)
(98)

Derivatives from Difference Tables


f [xi , xi+1 ] =
f [xi , xi+1 , xi+2 ] =
=
f [xi , xi+1 , xi+2 , xi+3 ] =
=

f [xi , xi+1 , xi+2 , , xi+n ] =

f (xi+1 ) f (xi )
= f 0 (xi )
xi+1 xi
f [xi+1 , xi+2 ] f [xi , xi+1 ]
xi+2 xi
0
f (xi+1 ) f 0 (xi )
f 00 (xi )
=
2h
2
f [xi+1 , xi+2 , xi+3 ] f [xi , xi+1 , xi+2 ]
xi+3 xi
00
00
f (xi+1 ) f (xi )
f 000 (xi )
=
2 3h
3!
..
.
f [xi+1 , xi+2 , , xi+n ] f [xi , xi+1 , , xi+n1 ]
xi+n xi
53

f n (xi )
n!

Similarly for evenly spaced n data points


f 0 (xi ) =
f 00 (xi ) =
f 000 (xi ) =

f n (xi ) =
=

f (xi )
f (xi+1 ) f (xi )
=
xi+1 xi
h
f (xi+2 ) 2f (xi+1 ) + f (xi )
2 f (xi )
=
h2
h2
3 f (xi )
f (xi+3 ) 3f (xi+2 ) + 3f (xi+1 ) f (xi )
=
h3
h3
..
.
f (xi+n ) nf (xi+n1 ) + n(n 1)f (xi+n2 ) + ()n f (xi )
hn
n
f (xi )
hn

Multi-Dimensional Finite Difference Formulas


Multidimensional finite difference formulas can be derived using the results of onedimensional formulas. For two dimensions, we consider
xi = x + ix
yj = y + jy
The first partial derivatives in the x- and y-directions are



f
fi+1,j fi,j
=
+ O(x)
x ij
x


f
fi,j+1 fi,j
=
+ O(y)
y ij
y
Similarly, the second-order central difference formulas for the second order derivatives are of the form



2f
x2

2f
y 2

fi+1,j 2fi,j + fi1,j


x2 4 f

x2
12 x4

fi,j+1 2fi,j + fi,j1 y 2 4 f

y 2
12 y 4

ij

ij

54

2f
xy


=
ij

2f
yx


=
ij

fi+1,j+1 fi+1,j fi,j+1 + fi,j


xy

+O(x) + O(y)
fi+1,j+1 fi+1,j1 fi1,j+1 + fi1,j1
=
4xy
+O(x2 ) + O(y 2 )

Numerical Solution of Ordinary Differential Equations

Definition:A solution to the initial value problem (I. V. P.)


y 0 = f (t, y)

with y(a) = y .

on an interval [a, b] is a differentiable function y = f (t) such that


y(t ) = y

and y 0 (t) = f (t, y(t)) for all t [t , b]

Definition:Given a rectangle R = {(t, y) : a t b, c y d}, assume that f (t, y) is


continuous on R. The function f is said to satisfy a Lipschitz condition in the
variable y on R provided that a constant L > 0 exists with the property that
|f (t, y1 ) f (t, y2 )| L|y1 y2 |
whenever (t, y1 ), (t, y2 ) R. The constant L is called a Lipschitz constant for f .
Theorem:- Existence and Uniqueness.
Assume that f (t, y) is continuous on R = {(t, y) : t t b, c y d}. If
f satisfies a Lipschitz condition on R in the variable y and (t , y ) R, then the
initial value problem
y 0 = f (t, y)

with y(t ) = y .

has a unique solution y = y(t) on some subinterval t t t + .

Definition:Assume that {(tk , yk ), k = 0, 1, 2, , m} is the set of discrete approximations


and that y = y(t) is the unique solution to the initial value problem.
The global discretization error k is defined by
k = y(tk ) yk

for k = 0, 1, 2, , m.
55

It is the difference between the unique solution and the solution obtained by the
discrete variable method.
The local discretization error k+1 is defined by
k+1 = y(tk+1 ) yk h(tk , yk )

for k = 0, 1, 2, , m 1.

It is the error committed in the single step from tk to tk+1 .

Euler Methods
Euler methods consist of three versions
1. forward Euler method,
2. modified Euler method,
3. backward Euler method.

Forward Euler Method


Let [a, b] be the interval over which we want to find the solution to the well-posed
initial value problem (I. V. P.)
y 0 = f (t, y)

with y(a) = y .

Subdivide the interval [a, b] into m equal subintervals and select the mesh points
tk = a + hk

for k = 0, 1, 2, 3, , m where h =

ba
m

(99)

The value h is called the step size. We now proceed to solve approximately
y 0 = f (t, y)

over [t , tm ] with y(t ) = y .

(100)

Assume that y(t), y 0 (t), and y 00 (t) are continuous and use Taylors theorem to
expand y(t) about t = t . For each value t, there exists a value that lies between
t and t so that
y 00 ( )(t t )2
y(t) = y(t ) + y (t )(t t ) +
.
2
0

(101)

When y 0 (t ) = f (t , y(t )) and h = t1 t are substituted in equation(101), the


result is an expression for y(t1 ):
y(t1 ) = y(t ) + hf (t , y(t )) + y 00 ( )
56

h2
.
2

(102)

If the step size h is chosen small enough, then we may neglect the second-order term
(involving h2 ) and get
y1 = y + hf (t , y ),
(103)
which is forward Eulers approximation. The process is repeated and generates
a sequence of points that approximate the solution curve y = y(t). The general step
for forward Eulers method is
tk+1 = tk + h,

yk+1 = yk + hf (tk , yk ) for k = 0, 1, 2, , m 1.

(104)

The trouble with this most simple method is its lack of accuracy, requiring an
extremely small step size. We might improve this method with just a little additional
effort.

Higher-Order Taylor-Series Method


Since Eulers method was derived using Taylors Theorem with n = 1 to approximate the solution of the differential equation, our first attempt to find methods
for improving the convergence properties of difference methods is to extend this
technique of derivation to larger value of n.
Suppose the solution y(t) to the I. V. P. (100) has (n + 1) continuous derivatives.
If we expand the solution, y(t), in terms of its nth Taylor polynomial about ti and
evaluate ti+1 , we obtain
hn (n)
hn+1 (n+1)
h2 00
y
(k ), (105)
y(tk+1 ) = y(tk ) + hy (tk ) + y (tk ) + + y (tk ) +
2
n!
(n + 1)!
0

for some k (tk , tk+1 ). Successive differentiation of the solution, y(t), gives
y 0 (t) = f (t, y(t)),

y 00 (t) = f 0 (t, y(t)),

and in general,
y (k) (t) = f (k1) (t, y(t)).
Substituting these results into equation (105) gives
h2 0
f (tk , y(tk )) +
2
hn (n1)
hn+1 (n)
+
f
(tk , y(tk )) +
f (k , y(k )),
n!
(n + 1)!

y(tk+1 ) = y(tk ) + hf (tk , y(tk )) +

(106)

The difference equation method corresponding to equation (106) is obtained by


deleting the remainder term involving . This method is called the Taylor method
of order n.
57

Note that Eulers method is Taylors method of order one.

Heuns Method
Heuns method introduces a new idea for constructing an algorithm to solve the
I. V. P.
y 0 (t) = f (t, y(t))

over [a, b] with y(t ) = y .

(107)

To obtain the solution point (t1 , y1 ) we can use the fundamental theorem of calculus,
and integrate y 0 (t) over [t , t1 ] and get
Z

t1

t1

f (t, y(t))dt =
t

y 0 (t)dt = y(t1 ) y(t ),

(108)

where the antiderivative of y 0 (t) is the desired function y(t). When equation (108)
is solved for y(t1 ), the result is
Z

t1

y(t1 ) = y(t ) +

f (t, y(t))dt.

(109)

Now a numerical integration method can be used to approximate the definite


integral in (109). If the trapezoidal rule is used with step size h = t1 t then the
result is
h
y(t1 ) y(t ) + [f (t , y(t )) + f (t1 , y(t1 ))] .
(110)
2
Notice, that the formula on the right-hand side of (110) involves the yet to be
determined value y(t1 ). To proceed, we use an estimate for y(t1 ). Eulers solution
will suffice for this purpose. After it is substituted into (110), the resulting formula
for finding (t1 , y1 ) is called Heuns method or modified Eulers method:
y(t1 ) = y(t ) +

h
[f (t , y ) + f (t1 , y + hf (t , y ))] .
2

(111)

The process is repeated and generates a sequence of points that approximates the
solution curve y = y(t). At each step, Eulers method is used as a prediction, and
then the trapezoidal rule is used to make a correction to obtain the final value. The
general step for Heuns method is
pk+1 = yk + hf (tk , yk ),
tk+1 = tk + h,
h
yk+1 = yk + [f (tk , yk ) + f (tk+1 , pk+1 )] .
2
58

(112)

Error for Heuns Method


The error term for the trapezoidal rule used to approximate the integral in (109) is

h3 00
y (k ).
12

(113)

If the only error at each step is that given in (113), after m steps the accumulated
error for Heuns method would be

m
X
h3
k=1

12

y 00 (k )

b a 00
y (k )h2 O(h2 ).
12

(114)

The Taylors method have the desirable property of higher-order local truncation
error, but the disadvantage of requiring the computational and evaluation of the
derivatives of f (t, y). This is complicated and time-consuming procedure for most
problems, so the Taylor methods are seldom used in practice.

Runge-Kutta Methods (R-K Methods)


Runge-Kutta methods have the high-order local truncation of the Taylor methods
while eliminating the need to compute and evaluate the derivatives of f (t, y). The
idea behind the derivations of Runge-Kutta methods need Taylors Theorem in
two variables.
Theorem:Suppose that f (t, y) and all its partial derivatives of order less than or equal to
(n + 1) are continuous on D = {(t, y)| a t b, c y d}, and let (t , y ) D.
For every (t, y) D, there exist between t and t and between y and y with
f (t, y) = Pn (t, y) + Rn (t, y),
where

f
f
f (t , y ) + (t t ) (t , y ) + (y y ) (t , y )
t
y

2 2
(t t ) f
2f
(t
,
y
)
+
(t

t
)(y

y
)
(t , y )

2
t2
ty

(y y )2 2 f
(t , y ) +
2
y 2
#
" n+1  
n
X
1
n
f
(t t )nj (y y )j nj j (t , y )
n! j=0 j
t y


Pn (t, y) =
+
+
+

59

(115)

and
"

#

n+1 
n+1
X
n+1

f
1
(t t )n+1j (y y )j n+1j j (t , y ) (116)
Rn (t, y) =
(n + 1)! j=0
j
t
y
The function Pn (t, y) is called nth Taylor polynomial in two variables for the
function f about (t , y ) and Rn (t, y) is the remainder associated with Pn (t, y).
To develop Runge-Kutta method of order 2, let us start with Taylors polynomial
as
h2 00 h3 000
y(ti+1 ) = y(ti ) + hy (ti ) + y + y ()
2
3!
h2 0
h3
= y(ti ) + hf (ti , y(ti )) + f (ti , y(ti )) + y 000 ().
2
3!
0

Since
f 0 (ti , y(ti )) =
and

f
f
(ti , y(ti )) +
(ti , y(ti )).y 0 (ti )
t
y

y 0 (ti ) = f (ti , y(ti )) this implies



h f
y(ti+1 ) = y(ti ) + h f (ti , y(ti )) +
(ti , y(ti ))
2 t

h f
h3 000
+
(ti , y(ti )).f (ti , y(ti )) + y ().
2 y
3!

Expanding
f (ti + , yi + )
in its Taylor polynomial of degree one about (ti , yi ) gives
a1 f (ti + , yi + ) a1 f (ti , y(ti )) + a1
+ a1

f
(ti , y(ti ))
t

f
(ti , y(ti ))
y

Matching the coefficients of f and its derivatives enclosed in the braces in the preceding equation gives
f (t, y) :
and

a1 = 1;
f
(t, y) :
y

f
(t, y) :
t
a1 1 =
60

a1 1 =

h
f (t, y).
2

h
;
2

The parameters a1 , 1 , and 1 are uniquely determined to be


a1 = 1,

1 =

h
,
2

and 1 =

h
f (t, y);
2

so


h
h
y(ti+1 ) = y(ti ) + hf ti + , y(ti ) + f (ti , y(ti ))
2
2
The error introduced by replacing the term in the Taylor method with its approximation has the same order as the error term for the method. The Runge-Kutta
method produced in this way, called the Midpoint method, is also a secondorder method. As a consequence, the local error of the method is proportional to
h3 , and the global error is proportional to h2 .

Midpoint Method
w0 = ,

wi+1 = wi + +hf


h
h
ti + , wi + f (ti , wi )
2
2

where i = 0, 1, ..., N 1, with local error O(h3 ) and global error O(h2 ).
Using a1 f (t + , y + ) to replace the term in the Taylor method is the easiest
choice, but it is not the only one. If we instead use a term of the form
a1 f (t, y) + a2 f (t + , y + f (t, y)),
the extra parameter in this formula provides an infinite number of second-order
Runge-Kutta formulas. When a1 = a2 = 21 and = = h, we have the
Modified Euler method.

Modified Euler Method


w0 = ,
h
wi+1 = wi + + [f (ti , wi ) + f (ti+1 , wi + hf (ti , wi ))]
2
where i = 0, 1, ..., N 1, with local error O(h3 ) and global error O(h2 ).
Higher-order Taylor formulas can be converted into Runge-Kutta techniques in
a similar way, but the algebra becomes tedious. The most common Runge-Kutta
method is of order 4 and is obtained by expanding an expression that involves
only four function evaluations. Deriving this expression requires solving a system of
equations involving 12 unknowns. Once the algebra has been performed, the method
has the following simple representation.
61

Runge-Kutta Method of Order 4


w0 = ,
k1 = h f (ti , wi ),
1
1
k2 = h f (ti + h, wi + k1 ),
2
2
1
1
k3 = h f (ti + h, wi + k2 ),
2
2
k4 = h f (ti+1 , wi + k3 ),
1
wi+1 = wi + (k1 + 2k2 + 2k3 + k4 ) ,
6
where i = 0, 1, ..., N 1, with local error O(h5 ) and global error O(h4 ).

Predictor-Corrector Methods
The Taylor and Runge-Kutta methods are examples of one-step methods for approximating the solution to initial-value problems. These methods use wi in the
approximation wi+1 to y(ti+1 ) but do not involve any of the prior approximations
w0 , w1 , , wi1 . Generally some functional evaluations of f are required at intermediate points, but these are discarded as soon as wi+1 is obtained.
Since |y(tj ) wj | decreases in accuracy as j increases, better approximation
methods can be derived if, when approximating y(ti+1 ), we include in the method
some of the approximations prior to wi . Methods developed using this philosophy
are called multistep methods.
In brief, one-step methods consider what occurred at only one previous step;
multistep methods consider what happened at more than one previous steps. To
derive a multistep method, suppose that the solution to the initial-value
dy
= f (t, y),
dt

for a t b with y(a) = ,

is integrated over the interval [ti , ti+1 ]. Then


Z

ti+1

y(ti+1 ) y(ti ) =

ti+1

y (t)dt =
ti

f (t, y(t))dt,
ti

and
Z

ti+1

y(ti+1 ) = y(ti ) +

f (t, y(t))dt.
ti

62

Since we cannot integrate f (t, y(t)) without knowing y(t), which is the solution to
the problem, we instead integrate an interpolating polynomial, P (t), determined by
some of the previously obtained data points
(t0 , w0 ), (t1 , w1 ), , (ti , wi ).
When we assume, in addition, that y(ti ) wi , we have
Z

ti+1

y(ti+1 ) wi +

P (t)dt.
ti

If wm+1 is the first approximation generated by the multistep method, then


we need to supply starting values w0 , w1 , , wm for the method. These starting
values are generated using a one-step Runge-Kutta method with the same error
characteristics as the multistep method.
There are two distinct classes of multistep methods. In an explicit method,
wi+1 does not involve the function evaluation f (ti+1 , wi+1 ). A method that does
depend in part on f (ti+1 , wi+1 ) is implicit.
Some of the explicit multistep methods, together with their required starting
values and local error terms, are given as:

Adams-Bashforth Two-Step Explicit Method

w0 = ,
wi+1

w 1 = 1 ,
h
= wi + (3f (ti , wi ) f (ti1 , wi1 )) ,
2

where i = 1, , N 1, with local error

5 000
y (i )h3
12

for some i in (ti1 , ti+1 ).

Adams-Bashforth Three-Step Explicit Method


w0 = ,
wi+1

w1 = 1 ,
w2 = 2 ,
h
= wi +
(23f (ti , wi ) 16f (ti1 , wi1 ) + 5f (ti2 , wi2 )) ,
12

where i = 2, 3, , N 1, with local error 83 y (4) (i )h4 for some i (ti2 , ti+1 ).
63

Adams-Bashforth Four-Step Explicit Method

w0 = ,
wi+1

w 1 = 1 ,
w2 = 2 ,
w3 = 3 ,
h
(55f (ti , wi ) 59f (ti1 , wi1 )
= wi +
24
+ 37f (ti2 , wi2 ) 9f (ti3 , wi3 )) ,

where i = 3, 4, , N 1, with local error

251 (5)
y (i )h5
720

for some i (ti3 , ti+1 ).

Adams-Bashforth Five-Step Explicit Method


w0 = ,
w 3 = 3 ,
wi+1

w1 = 1 ,
w2 = 2 ,
w4 = 4 ,
h
= wi +
(1901f (ti , wi ) 2774f (ti1 , wi1 )
720
+ 2616f (ti2 , wi2 ) 1274f (ti3 , wi3 ) + 251f (ti4 , wi4 )) ,

95 (6)
where i = 4, 5, ..., N 1, with local error 288
y (i )h6 for some i (ti4 , ti+1 ).
Implicit methods use (ti+1 , f (ti+1 , y(ti+1 ))) as an additional interpolation node
in the approximation of the integral

ti+1

f (t, y(t))dt.
ti

Some of the more common implicit methods are listed here. Notice that the local
error of an (m 1)-step implicit method is O(hm+1 ), the same as that of an m-step
explicit method. They both use m function evaluations, however, since the implicit
methods use f (ti+1 , wi+1 ), but the explicit methods do not.

Adams-Moulton Two-Step Implicit Method

w0 = ,
wi+1

w 1 = 1 ,
h
= wi +
(5f (ti+1 , wi+1 ) + 8f (ti , wi ) f (ti1 , wi1 )) ,
12

1 (4)
where i = 1, 2, , N 1, with local error 24
y (i )h4 for some i in (ti1 , ti+1 ).

64

Adams-Moulton Three-Step Implicit Method


w0 = ,
wi+1

w 1 = 1 ,
w2 = 2 ,
h
(9f (ti+1 , wi+1 ) + 19f (ti , wi ) 5f (ti1 , wi1 )
= wi +
24
+ f (ti2 , wi2 )) ,

19 (5)
where i = 2, 3, , N 1, with local error 720
y (i )h5 for some i in (ti2 , ti+1 ).

Adams-Moulton Four-Step Implicit Method


w0 = ,
wi+1

w 1 = 1 ,
w2 = 2 ,
w3 = 3 ,
h
= wi +
(251f (ti+1 , wi+1 ) + 646f (ti , wi )
720
246f (ti1 , wi1 ) + 106f (ti2 , wi2 ) 19f (ti3 , wi3 )) ,

3 (6)
y (i )h6 for some i in (ti3 , ti+1 ).
where i = 3, 4, , N 1, with local error 160
Comparing an m-step Adams-Bashforth explicit method to an (m 1)-step
Adams-Moulton implicit method, we see that both require m evaluations of f per
step, and both have the terms y (m+1) (i )hm+1 in their local errors. In general, the
coefficients of the terms involving f in the approximation and those in the local
error are smaller for the implicit methods than for the explicit methods. This leads
to smaller truncation and round-off errors for the implicit methods.
In practice, implicit multistep methods are not used alone. Rather, they are
used to improve approximations obtained by explicit methods. The combination of
an explicit and implicit technique is called a predictor-corrector method. The
explicit method predicts an approximation, and the implicit method corrects this
prediction.

Milnes Method
wi+1 = wi3 +

4h
(2f (ti , wi ) f (ti1 , wi1 ) + 2f (ti2 , wi2 )) ,
3

14 (5)
y (i )h5 for some i in (ti3 , ti+1 ).
where i = 3, 4, , N 1, with local error 45
This method is used as a predictor for an implicit method called Simpsons
method. Its name comes from the fact that it can be derived using Simpsons rule
for approximating integrals.

65

Simpsons Method
wi+1 = wi1 +

h
(f (ti+1 , wi+1 ) + 4f (ti , wi ) + f (ti1 , wi1 )) ,
3

1 (5)
where i = 1, 2, , N 1, with local error 90
y (i )h5 for some i in (ti1 , ti+1 ).
Although the local error involved with a predictor-corrector method of the MilneSimpson type is generally smaller than that of the Adams-Bashforth-Moulton method,
the technique has limited use because of round-off error problems, which do not occur with the Adams procedure.

66

You might also like