Professional Documents
Culture Documents
Numerical Computation
Prof. Dr. Tahira Haroon
Chairperson, Department of
Mathematics,
COMSATS Institute of Informaton
Technology,
Park Road, Chak sahazad,
Islamabad
NUMERICAL COMPUTATION
A numerical process of obtaining a solution is to reduce the original problem
to a repetition of the same step or series of steps so that computations become
automatic is called a numerical method and a numerical method, which can be
used to solve a problem, will be called an algorithm. An algorithm is a complete
and unambiguous set of procedures leading to the solution of a mathematical problem. The selection or construction of appropriate algorithms properly falls within
the discipline of numerical analysis. Numerical analysts should consider all the
sources of error that may affect the results. They must consider how much accuracy is required, estimate the magnitude of the round-off and discretization errors,
determine an appropriate step size or the number of iterations required, provide for
adequate checks on accuracy, and make allowance for corrective action in case of
non-convergence.
Representation of numbers on computers and the errors introduced by these
representations are as follows:
The number 257, for example, is expressible as
257 = 2 102 + 5 101 + 7 100
we call 10 the base of this system. Any integer is expressible as a polynomial in the
base 10 with integral coefficients between 0 and 9 as
N = (an an1 an2 a0 )10
= an 10n + an1 10n1 + an2 10n2 + + a0 100
Modern computers read pulses sent by electrical components. The state of an
electrical impulse is either on or off. It is therefore, convenient to represent numbers
in computers in the binary system, the base 2, and the integer coefficient may take
the values 0 and 1. A nonnegative integer N will be represented in the binary system
as
N = (an an1 an2 a0 )2
= an 2n + an1 2n1 + an2 2n2 + + a0 20
where the coefficient ak are either 0 or 1. Note that N is again represented as a
polynomial, but now in the base 2.
Users of computers prefer to work in the more familiar decimal system. Then computer converts their inputs to base 2 (or perhaps base 16), then performs base 2
arithmetic, and finally, translates the answer into base 10 before it prints it out to
them.
Conversion of the binary number to decimal may be accomplished as
1
(11)2 = 1 21 + 1 20 = 3
(1101)2 = 1 23 + 1 22 + 0 21 + 1 20 = 13
and decimal number to binary as
187 = (187)10 = (10111011)2
However, if we look into the machine languages, we soon realize that other number
systems, particularly the octal and hexadecimal systems, are also used. Hexadecimal
provides more efficient use of memory space for real numbers.
It is easy to convert from octal to binary and back since three binary digits make
one octal digit. To convert from octal to binary one merely replaces all octal digits
by their binary equivalent; thus
187 = (187)10 =
=
=
=
=
=
=
=
Third, there is the floating-point number system, which is the one used in
almost all practical scientific and engineering computations. This number system
differs in significant ways from the fixed-point number system. Typically, the computer word length includes both the mantissa and exponent; thus the number of
digits in the mantissa of a floating point number is less than in that of a fixed
point.
Floating-Point Arithmetic
Scientific and engineering calculations are, nearly in all cases, carried out in floatingpoint arithmetic. The computer has a number of values it chooses from to store as
an approximation to the real number. The term real numbers is for the continuous
(and infinite) set of numbers on the number line. When printed as a number with
a decimal point, it is either fixed point or Floating-point, in contrast to integers.
Floating-point numbers have three parts:
1. the sign (which requires one bit);
2. the fraction part-often called the mantissa but better characterized by the
name significand;
3. the exponent part-often called the characteristic.
The significand bits(digits) constitute the fractional part of the number. In almost
all cases, numbers are normalized, meaning that the fraction digits are shifted and
exponent adjusted so that a1 is nonzero. e.g.,
27.39 +.2739 102 ;
0.00124 .1240 102 ;
37000 +.3700 105 .
Observe that we have normalized the fractions-the first fraction digit is nonzero. Zero
is a special case; it usually has a fraction part with all zeros and a zero exponent.
This kind of zero is not normalized and never can be.
Most computers permit two or even three types of numbers:
1. single precision, use the letter E in the exponent which is usually equivalent
to seven to nine significant decimal digits;
2. double precision, use the letter D in the exponent instead of E varies from
14 to 29 significant decimal digits, but is typically about 16 or 17.;
3. extended precision, which may be equivalent to 19 to 20 significant decimal
digits.
3
Error Analysis
Error analysis is the study and evaluation of error. An error in a numerical
computation is simply the difference between the actual (true) value of a quantity
and its computed (approximate) value. There are three common ways to express
the size of the error in a computed result: Absolute error, Relative error and
Percentage Error.
Suppose that x is an approximation (computed value) to x. The error is
= x x ,
Er =
Ea
;
|x |
x 6= 0
Ea
;
|x|
x 6= 0
Percentage error: Relative error expressed in percentage is called the percentage error, defined by
P E = 100 Er
Significant Digits
In considering rounding errors, it is necessary to be precise in the usage of approximate digits. A significant digit in an approximate number is a digit, which gives
reliable information about the size of the number. In other words, a significant digit
is used to express accuracy, i.e., how many digits in the number have meaning.
Errors
The main sources of error are
Gross errors, can be avoided by taking enough care,
Errors in original data, nothing can be done to overcome such errors by any
choice of method, but we need to be aware of such uncertainties; in particular,
we may need to perform tests to see how sensitive the results are to changes
in the input information,
Truncation errors, due to the finite representation of processes,
Round-off errors, due to the finite representation of numbers in the machine.
5
They all cause the same effect: diversion from the exact answer. Some errors
are small and may be neglected while others may be devastating if overlooked.
Rounding Errors is the most basic source of errors in a computer. It occurs
when a calculator or computer is used to perform real number calculations. This
error arises because the arithmetic performed in a machine involves numbers with
only a finite number of digits, say, n significant digits by rounding off the (n + 1)th
place and dropping all digits after the nth with the result that calculations are
performed with approximate representations of the actual numbers. The error that
results from replacing a number with its floating-point form is called round-off
error (regardless of whether the rounding or chopping method is used).
When machine drop without rounding, which is called chopping; this can cause
serious trouble.
Round-off causes trouble mainly when two numbers of about the same size are
subtracted.
A second, more insidious trouble with round off , especially with chopping, is
the presence of internal correlations between numbers in the computation so that,
step after step, the small error is always in the same direction:
Propagated Error
The local error at any stage of the calculation is propagated through out the remaining part of the computation, i.e., the error in the succeeding steps of the process due
to the occurance of an earlier error. Propagated error is more subtle than the
other errors-such errors are in addition to the local errors. Propagated error is of
critical importance. If errors are magnified continuously as the method continues,
eventually they will overshadow the true value, destroying its validity; we call such
a method unstable. For a stable method-the desirable kind- errors made at early
points die out as the method continues.
6
Numerical Cancellation
Accuracy is lost when two nearly equal numbers are subtracted. Thus care should
be taken to avoid such subtraction where possible, because this is the major source
of error in floating point operations.
Machine eps
One important measure in computer arithmetic is how small a difference between
two values the computer can recognize. This quantity is termed the computer eps.
x + x1
0 a zero at 1
2
x1 + x1
x1 + x 1
f (x1 )f
=
< 0 new interval (x1 ,
)
2
2
Advantages
7
Objection
This method is slow to converge.
The possibilities to end the cycle of repetitions are
1). |x1 x2 |
absolute accuracy in x.
x1 x2
|
relative accuracy in x (except for x1 = 0)
2). |
x1
3). |f (x1 ) f (x2 )| function values small
4).
Repeat n times
a good method
Warnings on the Bisection Method
Function must be continuous.
For example, if
1
f (x) =
x
then the bisection processes will come down to a small interval about x = , and
probably none of our test will have caught the error because the bisection method
does not recognize the difference between root and singularity. The Secant
Method
Almost every function can be approximated by a straight line over a small interval. Let x is near to the root r. Assume that f (x) is linear in the vicinity of the
root r. Choose another point, x1 , which is near to x and also near to r (which we
dont know yet), then from the obvious similar triangles we get
x2 = x
f (x )(x x1 )
f (x ) f (x1 )
Since f (x) is not exactly linear, x2 is not equal to r, but it should be closer than
either of the two points we begin with. We can continue to get better estimates of
the root if we do this repeatedly, always choosing the two xvalues nearest to r for
drawing the straight line.
Newton-Raphson Method
One of the most widely used methods of solving equations is Newton-Raphson
Method. Starting from an initial estimate which is not too far from a root, x1 , we
extrapolate along the tangent to its intersection with the xaxis, and take that as
the next approximation. This is continued until either the successive xvalues are
sufficiently close, or the value of the function is sufficiently near zero.
General formula is
f (xk )
.
xk+1 = xk 0
f (xk )
This formula provides a method of going from one guess xk to the next guess
xk+1 .
Newtons method when it works is fine. The method does not always converge;
it may jump to another root or oscillate around the desired root.
Thus, in practice, unless the local structure of the function is well understood,
Newtons method is to be avoided.
Newtons method also works for complex roots.
Newtons method is widely used because; at least in the near neighborhood of
a root, it is quadratically convergent. However, offsetting this is the need for two
function evaluations at each step, f (xn ) and f 0 (xn ).
Mullers Method
Mullers method is based on approximating the function in the neighborhood of
the root by a quadratic polynomial. A second degree polynomial is made to fit three
points near a root,
(x , f (x )), (x1 , f (x1 )), (x2 , f (x2 )),
9
and the proper zero of this quadratic, using the quadratic formula, is used as the
improved estimate of the root. The process is then repeated using the three points
nearest the root being evaluated.
The procedure for Mullers method is developed by writing a quadratic equation
that fits through three points in the vicinity of the root. Then we get,
x = x
2c
b2 4ac
where
f1 h2 + f2 h1 f (h2 + h1 )
h21 h2 + h1 h22
f1 f ah21
b =
,
c = f
h1
h1 = x1 x ,
h2 = x x2
a =
with the sign in the denominator taken to give the largest absolute value of the
denominator (i.e., if b > 0, choose plus; if b < 0, choose minus; if b = 0, choose
either).
To take the root of the polynomial as one of the set of three points for the next
approximation, taking three points that are most closely spaced (i.e., if the root is
to the right of x , take x , x1 , and the root; if to the left, take x , x2 , and the root).
Always reset the subscripts to make x be the middle of the three values.
Fixed-Position Iteration
Fixed point iteration is a possible method for obtaining a root of the equation
f (x) = 0.
(1)
or
x = g(x),
(2)
so that any solution of (2) i.e., any fixed point of g(x), is a solution of (1).
A=
a11
a21
a31
an1
a12 a13
a22 a23
a32 a33
an2 an3
a1m
a2m
a3m
= [aij ],
i = 1, 2, 3, , n,
j = 1, 2, 3, , m
anm
Two matrices of the same size may be added or subtracted. The sum of A =
[aij ] and B = [bij ] is the matrix whose elements are the sum of the corresponding
elements of A and B:
C = A B = [aij bij ] = [cij ].
Multiplication of two matrices is defined when the number of columns of first matrix
is equal to the number of rows of second matrix i.e., when A is nm and B is mr :
i = 1, 2, 3, , n,
j = 1, 2, 3, , r.
k=1
or
cij = kaij
Matrix with only one column n 1 in size, is termed as a column vector, and one
of only one row, 1 m in size, is called a row vector. Normally, term vector is
used for a column vector. If A is n n, it is called a square matrix.
The set of n linear equations in m unknowns can be written as
where
A=
x=
x1
x2
x3
..
.
xm
b=
b1
b2
b3
..
.
bm
A very important special case is the multiplication of two vectors, when gives
a matrix of one row and one column, a pure number, a scalar then this product is
called the scalar product of the vectors, or inner product.
Reverse the order of multiplication of these two vectors, and get a matrix, then
this product is called the outer product.
If all the elements above the diagonal are zero, a matrix is called lower-triangular;
it is called upper-triangular when all the elements below the diagonal are zero. If
only the diagonal terms are nonzero, the matrix is called a diagonal matrix. When
the diagonal elements are each equal to unity while all off-diagonal elements are zero,
the matrix is said to be the identity matrix. Tridiagonal matrices are those
that have nonzero elements only on the diagonal and in the positions adjacent to the
diagonal. When a matrix is square, a quantity called its trace is defined, which is
the sum of the elements on the main diagonal. All the elements of the null matrix
are zero. For a matrix defined by A = [aij ], its transpose is defined by AT = [aji ].
The inverse of a matrix A is written as A1 and satisfies AA1 = A1 A = I. A
matrix that has orthonormal columns is called an orthogonal matrix. A vector
whose length is one is called a unit vector. A vector that has all its elements equal
to zero except one element, which has a value of unity, is called a unit basis vector.
There are three distinct unit basis vectors for order-3 vectors. Null vectors are
defined
asthe vectors, with all the elements zero. Transpose vector of a vector
x1
u = x2 is given by uT = [x1 x2 x3 ]
x3
Sigma Notation
n
X
xi = x1 + x 2 + x3 + + xn
i=1
n
X
i=1
1
X
c = c
n
X
1 == c(1 + 1 + 1 + + 1) = cn
i=1
xi = x1
i=1
12
n
X
xi =
i=1
n
X
xj =
j=1
n
X
xk .
k=1
a1j xj =
a2j xj =
a3j xj =
j=1
m
X
b1
b2
anj xj
j=1
b3
= bn
m
X
aij xj = bi
i = 1, 2, 3, , n
j=1
Product Notaion
n
Y
x1 = x 1 x2 x3 xn
i=1
ak1
ij
ak1
kj
ak1
k1 ik
akk
k+1 j m
k+1 i n
(4)
where the is, js, ks, and so on, are as previously defined. The superscripts shown
merely correspond to the primes used in identifying successive reduced matrices,
and are not needed in a computer program.
The back substitution procedure may be generalized in the form of the following
set of equations:
13
anm
ann
P
aim nj=i+1 aij xj
=
aii
(5)
xn =
xi
i = n 1, n 2, , 1
(6)
There are two points, yet to be considered. First, we must guard against dividing by zero. Observe that zeros may be created in the diagonal positions even
if they are not present in the original matrix of coefficients. A useful strategy to
avoid (if possible) such zero divisors is to rearrange the equations so as to put the
coefficient of large magnitude on the diagonal at each step. This is called pivoting.
Complete pivoting may require both row and column interchanges. This is not
frequently done. Partial pivoting which places a coefficient of larger magnitude
on the diagonal by row interchanges only, will guarantee a nonzero divisor if there
is a solution to the set of equations, and will have the added advantage of improved
arithmetic precision. The diagonal elements that result are called pivot elements.
The second important point is the effect of the magnitude of the pivot elements
on the accuracy of the solution. If the magnitude of the pivot element is appreciably
smaller than the magnitude (absolute), in general, of the other elements in the
matrix, the use of the small pivot element will cause a decrease in the solution
accuracy. Therefore, for overall accuracy, using as a pivot row the row having the
largest pivot element should make each reduction. Such a provision should always
be incorporated in a computer program that is to solve fairly large numbers of
simultaneous equations.
When only a small number of equations are to be solved, the round-off error is
small and usually does not substantially affect the accuracy of the results, but if
many equations are to be solved simultaneously, the cumulative effect of round-off
error can introduce relatively large solution errors. For this reason, the number of
simultaneous equations, which can be satisfactorily solved by Gausss elimination
method, using seven to ten significant digits in the arithmetic operations, is generally
limited to 15 to 20 when most of all of the unknowns are present in all of the
equations (the coefficient matrix is dense). On the other hand if only a few unknowns
are present in each equation (the coefficient matrix is sparse), many more equations
may be satisfactorily handled.
The number of equations which can be accurately solved also depends to a great
extent on the condition of the system of equations. If a small relative change in
one or more of the coefficients of a system of equations results in a small relative
change in the solution, the system of equations is called a well-conditioned system.
If, however, a small relative change in one or more of the coefficient values results
in a large relative change in solution values, the system of equations is said to be
ill conditioned. Since small changes in the coefficients of an equation may result
14
from round-off error, the use of double-precision arithmetic and partial pivoting
or complete pivoting becomes very important in obtaining meaningful solutions
of such sets of equations.
There exists the possibility that the set of equations has no solution or that
the prior procedure will fail to find it. During the triangularization step, if a
zero is encountered on the diagonal, we cannot use that row to eliminate coefficients
below that zero element. However, in that case, we will continue by interchanging
rows and eventually achieve an upper triangular matrix of coefficients. The real
stumbling block is finding a zero on the diagonal after we have trangularized. If
that occurs, the back substitution fails, for we cannot divide by zero. It also means
that the determinant is zero: there is no solution.
1 <i n
a1j ai1
1 <j m
(7)
bi1,j1 = aij
a11
a11 6=
0
a1j
1 <j m
bn,j1 =
(8)
a11 6=
0
a11
Equation (7) is used to find all elements of the new matrix B except those making
up the last row of that matrix. For determining the elements of the last row of the
new matrix, equation (8) is used. In these equations,
i
j
n
m
a
b
=
=
=
=
=
=
Choleskys Method
Choleskys method also known as Crouts method. Crouts method transforms the
coefficient matrix, A, into the product of two matrices, L (lower triangular matrix)
15
and U, (upper triangular matrix) where U has one on its main diagonal (the method
in which L has the ones on its diagonal, is known as Doolittles method).
The general formula for getting elements of L and U corresponding to the coefficient matrix for n simultaneous equations can be written as
lij = aij
aij
uij =
j1
X
k=1
i1
X
k=1
lii
lik ukj
j i i = 1, 2, 3, , n
(9)
lik ukj
i < j j = 2, 3, , n + 1
(10)
If we make sure that a11 in the original matrix is nonzero, then the divisions of
equations (10) will always be defined since the lii values will be nonzero. This may
be seen by noting that
LU = A
and therefore the determinant of L times the determinant of U equals the determinant of A. that is,
|L||U| = |A|
We are assuming independent equations, so the determinant of A is nonzero.
Norm
Discussing multicomponent entities like matrices and vectors, we frequently need
a way to express their magnitude- some measure of bigness or smallness. For
ordinary numbers, the absolute value tells us how large the number is, but for
a matrix there are many components, each of which may be larger or small in
magnitude. (We are not talking about the size of a matrix, meaning the number of
element it contains.)
Any good measure of the magnitude of a matrix (the technical term is norm
must have four properties that are intuitively essential:
1. the norm must always have a value greater than or equal to zero, and must be
zero only when the matrix is the zero matrix i.e.,
kAk 0 and kAk = 0 if and only if A = 0.
2. The norm must be multiplied by k if the matrix is multiplied by the scalar k.
i.e.,
kkAk = |k| kAk.
16
3. The norm of the sum of two matrices must not exceed the sum of the norms.
i.e.,
kA + Bk kAk + kBk.
4. The norm of the product of two matrices must not exceed the product of the
norms. i.e.,
kABk kAk kBk.
The third relationship is called the triangular inequality. The fourth is important
when we deal with the product of matrices.
For vectors in two or three space, the length satisfies all four requirements and
is a good value to use for the norm of a vector. This norm is called the Euclidean
norm, and is computed by
q
x21 + x22 + x23 .
Its generalized form will be
q
kxke = x21 + x22 + x23 + + x2n =
n
X
! 21
x2i
i=1
This is not the only way to compute a vector norm, however. The sum of the absolute
values of the xi can be used as a norm: the maximum value of the magnitudes of
the xi will also serve. These three norms can be interrelated by defining the p-norm
as
! p1
n
X
p
.
kxkp =
|x|i
i=1
n
X
|xi | =
sum of magnitudes;
i=1
kxk2 =
n
X
! 12
x2i
Euclidean norm;
i=1
kxk = max
1in
n
X
! 21
x2i
maximum-magnitude norm.
i=1
which of these vector norms is best to use may depend on the problem.
17
max
1jn
kAk = max
1in
n
X
| aij | =
| aij | =
i=1
n
X
j=1
The matrix norm kAk2 that corresponds to the 2-norm of a vector is not readily
computed. It is related to the eigenvalues of the matrix. This norm is also called
the spectral norm.
For an m n matrix, we can paraphrase the Euclidean (also called Frobenius)
norm as
kAke =
m X
n
X
i=1
! 21
a2ij
j=1
Why are norms important? For one thing, they let us express the accuracy of the
solution to a set of equations in quantitative terms by starting the norm of the error
vector (the true solution minus the approximate solution vector). Norms are also
used to study quantitatively the convergence of iterative methods of solving linear
systems.
Iterative Methods
Iterative methods, opposed to the direct method of solving a set of linear equations
by elimination, in certain cases, are preferred. Iterative techniques are seldom use
for solving linear systems of small dimension, since the time required for sufficient
accuracy exceeds that the required for direct techniques. When the coefficient matrix
is sparse (has many zeros) they may be more rapid.
x = b0 + C x,
18
where
b1
a11
a22
0
b =
,C =
b3
.33
b
n
ann
a12
a11
a13
a11
a14
a11
a21
a22
a23
a22
a24
a22
a31
a33
an1
ann
a32
a33
..
.
an2
ann
0
an3
ann
a34
a33
..
.
an4
ann
a1n
a11
a2n
a22
a3n
a33
Assuming we have an initial estimate x(0) for x, the next estimate x(1) is obtained
by substituting x(0) on the right side of the above equation and the next estimate
is obtained by substituting x(1) on the right side of equation to give x(2) . In this
iterative technique, we obtained
x(n+1) = b0 + C x(n)
If the elements of C are small in comparison to 1, A is said to be diagonally
dominant. If the elements of C are smaller in relation to 1, the more likely is
the sequence x(0) , x(1) , x(2) , , x(n) to converge. Nonetheless convergence also may
depend on how good the initial approximation is. This method is called the Jacobi
Iterative Method. It consists of solving the ith equation in A x = b for xi to
obtain (provided aii 6= 0)
n
X
bi
aij xj
for i = 1, 2, , n
+ ,
xi =
aii
aii
j=1
j6=i
(k)
n
X
(k)
xi
(k1)
aij xj
+ bi
j=1
j6=i
aii
for i = 1, 2, , n
Note that this method is exactly the same as the method of fixed-point iteration for
a single equation but it is now applied to a set of equations. This method is written
in the form
x(k+1) = G(x(k) ) = b0 C x(k)
which is identical to form
x(k+1) = g(x(k) ).
19
Jacobi method, is also known as the method of simultaneous displacements because each of the equations is simultaneously changed by using the most
recent set of x-values. Actually, the x-values of the next trial (new x) are not used,
even in part, until we have first found all its components, even though we have computed the new x1 , we still do not use this value in computing new x2 . In nearly
all cases the new values are better than the old, and should be used in preference
to the poorer values. When this is done, that is, to compute xki , the components
(k1)
(k)
(k)
(k)
of xi
are used. Since, for i > 1, x1 , x2 , , xi1 have already been computed
and are likely to be better approximations to the actual solutions x1 , x2 , , xi1
(k1)
(k1)
(k1)
(k)
than x1
, x2
, , xi1 , it seems reasonable to compute xi using these most
recently calculated values; that is,
bi
(k)
i1
X
j=1
(k)
(aij xj )
n
X
(k1)
(aij xj
j=i+1
,
for each i = 1, 2, , n
(11)
aii
This procedure is called the Gauss-Seidel Iterative method. In this method our
first step is to rearrange the set of equations by solving each equation for one of
the variables in terms of the others, exactly as we have done in the Jacobi method.
We then proceed to improve each x-value in turn, always using the most recent
approximations to the values of the other variables. The rate of convergence is more
rapid, as compared to Jacoi method.
xi
or
f (x) = f (x ) +
f (x1 ) f (x )
(x x )
x1 x
(12)
for x values between x and x1 . The use of this equation is called linear interpolation, familiar to every one who has used log tables.
If we are given the value of a function f (x) and are asked to find the value of
x then the process is called the inverse interpolation. For inverse interpolation
same straight line is used, but the equation is rearranged into the more convenient
form
x =
(13)
Here we will discuss polynomial interpolation, the simplest and certainly the
most widely used techniques for obtaining polynomial approximations. A best polynomial approximation does not give appreciably better results than an appropriate
scheme of polynomial interpolation.
Polynomial Interpolation
Several methods have been developed for finding polynomials, some of which make
use of special properties such as uniform spacing of the points of abscissa. Also, many
of these methods are useful for particular analytical or computational purposes. It
is important to remember that there is only one unique polynomial of degree n
which fits a given set of (n + 1) points. Hence, the polynomials obtained by the
different methods must be the same.
Polynomial Forms
Definition:- A polynomial p(x) of degree n is a function of the form
p(x) = a + a1 x + a2 x2 + a3 x3 + + an xn
(14)
(15)
(x c)2 00
(x c)3 000
(x c)n (n)
p (c) +
p (c) + +
p (c)
2!
3!
n!
(16)
(17)
(18)
is the nested form, whose evaluation for any particular value of x takes 2n additions
and n multiplications.
Lagrangian Polynomials
Data where the x-values are not equispaced often occur as the result of experimental
observations or when historical data are examined.
Consider a linear polynomial (equation of a straight line passing through two
distinct data points (x , y ) and (x1 , y1 ))
p(x) =
(x x1 )
(x x1 )
y +
y1
(x x1 )
(x x1 )
22
To generalize the concept of linear interpolation, consider the construction of polynomial of degree n. For this let (x , f ), (x1 , f1 ), (x2 , f2 ), (x3 , f3 ), , (xn , fn ) be (n + 1)
data points. Here we dont assume uniform spacing between the x-values, nor do
we need the x-values arranged in a particular order. The x-values must be distinct,
however. As the linear polynomial passing through (x , f (x )) and (x1 , f (x1 )) is
constructed by using the quotients
l (x) =
(x x1 )
(x x1 )
l1 (x) =
(x x )
(x1 x )
These quotients are satisfying the required conditions. Therefore, the polynomial
passing through these three data points becomes
p2 (x) = f (x )l2, (x) + f (x1 )l2,1 (x) + f (x2 )l2,2 (x)
For (n + 1) data values we need to construct for each k = 0, 1, 2, , n a quotient
ln,k (x) with the property that ln,k (xi ) = 0 when i 6= k and ln,k (xk ) = 1. To satisfy
ln,k (xi ) = 0 for each i 6= k requires that the numerator of ln,k (x) contain the term
(x x )(x x1 )(x x2 ) (x xk1 )(x xk+1 ) (x xn )
To satisfy ln,k (xk ) = 1, the deniminator and numerator of lk (x) must be equal when
evaluated at x = xk . Thus,
(x x )(x x1 ) (x xk1 )(x xk+1 ) (x xn )
(xk x )(xk x1 ) (xk xk1 )(xk xk+1 ) (xk xn )
n
Y
x xi
ln,k (x) =
for each k =, 1, 2, 3, , n
xk xi
i=0
ln,k (x) =
k6=i
23
(19)
The interpolating polynomial is easily described now that the form of ln,k is
known. This polynomial, is called the nth Lagrangian interpolating polynomial,
and defined as
p(x) = f (x )ln, (x) + f (x1 )ln,1 (x) + f (x2 )ln,2 (x) + + f (xn )ln,n (x)
n
X
=
f (xk )ln,k (x)
(20)
k=0
and
ln,k (xj ) =
1
0
j=k
j 6= k
j = 0, 1, 2, 3, , n
(21)
Forward Differeces
First forward difference of some function f (x) with respect to an increment h of
the independent variable x is
f (x) = f (x + h) f (x)
The operator always implies this operation on any function of x on which it
operates. Since f (x) is a function of x, it can be operated on by the operator
giving
2 f (x) = f (x + 2h) 2f (x + h) + f (x)
The function 2 f (x) is called the second forward difference of f (x). Third,
fourth, and higher differences are similarly obtained. In summary, the forward
difference expressions are
f (xi )
2 f (xi )
3 f (xi )
4 f (xi )
..
.
=
=
=
=
f (xi + h) f (xi )
f (xi + 2h) 2f (xi + h) + f (xi )
f (xi + 3h) 3f (xi + 2h) + 3f (xi + h) f (xi )
f (xi + 4h) 4f (xi + 3h) + 6f (xi + 2h) 4f (xi + h) + f (xi )
n(n 1)
f (xi + (n 2)h)
2!
n(n 1)(n 2)
f (xi + (n 3)h) + + (1)n f (xi )
3!
n(n 1)(n 2) (n k + 1)
k!
k
is the familiar symbol used for binomial coefficients.
n
Backward Differeces
The first backward difference of f (x) with respect to increment h is defined as
f (x) = f (x) f (x h)
The operator always implies this operation on the function of x on which it
operates, so that
[f (xi )] = 2 f (xi ) = f (xi ) 2f (xi h) + f (xi 2h)]
where xi denotes any specific values for x such as x , x1 , and so forth. In general,
n f (xi ) = [n1 f (xi )], and the backward differences of f (xi ) are
f (xi )
2 f (xi )
3 f (xi )
4 f (xi )
..
.
f (xi ) f (xi h)
f (xi ) 2f (xi h) + f (xi 2h)
f (xi ) 3f (xi h) + 3f (xi 2h) f (xi 3h)
f (xi ) 4f (xi h) + 6f (xi 2h) 4f (xi 3h) + f (xi 4h)
..
.
n(n 1)
n(n 1)(n 2)
n f (xi ) = f (xi ) nf (xi h) +
f (xi 2h)
f (xi 3h)
2!
3!
n(n 1) (n k + 1)
+ + (1)k
f (xi kh) +
k!
n(n 1) 3.2.1
+ (1)n
f (xi nh)
n!
=
=
=
=
2 f (xi )
= 2 f (xi 2h)
4
f (xi ) = 4 f (xi 4h)
n f (xi ) = n f (xi nh)
= f (xi h)
= 3 f (xi 3h)
In general
k fs = k fsk
(k = 1, 2, 3, )
Central Differeces
The first central difference of f (x) with respect to increment h is defined as, by
introducing the central difference operator
h
h
f (x) = f (x + ) f (x )
2
2
The operator always implies this operation on the function of x on which it operates, so that
h
h
2 f (xi ) = [f (xi )] = [f (xi + ) f (xi )]
2
2
2
= f (xi h)
where xi denotes any specific values for x such as x , x1 , and so forth. In general,
the central differences of f (xi ) are
h
f (xi ) = f (xi )
2
2
2
f (xi ) = f (xi h)
3h
3 f (xi ) = 3 f (xi )
2
4 f (xi ) = 4 f (xi 2h)
..
..
.
.
nh
n f (xi ) = n f (xi
)
2
Divided Differences
There are three disadvantages of using the Lagrangian polynomial method for interpolation.
26
f [xk , xj ] =
f [xk , xj , xl ] =
is called second order divided difference or divided difference of three arguments xk , xj and xl .
Similarly
f [xk , xj , xl ] f [xj , xl , xn ]
x k xn
= f [xl , xj , xk , xn ]
f [xk , xj , xl , xn ] =
f [x1 , x2 , x3 , , xn ] f [x , x1 , x2 , , xn1 ]
xn x
(22)
which is known as Newton form. If (x , f (x )), (x1 , f (x1 )), (x2 , f (x2 )), (x3 , f (x3 )),
, (xn , f (xn )) are (n + 1) data points, and Pn (x) is an interpolating polynomial, it
must match at (n + 1) data points i. e., Pn (xi ) = f (xi ) for i = 0, 1, 2, , n i. e.,
when
x
p(x )
a
x
p(x1 )
implies
when
=
=
=
=
=
a1 =
when
x =
p(x2 ) =
and
p(x2 ) =
Similarly when
x =
p(x3 ) =
+
and
p(x3 ) =
a3 =
x
a , and
p(x ) = f (x )
f (x ) = f
x1
a + a1 (x1 x ), and
p(x1 ) = f (x1 ) = f1
f1 f
= f [x , x1 ]
x1 x
x2
a + a1 (x2 x ) + a2 (x2 x )(x2 x1 ),
f [x2 , x1 ] f [x1 , x ]
= f [x , x1 , x2 ]
f 2 a2 =
x2 x
x3
a + a1 (x3 x ) + a2 (x3 x )(x3 x1 )
a3 (x3 x )(x3 x1 )(x3 x2 ),
f [x3 , x2 , x1 ] f [x2 , x1 , x ]
f 3 a3 =
x3 x
f [x , x1 , x2 , x3 ]
= f +
n1
X
k=0
n1
X
k=0
28
(24)
If f (x) is a polynomial of degree n, the remainder term is zero for all x, i. e.,
R(x) = 0
i.e.,
f [x, x , x1 , x2 , x3 , , xn ] = 0
is of degree zero (is a constant).
Omitting the remainder term in equation (23) gives
pn (x) = f +
n1
X
k=0
similarly
= s
= (s 1)
= (s 2)
= (s (n 1))
and
f
h
2 f
f [x , x1 , x2 ] =
2! h2
..
..
.
.
n f
f [x , x1 , x2 , x3 , , xn ] =
n! hn
f [x , x1 ] =
p(x) = f + s f +
xn1 = xn h
xn2 = xn 2 h
..
..
.
.
x = x n h
and for any general value of x we can write
x = xn s h
where s is a real number, then
(x xn )
h
(x xn1 )
h
(x xn2 )
h
(x x1 )
h
similarly
= s
= (1 s)
= (2 s)
= ((n 1) s)
fn
h
2 fn
f [xn , xn1 , xn2 ] =
2! h2
..
..
.
.
n fn
f [x , x1 , x2 , x3 , , xn ] =
n! hn
and
f [xn , xn1 ] =
p(x) = fn s fn +
f [x , x1 , x2 , , xn ] =
n f (x )
f [x1 , x2 , , xn ] f [x , x1 , , xn1 ]
=
xn x
n!hn
(26)
on the interval
[xi , xi+1 ],
for i = 0, 1, 2, 3, , n 1
=
=
=
=
yi ,
i = 0, 1, 2, 3, , n 1 and gn1 (xn ) = yn ;
gi+1 (xi+1 ),
i = 0, 1, 2, 3, , n 2;
0
gi+1 (xi+1 ),
i = 0, 1, 2, 3, , n 2;
00
gi+1 (xi+1 ),
i = 0, 1, 2, 3, , n 2;
(27)
(28)
(29)
(30)
Equations (27-30) say that the cubic spline fits to each of the points (27), is continuous (28), and is continuous in slope and curvature, (29) and (30), throughout the
region spanned by the points.
Using equation (27) in equation (26) immediately gives
di = yi ,
1 = 0, 1, 2, 3, , n 1.
(31)
(32)
(33)
(34)
00
(xn ) then
Let S(xi ) = gi00 (x) for i = 0, 1, 2, 3, , n 1 and Sn (xn ) = gn1
Si (xi ) =
=
Si+1 (xi ) =
=
bi =
(35)
ai
(36)
Substitute the relations for ai , bi , di given by equations (31), (35) and (36) into
equation (26) we get
Si+1 Si 3 Si 2
hi + hi + ci hi + yi = yi+1 ;
6hi
2
yi+1 yi hi Si+1 + 2hi Si
=
.
hi
6
gi (xi+1 ) =
implies
ci
(37)
(38)
In the previous interval, from xi1 to xi , the equation for cubic spline is
gi1 (x) = ai1 (x xi1 )3 + bi1 (x xi1 )2 + ci1 (x xi1 ) + di1 ;
0
gi1
(x) = 3ai1 (x xi1 )2 + 2bi1 (x xi1 ) + ci1 ;
0
gi1
(xi ) = 3ai1 (xi xi1 )2 + 2bi1 (xi xi1 ) + ci1 ;
(39)
Using equation (29), we obtain
yi0 = 3ai1 h2i1 + 2bi1 hi1 + ci1 ;
34
(40)
Equating equations (38) and (40) and using (35) (36) and (37) we get
yi+1 yi hi Si+1 + 2hi Si
hi
6
Si1
yi yi1 hi1 Si + 2hi1 Si1
Si Si1 2
hi1 + 2
hi1 +
= 3
6hi1
2
hi1
6
yi0 =
(41)
yi+1 yi yi yi1
= 6
hi
hi1
= 6 (f [xi+1 , xi ] f [xi , xi1 ]) .
(42)
If all of the intervals are equal in length, this simplifies to a linear difference
equation with constant coefficients:
Si1 + 4Si + Si+1 = 6
2 f (xi1 )
.
h2
(43)
2h S + h1 S1
= 6(f [x , x1 ] A)
hn1 Sn1 + 2hn Sn = 6(B f [xn1 , xn ])
S2 S1
,
h1
(h1 + h )S1 h S2
=
.
h1
S1 S
h
S
35
At right end:
implies
Sn Sn1
Sn1 Sn2
=
,
hn1
hn2
(hn1 + hn2 )Sn1 hn1 Sn2
.
Sn
=
hn2
2(h + h1 )
h1
h1
2(h1 + h2 )
h2
h2
2(h2 + h3 )
h3
h
2(h
3
3 + h4 )
..
..
.
.
h4
hn2 2(hn2 + hn1 )
Condition 2
f0 = A, fn0 = B :
2h
h1
h 2(h + h1 )
h1
h1
2(h1 + h2 )
h2
h
2(h
2
2 + h3 ) h3
..
..
.
.
hn2 2hn1
Condition 3
S = S1 , Sn = Sn1 :
(3h + 2h1 )
h1
h1
2(h1 + h2 )
h2
h
2(h
h3
2
2 + h3 )
h3
2(h3 + h4 )
..
..
.
.
h4
hn2 (2hn2 + 3hn1 )
Condition 4
h1
h1
h1
2(h1 + h2 )
h2
h2
2(h2 + h3 )
..
(h2n2 h2n1 )
hn2
h3
..
After the Si values are obtained, we can compute ai , bi , ci , and di for the cubics
in each interval, using
ai =
Si+1 Si
6hi
bi =
ci = f [xi , xi+1 ]
hi Si+1 + 2hi Si
6
Si
2
di = yi
means that for every > 0, there is a > 0 such that P is a partition of [a, b] with
kP k < , then
X
|
f (k )xk | <
k
37
Rectangle Rules
Rb
Using the above definitions, we approximate the value of a f (x) dx as a sum of
areas of rectangles. In particular, if we use a regular partition (evenly spaced data
ba
, then xk = x + kh for k = 0, 1, 2, 3, , n, a = x ,
points) with h = x =
n
b = xn , and
Z b
n
X
f (k )h,
f (x) dx
a
k=1
th
Z
a
n
X
baX
f (x)dx Am =
f (k1/2 )
f (k1/2 )h =
n k=1
k=1
If a function is strictly increasing or strictly decreasing over the interval, then the
end points rules give the areas of the inscribed and circumscribed rectangles.
width. The area under the curve in each subinterval is approximated by the trapezoid formed by replacing the curve by its secant line drawn between the endpoints
x < x1 < x2 < < xn of the curve. The integral is then approximated by the sum
of all the trapezoid areas. Let h be the constant x. Since the area of a trapezoid
is the sum of the area of a rectangle and the area of a triangle, for each subinterval,
Z
i+1
f (x)dx xf (xi ) +
i
x
(f (xi+1 f (xi ))
2
h
(f (xi ) + f (xi+1 ))
2
is known as trapezoidal rule, which can also be obtained from midpoint rule and
if f (x) 0 on interval [a, b], then to find the area under the curve f (x) over [a, b],
Z b
represented by
f (x)dx, subdivid [a, b] into n subintervals of size h, so that
a
h
(f (x ) + f (x1 ))
2
Z 2
h
A1 =
f (x)dx (f (x1 ) + f (x2 ))
2
1
..
..
.
.Z
n
h
An1 =
f (x)dx (f (xn1 ) + f (xn ))
2
n1
A =
f (x)dx
f (x)dx A + A1 + A2 + A3 + + An1
A=
a
39
(44)
h
(f (x ) + 2f (x1 ) + 2f (x2 ) + + 2f (xn1 ) + f (xn ))
2
a
Z b
n1
X
h
f (x)dx (f (x ) + 2
A =
f (xi ) + f (xn ))
(45)
2
a
i=1
f (x)dx
A =
or
equation (45) is called the composite trapezoidal rule. This method, for replacing a curve by a straight line is hardly accurate, unless the subintervals are very
small.
Simpsons Rules
The trapezoidal rule approximates the area under a curve by summing the areas of
uniform width trapezoids formed by connecting successive points on the curve by
straight lines. Simpsons rule gives a more accurate approximation by connecting
successive groups of three points on the curve by second-degree parabolas, known
as Simpsons 13 rule, and summing the areas under the parabolas to obtain the
approximate area under the curve, or by connecting successive groups of four points
on the curve by third-degree polynomial, known as Simpsons 38 rule, and summing
the areas, to obtain the approximate area under the curve.
Simpsons
1
3
Rules
Consider the area contained in the two strips under the curve of f (x) comprising
with three data points (x , y ), (x1 , y1 ) and (x2 , y2 ) . Approximate this area with
the area under a parabola passing through these three points.
The general form of the equation of the second-degree parabola connecting the
three points is
f (x) = ax2 + bx + c
(46)
The integration of equation (46) from x to x gives the area contained in the
two strips under the parabola. Hence,
Z
(ax2 + bx + c)dx
A2 strips =
x
=
ax3 bx2
+
+ cx
3
2
x
x
2
=
a(x)3 + 2c(x)
3
40
(47)
The constants a and c can be determined from the fact that points (x, y ), (0, y1 )
and (x, y2 ) must all satisfy equation (46). The substitution of these three sets of
the coordinates into equation (46) yields
a =
y 2y1 + y2
2(x)2
b=
y2 y
2(x)
c = y1
(48)
The substitution of the first and the third parts of equation (48) into equation (47)
yields
x
A2 strips =
(y + 4y1 + y2 )
(49)
3
which gives the area in terms of the three ordinates y , y1 , and y2 and the width x
of a single strip. This constitutes Simpsons 31 rule for obtaining the approximate
area contained in two equal-width strips under a curve.
If the area under a curve between two values of x is divided into n uniform strips
(n even), the application of equation (49) shows that
A
A2
A4
..
.
An2
x
(y + 4y1 + y2 )
3
x
(y2 + 4y3 + y4 )
=
3
x
=
(y4 + 4y5 + y6 )
3
..
.
x
(yn2 + 4yn1 + yn )
=
3
=
(50)
xn
f (x)dx A + A2 + A4 + + An2
x
x
(y + 4y1 + 2y2 + 4y3 + 2y4 + 4y5
3
+ 2y6 + + 4yn1 + yn )
i=n1
i=n2
X
X
x
=
(y + 4
yi + 2
yi + yn )
3
i=1,3,5
i=2,4,6
(51)
Simpsons
3
8
Rules
3
x
2
A3 strips =
32 x
=
=
23 x
32 x
9
b(x)3 + 3d(x)
4
(52)
3
The constants b and d can be determined from the fact that points ( x, y ),
2
1
1
3
( x, y1 ), ( x, y2 ), and ( x, y3 ) must all satisfy equation (52). Using these
2
2
2
four sets of the coordinates into equation (52), we obtain
y + y3 y1 y2
4x2
1
d =
(9y1 + 9y2 y y3 )
8
b =
(53)
(54)
The substitution of the b and d from equations (53) and (54) in equation (52)
yields
A3 strips =
3
(x)(y + 3y1 + 3y2 + y3 )
8
(55)
which is Simpsons three-eighths rule for obtaining the approximate area contained in three equal-width strips under a curve. It gives the area in terms of the
four ordinates y , y1 , y2 , and y3 and the width x of a single strip.
42
If the area under a curve is divided into n, a multiple of 3, uniform strips then
the application of equation (55) shows that
3x
(y + 3y1 + 3y2 + y3 )
8
3x
=
(y3 + 3y4 + 3y5 + y6 )
8
3x
=
(y6 + 3y7 + 3y8 + y9 )
8
..
.
3x
(yn3 + 3yn2 + 3yn1 + yn )
=
8
A3
A6
..
.
An3
(56)
3x
(y + 3y1 + 3y2 + 2y3 + 3y4 + 3y5
8
+ 2y6 + + 2yn3 + 3yn2 + 3yn1 + yn )
(57)
This formula will not give us the exact answer because the polynomial is not identical
with f (x). We get an expression for the error by integrating the error term of Pn (xs ).
So starting with the Newton-Gregory forward polynomial of degree n
s(s 1)(s 2) 3
s(s 1) 2
f +
f +
2!
3!
s(s 1)(s 2) (s (n 1)) n
+
f
n!
Pn (xs ) = f + s f +
43
P1 (xs ) = f + s f
Z
Z x1
b
P1 (xs )dx =
f (x)dx
(f + s f )dx
x1
= f h s ]10 + f h
=
s2 1
]
2 0
h
(f + f1 )
2
using f = f1 f and for the error estimation, we use the next-term rule i. e.,
Error = (Approximately) the value of the next term that would be added
to P1 (xs ) implies
Z x1
s(s 1) 2
Error =
f dx
2!
x
Z
2 f h 1 2
(s s)ds
=
2
0
1
2 f h s3 s2
=
2
3
2 0
h3
h
= 2 f = f00
12
12
2 f
= f00 . Taking
h2
(x , x1 ), then
Since
h3 00
f ()
12
Therefore, Newton-Cotes formula for n = 1 becomes
Z x1
h
h3
f (x)dx = (f + f1 ) f 00 ()
where (x , x1 )
2
12
x
Error =
s(s 1) 2
f
2!
x2
f (x)dx
and
P2 (xs )dx
Zxx2
(f + s f +
=
x
s(s 1) 2
f )dx
2!
24
5
4
3
2 0
4
4
f h
4
f h
=
=
24
15
90
4 f
= fiv . Taking
h4
(x , x2 ), then
Since
h5 iv
f ()
90
45
= f h s + f h
2 0
2
3
2 0
0
3
3 f h s4
+
s3 + s2
6
4
0
3
3
1
= 3h(f + (f1 f ) + (f2 2f1 + f ) + (f3 3f2 + 3f1 f ))
2
4
8
3h
(f + 3f1 + 3f2 + f3 )
=
8
where f = f1 f , 2 f = f2 2f1 + f and 3 f = f3 3f2 + 3f1 f .
Considering next term for error, we obtain
Z x3
s(s 1)(s 2)(s 3) 4
Error =
f dx
4!
x
Z 3
s(s 1)(s 2)(s 3) 4
f hds
=
4!
0
3
4 f h s5 6s4 11s3 6s2
3
=
= 4 f h
24
5
4
3
2 0
80
Taking (x , x3 ) such that f iv () is the maximum value in (x , x3 ), then
3
Error = h5 f iv ()
80
Therefore, Newton-Cotes formula for n = 3 is derived as
Z x3
3h
3
f (x)dx =
(f + 3f1 + 3f2 + f3 ) h5 f iv ()
where (x , x3 )
8
80
x
3
1
Since the order of error of the rule is the same as that of the rule, there is no
8
3
3
gain in accuracy over the rule when one has a free choice between the two rules.
8
46
Gaussian Quadrature
Gauss observed that if we dont consider the function at predetermined x-values,
a three term formula will contain six parameters and should corresponds to an
interpolating polynomial of degree five. Formulas based on this principle are called
Gaussian quadrature formulas. They can be applied only when f (x) is known
explicitly, so that it can be evaluated at any desired value of x. Consider the simple
case of two-term formula containing four unknown parameters:
Z 1
f (t)dt af (t1 ) + bf (t2 )
1
Multiplying the third equation by t21 , and subtracting from the first, we get
0 = 0 + b(t32 t2 t21 ) = b(t2 )(t2 t1 )(t2 + t1 ).
this implies that either b = 0, t2 = 0, t1 = t2 , or t1 = t2 . Only the last of these
possibilities is satisfactory, the other being invalid, or else reduce our formuls to only
a single term, so we choose t1 = t2 . Then
a
= b
t2 = t1
Z
1,
1
= = 0.5773,
3
(59)
i.e., adding these two values of the function gives the exact value for the integral of
any cubic polynomial over the interval from 1 to 1.
Consider a problem with limits of integration from a to b, not from 1 to 1 for
which we derived this formula. We must change the interval of integration to (1, 1)
47
x=
then
Z
a
ba
f (x)dx =
2
so that dx =
Z
f(
1
ba
dt,
2
ba
b+a
t+
) dt.
2
2
Gaussian quadrature can be extended beyond two terms. The formula is then
given by
Z 1
n
X
f (t) dt
wi f (ti ),
for n points.
1
i=1
This formula is exact for functions f (t) that are polynomials of degree 2n 1 or
less.
Moreover, by extending the method we used previously for the 2-point formula,
for each n we obtain a system of 2n equation:
0,
for k = 1, 3, 5, , 2n 1;
w1 tk1 + w2 tk2 + w3 tk3 + + wn tkn =
2
for k = 0, 2, 4, , 2n 2.
k+1
Adaptive Quadrature
The composite quadrature rules necessitate the use of equally spaced points. It is
useful to introduce a method that adjusts the step size to be smaller over portions
of the curve where a larger functional variation occurs. This technique is called
adaptive quadrature. The method is based on Simpsons rule.
Simpsons rule uses two subintervals over [ak , bk ]:
Z bk
h
f (x)dx [f (ak ) + 4f (ck ) + f (bk )] = S(ak , bk ), say
(60)
3
ak
where ck =
1
(a
2 k
b k ak
. Furthermore,
2
(61)
Assume that f (4) (1 ) f (4) (2 ); then the right sides of equations (61) and (63) are
used to obtain the relation
h5 (4)
h5 f (4) (2 )
S(ak , bk ) f (2 ) S(ak1 , bk1 ) + S(ak2 , bk2 )
90
16 90
(64)
16
h5 (4)
f (2 ) [S(ak1 , bk1 ) + S(ak2 , bk2 ) S(ak , bk )]
90
15
(65)
Si+1 Si
,
6(xi+1 xi )
49
Si
,
2
f (xi+1 ) f (xi ) (xi+1 xi )Si+1 + 2(xi+1 xi )Si
,
=
(xi+1 xi )
6
= f (xi ).
bi =
ci
di
Approximating the integral of f (x) over the n intervals where f (x) is approximated by the spline is straightforward:
Z ni+1
Z ni+1
f (x)dx =
f (x)dx
xi
xi
n
X
i=1
xi+1
bi
ci
ai
4
3
2
(x xi ) + (x xi ) + (x xi ) + di (x xi )
4
3
2
xi
n
X
ai
i=1
bi
ci
3
2
(xi+1 xi ) + (xi+1 xi ) + (xi+1 xi ) + di (xi+1 xi ) .
4
3
2
4
If the intervals are all of the same size, (h = xi+1 xi ), this equation becomes
Z ni+1
n
n
n
n
X
h4 X
h3 X
h2 X
f (x)dx =
ai +
bi +
ci + h
di .
4 i=1
3 i=1
2 i=1
xi
i=1
Numerical Differentiation
For numerical differentiation we shall use Taylor-series for a function y = f (x) at
(xi + h) expanded about xi
h3
h2 00
f (xi ) + f 000 (xi ) +
(67)
2!
3!
where h = x and f (xi ) is the ordinate corresponding to xi and (xi + h) is in the
region of convergence. The function at (xi h) is similarly given by
f (xi + h) = f (xi ) + hf 0 (xi ) +
h2 00
h3
f (xi ) f 000 (xi ) +
2!
3!
Subtracting equation (68) from equation (67), we obtain
2
f (xi + h) f (xi h)
h 000
0
f (xi ) =
f (xi ) +
2h
6
f (xi h) = f (xi ) hf 0 (xi ) +
(68)
(69)
If we designate equally spaced points to the right of xi as xi+1 , xi+2 , and so on,
and those to the left of xi as xi1 , xi2 , and identify the corresponding ordinates as
fi+1 , fi+2 , fi1 , fi2 , respectively, equation (69) can be written in the form
fi0 =
fi+1 fi1
2h
50
(70)
h2
12 i
The approximate expression using only the first term to the right of the equal sign
is the central-difference approximation of the second derivative of the function
at xi , with error of order h2 . As with the first derivative, the approximations with
error of higher order can also be derived.
To obtain an expression for the third derivative, we expand function at xi2 and
at xi+2 such that
(2h)2 00 (2h)3 000
f +
f +
2! i
3! i
(2h)2 00 (2h)3 000
f
f +
fi2 = fi 2hfi0 +
2! i
3! i
With the help of equations (67), (68), (72), and (73) we derive
fi+2 = fi + 2hfi0 +
fi000 =
(72)
(73)
(74)
Equation (74) gives the central-difference approximation for the third derivative
of the function f (x) at xi , with error of order h2 .
Successively higher derivatives can be obtained by this method, but since they
require the solution of increasingly larger number of simultaneous equations, the
process becomes quite tedious. The same technique may also be used to find more
accurate expressions for the derivatives, by using additional terms in the Taylorseries expansion.
It can be noted that the central-difference expressions for the various derivatives
involve values of the function on both sides of the x values at which the derivative of the function is desired. Using Taylor-series expression we can also obtain
expressions for the derivatives which are entirely in terms of values of the function
at xi and points to the right of xi . These are known as forward-finite-difference
expressions. In a similar manner, derivative expressions which are entirely in terms
of values of the function at xi and points to the left of xi can be found. These are
known as backward-finite-difference expressions. In numerical differentiation,
forward-finite-difference expressions are used when data to the left of a point at
which a derivative is desired are not available, and backward-difference expressions
are used when data to the right of desired point are not available. Central-difference
expressions, however, are more accurate than either forward- or backward-difference
expressions.
51
fi0 =
(75)
fi00
(76)
fi000
fi0000
(77)
(78)
fi0 =
(79)
fi00
(80)
fi000
fi0000
(81)
(82)
fi0 =
(83)
fi00
(84)
fi000
fi0000
(85)
(86)
fi0 =
(87)
fi00
(88)
fi000
52
(89)
fi0000 =
(90)
fi0 =
(91)
fi00
(92)
fi000
fi0000
(93)
(94)
fi0 =
(95)
fi00
(96)
fi000
fi0000
(97)
(98)
f (xi+1 ) f (xi )
= f 0 (xi )
xi+1 xi
f [xi+1 , xi+2 ] f [xi , xi+1 ]
xi+2 xi
0
f (xi+1 ) f 0 (xi )
f 00 (xi )
=
2h
2
f [xi+1 , xi+2 , xi+3 ] f [xi , xi+1 , xi+2 ]
xi+3 xi
00
00
f (xi+1 ) f (xi )
f 000 (xi )
=
2 3h
3!
..
.
f [xi+1 , xi+2 , , xi+n ] f [xi , xi+1 , , xi+n1 ]
xi+n xi
53
f n (xi )
n!
f n (xi ) =
=
f (xi )
f (xi+1 ) f (xi )
=
xi+1 xi
h
f (xi+2 ) 2f (xi+1 ) + f (xi )
2 f (xi )
=
h2
h2
3 f (xi )
f (xi+3 ) 3f (xi+2 ) + 3f (xi+1 ) f (xi )
=
h3
h3
..
.
f (xi+n ) nf (xi+n1 ) + n(n 1)f (xi+n2 ) + ()n f (xi )
hn
n
f (xi )
hn
f
fi+1,j fi,j
=
+ O(x)
x ij
x
f
fi,j+1 fi,j
=
+ O(y)
y ij
y
Similarly, the second-order central difference formulas for the second order derivatives are of the form
2f
x2
2f
y 2
x2
12 x4
y 2
12 y 4
ij
ij
54
2f
xy
=
ij
2f
yx
=
ij
+O(x) + O(y)
fi+1,j+1 fi+1,j1 fi1,j+1 + fi1,j1
=
4xy
+O(x2 ) + O(y 2 )
with y(a) = y .
with y(t ) = y .
for k = 0, 1, 2, , m.
55
It is the difference between the unique solution and the solution obtained by the
discrete variable method.
The local discretization error k+1 is defined by
k+1 = y(tk+1 ) yk h(tk , yk )
for k = 0, 1, 2, , m 1.
Euler Methods
Euler methods consist of three versions
1. forward Euler method,
2. modified Euler method,
3. backward Euler method.
with y(a) = y .
Subdivide the interval [a, b] into m equal subintervals and select the mesh points
tk = a + hk
for k = 0, 1, 2, 3, , m where h =
ba
m
(99)
The value h is called the step size. We now proceed to solve approximately
y 0 = f (t, y)
(100)
Assume that y(t), y 0 (t), and y 00 (t) are continuous and use Taylors theorem to
expand y(t) about t = t . For each value t, there exists a value that lies between
t and t so that
y 00 ( )(t t )2
y(t) = y(t ) + y (t )(t t ) +
.
2
0
(101)
h2
.
2
(102)
If the step size h is chosen small enough, then we may neglect the second-order term
(involving h2 ) and get
y1 = y + hf (t , y ),
(103)
which is forward Eulers approximation. The process is repeated and generates
a sequence of points that approximate the solution curve y = y(t). The general step
for forward Eulers method is
tk+1 = tk + h,
(104)
The trouble with this most simple method is its lack of accuracy, requiring an
extremely small step size. We might improve this method with just a little additional
effort.
for some k (tk , tk+1 ). Successive differentiation of the solution, y(t), gives
y 0 (t) = f (t, y(t)),
and in general,
y (k) (t) = f (k1) (t, y(t)).
Substituting these results into equation (105) gives
h2 0
f (tk , y(tk )) +
2
hn (n1)
hn+1 (n)
+
f
(tk , y(tk )) +
f (k , y(k )),
n!
(n + 1)!
(106)
Heuns Method
Heuns method introduces a new idea for constructing an algorithm to solve the
I. V. P.
y 0 (t) = f (t, y(t))
(107)
To obtain the solution point (t1 , y1 ) we can use the fundamental theorem of calculus,
and integrate y 0 (t) over [t , t1 ] and get
Z
t1
t1
f (t, y(t))dt =
t
(108)
where the antiderivative of y 0 (t) is the desired function y(t). When equation (108)
is solved for y(t1 ), the result is
Z
t1
y(t1 ) = y(t ) +
f (t, y(t))dt.
(109)
h
[f (t , y ) + f (t1 , y + hf (t , y ))] .
2
(111)
The process is repeated and generates a sequence of points that approximates the
solution curve y = y(t). At each step, Eulers method is used as a prediction, and
then the trapezoidal rule is used to make a correction to obtain the final value. The
general step for Heuns method is
pk+1 = yk + hf (tk , yk ),
tk+1 = tk + h,
h
yk+1 = yk + [f (tk , yk ) + f (tk+1 , pk+1 )] .
2
58
(112)
h3 00
y (k ).
12
(113)
If the only error at each step is that given in (113), after m steps the accumulated
error for Heuns method would be
m
X
h3
k=1
12
y 00 (k )
b a 00
y (k )h2 O(h2 ).
12
(114)
The Taylors method have the desirable property of higher-order local truncation
error, but the disadvantage of requiring the computational and evaluation of the
derivatives of f (t, y). This is complicated and time-consuming procedure for most
problems, so the Taylor methods are seldom used in practice.
t
)(y
y
)
(t , y )
2
t2
ty
(y y )2 2 f
(t , y ) +
2
y 2
#
" n+1
n
X
1
n
f
(t t )nj (y y )j nj j (t , y )
n! j=0 j
t y
Pn (t, y) =
+
+
+
59
(115)
and
"
#
n+1
n+1
X
n+1
f
1
(t t )n+1j (y y )j n+1j j (t , y ) (116)
Rn (t, y) =
(n + 1)! j=0
j
t
y
The function Pn (t, y) is called nth Taylor polynomial in two variables for the
function f about (t , y ) and Rn (t, y) is the remainder associated with Pn (t, y).
To develop Runge-Kutta method of order 2, let us start with Taylors polynomial
as
h2 00 h3 000
y(ti+1 ) = y(ti ) + hy (ti ) + y + y ()
2
3!
h2 0
h3
= y(ti ) + hf (ti , y(ti )) + f (ti , y(ti )) + y 000 ().
2
3!
0
Since
f 0 (ti , y(ti )) =
and
f
f
(ti , y(ti )) +
(ti , y(ti )).y 0 (ti )
t
y
Expanding
f (ti + , yi + )
in its Taylor polynomial of degree one about (ti , yi ) gives
a1 f (ti + , yi + ) a1 f (ti , y(ti )) + a1
+ a1
f
(ti , y(ti ))
t
f
(ti , y(ti ))
y
Matching the coefficients of f and its derivatives enclosed in the braces in the preceding equation gives
f (t, y) :
and
a1 = 1;
f
(t, y) :
y
f
(t, y) :
t
a1 1 =
60
a1 1 =
h
f (t, y).
2
h
;
2
1 =
h
,
2
and 1 =
h
f (t, y);
2
so
h
h
y(ti+1 ) = y(ti ) + hf ti + , y(ti ) + f (ti , y(ti ))
2
2
The error introduced by replacing the term in the Taylor method with its approximation has the same order as the error term for the method. The Runge-Kutta
method produced in this way, called the Midpoint method, is also a secondorder method. As a consequence, the local error of the method is proportional to
h3 , and the global error is proportional to h2 .
Midpoint Method
w0 = ,
wi+1 = wi + +hf
h
h
ti + , wi + f (ti , wi )
2
2
where i = 0, 1, ..., N 1, with local error O(h3 ) and global error O(h2 ).
Using a1 f (t + , y + ) to replace the term in the Taylor method is the easiest
choice, but it is not the only one. If we instead use a term of the form
a1 f (t, y) + a2 f (t + , y + f (t, y)),
the extra parameter in this formula provides an infinite number of second-order
Runge-Kutta formulas. When a1 = a2 = 21 and = = h, we have the
Modified Euler method.
Predictor-Corrector Methods
The Taylor and Runge-Kutta methods are examples of one-step methods for approximating the solution to initial-value problems. These methods use wi in the
approximation wi+1 to y(ti+1 ) but do not involve any of the prior approximations
w0 , w1 , , wi1 . Generally some functional evaluations of f are required at intermediate points, but these are discarded as soon as wi+1 is obtained.
Since |y(tj ) wj | decreases in accuracy as j increases, better approximation
methods can be derived if, when approximating y(ti+1 ), we include in the method
some of the approximations prior to wi . Methods developed using this philosophy
are called multistep methods.
In brief, one-step methods consider what occurred at only one previous step;
multistep methods consider what happened at more than one previous steps. To
derive a multistep method, suppose that the solution to the initial-value
dy
= f (t, y),
dt
ti+1
y(ti+1 ) y(ti ) =
ti+1
y (t)dt =
ti
f (t, y(t))dt,
ti
and
Z
ti+1
y(ti+1 ) = y(ti ) +
f (t, y(t))dt.
ti
62
Since we cannot integrate f (t, y(t)) without knowing y(t), which is the solution to
the problem, we instead integrate an interpolating polynomial, P (t), determined by
some of the previously obtained data points
(t0 , w0 ), (t1 , w1 ), , (ti , wi ).
When we assume, in addition, that y(ti ) wi , we have
Z
ti+1
y(ti+1 ) wi +
P (t)dt.
ti
w0 = ,
wi+1
w 1 = 1 ,
h
= wi + (3f (ti , wi ) f (ti1 , wi1 )) ,
2
5 000
y (i )h3
12
w1 = 1 ,
w2 = 2 ,
h
= wi +
(23f (ti , wi ) 16f (ti1 , wi1 ) + 5f (ti2 , wi2 )) ,
12
where i = 2, 3, , N 1, with local error 83 y (4) (i )h4 for some i (ti2 , ti+1 ).
63
w0 = ,
wi+1
w 1 = 1 ,
w2 = 2 ,
w3 = 3 ,
h
(55f (ti , wi ) 59f (ti1 , wi1 )
= wi +
24
+ 37f (ti2 , wi2 ) 9f (ti3 , wi3 )) ,
251 (5)
y (i )h5
720
w1 = 1 ,
w2 = 2 ,
w4 = 4 ,
h
= wi +
(1901f (ti , wi ) 2774f (ti1 , wi1 )
720
+ 2616f (ti2 , wi2 ) 1274f (ti3 , wi3 ) + 251f (ti4 , wi4 )) ,
95 (6)
where i = 4, 5, ..., N 1, with local error 288
y (i )h6 for some i (ti4 , ti+1 ).
Implicit methods use (ti+1 , f (ti+1 , y(ti+1 ))) as an additional interpolation node
in the approximation of the integral
ti+1
f (t, y(t))dt.
ti
Some of the more common implicit methods are listed here. Notice that the local
error of an (m 1)-step implicit method is O(hm+1 ), the same as that of an m-step
explicit method. They both use m function evaluations, however, since the implicit
methods use f (ti+1 , wi+1 ), but the explicit methods do not.
w0 = ,
wi+1
w 1 = 1 ,
h
= wi +
(5f (ti+1 , wi+1 ) + 8f (ti , wi ) f (ti1 , wi1 )) ,
12
1 (4)
where i = 1, 2, , N 1, with local error 24
y (i )h4 for some i in (ti1 , ti+1 ).
64
w 1 = 1 ,
w2 = 2 ,
h
(9f (ti+1 , wi+1 ) + 19f (ti , wi ) 5f (ti1 , wi1 )
= wi +
24
+ f (ti2 , wi2 )) ,
19 (5)
where i = 2, 3, , N 1, with local error 720
y (i )h5 for some i in (ti2 , ti+1 ).
w 1 = 1 ,
w2 = 2 ,
w3 = 3 ,
h
= wi +
(251f (ti+1 , wi+1 ) + 646f (ti , wi )
720
246f (ti1 , wi1 ) + 106f (ti2 , wi2 ) 19f (ti3 , wi3 )) ,
3 (6)
y (i )h6 for some i in (ti3 , ti+1 ).
where i = 3, 4, , N 1, with local error 160
Comparing an m-step Adams-Bashforth explicit method to an (m 1)-step
Adams-Moulton implicit method, we see that both require m evaluations of f per
step, and both have the terms y (m+1) (i )hm+1 in their local errors. In general, the
coefficients of the terms involving f in the approximation and those in the local
error are smaller for the implicit methods than for the explicit methods. This leads
to smaller truncation and round-off errors for the implicit methods.
In practice, implicit multistep methods are not used alone. Rather, they are
used to improve approximations obtained by explicit methods. The combination of
an explicit and implicit technique is called a predictor-corrector method. The
explicit method predicts an approximation, and the implicit method corrects this
prediction.
Milnes Method
wi+1 = wi3 +
4h
(2f (ti , wi ) f (ti1 , wi1 ) + 2f (ti2 , wi2 )) ,
3
14 (5)
y (i )h5 for some i in (ti3 , ti+1 ).
where i = 3, 4, , N 1, with local error 45
This method is used as a predictor for an implicit method called Simpsons
method. Its name comes from the fact that it can be derived using Simpsons rule
for approximating integrals.
65
Simpsons Method
wi+1 = wi1 +
h
(f (ti+1 , wi+1 ) + 4f (ti , wi ) + f (ti1 , wi1 )) ,
3
1 (5)
where i = 1, 2, , N 1, with local error 90
y (i )h5 for some i in (ti1 , ti+1 ).
Although the local error involved with a predictor-corrector method of the MilneSimpson type is generally smaller than that of the Adams-Bashforth-Moulton method,
the technique has limited use because of round-off error problems, which do not occur with the Adams procedure.
66