Professional Documents
Culture Documents
M. D. Jones, Ph.D.
Center for Computational Research University at Buffalo State University of New York
1 / 84
Introduction
Types
Types of PDEs
Recapitulate - types of partial differential equations (2nd order, 2D): Axx + 2Bxy + Cyy + Dx + Ey + F = 0 Three major classes: Hyperbolic (often time dependent, e.g., wave equation) Elliptic (steady state, e.g., Poissons Equation) Parabolic (often time dependent, e.g. diffusion equation) (nonlinear when coefcients depend on )
3 / 84
Introduction
Hyperbolic
Hyperbolic Problems
Hyperbolic problems: det(Z ) = Examples: Wave equation Explicit time steps Solution at point depends on neighboring points at prior time step A B B C <0
4 / 84
Introduction
Elliptic
Elliptic Problems
Examples: Laplace equation, Poisson Equation Steady-state problems Solution at points depends on all other points Locality is tougher than hyperbolic problems
5 / 84
Introduction
Parabolic
Parabolic Problems
Examples: Heat equation, Diffusion equation Needs an elliptic solve at each time step
6 / 84
Introduction
Parabolic
There are many forms of discretization that can be used on these systems: Finite Elements Finite Differences Adaptive combinations of the above for our purposes, though, we can illustrate many of the basic ideas using a simple one-dimensional example on a regular mesh.
7 / 84
The plus side of this example is that we have an integral general solution:
1 t
(x) =
x
dt
0
f (s)ds
9 / 84
Mesh Approach
Now we apply a regular mesh, i.e. a uniform one dened by N points uniformly distributed over the domain: xi = (i 1)x, x = h = 1/N.
Using a mid-point formula for the derivative: 2hd/dx(x) = (x + h/2) (x h/2), Applying this central difference formula twice yields (x h) + 2(x) (x + h) = h2 d 2 /dx 2 (x), or in another form n1 + 2n n+1 = h2 n ,
M. D. Jones, Ph.D. (CCR/UB) Partial Differential Equations in HPC, I HPC-I Fall 2012 10 / 84
So now our original boundary value problem can be expressed in nite differences as n1 + 2n n+1 = h2 fn and this is now easily reduced to a matrix problem, of course. But before we go there, we can generalize the above to use an arbitrary mesh, rather than uniform ...
11 / 84
Let hn = xn xn1 , and we can scale by (hn+1 hn )/2: hn+1 hn n+1 n n n1 f (xn ) = + , 2 hn+1 hn 1 1 1 1 = + n+1 n1 . n hn+1 hn hn+1 hn
(1)
12 / 84
Eq. 1 is a symmetric tridiagonal matrix, which can be shown to be positive denite. The i-th row (1 < i < N) has entries: ai,i1 = 1 hi ai,i = 1 1 + hi hi+1 ai,i+1 = 1 hi+1
and the resulting matrix equation looks like: A = F and we can subject this matrix equation to our full arsenal of matrix solution methods.
13 / 84
(x)ui (x)dx =
f (x)ui (x)dx,
0
and an approximation scheme for the general integral, e.g. the trapezoidal rule:
xi+1
f (x)dx
xi
x (f (xi ) + f (xi+1 )) . 2
14 / 84
Matrix Factorization
Matrix Factorization
This formulation reduces to the same form as our nite difference one, and can also be recast for an arbitrary mesh. Regardless of whether we used nite elements or nite differences, our simple 1D example reduced to the simple tridiagonal form: A = F, where ...
15 / 84
Matrix Factorization
the tridiagonal matrix A is given by: 1 1 0 0 ... 0 1 2 1 0 ... 0 0 1 2 1 . . . 0 . . . . . . . . . . . . A= . . . . . . 0 . . . 1 2 1 0 0 ... 0 1 2 1 0 ... 0 0 1 2 and F1 = h2 f (x1 )/2, Fi = h2 f (xi ).
16 / 84
Matrix Factorization
Direct Solution
The tridiagonal matrix A can be factored very simply into a product of lower (L) and upper (LT ) bidiagonal matrices: 1 0 0 0 ... 0 1 1 0 0 ... 0 0 1 1 0 ... 0 . . . . . . . . . . . . L= . . . . . . 0 . . . 1 1 0 0 0 ... 0 1 1 0 0 ... 0 0 1 1
17 / 84
Matrix Factorization
LT =
18 / 84
Matrix Factorization
So this starts to look quite familiar to our old friend, LU decomposition. We introduce the auxiliary vector Y which solves by forward substitution: LY = F, and then the solution is given by backward solution LT = Y,
19 / 84
Matrix Factorization
The pseudo-code for this forward and backward solution is given by the following:
y(1) = f (1) do i =2 ,N y ( i ) = y ( i 1) + f ( i ) end do p h i (N) = y (N) do i =N1,1,1 p h i ( i ) = p h i ( i 1) + y ( i ) end do
20 / 84
Let us consider the solution of a general tridiagonal system given by: a1 b1 0 0 . . . 0 c1 a2 b2 0 . . . 0 A = 0 c a b ... 0 2 3 3 . . . . .. . . . . . . . . . . . . which we factor: A = LU
21 / 84
and the factors L and U are lower and upper bidiagonal matrices, respectively, 1 0 0 0 ... 0 l1 1 0 0 . . . 0 L = 0 l 1 0 ... 0 2 . . . . .. . . . . . . . . . . . . U= d1 e1 0 0 0 d2 e2 0 0 0 d3 e3 . . . . . . . . . . . . ... 0 ... 0 ... 0 . .. . . .
22 / 84
Matching up terms in Aij = (LU)ij yields the following three equations: ci1 = li1 di1 ai = li1 ei1 + di bi = ei (i, i 1) (i, i) (i 1, i) (2) (3) (4)
From Eq. 2 we obtain li1 = ci1 /di1 and substituting into Eq. 3 yields a recurrence relation for di : di = ai where fi = ci1 ei1 . fi ci1 ei1 = ai + , di1 di1
23 / 84
The parallel prex method is then used for solving the recurrence relation for the di by dening di = pi /qi , and expressing in matrix form: pi qi = ai + fi qi1 /pi1 = = Mi pi1 qi1 ai 1 fi 0 pi1 qi1 p0 q0 (5)
= Mi Mi1 . . . M1
where f1 = 0, p0 = 1, and q0 = 0. In other words, we can write the recurrence relation as the direct product of a sequence of 2x2 matrix multiplications.
24 / 84
Parallel Prex
The parallel prex method is then used (in parallel) to quickly evaluate the product in Eq. 5 by using a binary tree structure to quickly evaluate the individual products. The following gure illustrates this process
M1 x M2 x M3 x M4 x M5 x M6 x M7 x M8
M1 x M2 x M3 x M4
M5 x M6 x M7 x M8
(albeit for only eight terms): Note that this procedure is quite generally valid for any associative operation (i.e. (A B) C = A (B C)), and can be done in only O(log2 N) steps.
M1 M2 M3 M4 M5 M6 M7 M8
"
# " # "
! !
M1 x M2
M3 x M4
M5 x M6
% $ % $
) ( ) ( 1 0 1 0
M7 x M8
25 / 84
Parallel Efciency
The time taken for this parallel method is then given by: (P) 2c1 n + 2c2 log2 P, P
where c1 is the time taken for 1 Flop, and c2 that for one step of parallel prex using 2x2 matrices. The parallel efciency is then just 1 2P log2 P 2+ , E(P) n where = c2 /c1 . E(P) with n P log2 P.
M. D. Jones, Ph.D. (CCR/UB) Partial Differential Equations in HPC, I HPC-I Fall 2012 26 / 84
1 , 2 + 2
Back to Jacobi
Recapitulate
Recall that in solving the matrix equation Ax = b Jacobis iterative method consisted of repeatedly updating the values of x based on their value in the previous iteration: 1 (k ) (k 1) xi = bi Ai,j xj , Ai,i
j=i
we will come back to convergence properties of this relation a bit later. Now, however, we have a matrix A that has been determined by our choice of a simple stencil for the derivatives in our PDE, which is very sparse, making for a large reduction in the workload.
28 / 84
Back to Jacobi
Iterative Techniques
Now we consider, for our simple 1D boundary value problem used in the last section, iterative methods for solution. The simplest scheme is Jacobi iteration. For the general mesh in 1D, k +1 = n hn hn+1 hn hn+1 k + k + f (xn ), n+1 n1 hn+1 + hn hn+1 + hn 2
whereas for our even simpler uniform mesh: k +1 = k + (h2 /2)f (x1 ), 2 1 k +1 = k + k + h2 f (xn ) /2. n n+1 n1 Recall that xN+1 = 1, and (1) = 0. n [2, N]
29 / 84
Back to Jacobi
First, in the classical Jacobi, each iteration involves a matrix multiplication, plus a vector addition. As long as the solution vectors k and k +1 are kept distinct, there is a high degree of data parallelism (i.e. well suited to OpenMP or similar shared memory API). Each iteration takes O(N) work, but typically O(N 2 log N) iterations are required to obtain the same accuracy as the original discrete equations (in fact we can solve such a tridiagonal system in O(N) work in the serial case).
30 / 84
Gauss-Seidel
Gauss-Seidel
The Gauss-Seidel method is a variation on Jacobi that makes use of the updated solution values as they become available, i.e. k +1 = k + (h2 /2)f (x1 ), 2 1 k +1 = k + k +1 + h2 f (xn ) /2. n n+1 n1 n [2, N]
(where n is increasing as we proceed). Gauss-Seidel has advantages - it converges faster than Jacobi, but we have introduced some rather severe data dependencies at the same time.
31 / 84
Gauss-Seidel
Parallel Gauss-Seidel
Gauss-Seidel is made more difcult to parallelize by the data dependencies introduced by the use of the updated solution vector data on prior grid points. We can break the dependencies by introducing a tile, N = rP,
k +1 = k + (h2 /2)f (x1 ), 2 1
k +1 ri+1 =
k +1 ri+n
i [0, P 1]
In this way we get P parallel tasks worth of work. Pseudo-code for MPI-based version of this approach follows ...
32 / 84
Gauss-Seidel
do i t e r =1 , N _ i t e r a t i o n s i f ( myID == 0 ) p h i ( 0 ) = p h i ( 1 ) i f ( myiD == N_procs1) p h i (1+N/ N_procs ) = 0 ! upper BC do i =1 ,N/ P u ( i ) = 0 . 5 ( u ( i +1)+u ( i 1) ) + f ( i ) ! note h=1 end do SENDRECV( u ( 0 ) t o / from myID1 ) SENDRECV( u (1+N/ P) t o / from myID+1 ) end do
in practice this method is never used, though (well, ok, as a smoother in multigrid, sometimes).
33 / 84
Reordering Gauss-Seidel
Reordering
So what is done, in practice, for Gauss-Seidel, is to reorder the equations. For example using odd/even combinations: k +1 = k + (h2 /2)f (x1 ), 2 1 1 k k +1 = + k + h2 f (x2n1 ) 2n 2n1 2 2n2 1 k +1 + k +1 + h2 f (x2n1 ) k +1 = 2n+1 2n 2 2n1
and this looks very much (it is) like a pair of Jacobi updates. This is referred to as coloring the index sets, and the above is an example of the red-black coloring scheme (looks like a chess board). Generally you can use more than two colors, albeit with increasing complexity.
34 / 84
Reordering Gauss-Seidel
The red-black scheme is illustrated in the following examples (this is in 2D, where it looks a bit more pretty). Gray squares have been updated. Left gure is simple Gauss-Seidel, while right is a classic red-black.
Note the 5-point stencil illustrating the data dependence at a particular grid point.
35 / 84
Reordering Gauss-Seidel
Parallel Example
Updating our example, a two-color scheme is used, but in a slightly unorthodox way, in which the values phi(k) are one color, and remaining values phi(1) ...phi(k-1) the other color. This is done to facilitate the use of message-passing parallelism, and to better overlap computation with communication.
36 / 84
Reordering Gauss-Seidel
k = N/P tag_left = 1 i f ( myID < Nprocs1) SEND( p h i ( k ) t o myID+1 , t a g _ l e f t ) do i t e r =1 , N i t e r s i f ( myID > 0 ) RECV( p h i ( 0 ) from myID1, t a g _ l e f t ) i f ( myID == 0 ) p h i ( 0 ) = p h i ( 1 ) phi ( 1 ) = 0.5( phi ( 0 ) + phi ( 2 ) ) + f ( 1 ) t a g _ r i g h t = 2 i t e r i f ( myID > 0 ) SEND( p h i ( 1 ) t o myID1, t a g _ r i g h t ) do i =2 , k1 p h i ( i ) = 0 . 5 ( p h i ( i 1) + p h i ( i +1) ) + f ( i ) end do i f ( myID < Nprocs1) RECV( p h i ( k +1) from myID+1 , t a g _ r i g h t ) i f ( myID == Nprocs1) p h i ( k +1) = 0 p h i ( k ) = 0 . 5 ( p h i ( k +1) + p h i ( k1) ) + f ( k ) t a g _ l e f t = 2 i t e r +1 i f ( myID < Nprocs1) SEND( p h i ( k ) t o myID+1 , t a g _ l e f t ) end do
37 / 84
Multigrid Methods
Multigrid Basics
Multigrid Basics
Multigrid is easy to conceptualize, but considerably more tedious in its details. In a nutshell: Jacobi or Gauss-Seidel forms the basic building block Iterations are halted while problem is transferred to a coarser mesh Recursively applied
39 / 84
Multigrid Methods
Multigrid Basics
Multigrid, in a Nutshell
For simplicity, consider once again, our linear elliptic problem, L = f and if we discretize using a uniform length, h, Lh h = fh To this point we have addressed several methods of solving for the exact solution to the discretized equations (directly solving the matrix equations as well as relaxation methods like Jacobi/Gauss-Seidel).
40 / 84
Multigrid Methods
Multigrid Basics
Now, let h be an approximate solution to the discretized equation, and vh = h h . Then we can dene a defect or residual as dh = Lh h fh , using which our error can be written as Lh vh = Lh h Lh h = dh
41 / 84
Multigrid Methods
Multigrid Basics
We have been nding vh by approximating Lh , Lh : Jacobi: Lh = diagonal part of Lh Gauss-Seidel: Lh = lower triangular part of Lh :
N
i =
j=i=1
Lij j fi /Lii
42 / 84
Multigrid Methods
Multigrid Basics
Using this two-grid approximation, LH vH = dH and we need to handle both the ne->coarse (sometimes called restriction or injection) transition: dH = Rdh , and the coarse->ne (prolongation or interpolation) vh = P vH and our new solution is just new = h + vh . h
43 / 84
Multigrid Methods
Multigrid Basics
In summary, our two-grid cycle looks like: Compute defect on ne grid: dh = Lh h fh Restrict defect: dH = Rdh Solve on coarse grid for correction: LH vH = dH Interpolate correction to ne grid: vh = P vH Compute next approximation, new = h + vh
h
44 / 84
Multigrid Methods
Multigrid Basics
The next level of renement is that we smooth out higher-frequency errors, for which relaxation schemes (Jacobi/Gauss-Seidel) are well suited. In the context of our two-grid scheme, we pre-smooth on the initial ne grid, and post-smooth the corrected solution.
45 / 84
Multigrid Methods
Multigrid Basics
Putting the pieces together, the next idea is to develop a full multigrid by applying the scheme recursively down to some trivially simple (either by direct or relaxation methods) coarse grid. This is known as a cycle, but let us rst develop a bit more mathematical formalism.
46 / 84
Multigrid Methods
Multigrid Basics
Multigrid Notation
Some denitions are in order: Let K > 1 dene the multigrid levels Level-zero Grid is our starting coarse grid: xi = (i 1)x, x = 1/N, 1i N +1
Next level, the level-one grid, comes from subdividing level-zero: xN+2i = xi 1 xN+2i+1 = (xi + xi+1 ) 2 i [1, N + 1] i [1, N]
47 / 84
Multigrid Methods
Multigrid Basics
Level-k grid (k is the number of points in level k , 0 = N + 1, and Nk is the number of points in all previous levels N0 0, N1 = N + 1): xNk +2i1 = xNk 1 +i 1 xNk +2i = (xNk 1 +i + xNk 1 +i+1 ) 2 and induction leads to k = 2k 1 1 Nk +1 = Nk + k i [1, k 1 ] i [1, k 1 1]
48 / 84
Multigrid Methods
Multigrid Basics
49 / 84
Multigrid Methods
Multigrid Basics
Multigrid Solution
Let Nk 1 +1 , ..., Nk denote the solution values at level k 1. These values dene a piecewise linear function on the mesh for the k 1-th level,
k
k 1
(x) =
i=1
Nk 1 +i uik 1 (x),
where the uik 1 (x) basis functions are dened as in the nite element case - unity only for the i-th node at level k 1.
50 / 84
Multigrid Methods
Multigrid Basics
Since the grids are nested, k 1 (x) is also piecewise linear at level k , for which we can interpolate: Nk +2i1 = Nk 1 +i i [1, k 1 ] 1 Nk +2i = Nk 1 +i + Nk 1 +i+1 2 If UNk = 0, then UNk +1 = 0. It is also necessary to coarsen, or transfer a solution to a coarser grid. Inverting the coarse-ne renement is a reasonable choice: Nk 1 +i = 1 1 Nk +2i1 + nk +2i2 + Nk +2i 2 4 i [1, k 1 ]
i [1, k 1 1]
51 / 84
Multigrid Methods
V-Cycle
Applying Jacobi/Gauss-Seidel
The Jacobi or Gauss-Seidel method can be applied on any level of multigrid. Previously we had (for Gauss-Seidel):
1 2 + n 1 2 h f (x1 ) 2 n [2, N]
1 n+1 + n1 + h2 fn 2
52 / 84
Multigrid Methods
V-Cycle
The Gauss-Seidel/Jacobi iterations are not used for the sake of convergence, but rather for smoothing the solution, in the sense of smoothing out error between the approximate and full solution.
53 / 84
Multigrid Methods
V-Cycle
For the k -th level iteration we can write the solution using the shorthand notation (note this is the converged solution): Ak k = F k , after doing, say, mk smoothing steps we write the residual error as R k = F k Ak k , where RNk +1 FNk +1 Nk +1 + Nk +2 RNk +n FNk +n 2Nk +n + Nk +n+1 + Nk +n1 n [2, k ]
54 / 84
Multigrid Methods
V-Cycle
The error E = , where is the exact solution of A = F , satises the residual equation, Ak E k = R k . The important result is k = k + E k . The recursive part comes in solving the residual equation, which is done on a coarser mesh, RNk 1 +i = 1 1 RNk +2i1 + RNk +2i2 + RNk +2i 2 4 i [1, k 1 ]
55 / 84
Multigrid Methods
V-Cycle
Now, see why it is called the V-cycle? You go back up the grids (ne-to-coarse) then back again (the residuals are added back to using ne-to-coarse steps), resulting in a classic V if you plot the levels.
level 5 4 3 2 1 execution time
56 / 84
Multigrid Methods
Full Multigrid
Full Multigrid
Repeating the V-cycle on a given level will converge to O(x 2 ) in O(| log x|) iterations Now that we have all of the pieces, we can outline the full multigrid V-cycle Solve Ak k = F k at level 0, 0 = (1 , ..., N+1 ) 1 initially approximated by coarse-to-ne transfer of 0 Level-one proceeds (smoothing, solve level-zero residual equation, coarse-to-ne interpolation again, more smoothing) r times Level-two proceeds as in level-one. By doing r repetitions at each level, we get the Full Multigrid V-cycle.
57 / 84
Multigrid Methods
Full Multigrid
where S denotes smoothing, and E the exact solution on the coarsest grid (illustrated is 4-level, 2-cycle, i.e. K=4, r=2).
M. D. Jones, Ph.D. (CCR/UB) Partial Differential Equations in HPC, I HPC-I Fall 2012 58 / 84
Multigrid Methods
Full Multigrid
Without proof, it can be shown that full multigrid converges to O(x 2 ) in O(1) iterations. For large enough r , each k -th level iteration will 2 solve to an accuracy O(xk ). In practice, r = 10 is typically sufcient.
59 / 84
Multigrid Methods
Parallel Multigrid
Multigrid in Parallel
The good news is that multigrid easily parallelizes in basically the same way as Jacobi/Gauss-Seidel. We have basically two parts:
1
Smoothing, which is indeed Jacobi/Gauss-Seidel, and can be parallelized in exactly the same fashion, with little modication Inter-grid transfers (renement & un-renement is one way of looking at these interpolations) - these are relatively trivial to deal with, as only elements at the ends of the arrays need to be transferred
60 / 84
Multigrid Methods
Parallel Multigrid
Multigrid Scaling
Now lets examine the scaling behavior of multigrid. Basic V-cycle, we estimate the time as
K 1
V
k =0
c1 N2k , P
where c1 is the time spent doing basic oating-point operations during smoothing. Communication is involved just at exchanging data at the ends of each interval at each step in the V-cycle, comm lat K ,
61 / 84
Multigrid Methods
Parallel Multigrid
Solving on the coarse grid at level K, for which there are N2K unknowns. We need to collect the right-hand sides at each processor, proportional to N2K . Calling the constant of proportionality c2 , K c2 N2K
62 / 84
Multigrid Methods
Parallel Multigrid
If we equate the rst and last terms, P = 2K +1 /, where = c2 /c1 . In that case, 4c1 N (P) + lat log2 (P/2) , P and the efciency E 1 (P) 4 + P log2 (P/2) c1 N 1 P log2 (P/2) E(P) , 4 c1 N
63 / 84
Multigrid Methods
Parallel Multigrid
which implies (nally!) that the full multigrid is scalable if P log2 P N, and E(P ) 1 4 + /c1
64 / 84
Multigrid Methods
Parallel Multigrid
MG Illustrated
Flow through square duct with 90 bend, LR = Line Relaxation (Gauss-Seidel), PR = Point Relaxation (Gauss), DADI = (diagonalized) Alternating Direction Implicit, (S)MG = (Single)Multi-Grid. Work units normalzied to DADI-SG iterations.
L. Yuan, "Comparison of Implicit Multigrid Schemes for Three-Dimensional Incompressible Flows," J. Comp. Phys. 177, 134-155 (2002). M. D. Jones, Ph.D. (CCR/UB) Partial Differential Equations in HPC, I HPC-I Fall 2012 65 / 84
Multidimensional Problems
Yes, in fact there are problems requiring more than one dimension ... However, what we have done thus far in treating a one dimensional elliptic problem with mesh methods is easily transferable to 2D and 3D. Lets look at our elliptic problem in 2D ...
67 / 84
Multidimensional Problems
2D Example
2D Example
and for simplicity we take strictly Dirichlet boundary conditions, = 0 on the boundary. Again dene a regular mesh, xi = (i 1)x, yj = (j 1)y . Again for simplicity we will take N mesh points in each direction. Using nite differences or nite elements (and we could easily extend to an arbitrary mesh) results in a simple system of linear equations ...
68 / 84
Multidimensional Problems
2D Example
For a regular tensor product (tensor product of one-dimensional bases) mesh, BN IN 0 0 ... 0 IN BN IN 0 ... 0 0 IN BN IN ... 0 1 . .. . .. .. .. . . A= 2 . . . . . . h 0 . . . IN BN IN 0 0 ... 0 IN BN IN 0 ... 0 0 IN BN where IN is just the N N identity matrix, and ...
69 / 84
Multidimensional Problems
2D Example
BN =
and note that the numerical entries in BN are for 2D, using two-point central differences.
70 / 84
Multidimensional Problems
A is an N 2 N 2 matrix (2D). The bandwidth is N, meaning that we will need more sophisticated sparse matrix algorithms that we have used thus far (which have largely been limited to tridiagonal systems). Iterative techniques are easily adapted to the multidimensional case, regardless of the index ordering. Multigrid in particular can be applied directly.
71 / 84
Basics
Initial value problems can be a bit more difcult to handle with scalable parallel algorithms. In a similar vein to our simple example to illustrate boundary value problems, we can illustrate initial value problems using a simple physical problem - the massless pendulum, which obeys: d 2u = f (u), dt 2 with initial conditions u(0) = a0 , u (0) = a1 . In this context u is the angle that the pendulum makes with respect to the force of gravity, and f (u) = mg sin(u).
73 / 84
Basics
Applying our usual central difference scheme using a two-point formula yields: un+1 = 2un un1 f (un ), where = t 2 . we can make this a linear problem by taking the small angle approximation sin(u) u + ... u << 1
74 / 84
Basics
Linear Form
In the linear problem (small oscillations in the case of our pendulum): d 2u = mgu dt 2 and the difference equations become a simple recursion relation: un+1 = (2 mg)un un1 with starting values
u0 = a, u1 = a0 a1 t,
75 / 84
Basics
Matrix/Linear Form
We can write the linear form using a very simple matrix-vector equation: LU = b or
1 T 1 0 0 . . . 0 1 T 1 0 . . . 0 0 1 T 1 . . . 0 0 0 1 T . . . 0 0 0 0 1 . . . ... ... ... ... ... .. . u1 u2 u3 u4 u5 . . . = (1 mg)a0 + a1 t a0 0 0 0 . . .
where T = ( mg 2). For this set of linear equations we can bring our usual bag of tricks in terms of scalable parallel algorithms.
76 / 84
Basics
Nonlinearity
If we do not make the linear approximation in our simple example, we do not have such an easy (or at least well traveled) path to follow. Using a similar matrix-vector notation, F (U) = L0 U + mg(U) b = 0 (6)
where L0 is of similar form as before, but with the terms removed (i.e. diagonal elements are 2). The (U) term can also be written as a matrix, (U)ij = i1,j sin(ui1 ), where ij is the Kronecker delta-function, and this term contains all of our nonlinearities.
77 / 84
Newton-Raphson Method
Newton-Raphson Method
The Newton-Raphson method is one of the most powerful in terms of solving nonlinear problems. Basically we want to solve Eq. 6, F (U) = 0 using a root-nding method, i.e. when F (U) crosses the U axis. So for a given guess U n we drop a tangent line to F and nd where that line crosses the axis: 0 F (U n ) = F (U n ), n+1 U n U or F (U n ) U n+1 = U n . F (U n )
78 / 84
Newton-Raphson Method
Convergence of Newton-Raphson
Recall that Newton-Raphson has some very nice convergence properties, provided that you start close to a solution: |U n+1 U n | C|U n U n1 |2
79 / 84
Newton-Raphson Method
Multidimensional Newton-Raphson
Extending the previous relations to multiple dimensions is straightforward: we replace F (U n ) by the Jacobian matrix of all partial derivatives of F with respect to vector U: JF (U)ij = and then solve: JF (U n )V = F (U n ), U n+1 = U n + V . and repeat until convergence. In our simple pendulum case, we have J (U)ij = i1,j cos(ui1 ). Fi (U) , uj
80 / 84
Starting Newton-Raphson
Starting Newton-Raphson
So if Newton-Raphson converges quickly given a good starting approximation, how do we ensure that we have a good starting point? Generally there is not a good answer to that question, but one approach is to use a simplied approximation, to the nonlinear f function f , and then solve d 2u = f (u), dt 2 along with the original initial conditions. In our pendulum example, we could have used (u) = u instead of f (u) = sin(u). This starting point f gives us a residual, R(u) = d 2u f (u) = (u) f (u). f dt 2
81 / 84
Starting Newton-Raphson
and if we view v as a correction to u, we can write: d 2v = vf (u) R(u) dt 2 using a simple Taylor expansion of f (u + v ) Or in other terms: f (u) + vf (u) + O(v 2 ).
d 2v = vf (u) (u) f (u) . f dt 2 The size of v is zero at the start (thanks to the initial conditions), so its size can be tracked as the calculation of u proceeds, and provides a way of monitoring the calculation error.
82 / 84
Other Methods
Other Methods
It is worth mentioning other techniques for initial value problems as well: Multigrid in the time variable - start coarse, and them improve. See, for example: L. Bafco et. al., Parallel-in-time Molecular-Dynamics Simulations, Phys. Rev. E 66, 057701 (2002) Wave-form relaxation for a system of ODEs, using domain decomposition rather than parallelize over the time domain. See, for example: J. A. McCammon et. al., Ordinary Differential Equations of Molecular Dynamics, Comput. Math. Appl. 28, 319 (1994).
83 / 84
Other Methods
And Then?
Of course, this has been a mere sampling of parallel methods for partial differential equations. Hopefully it has given you a feel for at least the general principles involved. In the next discussion of PDEs, we will focus more on available HPC solution libraries and specialized methods.
84 / 84