6336 Spring 2008

Introduction to Simulation - Lecture 1
Example Problems and Basic Equations

Jacob White
Thanks to Deepak Ramaswamy, Michal Rewienski,

Luca Daniel, Shihhsien Kuo and Karen Veroy
Outline
Uses For Simulation
Engineering Design
Virtual Environments
Model Verification
Course Philosophy
Example Problems
Power distribution on an Integrated Circuit
Load bearing on a space frame
Temperature distribution in a package
Circuit Analysis
Equations
Current-voltage relations for circuit elements (resistors, capacitors,
transistors, inductors), current balance equations
Recent Developments
Matrix-Implicit Krylov Subspace methods
Electromagnetic
Analysis of Packages
Equations
Maxwells Partial Differential Equations
Recent Developments
Fast Solvers for Integral Formulations
Structural Analysis of
Automobiles
Equations
Force-displacement relationships for mechanical elements (plates,
beams, shells) and sum of forces = 0.
Partial Differential Equations of Continuum Mechanics
Recent Developments
Meshless Methods, Iterative methods, Automatic Error Control
Drag Force Analysis

of Aircraft
Equations
Navier-Stokes Partial Differential Equations.
Recent Developments
Multigrid Methods for Unstructured Grids
Engine Thermal
Analysis
Equations
The Poisson Partial Differential Equation.
Recent Developments
Fast Integral Equation Solvers, Monte-Carlo Methods
Micromachine Device
Performance Analysis
Equations
Elastomechanics, Electrostatics, Stokes Flow.
Recent Developments
Fast Integral Equation Solvers, Matrix-Implicit Multi-level Newton
Methods for coupled domain problems.
Stock Option Pricing for

Hedge Funds
Option Price
Stock Price
t
Equations
Black-Scholes Partial Differential Equation
Recent Developments
Financial Service Companies are hiring engineers, mathematicians and
physicists.
Virtual Environments
for Computer Games
Equations
Multibody Dynamics, elastic collision equations.
Recent Developments
Multirate integration methods, parallel simulation
10
Virtual Surgery
Equations
Partial Differential Equations of Elastomechanics
Recent Developments
Parallel Computing, Fast methods
11
Biomolecule Electrostatic
Optimization
+
- - +
+ +
+- +-
Ligand
(drug
molecule)
-+
Receptor
(protein
molecule)
Ecm protein
Equations
Recent Developments
Matrix-Implicit Iterative Methods, Fast Integral Equation Solvers
12
The Computer Simulation Senario

Problem too complicated for hand analysis
Toss out some
Terms
Macromodel
Solve a
Simplified
Problem
No
Make
Sense?
Yes
Anxiety
Simulate using a canned routine, a friends

advice, or a recipe book
Works!
Way too slow
D
R
O
P
Develop
Understanding of
Computational
complexity
Develop
Understanding of
Convergence
Issues
Faster Method
Robust Method
C
L
A
S
S
Right
Algorithms
Happiness
Only works sometimes
New
Algorithms
Fame
13
Course Philosophy
Examine Several Modern Techniques
Understand, practically and theoretically, how the
techniques perform on representative, but real,
applications
Why Prove Theorems?
Guarantees, given assumptions, that the method
will always work.
Can help debug programs.
The theorem proof can tell you what to do in practice.
14
Power Distribution for a VLSI Circuit
+ 3.3
v
Cache
ALU
Decoder
Power Supply
Main power wires
Is there at least 3v across the ALU ?
One application problem which generates large systems of equations is the problem
of distributing power to the various parts of a Very Large Scale Integrated (VLSI)
circuit processor.
The picture on the left of the slide shows a layout for a typical processor, with
different functional blocks noted. The processor pictured has nearly a million
transistors, and millions of wires which transport signals and power. All one can
really see be eye are the larger wires that carry power and patterns of wires that
carry signals, boolean operations such as and and or .
A typical processor can be divided into a number of functional blocks, as
diagrammed on the layout on the left. There are caches, which store copies of data
and instructions from main memory for faster access. There are execution units
which perform boolean and numerical operations on data, such as and, or ,
addition and multiplication. These execution units are often grouped together and
referred to as an Arithmetic Logic Unit (ALU). Another main block of the processor
is the instruction decoder, which translates instructions fetched from the cache into
actions performed by the ALU.
On the right is a vastly simplified diagram of the processor, showing a typical 3.3
volts power supply, the 3 main functional blocks, and the wires (in red) carrying
power from the supply to the 3 main functional blocks. The wires, which are part of
the integrated circuit, are typically a micron thick, ten microns wide and thousands
of microns long ( a micron is a millionth of an inch). The resistance of these thin
wires is significant, and therefore even though the supply is 3.3 volts, these may not
be 3.3 volts across each of the functional blocks.
The main problem is we address is whether or not each functional block has
sufficient voltage to operate properly.
15
Load Bearing Space Frame
Droop
Joint
Beam
Attachment to
the ground
Cargo
Vehicle
Does the Space Frame droop too much

under load ! ?
In the diagram is a picture of a space frame used to hold cargo (in red) to be lowered
into a vehicle. The space frame is made using steel beams(in yellow) that are bolted
together at the purple joints. When cargo is hanging off the end of the space frame,
the frame droops.
The main problem we will address is how much does the space frame droop under
load.
16
Thermal Analysis
Does the engine get too hot ! ?
Above is a picture of an engine block, which is typically solid steel or aluminum.

The heat generated by the gas burning in the cylinders must be conducted through
the engine block to a wide enough surface area that the heat can be dissipated. If
not, the engine block temperature will rise too high and the block will melt.
17
Design Objectives for the VLSI problem
+ 3.3
v
Cache
ALU
Decoder
Select topology and metal widths & lengths so that

a) Voltage across every function block > 3 volts
b) Minimize the area used for the metal wires
18
Design Objectives for the Space Frame

Droop
Select topology and Strut widths and lengths

so that
a) Droop is small enough
b) Minimize the metal used.
19
Thermal Analysis
Select the shape so that
a) The temperature does not get too high
b) Minimize the metal used.
20
First Step - Analysis Tools

Droop
+ 3.3
v
Cache
ALU
Decoder
Given the topology and metal widths & lengths

determine
a) The voltage across the ALU, Cache and Decoder.
b) The droop of the space frame under load.
21
Who uses VLSI

Tools ?
Several big companies
IBM, Motorola, TI, Intel, Compaq, Sony, Hitachi

Non functional prototype costs
- Increases time-to-market
- Design rework costs millions
Once a VLSI circuit is designed, it is fabricated using a sequence of sophisticated

deposition and etching processes which convert a wafer of Silicon into millions of
transistors and wires. This processing can take more than a month. If the circuit
does not function, the design flaw must be found and the fabrication process
restarted from the beginning. For this reason, just a few design errors can delay a
product for months. In a competitive market, this delay can cost millions in lost
revenue in addition to the cost of redesigning the circuit.
In order to avoid fabricating designs with flaws, companies make extensive use of
simulation tools to verify design functionality and performance.
22
Who uses VLSI

Tools ?
1000s of small
companies
Small companies make application circuits disk

drives, graphics accelerators, CD players, cell
phones.
What is the cost of non-functional prototypes ?
- Out of business.
Thousands of small companies design VLSI circuits for applications as diverse as

peripherals for personal computers as well as signal processors for audio, video and
automotive applications. These small companies cannot afford the cost of
fabricating prototype designs that do not function. The very survival of these
companies depends on using simulation tools to verify designs before fabrication.
23
Who makes VLSI Tools ?

Company employees sales
Market cap.
Cadence 4,000
1.3 billion 3.8 billion
Synopsis/ 5,000
Avanti
Mentor
2,600
Graphics
1.5 billion 6.9 billion

.6 billion
1.4 billion
Companies compete by improving analysis

efficiency.
24
Modeling VLSI circuit Power Distribution
+
3.3 v
Cache
ALU
Decoder
Power supply provide current at a certain voltage.

Functional blocks draw current.
The wire resistance generates losses.
SMA-HPC 2003 MIT
25
Each of the elements in the simplified layout, the supply, the wires and the
functional blocks, can be modeled by relating the voltage across that element to the
current that passes through the element. Using these element constitutive relations,
we can construct a circuit from which we can determine the voltages across the
functional blocks and decide if the VLSI circuit will function.
25
Supply becomes
Modeling the
Circuit
Power supply
current
A Voltage Source
V
+ Vs
+ Voltage
Physical
Symbol
Current element
V = Vs
Constitutive
Equation
The power supply provides whatever current is necessary to ensure that the voltage
across the supply is maintained at a set value. Note that the constitutive equation (in
the figure) , which is supposed to relate element voltage (V) to element current (I)
does not include current as a variable. This should not be surprising since voltage is
always maintained regardless of how much current is supplied, and therefore
knowing the voltage tells one nothing about the supplied current.
26
Functional blocks become
Modeling the
Circuit
Current Sources
+
ALU
Physical
Symbol
Is
Circuit Element
I = Is
Constitutive
Equation
The functional blocks, the ALU, the cache and the decoder are complicated circuits
containing thousands of transistors. In order to determine whether the functional
block will always have a sufficient voltage to operate, a simple model must be
developed that abstracts many of the operating details. A simple worst-case model
is to assume that each functional block is always drawing its maximum current.
Each block is therefore modeled as a current source, although one must assume that
the associated currents have been determined by analyzing each functional block in
more detail. Note that once again the constitutive equation is missing a variable, this
time it is voltage. Since a current source passes the same current independent of the
voltage across the source, that V is missing should be expected.
27
Metal lines become
Modeling the
Circuit
Resistors
I
Physical Symbol
IR V = 0
Circuit model
Constitutive Equation
(Ohms Law)
Length
resistivity
R=
Area
Design
Parameters
Material
Property
The model for the wires connecting the supply to the functional blocks is a resistor,
where the resistance is proportional to the length of the wire ( the current has further
to travel) and inversely proportional to the wire cross-sectional area ( the current
has more paths to choose).
low Resistance
high Resistance
That the current through a resistor is proportional to the voltage across the resistor
is Ohms law.
28
Modeling VLSI
Power Distribution
IC
IALU
Cache
Putting it all together
ID
ALU
Decoder
+
-
Power Supply
voltage source
Functional Blocks
current sources
Wires become resistors
Result is a schematic
To generate representation which can be used to determine the voltages across each
of the functional units, consider each of the models previously described.
First, replace the supply with a voltage source.
Second, replace each functional block with an associated current source.
Third, replace each section of wire with a resistor.
Note that the resistors representing the wires replace a single section with no
branches, though the section can have turns.
The resulting connection of resistors, current sources and voltage sources is called a
circuit schematic. Formulating equations from schematics will be discussed later.
29
Modeling the Space Frame

Bolts
Struts
Ground
Load
Example is simplified for illustration
In order to examine the space frame, we will consider a simplified example with
only four steel beams and a load. Recall that the purple dots represent the points
where steel beams are bolted together. Each of the elements in the simplified layout,
the beams and the load, can be modeled by relating the relative positions of the
elements terminals to the force produced by the element. Using these element
constitutive relations, we can construct a schematic from which we can determine
the frames droop.
30
Load becomes
Modeling the
Frame
Force Source
Fx = 0
Fload
x
Mass
Physical
Symbol
Schematic
Symbol
Fy = Fload
Constitutive
Equation
Fload = Mass Gravity
The load is modeled as a force pulling in the negative Y direction ( Y being vertical,
X being horizontal).
Note that the constitutive equation does not include the variable for the loads
position, following from the fact that the loads force is independent of position.
31
Beam becomes
Modeling the
Frame
Strut
x1 , y1
Strut
v
f
Beam
x2 , y2
Physical
Symbol
L = (x1 x2 )2 + ( y1 y2 )2
v
L L
f = EAc 0
L0
(Hookes Law)
L0 = Unstretched Length
Ac = Cross-Sectional Area
E = Young's Modulus
Design
Parameters
Material Property
In order to model the steel beams in a space frame, it is necessary to develop a

relation between the beam deformation and the restoring force generated by the
beam. To derive a formula we will make several assumptions.
1) The beam is perfectly elastic.
This means that if one deforms the beam by applying a force, the beam always
returns to its original shape after the force is removed.
Apply
force
Remove
force
L1 > Lo
Lo
Lo
2) The beam does not buckle
buckling
Lo
Apply force
No buckling
L1 > Lo
Buckling is an important phenomenon and ignoring it limits the domain of

applicability of this model.
32
3) The beam is materially linear.

For a beam to be materially linear, the force which acts along the beam is
directly proportional to the change in length.
Lo
f=0
f= KL
f = K L
L1
To determine K consider that the force required to stretch a beam an amount
is
(I) Inversely proportional to its unstretched length (It is easier to stretch a 10 inch
rubber band 1 inch than to stretch a 1 inch rubber band 1 inch)
(II) Directly proportional to its cross-sectional area (Imagine 10 rubber bands in
parallel)
(III) Dependent on the material (Rubber stretches more easily than steel).
Combining (I), (II) and (III) leads to the formula at the bottom of the slide.
33
Modeling the
Frame
Putting it all together
Load
How much does the load droop?
To generate a representation which can be used to determine the displacements of

the beam joints, consider the models previously described.
First, replace the loads with forces.
Second, replace each beam with strut.
34
Formulating Equations
from Schematics
Two Types of Unknowns
Circuit - Node voltages, element currents
Struts - Joint positions, strut forces
Two Types of Equations
Conservation Law Equation
Circuit - Sum of Currents at each node = 0
Struts - Sum of Forces at each joint = 0
Circuit - element current is related to voltage
across the element
Struts - element force is related to the change
in element length
SMA-HPC 2003 MIT
35
35
Conservation Laws and

Constitutive Equations
Heat Flow
1-D Example
Incoming Heat
T (1)
T (0)
Near End
Temperature
Unit Length Rod
Far End
Temperature
Question: What is the temperature distribution along the bar

T
T (0)
SMA-HPC 2003 MIT
T (1)
x 36
36

Heat Flow
Discrete Representation
1) Cut the bar into short sections

2) Assign each cut a temperature
T (1)
T (0)
T1
T2
SMA-HPC 2003 MIT
TN 1 TN
37
37

Heat Flow
Constitutive Relation
Heat Flow through one section
Ti +1 hi +1,i = heat flow =
Ti
Ti +1 Ti
x
hi +1,i
Limit as the sections become vanishingly small
T ( x )
lim x 0 h ( x ) =
38
x
SMA-HPC 2003 MIT
38

Heat Flow
Conservation Law
Two Adjacent Sections

control volume
Incoming Heat (hs )
Ti 1 hi ,i 1
Ti
hi +1,i Ti +1
x
Net Heat Flow into Control Volume = 0
SMA-HPC 2003 MIT hi +1,i hi ,i 1 = h s x
39
39

Heat Flow
Conservation Law
Net Heat Flow into Control Volume = 0

In com ing H eat ( h s )
hi +1,i hi ,i 1 = hs x
T i 1 hi , i 1
Ti
hi + 1, i Ti + 1
Heat in
from left
Heat out
from right
Incoming
heat per
unit length
lim x 0 hs ( x ) =
SMA-HPC 2003 MIT
h ( x ) T ( x )
=
x
x
x
40
40

Heat Flow
Circuit Analogy
Temperature analogous to Voltage

Heat Flow analogous to Current
1
=
R
x
T1
+
-
vs = T (0)
SMA-HPC 2003 MIT
is = hs x
TN
+
-
vs = T (1)
41
41
Conducting Bar Temperature, section heat flows
Conservation Law Equation
Bar Sum of heat flows into control volume = 0
Circuit element current related to voltage
Struts - strut force related to length change
Bar section temperature drop related to heat flow
SMA-HPC 2003 MIT
42
42
Formulating Equations Circuit Example

from Schematics
Identifying Unknowns
1
+
-
vs
3
4
Assign each node a voltage, with one node as 0
Given a circuit schematic, the problem is to determine the node voltages and
element currents. In order to begin, one needs labels for the node voltages, and
therefore the nodes are numbered zero, one, two, N, where N+1 is the total
number of nodes.
The node numbered zero has a special meaning, it is the reference node. Voltages
are not absolute quantities, but must be measured against a reference.
To understand this point better, consider the simple example of a current source and
a resistor.
0
1
1
In order for one Amp to flow through the resistor, V1 - V0 must equal one volt. But
does V1 = 11 volts and V0 =10 volts Or is V1 = 101 volts and V0 = 100 volts ? It
really does not matter, what is important is that V1 is one volt higher than V0. So,
let V0 define a reference and set its value to a convenient number, V0 = 0.
43

from Schematics
i5
0
i2
i1
i4
i3
Assign each element except current sources a

current
The second set of unknowns are the element currents. Obviously, the currents
passing through current sources are already known, so one need only label the
currents through resistors and voltage sources. The currents are denoted
i1, i2,
ib , where b is the total number of unknown element currents. Since elements

connect nodes, in an analogy with graphs, element currents are often referred to as
branch currents.
44

from Schematics
Conservation Law
i5
i1 + i 5 i 4 = 0
0
i2
i1
is1
i4
is 2 + is 3 i 2 i 5 = 0
is 1 i 1 + i 2 = 0
is 3
is 2
4
i 4 is1 is 2 i 3 = 0
i3
i 3 is 3 = 0
Sum of currents = 0 (Kirchoffs current law)
The conservation law for a circuit is that the sum of currents at each node equals
zero. This is often referred to as Kirchoffs current law. Another way to state this
law, which more clearly indicates its conservation nature is to say
Any current entering a node must leave the node.
The conservation is that no current is lost, what comes in goes out. The green
statement also makes it clear that the direction of the current determines its sign
when summing the currents. Currents leaving the node are positive terms in the sum
and currents entering the node are negative terms ( one can reverse this convention
but one must be consistent).
45

from Schematics
R5 i5 = 0 V2
R1
R5
R2
R2 i2 =V1 V2
R1 i1 = 0 V1
R4
R3
3
R4 i4 =V4 0
R3 i3 =V3 V4
Use Constitutive Equations to relate branch

flow from plus node to minus
currents to node voltages (Currents
node)
Each element with an unknown branch current has an associated constitutive
equation which relates the voltage across the element to the current through the
element. For example, consider
V2
i2
R2 in the figure,
V3
R2
The constitutive relation for a resistor is Ohms law.
And in this case V = V2 -
1
V = I
R
V3 and I = i2.
Onse should again take note of the direction of the current. If current travels from
left node through the resistor to the right node, then the left node voltage will be
higher than the right node voltage by an amount
R I.
46

from Schematics
Summary
Unknowns for the Circuit example
Node voltages ( except for a reference)
Element currents ( except for current sources)
Equations for the Circuit example
One conservation equation (KCL) for each node
(except for the reference node)
One constitutive equation for each element
(except for current sources)
Note that # of equations = # of unknowns
47
Summary of key
points
Many Applications of simulation

Picked Three Representative Examples
Circuits, Struts and Joints, Heat Flow in Bar
Conservation Laws
Bar - Sum of heat flows into control volume = 0
Circuit current-voltage relationship
Struts - force-displacement relationship
Bar - temperature drop-heat flow relationship
SMA-HPC 2003 MIT
48
48

Equation Formulation Methods
Jacob White

and Karen Veroy
Outline
Formulating Equations from Schematics
Struts and Joints Example
Matrix Construction From Schematics

Stamping Procedure
Two Formulation Approaches

Node-Branch More general but less efficient
Nodal Derivable from Node-Branch
from Schematics
Struts Example
x1 , y1
x2 , y2
0, 0
1, 0
hinged
Assign each joint an X,Y position, with one

joint as zero.
SMA-HPC 2003 MIT
Given a schematic for the struts, the problem is to determine the joint positions and
the strut forces.
Recall the joints in the struts problem correspond physically to the location where
steel beams are bolted together. The joints are also analogous to the nodes in the
circuit, but there is an important difference. The joint position is a vector because
one needs two (X,Y) (three (X,Y,Z)) coordinates to specify a joint position in two
(three) dimensions.
The joint positions are labeled x1,y1,x2,y2,..xj,yj where j is the number of joints
whose positions are unknown. Like in circuits, in struts and joints there is also an
issue about position reference. The position of a joint is usually specified with
respect to a reference joint.
Note also the symbol
This symbol is used to denote a fixed structure ( like a concrete wall, for example).
Joints on such a wall have their positions fixed and usually one such joint is selected
as the reference joint. The reference joint has the position 0,0
( 0,0,0 in three dimensions).
from Schematics
3
fx , f y
1
fx , f y
Struts Example
1
4
fx ,
fy
fx , f y
f lo a d
Assign each strut an X and Y force component.

SMA-HPC 2003 MIT
The second set of unknowns are the strut forces. Like the currents in the circuit
examples, these forces can be considered branch quantities. There is again a
complication due to the two dimensional nature of the problem, there is an x and a y
1
1
s
s
component to the force. The strut forces are labeled
f , f ,..., f , f
x
where s is the number of struts.
from Schematics
Y
Struts Example
Aside on Strut Forces
f = EAc
fx
f
(0, 0)
L
x1 , y1
fy
L0 L
= ( L0 L )
L0
x1
f
L
y
= 1 f
L
fx =
X
fy
L =
x12 + y12
SMA-HPC 2003 MIT
The force, f, in a stretched strut always acts along the direction of the strut, as
shown in the figure. However, it will be necessary to sum the forces at a joint,
individual struts connected to a joint will not all be in the same direction. So, to sum
such forces, it is necessary to compute the components of the forces in the X and Y
direction. Since one must have selected the directions for the X and Y axis once for
a given problem, such axes are referred to as the global coordinate system. Then,
one can think of the process of computing fx, fy shown in the figure as mapping from
a local to a global coordinate system.
The formulas for determining fx and fy from f follow easily from the geometry
depicted in the figure, one is imply projecting the vector force onto coordinate axes.
from Schematics
1
1
y
fx + fx + fx = 0
f + fy + fy = 0
x1 , y1
Struts Example
Conservation Law
f3
x2 , y2
fx4 fx3 + floadx = 0
f4
f y4 f y3 + floady = 0
f2
0,0
f lo a d
1,0
Force Equilibrium
Sum of X-directed forces at a joint = 0
Sum of Y-directed forces at a joint = 0
SMA-HPC 2003 MIT
The conservation law for struts is usually referred to as requiring force equilibrium.
There are some subtleties about signs, however. To begin, consider that the sum of
X-diirected forces at a joint must sum to zero otherwise the joint will accelerate in
the X-direction. The Y-directed forces must also sum to zero to avoid joint
acceleration in the Y direction.
To see the subtlety about signs, consider a single strut aligned with the X axis as
shown below
x1,0
x 2,0
, then the strut will exert force in attempt to contract, as
If the strut is stretched by
shown below
x 2 + ,0
x1,0
fa
fb
The forces fa and fb , are equal in magnitude but opposite in sign. This is because fa
points in the positive X direction and fb in the negative X direction.
If one examines the force equilibrium equation for the left-hand joint in the figure,
then that equation will be of the form
Other forces + fa = 0
whereas the equilibrium equation for the right-hand joint will be
Other forces + fb = Other forces- fa = 0
In setting up a system of equations for the strut, one need not include both fa and fb
as separate variables in the system of equations. Instead, one can select either force
and implicitly exploit the relationship between the forces on opposite sides of the
strut.
As an example, consider that for strut 3 between joint 1 and joint 2 on the slide, we
have selected to represent the force on the joint 1 side of the strut and labeled that
force f3. Therefore, for the conservation law associated with joint 1, force f3 appears
with a positive sign, but for the conservation law associated with joint 2, we need
the opposite side force, - f3. Although the physical mechanism seems quite different,
this trick of representing the equations using only the force on one side of the strut
as a variable makes an algebraic analogy with the circuit sum of currents law. That
is, it appears as if a struts force leaves one joint and enters another.
Formulating Equations from

Schematics
1
f 1 x = Fx ( x1 0, y1 0)
f 1 y = Fy ( x1 0, y1 0)
f1 f2
Struts Example
Conservation Law
f 3x = Fx ( x1 x 2, y1 y 2)
f 3 y = Fy ( x1 x 2, y1 y 2)
2
f3
f load
f 2 x = Fx ( x1 1, y1 0)
f 2 y = Fy ( x1 1, y1 0)
f4
f 4 x = Fx ( x2 1, y2 0)
1,0
f 4 y = Fy ( x2 1, y2 0)
Use Constitutive Equations to relate strut forces to joint positions.
It is worth examining how the signs of the force are determined.

Again consider a single strut aligned with the X axis.
x1,0
x 2,0
The -X axis alignment can be used to simplify the relation between the force on the
x1 side and x1 and x2 to
fx =
L | x1 x2 |
x1 x2
0
| x1 x2 |
L0
Note that there are two ways to make fx negative and point in the negative x
direction. Either x1- x2 > 0, which corresponds to flipping the strut, or |x2- x1| < L0
which corresponds to compressing the strut.
from Schematics
Struts Example
Summary
Unknowns for the Strut Example

Joint positions (except for a reference or
fixed joints)
Strut forces
Equations for the Strut Example
One set of conservation equations for each
joint.
One set of constitutive equations for each
strut.
Note that the # equations = # unknowns
SMA-HPC 2003 MIT
Strut Example To Demonstrate

Sign convention
Two Struts Aligned with the X axis
f1
f2
x1 , y1 = 0
fL
x2 , y2 = 0
Conservation Law
At node 1: f1x + f 2 x = 0
At node 2: -f 2 x + f L = 0
SMA-HPC 2003 MIT
10

Sign convention
f1
f2
fL
x2 , y 2 = 0
x1 , y1 = 0
x 0
f1x = 1
( L0 x1 0 )
x1 0
f2 x =
x1 x2
( L0 x1 x2
x1 x2
SMA-HPC 2003 MIT
11

Sign convention

Reduced (Nodal) Equations
x1
x x
( L0 x1 ) + 1 2 ( L0 x1 x2 ) = 0
x1
x1 x2
f2 x
x1 x2
( L0 x1 x2 ) + f L = 0
x1 x2
f2 x
SMA-HPC 2003 MIT
12

Sign convention
f1
f2
x1 , y1 = 0
fL
x2 , y 2 = 0
Solution of Nodal Equations
f L = 10 (force in positive x direction)

10
10
x1 = L0 +
x2 = x1 + L0 +
SMA-HPC 2003 MIT
13

Sign convention
f1
f2
x1 , y1 = 0
fL
x2 , y 2 = 0
Notice the signs of the forces
f 2 x = 10 (force in positive x direction)

f1x = 10 (force in negative x direction)
SMA-HPC 2003 MIT
14

Schematics
Examples from last time

x1, y1
x2, y2
0,0
4
Circuit Modeling VLSI

Power Distribution
Struts and Joints

Modeling a Space Frame
SMA-HPC 2003 MIT
15

Schematics

Conservation Law
Constitutive Relations
Circuit branch (element) current proportional to branch
(element) voltage
Struts - branch (strut) force proportional to branch (strut)
displacement
16
Generating Matrices from

Schematics
Assume Linear Constitutive Equations...
Circuit Example
One Matrix column for each unknown
N columns for the Node voltage
B columns for the Branch currents
One Matrix row for each equation
N rows for KCL
B rows for element constitutive equations
(linear !)
SMA-HPC 2003 MIT
17

Schematics
Assume Linear Constitutive Equations
Struts Example in 2-D

One pair of Matrix columns for each unknown
J pairs of columns for the Joint positions
S pairs of columns for the Strut forces
One pair of Matrix rows for each equation
J pairs of rows for the Force Equilibrium
equations
S pairs of rows for element constitutive
equations (linear !)
SMA-HPC 2003 MIT
18

Schematics
Circuit Example
Conservation Equation
i5
0
R5
V1
R1
i1
i2
is1
is 2
is 3
R3
R4
i4
V2
R2
V4
i3
V3
SMA-HPC 2003 MIT
To generate a matrix equation for the circuit, we begin by writing the KCL equation
at each node in terms of the branch currents and the source currents. In particular,
we write
signed branch currents = signed source currents

where the sign of a branch current in the equation is positive if the current is leaving
the node and negative otherwise. The sign of the source current in the equation is
positive if the current is entering the node and negative otherwise.
19

Schematics
i5
0
i1
is 2
is 3
R3
R4
SMA-HPC 2003 MIT
V2
R2
i2
is 1
i4
R5
V1
R1
Circuit Example
V4
i3
V3
i1 + i 2 = is1
i 2 i 5 = i s 2 i s 3
i 3 = is 3
i3 + i4 = is1 + is2
20

Schematics
Circuit Example
Matrix Form for the Equations
One
Row
for
each
KCL
Equation
2
3
4
1 1
i1
i 2

i 3

i 4
i 5
One column for each branch

current
The matrix A is usually not square
is1
i
i
s
s
2
3
i
s3
is1 + is2
Right Hand
Side for
Source
Currents
SMA-HPC 2003 MIT
21

Schematics
Circuit Example
How each resistor contributes to the matrix
n1
n2
1
1
n1
KCL at n1
ik
n2
Rk
iother + ik = is
KCL at n 2
iother ik = is
A has no more than two nonzeros per column

SMA-HPC 2003 MIT
What happens to the matrix when one end of a resistor is connected to the reference
( or the zero node).
n1
ik
In that case, there is only one contribution to the kth column of the matrix, as shown
below
n1
22

Schematics
Circuit Example
How each current source contributes to the

Right Hand Side
n1 isother + isb
n 2 isother isb
RHS
isb n2
n1
KCL at n1
ib 's =
isother + isb
KCL at n2
ib 's =
isother isb
SMA-HPC 2003 MIT
23

Schematics
Circuit Example
Conservation Matrix Equation Generation Algorithm
n1
For each resistor
ik
n2
if (n1 > 0) A(n1, b) = 1

if (n 2 > 0) A(n 2, b) = 1
Set Is = zero vector
For each current source
n1
i sb n 2
if (n1 > 0) Is (n1) = Is (n1) isb

if (n 2 > 0) Is (n 2) = Is (n1) + isb
SMA-HPC 2003 MIT
24

Schematics
R
i5
Circuit Example
R1
i1
R2
i2
is1
is 2
i3
R4
i4
R3
4
1
2
3
4
is 3
1 1
1
i1 i 2 i 3
i4 i5
i1
is 1
i 2

is 2 is 3
i 3 =
is 3

4
i

is1 +is2
i 5
SMA-HPC 2003 MIT
25

Schematics
i5
i1
1 i2
Circuit Example
i5 =
2
i1 =
i2 =
1
1
Vb1 = (0 V 1)
R1
R1
i4
i4 =
1
1
Vb 5 = (0 V 2)
R5
R5
1
1
Vb2 = (V 3 V 4)
R2
R2
i3
4
1
1
Vb 4 =
(V 4 0)
R4
R4
i3 =
1
1
Vb 3 = (V 3 V 4)
R3
R3
First determine Voltages across resistors (Branch Voltages)

Second relate Branch currents to Branch Voltages
SMA-HPC 2003 MIT
The current through a resistor is related to the voltage across the resistor, which in
turn is related to the node voltages. Consider the resistor below.
V1
i1
V2
R1
The voltage across the resistor is V1-V2 and the current through the resistor is
i1 =
1
(V 1 V 2)
R1
Notice the sign, i , is positive if V1 > V2.

In order to construct a matrix representation of the constitutive equations, the first
step is to relate the node voltages to the voltages across resistors, the branch
voltages.
26

Schematics
Circuit Example
i5
0
i1
i5 =
i2
2
i2 =
1
1
i1 =
V b1 =
( 0 V 1)
R1
R1
i4
Examine
Matrix
Construction
1
1
Vb 2 =
(V 1 V 2)
R2
R2
i3
4
1
1
i4 =
Vb4 =
(V 4 0 )
R4
R4
V b 1 1
V b 2
1
V b 3 =

V b 4
V b 5
1
1
Vb5 =
(0 V 2)
R5
R5
1
1
i3 =
Vb3 =
(V 3 V 4 )
R3
R3
1
1
V 1
V 2

V 3

V 4
SMA-HPC 2003 MIT
To generate a matrix equation that relates the node voltages to the branch voltages,
one notes that the voltage across a branch is just the difference between the node
voltages at the ends of the branch. The sign is determined by the direction of the
current, which points from the positive node to the negative node.
Since there are B branch voltages and N node voltages, the matrix relating the two
has B rows and N columns.
27

Schematics
Circuit Example
Node to Branch Relation
KCL Equations
1 1
1
1
1 1
Vb1
1
Vb 2
V 1
1 1

V 2
1 1 = Vb 3

V 3
1
Vb 4
V 4
Vb 5
i1
i 2

i 3 = Is

i 4
i 5
AT
The node-to-Branch matrix is the transpose of the KCL Matrix

SMA-HPC 2003 MIT
A relation exists between the matrix associated with the conservation law (KCL)
and the matrix associated with the node to branch relation. To see this, examine a
single resistor.
k
Vl
Rk
Vm
For the conservation law, branch k contributes two nonzeros to the kth column of
A as in
l
m
1
1
I 1
:

:

:
I B

Is

A
Note that the voltage across branch k is Vl -Vm, so the kth branch contributes two
non-zeros to the kth row of the nodebranch relation as in
28
V 1
:
:
V N
V b 1
V b B
It is easy to see that each branch element will contribute a column to the incidence
matrix A, and will contribute the transpose of that column, a row, to the node-tobranch relation.
29

Schematics
i5
0
R1
is 2
i3
R4
i4
R2
i2
is1
R5
i1
Circuit Example
is 3
R3
The kth resistor contributes
i1
i 2
i 3 =

i 4
i 5
1
R1
1
R2 1
R3
1
R4
1
R5
Vb1
Vb 2

Vb 3

Vb 4
Vb 5
1
to ( k , k )
Rk
The matrix relates branch voltages to branch currents.

- One row for each unknown current.
- One column for each associated branch voltage.
The matrix is square and diagonal.
SMA-HPC 2003 MIT
30

Schematics
i5
0
R1
is 2
i3
R4
i4
R2
i2
is1
R5
i1
Circuit Example
is 3
R3
3
i1
i 2

i 3 AT

i 4
i 5
0
V 1
0
V 2

= 0
V 3

0
V 4
0
VS
The node voltages can be related to branch currents.

- AT relates node voltages to branch voltages.
-
relates branch voltages to branch currents.
SMA-HPC 2003 MIT
31

Schematics
B
N
AT
Circuit Example
Node-Branch Form
Ib
0
=

VN
Is
N = Number of Nodes with unknown voltages

B = Number of Branches with unknown currents
Ib AT VN = 0
A Ib = I s
Conservation Law
32

Schematics
Struts Example
In 2-D
One pair of columns for each unknown
- J pairs of columns for the Joint positions
- S pairs of columns for the Strut positions
One pair of Matrix Rows for each Equation
- J pairs of rows for the force equilibrium
equations
- S pairs of rows for the Linearized constitutive
relations.
33

Schematics
Struts Example
Follow Approach Parallel to Circuits

1) Form an Incidence Matrix, A, from
Conservation Law.
2) Determine strut deformation using AT.
3) Use linearized constitutive equations to relate
strut deformation
4) Combine (1),(2), and (3) to generate a
node-branch form.
34

Schematics
x1, y1
f1
Struts Example
f3
x2, y 2
f4
f2
fl
0,0
1,0
f 1x + f 2 x + f 3 x = 0
f 1y + f 2 y + f 3y = 0
f 3 x + f 4 x = fl x
f 3 y + f 4 y = fl y
SMA-HPC 2003 MIT
As a reminder, the conservation equation for struts is naturally divided in pairs. At

each joint the sum of X-directed forces = 0 and the sum of Y-directed forces = 0.
Note that the load force is known, so it appears on the right hand side of the
equation.
35

Schematics
x1, y1
f1
Struts Example
f3
x2, y 2
Stamping Approach
f4
f2
fl
0,0
Load pair of columns per strut

Load right side for load
1,0
f 1x f 1 y f 2 x f 2 y f 3x f 3 y f 4 x f 4 y
x1 1
y1
1
x2
y 2
1
1
1
1
1
1
A
SMA-HPC 2003 MIT
f 1x
f 1y

f 2x

f 2 y =
f 3x
fl x

fl y
f 3y
f 4x
FL

f 4 y
Note that the incidence matrix, A, for the strut problem is very similar to the
incidence matrix for the circuit problem, except the two dimensional forces and
positions generate 2x2 blocks in the incidence matrix. Consider a single strut
x j 1, y j 1
fs
xj 2, yj 2
The force equilibrium equations for the two joints at the ends of the strut are
At joint j1
+ fsx = flx
+ fsy = fly
xother
j1
At joint j2
j1
yother
j1
j1
fsx = flx
xother
j2
yother
j2
j2
fsy = fly
j2
36
Examining what goes in the matrix leads to a picture
fsx
xj1
yj1
xj 2
yj 2
fsy
Note that the matrix entries are 2x2 blocks. Therefore, the individual entries in the
matrix block for strut Ss contribution to j1s conservation equation need specific
indices and we use j1x, j1y to indicate the two rows and Sx, Sy to indicate the two
columns.
37

Schematics
Struts Example
Conservation Matrix Generation Algorithm

For each strut
If ( j1 is not fixed)
A( j1 x, bx ) = 1 A( j1 y , by ) = 1
A( j 2 x, bx ) = 1 A( j 2 y , by ) = 1
xj 1, yj 1
For each load
fload
FL ( j1x ) = FL ( j1x ) f load x

FL ( j1 y ) = FL ( j1x ) fload y
A has at most 2 non-zeros / column
38

Schematics
Struts Example
Y
f
x 1, y 1
First linearize the

constitutive relation
(0, 0)
If x1, y1 are close to some x0, y0, x02 + y02 = L02

Fx
( x 0, y 0)
fx x
=
fy Fy ( x 0, y 0)
x
Fx
( x 0, y 0)
y
Fy
( x 0, y 0 )
y
ux

uy
ux 1 = x 1 x 0
uy 1 = y 1 y 0
SMA-HPC 2003 MIT
As shown before, the force through a strut is
x
( L0 L )
L
y
f y = Fy ( x, y ) = ( L0 L )
L
f x = Fx ( x , y ) =
where
L = x2 + y2
and x, y are as in
y
L
X
If x and y are perturbed a small amount from some x0, y0 such that x02 + y02 = L02,
then since Fx(x0,y0) = 0
Fx
Fx
fx
( x 0, y 0) ( x1 x 0) +
( x 0, y 0) ( y1 y 0)
x
y
and a similar expression holds for y.
One should note that rotating the strut, even without stretching it, will violate the
small perturbation conditions. The Taylor series expression will not give good
approximate forces, because they will point in an incorrect direction.
39

Schematics
ux1, uy1
f1
Struts Example
f3
ux2, uy 2
f2
f4
fl
0,0
1,0
f 1x
f 1y

f 2x

f 2y
f 3x

f 3y
f 4x

f 4 y
11
22
33
44
0
0

ux1 0
0
T uy1
=
A
ux 2 0

uy 2 0
0

0
SMA-HPC 2003 MIT
40

Schematics
Struts Example
The ( s, s ) block
ux1, uy 1 fs
Initial position
x 10 , y 10
F x
x ( x 20 x 10 , y 20 y 10 )
( s, s ) =
F y
( x 20 x 10 , y 20 y 10 )
x
ux 2 , uy 2
Initial position
x 20 , y 20
F x
( x 2 0 x 1 0 , y 20 y 1 0 )
y
F y
( x 20 x 10 , y 20 y 10 )
y
SMA-HPC 2003 MIT
41

Schematics
2 S
2 J
Struts Example
Node-Branch From
AT fs 0

=
u fL
0
2 J
2 S
S =Number of Struts
J = Number of unfixed Joints
fs = AT u = 0 Constitutive Equation
A fs = 0 Conservation Law
42

Schematics
2 S
2 J
2 S
Struts Example
Comparison
AT fs 0

=
u fL
0
2 J
AT Ib Vs

=
VN Is
0
43
Generating Matrices
Nodal Formulation
is1 +
R5
1
1
V1 + (V1 V2 ) = 0
R1
R2
V1
R1
R4
is1 is2 +
Circuit Example
is1
V4
V2
is 2
is2 + is3 +
R2
1
1
(V2 V1 ) + V2 = 0
R2
R5
is 3
R3
V3
1
1
V4 + (V4 V3 ) = 0
R4
R3
1) Number the nodes with one node as 0.

2) Write a conservation law at each node.
except (0) in terms of the node voltages !
SMA-HPC 2003 MIT
44
Generating Matrices
Nodal Formulation
i5
0
i1
i
R5
V1
R1
Circuit Example
V2
R2
i2
is1
One row per node, one

column per node.
is 3
For each resistor
is 2
i3
4 R4
1
1
1
+
R1 R 2
R2
1
1
1
+
R2
R 2 R5
R3
V4
1
R3
1
R3
1
R3
1
1
+
R3 R 4
n1
V3
v1
is1
v
2 = is2 is3
v3
is3

v4
is1 + is2
n2
Is
G
SMA-HPC 2003 MIT
Examining the nodal equations one sees that a resistor contributes a current to two
equations, and its current is dependent on two voltages.
ik
V n1
Vn2
Rk
1
(Vn1 Vn 2) = is
Rk
1
KCL at node n 2 iothers (Vn1 Vn 2 ) = is
Rk
So, the matrix entries associated with Rk are
KCL at node n1
others
n1
n1
n2
n2
1
1
Rk
Rk
1
1
Rk
Rk
45
Generating Matrices
Nodal Formulation
Circuit Example
Nodal Matrix Generation Algorithm

if (n1 > 0) & (n 2 > 0)
1
1
, G(n 2, n1) = G(n 2, n1)
R
R
1
1
G(n1, n1) = G(n1, n1) +
, G(n 2, n 2) = G(n 2, n 2) +
R
R
G(n1, n 2) = G(n1, n 2)
else if (n1 > 0)

G(n1, n1) = G(n1, n1) +
1
R
else
G(n 2, n 2) = G(n 2, n 2) +
1
R
SMA-HPC 2003 MIT
46
Nodal Formulation
N
2 J
Generating Matrices
G Vn = Is
N
G uj = FL
(Resistor Networks)
(Struts and Joints)
2 J
47
Nodal Formulation
Comparing to Node-Branch form
Node-Branch Matrix
Constitutive
Conservation
Law
I AT
A
0
Ib 0
=
VN Is
Nodal Matrix
[ G ][VN ] = [Is ]
48
Nodal Formulation
Diagonally Dominant .
G matrix properties
Gii Gij
j i
Symmetric ..
Smaller ..
Gij = Gji
N N << ( N + B ) ( N + B )
2 J 2 J << ( 2 J + 2 S ) ( 2 J + 2 S )
49
Node-Branch form
Nodal Formulation
Node-Branch formulation
AT
0
M
Ib
0
=

Vn
I s
b
x
Not Symmetric or Diagonally Dominant

Matrix is (n+b) x (n+b)
SMA-HPC 2003 MIT
50
Deriving Formulation From NodeBranch
Nodal Formulation
Ib AT VN = 0
A ( Ib AT VN ) = A 0
A Ib = Is
A AT VN = Is
G
51
Problem element
Nodal Formulation
Voltage Source
is
Voltage source
Vn 1
Vs
+
Vn 2
0 is + V n 1 V n 2 = V s
SMA-HPC 2003 MIT
52
Problem Element
Nodal Formulation
Voltage Source
Can form Node-Branch Constitutive Equation with

Voltage Sources
R
i5
5
i6
Vs
R1
i1
i2
i3
R4
R2
i4
R3
0
1
i1
V1 0
1
i2
V2
1
i3 AT V 3 = 0
1
i4
V 4 0
1 i5
V 5 0
0 i6
Vs
Vs
1
R1
R2 1
=
R3 1
R4 1
R5
SMA-HPC 2003 MIT
53
Problem Element
Nodal Formulation
Voltage Source
Cannot Derive Nodal Formulation
I bR
T
0 A VN = 0 (Constitutive Equation)

I bR
A A ATVN = 0 (Multiply by A)
0
Resistor currents
Voltage source
currents
missing
A Ib = Is
(Conservation Law)
Ib
Cannot Eliminate Ai !
R
Nodal Formulation requires Constitutive relations

in the form
Conserved Quantity = F ( Node Voltages) !
SMA-HPC 2003 MIT
54
Problem Element
Nodal Formulation
Rigid rod
Rigid Rod
x1, y1
x 2, y 2
L
0
( y1 y2 )
0 fx (x1 x2)2+ ( y1 y2)2 L

+
=
( x1 x2) fy
0
0
SMA-HPC 2003 MIT
55
Comparing Matrix Sparsity
Nodal Formulation
Example Problem
Resistor Grid
V1
V2
V 101
V 901
V 102
V 902
V3
V 103
V 903
V4
V 99
V 100
V 200
V 1000
SMA-HPC 2003 MIT
56
Comparing Matrix Sparsity
Nodal Formulation
Example Problem
Matrix non-zero locations for 100 x 10 Resistor Grid
Node-Branch
Nodal
57
Summary of key points...
Developed algorithms for automatically

constructing matrix equations from schematics
using conservation law + constitutive equations.
Looked at two node-branch and nodal forms.
58
Summary of key points
Node-branch
General constitutive equations
Large sparser system
No diagonal dominance
Nodal
Conserved quantity must be a function of node
variables
Smaller denser system.
Diagonally dominant & symmetric.
59

Basics of Solving Linear Systems
Jacob White

Karen Veroy and Jacob White
SMA-HPC 2003 MIT
Outline
Solution Existence and Uniqueness
Gaussian Elimination Basics
LU factorization
Pivoting and Growth
Hard to solve problems
Conditioning
Application
Problems
G
M
Vn = I s
x
No voltage sources or rigid struts

Symmetric and Diagonally Dominant
Matrix is n x n
SMA-HPC 2003 MIT
Systems of Linear
Equations
M1
M2
x b
1 1
b
x
MN 2 = 2

xN bN
x1M 1 + x2 M 2 +
+ xN M N = b
Find a set of weights, x, so that the weighted

sum of the columns of the matrix M is equal to
the right hand side b
SMA-HPC 2003 MIT
Key Questions
Systems of Linear
Equations
Given Mx = b
Is there a solution?
Is the solution Unique?
Is there a Solution?
There exists weights, x1 ,
x1M 1 + x2 M 2 +
xN , such that
+ xN M N = b
A solution exists when b is in the

span of the columns of M
SMA-HPC 2003 MIT
Systems of Linear
Equations
Key Questions
Continued
Is the Solution Unique?

Suppose there exists weights, y1 ,
y1M 1 + y2 M 2 +
y N , not all zero
+ yN M N = 0
Then if Mx = b, therefore M ( x + y ) = b
A solution is unique only if the

columns of M are linearly
independent.
SMA-HPC 2003 MIT
Systems of Linear
Equations
Key Questions
Square Matrices
Given Mx = b, where M is square

If a solution exists for any b, then the
solution for a specific b is unique.
For a solution to exist for any b, the columns of M must
span all N-length vectors. Since there are only N
columns of the matrix M to span this space, these vectors
must be linearly independent.
A square matrix with linearly independent

columns is said to be nonsingular.
SMA-HPC 2003 MIT
Important Properties
Gaussian
Elimination Basics
Gaussian Elimination Method for Solving M x = b

A Direct Method
Finite Termination for exact result (ignoring roundoff)
Produces accurate results for a broad range of
matrices
Computationally Expensive
SMA-HPC 2003 MIT
Gaussian
Elimination Basics
Reminder by Example
3 x 3 example
M 11 M 12 M 13 x1
b1

M 21 M 22 M 23 x2 = b2
M 31 M 32 M 33 x3
b3
M 11 x1 + M 12 x2 + M 13 x3 = b1
M 21 x1 + M 22 x2 + M 23 x3 = b2
M 31 x1 + M 32 x2 + M 33 x3 = b3
SMA-HPC 2003 MIT
Gaussian
Elimination Basics
Reminder by Example
Key Idea
Use Eqn 1 to Eliminate x1 From Eqn 2 and 3
M 11 x1 + M 12 x2 + M13 x3 = b1
21 x1+ M
22 x2 + M
MM
M 2123 x3 = b2
M 21
21
M
M
x
+
M
M
x
=
b
b1
M
22
x23 + M x 13= b3
12 21
2
2
M
x
+
M
M
M
M
(
M
x
+
M
x
+
M
x
=
b
)
31
1
32
2
33
3
3
11
1112 2
111
11 1
13 3
M 11
M 31
M 31x = b
M 31
MM
x1+x2M+ 32 Mx233+M
3112
33 M
3 13 x33 = b3
b1
M 32
M
M
M
11
11
11
M 31
( M 11 x1 + M12 x2 + M13 x3 = b1 )
M 11
SMA-HPC 2003 MIT
10
Gaussian
Elimination Basics
Pivot
M 11
Reminder by Example
Key Idea in the Matrix
M 12
M 13
M 21
M 21
M 21
M
M
M
x
=
b
22
12 23
12 2
1
2
M 11
M 11
M11

M 31
M 31
M 32
M 12 M 33
M 12
M
M 11
M 11

x3 b3 31 b1
M11
MULTIPLIERS
SMA-HPC 2003 MIT
11
Gaussian
Elimination Basics
Reminder by Example
Remove x2 from Eqn 3
Pivot
M 12
M11
M 21
M 22
M12
0
M11
0
0
M13
M 21
M 23
M 12
M11
M 31
M 33 M M12
11
M 31
M 32
M 12
M 11
M 21
M 22
M 12
M 11
M 21
M 23
M 12
M11
x1 b1
21
b2
b
M11
x2 =
M 31
M
M
32
12
M11
b M 21 b
b M 31 b
2
1
3 M11 1
M11
M 21
M 22
M 12

M11
x3
Multiplier
SMA-HPC 2003 MIT
12
Gaussian
Elimination Basics
M 11
Reminder by Example
Simplify the Notation
1
b
M 12
M 13

M 21
M 21
M 21
M
M
M
x
=
b
22
12 23
12 2
1
2
11
M 1122
M23
M11

M 31
M 31
M
M
M
32
12 33
12
M
11
M 1132
M33

x3 b3 331 b1
M11
SMA-HPC 2003 MIT
13
Gaussian
Elimination Basics
M 11
0
Pivot
M 12
M 22
0
M 13
M 23
M 33
M 32
M 22
Reminder by Example
Remove x2 from Eqn 3
b1
x
x2 =
b2
x
M
3
M 23
b3 32 b2
M 22
Multiplier
SMA-HPC 2003 MIT
14
Gaussian
Elimination Basics
Reminder by Example
GE Yields Triangular System
x
x
y b1
M11 U MU12 UM
1
3
x = 1y
0
U
U
0 M M

x
2y = b2
23x
0 220 U
y
x3
= 0U 3 0 M33
x3 b3
33
11
x2 =
12
13
22
23
33
y2 U 23 x3
U 22
x1 =
SMA-HPC 2003 MIT
Altered
During
GE
y1 U12 x2 U13 x3
U11
15
Gaussian
Elimination Basics
Reminder by Example
The right-hand side updates
b1
b1
y1
M 21

y2 = b2 b1
11
=
y

2
2
y3 b M 31 b M 32 b
3 M 11 1 M 22 2

y3
0 0
1
y b
M21
1 1
M 21 M 1 0 y2 = b2
y b
b1 11
M 11 M31 M32 3 3
1
M11 M22
M
M
31
b
b1 32 b2
M 22
M 11
SMA-HPC 2003 MIT
16
Gaussian
Elimination Basics
Reminder by Example
Fitting the pieces together
M
M
M
M
M
M
M111 12 13
M
M
MM M
1 M
M
M M22 23
M
M
M 1
MM MM M33
11
21
2 1
12
13
22
23
11
1 1
31
3 1
1111
32
3 2
33
22 3 2
SMA-HPC 2003 MIT
17
Basics of LU
Factorization
Solve M x = b
Step 1
M = LiU
=
Step 2 Forward Elimination
Solve L y = b
Step 3 Backward Substitution
Solve U x = y
SMA-HPC 2003 MIT
Recall from basic linear algebra that a matrix A can be factored into the product of a
lower and an upper triangular matrix using Gaussian Elimination. The basic idea of
Gaussian elimination is to use equation one to eliminate x1 from all but the first
equation. Then equation two is used to eliminate x2 from all but the second
equation. This procedure continues, reducing the system to upper triangular form as
well as modifying the righthand side. More is
needed here on the basics of Gaussian elimination.
18
Basics of LU
Factorization
l11 0
l
21 l22
l13
lN 1
Solving Triangular
Systems
Matrix
0
lNN
y1
b1

b 2
y2

=

y
b
N
N
The First Equation has only y1 as an unknown.

The Second Equation has only y1 and y2 as
unknowns.
SMA-HPC 2003 MIT
19
Solving Triangular
Systems
Basics of LU
Factorization
l11 0
l
21 l22
l13
l N 1
Algorithm
0
0
l NN
y1
b1
y
b 2
2

=

y
bN

N
1
b1
l11
1
y2 = (b2 l21 y1)
l22
1
y3 = (b3 l31 y1 l32 y2)
l33
y1 =
yN =
1
l NN
N 1
(bN lN i yi )
i =1
SMA-HPC 2003 MIT
Solving a triangular system of equations is straightforward but expensive. y1 can be

computed with one divide, y2 can be computed with a multiply, a subtraction and a
divide. Once yk-1 has been computed, yk can be computed with k- 1multiplies, k- 2
adds, a subtraction and a divide. Roughly the number of arithmetic operations is
N divides + 0 add /subs + 1 add /subs + . N- 1 add /sub
for y2
for y1
+
0 mults
for y1
+ 1 mult
for yN
+ .. + N- 1 mults
for y2
for yN
(N
- 1)(N
- 2) / 2 add/subs + (N
- 1)(N
- 2) / 2 mults + N divides
Order N2 operations
20
Factoring
Basics of LU
Factorization
M 11
M
M
M 21
M
M
M 31
M
M
M 41
21
11
31
11
41
11
Picture
M 12 M 13 M 14
M
M 22
M 23
22 M
23 M 24
24
M
M 3333 M 34
M
M
32
34
M
M
M
M
M
M
M444444
M
43
42
43
43
42
M
M
32
22
42
43
22
33
SMA-HPC 2003 MIT
The above is an animation of LU factorization. In the first step, the first equation is
used to eliminate x1 from the 2nd through 4th equation. This involves multiplying
row 1 by a multiplier and then subtracting the scaled row 1 from each of the target
rows. Since such an operation would zero out the a21, a31 and a41 entries, we can
replace those zerod entries with the scaling factors, also called the multipliers. For
row 2, the scale factor is a21/a11 because if one multiplies row 1 by a21/a11 and
then subtracts the result from row 2, the resulting a21 entry would be zero. Entries
a22, a23 and a24 would also be modified during the subtraction and this is noted by
changing the color of these matrix entries to blue. As row 1 is used to zero a31 and
a41, a31 and a41 are replaced by multipliers. The remaining entries in rows 3 and 4
will be modified during this process, so they are recolored blue.
This factorization process continues with row 2. Multipliers are generated so that
row 2 can be used to eliminate x2 from rows 3 and 4, and these multipliers are
stored in the zerod locations. Note that as entries in rows 3 and 4 are modified
during this process, they are converted to gr een. The final step is to used row 3 to
eliminate x3 from row 4, modifying row 4s entry, which is denoted by converting
a44 to pink.
It is interesting to note that as the multipliers are standing in for zerod matrix
entries, they are not modified during the factorization.
21
Factoring
LU Basics
For i = 1 to n-1 {
For j = i+1 to n {
M ji =
Algorithm
For each Row

For each target Row below the source
M ji
M ii
Pivot
For k = i+1 to n { For each Row element beyond Pivot
M jk M jk M ji M ik
}
Multiplier
}
}
SMA-HPC 2003 MIT
22
Factoring
LU Basics
At Step i
Zero Pivots
Factored Portion
Multipliers M
(L)
ii
ji
Row i
Row j
M ji
What if M ii = 0 ? Cannot form
M ii
Simple Fix (Partial Pivoting)
If M ii = 0
Find M ji 0 j > i
Swap Row j with i
SMA-HPC 2003 MIT
Swapping row j with row i at step i of LU factorization is identical to applying LU

to a matrix with its rows swapped a priori.
To see this consider swapping rows before beginning LU.
r1
r2
rj
ri
r N
x1
b1
x2
b2

|
|

xi = bj
|
|

xj
bi
|
|

b N
x N
Swapping rows corresponds to reordering only the equations. Notice that

the vector of unknowns is NOT reordered.
23
Factoring
LU Basics
Zero Pivots
Two Important Theorems

1) Partial pivoting (swapping rows) always succeeds
if M is non singular
2) LU factorization applied to a strictly diagonally
dominant matrix will never produce a zero pivot
SMA-HPC 2003 MIT
Theorem Gaussian Elimination applied to strictly diagonally dominant matrices

will never produce a zero pivot.
Proof
1) Show the first step succeeds.
2) Show the (n- 1) x (n- 1) sub matrix
n- 1
n
n- 1
n
is still strictly diagonally dominant.
n
as
a11 0
| a11 |> | aij |
Second row after first step
j =2
First Step
0, a 22
Is
a 22
a 21
a 21
a 21
a12, a 23
a13, , a 2 n
a 1n
a11
a11
a11
a 21
a 21
a12 > a 2 j
a1 j ?
a11
a11
24
Numerical Problems
LU Basics
Small Pivots
Contrived Example
1017
2
1
L = 17
10
x1 1
x = 3
2
0
1
1017
U=
0
1017 + 2
1
Can we represent
this ?
SMA-HPC 2003 MIT
In order to compute the exact solution first forward eliminate
0 y1 1
=
1 y2 3
and therefore y1 = 1, y2 = 3 -1017.
1
1017
Backward substitution yields

1017
2 10
x1 1
x =
17
2 3 10
3 1017
and therefore x2 =
+1
2 1017
1
17
17
17
3 1017
17 2 10 (3 10 )
and x1 = 10 (1
) = 10 (
)
2 1017
2 1017
17
+1
In the rounded case

1
1017
1017
0 y1 1
= y1 = 1 y2 = 1017
1 y2 3
1 x1 1
=
x1 = 1 x2 = 0
1017 x2 1017
25
Numerical Problems
LU Basics
Small Pivots
An Aside on Floating Point Arithmetic

Double precision number
X.X X X X X i10exponent
64 bits
sign
Basic Problem
11 bits
52 bits
1.0 0.000000000000001 = 1.0

or
1.0000001 1 + ( 107 ) right 8 digits
Key Issue
Avoid small differences between large
numbers !
SMA-HPC 2003 MIT
26
LU Basics
Numerical Problems
Small Pivots
Back to the contrived example

LU Exact
LU Rounded
1
= 17
10
1
= 17
10
x1
1
=
x
1

2 Exact
0 1017
1 0
0 1017
1 0
x1 1
=
2 1017 x2 3
1
1 x1 1
=
1017 x2 3
x1
0
=
x

2 Rounded 1
SMA-HPC 2003 MIT
27
Numerical Problems
LU Basics
Small Pivots
Partial Pivoting for Roundoff reduction
If | M ii | < max | M ji |
j >i
Swap row i with arg (max | M ij |)

j >i
1
LU reordered = 17
10
0
1
2
1
0 1 + 2 1017
This multiplier
is small
This term gets

rounded
SMA-HPC 2003 MIT
To see why pivoting helped notice that
0 y1 3
=
1 y2 1
yields y1 = 3, y 2 = 1 1017 1
Notice that without partial pivoting y2 was 3-1017 or -1017 with rounding.
The right hand side value 3 in the unpivoted case was rounded away, where as now
it is preserved. Continuing with the back substitution.
1
1017
2
1
0 1 + 2 1017
x2 1 x1 1
x1 3
=
x2 1
28
LU Basics
Numerical Problems
Small Pivots
If the matrix is diagonally dominant or partial pivoting

for round-off reduction is during
LU Factorization:
1) The multipliers will always be smaller than one in
magnitude.
2) The maximum magnitude entry in the LU factors
will never be larger than 2(n-1) times the maximum
magnitude entry in the original matrix.
SMA-HPC 2003 MIT
To see why pivoting helped notice that
0 y1 3
=
1 y2 1
yields y1 = 3, y 2 = 1 1017 1
Notice that without partial pivoting y2 was 3-1017 or -1017 with rounding.
The right hand side value 3 in the unpivoted case was rounded away, where as now
it is preserved. Continuing with the back substitution.
1
1017
2
1
0 1 + 2 1017
x2 1 x1 1
x1 3
=
x2 1
29
Hard to Solve
Systems
Fitting Example
Polynomial Interpolation
Table of Data
f
t0
f (t0)
t1 f (t1)
f (t0)
tN f (tN)
t0 t1
t2
tN
Problem fit data with an Nth order polynomial

f (t ) = 0 + 1 t + 2 t 2 + + N t N
SMA-HPC 2003 MIT
30
Example Problem
Hard to Solve
Systems
Matrix Form
1
t0
t1
t0
t0
t1
t1
tN
tN
tN
0 f (t0 )

1 f (t1 )

N f (t N )
M interp
SMA-HPC 2003 MIT
The kth row in the system of equations on the slide corresponds to insisting that the
Nth order polynomial match the data exactly at point tk. Notice that we selected the
order of the polynomial to match the number of data points so that a square system
is generated. This would not generally be the best approach to fitting data, as we
will see in the next slides.
31
Hard to Solve
Systems
Fitting Example
Fitting f(t) = t
Coefficient
Value
Coefficient number
SMA-HPC 2003 MIT
Notice what happens when we try to fit a high order polynomial to a function that is
nearly t. Instead of getting only one coefficient to be one and the rest zero, instead
when 100th order polynomial is fit to the data, extremely large coefficients are
generated for the higher order terms. This is due to the extreme sensitivity of the
problem, as we shall see shortly.
32
Perturbation Analysis
Hard to Solve
Systems
Measuring Error Norms
Vector Norms
L2 (Euclidean) norm :
i=1
L1 norm :
Unit circle
n
xi
< 1
xi
i=1
< 1
L norm :
= max
i
xi
Square
< 1
SMA-HPC 2003 MIT
33
Hard to Solve
Systems
Measuring Error Norms
Matrix Norms
Vector induced norm :
A = max
x
Ax
x
= max
x =1
Ax
Induced norm of A is the maximum magnification of x by A

Easy norms to compute:
n
A
A
= max
1
0
j
i=1
Why? Let x =
n
= max
A ij = max abs row sum 1 0
i
j =1
Why? Let x =
= not so easy to compute!!

1
A ij = max abs column sum
SMA-HPC 2003 MIT
34
Hard to Solve
Systems
Perturbation Equation
(M + M ) ( x + x) = M x + M x + M x + M x =
Models LU
Roundoff
b
Unperturbed
RightHandSide
Models Solution
Perturbation
Since M x - b = 0
M x = M ( x + x ) x = M 1 M ( x + x )
1
Taking Norms
x M 1 MM x + xM x + x
M
x
M
1
M
M
Relative Error Relation
M
x + x
"Condition
Number "
SMA-HPC 2003 MIT
As the algebra on the slide shows the relative changes in the solution x is bounded
by an M
- dependent factor times the relative changes in M. The factor
|| M 1 || || M ||
was historically referred to as the condition number of M, but that definition has
been abandoned as then the condition number is norm
- dependent. Instead the
condition number of M is the ratio of singular values of M.
cond ( M ) =
max( M )
min( M )
Singular values are outside the scope of this course, consider consulting Trefethen
& Bau.
35
Hard to Solve
Systems
Geometric Approach is clearer

M = [M1 M 2 ], Solving M x = b is finding x1 M1 + x2 M 2 = b
x2
M2
|| M 2 ||
x2
M1
|| M 1 ||
x1
0
Case
1 1 orthogonal
Columns
0 106
M2
|| M 2 ||
M1
|| M 1 ||
x1
1
1 106
Case
1 nearly
Columns
aligned
6
1 10
When vectors are nearly aligned, difficult to determine

how much of M 1 versus how much of M 2
SMA-HPC 2003 MIT
36
Geometric Analysis
Hard to Solve
Systems
Polynomial Interpolation
log(cond(M))
~1020
1020
1010
~1013
1015
~314
~106
t2
16
32
The power series polynomials

are nearly linearly dependent
t1
t2
t12
t 22
tN
t N2
SMA-HPC 2003 MIT
Question Does row

- scaling reduce growth ?
0
d 11 0
0
0 dNN
0
a11
aN 1
a1N d 11 a11

=

aNN dNN aN 1
d 11 a1N
dNN aNN
Does row
- scaling reduce condition number ?
|| M || || M 1 || condition number of M
Theorem If floating point arithmetic is used, then row scaling (D M x = D b) will
not reduce growth in a meaningful way.
If
LU x = b
M
and
D M LU x ' = D b
then
x = x ' No roundoff reduction
37
Summary
Solution Existence and Uniqueness
Gaussian Elimination Basics
LU factorization
Pivoting and Growth
Hard to solve problems
Conditioning
38

Direct Methods for Sparse Linear Systems
Luca Daniel
Thanks to Deepak Ramaswamy, Michal

Rewienski, Karen Veroy and Jacob White
Outline
LU Factorization Reminder.
Sparse Matrices
Struts and joints, resistor grids, 3-d heat flow
Tridiagonal Matrix Factorization

General Sparse Factorization
Fill-in and Reordering
Graph Based Approach
Sparse Matrix Data Structures

Scattering
Factoring
LU Basics
Picture
M 11
M
M
M 21
M
M
M 31
M
M
M 41
21
11
31
11
41
11
M 12 M 13 M 14
M 2222 M
M 23
23 M 24
24
M
M 3333 M 34
M
M
32
34
M
M
M
M
M
M
M444444
M
43
42
43
43
42
M
M
32
22
42
43
22
33
SMA-HPC 2003 MIT
The above is an animation of LU factorization. In the first step, the first equation is
used to eliminate x1 from the 2nd through 4th equation. This involves multiplying
row 1 by a multiplier and then subtracting the scaled row 1 from each of the target
rows. Since such an operation would zero out the a21, a31 and a41 entries, we can
replace those zerod entries with the scaling factors, also called the multipliers. For
row 2, the scale factor is a21/a11 because if one multiplies row 1 by a21/a11 and
then subtracts the result from row 2, the resulting a21 entry would be zero. Entries
a22, a23 and a24 would also be modified during the subtraction and this is noted by
changing the color of these matrix entries to blue. As row 1 is used to zero a31 and
a41, a31 and a41 are replaced by multipliers. The remaining entries in rows 3 and 4
will be modified during this process, so they are recolored blue.
This factorization process continues with row 2. Multipliers are generated so that
row 2 can be used to eliminate x2 from rows 3 and 4, and these multipliers are
stored in the zerod locations. Note that as entries in rows 3 and 4 are modified
during this process, they are converted to gr een. The final step is to used row 3 to
eliminate x3 from row 4, modifying row 4s entry, which is denoted by converting
a44 to pink.
It is interesting to note that as the multipliers are standing in for zerod matrix
entries, they are not modified during the factorization.
Factoring
LU Basics
Algorithm
For i = 1 to n-1 {
For j = i+1 to n {
M ji =
M ji
M ii
For each Row

Pivot
n 1
(n i ) =
i =1
n2
2
multipliers
M jk M jk M ji M ik
}
Multiplier
n 1
(n i)
i =1
2 3
n
3
Multiply-adds
}
}
SMA-HPC 2003 MIT
Factoring
LU Basics
Theorem about Diagonally

Dominant Matrices
A) LU factorization applied to a strictly

diagonally dominant matrix will never produce
a zero pivot
B) The matrix entries produced by LU
factorization applied to a strictly diagonally
dominant matrix will never increase by more
than a factor 2(n-1)
SMA-HPC 2003 MIT
Theorem Gaussian Elimination applied to strictly diagonally dominant matrices

will never produce a zero pivot.
Proof
1) Show the first step succeeds.
2) Show the (n - 1)x (n - 1) sub matrix
n-1
n
n-1
n
is still strictly diagonally dominant.

First Step
as
a11 0
| a11 |> | aij |

j =2
Second row after first step
0, a 22
Is
a 22
a 21
a 21
a 21
a12, a 23
a13,, a 2n
a1n
a11
a11
a11
a 21
a 21
a12 > a 2 j
a1 j ?
a11
a11
Sparse Matrices
Applications
Space Frame
Nodal Matrix
Space Frame
5
3
4
2
1
X X
X X
X X
Unknowns : Joint positions

Equations : forces = 0
X
X X
X X X
X X X
X X X
X X
X
X
X X
X X
X X
X X
X
i
X =
i
X
X
X
X
X
i
i
2 x 2 b lo c k
SMA-HPC 2003 MIT
Applications
Sparse Matrices
1
m +1
Resistor Grid
m+2
m 1
m+3
2m
m2
(m 1) (m + 1)
Unknowns : Node Voltages

Equations : currents = 0
SMA-HPC 2003 MIT
The resistive grid is an important special case, as it is a model for discretized partial
differential equations (we will see this later).
Lets consider the nodal matrix and examine the locations and number of non zeros.
The matrix has a special form which is easy to discern from a 4 x 4 example. I n the
4 x 4 case the nodal matrix is
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
The tridiagonal blocks are due to the interaction between contiguously numbered
nodes along a single row in the grid. The non zeros, a distance 4 from the diagonals
are due to the inter row coupling between the diagonals.
Sparse Matrices
Nodal Formulation
Applications
Resistor Grid
Matrix non-zero locations for 100 x 10 Resistor Grid
Sparse Matrices
Nodal Formulation
Applications
Temperature in a cube
Temperature known on surface, determine interior temperature
m2 + 1
m2 + 2
Circuit
Model
m +1
m+2
Sparse Matrices
Nodal Formulation
1
Tridiagonal Example
X X
X X X
X X X
X X
Matrix Form
m 1
X
X
X
X
X
X
X X
X X
10
Sparse Matrices
For i = 1 to n-1 {
For j = i+1 to n {
M ji =
M ji
M ii
Tridiagonal Example
GE Algorithm
For each Row

Pivot
M jk M jk M ji M ik
}
Multiplier
Order N Operations!
}
}
SMA-HPC 2003 MIT
11
Fill-In
Sparse Matrices
R4
V3
Resistor Example
V2
V1
R1
R2
Example
R5
iS 1
R3
Nodal Matrix
R1 + R1
1
R
R1
1
1
R4
1
R2
1
1
+
R2 R3
1
R4
1
1
+
R4 R5
V1 0
V = 0 Symmetric
2 Diagonally Dominant
V3 iS1
SMA-HPC 2003 MIT
Recalling from lecture 2, the entries in the nodal matrix can be derived by noting
that a resistor, as
V n1
ik
Vn2
Rk
contributes to four locations in the nodal matrix as shown below.
n1
n2
n1
n2
1
1
Rk
Rk
1
1
Rk
Rk
It is also resisting to note that Gii is equal to the sum of the conductances (one over
resistance) incident at node i.
12
Sparse Matrices
Matrix Non zero structure
X X X
X X 0
X 0 X
Fill-In
Example
Matrix after one LU step
X X X
X X X
0
0 X
X X
X= Non zero
SMA-HPC 2003 MIT
During a step of LU factorization a multiple of a source row will be subtracted from

a row below it. Since these two rows will not necessarily have non zeros in the same
columns, the result of the subtraction might be to introduce additional non zeros into
the target row.
As a simple example, consider LU factoring
a11 a12
a21 0
The result is
a11
a
21
a11
a12
a a
21 12
a11
Notice that the factored matrix has a non zero entry in the bottom right corner,
where as the original matrix did not. This changing of a zero entry to a non zero
entry is referred to as a fill-in.
13
Fill-In
Sparse Matrices
Second Example
Fill-ins Propagate
X
X
X
0
X
0
X
0
X
0
X
0
Fill-ins from Step 1 result in Fill-ins in step 2

SMA-HPC 2003 MIT
In the example, the 4 x 4 mesh begins with 7 zeros. During the LU factorization, 5
of the zeros become non zero. What is of additional concern is the problem of fill-ins.
The first step of LU factorization where a multiple of the first row is subtracted
from the second row, generates fill-ins in the third and fourth column of row two.
When multiples of row 2 are subtracted from row 3 and row 4, the fill-ins generated
in row 2 generate second- level fill-ins in rows 3 and 4.
14
Sparse Matrices
V3
V1
V2
0
V3
V2
V1
Fill-In
Reordering
x
x
x
x
x
x
x
x
x
x Fill-ins
x
0
x No Fill-ins
Node Reordering Can Reduce Fill-in

- Preserves Properties (Symmetry, Diagonal Dominance)
- Equivalent to swapping rows and columns
SMA-HPC 2003 MIT
In the context of the nodal equation formulation, renumbering the nodes seems like
a simple operation to reduce fill-in, as selecting the node numbers was arbitrary to
begin with. Keep in mind, however, that such a renumbering of nodes in the nodal
equation formulation corresponds to swapping both rows and columns in the matrix.
15
Fill-In
Sparse Matrices
Reordering
Where can fill-in occur ?
Possible Fill-in
x Locations
x
x
Already Factored
Multipliers
x
x
x
x
x
x
Fill-in Estimate = (Non zeros in unfactored part of Row -1)

(Non zeros in unfactored part of Col -1)
Markowitz product
SMA-HPC 2003 MIT
16
Sparse Matrices
Fill-In
Reordering
Markowitz Reordering
For i = 1 to n
Find diagonal j i with min Markowitz Product
Swap rows j i and columns j i
Factor the new row i and determine fill-ins
End
Greedy Algorithm !
SMA-HPC 2003 MIT
In order to understand the Markowitz reordering algorithm, it is helpful to consider

the cost of the algorithm. The first step is to determine the diagonal with the
minimum Markowitz product. The cost of this step is
K i N operations
where K is the average number of non zeros per row.
The second step of the algorithm is to swap rows and columns in the factorization.
A good data structure will make the swap inexpensively.
The third step is to factor the reordered matrix and insert the fill-ins. If the matrix is
very sparse, this third step will also be inexpensive.
Since one must then find the diagonal in the updated matrix with the minimum
Markowitz product, the products must be computed at a cost of
K i ( N 1) operations
1
KN 2 operations will be needed just to compute
2
the Markowitz products in a reordering algorithm.
It is possible to improve the situation by noting that very few Markowitz products
will change during a single step of the factorization. The mechanics of such an
optimization are easiest to see by examining the graphs of a matrix.
Continuing, it is clear that
17
Fill-In
Sparse Matrices
Reordering
Why only try diagonals ?

Corresponds to node reordering in Nodal formulation
1
0
3
0
2
Reduces search cost

Preserves Matrix Properties
- Diagonal Dominance
- Symmetry
SMA-HPC 2003 MIT
18
Fill-In
Sparse Matrices
Pattern of a Filled-in Matrix
Very Sparse
Very Sparse
Dense
SMA-HPC 2003 MIT
19
Sparse Matrices
Fill-In
Unfactored Random Matrix
SMA-HPC 2003 MIT
20
Sparse Matrices
Fill-In
Factored Random Matrix
SMA-HPC 2003 MIT
21
Matrix Graphs
Sparse Matrices
Construction
Structurally Symmetric Matrices and Graphs

X
X X
X X
X X
1
2
X X X
X
X X
4
5
One Node Per Matrix Row

One Edge Per Off-diagonal Pair
SMA-HPC 2003 MIT
In the case where the matrix is structurally symmetric ( aij 0 if and only if
a ji 0), an undirected graph can be associated with the matrix.
The graph has
1 node per matrix row

1 edge between node i and node j if a ij 0
The graph has two important properties
1) The node degree squared yields the Markowitz product.
2) The graph can easily be updated after one step of factorization.
The graph makes efficient a two -step approach to factoring a structurally
symmetric matrix. First one determines an ordering which produces little fill by
using the graph. Then, one numerically factors the matrix in the graph-determined
order.
22
Sparse Matrices
X
X
X
Matrix Graphs
Markowitz Products
1
2
X
X
X
X
X
4
5
Markowitz Products = (Node Degree)2
M 11 3 i 3 = 9
M 22 2 i 2 = 4
M 33 3 i 3 = 9
M 44 2 i 2 = 4
M 55 2 i 2 = 4
(degree 1) 2 = 9
(deg ree 2) 2 = 4
(deg ree 3) 2 = 9
(degree 4) 2 = 4
(degree 5) 2 = 4
SMA-HPC 2003 MIT
That the ith node degree squared is equal to the Markowitz product associated with
the ith diagonal is easy to see. The node degree is the number of edges emanating
from the node, and each edge represents both an off-diagonal row entry and an offdiagonal column entry. Therefore, the number of off-diagonal row entries multiplied
by the number of off-diagonal column entries is equal to the node degree squared.
23
Matrix Graphs
Sparse Matrices
Factorization
One Step of LU Factorization

X
X X
X X
X X
1
2
X X X
X
X X
3
4
5
Delete the node associated with pivot row

Tie together the graph edges
SMA-HPC 2003 MIT
One step of LU factorization requires a number of floating point operations and

produces a reduced matrix, as below
factore d

f
unfactored act unfactored

or (includes fill-in)
de
After step i in the factorization, the unfactored portion of the matrix is smaller of
size (i - 1)x ( i - 1 ) , and may be denser if there are fill-ins. The graph can be used to
represent the location of non zeros in the unfactored portion of the matrix, but two
things must change.
1) A node must be removed as the unfactored portion has one fewer row.
2) The edges associated with fill-ins must be added.
In the animation, we show by example how the graph is updated during a step of LU
factorization. We can state the manipulation precisely by noting that if row i is
eliminated in the matrix, the node i must be eliminated from the graph. In addition,
all nodes adjacent to node i ( adjacent nodes are ones connected by an edge) will be
made adjacent to each other by adding the necessary edges. The added edges
represent fill- in.
24
Matrix Graphs
Sparse Matrices
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Markowitz products
( = Node degree)
SMA-HPC 2003 MIT
Example
Graph
1
2
Col Row
3
3
= 9
2
2
= 4
3
4
3
3
3
3
= 9
= 9
= 9
25
Matrix Graphs
Sparse Matrices
Example
Swap 2 with 1
x
x
x
x
x
x
x
x
x
x
Graph
SMA-HPC 2003 MIT
Examples that factor with no fill-in

Tridiagonal
1
A
Another ordering for the tridiagonal matrix that is more parallel
1
A
3
E
4
A
E
A
G
E
26
Graphs
Sparse Matrices
1
m +1
Resistor Grid Example
m+2
m 1
m+3
2m
m2
(m 1) (m + 1)
Unknowns : Node Voltages

Equations : currents = 0
SMA-HPC 2003 MIT
The resistive grid is an important special case, as it is a model for discretized partial
differential equations (we will see this later).
Lets consider the nodal matrix and examine the locations and number of non zeros.
The matrix has a special form which is easy to discern from a 4 x 4 example. I n the
4 x 4 case the nodal matrix is
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
The tridiagonal blocks are due to the interaction between contiguously numbered
nodes along a single row in the grid. The non zeros, a distance 4 from the diagonals
are due to the inter row coupling between the diagonals.
27
Sparse Matrices
Matrix Graphs
Grid Example
How long does it take to factor an M x M grid
Suppose the center column is eliminated last?

SMA-HPC 2003 MIT
A quick way to get a rough idea of how long it takes to factor the M2 x M2 matrix
associated with an M x M grid like a resistor array is to examine the graph. If one
orders the center column of M nodes in the graph last then they will be completely
connected as shown in the animation. However, a completely connected graph
corresponds to a dense matrix.
Since the resulting dense matrix requires M3 operations to factor, this suggests that
factoring an M x M grid costs something M3 operations, though making such an
argument precise is beyond the scope of the course.
28
Sparse Factorization Approach

1) Assume matrix requires NO numerical pivoting.
Diagonally dominant or symmetric positive definite.
2) Use Graphs to Determine Matrix Ordering

Many graph manipulation tricks used.
3) Form Data Structure for Storing Filled-in Matrix

Lots of additional nonzeros added
4) Put numerical values in Data Structure and factor

Computation must be organized carefully!
29
Sparse Matrices
Vector of row
pointers
1
Sparse Data Structure
Arrays of Data in a Row
Val 11 Val 12
Val 1K
Col 11 Col 12
Col 1K
Matrix entries
Column index
Val 21 Val 22
Val 2L
Col 21 Col 22
Col 2L
Val N1 Val N2
Val Nj
Col N1 Col N2
Col Nj
SMA-HPC 2003 MIT
In order to store a sparse matrix efficiently, one needs a data structure which can
represent only the matrix non zeros. One simple approach is based on the
observation that each row of a sparse matrix has at least one non zero entry. Then
one constructs one pair of arrays for each row, where the array part corresponds to
the matrix entry and the entrys column. As an example, consider the matrix
a11 0 a13
a
a
0
21 22
0 a33
The data structure for this example is

a11
1
a13
3
a 21
1
a 23
2
a 33
3
Note that there is no explicit storage for the zeros
30
Sparse Matrices

Problem of Misses
Eliminating Source Row i from Target row j

Row i
Row j
M i ,i +1
M i ,i + 7
M i ,i +15
i +1
i+7
i + 15
M i ,i +1
M i ,i + 4
M i ,i + 5
M i ,i + 7
M i ,i + 9
M i ,i +12
M i ,i +15
i +1
i+4
i+5
i+7
i+9
i + 12
i + 15
Must read all the row j entries to find the 3 that match row i
SMA-HPC 2003 MIT
a11 0 a13
a
a
0
21 22
0 a33

a11
1
a13
3
a 21
1
a 23
2
a 33
3
31
Sparse Matrices

Data on Misses
Rows
Ops
Misses
Res
300
904387 248967
RAM
2806
1017289 3817587
Grid
4356
3180726 3597746
More
misses
than
ops!
Every Miss is an unneeded Memory Reference!

SMA-HPC 2003 MIT
a11 0 a13
a
a
0
21 22
0 a33

a11
1
a13
3
a 21
1
a 23
2
a 33
3
32
Sparse Matrices

Scattering for Miss Avoidance
Row j
M i ,i +1
M i ,i +1
i +1
M i ,i + 4
i+4
M i , i + 4 M i ,i + 5
M i ,i + 5
i+5
M i ,i + 7
M i ,i + 7
i+7
M i ,i + 9
M i ,i + 9
i+9
M i ,i +12
i + 12
M i ,i +12
M i ,i +15
i + 15
M i ,i +15
1) Read all the elements in Row j, and scatter them in an n-length vector
2) Access only the needed elements using array indexing!
SMA-HPC 2003 MIT
a11 0 a13
a
a
0
21 22
0 a33

a11
1
a13
3
a 21
1
a 23
2
a 33
3
33
Summary
LU Factorization and Diagonal Dominance.
Factor without numerical pivoting
Sparse Matrices
Struts, resistor grids, 3-d heat flow -> O(N) nonzeros
Tridiagonal Matrix Factorization

Factor in O(N) operations
General Sparse Factorization

Markowitz Reordering to minize fill
Graph Based Approach

Factorization and Fill-in
Useful for estimating Sparse GE complexity
34

QR Factorization
Jacob White

and Karen Veroy
QR Factorization
Singular Example
LU Factorization Fails
Strut
Joint
Load force
The resulting nodal matrix is SINGULAR, but a solution exists!

SMA-HPC 2003 MIT
QR Factorization
Singular Example
LU Factorization Fails
v1
v2
1 v3
v4
The resulting nodal matrix is SINGULAR, but a solution exists!

SMA-HPC 2003 MIT
QR Factorization
Singular Example
Recall weighted sum of columns view of

systems of equations
M1
M2
x1 b1

x2 b2
MN
=

xN bN
x1M 1 + x2 M 2 +
+ xN M N = b
M is singular but b is in the span of the columns of M

SMA-HPC 2003 MIT
QR Factorization
Orthogonalization
If M has orthogonal columns
Orthogonal columns implies:
Mi M j = 0
i j
Multiplying the weighted columns equation by ith column:
M i x1M 1 + x2 M 2 +
+ xN M N = M i b
Simplifying using orthogonality:
xi M i M i = M i b xi =
SMA-HPC 2003 MIT
Mi b
(M
Mi
QR Factorization
Orthogonalization
Orthonormal M - Picture
M is orthonormal if:
Mi M j = 0
i j and
Mi Mi = 1
Picture for the two-dimensional case
M1
M1
M2
Non-orthogonal Case
SMA-HPC 2003 MIT
x1
x2
Orthogonal Case
M2
QR Factorization
Orthogonalization
QR Algorithm Key Idea
x1 b1
x2 b2
M
M
M
=
2
N
1

xN bN
Original Matrix
y
b

1 1
y2
b2
=
QN
Q1 Q2

yN bN
Matrix with
Orthonormal
Columns
Qy = b y = Q b
T
How to perform the conversion?

SMA-HPC 2003 MIT
Orthogonalization
QR Factorization
Projection Formula
Given M 1 , M 2 , find Q2 =M 2 r12 M 1 so that
M 1 Q2 = M 1 M 2 r12 M 1
) =0
M1 M 2
r12 =
M1 M1
M2
Q2
SMA-HPC 2003 MIT
r12
M1
Orthogonalization
QR Factorization
Normalization
Formulas simplify if we normalize

1
1
Q1 =
M 1 = M 1 Q1 Q1 = 1
r11
M1 M1
Now find Q2 =M 2 r12Q1 so that Q2 Q1 = 0

r12 = Q1 M 2
1
1
Finally Q2 =
Q2 = Q2
r
22
Q2 Q2
SMA-HPC 2003 MIT
QR Factorization
Orthogonalization
How was a 2x2 matrix
converted?
Since Mx should equal Qy, we can relate x to y

y1
x1
M 1 M 2 x = x1M 1 + x2 M 2 = Q1 Q2 y = y1Q1 + y2Q2
2
M 1 = r11Q1
M 2 = r22 Q2 + r12Q1
r11
0
SMA-HPC 2003 MIT
r12 x1 y1
=
r22 x2 y2
QR Factorization
M1

x1
M2 =
x2
Orthogonalization
The 2x2 QR Factorization
r11 r12 x1 b1
=
Q1 Q2
0 r22 x2 b2
Upper
Triangular
Orthonormal
Two Step Solve Given QR
Step 1) QRx = b Rx = QT b = b
Step 2) Backsolve Rx = b
SMA-HPC 2003 MIT
Orthogonalization
QR Factorization
The General Case
3x3 Case
M1
M2

M 3 M1

M 2 r12 M 1
To Insure the third column is orthogonal
(
(M
)
M )= 0
M 1 M 3 r13 M 1 r23 M 2 = 0
M2
SMA-HPC 2003 MIT
r13 M 1 r23
M 3 r13 M 1 r23 M 2
QR Factorization
(
(M
Orthogonalization
Must Solve Equations for
Coefficients in 3x3 Case
)
M )= 0
M 1 M 3 r13 M 1 r23 M 2 = 0
M2
M1 M1
M 2 M1
SMA-HPC 2003 MIT
r13 M 1 r23
M 1 M 2 r13 M 1 M 3
=
M 2 M 2 r23 M 2 M 3
QR Factorization
Orthogonalization
Must Solve Equations for
Coefficients
To Orthogonalize the Nth Vector
M1 M1
M N 1 M 1
M 1 M N 1 r1, N M 1 M N

M N 1 M N 1 rN 1, N M N 1 M N
3
N inner products requires N work

SMA-HPC 2003 MIT
QR Factorization
M1
Orthogonalization
3x3 Case
M2

M 3 M1

Use previously
orthogonalized vectors
M 2 r12Q1
M 3 r13Q1 r23Q2
To Insure the third column is orthogonal
Q1 M 3 Q1r13 Q2 r23 = 0 r13 = Q1 M 3

Q2 M 3 Q1r13 Q2 r23 = 0 r23 = Q2 M 3
SMA-HPC 2003 MIT
Basic Algorithm
QR Factorization
For i = 1 to N
rii = M i M i
1
Qi = M i
rii
For each Source Column

N
2N 2N
Normalize
For j = i+1 to N {
rij M j Qi
M j M j rij Qi
SMA-HPC 2003 MIT
Modified Gram-Schmidt
operations
i =1
For each target Column right of source

N
( N i)2 N N
i =1
operations
QR Factorization
Basic Algorithm
By Picture
Q1
Q2
Q3
SMA-HPC 2003 MIT
QN

r11
0
r12
r13
r22
0
r23
r33
r1N
r2 N
r3 N
rNN
QR Factorization
Basic Algorithm
By Picture
M1 1
Q
M
Q22
SMA-HPC 2003 MIT
Q33
M
Q
M44
r11 r12
r13
r14
r22
r23
r24
r33
r34
r44
QR Factorization
Basic Algorithm
Zero Column
What if a Column becomes Zero?
Q1

r11 r12 r13
0 0 0
MN
0 M3
0 0 0

0
Matrix MUST BE Singular!

0
r1N
0
0
1) Do not try to normalize the column.

2) Do not use the column as a source for orthogonalization.
3) Perform backward substitution as well as possible
SMA-HPC 2003 MIT
Basic Algorithm
QR Factorization
Zero Column Continued
Resulting QR Factorization
Q1
0 Q3
0
SMA-HPC 2003 MIT
QN

r11
0
0
0
r12
0
r13
0
r33
r1N
0
r3 N
rNN
QR Factorization
Singular Example
Recall weighted sum of columns view of

systems of equations
M1
M2
x b
1 1
x2 b2
=
MN

xN bN
x1M 1 + x2 M 2 +
+ xN M N = b
Two Cases when M is singular
Case 1) b span{M 1 ,.., M N } b span{Q1 ,.., QN }

Case 2) b span{M 1 ,.., M N }, How accurate is x ?
SMA-HPC 2003 MIT
QR Factorization
Minimization View
Alternative Formulations
Definition of the Residual R: R ( x ) b Mx
Find x which satisfies

Mx = b
Minimize over all x

N
R ( x ) R ( x ) = ( Ri ( x ) )
T
i =1
Equivalent if b span {cols ( M )}

T
Mx = b and min x R ( x ) R ( x ) = 0
Minimization extends to non-singular or nonsquare case!
SMA-HPC 2003 MIT
Minimization View
QR Factorization
One-dimensional
Minimization
Suppose x = x1e1 and therefore Mx = x1Me1 = x1M 1

One dimensional Minimization
R ( x ) R ( x ) = ( b x1Me1 ) ( b x1Me1 )
T
= b b 2 x1b Me1 + x ( Me1 )

T
2
1
( Me1 )
d
T
T
T
R ( x ) R ( x ) = 2b Me1 + 2 x1 ( Me1 ) ( Me1 ) = 0
dx
T
b Me1
x1 = T T
e1 M Me1 Normalization
SMA-HPC 2003 MIT
Minimization View
QR Factorization
One-dimensional
Minimization, Picture
Me1 = M 1
b
x1
b Me1
x1 = T T
e1 M Me1
e1
One dimensional minimization yields same result as
projection on the column!
SMA-HPC 2003 MIT
Minimization View
QR Factorization
Two-dimensional
Minimization
Now x = x1e1 + x2 e2 and Mx = x1Me1 + x2 Me2

Residual Minimization
R ( x ) R ( x ) = ( b x1Me1 x2 Me2 ) ( b x1Me1 x2 Me2 )
T
= b b 2 x1b Me1 + x ( Me1 )
( Me1 )
T
T
2
2 x2b Me2 + x2 ( Me2 ) ( Me2 )
Coupling
Term
SMA-HPC 2003 MIT
2
1
+2 x1 x2 ( Me1 )
( Me2 )
Minimization View
QR Factorization
Two-dimensional
Minimization Continued
More General Search Directions
x = v1 p1 + v2 p2 and Mx = v1Mp1 + v2 Mp2

span { p1 , p2 } = span {e1 , e2 }
R ( x ) R ( x ) = b b 2v1b Mp1 + v ( Mp1 )
T
2
1
( Mp1 )
2v2b Mp2 + v ( Mp2 )

T
Coupling
Term
T
1
( Mp2 )
T
+2v1v2 ( Mp1 ) ( Mp2 )
2
2
If p M T Mp2 = 0 Minimizations Decouple!!

SMA-HPC 2003 MIT
Minimization View
QR Factorization
Forming MTM orthogonal

Minimization Directions
ith search direction equals MTM orthogonalized unit vector

i 1
pi = ei rji p j
pi M T Mp j = 0
j =1
Use previous orthogonalized

Search directions
Mp ) ( Me )
(
=
( Mp ) ( Mp )
T
rji
SMA-HPC 2003 MIT
Minimization View
QR Factorization
Minimizing in the Search

Direction
Decoupled minimizations done individually

Minimize: v ( Mpi )
2
i
Differentiating:
T
Mp
2
v
b
( i ) i Mpi
2vi ( Mpi )
vi =
SMA-HPC 2003 MIT
( Mpi ) 2bT Mpi = 0
bT Mpi
( Mpi ) ( Mpi )
T
QR Factorization
For i = 1 to N
pi = ei
For each Source Column left of target
rij pTj M T Mpi

pi pi rij p j
rii = Mpi Mpi

1
pi
pi
rii
SMA-HPC 2003 MIT
Minimization Algorithm
For each Target Column
For j = 1 to i-1
x = x + vi pi
Minimization View
Orthogonalize Search Direction
Normalize search direction
QR Factorization
Q1
Minimization and QR
Comparison
QN
Q2
1
e1
r11
p1
1
( e2 r12e1 )
r22
SMA-HPC 2003 MIT
p2
Orthonormal
1
e2 riN ei )
(
MTM
rNN
Orthonormal
pN
QR Factorization
Search Direction
Orthogonalized unit vectors search directions

{ p1 , , pN }
{e1 , e2 , , eN }
Unit Vectors
MTM
Orthogonalization
Search Directions
Could use other sets of starting vectors

2
b
,
Mb
,
M
b,}
{
MTM
Krylov-Subspace Orthogonalization
Why?
SMA-HPC 2003 MIT
{ p1 , , pN }
Search Directions
Summary
QR Algorithm
Projection Formulas
Orthonormalizing the columns as you go
Modified Gram-Schmidt Algorithm
QR and Singular Matrices

Matrix is singular, column of Q is zero.
Minimization View of QR
Basic Minimization approach
Orthogonalized Search Directions
QR and Length minimization produce identical results
Mentioned changing the search directions

SMA-HPC 2003 MIT

Krylov-Subspace Matrix Solution Methods
Jacob White

and Karen Veroy
Outline
General Subspace Minimization Algorithm
Review orthogonalization and projection formulas
Generalized Conjugate Residual Algorithm

Krylov-subspace
Simplification in the symmetric case.
Convergence properties
Eigenvalue and Eigenvector Review

Norms and Spectral Radius
Spectral Mapping Theorem
Arbitrary Subspace
methods
Approach to Approximately Solving Mx=b

w01
wk 11
,...,
{w0 ,..., wk 1}
w
w
k 1N
0N
Pick a kdimensional
Subspace
Approximate x as a weighted sum of {w0 ,..., wk 1}

k
x =
k
k 1
w
i =0
SMA-HPC 2003 MIT
Arbitrary Subspace
methods
The residual is defined as r b Mx

k
If x =
k
k 1
w
i
i =0
k 1
r = b Mx = b i Mwi
k
i =0
Residual Minimizing idea: pick i ' s to minimize

r
k 2
2
( ) (r )
SMA-HPC 2003 MIT
k 1
= b i Mwi b i Mwi
i =0
i =0
k 1
Arbitrary Subspace
methods
Minimizing r
k 2
2
= b
Computational Approach
k 1
Mw
i =0
is easy if
i
2
( Mw ) = 0, i j or ( Mw ) orthogonal to ( Mw )
Create a set of vectors { p0 ,..., pk 1} such that
( Mwi )
span { p0 ,..., pk 1} = span {w0 ,..., wk 1}
and ( Mpi ) ( Mp j ) = 0, i j
T
SMA-HPC 2003 MIT
Arbitrary Subspace
methods
Algorithm Steps
Given M , b and a set of search directions {w0 ,..., wk 1}

1) Generate p j 's by orthogonalizing Mw j ' s
j 1
For j = 0 to k 1 p j = w j
i =0
( Mw ) ( Mp ) p
T
( Mpi ) ( Mpi )
T
2) compute the r minimizing solution x k

k 1
( r ) ( Mp )
i =0
( Mpi ) ( Mpi )
x =
k
SMA-HPC 2003 MIT
0 T
k 1
( r ) ( Mp )
i =0
( Mpi ) ( Mpi )
pi =
i T
pi
Arbitrary Subspace
methods
1) orthogonalize the Mwi ' s
w
p00
w11
p
Algorithm Steps by Picture
w
p22
w
p33

M p1
r0
SMA-HPC 2003 MIT
M p0
Minimization Algorithm
Arbitrary Subspace
Solution Algorithm
r 0 = b Ax 0
For j = 0 to k-1
p j = wj
For i = 0 to j-1
T
p j p j ( Mp j ) ( Mpi ) pi
pj
( Mp ) ( Mp )
T
j +1
j +1
Normalize
= x + (r
j
=r
pj
) ( Mp ) p
( r ) ( Mp ) M p
SMA-HPC 2003 MIT
Orthogonalize
Search Direction
j T
Update Solution
j T
Update Residual
Arbitrary Subspace
methods
Subspace Selection
Criteria
Criteria for selecting w0 ,..., wk 1

All that matters is the span {w0 ,..., wk 1}
i ' s such that b Mx k = b
k 1
Mw
i =0
A b in the span {w0 ,..., wk 1} for k

1
is small
One choice, unit vectors, x k span {e1 ,..., ek }

Generates the QR algorithm if k=N
Can be terrible if k < N
SMA-HPC 2003 MIT
Arbitrary Subspace
methods
Subspace Selection
Historical Development
1 T
T
Consider minimizing f ( x ) = x Mx x b
2
T
T
Assume M = M (symmetric) and x Mx > 0 (pos. def)
x f ( x ) = Mx b x = M 1b minimizes f
( )
Pick span {w0 ,..., wk 1} = span x f x ,..., x f x

0
Steepest descent directions for f, but f is not residual

Does not extend to nonsymmetric, non pos def case
SMA-HPC 2003 MIT
k 1
)}
Arbitrary Subspace
methods
Subspace Selection
Krylov Subspace
Note: span x f x 0 ,..., x f x k 1
)}
( )
= span r 0 ,..., r k 1
If: span {w0 ,..., wk 1} = span r ,..., r

then r k = r 0
k 1
Mr
i
i =0
k 1
and span r ,..., r
k 1
} = span {r , Mr ,..., M
0
k 1 0
Krylov Subspace
SMA-HPC 2003 MIT
The Generalized Conjugate

Residual Algorithm
Krylov Methods
The kth step of GCR
( r ) ( Mp )
k T
k =
( Mpk ) ( Mpk )
T
Determine optimal stepsize in

kth search direction
x k +1 = x k + k pk
k +1
Update the solution

and the residual
= r k Mpk
pk +1 = r
k +1
Mr ) ( Mp )
(
p
( Mp ) ( Mp )
SMA-HPC 2003 MIT
j =0
k +1 T
Compute the new

orthogonalized
search direction

Residual Algorithm
Krylov Methods
Algorithm Cost for iter k
( r ) ( Mp )
k T
k =
Vector inner products, O(n)

Matrix-vector product, O(n) if sparse
( Mpk ) ( Mpk )
T
x k +1 = x k + k pk
k +1
Vector Adds, O(n)
= r k Mpk
pk +1 = r
k +1
Mr ) ( Mp )
(
p
( Mp ) ( Mp )
k
j =0
k +1 T
O(k) inner products,

total cost O(nk)
If M is sparse, as k (# of iters) approaches n,

3
total cost = O (n ) + O (2n ) + .... + O ( kn ) = O (n )
SMA-HPC 2003 MIT
Better Converge Fast!

Residual Algorithm
Krylov Methods
Symmetric Case
An Amazing fact that will not be derived

T
k +1
j
If M = M then r Mp j < k
Mr ) ( Mp )
Mr ) ( Mp )
(
(
p = r
p p =r
p
( Mp ) ( Mp )
( Mp ) ( Mp )
Orthogonalization in one step
If k (# of iters ) n, then symmetric,
sparse, GCR is O(n2 )
Better Converge Fast!
k +1
k +1
j =0
SMA-HPC 2003 MIT
k +1 T
k +1 T
k +1
k +1
Krylov Methods
Nodal Formulation
No-leak Example
Insulated bar and Matrix
Incoming Heat
T (1)
T (0)
Near End
Temperature
Discretization
m
SMA-HPC 2003 MIT
2 1
1 2
Nodal
1 Equation
Form
1 2
Far End
Temperature
Krylov Methods
Nodal Formulation
1
m
SMA-HPC 2003 MIT
No-leak Example
Circuit and Matrix
2 1
1 2
1 2
m 1
Nodal
Equation
Form
Krylov Methods
Nodal Formulation
leaky Example
Conducting bar and Matrix
T (1)
T (0)
Near End
Temperature
Discretization
2.01 1
1 2.01
Nodal
Equation
1 Form
1 2.01
m
SMA-HPC 2003 MIT
Far End
Temperature
leaky Example
Krylov Methods
Nodal Formulation
1
m
SMA-HPC 2003 MIT
Circuit and Matrix
m 1
2.01 1
1 2.01
Nodal
Equation
1
Form
1 2.01
GCR Performance(Random Rhs)

10
R
E
S
I
D
U
A
L
10
10
10
10
Insulating
-1
Leaky
-2
-3
-4
10
20
Iteration
30
40
Plot of log(residual) versus Iteration

SMA-HPC 2003 MIT
50
60
GCR Performance(Rhs = -1,+1,-1,+1.)

0
10
R
E
S
I
D
U
A
L
-1
10
-2
10
-3
10
Insulating
-4
10
Leaky
-5
10
10
15
20
25
Iteration
30
35

SMA-HPC 2003 MIT
40
45
50
Convergence Analysis
Krylov Subspace
Methods
Polynomial Approach
If span {w0 ,..., wk } = span r 0 , Mr 0 ,..., M k r 0
k +1
M r
i =0
k +1
i 0
= k ( M ) r
kth order polynomial
= r i M r = ( I M k ( M )) r
0
i +1 0
i =0
Note: for any 0 0

0
1
0
0
span r , r = r 0 Mr
SMA-HPC 2003 MIT
}=
span r , Mr
Krylov Methods
Basic Properties
If j 0 for all j k in GCR, then

0
0
k 0
1) span { p0 , p1 ,..., pk } = span r , Mr , ..., M r
2) x
= k ( M )r , k is the k order
k +1
polynomial which minimizes r
k +1
th
2
2
= b Mx = r M k ( M )r
0
0
= ( I M k ( M ) ) r k +1 ( M ) r
th
0
where k +1 ( M ) r is the ( k + 1) order poly
k +1 2
minimizing r
subject to k +1 ( 0 ) =1
3) r
k +1
k +1
SMA-HPC 2003 MIT
Krylov Methods
Optimality of GCR poly
GCR Optimality Property

r
k +1 2
2
k+1 ( M )r
0 2
2
where k+1 is any k order

th
polynomial such that k+1 ( 0 ) =1
Therefore
Any polynomial which satisfies
the zero constraint can be used
to get an upper bound on
SMA-HPC 2003 MIT
k +1 2
2
Eigenvalues and
Vectors Review
Basic Definitions
Eigenvalues and eigenvectors of a matrix M satisfy

eigenvalue
Mui = i ui
eigenvector
Or, i is an eigenvalue of M if
M i I is singular
ui is an eigenvector of M if
( M i I ) ui
SMA-HPC 2003 MIT
=0
Eigenvalues and
Vectors Review
1.1 1
1 1.1
M 11
M
21
M N 1
Examples
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 2
0
M 22
SMA-HPC 2003 MIT
Basic Definitions
M NN 1
M NN
Eigenvalues?
Eigenvectors?
What about a lower

triangular matrix
A Simplifying
Assumption
Eigenvalues and
Vectors Review
Almost all NxN matrices have N linearly

independent Eigenvectors
u1
u2
u3
uN

= 1u1

2u2
3u3
N u N

The set of all eigenvalues of M is known as

the Spectrum of M
SMA-HPC 2003 MIT
Eigenvalues and
Vectors Review
A Simplifying
Assumption Continued
Almost all NxN matrices have N linearly

independent Eigenvectors
MU =U
1
0
0
0
0
0
N
U MU = or M = U U
1
Does NOT imply distinct eigenvalues, i can equal j

Does NOT imply M is nonsingular
SMA-HPC 2003 MIT
Eigenvalues and
Vectors Review
Im ( )
Spectral Radius
Re ( )
The spectral Radius of M is the radius of the smallest

circle, centered at the origin, which encloses all of
Ms eigenvalues
SMA-HPC 2003 MIT
Eigenvalues and
Vectors Review
Heat Flow Example
Incoming Heat
T (1)
T (0)
Unit Length Rod
T1
+
-
vs = T(0)
SMA-HPC 2003 MIT
TN
+
-
vs = T(1)
Eigenvalues and
Vectors Review
Heat Flow Example

Continued
2 1 0 0
1 2
0
1
0
0
1
2
Eigenvalues N=20
SMA-HPC 2003 MIT
Eigenvalues and
Vectors Review
Heat Flow Example

Continued
Four Eigenvectors Which ones?
SMA-HPC 2003 MIT
Useful
Eigenproperties
Spectral Mapping
Theorem
Given a polynomial
f ( x ) = a0 + a1 x + + a p x
Apply the polynomial to a matrix
f ( M ) = a0 + a1M + + a p M
Then
spectrum ( f ( M ) ) = f ( spectrum ( M ) )
SMA-HPC 2003 MIT
Useful
Eigenproperties
Spectral Mapping
Theorem Proof
Note a property of matrix powers
MM = U U U U = U U
p
p 1
M = U U
1
Apply to the polynomial of the matrix

1
1
p 1
f ( M ) = a0UU + a1U U + + a pU U
Factoring
f ( M ) = U ( a0 I + a1 + + a p
Diagonal
f ( M ) U = U ( a0 I + a1 + + a p
SMA-HPC 2003 MIT
)U
Useful
Eigenproperties
Spectral
Decomposition
Decompose arbitrary x in eigencomponents
x = 1u1 + 2u2 + + N u N
1
Compute by solving U = x = U 1 x

N
Applying M to x yeilds
Mx = M (1u1 + 2u2 + + N u N )
= 11u1 + 2 2u2 + + N N u N
SMA-HPC 2003 MIT
Krylov Methods
Important Observations
1) The GCR Algorithm converges to the exact solution

in at most n steps
Proof: Let n ( x ) = ( x 1 )( x 2 ) ... ( x n )

where i ( M ) .
n ( M ) r 0 = 0 and therefore r n = 0
2) If M has only q distinct eigenvalues, the GCR
Algorithm converges in at most q steps
Proof: Let q ( x ) = ( x 1 )( x 2 ) ... ( x q )
SMA-HPC 2003 MIT
Summary
Arbitrary Subspace Algorithm
Orthogonalization of Search Directions
Generalized Conjugate Residual Algorithm

Krylov-subspace
Simplification in the symmetric case.
Leaky and insulating examples
Eigenvalue and Eigenvector Review

Spectral Mapping Theorem
GCR limiting Cases

Q-step guaranteed convergence

Krylov-Subspace Matrix Solution Methods
Part II
Jacob White

and Karen Veroy
Outline
Reminder about GCR
Residual minimizing solution
Krylov Subspace
Polynomial Connection
Review Eigenvalues and Norms

Induced Norms
Spectral mapping theorem
Estimating Convergence Rate

Chebychev Polynomials
Preconditioners
Diagonal Preconditioners
Approximate LU preconditioners
With Normalization
Generalized
Conjugate Residual
Algorithm
r 0 = b Ax 0
For j = 0 to k-1
pj = r j
Residual is next search direction
For i = 0 to j-1
Orthogonalize
T
p j p j ( Mp j ) ( Mpi ) pi Search Direction
pj
pj
( Mp ) ( Mp )
T
x j +1 = x j + ( r
r j +1 = r j
Normalize
) ( Mp ) p
( r ) ( Mp ) M p
j T
Update Solution
j T
Update Residual
SMA-HPC 2003 MIT
Generalized
Conjugate Residual
Algorithm
1) orthogonalize the Mr i ' s
rp00
rp11
With Normalization
Algorithm Steps by Picture
pr 22
rp33

r k+1
SMA-HPC 2003 MIT
rk
M pk
First Few Steps
Generalized
Conjugate Residual
Algorithm
r0
First search direction r = b Mx = b, p0 =
Mr 0
0
Residual minimizing x1 =
solution
Second Search
Direction
( r 0 ) Mp0 p0
T
r1 = b Mx1 = r 0 1Mr 0
p1 =
r1 1,0 p0
M r1 1,0 p0
SMA-HPC 2003 MIT
Generalized
Conjugate Residual
Algorithm
Residual minimizing
solution
First few steps

Continued
x 2 = x1 + ( r1 ) Mp1 p1
T
Third Search Direction
r 2 = b Mx 2 = r 0 2,1Mr 0 2,0 M 2 r 0
p2 =
r1 2,0 p0 2,1 p1
M r1 2,0 p0 2,1 p1
SMA-HPC 2003 MIT
The kth step of GCR
Generalized
Conjugate Residual
Algorithm
k 1
pk = r ( Mr k )
k
j =0
pk =
pk
Mpk
k = ( r k ) ( Mpk )
T
x k +1 = x k + k pk
r k +1 = r k k Mpk
( Mp ) p
j
Orthogonalize and
normalize search
direction

Update the solution
and the residual
SMA-HPC 2003 MIT
Polynomial view
Generalized
Conjugate Residual
Algorithm
If j 0 for all j k in GCR, then

1) span { p0 , p1 ,..., pk } = span r 0 , Mr 0 ,..., Mr k
2) x k +1 = k ( M ) r 0 , k is the k th order poly

2
minimizing r k +1
2
k +1
k +1
0
3) r = b Mx = r M k ( M ) r 0
= ( I M k ( M ) ) r 0 k +1 ( M ) r 0
th
where k +1 ( M ) r 0 is the ( k + 1) order poly
2
minimizing r k +1 subject to k +1 ( 0 ) =1
SMA-HPC 2003 MIT
Krylov Methods
Polynomial View
If x k +1 span r 0 , Mr 0 ,..., Mr k
minimizes r k +1
2
2
= k ( M )r , k is the k order poly

2
minimizing r k +1
2
k +1
k +1
2) r = b Mx = ( I M k ( M ) ) r 0 =k +1 ( M ) r 0
th
where k +1 ( M ) r 0 is the ( k + 1) order poly
k +1 2
minimizing r
subject to k +1 ( 0 ) =1
1) x
k +1
th
Polynomial Property only a function of

solution space and residual minimization
SMA-HPC 2003 MIT
Krylov Methods
Nodal Formulation
No-leak Example
Insulated bar and Matrix
Incoming Heat
T (1)
T (0)
Near End
Temperature
Discretization
m
SMA-HPC 2003 MIT
Far End
Temperature
2 1
1 2
Nodal
1 Equation
Form
1
2
10
Krylov Methods
Nodal Formulation
leaky Example
Conducting bar and Matrix
T (1)
T (0)
Near End
Temperature
Discretization
Far End
Temperature
2.01 1
1 2.01
Nodal
Equation
1 Form
1 2.01
m
SMA-HPC 2003 MIT
11
GCR Performance(Random Rhs)

10
R
E
S
I
D
U
A
L
10
10
10
10
Insulating
-1
Leaky
-2
-3
-4
10
20
Iteration
30
40
50
60

SMA-HPC 2003 MIT
12
GCR Performance(Rhs = -1,+1,-1,+1.)

0
10
R
E
S
I
D
U
A
L
-1
10
-2
10
-3
10
Insulating
-4
10
Leaky
-5
10
10
15
20
25
Iteration
30
35
40
45
50

SMA-HPC 2003 MIT
13
Krylov Methods
Optimality of poly
Residual Minimizing Optimality Property

r k +1 k+1 ( M )r 0 k+1 ( M ) r 0
k+1 is any k th order poly such that k+1 ( 0 ) =1
Therefore
Any polynomial which satisfies
the constraints can be used to
get an upper bound on
r k +1
r0
SMA-HPC 2003 MIT
14
Induced Norms
Matrix Magnification
Question
Suppose y = Mx
How much larger is y than x?
OR
How much does M magnify x?

SMA-HPC 2003 MIT
15
Vector Norm
Review
Induced Norms
L2 (Euclidean) norm :
x
i=1
L1 norm :
x
xi
xi
i=1
< 1
< 1
L norm :
x
= max
i
xi
< 1
SMA-HPC 2003 MIT
16
Standard Induced
l-norms
Induced Matrix
Norms
Definition:
M l max
Mx
x
Examples
M
SMA-HPC 2003 MIT
max
1
= max
i
j =1
max
M ij
M
j
i =1
ij
x l =1
Mx
Max Column
Sum
Max Row
Sum
17
Standard Induced
l-norms continued
Induced Matrix
Norms
= m ax
j
i =1
Why? Let x =
= m ax
i
Why? Let
[1
j =1
x =
= max abs column sum
ij
0
ij
[ 1
= max abs column sum
1]
Not So easy to compute
SMA-HPC 2003 MIT
As the algebra on the slide shows the relative changes in the solution x is bounded
by an A-dependent factor times the relative changes in A. The factor
|| A1 || || A ||
was historically referred to as the condition number of A, but that definition has
been abandoned as then the condition number is norm-dependent. Instead the
condition number of A is the ratio of singular values of A.
cond ( A) =
max( A)
min( A)
Singular values are outside the scope of this course, consider consulting Trefethen
& Bau.
18
Useful
Eigenproperties
Spectral Mapping
Theorem
Given a polynomial
f ( x ) = a0 + a1 x + + a p x p
Apply the polynomial to a matrix
f ( M ) = a0 + a1M + + a p M p
Then
spectrum ( f ( M ) ) = f ( spectrum ( M ) )
SMA-HPC 2003 MIT
19
Krylov Methods
u N
=
u1
eigenvectors of M
k (M )
u
1
u N
u
1
Norm of matrix polynomials
k ( 1 )
u N
k ( N )
1
Cond(U)
u
1
k ( 1 )
u N
k ( N )
condition number of
M's eigenspace
SMA-HPC 2003 MIT
20
Krylov Methods
k ( 1 )
Norm of matrix polynomials
= max x =1
k ( N )
2
( ) x
k
= max i k ( i )
k ( M )
cond (V ) max i k ( i )
SMA-HPC 2003 MIT
21
Krylov Methods
Important Observations
1) A residual minimizing Krylov subspace algorithm

converges to the exact solution in at most n steps
Proof: Let n ( x ) = ( x 1 )( x 2 ) ... ( x n )
where i ( M ) . Then, max i n ( i ) = 0,
n ( M ) = 0 and therefore r n = 0
2) If M has only q distinct e-values, the residual

minimizing Krylov subspace algorithm converges
in at most q steps
Proof: Let q ( x ) = ( x 1 )( x 2 ) ... ( x q )
SMA-HPC 2003 MIT
22
Convergence for M =MT
Krylov Methods
Residual Polynomial
If M = MT then
1) M has orthonormal eigenvectors
cond (V ) =
u
1
u N
u
1
u N
=1
k ( M ) = max i k ( i )
2) M has real eigenvalues
If M is postive definite, then ( M ) > 0
SMA-HPC 2003 MIT
23
Residual Poly Picture for Heat Conducting Bar Matrix

No loss to air (n=10)
* = evals(M)
- = 5th order poly
- = 8th order poly
SMA-HPC 2003 MIT
24

Keep k ( i ) as small as possible:

Strategically place zeros of the poly
SMA-HPC 2003 MIT
25
Krylov Methods
Polynomial Min-Max Problem
Consider ( M ) [ min , max ] , min > 0
Then a good polynomial ( pk ( M ) is small)

can be found by solving the min-max problem
min kth order max x [min ,max ] pk ( x )

polys s . t .
pk ( 0 ) =1
The min-max problem is exactly

solved by Chebyshev Polynomials
SMA-HPC 2003 MIT
26
Krylov Methods
Chebyshev Solves Min-Max
The Chebyshev Polynomial
Ck ( x ) cos ( k cos 1 ( x ) ) x [ 1,1]
min kth order max x [min ,max ] k ( x )

polys s .t .
k ( 0 ) =1
max x [min ,max ]
x
Ck 1 + 2 min
max min
min
Ck 1 + 2
max min
SMA-HPC 2003 MIT
27
Chebyshev Polynomials minimizing over [1,10]
SMA-HPC 2003 MIT
28
Krylov Methods
Chebychev Bounds
min kth order max x [min ,max ] k ( x )

polys s .t .
k ( 0 ) =1
max
Ck 1 2
max min
max
min
2
max
+ 1
min
SMA-HPC 2003 MIT
29
Krylov Methods
Chebychev Result
If ( M ) [ min , max ] , min > 0

k
rk
max
min
r0
2
max
+ 1
min
SMA-HPC 2003 MIT
30
Preconditioning
Krylov Methods
1
0
0
1
0
Diagonal Example
1
0
0
2
0
For which problem will GCR Converge Faster?

SMA-HPC 2003 MIT
31
Preconditioning
Krylov Methods
Let M = D + M nd
(
Apply GCR to D 1M x = I + D 1M nd x = D 1b
The Inverse of a diagonal is cheap to compute
Usually improves convergence
SMA-HPC 2003 MIT
32
Heat Conducting
Bar example
x
x1
x2
x
100
xi
xi +1
Discretized system
one small x
xn
2 + 1
u1 f (x1)
1 2 +

1 1+ +100
100

100 1+ +100 1

1
1
u f (x )
1
2
n n
max
> 100
min
SMA-HPC 2003 MIT
33
Which Convergence Curve is GCR?
rk
r0
Iteration
SMA-HPC 2003 MIT
34
Heat Conducting
Bar example
Preconditioned Matrix
Eigenvalues
Residual Minimizing
Krylov-subspace
Algorithm can
eliminate outlying
eigenvalues by
placing polynomial
zeros directly on
them.
SMA-HPC 2003 MIT
35
The World According

to Krylov
Heat Flow Comparison Example
Dimension Dense GE Sparse GE
GCR
O ( m3 )
O (m)
O ( m2 )
O ( m6 )
O ( m3 )
O ( m3 )
O ( m9 )
O ( m6 )
O ( m4 )
GCR faster than banded GE in 2 and 3 dimensions

Could be faster, 3-D matrix only m3 nonzeros.
GCR converges too slowly!
SMA-HPC 2003 MIT
36
Preconditioning
Krylov Methods
Approximate LU
Preconditioners
Let M L U
Applying GCR to
((
LU
( )
M x = LU
Use an Implicit matrix representation!

Forming y =
(( LU )
M x is equivalent to
solving LUy = Mx
SMA-HPC 2003 MIT
37
Preconditioning
Krylov Methods
Approximate LU
Preconditioners Continued
Nonzeros in an exact LU Factorization

Filled-in LU factorization
Too expensive.
Ignore the fillin!
SMA-HPC 2003 MIT
38
Factoring 2-D Grid Matrices
Generated Fill-in Makes Factorization Expensive
SMA-HPC 2003 MIT
39
Preconditioning
Krylov Methods
Approximate LU
Preconditioners Continued
THROW AWAY FILL-INS!

Throw away all fill-ins
Throw away only fill-ins with small values
Throw away fill-ins produced by other fill-ins
Throw away fill-ins produced by fill-ins of
other fill-ins, etc.
40
Summary
Reminder about GCR
Residual minimizing solution
Krylov Subspace
Polynomial Connection
Review Norms and Eigenvalues

Induced Norms
Spectral mapping theorem
Estimating Convergence Rate

Chebychev Polynomials
Preconditioners
Approximate LU preconditioners
SMA-HPC 2003 MIT
41

1-D Nonlinear Solution Methods
Jacob White
Thanks to Deepak Ramaswamy Jaime Peraire, Michal

Rewienski, and Karen Veroy
Outline
Nonlinear Problems
Struts and Circuit Example
Richardson and Linear Convergence

Simple Linear Example
Newtons Method
Derivation of Newton
Quadratic Convergence
Examples
Global Convergence
Convergence Checks
Nonlinear
problems
( x0 , y0 )
( x2 , y2 )
Strut Example
( x1 , y1 )
Given: x0, y0, x1, y1, W

Find: x2, y2
Load force
W
Need to Solve f x
W
Struts Example
Nonlinear Problems
Reminder: Strut Forces
L0 L
EAc
L0
f
fx
f
(0,0)
L
x1 , y1
fy
fx
X
fy
L
SMA-HPC 2003 MIT
H L0 L
x1
f
L
y1
f
L
2
1
2
1
x y
Nonlinear
problems
Strut Example
( x1 , y1 )
( x0 , y0 )
x2 x0 y2 y0
L1
x2 x1 y2 y1
L2
f1
( x2 , y2 )
x2 x0
H ( Lo L1 )
L1
x2 x1
H ( Lo L2 )
L2
f1x
f2
f2 x
Load force
W
1x
f2 x
1y
f2 y W
Nonlinear
problems
Strut Example
Why Nonlinear?
y2 y1
H ( Lo L2 )
L2
y2 y0
H ( Lo L1 ) W
L1
Pull Hard on the

Struts
The strut forces change

in both magnitude and
direction
Nonlinear
problems
v1
10v
Circuit Example
v2
10
1
I r Vr
10
+
- Vd
Vd
I d I s (e
Need to Solve
Id Ir
I vsrc I r
0
0
Vt
1)
Nonlinear
problems
Solve Iteratively
Hard to find analytical solution for
f ( x)
Solve iteratively
0
guess at a solution x
x0
repeat for k = 0, 1, 2, .
k 1
W x
k 1
f
x
|0
until
Ask
Does the iteration converge to correct solution ?
How fast does the iteration converge?
Richardson
Iteration
Definition
Richardson Iteration Definition
k 1
x f (x )
An iteration stationary point is a solution
k 1
f ( xk )
xk
x* ( Solution)
Richardson
Iteration
Example 1
f ( x)
Start with
0.7 x 10
0
x1
x 0 f ( x 0 ) 10
x2
x1 f ( x1 ) 13
x 6 14.27
x3
x 2 f ( x 2 ) 13.9
x7
x4
x3 f ( x3 ) 14.17
x8 14.28
14.25
14.28
Converged
Richardson
Iteration
Example 1
f ( x)
x x
0.7 x 10
Richardson
Iteration
Example 2
f ( x)
Start with
2 x 10
x0
x1
x0 f ( x0 ) 10
x2
x1 f ( x1 )
x3
x2 f ( x2 ) 130
x4
x3 f ( x3 )
40
400
No convergence !
Richardson
Iteration
Convergence
Setup
Iteration Equation
Exact Solution
k 1
x f (x )
*
x N
f (x )
0
Computing Differences
k 1
x
x x f (x ) f (x )
Need to Estimate
Richardson
Iteration
f (v ) f y
Convergence
Mean Value Theorem
wf v
v y
wx
v > v, y @
v
Richardson
Iteration
Convergence
Use MVT
Iteration Equation
Exact Solution
k 1
x f (x )
*
x N
f (x )
0
Computing Differences
k 1
x
x x f (x ) f (x )
wf x k
*
1
x x
wx
Richardson
Iteration
If
1
And
Then
Or
Convergence
Richardson Theorem
wf x
wx
d J 1 for all x s.t. x x G
x x G
k 1
x dJ x x
lim k of x
k 1
x
lim k of J x x
Linear Convergence
Richardson
Iteration
Example 1
f ( x)
x x
0.7 x 10
Richardson
Iteration
Problems
Convergence is only linear

x, f(x) not in the same units:
x is a voltage, f(x) a current in circuits
x is a displacement, f(x) a force in struts
Adding 2 different physical quantities
But a Simple Algorithm
Just calculate f(x) and update
Newtons method
Another approach
From the Taylor series about solution
df k *
f ( x ) f ( x ) ( x ) ( x xk )
dx
*
Define iteration
Do k = 0 to .
1
df k
k 1
k
x
x ( x ) f ( xk )
dx
df k
if ( x )
dx
until convergence
1
exists
Newtons Method
Graphically
Newtons Method
Example
Newtons Method
x x
Example
Newtons Method
0
f ( x* )
Convergence
2
df
d
f
k
k
k
*
f ( x ) ( x )( x x ) 2 ( x )( x* x k ) 2
dx
dx
k
*

some x [ x , x ]
Mean Value theorem

truncates Taylor series
But
df k k 1 k
f ( x ) ( x )( x x )
dx
k
by Newton
definition
Convergence
Newtons Method
Contd.
2
df k k 1 *
( x )( x x )
Subtracting
dx
Dividing through
d f
k
* 2

x
x
x
(
)(

)
2
d x
2
df
d
f
k 1
k 1
k
*
* 2

( x x ) [ ( x )]
(
)(

)
x
x
x
2
dx
d x
1
2
df
d
f
Suppose ( x)
( x) d L for all x
2
dx d x
then x
k 1
x dL x x
* 2
Convergence is quadratic if L is bounded
Convergence
Newtons Method
Example 1
f ( x) x 2 1 0,
df k
( x ) 2 xk
dx
k
2x (x
k 1
k 1
2x (x
or ( x
k 1
find x ( x* 1)
x
x )
1
k
x ) 2x (x x )
*
x )
1
k
* 2
(x x )
k
2x
x
x
2
Convergence is quadratic
Convergence
Newtons Method
Example 2
2
f ( x) x 0, x 0
df
df k
Note :
not bounded
k
dx
(x ) 2 x
dx
away from zero
2
k
k 1
k
2 x ( x 0) ( x 0)
1 k
*
k 1
k
for x z x 0
x 0
x 0
2
1
*
*
( xk x )
or ( xk 1 x )
2
1
Convergence is linear
Newtons Method
Convergence
Examples 1 , 2
Newtons Method
Convergence
1
2
df
d f
Suppose ( x)
( x) d L for all x
2
dx d x
if L x0 x* d J 1
then xk converges to x*
Proof
x1 x * d L ( x0 x * ) x0 x *
x1 x * d J x0 x *
x2 x * d LJ x0 x * x1 x *
or x2 x * d J 2 x1 x * d J 3 x0 x *
x3 x * d J 4 x 2 x * d J 7 x 0 x *
Newtons Method
Convergence
Theorem
df
d2 f
If L is bounded (
bounded away from zero ;
bounded)
2
dx
dx
then Newton's method is guaranteed to converge given a "close
enough" guess
Always converges ?
Newtons Method
Convergence
Example
Convergence Depends on a Good Initial Guess
f(x)
x1
x1
x
SMA-HPC 2003 MIT
Convergence
Newtons Method
Convergence Checks
Need a "delta-x" check to avoid false convergence

f(x)
k 1
k 1
x ! H xa H xr x
f x
SMA-HPC 2003 MIT
k 1
H fa
k 1
Convergence
Newtons Method
Convergence Checks
Also need an "f x " check to avoid false convergence

f(x)
f x
x*
x
SMA-HPC 2003 MIT
x H xa H xr x
! H fa
X
x k 1 x k
k 1
k 1
k 1
Summary
Nonlinear Problems
Struts and Circuit Example
Richardson and Linear Convergence

Simple Linear Example
1-D Newtons Method
Derivation of Newton
Quadratic Convergence
Examples
Global Convergence
Convergence Checks

Multidimensional Newton Methods
Jacob White
Thanks to Deepak Ramaswamy, Jaime Peraire, Michal

Outline
Quick Review of 1-D Newton
Convergence Testing
Multidimensonal Newton Method

Basic Algorithm
Description of the Jacobian.
Equation formulation.
Multidimensional Convergence Properties

Prove local convergence
Improving convergence
Newton Idea
1-D Reminder
( )
Problem: Find x such that f x = 0

*
Use a Taylor Series Expansion
0
*
( )
f x = f ( x) +
f ( x )
x
(x
x +
f ( x)
If x is close to the exact solution
f ( x )
x
SMA-HPC 2003 MIT
x* x f ( x )
(x
Newton Algorithm
1-D Reminder
x 0 = Initial Guess, k = 0
Repeat {
( )
f x k
x
(x
k +1
( )
xk = f xk
k = k +1
} Until ?
x k +1 x k < threshold ?
SMA-HPC 2003 MIT
f x k +1 < threshold ?
1-D Reminder
Newton Algorithm
Algorithm Picture
SMA-HPC 2003 MIT
Newton Algorithm
1-D Reminder
Convergence Checks
Need a "delta-x" check to avoid false convergence

f(x)
k +1
k +1
x > xa + xr x
k
f x
SMA-HPC 2003 MIT
k +1
< fa
k +1
Newton Algorithm
1-D Reminder
Convergence Checks
Also need an "f ( x ) " check to avoid false convergence

f(x)
f x
x*
x
SMA-HPC 2003 MIT
x < xa + xr x
k
> fa
X
x k +1 x k
k +1
k +1
k +1
Newton Algorithm
1-D Reminder
Local Convergence
Convergence Depends on a Good Initial Guess
f(x)
x1
x1
x
SMA-HPC 2003 MIT
Multidimensional
Newton Method
F
l= x +y
2
FL
(lo l )
F = EAc
= (lo l )
lo
x
x
f x = F = (lo l )
l
l
y
y
f y = F = (lo l )
l
l
SMA-HPC 2003 MIT
Example Problem
Strut and Joint
F (x) =
f x + FLx = 0
f y + FLy = 0
OR
x
(lo l ) + FLx = 0
l
y
(lo l ) + FLy = 0
l
Multidimensional
Newton Method
v1
i2 + v2b -
Example Problem
Nonlinear Resistors
v2
Nodal Analysis
i3
+
v3b
i1
+
v1b
-
Nonlinear
Resistors
i = g (v)
SMA-HPC 2003 MIT
At Node 1: i1 + i2 = 0
g ( v1 ) + g ( v1 v2 ) = 0
At Node 2: i3 i2 = 0
g ( v3 ) g ( v1 v2 ) = 0
Two coupled
nonlinear equations
in two unknowns
Multidimensional
Newton Method
General Setting
( )
Problem: Find x such that F x = 0

*
x
*
and F :
Use a Taylor Series Expansion
( )
F x = F ( x) + JF ( x)
*
Jacobian
Matrix
If x is close to the exact solution
J F ( x ) x* x F ( x )
SMA-HPC 2003 MIT
x* x + H .O.T .
Nodal Analysis
Multidimensional
Newton Method
x
*
Strut and Joint
and F :
x
(lo l ) + FLx = 0
l
y
(lo l ) + FLy = 0
l
SMA-HPC 2003 MIT
FL
? ?
JF ( x) =
? ?
Multidimensional
Newton Method
v1
i1
x*
i2
b
v
+2
v2
i3
v1b
v3b
SMA-HPC 2003 MIT
Nodal Analysis
Nonlinear Resistor
and F :
At Node 1: i1 + i2 = 0
F1 ( v ) = g ( v1 ) + g ( v1 v2 ) = 0
At Node 2: i3 i2 = 0
F2 ( v ) = g ( v3 ) g ( v1 v2 ) = 0
? ?
JF ( x) =
? ?
Multidimensional
Newton Method
Jacobian Matrix
J F ( x ) x F ( x + x ) F ( x )
F1 ( x )
x1
J F ( x ) x
FN ( x )
x
1
SMA-HPC 2003 MIT
F1 ( x )
xN x1
FN ( x ) xN
xN
Multidimensional
Newton Method
Jacobian Matrix
Singular Case
Suppose J F ( x ) is singular?
F1 ( x )
x1
J F ( x ) x =
FN ( x )
x
1
F1 ( x )
xN x1
=0
FN ( x ) xN
xN
What does it mean?

SMA-HPC 2003 MIT
Multidimensional
Newton Method
Newton Algorithm
Repeat {
( ) ( )
( x )( x x ) = F ( x )
Compute F x k , J F x k
Solve J F
k +1
for x k +1
k = k +1
} Until
SMA-HPC 2003 MIT
x k +1 x k ,
f x k +1
small enough
Computing the Jacobian

and the Function
Multidimensional
Newton Method
Consider the contribution of one nonlinear resistor

Connected between nodes n1 and n2
b
i + vb i b = g vb
n1
n2
( )
Summing currents at Node n1: Fn1 ( v ) = g vn1 vn2 +
Summing currents at Node n 2 : Fn2 ( v ) = g vn1 vn2 +
Differenting at Node n1:
Fn1 ( v )
vn1
g vn1 vn2
vn1
g
v
SMA-HPC 2003 MIT
) +
Fn1 ( v )
vn2
g vn1 vn2
vn2
g
v
) +
Computing the Jacobian

and the Function
Multidimensional
Newton Method
i
Stamping a
Resistor
n1
n2
n1
n1
g vn1 vn2
n2
g vn1 vn2
g vn1 vn2
v
g vn1 vn2
SMA-HPC 2003 MIT
vb
+
-
v
JF (v )
n2
g vn1 vn2
g v v
n1
n2
F (v )
n1
n
2
Multidimensional More Complete Newton

Algorithm
Newton Method
Repeat {
( )
( )
Zero J F and F
for each element
Compute element currents and derivatives
Sum currents to F , sum derivatives to J F
( )( x
Solve J F x k
k +1
( )
xk = F xk
for x k +1
k = k +1
} Until
SMA-HPC 2003 MIT
x k +1 x k ,
f x k +1
small enough
Multidimensional
Newtons Method
Example: Heat Flow in

leaky bar
T (1)
T (0)
T1
vs = T(0)
TN
+
-
+
-
What is the Jacobian?

SMA-HPC 2003 MIT
ih = k1T + k2T
vs = T(1)
Multidimensional
Newton Method
Multidimensional
Convergence Theorem
Theorem Statement
Main Theorem
If
( )
( Inverse is bounded )
a)
J F1 x k
b)
JF ( x) JF ( y)
x y
( Derivative is Lipschitz Cont )
Then Newtons method converges given a sufficiently

close initial guess
SMA-HPC 2003 MIT
Multidimensional
Newton Method
If J F ( x ) J F ( y )
x y
Multidimensional
Convergence Theorem
Key Lemma
Then F ( x ) F ( y ) J F ( y )( x y )
x y
There is no multidimensional mean value theorem.
SMA-HPC 2003 MIT
Multidimensional
Convergence Theorem
Multidimensional
Newton Method
Theorem Proof
By definition of the Newton Iteration and the assumed

bound on the inverse of the Jacobian
( ) ( )
x k +1 x k = J F1 x k F x k
( )
F xk
Again applying the Newton iteration definition
( )
x k +1 x k F x k F x k 1 J F x k 1
0
Finally using the Lemma
x
SMA-HPC 2003 MIT
k +1
x
k
x x
k
k 1 2
)(
x k x k 1
Multidimensional
Convergence Theorem
Multidimensional
Newton Method
Theorem Proof Continued
Reorganizing the equation
k +1
k
k 1
x x x k x k 1
x
2
1
0
If
x x <1
2
x k +1 x k k x k +1 x k + x 0 converges
k =0
SMA-HPC 2003 MIT
Non-converging
Case
1-D Picture
f(x)
x1
x
Must Somehow Limit the changes in X

SMA-HPC 2003 MIT
Newton Method
with Limiting
Newton Algorithm
Newton Algorithm for Solving F ( x ) = 0

x = Initial Guess, k = 0
0
Repeat {
( ) ( )
( x ) x = F ( x )
+ limited ( x )
Solve J F
x k +1 = x k
k = k +1
} Until
SMA-HPC 2003 MIT
k +1
for x k +1
k +1
x k +1 , F x k +1
small enough
Newton Method
with Limiting
Limiting Methods
Direction Corrupting
limited x k +1 =
i
xik +1 if xik +1 <
limited x k +1
sign ( xik +1 ) otherwise
x k +1
NonCorrupting
limited x k +1 = x k +1
= min 1,
k +1
limited x
k +1
Heuristics, No Guarantee of Global Convergence

SMA-HPC 2003 MIT
x k +1
Damped Newton
Scheme
Newton Method
with Limiting
General Damping Scheme
( )
( )
Solve J F x k x k +1 = F x k
for x k +1
x k +1 = x k + k x k +1
Key Idea: Line Search
Pick to minimize F x + x
k
F x + x
k
k +1
2
2
k +1
F x + x
k
2
2
k +1
) F (x
T
+ k x k +1
Method Performs a one-dimensional search in

Newton Direction
SMA-HPC 2003 MIT
Damped Newton
Newton Method
with Limiting
Convergence Theorem
If
( )
a)
J F1 x k
b)
JF ( x) JF ( y)
x y
Then
There exists a set of k ' s ( 0,1] such that
F x k +1 = F x k + k x k +1
( )
< F xk
with <1
Every Step reduces F-- Global Convergence!

SMA-HPC 2003 MIT
Damped Newton
Newton Method
with Limiting
Nested Iteration
Repeat {
( ) ( )
Solve J ( x ) x = F ( x ) for x
Find ( 0,1] such that F ( x + x )
k
k +1
k +1
x k +1 = x k + k x k +1
k = k +1
} Until
x k +1 , F x k +1
k +1
is minimized
small enough
How can one find the damping coefficients?

SMA-HPC 2003 MIT
Newton Method
with Limiting
Damped Newton
Singular Jacobian Problem
f(x)
x2
1
1
D
Damped Newton Methods push iterates to local minimums

Finds the points where Jacobian is Singular
SMA-HPC 2003 MIT
Summary
Quick Review of 1-D Newton
Convergence Testing
Multidimensonal Newton Method
Basic Algorithm
Description of the Jacobian.
Jacobian Construction.
Local Convergence Theorem
Damped Newton Method

Nested Algorithm with line search
Global convergence IF Jacobian nonsingular

Modified Newton Methods
Jacob White

Outline
Damped Newton Schemes
Globally Convergent if Jacobian is Nonsingular
Difficulty with Singular Jacobians
Introduce Continuation Schemes

Problem with Source/Load stepping
More General Continuation Scheme
Improving Continuation Efficiency

Better first guess for each continuation step
Arc Length Continuation

SMA-HPC 2003 MIT
Multidimensional
Newton Method
Newton Algorithm

0
Repeat {
( ) ( )
( x )( x x ) = F ( x )
Solve J F
k +1
for x k +1
k = k +1
} Until
SMA-HPC 2003 MIT
x k +1 x k , F x k +1
small enough
Multidimensional
Newton Method
Multidimensional
Convergence Theorem
Theorem Statement
Main Theorem
If
( )
a)
J F1 x k
b)
JF ( x) JF ( y) A x y
Then Newtons method converges given a sufficiently

close initial guess
SMA-HPC 2003 MIT
Multidimensional
Newton Method
Multidimensional
Convergence Theorem
Implications
If a functions first derivative never goes to zero, and its

second derivative is never too large
Then Newtons method can be used to find the zero
of the function provided you all ready know the
answer.
Need a way to develop Newton methods which
converge regardless of initial guess!
SMA-HPC 2003 MIT
Non-converging
Case
1-D Picture
f(x)
x1
x
Limiting the changes in X might improve convergence

SMA-HPC 2003 MIT
Newton Method
with Limiting
Newton Algorithm

0
Repeat {
( ) ( )
( x ) x = F ( x )
+ limited ( x )
Solve J F
x k +1 = x k
k = k +1
} Until
SMA-HPC 2003 MIT
k +1
for x k +1
k +1
x k +1 , F x k +1
small enough
Damped Newton
Scheme
Newton Method
with Limiting
General Damping Scheme
( )
( )
Solve J F x k x k +1 = F x k
for x k +1
x k +1 = x k + k x k +1
Key Idea: Line Search
Pick to minimize F x + x
k
F x + x
k
k +1
2
2
k +1
F x + x
k
2
2
k +1
) F (x
T
+ k x k +1
Method Performs a one-dimensional search in

Newton Direction
SMA-HPC 2003 MIT
Newton Method
with Limiting
Damped Newton
Convergence Theorem
If
a)
J F1 ( x k )
b)
JF ( x) JF ( y) A x y
Then
There exists a set of k ' s ( 0,1] such that
F ( x k +1 ) = F ( x k + k x k +1 ) < F ( x k ) with <1
Every Step reduces F-- Global Convergence!

SMA-HPC 2003 MIT
Damped Newton
Newton Method
with Limiting
Nested Iteration
Repeat {
( ) ( )
k
k +1
k +1
x k +1 = x k + k x k +1
k = k +1
} Until
SMA-HPC 2003 MIT
x k +1 , F x k +1
small enough
k +1
is minimized
Newton Method
with Limiting
v1
1v
v2
10
+
- Vd
Damped Newton
Example
1
I r Vr = 0
10
Vd
I d I s (e
Vt
1) = 0
Nodal Equations with Numerical Values
f ( v2 )
( v 0)
v2 1)
(
16
0.025
=
+ 10 (e
1) = 0
2
10
Newton Method
with Limiting
f ( v2 )
Damped Newton
Example cont.
( v 0)
v2 1)
(
16
0.025
=
+ 10 (e
1) = 0
2
10
Damped Newton
Newton Method
with Limiting
Nested Iteration
Repeat {
( ) ( )
k
k +1
k +1
x k +1 = x k + k x k +1
k = k +1
} Until
x k +1 , F x k +1
k +1
is minimized
small enough
How can one find the damping coefficients?

SMA-HPC 2003 MIT
Newton Method
with Limiting
Damped Newton
Theorem Proof
By definition of the Newton Iteration
k +1
=x -
k
( )
k
( )
JF x
F xk

Newton Direction
Multidimensional Mean Value Lemma
F ( x ) F ( y ) J F ( y )( x y )
A
x y
2
Combining
F x
k +1
) F (x )+ J (x )
k
SMA-HPC 2003 MIT
( )
k J x k
F
A
F x k J F xk
2
( )
k
( )
( )
F x
Newton Method
with Limiting
Damped Newton
Theorem Proof-Cont
From the previous slide
F x
k +1
) F (x )+ J (x )
k
( )
J x
F
A k
F x
J F xk
2
( )
( )
Combining terms and moving scalars out of norms
F x
k +1
) (1 ) F ( x ) ( )
k
A
J F xk
2
( )
( )
F x
Using the Jacobian Bound and splitting the norm

2
2
2 A
k +1
k
k
k
k
F ( x ) (1 ) F ( x ) + ( )
F (x )
Yields a quadratic in the damping coefficient

SMA-HPC 2003 MIT
( )
F x
Newton Method
with Limiting
Damped Newton
Theorem Proof-Cont-II
Simplifying quadratic from previous slide
F x
k +1
1 k + k
( )
2A
2
( )
F x
k
F
x
( )
Two Cases:
1)
2A
2
( )
F xk
<
1
2
Pick k = 1 (Standard Newton)
2
2 A
k
k
1 +
F xk
2
2A
1
1
2)
Pick k = 2
F xk >
2
2
A F
( )
( )
( )
1 k + k
( )
SMA-HPC 2003 MIT
2A
2
(x )
( )
F xk
1
<
2
k
1
1
<
2 2A F x k
( )
Newton Method
with Limiting
Damped Newton
Theorem Proof-Cont-III
Combining the results from the previous slide
( )
F x k +1 k F x k
not good enough, need independent from k
The above result does imply
( )
F x k +1 F x 0
not yet a convergence theorem
A
1
For the case where
F ( xk ) >
( )
2 2A F x k
( )
2 2A F x0
Note the proof technique

First Show that the iterates do not increase
Second Use the non-increasing fact to prove convergence
SMA-HPC 2003 MIT
Damped Newton
Newton Method
with Limiting
Nested Iteration
Repeat {
( ) ( )
k
k +1
k +1
x k +1 = x k + k x k +1
k = k +1
} Until
x k +1 , F x k +1
k +1
is minimized
small enough
Many approaches to finding

SMA-HPC 2003 MIT
Newton Method
with Limiting
Damped Newton
Singular Jacobian Problem
f(x)
x2
1
1
D
Damped Newton Methods push iterates to local minimums

Finds the points where Jacobian is Singular
SMA-HPC 2003 MIT
Continuation Schemes
Source or Load-Stepping
Newton converges given a close initial guess
Basic Concepts
Generate a sequence of problems

Make sure previous problem generates guess for next problem
Heat-conducting bar example
1. Start with heat off, T= 0 is a very close initial guess

2. Increase the heat slightly, T=0 is a good initial guess
3. Increase heat again
SMA-HPC 2003 MIT
Basic Concepts
General Setting
Solve F ( x ( ) , ) = 0 where:
a) F ( x ( 0 ) , 0 ) = 0 is easy to solve Starts the continuation
b) F ( x (1) ,1) = F ( x )
Ends the continuation
c) x ( ) is sufficiently smooth Hard to insure!

x ( )
Dissallowed
0
SMA-HPC 2003 MIT
Basic Concepts
Template Algorithm
Solve F ( x ( 0 ) , 0 ) , x ( prev ) = x ( 0 )
=0.01, =
While < 1 {
x 0 ( ) = x ( prev )
Try to Solve F ( x ( ) , ) = 0 with Newton
If Newton Converged
x ( prev ) = x ( ) , = + , = 2
Else
1
= , = prev +
2
SMA-HPC 2003 MIT
Basic Concepts
R
Vs
+
-
v
Diode
Source/Load Stepping Examples
1
f ( v ( ) , ) = idiode ( v ) + ( v Vs ) = 0
R
f ( v, )
v
fL
idiode ( v )
G

F ( x, ) =
1
+
Not dependent!
R
f x ( x, y ) = 0
f y ( x, y ) + f l = 0
Source/Load Stepping Does Not Alter Jacobian

SMA-HPC 2003 MIT
Jacobian Altering Scheme
Description
F ( x ( ) , ) = F ( x ( ) ) + (1 ) x ( )
Observations
=0 F ( x ( 0 ) , 0 ) = x ( 0 ) = 0
F ( x ( 0 ) , 0 )
x
=I
=1 F ( x (1) ,1) = F ( x (1) )

F ( x ( 0 ) , 0 )
x
SMA-HPC 2003 MIT
F ( x (1) )
x
Problem is easy to solve and

Jacobian definitely nonsingular.
Back to the original problem

and original Jacobian
Basic Algorithm
Solve F ( x ( 0 ) , 0 ) , x ( prev ) = x ( 0 )
=0.01, =
While < 1 {
x 0 ( ) = x ( prev ) + ?
Try to Solve F ( x ( ) , ) = 0 with Newton
If Newton Converged
x ( prev ) = x ( ) , = + , = 2
Else
1
= , = prev +
2
SMA-HPC 2003 MIT
Initial Guess for each step.
x()
x ( + )
Initial Guess Error
x0 ( + ) = x ( )
0
SMA-HPC 2003 MIT
Update Improvement
F ( x ( + ) , + ) F ( x ( 0) , ) +
F ( x ( ) , )
x
( x ( + ) x ( ) )
F ( x ( ) , )
F ( x ( ) , )
x
Have From last
steps Newton
SMA-HPC 2003 MIT
x 0 ( + ) x ( ) =
Better Guess
for next steps
Newton
F ( x ( ) , )
Update Improvement Cont.
If
F ( x ( ) , ) = F ( x ( ) ) + (1 ) x ( )
Then
F ( x, )
= F ( x) x ( )
Easily Computed
SMA-HPC 2003 MIT
Update Improvement Cont. II.

( x ( ) , )
x 0 ( + ) = x ( )
F ( x ( ) , )
Graphically
x()
x0 ( + )
0
SMA-HPC 2003 MIT
Still can have problems
x()
Must switch back to

increasing lambda
Arc-length
steps
1
lambda steps
SMA-HPC 2003 MIT
Must switch from

increasing to
decreasing lambda
Arc-length Steps?
x()
arc-length
Arc-length
steps
( x ) + ( )
2
Must Solve For Lambda

F ( x, ) = 0
2
+
x
arc
=0
( prev )
( prev )
2
2
2
SMA-HPC 2003 MIT
Arc-length steps by Newton
F x k , k
k
2
x
x ( prev )
k
prev
SMA-HPC 2003 MIT
k
k

F x ,
x k +1 x k
=
k +1
k

k
2 prev
)
x ( )
k
k

F x ,
+ x
prev
2
2
arc 2
Arc-length Turning point
x( )
What happens here?

0
Upper left-hand
Block is singular
SMA-HPC 2003 MIT
F x k , k
k
x ( prev )
2
x
F x k , k
k
2 prev
Summary
Damped Newton Schemes
Globally Convergent if Jacobian is Nonsingular
Difficulty with Singular Jacobians
Introduce Continuation Schemes

Problem with Source/Load stepping
More General Continuation Scheme
Improving Efficiency
Better first guess for each continuation step
Arc-length Continuation
SMA-HPC 2003 MIT

Newton-Method Case Study Simulating
An Image Smoother
Jacob White
Thanks to Deepak Ramaswamy, Andrew Lumsdaine,

Jaime Peraire, Michal Rewienski, and Karen Veroy
Outline
Image Segmentation Example
Large nonlinear system of equations
Formulation? Continuation? Linear Solver?
Newton Iterative Methods

Accuracy Theorem
Matrix-free idea
Gershgorin Circle Theorem

Lends insight on iterative method convergence
Arc-Length Continuation
SMA-HPC 2003 MIT
Simple Smoother
Smoothed
Output
SMA-HPC 2003 MIT
Circuit Diagram
Image
Input
Nonlinear
Smoother
SMA-HPC 2003 MIT
Circuit Diagram
Nonlinear
Smoother
i v
SMA-HPC 2003 MIT
Nonlinear Resistor
1 e
Dv
E J D v
2
Nonlinear
Smoother
Nonlinear Resistor
Current
Voltage
SMA-HPC 2003 MIT
Nonlinear
Smoother
SMA-HPC 2003 MIT
Nonlinear Resistor
Varying Beta
Questions
What Equation Formulation?
Node-Branch or Nodal?
What Newton Method?

Standard, Damped, or Continuation?
What kind of Continuation?
What Linear Solver?

Sparse Gaussian Elimination or Krylov?
Will Krylov converge rapidly?
How will formulation, Newton choices
interact?
SMA-HPC 2003 MIT
Basic Algorithm
Newton-Iterative
Method
Nested Iteration
x 0 = Initial Guess, k 0
Repeat {
Solve (Using GCR)

J F x k 'x k 1 F x k
x k 1
k
for 'x k 1
x k 'x k 1
k 1
} Until
'x k 1 , F x k 1
small enough
How Accurately Should We Solve with GCR?

SMA-HPC 2003 MIT
Newton-Iterative
Method
Basic Algorithm
Solve Accuracy Required
After l steps of GCR
k 1,l
'
x

Newton
delta from
l GCR Steps
dE
J F xk
If
a)
b)
c)
J F1 x k
F xk
k ,l
rN
GCR
Residual
Inverse is bounded
JF x JF y d A x y
Derivative is Lipschitz Cont
2
k ,l
k
r d C F x
More accurate near convergence
Then
The Newton-Iterative Method Converges Quadratically
SMA-HPC 2003 MIT
Newton-Iterative
Method
Basic Algorithm
Convergence Proof
By definition of the Newton-Iterative Method

1
JF x
F x k r k ,l

Approximate Newton Direction

Multidimensional Mean Value Lemma
A
2
F x F y J F y x y d x y
2
x
k 1
x
Combining
F x
k 1
F x J x
k
SMA-HPC 2003 MIT
1
F x r
J x
F
k ,l
d A J xk
2 F
1
F x r
k
k ,l
Newton-Iterative
Method
Basic Algorithm
Convergence Proof Cont.
Canceling the Jacobian and its inverse on the previous slide
F x
k 1
F x F x r
k
k ,l
A
d J F xk
2
1
F x r
k
Combining terms and using the triangle inequality
F x k 1
A
d J F xk
2
1
F x r
k
k ,l
r k ,l
Using the Jacobian Bound and the triangle inequality
F x
SMA-HPC 2003 MIT
k 1
E 2A
2
F x
E 2 A r k ,l
1
r k ,l
k ,l
Newton-Iterative
Method
Basic Algorithm
Convergence Proof Cont. II
Using the bound on the iterative solver error
F x
k 1
E A
2
F x
E 2A F x k
2
1
2
C F xk
And combining terms yields
F x
k 1
SMA-HPC 2003 MIT
2
2
2
k

E
F
x
A
E A
C F xk
d
1

2
2

Easily Bounded
Newton-Iterative
Method
Matrix-Free Idea
Consider Applying GCR to The Newton Iterate Equation
J F x k 'x k 1
F xk
At each iteration GCR forms a matrix-vector product
JF x
p |
F x k H pl F x k
It is possible to use Newton-GCR without Jacobians!
Need to Select a good H

SMA-HPC 2003 MIT
Gerschgorin Circle
Theorem
Given a matrix
Theorem Statement
m1,1 " m1, N
%
#
#
mN ,1 " mN , N
For each eigenvalue of M there exists an i, 1< i < N such that
O mi ,i d mi , j
j zi
We say that the eigenvalues are contained in the

union of the Gerschgorin circles
SMA-HPC 2003 MIT
Gerschgorin Circle
Theorem
Theorem Statement
Picture of Gerschgorin
Im O
ith circle
radius
Eigenvalues are in the

union of all the disks
i, j
j zi
Re O
ith circle
center
SMA-HPC 2003 MIT
mi ,i
Gerschgorin Circle
Theorem
1
N
SMA-HPC 2003 MIT
Grounded Resistor Line
Nodal Matix
2.1 1
1 2.1
% 1
1 2.1

N 1
Nodal
Equation
Form
Gerschgorin Circle
Theorem
1
SMA-HPC 2003 MIT
Resistor Line
Nodal Matix
N 1
2 1
1 2
Nodal
% 1 Equation
Form
1 2

Basic Concepts
General Setting
Solve F x O , O 0 where:
a) F x 0 , 0 0 is easy to solve
b) F x 1 ,1
F x
Starts the continuation
Ends the continuation
c) x O is sufficiently smooth
Hard to insure!
x O
Dissallowed
0
SMA-HPC 2003 MIT
Solve F x 0 , 0 , x O prev
GO =0.01, O GO
Basic Concepts
Template Algorithm
x 0
While O 1 {
x 0 O x O prev
Try to Solve F x O , O 0 with Newton
If Newton Converged
x O prev x O , O O GO , GO
Else
1
GO
GO , O O prev GO
2
SMA-HPC 2003 MIT
2GO
Description
F x O , O O F x O 1 O x O
Observations
O =0 F x 0 , 0 x 0 0
wF x 0 , 0
wx
O =1 F x 1 ,1 F x 1
wF x 0 , 0
wF x 1
wx
wx
SMA-HPC 2003 MIT
Problem is easy to solve and

Jacobian definitely nonsingular.
Back to the original problem

and original Jacobian
Solve F x 0 , 0 , x O prev
GO =0.01, O GO
Basic Algorithm
x 0
While O 1 {
x 0 O x O prev ?
Try to Solve F x O , O 0 with Newton
If Newton Converged
x O prev x O , O O GO , GO
Else
1
GO
GO , O O prev GO
2
SMA-HPC 2003 MIT
2GO
Still can have problems
xO
Must switch back to
increasing lambda
Arc-length
steps
1
lambda steps
SMA-HPC 2003 MIT
Must switch from

increasing to
decreasing lambda
Arc-length Steps?
xO
arc-length |
Arc-length
steps
'x GO
Must Solve For Lambda

F x, O 0
2
2
O

O

x

x
O

arc
prev
prev
2
SMA-HPC 2003 MIT
Arc-length steps by Newton
wF x k , O k
wx
k
2
x
x O prev

Ok O
prev
SMA-HPC 2003 MIT
k
k

wF x , O
x k 1 x k
wO
k 1
k
O O
k
2 O O prev

x O
k
k

F x ,O
x
prev
2
2
arc 2
Arc-length Turning point
x O
What happens here?

0
Upper left-hand
Block is singular
SMA-HPC 2003 MIT
wF x k , O k
wx
k
x O prev
2
x
wF x k , O k
wO
k
2 O O prev
Summary
Image Segmentation Example
Large nonlinear system of equations
Examined issues in selecting numerical
methods
Newton Iterative Methods

Do not need to solve iteration equations exactly
Gershgorin Circle Theorem

Sometimes gives useful bounds on eigenvalues
Arc-Length Continuation
SMA-HPC 2003 MIT

Methods for Ordinary Differential Equations
Jacob White

Outline
Initial Value problem examples
Signal propagation (circuits with capacitors).
Space frame dynamics (struts and masses).
Chemical reaction dynamics.
Investigate the simple finite-difference methods
Forward-Euler, Backward-Euler, Trap Rule.
Look at the approximations and algorithms
Examine properties experimentally.
Analyze Convergence for Forward-Euler
Application
Problems
Signal Transmission in an
Integrated Circuit
Signal Wire
Wire has resistance
Wire and ground plane form a capacitor
Logic
Gate
Ground Plane
Metal Wires carry signals from gate to gate.
How long is the signal delayed?
SMA-HPC 2003 MIT
Logic
Gate
Application
Problems
Integrated Circuit
Circuit Model
resistor
capacitor
Constructing the Model

Cut the wire into sections.
Model wire resistance with resistors.
Model wire-plane capacitance with capacitors.
SMA-HPC 2003 MIT
Application
Problems
Oscillations in a Space
Frame
What is the oscillation amplitude?

SMA-HPC 2003 MIT
Application
Problems
Frame
Simplified Structure
Bolts
Struts
Ground
Example Simplified for Illustration
Load
Application
Problems
Frame
Modeling with Struts, Joints and
Point Masses
Point Mass
Strut
Constructing the Model

Replace Metal Beams with Struts.
Replace cargo with point mass.
Application
Problems
Chemical Reaction
Dynamics
Crucible
Reagent
Strange green
stuff
How fast is product produced?
Does it explode?
SMA-HPC 2003 MIT
Integrated Circuit
Application
Problems
v1
iC1
C1
iR1
A 2x2 Example
v2
iR2
R2
R1 R3
iR3
iC2
C2
Constitutive
Equations
dv
ic = C c
dt
1
iR = vR
R
Conservation
Laws
iC1 + iR1 + iR2 = 0

iC2 + iR3 iR2 = 0
Nodal Equations Yields 2x2 System

C1
0
SMA-HPC 2003 MIT
1
1
dv1
+
R1 R2
0 dt
C2 dv2
1
R
dt
2
R2 v1

1
1 v2
+
R3 R2
Application
Problems
Integrated Circuit
A 2x2 Example
Let C1 = C2 = 1, R1 = R3 = 10, R2 = 1
C1
0
1
1
dv1
+
R R
0 dt
1
2
C2 dv2
1
R
dt
2
dx 1.1 1.0
=
x
dt 1.0 1.1

1
R2 v1
1
1 v2
+
R3 R2
Eigenvalues and Eigenvectors

1 1 0.1 0 1 1
A=
2.1 1 1
1 1 0
eigenvectors
SMA-HPC 2003 MIT
Eigenvalues
An Aside on Eigenanalysis
dx(t )
Consider an ODE:
= Ax(t ),
dt
Eigendecomposition:
x(0) = x0
#
# 1 0 0 #
#
A = E1 E2 En 0 % 0 E1
#
#
# 0 0 n #

#
E2
#
Change of variables: Ey (t ) = x (t ) y (t ) = E 1 x (t )
dEy (t )
= AEy (t ), Ey (0) = x0
Substituting:
dt
1 0 0
1 dy (t )
1
Multiply by E :
= E AEy (t ) = 0 % 0 y(t )
0 0
dt
n
SMA-HPC 2003 MIT
#
En
#
An Aside on Eigenanalysis Continued

From last slide:
1 0 0
dy(t ) = 0 % 0
dt 0 0
n
y(t )
Decoupled
Equations!
dyi (t )
i t
= i yi (t ) yi (t ) = e y (0)
Decoupling:
dt
dx(t )
Steps for solving
= Ax(t ), x(0) = x0
dt
1) Determine E ,
1t
1
e
0
0
2) Compute y (0) = E x0
3) Compute y (t ) = 0 % 0 y (0)
0 0 ent
4) x (t ) = Ey (t )
SMA-HPC 2003 MIT
Application
Problems
Integrated Circuit
A 2x2 Example
v1 (0) = 1
v2 (0) = 0
Notice two time scale behavior

v1 and v2 come together quickly (fast eigenmode).
v1 and v2 decay to zero slowly (slow eigenmode).
SMA-HPC 2003 MIT
Application
Problems
y = y0 + u
fs
fm
Struts, Joints and point

mass example
A 2x2 Example
Constitutive
Equations
d 2u
fm = M 2
dt
Conservation
Law
fs + fm = 0
y y0 EAc
fs = EAc
u
=
y0
y0
Define v as velocity (du/dt) to yield a 2x2 System

M
0
SMA-HPC 2003 MIT
dv
EAc
0
0 dt
v
y0
=
u
1 du
0
dt 1
Application
Problems

mass example
A 2x2 Example
EAc
=1
Let M = 1,
y0
M
0
dv
EAc
0
0 dt
v
y
=

0
u
1 du
0
dt 1
dx 0 1.0
=
x
0
dt 1.0


1 1 i 0 1 1
A=
i i 0 i i i
eigenvectors
SMA-HPC 2003 MIT
Eigenvalues
Application
Problems

mass example
A 2x2 Example
v (0) = 1
0.5
0
u (0) = 0
-0.5
-1
10
Note the system has imaginary eigenvalues

Persistent Oscillation
Velocity, v, peaks when displacement, u, is zero.
SMA-HPC 2003 MIT
15
Application
Problems
Chemical Reaction
Example
A 2x2 Example
Amount of reactant = R, the temperature = T

dT
= T + R
dt
dR
= R + 4T
dt
SMA-HPC 2003 MIT
More reactant causes temperature to rise,

higher temperatures increases heat dissipation
causing temperature to fall
Higher temperatures raises reaction rates,
increased reactant interferes with reaction
and slows rate.
Application
Problems
Chemical Reaction
Example
A 2x2 Example
dT
dt 1 1 T
=
R
dR
4
1

dt
dx 1 1
=
x
dt 4 1


1 1 1 0 1 1
A=
0 3 2 2
2
2
eigenvectors
SMA-HPC 2003 MIT
Eigenvalues
Chemical Reaction
Example
Application
Problems
A 2x2 Example
12
10
8
6
R (0) = 0
T (0) = 1
2
0
0
0.5
1.5
Note the system has a positive eigenvalue

Solutions grow exponentially with time.
SMA-HPC 2003 MIT
2.5
Basic Concepts
Finite Difference
Methods
t t
First - Discretize Time
t1
Second - Represent x(t) using values at ti

3
x 2
4
x
1
x
x
0
t1
t2
Third - Approximate
t3
tL
d
the
)
x(tlusing
dt
l
l 1
d
x x
Example:
x (tl )
dt
tl
SMA-HPC 2003 MIT
t L 1 t L = T
t2
l
x x(tl )
Approx.
soln
discrete
x l +1 x l
or
tl +1
Exact
soln
x l 's
Finite Difference
Methods
x
slope =
d
x(tl )
dt
slope =
tl
tl +1
x(tl +1 ) x(tl )
t
Basic Concepts
Forward Euler Approximation
x(tl +1 ) x(tl )
d
x(tl ) = A x(tl )
t
dt
or
x(tl +1 ) x(tl ) + t A x(tl )
= x(tl +1 ) ( x(tl ) + t A x(tl ) )
SMA-HPC 2003 MIT
Finite Difference
Methods
Basic Concepts
Forward Euler Algorithm
x(t1 ) x1 = x(0) + tAx ( 0 )
x(t2 ) x 2 = x1 + tAx1
#
x(t L ) x L = x L 1 + tAx L 1
SMA-HPC 2003 MIT
tAx1
tAx (0)
t1
t2
t3
Basic Concepts
Finite Difference
Methods
x
slope =
d
x(tl +1 )
dt
slope =
tl
tl +1
x(tl +1 ) x(tl )
t
Backward Euler
Approximation
x(tl +1 ) x(tl )
d
x(tl +1 ) = A x(tl +1 )
t
dt
or
x(tl +1 ) x(tl ) + t A x(tl +1 )
= x(tl +1 ) ( x(tl ) + t A x(tl +1 ) )
SMA-HPC 2003 MIT
Finite Difference
Methods
Basic Concepts
Backward Euler Algorithm
Solve with Gaussian Elimination
x(t1 ) x1 = x(0) + tAx1

[ I tA] x1 = x(0)
x(t2 ) x 2 = [ I tA] 1 x1
#
x(t L ) x L = [ I tA] 1 x L 1
SMA-HPC 2003 MIT
tAx 2
tAx1
t1
t2
Finite Difference
Methods
1 d
d
( x(tl +1 ) + x(tl ))
2 dt
dt
1
= ( Ax(tl +1 ) + Ax(tl ))
2
x(tl +1 ) x(tl )

t
1
x(tl +1 ) x(tl ) + tA( x(tl +1 ) + x(tl ))
2
Basic Concepts
Trapezoidal Rule
slope =
d
x(tl )
d
dt
x(tl +1 )
slope =
dt
slope =
x (tl +1 ) x (tl )
t
1
1
= ( x(tl +1 ) tAx(tl )) ( x(tl ) + tAx(tl +1 ))
2
2
SMA-HPC 2003 MIT
Basic Concepts
Finite Difference
Methods
Trapezoidal Rule Algorithm
Solve with Gaussian Elimination
t
x (t1 ) x = x (0) + ( Ax (0) + Ax1 )
2
t 1
t
I
A x = I +
A x (0)
2
2
2
x (t2 ) x = I
2
x (tL ) x = I
SMA-HPC 2003 MIT
t

A I +
2

1
t

A I +
2

A x1
A x L1
t1
t2
Finite Difference
Methods
Basic Concepts
Numerical Integration View
tl +1
d
x (t ) = A x (t ) x (tl +1 ) = x (tl ) + Ax ( )d
tl
dt
tl +1
t
tl Ax( )d 2 ( Ax(tl ) + Ax(tl ) ) Trap
tAx (tl +1 ) BE
tAx (tl )
SMA-HPC 2003 MIT
tl
tl +1
FE
Finite Difference
Methods
Basic Concepts
Summary
Trap Rule, Forward-Euler, Backward-Euler

Are all one-step methods
x l is computed using only x l 1 , not x l 2 , x l 3 , etc.
Forward-Euler is simplest
No equation solution
explicit method.
Boxcar approximation to integral
Backward-Euler is more expensive
Equation solution each step
implicit method
Trapezoidal Rule might be more accurate
Equation solution each step
implicit method
Trapezoidal approximation to integral
SMA-HPC 2003 MIT
Finite Difference
Methods
Numerical Experiments
Unstable Reaction
t = 0.1
3.5
3
Exact Solution
Backward-Euler
2.5
2
Trap rule
1.5
Forward-Euler
1
0.5
0.5
1.5
FE and BE results have larger errors than Trap Rule,

and the errors grow with time.
SMA-HPC 2003 MIT
Finite Difference
Methods
Unstable Reaction-Error Plots
0.4
-0.05
0.35
Backward
Euler
0.3
-0.1
x 10
Trap
Rule
0.25
0.2
0.15
0.1
0.05
-1
-3
-0.15
Forward
Euler
-0.2
-0.25
-0.3
-0.35
-2
All methods have errors which grow exponentially

SMA-HPC 2003 MIT
Finite Difference
Methods
10
M
a
x
E
r
r
o
r
10
10
10
10
10
Unstable Reaction-Convergence
Backward-Euler
-2
-4
Trap rule
Forward-Euler
-6
-8
10
-3
10
-2
Timestep
10
-1
10
For FE and BE, Error t For Trap, Error ( t )

SMA-HPC 2003 MIT
Finite Difference
Methods
4
2
Oscillating Strut and Mass
Forward-Euler
t = 0.1
0
-2
-4
-6
Trap rule
0
10
Backward-Euler
15
20
25
30
Why does FE result grow, BE result decay and the

Trap rule preserve oscillations
SMA-HPC 2003 MIT
Finite Difference
Methods
Two timescale RC Circuit
small t
Backward-Euler
Computed Solution
large t
With Backward-Euler it is easy to use small timesteps for

the fast dynamics and then switch to large timesteps for the
slow decay
SMA-HPC 2003 MIT
Finite Difference
Methods
Two timescale RC Circuit
Forward-Euler Computed
Solution
The Forward-Euler is accurate for small timesteps, but goes

unstable when the timestep is enlarged
SMA-HPC 2003 MIT
Finite Difference
Methods
Convergence
Summary
Did the computed solution approach the exact solution?

Why did the trap rule approach faster than BE or FE?
Energy Preservation
Why did BE produce a decaying oscillation?
Why did FE produce a growing oscillation?
Why did trap rule maintain oscillation amplitude?
Two timeconstant (stiff) problems

Why did FE go unstable when the timestep increased?
We will focus on convergence today
SMA-HPC 2003 MIT
Finite Difference
Methods
Convergence Definition
Definition: A finite-difference method for solving initial

value problems on [0,T] is said to be convergent if
given any A and any initial condition
max
x x ( l t ) 0 as t 0
l
T
l0,
t
x l computed with t
t
l
x computed with
2
xexact
SMA-HPC 2003 MIT
Finite Difference
Methods
Order-p convergence
Definition: A finite-difference method for solving initial

value problems on [0,T] is said to be order p convergent
if given any A and any initial condition
max
x x ( l t ) C ( t )
l
T
l0,
t
for all t less than a given t0

Forward- and Backward-Euler are order 1 convergent
Trapezoidal Rule is order 2 convergent
SMA-HPC 2003 MIT
Finite Difference
Methods
Two Conditions for
Convergence
1) Local Condition: One step errors are small

(consistency)
Typically verified using Taylor Series
2) Global Condition: The single step errors do not grow
too quickly (stability)
All one-step methods are stable in this sense.
SMA-HPC 2003 MIT
Finite Difference
Methods
Consistency Definition
Definition: A one-step method for solving initial value

problems on an interval [0,T] is said to be consistent if
for any A and any initial condition
x x ( t )
1
SMA-HPC 2003 MIT
0 as t 0
Finite Difference
Methods
Consistency for Forward Euler
Forward-Euler definition
1
x = x ( 0 ) + tAx ( 0 )
Expanding in t about zero yields
[0 , t ]
dx ( 0 ) ( t ) d 2 x ( )
x(t ) = x ( 0 ) + t
+
dt
2
dt 2
2
d
Noting that
x(0) = Ax(0) and subtracting
dt
2
2
Proves the theorem if
( t ) d x ( ) derivatives of x are
1
x x ( t )
SMA-HPC 2003 MIT
dt
bounded
Finite Difference
Methods
Convergence Analysis for
Forward Euler
l +1
l
l
x = x + tAx
Expanding in t about l t yields
x ( ( l + 1) t ) = x ( l t ) + tAx ( l t ) + el
l
where e is the "one-step" error bounded by

e C ( t ) , where C = 0.5 max [0,T ]
l
SMA-HPC 2003 MIT
d x ( )
2
dt
2
Finite Difference
Methods
Convergence Analysis for
Forward Euler Continued
Subtracting the previous slide equations

x
l +1
x ( ( l + 1) t ) = ( I + tA ) ( x x ( l t ) ) + e
l
Define the "Global" error E x x ( l t )

l
l +1
= ( I + tA ) E + e
l
Taking norms and using the bound on e

E
l +1
( I + tA ) E + C ( t )
l
(1 + t A
SMA-HPC 2003 MIT
E + C ( t )
l
Finite Difference
Methods
A helpful bound on difference

equations
A lemma bounding difference equation solutions
If
l +1
Then
(1 + ) u + b, u = 0, > 0
l
e
l
u
b
To prove, first write u as a power series and sum

l 1
u (1 + )
l
j =0
SMA-HPC 2003 MIT
1 (1 + )
b=
b
1 (1 + )
j
Finite Difference
Methods

equations cont.
To finish, note (1 + ) e (1 + )l e l
l
1
1
+
1
+
1
(
)
(
)
e
l
u
b=
b
b
1 (1 + )
Mapping the global error equation to the lemma

E l +1
SMA-HPC 2003 MIT
2
l
1 + t A E + C ( t )
N

Finite Difference
Methods
Back to Forward Euler

Convergence analysis.
Applying the lemma and cancelling terms
l t A
2
e
l
l 1
2
E 1 + t A E + C ( t )
C
t
(
)
N

t A

Finally noting that l t T ,

max l[0, L] E e
l
SMA-HPC 2003 MIT
AT
C
t
A
Finite Difference
Methods
Observations about the

forward-Euler analysis.
max l[0, L] E e
l
AT
C
t
A
forward-Euler is order 1 convergent

The bound grows exponentially with time interval
C is related to the solution second derivative
The bound grows exponentially fast with norm(A).
SMA-HPC 2003 MIT
Finite Difference
Methods
Exact and forward-Euler(FE)

Plots for Unstable Reaction.
12
10
RFE
Rexact
Texact
TFE
2
0
0
0.5
1.5
Forward-Euler Errors appear to grow with time

SMA-HPC 2003 MIT
2.5
Finite Difference
Methods
1.2
forward-Euler errors for

solving reaction equation.
E
0.8
r
r 0.6
o0.4
r
Rexact-RFE
0.2
Texact - TFE
0
-0.2
0.5
Time
1.5
2.5
Note error grows exponentially with time, as bound

predicts
SMA-HPC 2003 MIT
Finite Difference
Methods

Plots for Circuit.
v1exact
0.8
v1FE
0.6
0.4
v2FE
0.2
0
0
v2exact
0.5
1.5
2.5
3.5
Forward-Euler Errors dont always grow with time

SMA-HPC 2003 MIT
Finite Difference
Methods

solving circuit equation.
0.03
v1exact - v1FE
E 0.02
r
r 0.01
o
0
r
-0.01
v2exact-v2FE
-0.02
-0.03
0.5
1.5
Time2
2.5
3.5
Error does not always grow exponentially with time!

Bound is conservative
SMA-HPC 2003 MIT
Summary
Initial Value problem examples
Signal propagation (two time scales).
Space frame dynamics (oscillator).
Chemical reaction dynamics (unstable system).
Looked at the simple finite-difference methods
Forward-Euler, Backward-Euler, Trap Rule.
Look at the approximations and algorithms
Experiments generated many questions
Analyzed Convergence for Forward-Euler
Many more questions to answer, some next time

Convergence of Multistep Methods
Jacob White
Thanks to Deepak Ramaswamy, Michal Rewienski, and

Karen Veroy
Outline
Small Timestep issues for Multistep Methods
Local truncation error
Selecting coefficients.
Nonconverging methods.
Stability + Consistency implies convergence
Next Time Investigate Large Timestep Issues
Absolute Stability for two time-scale examples.
Oscillators.
Basic Equations
Multistep Methods
General Notation
Nonlinear Differential Equation:

k-Step Multistep Approach:
x
x
l k
l 2
x l 1 x l
d
x(t ) = f ( x(t ), u (t ))
dt
j =0
j =0
l j
l j
x
=
f
x
, u ( tl j )
j
j
Multistep coefficients
Solution at discrete points
tl k
tl 3 tl 2 tl 1 tl
Time discretization
Basic Equations
Multistep Methods
Common Algorithms
Multistep Equation:
Multistep Coefficients:
BE Discrete Equation:
Trap Discrete Equation:
j =0
j =0
l j
l j
x
=
f
x
, u ( tl j )
j
j
Forward-Euler Approximation:
FE Discrete Equation:
x ( tl ) x ( tl 1 ) + t f ( x ( tl 1 ) , u ( tl 1 ) )
xl x l 1 = t f ( x l 1 , u ( tl 1 ) )
k = 1, 0 = 1, 1 = 1, 0 = 0, 1 = 1
x l x l 1 = t f ( x l , u ( tl ) )
k = 1, 0 = 1, 1 = 1, 0 = 1, 1 = 0
t
f ( x l , u ( tl ) ) + f ( x l 1 , u ( tl 1 ) )
2
1
1
k = 1, 0 = 1, 1 = 1, 0 = , 1 =
2
2
x l x l 1 =
Multistep Methods
Basic Equations
Definitions and Observations
Multistep Equation:
j =0
j =0
l j
l j
x
=
f
x
, u ( tl j )
j
j
1) If 0 0 the multistep method is implicit

2) A k step multistep method uses k previous x ' s and f ' s
3) A normalization is needed, 0 = 1 is common
4) A k -step method has 2k + 1 free coefficients
How does one pick good coefficients?

Want the highest accuracy
Simplified Problem for

Analysis
Multistep Methods
d
Scalar ODE:
v ( t ) = v(t ), v ( 0 ) = v0
dt
Why such a simple Test Problem?
Nonlinear Analysis has many unrevealing subtleties

Scalar is equivalent to vector for multistep methods.
multistep
d
x ( t ) = Ax(t ) discretization
dt
Let Ey (t ) = x(t )
Decoupled
Equations
l j
l j
x
=
Ax
j
j
k
j =0
l j
l j
1
y
=
E
AEy
j
j
j =0
j =0
l j
1
= t j
j =0
j =0
j =0
y l j

Analysis
Multistep Methods
Scalar ODE:
d
v ( t ) = v(t ), v ( 0 ) = v0
dt
k
k
Scalar Multistep formula:
l j
l j
v
=
v
j
j
j =0
j =0
Must Consider ALL

Im ( )
Decaying
Solutions
O
s
c
i
l
l
a
t
i
o
n
s
Growing
Solutions
Re ( )
Multistep Methods
Definition: A multistep method for solving initial value

problems on [0,T] is said to be convergent if given any
initial condition
max
T
l0,
t
vl v ( l t ) 0 as t 0
v l computed with t
t
l
v computed with
2
vexact
Multistep Methods
Order-p convergence
Definition: A multi-step method for solving initial value

problems on [0,T] is said to be order p convergent if
given any and any initial condition
max
v v ( l t ) C ( t )
l
T
l0,
t
for all t less than a given t0

Forward- and Backward-Euler are order 1 convergent
Trapezoidal Rule is order 2 convergent
Multi-step Methods
10
M
a
x
E
r
r
o
r
10
-2
10
-4
10
10
Reaction Equation Example
10
Backward-Euler
Trap rule
Forward-Euler
-6
-8
10
-3
10
-2
Timestep
10
-1
10
For FE and BE, Error t For Trap, Error ( t )
Multistep Methods
Two Conditions for
Convergence

(consistency)
All one-step (k=1) methods are stable in this sense.
Multi-step (k > 1) methods require careful analysis.
Multistep Methods
Global Error Equation

k
Multistep formula:
Exact solution Almost
satisfies Multistep Formula:
j v
l j
j =0
k
t j vl j = 0
j =0
d
l
=
v
t
t
v
t
e
(
j ( l j )
j
l j )
dt
j =0
j =0
Global Error: E l v ( tl ) v l
Local Truncation Error

(LTE)
Difference equation relates LTE to Global error

l
l 1
l k
l
E
+
E
+
+
E
=
e
( 0
( 1
( k
0)
1)
k)
Forward-Euler
Consistency for Forward Euler
l +1
v t v = 0
l
l t , ( l + 1) t
Substituting the exact v ( t ) and expanding

2
dv ( l t ) ( t ) d 2 v ( )
v ( ( l + 1) t ) v ( l t ) t
=
2
l
d
v = v
dt
dt
dt
el
where e is the LTE and is bounded by

2
d
v ( )
2
l
e C ( t ) , where C = 0.5 max [0,T ]
2
dt
Forward-Euler
l +1
l
l
v = v + t v
Using the LTE definition
v ( ( l + 1) t ) = v ( l t ) + t v ( l t ) + e
Subtracting yields global error equation

l +1
l
l
E = ( I + t ) E + e
l
Using magnitudes and the bound on e
E
l +1
I + t E + e (1 + t ) E + C ( t )
l
Forward-Euler

equations
A lemma bounding difference equation solutions
If
Then
l +1
(1 + ) u + b, u = 0, > 0
l
e
l
u
b
l
l
To prove, first write u as a power series and sum

l 1
u (1 + )
l
j =0
1 (1 + )
b=
b
1 (1 + )
l
One-step Methods
equations cont.
To finish, note (1 + ) e (1 + ) e
l
1
1
+
1
+
1
(
)
(
)
e
l
u
b=
b
b
1 (1 + )
One-step Methods
Back to Forward Euler

Convergence analysis.
Applying the lemma and cancelling terms
l t
2
l +1
l
2
E 1 + t E + C ( t ) e
C ( t )
b
Finally noting that l t T ,
max l[0, L] E e
l
Forward-Euler
Observations about the

forward-Euler analysis.
max l[0, L] E e
l
forward-Euler is order 1 convergent

Bound grows exponentially with time interval.
C related to exact solutions second derivative.
The bound grows exponentially with time.
Forward-Euler

Plots for Unstable Reaction.
12
10
RFE
Rexact
6
4
TempExact
TFE
2
0
0
0.5
1.5
Forward-Euler Errors appear to grow with time
2.5
Forward-Euler

solving reaction equation.
1.2
1
E
0.8
r
r 0.6
o0.4
r
Rexact-RFE
0.2
Texact - TFE
0
-0.2
0.5
Time
1.5
2.5
Note error grows exponentially with time, as bound

predicts
Forward-Euler

Plots for Circuit.
v1exact
0.8
v1FE
0.6
0.4
v2FE
0.2
0
0
v2exact
0.5
1.5
2.5
3.5
Forward-Euler Errors dont always grow with time
Forward-Euler

solving circuit equation.
0.03
v1exact - v1FE
E 0.02
r
r 0.01
o
0
r
-0.01
v2exact-v2FE
-0.02
-0.03
0.5
1.5
Time2
2.5
3.5
Error does not always grow exponentially with time!

Bound is conservative
Making LTE Small
Multistep Methods
Exactness Constraints
k
d
l
=
v
t
t
v
t
e
(
Local Truncation Error: j ( l j )

j
l j )
dt
j =0
j =0
Can't be from
d
v (t ) = v (t )
dt
LTE
d
p 1
If v ( t ) = t v ( t ) = pt
dt
p
( ( k j ) t )
v (t )
j =0
k j
t j p ( ( k j ) t )
j =0
d
v ( tk j )
dt
p 1
=e
Making LTE Small
Multistep Methods
Exactness Constraints Cont.

k
( ( k j ) t )
j =0
( t )
If
t j p ( ( k j ) t )
p 1
j =0
k
k
p
p 1
k
j (l j ) j p (l j ) = e
j =0
j =0
k
k
p
p 1
j (( k j )) j p ( k j ) = 0
j =0
j =0
then e k = 0 for v(t ) = t p
As any smooth v(t) has a locally accurate Taylor series in t:
if
k
k
p
p 1
j ( k j ) j p ( k j ) = 0 for all p p0
j =0
j =0
k
k
l
d
p0 +1
Then j v ( tl j ) j v ( tl j ) = e = C ( t )
dt
j =0
j =0
Multistep Methods
Making LTE Small

Exactness Constraint k=2
Example
k
k
p
p 1
Exactness Constraints: j ( k j ) j p ( k j ) = 0
j =0
j =0
For k=2, yields a 5x6 system of equations for Coefficients

p=0
p=1
p=2
p=3
p=4
1
2
8
16
1
1
1
1
1
1 0
0 1
0 4
0 12
0 32
0
1
2
3
4
0
0 0
1
1
0
2
0 = 0
0
0
0
1
0 0
2
Note
i = 0
Always
Making LTE Small
Multistep Methods
Exactness
Constraints for k=2
1
2
8
16

Example Continued
0
0 0 0
1
1 1 0
2
1 0 4 2 0 = 0
0

1 0 12 3 0 0
1
1 0 32 4 0 0
2
1 1
1 0
0
1
Forward-Euler 0 = 1, 1 = 1, 2 = 0, 0 = 0, 1 = 1, 2 = 0,
2
FE satisfies p = 0 and p = 1 but not p = 2 LTE = C ( t )
Backward-Euler 0 = 1, 1 = 1, 2 = 0, 0 = 1, 1 = 0, 2 = 0,
2
BE satisfies p = 0 and p = 1 but not p = 2 LTE = C ( t )
Trap Rule 0 = 1, 1 = 1, 2 = 0, 0 = 0.5, 1 = 0.5, 2 = 0,
3
Trap satisfies p = 0,1, or 2 but not p = 3 LTE = C ( t )
Multistep Methods
Making LTE Small

example, generating methods
First introduce a normalization, for example 0 = 1

1
1
1
1
0 0 1 1
1 1 2 2
0 4 2 0 0 = 4

0 12 3 0 1 8
0 32 4 0 2 16
1
0
0
1
Solve for the 2-step method with lowest LTE
0 = 1, 1 = 0, 2 = 1, 0 = 1/ 3, 1 = 4 / 3, 2 = 1/ 3
Satisfies all five exactness constraints LTE = C ( t )
Solve for the 2-step explicit method with lowest LTE

0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2
Can only satisfy four exactness constraints LTE = C ( t )
Making LTE Small
Multistep Methods
LTE Plots for the FE, Trap, and

Best Explicit (BESTE).
10
d
v (t ) = v (t )
d
FE
-5
L 10
T
E
Trap
-10
10
Beste
Best Explicit Method

has highest one-step
accurate
-15
10
-4
10
-3
10
Timestep
-2
10
-1
10
10
Making LTE Small
Multistep Methods
Global Error for the FE, Trap,

and Best Explicit (BESTE).
10
M
d
a -2 d v (t ) = v (t )
10
x
E -4
r 10
r
-6
10
o
r
t [0,1]
FE
Wheres BESTE?
Trap
-8
10 -4
10
-3
10
-2
10
Timestep
-1
10
10
Multistep Methods

worrysome
200
M 10
a
x 100
E 10
r
r 0
10
o
r
10
Making LTE Small
Best Explicit Method has

lowest one-step error but
global errror increases as
timestep decreases
d
v (t ) = v (t )
d
Beste
FE
Trap
-100
10
-4
10
-3
10
Timestep
-2
10
-1
10
Multistep Methods
Stability of the method

Difference Equation
Why did the best 2-step explicit method fail to

Converge?
Multistep Method Difference Equation
l
l 1
E
+
E
+
( 0
( 1
0)
1)
v ( l t ) v
+ ( k t k ) E l k = el
LTE
Global Error
We made the LTE so small, how come the Global
error is so large?
An Aside on Solving Difference Equations

Consider a general kth order difference equation
a0 x + a1 x
l
l 1
+ ak x
l k
=u
Which must have k initial conditions

1
x = x0 , x = x1 ,
0
,x
= xk
As is clear when the equation is in update form
1
0
x = ( a1 x +
a0
1
+ ak x
k +1
Most important difference equation result

l
x can be related to u by x = h u
l
j =0
l j
An Aside on Difference Equations Cont.
If a0 z + a1 z
k
k 1
+ ak = 0 has distinct roots

1 , 2 , , k
k
Then x = h u where h = j ( j )
l
l j
j =0
j =1
To understand how h is derived, first a simple case
Suppose x = x + u and x = 0
1
0
1
1
2
1
2
1
2
x = x + u = u , x = x + u = u + u
l 1
x =
l
j =0
l j
An Aside on Difference Equations Cont.

Three important observations
If i <1 for all i, then x C max j u

where C does not depend on l
l
If i >1 for any i, then there exists

j
l
a bounded u such that x
If i 1 for all i, and if i =1, i is distinct
then x Cl max j u
l
Multistep Methods

Difference Equation

l
l 1
E
+
E
+
( 0
( 1
0)
1)
+ ( k t k ) E l k = el
Definition: A multistep method is stable if and only if

As t 0 max
T
E C max T el
l0,
t
t
l
T
l0,
t
for any el
Theorem: A multistep method is stable if and only if

The roots of 0 z k + 1 z k 1 + + k = 0 are either
Less than one in magnitude or equal to one and distinct
Multistep Methods

Stability Theorem Proof
Given the Multistep Method Difference Equation

l
l 1
E
+
E
+
( 0
( 1
0)
1)
+ ( k t k ) E l k = el
If the roots of
k j
z
j =0
j =0
are either
less than one in magnitude

equal to one in magnitude but distinct
Then from the aside on difference equations
E Cl max l e
l
From which stability easily follows.
Multistep Methods

roots of
j =0
k j
=0
-1
roots of
Im
As t 0, roots
move inward to
match polynomial
Re
k j
z
= 0 for a nonzero t
( j
j)
j =0
Multistep Methods

The BESTE Method
Best explicit 2-step method
0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2
Im
roots of z + 4 z 5 = 0
2
-5
-1
Method is Wildly unstable!
Re
Multistep Methods

Dahlquists First Stability
Barrier
For a stable, explicit k-step multistep method, the

maximum number of exactness constraints that can be
satisfied is less than or equal to k (note there are 2k
coefficients). For implicit methods, the number of
constraints that can be satisfied is either k+2 if k is even
or k+1 if k is odd.
Multistep Methods
Conditions for convergence,

stability and consistency
1) Local Condition: One step errors are small (consistency)

Exactness Constraints up to p0 (p0 must be > 0)
max
T
l0,
t
C1 ( t )
p0 +1
for t < t0
2) Global Condition: One step errors grow slowly (stability)
roots of
k j
z
j = 0 inside or simple on unit circle
j =0
max
T
E C2 max T el
l0,
t
t
l
T
l0,
t
Convergence Result:
max
E CT ( t )
l
T
l 0,
t
p0
Summary
Local truncation error and Exactness.
Difference equation stability.
Stability + Consistency implies convergence.
Next time
Oscillators.
Maybe Runge-Kutta schemes

Multistep Methods II
Jacob White

Karen Veroy
Outline
Reminder about LTE minimization
A nonconverging example
Stability + Consistency implies convergence
Investigate Large Timestep Issues
Oscillators.
Basic Equations
Multistep Methods
General Notation
Nonlinear Differential Equation:

k-Step Multistep Approach:
x
x
l k
l 2
x l 1 x l
d
x(t ) = f ( x(t ), u (t ))
dt
j =0
j =0
l j
l j
x
=
f
x
, u ( tl j )
j
j
Multistep coefficients
Solution at discrete points
tl k
tl 3 tl 2 tl 1 tl
Time discretization

Analysis
Multistep Methods
Scalar ODE:
d
v ( t ) = v(t ), v ( 0 ) = v0
dt
k
k
Scalar Multistep formula:
l j
l j
v
=
v
j
j
j =0
j =0
Must Consider ALL ^

Im ( )
Decaying
Solutions
O
s
c
i
l
l
a
t
i
o
n
s
Growing
Solutions
Re ( )
Multistep Methods
Definition: A multistep method for solving initial value

problems on [0,T] is said to be convergent if given any
initial condition
max
T
l0,
t
vl v ( l t ) 0 as t 0
v l computed with t
t
l
v computed with
2
vexact
Multistep Methods
Two Conditions for
Convergence

(consistency)
Multi-step (k > 1) methods require careful analysis.
Multistep Methods

k
Multistep formula:
Exact solution Almost
satisfies Multistep Formula:
j v
l j
j =0
k
t j vl j = 0
j =0
d
l
=
v
t
t
v
t
e
(
j ( l j )
j
l j )
dt
j =0
j =0
Global Error: E l v ( tl ) v l
Local Truncation Error

(LTE)
Difference equation relates LTE to Global error

l
l 1
l k
l
E
+
E
+
"
+
E
=
e
( 0
( 1
( k
0)
1)
k)
Making LTE Small
Multistep Methods
Exactness Constraints
k
d
l
=
v
t
t
v
t
e
(
Local Truncation Error: j ( l j )

j
l j )
dt
j =0
j =0
Can't be from
d
v (t ) = v (t )
dt
LTE
d
p 1
If v ( t ) = t v ( t ) = pt
dt
p
j ) t )
( k
(
v (t )
j =0
k j
p 1
t j p ( ( k j ) t )
=e

j =0
d
v ( tk j )
dt
Multistep Methods
Making LTE Small

Example
k
k
p
p 1
Exactness Constraints: j ( k j ) j p ( k j ) = 0
j =0
j =0
For k=2, yields a 5x6 system of equations for Coefficients

p=0
p=1
p=2
p=3
p=4
1
2
8
16
1
1
1
1
1
1 0
0 1
0 4
0 12
0 32
0
1
2
3
4
0
0 0
1
1
0
2
0 = 0
0
0
0
1
0 0
2
Note
i = 0
Always
Multistep Methods
Making LTE Small

example, generating methods
First introduce a normalization, for example 0 = 1

1
1
1
1
0 0 1 1
1 1 2 2
0 4 2 0 0 = 4

0 12 3 0 1 8
0 32 4 0 2 16
1
0
0
1
Solve for the 2-step method with lowest LTE
0 = 1, 1 = 0, 2 = 1, 0 = 1/ 3, 1 = 4 / 3, 2 = 1/ 3
Satisfies all five exactness constraints LTE = C ( t )
Solve for the 2-step explicit method with lowest LTE

0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2
Can only satisfy four exactness constraints LTE = C ( t )
Making LTE Small
Multistep Methods
LTE Plots for the FE, Trap, and

Best Explicit (BESTE).
10
d
v (t ) = v (t )
d
FE
-5
L 10
T
E
Trap
-10
10
Beste
Best Explicit Method

has highest one-step
accurate
-15
10
-4
10
-3
10
Timestep
-2
10
-1
10
10
Making LTE Small
Multistep Methods

10
M
d
a -2 d v (t ) = v (t )
10
x
E -4
r 10
r
-6
10
o
r
t [0,1]
FE
Wheres BESTE?
Trap
-8
10 -4
10
-3
10
-2
10
Timestep
-1
10
10
Multistep Methods

worrysome
200
M 10
a
x 100
E 10
r
r 0
10
o
r
10
Making LTE Small
Best Explicit Method has

lowest one-step error but
global errror increases as
timestep decreases
d
v (t ) = v (t )
d
Beste
FE
Trap
-100
10
-4
10
-3
10
Timestep
-2
10
-1
10
Multistep Methods

Difference Equation
Why did the best 2-step explicit method fail to

Converge?
l
l 1
l k
l
E
+
E
+
"
+
E
=
e
( 0
( 1
( k
0)
1)
k)
v ( l t ) v
LTE
Global Error
We made the LTE so small, how come the Global
error is so large?
Multistep Methods

Stability Definition

l
l 1
l k
l
E
+
E
+
"
+
E
=
e
( 0
( 1
( k
0)
1)
k)
Definition: A multistep method is stable if as t 0

T
l
l
max T E C (T ) max T e
N t
l0,
l0,
t
t
interval
dependent
Global Error is bounded by a

Stability means:
constant times the sum of the LTEs
Convolution Sum
Aside on difference
Equations
Root Relation
Given a kth order difference eqn with zero initial conditions
a0 x + " + ak x
l
l k
= u , x = 0, " , x
l
l
x
x can be related to the input u by =
Q M q 1
Root multiplicity
m
q =1 m=0
k 1
l j j
h
u
j =0

Roots of
a0 z + a1 z
k
convolution sum
h = q ,m ( l ) ( q )
l
+ " + ak = 0
=0
Aside on difference
Equations
Convolution Sum
Bounding Terms
Q M q 1
l j
m
l
j
x = q ,m ( l j ) ( q ) u
q =1 m =0 j =0

Rq ,m
If q <1, then Rq ,m C max j u
Independent of l
If q < (1+ ) , then Rq ,0 C
Bounds distinct Roots
max j u
Multistep Methods

Stability Theorem
Theorem: A multistep method is stable if and only if

Roots of 0 z k + 1 z k 1 + " + k = 0 either:
1. Have magnitude less than one

2. Have magnitude equal to one and are distinct
Multistep Methods

Given the Multistep Method Difference Equation
( 0 t 0 ) E l + (1 t 1 ) E l 1 + " + ( k t k ) E l k = el
If, as t 0, roots of ( 0 t 0 ) z l + " + ( k t k ) = 0
less than one in magnitude or

are distinct and bounded by 1 + t , > 0
Then from the aside on difference equations
l t
T
e
C
e
T
l
l
l
max T E C
max T e
max T e
l0,
l0,
l0,
t
T
t
N
t
t
t
C (T )
Multistep Methods

Stability Theorem Picture
roots of
j =0
k j
=0
-1
roots of
Im
As t 0, roots
move inward to
match polynomial
Re
k j
z
= 0 for a nonzero t
( j
j)
j =0
Multistep Methods
The BESTE Method
Best explicit 2-step method

0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2
Im
roots of z + 4 z 5 = 0
2
-5
-1
Method is Wildly unstable!
Re
Multistep Methods

Dahlquists First Stability
Barrier
For a stable, explicit k-step multistep method, the

maximum number of exactness constraints that can be
satisfied is less than or equal to k (note there are 2k-1
coefficients). For implicit methods, the number of
constraints that can be satisfied is either k+2 if k is even
or k+1 if k is odd.
Multistep Methods
Conditions for convergence,

stability and consistency
1) Local Condition: One step errors are small (consistency)

Exactness Constraints up to p0 (p0 must be > 0)
max
T
l0,
t
C1 ( t )
p0 +1
for t < t0
2) Global Condition: One step errors grow slowly (stability)
roots of
k j
Inside the unit circle or on the
z
=
0
j
unit circle and distinct
j =0
T
max T E C2 max T el
l0,
l0,
t
t
p0
l
Convergence Result: max T E CT ( t )
l
l 0,
t
Multistep Methods
Large timestep stability

Two time-constant circuit
small t
Backward-Euler
Computed Solution
Circuit Example
d
x(t ) = Ax(t )
dt
eig ( A) = 2.1, 0.1
large t
With Backward-Euler it is easy to use small timesteps for

the fast dynamics and then switch to large timesteps for the
slow decay
Large Timestep Stability
Multistep Methods
FE on two time-constant
circuit?
Forward-Euler Computed
Solution
The Forward-Euler is accurate for small timesteps, but goes

unstable when the timestep is enlarged
Multistep Methods
Scalar ODE:
FE, BE and Trap on the scalar

ode problem
d
v ( t ) = v(t ), v ( 0 ) = v0
dt
l +1
l
l
l
Forward-Euler: v = v + t v = (1 + t ) v
If 1 + t > 1 the solution grows even if <0
1
l +1
l
l +1
l +1
Backward-Euler: v = v + t v v =
(1 t )
vl
1
If
< 1 the solution decays even if > 0
1 t
Trap Rule: v
l +1
= v + 0.5t ( v
l
l +1
+ v ) v
l
l +1
1 + 0.5t ) l
(
=
v
(1 0.5t )
Multistep Methods
Forward Euler
z = (1 + t )
-1
Im ( )
ODE stability
region
Im(z)
Difference Eqn
Stability region
FE large timestep region of

absolute stability
Re(z)
2
t
Region of
Absolute
Stability
Re ( )
Multistep Methods
FE large timestep stability,

circuit example
Circuit example with t = 0.1, = 2.1, 0.1

Im ( )
ODE stability
region
Im(z)
-1
Difference Eqn
Stability region
Re(z)
2
t
Region of
Absolute
Stability
Re ( )
Multistep Methods
FE large timestep stability,

circuit example
Circuit example with t=1.0, = 2.1, 0.1

Im ( )
Unstable
Difference
Equation
-1
ODE stability
region
Im(z)
Difference Eqn
Stability region
Re(z)
2
t
Region of
Absolute
Stability
Re ( )
Multistep Methods
Backward Euler
BE large timestep region of

absolute stability
z = (1 t )
Im(z)
-1
Difference Eqn
Stability region
Re(z)
Region of
Absolute
Stability
Im ( )
Multistep Methods
BE large timestep stability,

circuit example
Circuit example with t = 0.1, = 2.1, 0.1

Im ( )
Im(z)
-1
Difference Eqn
Stability region
Re(z)
Region of
Absolute
Stability
Multistep Methods
BE large timestep stability,

circuit example
Circuit example with t =1.0, = 2.1, 0.1

Im ( )
Stable Difference
Equation Im(z)
-1
Difference Eqn
Stability region
Re(z)
Region of
Absolute
Stability
Multistep Methods
Stability Definitions
Region of Absolute Stability for ak Multistep method:

Values of t where roots of ( j t j ) z k j = 0
are inside the unit circle.
j =0
A-stable:
A method is A-stable if its region of absolute stability

includes the entire left-half of the complex plane
Dahlquists second Stability barrier:

There are no A-stable multistep methods of convergence
order greater than 2, and the trap rule is the most accurate.
Multistep methods
4
2
Oscillating Strut and Mass
t = 0.1
Forward-Euler
0
-2
-4
-6
Trap rule
0
10
Backward-Euler
15
20
25
30
Why does FE result grow, BE result decay and the

Trap rule preserve oscillations
Multistep Methods
Forward Euler
FE large timestep oscillator

example
z = (1 + t )
Im ( )
ODE stability
region
Im(z)
oscillating
unstable
-1
Difference Eqn
Stability region
Re(z)
2
t
Region of
Absolute
Stability
Re ( )
Multistep Methods
Backward Euler
BE large timestep oscillator

example
z = (1 t )
Im ( )
Im(z)
decaying
-1
Difference Eqn
Stability region
oscillating
Re(z)
Region of
Absolute
Stability
Multistep Methods
Trap large timestep oscillator

example
1 + 0.5t )
(
Trap Rule z = (1 0.5t )
Im ( )
Im(z)
oscillating
oscillating
-1
Difference Eqn
Stability region
Re(z)
Region of
Absolute
Stability
Multistep Methods
Large Timestep Issues
Two Time-Constant Stable problem (Circuit)

FE: stability, not accuracy, limited timestep size.
BE was A-stable, any timestep could be used.
Trap Rule most accurate A-stable m-step method
Oscillator Problem
Forward-Euler generated an unstable difference
equation regardless of timestep size.
Backward-Euler generated a stable (decaying)
difference equation regardless of timestep size.
Trapezoidal rule mapped the imaginary axis
Summary
Local truncation error and Exactness.
Difference equation stability.
Stability + Consistency implies convergence.
Investigate Large Timestep Issues
Oscillators.
Didnt talk about
Runge-Kutta schemes, higher order A-stable
methods.

Methods for Computing Periodic
Steady-State
Jacob White

Karen Veroy
Outline
Periodic Steady-state problems
Application examples and simple cases
Finite-difference methods
Formulating large matrices
Shooting Methods
State transition function
Sensitivity matrix
Matrix Free Approach
Basic Definition
Periodic Steady-State
Basics
dx ( t )
(t )
= F x ( t ) + u{
{ input
dt
state
Suppose the system has a periodic input

T
2T
3T
Many Systems eventually respond periodically
x ( t + T ) = x ( t ) for t >> 0
SMA-HPC 2003 MIT
Basics
Basic Definition
Interesting Property
If x satisfies a differential equation which has a

unique solution for any initial condition
dx ( t )
= F ( x ( t ) ) + u (t )
dt
Then if u is periodic with period T and
x ( t0 + T ) = x ( t0 ) for some t0
x ( t + T ) = x ( t ) for all t > t0
SMA-HPC 2003 MIT
Basics
Periodic Input
Wind
Response
Oscillating Platform
Desired Info
Oscillation Amplitude
SMA-HPC 2003 MIT
Application Examples
Swaying Bridge
Basics
Communication Integrated
Circuit
Periodic Input
Received Signal at 900Mhz
Response
filtered demodulated signal
Desired Info
Distortion
SMA-HPC 2003 MIT
Basics
Automobile Vibration
Periodic Input
Regularly Spaced
Road Bumps
Response
Car Shakes
Desired Info
Shake amplitude
SMA-HPC 2003 MIT
Basics
RLC Circuit
Simple Example
RLC Filter,
Spring+Mass+Dashpot
Spring-Mass-Dashpot
Force
Both Described by Second-Order ODE

2
d x
dx
M 2 + D + x = u{
(t )
dt
dt
input
SMA-HPC 2003 MIT
Basics
Simple Example
RLC Filter,
Spring+Mass+Dashpot Cont.
Both Described by Second-Order ODE

2
d x
dx
M 2 + D + x = u (t )
dt
dt
u(t) = 0 lightly damped (D<<M) Response
x ( t ) Ke
SMA-HPC 2003 MIT
D
2M
cos
+
M
Basics
Simple Example
RLC Filter,
Spring+Mass+Dashpot Cont.
Ke
2M
A lightly damped system oscillates many

times before settling to a steady-state
SMA-HPC 2003 MIT
Basics
Computing Steady State

Frequency Domain Approach
Sinusoidally excited linear time-invariant system
dx ( t )
i t
= Ax ( t ) + e{
dt
input
Steady-State Solution simple to determine
x ( t ) = ( i A ) e
1
i t
Not useful for nonlinear or time-varying systems

SMA-HPC 2003 MIT
Basics

Time Integration Method
Time-Integrate Until Steady-State Achieved

dx ( t )
= F ( x ( t ) ) + u (t ) x l = x l 1 + t F ( x l ) + u (l t )
dt
Need many timepoints for lightly damped case!
SMA-HPC 2003 MIT
Aside Reviewing
Integration Methods
Solve with Backward-Euler
Nonlinear System
dx ( t )
= F x ( t ) + u{
(t )
{ input
dt
state
x ( 0 ) = x0
1424
3
Initial Condition
Backward Euler Equation for timestep l
x x
l
l 1
= t F ( x ) + u (l t )
l
How do we solve the backward-Euler Equation?

SMA-HPC 2003 MIT
Implicit Methods
Aside Reviewing
Integration Methods
Backward-Euler Example
Forward-Euler
Backward-Euler
x(t1 )
x1 = x(0) + t f ( x ( 0 ) , u ( 0 ) )
x(t1 )
x(t2 )
x 2 = x1 + t f x1 , u ( t1 )
x(t2 )
x(t L )
x L = x L 1 + t f x L 1 , u ( t L 1 )
Requires just function

Evaluations
x 2 = x1
M
x(t L )
)
+ t f ( x , u ( t ) )
x1 = x(0) + t f x1 , u ( t1 )
2
x L = x L 1 + t f x L , u ( t L )
Nonlinear equation
solution at each step
Stepwise Nonlinear equation solution needed whenever 0 0

SMA-HPC 2003 MIT
Implicit Methods
Aside Reviewing
Integration Methods
Solution with Newton
Rewrite the multistep Equation

k
0 x t 0 f ( x , u ( tl ) ) + j x
l
j =1
Solve with Newton
f ( x l , j , u ( tl ) )
0 I t 0
Jacobian
l j
j =1
Independent of x l
( x l , j +1 x l , j ) = 0 x l , j t 0 f ( x l , j , u ( tl ) ) + b
F ( xl , j )
Here j is the Newton iteration index

SMA-HPC 2003 MIT
t j f x l j , u ( tl j ) = 0
Implicit Methods
Aside Reviewing
Integration Methods
Solution with Newton Cont.
f ( x l , j , u ( tl ) ) l , j +1 l , j
0 I t 0
( x
x ) = F ( xl , j )
Newton Iteration:
Solution with Newton is very efficient

Converged
Solution
x l
x l ,0
tl k
0 I t 0
t l 3 t l 2 t l 1 t l
f ( x l , j , u ( tl ) )
SMA-HPC 2003 MIT
Polynomial
Predictor
0 I
as t 0
Easy to generate a good initial

guess using polynomial fitting
Jacobian become easy to
factor for small timesteps
Boundary-Value
Problem
Basic Formulation
Periodicity
Constraint
Differential
Equation Solution
d
N Differential Equations:
xi ( t ) = Fi ( x ( t ) )
dt
N Periodicity Constraints: xi (T ) = xi ( 0 )
SMA-HPC 2003 MIT
Finite Difference Methods
Boundary-Value
Problem
Linear Example Problem
dx ( t )
= Ax ( t ) + u ( t ) t [ 0, T ]
{
dt
x (T ) = x ( t )
14243
input
periodicity
constraint
Discretize with Backward-Euler

1
0
1
x = x + t Ax + u ( t )
x 2 = x1 + t Ax 2 + u ( 2t )
(
(
x = x
L
L 1
T
t =
L
+ t ( Ax + u ( Lt ) )
L
Periodicity implies x = x
L
SMA-HPC 2003 MIT
Boundary-Value
Problem
Linear Example Matrix Form
NxL
1
0
0
t I A
1
1 I
IA
0
t
NxL t
O
O
0
1
0
I
0
t
1
I x1 u ( 0 )
t
2
u
t
x ( )
0
M = M

0 M M
x L u ( Lt )
1
I A
t
Matrix is almost lower triangular

SMA-HPC 2003 MIT
Boundary-Value
Problem
Nonlinear Problem
dx ( t )
= F ( x ( t ) ) + u ( t ) t [ 0, T ]
{
dt
input
x (T ) = x ( t )
14243
periodicity
constraint

1
L
1
1
t
F
x
+ u ( t )
x
2
2
1
2
x x x t F x + u ( 2t )
H FD
=
M
L
x x L x L 1 t F x L + u Lt
( )

( ( )
( ( )
( ( )
Solve Using Newtons Method

SMA-HPC 2003 MIT
) =0
)
Boundary-Value
Problem
Shooting Method
Basic Definitions
dx ( t )
= F ( x (t )) + u (t )
Start with
dt
And assume x(t) is unique given x(0).
D.E. defines a State-Transition Function
( y, t0 , t1 ) x ( t1 )
where x (t ) is the D.E. solution given x ( t0 ) = y
SMA-HPC 2003 MIT
Boundary-Value
Problem
Shooting Method
State Transition function Example
dx ( t )
= x (t )
dt
( y, t0 , t1 ) e
SMA-HPC 2003 MIT
( t1 t0 )
Shooting Method
Boundary-Value
Problem
Abstract Formulation
Solve
H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0
14
4244
3
x(T )
Use Newtons method
( x, 0, T )
JH ( x) =
I
x
JH ( x
SMA-HPC 2003 MIT
)( x
k +1
) = H ( x )
k
Boundary-Value
Problem
Shooting Method
Computing Newton
To Compute ( x ( 0 ) , 0, T )
dx ( t )
Integrate
= F ( x ( t ) ) + u ( t ) on [0,T]
dt
( x, 0, T )
What is
?
x (T )
x
x (0) +
x (T )
x (0)
Indicates the sensitivity of x(T) to changes in x(0)

SMA-HPC 2003 MIT
Boundary-Value
Problem
( x, 0, T )
x
x11 (T ) x1 (T )
x1 T x T
(
)
(
)
N
N
1
SMA-HPC 2003 MIT
Shooting Method
Sensitivity Matrix by Perturbation
L L
L L
L L
L L
x1
(T ) x1 (T )
N
xN ( T ) xN ( T )
Boundary-Value
Problem
Shooting Method
Efficient Sensitivity Evaluation
Differentiate the first step of Backward-Euler
1
1
x x ( 0 ) t F x + u ( t ) = 0
x ( 0 )
1
1
1
F
x
x
0
(
)
x
x
t
=0
x ( 0 ) x ( 0 )
x x ( 0 )
1
I
1
F x
x ( 0 )
x
I t
=
x x ( 0 ) x ( 0 )
) )
( ( )
( )
( )
SMA-HPC 2003 MIT
Shooting Method
Boundary-Value
Problem
Efficient Sensitivity Matrix Cont
Applying the same trick on the l-th step
l
l 1
F ( x ) x l
x
I t
=
x x ( 0 ) x ( 0 )
F
x
(
)
( x, 0, T )
I t
x
x
l =1
SMA-HPC 2003 MIT
Shooting Method
Boundary-Value
Problem
Observations on Sensitivity Matrix
Newton at each timestep uses same matrices

l
F
x
(
)
( x, 0, T )
I t
x
x
l =1
1442443
L
Timestep Newton
Jacobian
Formula simplifies in the linear case
( x, 0, T )
SMA-HPC 2003 MIT
( I tA)
Shooting Method
Matrix-Free Approach
Basic Setup
dx ( t )
= F ( x (t )) + u (t )
Start with
dt
H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0
Use Newtons method
( x, 0, T )
JH ( x) =
I
x
k
k +1
k
k
J H ( x )( x x ) = H ( x )
SMA-HPC 2003 MIT
Shooting Method
Matrix-Vector Product
Solve Newton equation with Krylov-subspace method
( x k , 0, T )
k +1
k
k
k
I ( x x ) = x ( x , 0, T )
14243 1442443
x
144
42444
3
x
b
A
Matrix-Vector Product Computation

k
j
k
( x k , 0, T )
p
,
0,
T
x
, 0, T )
(
)
(
j
Ip
pj
Krylov method search direction

SMA-HPC 2003 MIT
Shooting Method
Convergence for GCR
Example
dx
Ax = 0 eig ( A ) real and negative
dt
Shooting-Newton Jacobian
( x, 0, T )
AT
I =e I
x
SMA-HPC 2003 MIT
Shooting Method
Convergence for GCR-evals
AT
I =S
1T
1
O
e
N T
1
S
1
Many Fast Modes cluster at 1
1
Few Slow Modes larger than 1
SMA-HPC 2003 MIT
Summary
Periodic Steady-state problems
Application examples and simple cases
Finite-difference methods
Formulating large matrices
Shooting Methods
Sensitivity matrix

Methods for Computing Periodic
Steady-State - Part II
Jacob White

Karen Veroy
Outline
Three Methods so far
Time integration until steady-state achieved
Finite difference methods
Shooting Methods
Shooting Methods
Sensitivity matrix
Spectral Methods
Galerkin and Collocation Methods
Basic Definition
Basics
dx ( t )
(t )
= F x ( t ) + uN
N input
dt
state
Suppose the system has a periodic input

T
2T
3T
Many Systems eventually respond periodically
x ( t + T ) = x ( t ) for t >> 0
SMA-HPC 2003 MIT
Basics

Time Integration Method
Time-Integrate Until Steady-State Achieved

dx ( t )
= F ( x ( t ) ) + u (t ) x l = x l 1 + t F ( x l ) + u (l t )
dt
Need many timepoints for lightly damped case!
SMA-HPC 2003 MIT
Boundary-Value
Problem
Basic Formulation
Periodicity
Constraint
Differential
Equation Solution
d
N Differential Equations:
xi ( t ) = Fi ( x ( t ) )
dt
N Periodicity Constraints: xi (T ) = xi ( 0 )
SMA-HPC 2003 MIT
Boundary-Value
Problem
Nonlinear Problem
dx ( t )
= F ( x (t )) + u (t )
N
dt
t [ 0, T ]
input
x (T ) = x ( t )

periodicity
constraint

1
L
1
1
t
F
x
+ u ( t )
x
2
2
1
2
x x x t F x + u ( 2t )
H FD
=
#
L
x x L x L 1 t F x L + u Lt
( )

( ( )
( ( )
( ( )
Solve Using Newtons Method

SMA-HPC 2003 MIT
) =0
)
Boundary-Value
Problem
Shooting Method
Basic Definitions
dx ( t )
= F ( x (t )) + u (t )
Start with
dt
And assume x(t) is unique given x(0).
D.E. defines a State-Transition Function
( y, t0 , t1 ) x ( t1 )
where x (t ) is the D.E. solution given x ( t0 ) = y
SMA-HPC 2003 MIT
Shooting Method
Boundary-Value
Problem
Abstract Formulation
Solve
H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0

x(T )
Use Newtons method
( x, 0, T )
JH ( x) =
I
x
JH ( x
SMA-HPC 2003 MIT
)( x
k +1
) = H ( x )
k
Boundary-Value
Problem
Shooting Method
Computing Newton
To Compute ( x ( 0 ) , 0, T )
dx ( t )
Integrate
= F ( x ( t ) ) + u ( t ) on [0,T]
dt
( x, 0, T )
What is
?
x (T )
x
x (0) +
x (T )
x (0)
Indicates the sensitivity of x(T) to changes in x(0)

SMA-HPC 2003 MIT
Boundary-Value
Problem
( x, 0, T )
x
x11 (T ) x1 (T )
x1 T x T
(
)
(
)
N
N
1
SMA-HPC 2003 MIT
Shooting Method
Sensitivity Matrix by Perturbation
" "
" "
" "
" "
x1
(T ) x1 (T )
N
xN ( T ) xN ( T )
Boundary-Value
Problem
Shooting Method
Efficient Sensitivity Evaluation
Differentiate the first step of Backward-Euler
1
1
x x ( 0 ) t F x + u ( t ) = 0
x ( 0 )
1
1
1
F
x
x
0
(
)
x
x
t
=0
x ( 0 ) x ( 0 )
x x ( 0 )
1
I
1
F x
x ( 0 )
x
I t
=
x x ( 0 ) x ( 0 )
) )
( ( )
( )
( )
SMA-HPC 2003 MIT
Shooting Method
Boundary-Value
Problem
Efficient Sensitivity Matrix Cont
Applying the same trick on the l-th step
l
l 1
F ( x ) x l
x
I t
=
x x ( 0 ) x ( 0 )
F
x
(
)
( x, 0, T )
I t
x
x
l =1
SMA-HPC 2003 MIT
Shooting Method
Boundary-Value
Problem
Observations on Sensitivity Matrix
Newton at each timestep uses same matrices

l
F
x
(
)
( x, 0, T )
I t
x
x
l =1

Timestep Newton
Jacobian
Formula simplifies in the linear case
( x, 0, T )
SMA-HPC 2003 MIT
( I tA)
Shooting Method
Basic Setup
dx ( t )
= F ( x (t )) + u (t )
Start with
dt
H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0
Use Newtons method
( x, 0, T )
JH ( x) =
I
x
k
k +1
k
k
J H ( x )( x x ) = H ( x )
SMA-HPC 2003 MIT
Shooting Method
Matrix-Vector Product
Solve Newton equation with Krylov-subspace method
( x k , 0, T )
k +1
k
k
k
I ( x x ) = x ( x , 0, T )

x

x
b
A
Matrix-Vector Product Computation

k
j
k
( x k , 0, T )
p
,
0,
T
x
, 0, T )
(
)
(
j
Ip
pj
Krylov method search direction

SMA-HPC 2003 MIT
Shooting Method
Convergence for GCR
Example
dx
Ax = 0 eig ( A ) real and negative
dt
Shooting-Newton Jacobian
( x, 0, T )
AT
I =e I
x
SMA-HPC 2003 MIT
Shooting Method
Convergence for GCR-evals
AT
I =S
1T
1
%
e
N T
1
S
1
Many Fast Modes cluster at 1
1
Few Slow Modes larger than 1
SMA-HPC 2003 MIT
Fourier Representation
Spectral Methods
Truncation Approximation
Periodic function fourier series
x(t ) =
Xe
l =
t
i 2 l
T
Approximate a function with truncated series
x(t )
Xe
l = L
SMA-HPC 2003 MIT
t
i 2 l
T
Spectral Methods
Square Wave Example
Copyright 1997 by Alan V. Oppenheim and Alan S. Willsky

SMA-HPC 2003 MIT
Spectral Methods
Annoyance for Real Functions
Real x Fourier Coeffs complex conjugate
X l = X
*
l
Can rewrite series with fewer unknowns
t
i 2 l
T
t
+ i 2 l
*
T
x(t ) = X l e
+ Xl e
+ X0
N

l =1
l =0
Real
SMA-HPC 2003 MIT
Spectral Methods
Orthogonality
Terms in Fourier Series are orthogonal

T
t
i 2 l
T
t
i 2 m
T
dt = 0 l m
Simple formula for computing coefficients

T
t
i 2 m
T
SMA-HPC 2003 MIT
x(t )dt = e
0
t
i 2 m
T
Xe
l =
t
i 2 l
T
dt = TX m
Spectral Methods
Advantages
For smooth functions (infinitely cont. diff)

Fourier Coefficients decay exponentially fast
T
1 i 2 m Tt
m
lim m > e
x(t )dt = lim m > X m = O ( c )
T 0
Automatically satisfies periodicity

x(t + T ) =
Xe
l = L
SMA-HPC 2003 MIT
t +T
i 2 l
Xe
l = L
i 2 l
t
T
= x (t )
Spectral Methods
Computing Coefficients
Residual
Plug representation into differential equation

t
G
i 2 l
d L
T
R X , t = Xle

dt l = L
Residual
t
i 2 l
L
T
F
X
e
l = L
u (t )
Simplify by differentiating representation

t
t
L
L
G
i 2 l
i 2 l
i 2 l
T
T
F Xle
R X ,t =
Xle

l = L T
l = L
Residual
SMA-HPC 2003 MIT
u (t )
Spectral Methods
Collocation and Galerkin
Collocation Residual = 0 at test points

G
R X , tl = 0 l = {1,..., 2 L + 1}

Residual
Galerkin Residual orthog to Fourier Terms

T
i 2 m
SMA-HPC 2003 MIT
t
T
G
R X , t dt = 0 m { L,..., 0,...L}

Residual
Spectral Methods
Galerkin Equation
Galerkin Residual orthog to Fourier Terms

T i 2 m t
T
e
0
t
t
L
L i 2 l
i 2 l
i 2 l
T
T
F Xle
Xle

l = L
l = L T
i 2 mX l + e
0
i 2 m
t
T
t
i 2 l
L
T
F Xle
l = L
T
t
i 2 m
T
+
dt
e
u ( t )dt = 0
m { L,..., 0,...L}
SMA-HPC 2003 MIT
u ( t ) dt
Spectral Methods
Linear Galerkin F(x)=Ax
i 2 mX l + e
0
i 2 m
t
T
t
i 2 l
L
T
A Xle
l = L
T
t
i 2 m
T
u ( t )dt = 0
dt + e
0

Um
i 2 L
+
A
0
0
0
UL
T
X L
X
i 2 ( L 1)
( L 1)
( L 1)
0
0
+A 0
# = #
T
0
0
0
%
#
#
U
X
i 2 L
L
L
+ A
0
0
0
T

Diagonal
SMA-HPC 2003 MIT
Spectral Methods
Collocation Equations
Collocation Residual zero at test times

t
t
L
L
G
i 2 l l
i 2 l l
i 2 l
T
T
F Xle
R X , tl = 0 =
Xle
T

l = L
l = L
Residual
l = {1,..., 2 L + 1}
SMA-HPC 2003 MIT
u ( tl )
Spectral Methods
Discrete Fourier Transform
L
x ( t1 )
i 2 t1 X L
i 2 TL t1
T
"
"
e
e
x (t )
X
2
( L 1)
#
% %
#
#
# =
#
% %
#

i 2 L t
L
i 2 t( 2 L+1)
( 2 L+1)
T
T
e
" " e
X L x t( 2 L +1)
Discrete Fourier Transform(DFT)
l
If tl =
T then DFT Matrix has orthog columns
2L + 1
SMA-HPC 2003 MIT
Spectral Methods
Collocation using timepoints
i 2 L
0
0
0
x ( t1 )
T
x ( t2 )
i 2 ( L 1)
0
0
0
1
#
DFT
DFT
(
)
T
#
%
0
0
i 2 L
x t( 2 L +1)
0
0
0
T

F ( x ( t1 ) )

F ( x ( t2 ) )
#

F x t( 2 L +1)
((
))
u (t )
1

u ( t2 )

#
=

#

u t( 2 L +1)
Spectral Differentiation
Converting timepoint into Fourier Coeffs,

Differentiating, and then returning to time
SMA-HPC 2003 MIT
Spectral Methods
Spectral Differentiation Example
Middle row, T = 17 and 2L+1 = 17

SMA-HPC 2003 MIT
Spectral Methods
Spectral Colloc vs. F-D

1
t
1
t
i 2 L
T
0
DFT
0
1
t
%
0
0
0
%
1
t
0
i 2 ( L 1)
T
0
0
SMA-HPC 2003 MIT
1
F ( x ( t1 ) )
1
x
t
2 F ( x ( t2 ) )
x
0
#
# +
0 #
#
1 x 2 L +1 F x t
( 2 L +1)
((
))
u (t )
1

u ( t2 )

#
=

#

u t( 2 L +1)
x ( t1 )
x ( t2 )
0
0
1
( DFT ) #
#
%
0
i 2 L
x t( 2 L +1)
0
T
0
F ( x ( t1 ) )

F ( x ( t2 ) )
#

#

F x t( 2 L +1)
((
))
u (t )
1

u ( t2 )

#
=

#

u t( 2 L +1)
Summary
Four Methods
Time integration until steady-state achieved
Finite difference methods
Shooting Methods
Spectral Methods
Shooting Methods
Sensitivity matrix
Spectral Methods
Galerkin and Collocation Methods
SMA-HPC 2003 MIT

Laplaces Equation FEM Methods
Jacob White

and Karen Veroy, Jaime Peraire and Tony Patera
Outline for Poisson Equation Section

Why Study Poissons equation
Heat Flow, Potential Flow, Electrostatics
Raises many issues common to solving PDEs.
Basic Numerical Techniques

basis functions (FEM) and finite-differences
Integral equation methods
Fast Methods for 3-D

Preconditioners for FEM and Finite-differences
Fast multipole techniques for integral equations
SMA-HPC 2003 MIT
Outline for Today

Why Poisson Equation
Reminder about heat conducting bar
Finite-Difference And Basis function methods

Key question of convergence
Convergence of Finite-Element methods

Key idea: solve Poisson by minimization
Demonstrate optimality in a carefully chosen norm
SMA-HPC 2003 MIT
Drag Force Analysis

of Aircraft
Potential Flow Equations

Poisson Partial Differential Equations.
SMA-HPC 2003 MIT
Engine Thermal
Analysis
Thermal Conduction Equations

SMA-HPC 2003 MIT
Capacitance on a microprocessor Signal Line
Electrostatic Analysis
The Laplace Partial Differential Equation.
SMA-HPC 2003 MIT
HeatFlow
1-D Example
Incoming Heat
T (1)
T (0)
Near End
Temperature
Unit Length Rod
Far End
Temperature
Question: What is the temperature distribution along the bar

T
T (0)
T (1)
x
1-D Example
HeatFlow

T (1)
T (0)
T1
T2
TN 1 TN
1-D Example
HeatFlow
Heat Flow through one section
x
Ti
Ti +1 hi +1,i
Ti +1 Ti
= heat flow =
x
hi +1,i
T ( x )
lim x 0 h ( x ) =
x
1-D Example
HeatFlow
Conservation Law
Two Adjacent Sections

control volume
Incoming Heat (hs )
Ti 1 hi ,i 1
Ti
hi +1,i Ti +1
x
Heat Flows into Control Volume Sums to zero
1-D Example
HeatFlow
Conservation Law
Heat Flows into Control Volume Sums to zero

In co m in g H eat ( h s )
Ti 1 hi , i 1
Ti
x
hi + 1, i Ti + 1
Heat in
from left
Heat out
from right
Incoming
heat per
unit length
h ( x ) T ( x )
lim x 0 hs ( x ) =
=
x
x
x
1-D Example
HeatFlow
CircuitAnalogy
Temperature analogous to Voltage

Heat Flow analogous to Current
1
=
R x
T1
+
-
vs = T (0)
SMA-HPC 2003 MIT
is = hs x
TN
+
-
vs = T (1)
HeatFlow
1-D Example
Normalized 1-D Equation
Normalized Poisson Equation
T ( x )
u ( x)
= hs
= f ( x)
2
x
x
x
2
u xx ( x ) = f ( x )
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
Residual Equation
Using Basis
Functions
Partial Differential Equation form
u
2 = f
x
2
u (0) = 0 u (1) = 0
Basis Function Representation

n
u ( x ) uh ( x ) = i i ( x )
{
i =1
Basis Functions
Plug Basis Function Representation into the Equation
d i ( x )
R ( x ) = i
+ f ( x)
2
dx
i =1
n
SMA-HPC 2003 MIT
Example Basis functions
Using Basis
Functions
Introduce basis representation u ( x ) uh ( x ) = i i ( x )

{
i =1
Basis Functions
u h ( x ) is a weighted sum of basis functions
The basis functions define a space
X h = v X h | v = ii for some i 's

i =1
Example
Hat basis functions

4 6
3 5
SMA-HPC 2003 MIT
Piecewise linear Space
Using Basis
functions
Basis Weights
Galerkin Scheme
Force the residual to be orthogonal to the basis functions

1
( x ) R ( x ) dt = 0
l
Generates n equations in n unknowns
d i ( x )
i
+ f ( x ) dx = 0 l {1,..., n}
2
0 l ( x )
dx
i =1
SMA-HPC 2003 MIT
Basis Weights
Using Basis
Functions
Galerkin with integration by

parts
Only first derivatives of basis functions

n
d l ( x )
0 dx
1
d i i ( x )
i =1
dx
dx i ( x ) f ( x ) dx = 0
0
l {1,..., n}
SMA-HPC 2003 MIT
Convergence
Analysis
The
question
is
u
uh
How does u uh decrease with refinement?

123
error
This time Finite-element methods

Next time Finite-difference methods
SMA-HPC 2003 MIT
Heat Equation
Overview of FEM
u
2 = f
x
2
u (0) = 0 u (1) = 0
Nearly Equivalent weak form
u v
x x dx = f v dx for all v
14243 1
424
3
a(u,v)
l (v )
Introduced an abstract notation for the equation u must satisfy
a (u , v) = l (v)
SMA-HPC 2003 MIT
for all v
Heat Equation
Overview of FEM
n
Introduce basis representation u ( x ) uh ( x ) = i i ( x )

{
i =1
Basis Functions
u h ( x ) is a weighted sum of basis functions
The basis functions define a space
X h = v X h | v = ii for some i 's

i =1
Example
Hat basis functions

4 6
3 5
SMA-HPC 2003 MIT
Piecewise linear Space
Heat Equation
Overview of FEM
Key Idea
a(u , u ) defines a norm a (u, u ) u
U is restricted to be 0 at 0 and1!!
Using the norm properties, it is possible to show
If a (uh , i ) = l ( i ) for all i {1 , 2 ,..., n }
Then
u uh = min wh X h u wh
1
424
3
1
424
3
Pr ojection
Solution
Error
Error
SMA-HPC 2003 MIT
Heat Equation
Overview of FEM
The question is only

u
uh
How well can you fit u with a member of Xh

But you must measure the error in the ||| ||| norm
For piecewise linear:
SMA-HPC 2003 MIT
1
u uh = O
1
424
3
n
error
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
Summary
Why Poisson Equation
Reminder about heat conducting bar
Finite-Difference And Basis function methods

Key question of convergence
Convergence of Finite-Element methods

Key idea: solve Poisson by minimization
Demonstrate optimality in a carefully chosen norm
SMA-HPC 2003 MIT

Finite-Difference Methods for
Boundary Value Problems
Jacob White
Thanks to Jaime Peraire
Outline
Informal Finite Difference Methods
Heat Conducting Bar
More Formal Analysis of Finite-Difference Methods

Heat Equation
Consistency + Stability yields Convergence
SMA-HPC 2003 MIT
1-D Example
Heat Flow

T (1)
T (0)
T1
T2
TN 1 TN
1-D Example
Heat Flow
Equation Formulation
In co m in g H eat ( h s )
Ti 1 hi , i 1
Ti
x
hi + 1, i Ti + 1
hi +1,i
Ti +1 Ti
= heat flow =
x
Heat in
from left
Heat out
from right
Incoming
heat per
unit length
h ( x )
T ( x )
lim x 0 hs ( x ) =
=
x
x
x
Heat Flow
1-D Example
Normalized Equation
T ( x )
u ( x)
= hs
= f ( x)
2
x
x
x
2
u xx ( x ) = f ( x )
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
Summary
Informal Finite Difference Methods
Heat Conducting Bar
More Formal Analysis of Finite-Difference Methods

Heat Equation
Consistency + Stability yields Convergence

Boundary Value Problems - Solving 3-D
Finite-Difference problems
Jacob White

Karen Veroy
Outline
Reminder about FEM and F-D
1-D Example
Finite Difference Matrices in 1, 2 and 3D

Gaussian Elimination Costs
Krylov Method
Communication Lower bound
Preconditioners based on improving communication
Heat Flow
1-D
Example
Normalized Poisson Equation
T ( x )
u ( x)
= hs
= f ( x)
2
x
x
x
2
u xx ( x ) = f ( x )
SMA-HPC 2002 MIT
SMA-HPC 2002 MIT
FD Matrix
properties
x0 x1 x2
2
1
1
0
2
x
M
0
SMA-HPC 2002 MIT
uxx = f
1-D Poisson
Finite Differences
xn xn +1
2 1
O O
O O
L 0
L
O
O
O
1
u j +1 2u j + u j
x
= f (x j )
u1
f ( x1 )
M
0 M
M
M M

0 M = M

M
1 M

2 M
M
u
f ( x )
n
n
Residual Equation
Using Basis
Functions
u
2 = f
x
2
u (0) = 0 u (1) = 0
Basis Function Representation

n
u ( x ) uh ( x ) = i i ( x )
{
i =1
Basis Functions
Plug Basis Function Representation into the Equation
d i ( x )
R ( x ) = i
+ f ( x)
2
dx
i =1
n
SMA-HPC 2002 MIT
Using Basis functions
Basis Weights
Galerkin Scheme
Force the residual to be orthogonal to the basis functions

1
( x ) R ( x ) dt = 0
l
Generates n equations in n unknowns
d i ( x )
i
+ f ( x ) dx = 0 l {1,..., n}
2
0 l ( x )
dx
i =1
SMA-HPC 2002 MIT
Basis Weights
Using Basis
Functions
Galerkin with integration by

parts
Only first derivatives of basis functions

n
d l ( x )
0 dx
1
d i i ( x )
i =1
dx
dx i ( x ) f ( x ) dx = 0
0
l {1,..., n}
SMA-HPC 2002 MIT
Structural Analysis of
Automobiles
Equations
Force-displacement relationships for
mechanical elements (plates, beams, shells)
and sum of forces = 0.
Partial Differential Equations of Continuum
Mechanics
Drag Force Analysis of

Aircraft
Equations
Navier-Stokes Partial Differential Equations.
Engine Thermal
Analysis
Equations
2-D Discretized Problem
FD Matrix
properties
Discretized Poisson
x1
x2
x2m
xm +1
u j +1 2u j + u j 1
244
x
144
3
u
xx
2
SMA-HPC 2002 MIT
xm
u j + m 2u j + u j m
y
1442443
u
yy
2
= f (x j )
FD Matrix
properties
SMA-HPC 2002 MIT
2-D Discretized Problem

Matrix Nonzeros, 5x5 example
3-D Discretization
FD Matrix
properties
Discretized Poisson
x j m
x j 1
x j m2
xj
x j +1
x j + m2
u j +1 2u j + u j 1
( x )
1442443
u xx
2
SMA-HPC 2002 MIT
u j + m 2u j + u j m
( y )
1442443
u yy
2
x j +m
u j + m 2 2u j + u j m 2
( z )
144
42444
3
u zz
2
= f (xj )
FD Matrix
properties
SMA-HPC 2002 MIT
3-D Discretization
Matrix nonzeros, m = 4 example
FD Matrix
properties
Summary
Numerical Properties
Matrix is Irreducibly Diagonally Dominant
| Aii | | Aij |
j i
Each row is strictly diagonally dominant, or path

connected to a strictly diagonally dominant row
Matrix is symmetric positive definite

Assuming uniform discretization, diagonal is
1
1 D 2 2,
SMA-HPC 2002 MIT
1
1
2 D 2 4, 3 D 2 6,
Summary
FD Matrix
properties
Structural Properties
Matrices in 3-D are LARGE

1 D m m,
2 D m 2 m 2 , 3 D m3 m3
100x100x100 grid in 3-D = 1 million x 1 million matrix
Matrices are very sparse

Nonzeros per row 1 D
Matrices are banded
1 D
Aij = 0
|i j | > 1
2D
Aij = 0
|i j| > m
3 D
Aij = 0
| i j | > m2
SMA-HPC 2002 MIT
3,
2D
5, 3 D
Basics of GE
A11
A
0
21
A
031
041
A
SMA-HPC 2002 MIT
Triangularizing
Picture
A12 A13 A14
A
A2323 A
A24
A22
22 A
24
A03232 A33
A
34
34
33
A04242 A0A4343 A4444
Triangularizing
GE Basics
Algorithm
For i = 1 to n-1 {
For each Row
For j = i+1 to n {
For each Row below pivot
For k = i+1 to n { For each element beyond Pivot
Ajk Ajk
Multiplie
r
}
}
Aji
Aii
Pivot
Aik
Form n-1 reciprocals (pivots)
Form
n 1
n2
(n i ) =
2
i =1
n 1
Perform
SMA-HPC 2002 MIT
multipliers
(n i)2
i =1
2 3
n
3
Multiply-adds
Complexity of GE
1 D
O ( n3 ) = O ( m3 )
100 pt grid O(106 ) ops
2D
O (n3 ) = O (m6 ) 100 100 grid O(1012 ) ops
3 D
O (n3 ) = O (m9 ) 100 100 100 grid O(1018 ) ops
For 2- D and 3-D problems Need a Faster Solver !
SMA-HPC 2002 MIT
Triangularizing
Banded GE
Algorithm
b
b
For i = 1 to n-1 {
For j = i+1 to ni+b-1
{ {
{
For k = i+1 to n i+b-1
{
Ajk Ajk
NONZEROS
Aii
Aik
}
}
}
n 1
Perform
(min(b 1, n i))
i =1
SMA-HPC 2002 MIT
Aji
O (b2n )
Multiply-adds
Complexity of
Banded GE
1 D
O (b 2 n) = O (m)
100 pt grid O(100) ops
2D
O (b 2 n) = O (m 4 ) 100 100 grid O(108 ) ops
3 D
O (b 2 n) = O (m7 ) 100 100 100 grid O (1014 ) ops
For 3 - D problems Need a Faster Solver !
SMA-HPC 2002 MIT
The World According to

Krylov
Preconditioning
Start with Ax = b, Form PAx = Pb

Determine the Krylov Subspace r 0 = Pb PAx
k 0
0
0
Krylov Subspace span r , PAr ,..., ( PA ) r
Select Solution from the Krylov Subspace

k 0
k +1
k
k
0
0
0
x = x + y , y span r , PAr ,..., ( PA ) r
GCR picks a residual minimizing y .

SMA-HPC 2002 MIT
Preconditioning
Krylov Methods
Let A = D + And
(
Apply GCR to D 1 A x = I + D 1 And x = D 1b

The Inverse of a diagonal is cheap to compute
Usually improves convergence
SMA-HPC 2002 MIT
Krylov Methods
Optimality of GCR poly
GCR Optimality Property

% k+1 ( PA) r 0 where
% k+1 is any k th order
r k +1
% k+1 ( 0 ) =1
polynomial such that
Therefore
Any polynomial which satisfies the
constraints can be used to get an
upper bound on
SMA-HPC 2002 MIT
r k +1
r

Keep k ( i ) as small as possible:
Easier if eigenvalues are clustered!
SMA-HPC 2002 MIT
Krylov Vectors for diagonal

Preconditioned A

Krylov
xexact (1 digit)
0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1
b = r0
-.5
1.25 -1 -.5
D A
D 1 A
1-D Discretized
PDE
0.5
0
0 1
1
0.5

1
0
O
0
0.5
0
O
O 0.5 M

0
0
0.5
1 0
14444
4244444
3
SMA-HPC 2002 MIT
D1A
1
1.25
0.5
M0.5
0
0

Krylov
xexact (1 digit)
b = r0
1
D A
1
D A
Krylov Vectors for Diagonal

Preconditioned A
Communication Lower Bound
0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1

1
-.5
1.25 -1 -.5
Communication Lower Bound for m gridpoints
D A r 0 is nonzero in mth entry after k = m iters

k
0
k +1
j
= x + j r 0
Need at least m iterations for x
m
j =1
SMA-HPC 2002 MIT
Krylov Vectors for Diagonal

Preconditioned A

Krylov
Two Dimensional Case
For an mxm Grid
1
0
If r 0 =
M

0
Takes 2m = O ( m ) iters
m2
SMA-HPC 2002 MIT
for x k +1
)m
Convergence for GCR

Krylov
Eigenanalysis
0.5
0
0 1
1
0.5

1
0
O
0
0.5
0
O
O 0.5 M

0
0
0.5
1 0
14444
3
4244444
1
1.25
0.5
M0.5
0
0
D1A
k
Recall Eigenvalues of D A = 1 cos
m +1
1
SMA-HPC 2002 MIT
Convergence for GCR

Krylov
Eigenanalysis
For D 1 A,
max
min
m

= 1 cos
1
cos
+
+
m
1
m
1
# Iters for GCR to achieve convergence

rk
r
1
2

+1
k =
log
convergence
tolerance
1 m
log
+1
O (m)
GCR achieves Communication lower bound O(m)!

SMA-HPC 2002 MIT

Krylov
Work for Banded Gaussian

Elimination, Relaxation and GCR
Dimension Banded GE Sparse GE

1
O (m)
O (m)
O (m
O (m
)
O (m )
4
)
O (m )
3
GCR
O (m
)
O (m )
O (m )
GCR faster than banded GE in 2 and 3 dimensions

Could be faster, 3-D matrix only m3 nonzeros.
Must defeat the communication lower bound!
SMA-HPC 2002 MIT

Krylov
How to get Faster

Converging GCR
Preconditioning is the Only Hope

GCR already achieves Communication Lower
bound for a diagonally preconditioned A
Preconditioner must accelerate communication

Multiplying by PA must move values by
more than one grid point.
SMA-HPC 2002 MIT
Gauss-Seidel Preconditioning
Preconditioning
Approaches
Physical View
1-D Discretized PDE

(new)
u0 u1
u2(old)
u1(new)
(new)
2
Gauss Seidel
u3(old)
un(new)
1
un(new) u
n +1
Each Iteration of Gauss-Seidel Relaxation

moves data across the grid
SMA-HPC 2002 MIT
Preconditioning
Approaches
xexact (1 digit)
b =1 r 0
( D + L) A
( D + L)
b =1 r
( D + L) A
( D + L)
Krylov Vectors
0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1

1
1-D Discretized
PDE
X=nonzero
Gauss-Seidel communicates quickly

in only one direction
SMA-HPC 2002 MIT
Preconditioning
Approaches
Symmetric Gauss-Seidel
(new)
u
u0 1
u2(old)
(new)
u
(new)
2
u1
u3(old)
un(new)
1
u
(newer)
u
u 1
0
(new)
n2
un(newer)
1
un(new) u
n +1
un( new)
u2(newer)
This symmetric Gauss-Seidel Preconditioner

Communicates both directions
SMA-HPC 2002 MIT
Preconditioning
Approaches
Symmetric Gauss-Seidel
Derivation of the SGS Iteration Equations

Forward Sweep ( half step ) : ( D + L ) x k +1/ 2 + Ux k = b
Backward Sweep ( half step ) : ( D + U ) x k + Lx k +1/ 2 = b
k +1
= ( D + U ) L ( D + L ) Ux k
1
1
1
+ ( D +U ) b ( D +U ) L ( D + L) b
k +1
= x ( D + U ) D ( D + L ) Ax k
+ ( D +U ) D ( D + L) b
1
SMA-HPC 2002 MIT
Block Diagonal
Preconditioners
Preconditioning
Approaches
Line Schemes
Grid
Matrix
m2
Tridiagonal Matrices factor quickly

SMA-HPC 2002 MIT
Block Diagonal
Preconditioners
Preconditioning
Approaches
Line Schemes
Grid
Problem
Lines preconditioners
communicate rapidly in
only one direction
Solution
m2
Do lines first in x,
then in y.
The Preconditioner is now two Tridiagonal solves, with

variable reordering in between.
SMA-HPC 2002 MIT
Block Diagonal
Preconditioners
Preconditioning
Approaches
Domain Decomposition
Approach
Break the domain into

small blocks each with the
same # of grid points
The trade-off
m2
SMA-HPC 2002 MIT
Fewer blocks means

faster convergence, but
more costly iterates
Block Diagonal
Preconditioners
Preconditioning
Approaches
Domain Decomposition
m points
Block
index
m
l2
m
points
l
2
m
Block cost: factoring l l grids, O ( m 2l ) , sparse GE
l
m
GCR iters: Communication bound gives O iterations.
l3
Suggests insensitivity to l: Algorithm is O ( m ) .
Do you have to refactor every GCR iteration?

SMA-HPC 2002 MIT
Siedelerized Block Diagonal

Preconditioners
Preconditioning
Approaches
Line Schemes
Grid
Matrix
m2
SMA-HPC 2002 MIT
Overlapping Domain
Preconditioners
Preconditioning
Approaches
Line based Schemes
Grid
Matrix
m2
Bigger systems to solve, but can have faster convergence on

tougher problems (not just Poisson).
SMA-HPC 2002 MIT
Preconditioning
i
Approaches
Incomplete Factorization
Schemes
Outline
Reminder about Gaussian Elimination

Computational Steps
Fill-in for Sparse Matrices
Greatly increases factorization cost
Fill-in in a 2-D grid
Incomplete Factorization Idea
Sparse Matrices
Fill-In
Example
Matrix Non zero structure
Matrix after one GE step
X X X
X X 0
X 0 X
X X X
X X X
0 X
X X
X= Non zero
Fill-ins
SMA-HPC 2002 MIT
Fill-In
Sparse Matrices
Second Example
Fill-ins Propagate
X
X
X
0
X
0
X
0
X
0
X
0
Fill-ins from Step 1 result in Fill-ins in step 2

SMA-HPC 2002 MIT
Sparse Matrices
Fill-In
Pattern of a Filled-in Matrix
SMA-HPC 2002 MIT
Very Sparse
Very Sparse
Dense
Sparse Matrices
SMA-HPC 2002 MIT
Fill-In
Unfactored Random Matrix
Sparse Matrices
SMA-HPC 2002 MIT
Fill-In
Factored Random Matrix
Factoring 2-D Finite-Difference matrices
Generated Fill-in Makes Factorization Expensive
SMA-HPC 2002 MIT
FD Matrix
properties
SMA-HPC 2002 MIT
3-D Discretization
Matrix nonzeros, m = 4 example
Preconditioning
i
Approaches
Incomplete Factorization
Schemes
Key idea
THROW AWAY FILL-INS!

Throw away all fill-ins
Throw away only fill-ins with small values
Throw away fill-ins produced by other fill-ins
Throw away fill-ins produced by fill-ins of
other fill-ins, etc.
Summary
3-D BVP Examples
Aerodynamics, Continuum Mechanics, Heat-Flow
Finite Difference Matrices in 1, 2 and 3D

Gaussian Elimination Costs
Krylov Method
Communication Lower bound
Preconditioners based on improving communication

Integral Equation Methods
Jacob White

Xin Wang and Karen Veroy
Outline
Exterior versus interior problems
Start with using point sources
Standard Solution Methods in 2-D
Galerkin Method
Collocation Method
Issues in 3-D
Panel Integration
SMA-HPC 2003 MIT
Interior Versus Exterior Problems
Interior
Exterior
2T = 0
outside
2T = 0
inside
Temperature
known on surface
Temperature in a Tank
Temperature
known on surface
Ice Cube in a Bath
What is the heat flow?

Heat Flow
SMA-HPC 2003 MIT
n
surface
Thermal
= conductivity
Exterior Problem in Electrostatics

potential
+
v
-
2 = 0
Outside
is given on Surface
What is the capacitance?

Capacitance
SMA-HPC 2003 MIT
Dielectric
= Permitivity
n
surface
Drag Force in a Microresonator
Courtesy of Werner Hemmert, Ph.D. Used with permission.
Resonator
Computed Forces
Bottom View
SMA-HPC 2003 MIT
Discretized Structure
Computed Forces
Top View
What is common about these problems.

Exterior Problems
Drag Force in MEMS device - fluid (air) creates drag.
Coupling in a Package - Fields in exterior create coupling
Capacitance of a Signal Line - Fields in exterior.
Quantities of Interest are on the surface
MEMS device - Just want surface traction force
Package - Just want coupling between conductors
Signal Line - Just want surface charge.
Exterior Problem is linear and space-invariant
MEMS - Exterior Stokes Flow equation (linear).
Package - Maxwells equations in free space (linear).
Signal Line - Laplaces equation in free space (linear).
But problems are geometrically very complex!
Exterior Problems
Surface
Why not use Finite-Difference

or FEM methods
2-D Heat Flow Example
T = 0 at
But, must
truncate the
mesh
T
Only need
on the surface, but T is computed everywhere
n
Must truncate the mesh, T () = 0 becomes T ( R ) = 0
SMA-HPC 2003 MIT
Greens Function
Laplaces Equation
In 2-D
2
2
( x x0 ) + ( y y0 )
If u = log
2u 2u
then 2 + 2 = 0 for all ( x, y ) ( x0 , y0 )
x
y
In 3-D
If u =
( x x0 ) + ( y y0 ) + ( z z0 )
2
2u 2u 2u
then 2 + 2 + 2 = 0 for all ( x, y, z ) ( x0 , y0 , z0 )
x
y
z
Proof: Just differentiate and see!

SMA-HPC 2003 MIT
Laplaces Equation
in 2-D
Simple Idea
u is given on surface
Surface
( x0 , y0 )
Let u = log
2u 2u
+ 2 = 0 outside
2
x
y
( x x0 ) + ( y y0 )
2
2u 2u
+ 2 = 0 outside
2
x
y
Problem Solved
Does not match boundary conditions!

SMA-HPC 2003 MIT
Simple Idea
Laplaces Equation
in 2-D
More Points
u is given on surface
2u 2u
+ 2 = 0 outside
2
x
y
( x2 , y2 )
( x1 , y1 )
n
Let u = i log
i =1
( xn , yn )
( x xi ) + ( y yi )
2
) = G ( x x , y y )
n
i =1
Pick the i ' s to match the boundary conditions!

SMA-HPC 2003 MIT
Simple Idea
Laplaces Equation
in 2-D
More Points Equations
(x , y )
t1
t1
Source Strengths selected

to give correct potential at
test points.
( x2 , y2 )
( x1 , y1 )
( xn , yn )
G ( xt x1 , yt y1 ) L L G ( xt xn , yt yn ) ( xt , yt )
1
1
1
1
1
1
M
O
M
M
M
M
O
M
M
G x x , y y L L G x x , y y n x , y
1
1
tn
tn
tn
n
tn
n
tn
tn
SMA-HPC 2003 MIT
Computational results using points approach

Circle with Charges r=9.5
Potentials on the Circle

n=20
SMA-HPC 2003 MIT
r
R=10
n=40
Laplaces Equation
in 2-D
Integral Formulation
Limiting Argument
Want to smear point charges onto surface
Results in an Integral Equation

( x ) = G ( x, x ) ( x ) dS
surface
How do we solve the integral equation?

SMA-HPC 2003 MIT
Laplaces Equation
in 2-D
Basis Function Approach

Basic Idea
Represent ( x ) = i i ( x )
{
i =1
Basis Functions
Example Basis
Represent circle with straight lines
Assume is constant along each line
The basis functions are on the surface

Can be used to approximate the density
May also approximate the geometry
SMA-HPC 2003 MIT
Laplaces Equation
in 2-D

Geometric Approximation is
not new.
Piecewise Straight surface basis

Triangles for 2-D FEM
Functions approximate the circle approximate the circle too!
( x) =
approx
surface
SMA-HPC 2003 MIT
G ( x, x ) ii ( x ) dS
i =1
Laplaces Equation
in 2-D
x1
xn
ln
l1
x2
l2
( x) =

Piecewise Constant Straight
Sections Example.
1) Pick a set of n Points on the

surface
2) Define a new surface by
connecting points with n lines.
3) Define i ( x ) = 1 if x is on line li
otherwise, i ( x ) = 0
i =1
i =1
G ( x, x ) ii ( x ) dS = i
approx
surface
SMA-HPC 2003 MIT
G ( x, x ) dS
line l
i
How do we determine the i ' s ?
Laplaces Equation
in 2-D
R ( x) ( x)
Residual Definition and

minimization
n
G ( x, x ) ii ( x ) dS
approx
surface
i =1
We will pick the i ' s to make R ( x ) small.

General Approach: Pick a set of test functions
1 ,K , n , and force R ( x ) to be orthogonal to the set
( x )R ( x ) dS = 0
i
SMA-HPC 2003 MIT
for all i.
Laplaces Equation
in 2-D
Residual minimization using

test functions
( x ) R ( x ) dS = ( x ) ( x ) dS
i
i ( x ) G ( x, x ) j j ( x ) dS dS = 0
j =1
approx
surface
We will generate different methods by chosing the 1 ,K , n ,
Collocation: i ( x ) = x xti (point-matching)
Galerkin Method: i ( x ) = i ( x ) (basis = test)

SMA-HPC 2003 MIT
Laplaces Equation
in 2-D
Collocation
Collocation: i ( x ) = ( xi ) (point-matching)
( x x ) R ( x ) dS = R ( x ) = ( x )
ti
ti
( )
xti = j
j =1
G xti , x
ti
approx
surface
) ( x) dS = 0
j =1
G xti , x j ( x ) dS
approx
surface
1444424444
3
Ai , j
A1,1 L L A1,n 1 ( xt1 )
M O
M M
= M
M
O M M M

An ,1 L L An ,n n xtn
( )
SMA-HPC 2003 MIT
Laplaces Equation
in 2-D
xn l
n
xt1
l1
l2
x2

Centroid Collocation for
Piecewise Constant Bases
( )
xti = j
j =1
G xti , x j ( x ) dS
approx
surface
Collocation point in
line center
A1,1 L L A1,n 1 ( xt1 )
M O
M M
= M
M
O M M M

An ,1 L L An ,n n xtn
( )
SMA-HPC 2003 MIT
( )
xti = j
j =1
G ( x , x) dS
ti
line j
1442443
Ai , j
Laplaces Equation
in 2-D
( )
xti = j
j =1

Centroid Collocation
Generates a nonsymmetric A
G ( x , x) dS
ti
line j
1442443
Ai , j
xt1
xt2
l1
A1,2 =
l2
G ( x , x) dS G ( x
t1
line 2
SMA-HPC 2003 MIT
t2
line1
, x ) dS = A2,1
Laplaces Equation
in 2-D

Galerkin
Galerkin: i ( x ) = i ( x ) (test=basis)
( x ) R ( x ) dS = ( x ) ( x ) dS
i
i ( x ) G ( x, x ) j j ( x ) dS dS = 0
i ( x ) ( x ) dS = j
approx
surface
j =1
14442444
3
bi
A1,1 L L
M O
M
O
An ,1 L L
approx
surface
G ( x, x ) i ( x ) j ( x ) dS dS
approx approx
surface surface
1444444
424444444
3
Ai , j
A1,n 1 b1
M M M
=
M M M

An ,n n bn
If G ( x, x) = G ( x, x) then Ai , j =A j ,i
SMA-HPC 2003 MIT
j =1
A is symmetric
Laplaces Equation
in 2-D
ln
l1 xn
l2
Galerkin for Piecewise

Constant Bases
x2
n
( x ) dS = G ( x, x) dS dS
linei
14243
bi
j =1
linei line j
144424443
Ai , j
A1,1 L L A1,n 1 b1
M O

M
M = M
M
O M M M

An ,1 L L An ,n n bn
SMA-HPC 2003 MIT
3-D Laplaces
Equation

Piecewise Constant Basis
Integral Equation: ( x ) =
surface
Discretize Surface into

Panels
1
( x ) dS
x x
n
Represent ( x ) i i ( x )
{
i =1
Basis Functions
j ( x ) = 1 if x is on panel j
Panel j ( x ) = 0 otherwise
j
SMA-HPC 2003 MIT
3-D Laplaces
Equation
Put collocation points at
panel centroids

( )
xci = j
xci Collocation
point
j =1
G(x
panel j
ci
, x dS
14442444
3
Ai , j
A1,1 L L A1,n 1 ( xc1 )
M O
M M
= M
M
O M M M

An ,1 L L An ,n n xcn
( )
SMA-HPC 2003 MIT
3-D Laplaces
Equation
Calculating Matrix Elements
xci Collocation
point
Ai , j =
panel j
1
dS
xci x
Panel j
One point
quadrature
Approximation
Four point
quadrature
Approximation
SMA-HPC 2003 MIT
Panel Area
Ai , j
xci xcentroid j
4
0.25* Area
j =1
xci x po int j
Ai , j
3-D Laplaces
Equation
Calculating Self-Term
xci Collocation
point
Ai ,i =
panel i
1
dS
xci x
Panel i
One point
quadrature
Approximation
Ai ,i =
panel i
SMA-HPC 2003 MIT
Ai ,i
Panel Area
xci xci
1
424
3
0
1
dS is an integrable singularity
xci x
3-D Laplaces
Equation
Tricks of the trade
xci Collocation
point
Panel i
Ai ,i =
panel i
Disk of radius R
surrounding
collocation point
Integrate in two Ai ,i =
disk
pieces
Disk Integral has
singularity but has
analytic formula
SMA-HPC 2003 MIT
disk
1
dS
xci x
1
1
dS +
dS
xci x
rest of panel xci x
R 2
1
dS =
xci x
0
1
rdrd = 2 R
r
3-D Laplaces
Equation
Other Tricks of the trade
xci Collocation
point
Panel i
Ai ,i =
panel i
1
dS
xci x
1
424
3
Integrand is singular
1) If panel is a flat polygon, analytical formulas exist

2) Curve panels can be handled with projection
SMA-HPC 2003 MIT
3-D Laplaces
Equation
Galerkin (test=basis)
n
( x ) ( x ) dS = ( x ) G ( x, x ) ( x ) dS dS
144
2443
144444244444
3
i
bi
j =1
Ai , j
For piecewise constant Basis

n
1
dS dS
( x ) dS = j
14
panel
i
panel
j
x x
243 j =1
14444244443
bi
Ai , j
A1,1 L L A1,n 1 b1
M O

M
M = M
M
O M M M

An ,1 L L An ,n n bn
SMA-HPC 2003 MIT
3-D Laplaces
Equation

Problem with dense matrix
Integral Equation Method Generate Huge

Dense Matrices
A1,1 L L A1,n 1 ( xc1 )
M O
M M
= M
M
O M M M

( )
Gaussian Elimination Much Too Slow!

SMA-HPC 2003 MIT
Summary
Exterior versus interior problems
Start with using point sources
Standard Solution Methods
Collocation Method
Galerkin Method
Next Time Fast Solvers
Use a Krylov-Subspace Iterative Method
Compute MV products Approximately

Fast Methods for Integral Equations
Jacob White

and Karen Veroy
Outline
Solving Discretized Integral Equations
Using Krylov Subspace Methods
Fast Matrix-Vector Products
Multipole Algorithms
Multipole Representation.
Basic Hierarchy
Algorithmic Improvements
Local Expansions
Adaptive Algorithms
Computational Results
Exterior Problem in Electrostatics

potential
+
v
-
2 = 0
Outside
is given on Surface
Dirichelet Problem
First Kind Integral Equation For Charge:

1
( x) =
( x ) dS
{
x
x
surface
1
424
3 Ch arg e
potential
Green's Density
Function
SMA-HPC 2003 MIT
Drag Force in a Microresonator
Courtesy of Werner Hemmert, Ph.D. Used with permission.
Resonator
Computed Forces
Bottom View
SMA-HPC 2003 MIT
Discretized Structure
Computed Forces
Top View
3-D Laplaces
Equation

Piecewise Constant Basis
Integral Equation: ( x ) =
surface
Discretize Surface into

Panels
1
( x ) dS
x x
n
Represent ( x ) i i ( x )
{
i =1
Basis Functions
j ( x ) = 1 if x is on panel j
Panel j ( x ) = 0 otherwise
j
SMA-HPC 2003 MIT
3-D Laplaces
Equation
Put collocation points at
panel centroids

( )
xci = j
xci Collocation
point
j =1
G(x
panel j
ci
, x dS
14442444
3
Ai , j
A1,1 L L A1,n 1 ( xc1 )
M O
M M
= M
M
O M M M

( )
SMA-HPC 2003 MIT
3-D Laplaces
Equation
Calculating Matrix Elements
xci Collocation
point
Ai , j =
panel j
1
dS
xci x
Panel j
One point
quadrature
Approximation
Four point
quadrature
Approximation
SMA-HPC 2003 MIT
Panel Area
Ai , j
xci xcentroid j
4
0.25* Area
j =1
xci x po int j
Ai , j
3-D Laplaces
Equation
xci Collocation
point
Ai ,i =
panel i
1
dS
xci x
Panel i
One point
quadrature
Approximation
Ai ,i =
panel i
SMA-HPC 2003 MIT
Ai ,i
Panel Area
xci xci
1
424
3
0
1
dS is an integrable singularity
xci x
3-D Laplaces
Equation
Tricks of the trade
xci Collocation
point
Panel i
Ai ,i =
panel i
Disk of radius R
surrounding
collocation point
Integrate in two Ai ,i =
disk
pieces
Disk Integral has
singularity but has
analytic formula
SMA-HPC 2003 MIT
disk
1
dS
xci x
1
1
dS +
dS
xci x
rest of panel xci x
R 2
1
dS =
xci x
0
1
rdrd = 2 R
r
3-D Laplaces
Equation
Other Tricks of the trade
xci Collocation
point
Panel i
Ai ,i =
panel i
1
dS
xci x
1
424
3
Integrand is singular
1) If panel is a flat polygon, analytical formulas exist

2) Curve panels can be handled with projection
SMA-HPC 2003 MIT
3-D Laplaces
Equation
Galerkin (test=basis)
n
( x ) ( x ) dS = ( x ) G ( x, x) ( x) dS dS
144
2443
1444442444443
i
bi
j =1
Ai , j
For piecewise constant Basis

n
1
dS dS
( x ) dS = j
14
panel
i
panel
j
x x
243 j =1
14444244443
bi
Ai , j
A1,1 L L A1,n 1 b1
M O

M
M = M
M
O M M M

L
L
A
A
n ,1
n,n
n bn
SMA-HPC 2003 MIT
3-D Laplaces
Equation

Problem with dense matrix
Integral Equation Method Generate Huge

Dense Matrices
A1,1 L L A1,n 1 ( xc1 )
M O
M M
= M
M
O M M M

( )
Gaussian Elimination Much Too Slow!

SMA-HPC 2003 MIT
Solving Discretized
Integral Equations
compute Apk
( r ) ( Ap )
k T
k =
( Apk )
( Apk )
x k +1 = x k + k pk

Residual Algorithm
The kth step of GCR
For discretized Integral

equations, A is dense
Update the solution

and the residual
r k +1 = r k k Apk
T
k
+
1
Compute the new
k
Ar
Ap
(
)
(
)
j
k +1
pk +1 = r
p j orthogonalized
T
j = 0 ( Ap ) ( Ap )
search direction
j
j
SMA-HPC 2003 MIT
Solving Discretized
Integral Equations
( r ) ( Ap )
k T
Vector inner products, O(n)
( Apk ) ( Apk )
x k +1 = x k + k pk
r k +1 = r k k Apk
pk +1 = r
k +1
Vector Adds, O(n)
Ar ) ( Ap )
(
p
( Ap ) ( Ap )
k
j =0
k +1 T
Complexity of GCR
Dense Matrix-vector
product costs O(n2)
compute Apk
k =

Residual Algorithm
O(k) inner products,

total cost O(nk)
Algorithm is O(n2) for Integral Equations

even though # iters (k) is small!
SMA-HPC 2003 MIT
Solving Discretized
Integral Equations

Residual Algorithm
Fast Matrix Vector Products
exactly compute Apk

Dense Matrix-vector product costs O(n2)
approximately compute Apk
Reduces Matrix-vector product costs to

O(n) or O(nlogn)
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
1/(# panels)
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
SMA-HPC 2003 MIT
Summary
Solving Discretized Integral Equations
GCR plus Fast Matrix-Vector Products
Multipole Algorithms
Multipole Representation.
Basic Hierarchy
Algorithmic Improvements
Local Expansions
Adaptive Algorithms
Computational Results
Precorrected-FFT Algorithms
Introduction to Simulation - Lectures 17, 18
Molecular Dynamics
Nicolas Hadjiconstantinou
Molecular Dynamics
Molecular dynamics is a technique for computing the equilibrium and
non-equilibrium properties of classical* many-body systems.
* The nuclear motion of the constituent particles obeys the laws of
classical mechanics (Newton).
References:
1)
2)
3)
Computer Simulation of Liquids, M.P. Allen & D.J. Tildesley,

Clarendon, Oxford, 1987.
Understanding Molecular Simulation: From Algorithms to
Applications, D. Frenkel and B. Smit, Academic Press, 1997.
Moldy manual
Moldy
A free and easy to use molecular dynamics simulation package can be found
at the CCP5 program library (http://www.ccp5.ac.uk/librar.shtml), under the
name Moldy. At this site a variety of very useful information as well as
molecular simulation tools can be found.
Moldy is easy to use and comes with a very well written manual which can
help as a reference. I expect the homework assignments to be completed using
Moldy.
Why Do We Need Molecular Dynamics?

Similar to real experiments.
1.
2.
Allows us to study and understand material behavior so that we can model it.
Tells us what the answer is when we do not have models.
Example: Diffusion equation
F |x
F | x + dx
dy
Conservation of mass:
dx
xx ++ dx
d
ndxdydz ) = ( F | x F | x + dx )dydz
(
dt
n = number density, F = flux

d
ndxdydz ) F | x
(
dt
F
2 F dx 2
| x dx + 2
F | x +
dydz
x
x 2
F
2 F dx
=
dxdydz 2
dydzdx
x
x 2
in the limit
dx 0,
n F
+
=0
t x
This equation cannot be solved unless a relation between n and F is provided.

Experiments or consideration of molecular behavior shows that
under a variety of conditions
n
F = D
x
n
2n
=D 2
t
x
diffusion equation!
5
Breakdown of linear gradient constitutive law

Large gradients
F = D
n
x
Far from equilibrium gaseous flows

- Shockwaves
- Small scale flows (high Knudsen number flow)
- Rarefied flows (low density) (high Knudsen number flow)
High Knudsen number flows (gases)

Kn is defined as the ratio of the molecular mean-free path to a characteristic
lengthscale
The molecular mean-free path is the average distance traveled by molecules
between collisions
Collisions tend to restore equilibrium
When Kn 1 particle motion is diffusive (near equilibrium)
When Kn 1 particle motion is ballistic (far from equilibrium)
0.1 Kn 10 physics is transitional (hard to model)
6
Example:
Re-entry vehicle aerobraking maneuver

in the upper atmosphere
In the upper atmosphere density is low (collision rate is low)

Long mean-free path
High Knudsen number flows typical
Other high Knudsen number flows

Small scale flows (mean-free path of air molecules at atmospheric pressure
is approximately 60 nanometers)
Vacuum science (low pressure)
From Dr. M. Gallis of Sandia National Laboratories
Brief Intro to Statistical Mechanics

Statistical mechanics provides the theoretical connection between the
microscopic description of matter (e.g. positions and velocities of molecules)
and the macroscopic description which uses observables such as pressure,
density, temperature
This is achieved through a statistical approach and the concept of an ensemble
average. An ensemble is a collection of a large number of identical systems
(M) evolving in time under the same macroscopic conditions but different
microscopic initial conditions.
( )
Let i M
be the number of such systems in state i :
( )
Then i can be interpreted as the probability of finding an ensemble

member in state i.
9
Macroscopic properties (observables) are then calculated as weighted

averages
A = (i ) A(i )
i
or in the continuous limit
A = ( ) A( )d.
One of the fundamental results of statistical mechanics is that the probability

of a state of a system with energy E in equilibrium at a fixed temperature T is
governed by
E
(E ) exp
kT
where k is Boltzmanns constant.
For non-equilibrium systems solving a problem reduces to the task of calculating
().
Molecular methods are similar to experiments where rather than solving for ()
we measure A directly.
10
A = (i ) A(i )
i
implies that given an ensemble of systems, any observable A

can be measured by averaging the value of this observable over all
systems.
However, in real life we do not use a large number of systems to do
experiments. We usually observe one system over some period of time.
This is because we use the ergodic hypothesis:
- Since there is a one-to-one correspondence between the initial
conditions of a system and the state at some other time, averaging
over a large number of initial conditions is equivalent to averaging
over time-evolved states of the system.
- The ergodic hypothesis allows us to convert the averaging from
ensemble members to time instances of the same system. THIS IS
AN ASSUMPTION THAT SEEMS TO WORK WELL MOST OF
THE TIME.
11
A Simplified MD Program Structure

Initialize:
-Read run parameters (initial temperature, number of timesteps,
density, number of particles, timestep)
-Generate or read initial configuration (positions and velocities
of particles)
Loop in timestep increments until t = tfinal
-Compute forces on particles
-Integrate equations of motion
-If t > tequilibrium, sample system
Output results
12
Equations of Motion
Newtons equations
For i = 1,K, N
r
d 2 ri r
r r r
mi 2 = Fi = r U r1 , r2 K rN )
ri
dt
r r r
U (r1 , r2 K rN ) = Potential energy of the system
r
r r
r r r
= U1 (r1 ) + U 2 (ri , rj ) + U 3 (r1 , r j , rk )
i
i j >i
i j >i k > j >i
+K
r
U1 (r1 ) = external field K
r r
r r
U 2 (r1 , rj ) = pair interaction = U 2 (rij ), (rij ) = (ri r j )
r r r
U 3 (ri , rj , rk ) = three body interaction (expensive to calculate)
13
For this reason, typically
U U1 (ri ) + U 2eff (rij )

i
i j >i
where U2eff includes some of the effects of the three body interactions.
14
The Lennard-Jones Potential

One of the most simple potentials used is the Lennard-Jones.
Typically used to simulate simple liquids and solids.
12 6
U (r ) = 4
r
r
is the well depth [energy]

is the interaction lengthscale
Very repulsive for r <

Potential minimum at r = 6 2
Weak attraction
(~ 1/ r ) for r > 2
6
15
The Lennard-Jones potential (U) and force (F) as a function of separation (r)
( = = 1)
1
0
1
2
0.5
1.5
2.5
3.5
0.5
1.5
2
r
2.5
3.5
1
0
1
2
16
Reduced Units
What is the evaporation temperature of a Lennard-Jones liquid?
What is an appropriate timestep for integration of the equations

of motion of Lennard-Jones particles?
What is the density of a liquid/gas made up of Lennard-Jones

molecules?
17
Number density * = 3
Temperature T * =
kT
P 3
Pressure P =
*
Time t =
t
2
m
In these units, numbers have physical significance.

Results valid for all Lennard-Jones molecules.
Easier to spot errors: (10-32 must be wrong!)
18
Integration Algorithms
An integration algorithm should
a)
Be fast, require little memory
b) Permit the use of a long timestep

c)
Duplicate the classical trajectory as closely as possible
d) Satisfy conservation of momentum and energy and be

time-reversible
e)
Simple and easy to program
19
Discussion of Integration Algorithms

a)
Not very important, by far the most expensive part of simulation is in

calculating the forces
b)
Very important because for a given total simulation time the longer the
timestep the less the number of force evaluation calls
c)
Not very important because no numerical algorithm will provide the

exact solution for long time (nearby trajectories deviate exponentially in
time). Recall that MD is a method for obtaining time averages over all
initial conditions under prescribed macroscopic constraints. Thus
conserving momentum and energy is more important.
d)
Very important (see C)
e)
Important, no need for complexity when no speed gains are possible.
20
The Verlet Algorithm

One of the most popular methods for at least the first few decades of MD.
r
r
r
r
r t + t = 2r t r t t + t 2 a t
r
F t
r
at =
m
Derivation: Taylor series expansion of rr t about time t
()
() (
()
()
()
r
1 r
r
r
r t + t ) = r t ) + t V t ) + t 2 a t ) + K
2
r
1 r
r
r
r t t ) = r t ) t V t ) + t 2 a t ) + K
2
ADVANTAGES
1)
Very compact and simple to program
2)
Excellent energy conservation properties (helped by time-reversibility)
21
3)
4)
Time reversible
r
r
r (t + t ) r (t t )
( )
4
Local error O t
DISADVANTAGES
1)
r
r
r
r t + t ) r t t )
Awkward handling of velocities V t ) =
2t
r
r
a) Need r t + t ) solution before getting V t )
( )
2
b) Error for velocities O t
2)
May be sensitive to truncation error because in

r
r
r
r
r t + t = 2r t r t t + t 2 a t
() (
()
a small number is added to the difference of two large numbers.

22
Improvements To The Verlet Algorithm

Beeman Algorithm:
r
r
r
4
a
t
a
t t )
(
)
(
r
r
2
r (t + t ) = r (t ) + tV (t ) + t
6
r
r
r
r
r
2a (t + t ) + 5a (t ) a (t t )
V ( t + t ) = V ( t ) + t
6
Coordinates equivalent to Verlet algorithm
r
V more accurate than Verlet
23
Predictor Corrector Algorithms

Basic structure:
a)
Predict positions, velocities, accelerations at t + t.
b) Evaluate accelerations from new positions and velocities (if forces are
velocity dependent.
c)
Correct the predicted positions, velocities, accelerations using the new

accelerations.
d) Go to (a).
Although these methods can be very accurate, the nature of MD simulations is
not well suited to them. The reason is that any prediction and correction which
does not take into account the motion of the neighbors is unreliable.
24
The concept of such a method is demonstrated here by the modified Beeman

algorithm that handles velocity dependent forces (discussed later).
a)
b)
c)
d)
e)
r
t 2 r
r
r
r
r (t + t ) = r (t ) + tV (t ) +
a ( t ) a ( t t )
6
rP
r
t r
r
V (t + t ) = V (t ) + 3a (t ) a (t t )
2
rP
1
r
r
a (t + t ) = F r (t + t ),V (t + t )
m
r
rc
t r
r
V (t + t ) = V (t ) + 2a (t + t ) + 5a (t ) (t t )
6
rP
rc
Replace V with V and go to c.
If there are no velocity dependent forces this reduces to the Beeman method
discussed above.
25
Periodic Boundary Conditions
Periodic boundary conditions are

very popular: Reduce surface
effects
Adapted from Computer Simulation of Liquids

by M.P. Allen & D.J. Tildesley,
Oxford Science Publications, 1987.
Todays computers can easily treat N > 1000 so artifacts from small systems
with periodic boundary conditions are limited.
Equilibrium properties unaffected.
Long wavelengths not possible.
In todays implementations, particle interacts with closest images of other
molecules.
26
Evaluating Macroscopic Quantities

Macroscopic quantities (observables) are defined as appropriate averages of
microscopic quantities.
r2
N
1
P
T=3
i
Nk i =1 2mi
2
1 N
= mi
V i =1
rr
1
r r
= mViVi + rij Fij
V i
i j
r
P
r i i
u=
mi
i
(macroscopic velocity).
If the system is not in equilibrium, these properties can be defined as a

function of space and time by averaging over extents (space, time) over
which change is small.
27
Starting The Simulation Up
Need initial conditions for positions and velocities of all molecules in the
system.
Typically initial density, temperature, number of particles known.
Because of the highly non-linear particle interaction starting at completely

arbitrary states is almost never successful.
If particle positions are initialized randomly, with
overwhelming probability at least one pair of particles
will be very close and lead to energy divergence.
Velocity degrees of freedom can be safely initialized using the equilibrium

distribution
E
P( E ) exp
kT
28
because the additive nature of the kinetic energy
r2
P
Ek = i
i 2m
N
leads to independent probability distributions for each particle velocity.
Liquids are typically started in

- a crystalline solid structure that melts during the
equilibration part of the simulation
- a quasi-random structure that avoids particle overlap
(see Moldy manual for an example).
29
Equilibration
Because systems are typically initialized in the incorrect state, potential energy
is usually either converted into thermal energy or thermal energy is consumed
as the system equilibrates. In order to stop the temperature from drifting, one
of the following methods is used:
r
Td r
Vi
1) Velocity rescaling Vi =
T
where Td is the desired temperature and
r2
Pi
2
T=
3Nk i 2mi
N
is the instantaneous temperature.
This is the simplest and most crude way of controlling the temperature
of the simulation.
30
2) Thermostat. A thermostat is a method to keep the temperature

constant by introducing it as a constraint into the equations of motion.
Thermostats will be described under Constrained Dynamics.
31
Long Range Forces, Cutoffs,

And Neighbor Lists
Lennard-Jones potential decays as r-6 which is reasonably fast.
However, the number of neighbors interacting with a particle grows as r 3.
Interaction is thus cut off at some distance rc to limit computational cost.
Most sensitive quantitites (surface tension) to the long range forces usually
calculated with rc approximately 10.
Typical calculations for hydrodynamics use rc approximately 2.5.
Electrostatic interactions require special methods
(Multiple expansions, Ewald sums) - See Moldy manual
32
Although system behavior (properties: equation of state, transport

coefficients, latent heat, elastic constants) are affected by rc, the new
ones can be measured, if required.
The simplest cut-off approach is a truncation
U (r ) r rc
U tr (r ) =
r > rc
O
Not favored because Utr(r) is discontinuous. Does not conserve energy.
This is fixed by the truncated and shifted potential
U (r ) U (rc ) r rc
U tr sh (r ) =
O
r > rc
33
Even with a small cut-off value the calculation cost is proportional to N 2

because of the need to examine all pair separations.
Cost reduced by neighbor lists
-Verlet list
-Cell index method
34
Verlet Neighbor Lists
(r
> rc )
so that neighbor pairs need not be calculated every timestep
Keep an expanded neighbor list
rl chosen such that need to test every 10-20 timesteps

Good for N < 1000
(too much storage required)
rl
3
7
7`
1
rc
Adapted from Computer Simulation

of Liquids, by M.P. Allen &
D.J. Tildesley, Oxford
Science Publications, 1987.
2
5
35
Cell-index Method
Divide simulation into m subcells in each direction (here m = 5 ).
Search only sub-cells within cut-off (conservative)
Example: If sub-cell size larger than cut-off for cell 13 only cells 7, 8, 9, 12, 13,
14, 17, 18, 19 need to be searched.
9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
36
In two dimensions cost is 4.5 NNc where
Nc =
N
m2
instead of
1
N ( N 1).
2
In three dimensions cost is 13.5 NNc instead of
1
N ( N 1).
2
(With Nc appropriately redefined)
37
Constraint Methods
Newtonian molecular dynamics conserves system energy, volume and

number of particles.
Engineering-physical systems typically operate at constant pressure,

temperature and exchange mass (i.e. they are open).
Methods to simulate these have been proposed*.
* These methods are capable of providing the correct statistical

description in equilibrium. Away from equilibrium there is no
proof that these methods provide the correct physics.
Therefore they need to be used with care, especially the crude
ones such as rescaling.
38
Constant Temperature Simulations
Newtons equations conserve energy and temperature is a variable.
In most calculations of practical interest we would like to prescribe temperature

(in reality, reservoirs, such as the atmosphere, interact with systems of interest
and keep temperature constant).
Similar considerations apply for pressure. We would like to perform constant

pressure calculations with variable system volume.
Three main types of approaches

-Velocity rescaling (crude)
39
- Extended system methods
(One or more extra degrees of freedom are

added to represent the reservoir.)
r
r r
r
dri Pi dPi r
Equations of motion
= ,
= Fi fPi
dt m dt
f is a dynamical variable and is given by
df kg
= (T Td )
dt Q
g = number of degrees of freedom

Q = reservoir mass (adjustable parameter-controls equilibration dynamics)
This method is known as the Nos-Hoover thermostat.
40
- Constraint methods (The equations of motion are constrained

to lie on a hyperplane in phase space).
In this case the constraint is
d
d
T Pi 2 = 0
dt
dt i
This leads to the following equations of motion
r r
dri Pi
= ,
dt m
N r r
r
P Fi
r
dPi r
= Fi Pi , = i =1N r
dt
Pi 2
i =1
is a Lagrange multiplier (not a dynamical variable).

A discussion of these and constant pressure methods can be found
in the Moldy Manual.
Note the velocity-dependent forces.
41
Homework Discussion
Use (and modify) control file control.argon. You can run short familiarization
runs with this file.
Control.argon calls the argon.in system specification file. Increase the number
of particles to at least 1000 (reduce small system effects and noise).
Note that density in control.argon is in units equivalent to
g/cm3 (= 1000 Kg/m3).
Reduce noise by increasing number of particles and/or averaging interval
(average-interval).
Make sure that temperature does not drift significantly from the original
temperature. You may want to rescale (scale-interval) more often and stop
rescaling (scale-end) later. Make sure you are not sampling while
rescaling!
By setting rdf-interval = 0 you disable the radial distribution function
calculation which makes the output file more readable.
42
Sample control file
#
# Sample control file. Different from the one provided with code.
#
sys-spec-file=argon.in
density=1.335
#density=1335 Kg/m3
temperature=120
#initial temperature=120K
scale-interval=2
#rescale velocities every 2 timesteps
scale-end=5000
#stop rescaling velocities at timestep 5000
nsteps=25000
#run a total of 25000 timesteps
print-interval=500 #print results every 500 timesteps
roll-interval=500
#keep rolling average results every 500 timesteps
begin-average=5001 #begin averaging at timestep 5001
average-interval=10000 #average for 10000 timesteps
step=0.01
#timestep 0.01ps
subcell=2
#subcell size for linked-cell method
strict-cutoff=1
#calculate forces using strict 0(N2) algorithm
cutoff=10.2
#3 * sigma
begin-rdf=2501 #begin radial distribution function calculation at timestep 2501
rdf-interval=0
rdf-out=2500
43
Introduction to Numerical Simulation (Fall 2003)

Problem Set #1 due September 12
This problem set is mainly intended to familiarize you with the algorithms associated with formu
lating a system of equations from a given problem description, and also to remind you of eigenvalues
and eigen vectors. We have used circuits and heat ow examples, and will use struts and joints
later in the term when we discuss nonlinear systems.
In problems (2) and (3), you will be using Matlab and modifying scripts that we have written for
you. You can download the les from the course web page.
1) This problem is intended to reenforce your understanding of the nodal analysis and nodebranch
equation formulation techniques.
Consider the following simple circuit.
R3
R2
0
Is
a) Apply nodal analysis to generate a linear system of equations which can be used to compute the
circuit node voltages. Where appropriate, please give matrix or vector entries as analytical formulas
in terms of R1 , R2 , R3 and Is .
b) Use the nodebranch approach to form a linear system of equations which can be used to compute
the circuit node voltages and resistor currents. Where appropriate, please give matrix or vector
entries as analytical formulas in terms of R1 , R2 , R3 and Is .
2) In order to help you better understand how to create a program which reads in schematics and
generates systems of equations, we have written a set of matlab functions and scripts which can
be used to read in a le containing a circuit description and set up the associated nodal analysis
1
matrix and the appropriate righthand side. In this problem, you will modify these functions and
scripts, so bear with us while we describe what they do.
The matlab script assumes that the circuit description is given as a list of statements in a le, and
that the element types are described in the following format.
resistors:
rlabel
node1
node2
val
where label is an arbitrary label, node1 and node2 are integer circuit node numbers, and val is
the resistors resistance (a oating point number).
current sources:
ilabel
node1
node2
val
the current owing from node1 to node2 .
voltage sources:
vlabel
node1
node2
val
the voltage between node1 and node2 .
voltagecontrolled current sources:
clabel
node1
node2
node3
node4
val
where label is an arbitrary label, node1 , node2 , node3 , and node4 are integer circuit node
numbers, and val is the controlled sources transconductance (denoted gm ), which means that the
current owing from node1 to node2 is val times the voltage dierence between node3 and node4
(i.e. i = gm (v3 v4 )). Note, ground is always node 0.
To use the supplied scripts to solve our supplied example circuit le, test.ckt, rst start up matlab.
When in matlab, type le = test.ckt to specify the name of the input le. Then type readckt to run
the script in the le readckt.m. This will read the le and put the information in arrays. Finally,
type loadMatrix to run the function in loadMatrix.m; this will create the conductance matrix and
the righthand side associated with the circuit in test.ckt. To determine the vector of node voltages,
type v = G\b.
The scripts we have provided have an implementation of nodal analysis for resistors, current sources
and voltagecontrolled current sources. Your job will be to extend the implementation to include
voltage sources for the special case where one terminal of the voltage source connected to ground.
To accomplish this, you will need to modify only the le loadMatrix.m.
You could, of course, switch the simulator so that it uses nodebranch analysis. However, for
networks of twoterminal positive resistors, the conductance matrix has a structure that is advan
tageous for numerical calculations (positive diagonals, negative odiagonals, symmetry, diagonal
dominance). Assuming that one terminal of the voltage source is connected to ground, it is possible
to implement voltage sources in your Matlab simulator in such a way that you will still preserve the
above properties of the conductance matrix. Please implement such a scheme in your Matlab simu
lator. Only implement what is needed for resistors, current sources, and grounded voltage sources,
do not bother with voltagecontrolled current sources. Be sure to test your modied simulator, you
will need a working version for problem (3).
Some Helpful Notes.
It will be necessary to make contributions to the righthand side (RHS) vector for each resistor that
is connected to a voltage source, and the matrix will require modications as well. In calculating
these modications, you will nd the array, sourcenodes generated by readckt helpful.
To make the simulator easier to debug, the node numbers in input les correspond to the row
numbers in the generated conductance matrix. If the input le node numbers are not contiguous,
or if voltage source nodes are not last, there will be rows in the resulting G matrix with only a
single one on the diagonal.
3) In this problem we will examine the heat conducting bar basic example, but will consider the
case of a leaky bar to give you practice developing a numerical technique for a new physical
problem.
With an appropriate input le, the simulator you developed in problem 2 can be used to solve
numerically the onedimensional Poisson equation with arbitrary boundary conditions. The Poisson
equation can be used to determine steadystate temperature distribution in a heatconducting bar,
as in
a
H(x)
2 T (x)
(T (x) T0 )
(1)
=
2
x
m
m
where T (x) is the temperature at a point in space x, H(x) is the heat generated at x, m is the
thermal conductivity along the metal bar, and a is the thermal conductivity from the bar to the
surrounding air. The temperature T0 is the surrounding air temperature. The ratio ma will be small
as heat moves much more easily along the bar than disspates from the bar into the surrounding air.
Now suppose one is trying to decide if it is necessary to have heat sink (a heat sink is usually just
a large chunk of metal which dissipates heat rapidly enough to stay close to room temperature)
connections at the ends of an electronic package. You can use simulation to help you make this
decision.
a) Use your Matlab simulator to numerically solve the above Poisson equation for T (x), x [0, 1],
given H(x) = 50 for x [0, 1], a = 0.001, and m = 0.1. In addition, assume the ambient air
temperature is T0 = 350, and T (0) = 300 and T (1) = 300. The boundary conditions at x = 0 and
x = 1 models heat sink connections to a cool metal cabinet at both ends of the package. That is,
it is assumed that the heat sink connections will insure both ends of the package are xed at near
room temperature. In your numerical calculation, how did you pick x? How do you know your
solution is accurate.
b) Now use your simulator to numerically solve the above equation for T (x), x [0, 1], given
H(x) = 50 for x [0, 1], a = 0.001, and m = 0.1. In addition, assume the ambient air temperature
is T0 = 350, and that T (0) and T (1) are unknown but Tx(0) = 0 and Tx(1) = 0. The zero heat ow
3
boundary conditions at x = 0 and x = 1 implies that there are no heat sinks at either end of the
package. Compare your results to part (a).
c) For the case examined in part (b) above, what will happen if a is identically zero. Can you solve
this problem? Can you nd a reasonable solution by examining what happens as a approaches
zero?
4) This last problem uses the simulator you developed in the previous question to remind you about
some of the properties of eigenvalues and eigenvectors before we use them in lecture. Be sure to
familiarize yourself with MATLABs eig command before you start.
a) Consider the leaky bar system in question 3, assume T0 = T (0) = T (1) = 0 and use ther
mal conductivity values in part 3a). For this system, nd a heat distribution, H(x), such that
maxx H(x) = 1 and when you solve for the temperature,
T (x) = H(x)
where is a real number. You need only plot H(x) and T (x), you do not need an analytical formula.
b) For the problem in part 4a, how many dierent H(x) and s are there?
c) Suppose a = 0, but all the other settings in part 3a) hold. How do the H(x)s and s which
satisfy
T (x) = H(x)
change? Please explain your results.
Introduction to Numerical Simulation (Fall 2003)

Problem Set #3 - due September 26th
Note: This problem set has only one problem, in order to give everyone a little time to catch up.
It does not cover the dierence between modied and unmodied QR. We will cover that issue in
the next problem set.
1) In this problem, you will write your own factorization algorithm based on row orthogonalization.
Such an approach makes it simpler to do numerical pivoting and is more easily compared to LU
factorization.
a) Write a matlab program for solving Mx = b, where M is square, which is based on making the
rows of M into a set of orthonormal vectors. Please remember to normalize the rows so that they
correspond to vectors of unit length. You may nd it helpful to look at our qr.m matlab program
which is based on orthogonalizing columns.
b) Consider using your row orthogonalization algorithm to solve a tridiagonal matrix (a tridiagonal matrix is one whose only nonzero entries are Mi,i , Mi,i+1 , and Mi,i1 ). Compare the operation
counts for sparse orthogonalization and sparse LU factorization of an n n tridiagonal matrix.
c) If a row that is about to be normalized in your row orthogonalization algorithm corresponds
to a vector with a very short length, then the normalization will increase the size of those matrix
elements, contributing to matrix growth and worsening round-o errors. It might be better to
exchange the unnormalized row with one of the other rows that has yet to be normalized, preferably
the one with the largest associated length. Such a row exchange is analogous to the row exchange
used in partial pivoting for LU factorization. Please modify your matlab row orthogonalization
program to perform such a pivoting algorithm.
d) What will happen if you apply your row orthogonalization algorithm with pivoting to a problem
in which M is singular? Please test your code on a singular example.

!#"$%&('*)
+-,+./.01
23/45
"6798;:=<> @?A /4B5C0!D
E 1GFIHKJ*L!MONQPR!STNUJVMXWH9YZS[YQMO\X\$]TW^`_+abcSGJVL!S>d3e-fga\XhWb*MiJVL!^jMO^`_!\XST^`STH7JkaDJ*MOWHlNZMXHnm*o5prqsOtuaH!v=wxo5prqsOtzy
1|{}L+aDJ~MON-J*L!Sv!MXSTbcSTH!]SSrJIYSSTHJ*L!S[JIYWa\OhWb*MiJVL!^`NkC{}L!MO]kL9MON#SJcJVSb|aDH!vKYQLl
4 1Z&R!_!_WNcSWR>YSb*S-JVbU5MOH!h~JVWhSTH!SbVaDJ*S#aHS!aD^_!\XSWbYQL!MX]kLzWH!SZW!JVLlSh]ba\XhWb*MiJVL!^`N
]TWHSTbchSTH!]STN
B
aNUJVSTb#J*L+aHJVL!SWJ*L!STby{}L7CYZWR!\Xv9WRaWMOvR!NcMOH!hazNc5^`^SrJVb*MX]GJVSNcJ~Srla^`_!\OS
&1B\OSaDN*S[+H!vaH9Srla^`_!\OSGWb-YQLlMO]kL9WH!S>d>e-fa\OhWb*MXJ*L!^j]TWH7Sb*hSN~aDNcJVSb#JVL+aHJVLlS3WDJVL!SbTy
, 1&R!_!_WN*SzJ*L!S`d>e-fua\OhWb*MiJVL!^MONGa_!_!\XMOSTv;J*WN*W\i&MXH!h$YQL!STbcS

MXNGNc5^`^SrJVb*MX]zaH!v%L+aDNa\O\
!RlJQJIYZWWMXJ*N#STMXhSTH7Da\ORlSTN#MOHKJ*L!SMOH7JVSbcDa\l&kDlJVLlS3WDJVL!SbQJIYZW`STMOhSTH7Da\OR!SNQSMOH!hAaH!vly
1n&L!WYJ*L+aDJJVL!S%bcSTN*MXv!R+a\[_lb*W5v!R!]TSv7JVLlS=J*L!MObcvMiJVSbVaDJ*MOWHW3JVLlSd3e-fa\XhWb*MiJVL!^
MOHlSTPR+a\XMXJI
D
D
NVaJVMONU+STNJVLlS

~}
MOH7JJ*L!MOHlKaWRlJQ]TWH!NcJ*b*R!]rJVMOHlhaJVLlMOb*v5IWbcv!STb#_W\X5H!W^MOa\WDJ*L!S[Wb*^
Z
V/
4B1{}L+aJQMON#JVL!SSTNUJ|WR!Hlv9WR]TaH+H!vWb
D5

1&_STb*MX^SH7Jka\X\XCSTb*Mi9_+abcJ*N a7~aDH!v /~79]TWHlNcJVbcR!]J*MOH!haH;S!aD^_!\XS A

&
v!MOahWH!a\
âJVb*M
YQMXJ*LJVLlS[aWSGSTMXhSTHl_!b*W_STbUJVMXSTNTySGN*Rlb*SJ*W_!MO]kàHSrla^`_!\OS~JVL+aDJL+aN#a>N*Rl`]TMXSTH7J#HR!^STbZWv!MXNcJVMXH!]J
STMXhSTH7Da\XR!STN/aH!v9_!\OWJr aN~azRlH!]J*MOWHWWHaz\OWhDI\XMOH!STabZ_l\OWJ-J*W`STbcMXCWR!bQWR!H!v!Ny
IV
01|R!^`STbcMO]aD\NcMO^R!\OaDJVMXWHWDZW5]TSTaHn+WYv&&H!a^MX]TN[MXNGR!N*Sv%MOH;YZSaDJ*L!STb_!bcSTv!MX]JVMXWH%aH!vnMXHnb*SN*SaDb*]kL=WH
h\XW+a\Y-abc^MXH!h!y%#L!S]TW^^`WH!\i%R!N*SvNcMO^R!\OaDJVMXWH@J*ST]kL!HlMOPR!STNàbcSK+aNcSTvWHb*S_STaDJVSvNcW\OR&JVMOWHW|a
WMON*NcWHSPR+aDJVMXWHAWbL5v!bcWNcJVaDJVMX]Q_!b*SN*N*Rlb*STNyZFJMXNPR!MiJVSQ]TW^^`WHzJVW3]TW^`_!RlJ*SQL5v!bcWNcJVaDJVMX]|_lb*STNcN*R!bcSTNZ
N*W\X5MOH!h[J*L!SQWMXN*NcWHSPR+aDJVMXWHAWHà[v!WâMOHJ*L+aDJ]TWSTb*NJVLlS~SH7JVMObcS#-JV\OaH7JVMO]#W5]TSTaH$7bcSTN*Rl\XJVMXH!hMOHz^`MO\X\OMOWH!N
WR!Hl5HlWYQHlNTyg#L!SWRlH!v+abU]WH!v!MiJVMXWH!NAWb`J*L+aDJWMXN*NcWHSTPR+aDJ*MOWHJVL!SHv!ST_STHlvWHJ*L!SNcST\OS]J*STv
_!L75N*MX]a\Z^W5v!ST\yFIHJ*L!MON3_!bcW!\OS^WRYQMO\X\BSrla^`MOH!SAN*W\X5MOH!havlMON*]b*SrJVMOSTvJIYZWDv!MO^`STH!NcMOWH!a\WMXN*NcWH
STPR+aJVMOWH@_!bcW!\XSTâNcN*W5]TMOaDJVSv=YQMiJVL%^W5v!S\OMOHlhCJVL!SAW5]TSaDH$aDH!v@Sla^MXH!SJVLlSMXHJ*STb*a]J*MOWH!N[SJIYZSTSHJVLlS
]kL!WMX]TSW_!L75N*MO]Ta\^W5v!S\ L!SH!]TS3WR!H!v+aDbcK]WH!v!MiJVMOWH!NkaDH!v9J*L!SbVaDJ*SWBd3e-f]WH7STbchSTHl]TSy
#L!S#W5]TSTaHMONaHR!HR!NcR+a\!vlWâDMOH3S]aR!NcS#MXJMON^R!]kLYQMOvlSTbJ*L+aHvlSTST_6yW[^`W5v!ST\7JVLlMONT]TWH!NcMOv!SbN*W\X5MOH!h
azJIYWDv!MO^`STHlN*MOWH+a\$WMONcN*WH9STPR+aJVMOWH$
Vl
Vl

( Vlr
YQL!Sb*S>%l DMON#J*L!S>LlWb*MXTWH7JkaD\
_WNcMXJ*MOWH$lKl MON#JVL!SSbcJ*MO]aD\
_WNcMXJ*MOWHWbQv!S_lJVL$aH!vA V!
MON-J*L!S[L75v!b*WNUJkaDJ*MO][_!bcSTNcN*R!bcS>aJ#_WN*MXJ*MOWHVy STbcS VlZMON#azbcMOhL7JcL+aH!vN*MOvlSGYQL!MO]kL^`W&v!S\ONQeWb*MOW\OMON

]TWHST]rJVMXSWbQWJVLlSTb#MOHlv!R!]TSv9Wb*]TSNTy#L!SWR!H!v+abUK]TWH!v!MXJ*MOWHlNQab*SDlWb-JVLlS3_lR!b*S>|SR!âDH!H^W5v!ST\
Vl

V!

zg
l
Wb!Wb-JVL!S_+aDbcJV\iKGMXb*MO]kLl\OSJ^W5v!S\
V!B(
V!

Vl
A(&
l
1{@SL+aSYQb*MXJcJVSHnaâDJV\Oa9RlH!]J*MOWH+Omr5!D9|pkw6 jYQL!MO]kL;YQMX\O\hSTHlSTbVaJVSJ*L!SNc_+ab*NcS
âDJVbcMO]TSNGaNcN*W5]TMOaDJVSvYQMXJ*L;v!MONc]TbcSJVMXTMXH!hWJVL;J*L!S_lR!b*Sz~STR!âH!H%aDH!v;_+abUJV\iGMXb*MX]kL!\OSrJ|_lb*W!\XST^`NTy|#L!S
N*]b*MX_lJ3MONN*R!]kL=JVL+aJ3J*L+aDJv!MXN*]b*SJ*MOTaDJVMXWH%]aH=SàvcR!NcJ*STv$YQMXJ*L
aH!v@;lT@
YQL!Sb*S
aH!v
abcSMXH7JVSThSTb*NydSH!STb*aDJVSAJVL!S`âDJVbcMO]TSN3WbJ*L!S]TaN*S`
l=
aDH!v
ly
|WDJVSBYQL!SHb*STPR!SNcJ*STvJVW;hSH!STb*aDJVSCJ*L!SCâDJVbcMi@aNcN*W5]TMOaDJVSvYQMXJVLJ*L!SC_!Rlb*ST\i~STR!âH!H_lb*W!\XST^9JVLlS
N*]b*MX_lJ-+Omr5GhSTH!SbVaDJ*STN~azâJVb*M`YQL!MO]kLKWbc]TSTN#J*L!S[_WDJVSTH7J*Mxa\$MXHKWH!S[]Wb*H!Sb#WJVL!S[_!bcW!\XST^vlWâDMOHJ*W
S3STbcW!yB{}LMXN-JVL+aJ
4B1#L!S#âJVb*MX]TSNhSTHlSTbVaJVSTvAMXH_+abUJ a7ab*S#NU5^^`SJ*b*MO]aH!vz]TaHzS#NcW\XSTvYQMiJVL`d3e-fR!NcMOH!haJ*b*R!H!]TaDJVSv
+a]kWbcJ*L!WhWH+a\OMXaDJ*MOWH6yFI^`_!\XST^`STH7JZJ*L!SJ*b*R!H!]TaDJVSvK+a]kWbcJ*L!WhWH!a\OMXaDJ*MOWHzMXHJVL!Swxo5prqsOtb*WRlJVMXH!SaDH!v
]TW^_+aDb*SZJ*L!SJ*MO^`SZbcSTPR!MObcSTvJ*WG]aDb*bc>WRlJ d>e-fMXJ*STbVaJVMOWH!NWHJVLlS-hSTHlSTbVaJVSTvzâDJ*b*MO]STNYQMXJ*L]TW^`_!\XSJVS
aH!vJVbcR!H!]TaDJVSv+a]kWbcJ*L!WhWH!a\OMXaDJ*MOWH!N cR!NUJQR!N*SWR!b#Y-aDJ*]kLJ*WAJ*MO^`Sy
&1[|_!_!\iKWR!b~^`W&v!Mi+STvd3e-fa\OhWb*MXJ*L!^
N*SrJJVLlS3J*W\OSbVaH!]S3J*W
55-JVW`N*W\i&MXH!hAJVL!S>v!MONc]Tb*SrJVMXTSTv_!R!bcS

|SR!âH!HnaHlv;_+abUJV\X9GMXb*MX]kL!\OSrJQ_!b*Wl\OST^`NTR!NcMOH!haH7b*aH!v!W^ST]rJVWbWby|\XWJ [
aNaRlH!]J*MOWHnWZWH%a\OWhI\OMXH!SaDb~_l\OWJ|WbWJVLn_lb*W!\XST^`NTy
WYYZWR!\Xv;WR=aH+aD\X5TS3YQLJVL!S_!abcJ*\X
GMObcMO]kL!\XSJ#_lb*W!\XST^ ]TWH7Sb*hSN|aNUJVSbk
1
_&JVMOWH+a\e-aHWR9+HlvaAhW5W&vK_!bcST]WH!v!MiJVMOWH!STb-Wb-J*L!S_+abUJV\XCGMObcMO]kL!\XSJ-_lb*W!\XST^

!#"$%&('*)
+-,+./.01
2345&
"$687:9<;= >@?A /4BDC.!E
FHGJILKMON#P!QSRUT!V*GW!XYKZ[R\KI]P+^JRUG_!Xa`G_bKcT!V*GW!XYKZedQY_DGV*f!KVgILGihQSjKkKjKV\`G_!KA^lXYQaI*I*XYKmILQYZAKmILGino^JILnpP%q!T$r
stImnoGjKoV*RmG_bXSùILP!KvW+^JR*QYnR]GwBILPbKvG_!Kyxtf!QYZAKo_bR*QYG_+^XFHKz#I*G_%Z{KyILP!G|f$r]}@Kcz~QYXYXnoGjKoVmZcq!XaILQYfbQYZAKo_!R\QYG_+^JX
FHKyz#ILG_eZ{KyILP!G|f!R#QS_8I*P!Km_!Ky|IHT!V*GW!XYKZR*KyIr
C 1Us_8ILPbQYR~T!V*GW!XYKZedb`Gq8z~QYXYXz~V*QSI*K3ÂFOKz#I*G_D^XShGV\QSILPbZw5GV#R\GXSj|QS_!hvILP!KmR\I*V*qbI*Rg^J_!fv\GQY_6ILR#T!V\GW!XSKoZQS_
b
ILPbK{+hqbV*KAWKXYGzmrlGq@z~QSXYXBq!R\K{I*P+^JIkFHKz#I*G_R\GXSjKoVmILGQS_6jKoRILQSh6^JILK{êTbV*GW!XSKoZI*P+^JI=P+^RmIzGDnGV*V\KonyI
R*GXYqbI*QYG_!RrFOGILKJdJI*P!KTbV*GW!XSKoZQSR>R\`|Z{ZAKI*V*QSn^JWGq&II*P!Kx^&QYRodR\GgQSI>no^_kWKILV*Ko^JILKfc^R^~G_!Kyxtf!QYZAKo_bR*QYG_+^X
T!V\GW!XYKZer
x1, y1
(0,0)
(d,0)
+GV-ILPbQYR#T!V\GW!XYKZed!^R*R\q!Z{K33gBmQY_ILP!KmRILV*q&IHnG_!RILQSI*qbILQajKmKoq+ÎLQYG_$r
1e+GVILP!KR\ILV\qbILRi^_!f@\GQS_I*RlK!^JZ{T!XSKD^WGjKJdOR\q!T!TGR*KDI*P!K<^TbT!XYQSKofw5GV*noKP!^Ri_!G_boKoV\G(^_!f
noGZ{TG_!K_I*RmR\G=`Gqn^_{_!GIq!R*KUR`|Z{ZAKI*V\`byr]QSjKOI*P!KU_!G|f+^X+w5GV\Zw5GVR\`|R\I*KoZGJw
_!G_!XSQY_!Ko^VKoq+^JI*QYG_bR
z~P!QSnpPn^J_uWKmq!R*KfeI*GAfbKILKV*ZAQY_!KUI*P!K]jJ^XYq!KR~Gw$-^_!fu|hQajKo_u+d/=^_bf>Jr
4 1O&q!TbTGR\KHzKg_!Gz^JR*R*qbZ{K~ILP+ÎI*P!Kg^JT!T!XYQSKofw5GV*nKgQSRG_!XSÀ^JnILQS_!hmQY_AILP!KO_!Koh^JILQajKOvf!QYV\KonI*QYG_rB]QSjK
B
ILPbK3G_!KUf!QSZ{K_!R*QSG_+^X_!G_!XSQY_!Ko^VK|q!^JILQSG_iw5GV-W6`lKoXYQSZ{QS_+^JI*QY_!hmILP!KU$jJ^V\Q^W!XSKgq!R\QY_!hvI*P!KUw^nIILP+^JI-W`
R\`|ZAZ{KyILVìILP!KmP!GV*QYG_6Ip^Xw5GV*nKoR~Zcq!RIHWKmQY_8W+^XY^_!nKr
&1&qbT!TGR*K%[ uQY_:ILP!K{Kyb^ZAT!XYK{^WGjKd^_bf^R\R*q!ZAKQYR=^nI*QY_!heQY_:ILP!K{_!Koh6ÎLQSjKl%f!QSV*KonyILQSG_@R\G
ILP!^JIm`Gq@n^_:q!R*KILPbK{G_bKyxfbQYZAKo_!R\QYG_+^JXKoq+ÎLQYG_@f!KV*QajKof@QY_%T+^VIlW/rA}@V*QaILK^8Zl^JI*X^W<R\noV\QYTbImILG8q!R\K
FHKyz#ILG_RZ{KyILP!G|fvILG]no^XYnq!X^JI*KILPbK\GQY_6IBf!Ky+KonyILQYG_l^RÛw5q!_bnILQSG_GwW6`vR\IL^V\I*QY_!hUz~QaILPAU(&pO(
^_!f<I*P!Ko_%QY_bnoV*Ko^R*QS_!hiILPbKvZl^h_!QSI*q!f!KkGJwI*P!K^T!TbXYQYKfw5GV\noKvQY_KoQYhPIUQY_!nV*KoZAKo_6I*R]GwA>kgb6&r=HR\K
^RI*P!KgQS_!QSI*Q^Xbhq!KoR\Rw5GVFHKz#I*G_$R-ZAKILPbG&fAILP!KHR*GXSqbILQSG_lGwILP!KUTbV*Kj|QSGq!RXSG6^f$r+GVI*P!KU(=R*GXSjKJdbq!R\K
r
ILPbK3QS_!QSI*Q^Xhq!KR*RO|B
&1mFOGzI*P+^JIO`Gq<_!GzI*P!KkR\GXYqbI*QYG_uw5GVHmHbSDZAK^J_!QY_!hAILP!K3w5GV*nKkQSRg^nyILQS_!h{fbGz~_6z^V*f+ydzGV*
W+^np6z^V*fbRgW6`V*Kf!q!nQY_!hvILPbK3Z{^h_bQSILqbf!KUGwILP!Km^JT!T!XYQSKofiw5GV\noKmQY_KQYhP6I~RILKoTbRodO&6|r]_bnoK3^h6^QS_$d
ILPbKkQY_bQSILQY^X
]q!KR*RHw5GV~I*P!KcFOKz#I*G_QaILKoV*^JILQSG_u^JIHK^npPDXSG6^fuR\I*KoTR\P!Gq!XSfWK3I*P!KkR\GXYq&ILQYG_uGJwILP!K3T!V\Kj|QYGqbR
XYG^f8R\ILKT$r
1H}P6`lQYRI*P!K]R*GXSqbILQSG_l`GqefbKILKV*ZAQY_!Kfiw5GV#>Obd|ILP!K]XY^R\I#XSG6^fiRILKT$d+GwT+^VI3nf!QSKoV\Ko_6I#w5V*GZILPbK
R*GXYqbI*QYG_`GqenGZ{TbqbILKfuw5GV-I*P!K3Obd!ILPbKm!V*R\IOXSG6^fR\I*KoT$d^JI#I*P!KmWKhQY_b_!QY_!hvGJwT+^JV\Ikf/\

!#"$%&('*)
+-,+./.01
23/45
"687:9<;3 =?>@ 4AB,6C
DFEGH6IKJMLHON6PHRQ$HG8PGNSGTH&UVH!NSLXW#Y[Z]\A^_PBÀSE6abLH&c
PH5Gd^PBeAEG8UAfgH:fAe
GT^_L-GT\gHh$i+GT\kjgabflG8^G
\bN6P]PHm
H&STN6L#ÀSE6abLHOcnP5Y
o 1qpVr!sutdvxw5y!sz|{|tO}Ky!{~{|!&{|wt!~wr+}suyd+{uz|s~-}ws*t!syF]yO*{y!{6O~*{|wtsu+{|wt!~u!s!{|z{|&
*{!vTt+z5~{|~-5!{|uz|zBzsy!~-*wr!sXwzz|wk{|t!}Kt!wt!z{|t!su#w{|~*~wtdsu+T*{|wt
&
&

T
k{*rr!sAw!tOy+uwt!yO{{wt!~
x+
+
@
Tt!y

O
R

kd~*{|v@!zsM+t!{*sy!{susut!s@~r!suv@s{~3O~*suy?wd~*wzsr!s@t!wt!z{|t!su[w{~*~wts5!T{wt?wttktOw&y!s
}{|y$O*r!sy!{|~usss5!T{wt!~q*s
kr!s*sAsAk{z|zwt!~{|y!s$r!sA{t*suTz
Xw#

t!y
M
M

T
&
T
T
M

wTs
t!w
r!s8t!w5y!su~
t!y
*s]t!wT{|t!z|!y!sy{|trOs
{|wt$!
!
]
!*r!
*w!}r]
r!sw!t!y!]w
t!y!{{wt!~
y!{~*us*M
{|
O
k*
TrO
suk sutsq
A*wsk*r+TA*r!squw!{tT~*~*w5u{|Tsyk{rM*r!skwsky!{|~us{usy@s+T{wt!~A{~At!wt!~{|t!}Oz*su}*y!zsu~~
*r!sTz|Osu~Xwr!s
w
~ pVr!T#y!w5su~*r!{|~{|v@!zwO-y+Tvxsuy8qs#wt]vxsr!w5y!~-T!!z|{suy@w~*wz5{|t!}

rO{|~k!*w!z|sv
wzsKr!sTwsKsu+T*{|wt!~k{r%xvM!z*{|y!{vxst!~*{wt+z$s#*wt ~vxsr!w5y<!~{|t!}]Bsuw]{|t!{{Tz
}!su~~
X wrOs
~ut!y
@ suv@wt!~*Tsr!Tw!!w}*vr!{ssu~K+Ty!T*{|uwtsu}sut!sATt!y
s
*r!st!vM
suw

s#*wt]{*su*T{wt!~*su!{*syBwM{t!~*!swt!s+{|t
y!ssuvx{t!
u!*{|tx*r!s~wz|&
{wt]Xw-wxu~*s~ubkr!sut
Tt!ydkrOsut pVr+Tqr+T!st!~#kr!sut

!~{|t!}xy+v@syRs#wt<*wB~*wzs[Xw*r!s
~kr!st
ww!}rOz8r!wz}s3 ut
q
u!*@{txXssur+t
*su~{|y!!zsTz|+{|wt!~ s~*O*s*wMuw!t
w]uwv@!O*s
w3wt!s!{|t
!r*wRy+v@r!s]t!s# wty!sz
su Xw*vwRyOss*v@{|t!s]rOwvM
rOsBX!t!*{|wts Tz|+{|wt!~Kw
? w
vxt]z|{t!s~5~*suv~*wzO{wt!~sus3su!{*suy
, 1@wtO~*{|yOsu3*r!s]Tuw!{|t&_X*sus@v@srOw&y?y!su~u*{sy{|t:uz|~*~ t:r+!O*wr6gs#wt ~Mv@s*r!w5y{|~

!~suy*w?~wzsd
t!yr!sF[-z|}w*{r!v{|~!~suyw:~*wzs]r!sdqs#wt!y+T *sFs+T{wt$

k{|zz+t!wTw*suy6&!&-{|~{|tO~suyBwtOz

A ~*{t!}[-{vxOz|{|s~gr+

/wK
X/
*w5y!!*~u6
Xy
/{|~
*r!sut]vxy!suw!{t wv*{ Xsus
~suy]
!
Xw *vv/ *{s*w/O
!
r
s
@
v

s
*
!
r
&
w
X/
8!~*{t!}r!s[O!*w&{|vxT{wt
w =rOs@t!vxsxw!{|t&XsusMXwzz|wk~X*wvrOs@r+3wt!z%]wO{t!sw8uwvx!&s@
!
{|~t!susy!suy6
Oqt!wTwO{t!swwvxOOs

!
X
w-rOs[yO{|~**s{|suydt!wt!z{|t!su-w{|~~*wtdsu+{|wt8Xwv!w!zsuvwt!s+wvx!*srOs[wts*}st!us3T*s
~t!y+T*ys#wtwFr!s]wtsu}sutOusdT*sBw#rOsduwO{t&_X*ssv@srOw&y
wq
wRuwv@+sBuwtsu}s
t!yds&!z|{|t]w!k*s
su~+vTs[~*suv@{|zw}!zw#wq|
| |
su~*!~BXw-r!swxv@s*r!w5y!~k
/s tu +!~*s X/
~*Oz~ - w#!O*w~*s~wr!{~ks&s*{vx
t!y
w-*r!sK3-z}w{rOvF&~*~!v@s
kr!sus
{|~qr!s *su~{|y!!zgtOy
uwtsu}sut!s@krOsut
{|t!z|z!kr!st?T!!*w&{|vxT*{|t!}
T
rOs[vxT**{_suw#O*w5y!!u+!~*s
y!s!t!suyws

t!y
r+TK{|~lr!w
w u!sutwvxs@w!Muw!{t5X*ssxv@s*r!w5y:vxw5y!{5{t!}
uzw~*s#*w[suw3ut@w@vTsq
!~{|t!}~t!y+T*yxs#*wt ~
wnuzw~*s#*w[suw3ut@w@vTsq
/A
/
v@srOw&y
!!w~*sBwsB{|ts*s~sy{|tzw u!uz|!zT*{|wtO~ugTt!ywt!z?tOsusuy*wRuwtsu*}s8s#*wt ~
v@KsrOw&y<~wx*r+T

k{|zz$w%rO{|ssKrOsKy!s~*{|suyRTuuO{tFrO s
wqkr+Tz|Os3w
Xssu~X!tO{wt8sTz+T{wt!~
t!wTr!sO!*wTrw%5{|t!}%w?!!zs#wt ~xvxsr!w5yO~*{|tO}?wtOzX!tO{wtsTz|!T{wt!~{|~*w
y
w vx!&sr!s3w!{|td!!w&{vT*suz]+t!{sy!{6 s*sutOusu~
u
r+Tk{|~

X * #

kr!s*s
{|~-rOs[su*wkk{r
{|t]rOs
3sutdt!ydusuw~qsuz|~skr!s*s - w
suqs#wtF{sT*{|wtxwOz|yd ~*!r!t!{*syO{su*st!usv@s*r!w&y]su!{*s
vtBXOt!*{|wtdsTz+T{wt!~
0 1t*r!{|~q!w!z|svF!wk{z|z
y!s!!}BwOvxTz|8+~suyRqs#wt<~*wzsqXwqTz|uOzT*{|t!}rOs[Xwus3su!{

z|{!*{!vw~*{{|wtw w{t~M{tz|wy!suy~!*!*s wKw*susu{t!z%suzuwv@sx*w<k*{*sxw!wkt
s#wtF~*wzskXwv~u*Tr$+kr!{r!ssuqw8+t!yds ~{|s

r!s[!*w}v~*r!w!z|yd*suy]r!sXwzz|wk{|t!}3Xw*vxTk~k{t!!O
w{t*~u
"!$#$ %'&)(*#$%,+.-/!$#$0 132,&54$68796:&;% <2,&54$687=6:&;%
"!$#$ {|~ktd!{*z|sz_ %'&)(*#$%,+.-/!$#$0 {~#t8{|ts}su>w{|t#t!w5y!st!vMsu/t!y 132,&54$687=6:&;% Tt!y

<2,&54$687=6:&;% sq*r!sw{|t ~ t!y@?uw5w*yO{|t+T*su~Akr!str!sq~**O*~-stOw~s*r!suy r!sq~**!!s{|~t!w
kr!s*s
z|wy!suy
~**O*~u
4$ "!$#$ %'&)(*#)A %'&)(*#$B #$ ;4$796:C$6879<

"!$#$ {|~qtF!{*BzTsz_ %'&)(*#)A t!y %'&)(*#$B T*s{|ts}suw{tkt!w&yOs[t!vKs*~utOy #$ ;4$796:C$687=<
kr!s*s
{|~#*r!ssuzT~{u{wrOs[~O
z|wy!~u
8 "!$#$ %'&)(*#$%,+*-/!$#$0 1 DE&;0FC# <DE&;0FC#

kr!s*s "!$#$ {|~3t!{*%zsuz %'&)(*#$%,+*-/!$#$0 {|~r!sGw{|tKtOw&y!sxt!vMsu3krOsu*sxr!sBzwy:Xw*s{|~
!Oz|{|sy$+tOy 1 DE&;0FC# t!y <HDE&;0FC# *s*r!s
t!yI?Bwv@wtOsut~kwr!sXwus
Aw<y!su{|y!s[wx!~*s3w!vxT*z8~**{O~qk{rFsX!zz]{t!~*s*suy<!O}~ur!st<wtF*r!sKw!~*sKsu<!}s
wFk{|zz+tOy~*sqw FTz|B+z|s~#kr!{|rd!~*s[qs#wt ~qv@srOw&y]w+tOyd*r!ssu!{|z{|!{|!vw~*{{wt]wl
~*swTl~*&~t!y w{|t~#~! ssyws&*sut+z
Xw*su~ r!sw&y!s!~su~qr!st!w5y+z
Tt+z5~{|~-XwvM!z{|wt$
~*wwFvxBk{|~rd*w@s5{|s *r+TkvxTs*{Tz6sXws[~ {t!}Mr!{~k!*wOz|suv
L
KJ
*
M
0F# "(;4$<54$7#$-/NO- %'#$PK7&;%QNO- 0F# "(;4$<R4$7#$-/NO%'#$PK7&;%QNOO&)"(;ST#$PK7&;%QNODE&;0FC#)NO%'#$PK7&;%QNO- O&)"(;SG#$PK7&;%QNO-
r sy!{su+zsu~XwrOsM~**O*~ w{t~w&y!sM*s
!
t!y
*suy!~{t
rOsx!tO~s*r!suy
{|t!{{Tz w{|tw~*{{|wt!~u
rOs@~OKwt!t!s*{5{ltOy?rOsxT!!z|{suyRXwusu~
uwt{|tO~k*r!suw5y!sXwks #wt ~qvxsr!w5y

r!s+zs
wt!~*O~krOs[{|}rr+tOyd~{|y!sTt!y
w!{TtXws#wt ~@vxsr!w5 y
rOs8!z|s
uwtT{|t!~w&y!s]*wRzu!z|Ts*r!s8~**OXwusu~Tt!y
susu~~w%v@w&y!{:twTz|zw
y!s*{TT*{su~wkXwusu~ vxs]t!
Tt!y
~
5
u
~
u
s
!
r

s
*
3
s

!
r
u
s
[
s
u
s
|
z
u
s
~
k
O
r
|
{
F
r
k

{
|
z
z

}
{

s
u

z
!
u
s
~

k
~
@
w
k
+
r
T
|
{
#
~
k

w
O
t
}

s
~5~
t!y8s~ ~5~
DE&;0FC#)NO-
FU
U
A*w5{|y!s3qs#wt~*wzs*r+Tkw*~wtds~ ~&~*su~ ~5~t!yds~ ~&~

wxy!wx*r!{|~yOss*v@{|t!s
{lr!s3!*w5{|y!sy<uw5y!s[w~ {g{y!w5su~w<r+s3w !~{]r!{~wt!uz!~*{wt${l{y!w&s~*t w zzr!s
w@s&!z|{|tBkrdtOydO]{
G
svxwt!~*T*s@*r+TKw!Msu~!z*{|t!}s#*wt ~Mv@srOw&y$g!!z{|suy:wFsrw#*r!ssOvxOz|sO*w!zsuv@~u
uw tsu}su~5!y!{|uz|z
V1
r!s*ssMxXs
wtFvxsr!w5y
s#
v@ws3s~sOvxOz|su~qr+k{|zz=!~s!*w!z|svx~sstRXwxuw*suz!w}*vxv@suy
W
8w!y!suO!}}syRs#wt&+~suyR~**O*~tOy w{t*~~wzsuwts!Tvx!zs[*su~
#y8
kr
t!
<w!y!suO!}}sys#wt5+~suy?~O~[tOy
pVrd*s*r!s[t!~su~yO{su*st
^]
\1
`_
FX ~5~
pVr+Tr+Ost!~
Yw{t~~*wzswt:sOv@!z|s~su~Z ~5~Kt!yRs~[ ~&~
LU
Eb b b
a[ UU[
~
r!sB*y!{{wt+z s#*wtv@srOw&ywts~
{|t+z At*{su~M?vx{y!t!{}r *wsu
[s rOw&ydwBwtsu}swt*r!sMv@w~sOv@!zsu~u6t!yR{tFrOs
A*{u sKXwkrOsMs*~wt<kr!wButR}ss#wt ~v@
T~*s[wgM*{|sO{|t]r!sXss~{sT*{|wtO~
c
r!t!}sw!~*&~tOy w{|t~3!*w}v~w<r+TKr!sXw*us{|~KtOw sOwt!st{Tz|z:suzsuy?wRr!w

su{+uz|zg~!!w~*s*r!sBvx}t!{!yOsw#*r!s~*&*s~w*{|tO}Xw*sB{|~
vM !r*r!sB~*&r+~M~srOsuy
}{sutd
&
wTs{O*r!sr!t!}s{|tKzsut!}TrK{~~vTz|z_*r!s-wsXw*vsuy!!su~g*wrOsv@w&y!szwTz|*suyO{vx!zsuv@sutsy
r!s8w!kt!s!*w}vwt8~*wv@s
nsOv@!z|s~
s*{B5!y!{|wts*}st!us
ed
gihkj ml onqp rg j l wg

Ef
f9s)t vu s f f f=s)t
HJ axy]
z
{b
kr+Tss~*rOsuv@su~wut*r!{|t!wTw:}sBqs#wt ~xvxsr!w5y*w?wts*}sFwtn~@vxtw
s
~ ~5~r!w!}r*su~
sKXws*tOsuy$r!w ssu
ssutRsKr+sut tOsuus~*~**{zF~*!
~5~~qwRut

ussuy!syk{*rdz|zwl*r!suv
|
}

!#"$%&('*)
+-,+./.01
2345&
"6798:<;= <>&?@A4B<C7D
C 1GFIHKJ*L!MONQP!RTSU!VXWZY\[S]K^_MXVOV!WG`!abYKMXH!W_VOS@caVEJ*R*]!HEcabJ*MOSHdWZRTR*SRe@NfJgaUEMOVOMhJI[iaHEjkcZSHlWZRTmWZH!cWon5SRpaHdWqÈaYdP!VOW
E
YdWJrLES&jkJ*L+abJ#jES&WN-H!SJscZSH7lWR*mWtlWZRT[K^sWVOVuQvsSH!NTMOj!WR-]!N*MXH!mwJ*L!Won5SVXVOS^_MOH!m3MOH7J*WZmR*abJrMXSHkYdWJ*L!S&jKJrSxNTSVXlW
n5SR-J*L!Wtyz5{T|-^_L!MOcgL}NrabJ*MONf~+WZN yz5{T|Q!yz5{T|Ge
I
y+yZ
(+y

^_L!WR*WybuQWR*WtyaPEP!R*S`&MOYKabJrWN-yz5{T|#abJ#J*MOYdWPSMOH7J#{Q(
a7|#WJ*WZRTYKMXH!WoJrLEW3VXS&cZaV6J*R*]!H!cZabJrMXSH9WZRTR*SR_SnJrLEMONdTVXWaP&n5R*SmiYKWqJrL!S@j$u
U/|-FIN#JrL!WYdWJ*L!S&jNfJgaU!VXW>MXVOVJ*L!WYKWqJrL!S@jcZSH7lWZR*mW
c |-BVXSJ_aH!j9cSYdP+aR*WJrL!WtcSYKPE]EJrWj>abH!jJrL!WtWGÈacJN*SVO]EJ*MOSHkn5SRsJrL!WocaNTWwk=e+aH!j9SH9J*L!WtMOH7JrWRTlbaV

{-EZbuQN*W e &&e!aH!j Ehu
j/|#
S&SkcabR*Wn5]EVOVX[abJ#[S]!R_P!VOSJ*NZe+abH!j>WG`&P!VaMXH9[S]!R_RTWZN*]EVXJrN_MXH}P!aRTJ_cbu
, 1FIH}J*L!MON-PER*SU!VXWZY[S]^_MOVXV6m7aMXH9P!RrabcJrMXcZW^_MhJrL9JrL!WJ*WZcgL!HEMO]!WZN_n5SR#WNTJrMXYkaJrMOHEmxVOS@caVJrRT]!H!caJrMOSH9WZR*RTSRZe

aH!j}aVXN*SxWGÈaYKMXH!Wo^_L+aJ_L+aP!PWZH!N-J*SxJrL!WabcZcZ]ERracq[SnaH9MXHJ*WZmR*abJrMXSHYdWJ*L!S&jK^_L!WZHJrL!WR*WaRTWpT]!YdP!N-MXH
JrLEWdPER*SU!VXWZYj!WZNTcZR*MXPEJrMXSH$NtH!SH!VXMOH!WZaRon5]!HEcJrMXSH$uvsSH!N*MXj!WZR]!NTMOH!m9JrLEWxn5SVOVXS^_MXH!mMXHJ*WZmR*abJrMXSH<YdWJ*L!S&jAJ*S
N*SVXlWtn5SR-J*L!Wtyz5{T|-^_L!MOcgL}NraJrMONf~+WZN yz5{T|Qz5yz5{T|T|qe
I
yy&
zETz5y|7qz5y@
7 z5y|*|*|
^_L!WR*WybuQSJ*WtJrL+abJ#yMON#MXHJ*WZH!jEWZjJrSdaP!P!RTSÈMXYkaJrWyz5{T|#abJ#J*MOYdWPSMOH7J#{Q
a7|QWJ*WZR*YdMOHEW-JrL!W_lbaVO]!WNSbn
EQaHEj7sNTS=JrL!abJQJrL!WVOS@caV&JrR*]EH!cabJ*MOSHdWZRTR*SRpSnJrL!WaUSlWYdWJ*L!S@jKMXNpSR*j!WR
! upH!W3aP!P!RTS7acgLn5SR#cZSYdP!]EJ*MOH!miJ*L!WcZS@WdcZMXWZH7JrNMON-J*SJ*WZNfJJ*L!Wtn5SR*Yw]!Vax]!NTMOH!miyz{T|p*{G*{ rZ{fu
U/|&]!P!PSN*Wtz5yz{T|*|QEyz5{T|GuQN*WJrL!WoaUSlWoYdWJrLES&j$e^_MXJ*LKJ*L!WocS&WqKcMOWHJ*Ns[S]j!WJ*WZRTYKMXH!WZjMXHAza7|qe
JrScSYKPE]EJrWxJ*L!WiN*SVX]EJrMXSH<n5SRoJ*L!WMOH7JrWRTlbaV{=(EZuvsSYdP+aRTWw[S]!RcZSYdP!]EJ*WZj%N*SVX]EJrMXSHJ*SJ*L!WWqÈacqJ

N*SVO]EJ*MOSH}abJ_{\n5SR \ b E b7 hXh usS^naNfJMON#JrL!W=aUSlW3YKWqJrL!S@j>cSH7lWR*mMXH!mKJ*S
JrLEW3WGÈacJN*SVX]EJrMXSH/
c|}S^NT]!P!PSN*Wz5yz5{T|T|
yz{T|n5SRyz5{T|
EaH!jz5yz5{T|T|
oEyz{T|}SJ*L!WZRf^_MON*WbuNTWJrLEW
aUSlW<YdWJ*L!S&j6e^_MXJ*LJ*L!WcS&WqKcMOWHJ*Nk[S]j!WJ*WZRTYKMXH!WZjMOHza7|qeJrScSYdP!]EJrWJrLEWN*SVO]EJ*MOSHn5SRJrL!W>MOH&
* *X!g* X
JrWRTlbaV6{sEZupvsSYdP+aR*W[S]!R#cSYdP!]EJrWj9N*SVX]EJrMXSHkJ*SJ*L!WWG`!abcJN*SVX]EJrMXSH9abJs{QdzE
|
n5SR & 7 Xhh

`&P!VaMXH9[S]!R_RTWZN*]EVXJrNu
uS^nabNTJMONJ*L!W-abUSlWsYKWqJrL!S@jwcSH7lWZRTmMOHEmoJrSoJ*L!WsWGÈacJBNTSVO]&JrMOSH/
01FIH<JrLEMONoPER*SU!VXWZY[S]<^_MOVXVcSYdP+aR*W3JrL!W=~+H!MXJ*Wqj!MXWZRTWZH!cWxaH!jN*L!S@SJrMXH!mkYdWJ*L!S@j!Nn5SRN*SVhl@MOH!mkakH!SH&
VOMXH!WabRkP!RTSU!VOWY>upS]N*LES]!VOjN*WZW<J*L+abJKJrL!W<JI^SYKWqJrL!S@j!NkaR*W<]!MhJrW<jEMXWZR*WH7JuvsSH!NTMOj!WRkJrLEW<N*cZaVaR
WZ]+aJrMOSH
yz5{T|Q(NTMOH!L_ycZSN {
&]!PEPSNTWxJrL!MXNoW@]!abJrMXSHAL+aNtaKPWR*MOS@j!MXc=NfJrWabjE[7INTJrabJrWiNTSVO]&JrMOSHSnQPWZRTMOS@j>e6MuWuheyz{T|_yz5{|
^_L!WR*WdMXNaH%MOH7JrWmWZRuiFIHJ*L!MONtPER*SU!VXWZY[S]%^_MOVOVWqÈaYdMOH!WxJI^S}j!MXWZRTWZH7J=aP!P!RTS7acgL!WN3n5SR~!H!j!MOHEmJ*L!MON
PWZR*MXS&jEMOcNTSVO]EJ*MOSH6u#L+aJMONZeB[S]^_MOVOV]!N*W9JI^SjEMXWZR*WH7JiJrWZcgLEH!MO]!WNdJ*S<~+H!ja<N*SVX]EJrMXSHJrS<J*L!W}aUSlW
WZ]+aJrMOSH$e+SH9JrL!WMXHJ*WZRflabV${-EZeE^_L!MXcgLabVON*SiN*abJrMXNT~+WNJ*L!WcZSHENTJrR*aMOH7J#J*L+abJ_yz7|Byzf|Gu
a7|d&WJk=e#aH!j]!N*WJrL!W~+H!MhJrWGIj!Mh6WR*WZHEcZW}YKWqJrL!S@jJ*SNTSVXlW}J*L!W<abUSlW}PWZR*MXS&jEMOc9P!RTSU!VOWY>uNTW
U+acg7^saR*j@IWZ]EVOWZRdn5SRiJrL!W9J*MOYdWqIjEMON*cR*WqJrMOZabJrMXSH$eQx{&E}N*SJrL!WR*W^_MOVXV-UW%H!S@j!WNKMXH[S]!R~+H!MhJrWq
j!Mh6WR*WZHEcZW3j!MONTcZR*WqJrMXabJ*MOSH$eEaH!j>aHMOH!MhJrMOaVm]!WZNTNyz{T|pu
U |#om7abMOH}]!N*MXH!mkU+abcg^saRTj&IW]!VOWRn5SR#JrLEW=JrMXYKWGIj!MXN*cR*WJ*MOZabJrMXSH$eE]EN*W=J*L!W3N*L!S@SJrMXH!mdYKWqJrL!S@j}J*SkSU&JgaMXH}yz{T|
/
n5SR_3upNTW3J*L!WMOH!MhJrMOaV/m]!WZNTNyz7|Qu
c|3S^VOWqJx(Eu}NTMOH!mJ*L!WkN*aYdWKj!MXN*cR*WJ*MOZabJrMXSHabH!jMOHEMXJrMOaVm]EWZN*NxaNwMXHP+aRfJ9za7|qeN*SVhlWKn5SR3yz{T|
]!NTMOH!m3JrL!W_~+H!MhJrWqj!MXWZRTWZH!cWYdWJ*L!S@j$uL+abJpL+abP!PWH!N^_LEWZHk[S]kJrRf[iJrS3]!N*WJrL!WN*LES&SJ*MOH!m3YdWJrLES&jdSHKJ*L!MON
P!RTSU!VOWY
j/|Rf[]EN*MOHEmJ*L!W>lbaVX]!W>Sbnoyz|in5S]EH!jU7[JrL!W~+H!MhJrWqj!MXWZRTWZH!cWYdWJ*L!S&jaNKaHMXH!MXJ*MaVm]!WZNTNKn5SRJrLEW
N*LES&SJ*MOH!mYdWJ*L!S&j6uQS^Yx]EcgL>cZaH}[S]}PWRTJr]ER*Uyz7|#aH!jNfJrMOVXV$L+alWJrLEW3NTL!S@SJrMXH!miYKWqJrL!S@jcZSH7lWZR*mW

!#"$%&('*)
+-,+./.01
23/4567
"$8:9<;%=> ?A@:& 7A4BDC.E'F+GHI 8JKLM 86+GGN7G1
CO1#PRQTS*U!VXWZY![F\]!^X_`ba\cedfVX^X^+]_gcOW*VXQOh>SiUO_kjTlmonpfè\^X_Nqc!^srt[BuOa7Q+rÌVXqNWZY![F\h[irt`vu!_NÌ\Q!WwSi[irSi_NuTVXQIqN^srtW*WNx
y#U!_3W*\z{SRd|r[F_>V}W#Y\WwSi_u~\Q:S*U!_qN\c![FW*_d|_]WFV}Si_tx
r8|W*_S*U!_è\^XuOaÌ\^X_qNc!^Xr[ZuOa7Q+rÌVXqNW|W*\z{SRd|r[*_S*\qr^}qNc!^XrtSi_SiU!_V}W*\S*U!_N[F`Trt^q\èYO[*_NWFW*V}]!VX^}V}SRa\z?^}VXc!VXu
r[Fh\QrSBrS*_NÌY_[irtS*c![*_rtQ!urMuO_NQ!WFV}SRa>\z+!:xy#UO_-VXWF\SiU!_[*èr^8qN\èY![F_NWFW*VX]OVX^XVoSRa
VXWfu!_H+Q!_uDrW

t

y#U!_3_H&Y_N[*V}è_Q8Sr^$tr^}c!_\z
&7 > +x
rtS#S*U!_NWF_>q\Q!u!VoSiVX\Q!W#VXW t:
]/g\dWF_NQ!WFV}SiVo_rt[*_ka\c![[*_W*c!ôSiWgS*\SiU!_kY\S*_NQ8SiVXr^qNcOSwR\c!W*_u$?rQ!uDS*U!_c!WF_\zZrWFS*[*VXqHSqNc&SFR\t\[r
^XV}Q!_u&Rq_N^}^$^XV}WFSfrYOY![*\8rtqU$x
,1MPRQS*U!VXW#YO[*\]!^}_Nà\cdfVX^}^q\Q!WFVXu!_[fS*U!_u!Vo_[*_NQOqN_]_SRd_N_NQSiU!_^}\&qNr^rtQ!uh^X\]!r^S*[*c!Q!qHSiV}\Q:_N[F[*\[-z6\[
]\c!Q!u+rt[Fatr^}c!_Y![*\]!^X_èWx
r8f|\Q!WFVXu!_[#SiU!__Nc+rSiVX\Q

!
\QS*U!_#VXQ8Si_[Ftr^ D OdfVoSiU]\c!Q!u!r[Fa>qN\QOu!V}S*VX\Q!W 8BrtQ!u Bx5_S*_N[*ÌVXQO_rtQ+r^}aS*VXqrt^X^}a
SiUO_3_OrqSMW*\^}cOSiV}\Q$x
]/#|\QOWFSi[Fc!qSrk!Q!V}S*_HRuOV}_N[*_Q!qN_gu!VXWFqN[*_HSiV}rtS*VX\QWFqU!_NÌ_z6\[-q\ÌY!cOSiV}Q!hkS*U!_W*\^XcOS*VX\QTS*\SiU!_g_Nc+rtS*VX\Q:V}Q
Y+r[wS r8-Qc!è_[*V}qr^}^}ax|\ÌY!cOS*_SiU!_3Qc!è_[*V}qr^
WF\^XcOS*VX\Q:S*\IS*U!_>_c+rtSiV}\Qc!W*V}Q!hI&O_uDW*Y!rtSiVXr^
WFS*_NY!WFVXN_W
\zBTO&eOo3rQ!u~OOx
q#\dqN\QOW*VXuO_N[#SiU!__7c!rtSiV}\Q

+

\QDS*U!_VXQ8Si_[Ftr^ ONdfVoSiUA]\c!Q!u+rt[Fa~qN\Q!uOV}SiV}\Q!W 8M:rQ!u g fx>&UO\dS*U+rtSS*U!__HOrqHS

W*\^XcOS*VX\Q:Si\S*U!VXW#Y![F\]!^}_N`V}W-SiU!_W*rè_3rW#S*U!__HOrqSWF\^XcOS*VX\QS*\SiU!_Y![F\]!^X_`VXQ:Y!r[FS rHx
$
u/#|\QOWFSi[Fc!qSrk!Q!V}S*_HRuOV}_N[*_Q!qN_gu!VXWFqN[*_HSiV}rtS*VX\QWFqU!_NÌ_z6\[-q\ÌY!cOSiV}Q!hkS*U!_W*\^XcOS*VX\QTS*\SiU!_g_Nc+rtS*VX\Q:V}Q
Y+r[wS q#Qc!è_[*V}qr^}^}ax||\ÌY!cOS*_3S*U!_>Qc!Ì_N[FVXqNr^?W*\^}cOSiV}\Q:Si\SiU!_>_Nc+rtS*VX\Qc!W*V}Q!he&O_uDW*Y!rtSiVXr^
WFS*_NY!WFVXN_W
\zBTO&eOo3rQ!u~OOx
_g&U!\dS*U+rtSS*U!_mXl*m7*lHlS*U+rtSgVXWMS*U!_[*_W*V}u!c!_kdfU!_QASiU!__OrqSWF\^XcOS*VX\Q~VXWMW*c!]!WwSiVoSicOS*_Nu
VXQ8S*\S*U!_u!VXWFqN[F_Si__7c!rtSiV}\Q$
VXWS*U!_Wirtè_z6\[S*U!_Y![*\]!^X_`VXQAY!r[FS r8rWu!VXWFqN[F_SiV}N_u<V}Q%Y+r[FS ]/H?rtQ!u
SiUO_Y![*\]!^X_`V}QY+r[wS qMrWgu!VXWFqN[*_HSiV}N_NuAV}QDY+r[wS u/Hx7\!SiUO_NW*_SRd\:Y![F\]!^X_èWMU+r_kS*U!_W*rè_rQ+rôa7S*VXq

W*\^XcOS*VX\QrtQ!uSiU!_Tu!VXWFqN[*_HSiV}rtS*VX\QÌ_S*U!\7u!Wkc!WF_NuU+r_S*U!_Wirtè_^}\&qNr^Si[*cOQ!qS*VX\Q_[*[F\[Nx<M\d__N[NSiUO_
_N[F[*\[V}QIS*U!_fQ7cOè_[*VXqNr^Xôaq\èYOcOSi_ueWF\^XcOS*VX\QOWBVXWQ!\SSiUO_W*rÌ_fz6\[S*U!_#SRd|\>Y![*\]!^X_èWxU+rtSVXWBuOV}_N[*_Q8S
r]\cOSBS*U!_#SRd|\3Y![*\]!^X_èW8rQ!uU!\dEuO\&_WS*U+rtSu!Vo_[*_Q!qN_f_H&Y!^XrVXQS*U!_#u!Vo_[*_NQOqN_fVXQSiUO_#Q7cOè_[*VXqNr^!_[*[*\[*WNx
0 1+\[#SiUOVXWfY![*\]!^X_`~!d_ku!_H_^X\Y_Nu~W*\Ì_>_7Si_NQOW*V}_kèrtSi^Xr]qN\7u!_Si\TUO_N^XYa\cc!Q!uO_N[*WwSrQOuD]\c!Q!u+r[wa8

_N^}_NÌ_NQ8SMè_HSiU!\7u!W]aU+r7VXQOhea\cc!WF_>SiUO_kÌ_SiUO\&uSi\eu!_S*_N[*ÌVXQO_3S*U!_>qNrY+rqV}SirQ!qN_3\zBWFVXQ!h^X_qN\QOu!c!qS*\[Nx
y#U!_3Y![*\]!^X_`VXW-z6\[*`c!^XrtSi_uVXQrW#z6\^}^X\dfVXQ!hWN
M\d`kc!qU~qU!r[*h_`cOWFSfd|_Y!c&SM\QrqN\Q!u!c!qHSi\[#Si\[*rVXWF_V}SiW#\^}Sirh_z6[*\`N_[*\\ôSiW-S*\e\QO_\ôSx
_3qNrQW*\ô_z6\[-SiUO_3qU!r[*h_u!_Q!W*VoSRa\QSiU!_qN\Q!u!c!qHSi\[fW*cO[Fzrq_3]8aWF\^}7VXQOhSiU!_VXQ8S*_Nh[*r^$_Nc+rSiVX\Q
g Bw6

77
dfU!_[*_SiUO_gY\Si_QS*Vsr^ g BMz6\[-r^}^+Y\V}Q8SiW \QeSiU!_MqN\Q!uOc!qS*\[-W*cO[Fzrq_rQ!u

VXWZSiU!_Mc!Q!Q!\dfQqU+r[*h_
u!_Q!W*VoSRax_U+r_3u!__N^}\Y_uDrW*_HSM\tzèrtSi^Xr]:qN\7u!_NW#S*U+rtSc!W*_SiUO_3q\^X^}\&qNrtSiV}\QÌ_S*U!\&u:S*\eq\èYOcOSi_SiUO_
qU+r[Fh_u!_NQOW*V}SRaxPRQ<S*U!VXWY![F\]!^}_N`$a\c%dfVX^}^5_HOrÌVXQO_SiU!_qN\7u!_rQOu<S*U!_NQ<qN\Q8_N[FS3SiUO_Iq\&uO_Si\c!W*V}Q!hr
r^X_[*VXQ:Ì_S*U!\&uxBy#U!_>èrtS*^sr]q\&uO_3V}Q!qN^}c!u!_WWF__N[irt^?!^X_NWf^}VXWFS*_Nu:]_N^}\dx
+ 8 !Z}K-y#U!VXWfV}W-SiU!_èrVXQ:[F\cOSiV}Q!_dfU!V}qU~qNr^Xqc!^srtS*_NW#qNrY+rqV}SrtQ!qN_tx
+ o|y#U!VXW#+^}_qN\Q8SrtVXQ!W-S*U!_[*\cOS*VXQ!_dfUOVXqUDrtQ+r^}aS*VXqrt^X^}aq\ÌY!cOSi_W
i
+ +?7Gofy#U!V}W#[*\c&SiVXQO_NWf[*_Nru!WrW*_SM\zY+rQ!_^XW#u!_W*q[*VX]OVXQ!hSiU!_u!V}W*q[*_S*VX_Nuh_N\è_HSi[Fax
7
!oy#U!VXWf[F\cOSiV}Q!_qN\èY!c&Si_NWfY+rtQ!_N^}W#qN_NQ8S*[*\V}u!WNx
6
O}Ky#U!V}Wf[*\cOS*VXQ!_WF_S*WMc!Y:S*U!_qN\^}^X\7qrtS*VX\QèrtS*[*Vox
_ZU!r_Zr^XWF\-Y![*\7VXuO_NuSRd|\f_HOrÌY!^}_NWNr-WF7c!r[*_Y!^XrtSi_ +^X_W
56O&$o$6Li56O&0/$o$L/rtQ!u56OCo$6L
rQ!urW*YOU!_N[F_ +^X_WGN?;$o$6LiGN?C7/,$o$L*G?!/;$}LMdfVoSiUASiUO[*_N_WFc!qNq_NW*WFV}_N^}a%[F_+QO_Nu
u!V}W*qN[F_S*VXrSiVX\Q!WN/rW#S*U!_3!^X_Q+rÌ_NWfVXÌY!ôaxy#U!_3Y!^srtS*_VXWrIcOQ!V}SfWF7c!r[*_>Y!^srtS*_c!Q!Voz6\[*Ì^}aTu!VXWFqN[*_HSiV}N_NuVXQ8S*\
OrQ!u
Y+rQ!_^XWNxey#U!_IW*Y!U!_[*_eVXW3rcOQ!V}SWFY!U!_N[F_Tu!V}W*q[*_S*VX_NuV}Q8Si\~rQ!\Q7cOQ!V}z6\[F` qN\^X^X_qS*VX\QA\zO
8rtQ!u
kY!rQ!_N^}WNxQ!q_a\cU+r_uO\dfQO^X\8ruO_Nu:SiU!_M+^X_W|z6[*\`SiUO_d|_]$Oa\cqNrQ[*c!QrtQ:_HOrÌY!^X_M]a
SRa7Y!VXQOh
gèrtS*[*Vo7
qrt^XqNqNrY Y!^XrtSi_NOx V}z
rQ!uS*U!_NQa\cIqNrQr^}W*\_HOrÌVXQO_SiUO_#`TrtS*[*V x|_-z6\[wd-rt[*Q!_uS*U+rtSBWFY!U!_[*_ t
Srt_NWr
S*VXÌ_S*\[*c!Q$x
r 5!rtèV}Q!_#SiU!_qN\7_IqNV}_NQ8S-èrtS*[*VoY![*\7u!c!q_Nue]aS*U!_Mq\^X^}\&qNrtSiV}\QÌ_SiUO\&u$7rQOuT_H&Y!^XrVXQdfU8aSiU!_`TrSi[*V
8
VXWfQ!_Nr[*ôaWwa&ÌÌ_Si[FVXqz6\[-S*U!_Y!^srSi__HOrÌY!^}_NWNO]!cOSfzr[-z6[*\`WFa7Ìè_HSi[FVXqz6\[-S*U!_W*Y!U!_[*_3_HOrÌY!^X_WNx
]/|y#U!_qNrY+rqV}SirQ!qN_g\zrkc!Q!VoS#Y!^srSi_gV}W#rY!YO[*\&VXèrtSi_^}a8OxkY!V}qN\zr[*ru!WxM\d[irY!V}u!^}aIV}W|SiU!_gqN\^}^X\7qrtS*VX\Q
Ì_SiUO\&uqN\Q_N[FhVXQ!hSi\SiU!rtSrQ!Wwd|_[Nx
q QOqN_Ta\cU+r_:_!rtèV}Q!_NuS*U!_q\^X^}\&qNrtSiV}\Q%qN\7u!_a\cqrQSiU!_QY![*\]!r]!^}a<WF_N_:U!\dSi\Dq\Q8_[FSSiUO_

qN\7u!_ISi\c!W*V}Q!hDr r^}_N[F7V}Q%è_HSiU!\7u$x:+\[SiUOVXW3Y![*\]O^X_N`
d|_Id|\c!^Xu^XVXt_a\cS*\~q\ÌY+r[*_SiUO_TrqqNc![*rqa
\zZSiU!_ r^X_[*VXQè_HSiU!\7uASi\:SiU!_q\^X^}\&qNrtSiV}\QÌ_S*U!\&uDz6\[rhV}_Q%Q7cO`]_[\z|Y!rQ!_N^}WN
]!c&S>\Q!^}az6\[SiUO_
W*c+rt[*_Y!^XrtSi_>Y![*\]O^X_N`xg_\Q!ôa~d|rQ8Sa\cASi\Td\[*dfV}S*USiU!_kW*c+r[F_Y!^srtS*_kY![F\]!^}_N` ]_qrc!WF_VXQS*U!_NWF_
Y![F\]!^X_èW-S*U!_Y+rQ!_^XWr[F_r^X^$WF`Trt^X^WFc+r[*_WNx
y\u!\eS*U!_ r^}_N[*V}Q~qNr^Xqc!^srtS*VX\QOa\c~dfVX^X^
QO_N_NuDS*\Tq\èYOcOSi_SiU!_>VXQ8Si_h[irt^\tzreY\Si_Q8SiVsrt^\_N[rIW*c+r[F_
Y+rQO_N^xe\Q!\SSi[waASi\qN\è_c!Y<dfV}SiU<rQrQ+rt^}aSiV}qz6\[*`cO^sr S*U!\c!hUVoza\cu!\OY!^}_rWF_^X_SÌ_Q!\d! x
y#U!_[*_Drt[*_`TrQ8ad|ra&WeSi\rY!Y![F\&V}`TrtS*_Nôaqr^}qNc!^XrtSi_S*U!_[F_Nc!V}[*_NuKVXQ8Si_h[irt^xPzMa\cEr[*_WFS*c!qz6\[IrQ
rY!YO[*\8rqUq\Q8SrqHSfSiU!_y"z6\[#W*\Ì_U!_N^}Y$x
u/#&cOY!Y\W*_SiUO_qN\^}^X\7qrtS*VX\Qe\[ r^X_[*VXQe`TrtS*[*V T_c+rtSiV}\Q!Wfr[F_]_NVXQOhW*\ô_Nu]8a rc!WFW*VsrtQ_N^}VXÌVXQ+rtS*VX\Qx
U!V}qUDdfVX^X^
Sir_^}\Q!h_[z6\[M_[Fa~^sr[Fh_3Y![*\]O^X_NÌWNz6\[*ÌVXQOhIS*U!_kèrtSi[FVo\[MW*\ô7VXQ!hISiU!_>`TrtS*[*V $#U!V}qU
Srt_NW^X\Q!h_N[-z6\[-SiU!_W*_3W*èr^X^Y![*\]O^X_NÌW%#
_ PzMa\cKr[*_c!WFVXQ!h '&Si\<W*\^}_:S*U!_qN\^}^X\7qrtS*VX\Q\[ rt^X_N[FVXQèrtSi[FVo_Nc+rtS*VX\Q!WdfU!V}qUrtY!Y![*\rqU

d\c!^Xu:a\cY![*_Hz6_N[N r^}_N[*V}Q\[-q\^X^}\&qNrtSiV}\Q#PRQ:z6\[*ÌVXQ!h>a\c![ r^}_N[*V}Q`TrSi[*V Ou!V}ua\cc!WF_>rÌ_S*U!\7u
SiU!rtShc+r[*rQ8Si_N_W#SiU!_èrtSi[FVoedfVX^X^]_Wwa&ÌÌ_Si[FVXq #
zig&c!YOY\WF_>S*U!_kq\^X^}\&qNrtSiV}\Q\[ r^}_N[*V}Q~èrtSi[FVo_7c!rtSiV}\Q!WMr[*_>]_VXQ!heW*\ô_Nuc!W*V}Q!h:r)([wa&^}\W*c!]OW*Y+rq_
Ì_SiUO\&u<^XV}_ '
&3x$U!V}qU<dfV}^X^5Sir_I^X\Q!h_N[z6\[_N[wa%^sr[Fh_Y![*\]!^X_èW?z6\[*ÌVXQ!hS*U!_`TrtS*[*V A\[W*\^}7VXQ!h
SiUO_3èrtS*[*Vo#:U+rtSMru!u!VoSiV}\Q+r^VXQOz6\[F`TrSiVX\QTu!\a\cQ!__NuSi\erQ!WFd_N[fSiUOVXW#c!_NWwSiV}\Q#
hy#U!_W*c+rt[*_Y!^XrtSi_e_HOrÌY!^X_Wr[F_Tu!V}W*q[*_S*VX_NuV}QS*\ h[*V}u!W>\tzfc!Q!V}z6\[*Ì^}aDWFVX_NuY+rQO_N^XW O
OrQ!u Ez6\[kSiU!_eSiU![F_N_:_HOrÌY!^}_NWr]\_Hx*5!rtèV}Q!_eS*U!_:qN\7_IqNV}_NQ8SèrtSi[FVoh_Q!_N[*rtSi_u]a
a\c![ rt^X_N[FVXQ_N[FW*VX\Q\tzfqr^}qNqrtYz6\[>S*U!_NWF__OrèYO^X_NWrQ!uQ!\tSiVXq_TU!\d`TrtQa\z#S*U!_èrtSi[FVo_NQ8Si[FVX_W
U+r_g_HOrqS*^}aSiUO_gWirtè_ftr^Xc!_tx+"gW|r3z6c!QOqSiV}\QT\tz
A&UO\dèrQ8ac!Q!V}c!_Mtr^}c!_NWr[*_SiU!_[*_MVXQSiU!_M`TrtS*[*V}qN_NW
rWFW*\7qNVsrSi_NudfVoSiUSiU!_fc!QOV}z6\[Fèôau!VXWFqN[*_HSiV}N_NuIW*c+rt[*_fY!^srSi_NW,#-B^}_rWF_hV}_frtQI_&Y!^srQ!rtSiV}\Qz6\[5a\c![[*_W*c!ôSiWNx
U /.r_Tr~WFY_qNVsrt^Z_N[*WFVX\Q\z-a\c![ r^X_[*VXQOVXN_u%qr^}qNqrtYSiU!rtS\QO^}aDd\[*Wkz6\[>c!QOV}z6\[FèôaDu!V}W*qN[F_S*VXN_u
/
W*c+rt[*_Y!^XrtSi_WNrQ!uDc!WF_kSiUO_k\]!WF_N[wrSiVX\Qa\c`TruO_kVXQ h-Si\Tè\[F_>_0IqNVX_Q8Si^}aq\èYOcOSi_3SiU!_q\&_0IqNVX_Q8S
èrtSi[FVoez6\[#SiU!V}W#W*Y_NqNVXr^$WF_SM\z5YO[*\]!^}_NÌWNx
V{
7c!Y!Y\W*_Ba\c>Q!\dS*[FaMSi\fW*\ô_BSiUO_Z^}VXQ!_Nr[$Wwa&WwSi_`z6\[$SiU!_qU+rt[*h_B_qS*\[1|]ac!W*V}Q!h#SiU!_ '&r^Xh\[*VoSiU!`x
-rQDa\cAc!W*_>SiU!_\]!W*_[FtrtSiV}\QDV}Q h8fSi\TW*c!]!WwSrQ8S*Vsr^}^}a[*_u!c!q_SiU!_Ì_NÌ\[waQ!_N_u!_NuASi\TW*\ô_c!QOV}z6\[Fèôa
u!V}W*qN[F_S*VXN_u~WFc+r[*_3Y!^XrtSi_W%#
Course 6.336 Introduction to Numerical Algorithms (Fall 2003)

Solutions to Problem Set #2
Problem 2.1
a) Assume, for the sake of simplicity, that all resistors in the line are of resistance R. The
structure of the N N conductance matrix G is:
G=
2
R
R1
0
..
.
R1
..
1
.
R
. .
..
.
.
..
..
.
.
0 R1
2
R
R1
..
0
..
.
R1
2
R
The matrix G is a tridiagonal matrix (i.e. a band matrix of bandwidth 2). By inspection,
the number of nonzero entries in G is N + 2(N 1) = 3N 2.
b) The matrix problem for the resistor line, written in terms of the resistance matrix G1
is G1 i = v where i is the vector of current source currents owing into each of the nodes,
and v is the vector of node voltages. For our original resistor line, i is a zero vector.
Suppose now that the jth entry of the vector i is nonzero. Physically, an injection of
current into node j will cause a change in all the node voltages. The jth entry of vector
i multiplies only the jth column of G1 . So a change in all the node voltages in v will
be algebraically possible only if the jth column of G1 consists of all nonzero entries, i.e.
G1 i,j =
0 for all i.
By extending this argument to all entries of the current source vector (and all columns of
the resistance matrix), we see that the N N resistance matrix G1 is full, i.e. will have
N 2 nonzero entries.
c) The factorization of the tridiagonal conductance matrix G produces two bidiagonal fac
tors L and U , such that LU = G. In order to see this, lets examine the rst few elimination
steps for the matrix G.
After the rst elimination step we get:
2
R
G(1) =
0
.
.
.
R1
3
2R
R1
R1
.
0
2
R
..
..
And after the second:
..
.
..
.
.
.
0
..
.
R1
1
2
R
R
2
R
G(2) = .
.
.
..
.
R1
3
2R
0
..
.
0
..
.
4
3R
R1
..
..
.
..
.
2
R
..
..
..
..
.
.
1
0 R
0
...
..
.
1
R
2
R
Each elimination step targets only one row in the tridiagonal matrix G. In addition, the
triangular block of zeros in the upperright corner of the matrix remains untouched. Thus
after all N 1 elimination steps, the L matrix will feature ones on the main diagonal, and
the N 1 multipliers on the subdiagonal. The U matrix will also be bidiagonal, with the
pivots on the main diagonal, and Rs on the superdiagonal. It follows that the number
of nonzero entries in L or U is thus N + (N 1) = 2N 1.
For N = 1000 the number of nonzero entries in G1 is 1, 000, 000 while L and U will each
contain only 1999 nonzero entries. It is not a good idea to use the inverse of a matrix
for solving the matrix problem due to the excessive number of required multiplications
proportional to the number of nonzero entries.
d) To determine the smallest entry in the resistance matrix, lets use the fact that an entry
rij if the resistance matrix is simply a voltage at node i caused by a unit current source
connected to node j. And we are looking for a smallest entry, i.e. the case where voltage is
minimal. You can easily gure out that we should put current source at one ends of the line
and examine the voltage at the other end of the line. In other words, the smallest element
for an N xN resistance matrix is always r1N or rN 1 they are equal, since matrix inversion
preserves symmetry.
Now imagine our line with current source connected to the rst node. The voltage at node
N is evidently
r1N =
1
N +1
(1)
Note, that for N xN matrix correspond to N + 1length resistor line.
Problem 2.2
a) It is very easy to nd counterexample, just remember the fomula for, say, second pivot:
a
22 = a22 a21
a12
a11
(2)
Evidently, there is no guarantee that |a

22 | |a22 |.
b) The statement is true. For example, (2) implies that |a
22 |le|a22 |. Therefore we have
proven the statement for order 2. To prove the statement in general case, we need to use
2
mathematical induction. Lets assume that we have proven our statement for the order
N 1. Now we need to show, that for the order N of the matrix, after eliminating rst row
from all subsequent rows, the resulting (N 1)x(N 1) submatrix:
1. will have all positive diagonal entries, no larger than original diagonal entries
2. will have negative odiagonals
3. will be strictly diagonally dominant
4. will be tridiagonal
Thie rst statement we have already shown, since all multipliers except for the M21 are
zero. Second one is also trivial. Same is the last one.
Now, lets show that the third statement holds. Since initial matrix is strictly diagonally
dominant, we know that |a22 | > |a21 | + |a23 |, and |a11 | > |a12 |. The only thing we need
to show is that the number we substract from a22 is less than a2 1, which is also evident.
Therefore we have:
|
a22 | > a22 |a21 | > |a21 | + |a23 | |a21 | = |a
23 |
(3)
We have proved the statement in general case.
Problem 2.3
a) For N (y, k) = I + yeTk and given k, the matrix N structurally looks like:
k
1
y1
0
. .
.
..
..
..
...
N (y, k) = 0 1 + yk 0
.
..
. . ..
.
. .
.
.
0
yn
1
where y = [y1 yn ]T . A simple check will help you verify that N 1 is structurally similar
to N . Let
k
1 w1 0
. .
..
. . ...
...
= 0 wk 0
.
.. . . ..
.
. .
.
.
0 wn 1
N 1
Furthermore, since N N 1 = I we get the following system in n unknowns:
w1 + y1 wk =
w2 + y2 wk =
... =
(1 + yk )wk =
... =
0
0
0
1
0
wn + yn wk = 0,
or equivalently N w = ek . Solving this system for w = [w1 wn ]T gives a formula for

obtaining N 1 :
N 1
1
..
.
0
..
.
0
k
y1
(1+y
k)
.
..
.
.
.
1
(1+yk )
..
.
yn
(1+y
k)
0
...
0
. . ..
. .
1
b) Using the result in part a) we see that

N (y, k)x = ek
or equivalently
(I + yeTk )x = ek
from where we get successively
x + xk y = ek
and
y=
1
(ek x).
xk
c) Say we wish to nd the inverse of the matrix A,
A =
a11 a12
a21 a22

an1 a12
Let us write A as
4
a1n
a2n

ann
A = x(0)
1
(0)
x
2
(0)
xn
(0)
where xj denotes the jth column of A at step 0 of our (yettobe derived) matrix inversion
algorithm.
In part (b) we showed how to nd a matrix N such that N x = ek for a given x. Let us
notate this matrix by N (x, k) so that N (x, k)x = ek . In the rst step of the algorithm we
(0)
compute N (x1 , 1) and multiply it into A, so that
(0)
N (x1 , 1)A
where
(1)
xj
(1)
(1)
0 x2
xn
0
(0)
(0)
= N (x1 , 1)xj
(0)
(1)
We now multiply N (x1 , 1)A by N (x2 , 2), and so forth, so that after k steps, we have
( 1)
N (x
, j)A
=
j=1
1 0
0 1
(k)
(k)
0 0 xk+1 xn
.. ..
. .
0 0
where the column vectors at step k are

(k)
xj
(j1)
N (xj
(0)
, j)xj
j=1
(k1)
It is fairly easy to convince yourself that multiplication by N (xk

, k) does not aect the
(k1)
rst k 1 columns of the matrix, since the rst k 1 columns of both N (xk
, k) and
k
( 1)
, j)A are the identity vectors e. Thus,
j =1 N (x
A1 =
(j1)
N (x
j
, j)
j=1
(k)
What makes the algorithm inplace is that since xjk = ej , we no longer have to store those
(j1)
, j) is el for l > k. Thus, we can accumulate A1

vectors. Also, column l of kj =1 N (xj
in the space occupied by the columns of A that have already been reduced to unit vectors.
5
(k1)
Note that if at some step k, xk

[k] = 0, then this algorithm will fail. The solution is to
pivot by swapping columns in the matrix. I havent included code that performs pivoting,
because that is the subject of the next problem set.
Some of you noticed that this is essentially the procedure known as Jordan elimination.
Jordan elimination is in some sense an extension of Gaussian elimination to the extent that
at each point in the elimination the elements on previous pivotal rows are also eliminated.
The following code implements these ideas by expliciting forming all the required products.
%in-place invert a matrix A
n = size(A,1);
for i=1:n,
y = -A(:,i);
y(i) = y(i) + 1.0;
y = y / A(i,i);
for k = i+1:n;
m = A(i,k);
for j=1:n;
A(j,k) = A(j,k) + m * y(j);
end;
end;
A(:,i) = y;
A(i,i) = A(i,i) + 1.0;
for k = 1:i-1;
m = A(i,k);
for j=1:n;
A(j,k) = A(j,k) + m * y(j);
end;
end;
end;
On a Sun Sparc10, this routine is about 500 times slower that Matlabs builtin inv()
function.
A more ecient code would operate on the matrix by columns:
%in-place invert a matrix A; vectorized
n = size(A,1);
for i=1:n,
y = -A(:,i);
y(i) = y(i) + 1.0;
y = y / A(i,i);
for k = i+1:n;
A(:,k) = A(:,k) + A(i,k) * y;
end;
A(:,i) = y;
A(i,i) = A(i,i) + 1.0;
for k = 1:i-1;
A(:,k) = A(:,k) + A(i,k) * y;
end;
end;
This routine is signicantly faster; it is only a factor of 1015 slower than Matlab. Regardless
of how fast the machine is, the fact that direct matrix inversion takes O(N 3 ) operations
limits the size of the problem we can solve in a reasonable time. Matlab took roughly 2.7
seconds to invert an N = 250 matrix. In a month, it could probably do N = 25000, in a
year, about N = 50000. The rst algorithm above could probably handle only N = 3000
in a month, N = 7000 in a year.

6336 Spring 2008

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6336 Spring 2008

Uploaded by

Copyright:

Available Formats

Introduction to Simulation - Lecture 1

Example Problems and Basic Equations

Thanks to Deepak Ramaswamy, Michal Rewienski,

Drag Force Analysis

Stock Option Pricing for

The Computer Simulation Senario

Simulate using a canned routine, a friends

Way too slow

Only works sometimes

Power Distribution for a VLSI Circuit

Is there at least 3v across the ALU ?

Load Bearing Space Frame

Does the Space Frame droop too much

Does the engine get too hot ! ?

Above is a picture of an engine block, which is typically solid steel or aluminum.

Design Objectives for the VLSI problem

Select topology and metal widths & lengths so that

Design Objectives for the Space Frame

Select topology and Strut widths and lengths

First Step - Analysis Tools

Given the topology and metal widths & lengths

Who uses VLSI

Several big companies

IBM, Motorola, TI, Intel, Compaq, Sony, Hitachi

Once a VLSI circuit is designed, it is fabricated using a sequence of sophisticated

Who uses VLSI

Small companies make application circuits disk

Thousands of small companies design VLSI circuits for applications as diverse as

Who makes VLSI Tools ?

1.3 billion 3.8 billion

1.5 billion 6.9 billion

Companies compete by improving analysis

Modeling VLSI circuit Power Distribution

Power supply provide current at a certain voltage.

Functional blocks become

Metal lines become

Putting it all together

Modeling the Space Frame

Example is simplified for illustration

Fload = Mass Gravity

In order to model the steel beams in a space frame, it is necessary to develop a

2) The beam does not buckle

Buckling is an important phenomenon and ignoring it limits the domain of

3) The beam is materially linear.

Putting it all together

How much does the load droop?

To generate a representation which can be used to determine the displacements of

Conservation Laws and

Unit Length Rod

Question: What is the temperature distribution along the bar

Conservation Laws and

1) Cut the bar into short sections

SMA-HPC 2003 MIT

Conservation Laws and

Heat Flow through one section

Ti +1 hi +1,i = heat flow =

Conservation Laws and

Two Adjacent Sections

Incoming Heat (hs )

Conservation Laws and

Net Heat Flow into Control Volume = 0

Limit as the sections become vanishingly small