You are on page 1of 900

Introduction to Simulation - Lecture 1

Example Problems and Basic Equations


Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


Luca Daniel, Shihhsien Kuo and Karen Veroy

Outline
Uses For Simulation
Engineering Design
Virtual Environments
Model Verification

Course Philosophy
Example Problems
Power distribution on an Integrated Circuit
Load bearing on a space frame
Temperature distribution in a package

Circuit Analysis
Equations
Current-voltage relations for circuit elements (resistors, capacitors,
transistors, inductors), current balance equations

Recent Developments
Matrix-Implicit Krylov Subspace methods

Electromagnetic
Analysis of Packages
Equations
Maxwells Partial Differential Equations

Recent Developments
Fast Solvers for Integral Formulations

Structural Analysis of
Automobiles
Equations
Force-displacement relationships for mechanical elements (plates,
beams, shells) and sum of forces = 0.
Partial Differential Equations of Continuum Mechanics

Recent Developments
Meshless Methods, Iterative methods, Automatic Error Control

Drag Force Analysis


of Aircraft
Equations
Navier-Stokes Partial Differential Equations.

Recent Developments
Multigrid Methods for Unstructured Grids

Engine Thermal
Analysis
Equations
The Poisson Partial Differential Equation.

Recent Developments
Fast Integral Equation Solvers, Monte-Carlo Methods

Micromachine Device
Performance Analysis
Equations
Elastomechanics, Electrostatics, Stokes Flow.

Recent Developments
Fast Integral Equation Solvers, Matrix-Implicit Multi-level Newton
Methods for coupled domain problems.

Stock Option Pricing for


Hedge Funds
Option Price

Stock Price

t
Equations
Black-Scholes Partial Differential Equation

Recent Developments
Financial Service Companies are hiring engineers, mathematicians and
physicists.

Virtual Environments
for Computer Games
Equations
Multibody Dynamics, elastic collision equations.

Recent Developments
Multirate integration methods, parallel simulation

10

Virtual Surgery
Equations
Partial Differential Equations of Elastomechanics

Recent Developments
Parallel Computing, Fast methods

11

Biomolecule Electrostatic
Optimization
+

- - +
+ +

+- +-

Ligand
(drug
molecule)

-+
Receptor
(protein
molecule)

Ecm protein

Equations
The Poisson Partial Differential Equation.

Recent Developments
Matrix-Implicit Iterative Methods, Fast Integral Equation Solvers

12

The Computer Simulation Senario


Problem too complicated for hand analysis
Toss out some
Terms
Macromodel
Solve a
Simplified
Problem

No

Make
Sense?
Yes
Anxiety

Simulate using a canned routine, a friends


advice, or a recipe book

Works!

Way too slow

D
R
O
P

Develop
Understanding of
Computational
complexity

Develop
Understanding of
Convergence
Issues

Faster Method

Robust Method

C
L
A
S
S

Right
Algorithms

Happiness

Only works sometimes

New
Algorithms

Fame

13

Course Philosophy
Examine Several Modern Techniques
Understand, practically and theoretically, how the
techniques perform on representative, but real,
applications
Why Prove Theorems?
Guarantees, given assumptions, that the method
will always work.
Can help debug programs.
The theorem proof can tell you what to do in practice.

14

Power Distribution for a VLSI Circuit

+ 3.3
v

Cache

ALU

Decoder

Power Supply
Main power wires

Is there at least 3v across the ALU ?

One application problem which generates large systems of equations is the problem
of distributing power to the various parts of a Very Large Scale Integrated (VLSI)
circuit processor.
The picture on the left of the slide shows a layout for a typical processor, with
different functional blocks noted. The processor pictured has nearly a million
transistors, and millions of wires which transport signals and power. All one can
really see be eye are the larger wires that carry power and patterns of wires that
carry signals, boolean operations such as and and or .
A typical processor can be divided into a number of functional blocks, as
diagrammed on the layout on the left. There are caches, which store copies of data
and instructions from main memory for faster access. There are execution units
which perform boolean and numerical operations on data, such as and, or ,
addition and multiplication. These execution units are often grouped together and
referred to as an Arithmetic Logic Unit (ALU). Another main block of the processor
is the instruction decoder, which translates instructions fetched from the cache into
actions performed by the ALU.
On the right is a vastly simplified diagram of the processor, showing a typical 3.3
volts power supply, the 3 main functional blocks, and the wires (in red) carrying
power from the supply to the 3 main functional blocks. The wires, which are part of
the integrated circuit, are typically a micron thick, ten microns wide and thousands
of microns long ( a micron is a millionth of an inch). The resistance of these thin
wires is significant, and therefore even though the supply is 3.3 volts, these may not
be 3.3 volts across each of the functional blocks.
The main problem is we address is whether or not each functional block has
sufficient voltage to operate properly.

15

Load Bearing Space Frame

Droop

Joint
Beam

Attachment to
the ground

Cargo

Vehicle

Does the Space Frame droop too much


under load ! ?

In the diagram is a picture of a space frame used to hold cargo (in red) to be lowered
into a vehicle. The space frame is made using steel beams(in yellow) that are bolted
together at the purple joints. When cargo is hanging off the end of the space frame,
the frame droops.
The main problem we will address is how much does the space frame droop under
load.

16

Thermal Analysis

Does the engine get too hot ! ?

Above is a picture of an engine block, which is typically solid steel or aluminum.


The heat generated by the gas burning in the cylinders must be conducted through
the engine block to a wide enough surface area that the heat can be dissipated. If
not, the engine block temperature will rise too high and the block will melt.

17

Design Objectives for the VLSI problem

+ 3.3
v

Cache

ALU

Decoder

Select topology and metal widths & lengths so that


a) Voltage across every function block > 3 volts
b) Minimize the area used for the metal wires

18

Design Objectives for the Space Frame


Droop

Select topology and Strut widths and lengths


so that
a) Droop is small enough
b) Minimize the metal used.

19

Thermal Analysis
Select the shape so that
a) The temperature does not get too high
b) Minimize the metal used.

20

First Step - Analysis Tools


Droop
+ 3.3
v

Cache

ALU

Decoder

Given the topology and metal widths & lengths


determine
a) The voltage across the ALU, Cache and Decoder.
b) The droop of the space frame under load.

21

Who uses VLSI


Tools ?

Several big companies

IBM, Motorola, TI, Intel, Compaq, Sony, Hitachi


Non functional prototype costs
- Increases time-to-market
- Design rework costs millions

Once a VLSI circuit is designed, it is fabricated using a sequence of sophisticated


deposition and etching processes which convert a wafer of Silicon into millions of
transistors and wires. This processing can take more than a month. If the circuit
does not function, the design flaw must be found and the fabrication process
restarted from the beginning. For this reason, just a few design errors can delay a
product for months. In a competitive market, this delay can cost millions in lost
revenue in addition to the cost of redesigning the circuit.
In order to avoid fabricating designs with flaws, companies make extensive use of
simulation tools to verify design functionality and performance.

22

Who uses VLSI


Tools ?

1000s of small
companies

Small companies make application circuits disk


drives, graphics accelerators, CD players, cell
phones.
What is the cost of non-functional prototypes ?
- Out of business.

Thousands of small companies design VLSI circuits for applications as diverse as


peripherals for personal computers as well as signal processors for audio, video and
automotive applications. These small companies cannot afford the cost of
fabricating prototype designs that do not function. The very survival of these
companies depends on using simulation tools to verify designs before fabrication.

23

Who makes VLSI Tools ?


Company employees sales

Market cap.

Cadence 4,000

1.3 billion 3.8 billion

Synopsis/ 5,000
Avanti
Mentor
2,600
Graphics

1.5 billion 6.9 billion


.6 billion

1.4 billion

Companies compete by improving analysis


efficiency.

24

Modeling VLSI circuit Power Distribution

+
3.3 v

Cache

ALU

Decoder

Power supply provide current at a certain voltage.


Functional blocks draw current.
The wire resistance generates losses.
SMA-HPC 2003 MIT

25

Each of the elements in the simplified layout, the supply, the wires and the
functional blocks, can be modeled by relating the voltage across that element to the
current that passes through the element. Using these element constitutive relations,
we can construct a circuit from which we can determine the voltages across the
functional blocks and decide if the VLSI circuit will function.

25

Supply becomes

Modeling the
Circuit

Power supply
current

A Voltage Source

V
+ Vs

+ Voltage
Physical
Symbol

Current element

V = Vs
Constitutive
Equation

The power supply provides whatever current is necessary to ensure that the voltage
across the supply is maintained at a set value. Note that the constitutive equation (in
the figure) , which is supposed to relate element voltage (V) to element current (I)
does not include current as a variable. This should not be surprising since voltage is
always maintained regardless of how much current is supplied, and therefore
knowing the voltage tells one nothing about the supplied current.

26

Functional blocks become

Modeling the
Circuit

Current Sources

+
ALU

Physical
Symbol

Is
Circuit Element

I = Is
Constitutive
Equation

The functional blocks, the ALU, the cache and the decoder are complicated circuits
containing thousands of transistors. In order to determine whether the functional
block will always have a sufficient voltage to operate, a simple model must be
developed that abstracts many of the operating details. A simple worst-case model
is to assume that each functional block is always drawing its maximum current.
Each block is therefore modeled as a current source, although one must assume that
the associated currents have been determined by analyzing each functional block in
more detail. Note that once again the constitutive equation is missing a variable, this
time it is voltage. Since a current source passes the same current independent of the
voltage across the source, that V is missing should be expected.

27

Metal lines become

Modeling the
Circuit

Resistors

I
Physical Symbol

IR V = 0

Circuit model

Constitutive Equation
(Ohms Law)

Length
resistivity
R=
Area
Design
Parameters

Material
Property

The model for the wires connecting the supply to the functional blocks is a resistor,
where the resistance is proportional to the length of the wire ( the current has further
to travel) and inversely proportional to the wire cross-sectional area ( the current
has more paths to choose).

low Resistance

high Resistance

That the current through a resistor is proportional to the voltage across the resistor
is Ohms law.

28

Modeling VLSI
Power Distribution

IC

IALU
Cache

Putting it all together

ID
ALU

Decoder

+
-

Power Supply
voltage source
Functional Blocks
current sources
Wires become resistors
Result is a schematic

To generate representation which can be used to determine the voltages across each
of the functional units, consider each of the models previously described.
First, replace the supply with a voltage source.
Second, replace each functional block with an associated current source.
Third, replace each section of wire with a resistor.
Note that the resistors representing the wires replace a single section with no
branches, though the section can have turns.
The resulting connection of resistors, current sources and voltage sources is called a
circuit schematic. Formulating equations from schematics will be discussed later.

29

Modeling the Space Frame


Bolts
Struts

Ground

Load

Example is simplified for illustration

In order to examine the space frame, we will consider a simplified example with
only four steel beams and a load. Recall that the purple dots represent the points
where steel beams are bolted together. Each of the elements in the simplified layout,
the beams and the load, can be modeled by relating the relative positions of the
elements terminals to the force produced by the element. Using these element
constitutive relations, we can construct a schematic from which we can determine
the frames droop.

30

Load becomes

Modeling the
Frame

Force Source

Fx = 0

Fload
x

Mass

Physical
Symbol

Schematic
Symbol

Fy = Fload
Constitutive
Equation

Fload = Mass Gravity

The load is modeled as a force pulling in the negative Y direction ( Y being vertical,
X being horizontal).
Note that the constitutive equation does not include the variable for the loads
position, following from the fact that the loads force is independent of position.

31

Beam becomes

Modeling the
Frame

Strut

x1 , y1

Strut

v
f

Beam

x2 , y2

Physical
Symbol

L = (x1 x2 )2 + ( y1 y2 )2
v
L L
f = EAc 0
L0
Constitutive Equation
(Hookes Law)

L0 = Unstretched Length
Ac = Cross-Sectional Area

E = Young's Modulus

Design
Parameters

Material Property

In order to model the steel beams in a space frame, it is necessary to develop a


relation between the beam deformation and the restoring force generated by the
beam. To derive a formula we will make several assumptions.
1) The beam is perfectly elastic.
This means that if one deforms the beam by applying a force, the beam always
returns to its original shape after the force is removed.

Apply
force

Remove
force

L1 > Lo

Lo

Lo

2) The beam does not buckle

buckling

Lo

Apply force
No buckling

L1 > Lo

Buckling is an important phenomenon and ignoring it limits the domain of


applicability of this model.

32

3) The beam is materially linear.


For a beam to be materially linear, the force which acts along the beam is
directly proportional to the change in length.

Lo

f=0

f= KL

f = K L

L1
To determine K consider that the force required to stretch a beam an amount

is

(I) Inversely proportional to its unstretched length (It is easier to stretch a 10 inch
rubber band 1 inch than to stretch a 1 inch rubber band 1 inch)
(II) Directly proportional to its cross-sectional area (Imagine 10 rubber bands in
parallel)
(III) Dependent on the material (Rubber stretches more easily than steel).
Combining (I), (II) and (III) leads to the formula at the bottom of the slide.

33

Modeling the
Frame

Putting it all together

Load

How much does the load droop?

To generate a representation which can be used to determine the displacements of


the beam joints, consider the models previously described.
First, replace the loads with forces.
Second, replace each beam with strut.

34

Formulating Equations
from Schematics
Two Types of Unknowns
Circuit - Node voltages, element currents
Struts - Joint positions, strut forces
Two Types of Equations
Conservation Law Equation
Circuit - Sum of Currents at each node = 0
Struts - Sum of Forces at each joint = 0
Constitutive Equation
Circuit - element current is related to voltage
across the element
Struts - element force is related to the change
in element length
SMA-HPC 2003 MIT

35

35

Conservation Laws and


Constitutive Equations

Heat Flow
1-D Example

Incoming Heat

T (1)

T (0)
Near End
Temperature

Unit Length Rod

Far End
Temperature

Question: What is the temperature distribution along the bar


T

T (0)
SMA-HPC 2003 MIT

T (1)

x 36

36

Conservation Laws and


Constitutive Equations

Heat Flow
Discrete Representation

1) Cut the bar into short sections


2) Assign each cut a temperature

T (1)

T (0)

T1

T2

SMA-HPC 2003 MIT

TN 1 TN
37

37

Conservation Laws and


Constitutive Equations

Heat Flow
Constitutive Relation

Heat Flow through one section

Ti +1 hi +1,i = heat flow =

Ti

Ti +1 Ti
x

hi +1,i
Limit as the sections become vanishingly small
T ( x )
lim x 0 h ( x ) =
38
x
SMA-HPC 2003 MIT

38

Conservation Laws and


Constitutive Equations

Heat Flow
Conservation Law

Two Adjacent Sections


control volume

Incoming Heat (hs )

Ti 1 hi ,i 1

Ti

hi +1,i Ti +1

x
Net Heat Flow into Control Volume = 0
SMA-HPC 2003 MIT hi +1,i hi ,i 1 = h s x

39

39

Conservation Laws and


Constitutive Equations

Heat Flow
Conservation Law

Net Heat Flow into Control Volume = 0


In com ing H eat ( h s )

hi +1,i hi ,i 1 = hs x
T i 1 hi , i 1

Ti

hi + 1, i Ti + 1

Heat in
from left

Heat out
from right

Incoming
heat per
unit length

Limit as the sections become vanishingly small

lim x 0 hs ( x ) =

SMA-HPC 2003 MIT

h ( x ) T ( x )
=

x
x
x

40

40

Conservation Laws and


Constitutive Equations

Heat Flow
Circuit Analogy

Temperature analogous to Voltage


Heat Flow analogous to Current

1
=
R
x
T1
+
-

vs = T (0)

SMA-HPC 2003 MIT

is = hs x

TN
+
-

vs = T (1)
41

41

Formulating Equations
Two Types of Unknowns
Circuit - Node voltages, element currents
Struts - Joint positions, strut forces
Conducting Bar Temperature, section heat flows
Two Types of Equations
Conservation Law Equation
Circuit - Sum of Currents at each node = 0
Struts - Sum of Forces at each joint = 0
Bar Sum of heat flows into control volume = 0
Constitutive Equations
Circuit element current related to voltage
Struts - strut force related to length change
Bar section temperature drop related to heat flow
SMA-HPC 2003 MIT

42

42

Formulating Equations Circuit Example


from Schematics
Identifying Unknowns
1

+
-

vs
3
4

Assign each node a voltage, with one node as 0

Given a circuit schematic, the problem is to determine the node voltages and
element currents. In order to begin, one needs labels for the node voltages, and
therefore the nodes are numbered zero, one, two, N, where N+1 is the total
number of nodes.
The node numbered zero has a special meaning, it is the reference node. Voltages
are not absolute quantities, but must be measured against a reference.
To understand this point better, consider the simple example of a current source and
a resistor.

0
1

1
In order for one Amp to flow through the resistor, V1 - V0 must equal one volt. But
does V1 = 11 volts and V0 =10 volts Or is V1 = 101 volts and V0 = 100 volts ? It
really does not matter, what is important is that V1 is one volt higher than V0. So,
let V0 define a reference and set its value to a convenient number, V0 = 0.

43

Formulating Equations Circuit Example


from Schematics
Identifying Unknowns

i5
0

i2

i1

i4

i3

Assign each element except current sources a


current

The second set of unknowns are the element currents. Obviously, the currents
passing through current sources are already known, so one need only label the
currents through resistors and voltage sources. The currents are denoted

i1, i2,

ib , where b is the total number of unknown element currents. Since elements


connect nodes, in an analogy with graphs, element currents are often referred to as
branch currents.

44

Formulating Equations Circuit Example


from Schematics
Conservation Law

i5
i1 + i 5 i 4 = 0
0

i2

i1

is1

i4

is 2 + is 3 i 2 i 5 = 0

is 1 i 1 + i 2 = 0

is 3

is 2
4

i 4 is1 is 2 i 3 = 0

i3

i 3 is 3 = 0

Sum of currents = 0 (Kirchoffs current law)

The conservation law for a circuit is that the sum of currents at each node equals
zero. This is often referred to as Kirchoffs current law. Another way to state this
law, which more clearly indicates its conservation nature is to say
Any current entering a node must leave the node.

The conservation is that no current is lost, what comes in goes out. The green
statement also makes it clear that the direction of the current determines its sign
when summing the currents. Currents leaving the node are positive terms in the sum
and currents entering the node are negative terms ( one can reverse this convention
but one must be consistent).

45

Formulating Equations Circuit Example


from Schematics
Constitutive Equations
R5 i5 = 0 V2

R1

R5

R2

R2 i2 =V1 V2

R1 i1 = 0 V1

R4

R3
3

R4 i4 =V4 0

R3 i3 =V3 V4

Use Constitutive Equations to relate branch


flow from plus node to minus
currents to node voltages (Currents
node)
Each element with an unknown branch current has an associated constitutive
equation which relates the voltage across the element to the current through the
element. For example, consider

V2

i2

R2 in the figure,

V3
R2

The constitutive relation for a resistor is Ohms law.

And in this case V = V2 -

1
V = I
R

V3 and I = i2.

Onse should again take note of the direction of the current. If current travels from
left node through the resistor to the right node, then the left node voltage will be
higher than the right node voltage by an amount

R I.

46

Formulating Equations Circuit Example


from Schematics
Summary
Unknowns for the Circuit example
Node voltages ( except for a reference)
Element currents ( except for current sources)
Equations for the Circuit example
One conservation equation (KCL) for each node
(except for the reference node)
One constitutive equation for each element
(except for current sources)
Note that # of equations = # of unknowns

47

Summary of key
points

Many Applications of simulation


Picked Three Representative Examples
Circuits, Struts and Joints, Heat Flow in Bar
Two Types of Equations
Conservation Laws
Circuit - Sum of Currents at each node = 0
Struts - Sum of Forces at each joint = 0
Bar - Sum of heat flows into control volume = 0
Constitutive Equation
Circuit current-voltage relationship
Struts - force-displacement relationship
Bar - temperature drop-heat flow relationship
SMA-HPC 2003 MIT

48

48

Introduction to Simulation - Lecture 2


Equation Formulation Methods
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


and Karen Veroy

Outline
Formulating Equations from Schematics
Struts and Joints Example

Matrix Construction From Schematics


Stamping Procedure

Two Formulation Approaches


Node-Branch More general but less efficient
Nodal Derivable from Node-Branch

Formulating Equations
from Schematics

Struts Example
Identifying Unknowns

x1 , y1

x2 , y2

0, 0

1, 0

hinged

Assign each joint an X,Y position, with one


joint as zero.
SMA-HPC 2003 MIT

Given a schematic for the struts, the problem is to determine the joint positions and
the strut forces.
Recall the joints in the struts problem correspond physically to the location where
steel beams are bolted together. The joints are also analogous to the nodes in the
circuit, but there is an important difference. The joint position is a vector because
one needs two (X,Y) (three (X,Y,Z)) coordinates to specify a joint position in two
(three) dimensions.
The joint positions are labeled x1,y1,x2,y2,..xj,yj where j is the number of joints
whose positions are unknown. Like in circuits, in struts and joints there is also an
issue about position reference. The position of a joint is usually specified with
respect to a reference joint.
Note also the symbol

This symbol is used to denote a fixed structure ( like a concrete wall, for example).
Joints on such a wall have their positions fixed and usually one such joint is selected
as the reference joint. The reference joint has the position 0,0
( 0,0,0 in three dimensions).

Formulating Equations
from Schematics
3

fx , f y
1

fx , f y

Struts Example
Identifying Unknowns

1
4

fx ,
fy

fx , f y

f lo a d

Assign each strut an X and Y force component.


SMA-HPC 2003 MIT

The second set of unknowns are the strut forces. Like the currents in the circuit
examples, these forces can be considered branch quantities. There is again a
complication due to the two dimensional nature of the problem, there is an x and a y
1
1
s
s
component to the force. The strut forces are labeled
f , f ,..., f , f
x

where s is the number of struts.

Formulating Equations
from Schematics
Y

Struts Example
Aside on Strut Forces

f = EAc
fx
f

(0, 0)
L

x1 , y1

fy

L0 L
= ( L0 L )
L0

x1
f
L
y
= 1 f
L

fx =
X

fy

L =

x12 + y12

SMA-HPC 2003 MIT

The force, f, in a stretched strut always acts along the direction of the strut, as
shown in the figure. However, it will be necessary to sum the forces at a joint,
individual struts connected to a joint will not all be in the same direction. So, to sum
such forces, it is necessary to compute the components of the forces in the X and Y
direction. Since one must have selected the directions for the X and Y axis once for
a given problem, such axes are referred to as the global coordinate system. Then,
one can think of the process of computing fx, fy shown in the figure as mapping from
a local to a global coordinate system.
The formulas for determining fx and fy from f follow easily from the geometry
depicted in the figure, one is imply projecting the vector force onto coordinate axes.

Formulating Equations
from Schematics
1

1
y

fx + fx + fx = 0
f + fy + fy = 0

x1 , y1

Struts Example
Conservation Law

f3

x2 , y2

fx4 fx3 + floadx = 0

f4

f y4 f y3 + floady = 0

f2

0,0

f lo a d
1,0

Force Equilibrium
Sum of X-directed forces at a joint = 0
Sum of Y-directed forces at a joint = 0
SMA-HPC 2003 MIT

The conservation law for struts is usually referred to as requiring force equilibrium.
There are some subtleties about signs, however. To begin, consider that the sum of
X-diirected forces at a joint must sum to zero otherwise the joint will accelerate in
the X-direction. The Y-directed forces must also sum to zero to avoid joint
acceleration in the Y direction.
To see the subtlety about signs, consider a single strut aligned with the X axis as
shown below

x1,0

x 2,0
, then the strut will exert force in attempt to contract, as

If the strut is stretched by

shown below

x 2 + ,0

x1,0

fa

fb

The forces fa and fb , are equal in magnitude but opposite in sign. This is because fa
points in the positive X direction and fb in the negative X direction.
If one examines the force equilibrium equation for the left-hand joint in the figure,
then that equation will be of the form
Other forces + fa = 0
whereas the equilibrium equation for the right-hand joint will be
Other forces + fb = Other forces- fa = 0
In setting up a system of equations for the strut, one need not include both fa and fb
as separate variables in the system of equations. Instead, one can select either force
and implicitly exploit the relationship between the forces on opposite sides of the
strut.
As an example, consider that for strut 3 between joint 1 and joint 2 on the slide, we
have selected to represent the force on the joint 1 side of the strut and labeled that
force f3. Therefore, for the conservation law associated with joint 1, force f3 appears
with a positive sign, but for the conservation law associated with joint 2, we need
the opposite side force, - f3. Although the physical mechanism seems quite different,
this trick of representing the equations using only the force on one side of the strut
as a variable makes an algebraic analogy with the circuit sum of currents law. That
is, it appears as if a struts force leaves one joint and enters another.

Formulating Equations from


Schematics

1
f 1 x = Fx ( x1 0, y1 0)
f 1 y = Fy ( x1 0, y1 0)

f1 f2

Struts Example
Conservation Law

f 3x = Fx ( x1 x 2, y1 y 2)
f 3 y = Fy ( x1 x 2, y1 y 2)
2

f3

f load

f 2 x = Fx ( x1 1, y1 0)
f 2 y = Fy ( x1 1, y1 0)

f4

f 4 x = Fx ( x2 1, y2 0)

1,0

f 4 y = Fy ( x2 1, y2 0)

Use Constitutive Equations to relate strut forces to joint positions.

It is worth examining how the signs of the force are determined.


Again consider a single strut aligned with the X axis.

x1,0

x 2,0

The -X axis alignment can be used to simplify the relation between the force on the
x1 side and x1 and x2 to

fx =

L | x1 x2 |
x1 x2
0
| x1 x2 |
L0

Note that there are two ways to make fx negative and point in the negative x
direction. Either x1- x2 > 0, which corresponds to flipping the strut, or |x2- x1| < L0
which corresponds to compressing the strut.

Formulating Equations
from Schematics

Struts Example
Summary

Unknowns for the Strut Example


Joint positions (except for a reference or
fixed joints)
Strut forces
Equations for the Strut Example
One set of conservation equations for each
joint.
One set of constitutive equations for each
strut.
Note that the # equations = # unknowns
SMA-HPC 2003 MIT

Strut Example To Demonstrate


Sign convention

Two Struts Aligned with the X axis

f1

f2

x1 , y1 = 0

fL
x2 , y2 = 0

Conservation Law

At node 1: f1x + f 2 x = 0

At node 2: -f 2 x + f L = 0
SMA-HPC 2003 MIT

10

Strut Example To Demonstrate


Sign convention

Two Struts Aligned with the X axis

f1

f2

fL

x2 , y 2 = 0
x1 , y1 = 0
Constitutive Equations
x 0
f1x = 1
( L0 x1 0 )
x1 0

f2 x =

x1 x2
( L0 x1 x2
x1 x2

SMA-HPC 2003 MIT

11

Strut Example To Demonstrate


Sign convention

Two Struts Aligned with the X axis


Reduced (Nodal) Equations

x1
x x
( L0 x1 ) + 1 2 ( L0 x1 x2 ) = 0
x1
x1 x2
f2 x

x1 x2
( L0 x1 x2 ) + f L = 0
x1 x2
f2 x

SMA-HPC 2003 MIT

12

Strut Example To Demonstrate


Sign convention

Two Struts Aligned with the X axis

f1

f2

x1 , y1 = 0

fL
x2 , y 2 = 0

Solution of Nodal Equations

f L = 10 (force in positive x direction)


10
10
x1 = L0 +
x2 = x1 + L0 +

SMA-HPC 2003 MIT

13

Strut Example To Demonstrate


Sign convention

Two Struts Aligned with the X axis

f1

f2

x1 , y1 = 0

fL
x2 , y 2 = 0

Notice the signs of the forces

f 2 x = 10 (force in positive x direction)


f1x = 10 (force in negative x direction)
SMA-HPC 2003 MIT

14

Formulating Equations from


Schematics

Examples from last time


x1, y1

x2, y2

0,0
4

Circuit Modeling VLSI


Power Distribution

Struts and Joints


Modeling a Space Frame

SMA-HPC 2003 MIT

15

Formulating Equations from


Schematics
Two Types of Unknowns
Circuit - Node voltages, element currents
Struts - Joint positions, strut forces

Two Types of Equations


Conservation Law
Circuit - Sum of Currents at each node = 0
Struts - Sum of Forces at each joint = 0

Constitutive Relations
Circuit branch (element) current proportional to branch
(element) voltage
Struts - branch (strut) force proportional to branch (strut)
displacement

16

Generating Matrices from


Schematics

Assume Linear Constitutive Equations...

Circuit Example
One Matrix column for each unknown
N columns for the Node voltage
B columns for the Branch currents
One Matrix row for each equation
N rows for KCL
B rows for element constitutive equations
(linear !)
SMA-HPC 2003 MIT

17

Generating Matrices from


Schematics

Assume Linear Constitutive Equations

Struts Example in 2-D


One pair of Matrix columns for each unknown
J pairs of columns for the Joint positions
S pairs of columns for the Strut forces
One pair of Matrix rows for each equation
J pairs of rows for the Force Equilibrium
equations
S pairs of rows for element constitutive
equations (linear !)
SMA-HPC 2003 MIT

18

Generating Matrices from


Schematics

Circuit Example
Conservation Equation

i5
0

R5

V1

R1

i1

i2

is1

is 2

is 3

R3

R4

i4

V2

R2

V4

i3

V3

SMA-HPC 2003 MIT

To generate a matrix equation for the circuit, we begin by writing the KCL equation
at each node in terms of the branch currents and the source currents. In particular,
we write

signed branch currents = signed source currents


where the sign of a branch current in the equation is positive if the current is leaving
the node and negative otherwise. The sign of the source current in the equation is
positive if the current is entering the node and negative otherwise.

19

Generating Matrices from


Schematics
i5
0

i1

is 2

is 3

R3

R4

SMA-HPC 2003 MIT

V2

R2

i2

is 1

i4

Conservation Equation
R5

V1

R1

Circuit Example

V4

i3

V3

i1 + i 2 = is1
i 2 i 5 = i s 2 i s 3
i 3 = is 3
i3 + i4 = is1 + is2

20

Generating Matrices from


Schematics

Circuit Example
Conservation Equation

Matrix Form for the Equations

One
Row
for
each
KCL
Equation

2
3
4

1 1

i1
i 2

i 3

i 4
i 5

One column for each branch


current

The matrix A is usually not square

is1

i
i
s
s
2
3
i

s3

is1 + is2
Right Hand
Side for
Source
Currents

SMA-HPC 2003 MIT

21

Generating Matrices from


Schematics

Circuit Example
Conservation Equation

How each resistor contributes to the matrix

n1
n2

1
1

n1
KCL at n1

ik

n2

Rk

iother + ik = is
KCL at n 2
iother ik = is

A has no more than two nonzeros per column


SMA-HPC 2003 MIT

What happens to the matrix when one end of a resistor is connected to the reference
( or the zero node).

n1

ik

In that case, there is only one contribution to the kth column of the matrix, as shown
below

n1

22

Generating Matrices from


Schematics

Circuit Example
Conservation Equation

How each current source contributes to the


Right Hand Side

n1 isother + isb

n 2 isother isb

RHS

isb n2

n1
KCL at n1
ib 's =

isother + isb

KCL at n2
ib 's =

isother isb

SMA-HPC 2003 MIT

23

Generating Matrices from


Schematics

Circuit Example
Conservation Equation

Conservation Matrix Equation Generation Algorithm

n1

For each resistor

ik

n2

if (n1 > 0) A(n1, b) = 1


if (n 2 > 0) A(n 2, b) = 1
Set Is = zero vector
For each current source

n1

i sb n 2

if (n1 > 0) Is (n1) = Is (n1) isb


if (n 2 > 0) Is (n 2) = Is (n1) + isb
SMA-HPC 2003 MIT

24

Generating Matrices from


Schematics
R
i5

Circuit Example
Conservation Equation

R1

i1

R2

i2

is1

is 2
i3

R4

i4

R3

4
1
2
3
4

is 3

1 1
1

i1 i 2 i 3

i4 i5

i1
is 1
i 2


is 2 is 3
i 3 =

is 3

4
i


is1 +is2
i 5

SMA-HPC 2003 MIT

25

Generating Matrices from


Schematics
i5
i1
1 i2

Circuit Example
Constitutive Equation
i5 =
2

i1 =

i2 =

1
1
Vb1 = (0 V 1)
R1
R1

i4
i4 =

1
1
Vb 5 = (0 V 2)
R5
R5

1
1
Vb2 = (V 3 V 4)
R2
R2

i3

4
1
1
Vb 4 =
(V 4 0)
R4
R4

i3 =

1
1
Vb 3 = (V 3 V 4)
R3
R3

First determine Voltages across resistors (Branch Voltages)


Second relate Branch currents to Branch Voltages
SMA-HPC 2003 MIT

The current through a resistor is related to the voltage across the resistor, which in
turn is related to the node voltages. Consider the resistor below.

V1

i1

V2

R1

The voltage across the resistor is V1-V2 and the current through the resistor is

i1 =

1
(V 1 V 2)
R1

Notice the sign, i , is positive if V1 > V2.


In order to construct a matrix representation of the constitutive equations, the first
step is to relate the node voltages to the voltages across resistors, the branch
voltages.

26

Generating Matrices from


Schematics

Circuit Example
Constitutive Equation

i5
0

i1

i5 =

i2

2
i2 =

1
1
i1 =
V b1 =
( 0 V 1)
R1
R1

i4

Examine
Matrix
Construction

1
1
Vb 2 =
(V 1 V 2)
R2
R2

i3

4
1
1
i4 =
Vb4 =
(V 4 0 )
R4
R4

V b 1 1
V b 2
1
V b 3 =

V b 4
V b 5

1
1
Vb5 =
(0 V 2)
R5
R5

1
1
i3 =
Vb3 =
(V 3 V 4 )
R3
R3

1
1

V 1
V 2

V 3

V 4

SMA-HPC 2003 MIT

To generate a matrix equation that relates the node voltages to the branch voltages,
one notes that the voltage across a branch is just the difference between the node
voltages at the ends of the branch. The sign is determined by the direction of the
current, which points from the positive node to the negative node.
Since there are B branch voltages and N node voltages, the matrix relating the two
has B rows and N columns.

27

Generating Matrices from


Schematics

Circuit Example
Constitutive Equation

Node to Branch Relation

KCL Equations
1 1

1
1

1 1

Vb1

1
Vb 2
V 1
1 1

V 2

1 1 = Vb 3

V 3

1
Vb 4

V 4

Vb 5

i1
i 2

i 3 = Is

i 4
i 5

AT

The node-to-Branch matrix is the transpose of the KCL Matrix


SMA-HPC 2003 MIT

A relation exists between the matrix associated with the conservation law (KCL)
and the matrix associated with the node to branch relation. To see this, examine a
single resistor.
k

Vl

Rk

Vm

For the conservation law, branch k contributes two non- zeros to the kth column of
A as in

l
m

1
1

I 1
:

:

:
I B




Is


A
Note that the voltage across branch k is Vl -Vm, so the kth branch contributes two
non-zeros to the kth row of the node- branch relation as in

28

V 1
:

:
V N

V b 1

V b B

It is easy to see that each branch element will contribute a column to the incidence
matrix A, and will contribute the transpose of that column, a row, to the node-tobranch relation.

29

Generating Matrices from


Schematics
i5
0

R1

is 2
i3

R4

i4

R2

i2

is1

Constitutive Equation

R5

i1

Circuit Example

is 3

R3

The kth resistor contributes

i1

i 2

i 3 =

i 4

i 5

1
R1

1
R2 1
R3

1
R4

1
R5

Vb1
Vb 2

Vb 3

Vb 4
Vb 5

1
to ( k , k )
Rk

The matrix relates branch voltages to branch currents.


- One row for each unknown current.
- One column for each associated branch voltage.
The matrix is square and diagonal.
SMA-HPC 2003 MIT

30

Generating Matrices from


Schematics
i5
0

R1

is 2
i3

R4

i4

R2

i2

is1

Constitutive Equation

R5

i1

Circuit Example

is 3

R3
3

i1
i 2

i 3 AT

i 4
i 5

0
V 1
0
V 2

= 0
V 3


0
V 4
0
VS

The node voltages can be related to branch currents.


- AT relates node voltages to branch voltages.
-

relates branch voltages to branch currents.

SMA-HPC 2003 MIT

31

Generating Matrices from


Schematics
B
N

AT

Circuit Example
Node-Branch Form

Ib
0
=


VN
Is

N = Number of Nodes with unknown voltages


B = Number of Branches with unknown currents

Ib AT VN = 0
A Ib = I s

Constitutive Relation
Conservation Law

32

Generating Matrices from


Schematics

Struts Example

In 2-D
One pair of columns for each unknown
- J pairs of columns for the Joint positions
- S pairs of columns for the Strut positions
One pair of Matrix Rows for each Equation
- J pairs of rows for the force equilibrium
equations
- S pairs of rows for the Linearized constitutive
relations.

33

Generating Matrices from


Schematics

Struts Example

Follow Approach Parallel to Circuits


1) Form an Incidence Matrix, A, from
Conservation Law.
2) Determine strut deformation using AT.
3) Use linearized constitutive equations to relate
strut deformation
4) Combine (1),(2), and (3) to generate a
node-branch form.

34

Generating Matrices from


Schematics
x1, y1

f1

Struts Example
Conservation Equation

f3

x2, y 2

f4

f2

fl
0,0

1,0

f 1x + f 2 x + f 3 x = 0
f 1y + f 2 y + f 3y = 0
f 3 x + f 4 x = fl x

f 3 y + f 4 y = fl y
SMA-HPC 2003 MIT

As a reminder, the conservation equation for struts is naturally divided in pairs. At


each joint the sum of X-directed forces = 0 and the sum of Y-directed forces = 0.
Note that the load force is known, so it appears on the right hand side of the
equation.

35

Generating Matrices from


Schematics
x1, y1

f1

Struts Example
Conservation Equation

f3

x2, y 2

Stamping Approach

f4

f2

fl
0,0

Load pair of columns per strut


Load right side for load

1,0

f 1x f 1 y f 2 x f 2 y f 3x f 3 y f 4 x f 4 y

x1 1
y1
1

x2
y 2

1
1

1
1

1
1

A
SMA-HPC 2003 MIT

f 1x
f 1y

f 2x



f 2 y =
f 3x
fl x


fl y
f 3y
f 4x
FL

f 4 y

Note that the incidence matrix, A, for the strut problem is very similar to the
incidence matrix for the circuit problem, except the two dimensional forces and
positions generate 2x2 blocks in the incidence matrix. Consider a single strut

x j 1, y j 1

fs

xj 2, yj 2

The force equilibrium equations for the two joints at the ends of the strut are
At joint j1

+ fsx = flx

+ fsy = fly

xother

j1

At joint j2

j1

yother

j1

j1

fsx = flx

xother

j2

yother

j2

j2

fsy = fly
j2

36

Examining what goes in the matrix leads to a picture

fsx
xj1
yj1

xj 2
yj 2

fsy

Note that the matrix entries are 2x2 blocks. Therefore, the individual entries in the
matrix block for strut Ss contribution to j1s conservation equation need specific
indices and we use j1x, j1y to indicate the two rows and Sx, Sy to indicate the two
columns.

37

Generating Matrices from


Schematics

Struts Example
Conservation Equation

Conservation Matrix Generation Algorithm


For each strut
If ( j1 is not fixed)
A( j1 x, bx ) = 1 A( j1 y , by ) = 1
If ( j2 is not fixed)
A( j 2 x, bx ) = 1 A( j 2 y , by ) = 1
xj 1, yj 1
For each load
fload
If ( j1 is not fixed)

FL ( j1x ) = FL ( j1x ) f load x


FL ( j1 y ) = FL ( j1x ) fload y

A has at most 2 non-zeros / column

38

Generating Matrices from


Schematics

Struts Example
Constitutive Equation

Y
f

x 1, y 1

First linearize the


constitutive relation

(0, 0)

If x1, y1 are close to some x0, y0, x02 + y02 = L02


Fx
( x 0, y 0)
fx x
=
fy Fy ( x 0, y 0)
x

Fx
( x 0, y 0)
y
Fy
( x 0, y 0 )
y

ux

uy

ux 1 = x 1 x 0
uy 1 = y 1 y 0

SMA-HPC 2003 MIT

As shown before, the force through a strut is

x
( L0 L )
L
y
f y = Fy ( x, y ) = ( L0 L )
L
f x = Fx ( x , y ) =

where

L = x2 + y2

and x, y are as in

y
L
X

If x and y are perturbed a small amount from some x0, y0 such that x02 + y02 = L02,
then since Fx(x0,y0) = 0

Fx
Fx
fx
( x 0, y 0) ( x1 x 0) +
( x 0, y 0) ( y1 y 0)
x
y
and a similar expression holds for y.
One should note that rotating the strut, even without stretching it, will violate the
small perturbation conditions. The Taylor series expression will not give good
approximate forces, because they will point in an incorrect direction.

39

Generating Matrices from


Schematics

ux1, uy1
f1

Struts Example
Constitutive Equation

f3

ux2, uy 2

f2

f4

fl
0,0

1,0

f 1x
f 1y

f 2x

f 2y
f 3x

f 3y
f 4x

f 4 y

11

22

33

44

0
0

ux1 0
0
T uy1
=
A
ux 2 0

uy 2 0
0

0

SMA-HPC 2003 MIT

40

Generating Matrices from


Schematics

Struts Example
Constitutive Equation

The ( s, s ) block

ux1, uy 1 fs
Initial position

x 10 , y 10

F x
x ( x 20 x 10 , y 20 y 10 )

( s, s ) =
F y

( x 20 x 10 , y 20 y 10 )
x

ux 2 , uy 2
Initial position

x 20 , y 20

F x

( x 2 0 x 1 0 , y 20 y 1 0 )
y

F y
( x 20 x 10 , y 20 y 10 )
y

SMA-HPC 2003 MIT

41

Generating Matrices from


Schematics
2 S

2 J

Struts Example
Node-Branch From

AT fs 0

=
u fL
0

2 J
2 S
S =Number of Struts
J = Number of unfixed Joints
fs = AT u = 0 Constitutive Equation
A fs = 0 Conservation Law

42

Generating Matrices from


Schematics
2 S

2 J

2 S

Struts Example
Comparison

AT fs 0

=
u fL
0

2 J

AT Ib Vs

=
VN Is
0

43

Generating Matrices

Nodal Formulation
is1 +

R5

1
1
V1 + (V1 V2 ) = 0
R1
R2

V1

R1

R4
is1 is2 +

Circuit Example

is1
V4

V2

is 2

is2 + is3 +

R2

1
1
(V2 V1 ) + V2 = 0
R2
R5

is 3
R3

V3

1
1
V4 + (V4 V3 ) = 0
R4
R3

1) Number the nodes with one node as 0.


2) Write a conservation law at each node.
except (0) in terms of the node voltages !
SMA-HPC 2003 MIT

44

Generating Matrices

Nodal Formulation
i5
0

i1
i

R5

V1

R1

Circuit Example
V2

R2

i2

is1

One row per node, one


column per node.

is 3

For each resistor

is 2
i3

4 R4

1
1
1
+

R1 R 2
R2
1
1
1

+
R2
R 2 R5

R3

V4

1
R3
1

R3

1
R3
1
1
+
R3 R 4

n1

V3

v1
is1
v

2 = is2 is3
v3
is3

v4
is1 + is2

n2

Is

G
SMA-HPC 2003 MIT

Examining the nodal equations one sees that a resistor contributes a current to two
equations, and its current is dependent on two voltages.

ik

V n1

Vn2
Rk

1
(Vn1 Vn 2) = is
Rk
1
KCL at node n 2 iothers (Vn1 Vn 2 ) = is
Rk
So, the matrix entries associated with Rk are
KCL at node n1

others

n1

n1

n2

n2

1
1

Rk
Rk
1
1

Rk
Rk

45

Generating Matrices

Nodal Formulation

Circuit Example

Nodal Matrix Generation Algorithm


if (n1 > 0) & (n 2 > 0)
1
1
, G(n 2, n1) = G(n 2, n1)
R
R
1
1
G(n1, n1) = G(n1, n1) +
, G(n 2, n 2) = G(n 2, n 2) +
R
R
G(n1, n 2) = G(n1, n 2)

else if (n1 > 0)


G(n1, n1) = G(n1, n1) +

1
R

else
G(n 2, n 2) = G(n 2, n 2) +

1
R

SMA-HPC 2003 MIT

46

Nodal Formulation

N
2 J

Generating Matrices

G Vn = Is

N
G uj = FL

(Resistor Networks)

(Struts and Joints)

2 J

47

Nodal Formulation

Comparing to Node-Branch form

Node-Branch Matrix
Constitutive
Conservation
Law

I AT

A
0

Ib 0
=
VN Is

Nodal Matrix

[ G ][VN ] = [Is ]

48

Nodal Formulation

Diagonally Dominant .

G matrix properties

Gii Gij
j i

Symmetric ..
Smaller ..

Gij = Gji

N N << ( N + B ) ( N + B )
2 J 2 J << ( 2 J + 2 S ) ( 2 J + 2 S )

49

Node-Branch form

Nodal Formulation

Node-Branch formulation

AT
0
M

Ib
0
=


Vn
I s
b
x

Not Symmetric or Diagonally Dominant


Matrix is (n+b) x (n+b)
SMA-HPC 2003 MIT

50

Deriving Formulation From NodeBranch

Nodal Formulation

Ib AT VN = 0
A ( Ib AT VN ) = A 0
A Ib = Is
A AT VN = Is
G

51

Problem element

Nodal Formulation

Voltage Source

is
Voltage source

Vn 1

Vs
+

Vn 2

Constitutive Equation

0 is + V n 1 V n 2 = V s
SMA-HPC 2003 MIT

52

Problem Element

Nodal Formulation

Voltage Source

Can form Node-Branch Constitutive Equation with


Voltage Sources
R
i5
5

i6
Vs

R1

i1

i2

i3

R4

R2

i4

R3

0
1
i1
V1 0
1
i2
V2
1

i3 AT V 3 = 0
1

i4
V 4 0

1 i5
V 5 0

0 i6
Vs
Vs

1
R1
R2 1

=
R3 1

R4 1

R5

SMA-HPC 2003 MIT

53

Problem Element

Nodal Formulation

Voltage Source

Cannot Derive Nodal Formulation

I bR
T
0 A VN = 0 (Constitutive Equation)

I bR
A A ATVN = 0 (Multiply by A)
0

Resistor currents
Voltage source
currents
missing

A Ib = Is

(Conservation Law)

Ib
Cannot Eliminate Ai !
R

Nodal Formulation requires Constitutive relations


in the form
Conserved Quantity = F ( Node Voltages) !
SMA-HPC 2003 MIT

54

Problem Element

Nodal Formulation

Rigid rod

Rigid Rod

x1, y1

x 2, y 2
L

0
( y1 y2 )

0 fx (x1 x2)2+ ( y1 y2)2 L


+
=
( x1 x2) fy
0
0

SMA-HPC 2003 MIT

55

Comparing Matrix Sparsity

Nodal Formulation

Example Problem

Resistor Grid
V1

V2

V 101

V 901

V 102

V 902

V3

V 103

V 903

V4

V 99

V 100

V 200

V 1000

SMA-HPC 2003 MIT

56

Comparing Matrix Sparsity

Nodal Formulation
Example Problem

Matrix non-zero locations for 100 x 10 Resistor Grid

Node-Branch

Nodal

57

Summary of key points...

Developed algorithms for automatically


constructing matrix equations from schematics
using conservation law + constitutive equations.
Looked at two node-branch and nodal forms.

58

Summary of key points

Node-branch
General constitutive equations
Large sparser system
No diagonal dominance

Nodal
Conserved quantity must be a function of node
variables
Smaller denser system.
Diagonally dominant & symmetric.

59

Introduction to Simulation - Lecture 3


Basics of Solving Linear Systems
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


Karen Veroy and Jacob White
SMA-HPC 2003 MIT

Outline
Solution Existence and Uniqueness
Gaussian Elimination Basics
LU factorization
Pivoting and Growth
Hard to solve problems
Conditioning

Application
Problems

G
M

Vn = I s
x

No voltage sources or rigid struts


Symmetric and Diagonally Dominant
Matrix is n x n

SMA-HPC 2003 MIT

Systems of Linear
Equations

M1

M2

x b
1 1
b
x
MN 2 = 2


xN bN

x1M 1 + x2 M 2 +

+ xN M N = b

Find a set of weights, x, so that the weighted


sum of the columns of the matrix M is equal to
the right hand side b
SMA-HPC 2003 MIT

Key Questions

Systems of Linear
Equations
Given Mx = b
Is there a solution?
Is the solution Unique?

Is there a Solution?
There exists weights, x1 ,

x1M 1 + x2 M 2 +

xN , such that

+ xN M N = b

A solution exists when b is in the


span of the columns of M
SMA-HPC 2003 MIT

Systems of Linear
Equations

Key Questions
Continued

Is the Solution Unique?


Suppose there exists weights, y1 ,

y1M 1 + y2 M 2 +

y N , not all zero

+ yN M N = 0

Then if Mx = b, therefore M ( x + y ) = b

A solution is unique only if the


columns of M are linearly
independent.
SMA-HPC 2003 MIT

Systems of Linear
Equations

Key Questions
Square Matrices

Given Mx = b, where M is square


If a solution exists for any b, then the
solution for a specific b is unique.
For a solution to exist for any b, the columns of M must
span all N-length vectors. Since there are only N
columns of the matrix M to span this space, these vectors
must be linearly independent.

A square matrix with linearly independent


columns is said to be nonsingular.
SMA-HPC 2003 MIT

Important Properties
Gaussian
Elimination Basics

Gaussian Elimination Method for Solving M x = b


A Direct Method
Finite Termination for exact result (ignoring roundoff)
Produces accurate results for a broad range of
matrices
Computationally Expensive

SMA-HPC 2003 MIT

Gaussian
Elimination Basics

Reminder by Example
3 x 3 example

M 11 M 12 M 13 x1
b1



M 21 M 22 M 23 x2 = b2
M 31 M 32 M 33 x3
b3

M 11 x1 + M 12 x2 + M 13 x3 = b1
M 21 x1 + M 22 x2 + M 23 x3 = b2
M 31 x1 + M 32 x2 + M 33 x3 = b3
SMA-HPC 2003 MIT

Gaussian
Elimination Basics

Reminder by Example
Key Idea

Use Eqn 1 to Eliminate x1 From Eqn 2 and 3

M 11 x1 + M 12 x2 + M13 x3 = b1
21 x1+ M
22 x2 + M

MM
M 2123 x3 = b2
M 21
21
M

M
x
+
M

M
x
=
b

b1
M
22
x23 + M x 13= b3
12 21
2
2
M
x
+
M
M
M
M

(
M
x
+
M
x
+
M
x
=
b
)
31
1
32
2
33
3
3
11
1112 2
111

11 1
13 3

M 11

M 31
M 31x = b
M 31
MM
x1+x2M+ 32 Mx233+M
3112
33 M
3 13 x33 = b3
b1
M 32
M
M
M

11
11
11
M 31

( M 11 x1 + M12 x2 + M13 x3 = b1 )
M 11
SMA-HPC 2003 MIT

10

Gaussian
Elimination Basics
Pivot

M 11

Reminder by Example
Key Idea in the Matrix

M 12
M 13

M 21
M 21
M 21

M
M

M
x
=
b

22
12 23
12 2

1
2
M 11
M 11
M11



M 31
M 31

M 32
M 12 M 33
M 12

M
M 11
M 11


x3 b3 31 b1
M11

MULTIPLIERS

SMA-HPC 2003 MIT

11

Gaussian
Elimination Basics

Reminder by Example
Remove x2 from Eqn 3

Pivot
M 12
M11

M 21
M 22
M12
0
M11

0
0

M13

M 21
M 23
M 12
M11

M 31

M 33 M M12
11

M 31
M 32
M 12
M 11

M 21
M 22
M 12
M 11

M 21
M 23
M 12
M11

x1 b1

21

b2
b

M11
x2 =

M 31

M
M
32

12

M11
b M 21 b
b M 31 b
2
1
3 M11 1
M11

M 21

M 22
M 12

M11

x3

Multiplier
SMA-HPC 2003 MIT

12

Gaussian
Elimination Basics

M 11

Reminder by Example
Simplify the Notation

1
b

M 12
M 13

M 21
M 21
M 21

M
M

M
x
=
b

22
12 23
12 2

1
2
11
M 1122

M23
M11



M 31
M 31

M
M

M
32
12 33
12

M
11
M 1132

M33


x3 b3 331 b1

M11

SMA-HPC 2003 MIT

13

Gaussian
Elimination Basics

M 11
0

Pivot

M 12
M 22
0

M 13
M 23
M 33

M 32
M 22

Reminder by Example
Remove x2 from Eqn 3

b1
x

x2 =

b2

x
M
3
M 23
b3 32 b2

M 22

Multiplier
SMA-HPC 2003 MIT

14

Gaussian
Elimination Basics

Reminder by Example
GE Yields Triangular System

x
x
y b1
M11 U MU12 UM
1
3
x = 1y
0
U
U
0 M M

x
2y = b2
23x
0 220 U

y
x3
= 0U 3 0 M33
x3 b3
33
11

x2 =

12

13

22

23

33

y2 U 23 x3
U 22

x1 =
SMA-HPC 2003 MIT

Altered
During
GE

y1 U12 x2 U13 x3
U11

15

Gaussian
Elimination Basics

Reminder by Example
The right-hand side updates

b1
b1
y1

M 21

y2 = b2 b1

11

=
y

2
2
y3 b M 31 b M 32 b

3 M 11 1 M 22 2

y3

0 0
1
y b
M21
1 1
M 21 M 1 0 y2 = b2
y b
b1 11
M 11 M31 M32 3 3
1

M11 M22
M
M

31
b
b1 32 b2

M 22
M 11

SMA-HPC 2003 MIT

16

Gaussian
Elimination Basics

Reminder by Example
Fitting the pieces together

M
M
M
M
M
M
M111 12 13
M
M
MM M
1 M
M
M M22 23
M
M
M 1
MM MM M33

11

21

2 1

12

13

22

23

11

1 1

31

3 1

1111

32

3 2

33

22 3 2

SMA-HPC 2003 MIT

17

Basics of LU
Factorization
Solve M x = b
Step 1

M = LiU

=
Step 2 Forward Elimination
Solve L y = b
Step 3 Backward Substitution
Solve U x = y
SMA-HPC 2003 MIT

Recall from basic linear algebra that a matrix A can be factored into the product of a
lower and an upper triangular matrix using Gaussian Elimination. The basic idea of
Gaussian elimination is to use equation one to eliminate x1 from all but the first
equation. Then equation two is used to eliminate x2 from all but the second
equation. This procedure continues, reducing the system to upper triangular form as
well as modifying the right- hand side. More is
needed here on the basics of Gaussian elimination.

18

Basics of LU
Factorization
l11 0
l
21 l22
l13

lN 1

Solving Triangular
Systems
Matrix

0
lNN

y1
b1

b 2
y2





=






y
b
N
N

The First Equation has only y1 as an unknown.


The Second Equation has only y1 and y2 as
unknowns.
SMA-HPC 2003 MIT

19

Solving Triangular
Systems

Basics of LU
Factorization
l11 0
l
21 l22
l13

l N 1

Algorithm
0

0
l NN

y1
b1
y
b 2
2





=






y
bN

N

1
b1
l11
1
y2 = (b2 l21 y1)
l22
1
y3 = (b3 l31 y1 l32 y2)
l33

y1 =

yN =

1
l NN

N 1

(bN lN i yi )
i =1

SMA-HPC 2003 MIT

Solving a triangular system of equations is straightforward but expensive. y1 can be


computed with one divide, y2 can be computed with a multiply, a subtraction and a
divide. Once yk-1 has been computed, yk can be computed with k- 1multiplies, k- 2
adds, a subtraction and a divide. Roughly the number of arithmetic operations is
N divides + 0 add /subs + 1 add /subs + . N- 1 add /sub
for y2

for y1
+

0 mults
for y1

+ 1 mult

for yN

+ .. + N- 1 mults

for y2

for yN

(N
- 1)(N
- 2) / 2 add/subs + (N
- 1)(N
- 2) / 2 mults + N divides

Order N2 operations

20

Factoring

Basics of LU
Factorization

M 11
M
M
M 21
M
M
M 31
M
M
M 41
21

11

31

11

41

11

Picture

M 12 M 13 M 14

M
M 22
M 23
22 M
23 M 24
24

M
M 3333 M 34

M
M
32
34
M

M
M
M
M
M
M444444
M
43
42
43
43
42
M
M
32

22

42

43

22

33

SMA-HPC 2003 MIT

The above is an animation of LU factorization. In the first step, the first equation is
used to eliminate x1 from the 2nd through 4th equation. This involves multiplying
row 1 by a multiplier and then subtracting the scaled row 1 from each of the target
rows. Since such an operation would zero out the a21, a31 and a41 entries, we can
replace those zerod entries with the scaling factors, also called the multipliers. For
row 2, the scale factor is a21/a11 because if one multiplies row 1 by a21/a11 and
then subtracts the result from row 2, the resulting a21 entry would be zero. Entries
a22, a23 and a24 would also be modified during the subtraction and this is noted by
changing the color of these matrix entries to blue. As row 1 is used to zero a31 and
a41, a31 and a41 are replaced by multipliers. The remaining entries in rows 3 and 4
will be modified during this process, so they are recolored blue.
This factorization process continues with row 2. Multipliers are generated so that
row 2 can be used to eliminate x2 from rows 3 and 4, and these multipliers are
stored in the zerod locations. Note that as entries in rows 3 and 4 are modified
during this process, they are converted to gr een. The final step is to used row 3 to
eliminate x3 from row 4, modifying row 4s entry, which is denoted by converting
a44 to pink.
It is interesting to note that as the multipliers are standing in for zerod matrix
entries, they are not modified during the factorization.

21

Factoring

LU Basics
For i = 1 to n-1 {
For j = i+1 to n {

M ji =

Algorithm

For each Row


For each target Row below the source

M ji
M ii

Pivot

For k = i+1 to n { For each Row element beyond Pivot

M jk M jk M ji M ik
}

Multiplier

}
}
SMA-HPC 2003 MIT

22

Factoring

LU Basics
At Step i

Zero Pivots

Factored Portion
Multipliers M

(L)

ii

ji

Row i
Row j

M ji
What if M ii = 0 ? Cannot form
M ii
Simple Fix (Partial Pivoting)
If M ii = 0
Find M ji 0 j > i
Swap Row j with i

SMA-HPC 2003 MIT

Swapping row j with row i at step i of LU factorization is identical to applying LU


to a matrix with its rows swapped a priori.
To see this consider swapping rows before beginning LU.

r1
r2

rj

ri

r N

x1
b1
x2
b2


|
|


xi = bj
|
|


xj
bi
|
|


b N
x N

Swapping rows corresponds to reordering only the equations. Notice that


the vector of unknowns is NOT reordered.

23

Factoring

LU Basics

Zero Pivots

Two Important Theorems


1) Partial pivoting (swapping rows) always succeeds
if M is non singular
2) LU factorization applied to a strictly diagonally
dominant matrix will never produce a zero pivot

SMA-HPC 2003 MIT

Theorem Gaussian Elimination applied to strictly diagonally dominant matrices


will never produce a zero pivot.
Proof
1) Show the first step succeeds.
2) Show the (n- 1) x (n- 1) sub matrix
n- 1
n

n- 1

n
is still strictly diagonally dominant.
n
as
a11 0
| a11 |> | aij |
Second row after first step
j =2

First Step

0, a 22
Is
a 22

a 21
a 21
a 21
a12, a 23
a13, , a 2 n
a 1n
a11
a11
a11

a 21
a 21
a12 > a 2 j
a1 j ?
a11
a11

24

Numerical Problems

LU Basics

Small Pivots

Contrived Example
1017

2
1
L = 17
10

x1 1
x = 3
2
0
1

1017
U=
0

1017 + 2
1

Can we represent
this ?
SMA-HPC 2003 MIT

In order to compute the exact solution first forward eliminate

0 y1 1
=
1 y2 3
and therefore y1 = 1, y2 = 3 -1017.
1
1017

Backward substitution yields


1017

2 10

x1 1
x =
17
2 3 10
3 1017
and therefore x2 =
+1
2 1017
1

17

17
17
3 1017
17 2 10 (3 10 )
and x1 = 10 (1
) = 10 (
)
2 1017
2 1017
17

+1

In the rounded case


1
1017

1017

0 y1 1
= y1 = 1 y2 = 1017

1 y2 3
1 x1 1
=
x1 = 1 x2 = 0
1017 x2 1017

25

Numerical Problems

LU Basics

Small Pivots

An Aside on Floating Point Arithmetic


Double precision number

X.X X X X X i10exponent
64 bits

sign

Basic Problem

11 bits

52 bits

1.0 0.000000000000001 = 1.0


or
1.0000001 1 + ( 107 ) right 8 digits

Key Issue
Avoid small differences between large
numbers !
SMA-HPC 2003 MIT

26

LU Basics

Numerical Problems
Small Pivots

Back to the contrived example


LU Exact
LU Rounded

1
= 17
10
1
= 17
10

x1
1
=
x
1

2 Exact

0 1017
1 0
0 1017

1 0

x1 1
=
2 1017 x2 3
1

1 x1 1
=
1017 x2 3

x1
0
=
x

2 Rounded 1

SMA-HPC 2003 MIT

27

Numerical Problems

LU Basics

Small Pivots

Partial Pivoting for Roundoff reduction

If | M ii | < max | M ji |
j >i

Swap row i with arg (max | M ij |)


j >i

1
LU reordered = 17
10

0
1

2
1

0 1 + 2 1017

This multiplier
is small

This term gets


rounded

SMA-HPC 2003 MIT

To see why pivoting helped notice that

0 y1 3
=
1 y2 1
yields y1 = 3, y 2 = 1 1017 1
Notice that without partial pivoting y2 was 3-1017 or -1017 with rounding.
The right hand side value 3 in the unpivoted case was rounded away, where as now
it is preserved. Continuing with the back substitution.
1
1017

2
1

0 1 + 2 1017

x2 1 x1 1

x1 3
=
x2 1

28

LU Basics

Numerical Problems
Small Pivots

If the matrix is diagonally dominant or partial pivoting


for round-off reduction is during
LU Factorization:
1) The multipliers will always be smaller than one in
magnitude.
2) The maximum magnitude entry in the LU factors
will never be larger than 2(n-1) times the maximum
magnitude entry in the original matrix.
SMA-HPC 2003 MIT

To see why pivoting helped notice that

0 y1 3
=
1 y2 1
yields y1 = 3, y 2 = 1 1017 1
Notice that without partial pivoting y2 was 3-1017 or -1017 with rounding.
The right hand side value 3 in the unpivoted case was rounded away, where as now
it is preserved. Continuing with the back substitution.
1
1017

2
1

0 1 + 2 1017

x2 1 x1 1

x1 3
=
x2 1

29

Hard to Solve
Systems

Fitting Example

Polynomial Interpolation
Table of Data
f
t0
f (t0)
t1 f (t1)
f (t0)

tN f (tN)

t0 t1

t2

tN

Problem fit data with an Nth order polynomial


f (t ) = 0 + 1 t + 2 t 2 + + N t N
SMA-HPC 2003 MIT

30

Example Problem

Hard to Solve
Systems

Matrix Form
1

t0

t1

t0

t0

t1

t1

tN

tN

tN

0 f (t0 )

1 f (t1 )

N f (t N )

M interp
SMA-HPC 2003 MIT

The kth row in the system of equations on the slide corresponds to insisting that the
Nth order polynomial match the data exactly at point tk. Notice that we selected the
order of the polynomial to match the number of data points so that a square system
is generated. This would not generally be the best approach to fitting data, as we
will see in the next slides.

31

Hard to Solve
Systems

Fitting Example
Fitting f(t) = t

Coefficient
Value

Coefficient number
SMA-HPC 2003 MIT

Notice what happens when we try to fit a high order polynomial to a function that is
nearly t. Instead of getting only one coefficient to be one and the rest zero, instead
when 100th order polynomial is fit to the data, extremely large coefficients are
generated for the higher order terms. This is due to the extreme sensitivity of the
problem, as we shall see shortly.

32

Perturbation Analysis

Hard to Solve
Systems

Measuring Error Norms

Vector Norms
L2 (Euclidean) norm :

i=1

L1 norm :

Unit circle
n

xi

< 1

xi

i=1

< 1

L norm :

= max
i

xi

Square

< 1

SMA-HPC 2003 MIT

33

Perturbation Analysis

Hard to Solve
Systems

Measuring Error Norms

Matrix Norms
Vector induced norm :

A = max
x

Ax
x

= max

x =1

Ax

Induced norm of A is the maximum magnification of x by A


Easy norms to compute:
n

A
A

= max

1
0
j
i=1
Why? Let x =

n
= max
A ij = max abs row sum 1 0
i

j =1

Why? Let x =

= not so easy to compute!!


1

A ij = max abs column sum

SMA-HPC 2003 MIT

34

Hard to Solve
Systems

Perturbation Analysis

Perturbation Equation

(M + M ) ( x + x) = M x + M x + M x + M x =
Models LU
Roundoff

b
Unperturbed
RightHandSide

Models Solution
Perturbation

Since M x - b = 0

M x = M ( x + x ) x = M 1 M ( x + x )
1
Taking Norms
x M 1 MM x + xM x + x
M
x
M
1

M
M
Relative Error Relation
M
x + x
"Condition
Number "

SMA-HPC 2003 MIT

As the algebra on the slide shows the relative changes in the solution x is bounded
by an M
- dependent factor times the relative changes in M. The factor

|| M 1 || || M ||
was historically referred to as the condition number of M, but that definition has
been abandoned as then the condition number is norm
- dependent. Instead the
condition number of M is the ratio of singular values of M.
cond ( M ) =

max( M )
min( M )

Singular values are outside the scope of this course, consider consulting Trefethen
& Bau.

35

Perturbation Analysis

Hard to Solve
Systems

Geometric Approach is clearer


M = [M1 M 2 ], Solving M x = b is finding x1 M1 + x2 M 2 = b
x2

M2
|| M 2 ||

x2
M1
|| M 1 ||

x1

0
Case
1 1 orthogonal
Columns
0 106

M2
|| M 2 ||

M1
|| M 1 ||

x1

1
1 106
Case
1 nearly
Columns
aligned

6
1 10

When vectors are nearly aligned, difficult to determine


how much of M 1 versus how much of M 2
SMA-HPC 2003 MIT

36

Geometric Analysis

Hard to Solve
Systems

Polynomial Interpolation

log(cond(M))
~1020

1020

1010

~1013

1015
~314

~106

t2
16

32

The power series polynomials


are nearly linearly dependent

t1
t2

t12
t 22

tN

t N2

SMA-HPC 2003 MIT

Question Does row


- scaling reduce growth ?

0
d 11 0
0

0 dNN
0

a11

aN 1

a1N d 11 a11

=


aNN dNN aN 1

d 11 a1N

dNN aNN

Does row
- scaling reduce condition number ?

|| M || || M 1 || condition number of M
Theorem If floating point arithmetic is used, then row scaling (D M x = D b) will
not reduce growth in a meaningful way.

If
LU x = b
M
and
D M LU x ' = D b
then
x = x ' No roundoff reduction

37

Summary
Solution Existence and Uniqueness
Gaussian Elimination Basics
LU factorization
Pivoting and Growth
Hard to solve problems
Conditioning

38

Introduction to Simulation - Lecture 4


Direct Methods for Sparse Linear Systems
Luca Daniel

Thanks to Deepak Ramaswamy, Michal


Rewienski, Karen Veroy and Jacob White

Outline
LU Factorization Reminder.
Sparse Matrices
Struts and joints, resistor grids, 3-d heat flow

Tridiagonal Matrix Factorization


General Sparse Factorization
Fill-in and Reordering
Graph Based Approach

Sparse Matrix Data Structures


Scattering

Factoring

LU Basics

Picture

M 11
M
M
M 21
M
M
M 31
M
M
M 41
21

11

31

11

41

11

M 12 M 13 M 14

M 2222 M
M 23
23 M 24
24

M
M 3333 M 34

M
M
32
34
M

M
M
M
M
M
M444444
M
43
42
43
43
42
M
M
32

22

42

43

22

33

SMA-HPC 2003 MIT

The above is an animation of LU factorization. In the first step, the first equation is
used to eliminate x1 from the 2nd through 4th equation. This involves multiplying
row 1 by a multiplier and then subtracting the scaled row 1 from each of the target
rows. Since such an operation would zero out the a21, a31 and a41 entries, we can
replace those zerod entries with the scaling factors, also called the multipliers. For
row 2, the scale factor is a21/a11 because if one multiplies row 1 by a21/a11 and
then subtracts the result from row 2, the resulting a21 entry would be zero. Entries
a22, a23 and a24 would also be modified during the subtraction and this is noted by
changing the color of these matrix entries to blue. As row 1 is used to zero a31 and
a41, a31 and a41 are replaced by multipliers. The remaining entries in rows 3 and 4
will be modified during this process, so they are recolored blue.
This factorization process continues with row 2. Multipliers are generated so that
row 2 can be used to eliminate x2 from rows 3 and 4, and these multipliers are
stored in the zerod locations. Note that as entries in rows 3 and 4 are modified
during this process, they are converted to gr een. The final step is to used row 3 to
eliminate x3 from row 4, modifying row 4s entry, which is denoted by converting
a44 to pink.
It is interesting to note that as the multipliers are standing in for zerod matrix
entries, they are not modified during the factorization.

Factoring

LU Basics

Algorithm

For i = 1 to n-1 {
For j = i+1 to n {

M ji =

M ji
M ii

For each Row


For each target Row below the source

Pivot

n 1

(n i ) =
i =1

n2
2

multipliers

For k = i+1 to n { For each Row element beyond Pivot

M jk M jk M ji M ik
}

Multiplier

n 1

(n i)
i =1

2 3
n
3

Multiply-adds

}
}
SMA-HPC 2003 MIT

Factoring

LU Basics

Theorem about Diagonally


Dominant Matrices

A) LU factorization applied to a strictly


diagonally dominant matrix will never produce
a zero pivot
B) The matrix entries produced by LU
factorization applied to a strictly diagonally
dominant matrix will never increase by more
than a factor 2(n-1)
SMA-HPC 2003 MIT

Theorem Gaussian Elimination applied to strictly diagonally dominant matrices


will never produce a zero pivot.
Proof
1) Show the first step succeeds.
2) Show the (n - 1)x (n - 1) sub matrix

n-1
n

n-1
n

is still strictly diagonally dominant.


First Step

as
a11 0

| a11 |> | aij |


j =2

Second row after first step

0, a 22

Is
a 22

a 21
a 21
a 21
a12, a 23
a13,, a 2n
a1n
a11
a11
a11

a 21
a 21
a12 > a 2 j
a1 j ?
a11
a11

Sparse Matrices

Applications
Space Frame

Nodal Matrix
Space Frame
5

3
4

2
1

X X
X X

X X

Unknowns : Joint positions


Equations : forces = 0

X
X X
X X X
X X X
X X X
X X
X

X
X X
X X
X X
X X
X

i
X =
i

X
X
X

X
X

i
i

2 x 2 b lo c k

SMA-HPC 2003 MIT

Applications

Sparse Matrices
1

m +1

Resistor Grid

m+2

m 1

m+3

2m

m2

(m 1) (m + 1)

Unknowns : Node Voltages


Equations : currents = 0

SMA-HPC 2003 MIT

The resistive grid is an important special case, as it is a model for discretized partial
differential equations (we will see this later).
Lets consider the nodal matrix and examine the locations and number of non zeros.
The matrix has a special form which is easy to discern from a 4 x 4 example. I n the
4 x 4 case the nodal matrix is

x
x

x
x
x

x
x
x

x
x

x
x
x
x

x
x

x
x

x
x
x
x

x
x
x

x
x
x
x

x
x

x
x
x

x
x
x
x

x
x
x

x
x
x

x
x
x

x
x
x

x
x
x

The tridiagonal blocks are due to the interaction between contiguously numbered
nodes along a single row in the grid. The non zeros, a distance 4 from the diagonals
are due to the inter row coupling between the diagonals.

Sparse Matrices
Nodal Formulation

Applications
Resistor Grid

Matrix non-zero locations for 100 x 10 Resistor Grid

Sparse Matrices
Nodal Formulation

Applications
Temperature in a cube

Temperature known on surface, determine interior temperature

m2 + 1

m2 + 2

Circuit
Model

m +1

m+2

Sparse Matrices
Nodal Formulation
1

Tridiagonal Example

X X
X X X

X X X

X X

Matrix Form

m 1

X
X

X
X

X
X

X X
X X

10

Sparse Matrices
For i = 1 to n-1 {
For j = i+1 to n {

M ji =

M ji
M ii

Tridiagonal Example
GE Algorithm

For each Row


For each target Row below the source

Pivot

For k = i+1 to n { For each Row element beyond Pivot

M jk M jk M ji M ik
}

Multiplier

Order N Operations!

}
}
SMA-HPC 2003 MIT

11

Fill-In

Sparse Matrices
R4

V3

Resistor Example

V2

V1

R1

R2

Example

R5

iS 1

R3

Nodal Matrix

R1 + R1
1
R
R1
1

1
R4

1
R2
1
1
+
R2 R3

1
R4

1
1
+
R4 R5

V1 0
V = 0 Symmetric
2 Diagonally Dominant
V3 iS1

SMA-HPC 2003 MIT

Recalling from lecture 2, the entries in the nodal matrix can be derived by noting
that a resistor, as

V n1

ik

Vn2
Rk

contributes to four locations in the nodal matrix as shown below.

n1

n2

n1

n2

1
1

Rk
Rk

1
1

Rk
Rk

It is also resisting to note that Gii is equal to the sum of the conductances (one over
resistance) incident at node i.

12

Sparse Matrices
Matrix Non zero structure

X X X
X X 0

X 0 X

Fill-In
Example

Matrix after one LU step

X X X
X X X
0

0 X
X X

X= Non zero
SMA-HPC 2003 MIT

During a step of LU factorization a multiple of a source row will be subtracted from


a row below it. Since these two rows will not necessarily have non zeros in the same
columns, the result of the subtraction might be to introduce additional non zeros into
the target row.
As a simple example, consider LU factoring

a11 a12

a21 0
The result is

a11
a
21
a11

a12

a a

21 12

a11

Notice that the factored matrix has a non zero entry in the bottom right corner,
where as the original matrix did not. This changing of a zero entry to a non zero
entry is referred to as a fill-in.

13

Fill-In

Sparse Matrices

Second Example

Fill-ins Propagate

X
X

X
0

X
0

X
0

X
0

X
0

Fill-ins from Step 1 result in Fill-ins in step 2


SMA-HPC 2003 MIT

In the example, the 4 x 4 mesh begins with 7 zeros. During the LU factorization, 5
of the zeros become non zero. What is of additional concern is the problem of fill-ins.
The first step of LU factorization where a multiple of the first row is subtracted
from the second row, generates fill-ins in the third and fourth column of row two.
When multiples of row 2 are subtracted from row 3 and row 4, the fill-ins generated
in row 2 generate second- level fill-ins in rows 3 and 4.

14

Sparse Matrices
V3
V1

V2

0
V3
V2

V1

Fill-In
Reordering

x
x

x
x

x
x
x
x

x
x Fill-ins

x
0
x No Fill-ins

Node Reordering Can Reduce Fill-in


- Preserves Properties (Symmetry, Diagonal Dominance)
- Equivalent to swapping rows and columns
SMA-HPC 2003 MIT

In the context of the nodal equation formulation, renumbering the nodes seems like
a simple operation to reduce fill-in, as selecting the node numbers was arbitrary to
begin with. Keep in mind, however, that such a renumbering of nodes in the nodal
equation formulation corresponds to swapping both rows and columns in the matrix.

15

Fill-In

Sparse Matrices

Reordering

Where can fill-in occur ?

Possible Fill-in

x Locations

x
x

Already Factored
Multipliers

x
x
x

x
x
x

Fill-in Estimate = (Non zeros in unfactored part of Row -1)


(Non zeros in unfactored part of Col -1)
Markowitz product
SMA-HPC 2003 MIT

16

Sparse Matrices

Fill-In
Reordering

Markowitz Reordering

For i = 1 to n
Find diagonal j i with min Markowitz Product

Swap rows j i and columns j i

Factor the new row i and determine fill-ins

End
Greedy Algorithm !
SMA-HPC 2003 MIT

In order to understand the Markowitz reordering algorithm, it is helpful to consider


the cost of the algorithm. The first step is to determine the diagonal with the
minimum Markowitz product. The cost of this step is

K i N operations
where K is the average number of non zeros per row.
The second step of the algorithm is to swap rows and columns in the factorization.
A good data structure will make the swap inexpensively.
The third step is to factor the reordered matrix and insert the fill-ins. If the matrix is
very sparse, this third step will also be inexpensive.
Since one must then find the diagonal in the updated matrix with the minimum
Markowitz product, the products must be computed at a cost of

K i ( N 1) operations

1
KN 2 operations will be needed just to compute
2
the Markowitz products in a reordering algorithm.
It is possible to improve the situation by noting that very few Markowitz products
will change during a single step of the factorization. The mechanics of such an
optimization are easiest to see by examining the graphs of a matrix.
Continuing, it is clear that

17

Fill-In

Sparse Matrices

Reordering

Why only try diagonals ?


Corresponds to node reordering in Nodal formulation
1

0
3

0
2

Reduces search cost


Preserves Matrix Properties
- Diagonal Dominance
- Symmetry
SMA-HPC 2003 MIT

18

Fill-In

Sparse Matrices

Pattern of a Filled-in Matrix

Very Sparse

Very Sparse

Dense

SMA-HPC 2003 MIT

19

Sparse Matrices

Fill-In
Unfactored Random Matrix

SMA-HPC 2003 MIT

20

Sparse Matrices

Fill-In
Factored Random Matrix

SMA-HPC 2003 MIT

21

Matrix Graphs

Sparse Matrices

Construction

Structurally Symmetric Matrices and Graphs


X

X X

X X

X X

1
2

X X X
X

X X

4
5

One Node Per Matrix Row


One Edge Per Off-diagonal Pair
SMA-HPC 2003 MIT

In the case where the matrix is structurally symmetric ( aij 0 if and only if
a ji 0), an undirected graph can be associated with the matrix.
The graph has

1 node per matrix row


1 edge between node i and node j if a ij 0
The graph has two important properties
1) The node degree squared yields the Markowitz product.
2) The graph can easily be updated after one step of factorization.
The graph makes efficient a two -step approach to factoring a structurally
symmetric matrix. First one determines an ordering which produces little fill by
using the graph. Then, one numerically factors the matrix in the graph-determined
order.

22

Sparse Matrices
X

X
X

Matrix Graphs
Markowitz Products

1
2

X
X
X

X
X

4
5

Markowitz Products = (Node Degree)2

M 11 3 i 3 = 9
M 22 2 i 2 = 4
M 33 3 i 3 = 9
M 44 2 i 2 = 4
M 55 2 i 2 = 4

(degree 1) 2 = 9
(deg ree 2) 2 = 4
(deg ree 3) 2 = 9
(degree 4) 2 = 4
(degree 5) 2 = 4

SMA-HPC 2003 MIT

That the ith node degree squared is equal to the Markowitz product associated with
the ith diagonal is easy to see. The node degree is the number of edges emanating
from the node, and each edge represents both an off-diagonal row entry and an offdiagonal column entry. Therefore, the number of off-diagonal row entries multiplied
by the number of off-diagonal column entries is equal to the node degree squared.

23

Matrix Graphs

Sparse Matrices

Factorization

One Step of LU Factorization


X

X X

X X

X X

1
2

X X X
X

X X

3
4
5

Delete the node associated with pivot row


Tie together the graph edges
SMA-HPC 2003 MIT

One step of LU factorization requires a number of floating point operations and


produces a reduced matrix, as below

factore d

f

unfactored act unfactored


or (includes fill-in)
de

After step i in the factorization, the unfactored portion of the matrix is smaller of
size (i - 1)x ( i - 1 ) , and may be denser if there are fill-ins. The graph can be used to
represent the location of non zeros in the unfactored portion of the matrix, but two
things must change.
1) A node must be removed as the unfactored portion has one fewer row.
2) The edges associated with fill-ins must be added.
In the animation, we show by example how the graph is updated during a step of LU
factorization. We can state the manipulation precisely by noting that if row i is
eliminated in the matrix, the node i must be eliminated from the graph. In addition,
all nodes adjacent to node i ( adjacent nodes are ones connected by an edge) will be
made adjacent to each other by adding the necessary edges. The added edges
represent fill- in.

24

Matrix Graphs

Sparse Matrices
x
x

x
x

x
x
x

x
x
x

x
x

x
x

x
x

Markowitz products
( = Node degree)

SMA-HPC 2003 MIT

Example

Graph
1
2

Col Row
3
3
= 9
2
2
= 4

3
4

3
3

3
3

= 9
= 9

= 9

25

Matrix Graphs

Sparse Matrices

Example

Swap 2 with 1
x
x

x
x

x
x

x
x

x
x

Graph

SMA-HPC 2003 MIT

Examples that factor with no fill-in


Tridiagonal

1
A

Another ordering for the tridiagonal matrix that is more parallel

1
A

3
E

4
A

E
A

G
E

26

Graphs

Sparse Matrices
1

m +1

Resistor Grid Example

m+2

m 1

m+3

2m

m2

(m 1) (m + 1)

Unknowns : Node Voltages


Equations : currents = 0

SMA-HPC 2003 MIT

The resistive grid is an important special case, as it is a model for discretized partial
differential equations (we will see this later).
Lets consider the nodal matrix and examine the locations and number of non zeros.
The matrix has a special form which is easy to discern from a 4 x 4 example. I n the
4 x 4 case the nodal matrix is

x
x

x
x
x

x
x
x

x
x

x
x
x
x

x
x

x
x

x
x
x
x

x
x
x

x
x
x
x

x
x

x
x
x

x
x
x
x

x
x
x

x
x
x

x
x
x

x
x
x

x
x
x

The tridiagonal blocks are due to the interaction between contiguously numbered
nodes along a single row in the grid. The non zeros, a distance 4 from the diagonals
are due to the inter row coupling between the diagonals.

27

Sparse Matrices

Matrix Graphs
Grid Example

How long does it take to factor an M x M grid

Suppose the center column is eliminated last?


SMA-HPC 2003 MIT

A quick way to get a rough idea of how long it takes to factor the M2 x M2 matrix
associated with an M x M grid like a resistor array is to examine the graph. If one
orders the center column of M nodes in the graph last then they will be completely
connected as shown in the animation. However, a completely connected graph
corresponds to a dense matrix.
Since the resulting dense matrix requires M3 operations to factor, this suggests that
factoring an M x M grid costs something M3 operations, though making such an
argument precise is beyond the scope of the course.

28

Sparse Factorization Approach


1) Assume matrix requires NO numerical pivoting.
Diagonally dominant or symmetric positive definite.

2) Use Graphs to Determine Matrix Ordering


Many graph manipulation tricks used.

3) Form Data Structure for Storing Filled-in Matrix


Lots of additional nonzeros added

4) Put numerical values in Data Structure and factor


Computation must be organized carefully!

29

Sparse Matrices
Vector of row
pointers
1

Sparse Data Structure

Arrays of Data in a Row

Val 11 Val 12

Val 1K

Col 11 Col 12

Col 1K

Matrix entries
Column index

Val 21 Val 22

Val 2L

Col 21 Col 22

Col 2L

Val N1 Val N2

Val Nj

Col N1 Col N2

Col Nj

SMA-HPC 2003 MIT

In order to store a sparse matrix efficiently, one needs a data structure which can
represent only the matrix non zeros. One simple approach is based on the
observation that each row of a sparse matrix has at least one non zero entry. Then
one constructs one pair of arrays for each row, where the array part corresponds to
the matrix entry and the entrys column. As an example, consider the matrix

a11 0 a13

a
a
0
21 22

0 a33

The data structure for this example is


a11
1

a13
3

a 21
1

a 23
2

a 33
3

Note that there is no explicit storage for the zeros

30

Sparse Matrices

Sparse Data Structure


Problem of Misses

Eliminating Source Row i from Target row j


Row i

Row j

M i ,i +1

M i ,i + 7

M i ,i +15

i +1

i+7

i + 15

M i ,i +1

M i ,i + 4

M i ,i + 5

M i ,i + 7

M i ,i + 9

M i ,i +12

M i ,i +15

i +1

i+4

i+5

i+7

i+9

i + 12

i + 15

Must read all the row j entries to find the 3 that match row i

SMA-HPC 2003 MIT

In order to store a sparse matrix efficiently, one needs a data structure which can
represent only the matrix non zeros. One simple approach is based on the
observation that each row of a sparse matrix has at least one non zero entry. Then
one constructs one pair of arrays for each row, where the array part corresponds to
the matrix entry and the entrys column. As an example, consider the matrix

a11 0 a13

a
a
0
21 22

0 a33

The data structure for this example is


a11
1

a13
3

a 21
1

a 23
2

a 33
3

Note that there is no explicit storage for the zeros

31

Sparse Matrices

Sparse Data Structure


Data on Misses

Rows

Ops

Misses

Res

300

904387 248967

RAM

2806

1017289 3817587

Grid

4356

3180726 3597746

More
misses
than
ops!

Every Miss is an unneeded Memory Reference!


SMA-HPC 2003 MIT

In order to store a sparse matrix efficiently, one needs a data structure which can
represent only the matrix non zeros. One simple approach is based on the
observation that each row of a sparse matrix has at least one non zero entry. Then
one constructs one pair of arrays for each row, where the array part corresponds to
the matrix entry and the entrys column. As an example, consider the matrix

a11 0 a13

a
a
0
21 22

0 a33

The data structure for this example is


a11
1

a13
3

a 21
1

a 23
2

a 33
3

Note that there is no explicit storage for the zeros

32

Sparse Matrices

Sparse Data Structure


Scattering for Miss Avoidance

Row j

M i ,i +1

M i ,i +1
i +1

M i ,i + 4
i+4

M i , i + 4 M i ,i + 5

M i ,i + 5
i+5

M i ,i + 7

M i ,i + 7
i+7

M i ,i + 9

M i ,i + 9
i+9

M i ,i +12
i + 12

M i ,i +12

M i ,i +15
i + 15

M i ,i +15

1) Read all the elements in Row j, and scatter them in an n-length vector
2) Access only the needed elements using array indexing!
SMA-HPC 2003 MIT

In order to store a sparse matrix efficiently, one needs a data structure which can
represent only the matrix non zeros. One simple approach is based on the
observation that each row of a sparse matrix has at least one non zero entry. Then
one constructs one pair of arrays for each row, where the array part corresponds to
the matrix entry and the entrys column. As an example, consider the matrix

a11 0 a13

a
a
0
21 22

0 a33

The data structure for this example is


a11
1

a13
3

a 21
1

a 23
2

a 33
3

Note that there is no explicit storage for the zeros

33

Summary
LU Factorization and Diagonal Dominance.
Factor without numerical pivoting

Sparse Matrices
Struts, resistor grids, 3-d heat flow -> O(N) nonzeros

Tridiagonal Matrix Factorization


Factor in O(N) operations

General Sparse Factorization


Markowitz Reordering to minize fill

Graph Based Approach


Factorization and Fill-in
Useful for estimating Sparse GE complexity

34

Introduction to Simulation - Lecture 5


QR Factorization
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


and Karen Veroy

QR Factorization

Singular Example
LU Factorization Fails

Strut

Joint
Load force

The resulting nodal matrix is SINGULAR, but a solution exists!


SMA-HPC 2003 MIT

QR Factorization

Singular Example
LU Factorization Fails

v1

v2

1 v3

v4

The resulting nodal matrix is SINGULAR, but a solution exists!


SMA-HPC 2003 MIT

QR Factorization

Singular Example

Recall weighted sum of columns view of


systems of equations

M1

M2

x1 b1

x2 b2
MN
=


xN bN

x1M 1 + x2 M 2 +

+ xN M N = b

M is singular but b is in the span of the columns of M


SMA-HPC 2003 MIT

QR Factorization

Orthogonalization
If M has orthogonal columns

Orthogonal columns implies:

Mi M j = 0

i j

Multiplying the weighted columns equation by ith column:

M i x1M 1 + x2 M 2 +

+ xN M N = M i b

Simplifying using orthogonality:

xi M i M i = M i b xi =
SMA-HPC 2003 MIT

Mi b

(M

Mi

QR Factorization

Orthogonalization
Orthonormal M - Picture

M is orthonormal if:

Mi M j = 0

i j and

Mi Mi = 1

Picture for the two-dimensional case

M1

M1

M2
Non-orthogonal Case
SMA-HPC 2003 MIT

x1

x2
Orthogonal Case

M2

QR Factorization

Orthogonalization
QR Algorithm Key Idea

x1 b1

x2 b2
M
M
M
=
2
N
1

xN bN
Original Matrix

y
b

1 1

y2
b2

=
QN
Q1 Q2

yN bN
Matrix with
Orthonormal
Columns

Qy = b y = Q b
T

How to perform the conversion?


SMA-HPC 2003 MIT

Orthogonalization

QR Factorization

Projection Formula

Given M 1 , M 2 , find Q2 =M 2 r12 M 1 so that

M 1 Q2 = M 1 M 2 r12 M 1

) =0

M1 M 2
r12 =
M1 M1

M2

Q2
SMA-HPC 2003 MIT

r12

M1

Orthogonalization

QR Factorization

Normalization

Formulas simplify if we normalize


1
1
Q1 =
M 1 = M 1 Q1 Q1 = 1
r11
M1 M1

Now find Q2 =M 2 r12Q1 so that Q2 Q1 = 0


r12 = Q1 M 2
1

1
Finally Q2 =
Q2 = Q2
r
22
Q2 Q2

SMA-HPC 2003 MIT

QR Factorization

Orthogonalization
How was a 2x2 matrix
converted?

Since Mx should equal Qy, we can relate x to y


y1

x1
M 1 M 2 x = x1M 1 + x2 M 2 = Q1 Q2 y = y1Q1 + y2Q2
2

M 1 = r11Q1

M 2 = r22 Q2 + r12Q1
r11
0

SMA-HPC 2003 MIT

r12 x1 y1
=

r22 x2 y2

QR Factorization

M1


x1
M2 =
x2

Orthogonalization
The 2x2 QR Factorization

r11 r12 x1 b1
=
Q1 Q2

0 r22 x2 b2

Upper
Triangular
Orthonormal

Two Step Solve Given QR

Step 1) QRx = b Rx = QT b = b

Step 2) Backsolve Rx = b
SMA-HPC 2003 MIT

Orthogonalization

QR Factorization

The General Case

3x3 Case

M1

M2



M 3 M1

M 2 r12 M 1

To Insure the third column is orthogonal

(
(M

)
M )= 0

M 1 M 3 r13 M 1 r23 M 2 = 0
M2

SMA-HPC 2003 MIT

r13 M 1 r23

M 3 r13 M 1 r23 M 2

QR Factorization

(
(M

Orthogonalization
Must Solve Equations for
Coefficients in 3x3 Case

)
M )= 0

M 1 M 3 r13 M 1 r23 M 2 = 0
M2

M1 M1

M 2 M1
SMA-HPC 2003 MIT

r13 M 1 r23

M 1 M 2 r13 M 1 M 3

=
M 2 M 2 r23 M 2 M 3

QR Factorization

Orthogonalization
Must Solve Equations for
Coefficients

To Orthogonalize the Nth Vector

M1 M1

M N 1 M 1

M 1 M N 1 r1, N M 1 M N


M N 1 M N 1 rN 1, N M N 1 M N
3

N inner products requires N work


SMA-HPC 2003 MIT

QR Factorization

M1

Orthogonalization

3x3 Case

M2



M 3 M1

Use previously
orthogonalized vectors

M 2 r12Q1

M 3 r13Q1 r23Q2

To Insure the third column is orthogonal

Q1 M 3 Q1r13 Q2 r23 = 0 r13 = Q1 M 3


Q2 M 3 Q1r13 Q2 r23 = 0 r23 = Q2 M 3
SMA-HPC 2003 MIT

Basic Algorithm

QR Factorization
For i = 1 to N

rii = M i M i
1
Qi = M i
rii

For each Source Column


N

2N 2N

Normalize

For j = i+1 to N {

rij M j Qi
M j M j rij Qi
SMA-HPC 2003 MIT

Modified Gram-Schmidt

operations

i =1

For each target Column right of source


N

( N i)2 N N
i =1

operations

QR Factorization

Basic Algorithm
By Picture

Q1

Q2

Q3

SMA-HPC 2003 MIT

QN

r11
0

r12

r13

r22
0

r23
r33

r1N

r2 N
r3 N

rNN

QR Factorization

Basic Algorithm
By Picture

M1 1
Q

M
Q22

SMA-HPC 2003 MIT

Q33
M

Q
M44

r11 r12

r13

r14

r22

r23

r24

r33

r34
r44

QR Factorization

Basic Algorithm
Zero Column

What if a Column becomes Zero?

Q1


r11 r12 r13

0 0 0
MN
0 M3

0 0 0

0

Matrix MUST BE Singular!


0

r1N
0
0

1) Do not try to normalize the column.


2) Do not use the column as a source for orthogonalization.
3) Perform backward substitution as well as possible
SMA-HPC 2003 MIT

Basic Algorithm

QR Factorization

Zero Column Continued

Resulting QR Factorization

Q1

0 Q3
0

SMA-HPC 2003 MIT

QN

r11
0

0
0

r12
0

r13
0

r33

r1N
0
r3 N

rNN

QR Factorization

Singular Example

Recall weighted sum of columns view of


systems of equations

M1

M2

x b
1 1
x2 b2
=
MN


xN bN

x1M 1 + x2 M 2 +

+ xN M N = b

Two Cases when M is singular

Case 1) b span{M 1 ,.., M N } b span{Q1 ,.., QN }


Case 2) b span{M 1 ,.., M N }, How accurate is x ?
SMA-HPC 2003 MIT

QR Factorization

Minimization View
Alternative Formulations

Definition of the Residual R: R ( x ) b Mx

Find x which satisfies


Mx = b

Minimize over all x


N

R ( x ) R ( x ) = ( Ri ( x ) )
T

i =1

Equivalent if b span {cols ( M )}


T
Mx = b and min x R ( x ) R ( x ) = 0
Minimization extends to non-singular or nonsquare case!
SMA-HPC 2003 MIT

Minimization View

QR Factorization

One-dimensional
Minimization

Suppose x = x1e1 and therefore Mx = x1Me1 = x1M 1


One dimensional Minimization
R ( x ) R ( x ) = ( b x1Me1 ) ( b x1Me1 )
T

= b b 2 x1b Me1 + x ( Me1 )


T

2
1

( Me1 )

d
T
T
T
R ( x ) R ( x ) = 2b Me1 + 2 x1 ( Me1 ) ( Me1 ) = 0
dx
T
b Me1
x1 = T T
e1 M Me1 Normalization
SMA-HPC 2003 MIT

Minimization View

QR Factorization

One-dimensional
Minimization, Picture

Me1 = M 1
b

x1

b Me1
x1 = T T
e1 M Me1

e1
One dimensional minimization yields same result as
projection on the column!
SMA-HPC 2003 MIT

Minimization View

QR Factorization

Two-dimensional
Minimization

Now x = x1e1 + x2 e2 and Mx = x1Me1 + x2 Me2


Residual Minimization
R ( x ) R ( x ) = ( b x1Me1 x2 Me2 ) ( b x1Me1 x2 Me2 )
T

= b b 2 x1b Me1 + x ( Me1 )

( Me1 )
T
T
2
2 x2b Me2 + x2 ( Me2 ) ( Me2 )

Coupling
Term
SMA-HPC 2003 MIT

2
1

+2 x1 x2 ( Me1 )

( Me2 )

Minimization View

QR Factorization

Two-dimensional
Minimization Continued

More General Search Directions

x = v1 p1 + v2 p2 and Mx = v1Mp1 + v2 Mp2


span { p1 , p2 } = span {e1 , e2 }
R ( x ) R ( x ) = b b 2v1b Mp1 + v ( Mp1 )
T

2
1

( Mp1 )

2v2b Mp2 + v ( Mp2 )


T

Coupling
Term
T
1

( Mp2 )
T
+2v1v2 ( Mp1 ) ( Mp2 )
2
2

If p M T Mp2 = 0 Minimizations Decouple!!


SMA-HPC 2003 MIT

Minimization View

QR Factorization

Forming MTM orthogonal


Minimization Directions

ith search direction equals MTM orthogonalized unit vector


i 1

pi = ei rji p j

pi M T Mp j = 0

j =1

Use previous orthogonalized


Search directions

Mp ) ( Me )
(
=
( Mp ) ( Mp )
T

rji

SMA-HPC 2003 MIT

Minimization View

QR Factorization

Minimizing in the Search


Direction

Decoupled minimizations done individually


Minimize: v ( Mpi )
2
i

Differentiating:

T
Mp

2
v
b
( i ) i Mpi

2vi ( Mpi )

vi =

SMA-HPC 2003 MIT

( Mpi ) 2bT Mpi = 0

bT Mpi

( Mpi ) ( Mpi )
T

QR Factorization
For i = 1 to N

pi = ei

For each Source Column left of target

rij pTj M T Mpi


pi pi rij p j

rii = Mpi Mpi


1
pi
pi
rii
SMA-HPC 2003 MIT

Minimization Algorithm

For each Target Column

For j = 1 to i-1

x = x + vi pi

Minimization View

Orthogonalize Search Direction

Normalize search direction

QR Factorization
Q1

Minimization and QR
Comparison

QN

Q2

1
e1
r11
p1

1
( e2 r12e1 )
r22

SMA-HPC 2003 MIT

p2

Orthonormal

1
e2 riN ei )
(
MTM
rNN
Orthonormal
pN

QR Factorization

Search Direction

Orthogonalized unit vectors search directions


{ p1 , , pN }
{e1 , e2 , , eN }

Unit Vectors

MTM
Orthogonalization

Search Directions

Could use other sets of starting vectors


2
b
,
Mb
,
M
b,}
{

MTM
Krylov-Subspace Orthogonalization

Why?
SMA-HPC 2003 MIT

{ p1 , , pN }

Search Directions

Summary
QR Algorithm
Projection Formulas
Orthonormalizing the columns as you go
Modified Gram-Schmidt Algorithm

QR and Singular Matrices


Matrix is singular, column of Q is zero.

Minimization View of QR
Basic Minimization approach
Orthogonalized Search Directions
QR and Length minimization produce identical results

Mentioned changing the search directions


SMA-HPC 2003 MIT

Introduction to Simulation - Lecture 6


Krylov-Subspace Matrix Solution Methods
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


and Karen Veroy

Outline
General Subspace Minimization Algorithm
Review orthogonalization and projection formulas

Generalized Conjugate Residual Algorithm


Krylov-subspace
Simplification in the symmetric case.
Convergence properties

Eigenvalue and Eigenvector Review


Norms and Spectral Radius
Spectral Mapping Theorem

Arbitrary Subspace
methods

Approach to Approximately Solving Mx=b


w01
wk 11

,...,
{w0 ,..., wk 1}
w

w
k 1N
0N

Pick a kdimensional
Subspace

Approximate x as a weighted sum of {w0 ,..., wk 1}


k

x =
k

k 1

w
i =0

SMA-HPC 2003 MIT

Residual Minimization

Arbitrary Subspace
methods

The residual is defined as r b Mx


k

If x =
k

k 1

w
i

i =0

k 1

r = b Mx = b i Mwi
k

i =0

Residual Minimizing idea: pick i ' s to minimize


r

k 2
2

( ) (r )

SMA-HPC 2003 MIT

k 1

= b i Mwi b i Mwi
i =0
i =0

k 1

Arbitrary Subspace
methods

Minimizing r

k 2
2

= b

Residual Minimization
Computational Approach
k 1

Mw
i =0

is easy if

i
2

( Mw ) = 0, i j or ( Mw ) orthogonal to ( Mw )
Create a set of vectors { p0 ,..., pk 1} such that
( Mwi )

span { p0 ,..., pk 1} = span {w0 ,..., wk 1}

and ( Mpi ) ( Mp j ) = 0, i j
T

SMA-HPC 2003 MIT

Residual Minimization

Arbitrary Subspace
methods

Algorithm Steps

Given M , b and a set of search directions {w0 ,..., wk 1}


1) Generate p j 's by orthogonalizing Mw j ' s
j 1

For j = 0 to k 1 p j = w j
i =0

( Mw ) ( Mp ) p
T

( Mpi ) ( Mpi )
T

2) compute the r minimizing solution x k


k 1

( r ) ( Mp )

i =0

( Mpi ) ( Mpi )

x =
k

SMA-HPC 2003 MIT

0 T

k 1

( r ) ( Mp )

i =0

( Mpi ) ( Mpi )

pi =

i T

pi

Arbitrary Subspace
methods
1) orthogonalize the Mwi ' s

w
p00

w11
p

Residual Minimization
Algorithm Steps by Picture

w
p22

w
p33

2) compute the r minimizing solution x k


M p1
r0

SMA-HPC 2003 MIT

M p0

Minimization Algorithm

Arbitrary Subspace
Solution Algorithm

r 0 = b Ax 0
For j = 0 to k-1
p j = wj
For i = 0 to j-1
T
p j p j ( Mp j ) ( Mpi ) pi
pj

( Mp ) ( Mp )
T

j +1

j +1

Normalize

= x + (r
j

=r

pj

) ( Mp ) p
( r ) ( Mp ) M p

SMA-HPC 2003 MIT

Orthogonalize
Search Direction

j T

Update Solution

j T

Update Residual

Arbitrary Subspace
methods

Subspace Selection
Criteria

Criteria for selecting w0 ,..., wk 1


All that matters is the span {w0 ,..., wk 1}
i ' s such that b Mx k = b

k 1

Mw
i =0

A b in the span {w0 ,..., wk 1} for k


1

is small

One choice, unit vectors, x k span {e1 ,..., ek }


Generates the QR algorithm if k=N
Can be terrible if k < N
SMA-HPC 2003 MIT

Arbitrary Subspace
methods

Subspace Selection
Historical Development

1 T
T
Consider minimizing f ( x ) = x Mx x b
2
T
T

Assume M = M (symmetric) and x Mx > 0 (pos. def)

x f ( x ) = Mx b x = M 1b minimizes f

( )

Pick span {w0 ,..., wk 1} = span x f x ,..., x f x


0

Steepest descent directions for f, but f is not residual


Does not extend to nonsymmetric, non pos def case
SMA-HPC 2003 MIT

k 1

)}

Arbitrary Subspace
methods

Subspace Selection
Krylov Subspace

Note: span x f x 0 ,..., x f x k 1

)}

( )

= span r 0 ,..., r k 1

If: span {w0 ,..., wk 1} = span r ,..., r


then r k = r 0

k 1

Mr
i
i =0

k 1

and span r ,..., r

k 1

} = span {r , Mr ,..., M
0

k 1 0

Krylov Subspace
SMA-HPC 2003 MIT

The Generalized Conjugate


Residual Algorithm

Krylov Methods

The kth step of GCR

( r ) ( Mp )
k T

k =

( Mpk ) ( Mpk )
T

Determine optimal stepsize in


kth search direction

x k +1 = x k + k pk

k +1

Update the solution


and the residual

= r k Mpk

pk +1 = r

k +1

Mr ) ( Mp )
(

p
( Mp ) ( Mp )

SMA-HPC 2003 MIT

j =0

k +1 T

Compute the new


orthogonalized
search direction

The Generalized Conjugate


Residual Algorithm

Krylov Methods

Algorithm Cost for iter k

( r ) ( Mp )
k T

k =

Vector inner products, O(n)


Matrix-vector product, O(n) if sparse

( Mpk ) ( Mpk )
T

x k +1 = x k + k pk

k +1

Vector Adds, O(n)

= r k Mpk

pk +1 = r

k +1

Mr ) ( Mp )
(

p
( Mp ) ( Mp )
k

j =0

k +1 T

O(k) inner products,


total cost O(nk)

If M is sparse, as k (# of iters) approaches n,


3
total cost = O (n ) + O (2n ) + .... + O ( kn ) = O (n )
SMA-HPC 2003 MIT

Better Converge Fast!

The Generalized Conjugate


Residual Algorithm

Krylov Methods

Symmetric Case

An Amazing fact that will not be derived


T
k +1
j
If M = M then r Mp j < k
Mr ) ( Mp )
Mr ) ( Mp )
(
(
p = r
p p =r
p
( Mp ) ( Mp )
( Mp ) ( Mp )
Orthogonalization in one step
If k (# of iters ) n, then symmetric,
sparse, GCR is O(n2 )
Better Converge Fast!
k +1

k +1

j =0

SMA-HPC 2003 MIT

k +1 T

k +1 T

k +1

k +1

Krylov Methods
Nodal Formulation

No-leak Example
Insulated bar and Matrix

Incoming Heat

T (1)

T (0)
Near End
Temperature

Discretization

m
SMA-HPC 2003 MIT

2 1

1 2
Nodal

1 Equation
Form

1 2

Far End
Temperature

Krylov Methods
Nodal Formulation
1

m
SMA-HPC 2003 MIT

No-leak Example
Circuit and Matrix

2 1

1 2

1 2

m 1

Nodal
Equation
Form

Krylov Methods
Nodal Formulation

leaky Example
Conducting bar and Matrix

T (1)

T (0)
Near End
Temperature

Discretization

2.01 1

1 2.01
Nodal

Equation

1 Form

1 2.01

m
SMA-HPC 2003 MIT

Far End
Temperature

leaky Example

Krylov Methods
Nodal Formulation
1

m
SMA-HPC 2003 MIT

Circuit and Matrix

m 1

2.01 1

1 2.01
Nodal

Equation

1
Form

1 2.01

GCR Performance(Random Rhs)


10

R
E
S
I
D
U
A
L

10

10

10

10

Insulating
-1

Leaky
-2

-3

-4

10

20

Iteration

30

40

Plot of log(residual) versus Iteration


SMA-HPC 2003 MIT

50

60

GCR Performance(Rhs = -1,+1,-1,+1.)


0

10

R
E
S
I
D
U
A
L

-1

10

-2

10

-3

10

Insulating

-4

10

Leaky
-5

10

10

15

20

25

Iteration

30

35

Plot of log(residual) versus Iteration


SMA-HPC 2003 MIT

40

45

50

Convergence Analysis

Krylov Subspace
Methods

Polynomial Approach

If span {w0 ,..., wk } = span r 0 , Mr 0 ,..., M k r 0

k +1

M r
i =0

k +1

i 0

= k ( M ) r

kth order polynomial

= r i M r = ( I M k ( M )) r
0

i +1 0

i =0

Note: for any 0 0


0
1
0
0
span r , r = r 0 Mr

SMA-HPC 2003 MIT

}=

span r , Mr

Krylov Methods

Convergence Analysis
Basic Properties

If j 0 for all j k in GCR, then


0
0
k 0
1) span { p0 , p1 ,..., pk } = span r , Mr , ..., M r

2) x

= k ( M )r , k is the k order
k +1
polynomial which minimizes r

k +1

th

2
2

= b Mx = r M k ( M )r
0
0
= ( I M k ( M ) ) r k +1 ( M ) r
th
0
where k +1 ( M ) r is the ( k + 1) order poly
k +1 2
minimizing r
subject to k +1 ( 0 ) =1

3) r

k +1

k +1

SMA-HPC 2003 MIT

Convergence Analysis

Krylov Methods

Optimality of GCR poly

GCR Optimality Property


r

k +1 2
2

k+1 ( M )r

0 2
2

where k+1 is any k order


th

polynomial such that k+1 ( 0 ) =1

Therefore
Any polynomial which satisfies
the zero constraint can be used
to get an upper bound on
SMA-HPC 2003 MIT

k +1 2
2

Eigenvalues and
Vectors Review

Basic Definitions

Eigenvalues and eigenvectors of a matrix M satisfy


eigenvalue

Mui = i ui
eigenvector

Or, i is an eigenvalue of M if

M i I is singular

ui is an eigenvector of M if

( M i I ) ui

SMA-HPC 2003 MIT

=0

Eigenvalues and
Vectors Review

1.1 1
1 1.1

M 11
M
21

M N 1

Examples

1 1 0 0
1 1 0 0

0 0 1 1

0 0 1 2

0
M 22

SMA-HPC 2003 MIT

Basic Definitions

M NN 1

M NN

Eigenvalues?
Eigenvectors?

What about a lower


triangular matrix

A Simplifying
Assumption

Eigenvalues and
Vectors Review

Almost all NxN matrices have N linearly


independent Eigenvectors

u1

u2

u3

uN

= 1u1

2u2

3u3

N u N

The set of all eigenvalues of M is known as


the Spectrum of M
SMA-HPC 2003 MIT

Eigenvalues and
Vectors Review

A Simplifying
Assumption Continued

Almost all NxN matrices have N linearly


independent Eigenvectors

MU =U

1
0

0
0

0
0
N

U MU = or M = U U
1

Does NOT imply distinct eigenvalues, i can equal j


Does NOT imply M is nonsingular
SMA-HPC 2003 MIT

Eigenvalues and
Vectors Review
Im ( )

Spectral Radius

Re ( )

The spectral Radius of M is the radius of the smallest


circle, centered at the origin, which encloses all of
Ms eigenvalues
SMA-HPC 2003 MIT

Eigenvalues and
Vectors Review

Heat Flow Example

Incoming Heat

T (1)

T (0)
Unit Length Rod

T1
+
-

vs = T(0)

SMA-HPC 2003 MIT

TN
+
-

vs = T(1)

Eigenvalues and
Vectors Review

Heat Flow Example


Continued

2 1 0 0
1 2

0
1

0
0
1
2

Eigenvalues N=20

SMA-HPC 2003 MIT

Eigenvalues and
Vectors Review

Heat Flow Example


Continued

Four Eigenvectors Which ones?

SMA-HPC 2003 MIT

Useful
Eigenproperties

Spectral Mapping
Theorem

Given a polynomial

f ( x ) = a0 + a1 x + + a p x

Apply the polynomial to a matrix

f ( M ) = a0 + a1M + + a p M
Then

spectrum ( f ( M ) ) = f ( spectrum ( M ) )

SMA-HPC 2003 MIT

Useful
Eigenproperties

Spectral Mapping
Theorem Proof

Note a property of matrix powers

MM = U U U U = U U
p
p 1
M = U U
1

Apply to the polynomial of the matrix


1
1
p 1
f ( M ) = a0UU + a1U U + + a pU U
Factoring

f ( M ) = U ( a0 I + a1 + + a p
Diagonal

f ( M ) U = U ( a0 I + a1 + + a p
SMA-HPC 2003 MIT

)U

Useful
Eigenproperties

Spectral
Decomposition

Decompose arbitrary x in eigencomponents

x = 1u1 + 2u2 + + N u N

1
Compute by solving U = x = U 1 x

N
Applying M to x yeilds
Mx = M (1u1 + 2u2 + + N u N )
= 11u1 + 2 2u2 + + N N u N
SMA-HPC 2003 MIT

Krylov Methods

Convergence Analysis
Important Observations

1) The GCR Algorithm converges to the exact solution


in at most n steps

Proof: Let n ( x ) = ( x 1 )( x 2 ) ... ( x n )


where i ( M ) .

n ( M ) r 0 = 0 and therefore r n = 0
2) If M has only q distinct eigenvalues, the GCR
Algorithm converges in at most q steps

Proof: Let q ( x ) = ( x 1 )( x 2 ) ... ( x q )

SMA-HPC 2003 MIT

Summary
Arbitrary Subspace Algorithm
Orthogonalization of Search Directions

Generalized Conjugate Residual Algorithm


Krylov-subspace
Simplification in the symmetric case.
Leaky and insulating examples

Eigenvalue and Eigenvector Review


Spectral Mapping Theorem

GCR limiting Cases


Q-step guaranteed convergence

Introduction to Simulation - Lecture 7


Krylov-Subspace Matrix Solution Methods
Part II
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


and Karen Veroy

Outline
Reminder about GCR
Residual minimizing solution
Krylov Subspace
Polynomial Connection

Review Eigenvalues and Norms


Induced Norms
Spectral mapping theorem

Estimating Convergence Rate


Chebychev Polynomials

Preconditioners
Diagonal Preconditioners
Approximate LU preconditioners

With Normalization

Generalized
Conjugate Residual
Algorithm

r 0 = b Ax 0

For j = 0 to k-1
pj = r j
Residual is next search direction
For i = 0 to j-1
Orthogonalize
T
p j p j ( Mp j ) ( Mpi ) pi Search Direction
pj

pj

( Mp ) ( Mp )
T

x j +1 = x j + ( r
r j +1 = r j

Normalize

) ( Mp ) p
( r ) ( Mp ) M p
j T

Update Solution

j T

Update Residual

SMA-HPC 2003 MIT

Generalized
Conjugate Residual
Algorithm
1) orthogonalize the Mr i ' s

rp00

rp11

With Normalization
Algorithm Steps by Picture

pr 22

rp33

2) compute the r minimizing solution x k


r k+1

SMA-HPC 2003 MIT

rk
M pk

First Few Steps

Generalized
Conjugate Residual
Algorithm

r0
First search direction r = b Mx = b, p0 =
Mr 0
0

Residual minimizing x1 =
solution
Second Search
Direction

( r 0 ) Mp0 p0
T

r1 = b Mx1 = r 0 1Mr 0
p1 =

r1 1,0 p0

M r1 1,0 p0

SMA-HPC 2003 MIT

Generalized
Conjugate Residual
Algorithm
Residual minimizing
solution

First few steps


Continued

x 2 = x1 + ( r1 ) Mp1 p1
T

Third Search Direction

r 2 = b Mx 2 = r 0 2,1Mr 0 2,0 M 2 r 0

p2 =

r1 2,0 p0 2,1 p1

M r1 2,0 p0 2,1 p1

SMA-HPC 2003 MIT

The kth step of GCR

Generalized
Conjugate Residual
Algorithm
k 1

pk = r ( Mr k )
k

j =0

pk =

pk
Mpk

k = ( r k ) ( Mpk )
T

x k +1 = x k + k pk

r k +1 = r k k Mpk

( Mp ) p
j

Orthogonalize and
normalize search
direction

Determine optimal stepsize in


kth search direction
Update the solution
and the residual

SMA-HPC 2003 MIT

Polynomial view

Generalized
Conjugate Residual
Algorithm

If j 0 for all j k in GCR, then


1) span { p0 , p1 ,..., pk } = span r 0 , Mr 0 ,..., Mr k

2) x k +1 = k ( M ) r 0 , k is the k th order poly


2
minimizing r k +1
2
k +1
k +1
0
3) r = b Mx = r M k ( M ) r 0
= ( I M k ( M ) ) r 0 k +1 ( M ) r 0
th
where k +1 ( M ) r 0 is the ( k + 1) order poly
2
minimizing r k +1 subject to k +1 ( 0 ) =1
SMA-HPC 2003 MIT

Residual Minimization

Krylov Methods

Polynomial View

If x k +1 span r 0 , Mr 0 ,..., Mr k

minimizes r k +1

2
2

= k ( M )r , k is the k order poly


2
minimizing r k +1
2
k +1
k +1
2) r = b Mx = ( I M k ( M ) ) r 0 =k +1 ( M ) r 0
th
where k +1 ( M ) r 0 is the ( k + 1) order poly
k +1 2
minimizing r
subject to k +1 ( 0 ) =1

1) x

k +1

th

Polynomial Property only a function of


solution space and residual minimization
SMA-HPC 2003 MIT

Krylov Methods
Nodal Formulation

No-leak Example
Insulated bar and Matrix

Incoming Heat

T (1)

T (0)
Near End
Temperature

Discretization

m
SMA-HPC 2003 MIT

Far End
Temperature

2 1

1 2
Nodal

1 Equation
Form

1
2

10

Krylov Methods
Nodal Formulation

leaky Example
Conducting bar and Matrix

T (1)

T (0)
Near End
Temperature

Discretization

Far End
Temperature

2.01 1

1 2.01
Nodal

Equation

1 Form

1 2.01

m
SMA-HPC 2003 MIT

11

GCR Performance(Random Rhs)


10

R
E
S
I
D
U
A
L

10

10

10

10

Insulating
-1

Leaky
-2

-3

-4

10

20

Iteration

30

40

50

60

Plot of log(residual) versus Iteration


SMA-HPC 2003 MIT

12

GCR Performance(Rhs = -1,+1,-1,+1.)


0

10

R
E
S
I
D
U
A
L

-1

10

-2

10

-3

10

Insulating

-4

10

Leaky
-5

10

10

15

20

25

Iteration

30

35

40

45

50

Plot of log(residual) versus Iteration


SMA-HPC 2003 MIT

13

Krylov Methods

Residual Minimization
Optimality of poly

Residual Minimizing Optimality Property


r k +1 k+1 ( M )r 0 k+1 ( M ) r 0
k+1 is any k th order poly such that k+1 ( 0 ) =1

Therefore
Any polynomial which satisfies
the constraints can be used to
get an upper bound on

r k +1
r0

SMA-HPC 2003 MIT

14

Induced Norms

Matrix Magnification
Question

Suppose y = Mx
How much larger is y than x?
OR

How much does M magnify x?


SMA-HPC 2003 MIT

15

Vector Norm
Review

Induced Norms
L2 (Euclidean) norm :
x

i=1

L1 norm :
x

xi

xi

i=1

< 1

< 1

L norm :
x

= max
i

xi

< 1

SMA-HPC 2003 MIT

16

Standard Induced
l-norms

Induced Matrix
Norms

Definition:
M l max

Mx
x

Examples

M
SMA-HPC 2003 MIT

max
1

= max

i
j =1

max

M ij

M
j
i =1

ij

x l =1

Mx

Max Column
Sum
Max Row
Sum

17

Standard Induced
l-norms continued

Induced Matrix
Norms

= m ax
j

i =1

Why? Let x =

= m ax
i

Why? Let

[1

j =1

x =

= max abs column sum

ij

0
ij

[ 1

= max abs column sum

1]

Not So easy to compute

SMA-HPC 2003 MIT

As the algebra on the slide shows the relative changes in the solution x is bounded
by an A-dependent factor times the relative changes in A. The factor

|| A1 || || A ||
was historically referred to as the condition number of A, but that definition has
been abandoned as then the condition number is norm-dependent. Instead the
condition number of A is the ratio of singular values of A.
cond ( A) =

max( A)
min( A)

Singular values are outside the scope of this course, consider consulting Trefethen
& Bau.

18

Useful
Eigenproperties

Spectral Mapping
Theorem

Given a polynomial

f ( x ) = a0 + a1 x + + a p x p

Apply the polynomial to a matrix

f ( M ) = a0 + a1M + + a p M p
Then

spectrum ( f ( M ) ) = f ( spectrum ( M ) )

SMA-HPC 2003 MIT

19

Krylov Methods

u N
=
u1

eigenvectors of M

k (M )

u
1

u N

u
1

Convergence Analysis
Norm of matrix polynomials

k ( 1 )

u N

k ( N )
1

Cond(U)

u
1

k ( 1 )

u N

k ( N )

condition number of
M's eigenspace
SMA-HPC 2003 MIT

20

Krylov Methods
k ( 1 )

Convergence Analysis
Norm of matrix polynomials

= max x =1
k ( N )
2

( ) x
k

= max i k ( i )
k ( M )

cond (V ) max i k ( i )

SMA-HPC 2003 MIT

21

Krylov Methods

Convergence Analysis
Important Observations

1) A residual minimizing Krylov subspace algorithm


converges to the exact solution in at most n steps

Proof: Let n ( x ) = ( x 1 )( x 2 ) ... ( x n )

where i ( M ) . Then, max i n ( i ) = 0,

n ( M ) = 0 and therefore r n = 0

2) If M has only q distinct e-values, the residual


minimizing Krylov subspace algorithm converges
in at most q steps

Proof: Let q ( x ) = ( x 1 )( x 2 ) ... ( x q )

SMA-HPC 2003 MIT

22

Convergence for M =MT

Krylov Methods

Residual Polynomial

If M = MT then
1) M has orthonormal eigenvectors

cond (V ) =

u
1

u N

u
1

u N

=1

k ( M ) = max i k ( i )

2) M has real eigenvalues

If M is postive definite, then ( M ) > 0

SMA-HPC 2003 MIT

23

Residual Poly Picture for Heat Conducting Bar Matrix


No loss to air (n=10)

* = evals(M)
- = 5th order poly
- = 8th order poly

SMA-HPC 2003 MIT

24

Residual Poly Picture for Heat Conducting Bar Matrix


No loss to air (n=10)

Keep k ( i ) as small as possible:


Strategically place zeros of the poly
SMA-HPC 2003 MIT

25

Convergence for M =MT

Krylov Methods

Polynomial Min-Max Problem

Consider ( M ) [ min , max ] , min > 0

Then a good polynomial ( pk ( M ) is small)


can be found by solving the min-max problem

min kth order max x [min ,max ] pk ( x )


polys s . t .
pk ( 0 ) =1

The min-max problem is exactly


solved by Chebyshev Polynomials
SMA-HPC 2003 MIT

26

Convergence for M =MT

Krylov Methods

Chebyshev Solves Min-Max

The Chebyshev Polynomial

Ck ( x ) cos ( k cos 1 ( x ) ) x [ 1,1]

min kth order max x [min ,max ] k ( x )


polys s .t .
k ( 0 ) =1

max x [min ,max ]

x
Ck 1 + 2 min
max min

min
Ck 1 + 2
max min

SMA-HPC 2003 MIT

27

Chebyshev Polynomials minimizing over [1,10]

SMA-HPC 2003 MIT

28

Convergence for M =MT

Krylov Methods

Chebychev Bounds

min kth order max x [min ,max ] k ( x )


polys s .t .
k ( 0 ) =1

max
Ck 1 2
max min

max

min

2
max

+ 1

min

SMA-HPC 2003 MIT

29

Convergence for M =MT

Krylov Methods

Chebychev Result

If ( M ) [ min , max ] , min > 0


k

rk

max

min

r0
2
max

+ 1

min

SMA-HPC 2003 MIT

30

Preconditioning

Krylov Methods

1
0

0
1
0

Diagonal Example

1
0

0
2
0

For which problem will GCR Converge Faster?


SMA-HPC 2003 MIT

31

Preconditioning

Krylov Methods

Diagonal Preconditioners

Let M = D + M nd
(

Apply GCR to D 1M x = I + D 1M nd x = D 1b
The Inverse of a diagonal is cheap to compute
Usually improves convergence
SMA-HPC 2003 MIT

32

Heat Conducting
Bar example
x
x1

x2

x
100

xi

xi +1

Discretized system

one small x
xn

2 + 1
u1 f (x1)

1 2 +


1 1+ +100
100


100 1+ +100 1


1
1

u f (x )
1
2

n n

max
> 100
min

SMA-HPC 2003 MIT

33

Which Convergence Curve is GCR?

rk
r0

Iteration
SMA-HPC 2003 MIT

34

Heat Conducting
Bar example

Preconditioned Matrix
Eigenvalues

Residual Minimizing
Krylov-subspace
Algorithm can
eliminate outlying
eigenvalues by
placing polynomial
zeros directly on
them.

SMA-HPC 2003 MIT

35

The World According


to Krylov

Heat Flow Comparison Example

Dimension Dense GE Sparse GE

GCR

O ( m3 )

O (m)

O ( m2 )

O ( m6 )

O ( m3 )

O ( m3 )

O ( m9 )

O ( m6 )

O ( m4 )

GCR faster than banded GE in 2 and 3 dimensions


Could be faster, 3-D matrix only m3 nonzeros.
GCR converges too slowly!
SMA-HPC 2003 MIT

36

Preconditioning

Krylov Methods

Approximate LU
Preconditioners

Let M L U
Applying GCR to

((

LU

( )

M x = LU

Use an Implicit matrix representation!


Forming y =

(( LU )

M x is equivalent to

solving LUy = Mx
SMA-HPC 2003 MIT

37

Preconditioning

Krylov Methods

Approximate LU
Preconditioners Continued

Nonzeros in an exact LU Factorization


Filled-in LU factorization
Too expensive.

Ignore the fillin!

SMA-HPC 2003 MIT

38

Factoring 2-D Grid Matrices

Generated Fill-in Makes Factorization Expensive

SMA-HPC 2003 MIT

39

Preconditioning

Krylov Methods

Approximate LU
Preconditioners Continued

THROW AWAY FILL-INS!


Throw away all fill-ins
Throw away only fill-ins with small values
Throw away fill-ins produced by other fill-ins
Throw away fill-ins produced by fill-ins of
other fill-ins, etc.

40

Summary
Reminder about GCR
Residual minimizing solution
Krylov Subspace
Polynomial Connection

Review Norms and Eigenvalues


Induced Norms
Spectral mapping theorem

Estimating Convergence Rate


Chebychev Polynomials

Preconditioners
Diagonal Preconditioners
Approximate LU preconditioners
SMA-HPC 2003 MIT

41

Introduction to Simulation - Lecture 8


1-D Nonlinear Solution Methods

Jacob White

Thanks to Deepak Ramaswamy Jaime Peraire, Michal


Rewienski, and Karen Veroy

Outline
Nonlinear Problems
Struts and Circuit Example

Richardson and Linear Convergence


Simple Linear Example

Newtons Method

Derivation of Newton
Quadratic Convergence
Examples
Global Convergence
Convergence Checks

Nonlinear
problems
( x0 , y0 )

( x2 , y2 )

Strut Example

( x1 , y1 )

Given: x0, y0, x1, y1, W


Find: x2, y2

Load force
W

Need to Solve f x

W

Struts Example

Nonlinear Problems

Reminder: Strut Forces

L0  L
EAc
L0

f
fx
f

(0,0)
L

x1 , y1

fy

fx
X

fy
L

SMA-HPC 2003 MIT

H L0  L

x1
f
L
y1
f
L
2
1

2
1

x y

Nonlinear
problems

Strut Example

( x1 , y1 )

( x0 , y0 )

x2  x0  y2  y0

L1

x2  x1  y2  y1

L2

f1
( x2 , y2 )

x2  x0
H ( Lo  L1 )
L1
x2  x1
H ( Lo  L2 )
L2

f1x

f2

f2 x

Load force
W

1x

 f2 x

1y

 f2 y  W

Nonlinear
problems

Strut Example

Why Nonlinear?

y2  y1
H ( Lo  L2 ) 
L2
y2  y0
H ( Lo  L1 )  W
L1

Pull Hard on the


Struts

The strut forces change


in both magnitude and
direction

Nonlinear
problems

v1
10v

Circuit Example

v2

10

1
I r  Vr
10

+
- Vd

Vd

I d  I s (e
Need to Solve

Id  Ir
I vsrc  I r

0
0

Vt

 1)

Nonlinear
problems

Solve Iteratively

Hard to find analytical solution for

f ( x)

Solve iteratively
0
guess at a solution x
x0
repeat for k = 0, 1, 2, .

k 1

W x

k 1
f
x
|0
until

Ask
Does the iteration converge to correct solution ?
How fast does the iteration converge?

Richardson
Iteration

Definition

Richardson Iteration Definition

k 1

x  f (x )

An iteration stationary point is a solution

k 1

f ( xk )

xk

x* ( Solution)

Richardson
Iteration

Example 1

f ( x)
Start with

0.7 x  10
0

x1

x 0  f ( x 0 ) 10

x2

x1  f ( x1 ) 13

x 6 14.27

x3

x 2  f ( x 2 ) 13.9

x7

x4

x3  f ( x3 ) 14.17

x8 14.28

14.25
14.28
Converged

Richardson
Iteration

Example 1

f ( x)

x x

0.7 x  10

Richardson
Iteration

Example 2

f ( x)
Start with

2 x  10

x0

x1

x0  f ( x0 ) 10

x2

x1  f ( x1 )

x3

x2  f ( x2 ) 130

x4

x3  f ( x3 )

40
400

No convergence !

Richardson
Iteration

Convergence

Setup

Iteration Equation
Exact Solution

k 1

x  f (x )
*

x N
f (x )
0

Computing Differences

k 1

x

x  x  f (x )  f (x )
Need to Estimate

Richardson
Iteration

f (v )  f y

Convergence

Mean Value Theorem

wf v
v  y
wx

v > v, y @

v

Richardson
Iteration

Convergence

Use MVT

Iteration Equation
Exact Solution

k 1

x  f (x )
*

x N
f (x )
0

Computing Differences

k 1

x

x  x  f (x )  f (x )
wf x k
*
1 
x x
wx

Richardson
Iteration

If

1

And
Then
Or

Convergence

Richardson Theorem

wf x
wx

d J  1 for all x s.t. x  x  G

x x G

k 1

x dJ x x

lim k of x

k 1

x

lim k of J x  x

Linear Convergence

Richardson
Iteration

Example 1

f ( x)

x x

0.7 x  10

Richardson
Iteration

Problems

Convergence is only linear


x, f(x) not in the same units:
x is a voltage, f(x) a current in circuits
x is a displacement, f(x) a force in struts
Adding 2 different physical quantities
But a Simple Algorithm
Just calculate f(x) and update

Newtons method

Another approach

From the Taylor series about solution

df k *
f ( x )  f ( x )  ( x ) ( x  xk )
dx
*

Define iteration
Do k = 0 to .
1
df k
k 1
k
x
x  ( x ) f ( xk )
dx

df k
if ( x )
dx

until convergence

1

exists

Newtons Method

Graphically

Newtons Method

Example

Newtons Method

x x

Example

Newtons Method
0

f ( x* )

Convergence

2
df
d
f
k
k
k
*
f ( x )  ( x )( x  x )  2 ( x )( x*  x k ) 2
dx
dx
k
*

some x [ x , x ]

Mean Value theorem


truncates Taylor series

But

df k k 1 k
f ( x )  ( x )( x  x )
dx
k

by Newton
definition

Convergence

Newtons Method

Contd.
2

df k k 1 *
( x )( x  x )
Subtracting
dx
Dividing through

d f
k
* 2

x
x
x
(
)(

)
2
d x

2
df
d
f
k 1
k 1
k
*
* 2

( x  x ) [ ( x )]
(
)(

)
x
x
x
2
dx
d x

1

2
df
d
f

Suppose ( x)
( x) d L for all x
2
dx d x

then x

k 1

x dL x x

* 2

Convergence is quadratic if L is bounded

Convergence

Newtons Method

Example 1

f ( x) x 2  1 0,
df k
( x ) 2 xk
dx
k

2x (x

k 1

k 1

2x (x
or ( x

k 1

find x ( x* 1)

 x

x )

1
k

 x )  2x (x  x )
*

x )

1
k
* 2
(x  x )
k
2x

 x

 x
2

Convergence is quadratic

Convergence

Newtons Method

Example 2
2

f ( x) x 0, x 0
df
df k
Note :
not bounded
k
dx
(x ) 2 x
dx
away from zero
2
k
k 1
k
2 x ( x  0) ( x  0)
1 k
*
k 1
k
for x z x 0
x 0
x 0
2
1
*
*
( xk  x )
or ( xk 1  x )
2
1

Convergence is linear

Newtons Method

Convergence

Examples 1 , 2

Newtons Method

Convergence

1

2
df
d f
Suppose ( x)
( x) d L for all x
2
dx d x

if L x0  x* d J  1
then xk converges to x*

Proof

x1  x * d L ( x0  x * ) x0  x *
x1  x * d J x0  x *
x2  x * d LJ x0  x * x1  x *
or x2  x * d J 2 x1  x * d J 3 x0  x *
x3  x * d J 4 x 2  x * d J 7 x 0  x *

Newtons Method

Convergence

Theorem
df
d2 f
If L is bounded (
bounded away from zero ;
bounded)
2
dx
dx
then Newton's method is guaranteed to converge given a "close
enough" guess

Always converges ?

Newtons Method

Convergence

Example
Convergence Depends on a Good Initial Guess
f(x)

x1

x1
x

SMA-HPC 2003 MIT

Convergence

Newtons Method

Convergence Checks

Need a "delta-x" check to avoid false convergence


f(x)

k 1

k 1

 x ! H xa  H xr x

f x

SMA-HPC 2003 MIT

k 1

 H fa

k 1

Convergence

Newtons Method

Convergence Checks

Also need an "f x " check to avoid false convergence


f(x)

f x
x*

x
SMA-HPC 2003 MIT

 x  H xa  H xr x

! H fa
X

x k 1 x k
k 1

k 1

k 1

Summary
Nonlinear Problems
Struts and Circuit Example

Richardson and Linear Convergence


Simple Linear Example

1-D Newtons Method

Derivation of Newton
Quadratic Convergence
Examples
Global Convergence
Convergence Checks

Introduction to Simulation - Lecture 9


Multidimensional Newton Methods
Jacob White

Thanks to Deepak Ramaswamy, Jaime Peraire, Michal


Rewienski, and Karen Veroy

Outline
Quick Review of 1-D Newton
Convergence Testing

Multidimensonal Newton Method


Basic Algorithm
Description of the Jacobian.
Equation formulation.

Multidimensional Convergence Properties


Prove local convergence
Improving convergence

Newton Idea

1-D Reminder

( )

Problem: Find x such that f x = 0


*

Use a Taylor Series Expansion

0
*

( )

f x = f ( x) +

f ( x )
x

(x

x +

f ( x)

If x is close to the exact solution

f ( x )
x

SMA-HPC 2003 MIT

x* x f ( x )

(x

Newton Algorithm

1-D Reminder
x 0 = Initial Guess, k = 0

Repeat {

( )

f x k
x

(x

k +1

( )

xk = f xk

k = k +1

} Until ?
x k +1 x k < threshold ?
SMA-HPC 2003 MIT

f x k +1 < threshold ?

1-D Reminder

Newton Algorithm

Algorithm Picture

SMA-HPC 2003 MIT

Newton Algorithm

1-D Reminder

Convergence Checks

Need a "delta-x" check to avoid false convergence


f(x)

k +1

k +1

x > xa + xr x
k

f x

SMA-HPC 2003 MIT

k +1

< fa

k +1

Newton Algorithm

1-D Reminder

Convergence Checks

Also need an "f ( x ) " check to avoid false convergence


f(x)

f x
x*

x
SMA-HPC 2003 MIT

x < xa + xr x
k

> fa
X

x k +1 x k
k +1

k +1

k +1

Newton Algorithm

1-D Reminder

Local Convergence
Convergence Depends on a Good Initial Guess
f(x)

x1

x1
x

SMA-HPC 2003 MIT

Multidimensional
Newton Method
F

l= x +y
2

FL

(lo l )
F = EAc
= (lo l )
lo
x
x
f x = F = (lo l )
l
l
y
y
f y = F = (lo l )
l
l

SMA-HPC 2003 MIT

Example Problem

Strut and Joint

F (x) =

f x + FLx = 0
f y + FLy = 0

OR

x
(lo l ) + FLx = 0
l
y
(lo l ) + FLy = 0
l

Multidimensional
Newton Method

v1

i2 + v2b -

Example Problem

Nonlinear Resistors

v2

Nodal Analysis

i3
+
v3b

i1
+
v1b
-

Nonlinear
Resistors
i = g (v)

SMA-HPC 2003 MIT

At Node 1: i1 + i2 = 0

g ( v1 ) + g ( v1 v2 ) = 0

At Node 2: i3 i2 = 0
g ( v3 ) g ( v1 v2 ) = 0

Two coupled
nonlinear equations
in two unknowns

Multidimensional
Newton Method

General Setting

( )

Problem: Find x such that F x = 0


*

x
*

and F :

Use a Taylor Series Expansion

( )

F x = F ( x) + JF ( x)
*

Jacobian
Matrix
If x is close to the exact solution

J F ( x ) x* x F ( x )
SMA-HPC 2003 MIT

x* x + H .O.T .

Nodal Analysis

Multidimensional
Newton Method
x
*

Strut and Joint

and F :

x
(lo l ) + FLx = 0
l
y
(lo l ) + FLy = 0
l
SMA-HPC 2003 MIT

FL

? ?
JF ( x) =

? ?

Multidimensional
Newton Method

v1

i1

x*
i2

b
v
+2

v2

i3

v1b

v3b

SMA-HPC 2003 MIT

Nodal Analysis

Nonlinear Resistor

and F :

At Node 1: i1 + i2 = 0

F1 ( v ) = g ( v1 ) + g ( v1 v2 ) = 0
At Node 2: i3 i2 = 0
F2 ( v ) = g ( v3 ) g ( v1 v2 ) = 0

? ?
JF ( x) =

? ?

Multidimensional
Newton Method

Jacobian Matrix

J F ( x ) x F ( x + x ) F ( x )
F1 ( x )

x1
J F ( x ) x

FN ( x )
x
1

SMA-HPC 2003 MIT

F1 ( x )

xN x1

FN ( x ) xN
xN

Multidimensional
Newton Method

Jacobian Matrix

Singular Case

Suppose J F ( x ) is singular?
F1 ( x )

x1
J F ( x ) x =

FN ( x )
x
1

F1 ( x )

xN x1
=0

FN ( x ) xN
xN

What does it mean?


SMA-HPC 2003 MIT

Multidimensional
Newton Method

Newton Algorithm

x 0 = Initial Guess, k = 0

Repeat {

( ) ( )
( x )( x x ) = F ( x )

Compute F x k , J F x k
Solve J F

k +1

for x k +1

k = k +1

} Until

SMA-HPC 2003 MIT

x k +1 x k ,

f x k +1

small enough

Computing the Jacobian


and the Function

Multidimensional
Newton Method

Consider the contribution of one nonlinear resistor


Connected between nodes n1 and n2
b
i + vb i b = g vb
n1
n2

( )

Summing currents at Node n1: Fn1 ( v ) = g vn1 vn2 +

Summing currents at Node n 2 : Fn2 ( v ) = g vn1 vn2 +

Differenting at Node n1:

Fn1 ( v )
vn1

g vn1 vn2
vn1
g
v

SMA-HPC 2003 MIT

) +

Fn1 ( v )
vn2

g vn1 vn2
vn2

g
v

) +

Computing the Jacobian


and the Function

Multidimensional
Newton Method
i

Stamping a
Resistor

n1
n2

n1

n1

g vn1 vn2

n2

g vn1 vn2

g vn1 vn2
v

g vn1 vn2

SMA-HPC 2003 MIT

vb
+
-

v
JF (v )

n2

g vn1 vn2

g v v
n1
n2

F (v )

n1

n
2

Multidimensional More Complete Newton


Algorithm
Newton Method
x 0 = Initial Guess, k = 0
Repeat {

( )

( )

Compute F x k , J F x k

Zero J F and F
for each element
Compute element currents and derivatives
Sum currents to F , sum derivatives to J F

( )( x

Solve J F x k

k +1

( )

xk = F xk

for x k +1

k = k +1

} Until
SMA-HPC 2003 MIT

x k +1 x k ,

f x k +1

small enough

Multidimensional
Newtons Method

Example: Heat Flow in


leaky bar

T (1)

T (0)
T1
vs = T(0)

TN

+
-

+
-

What is the Jacobian?


SMA-HPC 2003 MIT

ih = k1T + k2T
vs = T(1)

Multidimensional
Newton Method

Multidimensional
Convergence Theorem

Theorem Statement

Main Theorem
If

( )

( Inverse is bounded )

a)

J F1 x k

b)

JF ( x) JF ( y)

x y

( Derivative is Lipschitz Cont )

Then Newtons method converges given a sufficiently


close initial guess

SMA-HPC 2003 MIT

Multidimensional
Newton Method
If J F ( x ) J F ( y )

x y

Multidimensional
Convergence Theorem

Key Lemma

( Derivative is Lipschitz Cont )

Then F ( x ) F ( y ) J F ( y )( x y )

x y

There is no multidimensional mean value theorem.

SMA-HPC 2003 MIT

Multidimensional
Convergence Theorem

Multidimensional
Newton Method

Theorem Proof

By definition of the Newton Iteration and the assumed


bound on the inverse of the Jacobian

( ) ( )

x k +1 x k = J F1 x k F x k

( )

F xk

Again applying the Newton iteration definition

( )

x k +1 x k F x k F x k 1 J F x k 1
0
Finally using the Lemma

x
SMA-HPC 2003 MIT

k +1

x
k

x x
k

k 1 2

)(

x k x k 1

Multidimensional
Convergence Theorem

Multidimensional
Newton Method

Theorem Proof Continued

Reorganizing the equation

k +1

k
k 1
x x x k x k 1
x
2

1
0
If
x x <1
2

x k +1 x k k x k +1 x k + x 0 converges
k =0

SMA-HPC 2003 MIT

Non-converging
Case

1-D Picture

f(x)

x1
x

Must Somehow Limit the changes in X


SMA-HPC 2003 MIT

Newton Method
with Limiting

Newton Algorithm

Newton Algorithm for Solving F ( x ) = 0


x = Initial Guess, k = 0
0

Repeat {

( ) ( )
( x ) x = F ( x )
+ limited ( x )

Compute F x k , J F x k
Solve J F
x k +1 = x k
k = k +1

} Until
SMA-HPC 2003 MIT

k +1

for x k +1

k +1

x k +1 , F x k +1

small enough

Newton Method
with Limiting

Limiting Methods

Direction Corrupting

limited x k +1 =
i

xik +1 if xik +1 <

limited x k +1

sign ( xik +1 ) otherwise

x k +1

NonCorrupting

limited x k +1 = x k +1

= min 1,
k +1

limited x

k +1

Heuristics, No Guarantee of Global Convergence


SMA-HPC 2003 MIT

x k +1

Damped Newton
Scheme

Newton Method
with Limiting
General Damping Scheme

( )

( )

Solve J F x k x k +1 = F x k

for x k +1

x k +1 = x k + k x k +1

Key Idea: Line Search

Pick to minimize F x + x
k

F x + x
k

k +1

2
2

k +1

F x + x
k

2
2
k +1

) F (x
T

+ k x k +1

Method Performs a one-dimensional search in


Newton Direction
SMA-HPC 2003 MIT

Damped Newton

Newton Method
with Limiting

Convergence Theorem

If

( )

( Inverse is bounded )

a)

J F1 x k

b)

JF ( x) JF ( y)

x y

( Derivative is Lipschitz Cont )

Then
There exists a set of k ' s ( 0,1] such that

F x k +1 = F x k + k x k +1

( )

< F xk

with <1

Every Step reduces F-- Global Convergence!


SMA-HPC 2003 MIT

Damped Newton
Newton Method
with Limiting
Nested Iteration
x 0 = Initial Guess, k = 0
Repeat {

( ) ( )
Solve J ( x ) x = F ( x ) for x
Find ( 0,1] such that F ( x + x )
Compute F x k , J F x k
k

k +1

k +1

x k +1 = x k + k x k +1
k = k +1

} Until

x k +1 , F x k +1

k +1

is minimized

small enough

How can one find the damping coefficients?


SMA-HPC 2003 MIT

Newton Method
with Limiting

Damped Newton

Singular Jacobian Problem

f(x)

x2
1

1
D

Damped Newton Methods push iterates to local minimums


Finds the points where Jacobian is Singular
SMA-HPC 2003 MIT

Summary
Quick Review of 1-D Newton
Convergence Testing

Multidimensonal Newton Method

Basic Algorithm
Description of the Jacobian.
Jacobian Construction.
Local Convergence Theorem

Damped Newton Method


Nested Algorithm with line search
Global convergence IF Jacobian nonsingular

Introduction to Simulation - Lecture 10


Modified Newton Methods
Jacob White

Thanks to Deepak Ramaswamy, Jaime Peraire, Michal


Rewienski, and Karen Veroy

Outline
Damped Newton Schemes
Globally Convergent if Jacobian is Nonsingular
Difficulty with Singular Jacobians

Introduce Continuation Schemes


Problem with Source/Load stepping
More General Continuation Scheme

Improving Continuation Efficiency


Better first guess for each continuation step

Arc Length Continuation


SMA-HPC 2003 MIT

Multidimensional
Newton Method

Newton Algorithm

Newton Algorithm for Solving F ( x ) = 0


x = Initial Guess, k = 0
0

Repeat {

( ) ( )
( x )( x x ) = F ( x )

Compute F x k , J F x k
Solve J F

k +1

for x k +1

k = k +1

} Until
SMA-HPC 2003 MIT

x k +1 x k , F x k +1

small enough

Multidimensional
Newton Method

Multidimensional
Convergence Theorem

Theorem Statement

Main Theorem
If

( )

( Inverse is bounded )

a)

J F1 x k

b)

JF ( x) JF ( y) A x y

( Derivative is Lipschitz Cont )

Then Newtons method converges given a sufficiently


close initial guess

SMA-HPC 2003 MIT

Multidimensional
Newton Method

Multidimensional
Convergence Theorem

Implications

If a functions first derivative never goes to zero, and its


second derivative is never too large
Then Newtons method can be used to find the zero
of the function provided you all ready know the
answer.
Need a way to develop Newton methods which
converge regardless of initial guess!

SMA-HPC 2003 MIT

Non-converging
Case

1-D Picture

f(x)

x1
x

Limiting the changes in X might improve convergence


SMA-HPC 2003 MIT

Newton Method
with Limiting

Newton Algorithm

Newton Algorithm for Solving F ( x ) = 0


x = Initial Guess, k = 0
0

Repeat {

( ) ( )
( x ) x = F ( x )
+ limited ( x )

Compute F x k , J F x k
Solve J F
x k +1 = x k
k = k +1

} Until
SMA-HPC 2003 MIT

k +1

for x k +1

k +1

x k +1 , F x k +1

small enough

Damped Newton
Scheme

Newton Method
with Limiting
General Damping Scheme

( )

( )

Solve J F x k x k +1 = F x k

for x k +1

x k +1 = x k + k x k +1

Key Idea: Line Search

Pick to minimize F x + x
k

F x + x
k

k +1

2
2

k +1

F x + x
k

2
2
k +1

) F (x
T

+ k x k +1

Method Performs a one-dimensional search in


Newton Direction
SMA-HPC 2003 MIT

Newton Method
with Limiting

Damped Newton

Convergence Theorem

If
a)

J F1 ( x k )

b)

JF ( x) JF ( y) A x y

( Inverse is bounded )
( Derivative is Lipschitz Cont )

Then
There exists a set of k ' s ( 0,1] such that

F ( x k +1 ) = F ( x k + k x k +1 ) < F ( x k ) with <1

Every Step reduces F-- Global Convergence!


SMA-HPC 2003 MIT

Damped Newton
Newton Method
with Limiting
Nested Iteration
x 0 = Initial Guess, k = 0
Repeat {

( ) ( )
Solve J ( x ) x = F ( x ) for x
Find ( 0,1] such that F ( x + x )
Compute F x k , J F x k
k

k +1

k +1

x k +1 = x k + k x k +1
k = k +1

} Until

SMA-HPC 2003 MIT

x k +1 , F x k +1

small enough

k +1

is minimized

Newton Method
with Limiting

v1
1v

v2

10

+
- Vd

Damped Newton

Example

1
I r Vr = 0
10
Vd

I d I s (e

Vt

1) = 0

Nodal Equations with Numerical Values

f ( v2 )

( v 0)
v2 1)
(
16
0.025
=
+ 10 (e
1) = 0
2

10

Newton Method
with Limiting

f ( v2 )

Damped Newton

Example cont.

( v 0)
v2 1)
(
16
0.025
=
+ 10 (e
1) = 0
2

10

Damped Newton
Newton Method
with Limiting
Nested Iteration
x 0 = Initial Guess, k = 0
Repeat {

( ) ( )
Solve J ( x ) x = F ( x ) for x
Find ( 0,1] such that F ( x + x )
Compute F x k , J F x k
k

k +1

k +1

x k +1 = x k + k x k +1
k = k +1

} Until

x k +1 , F x k +1

k +1

is minimized

small enough

How can one find the damping coefficients?


SMA-HPC 2003 MIT

Newton Method
with Limiting

Damped Newton

Theorem Proof

By definition of the Newton Iteration

k +1

=x -
k

( )
k

( )

JF x
F xk


Newton Direction

Multidimensional Mean Value Lemma

F ( x ) F ( y ) J F ( y )( x y )

A
x y
2

Combining

F x

k +1

) F (x )+ J (x )
k

SMA-HPC 2003 MIT

( )

k J x k
F

A
F x k J F xk
2

( )
k

( )

( )

F x

Newton Method
with Limiting

Damped Newton

Theorem Proof-Cont

From the previous slide

F x

k +1

) F (x )+ J (x )
k

( )

J x
F

A k

F x
J F xk
2

( )

( )

Combining terms and moving scalars out of norms

F x

k +1

) (1 ) F ( x ) ( )
k

A
J F xk
2

( )

( )

F x

Using the Jacobian Bound and splitting the norm


2
2
2 A

k +1
k
k
k
k
F ( x ) (1 ) F ( x ) + ( )
F (x )

Yields a quadratic in the damping coefficient


SMA-HPC 2003 MIT

( )

F x

Newton Method
with Limiting

Damped Newton

Theorem Proof-Cont-II

Simplifying quadratic from previous slide

F x

k +1

1 k + k

( )

2A
2

( )

F x

k
F
x

( )

Two Cases:
1)

2A
2

( )

F xk

<

1
2

Pick k = 1 (Standard Newton)

2
2 A

k
k
1 +
F xk
2

2A
1
1
2)
Pick k = 2
F xk >
2
2
A F

( )

( )

( )

1 k + k

( )

SMA-HPC 2003 MIT

2A
2

(x )

( )

F xk

1
<
2
k

1
1
<

2 2A F x k

( )

Newton Method
with Limiting

Damped Newton

Theorem Proof-Cont-III

Combining the results from the previous slide

( )

F x k +1 k F x k

not good enough, need independent from k

The above result does imply

( )

F x k +1 F x 0

not yet a convergence theorem

A
1
For the case where
F ( xk ) >

( )

2 2A F x k

( )

2 2A F x0

Note the proof technique


First Show that the iterates do not increase
Second Use the non-increasing fact to prove convergence
SMA-HPC 2003 MIT

Damped Newton
Newton Method
with Limiting
Nested Iteration
x 0 = Initial Guess, k = 0
Repeat {

( ) ( )
Solve J ( x ) x = F ( x ) for x
Find ( 0,1] such that F ( x + x )
Compute F x k , J F x k
k

k +1

k +1

x k +1 = x k + k x k +1
k = k +1

} Until

x k +1 , F x k +1

k +1

is minimized

small enough

Many approaches to finding


SMA-HPC 2003 MIT

Newton Method
with Limiting

Damped Newton

Singular Jacobian Problem

f(x)

x2
1

1
D

Damped Newton Methods push iterates to local minimums


Finds the points where Jacobian is Singular
SMA-HPC 2003 MIT

Continuation Schemes

Source or Load-Stepping

Newton converges given a close initial guess

Basic Concepts

Generate a sequence of problems


Make sure previous problem generates guess for next problem

Heat-conducting bar example

1. Start with heat off, T= 0 is a very close initial guess


2. Increase the heat slightly, T=0 is a good initial guess
3. Increase heat again
SMA-HPC 2003 MIT

Continuation Schemes

Basic Concepts

General Setting

Solve F ( x ( ) , ) = 0 where:
a) F ( x ( 0 ) , 0 ) = 0 is easy to solve Starts the continuation
b) F ( x (1) ,1) = F ( x )

Ends the continuation

c) x ( ) is sufficiently smooth Hard to insure!


x ( )

Dissallowed
0
SMA-HPC 2003 MIT

Continuation Schemes

Basic Concepts

Template Algorithm

Solve F ( x ( 0 ) , 0 ) , x ( prev ) = x ( 0 )
=0.01, =

While < 1 {

x 0 ( ) = x ( prev )
Try to Solve F ( x ( ) , ) = 0 with Newton

If Newton Converged
x ( prev ) = x ( ) , = + , = 2
Else
1
= , = prev +
2

SMA-HPC 2003 MIT

Basic Concepts

Continuation Schemes

R
Vs

+
-

v
Diode

Source/Load Stepping Examples

1
f ( v ( ) , ) = idiode ( v ) + ( v Vs ) = 0
R
f ( v, )
v

fL

idiode ( v )

G

F ( x, ) =

1
+
Not dependent!
R

f x ( x, y ) = 0
f y ( x, y ) + f l = 0

Source/Load Stepping Does Not Alter Jacobian


SMA-HPC 2003 MIT

Jacobian Altering Scheme

Continuation Schemes

Description

F ( x ( ) , ) = F ( x ( ) ) + (1 ) x ( )
Observations
=0 F ( x ( 0 ) , 0 ) = x ( 0 ) = 0
F ( x ( 0 ) , 0 )
x

=I

=1 F ( x (1) ,1) = F ( x (1) )


F ( x ( 0 ) , 0 )
x
SMA-HPC 2003 MIT

F ( x (1) )
x

Problem is easy to solve and


Jacobian definitely nonsingular.

Back to the original problem


and original Jacobian

Continuation Schemes

Jacobian Altering Scheme

Basic Algorithm

Solve F ( x ( 0 ) , 0 ) , x ( prev ) = x ( 0 )
=0.01, =

While < 1 {

x 0 ( ) = x ( prev ) + ?
Try to Solve F ( x ( ) , ) = 0 with Newton

If Newton Converged
x ( prev ) = x ( ) , = + , = 2
Else
1
= , = prev +
2

SMA-HPC 2003 MIT

Jacobian Altering Scheme

Continuation Schemes

Initial Guess for each step.

x()
x ( + )

Initial Guess Error

x0 ( + ) = x ( )
0

SMA-HPC 2003 MIT

Jacobian Altering Scheme

Continuation Schemes

Update Improvement

F ( x ( + ) , + ) F ( x ( 0) , ) +
F ( x ( ) , )
x

( x ( + ) x ( ) )

F ( x ( ) , )

F ( x ( ) , )
x
Have From last
steps Newton

SMA-HPC 2003 MIT

x 0 ( + ) x ( ) =
Better Guess
for next steps
Newton

F ( x ( ) , )

Continuation Schemes

Jacobian Altering Scheme

Update Improvement Cont.

If
F ( x ( ) , ) = F ( x ( ) ) + (1 ) x ( )
Then

F ( x, )
= F ( x) x ( )

Easily Computed
SMA-HPC 2003 MIT

Jacobian Altering Scheme

Continuation Schemes

Update Improvement Cont. II.


 ( x ( ) , )

x 0 ( + ) = x ( )

F ( x ( ) , )

Graphically

x()
x0 ( + )
0

SMA-HPC 2003 MIT

Continuation Schemes

Jacobian Altering Scheme

Still can have problems

x()

Must switch back to


increasing lambda

Arc-length
steps

1
lambda steps

SMA-HPC 2003 MIT

Must switch from


increasing to
decreasing lambda

Continuation Schemes

Jacobian Altering Scheme

Arc-length Steps?

x()

arc-length

Arc-length
steps

( x ) + ( )
2

Must Solve For Lambda


F ( x, ) = 0
2

+
x

arc
=0
( prev )
( prev )
2

2
2

SMA-HPC 2003 MIT

Jacobian Altering Scheme

Continuation Schemes

Arc-length steps by Newton

F x k , k

k
2
x
x ( prev )

k
prev

SMA-HPC 2003 MIT

k
k

F x ,
x k +1 x k

=
k +1
k

k
2 prev

)
x ( )

k
k

F x ,

+ x

prev

2
2

arc 2

Jacobian Altering Scheme

Continuation Schemes

Arc-length Turning point

x( )

What happens here?


0

Upper left-hand
Block is singular

SMA-HPC 2003 MIT

F x k , k

k
x ( prev )
2
x

F x k , k

k
2 prev

Summary
Damped Newton Schemes
Globally Convergent if Jacobian is Nonsingular
Difficulty with Singular Jacobians

Introduce Continuation Schemes


Problem with Source/Load stepping
More General Continuation Scheme

Improving Efficiency
Better first guess for each continuation step

Arc-length Continuation
SMA-HPC 2003 MIT

Introduction to Simulation - Lecture 11


Newton-Method Case Study Simulating
An Image Smoother
Jacob White

Thanks to Deepak Ramaswamy, Andrew Lumsdaine,


Jaime Peraire, Michal Rewienski, and Karen Veroy

Outline
Image Segmentation Example
Large nonlinear system of equations
Formulation? Continuation? Linear Solver?

Newton Iterative Methods


Accuracy Theorem
Matrix-free idea

Gershgorin Circle Theorem


Lends insight on iterative method convergence

Arc-Length Continuation
SMA-HPC 2003 MIT

Simple Smoother

Smoothed
Output

SMA-HPC 2003 MIT

Circuit Diagram

Image
Input

Nonlinear
Smoother

SMA-HPC 2003 MIT

Circuit Diagram

Nonlinear
Smoother

i v

SMA-HPC 2003 MIT

Nonlinear Resistor
Constitutive Equation

1 e

Dv
 E J D v
2

Nonlinear
Smoother

Nonlinear Resistor
Constitutive Equation

Current

Voltage
SMA-HPC 2003 MIT

Nonlinear
Smoother

SMA-HPC 2003 MIT

Nonlinear Resistor
Constitutive Equation

Varying Beta

Questions
What Equation Formulation?
Node-Branch or Nodal?

What Newton Method?


Standard, Damped, or Continuation?
What kind of Continuation?

What Linear Solver?


Sparse Gaussian Elimination or Krylov?
Will Krylov converge rapidly?
How will formulation, Newton choices
interact?
SMA-HPC 2003 MIT

Basic Algorithm
Newton-Iterative
Method
Nested Iteration
x 0 = Initial Guess, k 0
Repeat {

Compute F x k , J F x k

Solve (Using GCR)


J F x k 'x k 1  F x k

x k 1
k

for 'x k 1

x k  'x k 1
k 1

} Until

'x k 1 , F x k 1

small enough

How Accurately Should We Solve with GCR?


SMA-HPC 2003 MIT

Newton-Iterative
Method

Basic Algorithm

Solve Accuracy Required

After l steps of GCR

k 1,l
'
x


Newton
delta from
l GCR Steps

dE

J F xk

If
a)

b)

c)

J F1 x k

 F xk 

k ,l
rN
GCR
Residual

Inverse is bounded
JF x  JF y d A x  y
Derivative is Lipschitz Cont
2
k ,l
k
r d C F x
More accurate near convergence

Then
The Newton-Iterative Method Converges Quadratically
SMA-HPC 2003 MIT

Newton-Iterative
Method

Basic Algorithm

Convergence Proof

By definition of the Newton-Iterative Method


1

JF x
F x k  r k ,l


Approximate Newton Direction


Multidimensional Mean Value Lemma
A
2
F x  F y  J F y x  y d x  y
2
x

k 1

x 

Combining

F x

k 1

 F x  J x
k

SMA-HPC 2003 MIT

1

F x  r

J x
F

k ,l

d A J xk
2 F

1

F x  r
k

k ,l

Newton-Iterative
Method

Basic Algorithm

Convergence Proof Cont.

Canceling the Jacobian and its inverse on the previous slide

F x

k 1

 F x  F x  r
k

k ,l

A
d J F xk
2

1

F x  r
k

Combining terms and using the triangle inequality

F x k 1

A
d J F xk
2

1

F x  r
k

k ,l

 r k ,l

Using the Jacobian Bound and the triangle inequality

F x

SMA-HPC 2003 MIT

k 1

E 2A
2

F x

E 2 A r k ,l
 1 

r k ,l

k ,l

Newton-Iterative
Method

Basic Algorithm
Convergence Proof Cont. II

Using the bound on the iterative solver error

F x

k 1

E A
2

F x

E 2A F x k
2
 1 
2

C F xk

And combining terms yields

F x

k 1

SMA-HPC 2003 MIT

2
2
2
k

E
F
x
A
E A
C F xk
d
 1 

2
2



Easily Bounded

Newton-Iterative
Method

Matrix-Free Idea

Consider Applying GCR to The Newton Iterate Equation

J F x k 'x k 1

 F xk

At each iteration GCR forms a matrix-vector product

JF x

p |

F x k  H pl  F x k

It is possible to use Newton-GCR without Jacobians!

Need to Select a good H


SMA-HPC 2003 MIT

Gerschgorin Circle
Theorem

Given a matrix

Theorem Statement

m1,1 " m1, N

%
#
#
mN ,1 " mN , N

For each eigenvalue of M there exists an i, 1< i < N such that

O  mi ,i d mi , j
j zi

We say that the eigenvalues are contained in the


union of the Gerschgorin circles
SMA-HPC 2003 MIT

Gerschgorin Circle
Theorem

Theorem Statement
Picture of Gerschgorin

Im O

ith circle
radius

Eigenvalues are in the


union of all the disks

i, j

j zi

Re O

ith circle
center
SMA-HPC 2003 MIT

mi ,i

Gerschgorin Circle
Theorem
1

N
SMA-HPC 2003 MIT

Grounded Resistor Line

Nodal Matix

2.1 1

1 2.1

% 1

1 2.1


N 1

Nodal
Equation
Form

Gerschgorin Circle
Theorem
1

SMA-HPC 2003 MIT

Resistor Line
Nodal Matix

N 1

2 1

1 2

Nodal

% 1 Equation

Form
1 2


Basic Concepts

Continuation Schemes

General Setting

Solve F x O , O 0 where:
a) F x 0 , 0 0 is easy to solve
b) F x 1 ,1

F x

Starts the continuation

Ends the continuation

c) x O is sufficiently smooth

Hard to insure!

x O

Dissallowed
0
SMA-HPC 2003 MIT

Continuation Schemes

Solve F x 0 , 0 , x O prev
GO =0.01, O GO

Basic Concepts

Template Algorithm
x 0

While O  1 {
x 0 O x O prev
Try to Solve F x O , O 0 with Newton

If Newton Converged
x O prev x O , O O  GO , GO
Else
1
GO
GO , O O prev  GO
2

SMA-HPC 2003 MIT

2GO

Jacobian Altering Scheme

Continuation Schemes

Description

F x O , O O F x O  1  O x O
Observations
O =0 F x 0 , 0 x 0 0
wF x 0 , 0
wx

O =1 F x 1 ,1 F x 1
wF x 0 , 0

wF x 1

wx

wx

SMA-HPC 2003 MIT

Problem is easy to solve and


Jacobian definitely nonsingular.

Back to the original problem


and original Jacobian

Continuation Schemes

Solve F x 0 , 0 , x O prev
GO =0.01, O GO

Jacobian Altering Scheme

Basic Algorithm
x 0

While O  1 {
x 0 O x O prev  ?
Try to Solve F x O , O 0 with Newton

If Newton Converged
x O prev x O , O O  GO , GO
Else
1
GO
GO , O O prev  GO
2

SMA-HPC 2003 MIT

2GO

Continuation Schemes

Jacobian Altering Scheme

Still can have problems

x O
Must switch back to
increasing lambda

Arc-length
steps

1
lambda steps

SMA-HPC 2003 MIT

Must switch from


increasing to
decreasing lambda

Continuation Schemes

Jacobian Altering Scheme

Arc-length Steps?

x O

arc-length |

Arc-length
steps

'x  GO

Must Solve For Lambda


F x, O 0
2

2
O

O

x

x
O

arc
prev
prev
2

SMA-HPC 2003 MIT

Jacobian Altering Scheme

Continuation Schemes

Arc-length steps by Newton

wF x k , O k

wx

k
2
x
 x O prev


Ok  O
prev

SMA-HPC 2003 MIT

k
k

wF x , O
x k 1  x k
wO
k 1
k
O  O
k
2 O  O prev


 x O

k
k

F x ,O

 x

prev

2
2

 arc 2

Jacobian Altering Scheme

Continuation Schemes

Arc-length Turning point

x O

What happens here?


0

Upper left-hand
Block is singular

SMA-HPC 2003 MIT

wF x k , O k

wx

k
 x O prev
2
x

wF x k , O k

wO

k
2 O  O prev

Summary
Image Segmentation Example
Large nonlinear system of equations
Examined issues in selecting numerical
methods

Newton Iterative Methods


Do not need to solve iteration equations exactly

Gershgorin Circle Theorem


Sometimes gives useful bounds on eigenvalues

Arc-Length Continuation
SMA-HPC 2003 MIT

Introduction to Simulation - Lecture 12


Methods for Ordinary Differential Equations
Jacob White

Thanks to Deepak Ramaswamy, Jaime Peraire, Michal


Rewienski, and Karen Veroy

Outline
Initial Value problem examples
Signal propagation (circuits with capacitors).
Space frame dynamics (struts and masses).
Chemical reaction dynamics.
Investigate the simple finite-difference methods
Forward-Euler, Backward-Euler, Trap Rule.
Look at the approximations and algorithms
Examine properties experimentally.
Analyze Convergence for Forward-Euler

Application
Problems

Signal Transmission in an
Integrated Circuit

Signal Wire
Wire has resistance
Wire and ground plane form a capacitor

Logic
Gate
Ground Plane
Metal Wires carry signals from gate to gate.
How long is the signal delayed?
SMA-HPC 2003 MIT

Logic
Gate

Application
Problems

Signal Transmission in an
Integrated Circuit
Circuit Model

resistor

capacitor

Constructing the Model


Cut the wire into sections.
Model wire resistance with resistors.
Model wire-plane capacitance with capacitors.
SMA-HPC 2003 MIT

Application
Problems

Oscillations in a Space
Frame

What is the oscillation amplitude?


SMA-HPC 2003 MIT

Application
Problems

Oscillations in a Space
Frame
Simplified Structure

Bolts
Struts

Ground

Example Simplified for Illustration

Load

Application
Problems

Oscillations in a Space
Frame
Modeling with Struts, Joints and
Point Masses

Point Mass

Strut

Constructing the Model


Replace Metal Beams with Struts.
Replace cargo with point mass.

Application
Problems

Chemical Reaction
Dynamics

Crucible
Reagent
Strange green
stuff
How fast is product produced?
Does it explode?
SMA-HPC 2003 MIT

Signal Transmission in an
Integrated Circuit

Application
Problems

v1
iC1
C1

iR1

A 2x2 Example

v2

iR2
R2
R1 R3

iR3

iC2
C2

Constitutive
Equations

dv
ic = C c
dt
1
iR = vR
R

Conservation
Laws

iC1 + iR1 + iR2 = 0


iC2 + iR3 iR2 = 0

Nodal Equations Yields 2x2 System


C1
0

SMA-HPC 2003 MIT

1
1
dv1
+

R1 R2
0 dt

C2 dv2
1

R
dt
2

R2 v1

1
1 v2
+
R3 R2

Application
Problems

Signal Transmission in an
Integrated Circuit
A 2x2 Example

Let C1 = C2 = 1, R1 = R3 = 10, R2 = 1
C1
0

1
1
dv1
+
R R
0 dt
1
2

C2 dv2
1

R
dt
2

dx 1.1 1.0
=
x

dt 1.0 1.1


1
R2 v1

1
1 v2
+
R3 R2

Eigenvalues and Eigenvectors


1 1 0.1 0 1 1
A=

2.1 1 1
1 1 0
eigenvectors
SMA-HPC 2003 MIT

Eigenvalues

An Aside on Eigenanalysis
dx(t )
Consider an ODE:
= Ax(t ),
dt
Eigendecomposition:

x(0) = x0

#
# 1 0 0 #
#
A = E1 E2 En 0 % 0 E1
#
#
# 0 0 n #


#
E2
#

Change of variables: Ey (t ) = x (t ) y (t ) = E 1 x (t )
dEy (t )
= AEy (t ), Ey (0) = x0
Substituting:
dt
1 0 0

1 dy (t )
1
Multiply by E :
= E AEy (t ) = 0 % 0 y(t )
0 0
dt
n

SMA-HPC 2003 MIT

#
En
#

An Aside on Eigenanalysis Continued


From last slide:

1 0 0
dy(t ) = 0 % 0
dt 0 0
n

y(t )

Decoupled
Equations!

dyi (t )
i t
= i yi (t ) yi (t ) = e y (0)
Decoupling:
dt
dx(t )
Steps for solving
= Ax(t ), x(0) = x0
dt
1) Determine E ,
1t
1

e
0
0
2) Compute y (0) = E x0

3) Compute y (t ) = 0 % 0 y (0)
0 0 ent

4) x (t ) = Ey (t )
SMA-HPC 2003 MIT

Application
Problems

Signal Transmission in an
Integrated Circuit
A 2x2 Example

v1 (0) = 1

v2 (0) = 0

Notice two time scale behavior


v1 and v2 come together quickly (fast eigenmode).
v1 and v2 decay to zero slowly (slow eigenmode).
SMA-HPC 2003 MIT

Application
Problems
y = y0 + u

fs
fm

Struts, Joints and point


mass example
A 2x2 Example
Constitutive
Equations

d 2u
fm = M 2
dt

Conservation
Law

fs + fm = 0

y y0 EAc
fs = EAc
u
=
y0
y0

Define v as velocity (du/dt) to yield a 2x2 System


M
0

SMA-HPC 2003 MIT

dv
EAc

0
0 dt
v

y0
=

u
1 du
0
dt 1

Application
Problems

Struts, Joints and point


mass example
A 2x2 Example

EAc
=1
Let M = 1,
y0
M
0

dv
EAc

0
0 dt
v

y
=

0
u
1 du
0
dt 1

dx 0 1.0
=
x

0
dt 1.0


Eigenvalues and Eigenvectors


1 1 i 0 1 1
A=

i i 0 i i i
eigenvectors
SMA-HPC 2003 MIT

Eigenvalues

Application
Problems

Struts, Joints and point


mass example
A 2x2 Example

v (0) = 1
0.5
0

u (0) = 0
-0.5
-1

10

Note the system has imaginary eigenvalues


Persistent Oscillation
Velocity, v, peaks when displacement, u, is zero.
SMA-HPC 2003 MIT

15

Application
Problems

Chemical Reaction
Example
A 2x2 Example

Amount of reactant = R, the temperature = T


dT
= T + R
dt
dR
= R + 4T
dt

SMA-HPC 2003 MIT

More reactant causes temperature to rise,


higher temperatures increases heat dissipation
causing temperature to fall
Higher temperatures raises reaction rates,
increased reactant interferes with reaction
and slows rate.

Application
Problems

Chemical Reaction
Example
A 2x2 Example

dT
dt 1 1 T
=
R

dR
4
1


dt

dx 1 1
=
x

dt 4 1


Eigenvalues and Eigenvectors


1 1 1 0 1 1
A=
0 3 2 2
2
2

eigenvectors
SMA-HPC 2003 MIT

Eigenvalues

Chemical Reaction
Example

Application
Problems

A 2x2 Example

12
10
8
6

R (0) = 0

T (0) = 1

2
0
0

0.5

1.5

Note the system has a positive eigenvalue


Solutions grow exponentially with time.
SMA-HPC 2003 MIT

2.5

Basic Concepts

Finite Difference
Methods

t t

First - Discretize Time

t1

Second - Represent x(t) using values at ti


3
x 2

4
x

1
x
x
0

t1

t2

Third - Approximate

t3

tL
d
the
)
x(tlusing
dt
l
l 1

d
x x
Example:
x (tl ) 
dt
tl
SMA-HPC 2003 MIT

t L 1 t L = T

t2
l

x  x(tl )

Approx.
soln

discrete

x l +1 x l
or
tl +1

Exact
soln

x l 's

Finite Difference
Methods
x

slope =

d
x(tl )
dt

slope =

tl

tl +1

x(tl +1 ) x(tl )
t

Basic Concepts
Forward Euler Approximation

x(tl +1 ) x(tl )
d
x(tl ) = A x(tl )
t
dt
or
x(tl +1 ) x(tl ) + t A x(tl )

= x(tl +1 ) ( x(tl ) + t A x(tl ) )

SMA-HPC 2003 MIT

Finite Difference
Methods

Basic Concepts
Forward Euler Algorithm

x(t1 )  x1 = x(0) + tAx ( 0 )

x(t2 )  x 2 = x1 + tAx1
#
x(t L )  x L = x L 1 + tAx L 1

SMA-HPC 2003 MIT

tAx1

tAx (0)

t1

t2

t3

Basic Concepts

Finite Difference
Methods
x

slope =

d
x(tl +1 )
dt

slope =

tl

tl +1

x(tl +1 ) x(tl )
t

Backward Euler
Approximation

x(tl +1 ) x(tl )
d
x(tl +1 ) = A x(tl +1 )
t
dt
or
x(tl +1 ) x(tl ) + t A x(tl +1 )

= x(tl +1 ) ( x(tl ) + t A x(tl +1 ) )

SMA-HPC 2003 MIT

Finite Difference
Methods

Basic Concepts
Backward Euler Algorithm

Solve with Gaussian Elimination

x(t1 )  x1 = x(0) + tAx1


[ I tA] x1 = x(0)

x(t2 )  x 2 = [ I tA] 1 x1
#

x(t L )  x L = [ I tA] 1 x L 1

SMA-HPC 2003 MIT

tAx 2

tAx1

t1

t2

Finite Difference
Methods
1 d
d
( x(tl +1 ) + x(tl ))
2 dt
dt
1
= ( Ax(tl +1 ) + Ax(tl ))
2
x(tl +1 ) x(tl )

t
1
x(tl +1 )  x(tl ) + tA( x(tl +1 ) + x(tl ))
2

Basic Concepts
Trapezoidal Rule
slope =

d
x(tl )
d
dt
x(tl +1 )
slope =
dt

slope =

x (tl +1 ) x (tl )
t

1
1
= ( x(tl +1 ) tAx(tl )) ( x(tl ) + tAx(tl +1 ))
2
2
SMA-HPC 2003 MIT

Basic Concepts

Finite Difference
Methods

Trapezoidal Rule Algorithm

Solve with Gaussian Elimination

t
x (t1 )  x = x (0) + ( Ax (0) + Ax1 )
2
t 1
t

I
A x = I +
A x (0)
2
2

2
x (t2 )  x = I
2

x (tL )  x = I

SMA-HPC 2003 MIT

t

A I +
2

1

t

A I +
2

A x1

A x L1

t1

t2

Finite Difference
Methods

Basic Concepts
Numerical Integration View

tl +1
d
x (t ) = A x (t ) x (tl +1 ) = x (tl ) + Ax ( )d
tl
dt
tl +1
t
tl Ax( )d 2 ( Ax(tl ) + Ax(tl ) ) Trap
tAx (tl +1 ) BE

tAx (tl )

SMA-HPC 2003 MIT

tl

tl +1

FE

Finite Difference
Methods

Basic Concepts
Summary

Trap Rule, Forward-Euler, Backward-Euler


Are all one-step methods
x l is computed using only x l 1 , not x l 2 , x l 3 , etc.
Forward-Euler is simplest
No equation solution
explicit method.
Boxcar approximation to integral
Backward-Euler is more expensive
Equation solution each step
implicit method
Trapezoidal Rule might be more accurate
Equation solution each step
implicit method
Trapezoidal approximation to integral
SMA-HPC 2003 MIT

Finite Difference
Methods

Numerical Experiments
Unstable Reaction

t = 0.1

3.5
3

Exact Solution

Backward-Euler

2.5
2

Trap rule

1.5

Forward-Euler

1
0.5

0.5

1.5

FE and BE results have larger errors than Trap Rule,


and the errors grow with time.
SMA-HPC 2003 MIT

Numerical Experiments

Finite Difference
Methods

Unstable Reaction-Error Plots

0.4

-0.05

0.35

Backward
Euler

0.3

-0.1

x 10

Trap
Rule

0.25

0.2

0.15

0.1

0.05

-1

-3

-0.15

Forward
Euler

-0.2
-0.25
-0.3
-0.35

-2

All methods have errors which grow exponentially


SMA-HPC 2003 MIT

Numerical Experiments

Finite Difference
Methods
10

M
a
x
E
r
r
o
r

10
10
10
10
10

Unstable Reaction-Convergence

Backward-Euler

-2

-4

Trap rule

Forward-Euler

-6

-8

10

-3

10

-2

Timestep

10

-1

10

For FE and BE, Error t For Trap, Error ( t )


SMA-HPC 2003 MIT

Finite Difference
Methods
4
2

Numerical Experiments
Oscillating Strut and Mass

Forward-Euler

t = 0.1

0
-2
-4
-6

Trap rule
0

10

Backward-Euler
15

20

25

30

Why does FE result grow, BE result decay and the


Trap rule preserve oscillations
SMA-HPC 2003 MIT

Finite Difference
Methods

Numerical Experiments
Two timescale RC Circuit

small t
Backward-Euler
Computed Solution

large t

With Backward-Euler it is easy to use small timesteps for


the fast dynamics and then switch to large timesteps for the
slow decay
SMA-HPC 2003 MIT

Finite Difference
Methods

Numerical Experiments
Two timescale RC Circuit

Forward-Euler Computed
Solution

The Forward-Euler is accurate for small timesteps, but goes


unstable when the timestep is enlarged
SMA-HPC 2003 MIT

Finite Difference
Methods
Convergence

Numerical Experiments
Summary

Did the computed solution approach the exact solution?


Why did the trap rule approach faster than BE or FE?

Energy Preservation
Why did BE produce a decaying oscillation?
Why did FE produce a growing oscillation?
Why did trap rule maintain oscillation amplitude?

Two timeconstant (stiff) problems


Why did FE go unstable when the timestep increased?
We will focus on convergence today
SMA-HPC 2003 MIT

Finite Difference
Methods

Convergence Analysis
Convergence Definition

Definition: A finite-difference method for solving initial


value problems on [0,T] is said to be convergent if
given any A and any initial condition

max

x x ( l t ) 0 as t 0
l

T
l0,
t

x l computed with t
t
l
x computed with
2

xexact

SMA-HPC 2003 MIT

Convergence Analysis

Finite Difference
Methods

Order-p convergence

Definition: A finite-difference method for solving initial


value problems on [0,T] is said to be order p convergent
if given any A and any initial condition

max

x x ( l t ) C ( t )
l

T
l0,
t

for all t less than a given t0


Forward- and Backward-Euler are order 1 convergent
Trapezoidal Rule is order 2 convergent
SMA-HPC 2003 MIT

Finite Difference
Methods

Convergence Analysis
Two Conditions for
Convergence

1) Local Condition: One step errors are small


(consistency)
Typically verified using Taylor Series
2) Global Condition: The single step errors do not grow
too quickly (stability)
All one-step methods are stable in this sense.

SMA-HPC 2003 MIT

Finite Difference
Methods

Convergence Analysis
Consistency Definition

Definition: A one-step method for solving initial value


problems on an interval [0,T] is said to be consistent if
for any A and any initial condition

x x ( t )
1

SMA-HPC 2003 MIT

0 as t 0

Convergence Analysis

Finite Difference
Methods

Consistency for Forward Euler

Forward-Euler definition
1
x = x ( 0 ) + tAx ( 0 )
Expanding in t about zero yields

[0 , t ]

dx ( 0 ) ( t ) d 2 x ( )
x(t ) = x ( 0 ) + t
+
dt
2
dt 2
2

d
Noting that
x(0) = Ax(0) and subtracting
dt
2
2
Proves the theorem if
( t ) d x ( ) derivatives of x are
1
x x ( t )

SMA-HPC 2003 MIT

dt

bounded

Finite Difference
Methods

Convergence Analysis
Convergence Analysis for
Forward Euler

Forward-Euler definition
l +1
l
l
x = x + tAx
Expanding in t about l t yields
x ( ( l + 1) t ) = x ( l t ) + tAx ( l t ) + el
l

where e is the "one-step" error bounded by


e C ( t ) , where C = 0.5 max [0,T ]
l

SMA-HPC 2003 MIT

d x ( )
2
dt
2

Finite Difference
Methods

Convergence Analysis
Convergence Analysis for
Forward Euler Continued

Subtracting the previous slide equations


x

l +1

x ( ( l + 1) t ) = ( I + tA ) ( x x ( l t ) ) + e
l

Define the "Global" error E x x ( l t )


l

l +1

= ( I + tA ) E + e
l

Taking norms and using the bound on e


E

l +1

( I + tA ) E + C ( t )
l

(1 + t A
SMA-HPC 2003 MIT

E + C ( t )
l

Convergence Analysis

Finite Difference
Methods

A helpful bound on difference


equations

A lemma bounding difference equation solutions

If

l +1

Then

(1 + ) u + b, u = 0, > 0
l
e
l
u
b

To prove, first write u as a power series and sum


l 1

u (1 + )
l

j =0

SMA-HPC 2003 MIT

1 (1 + )
b=
b
1 (1 + )
j

Convergence Analysis

Finite Difference
Methods

A helpful bound on difference


equations cont.

To finish, note (1 + ) e (1 + )l e l
l
1

1
+

1
+

1
(
)
(
)
e
l
u
b=
b
b
1 (1 + )

Mapping the global error equation to the lemma


E l +1
SMA-HPC 2003 MIT

2
l

1 + t A E + C ( t )
N


Convergence Analysis

Finite Difference
Methods

Back to Forward Euler


Convergence analysis.

Applying the lemma and cancelling terms

l t A
2
e
l
l 1
2

E 1 + t A E + C ( t )
C
t

(
)
N


t A

Finally noting that l t T ,


max l[0, L] E e
l

SMA-HPC 2003 MIT

AT

C
t
A

Convergence Analysis

Finite Difference
Methods

Observations about the


forward-Euler analysis.

max l[0, L] E e
l

AT

C
t
A

forward-Euler is order 1 convergent


The bound grows exponentially with time interval
C is related to the solution second derivative
The bound grows exponentially fast with norm(A).
SMA-HPC 2003 MIT

Convergence Analysis

Finite Difference
Methods

Exact and forward-Euler(FE)


Plots for Unstable Reaction.

12
10

RFE

Rexact

Texact

TFE

2
0
0

0.5

1.5

Forward-Euler Errors appear to grow with time


SMA-HPC 2003 MIT

2.5

Convergence Analysis

Finite Difference
Methods
1.2

forward-Euler errors for


solving reaction equation.

E
0.8
r
r 0.6
o0.4
r

Rexact-RFE

0.2

Texact - TFE

0
-0.2

0.5

Time

1.5

2.5

Note error grows exponentially with time, as bound


predicts
SMA-HPC 2003 MIT

Convergence Analysis

Finite Difference
Methods

Exact and forward-Euler(FE)


Plots for Circuit.

v1exact

0.8

v1FE
0.6
0.4

v2FE
0.2
0
0

v2exact
0.5

1.5

2.5

3.5

Forward-Euler Errors dont always grow with time


SMA-HPC 2003 MIT

Convergence Analysis

Finite Difference
Methods

forward-Euler errors for


solving circuit equation.

0.03

v1exact - v1FE

E 0.02
r
r 0.01
o
0
r
-0.01

v2exact-v2FE

-0.02
-0.03

0.5

1.5

Time2

2.5

3.5

Error does not always grow exponentially with time!


Bound is conservative

SMA-HPC 2003 MIT

Summary
Initial Value problem examples
Signal propagation (two time scales).
Space frame dynamics (oscillator).
Chemical reaction dynamics (unstable system).
Looked at the simple finite-difference methods
Forward-Euler, Backward-Euler, Trap Rule.
Look at the approximations and algorithms
Experiments generated many questions
Analyzed Convergence for Forward-Euler
Many more questions to answer, some next time

Introduction to Simulation - Lecture 13


Convergence of Multistep Methods
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski, and


Karen Veroy

Outline
Small Timestep issues for Multistep Methods
Local truncation error
Selecting coefficients.
Nonconverging methods.
Stability + Consistency implies convergence
Next Time Investigate Large Timestep Issues
Absolute Stability for two time-scale examples.
Oscillators.

Basic Equations

Multistep Methods

General Notation

Nonlinear Differential Equation:


k-Step Multistep Approach:

x
x

l k

l 2

x l 1 x l

d
x(t ) = f ( x(t ), u (t ))
dt

j =0

j =0

l j
l j

x
=

f
x
, u ( tl j )
j
j

Multistep coefficients

Solution at discrete points

tl k

tl 3 tl 2 tl 1 tl

Time discretization

Basic Equations

Multistep Methods

Common Algorithms

Multistep Equation:

Multistep Coefficients:
BE Discrete Equation:
Multistep Coefficients:
Trap Discrete Equation:
Multistep Coefficients:

j =0

j =0

l j
l j

x
=

f
x
, u ( tl j )
j
j

Forward-Euler Approximation:
FE Discrete Equation:

x ( tl ) x ( tl 1 ) + t f ( x ( tl 1 ) , u ( tl 1 ) )

xl x l 1 = t f ( x l 1 , u ( tl 1 ) )
k = 1, 0 = 1, 1 = 1, 0 = 0, 1 = 1

x l x l 1 = t f ( x l , u ( tl ) )
k = 1, 0 = 1, 1 = 1, 0 = 1, 1 = 0
t
f ( x l , u ( tl ) ) + f ( x l 1 , u ( tl 1 ) )
2
1
1
k = 1, 0 = 1, 1 = 1, 0 = , 1 =
2
2
x l x l 1 =

Multistep Methods

Basic Equations
Definitions and Observations

Multistep Equation:

j =0

j =0

l j
l j

x
=

f
x
, u ( tl j )
j
j

1) If 0 0 the multistep method is implicit


2) A k step multistep method uses k previous x ' s and f ' s
3) A normalization is needed, 0 = 1 is common
4) A k -step method has 2k + 1 free coefficients

How does one pick good coefficients?


Want the highest accuracy

Simplified Problem for


Analysis

Multistep Methods

d
Scalar ODE:
v ( t ) = v(t ), v ( 0 ) = v0
dt
Why such a simple Test Problem?

Nonlinear Analysis has many unrevealing subtleties


Scalar is equivalent to vector for multistep methods.
multistep
d
x ( t ) = Ax(t ) discretization
dt
Let Ey (t ) = x(t )

Decoupled
Equations

l j
l j

x
=

Ax
j
j
k

j =0

l j
l j
1

y
=

E
AEy
j
j
j =0

j =0

l j

1
= t j
j =0

j =0

j =0

y l j

Simplified Problem for


Analysis

Multistep Methods
Scalar ODE:

d
v ( t ) = v(t ), v ( 0 ) = v0
dt
k
k

Scalar Multistep formula:

l j
l j

v
=

v
j
j
j =0

j =0

Must Consider ALL


Im ( )

Decaying
Solutions

O
s
c
i
l
l
a
t
i
o
n
s

Growing
Solutions

Re ( )

Multistep Methods

Convergence Analysis
Convergence Definition

Definition: A multistep method for solving initial value


problems on [0,T] is said to be convergent if given any
initial condition

max

T
l0,
t

vl v ( l t ) 0 as t 0
v l computed with t
t
l
v computed with
2

vexact

Multistep Methods

Convergence Analysis
Order-p convergence

Definition: A multi-step method for solving initial value


problems on [0,T] is said to be order p convergent if
given any and any initial condition

max

v v ( l t ) C ( t )
l

T
l0,
t

for all t less than a given t0


Forward- and Backward-Euler are order 1 convergent
Trapezoidal Rule is order 2 convergent

Multi-step Methods
10

M
a
x
E
r
r
o
r

10

-2

10

-4

10
10

Reaction Equation Example

10

Convergence Analysis

Backward-Euler

Trap rule

Forward-Euler

-6

-8

10

-3

10

-2

Timestep

10

-1

10

For FE and BE, Error t For Trap, Error ( t )

Multistep Methods

Convergence Analysis
Two Conditions for
Convergence

1) Local Condition: One step errors are small


(consistency)
Typically verified using Taylor Series
2) Global Condition: The single step errors do not grow
too quickly (stability)
All one-step (k=1) methods are stable in this sense.
Multi-step (k > 1) methods require careful analysis.

Convergence Analysis

Multistep Methods

Global Error Equation


k

Multistep formula:
Exact solution Almost
satisfies Multistep Formula:

j v

l j

j =0
k

t j vl j = 0
j =0

d
l

=
v
t
t
v
t
e
(

j ( l j )
j
l j )
dt
j =0
j =0

Global Error: E l v ( tl ) v l

Local Truncation Error


(LTE)

Difference equation relates LTE to Global error


l
l 1
l k
l

E
+

E
+
+

E
=
e
( 0
( 1
( k
0)
1)
k)

Convergence Analysis

Forward-Euler

Consistency for Forward Euler

Forward-Euler definition

l +1

v t v = 0
l

l t , ( l + 1) t

Substituting the exact v ( t ) and expanding


2
dv ( l t ) ( t ) d 2 v ( )
v ( ( l + 1) t ) v ( l t ) t
=
2
l

d
v = v
dt

dt

dt

el

where e is the LTE and is bounded by


2
d
v ( )
2
l
e C ( t ) , where C = 0.5 max [0,T ]
2
dt

Convergence Analysis

Forward-Euler

Global Error Equation

Forward-Euler definition
l +1
l
l
v = v + t v
Using the LTE definition

v ( ( l + 1) t ) = v ( l t ) + t v ( l t ) + e

Subtracting yields global error equation


l +1
l
l
E = ( I + t ) E + e
l
Using magnitudes and the bound on e
E

l +1

I + t E + e (1 + t ) E + C ( t )
l

Convergence Analysis

Forward-Euler

A helpful bound on difference


equations

A lemma bounding difference equation solutions

If

Then

l +1

(1 + ) u + b, u = 0, > 0
l
e
l
u
b
l
l

To prove, first write u as a power series and sum


l 1

u (1 + )
l

j =0

1 (1 + )
b=
b
1 (1 + )
l

One-step Methods

Convergence Analysis
A helpful bound on difference
equations cont.

To finish, note (1 + ) e (1 + ) e

l
1

1
+

1
+

1
(
)
(
)
e
l
u
b=
b
b
1 (1 + )

Convergence Analysis

One-step Methods

Back to Forward Euler


Convergence analysis.

Applying the lemma and cancelling terms

l t
2
l +1
l
2
E 1 + t E + C ( t ) e
C ( t )

b
Finally noting that l t T ,

max l[0, L] E e
l

Convergence Analysis

Forward-Euler

Observations about the


forward-Euler analysis.

max l[0, L] E e
l

forward-Euler is order 1 convergent


Bound grows exponentially with time interval.
C related to exact solutions second derivative.
The bound grows exponentially with time.

Convergence Analysis

Forward-Euler

Exact and forward-Euler(FE)


Plots for Unstable Reaction.

12
10

RFE

Rexact

6
4

TempExact
TFE

2
0
0

0.5

1.5

Forward-Euler Errors appear to grow with time

2.5

Convergence Analysis

Forward-Euler

forward-Euler errors for


solving reaction equation.

1.2
1

E
0.8
r
r 0.6
o0.4
r

Rexact-RFE

0.2

Texact - TFE

0
-0.2

0.5

Time

1.5

2.5

Note error grows exponentially with time, as bound


predicts

Convergence Analysis

Forward-Euler

Exact and forward-Euler(FE)


Plots for Circuit.

v1exact

0.8

v1FE
0.6
0.4

v2FE
0.2
0
0

v2exact
0.5

1.5

2.5

3.5

Forward-Euler Errors dont always grow with time

Convergence Analysis

Forward-Euler

forward-Euler errors for


solving circuit equation.

0.03

v1exact - v1FE

E 0.02
r
r 0.01
o
0
r
-0.01

v2exact-v2FE

-0.02
-0.03

0.5

1.5

Time2

2.5

3.5

Error does not always grow exponentially with time!


Bound is conservative

Making LTE Small

Multistep Methods

Exactness Constraints
k

d
l

=
v
t
t
v
t
e
(

Local Truncation Error: j ( l j )


j
l j )
dt
j =0
j =0
Can't be from

d
v (t ) = v (t )
dt

LTE

d
p 1
If v ( t ) = t v ( t ) = pt
dt
p

( ( k j ) t )
v (t )
j =0

k j

t j p ( ( k j ) t )
j =0

d
v ( tk j )
dt

p 1

=e

Making LTE Small

Multistep Methods

Exactness Constraints Cont.


k

( ( k j ) t )
j =0

( t )
If

t j p ( ( k j ) t )

p 1

j =0

k
k
p
p 1
k
j (l j ) j p (l j ) = e
j =0
j =0

k
k
p
p 1
j (( k j )) j p ( k j ) = 0
j =0
j =0

then e k = 0 for v(t ) = t p

As any smooth v(t) has a locally accurate Taylor series in t:

if

k
k
p
p 1
j ( k j ) j p ( k j ) = 0 for all p p0
j =0
j =0

k
k
l
d
p0 +1
Then j v ( tl j ) j v ( tl j ) = e = C ( t )
dt
j =0
j =0

Multistep Methods

Making LTE Small


Exactness Constraint k=2
Example

k
k
p
p 1
Exactness Constraints: j ( k j ) j p ( k j ) = 0
j =0
j =0

For k=2, yields a 5x6 system of equations for Coefficients


p=0
p=1
p=2
p=3
p=4

1
2

8
16

1
1
1
1
1

1 0
0 1
0 4
0 12
0 32

0
1
2
3
4

0
0 0
1

1
0
2
0 = 0
0
0
0

1
0 0
2

Note
i = 0
Always

Making LTE Small

Multistep Methods

Exactness
Constraints for k=2

1
2

8
16

Exactness Constraint k=2


Example Continued
0
0 0 0
1
1 1 0
2
1 0 4 2 0 = 0
0

1 0 12 3 0 0
1
1 0 32 4 0 0
2
1 1
1 0

0
1

Forward-Euler 0 = 1, 1 = 1, 2 = 0, 0 = 0, 1 = 1, 2 = 0,
2
FE satisfies p = 0 and p = 1 but not p = 2 LTE = C ( t )
Backward-Euler 0 = 1, 1 = 1, 2 = 0, 0 = 1, 1 = 0, 2 = 0,
2
BE satisfies p = 0 and p = 1 but not p = 2 LTE = C ( t )
Trap Rule 0 = 1, 1 = 1, 2 = 0, 0 = 0.5, 1 = 0.5, 2 = 0,
3
Trap satisfies p = 0,1, or 2 but not p = 3 LTE = C ( t )

Multistep Methods

Making LTE Small


Exactness Constraint k=2
example, generating methods

First introduce a normalization, for example 0 = 1


1
1

1
1

0 0 1 1
1 1 2 2
0 4 2 0 0 = 4

0 12 3 0 1 8
0 32 4 0 2 16

1
0

0
1

Solve for the 2-step method with lowest LTE

0 = 1, 1 = 0, 2 = 1, 0 = 1/ 3, 1 = 4 / 3, 2 = 1/ 3

Satisfies all five exactness constraints LTE = C ( t )

Solve for the 2-step explicit method with lowest LTE


0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2

Can only satisfy four exactness constraints LTE = C ( t )

Making LTE Small

Multistep Methods

LTE Plots for the FE, Trap, and


Best Explicit (BESTE).

10

d
v (t ) = v (t )
d

FE

-5

L 10
T
E

Trap

-10

10

Beste

Best Explicit Method


has highest one-step
accurate

-15

10

-4

10

-3

10

Timestep

-2

10

-1

10

10

Making LTE Small

Multistep Methods

Global Error for the FE, Trap,


and Best Explicit (BESTE).

10

M
d
a -2 d v (t ) = v (t )
10
x
E -4
r 10
r
-6
10
o
r

t [0,1]
FE
Wheres BESTE?
Trap

-8

10 -4
10

-3

10

-2

10

Timestep

-1

10

10

Multistep Methods

Global Error for the FE, Trap,


and Best Explicit (BESTE).

worrysome

200

M 10
a
x 100
E 10
r
r 0
10
o
r
10

Making LTE Small

Best Explicit Method has


lowest one-step error but
global errror increases as
timestep decreases

d
v (t ) = v (t )
d

Beste

FE

Trap

-100

10

-4

10

-3

10

Timestep

-2

10

-1

10

Multistep Methods

Stability of the method


Difference Equation

Why did the best 2-step explicit method fail to


Converge?
Multistep Method Difference Equation
l
l 1

E
+

E
+
( 0
( 1
0)
1)

v ( l t ) v

+ ( k t k ) E l k = el

LTE

Global Error
We made the LTE so small, how come the Global
error is so large?

An Aside on Solving Difference Equations


Consider a general kth order difference equation

a0 x + a1 x
l

l 1

+ ak x

l k

=u

Which must have k initial conditions


1

x = x0 , x = x1 ,
0

,x

= xk

As is clear when the equation is in update form

1
0
x = ( a1 x +
a0
1

+ ak x

k +1

Most important difference equation result


l

x can be related to u by x = h u
l

j =0

l j

An Aside on Difference Equations Cont.

If a0 z + a1 z
k

k 1

+ ak = 0 has distinct roots


1 , 2 , , k
k

Then x = h u where h = j ( j )
l

l j

j =0

j =1

To understand how h is derived, first a simple case

Suppose x = x + u and x = 0
1
0
1
1
2
1
2
1
2
x = x + u = u , x = x + u = u + u
l 1

x =
l

j =0

l j

An Aside on Difference Equations Cont.


Three important observations

If i <1 for all i, then x C max j u


where C does not depend on l
l

If i >1 for any i, then there exists


j
l
a bounded u such that x
If i 1 for all i, and if i =1, i is distinct

then x Cl max j u
l

Multistep Methods

Stability of the method


Difference Equation

Multistep Method Difference Equation


l
l 1

E
+

E
+
( 0
( 1
0)
1)

+ ( k t k ) E l k = el

Definition: A multistep method is stable if and only if


As t 0 max

T
E C max T el
l0,
t
t
l

T
l0,
t

for any el

Theorem: A multistep method is stable if and only if


The roots of 0 z k + 1 z k 1 + + k = 0 are either
Less than one in magnitude or equal to one and distinct

Multistep Methods

Stability of the method


Stability Theorem Proof

Given the Multistep Method Difference Equation


l
l 1

E
+

E
+
( 0
( 1
0)
1)

+ ( k t k ) E l k = el

If the roots of

k j

z
j =0
j =0

are either

less than one in magnitude


equal to one in magnitude but distinct
Then from the aside on difference equations
E Cl max l e
l

From which stability easily follows.

Multistep Methods

Stability of the method


Stability Theorem Proof

roots of

j =0

k j

=0

-1

roots of

Im

As t 0, roots
move inward to
match polynomial

Re

k j

z
= 0 for a nonzero t
( j
j)
j =0

Multistep Methods

Stability of the method


The BESTE Method

Best explicit 2-step method

0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2

Im

roots of z + 4 z 5 = 0
2

-5

-1

Method is Wildly unstable!

Re

Multistep Methods

Stability of the method


Dahlquists First Stability
Barrier

For a stable, explicit k-step multistep method, the


maximum number of exactness constraints that can be
satisfied is less than or equal to k (note there are 2k
coefficients). For implicit methods, the number of
constraints that can be satisfied is either k+2 if k is even
or k+1 if k is odd.

Convergence Analysis

Multistep Methods

Conditions for convergence,


stability and consistency

1) Local Condition: One step errors are small (consistency)


Exactness Constraints up to p0 (p0 must be > 0)

max

T
l0,
t

C1 ( t )

p0 +1

for t < t0

2) Global Condition: One step errors grow slowly (stability)

roots of

k j

z
j = 0 inside or simple on unit circle
j =0

max

T
E C2 max T el
l0,
t
t
l

T
l0,
t

Convergence Result:

max

E CT ( t )
l

T
l 0,
t

p0

Summary
Small Timestep issues for Multistep Methods
Local truncation error and Exactness.
Difference equation stability.
Stability + Consistency implies convergence.
Next time
Absolute Stability for two time-scale examples.
Oscillators.
Maybe Runge-Kutta schemes

Introduction to Simulation - Lecture 14


Multistep Methods II
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski, and


Karen Veroy

Outline
Small Timestep issues for Multistep Methods
Reminder about LTE minimization
A nonconverging example
Stability + Consistency implies convergence
Investigate Large Timestep Issues
Absolute Stability for two time-scale examples.
Oscillators.

Basic Equations

Multistep Methods

General Notation

Nonlinear Differential Equation:


k-Step Multistep Approach:

x
x

l k

l 2

x l 1 x l

d
x(t ) = f ( x(t ), u (t ))
dt

j =0

j =0

l j
l j

x
=

f
x
, u ( tl j )
j
j

Multistep coefficients

Solution at discrete points

tl k

tl 3 tl 2 tl 1 tl

Time discretization

Simplified Problem for


Analysis

Multistep Methods
Scalar ODE:

d
v ( t ) = v(t ), v ( 0 ) = v0
dt
k
k

Scalar Multistep formula:

l j
l j

v
=

v
j
j
j =0

j =0

Must Consider ALL ^


Im ( )

Decaying
Solutions

O
s
c
i
l
l
a
t
i
o
n
s

Growing
Solutions

Re ( )

Multistep Methods

Convergence Analysis
Convergence Definition

Definition: A multistep method for solving initial value


problems on [0,T] is said to be convergent if given any
initial condition

max

T
l0,
t

vl v ( l t ) 0 as t 0
v l computed with t
t
l
v computed with
2

vexact

Multistep Methods

Convergence Analysis
Two Conditions for
Convergence

1) Local Condition: One step errors are small


(consistency)
Typically verified using Taylor Series
2) Global Condition: The single step errors do not grow
too quickly (stability)
Multi-step (k > 1) methods require careful analysis.

Convergence Analysis

Multistep Methods

Global Error Equation


k

Multistep formula:
Exact solution Almost
satisfies Multistep Formula:

j v

l j

j =0
k

t j vl j = 0
j =0

d
l

=
v
t
t
v
t
e
(

j ( l j )
j
l j )
dt
j =0
j =0

Global Error: E l v ( tl ) v l

Local Truncation Error


(LTE)

Difference equation relates LTE to Global error


l
l 1
l k
l

E
+

E
+
"
+

E
=
e
( 0
( 1
( k
0)
1)
k)

Making LTE Small

Multistep Methods

Exactness Constraints
k

d
l

=
v
t
t
v
t
e
(

Local Truncation Error: j ( l j )


j
l j )
dt
j =0
j =0
Can't be from

d
v (t ) = v (t )
dt

LTE

d
p 1
If v ( t ) = t v ( t ) = pt
dt
p

j ) t )
( k
(

v (t )
j =0

k j

p 1

t j p ( ( k j ) t )
=e


j =0
d
v ( tk j )
dt

Multistep Methods

Making LTE Small


Exactness Constraint k=2
Example

k
k
p
p 1
Exactness Constraints: j ( k j ) j p ( k j ) = 0
j =0
j =0

For k=2, yields a 5x6 system of equations for Coefficients


p=0
p=1
p=2
p=3
p=4

1
2

8
16

1
1
1
1
1

1 0
0 1
0 4
0 12
0 32

0
1
2
3
4

0
0 0
1

1
0
2
0 = 0
0
0
0

1
0 0
2

Note
i = 0
Always

Multistep Methods

Making LTE Small


Exactness Constraint k=2
example, generating methods

First introduce a normalization, for example 0 = 1


1
1

1
1

0 0 1 1
1 1 2 2
0 4 2 0 0 = 4

0 12 3 0 1 8
0 32 4 0 2 16

1
0

0
1

Solve for the 2-step method with lowest LTE

0 = 1, 1 = 0, 2 = 1, 0 = 1/ 3, 1 = 4 / 3, 2 = 1/ 3
Satisfies all five exactness constraints LTE = C ( t )

Solve for the 2-step explicit method with lowest LTE


0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2

Can only satisfy four exactness constraints LTE = C ( t )

Making LTE Small

Multistep Methods

LTE Plots for the FE, Trap, and


Best Explicit (BESTE).

10

d
v (t ) = v (t )
d

FE

-5

L 10
T
E

Trap

-10

10

Beste

Best Explicit Method


has highest one-step
accurate

-15

10

-4

10

-3

10

Timestep

-2

10

-1

10

10

Making LTE Small

Multistep Methods

Global Error for the FE, Trap,


and Best Explicit (BESTE).

10

M
d
a -2 d v (t ) = v (t )
10
x
E -4
r 10
r
-6
10
o
r

t [0,1]
FE
Wheres BESTE?
Trap

-8

10 -4
10

-3

10

-2

10

Timestep

-1

10

10

Multistep Methods

Global Error for the FE, Trap,


and Best Explicit (BESTE).

worrysome

200

M 10
a
x 100
E 10
r
r 0
10
o
r
10

Making LTE Small

Best Explicit Method has


lowest one-step error but
global errror increases as
timestep decreases

d
v (t ) = v (t )
d

Beste

FE

Trap

-100

10

-4

10

-3

10

Timestep

-2

10

-1

10

Multistep Methods

Stability of the method


Difference Equation

Why did the best 2-step explicit method fail to


Converge?
Multistep Method Difference Equation
l
l 1
l k
l

E
+

E
+
"
+

E
=
e
( 0
( 1
( k
0)
1)
k)

v ( l t ) v

LTE

Global Error
We made the LTE so small, how come the Global
error is so large?

Multistep Methods

Stability of the method


Stability Definition

Multistep Method Difference Equation


l
l 1
l k
l

E
+

E
+
"
+

E
=
e
( 0
( 1
( k
0)
1)
k)

Definition: A multistep method is stable if as t 0


T
l
l
max T E C (T ) max T e
N t
l0,
l0,
t

t
interval
dependent

Global Error is bounded by a


Stability means:
constant times the sum of the LTEs

Convolution Sum

Aside on difference
Equations

Root Relation

Given a kth order difference eqn with zero initial conditions

a0 x + " + ak x
l

l k

= u , x = 0, " , x
l

l
x
x can be related to the input u by =

Q M q 1

Root multiplicity
m

q =1 m=0

k 1

l j j
h
u

j =0


Roots of

a0 z + a1 z
k

convolution sum

h = q ,m ( l ) ( q )
l

+ " + ak = 0

=0

Aside on difference
Equations

Convolution Sum
Bounding Terms

Q M q 1

l j
m
l
j
x = q ,m ( l j ) ( q ) u
q =1 m =0 j =0


Rq ,m

If q <1, then Rq ,m C max j u

Independent of l

If q < (1+ ) , then Rq ,0 C

Bounds distinct Roots

max j u

Multistep Methods

Stability of the method


Stability Theorem

Theorem: A multistep method is stable if and only if


Roots of 0 z k + 1 z k 1 + " + k = 0 either:

1. Have magnitude less than one


2. Have magnitude equal to one and are distinct

Multistep Methods

Stability of the method


Stability Theorem Proof

Given the Multistep Method Difference Equation

( 0 t 0 ) E l + (1 t 1 ) E l 1 + " + ( k t k ) E l k = el
If, as t 0, roots of ( 0 t 0 ) z l + " + ( k t k ) = 0

less than one in magnitude or


are distinct and bounded by 1 + t , > 0
Then from the aside on difference equations
l t
T
e
C
e
T
l
l
l
max T E C
max T e
max T e
l0,
l0,
l0,
t
T
t

N
t
t

t
C (T )

Multistep Methods

Stability of the method


Stability Theorem Picture

roots of

j =0

k j

=0

-1

roots of

Im

As t 0, roots
move inward to
match polynomial

Re

k j

z
= 0 for a nonzero t
( j
j)
j =0

Stability of the method

Multistep Methods

The BESTE Method

Best explicit 2-step method


0 = 1, 1 = 4, 2 = 5, 0 = 0, 1 = 4, 2 = 2

Im

roots of z + 4 z 5 = 0
2

-5

-1

Method is Wildly unstable!

Re

Multistep Methods

Stability of the method


Dahlquists First Stability
Barrier

For a stable, explicit k-step multistep method, the


maximum number of exactness constraints that can be
satisfied is less than or equal to k (note there are 2k-1
coefficients). For implicit methods, the number of
constraints that can be satisfied is either k+2 if k is even
or k+1 if k is odd.

Convergence Analysis

Multistep Methods

Conditions for convergence,


stability and consistency

1) Local Condition: One step errors are small (consistency)


Exactness Constraints up to p0 (p0 must be > 0)

max

T
l0,
t

C1 ( t )

p0 +1

for t < t0

2) Global Condition: One step errors grow slowly (stability)

roots of

k j
Inside the unit circle or on the

z
=
0
j

unit circle and distinct

j =0

T
max T E C2 max T el
l0,
l0,
t

t
p0
l
Convergence Result: max T E CT ( t )
l

l 0,
t

Multistep Methods

Large timestep stability


Two time-constant circuit

small t
Backward-Euler
Computed Solution

Circuit Example

d
x(t ) = Ax(t )
dt
eig ( A) = 2.1, 0.1

large t

With Backward-Euler it is easy to use small timesteps for


the fast dynamics and then switch to large timesteps for the
slow decay

Large Timestep Stability

Multistep Methods

FE on two time-constant
circuit?

Forward-Euler Computed
Solution

The Forward-Euler is accurate for small timesteps, but goes


unstable when the timestep is enlarged

Large Timestep Stability

Multistep Methods
Scalar ODE:

FE, BE and Trap on the scalar


ode problem

d
v ( t ) = v(t ), v ( 0 ) = v0
dt

l +1
l
l
l
Forward-Euler: v = v + t v = (1 + t ) v
If 1 + t > 1 the solution grows even if <0
1
l +1
l
l +1
l +1
Backward-Euler: v = v + t v v =

(1 t )

vl

1
If
< 1 the solution decays even if > 0
1 t

Trap Rule: v

l +1

= v + 0.5t ( v
l

l +1

+ v ) v
l

l +1

1 + 0.5t ) l
(
=
v
(1 0.5t )

Large Timestep Stability

Multistep Methods
Forward Euler

z = (1 + t )

-1

Im ( )

ODE stability
region

Im(z)

Difference Eqn
Stability region

FE large timestep region of


absolute stability

Re(z)

2
t

Region of
Absolute
Stability

Re ( )

Large Timestep Stability

Multistep Methods

FE large timestep stability,


circuit example

Circuit example with t = 0.1, = 2.1, 0.1


Im ( )
ODE stability
region

Im(z)

-1

Difference Eqn
Stability region

Re(z)

2
t

Region of
Absolute
Stability

Re ( )

Large Timestep Stability

Multistep Methods

FE large timestep stability,


circuit example

Circuit example with t=1.0, = 2.1, 0.1


Im ( )
Unstable
Difference
Equation

-1

ODE stability
region

Im(z)

Difference Eqn
Stability region

Re(z)

2
t

Region of
Absolute
Stability

Re ( )

Large Timestep Stability

Multistep Methods
Backward Euler

BE large timestep region of


absolute stability

z = (1 t )

Im(z)

-1

Difference Eqn
Stability region

Re(z)
Region of
Absolute
Stability

Im ( )

Large Timestep Stability

Multistep Methods

BE large timestep stability,


circuit example

Circuit example with t = 0.1, = 2.1, 0.1


Im ( )
Im(z)

-1

Difference Eqn
Stability region

Re(z)
Region of
Absolute
Stability

Large Timestep Stability

Multistep Methods

BE large timestep stability,


circuit example

Circuit example with t =1.0, = 2.1, 0.1


Im ( )

Stable Difference
Equation Im(z)

-1

Difference Eqn
Stability region

Re(z)
Region of
Absolute
Stability

Large Timestep Stability

Multistep Methods

Stability Definitions

Region of Absolute Stability for ak Multistep method:


Values of t where roots of ( j t j ) z k j = 0
are inside the unit circle.

j =0

A-stable:

A method is A-stable if its region of absolute stability


includes the entire left-half of the complex plane

Dahlquists second Stability barrier:


There are no A-stable multistep methods of convergence
order greater than 2, and the trap rule is the most accurate.

Multistep methods
4
2

Numerical Experiments
Oscillating Strut and Mass

t = 0.1

Forward-Euler

0
-2
-4
-6

Trap rule
0

10

Backward-Euler
15

20

25

30

Why does FE result grow, BE result decay and the


Trap rule preserve oscillations

Large Timestep Stability

Multistep Methods
Forward Euler

FE large timestep oscillator


example

z = (1 + t )

Im ( )

ODE stability
region

Im(z)

oscillating

unstable

-1

Difference Eqn
Stability region

Re(z)

2
t

Region of
Absolute
Stability

Re ( )

Large Timestep Stability

Multistep Methods
Backward Euler

BE large timestep oscillator


example

z = (1 t )

Im ( )

Im(z)
decaying

-1

Difference Eqn
Stability region

oscillating

Re(z)
Region of
Absolute
Stability

Large Timestep Stability

Multistep Methods

Trap large timestep oscillator


example

1 + 0.5t )
(
Trap Rule z = (1 0.5t )

Im ( )

Im(z)
oscillating
oscillating

-1

Difference Eqn
Stability region

Re(z)
Region of
Absolute
Stability

Multistep Methods

Large Timestep Issues

Two Time-Constant Stable problem (Circuit)


FE: stability, not accuracy, limited timestep size.
BE was A-stable, any timestep could be used.
Trap Rule most accurate A-stable m-step method
Oscillator Problem
Forward-Euler generated an unstable difference
equation regardless of timestep size.
Backward-Euler generated a stable (decaying)
difference equation regardless of timestep size.
Trapezoidal rule mapped the imaginary axis

Summary
Small Timestep issues for Multistep Methods
Local truncation error and Exactness.
Difference equation stability.
Stability + Consistency implies convergence.
Investigate Large Timestep Issues
Absolute Stability for two time-scale examples.
Oscillators.
Didnt talk about
Runge-Kutta schemes, higher order A-stable
methods.

Introduction to Simulation - Lecture 15


Methods for Computing Periodic
Steady-State
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski, and


Karen Veroy

Outline
Periodic Steady-state problems
Application examples and simple cases

Finite-difference methods
Formulating large matrices

Shooting Methods
State transition function
Sensitivity matrix

Matrix Free Approach

Basic Definition

Periodic Steady-State
Basics

dx ( t )
(t )
= F x ( t ) + u{
{ input
dt
state

Suppose the system has a periodic input


T

2T

3T

Many Systems eventually respond periodically

x ( t + T ) = x ( t ) for t >> 0

SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Basic Definition
Interesting Property

If x satisfies a differential equation which has a


unique solution for any initial condition

dx ( t )
= F ( x ( t ) ) + u (t )
dt
Then if u is periodic with period T and

x ( t0 + T ) = x ( t0 ) for some t0
x ( t + T ) = x ( t ) for all t > t0

SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Periodic Input
Wind

Response
Oscillating Platform

Desired Info
Oscillation Amplitude

SMA-HPC 2003 MIT

Application Examples
Swaying Bridge

Periodic Steady-State
Basics

Application Examples
Communication Integrated
Circuit

Periodic Input
Received Signal at 900Mhz

Response
filtered demodulated signal

Desired Info
Distortion

SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Application Examples
Automobile Vibration

Periodic Input
Regularly Spaced
Road Bumps

Response
Car Shakes

Desired Info
Shake amplitude

SMA-HPC 2003 MIT

Periodic Steady-State
Basics
RLC Circuit

Simple Example
RLC Filter,
Spring+Mass+Dashpot

Spring-Mass-Dashpot
Force

Both Described by Second-Order ODE


2

d x
dx
M 2 + D + x = u{
(t )
dt
dt
input

SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Simple Example
RLC Filter,
Spring+Mass+Dashpot Cont.

Both Described by Second-Order ODE


2

d x
dx
M 2 + D + x = u (t )
dt
dt
u(t) = 0 lightly damped (D<<M) Response

x ( t ) Ke
SMA-HPC 2003 MIT

D
2M

cos
+
M

Periodic Steady-State
Basics

Simple Example
RLC Filter,
Spring+Mass+Dashpot Cont.

Ke

2M

A lightly damped system oscillates many


times before settling to a steady-state
SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Computing Steady State


Frequency Domain Approach

Sinusoidally excited linear time-invariant system

dx ( t )
i t
= Ax ( t ) + e{
dt
input

Steady-State Solution simple to determine

x ( t ) = ( i A ) e
1

i t

Not useful for nonlinear or time-varying systems


SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Computing Steady State


Time Integration Method

Time-Integrate Until Steady-State Achieved


dx ( t )
= F ( x ( t ) ) + u (t ) x l = x l 1 + t F ( x l ) + u (l t )
dt

Need many timepoints for lightly damped case!

SMA-HPC 2003 MIT

Aside Reviewing
Integration Methods

Solve with Backward-Euler

Nonlinear System

dx ( t )
= F x ( t ) + u{
(t )
{ input
dt
state

x ( 0 ) = x0
1424
3
Initial Condition

Backward Euler Equation for timestep l

x x
l

l 1

= t F ( x ) + u (l t )
l

How do we solve the backward-Euler Equation?


SMA-HPC 2003 MIT

Implicit Methods

Aside Reviewing
Integration Methods

Backward-Euler Example

Forward-Euler

Backward-Euler

x(t1 )

x1 = x(0) + t f ( x ( 0 ) , u ( 0 ) )

x(t1 )

x(t2 )

x 2 = x1 + t f x1 , u ( t1 )

x(t2 )

x(t L )

x L = x L 1 + t f x L 1 , u ( t L 1 )

Requires just function


Evaluations

x 2 = x1
M

x(t L )

)
+ t f ( x , u ( t ) )

x1 = x(0) + t f x1 , u ( t1 )
2

x L = x L 1 + t f x L , u ( t L )

Nonlinear equation
solution at each step

Stepwise Nonlinear equation solution needed whenever 0 0


SMA-HPC 2003 MIT

Implicit Methods

Aside Reviewing
Integration Methods

Solution with Newton

Rewrite the multistep Equation


k

0 x t 0 f ( x , u ( tl ) ) + j x
l

j =1

Solve with Newton

f ( x l , j , u ( tl ) )
0 I t 0

Jacobian

l j

j =1

Independent of x l

( x l , j +1 x l , j ) = 0 x l , j t 0 f ( x l , j , u ( tl ) ) + b

F ( xl , j )

Here j is the Newton iteration index


SMA-HPC 2003 MIT

t j f x l j , u ( tl j ) = 0

Implicit Methods

Aside Reviewing
Integration Methods

Solution with Newton Cont.

f ( x l , j , u ( tl ) ) l , j +1 l , j
0 I t 0
( x
x ) = F ( xl , j )

Newton Iteration:

Solution with Newton is very efficient


Converged
Solution

x l
x l ,0

tl k

0 I t 0

t l 3 t l 2 t l 1 t l
f ( x l , j , u ( tl ) )

SMA-HPC 2003 MIT

Polynomial
Predictor

0 I

as t 0

Easy to generate a good initial


guess using polynomial fitting
Jacobian become easy to
factor for small timesteps

Boundary-Value
Problem

Basic Formulation
Periodicity
Constraint

Differential
Equation Solution

d
N Differential Equations:
xi ( t ) = Fi ( x ( t ) )
dt
N Periodicity Constraints: xi (T ) = xi ( 0 )
SMA-HPC 2003 MIT

Finite Difference Methods

Boundary-Value
Problem

Linear Example Problem

dx ( t )
= Ax ( t ) + u ( t ) t [ 0, T ]
{
dt

x (T ) = x ( t )
14243

input

periodicity
constraint

Discretize with Backward-Euler


1
0
1
x = x + t Ax + u ( t )
x 2 = x1 + t Ax 2 + u ( 2t )

(
(

x = x
L

L 1

T
t =
L

+ t ( Ax + u ( Lt ) )
L

Periodicity implies x = x
L

SMA-HPC 2003 MIT

Finite Difference Methods

Boundary-Value
Problem

Linear Example Matrix Form

NxL

1
0
0
t I A

1
1 I
IA
0
t
NxL t

O
O
0

1
0
I
0
t

1
I x1 u ( 0 )
t

2
u
t

x ( )

0
M = M


0 M M
x L u ( Lt )
1

I A
t

Matrix is almost lower triangular


SMA-HPC 2003 MIT

Finite Difference Methods

Boundary-Value
Problem

Nonlinear Problem

dx ( t )
= F ( x ( t ) ) + u ( t ) t [ 0, T ]
{
dt
input

x (T ) = x ( t )
14243
periodicity
constraint

Discretize with Backward-Euler


1
L
1
1

t
F
x
+ u ( t )
x
2
2
1
2

x x x t F x + u ( 2t )

H FD
=
M
L
x x L x L 1 t F x L + u Lt
( )

( ( )
( ( )

( ( )

Solve Using Newtons Method


SMA-HPC 2003 MIT

) =0
)

Boundary-Value
Problem

Shooting Method
Basic Definitions

dx ( t )
= F ( x (t )) + u (t )
Start with
dt
And assume x(t) is unique given x(0).
D.E. defines a State-Transition Function

( y, t0 , t1 ) x ( t1 )
where x (t ) is the D.E. solution given x ( t0 ) = y
SMA-HPC 2003 MIT

Boundary-Value
Problem

Shooting Method
State Transition function Example

dx ( t )
= x (t )
dt

( y, t0 , t1 ) e

SMA-HPC 2003 MIT

( t1 t0 )

Shooting Method

Boundary-Value
Problem

Abstract Formulation

Solve

H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0
14
4244
3
x(T )

Use Newtons method

( x, 0, T )
JH ( x) =
I
x

JH ( x
SMA-HPC 2003 MIT

)( x

k +1

) = H ( x )
k

Boundary-Value
Problem

Shooting Method
Computing Newton

To Compute ( x ( 0 ) , 0, T )
dx ( t )
Integrate
= F ( x ( t ) ) + u ( t ) on [0,T]
dt
( x, 0, T )

What is
?
x (T )
x
x (0) +
x (T )
x (0)

Indicates the sensitivity of x(T) to changes in x(0)


SMA-HPC 2003 MIT

Boundary-Value
Problem

( x, 0, T )

x
x11 (T ) x1 (T )

x1 T x T
(
)
(
)
N
N

1
SMA-HPC 2003 MIT

Shooting Method
Sensitivity Matrix by Perturbation

L L
L L
L L
L L

x1

(T ) x1 (T )

N
xN ( T ) xN ( T )

Boundary-Value
Problem

Shooting Method
Efficient Sensitivity Evaluation

Differentiate the first step of Backward-Euler

1
1
x x ( 0 ) t F x + u ( t ) = 0
x ( 0 )
1
1
1

F
x

x
0
(
)
x
x

t
=0
x ( 0 ) x ( 0 )
x x ( 0 )
1
I
1

F x
x ( 0 )
x

I t
=

x x ( 0 ) x ( 0 )

) )

( ( )

( )

( )

SMA-HPC 2003 MIT

Shooting Method

Boundary-Value
Problem

Efficient Sensitivity Matrix Cont

Applying the same trick on the l-th step

l
l 1

F ( x ) x l
x

I t
=

x x ( 0 ) x ( 0 )

F
x

(
)
( x, 0, T )

I t
x
x
l =1

SMA-HPC 2003 MIT

Shooting Method

Boundary-Value
Problem

Observations on Sensitivity Matrix

Newton at each timestep uses same matrices


l

F
x
(
)
( x, 0, T )
I t

x
x
l =1
1442443
L

Timestep Newton
Jacobian

Formula simplifies in the linear case

( x, 0, T )

SMA-HPC 2003 MIT

( I tA)

Shooting Method

Matrix-Free Approach
Basic Setup

dx ( t )
= F ( x (t )) + u (t )
Start with
dt

H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0

Use Newtons method

( x, 0, T )
JH ( x) =
I
x
k
k +1
k
k
J H ( x )( x x ) = H ( x )

SMA-HPC 2003 MIT

Shooting Method

Matrix-Free Approach
Matrix-Vector Product

Solve Newton equation with Krylov-subspace method

( x k , 0, T )

k +1
k
k
k

I ( x x ) = x ( x , 0, T )

14243 1442443
x
144

42444
3
x
b
A

Matrix-Vector Product Computation


k
j
k
( x k , 0, T )

p
,
0,
T
x
, 0, T )

(
)
(
j

Ip
pj

Krylov method search direction


SMA-HPC 2003 MIT

Shooting Method

Matrix-Free Approach
Convergence for GCR

Example

dx
Ax = 0 eig ( A ) real and negative
dt

Shooting-Newton Jacobian

( x, 0, T )
AT
I =e I
x
SMA-HPC 2003 MIT

Matrix-Free Approach

Shooting Method

Convergence for GCR-evals

AT

I =S

1T

1
O
e

N T

1
S
1

Many Fast Modes cluster at 1

1
Few Slow Modes larger than 1
SMA-HPC 2003 MIT

Summary
Periodic Steady-state problems
Application examples and simple cases

Finite-difference methods
Formulating large matrices

Shooting Methods
State transition function
Sensitivity matrix

Introduction to Simulation - Lecture 16


Methods for Computing Periodic
Steady-State - Part II
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski, and


Karen Veroy

Outline
Three Methods so far
Time integration until steady-state achieved
Finite difference methods
Shooting Methods

Shooting Methods
State transition function
Sensitivity matrix
Matrix-Free Approach

Spectral Methods
Galerkin and Collocation Methods

Basic Definition

Periodic Steady-State
Basics

dx ( t )
(t )
= F x ( t ) + uN
N input
dt
state

Suppose the system has a periodic input


T

2T

3T

Many Systems eventually respond periodically

x ( t + T ) = x ( t ) for t >> 0

SMA-HPC 2003 MIT

Periodic Steady-State
Basics

Computing Steady State


Time Integration Method

Time-Integrate Until Steady-State Achieved


dx ( t )
= F ( x ( t ) ) + u (t ) x l = x l 1 + t F ( x l ) + u (l t )
dt

Need many timepoints for lightly damped case!

SMA-HPC 2003 MIT

Boundary-Value
Problem

Basic Formulation
Periodicity
Constraint

Differential
Equation Solution

d
N Differential Equations:
xi ( t ) = Fi ( x ( t ) )
dt
N Periodicity Constraints: xi (T ) = xi ( 0 )
SMA-HPC 2003 MIT

Finite Difference Methods

Boundary-Value
Problem

Nonlinear Problem

dx ( t )
= F ( x (t )) + u (t )
N
dt

t [ 0, T ]

input

x (T ) = x ( t )


periodicity
constraint

Discretize with Backward-Euler


1
L
1
1

t
F
x
+ u ( t )
x
2
2
1
2

x x x t F x + u ( 2t )

H FD
=
#
L
x x L x L 1 t F x L + u Lt
( )

( ( )
( ( )

( ( )

Solve Using Newtons Method


SMA-HPC 2003 MIT

) =0
)

Boundary-Value
Problem

Shooting Method
Basic Definitions

dx ( t )
= F ( x (t )) + u (t )
Start with
dt
And assume x(t) is unique given x(0).
D.E. defines a State-Transition Function

( y, t0 , t1 ) x ( t1 )
where x (t ) is the D.E. solution given x ( t0 ) = y
SMA-HPC 2003 MIT

Shooting Method

Boundary-Value
Problem

Abstract Formulation

Solve

H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0


x(T )

Use Newtons method

( x, 0, T )
JH ( x) =
I
x

JH ( x
SMA-HPC 2003 MIT

)( x

k +1

) = H ( x )
k

Boundary-Value
Problem

Shooting Method
Computing Newton

To Compute ( x ( 0 ) , 0, T )
dx ( t )
Integrate
= F ( x ( t ) ) + u ( t ) on [0,T]
dt
( x, 0, T )

What is
?
x (T )
x
x (0) +
x (T )
x (0)

Indicates the sensitivity of x(T) to changes in x(0)


SMA-HPC 2003 MIT

Boundary-Value
Problem

( x, 0, T )

x
x11 (T ) x1 (T )

x1 T x T
(
)
(
)
N
N

1
SMA-HPC 2003 MIT

Shooting Method
Sensitivity Matrix by Perturbation

" "
" "
" "
" "

x1

(T ) x1 (T )

N
xN ( T ) xN ( T )

Boundary-Value
Problem

Shooting Method
Efficient Sensitivity Evaluation

Differentiate the first step of Backward-Euler

1
1
x x ( 0 ) t F x + u ( t ) = 0
x ( 0 )
1
1
1

F
x

x
0
(
)
x
x

t
=0
x ( 0 ) x ( 0 )
x x ( 0 )
1
I
1

F x
x ( 0 )
x

I t
=

x x ( 0 ) x ( 0 )

) )

( ( )

( )

( )

SMA-HPC 2003 MIT

Shooting Method

Boundary-Value
Problem

Efficient Sensitivity Matrix Cont

Applying the same trick on the l-th step

l
l 1

F ( x ) x l
x

I t
=

x x ( 0 ) x ( 0 )

F
x

(
)
( x, 0, T )

I t
x
x
l =1

SMA-HPC 2003 MIT

Shooting Method

Boundary-Value
Problem

Observations on Sensitivity Matrix

Newton at each timestep uses same matrices


l

F
x
(
)
( x, 0, T )
I t

x
x
l =1


Timestep Newton
Jacobian

Formula simplifies in the linear case

( x, 0, T )

SMA-HPC 2003 MIT

( I tA)

Shooting Method

Matrix-Free Approach
Basic Setup

dx ( t )
= F ( x (t )) + u (t )
Start with
dt

H ( x ( 0 ) ) = ( x ( 0 ) , 0, T ) x ( 0 ) = 0

Use Newtons method

( x, 0, T )
JH ( x) =
I
x
k
k +1
k
k
J H ( x )( x x ) = H ( x )

SMA-HPC 2003 MIT

Shooting Method

Matrix-Free Approach
Matrix-Vector Product

Solve Newton equation with Krylov-subspace method

( x k , 0, T )

k +1
k
k
k

I ( x x ) = x ( x , 0, T )




x


x
b
A

Matrix-Vector Product Computation


k
j
k
( x k , 0, T )

p
,
0,
T
x
, 0, T )

(
)
(
j

Ip
pj

Krylov method search direction


SMA-HPC 2003 MIT

Shooting Method

Matrix-Free Approach
Convergence for GCR

Example

dx
Ax = 0 eig ( A ) real and negative
dt

Shooting-Newton Jacobian

( x, 0, T )
AT
I =e I
x
SMA-HPC 2003 MIT

Matrix-Free Approach

Shooting Method

Convergence for GCR-evals

AT

I =S

1T

1
%
e

N T

1
S
1

Many Fast Modes cluster at 1

1
Few Slow Modes larger than 1
SMA-HPC 2003 MIT

Fourier Representation

Spectral Methods

Truncation Approximation

Periodic function fourier series

x(t ) =

Xe

l =

t
i 2 l
T

Approximate a function with truncated series

x(t )

Xe

l = L
SMA-HPC 2003 MIT

t
i 2 l
T

Spectral Methods

Fourier Representation
Square Wave Example

Copyright 1997 by Alan V. Oppenheim and Alan S. Willsky


SMA-HPC 2003 MIT

Fourier Representation

Spectral Methods

Annoyance for Real Functions

Real x Fourier Coeffs complex conjugate

X l = X

*
l

Can rewrite series with fewer unknowns

t
i 2 l
T

t
+ i 2 l
*
T

x(t ) = X l e
+ Xl e
+ X0
N


l =1
l =0
Real
SMA-HPC 2003 MIT

Fourier Representation

Spectral Methods

Orthogonality

Terms in Fourier Series are orthogonal


T

t
i 2 l
T

t
i 2 m
T

dt = 0 l m

Simple formula for computing coefficients


T

t
i 2 m
T

SMA-HPC 2003 MIT

x(t )dt = e
0

t
i 2 m
T

Xe

l =

t
i 2 l
T

dt = TX m

Fourier Representation

Spectral Methods

Advantages

For smooth functions (infinitely cont. diff)


Fourier Coefficients decay exponentially fast
T

1 i 2 m Tt
m
lim m > e
x(t )dt = lim m > X m = O ( c )
T 0

Automatically satisfies periodicity


x(t + T ) =

Xe

l = L
SMA-HPC 2003 MIT

t +T
i 2 l

Xe

l = L

i 2 l

t
T

= x (t )

Spectral Methods

Computing Coefficients
Residual

Plug representation into differential equation


t
G
i 2 l
d L
T
R X , t = Xle


dt l = L

Residual

t
i 2 l

L
T
F
X
e

l = L

u (t )

Simplify by differentiating representation


t
t
L
L
G
i 2 l
i 2 l

i 2 l
T
T
F Xle
R X ,t =
Xle


l = L T
l = L

Residual

SMA-HPC 2003 MIT

u (t )

Computing Coefficients

Spectral Methods

Collocation and Galerkin

Collocation Residual = 0 at test points


G
R X , tl = 0 l = {1,..., 2 L + 1}


Residual

Galerkin Residual orthog to Fourier Terms


T

i 2 m

SMA-HPC 2003 MIT

t
T

G
R X , t dt = 0 m { L,..., 0,...L}


Residual

Spectral Methods

Computing Coefficients
Galerkin Equation

Galerkin Residual orthog to Fourier Terms


T i 2 m t
T
e
0

t
t
L
L i 2 l
i 2 l
i 2 l

T
T
F Xle
Xle

l = L
l = L T

i 2 mX l + e
0

i 2 m

t
T

t
i 2 l
L
T
F Xle
l = L

T
t
i 2 m

T
+
dt
e
u ( t )dt = 0

m { L,..., 0,...L}
SMA-HPC 2003 MIT

u ( t ) dt

Spectral Methods

Computing Coefficients
Linear Galerkin F(x)=Ax

i 2 mX l + e
0

i 2 m

t
T

t
i 2 l
L
T
A Xle
l = L

T
t
i 2 m

T
u ( t )dt = 0
dt + e

0


Um

i 2 L

+
A
0
0
0
UL
T
X L

X
i 2 ( L 1)
( L 1)

( L 1)
0
0
+A 0

# = #
T

0
0
0
%
#

#
U

X
i 2 L
L
L
+ A
0
0
0

T


Diagonal

SMA-HPC 2003 MIT

Spectral Methods

Computing Coefficients
Collocation Equations

Collocation Residual zero at test times


t
t
L
L
G
i 2 l l
i 2 l l

i 2 l
T
T
F Xle
R X , tl = 0 =
Xle
T


l = L
l = L

Residual

l = {1,..., 2 L + 1}

SMA-HPC 2003 MIT

u ( tl )

Spectral Methods

Computing Coefficients
Discrete Fourier Transform

L
x ( t1 )
i 2 t1 X L
i 2 TL t1
T
"
"
e
e
x (t )

X
2

( L 1)

#
% %
#
#

# =
#
% %
#

i 2 L t

L
i 2 t( 2 L+1)

( 2 L+1)

T
T
e

" " e
X L x t( 2 L +1)

Discrete Fourier Transform(DFT)

l
If tl =
T then DFT Matrix has orthog columns
2L + 1
SMA-HPC 2003 MIT

Spectral Methods

Computing Coefficients
Collocation using timepoints

i 2 L

0
0
0
x ( t1 )
T

x ( t2 )
i 2 ( L 1)

0
0
0
1
#
DFT
DFT
(
)
T

#
%
0
0

i 2 L
x t( 2 L +1)
0
0
0

T


F ( x ( t1 ) )

F ( x ( t2 ) )

#


F x t( 2 L +1)

((

))

u (t )
1

u ( t2 )

#
=

#

u t( 2 L +1)

Spectral Differentiation

Converting timepoint into Fourier Coeffs,


Differentiating, and then returning to time
SMA-HPC 2003 MIT

Spectral Methods

Computing Coefficients
Spectral Differentiation Example

Middle row, T = 17 and 2L+1 = 17


SMA-HPC 2003 MIT

Computing Coefficients

Spectral Methods

Spectral Colloc vs. F-D


1
t

1
t

i 2 L
T

0
DFT

0
1
t
%
0

0
0
%
1

t
0

i 2 ( L 1)
T
0
0

SMA-HPC 2003 MIT

1
F ( x ( t1 ) )
1

x
t

2 F ( x ( t2 ) )
x
0

#
# +

0 #
#

1 x 2 L +1 F x t
( 2 L +1)

((

))

u (t )
1

u ( t2 )

#
=

#

u t( 2 L +1)

x ( t1 )

x ( t2 )

0
0
1
( DFT ) #

#
%
0

i 2 L
x t( 2 L +1)
0

T
0

F ( x ( t1 ) )

F ( x ( t2 ) )

#

#


F x t( 2 L +1)

((

))

u (t )
1

u ( t2 )

#
=

#

u t( 2 L +1)

Summary
Four Methods
Time integration until steady-state achieved
Finite difference methods
Shooting Methods
Spectral Methods

Shooting Methods
State transition function
Sensitivity matrix
Matrix-Free Approach

Spectral Methods
Galerkin and Collocation Methods
SMA-HPC 2003 MIT

Introduction to Simulation - Lecture 19


Laplaces Equation FEM Methods
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


and Karen Veroy, Jaime Peraire and Tony Patera

Outline for Poisson Equation Section


Why Study Poissons equation
Heat Flow, Potential Flow, Electrostatics
Raises many issues common to solving PDEs.

Basic Numerical Techniques


basis functions (FEM) and finite-differences
Integral equation methods

Fast Methods for 3-D


Preconditioners for FEM and Finite-differences
Fast multipole techniques for integral equations

SMA-HPC 2003 MIT

Outline for Today


Why Poisson Equation
Reminder about heat conducting bar

Finite-Difference And Basis function methods


Key question of convergence

Convergence of Finite-Element methods


Key idea: solve Poisson by minimization
Demonstrate optimality in a carefully chosen norm

SMA-HPC 2003 MIT

Drag Force Analysis


of Aircraft

Potential Flow Equations


Poisson Partial Differential Equations.

SMA-HPC 2003 MIT

Engine Thermal
Analysis

Thermal Conduction Equations


The Poisson Partial Differential Equation.

SMA-HPC 2003 MIT

Capacitance on a microprocessor Signal Line

Electrostatic Analysis
The Laplace Partial Differential Equation.

SMA-HPC 2003 MIT

HeatFlow

1-D Example

Incoming Heat

T (1)

T (0)
Near End
Temperature

Unit Length Rod

Far End
Temperature

Question: What is the temperature distribution along the bar


T

T (0)

T (1)
x

1-D Example

HeatFlow

Discrete Representation

1) Cut the bar into short sections


2) Assign each cut a temperature

T (1)

T (0)
T1

T2

TN 1 TN

1-D Example

HeatFlow

Constitutive Relation

Heat Flow through one section

x
Ti

Ti +1 hi +1,i

Ti +1 Ti
= heat flow =
x

hi +1,i
Limit as the sections become vanishingly small
T ( x )
lim x 0 h ( x ) =
x

1-D Example

HeatFlow

Conservation Law

Two Adjacent Sections


control volume

Incoming Heat (hs )

Ti 1 hi ,i 1

Ti

hi +1,i Ti +1

x
Heat Flows into Control Volume Sums to zero
hi +1,i hi ,i 1 = hs x

1-D Example

HeatFlow

Conservation Law

Heat Flows into Control Volume Sums to zero


In co m in g H eat ( h s )

hi +1,i hi ,i 1 = hs x
Ti 1 hi , i 1

Ti
x

hi + 1, i Ti + 1

Heat in
from left

Heat out
from right

Incoming
heat per
unit length

Limit as the sections become vanishingly small

h ( x ) T ( x )

lim x 0 hs ( x ) =
=
x
x
x

1-D Example

HeatFlow

CircuitAnalogy

Temperature analogous to Voltage


Heat Flow analogous to Current

1
=
R x
T1
+
-

vs = T (0)

SMA-HPC 2003 MIT

is = hs x

TN
+
-

vs = T (1)

HeatFlow

1-D Example
Normalized 1-D Equation

Normalized Poisson Equation

T ( x )
u ( x)
= hs
= f ( x)

2
x
x
x
2

u xx ( x ) = f ( x )

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

Residual Equation

Using Basis
Functions
Partial Differential Equation form

u
2 = f
x
2

u (0) = 0 u (1) = 0

Basis Function Representation


n

u ( x ) uh ( x ) = i i ( x )
{
i =1

Basis Functions

Plug Basis Function Representation into the Equation

d i ( x )
R ( x ) = i
+ f ( x)
2
dx
i =1
n

SMA-HPC 2003 MIT

Example Basis functions

Using Basis
Functions

Introduce basis representation u ( x ) uh ( x ) = i i ( x )


{
i =1
Basis Functions
u h ( x ) is a weighted sum of basis functions
The basis functions define a space

X h = v X h | v = ii for some i 's


i =1

Example

Hat basis functions


4 6

3 5

SMA-HPC 2003 MIT

Piecewise linear Space

Using Basis
functions

Basis Weights
Galerkin Scheme

Force the residual to be orthogonal to the basis functions


1

( x ) R ( x ) dt = 0
l

Generates n equations in n unknowns

d i ( x )
i
+ f ( x ) dx = 0 l {1,..., n}
2
0 l ( x )
dx
i =1

SMA-HPC 2003 MIT

Basis Weights

Using Basis
Functions

Galerkin with integration by


parts

Only first derivatives of basis functions


n

d l ( x )
0 dx
1

d i i ( x )
i =1

dx

dx i ( x ) f ( x ) dx = 0
0

l {1,..., n}
SMA-HPC 2003 MIT

Convergence
Analysis

The
question
is
u
uh

How does u uh decrease with refinement?


123
error

This time Finite-element methods


Next time Finite-difference methods
SMA-HPC 2003 MIT

Convergence Analysis

Heat Equation

Overview of FEM

Partial Differential Equation form

u
2 = f
x
2

u (0) = 0 u (1) = 0

Nearly Equivalent weak form

u v
x x dx = f v dx for all v
14243 1
424
3
a(u,v)
l (v )
Introduced an abstract notation for the equation u must satisfy

a (u , v) = l (v)
SMA-HPC 2003 MIT

for all v

Convergence Analysis

Heat Equation

Overview of FEM
n

Introduce basis representation u ( x ) uh ( x ) = i i ( x )


{
i =1
Basis Functions
u h ( x ) is a weighted sum of basis functions
The basis functions define a space

X h = v X h | v = ii for some i 's


i =1

Example

Hat basis functions


4 6

3 5

SMA-HPC 2003 MIT

Piecewise linear Space

Heat Equation

Convergence Analysis
Overview of FEM

Key Idea
a(u , u ) defines a norm a (u, u ) u

U is restricted to be 0 at 0 and1!!
Using the norm properties, it is possible to show

If a (uh , i ) = l ( i ) for all i {1 , 2 ,..., n }

Then

u uh = min wh X h u wh
1
424
3
1
424
3
Pr ojection
Solution
Error
Error

SMA-HPC 2003 MIT

Heat Equation

Convergence Analysis
Overview of FEM

The question is only


u

uh

How well can you fit u with a member of Xh


But you must measure the error in the ||| ||| norm
For piecewise linear:
SMA-HPC 2003 MIT

1
u uh = O
1
424
3
n
error

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

Summary
Why Poisson Equation
Reminder about heat conducting bar

Finite-Difference And Basis function methods


Key question of convergence

Convergence of Finite-Element methods


Key idea: solve Poisson by minimization
Demonstrate optimality in a carefully chosen norm

SMA-HPC 2003 MIT

Introduction to Simulation - Lecture 20


Finite-Difference Methods for
Boundary Value Problems
Jacob White

Thanks to Jaime Peraire

Outline
Informal Finite Difference Methods
Heat Conducting Bar

More Formal Analysis of Finite-Difference Methods


Heat Equation
Consistency + Stability yields Convergence

SMA-HPC 2003 MIT

1-D Example

Heat Flow

Discrete Representation

1) Cut the bar into short sections


2) Assign each cut a temperature

T (1)

T (0)
T1

T2

TN 1 TN

1-D Example

Heat Flow

Equation Formulation

In co m in g H eat ( h s )

Ti 1 hi , i 1

Ti
x

hi + 1, i Ti + 1

hi +1,i

Ti +1 Ti
= heat flow =
x

hi +1,i hi ,i 1 = hs x
Heat in
from left

Heat out
from right

Incoming
heat per
unit length

Limit as the sections become vanishingly small

h ( x )
T ( x )

lim x 0 hs ( x ) =
=
x
x
x

Heat Flow

1-D Example
Normalized 1-D Equation

Normalized Equation

T ( x )
u ( x)
= hs
= f ( x)

2
x
x
x
2

u xx ( x ) = f ( x )

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

Summary
Informal Finite Difference Methods
Heat Conducting Bar

More Formal Analysis of Finite-Difference Methods


Heat Equation
Consistency + Stability yields Convergence

Introduction to Simulation - Lecture 21


Boundary Value Problems - Solving 3-D
Finite-Difference problems
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski, and


Karen Veroy

Outline
Reminder about FEM and F-D
1-D Example

Finite Difference Matrices in 1, 2 and 3D


Gaussian Elimination Costs

Krylov Method
Communication Lower bound
Preconditioners based on improving communication

Heat Flow

1-D
Example
Normalized 1-D Equation

Normalized Poisson Equation

T ( x )
u ( x)
= hs
= f ( x)

2
x
x
x
2

u xx ( x ) = f ( x )

SMA-HPC 2002 MIT

SMA-HPC 2002 MIT

FD Matrix
properties
x0 x1 x2

2
1
1
0
2
x
M
0
SMA-HPC 2002 MIT

uxx = f

1-D Poisson

Finite Differences

xn xn +1

2 1
O O
O O
L 0

L
O
O
O
1

u j +1 2u j + u j
x

= f (x j )

u1
f ( x1 )
M
0 M

M
M M

0 M = M

M
1 M

2 M
M
u
f ( x )
n
n

Residual Equation

Using Basis
Functions
Partial Differential Equation form

u
2 = f
x
2

u (0) = 0 u (1) = 0

Basis Function Representation


n

u ( x ) uh ( x ) = i i ( x )
{
i =1

Basis Functions

Plug Basis Function Representation into the Equation

d i ( x )
R ( x ) = i
+ f ( x)
2
dx
i =1
n

SMA-HPC 2002 MIT

Using Basis functions

Basis Weights
Galerkin Scheme

Force the residual to be orthogonal to the basis functions


1

( x ) R ( x ) dt = 0
l

Generates n equations in n unknowns

d i ( x )
i
+ f ( x ) dx = 0 l {1,..., n}
2
0 l ( x )
dx
i =1

SMA-HPC 2002 MIT

Basis Weights

Using Basis
Functions

Galerkin with integration by


parts

Only first derivatives of basis functions


n

d l ( x )
0 dx
1

d i i ( x )
i =1

dx

dx i ( x ) f ( x ) dx = 0
0

l {1,..., n}
SMA-HPC 2002 MIT

Structural Analysis of
Automobiles

Equations
Force-displacement relationships for
mechanical elements (plates, beams, shells)
and sum of forces = 0.
Partial Differential Equations of Continuum
Mechanics

Drag Force Analysis of


Aircraft

Equations
Navier-Stokes Partial Differential Equations.

Engine Thermal
Analysis

Equations
The Poisson Partial Differential Equation.

2-D Discretized Problem

FD Matrix
properties

Discretized Poisson

x1

x2

x2m

xm +1

u j +1 2u j + u j 1
244
x
144
3
u
xx
2

SMA-HPC 2002 MIT

xm

u j + m 2u j + u j m
y
1442443
u
yy
2

= f (x j )

FD Matrix
properties

SMA-HPC 2002 MIT

2-D Discretized Problem


Matrix Nonzeros, 5x5 example

3-D Discretization

FD Matrix
properties

Discretized Poisson

x j m

x j 1

x j m2

xj

x j +1

x j + m2
u j +1 2u j + u j 1
( x )
1442443
u xx
2

SMA-HPC 2002 MIT

u j + m 2u j + u j m
( y )
1442443
u yy
2

x j +m

u j + m 2 2u j + u j m 2
( z )
144
42444
3
u zz
2

= f (xj )

FD Matrix
properties

SMA-HPC 2002 MIT

3-D Discretization
Matrix nonzeros, m = 4 example

FD Matrix
properties

Summary
Numerical Properties

Matrix is Irreducibly Diagonally Dominant

| Aii | | Aij |
j i

Each row is strictly diagonally dominant, or path


connected to a strictly diagonally dominant row

Matrix is symmetric positive definite


Assuming uniform discretization, diagonal is
1
1 D 2 2,

SMA-HPC 2002 MIT

1
1
2 D 2 4, 3 D 2 6,

Summary

FD Matrix
properties

Structural Properties

Matrices in 3-D are LARGE


1 D m m,

2 D m 2 m 2 , 3 D m3 m3

100x100x100 grid in 3-D = 1 million x 1 million matrix

Matrices are very sparse


Nonzeros per row 1 D
Matrices are banded
1 D

Aij = 0

|i j | > 1

2D

Aij = 0

|i j| > m

3 D

Aij = 0

| i j | > m2

SMA-HPC 2002 MIT

3,

2D

5, 3 D

Basics of GE

A11
A
0
21

A
031

041
A
SMA-HPC 2002 MIT

Triangularizing
Picture

A12 A13 A14

A
A2323 A
A24
A22
22 A
24

A03232 A33
A
34
34
33

A04242 A0A4343 A4444

Triangularizing

GE Basics

Algorithm

For i = 1 to n-1 {
For each Row
For j = i+1 to n {
For each Row below pivot
For k = i+1 to n { For each element beyond Pivot

Ajk Ajk

Multiplie
r

}
}

Aji
Aii

Pivot

Aik
Form n-1 reciprocals (pivots)
Form

n 1

n2
(n i ) =

2
i =1
n 1

Perform
SMA-HPC 2002 MIT

multipliers

(n i)2
i =1

2 3
n
3

Multiply-adds

Complexity of GE

1 D

O ( n3 ) = O ( m3 )

100 pt grid O(106 ) ops

2D

O (n3 ) = O (m6 ) 100 100 grid O(1012 ) ops

3 D

O (n3 ) = O (m9 ) 100 100 100 grid O(1018 ) ops

For 2- D and 3-D problems Need a Faster Solver !

SMA-HPC 2002 MIT

Triangularizing

Banded GE

Algorithm

b
b

For i = 1 to n-1 {
For j = i+1 to ni+b-1
{ {
{
For k = i+1 to n i+b-1
{
Ajk Ajk

NONZEROS

Aii

Aik

}
}
}
n 1

Perform

(min(b 1, n i))
i =1

SMA-HPC 2002 MIT

Aji

O (b2n )

Multiply-adds

Complexity of
Banded GE

1 D

O (b 2 n) = O (m)

100 pt grid O(100) ops

2D

O (b 2 n) = O (m 4 ) 100 100 grid O(108 ) ops

3 D

O (b 2 n) = O (m7 ) 100 100 100 grid O (1014 ) ops

For 3 - D problems Need a Faster Solver !

SMA-HPC 2002 MIT

The World According to


Krylov

Preconditioning

Start with Ax = b, Form PAx = Pb


Determine the Krylov Subspace r 0 = Pb PAx
k 0
0
0
Krylov Subspace span r , PAr ,..., ( PA ) r

Select Solution from the Krylov Subspace


k 0
k +1
k
k
0
0
0
x = x + y , y span r , PAr ,..., ( PA ) r

GCR picks a residual minimizing y .


SMA-HPC 2002 MIT

Preconditioning

Krylov Methods

Diagonal Preconditioners

Let A = D + And
(

Apply GCR to D 1 A x = I + D 1 And x = D 1b


The Inverse of a diagonal is cheap to compute
Usually improves convergence
SMA-HPC 2002 MIT

Krylov Methods

Convergence Analysis
Optimality of GCR poly

GCR Optimality Property


% k+1 ( PA) r 0 where
% k+1 is any k th order
r k +1
% k+1 ( 0 ) =1
polynomial such that

Therefore
Any polynomial which satisfies the
constraints can be used to get an
upper bound on
SMA-HPC 2002 MIT

r k +1
r

Residual Poly Picture for Heat Conducting Bar Matrix


No loss to air (n=10)

Keep k ( i ) as small as possible:

Easier if eigenvalues are clustered!

SMA-HPC 2002 MIT

Krylov Vectors for diagonal


Preconditioned A

The World According to


Krylov

xexact (1 digit)

0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1

b = r0

-.5

1.25 -1 -.5

D A
D 1 A

1-D Discretized
PDE

0.5
0
0 1
1
0.5

1
0
O
0

0.5
0
O
O 0.5 M


0
0
0.5
1 0
14444
4244444
3
SMA-HPC 2002 MIT

D1A

1
1.25

0.5

M0.5
0
0

The World According to


Krylov

xexact (1 digit)
b = r0
1

D A
1
D A

Krylov Vectors for Diagonal


Preconditioned A
Communication Lower Bound

0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1


1

-.5

1.25 -1 -.5

Communication Lower Bound for m gridpoints

D A r 0 is nonzero in mth entry after k = m iters


k

0
k +1
j
= x + j r 0
Need at least m iterations for x
m
j =1

SMA-HPC 2002 MIT

Krylov Vectors for Diagonal


Preconditioned A

The World According to


Krylov

Two Dimensional Case

For an mxm Grid

1
0
If r 0 =
M

0

Takes 2m = O ( m ) iters
m2
SMA-HPC 2002 MIT

for x k +1

)m

Convergence for GCR

The World According to


Krylov

Eigenanalysis

0.5
0
0 1
1
0.5

1
0
O
0

0.5
0
O
O 0.5 M


0
0
0.5
1 0
14444
3
4244444

1
1.25
0.5

M0.5
0
0

D1A

k
Recall Eigenvalues of D A = 1 cos

m +1
1

SMA-HPC 2002 MIT

Convergence for GCR

The World According to


Krylov

Eigenanalysis

For D 1 A,

max
min

m

= 1 cos

1
cos

+
+
m
1
m
1

# Iters for GCR to achieve convergence


rk
r

1
2

+1
k =

log

convergence

tolerance

1 m
log

+1

O (m)

GCR achieves Communication lower bound O(m)!


SMA-HPC 2002 MIT

The World According to


Krylov

Work for Banded Gaussian


Elimination, Relaxation and GCR

Dimension Banded GE Sparse GE


1

O (m)

O (m)

O (m

O (m

)
O (m )
4

)
O (m )
3

GCR
O (m

)
O (m )
O (m )

GCR faster than banded GE in 2 and 3 dimensions


Could be faster, 3-D matrix only m3 nonzeros.
Must defeat the communication lower bound!
SMA-HPC 2002 MIT

The World According to


Krylov

How to get Faster


Converging GCR

Preconditioning is the Only Hope


GCR already achieves Communication Lower
bound for a diagonally preconditioned A

Preconditioner must accelerate communication


Multiplying by PA must move values by
more than one grid point.

SMA-HPC 2002 MIT

Gauss-Seidel Preconditioning

Preconditioning
Approaches

Physical View

1-D Discretized PDE


(new)

u0 u1

u2(old)

u1(new)

(new)
2

Gauss Seidel
u3(old)

un(new)
1

un(new) u
n +1

Each Iteration of Gauss-Seidel Relaxation


moves data across the grid

SMA-HPC 2002 MIT

Gauss-Seidel Preconditioning

Preconditioning
Approaches

xexact (1 digit)
b =1 r 0
( D + L) A

( D + L)

b =1 r
( D + L) A

( D + L)

Krylov Vectors

0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1


1

1-D Discretized
PDE

X=nonzero

Gauss-Seidel communicates quickly


in only one direction

SMA-HPC 2002 MIT

Gauss-Seidel Preconditioning

Preconditioning
Approaches

Symmetric Gauss-Seidel

(new)

u
u0 1

u2(old)
(new)

u
(new)
2

u1

u3(old)

un(new)
1
u
(newer)

u
u 1
0

(new)
n2

un(newer)
1

un(new) u
n +1

un( new)

u2(newer)

This symmetric Gauss-Seidel Preconditioner


Communicates both directions
SMA-HPC 2002 MIT

Gauss-Seidel Preconditioning

Preconditioning
Approaches

Symmetric Gauss-Seidel

Derivation of the SGS Iteration Equations


Forward Sweep ( half step ) : ( D + L ) x k +1/ 2 + Ux k = b
Backward Sweep ( half step ) : ( D + U ) x k + Lx k +1/ 2 = b

k +1

= ( D + U ) L ( D + L ) Ux k
1
1
1
+ ( D +U ) b ( D +U ) L ( D + L) b

k +1

= x ( D + U ) D ( D + L ) Ax k

+ ( D +U ) D ( D + L) b
1

SMA-HPC 2002 MIT

Block Diagonal
Preconditioners

Preconditioning
Approaches

Line Schemes

Grid

Matrix

m2

Tridiagonal Matrices factor quickly


SMA-HPC 2002 MIT

Block Diagonal
Preconditioners

Preconditioning
Approaches

Line Schemes

Grid

Problem

Lines preconditioners
communicate rapidly in
only one direction

Solution
m2

Do lines first in x,
then in y.

The Preconditioner is now two Tridiagonal solves, with


variable reordering in between.
SMA-HPC 2002 MIT

Block Diagonal
Preconditioners

Preconditioning
Approaches

Domain Decomposition

Approach

Break the domain into


small blocks each with the
same # of grid points

The trade-off
m2

SMA-HPC 2002 MIT

Fewer blocks means


faster convergence, but
more costly iterates

Block Diagonal
Preconditioners

Preconditioning
Approaches

Domain Decomposition

m points

Block
index

m
l2

m
points
l
2

m
Block cost: factoring l l grids, O ( m 2l ) , sparse GE
l
m
GCR iters: Communication bound gives O iterations.
l3
Suggests insensitivity to l: Algorithm is O ( m ) .

Do you have to refactor every GCR iteration?


SMA-HPC 2002 MIT

Siedelerized Block Diagonal


Preconditioners

Preconditioning
Approaches

Line Schemes

Grid

Matrix

m2

SMA-HPC 2002 MIT

Overlapping Domain
Preconditioners

Preconditioning
Approaches

Line based Schemes

Grid

Matrix

m2

Bigger systems to solve, but can have faster convergence on


tougher problems (not just Poisson).

SMA-HPC 2002 MIT

Preconditioning
i
Approaches

Incomplete Factorization
Schemes
Outline

Reminder about Gaussian Elimination


Computational Steps
Fill-in for Sparse Matrices
Greatly increases factorization cost
Fill-in in a 2-D grid
Incomplete Factorization Idea

Sparse Matrices

Fill-In
Example

Matrix Non zero structure

Matrix after one GE step

X X X
X X 0

X 0 X

X X X
X X X

0 X
X X

X= Non zero
Fill-ins
SMA-HPC 2002 MIT

Fill-In

Sparse Matrices

Second Example

Fill-ins Propagate

X
X

X
0

X
0

X
0

X
0

X
0

Fill-ins from Step 1 result in Fill-ins in step 2


SMA-HPC 2002 MIT

Sparse Matrices

Fill-In

Pattern of a Filled-in Matrix

SMA-HPC 2002 MIT

Very Sparse

Very Sparse

Dense

Sparse Matrices

SMA-HPC 2002 MIT

Fill-In
Unfactored Random Matrix

Sparse Matrices

SMA-HPC 2002 MIT

Fill-In
Factored Random Matrix

Factoring 2-D Finite-Difference matrices

Generated Fill-in Makes Factorization Expensive

SMA-HPC 2002 MIT

FD Matrix
properties

SMA-HPC 2002 MIT

3-D Discretization
Matrix nonzeros, m = 4 example

Preconditioning
i
Approaches

Incomplete Factorization
Schemes
Key idea

THROW AWAY FILL-INS!


Throw away all fill-ins
Throw away only fill-ins with small values
Throw away fill-ins produced by other fill-ins
Throw away fill-ins produced by fill-ins of
other fill-ins, etc.

Summary
3-D BVP Examples
Aerodynamics, Continuum Mechanics, Heat-Flow

Finite Difference Matrices in 1, 2 and 3D


Gaussian Elimination Costs

Krylov Method
Communication Lower bound
Preconditioners based on improving communication

Introduction to Simulation - Lecture 22


Integral Equation Methods
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


Xin Wang and Karen Veroy

Outline
Integral Equation Methods
Exterior versus interior problems
Start with using point sources
Standard Solution Methods in 2-D
Galerkin Method
Collocation Method
Issues in 3-D
Panel Integration
SMA-HPC 2003 MIT

Interior Versus Exterior Problems

Interior

Exterior
2T = 0

outside

2T = 0

inside
Temperature
known on surface

Temperature in a Tank

Temperature
known on surface

Ice Cube in a Bath

What is the heat flow?


Heat Flow
SMA-HPC 2003 MIT

n
surface

Thermal
= conductivity

Exterior Problem in Electrostatics


potential

+
v
-

2 = 0

Outside

is given on Surface

What is the capacitance?


Capacitance

SMA-HPC 2003 MIT

Dielectric
= Permitivity

n
surface

Drag Force in a Microresonator

Courtesy of Werner Hemmert, Ph.D. Used with permission.

Resonator

Computed Forces
Bottom View

SMA-HPC 2003 MIT

Discretized Structure

Computed Forces
Top View

What is common about these problems.


Exterior Problems
Drag Force in MEMS device - fluid (air) creates drag.
Coupling in a Package - Fields in exterior create coupling
Capacitance of a Signal Line - Fields in exterior.
Quantities of Interest are on the surface
MEMS device - Just want surface traction force
Package - Just want coupling between conductors
Signal Line - Just want surface charge.
Exterior Problem is linear and space-invariant
MEMS - Exterior Stokes Flow equation (linear).
Package - Maxwells equations in free space (linear).
Signal Line - Laplaces equation in free space (linear).

But problems are geometrically very complex!

Exterior Problems
Surface

Why not use Finite-Difference


or FEM methods

2-D Heat Flow Example

T = 0 at
But, must
truncate the
mesh
T
Only need
on the surface, but T is computed everywhere
n
Must truncate the mesh, T () = 0 becomes T ( R ) = 0
SMA-HPC 2003 MIT

Greens Function

Laplaces Equation

In 2-D
2
2
( x x0 ) + ( y y0 )

If u = log
2u 2u
then 2 + 2 = 0 for all ( x, y ) ( x0 , y0 )
x
y

In 3-D

If u =

( x x0 ) + ( y y0 ) + ( z z0 )
2

2u 2u 2u
then 2 + 2 + 2 = 0 for all ( x, y, z ) ( x0 , y0 , z0 )
x
y
z

Proof: Just differentiate and see!


SMA-HPC 2003 MIT

Laplaces Equation
in 2-D

Simple Idea

u is given on surface

Surface

( x0 , y0 )
Let u = log

2u 2u
+ 2 = 0 outside
2
x
y

( x x0 ) + ( y y0 )
2

2u 2u
+ 2 = 0 outside
2
x
y

Problem Solved

Does not match boundary conditions!


SMA-HPC 2003 MIT

Simple Idea

Laplaces Equation
in 2-D

More Points

u is given on surface

2u 2u
+ 2 = 0 outside
2
x
y

( x2 , y2 )
( x1 , y1 )
n

Let u = i log
i =1

( xn , yn )

( x xi ) + ( y yi )
2

) = G ( x x , y y )
n

i =1

Pick the i ' s to match the boundary conditions!


SMA-HPC 2003 MIT

Simple Idea

Laplaces Equation
in 2-D

More Points Equations

(x , y )
t1

t1

Source Strengths selected


to give correct potential at
test points.

( x2 , y2 )
( x1 , y1 )

( xn , yn )

G ( xt x1 , yt y1 ) L L G ( xt xn , yt yn ) ( xt , yt )
1
1
1
1
1
1

M
O
M
M

M
M
O
M
M

G x x , y y L L G x x , y y n x , y
1
1
tn
tn
tn
n
tn
n
tn
tn

SMA-HPC 2003 MIT

Computational results using points approach


Circle with Charges r=9.5

Potentials on the Circle


n=20

SMA-HPC 2003 MIT

r
R=10

n=40

Laplaces Equation
in 2-D

Integral Formulation
Limiting Argument

Want to smear point charges onto surface

Results in an Integral Equation


( x ) = G ( x, x ) ( x ) dS
surface

How do we solve the integral equation?


SMA-HPC 2003 MIT

Laplaces Equation
in 2-D

Basis Function Approach


Basic Idea

Represent ( x ) = i i ( x )
{
i =1

Basis Functions

Example Basis
Represent circle with straight lines

Assume is constant along each line

The basis functions are on the surface


Can be used to approximate the density
May also approximate the geometry
SMA-HPC 2003 MIT

Laplaces Equation
in 2-D

Basis Function Approach


Geometric Approximation is
not new.

Piecewise Straight surface basis


Triangles for 2-D FEM
Functions approximate the circle approximate the circle too!

( x) =

approx
surface
SMA-HPC 2003 MIT

G ( x, x ) ii ( x ) dS
i =1

Laplaces Equation
in 2-D
x1

xn

ln

l1

x2
l2

( x) =

Basis Function Approach


Piecewise Constant Straight
Sections Example.

1) Pick a set of n Points on the


surface
2) Define a new surface by
connecting points with n lines.
3) Define i ( x ) = 1 if x is on line li
otherwise, i ( x ) = 0

i =1

i =1

G ( x, x ) ii ( x ) dS = i

approx
surface
SMA-HPC 2003 MIT

G ( x, x ) dS

line l
i

How do we determine the i ' s ?

Basis Function Approach

Laplaces Equation
in 2-D
R ( x) ( x)

Residual Definition and


minimization
n

G ( x, x ) ii ( x ) dS

approx
surface

i =1

We will pick the i ' s to make R ( x ) small.


General Approach: Pick a set of test functions
1 ,K , n , and force R ( x ) to be orthogonal to the set

( x )R ( x ) dS = 0
i

SMA-HPC 2003 MIT

for all i.

Basis Function Approach

Laplaces Equation
in 2-D

Residual minimization using


test functions

( x ) R ( x ) dS = ( x ) ( x ) dS
i

i ( x ) G ( x, x ) j j ( x ) dS dS = 0
j =1

approx
surface

We will generate different methods by chosing the 1 ,K , n ,

Collocation: i ( x ) = x xti (point-matching)

Galerkin Method: i ( x ) = i ( x ) (basis = test)


SMA-HPC 2003 MIT

Basis Function Approach

Laplaces Equation
in 2-D

Collocation

Collocation: i ( x ) = ( xi ) (point-matching)

( x x ) R ( x ) dS = R ( x ) = ( x )
ti

ti

( )

xti = j
j =1

G xti , x

ti

approx
surface

) ( x) dS = 0
j =1

G xti , x j ( x ) dS

approx
surface

1444424444
3
Ai , j

A1,1 L L A1,n 1 ( xt1 )

M O

M M

= M
M
O M M M


An ,1 L L An ,n n xtn

( )

SMA-HPC 2003 MIT

Laplaces Equation
in 2-D

xn l
n

xt1

l1
l2

x2

Basis Function Approach


Centroid Collocation for
Piecewise Constant Bases

( )

xti = j
j =1

G xti , x j ( x ) dS

approx
surface

Collocation point in
line center
A1,1 L L A1,n 1 ( xt1 )

M O

M M

= M
M
O M M M


An ,1 L L An ,n n xtn

( )

SMA-HPC 2003 MIT

( )

xti = j
j =1

G ( x , x) dS
ti

line j

1442443
Ai , j

Laplaces Equation
in 2-D

( )

xti = j
j =1

Basis Function Approach


Centroid Collocation
Generates a nonsymmetric A

G ( x , x) dS
ti

line j

1442443
Ai , j

xt1
xt2

l1

A1,2 =

l2

G ( x , x) dS G ( x
t1

line 2

SMA-HPC 2003 MIT

t2

line1

, x ) dS = A2,1

Laplaces Equation
in 2-D

Basis Function Approach


Galerkin

Galerkin: i ( x ) = i ( x ) (test=basis)

( x ) R ( x ) dS = ( x ) ( x ) dS
i

i ( x ) G ( x, x ) j j ( x ) dS dS = 0

i ( x ) ( x ) dS = j

approx
surface

j =1

14442444
3
bi
A1,1 L L
M O

M
O

An ,1 L L

approx
surface

G ( x, x ) i ( x ) j ( x ) dS dS

approx approx
surface surface

1444444
424444444
3
Ai , j
A1,n 1 b1
M M M
=
M M M

An ,n n bn

If G ( x, x) = G ( x, x) then Ai , j =A j ,i
SMA-HPC 2003 MIT

j =1

A is symmetric

Basis Function Approach

Laplaces Equation
in 2-D

ln

l1 xn
l2

Galerkin for Piecewise


Constant Bases

x2
n

( x ) dS = G ( x, x) dS dS

linei

14243
bi

j =1

linei line j

144424443
Ai , j

A1,1 L L A1,n 1 b1
M O

M

M = M
M
O M M M


An ,1 L L An ,n n bn
SMA-HPC 2003 MIT

3-D Laplaces
Equation

Basis Function Approach


Piecewise Constant Basis

Integral Equation: ( x ) =

surface

Discretize Surface into


Panels

1
( x ) dS
x x
n

Represent ( x ) i i ( x )
{
i =1

Basis Functions

j ( x ) = 1 if x is on panel j
Panel j ( x ) = 0 otherwise
j
SMA-HPC 2003 MIT

3-D Laplaces
Equation
Put collocation points at
panel centroids

Basis Function Approach


Centroid Collocation

( )

xci = j

xci Collocation
point

j =1

G(x

panel j

ci

, x dS

14442444
3
Ai , j

A1,1 L L A1,n 1 ( xc1 )

M O

M M

= M
M
O M M M


An ,1 L L An ,n n xcn

( )

SMA-HPC 2003 MIT

Basis Function Approach

3-D Laplaces
Equation

Calculating Matrix Elements

xci Collocation

point

Ai , j =

panel j

1
dS
xci x

Panel j
One point
quadrature
Approximation
Four point
quadrature
Approximation
SMA-HPC 2003 MIT

Panel Area

Ai , j

xci xcentroid j
4

0.25* Area

j =1

xci x po int j

Ai , j

Basis Function Approach

3-D Laplaces
Equation

Calculating Self-Term

xci Collocation

point

Ai ,i =

panel i

1
dS
xci x

Panel i
One point
quadrature
Approximation

Ai ,i =

panel i
SMA-HPC 2003 MIT

Ai ,i

Panel Area

xci xci
1
424
3
0

1
dS is an integrable singularity
xci x

Basis Function Approach

3-D Laplaces
Equation

Calculating Self-Term
Tricks of the trade

xci Collocation

point

Panel i

Ai ,i =

panel i

Disk of radius R
surrounding
collocation point

Integrate in two Ai ,i =
disk
pieces
Disk Integral has
singularity but has
analytic formula
SMA-HPC 2003 MIT

disk

1
dS
xci x

1
1
dS +
dS

xci x
rest of panel xci x
R 2

1
dS =
xci x
0

1
rdrd = 2 R
r

Basis Function Approach

3-D Laplaces
Equation

Calculating Self-Term
Other Tricks of the trade

xci Collocation

point

Panel i

Ai ,i =

panel i

1
dS
xci x
1
424
3

Integrand is singular

1) If panel is a flat polygon, analytical formulas exist


2) Curve panels can be handled with projection

SMA-HPC 2003 MIT

Basis Function Approach

3-D Laplaces
Equation

Galerkin (test=basis)
n

( x ) ( x ) dS = ( x ) G ( x, x ) ( x ) dS dS
144
2443
144444244444
3
i

bi

j =1

Ai , j

For piecewise constant Basis


n
1
dS dS
( x ) dS = j
14

panel
i
panel
j
x x
243 j =1
14444244443
bi
Ai , j
A1,1 L L A1,n 1 b1
M O

M

M = M
M
O M M M


An ,1 L L An ,n n bn
SMA-HPC 2003 MIT

3-D Laplaces
Equation

Basis Function Approach


Problem with dense matrix

Integral Equation Method Generate Huge


Dense Matrices
A1,1 L L A1,n 1 ( xc1 )

M O

M M

= M
M
O M M M


An ,1 L L An ,n n xcn

( )

Gaussian Elimination Much Too Slow!


SMA-HPC 2003 MIT

Summary
Integral Equation Methods
Exterior versus interior problems
Start with using point sources
Standard Solution Methods
Collocation Method
Galerkin Method
Next Time Fast Solvers
Use a Krylov-Subspace Iterative Method
Compute MV products Approximately

Introduction to Simulation - Lecture 23


Fast Methods for Integral Equations
Jacob White

Thanks to Deepak Ramaswamy, Michal Rewienski,


and Karen Veroy

Outline
Solving Discretized Integral Equations
Using Krylov Subspace Methods
Fast Matrix-Vector Products
Multipole Algorithms
Multipole Representation.
Basic Hierarchy
Algorithmic Improvements
Local Expansions
Adaptive Algorithms
Computational Results

Exterior Problem in Electrostatics


potential

+
v
-

2 = 0

Outside

is given on Surface
Dirichelet Problem

First Kind Integral Equation For Charge:


1
( x) =
( x ) dS
{

x
x

surface
1
424
3 Ch arg e
potential
Green's Density
Function

SMA-HPC 2003 MIT

Drag Force in a Microresonator

Courtesy of Werner Hemmert, Ph.D. Used with permission.

Resonator

Computed Forces
Bottom View

SMA-HPC 2003 MIT

Discretized Structure

Computed Forces
Top View

3-D Laplaces
Equation

Basis Function Approach


Piecewise Constant Basis

Integral Equation: ( x ) =

surface

Discretize Surface into


Panels

1
( x ) dS
x x
n

Represent ( x ) i i ( x )
{
i =1

Basis Functions

j ( x ) = 1 if x is on panel j
Panel j ( x ) = 0 otherwise
j
SMA-HPC 2003 MIT

3-D Laplaces
Equation
Put collocation points at
panel centroids

Basis Function Approach


Centroid Collocation

( )

xci = j

xci Collocation
point

j =1

G(x

panel j

ci

, x dS

14442444
3
Ai , j

A1,1 L L A1,n 1 ( xc1 )

M O

M M

= M
M
O M M M


An ,1 L L An ,n n xcn

( )

SMA-HPC 2003 MIT

Basis Function Approach

3-D Laplaces
Equation

Calculating Matrix Elements

xci Collocation

point

Ai , j =

panel j

1
dS
xci x

Panel j
One point
quadrature
Approximation
Four point
quadrature
Approximation
SMA-HPC 2003 MIT

Panel Area

Ai , j

xci xcentroid j
4

0.25* Area

j =1

xci x po int j

Ai , j

Basis Function Approach

3-D Laplaces
Equation

Calculating Self-Term

xci Collocation

point

Ai ,i =

panel i

1
dS
xci x

Panel i
One point
quadrature
Approximation

Ai ,i =

panel i
SMA-HPC 2003 MIT

Ai ,i

Panel Area

xci xci
1
424
3
0

1
dS is an integrable singularity
xci x

Basis Function Approach

3-D Laplaces
Equation

Calculating Self-Term
Tricks of the trade

xci Collocation

point

Panel i

Ai ,i =

panel i

Disk of radius R
surrounding
collocation point

Integrate in two Ai ,i =
disk
pieces
Disk Integral has
singularity but has
analytic formula
SMA-HPC 2003 MIT

disk

1
dS
xci x

1
1
dS +
dS

xci x
rest of panel xci x
R 2

1
dS =
xci x
0

1
rdrd = 2 R
r

Basis Function Approach

3-D Laplaces
Equation

Calculating Self-Term
Other Tricks of the trade

xci Collocation

point

Panel i

Ai ,i =

panel i

1
dS
xci x
1
424
3

Integrand is singular

1) If panel is a flat polygon, analytical formulas exist


2) Curve panels can be handled with projection

SMA-HPC 2003 MIT

Basis Function Approach

3-D Laplaces
Equation

Galerkin (test=basis)
n

( x ) ( x ) dS = ( x ) G ( x, x) ( x) dS dS
144
2443
1444442444443
i

bi

j =1

Ai , j

For piecewise constant Basis


n
1
dS dS
( x ) dS = j
14

panel
i
panel
j
x x
243 j =1
14444244443
bi
Ai , j
A1,1 L L A1,n 1 b1
M O

M

M = M
M
O M M M


L
L
A
A
n ,1
n,n
n bn
SMA-HPC 2003 MIT

3-D Laplaces
Equation

Basis Function Approach


Problem with dense matrix

Integral Equation Method Generate Huge


Dense Matrices
A1,1 L L A1,n 1 ( xc1 )

M O

M M

= M
M
O M M M


An ,1 L L An ,n n xcn

( )

Gaussian Elimination Much Too Slow!


SMA-HPC 2003 MIT

Solving Discretized
Integral Equations

compute Apk

( r ) ( Ap )
k T

k =

( Apk )

( Apk )

x k +1 = x k + k pk

The Generalized Conjugate


Residual Algorithm
The kth step of GCR

For discretized Integral


equations, A is dense
Determine optimal stepsize in
kth search direction

Update the solution


and the residual
r k +1 = r k k Apk
T
k
+
1
Compute the new
k
Ar
Ap
(
)
(
)
j
k +1
pk +1 = r
p j orthogonalized
T
j = 0 ( Ap ) ( Ap )
search direction
j
j
SMA-HPC 2003 MIT

Solving Discretized
Integral Equations

( r ) ( Ap )
k T

Vector inner products, O(n)

( Apk ) ( Apk )
x k +1 = x k + k pk
r k +1 = r k k Apk

pk +1 = r

k +1

Vector Adds, O(n)

Ar ) ( Ap )
(

p
( Ap ) ( Ap )
k

j =0

k +1 T

Complexity of GCR

Dense Matrix-vector
product costs O(n2)

compute Apk
k =

The Generalized Conjugate


Residual Algorithm

O(k) inner products,


total cost O(nk)

Algorithm is O(n2) for Integral Equations


even though # iters (k) is small!
SMA-HPC 2003 MIT

Solving Discretized
Integral Equations

The Generalized Conjugate


Residual Algorithm
Fast Matrix Vector Products

exactly compute Apk


Dense Matrix-vector product costs O(n2)

approximately compute Apk

Reduces Matrix-vector product costs to


O(n) or O(nlogn)

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

1/(# panels)

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

SMA-HPC 2003 MIT

Summary
Solving Discretized Integral Equations
GCR plus Fast Matrix-Vector Products
Multipole Algorithms
Multipole Representation.
Basic Hierarchy
Algorithmic Improvements
Local Expansions
Adaptive Algorithms
Computational Results
Precorrected-FFT Algorithms

Introduction to Simulation - Lectures 17, 18

Molecular Dynamics
Nicolas Hadjiconstantinou

Molecular Dynamics
Molecular dynamics is a technique for computing the equilibrium and
non-equilibrium properties of classical* many-body systems.
* The nuclear motion of the constituent particles obeys the laws of
classical mechanics (Newton).
References:
1)
2)
3)

Computer Simulation of Liquids, M.P. Allen & D.J. Tildesley,


Clarendon, Oxford, 1987.
Understanding Molecular Simulation: From Algorithms to
Applications, D. Frenkel and B. Smit, Academic Press, 1997.
Moldy manual

Moldy
A free and easy to use molecular dynamics simulation package can be found
at the CCP5 program library (http://www.ccp5.ac.uk/librar.shtml), under the
name Moldy. At this site a variety of very useful information as well as
molecular simulation tools can be found.
Moldy is easy to use and comes with a very well written manual which can
help as a reference. I expect the homework assignments to be completed using
Moldy.

Why Do We Need Molecular Dynamics?


Similar to real experiments.
1.
2.

Allows us to study and understand material behavior so that we can model it.
Tells us what the answer is when we do not have models.
Example: Diffusion equation

F |x

F | x + dx
dy

Conservation of mass:

dx
xx ++ dx
d
ndxdydz ) = ( F | x F | x + dx )dydz
(
dt

n = number density, F = flux


d
ndxdydz ) F | x
(
dt

F
2 F dx 2
| x dx + 2
F | x +
dydz

x
x 2

F
2 F dx
=
dxdydz 2
dydzdx
x
x 2

in the limit

dx 0,

n F
+
=0
t x

This equation cannot be solved unless a relation between n and F is provided.


Experiments or consideration of molecular behavior shows that
under a variety of conditions
n
F = D
x

n
2n

=D 2
t
x

diffusion equation!
5

Breakdown of linear gradient constitutive law


Large gradients

F = D

n
x

Far from equilibrium gaseous flows


- Shockwaves
- Small scale flows (high Knudsen number flow)
- Rarefied flows (low density) (high Knudsen number flow)

High Knudsen number flows (gases)


Kn is defined as the ratio of the molecular mean-free path to a characteristic
lengthscale
The molecular mean-free path is the average distance traveled by molecules
between collisions
Collisions tend to restore equilibrium
When Kn 1 particle motion is diffusive (near equilibrium)
When Kn 1 particle motion is ballistic (far from equilibrium)
0.1 Kn 10 physics is transitional (hard to model)
6

Example:

Re-entry vehicle aerobraking maneuver


in the upper atmosphere

In the upper atmosphere density is low (collision rate is low)


Long mean-free path
High Knudsen number flows typical

Other high Knudsen number flows


Small scale flows (mean-free path of air molecules at atmospheric pressure
is approximately 60 nanometers)
Vacuum science (low pressure)

From Dr. M. Gallis of Sandia National Laboratories

Brief Intro to Statistical Mechanics


Statistical mechanics provides the theoretical connection between the
microscopic description of matter (e.g. positions and velocities of molecules)
and the macroscopic description which uses observables such as pressure,
density, temperature
This is achieved through a statistical approach and the concept of an ensemble
average. An ensemble is a collection of a large number of identical systems
(M) evolving in time under the same macroscopic conditions but different
microscopic initial conditions.

( )

Let i M

be the number of such systems in state i :

( )

Then i can be interpreted as the probability of finding an ensemble


member in state i.
9

Macroscopic properties (observables) are then calculated as weighted


averages
A = (i ) A(i )
i

or in the continuous limit

A = ( ) A( )d.

One of the fundamental results of statistical mechanics is that the probability


of a state of a system with energy E in equilibrium at a fixed temperature T is
governed by

E
(E ) exp
kT
where k is Boltzmanns constant.
For non-equilibrium systems solving a problem reduces to the task of calculating
().
Molecular methods are similar to experiments where rather than solving for ()
we measure A directly.
10

A = (i ) A(i )
i

implies that given an ensemble of systems, any observable A


can be measured by averaging the value of this observable over all
systems.
However, in real life we do not use a large number of systems to do
experiments. We usually observe one system over some period of time.
This is because we use the ergodic hypothesis:
- Since there is a one-to-one correspondence between the initial
conditions of a system and the state at some other time, averaging
over a large number of initial conditions is equivalent to averaging
over time-evolved states of the system.
- The ergodic hypothesis allows us to convert the averaging from
ensemble members to time instances of the same system. THIS IS
AN ASSUMPTION THAT SEEMS TO WORK WELL MOST OF
THE TIME.
11

A Simplified MD Program Structure


Initialize:
-Read run parameters (initial temperature, number of timesteps,
density, number of particles, timestep)
-Generate or read initial configuration (positions and velocities
of particles)
Loop in timestep increments until t = tfinal
-Compute forces on particles
-Integrate equations of motion
-If t > tequilibrium, sample system
Output results
12

Equations of Motion
Newtons equations

For i = 1,K, N
r
d 2 ri r

r r r
mi 2 = Fi = r U r1 , r2 K rN )
ri
dt

r r r
U (r1 , r2 K rN ) = Potential energy of the system
r
r r
r r r
= U1 (r1 ) + U 2 (ri , rj ) + U 3 (r1 , r j , rk )
i

i j >i

i j >i k > j >i

+K

r
U1 (r1 ) = external field K
r r
r r
U 2 (r1 , rj ) = pair interaction = U 2 (rij ), (rij ) = (ri r j )
r r r
U 3 (ri , rj , rk ) = three body interaction (expensive to calculate)
13

For this reason, typically

U U1 (ri ) + U 2eff (rij )


i

i j >i

where U2eff includes some of the effects of the three body interactions.

14

The Lennard-Jones Potential


One of the most simple potentials used is the Lennard-Jones.
Typically used to simulate simple liquids and solids.

12 6
U (r ) = 4
r
r

is the well depth [energy]


is the interaction lengthscale

Very repulsive for r <


Potential minimum at r = 6 2
Weak attraction

(~ 1/ r ) for r > 2
6

15

The Lennard-Jones potential (U) and force (F) as a function of separation (r)
( = = 1)

1
0
1
2

0.5

1.5

2.5

3.5

0.5

1.5

2
r

2.5

3.5

1
0
1
2

16

Reduced Units
What is the evaporation temperature of a Lennard-Jones liquid?

What is an appropriate timestep for integration of the equations


of motion of Lennard-Jones particles?

What is the density of a liquid/gas made up of Lennard-Jones


molecules?

17

Number density * = 3
Temperature T * =

kT

P 3
Pressure P =

*
Time t =

t
2
m

In these units, numbers have physical significance.


Results valid for all Lennard-Jones molecules.
Easier to spot errors: (10-32 must be wrong!)
18

Integration Algorithms
An integration algorithm should
a)

Be fast, require little memory

b) Permit the use of a long timestep


c)

Duplicate the classical trajectory as closely as possible

d) Satisfy conservation of momentum and energy and be


time-reversible
e)

Simple and easy to program

19

Discussion of Integration Algorithms


a)

Not very important, by far the most expensive part of simulation is in


calculating the forces

b)

Very important because for a given total simulation time the longer the
timestep the less the number of force evaluation calls

c)

Not very important because no numerical algorithm will provide the


exact solution for long time (nearby trajectories deviate exponentially in
time). Recall that MD is a method for obtaining time averages over all
initial conditions under prescribed macroscopic constraints. Thus
conserving momentum and energy is more important.

d)

Very important (see C)

e)

Important, no need for complexity when no speed gains are possible.

20

The Verlet Algorithm


One of the most popular methods for at least the first few decades of MD.
r
r
r
r
r t + t = 2r t r t t + t 2 a t
r
F t
r
at =
m
Derivation: Taylor series expansion of rr t about time t

()

() (
()

()

()

r
1 r
r
r
r t + t ) = r t ) + t V t ) + t 2 a t ) + K
2
r
1 r
r
r
r t t ) = r t ) t V t ) + t 2 a t ) + K
2

ADVANTAGES
1)
Very compact and simple to program
2)
Excellent energy conservation properties (helped by time-reversibility)
21

3)
4)

Time reversible

r
r
r (t + t ) r (t t )

( )

4
Local error O t

DISADVANTAGES
1)

r
r
r
r t + t ) r t t )
Awkward handling of velocities V t ) =
2t
r
r
a) Need r t + t ) solution before getting V t )

( )

2
b) Error for velocities O t

2)

May be sensitive to truncation error because in


r
r
r
r
r t + t = 2r t r t t + t 2 a t

() (

()

a small number is added to the difference of two large numbers.


22

Improvements To The Verlet Algorithm


Beeman Algorithm:

r
r
r
4

a
t
a
t t )
(
)
(
r
r
2
r (t + t ) = r (t ) + tV (t ) + t
6
r
r
r
r
r
2a (t + t ) + 5a (t ) a (t t )
V ( t + t ) = V ( t ) + t
6
Coordinates equivalent to Verlet algorithm

r
V more accurate than Verlet

23

Predictor Corrector Algorithms


Basic structure:
a)

Predict positions, velocities, accelerations at t + t.

b) Evaluate accelerations from new positions and velocities (if forces are
velocity dependent.
c)

Correct the predicted positions, velocities, accelerations using the new


accelerations.

d) Go to (a).
Although these methods can be very accurate, the nature of MD simulations is
not well suited to them. The reason is that any prediction and correction which
does not take into account the motion of the neighbors is unreliable.

24

The concept of such a method is demonstrated here by the modified Beeman


algorithm that handles velocity dependent forces (discussed later).

a)
b)
c)
d)
e)

r
t 2 r
r
r
r
r (t + t ) = r (t ) + tV (t ) +
a ( t ) a ( t t )
6
rP
r
t r
r
V (t + t ) = V (t ) + 3a (t ) a (t t )
2
rP
1
r
r
a (t + t ) = F r (t + t ),V (t + t )
m
r
rc
t r
r
V (t + t ) = V (t ) + 2a (t + t ) + 5a (t ) (t t )
6
rP
rc
Replace V with V and go to c.

If there are no velocity dependent forces this reduces to the Beeman method
discussed above.
25

Periodic Boundary Conditions

Periodic boundary conditions are


very popular: Reduce surface
effects

Adapted from Computer Simulation of Liquids


by M.P. Allen & D.J. Tildesley,
Oxford Science Publications, 1987.

Todays computers can easily treat N > 1000 so artifacts from small systems
with periodic boundary conditions are limited.
Equilibrium properties unaffected.
Long wavelengths not possible.
In todays implementations, particle interacts with closest images of other
molecules.
26

Evaluating Macroscopic Quantities


Macroscopic quantities (observables) are defined as appropriate averages of
microscopic quantities.
r2
N
1
P
T=3
i
Nk i =1 2mi
2

1 N
= mi
V i =1
rr
1
r r
= mViVi + rij Fij

V i
i j
r
P
r i i
u=
mi
i

(macroscopic velocity).

If the system is not in equilibrium, these properties can be defined as a


function of space and time by averaging over extents (space, time) over
which change is small.
27

Starting The Simulation Up

Need initial conditions for positions and velocities of all molecules in the
system.

Typically initial density, temperature, number of particles known.

Because of the highly non-linear particle interaction starting at completely


arbitrary states is almost never successful.
If particle positions are initialized randomly, with
overwhelming probability at least one pair of particles
will be very close and lead to energy divergence.

Velocity degrees of freedom can be safely initialized using the equilibrium


distribution

E
P( E ) exp
kT
28

because the additive nature of the kinetic energy

r2
P
Ek = i
i 2m
N

leads to independent probability distributions for each particle velocity.

Liquids are typically started in


- a crystalline solid structure that melts during the
equilibration part of the simulation
- a quasi-random structure that avoids particle overlap
(see Moldy manual for an example).

29

Equilibration
Because systems are typically initialized in the incorrect state, potential energy
is usually either converted into thermal energy or thermal energy is consumed
as the system equilibrates. In order to stop the temperature from drifting, one
of the following methods is used:

r
Td r
Vi
1) Velocity rescaling Vi =
T
where Td is the desired temperature and

r2
Pi
2
T=

3Nk i 2mi
N

is the instantaneous temperature.

This is the simplest and most crude way of controlling the temperature
of the simulation.
30

2) Thermostat. A thermostat is a method to keep the temperature


constant by introducing it as a constraint into the equations of motion.
Thermostats will be described under Constrained Dynamics.

31

Long Range Forces, Cutoffs,


And Neighbor Lists
Lennard-Jones potential decays as r-6 which is reasonably fast.
However, the number of neighbors interacting with a particle grows as r 3.
Interaction is thus cut off at some distance rc to limit computational cost.
Most sensitive quantitites (surface tension) to the long range forces usually
calculated with rc approximately 10.
Typical calculations for hydrodynamics use rc approximately 2.5.
Electrostatic interactions require special methods
(Multiple expansions, Ewald sums) - See Moldy manual
32

Although system behavior (properties: equation of state, transport


coefficients, latent heat, elastic constants) are affected by rc, the new
ones can be measured, if required.
The simplest cut-off approach is a truncation

U (r ) r rc
U tr (r ) =
r > rc
O
Not favored because Utr(r) is discontinuous. Does not conserve energy.
This is fixed by the truncated and shifted potential

U (r ) U (rc ) r rc
U tr sh (r ) =
O
r > rc

33

Even with a small cut-off value the calculation cost is proportional to N 2


because of the need to examine all pair separations.
Cost reduced by neighbor lists

-Verlet list
-Cell index method

34

Verlet Neighbor Lists

(r

> rc )
so that neighbor pairs need not be calculated every timestep

Keep an expanded neighbor list

rl chosen such that need to test every 10-20 timesteps


Good for N < 1000
(too much storage required)

rl
3

7
7`

1
rc

Adapted from Computer Simulation


of Liquids, by M.P. Allen &
D.J. Tildesley, Oxford
Science Publications, 1987.

2
5

35

Cell-index Method
Divide simulation into m subcells in each direction (here m = 5 ).
Search only sub-cells within cut-off (conservative)
Example: If sub-cell size larger than cut-off for cell 13 only cells 7, 8, 9, 12, 13,
14, 17, 18, 19 need to be searched.

9 10

11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
36

In two dimensions cost is 4.5 NNc where

Nc =

N
m2

instead of

1
N ( N 1).
2

In three dimensions cost is 13.5 NNc instead of

1
N ( N 1).
2

(With Nc appropriately redefined)

37

Constraint Methods

Newtonian molecular dynamics conserves system energy, volume and


number of particles.

Engineering-physical systems typically operate at constant pressure,


temperature and exchange mass (i.e. they are open).

Methods to simulate these have been proposed*.

* These methods are capable of providing the correct statistical


description in equilibrium. Away from equilibrium there is no
proof that these methods provide the correct physics.
Therefore they need to be used with care, especially the crude
ones such as rescaling.

38

Constant Temperature Simulations

Newtons equations conserve energy and temperature is a variable.

In most calculations of practical interest we would like to prescribe temperature


(in reality, reservoirs, such as the atmosphere, interact with systems of interest
and keep temperature constant).

Similar considerations apply for pressure. We would like to perform constant


pressure calculations with variable system volume.

Three main types of approaches


-Velocity rescaling (crude)

39

- Extended system methods

(One or more extra degrees of freedom are


added to represent the reservoir.)

r
r r
r
dri Pi dPi r
Equations of motion
= ,
= Fi fPi
dt m dt
f is a dynamical variable and is given by

df kg
= (T Td )
dt Q

g = number of degrees of freedom


Q = reservoir mass (adjustable parameter-controls equilibration dynamics)

This method is known as the Nos-Hoover thermostat.

40

- Constraint methods (The equations of motion are constrained


to lie on a hyperplane in phase space).

In this case the constraint is

d
d
T Pi 2 = 0
dt
dt i

This leads to the following equations of motion

r r
dri Pi
= ,
dt m

N r r
r
P Fi
r
dPi r
= Fi Pi , = i =1N r
dt
Pi 2
i =1

is a Lagrange multiplier (not a dynamical variable).


A discussion of these and constant pressure methods can be found
in the Moldy Manual.
Note the velocity-dependent forces.

41

Homework Discussion

Use (and modify) control file control.argon. You can run short familiarization
runs with this file.
Control.argon calls the argon.in system specification file. Increase the number
of particles to at least 1000 (reduce small system effects and noise).
Note that density in control.argon is in units equivalent to
g/cm3 (= 1000 Kg/m3).
Reduce noise by increasing number of particles and/or averaging interval
(average-interval).
Make sure that temperature does not drift significantly from the original
temperature. You may want to rescale (scale-interval) more often and stop
rescaling (scale-end) later. Make sure you are not sampling while
rescaling!
By setting rdf-interval = 0 you disable the radial distribution function
calculation which makes the output file more readable.
42

Sample control file

#
# Sample control file. Different from the one provided with code.
#
sys-spec-file=argon.in
density=1.335
#density=1335 Kg/m3
temperature=120
#initial temperature=120K
scale-interval=2
#rescale velocities every 2 timesteps
scale-end=5000
#stop rescaling velocities at timestep 5000
nsteps=25000
#run a total of 25000 timesteps
print-interval=500 #print results every 500 timesteps
roll-interval=500
#keep rolling average results every 500 timesteps
begin-average=5001 #begin averaging at timestep 5001
average-interval=10000 #average for 10000 timesteps
step=0.01
#timestep 0.01ps
subcell=2
#subcell size for linked-cell method
strict-cutoff=1
#calculate forces using strict 0(N2) algorithm
cutoff=10.2
#3 * sigma
begin-rdf=2501 #begin radial distribution function calculation at timestep 2501
rdf-interval=0
rdf-out=2500
43

Introduction to Numerical Simulation (Fall 2003)


Problem Set #1 due September 12
This problem set is mainly intended to familiarize you with the algorithms associated with formu
lating a system of equations from a given problem description, and also to remind you of eigenvalues
and eigen vectors. We have used circuits and heat ow examples, and will use struts and joints
later in the term when we discuss nonlinear systems.
In problems (2) and (3), you will be using Matlab and modifying scripts that we have written for
you. You can download the les from the course web page.

1) This problem is intended to reenforce your understanding of the nodal analysis and nodebranch
equation formulation techniques.
Consider the following simple circuit.

R3

R2
0

Is

a) Apply nodal analysis to generate a linear system of equations which can be used to compute the
circuit node voltages. Where appropriate, please give matrix or vector entries as analytical formulas
in terms of R1 , R2 , R3 and Is .
b) Use the nodebranch approach to form a linear system of equations which can be used to compute
the circuit node voltages and resistor currents. Where appropriate, please give matrix or vector
entries as analytical formulas in terms of R1 , R2 , R3 and Is .
2) In order to help you better understand how to create a program which reads in schematics and
generates systems of equations, we have written a set of matlab functions and scripts which can
be used to read in a le containing a circuit description and set up the associated nodal analysis
1

matrix and the appropriate righthand side. In this problem, you will modify these functions and
scripts, so bear with us while we describe what they do.
The matlab script assumes that the circuit description is given as a list of statements in a le, and
that the element types are described in the following format.
resistors:
rlabel

node1

node2

val

where label is an arbitrary label, node1 and node2 are integer circuit node numbers, and val is

the resistors resistance (a oating point number).

current sources:

ilabel

node1

node2

val

where label is an arbitrary label, node1 and node2 are integer circuit node numbers, and val is

the current owing from node1 to node2 .

voltage sources:

vlabel

node1

node2

val

where label is an arbitrary label, node1 and node2 are integer circuit node numbers, and val is

the voltage between node1 and node2 .

voltagecontrolled current sources:

clabel

node1

node2

node3

node4

val

where label is an arbitrary label, node1 , node2 , node3 , and node4 are integer circuit node
numbers, and val is the controlled sources transconductance (denoted gm ), which means that the
current owing from node1 to node2 is val times the voltage dierence between node3 and node4
(i.e. i = gm (v3 v4 )). Note, ground is always node 0.
To use the supplied scripts to solve our supplied example circuit le, test.ckt, rst start up matlab.
When in matlab, type le = test.ckt to specify the name of the input le. Then type readckt to run
the script in the le readckt.m. This will read the le and put the information in arrays. Finally,
type loadMatrix to run the function in loadMatrix.m; this will create the conductance matrix and
the righthand side associated with the circuit in test.ckt. To determine the vector of node voltages,
type v = G\b.
The scripts we have provided have an implementation of nodal analysis for resistors, current sources
and voltagecontrolled current sources. Your job will be to extend the implementation to include
voltage sources for the special case where one terminal of the voltage source connected to ground.
To accomplish this, you will need to modify only the le loadMatrix.m.
You could, of course, switch the simulator so that it uses nodebranch analysis. However, for
networks of twoterminal positive resistors, the conductance matrix has a structure that is advan
tageous for numerical calculations (positive diagonals, negative odiagonals, symmetry, diagonal
dominance). Assuming that one terminal of the voltage source is connected to ground, it is possible
to implement voltage sources in your Matlab simulator in such a way that you will still preserve the

above properties of the conductance matrix. Please implement such a scheme in your Matlab simu
lator. Only implement what is needed for resistors, current sources, and grounded voltage sources,
do not bother with voltagecontrolled current sources. Be sure to test your modied simulator, you
will need a working version for problem (3).
Some Helpful Notes.
It will be necessary to make contributions to the righthand side (RHS) vector for each resistor that
is connected to a voltage source, and the matrix will require modications as well. In calculating
these modications, you will nd the array, sourcenodes generated by readckt helpful.
To make the simulator easier to debug, the node numbers in input les correspond to the row
numbers in the generated conductance matrix. If the input le node numbers are not contiguous,
or if voltage source nodes are not last, there will be rows in the resulting G matrix with only a
single one on the diagonal.
3) In this problem we will examine the heat conducting bar basic example, but will consider the
case of a leaky bar to give you practice developing a numerical technique for a new physical
problem.
With an appropriate input le, the simulator you developed in problem 2 can be used to solve
numerically the onedimensional Poisson equation with arbitrary boundary conditions. The Poisson
equation can be used to determine steadystate temperature distribution in a heatconducting bar,
as in
a
H(x)
2 T (x)
(T (x) T0 )
(1)
=
2
x
m
m
where T (x) is the temperature at a point in space x, H(x) is the heat generated at x, m is the
thermal conductivity along the metal bar, and a is the thermal conductivity from the bar to the
surrounding air. The temperature T0 is the surrounding air temperature. The ratio ma will be small
as heat moves much more easily along the bar than disspates from the bar into the surrounding air.
Now suppose one is trying to decide if it is necessary to have heat sink (a heat sink is usually just
a large chunk of metal which dissipates heat rapidly enough to stay close to room temperature)
connections at the ends of an electronic package. You can use simulation to help you make this
decision.
a) Use your Matlab simulator to numerically solve the above Poisson equation for T (x), x [0, 1],
given H(x) = 50 for x [0, 1], a = 0.001, and m = 0.1. In addition, assume the ambient air
temperature is T0 = 350, and T (0) = 300 and T (1) = 300. The boundary conditions at x = 0 and
x = 1 models heat sink connections to a cool metal cabinet at both ends of the package. That is,
it is assumed that the heat sink connections will insure both ends of the package are xed at near
room temperature. In your numerical calculation, how did you pick x? How do you know your
solution is accurate.
b) Now use your simulator to numerically solve the above equation for T (x), x [0, 1], given
H(x) = 50 for x [0, 1], a = 0.001, and m = 0.1. In addition, assume the ambient air temperature
is T0 = 350, and that T (0) and T (1) are unknown but Tx(0) = 0 and Tx(1) = 0. The zero heat ow
3

boundary conditions at x = 0 and x = 1 implies that there are no heat sinks at either end of the
package. Compare your results to part (a).
c) For the case examined in part (b) above, what will happen if a is identically zero. Can you solve
this problem? Can you nd a reasonable solution by examining what happens as a approaches
zero?
4) This last problem uses the simulator you developed in the previous question to remind you about
some of the properties of eigenvalues and eigenvectors before we use them in lecture. Be sure to
familiarize yourself with MATLABs eig command before you start.
a) Consider the leaky bar system in question 3, assume T0 = T (0) = T (1) = 0 and use ther
mal conductivity values in part 3a). For this system, nd a heat distribution, H(x), such that
maxx H(x) = 1 and when you solve for the temperature,
T (x) = H(x)
where is a real number. You need only plot H(x) and T (x), you do not need an analytical formula.

b) For the problem in part 4a, how many dierent H(x) and s are there?

c) Suppose a = 0, but all the other settings in part 3a) hold. How do the H(x)s and s which

satisfy
T (x) = H(x)
change? Please explain your results.

Introduction to Numerical Simulation (Fall 2003)


Problem Set #3 - due September 26th
Note: This problem set has only one problem, in order to give everyone a little time to catch up.
It does not cover the dierence between modied and unmodied QR. We will cover that issue in
the next problem set.
1) In this problem, you will write your own factorization algorithm based on row orthogonalization.
Such an approach makes it simpler to do numerical pivoting and is more easily compared to LU
factorization.
a) Write a matlab program for solving Mx = b, where M is square, which is based on making the
rows of M into a set of orthonormal vectors. Please remember to normalize the rows so that they
correspond to vectors of unit length. You may nd it helpful to look at our qr.m matlab program
which is based on orthogonalizing columns.
b) Consider using your row orthogonalization algorithm to solve a tridiagonal matrix (a tridiagonal matrix is one whose only nonzero entries are Mi,i , Mi,i+1 , and Mi,i1 ). Compare the operation
counts for sparse orthogonalization and sparse LU factorization of an n n tridiagonal matrix.
c) If a row that is about to be normalized in your row orthogonalization algorithm corresponds
to a vector with a very short length, then the normalization will increase the size of those matrix
elements, contributing to matrix growth and worsening round-o errors. It might be better to
exchange the unnormalized row with one of the other rows that has yet to be normalized, preferably
the one with the largest associated length. Such a row exchange is analogous to the row exchange
used in partial pivoting for LU factorization. Please modify your matlab row orthogonalization
program to perform such a pivoting algorithm.
d) What will happen if you apply your row orthogonalization algorithm with pivoting to a problem
in which M is singular? Please test your code on a singular example.


   !#"$% &('*)
+-,+./.01
23/45
"6798;:=<> @?A /4B5C0!D

E 1GFIHKJ*L!MONQPR!STNUJVMXWH9YZS[YQMO\X\$]TW^`_+abcSGJVL!S>d3e-fga\XhWb*MiJVL!^jMO^`_!\XST^`STH7JkaDJ*MOWHlNZMXHnm*o5prqsOtuaH!v=wxo5prqsOtzy
1|{}L+aDJ~MON-J*L!Sv!MXSTbcSTH!]SSrJIYSSTHJ*L!S[JIYWa\OhWb*MiJVL!^`NkC{}L!MO]kL9MON#SJcJVSb|aDH!vKYQLl
4 1Z&R!_!_WNcSWR>YSb*S-JVbU5MOH!h~JVWhSTH!SbVaDJ*S#aHS!aD^_!\XSWb YQL!MX]kLzWH!SZW!JVLlSh]ba\XhWb*MiJVL!^`N
]TWHSTbchSTH!]STN
B
aNUJVSTb#J*L+aHJVL!SWJ*L!STby{}L7CYZWR!\Xv9WRaWMOvR!NcMOH!hazNc5^`^SrJVb*MX]GJVSNcJ~Srla^`_!\OS
&1B\OSaDN*S[+H!vaH9Srla^`_!\OSGWb-YQLlMO]kL9WH!S>d>e-fa\OhWb*MXJ*L!^j]TWH7Sb*hSN~aDNcJVSb#JVL+aHJVLlS3WDJVL!SbTy

, 1&R!_!_WN*SzJ*L!S`d>e-fua\OhWb*MiJVL!^MONGa_!_!\XMOSTv;J*WN*W\i&MXH!h$YQL!STbcS

MXNGNc5^`^SrJVb*MX]zaH!v%L+aDNa\O\
!RlJQJIYZWWMXJ*N#STMXhSTH7Da\ORlSTN#MOHKJ*L!SMOH7JVSbcDa\l&kDlJVLlS3WDJVL!SbQJIYZW`STMOhSTH7Da\OR!SNQSMOH!hAaH!vly
1n&L!WYJ*L+aDJJVL!S%bcSTN*MXv!R+a\[_lb*W5v!R!]TSv7JVLlS=J*L!MObcvMiJVSbVaDJ*MOWHW3JVLlSd3e-fa\XhWb*MiJVL!^
MOHlSTPR+a\XMXJI
D
D

NVaJVMONU+STNJVLlS


~}

 MOH7JJ*L!MOHlKaWRlJQ]TWH!NcJ*b*R!]rJVMOHlhaJVLlMOb*v5IWbcv!STb#_W\X5H!W^MOa\WDJ*L!S[Wb*^

Z

V/

4B1{}L+aJQMON#JVL!SSTNUJ|WR!Hlv9WR]TaH+H!vWb
D5


1&_STb*MX^SH7Jka\X\XCSTb*Mi9_+abcJ*N a7~aDH!v /~79]TWHlNcJVbcR!]J*MOH!haH;S!aD^_!\XS A


&
v!MOahWH!a\
^aJVb*M

YQMXJ*LJVLlS[aWSGSTMXhSTHl_!b*W_STbUJVMXSTNTySGN*Rlb*SJ*W_!MO]k`aHSrla^`_!\OS~JVL+aDJL+aN#a>N*Rl`]TMXSTH7J#HR!^STbZWv!MXNcJVMXH!]J
STMXhSTH7Da\XR!STN/aH!v9_!\OWJr aN~azRlH!]J*MOWHWWHaz\OWhDI\XMOH!STabZ_l\OWJ-J*W`STbcMXCWR!bQWR!H!v!Ny
IV
01|R!^`STbcMO]aD\NcMO^R!\OaDJVMXWHWDZW5]TSTaHn+WYv&&H!a^MX]TN[MXNGR!N*Sv%MOH;YZSaDJ*L!STb_!bcSTv!MX]JVMXWH%aH!vnMXHnb*SN*SaDb*]kL=WH
h\XW+a\Y-abc^MXH!h!y%#L!S]TW^^`WH!\i%R!N*SvNcMO^R!\OaDJVMXWH@J*ST]kL!HlMOPR!STN`abcSK+aNcSTvWHb*S_STaDJVSvNcW\OR&JVMOWHW|a
WMON*NcWHSPR+aDJVMXWHAWbL5v!bcWNcJVaDJVMX]Q_!b*SN*N*Rlb*STNyZFJMXNPR!MiJVSQ]TW^^`WHzJVW3]TW^`_!RlJ*SQL5v!bcWNcJVaDJVMX]|_lb*STNcN*R!bcSTNZ
N*W\X5MOH!h[J*L!SQWMXN*NcWHSPR+aDJVMXWHAWH`a[v!W^aMOHJ*L+aDJ]TWSTb*NJVLlS~SH7JVMObcS#-JV\OaH7JVMO]#W5]TSTaH$7bcSTN*Rl\XJVMXH!hMOHz^`MO\X\OMOWH!N
WR!Hl5HlWYQHlNTyg#L!SWRlH!v+abU]WH!v!MiJVMXWH!NAWb`J*L+aDJWMXN*NcWHSTPR+aDJ*MOWHJVL!SHv!ST_STHlvWHJ*L!SNcST\OS]J*STv
_!L75N*MX]a\Z^W5v!ST\yFIHJ*L!MON3_!bcW!\OS^WRYQMO\X\BSrla^`MOH!SAN*W\X5MOH!havlMON*]b*SrJVMOSTvJIYZWDv!MO^`STH!NcMOWH!a\WMXN*NcWH
STPR+aJVMOWH@_!bcW!\XST^aNcN*W5]TMOaDJVSv=YQMiJVL%^W5v!S\OMOHlhCJVL!SAW5]TSaDH$aDH!v@Sla^MXH!SJVLlSMXHJ*STb*a]J*MOWH!N[SJIYZSTSHJVLlS
]kL!WMX]TSW_!L75N*MO]Ta\^W5v!S\ L!SH!]TS3WR!H!v+aDbcK]WH!v!MiJVMOWH!NkaDH!v9J*L!SbVaDJ*SWBd3e-f]WH7STbchSTHl]TSy
#L!S#W5]TSTaHMONaHR!HR!NcR+a\!vlW^aDMOH3S]aR!NcS#MXJMON^R!]kLYQMOvlSTbJ*L+aHvlSTST_6yW[^`W5v!ST\7JVLlMONT]TWH!NcMOv!SbN*W\X5MOH!h
azJIYWDv!MO^`STHlN*MOWH+a\$ WMONcN*WH9STPR+aJVMOWH$
Vl

Vl

( Vlr

YQL!Sb*S>%l DMON#J*L!S>LlWb*MXTWH7JkaD\
_WNcMXJ*MOWH$lKl MON#JVL!SSbcJ*MO]aD\
_WNcMXJ*MOWHWbQv!S_lJVL$aH!vA V!

MON-J*L!S[L75v!b*WNUJkaDJ*MO][_!bcSTNcN*R!bcS>aJ#_WN*MXJ*MOWHVy STbcS VlZMON#azbcMOhL7JcL+aH!vN*MOvlSGYQL!MO]kL^`W&v!S\ONQeWb*MOW\OMON


]TWHST]rJVMXSWbQWJVLlSTb#MOHlv!R!]TSv9Wb*]TSNTy#L!SWR!H!v+abUK]TWH!v!MXJ*MOWHlNQab*SDlWb-JVLlS3_lR!b*S>|SR!^aDH!H^W5v!ST\
Vl

V!

zg

l

Wb!Wb-JVL!S_+aDbcJV\iKGMXb*MO]kLl\OSJ^W5v!S\
V!B(
V!

Vl

A(&

l

1{@SL+aSYQb*MXJcJVSHna^aDJV\Oa9RlH!]J*MOWH+Omr5!D9|pkw6 jYQL!MO]kL;YQMX\O\hSTHlSTbVaJVSJ*L!SNc_+ab*NcS
^aDJVbcMO]TSNGaNcN*W5]TMOaDJVSvYQMXJ*L;v!MONc]TbcSJVMXTMXH!hWJVL;J*L!S_lR!b*Sz~STR!^aH!H%aDH!v;_+abUJV\iGMXb*MX]kL!\OSrJ|_lb*W!\XST^`NTy|#L!S
N*]b*MX_lJ3MONN*R!]kL=JVL+aJ3J*L+aDJv!MXN*]b*SJ*MOTaDJVMXWH%]aH=S`avcR!NcJ*STv$YQMXJ*L
aH!v@;lT@
YQL!Sb*S
aH!v
abcSMXH7JVSThSTb*NydSH!STb*aDJVSAJVL!S`^aDJVbcMO]TSN3WbJ*L!S]TaN*S`
l=

aDH!v
ly

|WDJVSBYQL!SHb*STPR!SNcJ*STvJVW;hSH!STb*aDJVSCJ*L!SC^aDJVbcMi@aNcN*W5]TMOaDJVSvYQMXJVLJ*L!SC_!Rlb*ST\i~STR!^aH!H_lb*W!\XST^9JVLlS
N*]b*MX_lJ-+Omr5GhSTH!SbVaDJ*STN~az^aJVb*M`YQL!MO]kLKWbc]TSTN#J*L!S[_WDJVSTH7J*Mxa\$MXHKWH!S[]Wb*H!Sb#W JVL!S[_!bcW!\XST^vlW^aDMOHJ*W
S3STbcW!yB{}LMXN-JVL+aJ
4B1#L!S#^aJVb*MX]TSNhSTHlSTbVaJVSTvAMXH_+abUJ a7 ab*S#NU5^^`SJ*b*MO]aH!vz]TaHzS#NcW\XSTvYQMiJVL`d3e-fR!NcMOH!haJ*b*R!H!]TaDJVSv
+a]kWbcJ*L!WhWH+a\OMXaDJ*MOWH6y FI^`_!\XST^`STH7JZJ*L!SJ*b*R!H!]TaDJVSvK+a]kWbcJ*L!WhWH!a\OMXaDJ*MOWHzMXHJVL!Swxo5prqsOtb*WRlJVMXH!SaDH!v
]TW^_+aDb*SZJ*L!SJ*MO^`SZbcSTPR!MObcSTvJ*WG]aDb*bc>WRlJ d>e-fMXJ*STbVaJVMOWH!NWHJVLlS-hSTHlSTbVaJVSTvz^aDJ*b*MO]STNYQMXJ*L]TW^`_!\XSJVS

aH!vJVbcR!H!]TaDJVSv+a]kWbcJ*L!WhWH!a\OMXaDJ*MOWH!N cR!NUJQR!N*SWR!b#Y-aDJ*]kLJ*WAJ*MO^`Sy
&1[|_!_!\iKWR!b~^`W&v!Mi+STvd3e-fa\OhWb*MXJ*L!^

N*SrJJVLlS3J*W\OSbVaH!]S3J*W

55-JVW`N*W\i&MXH!hAJVL!S>v!MONc]Tb*SrJVMXTSTv_!R!bcS


|SR!^aH!HnaHlv;_+abUJV\X9GMXb*MX]kL!\OSrJQ_!b*Wl\OST^`NTR!NcMOH!haH7b*aH!v!W^ST]rJVWbWby|\XWJ [

aNaRlH!]J*MOWHnWZWH%a\OWhI\OMXH!SaDb~_l\OWJ|WbWJVLn_lb*W!\XST^`NTy
WYYZWR!\Xv;WR=aH+aD\X5TS3YQLJVL!S_!abcJ*\X
GMObcMO]kL!\XSJ#_lb*W!\XST^ ]TWH7Sb*hSN|aNUJVSbk
1


_&JVMOWH+a\e-aHWR9+HlvaAhW5W&vK_!bcST]WH!v!MiJVMOWH!STb-Wb-J*L!S_+abUJV\XCGMObcMO]kL!\XSJ-_lb*W!\XST^


   !#"$% &('*)
+-,+./.01
2345&
"$687:9<;= >@?A /4BDC.!E

FHGJILKMON#P!QSRUT!V*GW!XYKZ[R\KI]P+^JRUG_!Xa`G_bKcT!V*GW!XYKZedQY_DGV*f!KVgILGihQSjKkKjKV\`G_!KA^lXYQaI*I*XYKmILQYZAKmILGino^JILnpP%q!T$r
stImnoGjKoV*RmG_bXS`uILP!KvW+^JR*QYnR]GwBILPbKvG_!Kyxtf!QYZAKo_bR*QYG_+^XFHKz#I*G_%Z{KyILP!G|f$r]}@Kcz~QYXYX noGjKoVmZcq!XaILQYfbQYZAKo_!R\QYG_+^JX
FHKyz#ILG_eZ{KyILP!G|f!R#QS_8I*P!Km_!Ky|IHT!V*GW!XYKZR*KyIr
C 1Us_8ILPbQYR~T!V*GW!XYKZedb`Gq8z~QYXYXz~V*QSI*K3^AFOKz#I*G_D^XShGV\QSILPbZw5GV#R\GXSj|QS_!hvILP!KmR\I*V*qbI*Rg^J_!fv\GQY_6ILR#T!V\GW!XSKoZQS_
b
ILPbK{+hqbV*KAWKXYGzmrlGq@z~QSXYXBq!R\K{I*P+^JIkFHKz#I*G_R\GXSjKoVmILGQS_6jKoRILQSh6^JILK{^eTbV*GW!XSKoZI*P+^JI=P+^RmIzGDnGV*V\KonyI
R*GXYqbI*QYG_!RrFOGILKJdJI*P!KTbV*GW!XSKoZQSR>R\`|Z{ZAKI*V*QSn^JWGq&I I*P!Kx^&QYRodR\GgQSI>no^_kWKILV*Ko^JILKfc^R^~G_!Kyxtf!QYZAKo_bR*QYG_+^X
T!V\GW!XYKZer
x1, y1

(0,0)

(d,0)

+GV-ILPbQYR#T!V\GW!XYKZed!^R*R\q!Z{K33gBmQY_ILP!KmRILV*q&IHnG_!RILQSI*qbILQajKmKoq+^ILQYG_$r
1e+GVILP!KR\ILV\qbILRi^_!f@\GQS_I*RlK!^JZ{T!XSKD^WGjKJdOR\q!T!TGR*KDI*P!K<^TbT!XYQSKofw5GV*noKP!^Ri_!G_boKoV\G(^_!f
noGZ{TG_!K_I*RmR\G=`Gqn^_{_!GIq!R*KUR`|Z{ZAKI*V\`byr]QSjKOI*P!KU_!G|f+^X+w5GV\Zw5GVR\`|R\I*KoZGJw
_!G_!XSQY_!Ko^VKoq+^JI*QYG_bR
z~P!QSnpPn^J_uWKmq!R*KfeI*GAfbKILKV*ZAQY_!KUI*P!K]jJ^XYq!KR~Gw$-^_!fu|hQajKo_u+d/ =^_bf>Jr
4 1O&q!TbTGR\KHzKg_!Gz^JR*R*qbZ{K~ILP+^II*P!Kg^JT!T!XYQSKofw5GV*nKgQSRG_!XS`A^JnILQS_!hmQY_AILP!KO_!Koh^JILQajKOvf!QYV\KonI*QYG_rB]QSjK
B
ILPbK3G_!KUf!QSZ{K_!R*QSG_+^X_!G_!XSQY_!Ko^VK|q!^JILQSG_iw5GV-W6`lKoXYQSZ{QS_+^JI*QY_!hmILP!KU$jJ^V\Q^W!XSKgq!R\QY_!hvI*P!KUw^nIILP+^JI-W`
R\`|ZAZ{KyILV`iILP!KmP!GV*QYG_6Ip^Xw5GV*nKoR~Zcq!RIHWKmQY_8W+^XY^_!nKr
&1&qbT!TGR*K%[ uQY_:ILP!K{Kyb^ZAT!XYK{^WGjKd^_bf^R\R*q!ZAKQYR=^nI*QY_!heQY_:ILP!K{_!Koh6^ILQSjKl%f!QSV*KonyILQSG_@R\G
ILP!^JIm`Gq@n^_:q!R*KILPbK{G_bKyxfbQYZAKo_!R\QYG_+^JXKoq+^ILQYG_@f!KV*QajKof@QY_%T+^VIlW/rA}@V*QaILK^8Zl^JI*X^W<R\noV\QYTbImILG8q!R\K
FHKyz#ILG_RZ{KyILP!G|fvILG]no^XYnq!X^JI*KILPbK\GQY_6IBf!Ky+KonyILQYG_l^R^Uw5q!_bnILQSG_GwW6`vR\IL^V\I*QY_!hUz~QaILPA U(&p O(
^_!f<I*P!Ko_%QY_bnoV*Ko^R*QS_!hiILPbKvZl^h_!QSI*q!f!KkGJwI*P!K^T!TbXYQYKfw5GV\noKvQY_KoQYhPIUQY_!nV*KoZAKo_6I*R]GwA>kgb6&r=HR\K
^RI*P!KgQS_!QSI*Q^Xbhq!KoR\Rw5GVFHKz#I*G_$R-ZAKILPbG&fAILP!KHR*GXSqbILQSG_lGwILP!KUTbV*Kj|QSGq!RXSG6^f$r+GVI*P!KU(=R*GXSjKJdbq!R\K

r
ILPbK3QS_!QSI*Q^Xhq!KR*RO|B

&1mFOGzI*P+^JIO`Gq<_!GzI*P!KkR\GXYqbI*QYG_uw5GVH mHbSDZAK^J_!QY_!hAILP!K3w5GV*nKkQSRg^nyILQS_!h{fbGz~_6z^V*f+ydzGV*
W+^np6z^V*fbRgW6`V*Kf!q!nQY_!hvILPbK3Z{^h_bQSILqbf!KUGw ILP!Km^JT!T!XYQSKofiw5GV\noKmQY_KQYhP6I~RILKoTbRod O&6|r]_bnoK3^h6^QS_$d
ILPbKkQY_bQSILQY^X
]q!KR*RHw5GV~I*P!KcFOKz#I*G_QaILKoV*^JILQSG_u^JIHK^npPDXSG6^fuR\I*KoTR\P!Gq!XSfWK3I*P!KkR\GXYq&ILQYG_uGJwILP!K3T!V\Kj|QYGqbR
XYG^f8R\ILKT$r
1H}P6`lQYRI*P!K]R*GXSqbILQSG_l`GqefbKILKV*ZAQY_!Kfiw5GV#>Obd|ILP!K]XY^R\I#XSG6^fiRILKT$d+Gw T+^VI3nf!QSKoV\Ko_6I#w5V*GZILPbK
R*GXYqbI*QYG_`GqenGZ{TbqbILKfuw5GV-I*P!K3 Obd!ILPbKm!V*R\IOXSG6^fR\I*KoT$d^JI#I*P!KmWKhQY_b_!QY_!hvGJwT+^JV\Ikf/\


   !#"$% &('*)
+-,+./.01
23/45
"687:9<;3  =?>@ 4AB,6C
DFEGH6IKJMLHON6PHRQ$HG8PGNSGTH&UVH!NSLXW#Y[Z]\A^_PB`ASE6abLH&c
PH5Gd^PBeAEG8UAfgH:fAe
GT^_L-GT\gHh$i+GT\kjgabflG8^G
\bN6P]PHm
H&STN6L#`ASE6abLHOcnP5Y

o 1qpVr!sutdvxw5y!sz|{|tO}Ky!{~{|!&{|wt!~wr+}suyd+{uz|s~-}ws*t!syF]yO*{y!{6O~*{|wtsu+{|wt!~u!s!{|z{|&
*{!vTt+z5~{|~-5!{|uz|zBzsy!~-*wr!sXwzz|wk{|t!}Kt!wt!z{|t!su#w{|~*~wtdsu+T*{|wt
&
&


T

k{*rr!sAw!tOy+uwt!yO{{wt!~

x+
+

@
Tt!y

O
R

kd~*{|v@!zsM+t!{*sy!{susut!s@~r!suv@s{~3O~*suy?wd~*wzsr!s@t!wt!z{|t!su[w{~*~wts5!T{wt?wttktOw&y!s
}{|y$O*r!sy!{|~usss5!T{wt!~q*s

kr!s*sAsAk{z|zwt!~{|y!s$r!sA{t*suTz

Xw#

 

t!y

M 

M 


T

&

T
T

M 


wTs
t!w
r!s8t!w5y!su~
t!y
*s]t!wT{|t!z|!y!sy{|trOs

{|wt$!
!
]
!*r!
 *w!}r]
 r!sw!t!y!]w
t!y!{{wt!~

y!{~*us*M
{|
O
k*
TrO
suk sutsq

A*wsk*r+TA*r!squw!{tT~*~*w5u{|Tsyk{rM*r!skwsky!{|~us{usy@s+T{wt!~A{~At!wt!~{|t!}Oz*su}*y!zsu~~
*r!sTz|Osu~Xwr!s
w

~ pVr!T#y!w5su~*r!{|~{|v@!zwO-y+Tvxsuy8qs#wt]vxsr!w5y!~-T!!z|{suy@w~*wz5{|t!}

rO{|~k!*w!z|sv

wzsKr!sTwsKsu+T*{|wt!~k{r%xvM!z*{|y!{vxst!~*{wt+z$s#*wt ~vxsr!w5y<!~{|t!}]Bsuw]{|t!{{Tz
}!su~~

X wrOs
~ut!y

@ suv@wt!~*Tsr!Tw!!w}*vr!{ssu~K+Ty!T*{|uwtsu}sut!sATt!y
s
*r!st!vM
suw

s#*wt]{*su*T{wt!~*su!{*syBwM{t!~*!swt!s+{|t
y!ssuvx{t!
u!*{|tx*r!s~wz|&
{wt]Xw-wxu~*s~ubkr!sut
Tt!ydkrOsut pVr+Tqr+T!st!~#kr!sut 



!~{|t!}xy+v@syRs#wt<*wB~*wzs[Xw*r!s
~kr!st
ww!}rOz8r!wz}s3 ut
q

u!*@{txXssur+t
*su~{|y!!zsTz|+{|wt!~ s~*O*s*wMuw!t
w]uwv@!O*s
w3wt!s!{|t
 !r*wRy+v@r!s]t!s# wty!sz
 su Xw*vwRyOss*v@{|t!s]rOwvM
rOsBX!t!*{|wts Tz|+{|wt!~Kw
? w

vxt]z|{t!s~5~*suv~*wzO{wt!~sus3su!{*suy

, 1@wtO~*{|yOsu3*r!s]Tuw!{|t&_X*sus@v@srOw&y?y!su~u*{sy{|t:uz|~*~ t:r+!O*wr6gs#wt ~Mv@s*r!w5y{|~



!~suy*w?~wzsd
t!yr!sF[-z|}w*{r!v{|~!~suyw:~*wzs]r!sdqs#wt!y+T *sFs+T{wt$


k{|zz+t!wTw*suy6&!&-{|~{|tO~suyBwtOz

A ~*{t!}[-{vxOz|{|s~gr+

/wK
X/
*w5y!!*~u6
Xy
/{|~
*r!sut]vxy!suw!{t wv*{ Xsus
~suy]
!
Xw *vv/ *{s*w/O
!
r

s
@
v

s
*

!
r
&
w

X/

8!~*{t!}r!s[O!*w&{|vxT{wt

w =rOs@t!vxsxw!{|t&XsusMXwzz|wk~X*wvrOs@r+3wt!z%]wO{t!sw8uwvx!&s@
!
{|~t!susy!suy6
Oqt!wTwO{t!swwvxOOs

!

X
w-rOs[yO{|~**s{|suydt!wt!z{|t!su-w{|~~*wtdsu+{|wt8Xwv!w!zsuvwt!s+wvx!*srOs[wts*}st!us3T*s
~t!y+T*ys#wtwFr!s]wtsu}sutOusdT*sBw#rOsduwO{t&_X*ssv@srOw&y
wq
wRuwv@+sBuwtsu}s

t!yds&!z|{|t]w!k*s
su~+vTs[~*suv@{|zw}!zw#wq|
| |
su~*!~BXw-r!swxv@s*r!w5y!~k
/s tu +!~*s X/
~*Oz~ - w#!O*w~*s~wr!{~ks&s*{vx
t!y
w-*r!sK3-z}w{rOvF&~*~!v@s

kr!sus
{|~qr!s *su~{|y!!zgtOy
uwtsu}sut!s@krOsut

{|t!z|z!kr!st?T!!*w&{|vxT*{|t!}
T

rOs[vxT**{_suw#O*w5y!!u+!~*s
y!s!t!suyws


  



t!y
r+TK{|~lr!w

w u!sutwvxs@w!Muw!{t5X*ssxv@s*r!w5y:vxw5y!{5{t!}

uzw~*s#*w[suw3ut@w@vTsq
!~{|t!}~t!y+T*yxs#*wt ~
wnuzw~*s#*w[suw3ut@w@vTsq
/A
/

v@srOw&y

!!w~*sBwsB{|ts*s~sy{|tzw u!uz|!zT*{|wtO~ugTt!ywt!z?tOsusuy*wRuwtsu*}s8s#*wt ~
v@KsrOw&y<~wx*r+T

k{|zz$w%rO{|ssKrOsKy!s~*{|suyRTuuO{tFrO s
wqkr+Tz|Os3w

Xssu~X!tO{wt8sTz+T{wt!~

t!wTr!sO!*wTrw%5{|t!}%w?!!zs#wt ~xvxsr!w5yO~*{|tO}?wtOzX!tO{wtsTz|!T{wt!~{|~*w
y
w vx!&sr!s3w!{|td!!w&{vT*suz]+t!{sy!{6 s*sutOusu~
u
r+Tk{|~



 

X * #





kr!s*s
{|~-rOs[su*wkk{r
{|t]rOs
3sutdt!ydusuw~qsuz|~skr!s*s - w
suqs#wtF{sT*{|wtxwOz|yd ~*!r!t!{*syO{su*st!usv@s*r!w&y]su!{*s

vtBXOt!*{|wtdsTz+T{wt!~

0 1t*r!{|~q!w!z|svF!wk{z|z
y!s!!}BwOvxTz|8+~suyRqs#wt<~*wzsqXwqTz|uOzT*{|t!}rOs[Xwus3su!{

z|{!*{!vw~*{{|wtw w{t~M{tz|wy!suy~!*!*s wKw*susu{t!z%suzuwv@sx*w<k*{*sxw!wkt
s#wtF~*wzskXwv~u*Tr$+kr!{r!ssuqw8+t!yds ~{|s



r!s[!*w}v~*r!w!z|yd*suy]r!sXwzz|wk{|t!}3Xw*vxTk~k{t!!O

w{t*~u

 "!$#$ %'&)(*#$%,+.-/!$#$0 132,&54$68796:&;% <2,&54$687=6:&;%

 "!$#$ {|~ktd!{*z|sz_ %'&)(*#$%,+.-/!$#$0 {~#t8{|ts}su>w{|t#t!w5y!st!vMsu/t!y 132,&54$687=6:&;% Tt!y


<2,&54$687=6:&;% sq*r!s w{|t ~ t!y@?uw5w*yO{|t+T*su~Akr!str!sq~**O*~-stOw~s*r!suy r!sq~**!!s{|~t!w

kr!s*s

z|wy!suy

~**O*~u

4$ "!$#$ %'&)(*#)A %'&)(*#$B #$ ;4$796:C$6879<


 "!$#$ {|~qtF!{*BzTsz_ %'&)(*#)A t!y %'&)(*#$B T*s{|ts}suw{tkt!w&yOs[t!vKs*~utOy #$ ;4$796:C$687=<

kr!s*s
{|~#*r!ssuzT~{u{w rOs[~O
z|wy!~u

8 "!$#$ %'&)(*#$%,+*-/!$#$0 1 DE&;0FC# <DE&;0FC#


kr!s*s  "!$#$ {|~3t!{*%zsuz %'&)(*#$%,+*-/!$#$0 {|~r!sGw{|tKtOw&y!sxt!vMsu3krOsu*sxr!sBzwy:Xw*s{|~
!Oz|{|sy$+tOy 1 DE&;0FC# t!y <HDE&;0FC# *s*r!s
t!yI?Bwv@wtOsut~kw r!sXwus

Aw<y!su{|y!s[wx!~*s3w!vxT*z8~**{O~qk{rFsX!zz]{t!~*s*suy<!O}~ur!st<wtF*r!sKw!~*sKsu<!}s
wFk{|zz+tOy~*sqw FTz|B+z|s~#kr!{|rd!~*s[qs#wt ~qv@srOw&y]w+tOyd*r!ssu!{|z{|!{|!vw~*{{wt]wl
~*swTl~*&~t!y w{|t~#~! ssyws&*sut+z
Xw*su~ r!sw&y!s!~su~qr!st!w5y+z
Tt+z5~{|~-XwvM!z{|wt$
~*wwFvxBk{|~rd*w@s5{|s *r+TkvxTs*{Tz6sXws[~ {t!}Mr!{~k!*wOz|suv

L

KJ

*

M

0F# "(;4$<54$7#$-/NO- %'#$PK7&;%QNO- 0F# "(;4$<R4$7#$-/NO%'#$PK7&;%QNOO&)"(;ST#$PK7&;%QNODE&;0FC#)NO%'#$PK7&;%QNO- O&)"(;SG#$PK7&;%QNO-

r sy!{su+zsu~XwrOsM~**O*~ w{t~w&y!sM*s
!
t!y
*suy!~{t

rOsx!tO~s*r!suy
{|t!{{Tz w{|tw~*{{|wt!~u
rOs@~OKwt!t!s*{5{ltOy?rOsxT!!z|{suyRXwusu~

uwt{|tO~k*r!suw5y!sXwks #wt ~qvxsr!w5y


r!s+zs
wt!~*O~krOs[{|}rr+tOyd~{|y!sTt!y
w!{TtXws#wt ~@vxsr!w5 y
rOs8!z|s
uwtT{|t!~w&y!s]*wRzu!z|Ts*r!s8~**OXwusu~Tt!y
susu~~w%v@w&y!{:twTz|zw
y!s*{TT*{su~wkXwusu~ vxs]t!

Tt!y

~
5

u
~


u
s

!
r

s
*

3
s


!
r

u
s
[
s

u
s

|
z
u
s
~
k

O
r
|
{

F
r
k


{
|
z

z

}

{


s
u


z
!

u
s

~


k
~

@
w
k

+
r
T

|
{
#
~
k


w
O
t
}


s

~5~

t!y8s~ ~5~

DE&;0FC#)NO-

FU

U

A*w5{|y!s3qs#wt~*wzs*r+Tkw*~wtds~ ~&~*su~ ~5~t!yds~ ~&~


wxy!wx*r!{|~yOss*v@{|t!s

{lr!s3!*w5{|y!sy<uw5y!s[w~ {g{y!w5su~w<r+s3w !~{]r!{~wt!uz!~*{wt${l{y!w&s~*t w zz r!s

w@s&!z|{|tBkrdtOydO]{

G

svxwt!~*T*s@*r+TKw!Msu~!z*{|t!}s#*wt ~Mv@srOw&y$g!!z{|suy:wFsrw#*r!ssOvxOz|sO*w!zsuv@~u

uw tsu}su~5!y!{|uz|z

V1

r!s*ssMxXs
wtFvxsr!w5y
s#

v@ws3s~sOvxOz|su~qr+k{|zz=!~s!*w!z|svx~sstRXwxuw*suz!w}*vxv@suy

W

8w!y!suO!}}syRs#wt&+~suyR~**O*~tOy w{t*~~wzsuwts!Tvx!zs[*su~
#y8
kr
t!

<w!y!suO!}}sys#wt5+~suy?~O~[tOy

pVrd*s*r!s[t!~su~yO{su*st

^]

\1

`_

FX ~5~

pVr+Tr+Ost!~

Yw{t~~*wzswt:sOv@!z|s~su~Z ~5~Kt!yRs~[ ~&~

LU

Eb b b 

a[ UU[

~
r!sB*y!{{wt+z s#*wtv@srOw&ywts~
{|t+z At*{su~M?vx{y!t!{}r *wsu
[s rOw&ydwBwtsu}swt*r!sMv@w~sOv@!zsu~u6t!yR{tFrOs
A*{u sKXwkrOsMs*~wt<kr!wButR}ss#wt ~v@

T~*s[wgM*{|sO{|t]r!sXss~{sT*{|wtO~

c

r!t!}sw!~*&~tOy w{|t~3!*w}v~w<r+TKr!sXw*us{|~KtOw sOwt!st{Tz|z:suzsuy?wRr!w


su{+uz|zg~!!w~*s*r!sBvx}t!{!yOsw#*r!s~*&*s~w*{|tO}Xw*sB{|~
vM !r*r!sB~*&r+~M~srOsuy

}{sutd

 &
wTs{O*r!sr!t!}s{|tKzsut!}TrK{~~vTz|z_*r!s-wsXw*vsuy!!su~g*wrOsv@w&y!szwTz|*suyO{vx!zsuv@sutsy
r!s8w!kt!s!*w}vwt8~*wv@s
 nsOv@!z|s~
s*{B5!y!{|wts*}st!us

ed

gihkj  ml onqp rg  j    l  wg 


Ef
f9s)t vu s f f f=s)t
HJ axy]
z

{b

kr+Tss~*rOsuv@su~wut*r!{|t!wTw:}sBqs#wt ~xvxsr!w5y*w?wts*}sFwtn~@vxtw
s
~ ~5~r!w!}r*su~
sKXws*tOsuy$r!w ssu
ssutRsKr+sut tOsuus~*~**{zF~*!
~5~~qwRut


ussuy!syk{*rdz|zwl*r!suv

|

}


   !#"$% &('*)
+-,+./.01
2345&
"6798:<;= <>&?@A4B<C7D

C 1GFIHKJ*L!MONQP!RTSU!VXWZY\[S]K^_MXVOV!WG`!abYKMXH!W_VOS@caVEJ*R*]!HEcabJ*MOSHdWZRTR*SRe@NfJgaUEMOVOMhJI[iaHEjkcZSHlWZRTmWZH!cWon5SRpaHdWq`EaYdP!VOW
E
YdWJrLES&jkJ*L+abJ#jES&WN-H!SJscZSH7lWR*mWtlWZRT[K^sWVOVuQvsSH!NTMOj!WR-]!N*MXH!mwJ*L!Won5SVXVOS^_MOH!m3MOH7J*WZmR*abJrMXSHkYdWJ*L!S&jKJrSxNTSVXlW
n5SR-J*L!Wtyz5{T|-^_L!MOcgL}NrabJ*MONf~+WZN yz5{T|Q!yz5{T|Ge
I
y+yZ
(+y

^_L!WR*WybuQWR*WtyaPEP!R*S`&MOYKabJrWN-yz5{T|#abJ#J*MOYdWPSMOH7J#{Q(

a7|#WJ*WZRTYKMXH!WoJrLEW3VXS&cZaV6J*R*]!H!cZabJrMXSH9WZRTR*SR_Sn JrLEMONdTVXWaP&n5R*SmiYKWqJrL!S@j$u
U/|-FIN#JrL!WYdWJ*L!S&jNfJgaU!VXW>MXVOVJ*L!WYKWqJrL!S@jcZSH7lWZR*mW
c |-BVXSJ_aH!j9cSYdP+aR*WJrL!WtcSYKPE]EJrWj>abH!jJrL!WtWG`EacJN*SVO]EJ*MOSHkn5SRsJrL!WocaNTWwk=e+aH!j9SH9J*L!WtMOH7JrWRTlbaV

{-EZbuQN*W e &&e!aH!j Ehu
j/|#
S&SkcabR*Wn5]EVOVX[abJ#[S]!R_P!VOSJ*NZe+abH!j>WG`&P!VaMXH9[S]!R_RTWZN*]EVXJrN_MXH}P!aRTJ_cbu
, 1FIH}J*L!MON-PER*SU!VXWZY[S]^_MOVXV6m7aMXH9P!RrabcJrMXcZW^_MhJrL9JrL!WJ*WZcgL!HEMO]!WZN_n5SR#WNTJrMXYkaJrMOHEmxVOS@caVJrRT]!H!caJrMOSH9WZR*RTSRZe

aH!j}aVXN*SxWG`EaYKMXH!Wo^_L+aJ_L+aP!PWZH!N-J*SxJrL!WabcZcZ]ERracq[SnaH9MXHJ*WZmR*abJrMXSHYdWJ*L!S&jK^_L!WZHJrL!WR*WaRTWpT]!YdP!N-MXH
JrLEWdPER*SU!VXWZYj!WZNTcZR*MXPEJrMXSH$NtH!SH!VXMOH!WZaRon5]!HEcJrMXSH$uvsSH!N*MXj!WZR]!NTMOH!m9JrLEWxn5SVOVXS^_MXH!mMXHJ*WZmR*abJrMXSH<YdWJ*L!S&jAJ*S
N*SVXlWtn5SR-J*L!Wtyz5{T|-^_L!MOcgL}NraJrMONf~+WZN  yz5{T|Qz5yz5{T|T|qe
I
yy&

zETz5y| 7qz5y@

7 z5y|*|*|

^_L!WR*WybuQSJ*WtJrL+abJ#y MON#MXHJ*WZH!jEWZjJrSdaP!P!RTS`EMXYkaJrWyz5{T|#abJ#J*MOYdWPSMOH7J#{Q

a7|QWJ*WZR*YdMOHEW-JrL!W_lbaVO]!WNSbn
EQaHEj7sNTS=JrL!abJQJrL!WVOS@caV&JrR*]EH!cabJ*MOSHdWZRTR*SRpSnJrL!WaUSlWYdWJ*L!S@jKMXNpSR*j!WR

! upH!W3aP!P!RTS7acgLn5SR#cZSYdP!]EJ*MOH!miJ*L!WcZS@WdcZMXWZH7JrNMON-J*SJ*WZNfJJ*L!Wtn5SR*Yw]!Vax]!NTMOH!miyz{T|p*{G*{ rZ{fu
U/|&]!P!PSN*Wtz5yz{T|*|QEyz5{T|GuQN*WJrL!WoaUSlWoYdWJrLES&j$e^_MXJ*LKJ*L!WocS&WqKcMOWHJ*Ns[S]j!WJ*WZRTYKMXH!WZjMXHAza7|qe
JrScSYKPE]EJrWxJ*L!WiN*SVX]EJrMXSH<n5SRoJ*L!WMOH7JrWRTlbaV {=(EZuvsSYdP+aRTWw[S]!RcZSYdP!]EJ*WZj%N*SVX]EJrMXSHJ*SJ*L!WWq`EacqJ

N*SVO]EJ*MOSH}abJ_{\n5SR \ b    E b7 hXh  usS^naNfJMON#JrL!W=aUSlW3YKWqJrL!S@j>cSH7lWR*mMXH!mKJ*S
JrLEW3WG`EacJN*SVX]EJrMXSH/
c|}S^NT]!P!PSN*Wz5yz5{T|T|
yz{T|n5SRyz5{T|
EaH!jz5yz5{T|T|
oEyz{T|}SJ*L!WZRf^_MON*WbuNTWJrLEW
aUSlW<YdWJ*L!S&j6e^_MXJ*LJ*L!WcS&WqKcMOWHJ*Nk[S]j!WJ*WZRTYKMXH!WZjMOHza7|qeJrScSYdP!]EJrWJrLEWN*SVO]EJ*MOSHn5SRJrL!W>MOH&
* *X!g* X
JrWRTlbaV6{sEZupvsSYdP+aR*W[S]!R#cSYdP!]EJrWj9N*SVX]EJrMXSHkJ*SJ*L!WWG`!abcJN*SVX]EJrMXSH9abJs{QdzE
|

n5SR     & 7 Xhh 


`&P!VaMXH9[S]!R_RTWZN*]EVXJrNu

uS^nabNTJMONJ*L!W-abUSlWsYKWqJrL!S@jwcSH7lWZRTmMOHEmoJrSoJ*L!WsWG`EacJBNTSVO]&JrMOSH/

01FIH<JrLEMONoPER*SU!VXWZY[S]<^_MOVXV cSYdP+aR*W3JrL!W=~+H!MXJ*Wqj!MXWZRTWZH!cWxaH!jN*L!S@SJrMXH!mkYdWJ*L!S@j!Nn5SRN*SVhl@MOH!mkakH!SH&
VOMXH!WabRkP!RTSU!VOWY>upS]N*LES]!VOjN*WZW<J*L+abJKJrL!W<JI^SYKWqJrL!S@j!NkaR*W<]!MhJrW<jEMXWZR*WH7JuvsSH!NTMOj!WRkJrLEW<N*cZaVaR
WZ]+aJrMOSH

yz5{T|Q(NTMOH!L_ycZSN  {

&]!PEPSNTWxJrL!MXNoW@]!abJrMXSHAL+aNtaKPWR*MOS@j!MXc=NfJrWabjE[7INTJrabJrWiNTSVO]&JrMOSHSnQPWZRTMOS@j>e6MuWuheyz{T|_yz5{ |
^_L!WR*WdMXNaH%MOH7JrWmWZRuiFIHJ*L!MONtPER*SU!VXWZY[S]%^_MOVOVWq`EaYdMOH!WxJI^S}j!MXWZRTWZH7J=aP!P!RTS7acgL!WN3n5SR~!H!j!MOHEmJ*L!MON
PWZR*MXS&jEMOcNTSVO]EJ*MOSH6u#L+aJMONZeB[S]^_MOVOV]!N*W9JI^SjEMXWZR*WH7JiJrWZcgLEH!MO]!WNdJ*S<~+H!ja<N*SVX]EJrMXSHJrS<J*L!W}aUSlW
WZ]+aJrMOSH$e+SH9JrL!WMXHJ*WZRflabV${-EZeE^_L!MXcgLabVON*SiN*abJrMXNT~+WNJ*L!WcZSHENTJrR*aMOH7J#J*L+abJ_yz7|Byzf|Gu
a7|d&WJk=e#aH!j]!N*WJrL!W~+H!MhJrWGIj!Mh6WR*WZHEcZW}YKWqJrL!S@jJ*SNTSVXlW}J*L!W<abUSlW}PWZR*MXS&jEMOc9P!RTSU!VOWY>uNTW
U+acg7^saR*j@IWZ]EVOWZRdn5SRiJrL!W9J*MOYdWqIjEMON*cR*WqJrMOZabJrMXSH$eQx{&E}N*SJrL!WR*W^_MOVXV-UW%H!S@j!WNKMXH[S]!R~+H!MhJrWq
j!Mh6WR*WZHEcZW3j!MONTcZR*WqJrMXabJ*MOSH$eEaH!j>aHMOH!MhJrMOaVm]!WZNTNyz{T|pu
U |#om7abMOH}]!N*MXH!mkU+abcg^saRTj&IW]!VOWRn5SR#JrLEW=JrMXYKWGIj!MXN*cR*WJ*MOZabJrMXSH$eE]EN*W=J*L!W3N*L!S@SJrMXH!mdYKWqJrL!S@j}J*SkSU&JgaMXH}yz{T|
/
n5SR_3upNTW3J*L!WMOH!MhJrMOaV/m]!WZNTNyz7|Qu
c|3S^VOWqJx(Eu}NTMOH!mJ*L!WkN*aYdWKj!MXN*cR*WJ*MOZabJrMXSHabH!jMOHEMXJrMOaVm]EWZN*NxaNwMXHP+aRfJ9za7|qeN*SVhlWKn5SR3yz{T|
]!NTMOH!m3JrL!W_~+H!MhJrWqj!MXWZRTWZH!cWYdWJ*L!S@j$uL+abJpL+abP!PWH!N^_LEWZHk[S]kJrRf[iJrS3]!N*WJrL!WN*LES&SJ*MOH!m3YdWJrLES&jdSHKJ*L!MON
P!RTSU!VOWY
j/|Rf[]EN*MOHEmJ*L!W>lbaVX]!W>Sbnoyz|in5S]EH!jU7[JrL!W~+H!MhJrWqj!MXWZRTWZH!cWYdWJ*L!S&jaNKaHMXH!MXJ*MaVm]!WZNTNKn5SRJrLEW
N*LES&SJ*MOH!mYdWJ*L!S&j6uQS^Yx]EcgL>cZaH}[S]}PWRTJr]ER*Uyz7|#aH!jNfJrMOVXV$L+alWJrLEW3NTL!S@SJrMXH!miYKWqJrL!S@jcZSH7lWZR*mW


   !#"$% &('*)
+-,+./.01
23/4567
"$8:9<;%=> ?A@:& 7A4BDC.E'F+GHI 8JKLM 86+GGN7G1
CO1#PRQTS*U!VXWZY![F\]!^X_`ba\cedfVX^X^+]_gcOW*VXQOh>SiUO_kjTlmonpf`e\^X_Nqc!^srt[BuOa7Q+r`IVXqNWZY![F\h[irt`vu!_N`I\Q!WwSi[irSi_NuTVXQIqN^srtW*WNx
y#U!_3W*\z{SRd|r[F_>V}W#Y\WwSi_u~\Q:S*U!_qN\c![FW*_d|_]WFV}Si_tx
r8|W*_S*U!_`e\^XuOa`I\^X_qNc!^Xr[ZuOa7Q+r`IVXqNW|W*\z{SRd|r[*_S*\qr^}qNc!^XrtSi_SiU!_V}W*\S*U!_N[F`Trt^q\`eYO[*_NWFW*V}]!VX^}V}SRa\z?^}VXc!VXu
r[Fh\QrSBrS*_N`IY_[irtS*c![*_rtQ!urMuO_NQ!WFV}SRa>\z+!:xy#UO_-VXWF\SiU!_[*`er^8qN\`eY![F_NWFW*VX]OVX^XVoSRa
VXWfu!_H+Q!_uDrW

t

y#U!_3_H&Y_N[*V}`e_Q8Sr^$tr^}c!_\z 

&7 > +x

rtS#S*U!_NWF_>q\Q!u!VoSiVX\Q!W#VXW t:

]/g\dWF_NQ!WFV}SiVo_rt[*_ka\c![[*_W*c!^oSiWgS*\SiU!_kY\S*_NQ8SiVXr^ qNcOSwR\c!W*_u$?rQ!uDS*U!_c!WF_\zZrWFS*[*VXqHSqNc&SFR\t\[r
^XV}Q!_u&Rq_N^}^$^XV}WFSfrYOY![*\8rtqU$x
,1MPRQS*U!VXW#YO[*\]!^}_N`a\cdfVX^}^q\Q!WFVXu!_[fS*U!_u!Vo_[*_NQOqN_]_SRd_N_NQSiU!_^}\&qNr^rtQ!uh^X\]!r^S*[*c!Q!qHSiV}\Q:_N[F[*\[-z6\[
]\c!Q!u+rt[Fatr^}c!_Y![*\]!^X_`eWx
r8f|\Q!WFVXu!_[#SiU!__Nc+rSiVX\Q




!
\QS*U!_#VXQ8Si_[Ftr^ D OdfVoSiU]\c!Q!u!r[Fa>qN\QOu!V}S*VX\Q!W 8BrtQ!u Bx5_S*_N[*`IVXQO_rtQ+r^}aS*VXqrt^X^}a
SiUO_3_OrqSMW*\^}cOSiV}\Q$x
]/#|\QOWFSi[Fc!qSrk!Q!V}S*_HRuOV}_N[*_Q!qN_gu!VXWFqN[*_HSiV}rtS*VX\QWFqU!_N`I_z6\[-q\`IY!cOSiV}Q!hkS*U!_W*\^XcOS*VX\QTS*\SiU!_g_Nc+rtS*VX\Q:V}Q
Y+r[wS r8-Qc!`e_[*V}qr^}^}ax|\`IY!cOS*_SiU!_3Qc!`e_[*V}qr^
WF\^XcOS*VX\Q:S*\IS*U!_>_c+rtSiV}\Qc!W*V}Q!hI&O_uDW*Y!rtSiVXr^
WFS*_NY!WFVXN_W
\zBTO&eOo3rQ!u~OOx
q#\dqN\QOW*VXuO_N[#SiU!__7c!rtSiV}\Q


+

\QDS*U!_VXQ8Si_[Ftr^ ONdfVoSiUA]\c!Q!u+rt[Fa~qN\Q!uOV}SiV}\Q!W 8M:rQ!u g fx>&UO\dS*U+rtSS*U!__HOrqHS


W*\^XcOS*VX\Q:Si\S*U!VXW#Y![F\]!^}_N`V}W-SiU!_W*r`e_3rW#S*U!__HOrqSWF\^XcOS*VX\QS*\SiU!_Y![F\]!^X_`VXQ:Y!r[FS rHx
$

u/#|\QOWFSi[Fc!qSrk!Q!V}S*_HRuOV}_N[*_Q!qN_gu!VXWFqN[*_HSiV}rtS*VX\QWFqU!_N`I_z6\[-q\`IY!cOSiV}Q!hkS*U!_W*\^XcOS*VX\QTS*\SiU!_g_Nc+rtS*VX\Q:V}Q
Y+r[wS q#Qc!`e_[*V}qr^}^}ax||\`IY!cOS*_3S*U!_>Qc!`I_N[FVXqNr^?W*\^}cOSiV}\Q:Si\SiU!_>_Nc+rtS*VX\Qc!W*V}Q!he&O_uDW*Y!rtSiVXr^
WFS*_NY!WFVXN_W
\zBTO&eOo3rQ!u~OOx
_g&U!\dS*U+rtSS*U!_mXl*m 7*lHlS*U+rtSgVXWMS*U!_[*_W*V}u!c!_kdfU!_QASiU!__OrqSWF\^XcOS*VX\Q~VXWMW*c!]!WwSiVoSicOS*_Nu
VXQ8S*\S*U!_u!VXWFqN[F_Si__7c!rtSiV}\Q$
VXWS*U!_Wirt`e_z6\[S*U!_Y![*\]!^X_`VXQAY!r[FS r8rWu!VXWFqN[F_SiV}N_u<V}Q%Y+r[FS ]/H?rtQ!u

SiUO_Y![*\]!^X_`V}QY+r[wS qMrWgu!VXWFqN[*_HSiV}N_NuAV}QDY+r[wS u/Hx7\!SiUO_NW*_SRd\:Y![F\]!^X_`eWMU+r_kS*U!_W*r`e_rQ+r^oa7S*VXq


W*\^XcOS*VX\QrtQ!uSiU!_Tu!VXWFqN[*_HSiV}rtS*VX\Q`I_S*U!\7u!Wkc!WF_NuU+r_S*U!_Wirt`e_^}\&qNr^Si[*cOQ!qS*VX\Q_[*[F\[Nx<M\d__N[NSiUO_
_N[F[*\[V}QIS*U!_fQ7cO`e_[*VXqNr^X^oaq\`eYOcOSi_ueWF\^XcOS*VX\QOWBVXWQ!\SSiUO_W*r`I_fz6\[S*U!_#SRd|\>Y![*\]!^X_`eWxU+rtSVXWBuOV}_N[*_Q8S
r]\cOSBS*U!_#SRd|\3Y![*\]!^X_`eW8rQ!uU!\dEuO\&_WS*U+rtSu!Vo_[*_Q!qN_f_H&Y!^XrVXQS*U!_#u!Vo_[*_NQOqN_fVXQSiUO_#Q7cO`e_[*VXqNr^!_[*[*\[*WNx

0 1+\[#SiUOVXWfY![*\]!^X_`~!d_ku!_H_^X\Y_Nu~W*\`I_>_7Si_NQOW*V}_k`ertSi^Xr]qN\7u!_Si\TUO_N^XYa\cc!Q!uO_N[*WwSrQOuD]\c!Q!u+r[wa8

_N^}_N`I_NQ8SM`e_HSiU!\7u!W]aU+r7VXQOhea\cc!WF_>SiUO_k`I_SiUO\&uSi\eu!_S*_N[*`IVXQO_3S*U!_>qNrY+rqV}SirQ!qN_3\zBWFVXQ!h^X_qN\QOu!c!qS*\[Nx
y#U!_3Y![*\]!^X_`VXW-z6\[*`c!^XrtSi_uVXQrW#z6\^}^X\dfVXQ!hWN
M\d`kc!qU~qU!r[*h_`cOWFSfd|_Y!c&SM\QrqN\Q!u!c!qHSi\[#Si\[*rVXWF_V}SiW#\^}Sirh_z6[*\`N_[*\\^oSiW-S*\e\QO_\^oSx
_3qNrQW*\^o_z6\[-SiUO_3qU!r[*h_u!_Q!W*VoSRa\QSiU!_qN\Q!u!c!qHSi\[fW*cO[Fzrq_3]8aWF\^}7VXQOhSiU!_VXQ8S*_Nh[*r^$_Nc+rSiVX\Q
g Bw6



77

dfU!_[*_SiUO_gY\Si_QS*Vsr^ g BMz6\[-r^}^+Y\V}Q8SiW \QeSiU!_MqN\Q!uOc!qS*\[-W*cO[Fzrq_rQ!u


VXWZSiU!_Mc!Q!Q!\dfQqU+r[*h_

u!_Q!W*VoSRax_U+r_3u!__N^}\Y_uDrW*_HSM\tz`ertSi^Xr]:qN\7u!_NW#S*U+rtSc!W*_SiUO_3q\^X^}\&qNrtSiV}\Q`I_S*U!\&u:S*\eq\`eYOcOSi_SiUO_
qU+r[Fh_u!_NQOW*V}SRaxPRQ<S*U!VXWY![F\]!^}_N`$a\c%dfVX^}^5_HOr`IVXQO_SiU!_qN\7u!_rQOu<S*U!_NQ<qN\Q8_N[FS3SiUO_Iq\&uO_Si\c!W*V}Q!hr
r^X_[*VXQ:`I_S*U!\&uxBy#U!_>`ertS*^sr]q\&uO_3V}Q!qN^}c!u!_WWF__N[irt^?!^X_NWf^}VXWFS*_Nu:]_N^}\dx
+ 8 !Z}K-y#U!VXWfV}W-SiU!_`erVXQ:[F\cOSiV}Q!_dfU!V}qU~qNr^Xqc!^srtS*_NW#qNrY+rqV}SrtQ!qN_tx
+ o|y#U!VXW#+^}_qN\Q8SrtVXQ!W-S*U!_[*\cOS*VXQ!_dfUOVXqUDrtQ+r^}aS*VXqrt^X^}aq\`IY!cOSi_W
i 

+ +?7Gofy#U!V}W#[*\c&SiVXQO_NWf[*_Nru!WrW*_SM\zY+rQ!_^XW#u!_W*q[*VX]OVXQ!hSiU!_u!V}W*q[*_S*VX_Nuh_N\`e_HSi[Fax
7 
!oy#U!VXWf[F\cOSiV}Q!_qN\`eY!c&Si_NWfY+rtQ!_N^}W#qN_NQ8S*[*\V}u!WNx
6
O}Ky#U!V}Wf[*\cOS*VXQ!_WF_S*WMc!Y:S*U!_qN\^}^X\7qrtS*VX\Q`ertS*[*Vox
_ZU!r_Zr^XWF\-Y![*\7VXuO_NuSRd|\f_HOr`IY!^}_NWNr-WF7c!r[*_Y!^XrtSi_ +^X_W
56O&$o$6Li56O&0/$o$L/rtQ!u56OC o$6L
rQ!urW*YOU!_N[F_ +^X_WGN?;$o$6Li GN?C7/,$o$L*G?!/;$}LMdfVoSiUASiUO[*_N_WFc!qNq_NW*WFV}_N^}a%[F_+QO_Nu
u!V}W*qN[F_S*VXrSiVX\Q!WN/rW#S*U!_3!^X_Q+r`I_NWfVX`IY!^oaxy#U!_3Y!^srtS*_VXWrIcOQ!V}SfWF7c!r[*_>Y!^srtS*_c!Q!Voz6\[*`I^}aTu!VXWFqN[*_HSiV}N_NuVXQ8S*\
OrQ!u 
Y+rQ!_^XWNxey#U!_IW*Y!U!_[*_eVXW3rcOQ!V}SWFY!U!_N[F_Tu!V}W*q[*_S*VX_NuV}Q8Si\~rQ!\Q7cOQ!V}z6\[F` qN\^X^X_qS*VX\QA\z  O
8rtQ!u 
kY!rQ!_N^}WNxQ!q_a\cU+r_uO\dfQO^X\8ruO_Nu:SiU!_M+^X_W|z6[*\`SiUO_d|_]$Oa\cqNrQ[*c!QrtQ:_HOr`IY!^X_M]a
SRa7Y!VXQOh
g`ertS*[*Vo7

qrt^XqNqNrY  Y!^XrtSi_NOx V}z 

rQ!uS*U!_NQa\cIqNrQr^}W*\_HOr`IVXQO_SiUO_#`TrtS*[*V x|_-z6\[wd-rt[*Q!_uS*U+rtSBWFY!U!_[*_ t
Srt_NWr

S*VX`I_S*\[*c!Q$x

r 5!rt`eV}Q!_#SiU!_qN\7_IqNV}_NQ8S-`ertS*[*VoY![*\7u!c!q_Nue]aS*U!_Mq\^X^}\&qNrtSiV}\Q`I_SiUO\&u$7rQOuT_H&Y!^XrVXQdfU8aSiU!_`TrSi[*V
8
VXWfQ!_Nr[*^oaWwa&`I`I_Si[FVXqz6\[-S*U!_Y!^srSi__HOr`IY!^}_NWNO]!cOSfzr[-z6[*\`WFa7`I`e_HSi[FVXqz6\[-S*U!_W*Y!U!_[*_3_HOr`IY!^X_WNx
]/|y#U!_qNrY+rqV}SirQ!qN_g\zrkc!Q!VoS#Y!^srSi_gV}W#rY!YO[*\&VX`ertSi_^}a8Ox kY!V}qN\zr[*ru!WxM\d[irY!V}u!^}aIV}W|SiU!_gqN\^}^X\7qrtS*VX\Q

`I_SiUO\&uqN\Q_N[FhVXQ!hSi\SiU!rtSrQ!Wwd|_[Nx
q QOqN_Ta\cU+r_:_!rt`eV}Q!_NuS*U!_q\^X^}\&qNrtSiV}\Q%qN\7u!_a\cqrQSiU!_QY![*\]!r]!^}a<WF_N_:U!\dSi\Dq\Q8_[FSSiUO_

qN\7u!_ISi\c!W*V}Q!hDr r^}_N[F7V}Q%`e_HSiU!\7u$x:+\[SiUOVXW3Y![*\]O^X_N`
d|_Id|\c!^Xu^XVXt_a\cS*\~q\`IY+r[*_SiUO_TrqqNc![*rqa
\zZSiU!_ r^X_[*VXQ`e_HSiU!\7uASi\:SiU!_q\^X^}\&qNrtSiV}\Q`I_S*U!\&uDz6\[rhV}_Q%Q7cO`]_[\z|Y!rQ!_N^}WN
]!c&S>\Q!^}az6\[SiUO_
W*c+rt[*_Y!^XrtSi_>Y![*\]O^X_N`xg_\Q!^oa~d|rQ8Sa\cASi\Td\[*dfV}S*USiU!_kW*c+r[F_Y!^srtS*_kY![F\]!^}_N` ]_qrc!WF_VXQS*U!_NWF_
Y![F\]!^X_`eW-S*U!_Y+rQ!_^XWr[F_r^X^$WF`Trt^X^WFc+r[*_WNx
y \u!\eS*U!_ r^}_N[*V}Q~qNr^Xqc!^srtS*VX\QOa\c~dfVX^X^
QO_N_NuDS*\Tq\`eYOcOSi_SiU!_>VXQ8Si_h[irt^ \tzreY\Si_Q8SiVsrt^ \_N[rIW*c+r[F_
Y+rQO_N^xe\Q!\SSi[waASi\qN\`e_c!Y<dfV}SiU<rQrQ+rt^}aSiV}qz6\[*`cO^sr S*U!\c!hUVoza\cu!\OY!^}_rWF_^X_S`I_Q!\d! x
y#U!_[*_Drt[*_`TrQ8ad|ra&WeSi\rY!Y![F\&V}`TrtS*_N^oaqr^}qNc!^XrtSi_S*U!_[F_Nc!V}[*_NuKVXQ8Si_h[irt^xPzMa\cEr[*_WFS*c!qz6\[IrQ
rY!YO[*\8rqUq\Q8SrqHSfSiU!_y"z6\[#W*\`I_U!_N^}Y$x
u/#&cOY!Y\W*_SiUO_qN\^}^X\7qrtS*VX\Qe\[ r^X_[*VXQe`TrtS*[*V T_c+rtSiV}\Q!Wfr[F_]_NVXQOhW*\^o_Nu]8a rc!WFW*VsrtQ_N^}VX`IVXQ+rtS*VX\Qx
U!V}qUDdfVX^X^
Sir_^}\Q!h_[z6\[M_[Fa~^sr[Fh_3Y![*\]O^X_N`IWNz6\[*`IVXQOhIS*U!_k`ertSi[FVo\[MW*\^o7VXQ!hISiU!_>`TrtS*[*V $#U!V}qU
Srt_NW^X\Q!h_N[-z6\[-SiU!_W*_3W*`er^X^Y![*\]O^X_N`IW%#
_ PzMa\cKr[*_c!WFVXQ!h '&Si\<W*\^}_:S*U!_qN\^}^X\7qrtS*VX\Q\[ rt^X_N[FVXQ`ertSi[FVo_Nc+rtS*VX\Q!WdfU!V}qUrtY!Y![*\rqU

d\c!^Xu:a\cY![*_Hz6_N[N r^}_N[*V}Q\[-q\^X^}\&qNrtSiV}\Q#PRQ:z6\[*`IVXQ!h>a\c![ r^}_N[*V}Q`TrSi[*V Ou!V}ua\cc!WF_>r`I_S*U!\7u
SiU!rtShc+r[*rQ8Si_N_W#SiU!_`ertSi[FVoedfVX^X^]_Wwa&`I`I_Si[FVXq #
zig&c!YOY\WF_>S*U!_kq\^X^}\&qNrtSiV}\Q\[ r^}_N[*V}Q~`ertSi[FVo_7c!rtSiV}\Q!WMr[*_>]_VXQ!heW*\^o_Nuc!W*V}Q!h:r)([wa&^}\W*c!]OW*Y+rq_
`I_SiUO\&u<^XV}_ '
&3x$U!V}qU<dfV}^X^5Sir_I^X\Q!h_N[z6\[_N[wa%^sr[Fh_Y![*\]!^X_`eW?z6\[*`IVXQ!hS*U!_`TrtS*[*V A\[W*\^}7VXQ!h
SiUO_3`ertS*[*Vo#:U+rtSMru!u!VoSiV}\Q+r^VXQOz6\[F`TrSiVX\QTu!\a\cQ!__NuSi\erQ!WFd_N[fSiUOVXW#c!_NWwSiV}\Q#
hy#U!_W*c+rt[*_Y!^XrtSi_e_HOr`IY!^X_Wr[F_Tu!V}W*q[*_S*VX_NuV}QS*\  h[*V}u!W>\tzfc!Q!V}z6\[*`I^}aDWFVX_NuY+rQO_N^XW O
OrQ!u Ez6\[kSiU!_eSiU![F_N_:_HOr`IY!^}_NWr]\_Hx*5!rt`eV}Q!_eS*U!_:qN\7_IqNV}_NQ8S`ertSi[FVoh_Q!_N[*rtSi_u]a
a\c![ rt^X_N[FVXQ_N[FW*VX\Q\tzfqr^}qNqrtYz6\[>S*U!_NWF__Or`eYO^X_NWrQ!uQ!\tSiVXq_TU!\d`TrtQa\z#S*U!_`ertSi[FVo_NQ8Si[FVX_W
U+r_g_HOrqS*^}aSiUO_gWirt`e_ftr^Xc!_tx+"gW|r3z6c!QOqSiV}\QT\tz
A&UO\d`erQ8ac!Q!V}c!_Mtr^}c!_NWr[*_SiU!_[*_MVXQSiU!_M`TrtS*[*V}qN_NW
rWFW*\7qNVsrSi_NudfVoSiUSiU!_fc!QOV}z6\[F`e^oau!VXWFqN[*_HSiV}N_NuIW*c+rt[*_fY!^srSi_NW,#-B^}_rWF_hV}_frtQI_&Y!^srQ!rtSiV}\Qz6\[5a\c![[*_W*c!^oSiWNx
U /.r_Tr~WFY_qNVsrt^Z_N[*WFVX\Q\z-a\c![ r^X_[*VXQOVXN_u%qr^}qNqrtYSiU!rtS\QO^}aDd\[*Wkz6\[>c!QOV}z6\[F`e^oaDu!V}W*qN[F_S*VXN_u
/
W*c+rt[*_Y!^XrtSi_WNrQ!uDc!WF_kSiUO_k\]!WF_N[wrSiVX\Qa\c`TruO_kVXQ h-Si\T`e\[F_>_0IqNVX_Q8Si^}aq\`eYOcOSi_3SiU!_q\&_0IqNVX_Q8S
`ertSi[FVoez6\[#SiU!V}W#W*Y_NqNVXr^$WF_SM\z5YO[*\]!^}_N`IWNx
V{
7c!Y!Y\W*_Ba\c>Q!\dS*[FaMSi\fW*\^o_BSiUO_Z^}VXQ!_Nr[$Wwa&WwSi_`z6\[$SiU!_qU+rt[*h_B_qS*\[1|]ac!W*V}Q!h#SiU!_ '&r^Xh\[*VoSiU!`x
-rQDa\cAc!W*_>SiU!_\]!W*_[FtrtSiV}\QDV}Q h8fSi\TW*c!]!WwSrQ8S*Vsr^}^}a[*_u!c!q_SiU!_`I_N`I\[waQ!_N_u!_NuASi\TW*\^o_c!QOV}z6\[F`e^oa
u!V}W*qN[F_S*VXN_u~WFc+r[*_3Y!^XrtSi_W%#

Course 6.336 Introduction to Numerical Algorithms (Fall 2003)


Solutions to Problem Set #2

Problem 2.1
a) Assume, for the sake of simplicity, that all resistors in the line are of resistance R. The
structure of the N N conductance matrix G is:

G=

2
R
R1

0
..
.

R1

..
1
.
R
. .

..
.

.
..
..
.
.
0 R1

2
R
R1

..

0
..
.

R1
2
R

The matrix G is a tridiagonal matrix (i.e. a band matrix of bandwidth 2). By inspection,
the number of nonzero entries in G is N + 2(N 1) = 3N 2.

b) The matrix problem for the resistor line, written in terms of the resistance matrix G1
is G1 i = v where i is the vector of current source currents owing into each of the nodes,
and v is the vector of node voltages. For our original resistor line, i is a zero vector.
Suppose now that the jth entry of the vector i is nonzero. Physically, an injection of
current into node j will cause a change in all the node voltages. The jth entry of vector
i multiplies only the jth column of G1 . So a change in all the node voltages in v will
be algebraically possible only if the jth column of G1 consists of all nonzero entries, i.e.
G1 i,j =
0 for all i.

By extending this argument to all entries of the current source vector (and all columns of
the resistance matrix), we see that the N N resistance matrix G1 is full, i.e. will have
N 2 nonzero entries.
c) The factorization of the tridiagonal conductance matrix G produces two bidiagonal fac
tors L and U , such that LU = G. In order to see this, lets examine the rst few elimination
steps for the matrix G.
After the rst elimination step we get:
2
R

G(1) =
0
.
.
.

R1

3
2R
R1

R1

.
0

2
R

..

..

And after the second:

..
.
..
.
.
.
0
..
.
R1
1
2
R
R

2
R

G(2) = .
.
.

..
.

R1
3
2R

0
..
.

0
..
.
4
3R
R1

..

..
.
..
.
2
R

..

..

..

..

.
.
1
0 R

0
...
..
.

1
R
2
R

Each elimination step targets only one row in the tridiagonal matrix G. In addition, the
triangular block of zeros in the upperright corner of the matrix remains untouched. Thus
after all N 1 elimination steps, the L matrix will feature ones on the main diagonal, and
the N 1 multipliers on the subdiagonal. The U matrix will also be bidiagonal, with the
pivots on the main diagonal, and Rs on the superdiagonal. It follows that the number
of nonzero entries in L or U is thus N + (N 1) = 2N 1.

For N = 1000 the number of nonzero entries in G1 is 1, 000, 000 while L and U will each
contain only 1999 nonzero entries. It is not a good idea to use the inverse of a matrix
for solving the matrix problem due to the excessive number of required multiplications
proportional to the number of nonzero entries.
d) To determine the smallest entry in the resistance matrix, lets use the fact that an entry
rij if the resistance matrix is simply a voltage at node i caused by a unit current source
connected to node j. And we are looking for a smallest entry, i.e. the case where voltage is
minimal. You can easily gure out that we should put current source at one ends of the line
and examine the voltage at the other end of the line. In other words, the smallest element
for an N xN resistance matrix is always r1N or rN 1 they are equal, since matrix inversion
preserves symmetry.
Now imagine our line with current source connected to the rst node. The voltage at node
N is evidently
r1N =

1
N +1

(1)

Note, that for N xN matrix correspond to N + 1length resistor line.

Problem 2.2
a) It is very easy to nd counterexample, just remember the fomula for, say, second pivot:
a
22 = a22 a21

a12
a11

(2)

Evidently, there is no guarantee that |a


22 | |a22 |.
b) The statement is true. For example, (2) implies that |a
22 |le|a22 |. Therefore we have
proven the statement for order 2. To prove the statement in general case, we need to use
2

mathematical induction. Lets assume that we have proven our statement for the order
N 1. Now we need to show, that for the order N of the matrix, after eliminating rst row
from all subsequent rows, the resulting (N 1)x(N 1) submatrix:
1. will have all positive diagonal entries, no larger than original diagonal entries
2. will have negative odiagonals
3. will be strictly diagonally dominant
4. will be tridiagonal
Thie rst statement we have already shown, since all multipliers except for the M21 are
zero. Second one is also trivial. Same is the last one.
Now, lets show that the third statement holds. Since initial matrix is strictly diagonally
dominant, we know that |a22 | > |a21 | + |a23 |, and |a11 | > |a12 |. The only thing we need
to show is that the number we substract from a22 is less than a2 1, which is also evident.
Therefore we have:
|
a22 | > a22 |a21 | > |a21 | + |a23 | |a21 | = |a
23 |

(3)

We have proved the statement in general case.

Problem 2.3
a) For N (y, k) = I + yeTk and given k, the matrix N structurally looks like:
k
1
y1
0
. .
.
..
..
..
...

N (y, k) = 0 1 + yk 0
.
..
. . ..
.
. .
.
.
0
yn
1

where y = [y1 yn ]T . A simple check will help you verify that N 1 is structurally similar
to N . Let
k
1 w1 0
. .
..
. . ...
...

= 0 wk 0
.
.. . . ..
.
. .
.
.
0 wn 1

N 1

Furthermore, since N N 1 = I we get the following system in n unknowns:

w1 + y1 wk =

w2 + y2 wk =

... =

(1 + yk )wk =

... =

0
0
0
1

0
wn + yn wk = 0,

or equivalently N w = ek . Solving this system for w = [w1 wn ]T gives a formula for


obtaining N 1 :

N 1

1
..
.
0
..
.
0

k
y1
(1+y
k)
.
..
.
.
.
1
(1+yk )
..
.
yn
(1+y
k)

0
...
0
. . ..
. .
1

b) Using the result in part a) we see that


N (y, k)x = ek
or equivalently
(I + yeTk )x = ek
from where we get successively
x + xk y = ek
and
y=

1
(ek x).
xk

c) Say we wish to nd the inverse of the matrix A,

A =

a11 a12
a21 a22


an1 a12

Let us write A as
4

a1n
a2n


ann

A = x(0)
1

(0)
x
2

(0)
xn

(0)

where xj denotes the jth column of A at step 0 of our (yettobe derived) matrix inversion
algorithm.
In part (b) we showed how to nd a matrix N such that N x = ek for a given x. Let us
notate this matrix by N (x, k) so that N (x, k)x = ek . In the rst step of the algorithm we
(0)
compute N (x1 , 1) and multiply it into A, so that

(0)
N (x1 , 1)A

where

(1)

xj

(1)
(1)
0 x2
xn
0

(0)

(0)

= N (x1 , 1)xj

(0)

(1)

We now multiply N (x1 , 1)A by N (x2 , 2), and so forth, so that after k steps, we have

( 1)
N (x
, j)A
=

j=1

1 0
0 1
(k)
(k)
0 0 xk+1 xn
.. ..
. .
0 0

where the column vectors at step k are


(k)

xj

(j1)

N (xj

(0)

, j)xj

j=1
(k1)

It is fairly easy to convince yourself that multiplication by N (xk


, k) does not aect the
(k1)
rst k 1 columns of the matrix, since the rst k 1 columns of both N (xk
, k) and
k
( 1)
, j)A are the identity vectors e. Thus,
j =1 N (x
A1 =

(j1)

N (x
j

, j)

j=1
(k)

What makes the algorithm inplace is that since xjk = ej , we no longer have to store those

(j1)

, j) is el for l > k. Thus, we can accumulate A1


vectors. Also, column l of kj =1 N (xj
in the space occupied by the columns of A that have already been reduced to unit vectors.
5

(k1)

Note that if at some step k, xk


[k] = 0, then this algorithm will fail. The solution is to
pivot by swapping columns in the matrix. I havent included code that performs pivoting,
because that is the subject of the next problem set.
Some of you noticed that this is essentially the procedure known as Jordan elimination.
Jordan elimination is in some sense an extension of Gaussian elimination to the extent that
at each point in the elimination the elements on previous pivotal rows are also eliminated.
The following code implements these ideas by expliciting forming all the required products.
%in-place invert a matrix A

n = size(A,1);

for i=1:n,

y = -A(:,i);

y(i) = y(i) + 1.0;

y = y / A(i,i);

for k = i+1:n;

m = A(i,k);

for j=1:n;

A(j,k) = A(j,k) + m * y(j);

end;

end;

A(:,i) = y;

A(i,i) = A(i,i) + 1.0;

for k = 1:i-1;

m = A(i,k);

for j=1:n;

A(j,k) = A(j,k) + m * y(j);

end;

end;

end;

On a Sun Sparc10, this routine is about 500 times slower that Matlabs builtin inv()
function.
A more ecient code would operate on the matrix by columns:
%in-place invert a matrix A; vectorized

n = size(A,1);

for i=1:n,

y = -A(:,i);

y(i) = y(i) + 1.0;

y = y / A(i,i);

for k = i+1:n;

A(:,k) = A(:,k) + A(i,k) * y;

end;

A(:,i) = y;

A(i,i) = A(i,i) + 1.0;

for k = 1:i-1;

A(:,k) = A(:,k) + A(i,k) * y;

end;

end;

This routine is signicantly faster; it is only a factor of 1015 slower than Matlab. Regardless
of how fast the machine is, the fact that direct matrix inversion takes O(N 3 ) operations
limits the size of the problem we can solve in a reasonable time. Matlab took roughly 2.7
seconds to invert an N = 250 matrix. In a month, it could probably do N = 25000, in a
year, about N = 50000. The rst algorithm above could probably handle only N = 3000
in a month, N = 7000 in a year.

You might also like