Lecture Notes 02 PDF

Design Problem Formalization
Necessary Entities for Design Formalization

To formalize a parametric design problem, ve design entities
must be rst determined and then incorporated into design
formulation through appropriate modeling:
1 Design Component (parameter/variable)
2 Design Attribute (must/wish)
3 Functional Requirement
4 Design Function
5 Design Availability (if any)
Given a design problem, you need to come up with this ve

necessary entities in a 5-step procedure, i.e., (1) - (5).

c Dr. R. Perez, RMC ME503, Adv. Dsg. Eng. Sys. 45/787
(1) Design Component (parameter/variable)

Design component is a factor that needs to be congured in a
design problem.
Design Variable (dened as a design component that will
be determined during the design process, e.g., geometric
dimensions.)
A set of design variables form the vector of design
variables: x = {x1 , x2 , . . . , xn }T
Design Parameter (dened as a design component that
is pre-determined prior to the execution of a design
process, e.g., properties of selected materials.)
A set of design parameters form the vector of design
parameters: p = {p1 , p2 , . . . , pn }T
Design vector: {x, p}

(2) Design Attribute
Design attribute refers to any property or characteristic of a

design; denoted as Ai .
Wish Attribute (dened as a design attribute whose
functional requirement (FR, desired) would be achieved
whenever possible, thereby allowing for compromise in
design.)
Must Attribute (dened as a design attribute whose
functional requirement (FR, specied) must be achieved
under all circumstances, thereby not allowing for any
compromise in design.)

(3) Functional Requirement (FR)
FR refers to the composition of a design attribute, a permissible

value or range of values associated with the attribute.
For example: the normal stress is not in excess of 50 MPa, the

wing span should not exceed 80 m, etc.

(4) Design Function

Design function refers to a functional mapping that relates a
state of the design components to a design attribute, e.g.,
fi (x, p) Ai .
The mappings () can be any calculation or procedure to

measure the performance of a design [12], including:
closed-form equations (analytic mapping)
a set of heuristic rules (experience-based mapping)
black box calculations (experience-based mapping)
iterative solutions
virtual prototyping or simulation (analytic mapping)

(5) Design Availability
Design availability refers to an admissible space in which each

value is physically available to the choice of a design variable.
It signies the physical limitations imposed on the design

components
For design parameters it represents the dierent
alternatives to choose (e.g., dierent materials)
For design variables it represent realistic limitations on
design variables

(5) Design Availability
Note that design availability is not the same as the notion of a

feasible design space.
The latter lies in the space formed by considering the

behavioural bounds of design functions (i.e. design
constraints), in addition to the physical limits of the design
variables (i.e. design bounds).
By contrast design availability delimits the space resulting from

the physical limits to a suitable choice of design variables only.

Remarks
The geometric and topologic pattern of design variables is

determined through shape modeling, helping visualize and
understand the appearance of an artifact.
The behavioral and functional relation of a design
attribute is determined through function modeling, which
is nothing but a functional mapping process, resulting in a
corresponding design function.
The desired performance of design attributes is determined
through design modeling, by propely choosing the
parametric values or dimensions of design variables,
enabling the design conversion from non-functional to
functional.

Multi-attribute Design
In general, a design is a state of conguration, say

y = {yi }T (i [1, n]), that purports to satisfy the functional
requirements prescribed for a collection of design attributes,
where yi denotes the i th design component of design vector y
that needs to be congured upon design availability in a design
problem.

Design Problem Statement
Given a set of design attributes and the associated functional

requirements, nd a state of parametric conguration for the
design vector that will best satisfy all the functional
requirements subject to design availability and constraints.

Attribute Modeling
Wish attribute
Larger-the-better
Center-the-better
Smaller-the-better
Must attribute
At-most
No-bias
At-least

Wish Attribute Modeling
Center-the-better Larger-the-better
Degree of Goodness
Smaller-the-better
Design Attribute

A wish design attribute is viewed as the functional

requirement that is exible for compromise.
smaller-the-better: attributes in which the performance
value becomes ideal if it can be minimized whenever
possible, that is: min f (x, p) , (e.g. cost, stress, noise,
etc.)
center-the-better: attributes in which the performance
value becomes ideal if it is specied and used as a desired
target that needs to be achieved in design, that is:
min [f (x, p) T ] T desiredtarget, (e.g.
thrust-to-weight ratio in aircraft design)

A wish design attribute is viewed as the functional

requirement that is exible for compromise.
larger-the-better: attributes in which the performance
value becomes ideal if it can be maximized whenever
possible, that is: max f (x, p) min f (x, p), (e.g.
strength, reliability, service life, etc.)
In a generalized way a wish attribute is dealt with in the
form: min f (x, p)

Must Attribute Modeling

A must design attribute is viewed as the functional
requirement with no room of exibility for compromise, it is
usually expressed in the form of inequalities and equalities.
at-most: attributes in which the performance value must
not be more than a certain prescribed value, that is:
gi (x, p) < g0 , (e.g. maximum allowable stress)
no-biased: attributes in which the performance value
must agreeably or consistently stay on a constant value,
that is: gi (x, p) = g0 [gi (x, p) g0 ]2 <
numerictolerance, (e.g. in automotive design number of
seat belts must be equal to maximum number of
passangers, temperature control)

Must Attribute Modeling
A must design attribute is viewed as the functional

requirement with no room of exibility for compromise, it is
usually expressed in the form of inequalities and equalities.
at-least: attributes in which the performance value must
be more than a certain prescribed value, that is:
gi (x, p) > g0 gi (x, p) < g0 , (e.g. minimum
strength of an structure)
In a generalized way a must attribute is dealt with in the
form: gi (x, p) g0

Welded Beam Design (1)

Welded Beam Design (2)

Design components
Parameters: applied force, beams length,
Youngs modulus, shear modulus
Variables: height / thickness of the beam,
depth / length of the weld
Design availability
Bounds on, for example, height
Design attributes
Shear stress, welding cost, beam deection,
bending stress, buckling load on the bar, etc.
Functional requirements
Maximum allowable stress, maximum beam
deection

Typical Design Paradigms
Alternate ways of integrating wish and must attributes to

formulate a design model depend on what design intent will be.
One-wish many-must design paradigm, representing the

design scenario in which there is one single wish design
attribute but multiple must design attributes.
Many-wish many-must design paradigm, representing

the design scenario in which there are multiple wish design
attributes and multiple must design attributes.

Solving the Design Paradigm: Optimization
Mathematical Foundation
Optimization not only allow us to formulate, formalize, and
solve design paradigms it also yield an optimum design:
chooses the best design in the possible set of feasible designs.
Optimization theory is used as an underlying

mathematical means to formulate the design models.
Optimization formalism is a mathematical framework to

describe a computational design problem.
Optimization modeling approaches facilitate

computerization of design modeling and to support
automation of design computing.

Modeling and Formulation Strategy
Single-objective optimization formalism is used to

formulate one-wish many-must design paradigm.
Multi-objective optimization formalism is used to

formulate many-wish many-must design paradigm.

Optimization Formalism
Design components design variables and parameters

Wish attribute design objective
Must attribute design constraint
Functional requirement min/max objective function,
inequality/equality constraint function
Design availability side constraints or lower and upper
bounds on the design variables
Feasible design space delimited by the must attributes (i.e.,

design constraints) upon design availability

Optimization Formalism
Mathematical statement of a general optimization problem:
minimize f (x, p)
by varying x Rn
subject to ge (x, p) = 0, eq = 1, 2, . . . , me
gi (x, p) 0, in = 1, 2, . . . , mi
xl x xu
f : objective function, output (e.g. structural weight)
x : vector of design variables, inputs (e.g. aerodynamic
shape); bounds can be set on these variables
h : vector of equality constraints (e.g. lift); in general
these are nonlinear functions of the design variables
g : vector of inequality constraints (e.g. structural
stresses), may also be nonlinear and implicit

Classication of Optimization Problems (1)
Design Variables
single-variable vs. multivariable optimization
continuous vs. integer vs. discrete
optimization
Design Constraints
unconstrained vs. constrained optimization
Design Objective(s)
single-criterion vs. multi-criterion
optimization

Classication of Optimization Problems (2)

Design Space
local (convex) vs. global optimization
unimodal vs. multimodal optimization
Linearity
linear objective function and constraints
linear programming (LP)
nonlinear objective function and/or
constraints nonlinear programming (NLP)
Time
dynamic vs. static optimization
Data
deterministic vs. robust vs. stochastic
optimization

Sensitivity Analysis
What is sensitivity analysis?
Sensitivity analysis is formally dened as the task of computing

the k th order partial derivative of one or more quantities
(outputs) with respect to one or several independent variables
(inputs).
Let the outputs of interest of a given function f be dened as:

y = f (x) where x is the n-vector of independent inputs
x = [x1 , x2 , . . . , xn ]T .
If we consider rst-order sensitivity analysis and a single output

y y y
variable we are looking for: y = [ x 1
, x 2
, . . . , x n
], n

Why sensitivity analysis?
Although there are various uses for sensitivity information,

including the analysis of trade-os in design, our main
motivation is the use of this information in gradient-based
optimization.
By default, most gradient-based optimizers use sensitivity

analysis information. Since the calculation of gradients is often
the most costly step in the optimization cycle, using ecient
methods that accurately calculate sensitivities is extremely
important.
Accurate sensitivities are required for proper convergence.

Why sensitivity analysis?
As we will see later a gradient-based optimization algorithm we

usually require at least:
The sensitivities of the objective function,
f (x) = f /xi (n 1)
The sensitivities of all the active constraints at the current
design point g (x) = gj /xi (m n)
When the cost of calculating the sensitivities is proportional to

the number of design variables, and this number is large,
sensitivity analysis is the bottleneck in the optimization cycle.

Methods for Sensitivity Analysis
Finite Dierences Approximations: very popular; easy, but

lacks robustness and accuracy; run solver n times
df f (xi + h) f (x)
+ O(h)
dxi h
Complex-Step Method: relatively new; accurate and robust;

easy to implement and maintain; run solver n times
df [f (xi + ih)]
+ O(h2 )
dxi h

Methods for Sensitivity Analysis
Symbolic Dierentiation: accurate; restricted to explicit

functions of low dimensionality
Algorithmic/Automatic/Computational Dierentiation:
accurate; ease of implementation and cost varies
(Semi)-Analytic Methods: ecient and accurate; long

development time; cost can be independent of n

Finite Dierences Approximation
Finite-dierence formulae are very commonly used to estimate

sensitivities. Although these approximations are neither
particularly accurate or ecient, this methods biggest
advantage resides in the fact that it is extremely easy to
implement.
All the nite-dierencing formulae can be derived by truncating

a Taylor series expanded about a given point x.


Suppose we have a function f of one variable x. A common
estimate for the rst derivative is the forward-dierence which
can be derived from the expansion of f (x + h),
h2 2 h3 3
f (x + h) = f (x) + hf (x) + f (x) + f (x) + . . . , (1)
2! 3!
Solving for f (x) = df
dx we get the nite-dierence formula,
f (x + h) f (x)
f (x) = + O(h), (2)
h
where h is called the nite-dierence interval. The truncation
error is O(h), and hence this is a rst-order approximation.

f(x)
f(x+h)
x x+h


To reduce the truncation error, a second-order estimate we can
obtained using the expansion of f (x h),
h2 2 h3 3
f (x h) = f (x) hf (x) + f (x) f (x) + . . . , (3)
2! 3!
and subtract it from the expansion (1). The resulting equation
can then be solved for the derivative of f to obtain the
central-dierence formula,
f (x + h) f (x h)
f (x) = + O(h2 ) (4)
2h
with respect to the rst-order approximation a higher
truncation error accurancy is obtained at the expense of twice

the number of function evaluations.


More accurate estimates can also be derived by combining
dierent Taylor series expansions. Formulas for estimating
higher-order derivatives can be obtained by nesting
nite-dierence formulas. We can use, for example the central
dierence (4) to estimate the second derivative instead of the
rst,
f (x + h) f (x h)
2 f (x) = + O(h2 ) (5)
2h
and use central dierence again to estimate both f (x + h)
and f (x h) in the above equation to obtain,
f (x + 2h) 2f (x) + f (x 2h)

f (x) =
2
+ O(h) (6)
4h2

In theory, the accurancy of the approximations should depend

on the truncation errors arising from the neglected terms in the
Taylor series expansion.
In practice, the accurancy of the approximations critically

depend on the step size used due to nite precision arithmetic
(round-o errors).


When estimating sensitivities using nite-dierence formulae we
are faced with the step-size dilemma, that is the desire to
choose a small step size to minimize truncation error while
avoiding the use of a step so small that errors due to
subtractive cancellation become dominant.
Forward-dierence approximation:
df (x) f (x + h) f (x)
= + O(h) With 16-digit arithmetic,
dx h
f (x + h) +1.234567890123431
f (x) +1.234567890123456
f 0.000000000000025

For functions of several variables, that is when x is a vector (x),

then we have to calculate each component of the gradient
f (x) by perturbing the corresponding variable xi .
The cost of calculating sensitivities with nite-dierences is

therefore proportional to the number of design variables and f
must be calculated for each perturbation of xi . This means
that if we use forward dierences, for example, the cost would
be n + 1 times the cost of calculating f .

Complex-Step Approximation
The use of complex variables to develop estimates of
derivatives originated with the work of Lyness and
Moler [13, 14]. Their work produced several methods that
made use of complex variables, including a reliable method for
calculating the nth derivative of an analytic function. However,
only recently has some of this theory been rediscovered by
Squire and Trapp [15] and used to obtain a very simple
expression for estimating the rst derivative.
This estimate is suitable for use in modern numerical

computing and has shown to be very accurate, extremely
robust and surprisingly easy to implement, while retaining a
reasonable computational cost [11, 16]

We will now see that a very simple formula for the rst
derivative of real functions can be obtained using complex
calculus.
The complex-step derivative approximation can also be derived

using a Taylor series expansion. Rather than using a real step
h, we now use a pure imaginary step, ih. If f is a real function
in real variables and it is also analytic, we can expand it in a
Taylor series about a real point x as follows,
2 3
2 f (x) 3 f (x)
f (x + ih) = f (x) + ihf (x) h ih + . . . (7)
2! 3!

Taking the imaginary parts of both sides of (7) and dividing the
equation by h yields
[f (x + ih)] 2 f (x)
3
f (x) = +h + ... (8)
h 3!
Hence the approximations is a O(h2 ) estimate of the derivative
of f .

An alternative way of deriving and understanding the complex

step is to consider a function, f = u + iv , of the complex
variable, z = x + iy . If f is analytic the CauchyRiemann
equations apply, i.e.,
u v
= (9)
x y
u v
= (10)
y x
These equations establish the exact relationship between the
real and imaginary parts of the function.

We can use the denition of a derivative in the right hand side

of the rst CauchyRiemann equation (9) to obtain,
u v (x + i(y + h)) v (x + iy )
= lim (11)
x h0 h
where h is a small real number

Since the functions that we are interested in are real functions

of a real variable, we restrict ourselves to the real axis, in which
case y = 0, u(x) = f (x) and v (x) = 0.
Equation (11) can then be re-written as,
f [f (x + ih)]
= lim (12)
x h0 h

For a small discrete h, this can be approximated by,
f [f (x + ih)]
(13)
x h
We will call this the complex-step derivative approximation.
This estimate is not subject to subtractive cancellation error,

since it does not involve a dierence operation. This
constitutes a tremendous advantage over the nite-dierence
approaches expressed in (2, and 4).

It is worth noting that using both the real and imaginary parts
obtained from (7), and approximation for the second derivative
can be written as:
d 2f 2 (f (x) [f (x + ih)])
2
= 2
+ O(h2 ) (14)
dx h
Unfortunately, the preceeding equation is subject to
cancellation errors and hence the accuracy of the approximation

will be sensitive to the step size h.

To show the how the complex-step method works, consider the
following analytic function:
ex
f (x) = (15)
3 3
sin x + cos x
The exact derivative at x = 1.5 was computed analytically to
16 digits and then compared to the results given by the
complex-step (13) and the forward and central nite-dierence
approximations.
Relative error in the sensitivity estimates is done with the
analytic result as the reference,
|f fref |
= (16)
|fref |


The forward-dierence estimate initially converges to the exact

result at a linear rate since its truncation error is O(h).
The central-dierence converges quadratically, as expected.

However, as the step is reduced below a value of about 108
for the forward-dierence and 105 for the central-dierence,
subtractive cancellation errors become signicant and the
estimates are unreliable.
When the interval h is so small that no dierence exists in the

output (for steps smaller than 1016 ) the nite-dierence
estimates eventually yields zero and then = 1.

The complex-step estimate converges quadratically with

decreasing step size, as predicted by the truncation error
estimate.
The estimate is practically insensitive to small step sizes and

below an h of the order of 108 it achieves the accuracy of the
function evaluation.
Comparing the best accuracy of each of these approaches, we

can see that by using nite-dierence we only achieve a fraction
of the accuracy that is obtained by using the complex-step
approximation.

The complex-step size can be made extremely small. However,
there is a lower limit on the step size when using nite precision
arithmetic. The range of real numbers that can be handled in
numerical computing is dependent on the particular compiler
that is used. In this case, the smallest non-zero number that
can be represented is 10308 . If a number falls below this value,
underow occurs and the number drops to zero. Note that the
estimate is still accurate down to a step of the order of 10307 .
Below this, underow occurs and the estimate results in NaN.
Comparing the accuracy of complex and real computations,

there is an increased error in basic arithmetic operations when
using complex numbers, more specically when dividing and
multiplying.

New Functions and Operators
To what extent can the complex-step method be used in an

arbitrary algorithm? To answer this question, we have to look
at each operator and function in the algorithm.
Relational operators
Used with if statements to direct the execution thread.
Complex algorithm must follow same thread.
Therefore, compare only the real parts.
Also, max, min, etc.

Arithmetic functions and operators:

Most of these have a mathematical standard denition
that is analytic.
Some of them are implemented in Fortran.
1 x < 0
Exception: abs u x = v
y =
+1 x > 0

x iy x < 0
abs(x + iy ) = .
+x + iy x 0


Can the Complex-Step Method be Improved?
Improvements necessary
because,

arcsin(z) = i log iz + 1 z 2 , may yield a zero derivative
...
How? If z = x + ih, where x = O(1) and h = O(1020 ) then in

the addition, iz + z = (x h) + i (x + h) h vanishes when
using nite precision arithmetic.
Would like to keep the real and imaginary parts separate.

The complex denition of sine also problematic,
iz
sin(z) = e e
iz
2i .
The complex trigonometric relation yields a better alternative,
sin(x + ih) = sin(x) cosh(h) + i cos(x) sinh(h).
Note that linearizing this equation (that is for small h) this
simplies to, sin(x + ih) sin(x) + ih cos(x). From the standard

complex denition, arcsin(z) = i log iz + 1 z 2 .
Need real and imaginary parts to be calculated separately.

Linearizing in h about h = 0,
arcsin(x + ih) arcsin(x) + i 1x
h
2
.

Implementation Procedure
Cookbook procedure for any programming language:
Substitute all real type variable declarations with
complex declarations.
Dene all functions and operators that are not dened for
complex arguments.
A complex-step can then be added to the desired variable
and the derivative can be estimated by
f [f (x + ih)]/h.

Implementation Procedure
Fortran 77: write new subroutines, substitute some of the
intrinsic function calls by the subroutine names, e.g. abs
by c abs. But . . . need to know variable types in original
code.
Fortran 90: can overload intrinsic functions and operators,
including comparison operators. Compiler knows variable
types and chooses correct version of the function or
operator.
C/C++: also uses function and operator overloading.

Complex-Step Approximation Applications

3D Aero-Structural Design Optimization Framework [11]
0
10
ComplexStep
1
10 Finitedifference
2
10
Relative Error,
3
10
4
10
5
10
6
10 5 10 15
10 10 10
Step Size, h


Design of natural laminar ow supersonic aircraft [17]
Transition prediction, Viscous and inviscid drag

Design optimization
Wing planform and airfoil design
Wing-Body intersection design


2
10
Finite Difference
ComplexStep
0
10
Relative Error,
2
10
4
10
6
10
8
10 0 5 10 15 20
10 10 10 10 10
Step Size, h
Sensitivity estimate vs. step size

Algorithmic Dierentiation
Algorithmic dierentiation, also known as computational
dierentiation or automatic dierentiation (AD), is a well
known method based on the systematic application of the chain
rule of dierentiation to each operation in a computer program
ow.
Although this approach is as accurate as an analytic method, it
is potentially much easier to implement since this can be done
automatically.
The derivatives given by the chain rule can be propagated
forward (forward mode) or backward (reverse mode).
When using the forward mode, for each intermediate variable in
the algorithm, a variation due to one input variable is carried
through.

Example Using algorithmic dierentiation in forward and
reverse modes, we want to compute the derivatives of:
f (x1 , x2 ) = x1 x2 + sin(x1 )
where the exact sensitivities are

f x2 + cos(x1 )
=
x x1
We want this matrix evaluated at x = [0, 2], i.e.

f 1 + cos (2) 2
= =
x 2 2

In general, we denote the independent variables as t1 , t2 , . . . tn .

In this case n = 2. The calculation of the given function is
interpreted as the sequence of elementary operations each
denoted as dependent variables, which we write as
tn+1 , tn+2 , . . . , tm .
The sequence of operations in the algorithm is then given by
ti = fi (t1 , t2 , . . . ti1 ) , i = n + 1, n + 2, . . . , m (17)
Each of these functions are either unary or binary operations.

We can write the example function (f = x1 x2 + sin(x1 )) as the

following series of computations:
t 1 = x1
t 2 = x2
t3 = t1 t2
t4 = sin t1
t5 = t3 + t4 (= f )
Thus in this case, m = 5.

The graph of the algorithm provides information on the

interdependence of all the intermediate variables.

The chain rule can be applied to each of these operations and
is written as
ti fi tk
i1
= , j = 1, 2, . . . , n (18)
tj tk tj
k=1
Using the forward mode, we choose one j and keep it xed. We

then work our way forward in the index i until we get the
desired derivative.
The reverse mode, on the other hand, works by xing i, the

desired quantity we want to dierentiate, and working our way
backward in the index j all the way to the independent
variables.

Using forward mode in our example we have:
operation derivative i=1 i=2
t1
t1 = x1 xi 1 0
t2
t2 = x2 xi 0 1
t3
t3 = t1 t2 xi = t 1 t2
xi t2 + t1 xi 1x2 + x1 0 = x2 0x2 + x1 1 = x1
t4
t4 = sin t1 xi = cos t1 t
xi
1
cos x1 1 0
t5
t5 = t3 + t 4 xi = t 3 t4
xi + xi x2 + cos x1 x1 + 0
so evaluating at the x = [0, 2] we have
f
= x2 + cos x1 = 1 + 1 = 2
x1
f
= x1 = 2
x2

Forward Mode Dierentiation

Using reverse mode we traverse the chain rule from left to
right, in our example we have:
f
=1
t5
f f t5 f
= = 1=1
t4 t5 t4 t5
f f t5 f
= = 1=1
t3 t5 t3 t5
f f t3 f
= = t1 = x2
t2 t3 t2 t3
f f t4 f t3 f f
= + = cos t1 + t2 = cos x1 + x2
t1 t4 t1 t3 t1 t4 t3

Reverse Mode Dierentiation

The cost of calculating the derivative of one output to many

inputs is not proportional to the number of input but to the
number of outputs.
When using the reverse mode we need to store all the

intermediate variables as well as the complete graph of the
algorithm, the amount of memory that is necessary increases
dramatically and the computational cost becomes prohibitive.

Recall chain rule (18) that forms the basis for both the forward
and reverse modes
ti fi tk
i1
= , j = 1, 2, . . . , n
tj tk tj
k=1
This can be written as a matrix equation,
L

D =

E (19)
mm mn mn
where L is the m m lower-triangular matrix of partial

derivatives of each function with respect to the intermediate
variables involved in the function, i.e. these are explicit
derivatives.

The unit diagonal multiplied by D represents the left-hand side

of the chain rule, while the nonzero o-diagonal terms come
from fi /tk in the chain rule and therefore, Lij = fi /tk .
For each row of L, only one or two o-diagonal terms are
nonzero.
D is the matrix of sensitivities we must solve for. Each column

of D is the m-vector of derivatives tk /xj , k = 1, 2, . . . , m.
We seek the the bottom n n block of this matrix.
Each column of E is the j th unit vector.

The solution can be obtained in two dierent ways.
Forward mode:
Pick one column j of D and the corresponding column of E
and solve for the derivative of all variables with respect to one
independent variable by forward substitution,
LDj = Ej Ldj = ej (20)
Reverse mode:
We solve for the chosen row of D. This can be done by writing
1 1
L1 = L1
m Lm1 . . . Ln+1 (21)
A row k of D can then be computed from
1 1
Dk = ekT L1
m Lm1 . . . Ln+1 E (22)

Tools for Algorithmic Dierentiation

There are two main methods for implementing algorithmic
dierentiation
Source code transformation
Derived datatypes and operator overloading

Source Transformation
To implement algorithmic dierentiation by source
transformation, the whole source code must be processed with
a parser and all the derivative calculations are introduced as
additional lines of code. The resulting source code is greatly
enlarged and it becomes practically unreadable. This fact might
constitute an implementation disadvantage as it becomes
impractical to debug this new extended code. One has to work
with the original source, and every time it is changed (or if
dierent derivatives are desired) one must rerun the parser
before compiling a new version. The advantage is that this
method tends to yield faster code.

Datatypes and Operator Overloading
To implement algorithmic dierentiation using derived
datatypes and operator overloading we need languages that
support this feature, such as Fortran 90, C++, and Matlab. To
implement algorithmic dierentiation using this feature, a new
type of structure is created that contains both the value and its
derivative. All the existing operators are then re-dened
(overloaded) for the new type. The new operator has exactly
the same behavior as before for the value part of the new type,
but uses the denition of the derivative of the operator to
calculate the derivative portion. This results in a very elegant
implementation since very few changes are required in the
original code.


Many tools for automatic algorithmic dierentiation of
programs in dierent languages exist. Some have been
extensively developed and provide the user with great
functionality, including the calculation of higher-order
derivatives and reverse mode options.


Fortran: Tools that use the source transformation approach
include: ADIFOR, TAMC, DAFOR, GRESS, TAPENADE. The
necessary changes to the source code are made automatically.
The derived datatype approach is used in the following tools:
AD01, ADOL-F, IMAS and OPTIMA90. Although it is in
theory possible to have a script make the necessary changes in
the source code automatically, none of these tools have this
facility and the changes must be done manually.


C/C++: Established tools for automatic algorithmic
dierentiation also exist for C/C++. These include include
ADIC, an implementation mirroring ADIFOR, and ADOL-C, a
free package that uses operator overloading and can operate in
the forward or reverse modes and compute higher order
derivatives.
Matlab: The Matlab Automatic Dierentiation (MAD)

commerical toolbox that uses operator overloading and
operates in the forward mode.

Forward mode connection to the complex-step method

Suppose we want to dierentiate f = x1 x2 , with respect to x1 .
Algorithmic Complex-Step
x1 = 1 h1 = 1020
x2 = 0 h2 = 0
f = x1 x2 f = (x1 + ih1 )(x2 + ih2 )
f = x1 x2 + x2 x1 f = x1 x2 h1 h2 + i(x1 h2 + x2 h1 )
df /dx1 = f df /dx1 = f /h

We can see, algorithmic dierentiation stores the derivative
value in a separate set of variables while the complex step
carries the derivative information in the imaginary part of the
variables, while the complex-step method performs one
additional operation, the calculation of the term h1 h2 . This
additional operation becomes superuous as a very small h
using nite precision arithmetic has no eect on the real part of
the result.
Both methods work for an algorithm involving an arbitrary

sequence of operations by propagating the variation of one
input forward throughout the code. This means that in order to
calculate n derivatives, the dierentiated code must be
executed n times.


Source transformation (ADIFOR, ADIC): resulting code is
unmaintainable.
Derived datatype and operator overloading (ADOL-F,
ADOL-C): far fewer changes are necessary in source code,
requires object-oriented language.
Complex Step:
Even fewer changes are required.
Resulting code is maintainable.
Can be easily implemented in any programming language
that supports complex arithmetic.

Analytic Sensitivity Analysis
Analytic methods are the most accurate and ecient methods

available for sensitivity analysis.
They are, however, more involved than the other methods we

have seen so far since they require the knowledge of the
governing equations and the algorithm that is used to solve
those equations.
We will learn how to compute analytic sensitivities with direct

and adjoint methods. We will start with single discipline
systems and then generalize for the case of multiple systems.

Notation
f function of interest/output (could be a vector)

Rk residuals of governing equation, k = 1, . . . , NR
xn design/independent/input variables, n = 1, . . . , Nx
yi state variables, i = 1, . . . , NR
k adjoint vector, k = 1, . . . , NR


Basic Equations
The main objective is to calculate the sensitivity of a function

of interest with respect to a number of design variables. The
function of interest can be either the objective function or any
of the constraints specied in the optimization problem.
In general, such functions depend not only on the design

variables, but also on the physical state of the system under
analysis. Thus we can write the function as
f = f (xn , yi ) (23)
where xn represents the vector of design variables and yi is the

state variable vector.

For a given vector xn , the solution of the governing equations

of the system yields a vector yi , thus establishing the
dependence of the state of the system on the design variables.
We denote these governing equations by
Rk (xn , yi (xn )) = 0. (24)
The rst instance of xn in the above equation indicates the fact

that the residual of the governing equations may depend
explicitly on xn .

In the case of a structural solver, for example, changing the size

of an element has a direct eect on the stiness matrix.
By solving the governing equations we determine the state, yi ,

which depends implicitly on the design variables through the
solution of the system. These equations may be non-linear, in
which case the usual procedure is to drive residuals, Rk , to
zero using an iterative method.


Since the number of equations must equal the number of state
variables, the ranges of the indices i and k are the same, i.e.,
i, k = 1, . . . , NR .
In the case of a structural solver, for example, NR is the

number of degrees of freedom, while for a CFD solver, NR is
the number of mesh points multiplied by the number of state
variables at each point.


As a rst step toward obtaining the derivatives that we
ultimately want to compute, we use the chain rule to write the
total sensitivity of f as
df f f dyi
= + (25)
dxn xn yi dxn
for i = 1, . . . , NR , n = 1, . . . , Nx . Index notation is used to
denote the vector dot products.
It is important to distinguish the total and partial derivatives in
this equation. The partial derivatives can be directly evaluated
by varying the denominator and re-evaluating the function in
the numerator. The total derivatives, however, require the
solution of the problem. Thus, all the terms in the total
sensitivity equation (25) are easily computed except for
dyi /dxn .

Since the governing equations must always be satised, the

total derivative of the residuals (24) with respect to any design
variable must also be zero.
Expanding the total derivative of the governing equations with

respect to the design variables we can write,
dRk Rk Rk dyi
= + =0 (26)
dxn xn yi dxn
for all i, k = 1, . . . , NR and n = 1, . . . , Nx .

Expression (26) provides the means for computing the total

sensitivity of the state variables with respect to the design
variables. By rewriting equation (26) as
Rk dyi Rk
= (27)
yi dxn xn


We can solve for dyi /dxn and substitute this result into the
total derivative equation (25), to obtain
dyi /dxn

1
df f f Rk Rk
= (28)
dxn xn yi yi xn

k
The inverse of the Jacobian Rk /yi is not necessarily

explicitly calculated. In the case of large iterative problems
neither this matrix nor its factorization are usually stored due
to their prohibitive size.


Direct Sensitivity Equations
The approach where we rst calculate dyi /dxn using
equation (27) and then substitute the result in the expression
for the total sensitivity (28) is called the direct method.
Note that solving for dyi /dxn requires the solution of the
matrix equation (27) for each design variable xn .
A change in the design variable aects only the right-hand side

of the equation, so for problems where the matrix Rk /yi can
be explicitly factorized and stored, solving for multiple
right-hand-side vectors by back substitution would be relatively
inexpensive.

However, for large iterative problems, such as the ones

encountered in CFD, the matrix Rk /yi is never factorized
explicitly and the system of equations requires an iterative
solution which is usually as costly as solving the governing
equations.
When we multiply this cost by the number of design variables,

the total cost for calculating the sensitivity vector may become
unacceptable.


Adjoint Sensitivity Equations
Returning to the total sensitivity equation (28), we observe
that there is an alternative option for computing the total
sensitivity df /dxn . The auxiliary vector k can be obtained by
solving the adjoint equations
Rk f
k = (29)
yi yi
The vector k is usually called the adjoint vector and is
substituted into equation (28) to nd the total sensitivity. In
contrast with the direct method, the adjoint vector does not
depend on the design variables, xn , but instead depends on the
function of interest, f .


Direct vs. Adjoint Sensitivities
We can now see that the choice of the solution procedure
(direct vs. adjoint) to obtain the total sensitivity (28) has a
substantial impact on the cost of sensitivity analysis.
Although all the partial derivative terms are the same for both
the direct and adjoint methods, the order of the operations is
not. Notice that once dyi /dxn is computed, it is valid for any
function f , but must be recomputed for each design variable
(direct method).
On the other hand, k is valid for all design variables, but

must be recomputed for each function (adjoint method).


The cost involved in calculating sensitivities using the adjoint
method is therefore practically independent of the number of
design variables. After having solved the governing equations,
the adjoint equations are solved only once for each f . Moreover,
the cost of solution of the adjoint equations is similar to that of
the solution of the governing equations since they are of similar
complexity and the partial derivative terms are easily computed.


Therefore, if the number of design variables is greater than the
number of functions for which we seek sensitivity information,
the adjoint method is computationally more ecient.
Otherwise, if the number of functions to be dierentiated is
greater than the number of design variables, the direct method
would be a better choice.


A comparison of the cost of computing sensitivities with the
direct versus adjoint methods is shown in Table 145. With
either method, we must factorize the same matrix, Rk /yi .
The dierence in the cost comes form the back-solve step for
solving equations (27) and (29) respectively. The direct
method requires that we perform this step for each design
variable (i.e. for each j) while the adjoint method requires this
to be done for each function of interest (i.e. for each i). The
multiplication step is simply the calculation of the nal
sensitivity expressed in equations (27) and (29) respectively.
The cost involved in this step when computing the same set of
sensitivities is the same for both methods.

Step Direct Adjoint

Factorization same same
Back-solve Nx times Nf times
Multiplication same same

In this discussion, we have assumed that the governing

equations have been discretized. The same kind of procedure
can be applied to continuous governing equations. The
principle is the same, but the notation would have to be more
general. The equations, in the end, have to be discretized in
order to be solved numerically. The gure below shows the two
ways of arriving at the discrete sensitivity equations. We can
either dierentiate the continuous governing equations rst and
then discretize them, or discretize the governing equations and
dierentiate them in the second step.


The two ways of obtaining the discretized sensitivity equations
Continuous Discrete
Sensitivity Sensitivity
Equations Equations 1
Continuous
Governing
Equations
Discrete Discrete
Governing Sensitivity
Equations Equations 2
The resulting sensitivity equations should be equivalent, but are

not necessarily the same. Dierentiating the continuous
governing equations rst is usually more involved. In addition,
applying boundary conditions to the dierentiated equations
can be non-intuitive as some of these boundary conditions are
non-physical.

Analytic Sensitivity Analysis Application
Structural Sensitivity Analysis

The discretized governing equations for a nite-element
structural model are,
Rk = Kki ui Fk = 0 (30)
where Kki is the stiness matrix, ui is the vector of

displacement (the state) and Fk is the vector of applied force
(not to be confused with the function of interest from the
previous section!).

We are interested in nding the sensitivities of the stress, which

is related to the displacements by the equation,
m = Smi ui (31)
We will consider the design variables to be the cross-sectional

areas of the elements, Aj . We will now look at the terms that
we need to use the generalized total sensitivity equation (28).


For the matrix of sensitivities of the governing equations with
respect to the state variables we nd that it is simply the
stiness matrix, i.e.,
Rk (Kki ui Fk )
= = Kki (32)
yi ui
Lets consider the sensitivity of the residuals with respect to the
design variables (cross-sectional areas in our case). Neither the
displacements of the applied forces vary explicitly with the
element sizes. The only term that depends on Aj directly is the
stiness matrix, so we get,
Rk (Kki ui Fk ) Kki
= = ui (33)
xj Aj Aj

The partial derivative of the stress with respect to the

displacements is simply given by the matrix in equation (31),
i.e.,
fm m
= = Smi (34)
yi ui
Finally, the explicit variation of stress with respect to the
cross-sectional areas is zero, since the stresses depends only on
the displacement eld,
fm m
= = 0. (35)
xj Aj


Substituting these into the generalized total sensitivity
equation (28) we get:
dm m 1 Kki
= K ui (36)
dAj ui ki Aj
Referring to the theory presented previously, if we were to use
the direct method, we would solve,
dui Kki
Kki = ui (37)
dAj Aj
and then substitute the result in,
dm m dui
= (38)
dAj ui dAj
to calculate the desired sensitivities.

The adjoint method could also be used, in which case we would

solve equation (29) for the structures case,
m
KkiT k = (39)
ui
Then we would substitute the adjoint vector into the equation,

dm m Kki
= + kT ui (40)
dAj Aj Aj
to calculate the desired sensitivities.

Computational Accuracy and Cost
Method Sample Sensitivity Time Memory

Complex 39.049760045804646 1.00 1.00
ADIFOR 39.049760045809059 2.33 8.09
Analytic 39.049760045805281 0.58 2.42
FD 39.049724352820375 0.88 0.72

All except nite-dierence achieve the solvers precision.

Analytic: best, but not easy to implement.
ADIFOR: costly.
Complex-step: good compromise.
Caveat: ratios depend on problem.

Minimize Drag on Rotating Airfoil




Self-Organizing Maps - Qualitative Sensitivity Analysis
A Kohonons Self-Organizing Map (SOM) is a type of

unsupervised learning algorithm where the goal is to discover
some underlying structure of the data [18].
With a SOM there is a topological structure imposed which

preserves neighborhood relations on the data.
While the inner workings of the SOM are outside of the scope
of study of these course the results of SOMs can be useful in
visualizing sensitivities.

Self-Organizing Maps - Qualitative Sensitivity Analysis

Umatrix Vpz T3
209000 0.0231 757
105000 0.0117 545
1.21 0.000198 333

d d
P3 mdot air T4
1.3e+006 17.6 2580
701000 8.84 2450
101000 0.0933 2310

d d d
mNOX
19.3
10.3
1.37
d
Self-Organizing Map of Combustor Data


Lecture Notes 02 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes 02 PDF

Uploaded by

Copyright:

Available Formats

Design Problem Formalization

Necessary Entities for Design Formalization

Given a design problem, you need to come up with this ve

(1) Design Component (parameter/variable)

Design vector: {x, p}

(2) Design Attribute

Design attribute refers to any property or characteristic of a

(3) Functional Requirement (FR)

FR refers to the composition of a design attribute, a permissible

For example: the normal stress is not in excess of 50 MPa, the

(4) Design Function

The mappings () can be any calculation or procedure to

(5) Design Availability

Design availability refers to an admissible space in which each

It signies the physical limitations imposed on the design

(5) Design Availability

Note that design availability is not the same as the notion of a

The latter lies in the space formed by considering the

By contrast design availability delimits the space resulting from

The geometric and topologic pattern of design variables is

In general, a design is a state of conguration, say

Design Problem Statement

Given a set of design attributes and the associated functional

Wish Attribute Modeling

Wish Attribute Modeling

A wish design attribute is viewed as the functional

Wish Attribute Modeling

A wish design attribute is viewed as the functional

Must Attribute Modeling

Must Attribute Modeling

A must design attribute is viewed as the functional

Welded Beam Design (1)

Welded Beam Design (2)

Typical Design Paradigms

Alternate ways of integrating wish and must attributes to

One-wish many-must design paradigm, representing the

Many-wish many-must design paradigm, representing

Optimization theory is used as an underlying

Optimization formalism is a mathematical framework to

Optimization modeling approaches facilitate

Modeling and Formulation Strategy

Single-objective optimization formalism is used to

Multi-objective optimization formalism is used to

Design components design variables and parameters

Feasible design space delimited by the must attributes (i.e.,

Classication of Optimization Problems (1)

Classication of Optimization Problems (2)

What is sensitivity analysis?

Sensitivity analysis is formally dened as the task of computing

Let the outputs of interest of a given function f be dened as:

If we consider rst-order sensitivity analysis and a single output

Why sensitivity analysis?

Although there are various uses for sensitivity information,

By default, most gradient-based optimizers use sensitivity

Accurate sensitivities are required for proper convergence.

Why sensitivity analysis?

As we will see later a gradient-based optimization algorithm we

When the cost of calculating the sensitivities is proportional to

Methods for Sensitivity Analysis

Finite Dierences Approximations: very popular; easy, but

Complex-Step Method: relatively new; accurate and robust;

Methods for Sensitivity Analysis

Symbolic Dierentiation: accurate; restricted to explicit