Professional Documents
Culture Documents
Multidisciplinary Design
Optimization
25
2.1 A Gentle Introduction to MDO
2.1.1 Multidisciplinary Design
The goal of multidisciplinary design optimization, according to [Kroo, 1997],
is to ‘provide a more formalized method for complex system design than is
found in traditional design approaches’. This goal is to be achieved by
applying numerical optimization techniques to the multidisciplinary design
process.
For us to understand what this means, it is important to know what is
meant by ‘complex system design’, ‘traditional design approaches’, and of
course ‘numerical optimization.’
By design, we mean the process of defining a system that meets some set
of requirements. This process is iterative in nature: It is usually not possible
to design the system in one go. Generally, an initial design is analyzed
and based on the results of this analysis the design is adapted, until the
requirements are met. By analysis we refer to the process of determining
the response of a system to its environment [Vanderplaats, 2007].
Complex system design refers to the design of systems that are too com-
plicated for a single person to be able to grasp all the details and interactions
associated with it. Examples of such systems are aircraft, cars, wind tur-
bines, and so on and so forth. Inherent to the complexity of these systems
is the multidisciplinary nature of the design process.
The multidisciplinary design approach offers a way to cope with the com-
plexity of the system. The division of a large complex system into smaller,
more manageable subsystems (disciplines, subdisciplines, etc.), enables peo-
ple to get a grip on the design task.
In multidisciplinary design, experts from many different disciplines (e.g.
structures, aerodynamics, electronics, etc.) have to work together to pro-
duce a single consistent end-product (design) that meets the specified re-
quirements.
These disciplines are often mutually dependent, which implies that large
amounts of information need to be exchanged between them. Some form of
coordination is required in order to guide this process in the right direction.
This information management task is complicated greatly by the iterative
nature of the design process.
One example of such an iterative process is fixed point iteration (FPI).
Fixed point iteration can be applied on different levels in design and analysis,
but the basic idea behind the concept is described mathematically in the gray
box below.
26
Fixed Point Iteration
g(xn )
xn+1 = xn − (2.2)
g 0 (xn )
27
Dynamics) model of a part usually has a different kind of grid than a FE
(Finite Element) model. Hence, if one is to be used as a basis for the other,
some translation between grids needs to be made.
As a result of such difficulties in information management, among other
things, it is often very hard even to produce a design that meets all the
requirements. Due to limited resources (e.g. time), a design often goes
through only a few (system level) iterations. This implies that it is difficult to
perform optimization in traditional complex system design. An interesting
discussion of these issues is given by [Kroo, 1997].
In the concurrent engineering approach, the traditional division into spe-
cialist design departments is replaced by integrated product development
teams with a more multidisciplinary nature [Bartholomew, 1998]. MDO is
meant to enable such an integrated approach.
As mentioned, the goal of MDO is to provide a more formalized method
for complex system design. The general idea is that, by formalizing the mul-
tidisciplinary design process, it can be made amenable to numerical opti-
mization. Among other things, the use of numerical optimization is expected
to provide a considerable improvement in efficiency and robustness of the
design process, and it is expected to help in avoiding difficulties associated
with traditional design methods such as sequential design [Kroo, 1997].
Thus, numerical optimization plays a very large role in MDO.
28
Figure 2.1: The basic optimization process (source: [de Weck et al., 2007])
Now, you may wonder how an optimizer actually solves a problem. Per-
haps an example can shed some light on this issue.
Suppose you would like to know the location of the deepest point on the
bottom of a murky lake. However, you only have one hour to do it, and you
have nothing but a small rowboat at your disposal with a long stick that
can be used as a depth gauge. Furthermore, you don’t have a clue what the
bottom of the lake looks like, let alone where to find the deepest point...
This pretty much sums up the basic problem of optimization: You want
to find the maximum depth (or minimum height), but the resources at your
disposal are limited, and you have no idea what the depth ‘function’ looks
like. Of course, in the world of aircraft and spacecraft the problem is usually
much more complex, but we will get to that soon enough. For now, let us
consider your options for the problem stated above. First some terminology,
intended to make our discussion more clear.
The surface of the lake represents the domain of all possible locations
for your rowboat, and is referred to as the design space. Let’s assume that
the depth of the lake can be represented by a function of the location on
the surface. This function is called the objective function, or simply the
objective. Your goal is to find the deepest point, in other words you want to
find the location at which the objective is maximized (or minimized). This
point is the optimum.
In order to know the depth at location Xq , you need to use the depth
gauge to feel for the bottom. This is called an evaluation of the objective
function: f (Xq ). Each function evaluation has a cost associated with it.
The cost may be expressed in terms of various resources, but in this context
it will be expressed in terms of time.
An evaluation is considered expensive if it requires a lot of time (e.g.
computation time for numerical simulations). A search method that requires
many function evaluations is also deemed expensive (unless the evaluations
are very cheap).
29
Now, the most obvious way to find the deepest point of the lake would
perhaps be to divide the surface into a fine grid of many thousands of points,
and then to evaluate the depth at every point. This would yield a map of the
entire bottom, perhaps similar to figure 2.2, from which the deepest point
could easily be found.
If scarcity of resources were not an issue, this kind of approach would
be very interesting [Nievergelt, 2000]. Unfortunately, this exhaustive search
method is not applicable in our case because it would be much too expensive:
Gauging the depth of a large lake at many thousands of locations, using a
rowboat, in one hour? Go figure.
30
and gauge the depth. If the new point is deeper, choose a new random
direction and repeat, otherwise go back to the previous location, choose
another direction and gauge the depth again. This sequence is then repeated
until a better location is found. If no improvement can be found in any
direction, it would be possible to change the move distance. If still no
improvement is found, you may conclude that you have reached an optimum.
However, not only does this kind of method still rely on luck, it also
has another problem. For example, looking at figure 2.2, if you start on the
left-hand-side of the lake, it is likely that you arrive at a local deep spot and
would mistake this for the deepest point of the lake. The rest of the lake
would probably remain unexplored.
You might also choose to adopt a kind of pattern search strategy, in
which you evaluate (explore) points around your current location in some
predefined pattern. If you find a better (i.e. deeper) point, move there and
explore the points surrounding the new location. If not, reduce the distance
between points in the pattern and explore again. This sequence is then
continued until the distance between points becomes very small. Another
related method is that of [Hooke and Jeeves, 1961].
Now, in order to make things more interesting, suppose that you can
also use the depth gauge to determine the slope (gradient) of the bottom.
This extra information allows you to determine in which direction you need
to move to find a deeper point. Once this direction is established (direction
of steepest descent), you only need to move in that direction, gauging the
depth every few meters until you no longer find a deeper point. Then you
determine the slope again in order to find a new direction in which to search.
This method is known as line search or one dimensional search.
Although gradient based methods can also become trapped in local deep
spots, they are usually orders of magnitude faster than direct search. There-
fore, this text will focus primarily on gradient based search methods, with
only a few excursions to direct search type methods.
The solution procedures or algorithms described above are well suited
for numerical implementation. The performance of an algorithm can then
be measured by the number of numerical operations required to solve a
specific problem. This subject is studied in the field of computing science,
and specifically that of computational complexity.
Although complexity theory is of little practical significance for us, it is
important to know at least a little something about it.
31
seconds, for example, but also in terms of computational steps. The amount
of computational effort required by different algorithms is studied in the
field of computational complexity.
Although this is not the place for an in depth discussion of computational
complexity theory, some of the basic concepts from this field do need to
be considered. This has to do mainly with the performance assessment of
optimization methods. Indeed, in order to make a statement about the
performance of an optimization algorithm on a specific problem, it would
help if we knew the performance limits for the problem, i.e. the minimum
and maximum possible cost for solving the problem.
For example, a matrix multiplication of two n × n matrices (A and B)
will at least require 2n2 computational steps, because the algorithm will at
least have to look at all the input, and the number of entries in the two
matrices is 2n2 . It can be shown that an upper limit (worst case complexity
estimate) for the matrix multiplication problem is in the order of n3 com-
putational steps. If C = AB then C is also n × n, and for every entry in C,
n multiplications need to be performed, hence n × n × n = n3 .
An algorithm that has complexity in the order of some polynomial nk ,
for large n, where n is the amount of input data and k is some number, is
said to exhibit polynomial time performance. Such algorithms are classified
as fast, irrespective of the magnitude of k (so the calculations may still take
a very long time). A slow algorithm on the other hand would require more
than polynomial time to come up with a solution, for example exponential
time algorithms require in the order of k n computational steps (figure 2.3
shows the difference in growth1 ). If a problem can be solved in polynomial
time it is said to be easy, if the solution cannot be found in polynomial time
the problem is considered hard.
In literature one may encounter reference to special classes of problems
called P and NP (Polynomial and Non-deterministic Polynomial, respec-
tively). These are just two of many (488!) complexity classes [Aaronson,
2008]. The class P contains the set of all easy problems, i.e. all the problems
that can be solved in polynomial time. The class NP contains the problems
for which the solution may or may not be found in polynomial time, but for
which a solution can at least be checked in polynomial time. In other words,
P contains the problems for which it is easy to find a solution, whereas NP
contains the problems for which it is easy to check a solution (i.e. to see
whether the solution is correct), even though finding the solution may be
hard.
Problems considered in complexity theory are usually decision problems.
A decision problem is one that has only a ‘Yes’ or ‘No’ answer. Optimization
problems can often be reduced to decision problems, and it can be shown
1
An interesting view on the implications of exponential growth is provided in the video
lecture by [Bartlett, 2004]
32
computational steps
2n
n9
Figure 2.3: The difference in growth between some polynomial function (nk ) and
some exponential function (k n ). Depending on the value of k, the polynomial func-
tion may grow faster initially, but as n increases the exponential always ‘wins’.
that if there is a fast way to solve the decision problem then there is a fast
way to solve the optimization problem [Wilf, 1994].
A decision problem is NP -complete if it is in NP and if every problem
in NP can be quickly reduced to it. This implies that if a fast solution to
an NP -complete problem could be found, it would be fast for every problem
in NP. The only downside is that such a solution has not been found (yet).
There is even a one-million dollar prize waiting for the person that does find
this solution.
A decision problem is NP -hard if it is at least as hard as the hardest
problems in NP, but it does not necessarily have to be in NP. Probably
the best-known example of an NP -hard problem is the Travelling Salesman
Problem (TSP) depicted in figure 2.4.
33
Suppose a travelling salesman needs to visit a certain number of cities.
The cost of travelling between each pair of cities is known. The travelling
salesman problem, posed as an optimization problem, is to find the route
that minimizes the travelling cost while satisfying the constraints that every
city has to be visited at least once and that the starting point is also the
end point (round-trip) [Cook, 2008].
The TSP can also be stated as a decision problem. For example: Given
the costs and some number c, decide whether there is a round-trip route
that is cheaper than c.
Although we have merely touched the tip of the iceberg on the subject
of complexity, this is as far as we go. For more information, please refer to
e.g. [Wilf, 1994], [Aaronson, 2008], or [Papadimitriou, 1994].
By now we are ready to start looking into numerical optimization in
more detail.
34
Table 2.1: Classification of optimization problems
Characteristic Variations
Objective Function Structure Single-objective or Multiobjective
Model Structure Linear or Nonlinear
Constraint Existence Unconstrained or Constrained
Design Space Convexity Convex or Non-convex
Design Variable Type Real Cont./Discr. or Integer/Binary
Design Variable Nature Static or Dynamic
Parameter Nature Deterministic or Stochastic
Adaptation of Table 2.1 in [Jilla, 2002]
35
In this description,
X = (x1 , x2 , · · · , xn )T (2.7)
is the design vector consisting of all the design variables xi , F is the (scalar)
objective, gj represents the inequality constraints, hk represents the equal-
ity constraints, and the other constraints are called side constraints. Side
constraints are actually inequality constraints, but in many cases it is more
convenient to treat them separately because they impose direct limits on the
design space (i.e. the collection of all possible design vectors, a.k.a. decision
space).
Notice that it is possible to get rid of an equality constraint by replacing
it with two equal but opposite inequality constraints. This is important
because some optimization methods are better equipped to handle inequality
constraints.
The search space or feasible region is a subset of the design space where
all constraints are satisfied [Miettinen, 1999]. A solution to the optimiza-
tion problem must be feasible (i.e. it must be part of the feasible region),
otherwise it is of no use.
X1
X1
X1
X2 X2 X2
36
Convex Concave Non-convex
Figure 2.6: Convexity in functions. The region above a concave function represents
a non-convex set, hence a concave function may have several local minima on its
domain.
the feasible design space defines a convex set, and there will be only one op-
timum, which is the global optimum. A non-convex design space will have
multiple optima and there need not be a single global optimum (since mul-
tiple optima may have the same value). Necessary and sufficient conditions
for a global optimum are discussed in more detail in Section 2.2.6.
Keep in mind throughout this text that, in general, we will not have a
clear view of what the design space actually looks like. This implies that we
are more or less working blindfolded. In practical engineering problems it is
rarely possible to ensure that an absolute optimum will be found.
where q is the iteration number. Note that a new solution could also be
worse than the original, in which case it would obviously be rejected.
Numerical optimization methods differ in the way the required pertur-
bation δX is determined. The perturbation may be completely random or it
may be based on information about the sensitivity of the objective to change
in the design variables, i.e. gradient information. This leads to the impor-
tant distinction between gradient based methods and non-gradient based
methods.
3
Note that establishing an initial design is usually far from trivial unless the problem
is very simple.
37
Gradient based methods use first order or second order gradient infor-
mation of the objective in order to determine the required perturbation in
the design variables, δX, hence these methods are referred to as first order
methods or second order methods respectively.
Non-gradient based methods, also known as direct search methods [Hooke
and Jeeves, 1961] or zero-order methods, only make use of objective function
values. They are called combinatorial methods if the set of feasible solutions
is discrete.
Non-gradient based methods are usually one or two orders of magnitude
more expensive, in terms of computational cost, than gradient based meth-
ods [Whitney et al., 2005]. Thus, if a gradient based method can be applied
this is usually preferred over non-gradient based methods. Nevertheless the
nature of the problem may complicate, or even prevent, the use of gradient
based methods, e.g. in the case of discrete variables, non-smooth functions,
or a non-convex design space.
One disadvantage of gradient based methods is their inability to explore
the design space. In a non-convex design space (multiple local optima) a
gradient based method will converge to a local optimum and then stop,
whereas non-gradient based methods often explore the whole design space
and are thus better suited for finding a global optimum, although they usu-
ally exhibit slow convergence. Global optimization algorithms may use a
combination of gradient based methods and non-gradient methods in or-
der to achieve a balance between exploration and exploitation of available
information [Neumaier, 2004], [Eskandari and Geiger, 2008].
If gradient based methods are used, the technique used to obtain the
gradient information is very important, because this has a large influence
on the number of function evaluations required for the optimization.
4
The Hessian matrix is not to be confused with the Jacobian matrix, a matrix contain-
ing the gradient vectors of a vector valued function (e.g. F(X)) as its rows.
38
A scalar valued function F (X), with X = (x1 , x2 , . . . , xn )T , has a vector
valued gradient
∂F T
∂F ∂F
∇F (X) = , ,...,
∂x1 ∂x2 ∂xn
and its Hessian matrix is defined as
∂2F ∂2F ∂2F
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
∂2F ∂2F ∂2F
2
∂x2 ∂x1 ∂x22
... ∂x2 ∂xn
HF (X) = ∇ F (X) =
.. .. .. ..
. . . .
∂2F ∂2F ∂2F
∂xn ∂x1 ∂xn ∂x2 ... ∂x2n
There are several ways to obtain gradient information, i.e. perform sensi-
tivity analysis. If the objective and constraints are well-behaved analytical
functions, an expression for the gradient can be found directly. However,
this is rarely the case in real life problems. Often complicated numerical
analysis tools are used which, in many cases, do not readily yield gradient
information.
In those cases where the objective is analyzed with the help of ‘black-
box’ tools that yield only function values, the simplest approach to obtaining
gradient information is via the method of finite differences (FD). Because
this is an approximate method it will give rise to inaccuracies, and as such
the method is prone to numerical instability.
A finite difference approximation of the gradient of a function of multiple
variables requires one extra function evaluation per variable, as becomes
clear from the (forward) finite difference scheme for the ith partial derivative
in ∇F [Pauw and Vanrolleghem, 2006]:
∂F F (X + ∆Xi ) − F (X)
≈ (2.9)
∂xi hi
Note that ∆Xi is a vector with a single non-zero entry: a small pertur-
bation hi on the ith component of X. This implies that, if the objective is
a function of n variables, a finite difference approximation of ∇F requires n
perturbed function evaluations for every optimization iteration.
If a function evaluation is cheap, i.e. low computing cost, this is not
a big problem, but in real engineering problems complicated analysis tools
are often used, such as CFD (Computational Fluid Dynamics) or FE (Finite
Element) solvers, which can be quite computationally intensive5 .
5
These codes are themselves often iterative in nature.
39
Furthermore, selection of the correct perturbation factor h is often a
problem. If the factor is chosen too small the method will suffer from numer-
ical inaccuracies because two almost-equal numbers are subtracted, whereas
a factor that is too large may result in instability due to nonlinearity of the
function.
Fortunately more advanced techniques exist for obtaining gradient in-
formation that offer significant advantages in terms of numerical accuracy
and/or computational burden.
Important methods are the Complex-step Derivative Approximation tech-
nique [Pauw and Vanrolleghem, 2006], the use of Global Sensitivity Equa-
tions (for multidisciplinary system sensitivity) [Sobieszczanski-Sobieski, 1990],
and the use of adjoint methods [Giles and Pierce, 2000], [McNamara et al.,
2004], [Kaminski et al., 2005], and automatic differentiation (AD) [Verma,
2000], [Griewank, 2000], [Rall, 1981], [Bücker et al., 2006].
Sensitivity information can be calculated analytically using the direct
method or using the adjoint method. The computational burden for the
adjoint method depends on the number of functions for which the gradients
need to be calculated, rather than on the number of design variables. For the
direct method it is the other way round. Therefore, the adjoint approach
can be very efficient when the number of design variables is larger than
the number of functions, and otherwise the direct method may prove more
interesting [Martins, 2002]. These analytical approaches are discussed in
some more detail in Appendix 2.A.
Automatic differentiation (a.k.a. algorithmic differentiation) is a tech-
nique that makes use of the chain rule from calculus. Numerical analysis
programs, for example FE codes or CFD codes, can be adapted using AD
techniques to yield gradient information to working precision at an effective
cost of at most five extra function evaluations, regardless of the number of
independent variables (compared to n extra evaluations for FD!) [Griewank,
2000].
Any numerical analysis program, no matter how complex, is composed of
basic mathematical operations such as e.g. ∗, +, sin, exp. AD operates by
decomposing the program into these basic elements, evaluating their deriva-
tives (using actual numerical values, not symbolic variables), and using the
chain rule to recombine those values to form the gradient. This process may
be performed in forward mode or reverse mode (adjoint mode). The latter
is the method of choice for functions of many variables.
Although AD methods yield gradient information accurate to working
precision at relatively low cost, they require specific adaptation of the ana-
lyzer code, which complicates their use with ‘black-box’ analyzers.
Another very powerful option for reducing computational cost, which
has benefits throughout the optimization process, is the use of surrogate
models such as response surfaces [Jones, 2001]. With the help of techniques
from design of experiments, a field focused on efficiently gathering useful
40
information, a number of design points are selected and the objective is
evaluated for these points using the full analysis model (e.g. a CFD model).
Then (hyper-) surfaces are fit to the design points, and these so-called
response surfaces are used as approximate models in the optimization, in-
stead of the original (CFD) model. The evaluation of a surrogate model
is much cheaper than the evaluation of the original model. The applica-
tion of response surface methods may even allow the use of computationally
expensive global optimizers [Laurenceau and Meaux, 2008].
The use of response surfaces, design of experiments and global optimiza-
tion are discussed at a later stage.
But no matter how the gradient information is obtained, once it is avail-
able, it becomes possible to use gradient based optimization methods.
Xq = Xq−1 + α∗ Sq (2.10)
where the vector Sq represents a search direction and the scalar α∗ rep-
resents the amount of change in that direction necessary to minimize the
objective function F (Xq ) subject to the constraints. The asterisk denotes
an optimum.
Note that Equation 2.10 effectively expresses the design vector Xq as a
function of the single variable α. As a result, the objective and constraints
are also reduced to functions of a single variable, i.e. F (Xq ) = F (Xq (α)).
This single variable optimization problem can now be solved to find the
optimum perturbation δXq .
Thus, in order to find Xq , two steps need to be taken:
41
section method and the polynomial approximation method [Vanderplaats,
2007]. It is very important to understand that this one-dimensional opti-
mization is performed in every iteration q of the main optimization: The n-
dimensional optimization problem is effectively transformed into a sequence
of one-dimensional searches.
Step 1, the selection of a search direction Sq , is where the various gra-
dient based optimization methods are distinguished from each other. In
this direction finding problem, there are two basic approaches, based on the
distinction between constrained and unconstrained optimization problems.
Direct optimization methods approach the constrained problem directly,
so they are able to take into account constraints explicitly. Examples of
direct gradient based methods are: sequential linear programming (SLP)6 ,
the method of feasible directions, the modified feasible directions method,
the generalized reduced gradient method, and sequential quadratic program-
ming (SQP).
SQP is a very powerful method which is often used. A concise description
of the method can be found in appendix 2.B.
Examples of direct non-gradient based methods are: genetic algorithms,
particle swarm methods, simulated annealing.
Indirect methods first reduce the constrained problem to an equivalent
unconstrained problem by producing a pseudo-objective function with the
help of penalty functions and/or Lagrange multipliers. Subsequently they
proceed by solving the unconstrained problem using well established meth-
ods for unconstrained optimization. This will be discussed in more detail in
the next section.
Due to the use of penalty functions, numerical ill-conditioning may arise.
In order to prevent this problem, the pseudo-objective is often optimized se-
quentially, starting out with small values for the penalty parameters and
then sequentially increasing these values. After each increase a new un-
constrained optimization needs to be performed. Methods that work like
this are referred to as Sequential Unconstrained Minimization Techniques
(SUMT) [Vanderplaats, 2007]. Examples of SUMT methods are: the ex-
terior penalty function method, various interior penalty function methods,
and the augmented Lagrange multiplier method.
Examples of first-order techniques for unconstrained optimization are:
the steepest descent method, the conjugate direction method, and variable
metric methods. Newton’s method is an example of a second-order method.
In order to obtain a better understanding of optimization algorithms we
will now proceed by examining one of the indirect (sequential unconstrained)
optimization methods in more detail.
6
SLP does not strictly use line search as defined in Equation 2.10, instead it minimizes
a first-order Taylor (i.e. linear) approximation of the objective about the q th design
point using a linear programming method (e.g. the simplex method) in order to find the
perturbation vector δX directly.
42
2.2.5 An Example: The Exterior Penalty Function Method
in Combination with the Method of Steepest Descent
The exterior penalty function (EPF) method is treated here because it is
well suited for clarifying the kind of nested-loop structure7 that occurs in
optimization algorithms. In practical applications the EPF method has been
replaced by more advanced sequential methods that include Lagrange mul-
tipliers (e.g. the Augmented Lagrange Multiplier method) [Vanderplaats,
2007].
The complete EPF algorithm as described here is depicted in Figure 2.7.
The figure clearly shows the nested iterative-loop structure that is typical
for numerical optimization algorithms.
Indirect methods employ a pseudo-objective, which incorporates the ob-
jective as well as the constraints, to transform the constrained problem into
an unconstrained problem. This can be done using a kind of weighted sum
of the constraint functions, called the penalty function P (X):
m
X l
X
Φ(X, rp ) = F (X) + rp (max[0, gj (X)])2 + (hk (X))2 (2.12)
j=1 k=1
43
One-dimensional minimization
Steepest descent
Find bounds on
minimum of Φ(α)
(iterative process)
S = -∇Φ(X)
Exterior penalty function method
44
polynomial
approximation
Minimize:
to find α*
Φ(X,rp ) Converged?
yes
Converged? yes Exit
rp = γ rp
Figure 2.7: The Exterior Penalty Function method using method of Steepest Descent
for unconstrained optimization, and Golden Section method in combination with
is increased by multiplying with some factor γ, and the resulting pseudo-
objective is minimized again, starting from the previous optimum solution.
The new solution will still be infeasible but it will be closer to the true
optimum. This process is continued until a satisfactory result is obtained.
The solution is assessed by what is called a converger, a piece of logic that
compares subsequent solutions in order to determine whether convergence
is reached, i.e. the convergence criteria have been met.
This sequential process is illustrated in figure 2.8, using a two-dimensional
objective function and a single inequality constraint.
Notice that the optimum is approached from the infeasible region, hence
the name exterior penalty method. Interior methods approach the opti-
mum from inside the feasible region, which is often preferable because then
intermediate solutions can also be used.
The minimization of the pseudo-objective function is an unconstrained
minimization problem which can be approached using one of various well
known techniques. We will use one of the simplest of those in our EPF
algorithm: the method of steepest descent. The method of steepest descent
is a first-order unconstrained optimization method based on one-dimensional
search. It is treated here only because it is easy to explain, not because of its
performance (it has the worst performance of all the first-order methods).
As the name implies, the method of steepest descent uses the direction
of steepest descent as the search direction in the one-dimensional search.
The gradient represents the direction of steepest ascent, so the method of
steepest descent employs the negative gradient as the search direction:
45
original constrained problem pseudo−objective with r = 0.01
p
46
In the previous section, the pseudo-objective was introduced as a com-
bination of objective and constraints. The Lagrangian represents another
unconstrained formulation of the constrained optimization problem. The ob-
jective function and constraint functions are combined into a single function
using Lagrange multipliers (λ) as weighting parameters. The Lagrangian is
defined as [Vanderplaats, 2007]:
m
X l
X
L(X, λ) = F (X) + λj gj (X) + λm+k hk (X) (2.14)
j=1 k=1
X∗ is f easible
λj gj (X∗ ) = 0 f or j = 1, 2, · · · , m
47
The first condition is rather obvious, any optimum solution must be fea-
sible.
The second condition may seem somewhat harder to understand. It is
important to remember the standard formulation of the optimization prob-
lem from section 2.2.1. This formulation dictates that an inactive constraint
has a negative (nonzero) value. In this light, we may interpret the second
condition as a requirement that the value of the Lagrangian is equal to the
value of the objective function. Again note that equality constraints must
always be active for a feasible solution, and hence always have zero value.
The third condition requires that, at the optimum, the gradient of the
Lagrangian must be equal to zero. This is equivalent to saying that the
function described by the combined active constraints (with multipliers) is
tangent to the objective function. If this were not the case, a move along the
constraint boundary would still improve the objective value. Alternatively
we may say that, at the optimum, the gradient of the objective can be
expressed as a linear combination of the active constraint gradients, with
the Lagrange multipliers as the coefficients. More formally, the gradient of
the objective is entirely contained in the subspace spanned by the gradients
of the constraints.
The latter interpretation may be clarified by figures 2.9, 2.10, and 2.11.
These figures show examples of a two dimensional constrained optimization
problem with linear inequality constraints. The active constraint gradients
at the optimum are depicted in figure 2.9. Figure 2.10 shows the vector
sum that is represented by the gradient of the Lagrangian for two active
constraints. Figure 2.11 shows the same thing for a problem with one con-
straint inactive at the optimum. From the latter it also becomes clear that
the Lagrange multiplier for the inactive constraint needs to be equal to zero.
As the name implies, the Kuhn-Tucker necessary conditions are necessary
for a solution to be a global optimum, but they are not sufficient. Only if
the design space is convex, the optimum will be global, otherwise the Kuhn-
Tucker conditions only guarantee local optimality.
Remember from the discussion on convexity that the design space is
convex if the objective as well as the constraints are convex functions. This
requires that their Hessian matrices be positive definite (i.e. the matrices
have all positive eigenvalues) on the entire domain. This is the multi-variable
equivalent to the requirement in single variable minimization that the second
derivative be strictly positive.
Unfortunately, in practical design applications it is hardly ever possible
to demonstrate design space convexity. Therefore the usual approach is to
start the optimization process from several different initial points and to see
whether they all converge to the same final design.
48
∇F
2
x
∇g1
∇g
2
x
1
Figure 2.9: A two dimensional problem with two active linear inequality constraints
at the optimum. The vector sum of the constraint gradients and the objective gra-
dient is not equal to zero. The shaded area represents the infeasible region.
∇F
2
x
λ2∇g2
λ1∇g1
x
1
Figure 2.10: A two dimensional problem with two active linear inequality constraints
at the optimum. The constraint gradient vectors are scaled using the Lagrange
multipliers, so that the vector sum of the constraints exactly opposes the gradient
of the objective function. Thus, the Lagrangian vanishes at the optimum.
49
∇F
2
x
λ2∇g2
λ1∇g1
x
1
Figure 2.11: A two dimensional problem with one active linear inequality constraint
at the optimum. The Lagrange multiplier for the inactive constraint (λ2 ) must be
equal to zero, at the optimum, for the Lagrangian to vanish.
The global optimization problem can also be tackled using zero-order meth-
ods.
50
Figure 2.12: An example of a non-convex problem: Langermann’s function. This
and other test functions for global optimization can be found in literature, e.g. in
[Molga and Smutnicki, 2005] (a somewhat obscure document which nevertheless
provides a clear and practical overview of test functions).
51
F(X)
observation
third−degree polynomial fit
cubic spline interpolation
X
Figure 2.13: The difference between an interpolating fit and a non-interpolating fit.
better as the number of samples (data points) increases. This is because the
surface passes through all the sample points, so the estimation error at the
sample points equals zero.
Non-interpolating functions have the advantage that they are less sensi-
tive to noise in the data, but the fit can only be as good as allowed by the
shape function that is used. The addition of data points does not necessarily
improve the fit. For example, a second-degree polynomial function cannot
accurately fit a third-degree polynomial at all points.
For data obtained from numerical simulations the amount of noise is
usually small, which implies that interpolating methods are the best choice
[Jones, 2001].
In the following example, let us assume we are dealing with a com-
putationally expensive CFD model (e.g. requiring one day per function
evaluation). Obviously a zero-order optimization method requiring several
hundreds or thousands of function evaluations cannot be used to find an
optimum, because of the amount of computing time that would be required.
Therefore a response surface approximation of the original CFD model is
constructed. This will allow the use of a zero order method. An illustra-
tion of the optimization process using response surface approximations is
depicted in figure 2.14.
In order to construct a response surface approximation of the CFD
model, response data are required. These data are generated by evaluat-
ing the CFD model at a number of sample points, which yields a number of
observations. Because the CFD evaluations are very expensive, it is impor-
52
Start
C onstruction of R esponse
n= n+1 Surface Approxim ation
(R SA)
No
C onverged?
Yes
Exit
53
2
x
x
1
Figure 2.15: A two-dimensional Latin Hypercube Sample of size n=8 (i.e. 8 points
in the two-dimensional design space). For each of the two variables, the domain is
divided into n intervals of equal probability (in this case for a uniform distribution)
represented by the horizontal and vertical lines, and a single value is randomly
selected from each interval. Then the variables are shuffled (called pairing) in order
to obtain a better spread of samples. Pairing may be random as in this example, or
it may be according to some scheme that ensures higher uniformity.
54
Once a number of observations have been obtained, it becomes possible
to construct a response surface approximation based on these points. A
powerful interpolating predictor that can be used here is the Kriging pre-
dictor.
Kriging is based on a statistical approach, using the mean and variance
of the available data together with the correlation between sample points
to predict the function value at locations that have not yet been sampled.
Kriging also provides a measure for estimating the error in the prediction.
This enables the selection of new sample points at locations where there is
high uncertainty about the predicted function value. Detailed descriptions
of the method can be found, for example, in [Jones, 2001], [Queipo et al.,
2005], [Simpson and Mistree, 2001], or [Swiler et al., 2006]. A disadvantage
of Kriging is that the predictor may become ill defined if the distance be-
tween sample points becomes very small, which may occur as the number
of samples increases.
After constructing the response surface it is used as a surrogate model in
the optimization routine. The (global) optimizer searches for the optimum
of the response surface model. The resulting optimum will be close to the
real optimum only if the response surface provides a good fit to the real
response. If this is not the case, the CFD model is evaluated at the new
‘optimum’ design point. This yields a new observation which is then used
to update the response surface approximation, and the process is repeated
until convergence.
Apart from the global optimization problem, there are other types of
problems that cannot be tackled by gradient based methods without special
precautions. Discrete variable optimization is one example.
55
search methods are used for routing problems, which occur in aerospace in
the form of wire routing or pipe routing, for example. Examples of these
methods are discussed in appendix 2.C, but here we will just focus on classic
branch-and-bound.
Branch and bound methods can be very expensive, requiring a multitude
of nonlinear optimization tasks to be performed, but they can be applied to
all kinds of discrete variable or mixed variable problems.
A branch and bound solution procedure may be visualized with the help
of a solution tree, such as the one depicted in figure 2.16. Such a tree consists
of nodes and branches.
The nodes represent states; optimum solutions to a continuous relaxation
of the problem. Continuous relaxation implies that the discrete variables are
treated as if they were continuous, which enables the use of a gradient based
optimizer.
At each node, one of the variables is fixed at an allowable discrete value,
after which a continuous relaxation of the problem is optimized with respect
to the remaining variables.
At node 1 the optimization needs to be initialized, so all of the variables
are treated as continuous, there is no fixed variable. The result of this
continuous optimization is used to determine the fixed variables for nodes 2
and 3, and so on. Nodes on the same branching level or tier, like nodes 2
and 3 in figure 2.16, have the same fixed variable.
The branches represent relationships between nodes and each branch
corresponds to a different allowable discrete value for the fixed variable at
that tier. For example, the branch from node 1 to node 2 in figure 2.16
corresponds to the fixed variable X1 = l1 .
The branching strategy determines how many branches are generated
(‘grown’) at each node. It is possible to have as many branches as there are
allowable discrete values for the variable under consideration. On the other
hand it is also possible to have only two branches per node, as depicted in
the figure.
In the latter case, the branches correspond to the nearest allowable dis-
crete neighbors of the continuous optimum value for the fixed variable under
consideration. For example, at the first tier X1 is the fixed variable. Let’s
say that the allowable discrete values for X1 are [0.12, 0.23, 0.45, 0.49], and
the continuous optimum value for X1 (from node 1) is 0.3401. The continu-
ous optimum value is then in the interval [0.23, 0.45], hence these values are
used as the bounds for the two branches: l1 = 0.23 and u1 = 0.45.
Numerous branch and bound strategies exist, varying in the number
of branches per node and in the order in which the nodes are evaluated
(traversal order). Three popular traversal strategies are breadth-first, depth-
first, and best-first. Breadth-first and depth-first are depicted in figure 2.17.
The tree from figure 2.16 is an example of a two-branch, best-first strat-
egy. This example is now discussed in more detail, using a step by step
56
1
x1 ≤ l1 x1 ≥ u1
2 3
x1 ≤ l1 x1 ≤ l1 x1 ≥ u1 x1 ≥ u1
x2 ≤ l2 x2 ≥ u2 x2 ≤ l2 x2 ≥ u2
4 5 8 9
x1 ≤ l1 x1 ≤ l1
x2 ≥ u2 x2 ≥ u2
x3 ≤ l3 x3 ≥ u3
6 7
x1 ≤ l1* x1 ≥ u1*
x2 ≥ u2 x2 ≥ u2
x3 ≤ l3 x3 ≤ l3
10 11
Figure 2.16: Example of a branch and bound solution tree with two branches and
‘best-node-first’ selection criterion. Each node represents a state, i.e. a solution
to a continuous relaxation of the discrete problem. The node numbers indicate the
traversal order. At each level of branching another variable is fixed.
57
1 1
2 3 2 9
4 5 6 7 3 4 10 11
8 9 5 8
10 11 6 7
approach.
58
Let’s assume that neither one of the nodes can be pruned, so we need to
select one of the two nodes to continue branching. In this example we use
the best-first strategy, which implies that we select the node with the best
objective function value (obtained from the continuous relaxation problem).
Let’s say that node 2 has the best objective function.
In the second tier, X2 becomes the fixed variable. The nearest discrete
bounds on the continuous optimum value for X2 (from node 2) are l2 and u2 .
These values are used to construct the two branches at node 2. However,
note that the upper bound on X1 still remains. The bounds on the free
variables are inherited by lower branches.
Now the new nodes (4 and 5) are evaluated by optimizing the continuous
relaxation of the problem with X2 = l2 and X2 = u2 respectively, both
subject to the constraint that X1 ≤ l1 . Node 4 turns out to be infeasible,
so this branch is pruned.
Again the new best node is selected, taking into account all nodes that
have not been fathomed yet (so 3 and 5). Now it appears that node five is
the most promising. Thus we start the third tier, where X3 is fixed. The
bounds are again inherited from the parent node, so that now X1 ≤ l1 and
X2 ≥ u 2 .
After optimization we find that node 7 is infeasible, and node 3 has a
better optimum than node 6, so we continue branching from there. Note
that we return to the second tier where X2 is the fixed variable and now X1
is bounded on the lower side by u1 .
Let’s say node 8 turns out to be infeasible, and the continuous optimum
at node 9 turns out to be a discrete solution. This represents the first
incumbent solution. However, the (continuous) objective value at node 6 is
lower than that for the incumbent, so it is still possible that something can
be gained there.
At the fourth tier we return to X1 as the fixed variable. The inherited
bound on X1 is now removed and new bounds are established. Bounds on
the other two variables are X2 ≥ u2 and X3 ≤ l3 . After optimization we
find another discrete solution at node 10.
The continuous optimum at node 11 has a worse value than the current
incumbent, so this branch can be pruned because it can not offer any more
improvement. The new discrete solution indeed has a better optimum value
than the current incumbent solution (node 9), so finally node 10 represents
the optimum solution to the discrete variable optimization problem.
This concludes the introduction to optimization methods. One very im-
portant question remains, however: How do we choose between the different
optimization methods?
59
2.2.9 Selection of an Optimization Method
In selecting an optimization method, one rule is of paramount importance:
Any information about the nature of the problem should be taken into ac-
count in the selection of an optimization strategy.
No matter how good a method may be, if it is applied to a problem it
is not suited for, it may even perform worse than a random search. There
is no single optimization algorithm that performs equally well on all possi-
ble problems, in other words, there is no such thing as a general-purpose
universal optimization strategy.
These statements are derived from the No-Free-Lunch theorem of op-
timization. Another way of stating this theorem is that “there can be
no search algorithm that outperforms all others on all problems” [Ho and
Pepyne, 2002]. The essence of the theorem is not that all algorithms are
equally good, but rather that an algorithm cannot be expected to perform
better than any other if you do not take into account the nature of the
problem you are trying to solve. Although this theorem may seem of little
practical use, it provides a very clear warning: If you just choose an algo-
rithm blindly (or based on the wrong premises), chances are it will perform
even worse than random search.
When comparing optimization methods based on their performance on a
specific example problem, it is important to be careful. Such a performance
comparison cannot always be extrapolated to other kinds of problems. No-
free-lunch tells us that such a comparison can only provide certainty if the
problem you are trying to solve is similar to the example problem [Ho and
Pepyne, 2002].
A choice in which the nature of the problem is of obvious importance
is that between gradient based methods and non-gradient based methods.
Remember that even the best non-gradient based methods are one or two
orders of magnitude more expensive, computationally, than gradient based
methods. Therefore, if gradient based methods can be used, use them.
Information available about a problem will usually lead to a number
of methods that can be used. In many cases the only way to know which
specific optimization method is best for a given problem is by trial and
error. A useful discussion on selection of optimization methods is provided
by [Keane and Nair, 2005] (Chapter 16). In most cases, selection of an
optimization strategy is an optimization problem in itself [Miettinen, 1999].
This becomes even more apparent when looking at problems with mul-
tiple objectives.
60
2.3 Optimization for Multiple Objectives
2.3.1 Why Multiobjective Optimization is More Difficult than
Single Objective Optimization
In the previous section we dealt with single objective optimization. How-
ever, in real-life engineering problems there are often multiple aspects that
influence the desirability of the final solution. For example, an aircraft needs
to be lightweight, but it also needs to be as cheap as possible. Often (not
always!) a lighter structure will be more expensive to manufacture and to
maintain. In those cases lightness and low cost are conflicting requirements.
If an optimization problem consists of multiple objectives, it is called
(not surprisingly) a multiobjective optimization (MOO) problem. In mul-
tiobjective optimization, we aim to minimize all the objective functions si-
multaneously. The reason this is more complicated than single objective
optimization is that the objectives are often conflicting, as described in the
example above.
If there were no conflicts between objectives, then every objective could
be optimized independently and there would be no need for special multiob-
jective methods. Hence, in this text, a multiobjective optimization problem
has conflicting objectives by definition (although not all objectives need to
be conflicting).
Because of the conflicting nature of multiobjective optimization prob-
lems, no single solution exists that is optimal with respect to every objective
function [Miettinen, 1999]. This characteristic of MOO problems is reflected
in the concept of Pareto optimality, which is discussed in Section 2.3.3. But
first let’s have a look at a general representation of the multiobjective opti-
mization problem.
The design vectors X belong to the set S which represents the feasible re-
gion as defined by the constraints. The constraints are not explicitly defined
here, in order to keep things simple. Notice that F is a vector of k objective
functions in the objective space.
61
Multiobjective optimization deals with the objective space rather than
the design space, because it is much easier to compare objective values than
to compare design vectors. This will become clear in the following section
on Pareto optimality. Furthermore, the objective space is usually of lower
dimension than the design space (i.e. k < n). The image of the feasible
region, Z = F(S), is a subset of the objective space and is called the feasible
objective region.
MOO problems can be linear (MOLP) or nonlinear (MONLP), similar to
single objective problems. A multiobjective optimization problem is convex
if all objective functions and the feasible region (the constraint functions)
are convex [Miettinen, 1999], just like in single objective optimization.
62
Figure 2.18: Example of a space telescope design comparison. Source: [Jilla, 2002]
63
Kuhn-Tucker necessary conditions for Pareto optimality (using only inequal-
ity constraints)
X∗ is f easible
λj gj (X∗ ) = 0 f or j = 1, 2, · · · , m
64
aspiration levels, i.e. values for each objective function in the objective
vector that he thinks are desirable (or satisfactory). A design solution that
meets all the aspiration levels is called a satisficing solution.
Alternatively, a decision maker is often assumed to act on the basis of
an underlying (implicit) function called a value function, which represents
his/her preferences. If this value function could be expressed mathemati-
cally it would provide an ideal selection tool because it would reduce the
multiobjective problem to a single objective problem. Unfortunately it is
rarely possible to express the value function in mathematical terms, due to
its subjective nature [Hazelrigg, 1997]. But even if this were possible the
function would probably be too complex to handle. Therefore the value
function (if it exists) is often assumed to be known only implicitly.
As mentioned, trade-offs can be used as a tool in the decision making
process. A trade-off reflects the ratio of change in objective function values
when moving from one design vector to the other. The (partial) trade-off
between objective functions Fi and Fj for a move from design vector X1 to
X2 is defined by [Miettinen, 1999]:
Fi (X1 ) − Fi (X2 )
Λij = (2.19)
Fj (X1 ) − Fj (X2 )
∂Fi (X∗ )
λij = (2.20)
∂Fj
65
minimized. Depending on the way this is done, the solution can be guar-
anteed to be Pareto optimal. After the optimization, the solution is offered
to the decision maker, who can then decide whether to keep it or discard
it. Basic methods which rely more on the decision maker are the weighting
method and the ε-constraint method.
The weighting method represents a straightforward way of scalarizing the
multiobjective problem. The objective functions are combined in a weighted
sum which is then minimized using a single objective technique. The weight-
ing coefficients corresponding to the individual objectives should reflect the
value function of the decision maker. The solution to a weighting problem
can be shown to be Pareto optimal if the weighting coefficients are all pos-
itive or if the solution is unique. Some authors advise against the use of
the weighting method because weight allocation heavily influences the final
solution, which leads to unpredictable results [Vanderplaats, 2007].
In the ε-constraint method, one of the objectives is selected to be opti-
mized, and the others are converted into constraints by setting upper bounds
(εj ) to each of them. By changing the objective to be optimized and by
varying the upper bounds, theoretically, all Pareto optimal solutions can
be obtained. Hybrid methods, combining the weighting method with the
ε-constraint method also exist.
Lexicographic ordering and Goal programming are examples of meth-
ods in which the decision maker is asked to express his/her expectations
beforehand.
In Lexicographic ordering, the decision maker has to order the individual
objectives according to their absolute importance. The most important ob-
jective is then optimized, subject to the original constraints. If this problem
has a unique solution, this is selected as the final solution to the problem.
If there is no unique solution, the second most important objective is op-
timized, subject to the original constraints and an extra constraint which
ensures that the first objective remains at its optimum value. If this prob-
lem has a unique solution this is selected as the final solution, if not, the
process is continued for the next most important objective, and so on and
so forth. This is a robust and simple method that always yields a Pareto
optimal solution. Lexicographic ordering may be used as a part of the goal
programming method.
In the generalized goal programming method a decision maker specifies
aspiration levels for the objectives, and any deviations from these aspiration
levels are minimized. An objective function combined with an aspiration
level forms a (flexible) goal. This can be interpreted as a constraint which
is not strictly imposed, but for which the constraint violation is minimized.
The original constraints, the ones that define the feasible region, are referred
to as rigid goals in this context. The goal deviations are represented by
deviational variables. As there is a deviational variable for each objective,
the introduction of goals only leads to a reformulation of the multiobjective
66
problem: Instead of having to minimize the objectives, we now have to
minimize the goal deviations. This new multiobjective problem can be solved
using techniques such as the weighting method or Lexicographic ordering.
Goal programming is widely used in practical applications. The selection
of aspiration levels determines whether the resulting solutions are Pareto
optimal.
Interactive methods, in which the decision maker is intimately involved
in the optimization process, are the most developed methods and usually
yield the best results. Numerous interactive methods exist, based on various
approaches, but their detailed description is beyond the scope of this text.
One of the most well-known interactive methods is the Geoffrion-Dyer-
Feinberg (GDF) method [Miettinen, 1999]. This method is based on max-
imization of the implicitly known value function. At each iteration in the
GDF approach, a local approximation to the value function is generated
and maximized. The approximation to the (implicit!) value function is
found by using the marginal rates of substitution specified by the decision
maker to approximate the gradient of the value function. The approximated
value function is then maximized using a gradient based method. The big
drawback of this method is the difficulty the decision maker may have in
determining the marginal rates of substitution in each iteration (because the
value function is only implicitly known).
Now that the basics of single objective optimization and multiobjective
optimization have been covered, we can proceed with the subject of multi-
disciplinary design optimization.
67
subsystems that coherently exploits the synergism of mutually interacting
phenomena’.
Both definitions indicate a systematic approach (i.e. methodology) to the
design of complex engineering systems with mutually interacting phenomena.
The second definition specifically refers to the synergetic effects 11 that may
result from the interactions between subsystems.
Traditional multidisciplinary design (MD) relies heavily on engineering
experience and heuristic procedures (common sense). In most cases however,
the multidisciplinary design problem is of such scale and complexity that it
is very hard even to meet the design requirements, let alone actively exploit
the synergetic potential that is present in the complex system.
In order to unlock this synergetic potential, numerical optimization tech-
niques can be used. MDO in general tries to formalize the multidisciplinary
design problem so as to make it amenable to solution by numerical opti-
mization techniques. This formalization of the MD process requires careful
analysis and knowledge of the technical aspects as well as the organizational
aspects of the system. For systems such as aircraft, with their immense
complexity, this is quite a challenge indeed, as is clearly explained by [Kroo,
1997].
There are many examples of applications of MDO in industry, but gen-
erally these concern only a small part of the total design problem (e.g.
[Gilmore et al., 2002], [de Weck et al., 2007]). MDO has not yet reached a
state of maturity that would allow it to be used to address the entire design
process. Moreover, it should be noted that MDO is not intended to provide
fully automatic design capability. MDO is a tool that can help guide the
design process in the right direction by allowing the engineers and managers
to focus on the creative aspects of design.
A key aspect of MDO is the decentralization of the multidisciplinary
design process, which decreases the reliance on a single leader to drive the
design. In order to achieve such decentralization, the complex design process
needs to be partitioned or decomposed into smaller subprocesses that can
operate separately as much as possible. Once a proper decomposition has
been found, some form of coordination needs to be imposed in order to guide
the MD process towards its goal.
This may sound pretty vague, so let us try to clarify things by having a
closer look at the subjects of complexity, decomposition, and coordination.
68
between those subsystems. This complexity may be a result of large scale
(e.g. many subsystems, very large numbers of variables), but also of inter-
actions between subsystems [Allison, 2004]. A system that exhibits such
interactions is said to be coupled.
In the context of computing science, coupling between subsystems (also
modules, disciplines, processes, and functions) is defined as the measure of
strength-of-association established by a connection from one subsystem to
another [Stevens et al., 1974]. This coupling may be loose (also weak and
low ) or tight (also strong and high). Loosely coupled systems have minimal
interdependence between individual subsystems.
The strength of a coupling can depend on various aspects of the in-
formation that is interchanged between subsystems. For example, a large
amount of information flowing from one subsystem to the other may con-
stitute a strong coupling (quantity), but a single piece of very important
information may have a similar effect (quality). Conversely, if a subsystem
is only slightly influenced by a large change in the information it obtains
from another subsystem (sensitivity), this coupling may be characterized as
weak.
In addition to strength, a coupling also has direction. The direction
of a coupling is indicated using the terms feed-forward and feedback. The
presence of feed-forward between two subsystems implies that they have
to operate sequentially, i.e. one subsystem has to finish before the other
can start. The presence of both feed-forward and feedback implies that
the subsystems have to operate iteratively, e.g. subsystem A depends on
subsystem B for input, but B also depends on A for input, thus B has
to wait for A to finish, but when B is finished A has to start again, and
so on and so forth, until some measure of convergence is satisfied. Iterative
processes are especially difficult to cope with, and they have to be initialized
using some kind of initial guess.
It is possible to characterize systems according the presence of feed-
forward and feedback relations. In our context, a coupled system has both
types of relations, whereas a decoupled system contains only feed-forward
relations. This type of system can operate much faster than a coupled one,
due to the absence of iterative loops. The subsystems can operate separately
but not simultaneously, because of the feed-forward relations [Kusiak and
Wang, 1993]. An uncoupled system is one that has neither feed-forward nor
feedback. The subsystems are fully independent and can operate separately
and simultaneously. This represents the ideal case, because it leads to the
greatest reduction in throughput time.
Coupled systems may also be classified according to their coupling ar-
chitecture. In a non-hierarchic system there are no restrictions on the way
the subsystems are coupled, whereas in an hierarchic system there exists
a natural information hierarchy (information is transferred between ‘par-
ent’ subsystems and ‘children’ subsystems, but there is no coupling between
69
subsystem subsystem subsystem
subsystem subsystem
70
(a) Object-based (b) Aspect-based
[Wagner and Papalambros, 1993], also [Bloebaum, 1991], [Kusiak and Wang,
1993].
There are different ways to subdivide a system. For example, a system
can be divided into subsystems, modules, and components (object-based), or
into disciplines according to physical aspects (aspect-based), as illustrated
in figure 2.20 [Tosserams, 2008]. Often combinations of the two are used.
For example, in the aircraft industry, design teams can be divided into
disciplinary groups (e.g. structures, aerodynamics, etc.), which are then
subdivided into subsystem groups (e.g. wing structure, fuselage structure)
or component groups. It is also possible to do it the other way round, first
dividing the teams according to subsystems (e.g. wing, fuselage, etc.), and
then dividing each subsystem according to discipline (e.g. wing aerodynam-
ics, wing structure).
The decomposition process can be facilitated using tools from the won-
derful world of systems engineering. Most importantly, the technical system
(or corresponding design process) can be represented in a compact matrix
form that makes it amenable to analytical (numerical) operations.
There are different kinds of these matrices, but the most well known
is the Design Structure Matrix (DSM)12 [Steward, 1981] or N 2 diagram
(although the latter term is also used to indicate a specific kind of DSM
[Browning, 2001]).
The basic DSM is a square binary matrix (i.e. entries can be true or
false), with the subsystems or design tasks arranged on the diagonal, as
depicted in figure 2.21. The order of execution of the subsystems or tasks
is reflected by their position on the diagonal, starting top-left and ending
12
In fact, [Steward, 1981] starts out with a precedence matrix, then partitions this
matrix into block triangular form, after which he determines where the feedback relations
need to be teared. He refers to the decomposed matrix as the Design Structure Matrix.
71
S1 S2 S3 S4 S5 S6 S7 S8
S1 S1
S2 S2
S3 S3
S4 S4
S5 S5
S6 S6
S7 S7
S8 S8
72
can be feed-forward relations (and there usually are many). Thus, a block
triangular DSM is empty above the diagonal of blocks but can have non-
empty entries below the diagonal of blocks. If the blocks are treated as single
subsystems, the DSM becomes a lower triangular matrix, which explains the
term block triangular [Warfield, 1973]. A lower triangular DSM represents
a decoupled system (i.e. without feedback).
The block triangular DSM is decoupled with respect to the blocks, but
within blocks feedback still occurs. However, if some of these feedback rela-
tions can be broken, the system may be partitioned further. This breaking
or tearing of relations is not a trivial task. On the contrary, it involves ex-
tensive knowledge of technical aspects of the system that is to be designed,
as well as knowledge of the organization that has to produce the design.
It is important to understand that real life design problems are often so
complex that it becomes very hard, if not impossible, to capture the entire
process in a nicely structured format such as the DSM. Often it is only
possible to represent parts of the process this way, and often this is done
once the process is already in place.
An example of a DSM for a real design problem is depicted in figure
2.22. This figure shows the partitioned system with subsystem blocks on the
diagonal. The figure clearly shows the tightly coupled nature of the blocks,
as well as some remaining high-level feedback relations between blocks. By
tearing some of these relations, the system could (theoretically) be decoupled
further.
As mentioned before, tearing a coupling relation (feed-forward and/or
feedback) implies that some form of coordination becomes necessary in order
to ensure that the resulting system remains consistent. Tearing a relation
may allow subsystems to operate sequentially or in parallel, but if too many
relations are broken the coordination task may become too difficult.
In the next section we will have a closer look at various coordination
strategies.
73
Figure 2.22: Example of a DSM, in block triangular form, for a semiconductor
design problem. This clearly illustrates the complexity of real life design problems.
Note that not all feedback can be removed from the design process. (source: [Ep-
pinger et al., 1994]).
74
coordination within disciplines. Figure 2.23 depicts the general idea of co-
ordination.
Figure 2.23: Coordination of data flow within a partitioned system (source: [Alli-
son, 2008]).
75
Assume that the multidisciplinary design process and its disciplinary pro-
cesses incorporate models that require iterative solution procedures. The
behavior of these models is described by means of state equations. The in-
ternal state of such a model, e.g. the deformation of a loaded structure, is
represented by state variables. Based on the state variables, residuals can
be computed. These residuals provide a measure for convergence.
A process is said to have converged if a consistent solution has been
found, which implies that the residuals must vanish (i.e. become equal to—
or close enough to—zero). Note that the internal state of a system is not
only a function of the state variables, but also of the input variables and
internal parameters. Convergence is taken care of by the coordinator.
We define a coordinator to be a subsystem that manages coupling re-
lations in order to ensure system consistency. A multidisciplinary coordi-
nator (MDC) takes care of interdisciplinary couplings (between disciplines),
and a disciplinary coordinator (DC) takes care of intradisciplinary couplings
(within disciplines). Note that a coordination block may have direct feed-
through. Furthermore, initialization is also considered a task that is part of
coordination.
In our discussion, interdisciplinary couplings are represented by system
state variables (usually called coupling variables in literature), and intradis-
ciplinary couplings are represented by discipline state variables (usually just
called state variables in literature).
By treating coupling variables as state variables, we can construct a
consistent description of a multidisciplinary design process for the purpose
of clarifying the differences between various MDO architectures.
Let us now define an evaluator to be a process that performs a sin-
gle calculation of the residuals, the (new) state, and the output, based on
internal parameters, input variables, and the (old) state variables. This def-
inition is valid on the system level (multidisciplinary evaluator, MDE) as
well as on the discipline level (disciplinary evaluator, DE), and possibly on
lower levels as well. Depending on the problem, the output can consist of
objective values, constraint values, gradient information, coupling function
values, internal state, residuals, etc.
The combination of an evaluator and a coordinator in an iterative loop
is referred to as an analyzer. The coordinator makes sure that the system
is driven to convergence in an iterative process such as for example fixed
point iteration (FPI, described in appendix ??). It uses the residuals and
state calculated by the evaluator to guide the convergence process. Since
the analyzer delivers a consistent solution, there is no need to provide the
residuals and state as output. The definition of an analyzer also holds on
the system level (multidisciplinary analyzer, MDA) and on the discipline
level (disciplinary analyzer DA). In general, we may say that evaluation is
cheap, but analysis is expensive because it involves many evaluations.
An analysis run is based on a given design, which is determined by the
76
design variables—in this context we consider design to be the process of
defining the properties of a system, whereas analysis is considered to be the
process of determining the behavior of that system [Vanderplaats, 2007]. In
MDO, the task of determining the values of the design variables (decision
authority), based on objective function values and constraint values, falls
in the hands of the optimizer. Again we can distinguish between a multi-
disciplinary optimizer (MO) and a disciplinary optimizer (DO). As we will
see later, it is also possible for an optimizer to take over the task of the
coordinator.
Summarizing, we now have the following basic building blocks:
• the optimizer, which controls the design variables and can also take
over the coordination task
With the help of these building blocks, it is possible to describe some basic
strategies for MDO, starting with single-level strategies and moving on to
multilevel strategies.
77
Different single-level MDO strategies have different ways of coordinating
the state variables. By treating state variables as design variables, thus by
letting the optimizer control the state variables in addition to the design
variables, a compromise can be found between optimization problem com-
plexity and analysis complexity. This leads to three basic strategies: MDF,
IDF, and AAO. These are now discussed in more detail.
For clarity we will assume that all analyses are performed using some
form of fixed-point-iteration. Based on this assumption, we can define a
residual as a measure for the change in state between two subsequent iter-
ations: r = |s − s∗ |. This is a rather intuitive choice, but many other kinds
of residuals can be used. Furthermore, we assume that all the constraints
are posed as inequality constraints (g ≤ 0);
MDF
Perhaps the most straightforward implementation of MDO is the addition of
a system optimizer to an existing multidisciplinary analysis process (MDA).
The system optimizer controls the design variables (xS and all xi ), and
at every optimization step a fully coupled multidisciplinary analysis is per-
formed. Objective values, constraint values, and gradients are calculated as
part of the multidisciplinary analysis.
In this approach, disciplinary consistency and multidisciplinary consis-
tency are maintained throughout the optimization process (the optimizer
is not aware of the coupling relations within the system, these are man-
aged by separate coordinators). This method is often referred to as MDF
(MultiDiscipline Feasible, where the word ‘feasible’ refers to consistency)
[Cramer et al., 1994]. Other designations are All-In-One (AIO) [Kodiyalam
and Sobieszczanski-Sobieski, 2001] and Fully Integrated Optimization (FIO)
[Keane and Nair, 2005].
A schematic representation of a basic MDF architecture for two disci-
plines is depicted in figure 2.24. The figure shows a multidisciplinary analy-
sis process (MDA, enclosed by a dashed line) guided by a system optimizer
(SO). The MDA consists of two disciplinary analysis processes (DA 1 and
DA 2, also enclosed by dashed lines) that are coupled via a system coor-
dinator (SC). Each disciplinary analysis comprises a disciplinary evaluator
(DE) and a disciplinary coordinator (DC).
The system optimizer determines values for the system design variables
xS and the disciplinary design variables xi , based on some system objective
and constraints. For convenience we will assume that the system objective
is simply a combination of the disciplinary objectives fi and that the con-
straints to be satisfied are just the disciplinary constraints gi . Depending on
the type of optimization routine used, the optimizer may also require gradi-
ent information, but for the sake of clarity this is not taken into account in
our discussion.
78
SO
DC 1 DE 1 DE 2 DC 2
DA 1 DA 2
SC
MDA
79
the design structure matrix in figure 2.25. The processes are located on
the diagonal and the (most important) information flow between processes
is specified. Again, entries above the diagonal represent feedback, whereas
entries below the diagonal represent feed-forward. In other words, a process
provides the entries in its column and it receives the entries in its row.
For example: disciplinary coordinator 1 (DC 1) receives r1 and s1 , and it
provides s∗1 .
SO f1 , g1 f2 , g2
SC r12 , s12 r21 , s21
DC 1 r1 , s1
xS , x1 s∗S s∗1 DE 1
DC 2 r2 , s2
xS , x2 s∗S s∗2 DE 2
IDF
If the multidisciplinary coordinator is removed from the multidisciplinary
analysis process (MDA), it essentially becomes a multidisciplinary evalua-
tion process (MDE). This implies that multidisciplinary consistency is no
longer maintained throughout the optimization process, even though disci-
plinary consistency is still maintained. In addition to the optimization task,
the system optimizer now becomes responsible for ensuring multidisciplinary
consistency.
This is achieved by treating the system input states as design variables.
Consistency is enforced by introducing an equality constraint for every sys-
80
tem state variable, stating that the system input states and corresponding
system output states are equal, or in other words, that the system residuals
vanish (rij = s∗ij − sij = 0 for all i, j). Thus, multidisciplinary consistency
is only attained when these equality constraints are satisfied.
This approach is usually referred to as IDF (Individual Discipline Feasi-
ble) [Cramer et al., 1994], because the individual disciplines are consistent
at every optimization step. The disciplines are uncoupled, allowing them to
operate separately and simultaneously. An alternative designation is Dis-
tributed Analysis Optimization (DAO) [Keane and Nair, 2005].
SO
DC 1 DE 1 DE 2 DC 2
DA 1 DA 2
Figure 2.26: The individual discipline feasible (IDF) approach: full disciplinary
analysis guided and coordinated by a system optimizer
f1 , g1 , f 2 , g2 ,
SO
r12 , s12 r21 , s21
DC 1 r1 , s1
xS , x1 ,
s∗1 DE 1
s∗S
DC 2 r2 , s2
xS , x2 ,
s∗2 DE 2
s∗S
81
An individual discipline feasible (IDF) optimization problem for a two-
discipline system
Minimize:
f (f1 , f2 )
with respect to xS , x1 , x2 , s∗S (where s∗S = [s∗12 ; s∗21 ])
Subject to:
g1 ≤ 0
g2 ≤ 0
r12 = s∗12 − s12 = 0
r21 = s∗21 − s21 = 0
AAO
The All-At-Once (AAO) approach goes one step further than IDF, removing
not only the multidisciplinary coordinator but also the disciplinary coordi-
nators [Cramer et al., 1994]. The system optimizer now has the responsibil-
ity of ensuring not only multidisciplinary consistency but also disciplinary
consistency. Usually this can only be done at the optimum. AAO could
therefore be interpreted as a ‘no-discipline feasible’ approach. Other des-
ignations are Simultaneous Analysis and Design (SAND) [Keane and Nair,
2005], and Optimization Based Decomposition [Kroo, 1997].
The system optimizer now controls the disciplinary input states as well.
Just like in IDF, multidisciplinary consistency is achieved by introducing
coupling constraints in the form of equality constraints for the system residu-
als (rij = 0). Now disciplinary consistency is also maintained by introducing
equality constraints for the discipline residuals: ri = 0.
Note that the number of variables and coupling constraints can easily be-
come very large, complicating the optimization problem. On the other hand,
a single optimization step requires only evaluations, which are much cheaper
than analyses. Thus, where MDF involves a relatively simple optimization
problem with an expensive coupled multidisciplinary analysis, AAO involves
a difficult optimization problem with relatively cheap uncoupled disciplinary
evaluations.
The AAO strategy for this simple system can also be formulated as an
optimization problem:
82
SO
DE 1 DE 2
f1 , g1 , f2 , g2 ,
SO r12 , s12 , r21 , s21 ,
r1 , s1 r2 , s2
xS , x1 ,
DE 1
s∗S , s∗1
xS , x2 ,
DE 2
s∗S , s∗2
These single level MDO strategies provide a good starting point for ex-
plaining the fundamental aspects of MDO, but they are often hard to imple-
ment in real life design processes. For a large part this is due to the presence
of centralized decision authority (a single system optimizer that controls all
the design variables). As system scale increases, this problem becomes more
apparent.
Real organizations require distributed decision authority for many differ-
83
ent reasons. For example, different design departments are often located at
separate geographical locations (as are subcontractors). Also, disciplinary
experts often do not tolerate too much centralized decision authority. Fur-
thermore, as system complexity increases the amount of information that
has to be handled by the system optimizer simply becomes unmanageable
[Kroo, 1997]. For these reasons multilevel MDO strategies are developed.
CO
Collaborative Optimization (CO) is a a bi-level MDO strategy, with a system
optimizer and separate disciplinary optimizers. The basic architecture is
depicted in figure 2.30, and the relations are again specified using a DSM
representation in figure 2.31.
The system optimizer sets a target for the disciplinary optimizers and the
disciplinary optimizers try to match this target as closely as possible. This
allows disciplines to operate separately and simultaneously: no information
is exchanged between disciplines.
SO
DO 1 DO 2
DC 1 DE 1 DE 2 DC 2
DA 1 DA 2
The system optimizer controls the system design vector xS and the sys-
tem state vector sS (which contains the interdisciplinary couplings). The
84
SO R1 f1 R2 f2
z DO 1 g1
DC 1 r1 , s1
z1 s∗1 DE 1
z DO 2 g2
DC 2 r2 , s2
z2 s∗2 DE 2
Figure 2.31: CO (Note that in this example f1 and f2 are communicated directly to
the system optimizer. This is just one possible implementation, chosen for consis-
tency with our descriptions of single-level architectures. This direct communication
is not depicted in figure 2.30, but we assume direct feed-through via the discipline
optimizers.)
85
practical in real problems.
Note that constraint gradient information for the system level optimiza-
tion problem is easily obtained in analytical form. Information from post-
optimality analysis of the disciplines can be used here [Braun and Kroo,
1993].
The CO approach can be described as a set of formal optimization prob-
lems, at the system level problem and at the discipline level.
System level
Minimize:
fS (f1 , f2 )
with respect to z, where z = [xS ; sS ]
Subject to:
R1 ≤ 0
R2 ≤ 0
Discipline level (i = 1, 2)
Minimize:
Ri = |zi − z|2
with respect to zi , xi
Subject to:
gi ≤ 0
86
given problem, then it could be easiest to use MDF. An added advantage of
MDF is that sub-optimal solutions may still be used, because consistency
is always maintained. If MDF is too expensive one of the other strategies,
such as IDF or AAO, might yield better results.
Multilevel strategies are often better suited for use in large complex or-
ganizations, because they allow distributed decision authority. However,
multilevel strategies are usually more prone to convergence issues. For ex-
ample, the robustness of the CO method depends heavily on the way it is
implemented, as described by [Alexandrov and Lewis, 2002].
Several aspects have been identified that are important for the successful
implementation of MDO strategies in large-scale projects [Sobieszczanski-
Sobieski and Haftka, 1997], [Tosserams, 2008]. An ideal MDO strategy would
have the following characteristics:
• Disciplinary design autonomy: The strategy should allow the use of
available expertise and legacy design tools. Local decision authority
should be respected.
• Flexibility: The strategy should be easily adaptable to a specific or-
ganization.
• Mathematical rigor: The strategy should yield reliable and consistent
results, and the optimality of the results should be provable.
• Efficiency: The strategy should lead to an optimal solution in a min-
imum number of iterations and it should minimize design time (e.g.
by concurrency of tasks).
The CO strategy appears to possess most of these characteristics, except
for the mathematical rigor [?].
In our discussion of MDO strategies, the optimizer itself was treated as
a black box. However, the choice of optimization method also influences the
success of the MDO strategy. In this respect, what was said about the selec-
tion of single-objective and multiobjective optimization methods also holds
for MDO. For example, the use of SQP for the system level optimization
problem in CO may cause problems if not used correctly [Alexandrov and
Lewis, 2002].
This concludes our discussion of MDO. Obviously, a lot of ground (ac-
tually most of it) has been left uncovered in this introduction, and the
interested reader is urged to investigate further with the help of recent lit-
erature.
87
design problems. By no means does it provide a comprehensive overview of
the subject, this much should be clear. For more in-depth information about
any of the topics touched upon here, the reader is referred to literature.
A very good reference work that covers all of the aforementioned topics
(and then some) in more detail, is provided by [Keane and Nair, 2005].
However, the field of optimization in general, and that of MDO in particular,
is very dynamic. New methods and improvements are developed all the time,
so it is important always to keep an eye on recent developments.
An interesting discussion of the current status and future of MDO is
given by [de Weck et al., 2007]. Other useful information sources are:
[MDOB, 2008], [de Weck and Willcox, 2004], [Venkataraman, 2002], [Anton-
sson and Cagan, 2001]. For a taxonomy of numerical optimization methods,
visit the NEOS Guide and have a look at the Optimization Tree [Optimiza-
tion Technology Center, 2008].
One very important issue in optimization is the fidelity of analysis mod-
els. Even the ‘best’ optimization methods can only optimize the mathemat-
ical problem description provided to them. Thus, an optimizer may come
up with a perfect optimum for a specific problem description which is math-
ematically correct, but if the mathematics do not accurately represent the
real physical problem, this optimum may be of no use whatsoever.
For example, if stall behavior were neglected in the aerodynamic model
of an aircraft, an optimizer might come up with a solution that requires
unrealistically large angles of attack.
In MDO the importance of model fidelity is even more important due
to the scale of the problems and the number of interactions between disci-
plines. There is always a chance that an MDO process would exploit some
mathematical characteristic of the problem description that has no physical
significance, leading to bogus synergetic effects, for example.
The applicability of a solution is also influenced by uncertainties in the
model. Special techniques exist for dealing with uncertainties in optimiza-
tion, but these are beyond the scope of this text. For more information on
this subject refer to e.g. [Beyer and Sendhoff, 2007].
Although the field of numerical optimization has been around for a long
time, the MDO discipline is still relatively young. As a result there is no
real consensus yet about what constitutes MDO and how best to represent
different MDO strategies. Also, be aware that some of the definitions and
classifications described in this text are not unique. Often different authors
use different labels for the same thing.
Again, it is important to keep an eye on recent developments, especially
in the field of MDO.
88
Bibliography
89
tic or Deterministic?, pages 125–137. Springer-Verlag Berlin Heidelberg,
2003.
Martin Bücker, George Corliss, Paul Hovland, Uwe Naumann, and Boyana
Norris. Automatic Differentiation: Applications, Theory, and Implemen-
tations, volume 50 of Lecture Notes in Computational Science and Engi-
neering. Springer-Verlag Berlin Heidelberg, 2006.
90
Trends in Multidisciplinary Design Optimization. In Proceedings of the
48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics,
and Materials Conference, number AIAA 2007-1905, Honolulu, Hawaii,
April 2007.
Fred Glover. Tabu Search - Part II. ORSA Journal on Computing, 2(1):
4–32, 1990.
Fred Glover, Eric Taillard, and Dominique de Werra. A user’s guide to tabu
search. Annals of Operations Research, 41(1):3–28, March 1993.
91
Y.C. Ho and D.L. Pepyne. Simple Explanation of the No-Free-Lunch The-
orem and Its Implications. Journal of Optimization Theory and Applica-
tions, 115(3):549–570, December 2002.
R. Hooke and T.A. Jeeves. ”Direct Search” Solution of Numerical and Sta-
tistical Problems. Journal of the Association for Computing Machinery,
8(2):212–229, April 1961.
C.Y. Lee. An Algorithm for Path Connections and its Applications. IRE
Transactions on Electronic Computers, EC-10(2):364–365, 1961.
92
Thomas W. Malone and Kevin Crowston. The interdisciplinary study of
coordination. ACM Computing Surveys, 26(1):87–119, 1994.
Antoine McNamara, Adrien Treuille, Zoran Popovic, and Jos Stam. Fluid
Control Using the Adjoint Method. In Proceedings International Con-
ference on Computer Graphics and Interactive Techniques (ACM SIG-
GRAPH 2004), pages 449–456, 2004.
D.J.W. De Pauw and P.A. Vanrolleghem. Avoiding the finite difference sen-
sitivity analysis deathtrap by using the complex-step derivative approx-
imation technique. In Proceedings Summit on Environmental Modelling
and Software (iEMSs2006), Burlington, Vermont, July 9-12 2006.
93
Nestor V. Queipo, Raphael T. Haftka, Wei Shyy, Tushar Goel, Rajkumar
Vaidyanathan, and P. Kevin Tucker. Surrogate-based analysis and opti-
mization. Progress in Aerospace Sciences, 41:1–28, 2005.
Timothy W. Simpson and Farrokh Mistree. Kriging Models for Global Ap-
proximation in Simulation-Based Multidisciplinary Design Optimization.
AIAA Journal, 39(12):2233–2241, December 2001.
W.P. Stevens, G.J. Myers, and L.L. Constantine. Structured Design. IBM
Systems Journal, 13(2):115–139, 1974.
94
Optimization Specialist Conference), number AIAA-2006-1827, Newport,
Rhode Island, 2006.
S. Tosserams. Distributed Optimization for Systems Design: An Augmented
Lagrangian Coordination Method. PhD thesis, Eindhoven University of
Technology, August 2008.
Christian A. van der Velden. Application of Knowledge Based Engineering
to Intelligent Design Systems. PhD thesis, RMIT School of Aerospace,
Mechanical and Manufacturing Engineering, Melbourne, Australia, July
2008.
Ed van Hinte and Michel van Tooren. First Read This - Systems Engineering
in Practice. 010 Publishers, Rotterdam, 2008.
Garret N. Vanderplaats. Multidiscipline Design Optimization. Vanderplaats
Research & Development, Inc., Colorado Springs, CO, 1st edition, 2007.
ISBN 0-944956-04-1.
P. Venkataraman. Applied Optimization with Matlab
R
Programming. John
Wiley & Sons, New York, 2002.
Arun Verma. An introduction to automatic differentiation. Current Science,
78(7):804–807, April 2000.
Terrance C. Wagner and Panos Y. Papalambros. A General Framework for
Decomposition Analysis in Optimal Design. Advances in Design Automa-
tion, 65(2):315–325, September 1993.
John N. Warfield. Binary Matrices in System Modeling. IEEE Transactions
on Systems, Man, and Cybernetics, 3(5):441–449, September 1973.
Eric W. Weisstein. Directed Graph. From MathWorld–A Wolfram
Web Resource, December 22, 2008a. http://mathworld.wolfram.com/
DirectedGraph.html.
Eric W. Weisstein. Vector Norm. From MathWorld–A Wolfram Web
Resource., December 23, 2008b. http://mathworld.wolfram.com/
VectorNorm.html.
Eric J. Whitney, Luis F. Gonzalez, and Jacques Periaux. Multidisciplinary
Methods for Analysis Optimization and Control of Complex Systems,
volume 6 of Mathematics in Industry, chapter Distributed Multidisci-
plinary Design Optimisation in Aeronautics using Evolutionary Algo-
rithms, Game Theory and Hierarchy, pages 249–281. Springer Berlin
Heidelberg, 2005.
Wikipedia. A* search algorithm. Wikipedia, December 23, 2008. http:
//en.wikipedia.org/wiki/A*_search_algorithm.
95
Herbert S. Wilf. Algorithms and Complexity. A.K. Peters Ltd., 1st edition,
1994. http://www.cis.upenn.edu/~wilf.
96
Appendix 2.A
Analytical Sensitivity
Analysis
97
2.A.2 The Direct Approach
The goal is to optimize the system described by these governing equations.
This is done by minimizing some objective function, usually subject to con-
straint functions. Both are usually functions of the design variables and
the state variables. If we wish use gradient based optimization methods,
we require sensitivity information for these functions. We will denote the
general function for which we require gradient information by I (so I may
be an objective or a constraint function or some other function):
I = I(xn , si (xn )) (2.A.2)
The gradient of I can be found using the chain rule of differentiation:
dI ∂I ∂I dsi
= + (2.A.3)
dxn ∂xn ∂si dxn
The partial-derivative terms on the right-hand-side of the equation can be
evaluated easily by varying the denominator, but the total-derivative term
is another matter. The total sensitivity of the state variables with respect
dsi
to the design variables ( dx n
) requires a full solution of the system. This can
be clarified as follows.
For the system to be consistent the governing equations must always be
satisfied, which implies that the total derivative of the residuals with respect
to the design variables must be zero. Again using the chain rule, we find:
dRk ∂Rk ∂Rk dsi
= + =0 (2.A.4)
dxn ∂xn ∂si dxn
Note that the term ∂R
∂si (in index notation) represents the Jacobian matrix.
k
dsi
By rewriting this equation we can determine dx n
:
−1
dsi ∂Rk ∂Rk
=− (2.A.5)
dxn ∂si ∂xn
dsi
Thus, by solving for dx n
and substituting into equation 2.A.3, we can deter-
mine the sensitivity of I. This method is called the direct approach.
However, note that in order to find the total sensitivity of the system
using the direct approach, the matrix equation needs to be solved for every
design variable, so equation 2.A.5 has to be evaluated Nx times.
98
Now we can introduce a vector Ψk (index notation), defined as
∂I ∂Rk −1
Ψk = − (2.A.7)
∂si ∂si
The vector Ψk is often called the adjoint vector. With this definition, the
sensitivity equation (eq. 2.A.6) can be simplified to:
dI ∂I ∂Rk
= + Ψk (2.A.8)
dxn ∂xn ∂xn
We can rewrite equation 2.A.9 to form the adjoint equation:
∂Rk ∂I
Ψk = − (2.A.9)
∂si ∂si
Now, we can solve the adjoint equation for Ψk , and then substitute into
equation 2.A.8 in order to find the system sensitivity. This method is called
the adjoint approach.
Note that the adjoint depends on the function I instead of the design
variables. Thus in order to find the total sensitivity of the system using the
adjoint approach, the matrix equation 2.A.9 needs to be solved for every
function I. The computational burden for the adjoint approach is therefore
largely independent of the number of design variables.
2.A.4 Conclusion
dsi
Thus, in the direct approach it is necessary to compute dx n
once for every
design variable xn , whereas in the adjoint approach it is necessary to com-
pute Ψk once for every function I. The computational burden per evaluation
dsi
of dx n
or Ψk is comparable.
The conclusion is that, if the number of design variables is greater than
the number of functions for which sensitivity information is required (I),
then the adjoint method is computationally more efficient than the direct
method, and vice versa [Martins, 2002].
It should be mentioned that the implementation of analytical sensitivity
methods in large scale numerical codes often requires a large amount of
work. Once this has been done, however, the advantages are obvious.
99
Appendix 2.B
Sequential Quadratic
Programming
Xq = Xq−1 + α∗ Sq (2.B.1)
100
Start
Initialization
q=0, B=I, X=X0
q=q+1
Perform one-dimensional
unconstrained
optimization of
augmented objective (!)
in order to find !*
Yes
Converged? Exit
No
Update B
101
2.B.2 Direction Finding in SQP
The first step, finding Sq , is referred to as the direction finding problem. In
order to solve the direction finding problem in SQP, the original optimization
problem is approximated at the current design Xq−1 using a quadratic ob-
jective and linear constraints, thus forming a quadratic programming (QP)
problem (in terms of S). This allows the use of very efficient solution pro-
cedures that are available for QP problems.
The quadratic approximation to the objective at the point Xq−1 (in
terms of S) is
1
FQP (S) = F (Xq−1 ) + ∇F (Xq−1 )T S + ST BS (2.B.2)
2
where B is an approximation to the Hessian of the Lagrangian of the
original problem. The matrix B is initialized as the identity matrix I, after
which it is updated at every major iteration using a method like BFGS
(Broyden-Fletcher-Goldfarb-Shanno) [Vanderplaats, 2007].
The linear approximations to the constraints at the point Xq−1 (in terms
of S) are
gQPi (S) = gi (Xq−1 ) + ∇gi (Xq−1 )T S ≤ 0 (2.B.3)
for i = 1, . . . , m, where m is the number of constraints.
Note that the approximation to the constraints that is normally used is
slightly more complicated than the one described here. We use this simplified
version for clarity. For the details refer to e.g. [Vanderplaats, 2007].
Now, by optimizing this QP problem we find S∗ , which represents the
new direction for the search. This is because the vector S has the old design
point Xq−1 as its origin. Thus, the optimum of the QP problem is the new
search direction for the line search (equation 2.B.1): Sq = S ∗ .
This completes step 1, determining the search direction. Now, for step
2, the original problem needs to be optimized in this search direction.
where
X(α) = Xq−1 + αSq (2.B.5)
102
and ui is a penalty parameter. The Lagrange multipliers of the approximate
QP problem can be used as penalty parameters, as described in [Vander-
plaats, 2007], but the details are omitted here for clarity.
This optimization yields the line search optimum, α∗ , which is then
substituted into equation 2.B.1 in order to find the new estimate for the
design vector, i.e. Xq .
After the new estimate has been found, the matrix B is updated, and a
new (major) iteration is started (unless the optimum has been found).
Even though SQP is very efficient for well behaved problems, it has
problems handling non-smooth functions, just like other gradient based op-
timizers.
103
Appendix 2.C
Pathfinding
u z
x y
104
The edges can have weights, e.g. indicating cost. Not surprisingly, a
graph with weighted edges is called a weighted graph. The travelling sales-
man problem, for example, can be posed in the form of a weighted graph.
In that case the vertices represent cities and the edges represent the cost of
travelling between cities.
In many routing problems it is common practice to represent the search
space as a (multidimensional) grid. Such a grid can also be interpreted as a
graph, as depicted in figure 2.C.2 for a two-dimensional grid. A move from
one grid cell to another is equivalent to a move from one vertex to another.
1 2 3
1 11 12 13 11 12 13
2 21 22 23 21 22 23
3 31 32 33 31 32 33
Figure 2.C.2: A search grid can be represented as an undirected graph. Vertex labels
represent grid location ij (row i,column j). Left-to-right: search grid, graph without
diagonal movement (taxicab), graph with diagonal movement
1 3 6 10 1 3 5 12 1 14 15 16
2 5 9 13 2 4 11 13 2 11 12 13
4 8 12 15 6 8 10 15 3 8 9 10
7 11 14 16 7 9 14 16 4 5 6 7
Figure 2.C.3: Various graph search approaches applied to a grid-based graph. Left-
to-right: breadth-first, best-first (order of best nodes, e.g. 3,2,6,8,5,12,10,14), and
depth-first.
During the first iteration, a breadth-first search routine examines all the
vertices that are directly connected to the source vertex. During the second
iteration, the routine examines all the vertices that are directly connected
to the new vertices that were examined in the first iteration, and that have
not been examined yet. This process continues until some goal is reached.
105
A best-first search routine also starts by examining all the vertices di-
rectly connected to the source vertex. Then it moves to the best of the
examined vertices (best according to some criterium). From there, the rou-
tine again examines the directly connected vertices. Then the routine takes
into account all the vertices hitherto examined that still have unexplored
edges (also from previous iterations), and moves to the best of those ver-
tices. This is repeated until some goal is reached.
A depth-first search routine starts by examining the first vertex directly
connected to the source vertex, then proceeds by examining the first vertex
connected to that newly examined vertex, and so on until it reaches a vertex
with no remaining edges, after which it returns to the previous vertex to
examine the next directly connected vertex that has not been examined yet,
and so on.
The maze algorithm is an example of a breadth-first search algortihm.
106
One of the earliest approaches to solving such a problem is the maze
algorithm by [Lee, 1961]. The maze algorithm works by propagating a wave
from source to target. At every step the cells at a taxicab distance of k from
the source (as explained below in the gray box) are labeled k. Then k is
incremented by one, and the labeling is repeated. This goes on until the
target is reached, after which the algorithm backtracks to find a shortest
path. The result is depicted in figure 2.C.5.
Note that the Euclidean distance of x is often simply denoted |x|, without
the subscript. This norm represents the shortest possible distance between
two points.
Another distance measure of interest is the L1 norm, also known as taxicab
distance or Manhattan distance. For p = 1 we find
X
|x|1 = |xi | (2.C.3)
i
The taxicab distance is the shortest possible distance between two points if
only orthogonal movement is allowed.
Backtracking simply means working from the target to the source, each
time decreasing k to find the next cell. In the example, after the target has
been reached, the algorithm selects the cell with value 12 that borders the
target, then finds a cell valued 11, then one valued 10, and so on an so forth
until the source is reached. This is indicated by the arrow in the figure. The
result is guaranteed to be a shortest path. Often multiple shortest paths are
possible, as can be seen in the example.
In terms of graph theory, Lee’s maze routing algorithm is essentially a
breadth-first search algorithm. All the vertices connected to the current
vertex are evaluated before moving on to the next iteration, during which
107
all the vertices connected to those new vertices are evaluated, and so on.
Note that the basic maze algorithm evaluates a very large part of the
search space. This leads to the question whether it is possible to refine the
method so as to produce a more efficient algorithm. One possible refinement
would be to include some notion of closeness to a solution. Methods like
those are called heuristic methods.
2 1 2 3 4 5
1 S 1 2 3 4 12
4 5 11 12
5 6 7 8 9 10 11 12
10 9 8 7 6 7 8 9 10 11 12
11 10 9 8 7 8 9 10 11 12
12 11 10 9 8 9 10 T
12 11 10 9 10 11
Figure 2.C.5: One possible solution of the two-dimensional routing problem using
the maze algorithm (source: [van der Velden, 2008]). The number in each cell
indicates the taxicab distance from that cell to the source, and the arrow represents
backtracking.
2.C.3 Heuristics
Heuristics are common sense rules. These are generally informal and based
on experimentation and trial-and-error techniques, as opposed to formal
mathematical rules. A heuristic algorithm should be able to judge whether
the problem is closer to a solution after each iteration [Soanes and Stevenson,
2008]. Heuristics could be used to improve the maze algorithm.
The maze algorithm evaluates a very large part of the search space. This
is largely due to the blind propagation mechanism that is employed, which
causes the algorithm to search in all directions regardless of where the target
is located.
A common sense approach to improve the efficiency of the algorithm
would be to incorporate some rule that says in which direction the algorithm
would most likely have to search in order to find a shortest path.
An example of such a rule is to search only in a direction that decreases
some estimate of distance to the target. Usually a shortest distance measure
is used as an estimate, not taking into account detours that are necessary
due to obstacles and other constraints. Examples of these distance measures
are the L1 norm and the L2 norm (as explained in the gray box).
A well-known algorithm that uses heuristics to efficiently find a best path
is the A∗ algorithm.
108
2.C.4 The A∗ Algorithm
The A∗ algorithm is often used in engineering applications as well as in the
game industry. In real-time-strategy games such as Command and Con-
quer, for example, units need to find a way to their target without too
many detours. Similar problems are encountered in wire harness routing in
the aerospace industry. The A∗ algorithm is a generalization of Dijkstra’s
algorithm1 [Dijkstra, 1959].
Dijkstra’s algorithm is a so called best-first graph search algorithm. It is
a formal approach, in that it is guaranteed to yield a shortest path. At each
iteration the algorithm examines the closest vertices that have not been
examined yet. Thus it expands outward from the source until it reaches
the target, then backtracks. The algorithm is therefore similar to the maze
algorithm discussed earlier.
Dijkstra’s algorithm uses the actual distance to the source as a cost
function. A heuristic approach, on the other hand, would be to use an
estimate for the distance to the target as the cost function. This yields a
best-first approach, which is not guaranteed to find a shortest path, but is
generally much faster than dijkstra’s algorithm.
As long as there are no obstacles a best-first algorithm works pretty well,
but in the presence of obstacles it is likely to yield suboptimal paths. This is
mainly due to the fact that it does not take into account the actual distance
that has been travelled from the source to the current vertex.
The A∗ algorithm uses a combination of Dijkstra’s algorithm and a
heuristic, combining the best of both worlds: It can guarantee a shortest
path and it is guided by a heuristic to improve performance.
For each vertex n that is visited, the A∗ algorithm stores three values: the
actual distance traveled from the source g(n), the estimated distance to the
target h(n) (some heuristic), and the sum of these two f (n) = g(n) + h(n).
The algorithm also maintains a priority queue. The vertex highest in the
priority queue is the first to be examined next. The lower the value of f (n)
for a vertex, the higher its priority.
At each step of the algorithm, the node with the lowest f (n) value (i.e.
the highest priority) is removed from the queue. Then the f (n) and h(n) val-
ues of its neighbors are updated accordingly, and these neighbors are added
to the queue. The algorithm continues until a goal node has a lower f (n)
value than any node in the queue (or until the queue is empty) [Wikipedia,
2008].
An A∗ algorithm based on orthogonal movement would solve the example
problem as depicted in figure 2.C.6. When comparing this figure to the maze
solution, it becomes clear that the A∗ algorithm is much more efficient.
1
Be sure to check out Dijkstra’s manuscripts at the E.W. Dijkstra Archive [Richards,
2008].
109
S
Figure 2.C.6: The vertices evaluated by an A∗ algorithm (by [Patel, 2008]) in order
to solve the simple routing problem from [van der Velden, 2008].
110