You are on page 1of 86

Chapter 2

Multidisciplinary Design
Optimization

This chapter provides a brief introduction to the subject of numerical opti-


mization and its application to the design of complex systems, as studied in
the field of Multidisciplinary Design Optimization (MDO).
The aim is not to provide a rigorous mathematical discussion of numeri-
cal optimization methods, but rather to create a foundation that will enable
the reader to find his/her way in the abundance of optimization literature.
The chapter consists of four parts. We will start by introducing some
general subjects that are important for the understanding of optimization
in general.
After this introduction, we will have a look at numerical optimization
of single-objective problems, i.e. optimization for a single goal. Single-
objective optimization methods form the basis for MDO.
Next, the optimization of multiobjective problems (problems with multi-
ple conflicting goals) is discussed, albeit rather briefly. At this point people
start to play a role in the optimization process.
Finally we are ready to have a look at the actual field of multidisci-
plinary design optimization, which can incorporate both single-objective and
multiobjective optimization problems, and which requires a much broader
perspective than simply a mathematical one. The organizational aspect is
quite important here, which is reflected in the use of a systems engineering
approach.
Thus we will traverse the spectrum from the mathematically well de-
fined single-objective optimization problem, through the slightly less well
defined multi-objective optimization problem, to the relatively vague multi-
disciplinary design optimization problem. But let’s start gently.

25
2.1 A Gentle Introduction to MDO
2.1.1 Multidisciplinary Design
The goal of multidisciplinary design optimization, according to [Kroo, 1997],
is to ‘provide a more formalized method for complex system design than is
found in traditional design approaches’. This goal is to be achieved by
applying numerical optimization techniques to the multidisciplinary design
process.
For us to understand what this means, it is important to know what is
meant by ‘complex system design’, ‘traditional design approaches’, and of
course ‘numerical optimization.’
By design, we mean the process of defining a system that meets some set
of requirements. This process is iterative in nature: It is usually not possible
to design the system in one go. Generally, an initial design is analyzed
and based on the results of this analysis the design is adapted, until the
requirements are met. By analysis we refer to the process of determining
the response of a system to its environment [Vanderplaats, 2007].
Complex system design refers to the design of systems that are too com-
plicated for a single person to be able to grasp all the details and interactions
associated with it. Examples of such systems are aircraft, cars, wind tur-
bines, and so on and so forth. Inherent to the complexity of these systems
is the multidisciplinary nature of the design process.
The multidisciplinary design approach offers a way to cope with the com-
plexity of the system. The division of a large complex system into smaller,
more manageable subsystems (disciplines, subdisciplines, etc.), enables peo-
ple to get a grip on the design task.
In multidisciplinary design, experts from many different disciplines (e.g.
structures, aerodynamics, electronics, etc.) have to work together to pro-
duce a single consistent end-product (design) that meets the specified re-
quirements.
These disciplines are often mutually dependent, which implies that large
amounts of information need to be exchanged between them. Some form of
coordination is required in order to guide this process in the right direction.
This information management task is complicated greatly by the iterative
nature of the design process.
One example of such an iterative process is fixed point iteration (FPI).
Fixed point iteration can be applied on different levels in design and analysis,
but the basic idea behind the concept is described mathematically in the gray
box below.

26
Fixed Point Iteration

A point c is called a fixed point of the function f if it satisfies the


equation f (c) = c. In many cases, a fixed point can be found by start-
ing with an initial guess x0 and calculating successive approximations
x1 = f (x0 ), x2 = f (x1 ), etc. [Adams, 1999]. This process is called fixed
point iteration (FPI).

In general, fixed point iteration is described by

xn+1 = f (xn ), for n = 0, 1, 2, . . . (2.1)

If the change in function value between two successive iterations approaches


zero (to within some specified tolerance), the process is said to have con-
verged.
The FPI process converges to the fixed point of f for any given starting
point x0 if, on an interval I = [a, b], f (x) belongs to I whenever x belongs
to I, and if there exists a constant K with 0 < K < 1 such that for every u
and v in I, |f (u) − f (v)| ≤ K|u − v| (Lipschitz continuous).
Note that in real design problems, fixed point iteration does not always
lead to convergence.

Newton’s method, used for finding the root of an equation g(x) = 0,


can be interpreted as an example of fixed point iteration:

g(xn )
xn+1 = xn − (2.2)
g 0 (xn )

Traditionally, the dependence between disciplines is often handled using


a form of sequential design, in which first one discipline is ‘solved’, then
the results are passed on to another, and so on, until the process can be
repeated.
Another way to cope with the problem of information management in
these iterative design processes is by means of system reviews, during which
the results from separate (concurrent) disciplinary analyses are combined.
After a system review decisions can be made about the next iteration step.
On lower levels however, information management is often handled much
more informally. For example, engineers from different design departments
may talk to each other near the coffee machine and exchange some infor-
mation there. Obviously it is quite difficult to keep track of these kinds of
informal information exchange.
Furthermore, even if information exchange is formalized, some kind of
translation is often necessary. For example, a CFD (Computational Fluid

27
Dynamics) model of a part usually has a different kind of grid than a FE
(Finite Element) model. Hence, if one is to be used as a basis for the other,
some translation between grids needs to be made.
As a result of such difficulties in information management, among other
things, it is often very hard even to produce a design that meets all the
requirements. Due to limited resources (e.g. time), a design often goes
through only a few (system level) iterations. This implies that it is difficult to
perform optimization in traditional complex system design. An interesting
discussion of these issues is given by [Kroo, 1997].
In the concurrent engineering approach, the traditional division into spe-
cialist design departments is replaced by integrated product development
teams with a more multidisciplinary nature [Bartholomew, 1998]. MDO is
meant to enable such an integrated approach.
As mentioned, the goal of MDO is to provide a more formalized method
for complex system design. The general idea is that, by formalizing the mul-
tidisciplinary design process, it can be made amenable to numerical opti-
mization. Among other things, the use of numerical optimization is expected
to provide a considerable improvement in efficiency and robustness of the
design process, and it is expected to help in avoiding difficulties associated
with traditional design methods such as sequential design [Kroo, 1997].
Thus, numerical optimization plays a very large role in MDO.

2.1.2 Numerical Optimization


So, what exactly is meant by optimization? Optimization can be described
as the search for the best possible solution to a given problem. The meaning
of ‘best possible’ depends on the problem and on the available resources.
In general there are four different approaches to optimization: analytical,
graphical, experimental, and numerical. This text focuses on numerical
optimization because this is the only approach that can efficiently handle
complex systems, as required for MDO.
In numerical optimization, the ‘goodness’ of a solution is expressed in
terms of one or more objective functions (goal functions). A solution is
usually represented by a design vector, often simply called a design. An
optimizer is a mathematical tool for searching the design space, i.e. the set
of all possible designs, in a way that is more efficient than simply evaluating
all possibilities.
A basic numerical optimization process is depicted in figure 2.1. This
figure shows the design vector Xq (where q is the iteration counter), the
objective function f (Xq ), and constraint functions g(Xq ) and h(Xq ). These
two will be discussed at a later stage. As implied by the arrows in the fig-
ure, numerical optimization is an iterative process. A convergence criterion
is used in order to assess whether the optimizer has reached an optimum
solution. Such an optimum solution is denoted by an asterisk (X∗ ).

28
Figure 2.1: The basic optimization process (source: [de Weck et al., 2007])

Now, you may wonder how an optimizer actually solves a problem. Per-
haps an example can shed some light on this issue.
Suppose you would like to know the location of the deepest point on the
bottom of a murky lake. However, you only have one hour to do it, and you
have nothing but a small rowboat at your disposal with a long stick that
can be used as a depth gauge. Furthermore, you don’t have a clue what the
bottom of the lake looks like, let alone where to find the deepest point...
This pretty much sums up the basic problem of optimization: You want
to find the maximum depth (or minimum height), but the resources at your
disposal are limited, and you have no idea what the depth ‘function’ looks
like. Of course, in the world of aircraft and spacecraft the problem is usually
much more complex, but we will get to that soon enough. For now, let us
consider your options for the problem stated above. First some terminology,
intended to make our discussion more clear.
The surface of the lake represents the domain of all possible locations
for your rowboat, and is referred to as the design space. Let’s assume that
the depth of the lake can be represented by a function of the location on
the surface. This function is called the objective function, or simply the
objective. Your goal is to find the deepest point, in other words you want to
find the location at which the objective is maximized (or minimized). This
point is the optimum.
In order to know the depth at location Xq , you need to use the depth
gauge to feel for the bottom. This is called an evaluation of the objective
function: f (Xq ). Each function evaluation has a cost associated with it.
The cost may be expressed in terms of various resources, but in this context
it will be expressed in terms of time.
An evaluation is considered expensive if it requires a lot of time (e.g.
computation time for numerical simulations). A search method that requires
many function evaluations is also deemed expensive (unless the evaluations
are very cheap).

29
Now, the most obvious way to find the deepest point of the lake would
perhaps be to divide the surface into a fine grid of many thousands of points,
and then to evaluate the depth at every point. This would yield a map of the
entire bottom, perhaps similar to figure 2.2, from which the deepest point
could easily be found.
If scarcity of resources were not an issue, this kind of approach would
be very interesting [Nievergelt, 2000]. Unfortunately, this exhaustive search
method is not applicable in our case because it would be much too expensive:
Gauging the depth of a large lake at many thousands of locations, using a
rowboat, in one hour? Go figure.

Figure 2.2: Contour lines of the bottom of murky lake

Another possibility is to keep evaluating the objective at random points


on the lake until you run out of time. In this case there is some small
chance that one of the evaluations will be close to the actual deepest point.
The more locations you manage to evaluate, the better the chance that the
optimum of your evaluations corresponds to the actual deepest point, but
you will never be completely sure until you’ve covered the entire lake (i.e.
exhaustive search). Thus random search relies mostly on luck.
Countless variations on this method exist, all aimed at reducing the
amount of luck required, but the fact remains that (too) many evaluations
are necessary to obtain reasonable certainty about the supposed optimum.
For example, it would be possible to choose some starting point and
then to choose a random direction, move some distance in that direction

30
and gauge the depth. If the new point is deeper, choose a new random
direction and repeat, otherwise go back to the previous location, choose
another direction and gauge the depth again. This sequence is then repeated
until a better location is found. If no improvement can be found in any
direction, it would be possible to change the move distance. If still no
improvement is found, you may conclude that you have reached an optimum.
However, not only does this kind of method still rely on luck, it also
has another problem. For example, looking at figure 2.2, if you start on the
left-hand-side of the lake, it is likely that you arrive at a local deep spot and
would mistake this for the deepest point of the lake. The rest of the lake
would probably remain unexplored.
You might also choose to adopt a kind of pattern search strategy, in
which you evaluate (explore) points around your current location in some
predefined pattern. If you find a better (i.e. deeper) point, move there and
explore the points surrounding the new location. If not, reduce the distance
between points in the pattern and explore again. This sequence is then
continued until the distance between points becomes very small. Another
related method is that of [Hooke and Jeeves, 1961].
Now, in order to make things more interesting, suppose that you can
also use the depth gauge to determine the slope (gradient) of the bottom.
This extra information allows you to determine in which direction you need
to move to find a deeper point. Once this direction is established (direction
of steepest descent), you only need to move in that direction, gauging the
depth every few meters until you no longer find a deeper point. Then you
determine the slope again in order to find a new direction in which to search.
This method is known as line search or one dimensional search.
Although gradient based methods can also become trapped in local deep
spots, they are usually orders of magnitude faster than direct search. There-
fore, this text will focus primarily on gradient based search methods, with
only a few excursions to direct search type methods.
The solution procedures or algorithms described above are well suited
for numerical implementation. The performance of an algorithm can then
be measured by the number of numerical operations required to solve a
specific problem. This subject is studied in the field of computing science,
and specifically that of computational complexity.
Although complexity theory is of little practical significance for us, it is
important to know at least a little something about it.

2.1.3 Computational Complexity


The complexity of an algorithm is defined as the cost, or amount of resources
required to solve a problem using this algorithm [Wilf, 1994]. Cost may be
expressed in various types of unit, but in this context we consider calculation
time to be the most important one. Computation time can be expressed in

31
seconds, for example, but also in terms of computational steps. The amount
of computational effort required by different algorithms is studied in the
field of computational complexity.
Although this is not the place for an in depth discussion of computational
complexity theory, some of the basic concepts from this field do need to
be considered. This has to do mainly with the performance assessment of
optimization methods. Indeed, in order to make a statement about the
performance of an optimization algorithm on a specific problem, it would
help if we knew the performance limits for the problem, i.e. the minimum
and maximum possible cost for solving the problem.
For example, a matrix multiplication of two n × n matrices (A and B)
will at least require 2n2 computational steps, because the algorithm will at
least have to look at all the input, and the number of entries in the two
matrices is 2n2 . It can be shown that an upper limit (worst case complexity
estimate) for the matrix multiplication problem is in the order of n3 com-
putational steps. If C = AB then C is also n × n, and for every entry in C,
n multiplications need to be performed, hence n × n × n = n3 .
An algorithm that has complexity in the order of some polynomial nk ,
for large n, where n is the amount of input data and k is some number, is
said to exhibit polynomial time performance. Such algorithms are classified
as fast, irrespective of the magnitude of k (so the calculations may still take
a very long time). A slow algorithm on the other hand would require more
than polynomial time to come up with a solution, for example exponential
time algorithms require in the order of k n computational steps (figure 2.3
shows the difference in growth1 ). If a problem can be solved in polynomial
time it is said to be easy, if the solution cannot be found in polynomial time
the problem is considered hard.
In literature one may encounter reference to special classes of problems
called P and NP (Polynomial and Non-deterministic Polynomial, respec-
tively). These are just two of many (488!) complexity classes [Aaronson,
2008]. The class P contains the set of all easy problems, i.e. all the problems
that can be solved in polynomial time. The class NP contains the problems
for which the solution may or may not be found in polynomial time, but for
which a solution can at least be checked in polynomial time. In other words,
P contains the problems for which it is easy to find a solution, whereas NP
contains the problems for which it is easy to check a solution (i.e. to see
whether the solution is correct), even though finding the solution may be
hard.
Problems considered in complexity theory are usually decision problems.
A decision problem is one that has only a ‘Yes’ or ‘No’ answer. Optimization
problems can often be reduced to decision problems, and it can be shown
1
An interesting view on the implications of exponential growth is provided in the video
lecture by [Bartlett, 2004]

32
computational steps

2n

n9

Figure 2.3: The difference in growth between some polynomial function (nk ) and
some exponential function (k n ). Depending on the value of k, the polynomial func-
tion may grow faster initially, but as n increases the exponential always ‘wins’.

that if there is a fast way to solve the decision problem then there is a fast
way to solve the optimization problem [Wilf, 1994].
A decision problem is NP -complete if it is in NP and if every problem
in NP can be quickly reduced to it. This implies that if a fast solution to
an NP -complete problem could be found, it would be fast for every problem
in NP. The only downside is that such a solution has not been found (yet).
There is even a one-million dollar prize waiting for the person that does find
this solution.
A decision problem is NP -hard if it is at least as hard as the hardest
problems in NP, but it does not necessarily have to be in NP. Probably
the best-known example of an NP -hard problem is the Travelling Salesman
Problem (TSP) depicted in figure 2.4.

Figure 2.4: The travelling salesman problem

33
Suppose a travelling salesman needs to visit a certain number of cities.
The cost of travelling between each pair of cities is known. The travelling
salesman problem, posed as an optimization problem, is to find the route
that minimizes the travelling cost while satisfying the constraints that every
city has to be visited at least once and that the starting point is also the
end point (round-trip) [Cook, 2008].
The TSP can also be stated as a decision problem. For example: Given
the costs and some number c, decide whether there is a round-trip route
that is cheaper than c.
Although we have merely touched the tip of the iceberg on the subject
of complexity, this is as far as we go. For more information, please refer to
e.g. [Wilf, 1994], [Aaronson, 2008], or [Papadimitriou, 1994].
By now we are ready to start looking into numerical optimization in
more detail.

2.2 Single Objective Optimization


2.2.1 The Single Objective Optimization Problem
In the context of this chapter, the term optimization refers to the minimiza-
tion2 of a function or set of functions. A function that is to be minimized is
referred to as an objective function or simply an objective. An objective is
a scalar valued function of one or more variables. In the latter case we are
talking about multivariate (or multi-variable) optimization. An objective
may be either linear or nonlinear. Optimization problems may have one or
more objectives (or even none at all, in the case of constraint satisfaction
problems).
The mass of an aircraft component is an example of an objective. It is
obvious that we would like to minimize its value. The mass of a component
can usually be expressed in terms of dimensions and material properties,
such as m = x1 x2 x3 ρ (e.g. length, width, height and density, respectively).
This is a multivariate objective function.
Normally in optimization problems there are additional requirements to
be taken into account, for example we may want to minimize an objective
function while requiring that all variables retain non-negative values, as in
the example of component mass. In such a case the problem is referred to as
a constrained optimization problem, and the extra requirements are called
constraints or constraint functions.
Constraints often take more complicated forms. For example, the com-
ponent for which we would like to minimize the mass will also have to fulfill
certain strength requirements, which can be expressed in terms of allowable
2
A maximization problem can be transformed into a minimization problem by changing
signs.

34
Table 2.1: Classification of optimization problems
Characteristic Variations
Objective Function Structure Single-objective or Multiobjective
Model Structure Linear or Nonlinear
Constraint Existence Unconstrained or Constrained
Design Space Convexity Convex or Non-convex
Design Variable Type Real Cont./Discr. or Integer/Binary
Design Variable Nature Static or Dynamic
Parameter Nature Deterministic or Stochastic
Adaptation of Table 2.1 in [Jilla, 2002]

stress or strain. These can also be expressed in terms of the dimensions x1


to x3 .
Note that unconstrained problems are usually easier to solve than con-
strained problems. Luckily it is often possible to transform a constrained
problem into an equivalent unconstrained problem, as will be demonstrated
later on.
If the objective and constraints are all linear equations, the problem is
referred to as a linear programming (LP) problem. Very effective techniques
are available for solving linear programming problems, e.g. the simplex
method. If the objective is quadratic and the constraints are linear, the
problem is a quadratic programming (QP) problem. If at least one of the
equations is nonlinear, we speak of a nonlinear programming (NLP) problem.
Of course LP and QP problems are special kinds of NLP problems, but they
are often considered separately because there are more efficient solution
techniques available for these types of problems.
Other characteristics by which to classify optimization problems can be
found in Table 2.1, and an interesting taxonomy of optimization methods
can be found in [Optimization Technology Center, 2008].
The general single-objective nonlinear constrained optimization problem
can be written in the following standard form [Vanderplaats, 2007], [Anto-
niou and Lu, 2007], [Snyman, 2005]:

The nonlinear constrained optimization problem


Minimize:
F (X) (2.3)
Subject to:
gj (X) ≤ 0 f or j = 1, 2, · · · , m (2.4)
hk (X) = 0 f or k = 1, 2, · · · , l (2.5)
xL U
i ≤ xi ≤ xi f or i = 1, 2, · · · , n (2.6)

35
In this description,
X = (x1 , x2 , · · · , xn )T (2.7)
is the design vector consisting of all the design variables xi , F is the (scalar)
objective, gj represents the inequality constraints, hk represents the equal-
ity constraints, and the other constraints are called side constraints. Side
constraints are actually inequality constraints, but in many cases it is more
convenient to treat them separately because they impose direct limits on the
design space (i.e. the collection of all possible design vectors, a.k.a. decision
space).
Notice that it is possible to get rid of an equality constraint by replacing
it with two equal but opposite inequality constraints. This is important
because some optimization methods are better equipped to handle inequality
constraints.
The search space or feasible region is a subset of the design space where
all constraints are satisfied [Miettinen, 1999]. A solution to the optimiza-
tion problem must be feasible (i.e. it must be part of the feasible region),
otherwise it is of no use.

An important concept in optimization is that of convexity. This concept


is used in establishing the existence and uniqueness of an optimum.
Imagine a straight line segment connecting two points, X1 and X2 , in
a set. If all the points on the line segment are also part of the set for any
combination of X1 and X2 , the set is convex. A two dimensional example
of this concept is found in Figure 2.5.

X1

X1
X1

X2 X2 X2

Convex set Non-convex sets

Figure 2.5: Convex and non-convex sets in two-dimensional space

A function is convex if the region above its graph is a convex set. If


a function is convex on its domain, it has only one minimum, whereas a
concave function or a non-convex function has several minima (and maxima),
as becomes clear from Figure 2.6.
If the objective function as well as the constraints are convex functions,

36
Convex Concave Non-convex

Figure 2.6: Convexity in functions. The region above a concave function represents
a non-convex set, hence a concave function may have several local minima on its
domain.

the feasible design space defines a convex set, and there will be only one op-
timum, which is the global optimum. A non-convex design space will have
multiple optima and there need not be a single global optimum (since mul-
tiple optima may have the same value). Necessary and sufficient conditions
for a global optimum are discussed in more detail in Section 2.2.6.
Keep in mind throughout this text that, in general, we will not have a
clear view of what the design space actually looks like. This implies that we
are more or less working blindfolded. In practical engineering problems it is
rarely possible to ensure that an absolute optimum will be found.

2.2.2 Classification of Search Methods


The numerical optimization process is an iterative search process: Start-
ing with an initial guess, a progressively improving solution is found until
some convergence criterion is satisfied [Antoniou and Lu, 2007]. This search
process can be described mathematically as follows.
Let a design be represented by the design vector X in the design space.
The first goal in any design problem is to come up with one or more initial
designs. Once an initial design X0 is established3 , it will generally not rep-
resent an optimal solution. This implies that the design could be improved
upon by changing the values of the individual design variables. Let this
change be represented in vector form by the perturbation δX. The iterative
search process is then described by:

Xq = Xq−1 + δXq (2.8)

where q is the iteration number. Note that a new solution could also be
worse than the original, in which case it would obviously be rejected.
Numerical optimization methods differ in the way the required pertur-
bation δX is determined. The perturbation may be completely random or it
may be based on information about the sensitivity of the objective to change
in the design variables, i.e. gradient information. This leads to the impor-
tant distinction between gradient based methods and non-gradient based
methods.
3
Note that establishing an initial design is usually far from trivial unless the problem
is very simple.

37
Gradient based methods use first order or second order gradient infor-
mation of the objective in order to determine the required perturbation in
the design variables, δX, hence these methods are referred to as first order
methods or second order methods respectively.
Non-gradient based methods, also known as direct search methods [Hooke
and Jeeves, 1961] or zero-order methods, only make use of objective function
values. They are called combinatorial methods if the set of feasible solutions
is discrete.
Non-gradient based methods are usually one or two orders of magnitude
more expensive, in terms of computational cost, than gradient based meth-
ods [Whitney et al., 2005]. Thus, if a gradient based method can be applied
this is usually preferred over non-gradient based methods. Nevertheless the
nature of the problem may complicate, or even prevent, the use of gradient
based methods, e.g. in the case of discrete variables, non-smooth functions,
or a non-convex design space.
One disadvantage of gradient based methods is their inability to explore
the design space. In a non-convex design space (multiple local optima) a
gradient based method will converge to a local optimum and then stop,
whereas non-gradient based methods often explore the whole design space
and are thus better suited for finding a global optimum, although they usu-
ally exhibit slow convergence. Global optimization algorithms may use a
combination of gradient based methods and non-gradient methods in or-
der to achieve a balance between exploration and exploitation of available
information [Neumaier, 2004], [Eskandari and Geiger, 2008].
If gradient based methods are used, the technique used to obtain the
gradient information is very important, because this has a large influence
on the number of function evaluations required for the optimization.

2.2.3 Sensitivity analysis


Sensitivity analysis is the process of obtaining gradient information, in the
form of gradient vectors and Hessian matrices. The gradient vector ∇F , a
vector of partial derivatives of the function F , defines the direction of max-
imum rate of change in a point [Adams, 1999]. The Hessian matrix 4 , is a
matrix of partial second-derivatives of F .

4
The Hessian matrix is not to be confused with the Jacobian matrix, a matrix contain-
ing the gradient vectors of a vector valued function (e.g. F(X)) as its rows.

38
A scalar valued function F (X), with X = (x1 , x2 , . . . , xn )T , has a vector
valued gradient
∂F T
 
∂F ∂F
∇F (X) = , ,...,
∂x1 ∂x2 ∂xn
and its Hessian matrix is defined as
∂2F ∂2F ∂2F
 
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
∂2F ∂2F ∂2F
 
2

∂x2 ∂x1 ∂x22
... ∂x2 ∂xn

HF (X) = ∇ F (X) = 
 
.. .. .. .. 

 . . . . 

∂2F ∂2F ∂2F
∂xn ∂x1 ∂xn ∂x2 ... ∂x2n

Note: The Hessian matrix of F is equal to the Jacobian matrix of ∇F ,


provided that F is sufficiently smooth to ensure equality of mixed partial
derivatives (Fij = Fji ) [Adams, 1999].

There are several ways to obtain gradient information, i.e. perform sensi-
tivity analysis. If the objective and constraints are well-behaved analytical
functions, an expression for the gradient can be found directly. However,
this is rarely the case in real life problems. Often complicated numerical
analysis tools are used which, in many cases, do not readily yield gradient
information.
In those cases where the objective is analyzed with the help of ‘black-
box’ tools that yield only function values, the simplest approach to obtaining
gradient information is via the method of finite differences (FD). Because
this is an approximate method it will give rise to inaccuracies, and as such
the method is prone to numerical instability.
A finite difference approximation of the gradient of a function of multiple
variables requires one extra function evaluation per variable, as becomes
clear from the (forward) finite difference scheme for the ith partial derivative
in ∇F [Pauw and Vanrolleghem, 2006]:

∂F F (X + ∆Xi ) − F (X)
≈ (2.9)
∂xi hi
Note that ∆Xi is a vector with a single non-zero entry: a small pertur-
bation hi on the ith component of X. This implies that, if the objective is
a function of n variables, a finite difference approximation of ∇F requires n
perturbed function evaluations for every optimization iteration.
If a function evaluation is cheap, i.e. low computing cost, this is not
a big problem, but in real engineering problems complicated analysis tools
are often used, such as CFD (Computational Fluid Dynamics) or FE (Finite
Element) solvers, which can be quite computationally intensive5 .
5
These codes are themselves often iterative in nature.

39
Furthermore, selection of the correct perturbation factor h is often a
problem. If the factor is chosen too small the method will suffer from numer-
ical inaccuracies because two almost-equal numbers are subtracted, whereas
a factor that is too large may result in instability due to nonlinearity of the
function.
Fortunately more advanced techniques exist for obtaining gradient in-
formation that offer significant advantages in terms of numerical accuracy
and/or computational burden.
Important methods are the Complex-step Derivative Approximation tech-
nique [Pauw and Vanrolleghem, 2006], the use of Global Sensitivity Equa-
tions (for multidisciplinary system sensitivity) [Sobieszczanski-Sobieski, 1990],
and the use of adjoint methods [Giles and Pierce, 2000], [McNamara et al.,
2004], [Kaminski et al., 2005], and automatic differentiation (AD) [Verma,
2000], [Griewank, 2000], [Rall, 1981], [Bücker et al., 2006].
Sensitivity information can be calculated analytically using the direct
method or using the adjoint method. The computational burden for the
adjoint method depends on the number of functions for which the gradients
need to be calculated, rather than on the number of design variables. For the
direct method it is the other way round. Therefore, the adjoint approach
can be very efficient when the number of design variables is larger than
the number of functions, and otherwise the direct method may prove more
interesting [Martins, 2002]. These analytical approaches are discussed in
some more detail in Appendix 2.A.
Automatic differentiation (a.k.a. algorithmic differentiation) is a tech-
nique that makes use of the chain rule from calculus. Numerical analysis
programs, for example FE codes or CFD codes, can be adapted using AD
techniques to yield gradient information to working precision at an effective
cost of at most five extra function evaluations, regardless of the number of
independent variables (compared to n extra evaluations for FD!) [Griewank,
2000].
Any numerical analysis program, no matter how complex, is composed of
basic mathematical operations such as e.g. ∗, +, sin, exp. AD operates by
decomposing the program into these basic elements, evaluating their deriva-
tives (using actual numerical values, not symbolic variables), and using the
chain rule to recombine those values to form the gradient. This process may
be performed in forward mode or reverse mode (adjoint mode). The latter
is the method of choice for functions of many variables.
Although AD methods yield gradient information accurate to working
precision at relatively low cost, they require specific adaptation of the ana-
lyzer code, which complicates their use with ‘black-box’ analyzers.
Another very powerful option for reducing computational cost, which
has benefits throughout the optimization process, is the use of surrogate
models such as response surfaces [Jones, 2001]. With the help of techniques
from design of experiments, a field focused on efficiently gathering useful

40
information, a number of design points are selected and the objective is
evaluated for these points using the full analysis model (e.g. a CFD model).
Then (hyper-) surfaces are fit to the design points, and these so-called
response surfaces are used as approximate models in the optimization, in-
stead of the original (CFD) model. The evaluation of a surrogate model
is much cheaper than the evaluation of the original model. The applica-
tion of response surface methods may even allow the use of computationally
expensive global optimizers [Laurenceau and Meaux, 2008].
The use of response surfaces, design of experiments and global optimiza-
tion are discussed at a later stage.
But no matter how the gradient information is obtained, once it is avail-
able, it becomes possible to use gradient based optimization methods.

2.2.4 Gradient Based Optimization Methods


By definition simple problems are easier to solve than hard problems. For
this reason, people often try to transform hard problems into simpler ones
for which they know how to obtain a solution.
Most gradient based methods are examples of this approach. They re-
duce an n-dimensional optimization problem to an optimization problem
in one dimension called one-dimensional search or line search. The one-
dimensional search approach is described as follows.
The iterative search process described by equation 2.8 is transformed
into the following one by assuming that the perturbation δXq is equal to
α∗ Sq [Vanderplaats, 2007]:

Xq = Xq−1 + α∗ Sq (2.10)
where the vector Sq represents a search direction and the scalar α∗ rep-
resents the amount of change in that direction necessary to minimize the
objective function F (Xq ) subject to the constraints. The asterisk denotes
an optimum.
Note that Equation 2.10 effectively expresses the design vector Xq as a
function of the single variable α. As a result, the objective and constraints
are also reduced to functions of a single variable, i.e. F (Xq ) = F (Xq (α)).
This single variable optimization problem can now be solved to find the
optimum perturbation δXq .
Thus, in order to find Xq , two steps need to be taken:

1. Choose a search direction Sq


2. Minimize the resulting one-dimensional problem in terms of α in order
to find δXq = α∗ Sq

The one-dimensional optimization in step 2 is usually performed us-


ing a combination of well established methods such as the iterative Golden

41
section method and the polynomial approximation method [Vanderplaats,
2007]. It is very important to understand that this one-dimensional opti-
mization is performed in every iteration q of the main optimization: The n-
dimensional optimization problem is effectively transformed into a sequence
of one-dimensional searches.
Step 1, the selection of a search direction Sq , is where the various gra-
dient based optimization methods are distinguished from each other. In
this direction finding problem, there are two basic approaches, based on the
distinction between constrained and unconstrained optimization problems.
Direct optimization methods approach the constrained problem directly,
so they are able to take into account constraints explicitly. Examples of
direct gradient based methods are: sequential linear programming (SLP)6 ,
the method of feasible directions, the modified feasible directions method,
the generalized reduced gradient method, and sequential quadratic program-
ming (SQP).
SQP is a very powerful method which is often used. A concise description
of the method can be found in appendix 2.B.
Examples of direct non-gradient based methods are: genetic algorithms,
particle swarm methods, simulated annealing.
Indirect methods first reduce the constrained problem to an equivalent
unconstrained problem by producing a pseudo-objective function with the
help of penalty functions and/or Lagrange multipliers. Subsequently they
proceed by solving the unconstrained problem using well established meth-
ods for unconstrained optimization. This will be discussed in more detail in
the next section.
Due to the use of penalty functions, numerical ill-conditioning may arise.
In order to prevent this problem, the pseudo-objective is often optimized se-
quentially, starting out with small values for the penalty parameters and
then sequentially increasing these values. After each increase a new un-
constrained optimization needs to be performed. Methods that work like
this are referred to as Sequential Unconstrained Minimization Techniques
(SUMT) [Vanderplaats, 2007]. Examples of SUMT methods are: the ex-
terior penalty function method, various interior penalty function methods,
and the augmented Lagrange multiplier method.
Examples of first-order techniques for unconstrained optimization are:
the steepest descent method, the conjugate direction method, and variable
metric methods. Newton’s method is an example of a second-order method.
In order to obtain a better understanding of optimization algorithms we
will now proceed by examining one of the indirect (sequential unconstrained)
optimization methods in more detail.
6
SLP does not strictly use line search as defined in Equation 2.10, instead it minimizes
a first-order Taylor (i.e. linear) approximation of the objective about the q th design
point using a linear programming method (e.g. the simplex method) in order to find the
perturbation vector δX directly.

42
2.2.5 An Example: The Exterior Penalty Function Method
in Combination with the Method of Steepest Descent
The exterior penalty function (EPF) method is treated here because it is
well suited for clarifying the kind of nested-loop structure7 that occurs in
optimization algorithms. In practical applications the EPF method has been
replaced by more advanced sequential methods that include Lagrange mul-
tipliers (e.g. the Augmented Lagrange Multiplier method) [Vanderplaats,
2007].
The complete EPF algorithm as described here is depicted in Figure 2.7.
The figure clearly shows the nested iterative-loop structure that is typical
for numerical optimization algorithms.
Indirect methods employ a pseudo-objective, which incorporates the ob-
jective as well as the constraints, to transform the constrained problem into
an unconstrained problem. This can be done using a kind of weighted sum
of the constraint functions, called the penalty function P (X):

Φ(X, rp ) = F (X) + rp P (X) (2.11)


where the scalar rp is called a penalty parameter and subscript p rep-
resents the iteration number of the sequential unconstrained optimization.
For the EPF method the pseudo-objective Φ takes the following form:

 
m
X l
X
Φ(X, rp ) = F (X) + rp  (max[0, gj (X)])2 + (hk (X))2  (2.12)
j=1 k=1

Note that, if the constraints are satisfied, no penalty is applied (both


summations in the penalty function equate to zero). Any constraint violation
however is penalized by the factor rp . The constraint violations are squared
in order to ensure a continuous slope of the pseudo-objective.
For small values of rp the function Φ is easily minimized but may yield
large constraint violations, because constraint violations only have a rela-
tively small effect on the value of the pseudo-objective (note that an opti-
mum is only approached to within some predetermined accuracy, as defined
in the convergence criteria). On the other hand, if a large value is chosen
for rp , the constraints may be nearly satisfied but the optimization problem
becomes poorly-conditioned in a numerical sense.
This is why a sequential approach is used: The pseudo-objective is first
minimized, using an unconstrained minimization technique, for some small
value of rp . This will yield a solution that is infeasible, but it will be some-
where in the neighborhood of the real optimum. Then the penalty parameter
7
This is because complex problems are reduced to simpler problems by introducing
extra iterative loops (e.g. constrained ⇒ unconstrained, multivariate ⇒ single-variable).

43
One-dimensional minimization

Steepest descent

Find bounds on
minimum of Φ(α)
(iterative process)
S = -∇Φ(X)
Exterior penalty function method

Minimize: Refine bounds using


Start Φ(X + α S) Golden Section method
to find α* (iterative process)
no
Given: X0 , r p, γ
X = X + α* S Apply

44
polynomial
approximation
Minimize:
to find α*
Φ(X,rp ) Converged?

yes
Converged? yes Exit

polynomial approximation for one-dimensional optimization


no

rp = γ rp

Figure 2.7: The Exterior Penalty Function method using method of Steepest Descent
for unconstrained optimization, and Golden Section method in combination with
is increased by multiplying with some factor γ, and the resulting pseudo-
objective is minimized again, starting from the previous optimum solution.
The new solution will still be infeasible but it will be closer to the true
optimum. This process is continued until a satisfactory result is obtained.
The solution is assessed by what is called a converger, a piece of logic that
compares subsequent solutions in order to determine whether convergence
is reached, i.e. the convergence criteria have been met.
This sequential process is illustrated in figure 2.8, using a two-dimensional
objective function and a single inequality constraint.
Notice that the optimum is approached from the infeasible region, hence
the name exterior penalty method. Interior methods approach the opti-
mum from inside the feasible region, which is often preferable because then
intermediate solutions can also be used.
The minimization of the pseudo-objective function is an unconstrained
minimization problem which can be approached using one of various well
known techniques. We will use one of the simplest of those in our EPF
algorithm: the method of steepest descent. The method of steepest descent
is a first-order unconstrained optimization method based on one-dimensional
search. It is treated here only because it is easy to explain, not because of its
performance (it has the worst performance of all the first-order methods).
As the name implies, the method of steepest descent uses the direction
of steepest descent as the search direction in the one-dimensional search.
The gradient represents the direction of steepest ascent, so the method of
steepest descent employs the negative gradient as the search direction:

Sq = −∇F (Xq−1 ) (2.13)


Given this search direction, the method of steepest descent performs a
one-dimensional minimization of Φ(Xq (α)) in order to find the optimum
step α∗ . This one dimensional minimization can be done with the help of
a combination of methods, for example using the Golden Section method,
which iteratively narrows the bounds on the minimum, followed by a poly-
nomial approximation, which allows easy calculation of the minimum. For
more detailed information on these methods please refer to [Vanderplaats,
2007].
The more advanced indirect methods that are used nowadays rely on
the use of Lagrange multipliers. These are treated next in the discussion of
existence of optima.

2.2.6 Lagrange Multipliers and the Existence of Optima


If a design solution is found using an optimization method, we would like to
know whether this is a local optimum or a global optimum. Here we define
some necessary conditions for an optimum to exist. This is done with the
help of the Lagrangian.

45
original constrained problem pseudo−objective with r = 0.01
p

pseudo−objective with r = 0.10 pseudo−objective with r = 0.30


p p

pseudo−objective with r = 0.60 pseudo−objective with r = 1.00


p p

Figure 2.8: Example of an unconstrained optimization sequence using the exterior


penalty function method. The first picture shows the contour lines of the original
objective together with an inequality constraint. The shaded area represents the in-
feasible region. The small circle represents the actual constrained optimum. The
other pictures show the contour lines of the pseudo-objective function. The small
cross represents the (unconstrained) optimum of the pseudo-objective. For small
values of the penalty parameter rp the pseudo-objective resembles the original objec-
tive. As rp increases the contour lines are molded around the constraint boundary.
Clearly the pseudo-optimum approaches the real optimum from outside the feasible
region. The density of the contour lines for high values of rp are an indication of
poor conditioning far from the optimum.

46
In the previous section, the pseudo-objective was introduced as a com-
bination of objective and constraints. The Lagrangian represents another
unconstrained formulation of the constrained optimization problem. The ob-
jective function and constraint functions are combined into a single function
using Lagrange multipliers (λ) as weighting parameters. The Lagrangian is
defined as [Vanderplaats, 2007]:
m
X l
X
L(X, λ) = F (X) + λj gj (X) + λm+k hk (X) (2.14)
j=1 k=1

The Lagrangian provides another means for optimization. For exam-


ple, in the Augmented Lagrange Multiplier (ALM) method the objective
function is augmented using penalty parameters as well as the Lagrange
multipliers, which leads to a significant improvement over SUMT that do
not use Lagrange multipliers, such as the exterior penalty function method
described earlier [Vanderplaats, 2007]. However, the ALM method will not
be discussed in detail here, because we are interested in another use of the
Lagrangian.
The Lagrangian allows us to define conditions for optimality in con-
strained optimization problems. These conditions are called the Kuhn-
Tucker necessary conditions 8 and they define a stationary point of the La-
grangian [Vanderplaats, 2007].

The Kuhn-Tucker conditions state that a vector X∗ defines an optimum


design if the following three conditions are satisfied

1. The solution must satisfy all the constraints:

X∗ is f easible

2. If an inequality constraint is inactive (i.e. gj (X∗ ) < 0, whereas for an


active constraint gj (X∗ ) = 0), the corresponding Lagrange multiplier
must be zero:

λj gj (X∗ ) = 0 f or j = 1, 2, · · · , m

3. The gradient of the Lagrangian must vanish in X∗ :


m
X l
X
∗ ∗
∇F (X ) + λj ∇gj (X ) + λm+k ∇hk (X∗ ) = 0 (2.15)
j=1 k=1

where λj ≥ 0 (inequality constraints)


and λm+k is unrestricted in sign (equality constraints)
8
Also known as Karush-Kuhn-Tucker or KKT conditions.

47
The first condition is rather obvious, any optimum solution must be fea-
sible.
The second condition may seem somewhat harder to understand. It is
important to remember the standard formulation of the optimization prob-
lem from section 2.2.1. This formulation dictates that an inactive constraint
has a negative (nonzero) value. In this light, we may interpret the second
condition as a requirement that the value of the Lagrangian is equal to the
value of the objective function. Again note that equality constraints must
always be active for a feasible solution, and hence always have zero value.
The third condition requires that, at the optimum, the gradient of the
Lagrangian must be equal to zero. This is equivalent to saying that the
function described by the combined active constraints (with multipliers) is
tangent to the objective function. If this were not the case, a move along the
constraint boundary would still improve the objective value. Alternatively
we may say that, at the optimum, the gradient of the objective can be
expressed as a linear combination of the active constraint gradients, with
the Lagrange multipliers as the coefficients. More formally, the gradient of
the objective is entirely contained in the subspace spanned by the gradients
of the constraints.
The latter interpretation may be clarified by figures 2.9, 2.10, and 2.11.
These figures show examples of a two dimensional constrained optimization
problem with linear inequality constraints. The active constraint gradients
at the optimum are depicted in figure 2.9. Figure 2.10 shows the vector
sum that is represented by the gradient of the Lagrangian for two active
constraints. Figure 2.11 shows the same thing for a problem with one con-
straint inactive at the optimum. From the latter it also becomes clear that
the Lagrange multiplier for the inactive constraint needs to be equal to zero.
As the name implies, the Kuhn-Tucker necessary conditions are necessary
for a solution to be a global optimum, but they are not sufficient. Only if
the design space is convex, the optimum will be global, otherwise the Kuhn-
Tucker conditions only guarantee local optimality.
Remember from the discussion on convexity that the design space is
convex if the objective as well as the constraints are convex functions. This
requires that their Hessian matrices be positive definite (i.e. the matrices
have all positive eigenvalues) on the entire domain. This is the multi-variable
equivalent to the requirement in single variable minimization that the second
derivative be strictly positive.
Unfortunately, in practical design applications it is hardly ever possible
to demonstrate design space convexity. Therefore the usual approach is to
start the optimization process from several different initial points and to see
whether they all converge to the same final design.

48
∇F

2
x

∇g1

∇g
2

x
1

Figure 2.9: A two dimensional problem with two active linear inequality constraints
at the optimum. The vector sum of the constraint gradients and the objective gra-
dient is not equal to zero. The shaded area represents the infeasible region.

∇F
2
x

λ2∇g2

λ1∇g1

x
1

Figure 2.10: A two dimensional problem with two active linear inequality constraints
at the optimum. The constraint gradient vectors are scaled using the Lagrange
multipliers, so that the vector sum of the constraints exactly opposes the gradient
of the objective function. Thus, the Lagrangian vanishes at the optimum.

49
∇F

2
x

λ2∇g2

λ1∇g1

x
1

Figure 2.11: A two dimensional problem with one active linear inequality constraint
at the optimum. The Lagrange multiplier for the inactive constraint (λ2 ) must be
equal to zero, at the optimum, for the Lagrangian to vanish.

The global optimization problem can also be tackled using zero-order meth-
ods.

2.2.7 Global Optimization and Approximate Methods


Non-convex problems generally have multiple local minima (and maxima), as
illustrated in figure 2.12. Without special care, gradient based optimization
algorithms are likely to get trapped at such local optima. Global optimization
methods, on the other hand, are intended to overcome this problem and find
the global optimum (as the name suggests).
Local gradient based optimization methods can be extended with global
search capabilities using techniques such as, e.g. tabu search [Glover, 1989].
Basically, tabu search techniques temporarily exclude certain areas from
the design space. A tabu-list is maintained which contains forbidden search
directions. The gradient based optimizer is only allowed to search in di-
rections that are not in the tabu-list [Bartholomew-Biggs et al., 2003]. For
example, the list could contain a fixed number of the most recently visited
search locations, making sure that these are not revisited. Many other tabu
strategies are available, as described by e.g. [Glover, 1989], [Glover, 1990],
[Glover et al., 1993].
Zero-order methods such as genetic algorithms or simulated annealing
are naturally able to search for a global optimum, due to their random
element which allows them to jump out of any local optimum. One of the

50
Figure 2.12: An example of a non-convex problem: Langermann’s function. This
and other test functions for global optimization can be found in literature, e.g. in
[Molga and Smutnicki, 2005] (a somewhat obscure document which nevertheless
provides a clear and practical overview of test functions).

great disadvantages of zero-order methods, however, is the large number


of function evaluations required. This makes them ill-suited for use with
computationally expensive models (e.g. large FE or CFD models).
However, it is possible to replace the computationally expensive model
with a much simpler one that tries to approximate its response. Such a
simple replacement model is called a response surface approximation (RSA)
model, approximate model, surrogate model or meta-model [Queipo et al.,
2005]. A response surface approximation model requires only a very small
fraction of the computational time of the original model, which makes it
suitable for use in a zero-order optimization method.
Note that it is also possible to use response surface approximations to
speed up gradient based optimization routines.
A response surface approximation model is constructed using results
obtained from the original model. This construction is nothing more than
a multi-dimensional curve fit on the available data. The method used for
fitting this curve, surface, or hyper-surface, determines to a great extent the
obtainable accuracy of the fit.
A fit is made with the help of a predictor function, and many different
types of predictors are available. Predictor functions may be either interpo-
lating or non-interpolating. The difference is illustrated in figure 2.13.
Interpolating functions have the advantage that the fit generally becomes

51
F(X)

observation
third−degree polynomial fit
cubic spline interpolation
X

Figure 2.13: The difference between an interpolating fit and a non-interpolating fit.

better as the number of samples (data points) increases. This is because the
surface passes through all the sample points, so the estimation error at the
sample points equals zero.
Non-interpolating functions have the advantage that they are less sensi-
tive to noise in the data, but the fit can only be as good as allowed by the
shape function that is used. The addition of data points does not necessarily
improve the fit. For example, a second-degree polynomial function cannot
accurately fit a third-degree polynomial at all points.
For data obtained from numerical simulations the amount of noise is
usually small, which implies that interpolating methods are the best choice
[Jones, 2001].
In the following example, let us assume we are dealing with a com-
putationally expensive CFD model (e.g. requiring one day per function
evaluation). Obviously a zero-order optimization method requiring several
hundreds or thousands of function evaluations cannot be used to find an
optimum, because of the amount of computing time that would be required.
Therefore a response surface approximation of the original CFD model is
constructed. This will allow the use of a zero order method. An illustra-
tion of the optimization process using response surface approximations is
depicted in figure 2.14.
In order to construct a response surface approximation of the CFD
model, response data are required. These data are generated by evaluat-
ing the CFD model at a number of sample points, which yields a number of
observations. Because the CFD evaluations are very expensive, it is impor-

52
Start

D esign of Experim ents

N um erical sim ulation (s) at


new sam ple location (s )

C onstruction of R esponse
n= n+1 Surface Approxim ation
(R SA)

Optim ization using R SA


m odel

No

C onverged?

Yes

Exit

Figure 2.14: Optimization using a response surface approximation of a computa-


tionally expensive numerical model. A one dimensional example is shown for each
process. The dotted line in the examples represents the true response of the numer-
ical model, which is unknown except at the sample locations.

53
2
x

x
1

Figure 2.15: A two-dimensional Latin Hypercube Sample of size n=8 (i.e. 8 points
in the two-dimensional design space). For each of the two variables, the domain is
divided into n intervals of equal probability (in this case for a uniform distribution)
represented by the horizontal and vertical lines, and a single value is randomly
selected from each interval. Then the variables are shuffled (called pairing) in order
to obtain a better spread of samples. Pairing may be random as in this example, or
it may be according to some scheme that ensures higher uniformity.

tant to choose the sample points wisely. Therefore a design-of-experiments


(DOE) approach is usually adopted in order to produce an efficient initial
data set. Design-of-experiments aims to maximize the amount of informa-
tion that is obtained from a minimum number of sample points [Queipo
et al., 2005], [Swiler et al., 2006].
Examples of sampling strategies from design-of-experiments are Monte
Carlo sampling and Latin hypercube sampling (LHS) [McKay et al., 1979].
A Latin hypercube is an extension of the Latin square to more than two
dimensions. In the statistical sampling context, a Latin square is a square
grid with only one sample in each row and each column, as depicted in figure
2.15.
In order to obtain a Latin hypercube sample of size n, i.e. n sample
points, regardless of the number of variables d, the domain of each variable
is subdivided into n intervals of equal probability (corresponding to n rows or
columns). A sample is taken randomly from each interval for each variable.
The samples for all variables are then combined according to some shuffling
scheme (e.g. at random). This yields n samples of dimension d, which
provide a reasonable estimate of the distribution of the objective function
values [Wyss and Jorgensen, 1998].

54
Once a number of observations have been obtained, it becomes possible
to construct a response surface approximation based on these points. A
powerful interpolating predictor that can be used here is the Kriging pre-
dictor.
Kriging is based on a statistical approach, using the mean and variance
of the available data together with the correlation between sample points
to predict the function value at locations that have not yet been sampled.
Kriging also provides a measure for estimating the error in the prediction.
This enables the selection of new sample points at locations where there is
high uncertainty about the predicted function value. Detailed descriptions
of the method can be found, for example, in [Jones, 2001], [Queipo et al.,
2005], [Simpson and Mistree, 2001], or [Swiler et al., 2006]. A disadvantage
of Kriging is that the predictor may become ill defined if the distance be-
tween sample points becomes very small, which may occur as the number
of samples increases.
After constructing the response surface it is used as a surrogate model in
the optimization routine. The (global) optimizer searches for the optimum
of the response surface model. The resulting optimum will be close to the
real optimum only if the response surface provides a good fit to the real
response. If this is not the case, the CFD model is evaluated at the new
‘optimum’ design point. This yields a new observation which is then used
to update the response surface approximation, and the process is repeated
until convergence.
Apart from the global optimization problem, there are other types of
problems that cannot be tackled by gradient based methods without special
precautions. Discrete variable optimization is one example.

2.2.8 Discrete Variable Optimization


Many real design problems contain discrete variables. For example, the
number of stiffeners in a wing or fuselage, or the number of rivets used
to connect them, these are discrete variables. Also, a design containing
commercial-off-the-shelf products is a discrete design problem (e.g. standard
beam sizes).
One way to solve a discrete variable optimization problem is to solve
the problem first in continuous space and then to round the solution to
the nearest allowable discrete or integer values [Vanderplaats, 2007]. The
solution, however, is often suboptimal or even infeasible.
Specific methods are available for discrete variable optimization. Zero
order methods like genetic algorithms are able to handle discrete variables,
but many discrete programming methods are variations on the classic branch
and bound method (e.g. [Clausen, 1999], [Vanderplaats, 2007], [Venkatara-
man, 2002]).
Branch and bound is a specific form of graph search method. Graph

55
search methods are used for routing problems, which occur in aerospace in
the form of wire routing or pipe routing, for example. Examples of these
methods are discussed in appendix 2.C, but here we will just focus on classic
branch-and-bound.
Branch and bound methods can be very expensive, requiring a multitude
of nonlinear optimization tasks to be performed, but they can be applied to
all kinds of discrete variable or mixed variable problems.
A branch and bound solution procedure may be visualized with the help
of a solution tree, such as the one depicted in figure 2.16. Such a tree consists
of nodes and branches.
The nodes represent states; optimum solutions to a continuous relaxation
of the problem. Continuous relaxation implies that the discrete variables are
treated as if they were continuous, which enables the use of a gradient based
optimizer.
At each node, one of the variables is fixed at an allowable discrete value,
after which a continuous relaxation of the problem is optimized with respect
to the remaining variables.
At node 1 the optimization needs to be initialized, so all of the variables
are treated as continuous, there is no fixed variable. The result of this
continuous optimization is used to determine the fixed variables for nodes 2
and 3, and so on. Nodes on the same branching level or tier, like nodes 2
and 3 in figure 2.16, have the same fixed variable.
The branches represent relationships between nodes and each branch
corresponds to a different allowable discrete value for the fixed variable at
that tier. For example, the branch from node 1 to node 2 in figure 2.16
corresponds to the fixed variable X1 = l1 .
The branching strategy determines how many branches are generated
(‘grown’) at each node. It is possible to have as many branches as there are
allowable discrete values for the variable under consideration. On the other
hand it is also possible to have only two branches per node, as depicted in
the figure.
In the latter case, the branches correspond to the nearest allowable dis-
crete neighbors of the continuous optimum value for the fixed variable under
consideration. For example, at the first tier X1 is the fixed variable. Let’s
say that the allowable discrete values for X1 are [0.12, 0.23, 0.45, 0.49], and
the continuous optimum value for X1 (from node 1) is 0.3401. The continu-
ous optimum value is then in the interval [0.23, 0.45], hence these values are
used as the bounds for the two branches: l1 = 0.23 and u1 = 0.45.
Numerous branch and bound strategies exist, varying in the number
of branches per node and in the order in which the nodes are evaluated
(traversal order). Three popular traversal strategies are breadth-first, depth-
first, and best-first. Breadth-first and depth-first are depicted in figure 2.17.
The tree from figure 2.16 is an example of a two-branch, best-first strat-
egy. This example is now discussed in more detail, using a step by step

56
1
x1 ≤ l1 x1 ≥ u1

2 3
x1 ≤ l1 x1 ≤ l1 x1 ≥ u1 x1 ≥ u1
x2 ≤ l2 x2 ≥ u2 x2 ≤ l2 x2 ≥ u2

4 5 8 9
x1 ≤ l1 x1 ≤ l1
x2 ≥ u2 x2 ≥ u2
x3 ≤ l3 x3 ≥ u3

6 7
x1 ≤ l1* x1 ≥ u1*
x2 ≥ u2 x2 ≥ u2
x3 ≤ l3 x3 ≤ l3

10 11
Figure 2.16: Example of a branch and bound solution tree with two branches and
‘best-node-first’ selection criterion. Each node represents a state, i.e. a solution
to a continuous relaxation of the discrete problem. The node numbers indicate the
traversal order. At each level of branching another variable is fixed.

57
1 1

2 3 2 9

4 5 6 7 3 4 10 11

8 9 5 8

10 11 6 7

Figure 2.17: Alternative traversal order in branch-and-bound trees: breadth-first


(left-hand-side) and depth-first (right-hand-side). Although these example trees are
both the same shape, this is not necessarily the case when different traversal strate-
gies are used. The number of branches per node depends on the branching strategy.

approach.

Branch and bound works by sequentially optimizing a continuous relax-


ation of the discrete variable problem. Assume we have a problem in three
discrete variables: X1 to X3 . First a lower bound for the objective value
is found by optimizing the problem with all variables treated as continuous
(i.e. continuous relaxation). Such an optimization may be performed with
any of the optimization methods discussed before. The continuous optimum
solution constitutes node 1 of the branch and bound tree.
From node one we start to construct the first tier. The first variable,
X1 , is chosen as the fixed variable. Assume that the nearest discrete bounds
on the continuous optimum value for X1 are l1 and u1 . Then the branch
on the left-hand-side will use l1 as an upper bound for X1 , and the branch
on the right-hand-side will use u1 as a lower bound on X1 . Effectively this
splits the design space into two subspaces based on X1 .
Now the nodes need to be evaluated. This is again achieved by optimizing
a continuous relaxation of the problem for each node. For the second node,
we set X1 = l1 , and the remaining variables are treated as continuous and are
free to change. For the third node we do the same, but now with X1 = u1 .
The two continuous optima constitute the new nodes. This completes the
first tier.
Now we need to determine from which node we continue the branching
procedure. First we need to check if one of the nodes can be pruned, i.e.
terminated. A node can be pruned if the continuous relaxation problem has
no feasible solution or if it represents a discrete solution. In the latter case,
we have found a possible solution to the problem. If this solution is better
than the previous solution (if there is one) it becomes the new incumbent
solution, i.e. the best discrete solution. A node that has been fully expanded
to the end has been fathomed.

58
Let’s assume that neither one of the nodes can be pruned, so we need to
select one of the two nodes to continue branching. In this example we use
the best-first strategy, which implies that we select the node with the best
objective function value (obtained from the continuous relaxation problem).
Let’s say that node 2 has the best objective function.
In the second tier, X2 becomes the fixed variable. The nearest discrete
bounds on the continuous optimum value for X2 (from node 2) are l2 and u2 .
These values are used to construct the two branches at node 2. However,
note that the upper bound on X1 still remains. The bounds on the free
variables are inherited by lower branches.
Now the new nodes (4 and 5) are evaluated by optimizing the continuous
relaxation of the problem with X2 = l2 and X2 = u2 respectively, both
subject to the constraint that X1 ≤ l1 . Node 4 turns out to be infeasible,
so this branch is pruned.
Again the new best node is selected, taking into account all nodes that
have not been fathomed yet (so 3 and 5). Now it appears that node five is
the most promising. Thus we start the third tier, where X3 is fixed. The
bounds are again inherited from the parent node, so that now X1 ≤ l1 and
X2 ≥ u 2 .
After optimization we find that node 7 is infeasible, and node 3 has a
better optimum than node 6, so we continue branching from there. Note
that we return to the second tier where X2 is the fixed variable and now X1
is bounded on the lower side by u1 .
Let’s say node 8 turns out to be infeasible, and the continuous optimum
at node 9 turns out to be a discrete solution. This represents the first
incumbent solution. However, the (continuous) objective value at node 6 is
lower than that for the incumbent, so it is still possible that something can
be gained there.
At the fourth tier we return to X1 as the fixed variable. The inherited
bound on X1 is now removed and new bounds are established. Bounds on
the other two variables are X2 ≥ u2 and X3 ≤ l3 . After optimization we
find another discrete solution at node 10.
The continuous optimum at node 11 has a worse value than the current
incumbent, so this branch can be pruned because it can not offer any more
improvement. The new discrete solution indeed has a better optimum value
than the current incumbent solution (node 9), so finally node 10 represents
the optimum solution to the discrete variable optimization problem.
This concludes the introduction to optimization methods. One very im-
portant question remains, however: How do we choose between the different
optimization methods?

59
2.2.9 Selection of an Optimization Method
In selecting an optimization method, one rule is of paramount importance:
Any information about the nature of the problem should be taken into ac-
count in the selection of an optimization strategy.
No matter how good a method may be, if it is applied to a problem it
is not suited for, it may even perform worse than a random search. There
is no single optimization algorithm that performs equally well on all possi-
ble problems, in other words, there is no such thing as a general-purpose
universal optimization strategy.
These statements are derived from the No-Free-Lunch theorem of op-
timization. Another way of stating this theorem is that “there can be
no search algorithm that outperforms all others on all problems” [Ho and
Pepyne, 2002]. The essence of the theorem is not that all algorithms are
equally good, but rather that an algorithm cannot be expected to perform
better than any other if you do not take into account the nature of the
problem you are trying to solve. Although this theorem may seem of little
practical use, it provides a very clear warning: If you just choose an algo-
rithm blindly (or based on the wrong premises), chances are it will perform
even worse than random search.
When comparing optimization methods based on their performance on a
specific example problem, it is important to be careful. Such a performance
comparison cannot always be extrapolated to other kinds of problems. No-
free-lunch tells us that such a comparison can only provide certainty if the
problem you are trying to solve is similar to the example problem [Ho and
Pepyne, 2002].
A choice in which the nature of the problem is of obvious importance
is that between gradient based methods and non-gradient based methods.
Remember that even the best non-gradient based methods are one or two
orders of magnitude more expensive, computationally, than gradient based
methods. Therefore, if gradient based methods can be used, use them.
Information available about a problem will usually lead to a number
of methods that can be used. In many cases the only way to know which
specific optimization method is best for a given problem is by trial and
error. A useful discussion on selection of optimization methods is provided
by [Keane and Nair, 2005] (Chapter 16). In most cases, selection of an
optimization strategy is an optimization problem in itself [Miettinen, 1999].
This becomes even more apparent when looking at problems with mul-
tiple objectives.

60
2.3 Optimization for Multiple Objectives
2.3.1 Why Multiobjective Optimization is More Difficult than
Single Objective Optimization
In the previous section we dealt with single objective optimization. How-
ever, in real-life engineering problems there are often multiple aspects that
influence the desirability of the final solution. For example, an aircraft needs
to be lightweight, but it also needs to be as cheap as possible. Often (not
always!) a lighter structure will be more expensive to manufacture and to
maintain. In those cases lightness and low cost are conflicting requirements.
If an optimization problem consists of multiple objectives, it is called
(not surprisingly) a multiobjective optimization (MOO) problem. In mul-
tiobjective optimization, we aim to minimize all the objective functions si-
multaneously. The reason this is more complicated than single objective
optimization is that the objectives are often conflicting, as described in the
example above.
If there were no conflicts between objectives, then every objective could
be optimized independently and there would be no need for special multiob-
jective methods. Hence, in this text, a multiobjective optimization problem
has conflicting objectives by definition (although not all objectives need to
be conflicting).
Because of the conflicting nature of multiobjective optimization prob-
lems, no single solution exists that is optimal with respect to every objective
function [Miettinen, 1999]. This characteristic of MOO problems is reflected
in the concept of Pareto optimality, which is discussed in Section 2.3.3. But
first let’s have a look at a general representation of the multiobjective opti-
mization problem.

2.3.2 The Multiobjective Optimization Problem


The multiobjective optimization problem can be stated as follows [Mietti-
nen, 1999], similar to the single objective problem:

The nonlinear constrained multiobjective optimization problem


Minimize:
F(X) (2.16)
Subject to:
X∈S (2.17)

The design vectors X belong to the set S which represents the feasible re-
gion as defined by the constraints. The constraints are not explicitly defined
here, in order to keep things simple. Notice that F is a vector of k objective
functions in the objective space.

61
Multiobjective optimization deals with the objective space rather than
the design space, because it is much easier to compare objective values than
to compare design vectors. This will become clear in the following section
on Pareto optimality. Furthermore, the objective space is usually of lower
dimension than the design space (i.e. k < n). The image of the feasible
region, Z = F(S), is a subset of the objective space and is called the feasible
objective region.
MOO problems can be linear (MOLP) or nonlinear (MONLP), similar to
single objective problems. A multiobjective optimization problem is convex
if all objective functions and the feasible region (the constraint functions)
are convex [Miettinen, 1999], just like in single objective optimization.

2.3.3 Pareto Optimality


Due to the fact that at least some of the objectives are conflicting, no single
optimum solution can exist. Instead multiobjective problems are associated
with a different notion of optimality: Pareto optimality.
An objective vector is Pareto optimal 9 if none of the components (i.e.
individual objective values) can be improved without deterioration to at
least one of the other components [Miettinen, 1999].
If an objective vector is Pareto optimal then the corresponding design
vector is also Pareto optimal. Usually many Pareto optimal vectors exist
(infinitely many), together forming a Pareto optimal set. The vectors in the
Pareto optimal set define a Pareto front. Perhaps an example can clarify
the concept of Pareto optimality.
Figure 2.18 shows several designs in the feasible objective space of a
space-telescope design problem in which lifecycle cost and performance are
to be optimized. Obviously the best solution would be in the lower right-
hand side of the figure where high performance is combined with low cost.
The figure shows that designs 1, 2, and 3 are closest to the lower right-hand
side, whereas designs 4 and 5 are farthest away. We say that designs 4 and 5
are suboptimal because they are dominated by 1, 2, and 3 (i.e. some of the
components of the objective vector can still be improved without deteriora-
tion to the other components). However, there is no obvious ‘best’ solution
among the optimal designs: Design 3 is best in terms of performance, but
at high lifecycle cost, whereas design 2 is cheaper, but at the expense of
performance. The same is true for design 1. Apparently, if one objective
value is improved the other will deteriorate. Therefore designs 1, 2 and 3
are part of the Pareto-optimal set.
All Pareto optimal solutions are equally good in a mathematical sense.
Nevertheless, a single design will have to be selected. Apparently additional
information is needed which is not contained in the objectives. This is why
9
Also called Edgeworth-Pareto optimal, non-inferior, non-dominant, or efficient.

62
Figure 2.18: Example of a space telescope design comparison. Source: [Jilla, 2002]

multiobjective optimization problems require the intervention of a decision


maker. The decision maker will be discussed in the next section, but first
we will establish some conditions for Pareto optimal solutions to exist.
The Kuhn-Tucker conditions for optimality can be extended to form
necessary conditions for Pareto optimality. This is achieved with the help of
another set of multipliers µ [Miettinen, 1999] which allow the summation of
all the individual quasi-objectives to form a new single objective function.
The method of converting multiple objectives into a single objective function
(scalarizing function), or a family of independent single objective functions,
is called scalarization.
A design vector X∗ is Pareto optimal if there exist vectors of multipliers
λ and µ, with (λ, µ) 6= (0, 0) and λ 6= 0, so that the following conditions
are met:

63
Kuhn-Tucker necessary conditions for Pareto optimality (using only inequal-
ity constraints)

1. The design vector X∗ satisfies all the constraints

X∗ is f easible

2. If an inequality constraint is inactive (i.e. gj (X∗ ) < 0), the corre-


sponding Lagrange multiplier must be zero:

λj gj (X∗ ) = 0 f or j = 1, 2, · · · , m

3. The gradient of the scalarizing function must vanish in X∗ :


k
X m
X
µi ∇Fi (X∗ ) + λj ∇gj (X∗ ) = 0 (2.18)
i=1 j=1

This statement of the Kuhn-Tucker conditions is not entirely complete, in


a strict mathematical sense, but it serves its purpose of demonstrating the
similarity with the single objective optimality conditions. Furthermore, note
that the conditions again become sufficient if the objectives and constraints
are convex.
In a more practical sense, tests are available to determine whether spe-
cific multiobjective optimization methods yield Pareto optimal solutions.
Please refer to the description of a specific method for more information on
this subject.
Once a Pareto optimal set of solutions is obtained, the problem of choos-
ing between equally good solutions still remains, thus a decision is required.

2.3.4 Decision Making


A move from one Pareto optimal solution to another, by definition, involves
a trade-off between objectives. This is because every point in the Pareto op-
timal set is equally good in a mathematical sense. In order to select a single
design from this set, a compromise needs to be made. Therefore information
is required that is not contained in the objectives10 . This information can
be provided by a decision maker.
A decision maker is a person (or a group of people) who is supposed to
have better insight into the problem and who can express preference relations
between different solutions [Miettinen, 1999]. In some MOO methods a
decision maker is asked to define a reference point, which is a vector of
10
Sometimes trade-offs can be avoided by reformulating the problem. This is a way of
including the required information in the objectives.

64
aspiration levels, i.e. values for each objective function in the objective
vector that he thinks are desirable (or satisfactory). A design solution that
meets all the aspiration levels is called a satisficing solution.
Alternatively, a decision maker is often assumed to act on the basis of
an underlying (implicit) function called a value function, which represents
his/her preferences. If this value function could be expressed mathemati-
cally it would provide an ideal selection tool because it would reduce the
multiobjective problem to a single objective problem. Unfortunately it is
rarely possible to express the value function in mathematical terms, due to
its subjective nature [Hazelrigg, 1997]. But even if this were possible the
function would probably be too complex to handle. Therefore the value
function (if it exists) is often assumed to be known only implicitly.
As mentioned, trade-offs can be used as a tool in the decision making
process. A trade-off reflects the ratio of change in objective function values
when moving from one design vector to the other. The (partial) trade-off
between objective functions Fi and Fj for a move from design vector X1 to
X2 is defined by [Miettinen, 1999]:

Fi (X1 ) − Fi (X2 )
Λij = (2.19)
Fj (X1 ) − Fj (X2 )

A related concept is that of the trade-off rate:

∂Fi (X∗ )
λij = (2.20)
∂Fj

Whereas trade-offs or trade-off rates provide a mathematical approach,


it is also possible to consider the opinions of the decision maker in subjective
terms using marginal rates of substitution. If two solutions are equally desir-
able to a decision maker, they are said to be on the same indifference curve.
A marginal rate of substitution represents the increase in objective Fj that
a decision maker is willing to tolerate for a certain decrease in objective Fi
while the preferences of the two solutions remain the same. The marginal
rate of substitution is also known as the indifference trade-off.
Trade-off rates and marginal rates of substitution are used in some MOO
methods. In order to clarify the general multiobjective optimization process,
we will now briefly discuss a selection of MOO methods.

2.3.5 Brief Overview of MOO Methods


The simplest methods for multiobjective optimization do not take into ac-
count the opinion of the decision maker during the optimization process.
For example in compromise programming, the analyst (the one running the
optimization) chooses some ideal reference point in objective space, and the
distance between this reference point and the feasible objective region is

65
minimized. Depending on the way this is done, the solution can be guar-
anteed to be Pareto optimal. After the optimization, the solution is offered
to the decision maker, who can then decide whether to keep it or discard
it. Basic methods which rely more on the decision maker are the weighting
method and the ε-constraint method.
The weighting method represents a straightforward way of scalarizing the
multiobjective problem. The objective functions are combined in a weighted
sum which is then minimized using a single objective technique. The weight-
ing coefficients corresponding to the individual objectives should reflect the
value function of the decision maker. The solution to a weighting problem
can be shown to be Pareto optimal if the weighting coefficients are all pos-
itive or if the solution is unique. Some authors advise against the use of
the weighting method because weight allocation heavily influences the final
solution, which leads to unpredictable results [Vanderplaats, 2007].
In the ε-constraint method, one of the objectives is selected to be opti-
mized, and the others are converted into constraints by setting upper bounds
(εj ) to each of them. By changing the objective to be optimized and by
varying the upper bounds, theoretically, all Pareto optimal solutions can
be obtained. Hybrid methods, combining the weighting method with the
ε-constraint method also exist.
Lexicographic ordering and Goal programming are examples of meth-
ods in which the decision maker is asked to express his/her expectations
beforehand.
In Lexicographic ordering, the decision maker has to order the individual
objectives according to their absolute importance. The most important ob-
jective is then optimized, subject to the original constraints. If this problem
has a unique solution, this is selected as the final solution to the problem.
If there is no unique solution, the second most important objective is op-
timized, subject to the original constraints and an extra constraint which
ensures that the first objective remains at its optimum value. If this prob-
lem has a unique solution this is selected as the final solution, if not, the
process is continued for the next most important objective, and so on and
so forth. This is a robust and simple method that always yields a Pareto
optimal solution. Lexicographic ordering may be used as a part of the goal
programming method.
In the generalized goal programming method a decision maker specifies
aspiration levels for the objectives, and any deviations from these aspiration
levels are minimized. An objective function combined with an aspiration
level forms a (flexible) goal. This can be interpreted as a constraint which
is not strictly imposed, but for which the constraint violation is minimized.
The original constraints, the ones that define the feasible region, are referred
to as rigid goals in this context. The goal deviations are represented by
deviational variables. As there is a deviational variable for each objective,
the introduction of goals only leads to a reformulation of the multiobjective

66
problem: Instead of having to minimize the objectives, we now have to
minimize the goal deviations. This new multiobjective problem can be solved
using techniques such as the weighting method or Lexicographic ordering.
Goal programming is widely used in practical applications. The selection
of aspiration levels determines whether the resulting solutions are Pareto
optimal.
Interactive methods, in which the decision maker is intimately involved
in the optimization process, are the most developed methods and usually
yield the best results. Numerous interactive methods exist, based on various
approaches, but their detailed description is beyond the scope of this text.
One of the most well-known interactive methods is the Geoffrion-Dyer-
Feinberg (GDF) method [Miettinen, 1999]. This method is based on max-
imization of the implicitly known value function. At each iteration in the
GDF approach, a local approximation to the value function is generated
and maximized. The approximation to the (implicit!) value function is
found by using the marginal rates of substitution specified by the decision
maker to approximate the gradient of the value function. The approximated
value function is then maximized using a gradient based method. The big
drawback of this method is the difficulty the decision maker may have in
determining the marginal rates of substitution in each iteration (because the
value function is only implicitly known).
Now that the basics of single objective optimization and multiobjective
optimization have been covered, we can proceed with the subject of multi-
disciplinary design optimization.

2.4 Multidisciplinary Design Optimization


2.4.1 Multidisciplinary Design combined with Optimization
Multidisciplinary Design Optimization (MDO) is related to Systems Engi-
neering [van Hinte and van Tooren, 2008] as much as it is to optimization,
or perhaps even more. As a result, it is not as clearly defined as the opti-
mization problems that were discussed in the previous sections. Instead of
trying to give a comprehensive description of MDO—if such a thing is even
possible—we will focus on some of the most prominent aspects of MDO that
appear in literature. So let’s start by examining some definitions of MDO.
According to [Kroo, 1997], the goal of MDO is ‘to provide a more formal-
ized method for complex system design than is found in traditional design
approaches’.
Two other, very similar, definitions are due to [Alexandrov, 2005], who
refers to MDO as ‘an area of research concerned with developing systematic
approaches to the design of complex engineering systems governed by inter-
acting physical phenomena’, and [de Weck and Willcox, 2004], who define
MDO as ‘a methodology for the design of complex engineering systems and

67
subsystems that coherently exploits the synergism of mutually interacting
phenomena’.
Both definitions indicate a systematic approach (i.e. methodology) to the
design of complex engineering systems with mutually interacting phenomena.
The second definition specifically refers to the synergetic effects 11 that may
result from the interactions between subsystems.
Traditional multidisciplinary design (MD) relies heavily on engineering
experience and heuristic procedures (common sense). In most cases however,
the multidisciplinary design problem is of such scale and complexity that it
is very hard even to meet the design requirements, let alone actively exploit
the synergetic potential that is present in the complex system.
In order to unlock this synergetic potential, numerical optimization tech-
niques can be used. MDO in general tries to formalize the multidisciplinary
design problem so as to make it amenable to solution by numerical opti-
mization techniques. This formalization of the MD process requires careful
analysis and knowledge of the technical aspects as well as the organizational
aspects of the system. For systems such as aircraft, with their immense
complexity, this is quite a challenge indeed, as is clearly explained by [Kroo,
1997].
There are many examples of applications of MDO in industry, but gen-
erally these concern only a small part of the total design problem (e.g.
[Gilmore et al., 2002], [de Weck et al., 2007]). MDO has not yet reached a
state of maturity that would allow it to be used to address the entire design
process. Moreover, it should be noted that MDO is not intended to provide
fully automatic design capability. MDO is a tool that can help guide the
design process in the right direction by allowing the engineers and managers
to focus on the creative aspects of design.
A key aspect of MDO is the decentralization of the multidisciplinary
design process, which decreases the reliance on a single leader to drive the
design. In order to achieve such decentralization, the complex design process
needs to be partitioned or decomposed into smaller subprocesses that can
operate separately as much as possible. Once a proper decomposition has
been found, some form of coordination needs to be imposed in order to guide
the MD process towards its goal.
This may sound pretty vague, so let us try to clarify things by having a
closer look at the subjects of complexity, decomposition, and coordination.

2.4.2 Complex Systems and Coupling


First, let us see what exactly defines a complex system. Complex systems
are characterized by the fact that it is difficult, if not impossible, for one
person to understand all the details of its subsystems and all the interactions
11
The concept of synergy is related to holism and reductionism. Some interesting views
on these subjects are found in the Stanford Encyclopedia of Philosophy [Healey, 2008]

68
between those subsystems. This complexity may be a result of large scale
(e.g. many subsystems, very large numbers of variables), but also of inter-
actions between subsystems [Allison, 2004]. A system that exhibits such
interactions is said to be coupled.
In the context of computing science, coupling between subsystems (also
modules, disciplines, processes, and functions) is defined as the measure of
strength-of-association established by a connection from one subsystem to
another [Stevens et al., 1974]. This coupling may be loose (also weak and
low ) or tight (also strong and high). Loosely coupled systems have minimal
interdependence between individual subsystems.
The strength of a coupling can depend on various aspects of the in-
formation that is interchanged between subsystems. For example, a large
amount of information flowing from one subsystem to the other may con-
stitute a strong coupling (quantity), but a single piece of very important
information may have a similar effect (quality). Conversely, if a subsystem
is only slightly influenced by a large change in the information it obtains
from another subsystem (sensitivity), this coupling may be characterized as
weak.
In addition to strength, a coupling also has direction. The direction
of a coupling is indicated using the terms feed-forward and feedback. The
presence of feed-forward between two subsystems implies that they have
to operate sequentially, i.e. one subsystem has to finish before the other
can start. The presence of both feed-forward and feedback implies that
the subsystems have to operate iteratively, e.g. subsystem A depends on
subsystem B for input, but B also depends on A for input, thus B has
to wait for A to finish, but when B is finished A has to start again, and
so on and so forth, until some measure of convergence is satisfied. Iterative
processes are especially difficult to cope with, and they have to be initialized
using some kind of initial guess.
It is possible to characterize systems according the presence of feed-
forward and feedback relations. In our context, a coupled system has both
types of relations, whereas a decoupled system contains only feed-forward
relations. This type of system can operate much faster than a coupled one,
due to the absence of iterative loops. The subsystems can operate separately
but not simultaneously, because of the feed-forward relations [Kusiak and
Wang, 1993]. An uncoupled system is one that has neither feed-forward nor
feedback. The subsystems are fully independent and can operate separately
and simultaneously. This represents the ideal case, because it leads to the
greatest reduction in throughput time.
Coupled systems may also be classified according to their coupling ar-
chitecture. In a non-hierarchic system there are no restrictions on the way
the subsystems are coupled, whereas in an hierarchic system there exists
a natural information hierarchy (information is transferred between ‘par-
ent’ subsystems and ‘children’ subsystems, but there is no coupling between

69
subsystem subsystem subsystem

subsystem subsystem subsystem subsystem subsystem subsystem

subsystem subsystem

non-hierarchic hierarchic hybrid

Figure 2.19: A non-hierarchic system, a hierarchic system, and a hybrid system

children subsystems) [Wagner and Papalambros, 1993]. In reality, complex


systems are usually hybrid systems, as may be clarified by Figure 2.19.
System hierarchy can often be exploited in order to obtain a more effi-
cient (design) process. For example, the two children subsystems in figure
2.19 can operate in parallel, which can save time. This is not possible for
the subsystems in the non-hierarchic system.
By analyzing the coupling in a complex system, it may be possible to
find a way to rearrange subsystems so as to remove feedback and to achieve
a more hierarchic system structure. Moreover, coupling relations may be
identified that, when removed, allow for a considerable simplification of the
system structure. This is where system analysis and decomposition tools
come into play.

2.4.3 System Analysis and Decomposition


In order to get a firm grip on the design task, a complex system may be
subdivided into smaller subsystems, which in turn may be subdivided again,
and so on and so forth until a manageable level is reached. Design teams
can then be assigned to the resulting subsystem design tasks.
Once the design tasks are known, it becomes possible to rearrange them
so as to find a way to reduce feedback. This is called partitioning. After re-
arranging subsystems, it is often possible to simplify the system even further
by tearing (i.e. breaking) relations as described by [Steward, 1965].
A relation between two subsystems implies that output from one sub-
system is used directly as input for the other subsystem. By tearing the
relation, this direct link is broken. Instead, the output from the one subsys-
tem and the input to the other subsystem are now treated as if they were
independent. However, in order to maintain system consistency, equality of
output and input has to be enforced in some other way (indirectly). This is
discussed in the next section.
The removal of subsystem interdependence in order to simplify the prob-
lem, by means of partitioning and tearing, is referred to as decomposition

70




  
  

 


 
 


   

 
(a) Object-based (b) Aspect-based

Figure 2.20: Two examples of partial subdivisions (partitions) of an aircraft: object-


based and aspect-based. Again, combinations of the two occur as well (source:
[Tosserams, 2008]).

[Wagner and Papalambros, 1993], also [Bloebaum, 1991], [Kusiak and Wang,
1993].
There are different ways to subdivide a system. For example, a system
can be divided into subsystems, modules, and components (object-based), or
into disciplines according to physical aspects (aspect-based), as illustrated
in figure 2.20 [Tosserams, 2008]. Often combinations of the two are used.
For example, in the aircraft industry, design teams can be divided into
disciplinary groups (e.g. structures, aerodynamics, etc.), which are then
subdivided into subsystem groups (e.g. wing structure, fuselage structure)
or component groups. It is also possible to do it the other way round, first
dividing the teams according to subsystems (e.g. wing, fuselage, etc.), and
then dividing each subsystem according to discipline (e.g. wing aerodynam-
ics, wing structure).
The decomposition process can be facilitated using tools from the won-
derful world of systems engineering. Most importantly, the technical system
(or corresponding design process) can be represented in a compact matrix
form that makes it amenable to analytical (numerical) operations.
There are different kinds of these matrices, but the most well known
is the Design Structure Matrix (DSM)12 [Steward, 1981] or N 2 diagram
(although the latter term is also used to indicate a specific kind of DSM
[Browning, 2001]).
The basic DSM is a square binary matrix (i.e. entries can be true or
false), with the subsystems or design tasks arranged on the diagonal, as
depicted in figure 2.21. The order of execution of the subsystems or tasks
is reflected by their position on the diagonal, starting top-left and ending
12
In fact, [Steward, 1981] starts out with a precedence matrix, then partitions this
matrix into block triangular form, after which he determines where the feedback relations
need to be teared. He refers to the decomposed matrix as the Design Structure Matrix.

71
S1 S2 S3 S4 S5 S6 S7 S8
S1 S1
S2 S2
S3 S3
S4 S4
S5 S5
S6 S6
S7 S7
S8 S8

Figure 2.21: A small example of a (binary) Design Structure Matrix. Subsystem 4


provides input to subsystems 2 and 8, and it receives input from subsystems 5, 6,
and 8.

bottom-right. A true entry above the diagonal indicates a feedback relation,


whereas a true entry below the diagonal indicates a feed-forward relation,
although some authors prefer it the other way round.
Many refinements of this basic form of the DSM are possible. For ex-
ample, numerical entries can be used (instead of binary ones) in order to
take into account the strength of couplings. Various measures are available
here, such as the number of variables that constitute the coupling. It is also
possible to use so called repeat probabilities, which represent the likelihood
of having to repeat a task if it proceeds without the required input, as de-
scribed by [Gebala and Eppinger, 1991]. But to keep things simple, we will
only consider the basic form in our discussion.
A DSM does more than simply give an overview of the relations between
subsystems. It also enables the use of partitioning algorithms [Steward,
1965] that rearrange the subsystems in order to remove as many feedback
relations as possible. This is achieved by simply rearranging the positions
of the subsystems on the diagonal while maintaining the existing relations.
For example, subsystem 7 in the figure provides feedback to subsystem 6,
but 7 does not receive any inputs from other subsystems. Thus by reversing
the positions on the diagonal of subsystems 6 and 7, this feedback can be
transformed into a feed-forward relation. This example is rather trivial, but
it illustrates the basic idea of rearranging subsystems.
The goal of the partitioning algorithm is to rearrange the matrix into
block triangular form. A block contains the smallest possible number of
subsystems that are mutually dependent (they depend on each other via
feedback loops) and hence cannot be solved sequentially without making
assumptions.
Ideally, there are no feedback relations between these blocks, but there

72
can be feed-forward relations (and there usually are many). Thus, a block
triangular DSM is empty above the diagonal of blocks but can have non-
empty entries below the diagonal of blocks. If the blocks are treated as single
subsystems, the DSM becomes a lower triangular matrix, which explains the
term block triangular [Warfield, 1973]. A lower triangular DSM represents
a decoupled system (i.e. without feedback).
The block triangular DSM is decoupled with respect to the blocks, but
within blocks feedback still occurs. However, if some of these feedback rela-
tions can be broken, the system may be partitioned further. This breaking
or tearing of relations is not a trivial task. On the contrary, it involves ex-
tensive knowledge of technical aspects of the system that is to be designed,
as well as knowledge of the organization that has to produce the design.
It is important to understand that real life design problems are often so
complex that it becomes very hard, if not impossible, to capture the entire
process in a nicely structured format such as the DSM. Often it is only
possible to represent parts of the process this way, and often this is done
once the process is already in place.
An example of a DSM for a real design problem is depicted in figure
2.22. This figure shows the partitioned system with subsystem blocks on the
diagonal. The figure clearly shows the tightly coupled nature of the blocks,
as well as some remaining high-level feedback relations between blocks. By
tearing some of these relations, the system could (theoretically) be decoupled
further.
As mentioned before, tearing a coupling relation (feed-forward and/or
feedback) implies that some form of coordination becomes necessary in order
to ensure that the resulting system remains consistent. Tearing a relation
may allow subsystems to operate sequentially or in parallel, but if too many
relations are broken the coordination task may become too difficult.
In the next section we will have a closer look at various coordination
strategies.

2.4.4 Coordination and Optimization


For our discussion, we will adopt the following definition by [Malone and
Crowston, 1994] (p.90): “Coordination is managing dependencies between
activities.” Coordination is required in any organization in order to make
sure that the different parts of the organization work together towards a
common goal.
This is no different in a complex design process. In any design process,
be it based on an existing organizational structure or on formal decompo-
sition, some form of coordination is required in order to ensure consistency
of the resulting design. This coordination is necessary on the system level,
i.e. coordination between disciplines, but also on the subsystem levels, i.e.

73
Figure 2.22: Example of a DSM, in block triangular form, for a semiconductor
design problem. This clearly illustrates the complexity of real life design problems.
Note that not all feedback can be removed from the design process. (source: [Ep-
pinger et al., 1994]).

74
coordination within disciplines. Figure 2.23 depicts the general idea of co-
ordination.

Figure 2.23: Coordination of data flow within a partitioned system (source: [Alli-
son, 2008]).

Note that some form of coordination is always required in the presence


of feedback loops. This is because a feedback loop needs to be broken in
order to initialize the iteration process. This is done by specifying an initial
design (an initial guess).
In an MDO architecture, coordination can either be taken care of by a
dedicated coordinator, or it can be taken care of by an optimizer. In fact, the
distinction between a coordinator and an optimizer is often blurred. A co-
ordinator, in the numerical sense, may be interpreted as a kind of optimizer
for constrained optimization without an objective (constraint solver).
It is possible to distinguish between different MDO architectures by an-
swering the following two questions:

1. Is decision authority centralized or distributed?

2. Is the optimizer also responsible for coordination?

The first question refers to the distinction between single-level archi-


tectures and multilevel architectures. A single-level architecture employs
only a system optimizer, hence decision authority is centralized. A multi-
level architecture employs subsystem optimizers in addition to the system
optimizer, so decision authority is distributed.
The second question refers to the distinction between analysis and eval-
uation. Analysis implies that a dedicated coordinator is used to maintain
(sub)system consistency at every optimization step. Evaluation implies that
this dedicated coordinator is not present, so that the optimizer needs to take
on the added responsibility of ensuring (sub)system consistency. In this case,
(sub)system consistency can only be ensured at an optimum solution, as will
be explained later.
But before going any further, it might be a good idea to discuss this
terminology in more detail.

75
Assume that the multidisciplinary design process and its disciplinary pro-
cesses incorporate models that require iterative solution procedures. The
behavior of these models is described by means of state equations. The in-
ternal state of such a model, e.g. the deformation of a loaded structure, is
represented by state variables. Based on the state variables, residuals can
be computed. These residuals provide a measure for convergence.
A process is said to have converged if a consistent solution has been
found, which implies that the residuals must vanish (i.e. become equal to—
or close enough to—zero). Note that the internal state of a system is not
only a function of the state variables, but also of the input variables and
internal parameters. Convergence is taken care of by the coordinator.
We define a coordinator to be a subsystem that manages coupling re-
lations in order to ensure system consistency. A multidisciplinary coordi-
nator (MDC) takes care of interdisciplinary couplings (between disciplines),
and a disciplinary coordinator (DC) takes care of intradisciplinary couplings
(within disciplines). Note that a coordination block may have direct feed-
through. Furthermore, initialization is also considered a task that is part of
coordination.
In our discussion, interdisciplinary couplings are represented by system
state variables (usually called coupling variables in literature), and intradis-
ciplinary couplings are represented by discipline state variables (usually just
called state variables in literature).
By treating coupling variables as state variables, we can construct a
consistent description of a multidisciplinary design process for the purpose
of clarifying the differences between various MDO architectures.
Let us now define an evaluator to be a process that performs a sin-
gle calculation of the residuals, the (new) state, and the output, based on
internal parameters, input variables, and the (old) state variables. This def-
inition is valid on the system level (multidisciplinary evaluator, MDE) as
well as on the discipline level (disciplinary evaluator, DE), and possibly on
lower levels as well. Depending on the problem, the output can consist of
objective values, constraint values, gradient information, coupling function
values, internal state, residuals, etc.
The combination of an evaluator and a coordinator in an iterative loop
is referred to as an analyzer. The coordinator makes sure that the system
is driven to convergence in an iterative process such as for example fixed
point iteration (FPI, described in appendix ??). It uses the residuals and
state calculated by the evaluator to guide the convergence process. Since
the analyzer delivers a consistent solution, there is no need to provide the
residuals and state as output. The definition of an analyzer also holds on
the system level (multidisciplinary analyzer, MDA) and on the discipline
level (disciplinary analyzer DA). In general, we may say that evaluation is
cheap, but analysis is expensive because it involves many evaluations.
An analysis run is based on a given design, which is determined by the

76
design variables—in this context we consider design to be the process of
defining the properties of a system, whereas analysis is considered to be the
process of determining the behavior of that system [Vanderplaats, 2007]. In
MDO, the task of determining the values of the design variables (decision
authority), based on objective function values and constraint values, falls
in the hands of the optimizer. Again we can distinguish between a multi-
disciplinary optimizer (MO) and a disciplinary optimizer (DO). As we will
see later, it is also possible for an optimizer to take over the task of the
coordinator.
Summarizing, we now have the following basic building blocks:

• the evaluator, which determines the residuals

• the coordinator, which controls the state variables in order to drive


the residuals to zero, thus ensuring consistency

• the analyzer, which can be seen as a combination of an evaluator and


a coordinator

• the optimizer, which controls the design variables and can also take
over the coordination task

With the help of these building blocks, it is possible to describe some basic
strategies for MDO, starting with single-level strategies and moving on to
multilevel strategies.

2.4.5 Single-level Optimization Strategies


Different MDO strategies distinguish themselves by the distribution of the
design tasks (determining the values of design variables) and the coordina-
tion tasks (determining the values of state variables) throughout the system
architecture.
The most basic MDO strategies employ a single system optimizer (mul-
tidisciplinary optimizer). These are called single-level strategies. More ad-
vanced MDO strategies employ disciplinary optimizers in addition to the
system optimizer. These are called multilevel strategies and are discussed
in the next section.
In all single-level strategies, the design task (decision authority) is in the
hands of the optimizer, that is, the optimizer controls the design variables.
The system design variables, collected in xS , are common to all disci-
plines (or at least, each system design variable is used in more than one
discipline). The discipline design variables xi are local to discipline i. These
are all disjoint sets of variables (i.e. they have no common elements).
Coordinators are responsible for the state variables. System state vari-
ables sS represent the coupling between disciplines, and discipline state vari-
ables si represent internal coupling in discipline i.

77
Different single-level MDO strategies have different ways of coordinating
the state variables. By treating state variables as design variables, thus by
letting the optimizer control the state variables in addition to the design
variables, a compromise can be found between optimization problem com-
plexity and analysis complexity. This leads to three basic strategies: MDF,
IDF, and AAO. These are now discussed in more detail.
For clarity we will assume that all analyses are performed using some
form of fixed-point-iteration. Based on this assumption, we can define a
residual as a measure for the change in state between two subsequent iter-
ations: r = |s − s∗ |. This is a rather intuitive choice, but many other kinds
of residuals can be used. Furthermore, we assume that all the constraints
are posed as inequality constraints (g ≤ 0);

MDF
Perhaps the most straightforward implementation of MDO is the addition of
a system optimizer to an existing multidisciplinary analysis process (MDA).
The system optimizer controls the design variables (xS and all xi ), and
at every optimization step a fully coupled multidisciplinary analysis is per-
formed. Objective values, constraint values, and gradients are calculated as
part of the multidisciplinary analysis.
In this approach, disciplinary consistency and multidisciplinary consis-
tency are maintained throughout the optimization process (the optimizer
is not aware of the coupling relations within the system, these are man-
aged by separate coordinators). This method is often referred to as MDF
(MultiDiscipline Feasible, where the word ‘feasible’ refers to consistency)
[Cramer et al., 1994]. Other designations are All-In-One (AIO) [Kodiyalam
and Sobieszczanski-Sobieski, 2001] and Fully Integrated Optimization (FIO)
[Keane and Nair, 2005].
A schematic representation of a basic MDF architecture for two disci-
plines is depicted in figure 2.24. The figure shows a multidisciplinary analy-
sis process (MDA, enclosed by a dashed line) guided by a system optimizer
(SO). The MDA consists of two disciplinary analysis processes (DA 1 and
DA 2, also enclosed by dashed lines) that are coupled via a system coor-
dinator (SC). Each disciplinary analysis comprises a disciplinary evaluator
(DE) and a disciplinary coordinator (DC).
The system optimizer determines values for the system design variables
xS and the disciplinary design variables xi , based on some system objective
and constraints. For convenience we will assume that the system objective
is simply a combination of the disciplinary objectives fi and that the con-
straints to be satisfied are just the disciplinary constraints gi . Depending on
the type of optimization routine used, the optimizer may also require gradi-
ent information, but for the sake of clarity this is not taken into account in
our discussion.

78
SO

DC 1 DE 1 DE 2 DC 2

DA 1 DA 2

SC
MDA

Figure 2.24: The multidiscipline feasible (MDF) approach: full multidisciplinary


analysis guided by a system optimizer

Disciplinary evaluator i computes the disciplinary output state si , dis-


ciplinary residual ri , objective function values fi , constraint values gi , and
system output state13 sij (‘computed in i as input for j’), based on the
design variables (xS and xi ), the system input state s∗S (= [s∗12 ; s∗21 ] in this
case) and disciplinary input state s∗i . It is important to note that the disci-
plinary evaluator treats the design variables and the system input state as
fixed parameters.
The disciplinary input state is determined by a disciplinary coordinator
based on the previous disciplinary output state and residual. The purpose
of the disciplinary coordinator is to drive the disciplinary residual to zero.
In that case, the disciplinary input state is equal (or close enough) to the
disciplinary output state: s∗i = si (the discipline has converged).
The combination of disciplinary evaluator and disciplinary coordinator
is referred to as a disciplinary analyzer. A disciplinary analyzer outputs
only the objective and constraint values and the system output state and
system residual, based on the design variables and system input state. The
disciplinary state and residuals are internal to the analyzer. Together, the
two disciplinary analyzers comprise the system evaluator (not explicitly in-
dicated in the figure).
The system coordinator handles the interdisciplinary relations so as to
make sure that the multidisciplinary system remains consistent at every
optimization step. Thus, the system coordinator drives the system residuals
rij to zero. This implies that the system state does not change from one
iteration to the other, in other words, a fixed point is reached: s∗ij = sij for
all i, j. Note that the internal structure of the system coordinator may also
contain direct feed-through.
This MDF architecture is depicted in a slightly more abstract way in
13
System state variables are often referred to as coupling variables in literature.

79
the design structure matrix in figure 2.25. The processes are located on
the diagonal and the (most important) information flow between processes
is specified. Again, entries above the diagonal represent feedback, whereas
entries below the diagonal represent feed-forward. In other words, a process
provides the entries in its column and it receives the entries in its row.
For example: disciplinary coordinator 1 (DC 1) receives r1 and s1 , and it
provides s∗1 .

SO f1 , g1 f2 , g2
SC r12 , s12 r21 , s21
DC 1 r1 , s1
xS , x1 s∗S s∗1 DE 1
DC 2 r2 , s2
xS , x2 s∗S s∗2 DE 2

Figure 2.25: MDF

Following the example of previous sections, we can formulate the MDF


strategy as a standard optimization problem (note that this is only one of
many possible formulations, for a simplified architecture):

A multidiscipline feasible (MDF) optimization problem for a two-discipline


system
Minimize:
f (f1 , f2 )
with respect to xS , x1 , x2
Subject to:
g1 ≤ 0
g2 ≤ 0

IDF
If the multidisciplinary coordinator is removed from the multidisciplinary
analysis process (MDA), it essentially becomes a multidisciplinary evalua-
tion process (MDE). This implies that multidisciplinary consistency is no
longer maintained throughout the optimization process, even though disci-
plinary consistency is still maintained. In addition to the optimization task,
the system optimizer now becomes responsible for ensuring multidisciplinary
consistency.
This is achieved by treating the system input states as design variables.
Consistency is enforced by introducing an equality constraint for every sys-

80
tem state variable, stating that the system input states and corresponding
system output states are equal, or in other words, that the system residuals
vanish (rij = s∗ij − sij = 0 for all i, j). Thus, multidisciplinary consistency
is only attained when these equality constraints are satisfied.
This approach is usually referred to as IDF (Individual Discipline Feasi-
ble) [Cramer et al., 1994], because the individual disciplines are consistent
at every optimization step. The disciplines are uncoupled, allowing them to
operate separately and simultaneously. An alternative designation is Dis-
tributed Analysis Optimization (DAO) [Keane and Nair, 2005].

SO

DC 1 DE 1 DE 2 DC 2

DA 1 DA 2

Figure 2.26: The individual discipline feasible (IDF) approach: full disciplinary
analysis guided and coordinated by a system optimizer

f1 , g1 , f 2 , g2 ,
SO
r12 , s12 r21 , s21
DC 1 r1 , s1
xS , x1 ,
s∗1 DE 1
s∗S
DC 2 r2 , s2
xS , x2 ,
s∗2 DE 2
s∗S

Figure 2.27: IDF

The IDF strategy can also be formulated as an optimization problem:

81
An individual discipline feasible (IDF) optimization problem for a two-
discipline system
Minimize:
f (f1 , f2 )
with respect to xS , x1 , x2 , s∗S (where s∗S = [s∗12 ; s∗21 ])
Subject to:
g1 ≤ 0
g2 ≤ 0
r12 = s∗12 − s12 = 0
r21 = s∗21 − s21 = 0

AAO
The All-At-Once (AAO) approach goes one step further than IDF, removing
not only the multidisciplinary coordinator but also the disciplinary coordi-
nators [Cramer et al., 1994]. The system optimizer now has the responsibil-
ity of ensuring not only multidisciplinary consistency but also disciplinary
consistency. Usually this can only be done at the optimum. AAO could
therefore be interpreted as a ‘no-discipline feasible’ approach. Other des-
ignations are Simultaneous Analysis and Design (SAND) [Keane and Nair,
2005], and Optimization Based Decomposition [Kroo, 1997].
The system optimizer now controls the disciplinary input states as well.
Just like in IDF, multidisciplinary consistency is achieved by introducing
coupling constraints in the form of equality constraints for the system residu-
als (rij = 0). Now disciplinary consistency is also maintained by introducing
equality constraints for the discipline residuals: ri = 0.
Note that the number of variables and coupling constraints can easily be-
come very large, complicating the optimization problem. On the other hand,
a single optimization step requires only evaluations, which are much cheaper
than analyses. Thus, where MDF involves a relatively simple optimization
problem with an expensive coupled multidisciplinary analysis, AAO involves
a difficult optimization problem with relatively cheap uncoupled disciplinary
evaluations.
The AAO strategy for this simple system can also be formulated as an
optimization problem:

82
SO

DE 1 DE 2

Figure 2.28: The all-at-once (AAO) approach: separate disciplinary evaluations


guided and coordinated by a system optimizer

f1 , g1 , f2 , g2 ,
SO r12 , s12 , r21 , s21 ,
r1 , s1 r2 , s2
xS , x1 ,
DE 1
s∗S , s∗1
xS , x2 ,
DE 2
s∗S , s∗2

Figure 2.29: AAO

An all-at-once (AAO) optimization problem for a two-discipline system


Minimize:
f (f1 , f2 )
with respect to xS , x1 , x2 , s∗S , s∗1 , s∗2 , (where s∗S = [s∗12 ; s∗21 ])
Subject to:
g1 ≤ 0
g2 ≤ 0
r12 = s∗12 − s12 = 0
r21 = s∗21 − s21 = 0
r1 = 0
r2 = 0

These single level MDO strategies provide a good starting point for ex-
plaining the fundamental aspects of MDO, but they are often hard to imple-
ment in real life design processes. For a large part this is due to the presence
of centralized decision authority (a single system optimizer that controls all
the design variables). As system scale increases, this problem becomes more
apparent.
Real organizations require distributed decision authority for many differ-

83
ent reasons. For example, different design departments are often located at
separate geographical locations (as are subcontractors). Also, disciplinary
experts often do not tolerate too much centralized decision authority. Fur-
thermore, as system complexity increases the amount of information that
has to be handled by the system optimizer simply becomes unmanageable
[Kroo, 1997]. For these reasons multilevel MDO strategies are developed.

2.4.6 Multilevel Optimization Strategies


A multilevel MDO strategy uses subsystem optimizers in addition to the
system optimizer, thus it incorporates distributed decision authority. Mul-
tilevel strategies use the same building blocks as single-level strategies, and
just like for single-level strategies, the possibilities are endless. Some exam-
ples of multilevel strategies are CSSO (Concurrent SubSpace Optimization),
BLISS (Bi-Level Integrated System Synthesis), and CO (Collaborative Op-
timization). We will look at the latter in some more detail.

CO
Collaborative Optimization (CO) is a a bi-level MDO strategy, with a system
optimizer and separate disciplinary optimizers. The basic architecture is
depicted in figure 2.30, and the relations are again specified using a DSM
representation in figure 2.31.
The system optimizer sets a target for the disciplinary optimizers and the
disciplinary optimizers try to match this target as closely as possible. This
allows disciplines to operate separately and simultaneously: no information
is exchanged between disciplines.

SO

DO 1 DO 2

DC 1 DE 1 DE 2 DC 2

DA 1 DA 2

Figure 2.30: The collaborative optimization (CO) approach: separate disciplinary


analyses with disciplinary optimizers, guided and coordinated by a system optimizer

The system optimizer controls the system design vector xS and the sys-
tem state vector sS (which contains the interdisciplinary couplings). The

84
SO R1 f1 R2 f2
z DO 1 g1
DC 1 r1 , s1
z1 s∗1 DE 1
z DO 2 g2
DC 2 r2 , s2
z2 s∗2 DE 2

Figure 2.31: CO (Note that in this example f1 and f2 are communicated directly to
the system optimizer. This is just one possible implementation, chosen for consis-
tency with our descriptions of single-level architectures. This direct communication
is not depicted in figure 2.30, but we assume direct feed-through via the discipline
optimizers.)

combination of the two is called the system target: z = [xS ; sS ].


The system optimizer tries to minimize the system objective fS , which
represents the design goal, subject to equality constraints (Ri ) that enforce
system consistency.
For ease of discussion we will assume that the system objective is some
function of the objective values f1 and f2 obtained from the disciplines, and
that it is calculated within the optimizer. It would also be possible to use
a separate discipline for the task of calculating the system objective value,
or to incorporate this calculation into one of the existing disciplines. These
choices depend on the problem at hand.
Each disciplinary optimizer controls the local design vector xi as well as a
local version of the system target, zi = [xS ; sS ]i . Note that the actual system
target values from z are treated as fixed parameters within the discipline,
whereas zi is treated as a design vector.
The task for the discipline optimizer is to match the local version of
the system target to the actual system target as closely as possible while
satisfying local constraints. According to [Kroo, 2004], this is usually done
by minimizing the sum of squared differences (i.e. the square of the L2 -norm
or Euclidean norm), so the objective for discipline i is: Ri = |zi − z|2 . This
can be interpreted as the minimization of a residual.
Think about what this means: Each discipline is allowed to deviate from
the system target in order to allow the local constraints to be satisfied, but
this deviation is to be minimized. The system is only consistent when all
these deviations (residuals) are equal to zero. Thus, the system optimizer
has to enforce system consistency by requiring that the disciplinary objective
values are all equal (or close enough) to zero at the optimum (Ri = 0 for
all i). For reasons of robustness, a more convenient approach would be to
require Ri ≤ 0.
Again, for ease of discussion, we assume that the disciplinary objectives
are calculated by the disciplinary optimizers, even though this may not be

85
practical in real problems.
Note that constraint gradient information for the system level optimiza-
tion problem is easily obtained in analytical form. Information from post-
optimality analysis of the disciplines can be used here [Braun and Kroo,
1993].
The CO approach can be described as a set of formal optimization prob-
lems, at the system level problem and at the discipline level.

A collaborative optimization (CO) problem for a two-discipline system

System level
Minimize:
fS (f1 , f2 )
with respect to z, where z = [xS ; sS ]
Subject to:
R1 ≤ 0
R2 ≤ 0
Discipline level (i = 1, 2)
Minimize:
Ri = |zi − z|2
with respect to zi , xi
Subject to:
gi ≤ 0

Collaborative Optimization is only one of the more promising approaches


to MDO. There are numerous other multilevel MDO strategies that also
have great potential. For more detail on these methods, please refer to e.g.
[Kodiyalam and Sobieszczanski-Sobieski, 2001], [Allison, 2004], [Sobieszczanski-
Sobieski et al., 1998].

2.4.7 Selection of an MDO Strategy


As with any optimization problem, the choice of MDO strategy depends
strongly on the problem that is to be solved. This makes it quite hard to
say which method should be used for which kind of problem beforehand.
Nevertheless, it is possible to give some guidelines.
For example, single-level strategies are often relatively easy to implement
and understand, and as long as the problem does not require distributed
decision authority, these strategies may offer a good solution. Within the
single-level methods, the choice depends on the degree of coupling and the
cost of analyses. If the multidisciplinary analysis is relatively cheap for a

86
given problem, then it could be easiest to use MDF. An added advantage of
MDF is that sub-optimal solutions may still be used, because consistency
is always maintained. If MDF is too expensive one of the other strategies,
such as IDF or AAO, might yield better results.
Multilevel strategies are often better suited for use in large complex or-
ganizations, because they allow distributed decision authority. However,
multilevel strategies are usually more prone to convergence issues. For ex-
ample, the robustness of the CO method depends heavily on the way it is
implemented, as described by [Alexandrov and Lewis, 2002].
Several aspects have been identified that are important for the successful
implementation of MDO strategies in large-scale projects [Sobieszczanski-
Sobieski and Haftka, 1997], [Tosserams, 2008]. An ideal MDO strategy would
have the following characteristics:
• Disciplinary design autonomy: The strategy should allow the use of
available expertise and legacy design tools. Local decision authority
should be respected.
• Flexibility: The strategy should be easily adaptable to a specific or-
ganization.
• Mathematical rigor: The strategy should yield reliable and consistent
results, and the optimality of the results should be provable.
• Efficiency: The strategy should lead to an optimal solution in a min-
imum number of iterations and it should minimize design time (e.g.
by concurrency of tasks).
The CO strategy appears to possess most of these characteristics, except
for the mathematical rigor [?].
In our discussion of MDO strategies, the optimizer itself was treated as
a black box. However, the choice of optimization method also influences the
success of the MDO strategy. In this respect, what was said about the selec-
tion of single-objective and multiobjective optimization methods also holds
for MDO. For example, the use of SQP for the system level optimization
problem in CO may cause problems if not used correctly [Alexandrov and
Lewis, 2002].
This concludes our discussion of MDO. Obviously, a lot of ground (ac-
tually most of it) has been left uncovered in this introduction, and the
interested reader is urged to investigate further with the help of recent lit-
erature.

2.5 Concluding remarks


This chapter about multidisciplinary design optimization was intended as
an introduction to numerical optimization and its application to large-scale

87
design problems. By no means does it provide a comprehensive overview of
the subject, this much should be clear. For more in-depth information about
any of the topics touched upon here, the reader is referred to literature.
A very good reference work that covers all of the aforementioned topics
(and then some) in more detail, is provided by [Keane and Nair, 2005].
However, the field of optimization in general, and that of MDO in particular,
is very dynamic. New methods and improvements are developed all the time,
so it is important always to keep an eye on recent developments.
An interesting discussion of the current status and future of MDO is
given by [de Weck et al., 2007]. Other useful information sources are:
[MDOB, 2008], [de Weck and Willcox, 2004], [Venkataraman, 2002], [Anton-
sson and Cagan, 2001]. For a taxonomy of numerical optimization methods,
visit the NEOS Guide and have a look at the Optimization Tree [Optimiza-
tion Technology Center, 2008].
One very important issue in optimization is the fidelity of analysis mod-
els. Even the ‘best’ optimization methods can only optimize the mathemat-
ical problem description provided to them. Thus, an optimizer may come
up with a perfect optimum for a specific problem description which is math-
ematically correct, but if the mathematics do not accurately represent the
real physical problem, this optimum may be of no use whatsoever.
For example, if stall behavior were neglected in the aerodynamic model
of an aircraft, an optimizer might come up with a solution that requires
unrealistically large angles of attack.
In MDO the importance of model fidelity is even more important due
to the scale of the problems and the number of interactions between disci-
plines. There is always a chance that an MDO process would exploit some
mathematical characteristic of the problem description that has no physical
significance, leading to bogus synergetic effects, for example.
The applicability of a solution is also influenced by uncertainties in the
model. Special techniques exist for dealing with uncertainties in optimiza-
tion, but these are beyond the scope of this text. For more information on
this subject refer to e.g. [Beyer and Sendhoff, 2007].
Although the field of numerical optimization has been around for a long
time, the MDO discipline is still relatively young. As a result there is no
real consensus yet about what constitutes MDO and how best to represent
different MDO strategies. Also, be aware that some of the definitions and
classifications described in this text are not unique. Often different authors
use different labels for the same thing.
Again, it is important to keep an eye on recent developments, especially
in the field of MDO.

88
Bibliography

Scott Aaronson. Complexity Zoo. Website, August 25 2008. http://qwiki.


stanford.edu/wiki/Complexity_Zoo.

Robert A. Adams. Calculus: A Complete Course. Addison Wesley, Don


Mills, Ontario, fourth edition, 1999.

N. Alexandrov. Editorial–Multidisciplinary Design Optimization. Optimiza-


tion and Engineering, 6(1):5–7, 2005.

Natalia M. Alexandrov and Robert Michael Lewis. Analytical and Com-


putational Aspects of Collaborative Optimization for Multidisciplinary
Design. AIAA Journal, 40(2):301–309, February 2002.

James T. Allison. Complex System Optimization: A Review of Analytical


Target Cascading, Collaborative Optimization, and Other Formulations.
Master’s thesis, The University of Michigan, 2004.

James T. Allison. Optimal Partitioning and Coordination Decisions in Com-


plex System Design Optimization. Website University of Michigan, De-
cember 17, 2008. http://ode.engin.umich.edu/research/MDO.html.

Andreas Antoniou and Wu-Sheng Lu. Practical Optimization: Algorithms


and Engineering Applications. Springer Science+Business Media, New
York, NY, 2007.

Erik K. Antonsson and Jonathan Cagan, editors. Formal Engineering Design


Synthesis. Cambridge University Press, Cambridge, 2001. ISBN 0-521-
79247-9.

Peter Bartholomew. The Role of MDO within Aerospace Design and


Progress Towards an MDO Capability. AIAA White Paper on Industrial
Experience with MDO, (AIAA-98-4705), 1998. http://www.aiaa.org/
Participate/Uploads/AIAA_MDO_TC_1998_White_Paper_on_MDO.pdf.

Mike C. Bartholomew-Biggs, Steven C. Parkhurst, and Simon P. Wilson.


Stochastic Algorithms: Foundations and Applications, volume 2827 of Lec-
ture Notes in Computer Science, chapter Global Optimization Stochas-

89
tic or Deterministic?, pages 125–137. Springer-Verlag Berlin Heidelberg,
2003.

Albert Bartlett. Arithmetic, Population and Energy. Video Lec-


ture, 2004. http://globalpublicmedia.com/dr_albert_bartlett_
arithmetic_population_and_energy, December 18, 2008.

Hans-Georg Beyer and Bernhard Sendhoff. Robust optimization - A compre-


hensive survey. Computer Methods in Applied Mechanics and Engineering,
196:3190–3218, 2007.

Christina L. Bloebaum. Formal and heuristic system decomposition methods


in multidisciplinary synthesis. NASA Contractor Report NASA-CR-4413,
NASA Langley Research Center, 1991.

Robert D. Braun and Ilan M. Kroo. Post Optimality Analysis in Aerospace


Vehicle Design. In AIAA Aircraft Design, Systems and Operations Meet-
ing, Monterey, CA, August 11-13 1993.

Tyson R. Browning. Applying the Design Structure Matrix to System De-


composition and Integration Problems: A Review and New Directions.
IEEE Transactions on Engineering Management, 48(3):292–306, August
2001.

Martin Bücker, George Corliss, Paul Hovland, Uwe Naumann, and Boyana
Norris. Automatic Differentiation: Applications, Theory, and Implemen-
tations, volume 50 of Lecture Notes in Computational Science and Engi-
neering. Springer-Verlag Berlin Heidelberg, 2006.

Jens Clausen. Branch and Bound Algorithms - Principles and Examples.


Technical report, University of Copenhagen, Department of Computer
Science, March 1999.

William Cook. The traveling salesman problem. Website of the Georgia


Institute of Technology, December 17, 2008. http://www.tsp.gatech.
edu/index.html.

Evin J. Cramer, J. E. Dennis, Jr., Paul D. Frank, Robert Michael Lewis,


and Gregory R. Shubin. problem Formulation for Multidisciplinary Opti-
mization. SIAM Journal on Optimization, 4(4):754–776, November 1994.

Olivier de Weck and Karen Willcox. Multidisciplinary System Design Op-


timization (MSDO). Lecture Notes MIT Open Course Ware, February
2004. http://ocw.mit.edu/OcwWeb/Aeronautics-and-Astronautics/
16-888Spring-2004/LectureNotes/index.htm.

Olivier de Weck, Jeremy Agte, Jaroslaw Sobieszczanski-Sobieski, Paul


Arendsen, Alan Morris, and Martin Spieck. State-of-the-Art and Future

90
Trends in Multidisciplinary Design Optimization. In Proceedings of the
48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics,
and Materials Conference, number AIAA 2007-1905, Honolulu, Hawaii,
April 2007.

E.W. Dijkstra. A Note on Two Problems in Connexion with Graphs. Nu-


merische Mathematik, 1:269–271, June 1959.

Steven D. Eppinger, Daniel E. Whitney, Robert P. Smith, and David A.


Gebala. A Model-Based Method for Organizing Tasks in Product Devel-
opment. Research in Engineering Design, 6:1–13, 1994.

Hamidreza Eskandari and Christopher D. Geiger. A fast Pareto genetic al-


gorithm approach for solving expensive multiobjective optimization prob-
lems. Journal of Heuristics, 14:203–241, 2008.

David A. Gebala and Steven D. Eppinger. Methods for Analyzing Design


Procedures. In Design Theory and Methodology, volume 31 of ASME
Design Technical Conferences, pages 227–233, Miami, Florida, September
1991.

Michael B. Giles and Niles A. Pierce. An Introduction to the Adjoint Ap-


proach to Design. Flow, Turbulence and Combustion, 65:393–415, 2000.

Richard Gilmore, Sean Wakayama, and Dino Roman. Optimization of High-


Subsonic Blended-Wing-Body Configurations. In Proceedings of the 9th
AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimiza-
tion, Atlanta, Georgia, September 2002.

Fred Glover. Tabu Search - Part I. ORSA Journal on Computing, 1(3):


190–206, 1989.

Fred Glover. Tabu Search - Part II. ORSA Journal on Computing, 2(1):
4–32, 1990.

Fred Glover, Eric Taillard, and Dominique de Werra. A user’s guide to tabu
search. Annals of Operations Research, 41(1):3–28, March 1993.

Andreas Griewank. Evaluating Derivatives: principles and techniques of


algorithmic differentiation. Frontiers in Applied Mathematics. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, 2000.

G.A. Hazelrigg. On Irrationality in Engineering Design. Journal of Mechan-


ical Design, 119:194–196, June 1997.

Richard Healey. Holism and Nonseparability in Physics. The Stanford


Encyclopedia of Philosophy, Fall 2008. http://plato.stanford.edu/
archives/fall2008/entries/physics-holism/.

91
Y.C. Ho and D.L. Pepyne. Simple Explanation of the No-Free-Lunch The-
orem and Its Implications. Journal of Optimization Theory and Applica-
tions, 115(3):549–570, December 2002.

R. Hooke and T.A. Jeeves. ”Direct Search” Solution of Numerical and Sta-
tistical Problems. Journal of the Association for Computing Machinery,
8(2):212–229, April 1961.

Cyrus D. Jilla. A Multiobjective, Multidisciplinary Design Optimization


Methodology for the Conceptual Design of Distributed Satellite Systems.
PhD thesis, Massachusetts Institute of Technology, May 2002.

Donald R. Jones. A Taxonomy of Global Optimization Methods Based


on Response Surfaces. Journal of Global Optimization, 21(4):345–383,
December 2001.

Thomas Kaminski, Ralf Giering, and Carsten Othmer. Topological design


based on highly efficient adjoints generated by automatic differentiation.
In Proceedings of the Design Optimization International Conference, Las
Palmas, Spain, April 5-7 2005.

Andy J. Keane and Prasanth B. Nair. Computational Approaches for


Aerospace Design: The Pursuit of Excellence. John Wiley & Sons, Ltd,
Chichester, West Sussex, 2005.

Srinivas Kodiyalam and Jaroslaw Sobieszczanski-Sobieski. Multidisciplinary


design optimization – some formal methods, framework requirements, and
application to vehicle design. International Journal of Vehicle Design, 25
(1/2):3–22, 2001. Special Issue.

I. Kroo. Distributed multidisciplinary design and collaborative optimization.


In VKI lecture series on Optimization Methods & Tools for Multicrite-
ria/Multidisciplinary Design. Von Karman Institute for Fluid Dynamics,
Sint-Genesius-Rode, Belgium, November 2004.

Ilan Kroo. MDO in Large-Scale Design. In Multidisciplinary Design Opti-


mization: State of the Art, pages 22–44. SIAM, 1997.

A. Kusiak and J. Wang. Efficient organizing of design activities. Interna-


tional Journal of Production Research, 31(4):753–769, 1993.

J. Laurenceau and M. Meaux. Comparison of gradient and response surface


based optimization frameworks using adjoint method. In Proceedings of
the 4th AIAA Multidisciplinary Design Optimization Specialist Confer-
ence, Schaumburg, Illinois, 2008.

C.Y. Lee. An Algorithm for Path Connections and its Applications. IRE
Transactions on Electronic Computers, EC-10(2):364–365, 1961.

92
Thomas W. Malone and Kevin Crowston. The interdisciplinary study of
coordination. ACM Computing Surveys, 26(1):87–119, 1994.

Joaquim R.R.A. Martins. A Coupled-Adjoint Method for High-Fidelity


Aero-Structural Optimization. PhD thesis, Stanford University, Stanford,
October 2002. http://aero-comlab.stanford.edu/Papers/martins.
thesis.pdf.

M. D. McKay, R. J. Beckman, and W. J. Conover. A Comparison of Three


Methods for Selecting Values of Input Variables in the Analysis of Output
from a Computer Code. Technometrics, 21(2):239–245, May 1979.

Antoine McNamara, Adrien Treuille, Zoran Popovic, and Jos Stam. Fluid
Control Using the Adjoint Method. In Proceedings International Con-
ference on Computer Graphics and Interactive Techniques (ACM SIG-
GRAPH 2004), pages 449–456, 2004.

NASA MDOB. NASA Multidisciplinary Optimization Branch Website.


http://mdob.larc.nasa.gov, May 2008.

Kaisa M. Miettinen. Nonlinear Multiobjective Optimization. Kluwer Aca-


demic Publishers, Norwell, Massachusetts, 1999.

Marcin Molga and Czeslaw Smutnicki. Test functions for optimization


needs. Website, August 2005. http://www.zsd.ict.pwr.wroc.pl/
files/docs/functions.pdf.

Arnold Neumaier. Acta Numerica, volume 13, chapter 4: Complete Search in


Continuous Global Optimization and Constraint Satisfaction. Cambridge
University Press, 2004.

Jürg Nievergelt. Exhaustive Search, Combinatorial Optimization and Enu-


meration: Exploring the Potential of Raw Computing Power. In SOFSEM
2000: Theory and Practice of Informatics, Lecture Notes in Computer
Science. Springer Berlin Heidelberg, 2000.

Optimization Technology Center. NEOS Guide. Website, December 12,


2008. http://www-fp.mcs.anl.gov/OTC/Guide/OptWeb/index.html.

C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.

Amit Patel. Amit’s A∗ Pages. Website Stanford University, December 12,


2008. http://theory.stanford.edu/~amitp/GameProgramming/.

D.J.W. De Pauw and P.A. Vanrolleghem. Avoiding the finite difference sen-
sitivity analysis deathtrap by using the complex-step derivative approx-
imation technique. In Proceedings Summit on Environmental Modelling
and Software (iEMSs2006), Burlington, Vermont, July 9-12 2006.

93
Nestor V. Queipo, Raphael T. Haftka, Wei Shyy, Tushar Goel, Rajkumar
Vaidyanathan, and P. Kevin Tucker. Surrogate-based analysis and opti-
mization. Progress in Aerospace Sciences, 41:1–28, 2005.

Louis B. Rall. Automatic Differentiation: Techniques and Applications, vol-


ume 120 of Lecture Notes in Computer Science. Springer-Verlag Berlin
Heidelberg, 1981.

H. Richards. The E.W. Dijkstra Archive. University of Texas Website,


December 23, 2008. http://www.cs.utexas.edu/~EWD/welcome.html.

Timothy W. Simpson and Farrokh Mistree. Kriging Models for Global Ap-
proximation in Simulation-Based Multidisciplinary Design Optimization.
AIAA Journal, 39(12):2233–2241, December 2001.

Jan A. Snyman. Practical Mathematical Optimization, volume 97 of Applied


Optimization. Springer Science+Business Media, New York, NY, 2005.

Catherine Soanes and Angus Stevenson, editors. Concise Oxford English


Dictionary. Oxford University Press, 11th revised edition, July 2008.

J. Sobieszczanski-Sobieski and R.T. Haftka. Multidisciplinary aerospace de-


sign optimization: survey of recent developments. Structural and Multi-
disciplinary Optimization, 14:1–23, 1997.

J. Sobieszczanski-Sobieski, J.S. Agte, and R.R. Sandusky, Jr. Bi-Level Inte-


grated System Synthesis (BLISS). Technical Report NASA/TM-1998-
208715, NASA Langley Research Center, Hampton, Virginia, August
1998.

Jaroslaw Sobieszczanski-Sobieski. Sensitivity of Complex, Internally Cou-


pled Systems. AIAA Journal, 28(1):153–160, January 1990.

W.P. Stevens, G.J. Myers, and L.L. Constantine. Structured Design. IBM
Systems Journal, 13(2):115–139, 1974.

Donald V. Steward. Partitioning and Tearing Systems of Equations. Journal


of the Society for Industrial and Applied Mathematics, Series B: Numer-
ical Analysis, 2(2):345–365, 1965.

Donald V. Steward. The Design Structure System: A Method for Manag-


ing the Design of Complex Systems. IEEE Transactions on Engineering
Management, 28(3):71–74, August 1981.

L. P. Swiler, R. Slepoy, and A. A. Giunta. Evaluation of Sampling Meth-


ods in Constructing Response Surface Approximations. In Proceedings
of the 47th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dy-
namics, and Materials Conference (2nd AIAA Multidisciplinary Design

94
Optimization Specialist Conference), number AIAA-2006-1827, Newport,
Rhode Island, 2006.
S. Tosserams. Distributed Optimization for Systems Design: An Augmented
Lagrangian Coordination Method. PhD thesis, Eindhoven University of
Technology, August 2008.
Christian A. van der Velden. Application of Knowledge Based Engineering
to Intelligent Design Systems. PhD thesis, RMIT School of Aerospace,
Mechanical and Manufacturing Engineering, Melbourne, Australia, July
2008.
Ed van Hinte and Michel van Tooren. First Read This - Systems Engineering
in Practice. 010 Publishers, Rotterdam, 2008.
Garret N. Vanderplaats. Multidiscipline Design Optimization. Vanderplaats
Research & Development, Inc., Colorado Springs, CO, 1st edition, 2007.
ISBN 0-944956-04-1.
P. Venkataraman. Applied Optimization with Matlab
R
Programming. John
Wiley & Sons, New York, 2002.
Arun Verma. An introduction to automatic differentiation. Current Science,
78(7):804–807, April 2000.
Terrance C. Wagner and Panos Y. Papalambros. A General Framework for
Decomposition Analysis in Optimal Design. Advances in Design Automa-
tion, 65(2):315–325, September 1993.
John N. Warfield. Binary Matrices in System Modeling. IEEE Transactions
on Systems, Man, and Cybernetics, 3(5):441–449, September 1973.
Eric W. Weisstein. Directed Graph. From MathWorld–A Wolfram
Web Resource, December 22, 2008a. http://mathworld.wolfram.com/
DirectedGraph.html.
Eric W. Weisstein. Vector Norm. From MathWorld–A Wolfram Web
Resource., December 23, 2008b. http://mathworld.wolfram.com/
VectorNorm.html.
Eric J. Whitney, Luis F. Gonzalez, and Jacques Periaux. Multidisciplinary
Methods for Analysis Optimization and Control of Complex Systems,
volume 6 of Mathematics in Industry, chapter Distributed Multidisci-
plinary Design Optimisation in Aeronautics using Evolutionary Algo-
rithms, Game Theory and Hierarchy, pages 249–281. Springer Berlin
Heidelberg, 2005.
Wikipedia. A* search algorithm. Wikipedia, December 23, 2008. http:
//en.wikipedia.org/wiki/A*_search_algorithm.

95
Herbert S. Wilf. Algorithms and Complexity. A.K. Peters Ltd., 1st edition,
1994. http://www.cis.upenn.edu/~wilf.

Gregory D. Wyss and Kelly H. Jorgensen. A User’s Guide to LHS: Sandia’s


Latin Hypercube Sampling Software. Technical Report SAND98-0210,
Sandia National Laboratories, Albuquerque, NM, February 1998.

96
Appendix 2.A

Analytical Sensitivity
Analysis

Sensitivity analysis is a very important aspect of numerical optimization.


One of the most straightforward numerical approaches to obtaining sensi-
tivity information (i.e. gradient information) is by using finite-differencing
(FD) techniques. However, this approach has many drawbacks, such as poor
numerical stability and the presence of approximation errors. But more im-
portantly, the FD approach is very expensive in terms of computational
burden. It requires one extra function evaluation per design variable, at
every optimization step. Analytical methods for sensitivity analysis provide
a reliable and computationally cheap alternative.
This appendix provides a concise introduction to analytical sensitivity
analysis by means of the direct approach and the adjoint approach. The
text is largely based on chapter 4 of [Martins, 2002].

2.A.1 Governing Equations


We consider a system described by governing equations in residual form.
The solution of these governing equations yields the system state si . The
governing equations Rk depend on the design variables xn , but also on the
state variables si , which can also be implicit functions of the design variables
(they depend on the design variables through the solution of the system).
Using index notation we write:

Rk (xn , si (xn )) = 0 (2.A.1)


The number of governing equations must equal the number of state vari-
ables in order to have a fully defined system. Hence the range of indices i
and k is i, k = 1, 2, . . . , NR , where NR represents the number of degrees-of-
freedom of the system. The index n has the range n = 1, 2, . . . , Nx , where
Nx is the number of design variables.

97
2.A.2 The Direct Approach
The goal is to optimize the system described by these governing equations.
This is done by minimizing some objective function, usually subject to con-
straint functions. Both are usually functions of the design variables and
the state variables. If we wish use gradient based optimization methods,
we require sensitivity information for these functions. We will denote the
general function for which we require gradient information by I (so I may
be an objective or a constraint function or some other function):
I = I(xn , si (xn )) (2.A.2)
The gradient of I can be found using the chain rule of differentiation:
dI ∂I ∂I dsi
= + (2.A.3)
dxn ∂xn ∂si dxn
The partial-derivative terms on the right-hand-side of the equation can be
evaluated easily by varying the denominator, but the total-derivative term
is another matter. The total sensitivity of the state variables with respect
dsi
to the design variables ( dx n
) requires a full solution of the system. This can
be clarified as follows.
For the system to be consistent the governing equations must always be
satisfied, which implies that the total derivative of the residuals with respect
to the design variables must be zero. Again using the chain rule, we find:
dRk ∂Rk ∂Rk dsi
= + =0 (2.A.4)
dxn ∂xn ∂si dxn
Note that the term ∂R
∂si (in index notation) represents the Jacobian matrix.
k

dsi
By rewriting this equation we can determine dx n
:
 −1
dsi ∂Rk ∂Rk
=− (2.A.5)
dxn ∂si ∂xn
dsi
Thus, by solving for dx n
and substituting into equation 2.A.3, we can deter-
mine the sensitivity of I. This method is called the direct approach.
However, note that in order to find the total sensitivity of the system
using the direct approach, the matrix equation needs to be solved for every
design variable, so equation 2.A.5 has to be evaluated Nx times.

2.A.3 The Adjoint Approach


Fortunately it is also possible to find the sensitivity of I in a slightly different
way. This becomes clear when we substitute equation 2.A.5 into equation
2.A.3:
∂I ∂Rk −1 ∂Rk
 
dI ∂I
= − (2.A.6)
dxn ∂xn ∂si ∂si ∂xn

98
Now we can introduce a vector Ψk (index notation), defined as

∂I ∂Rk −1
 
Ψk = − (2.A.7)
∂si ∂si

The vector Ψk is often called the adjoint vector. With this definition, the
sensitivity equation (eq. 2.A.6) can be simplified to:

dI ∂I ∂Rk
= + Ψk (2.A.8)
dxn ∂xn ∂xn
We can rewrite equation 2.A.9 to form the adjoint equation:
∂Rk ∂I
Ψk = − (2.A.9)
∂si ∂si
Now, we can solve the adjoint equation for Ψk , and then substitute into
equation 2.A.8 in order to find the system sensitivity. This method is called
the adjoint approach.
Note that the adjoint depends on the function I instead of the design
variables. Thus in order to find the total sensitivity of the system using the
adjoint approach, the matrix equation 2.A.9 needs to be solved for every
function I. The computational burden for the adjoint approach is therefore
largely independent of the number of design variables.

2.A.4 Conclusion
dsi
Thus, in the direct approach it is necessary to compute dx n
once for every
design variable xn , whereas in the adjoint approach it is necessary to com-
pute Ψk once for every function I. The computational burden per evaluation
dsi
of dx n
or Ψk is comparable.
The conclusion is that, if the number of design variables is greater than
the number of functions for which sensitivity information is required (I),
then the adjoint method is computationally more efficient than the direct
method, and vice versa [Martins, 2002].
It should be mentioned that the implementation of analytical sensitivity
methods in large scale numerical codes often requires a large amount of
work. Once this has been done, however, the advantages are obvious.

99
Appendix 2.B

Sequential Quadratic
Programming

Consider a general non-linear constrained optimization problem, represented


by objective F (X) and constraints gi (X) for i = 1, . . . , m, where X is the
design vector. For brevity we only consider inequality constraints.
One of the most powerful methods for gradient based, non-linear con-
strained optimization is the Sequential Quadratic Programming (SQP) method.
A flowchart for the SQP method is depicted in figure 2.B.1.
Like other gradient based methods, SQP reduces the multivariate opti-
mization problem in X to a sequence of optimization problems in a single
variable α, called line search or one-dimensional search.

2.B.1 Line Search


In the line search method, a new estimate for the design vector Xq , corre-
sponding to (major) iteration q, is determined as follows:

Xq = Xq−1 + α∗ Sq (2.B.1)

where Sq represents a search direction and α∗ is the optimum solution to


the one-dimensional optimization problem corresponding to that search di-
rection.
This one dimensional optimization problem in α is found by substituting
equation 2.B.1 into the original optimization problem (without the asterisk
∗ which indicates the optimum) which yields F (X(α)) and g(X(α)).
Thus, the line search approach consists of two steps that need to be
performed at every major iteration of the optimizer:
1. Determine a search direction Sq
2. Minimize the resulting one-dimensional problem in terms of α in order
to find α∗

100
Start

Initialization
q=0, B=I, X=X0

q=q+1

Solve direction finding


problem FQP(S) subject to
gQP(S) in order to find Sq

Perform one-dimensional
unconstrained
optimization of
augmented objective (!)
in order to find !*

Assemble new estimate


for the design vector
Xq = Xq-1 + !* Sq

Yes
Converged? Exit

No

Update B

Figure 2.B.1: A flowchart for the SQP method

101
2.B.2 Direction Finding in SQP
The first step, finding Sq , is referred to as the direction finding problem. In
order to solve the direction finding problem in SQP, the original optimization
problem is approximated at the current design Xq−1 using a quadratic ob-
jective and linear constraints, thus forming a quadratic programming (QP)
problem (in terms of S). This allows the use of very efficient solution pro-
cedures that are available for QP problems.
The quadratic approximation to the objective at the point Xq−1 (in
terms of S) is
1
FQP (S) = F (Xq−1 ) + ∇F (Xq−1 )T S + ST BS (2.B.2)
2
where B is an approximation to the Hessian of the Lagrangian of the
original problem. The matrix B is initialized as the identity matrix I, after
which it is updated at every major iteration using a method like BFGS
(Broyden-Fletcher-Goldfarb-Shanno) [Vanderplaats, 2007].
The linear approximations to the constraints at the point Xq−1 (in terms
of S) are
gQPi (S) = gi (Xq−1 ) + ∇gi (Xq−1 )T S ≤ 0 (2.B.3)
for i = 1, . . . , m, where m is the number of constraints.
Note that the approximation to the constraints that is normally used is
slightly more complicated than the one described here. We use this simplified
version for clarity. For the details refer to e.g. [Vanderplaats, 2007].
Now, by optimizing this QP problem we find S∗ , which represents the
new direction for the search. This is because the vector S has the old design
point Xq−1 as its origin. Thus, the optimum of the QP problem is the new
search direction for the line search (equation 2.B.1): Sq = S ∗ .
This completes step 1, determining the search direction. Now, for step
2, the original problem needs to be optimized in this search direction.

2.B.3 One-Dimensional Optimization in SQP


An augmented form of the original optimization problem is used in order
to allow unconstrained optimization [Vanderplaats, 2007]. The resulting
one-dimensional augmented objective in terms of α is
m
X
φ(α) = F (X(α)) + ui max[0, gi (X(α))] (2.B.4)
i=1

where
X(α) = Xq−1 + αSq (2.B.5)

102
and ui is a penalty parameter. The Lagrange multipliers of the approximate
QP problem can be used as penalty parameters, as described in [Vander-
plaats, 2007], but the details are omitted here for clarity.
This optimization yields the line search optimum, α∗ , which is then
substituted into equation 2.B.1 in order to find the new estimate for the
design vector, i.e. Xq .
After the new estimate has been found, the matrix B is updated, and a
new (major) iteration is started (unless the optimum has been found).
Even though SQP is very efficient for well behaved problems, it has
problems handling non-smooth functions, just like other gradient based op-
timizers.

103
Appendix 2.C

Pathfinding

The pathfinding problem is a discrete optimization problem in which the


goal is to find a path from a starting point to an end point while avoiding
obstacles (i.e. satisfying constraints) and minimizing cost (e.g. distance or
travel time) [Patel, 2008].
Pathfinding problems are best known from their occurrence in games
and in GPS navigation, but they are also common in engineering design,
where they are often called routing problems.
Routing problems frequently occur in aerospace engineering (wire har-
nesses), maritime engineering (piping), electrical engineering (microproces-
sor circuits), and so on and so forth [van der Velden, 2008].

2.C.1 Graph Search


Pathfinding or routing algorithms are graph search algorithms. Tree search,
as used in the branch-and-bound algorithm, is also a special kind of graph
search (hierarchic).
A graph in the mathematical sense is a set of nodes or vertices connected
by lines or edges. If the edges of the graph have a direction (indicated by
an arrow) the graph is called a directed graph or digraph [Weisstein, 2008a],
otherwise it is an undirected graph. An example of a digraph is depicted in
figure 2.C.1.

u z

x y

Figure 2.C.1: An example of a directed graph

104
The edges can have weights, e.g. indicating cost. Not surprisingly, a
graph with weighted edges is called a weighted graph. The travelling sales-
man problem, for example, can be posed in the form of a weighted graph.
In that case the vertices represent cities and the edges represent the cost of
travelling between cities.
In many routing problems it is common practice to represent the search
space as a (multidimensional) grid. Such a grid can also be interpreted as a
graph, as depicted in figure 2.C.2 for a two-dimensional grid. A move from
one grid cell to another is equivalent to a move from one vertex to another.

1 2 3

1 11 12 13 11 12 13

2 21 22 23 21 22 23

3 31 32 33 31 32 33

Figure 2.C.2: A search grid can be represented as an undirected graph. Vertex labels
represent grid location ij (row i,column j). Left-to-right: search grid, graph without
diagonal movement (taxicab), graph with diagonal movement

A grid, or the equivalent graph, can be searched in different ways. Some


important approaches are breadth-first, best-first, and depth-first. The dif-
ference in search order is depicted in figure 2.C.3.

1 3 6 10 1 3 5 12 1 14 15 16

2 5 9 13 2 4 11 13 2 11 12 13

4 8 12 15 6 8 10 15 3 8 9 10

7 11 14 16 7 9 14 16 4 5 6 7

Figure 2.C.3: Various graph search approaches applied to a grid-based graph. Left-
to-right: breadth-first, best-first (order of best nodes, e.g. 3,2,6,8,5,12,10,14), and
depth-first.

During the first iteration, a breadth-first search routine examines all the
vertices that are directly connected to the source vertex. During the second
iteration, the routine examines all the vertices that are directly connected
to the new vertices that were examined in the first iteration, and that have
not been examined yet. This process continues until some goal is reached.

105
A best-first search routine also starts by examining all the vertices di-
rectly connected to the source vertex. Then it moves to the best of the
examined vertices (best according to some criterium). From there, the rou-
tine again examines the directly connected vertices. Then the routine takes
into account all the vertices hitherto examined that still have unexplored
edges (also from previous iterations), and moves to the best of those ver-
tices. This is repeated until some goal is reached.
A depth-first search routine starts by examining the first vertex directly
connected to the source vertex, then proceeds by examining the first vertex
connected to that newly examined vertex, and so on until it reaches a vertex
with no remaining edges, after which it returns to the previous vertex to
examine the next directly connected vertex that has not been examined yet,
and so on.
The maze algorithm is an example of a breadth-first search algortihm.

2.C.2 Maze Routing


An example of a routing problem in grid form, from [van der Velden, 2008],
is depicted in figure 2.C.4. The starting point or source is labeled S, the
target is labeled T, and obstacles (constraints) are represented by black
cells. Think of what the corresponding graph would look like: An obstacle
effectively removes a vertex and the associated edges.

Figure 2.C.4: A simple example of a two-dimensional routing problem (source:


[van der Velden, 2008])

106
One of the earliest approaches to solving such a problem is the maze
algorithm by [Lee, 1961]. The maze algorithm works by propagating a wave
from source to target. At every step the cells at a taxicab distance of k from
the source (as explained below in the gray box) are labeled k. Then k is
incremented by one, and the labeling is repeated. This goes on until the
target is reached, after which the algorithm backtracks to find a shortest
path. The result is depicted in figure 2.C.5.

Vector norms [Weisstein, 2008b]


A vector norm represents a (strictly positive) measure for the length of a
vector. The so called Lp norm of a vector x is defined as
!1
X p

|x|p = |xi |p (2.C.1)


i

where p determines the type of norm.


Perhaps the most familiar type of vector norm is the L2 norm, better known
as the Euclidean norm or Euclidean distance. For p = 2 we find
sX
|x|2 = |xi |2 (2.C.2)
i

Note that the Euclidean distance of x is often simply denoted |x|, without
the subscript. This norm represents the shortest possible distance between
two points.
Another distance measure of interest is the L1 norm, also known as taxicab
distance or Manhattan distance. For p = 1 we find
X
|x|1 = |xi | (2.C.3)
i

The taxicab distance is the shortest possible distance between two points if
only orthogonal movement is allowed.

Backtracking simply means working from the target to the source, each
time decreasing k to find the next cell. In the example, after the target has
been reached, the algorithm selects the cell with value 12 that borders the
target, then finds a cell valued 11, then one valued 10, and so on an so forth
until the source is reached. This is indicated by the arrow in the figure. The
result is guaranteed to be a shortest path. Often multiple shortest paths are
possible, as can be seen in the example.
In terms of graph theory, Lee’s maze routing algorithm is essentially a
breadth-first search algorithm. All the vertices connected to the current
vertex are evaluated before moving on to the next iteration, during which

107
all the vertices connected to those new vertices are evaluated, and so on.
Note that the basic maze algorithm evaluates a very large part of the
search space. This leads to the question whether it is possible to refine the
method so as to produce a more efficient algorithm. One possible refinement
would be to include some notion of closeness to a solution. Methods like
those are called heuristic methods.

2 1 2 3 4 5
1 S 1 2 3 4 12
4 5 11 12
5 6 7 8 9 10 11 12
10 9 8 7 6 7 8 9 10 11 12
11 10 9 8 7 8 9 10 11 12
12 11 10 9 8 9 10 T
12 11 10 9 10 11

Figure 2.C.5: One possible solution of the two-dimensional routing problem using
the maze algorithm (source: [van der Velden, 2008]). The number in each cell
indicates the taxicab distance from that cell to the source, and the arrow represents
backtracking.

2.C.3 Heuristics
Heuristics are common sense rules. These are generally informal and based
on experimentation and trial-and-error techniques, as opposed to formal
mathematical rules. A heuristic algorithm should be able to judge whether
the problem is closer to a solution after each iteration [Soanes and Stevenson,
2008]. Heuristics could be used to improve the maze algorithm.
The maze algorithm evaluates a very large part of the search space. This
is largely due to the blind propagation mechanism that is employed, which
causes the algorithm to search in all directions regardless of where the target
is located.
A common sense approach to improve the efficiency of the algorithm
would be to incorporate some rule that says in which direction the algorithm
would most likely have to search in order to find a shortest path.
An example of such a rule is to search only in a direction that decreases
some estimate of distance to the target. Usually a shortest distance measure
is used as an estimate, not taking into account detours that are necessary
due to obstacles and other constraints. Examples of these distance measures
are the L1 norm and the L2 norm (as explained in the gray box).
A well-known algorithm that uses heuristics to efficiently find a best path
is the A∗ algorithm.

108
2.C.4 The A∗ Algorithm
The A∗ algorithm is often used in engineering applications as well as in the
game industry. In real-time-strategy games such as Command and Con-
quer, for example, units need to find a way to their target without too
many detours. Similar problems are encountered in wire harness routing in
the aerospace industry. The A∗ algorithm is a generalization of Dijkstra’s
algorithm1 [Dijkstra, 1959].
Dijkstra’s algorithm is a so called best-first graph search algorithm. It is
a formal approach, in that it is guaranteed to yield a shortest path. At each
iteration the algorithm examines the closest vertices that have not been
examined yet. Thus it expands outward from the source until it reaches
the target, then backtracks. The algorithm is therefore similar to the maze
algorithm discussed earlier.
Dijkstra’s algorithm uses the actual distance to the source as a cost
function. A heuristic approach, on the other hand, would be to use an
estimate for the distance to the target as the cost function. This yields a
best-first approach, which is not guaranteed to find a shortest path, but is
generally much faster than dijkstra’s algorithm.
As long as there are no obstacles a best-first algorithm works pretty well,
but in the presence of obstacles it is likely to yield suboptimal paths. This is
mainly due to the fact that it does not take into account the actual distance
that has been travelled from the source to the current vertex.
The A∗ algorithm uses a combination of Dijkstra’s algorithm and a
heuristic, combining the best of both worlds: It can guarantee a shortest
path and it is guided by a heuristic to improve performance.
For each vertex n that is visited, the A∗ algorithm stores three values: the
actual distance traveled from the source g(n), the estimated distance to the
target h(n) (some heuristic), and the sum of these two f (n) = g(n) + h(n).
The algorithm also maintains a priority queue. The vertex highest in the
priority queue is the first to be examined next. The lower the value of f (n)
for a vertex, the higher its priority.
At each step of the algorithm, the node with the lowest f (n) value (i.e.
the highest priority) is removed from the queue. Then the f (n) and h(n) val-
ues of its neighbors are updated accordingly, and these neighbors are added
to the queue. The algorithm continues until a goal node has a lower f (n)
value than any node in the queue (or until the queue is empty) [Wikipedia,
2008].
An A∗ algorithm based on orthogonal movement would solve the example
problem as depicted in figure 2.C.6. When comparing this figure to the maze
solution, it becomes clear that the A∗ algorithm is much more efficient.
1
Be sure to check out Dijkstra’s manuscripts at the E.W. Dijkstra Archive [Richards,
2008].

109
S

Figure 2.C.6: The vertices evaluated by an A∗ algorithm (by [Patel, 2008]) in order
to solve the simple routing problem from [van der Velden, 2008].

Many variations on the A∗ and other search algorithms exist, but we


will not discuss them here. The reader is referred to literature for more
information about those.

110

You might also like