Professional Documents
Culture Documents
Optimization Problems
Panos Parpas∗, Berç Rustem∗
December 2006
Abstract
The Laplace method has found many applications in the theoretical
and applied study of optimization problems. It has been used to study:
the asymptotic behavior of stochastic algorithms, ‘phase transitions’
in combinatorial optimization, and as a smoothing technique for non–
differentiable minimax problems. This article describes the theoretical
foundation and practical applications of this useful technique.
1 Background
Laplace’s method is based on an ingenious trick used by Laplace in one his
papers[19]. The technique is most frequently used to perform asymptotic
evaluations to integrals that depend on a scalar parameter t, as t tends to
infinity. Its use can be theoretically justified for integrals in the following
form:
−f (x)
Z
I(t) = exp{ }dΛ(x).
A T (t)
Where f : Rn → R, T : R → R, are assumed to be smooth, and T (t) → 0 as t
tends to ∞. A is some compact set, and Λ is some measure on B (the σ−field
generated by A). We know that since A is compact, the continuous function
f will have a global minimum in A. For simplicity, assume that the global
minimum x∗ is unique, and that it occurs in the interior A. Under these
conditions, and as t tends to infinity, only points that are in the immediate
neighborhood of x∗ contribute to the asymptotic expansion of I(t) for large t.
The heuristic argument presented above can be made precise. The complete
argument can be found in [2], and in [4]. Instead we give a heuristic but
didactic argument that is usually used when introducing the method.
∗
Department of Computing, Imperial College, London SW7 2AZ
1
1.1 Heuristic foundations of the method
For the purpose of this subsection only, assume that f is a function of one
variable, and that A is given by some interval [a, b]. It will be instructive to
give a justification of the method based on the one dimensional integral:
Z b
f (x)
K(t) = exp{− }dx.
a t
Suppose that f has a unique global minimum, say c, such that c ∈ (a, b).
As t is assumed to be large, we only need to take into account points near c
when evaluating K(t). We therefore approximate K(t) by K(t; ǫ). The latter
quantity is given by:
Z c+ǫ
f (x)
K(t; ǫ) = exp{− }dx.
c−ǫ t
The limits of the integral above can be extended to infinity. This extension
can be justified by the fact only points around c contribute to the asymptotic
evaluation of the integral.
Z +∞
f (c) f ′′ (c)(x − c)2
K(t; ǫ) ≈ exp{− } exp{− }dx
t −∞ 2t
s
f (c) 2πt
= exp{− } .
t f ′′ (c)
2
2 Applications
Consider the following problem:
F ∗ = min f (x)
(2.1)
s.t gi(x) ≤ 0 i = 1, . . . , l.
Let S denote the feasible region of the problem above, and assume that it is
nonempty, and compact, then:
Where,
Z n −f (x) o
c(t) , exp dΛ
S t
Z n −f (x) o (2.3)
= exp Ix (S)dΛ.
Rn t
S ∗ = {x ∈ S | f (x) = F ∗ }.
Often the only way to solve such problems is by using a stochastic method.
Deterministic methods are also available but are usually applicable to low di-
mensional problems. When designing stochastic methods for global optimiza-
tion, it is often the case that the algorithm can be analyzed as a stochastic
3
process. Then in order to analyze the behavior of the algorithm we can ex-
amine the asymptotic behavior of the stochastic process. In order to perform
this analysis we need to define a probability measure that has its support in
S ∗ . This strategy has been implemented in [16, 6, 9, 10, 8, 3, 7].
A well known method for obtaining a solution to an unconstrained opti-
mization problem is to consider the following Ordinary Differential Equation
(ODE):
dX(t) = −∇f (X(t))dt. (2.4)
By studying the behavior of X(t) for large t, it can be shown that X(t)
will eventually converge to a stationary point of the unconstrained problem.
A review of, so called, continuous-path methods can be found in [22]. More
recently, application of this method to large scale problems was considered by
Li-Zhi et. al. [13]. A deficiency of using (2.4) to solve optimization problems
is that it will get trapped in local minima. In order to allow the trajectory
to escape from local minima, it has been proposed by various authors (e.g.
[1, 3, 7, 12, 8, 16]) to add a stochastic term that would allow the trajectory
to “climb” hills. One possible augmentation to (2.4) that would enable us to
escape from local minima is to add noise. One then considers the diffusion
process: p
dX(t) = −∇f (X(t))dt + 2T (t)dB(t). (2.5)
Where B(t) is the standard Brownian motion in Rn . It has been shown in
[3, 7, 8], under appropriate conditions on f , that if the annealing schedule is
chosen as follows:
c
T (t) , , for some c ≥ c0 , (2.6)
log(2 + t)
4
2.2 Phase Transitions in Combinatorial Optimization
The aim in combinatorial optimization is to select from a finite set of configu-
rations of the system, the one that minimizes an objective function. The most
famous combinatorial problem is the Travelling Salesman Problem (TSP). A
large part of theoretical computer science is concerned with estimating the
complexity of combinatorial problems. Loosely speaking, the aim of com-
putational complexity theory is to classify problems in terms of their degree
of difficulty. One measure of complexity is time complexity, and worst case
time complexity has been the aspect that received most attention. We re-
fer the interested reader to [15] for results in this direction. We will briefly
summarize results that have to do with average time complexity, the Laplace
method, and phase transitions.
Most of complexity theory is concerned with worst case complexity. How-
ever, many useful methods (e.g. the simplex method) will require an expo-
nential amount of time to converge only in pathological cases. It is therefore
of great interest to estimate average case complexity. The physics community
has recently proposed the use of tools from statistical mechanics as one way
of estimating average case complexity. A review in the form of a tutorial can
be found in [14]. Here we just briefly adumbrate the main ideas.
The first step in the statistical mechanics approach is to define a prob-
ability measure on the configuration of the system. This definition is done
with the Boltzmann density:
exp{− 1t f (C)}
pt (C) = P .
exp{− 1t f (C)}
C
The preceding equation is of course the discrete version of (2.7). Using the
above definition, the average value of the objective function is given by:
X
< ft >= pt (C)f (C).
C
5
2.3 Worst Case Optimization
In many areas where optimization methods can be fruitfully applied, worst
case analysis can provide considerable insight into the decision process. The
fundamental tool for worst case analysis is the continuous minimax problem:
min Φ(x),
x∈X
where Φ(x) = maxy∈Y f (x, y). The continuous minimax problem arises in
numerous disciplines, including n–person games, finance, economics and pol-
icy optimization (see [18] for a review). In general, they are used by the
decision maker to assess the worst–case strategy of the opponent and com-
pute the optimal response. The opponent can also be interpreted as nature
choosing the worst–case value of the uncertainty, and the solution would be
the strategy which ensures the optimal response to the worst–case. Neither
the robust decision maker nor the opponent would benefit by deviating uni-
laterally from this strategy. The solution can be characterized as a saddle
point when f (x, ·) is convex in x and f (·, y) is concave in y. A survey of
algorithms for computing saddle points can be found in [5, 18].
Evaluating Φ(x) is extremely difficult due to the fact that global opti-
mization is required over Y . Moreover, this function will in general be non–
differentiable. For this reason, it has been suggested by many researchers
(e.g. [17, 21]) to approximate Φ(x) with Φ(x; t) given by:
f (x, y)
Z
Φ(x; t) = exp{− }dy.
Y t
This is of course another application of the Laplace method, and it can easily
be seen that:
lim −t ln Φ(x; t) = Φ(x).
t↓0
This idea has been implemented in [17, 21] with considerable success.
References
[1] F. Aluffi-Pentini, V. Parisi, and F. Zirilli. Global optimization and
stochastic differential equations. J. Optim. Theory Appl., 47(1):1–16,
1985.
[2] Carl M. Bender and Steven A. Orszag. Advanced mathematical meth-
ods for scientists and engineers. I. Springer-Verlag, New York, 1999.
Asymptotic methods and perturbation theory, Reprint of the 1978 orig-
inal.
6
[3] T.S. Chiang, C.R. Hwang, and S.J. Sheu. Diffusion for global optimiza-
tion in Rn . SIAM J. Control Optim., 25(3):737–753, 1987.
[6] S.B. Gelfand and S.K. Mitter. Recursive stochastic algorithms for global
optimization in Rd . SIAM J. Control Optim., 29(5):999–1018, 1991.
[7] S. Geman and C.R. Hwang. Diffusions for global optimization. SIAM
J. Control Optim., 24(5):1031–1043, 1986.
[13] Li-Zhi L., Liqun Q., and Hon W.T. A gradient-based continuous method
for large-scale optimization problems. Journal of Global Optimization,
31(2):271, 2005.
7
[15] Christos H. Papadimitriou. Computational complexity. Addison-Wesley
Publishing Company, Reading, MA, 1994.
[18] Berç Rustem and Melendres Howe. Algorithms for worst-case design and
applications to risk management. Princeton University Press, Princeton,
NJ, 2002.
[21] Song Xu. Smoothing method for minimax problems. Comput. Optim.
Appl., 20(3):267–279, 2001.