You are on page 1of 8

Applications of the Laplace Method to

Optimization Problems
Panos Parpas∗, Berç Rustem∗
December 2006

Abstract
The Laplace method has found many applications in the theoretical
and applied study of optimization problems. It has been used to study:
the asymptotic behavior of stochastic algorithms, ‘phase transitions’
in combinatorial optimization, and as a smoothing technique for non–
differentiable minimax problems. This article describes the theoretical
foundation and practical applications of this useful technique.

1 Background
Laplace’s method is based on an ingenious trick used by Laplace in one his
papers[19]. The technique is most frequently used to perform asymptotic
evaluations to integrals that depend on a scalar parameter t, as t tends to
infinity. Its use can be theoretically justified for integrals in the following
form:
−f (x)
Z
I(t) = exp{ }dΛ(x).
A T (t)
Where f : Rn → R, T : R → R, are assumed to be smooth, and T (t) → 0 as t
tends to ∞. A is some compact set, and Λ is some measure on B (the σ−field
generated by A). We know that since A is compact, the continuous function
f will have a global minimum in A. For simplicity, assume that the global
minimum x∗ is unique, and that it occurs in the interior A. Under these
conditions, and as t tends to infinity, only points that are in the immediate
neighborhood of x∗ contribute to the asymptotic expansion of I(t) for large t.
The heuristic argument presented above can be made precise. The complete
argument can be found in [2], and in [4]. Instead we give a heuristic but
didactic argument that is usually used when introducing the method.

Department of Computing, Imperial College, London SW7 2AZ

1
1.1 Heuristic foundations of the method
For the purpose of this subsection only, assume that f is a function of one
variable, and that A is given by some interval [a, b]. It will be instructive to
give a justification of the method based on the one dimensional integral:
Z b
f (x)
K(t) = exp{− }dx.
a t

Suppose that f has a unique global minimum, say c, such that c ∈ (a, b).
As t is assumed to be large, we only need to take into account points near c
when evaluating K(t). We therefore approximate K(t) by K(t; ǫ). The latter
quantity is given by:
Z c+ǫ
f (x)
K(t; ǫ) = exp{− }dx.
c−ǫ t

Expanding f to second order, and by noting that f ′ (c) = 0, we obtain the


following approximation:

f (c) + 12 f ′′ (c)(x − c)2


Z c+ǫ
K(t; ǫ) ≈ exp{− }dx
c−ǫ t
Z c+ǫ
f (c) f ′′ (c)(x − c)2
= exp{− } exp{− }dx.
t c−ǫ 2t

The limits of the integral above can be extended to infinity. This extension
can be justified by the fact only points around c contribute to the asymptotic
evaluation of the integral.
Z +∞
f (c) f ′′ (c)(x − c)2
K(t; ǫ) ≈ exp{− } exp{− }dx
t −∞ 2t
s
f (c) 2πt
= exp{− } .
t f ′′ (c)

In conclusion we have that:


s
f (c) 2πt
lim K(t) = exp{− } .
t→∞ t f ′′ (c)

Rigorous justifications of the above arguments can be found in [4]. These


types of results are standard in the field of asymptotic analysis. The same
ideas can be applied to optimization problems.

2
2 Applications
Consider the following problem:

F ∗ = min f (x)
(2.1)
s.t gi(x) ≤ 0 i = 1, . . . , l.

Let S denote the feasible region of the problem above, and assume that it is
nonempty, and compact, then:

lim −ǫ ln c(t) = F ∗ . (2.2)


t↓0

Where,
Z n −f (x) o
c(t) , exp dΛ
S t
Z n −f (x) o (2.3)
= exp Ix (S)dΛ.
Rn t

Λ is any measure on (Rn , B). A proof of (2.2) can be found in [16].


The relationship in (2.3) can be evaluated using the Laplace method. The
link between the Laplace method and optimization has been explored in:

• Stochastic methods for global optimization.

• Phase transitions in combinatorial optimization.

• Algorithms for worst case analysis.

These application areas will be explored next.

2.1 Stochastic Methods for Global Optimization


Global optimization is concerned with the computation of global solutions
of (2.1). In other words, one seeks to compute F ∗ , and if possible obtaining
points from the following set:

S ∗ = {x ∈ S | f (x) = F ∗ }.

Often the only way to solve such problems is by using a stochastic method.
Deterministic methods are also available but are usually applicable to low di-
mensional problems. When designing stochastic methods for global optimiza-
tion, it is often the case that the algorithm can be analyzed as a stochastic

3
process. Then in order to analyze the behavior of the algorithm we can ex-
amine the asymptotic behavior of the stochastic process. In order to perform
this analysis we need to define a probability measure that has its support in
S ∗ . This strategy has been implemented in [16, 6, 9, 10, 8, 3, 7].
A well known method for obtaining a solution to an unconstrained opti-
mization problem is to consider the following Ordinary Differential Equation
(ODE):
dX(t) = −∇f (X(t))dt. (2.4)
By studying the behavior of X(t) for large t, it can be shown that X(t)
will eventually converge to a stationary point of the unconstrained problem.
A review of, so called, continuous-path methods can be found in [22]. More
recently, application of this method to large scale problems was considered by
Li-Zhi et. al. [13]. A deficiency of using (2.4) to solve optimization problems
is that it will get trapped in local minima. In order to allow the trajectory
to escape from local minima, it has been proposed by various authors (e.g.
[1, 3, 7, 12, 8, 16]) to add a stochastic term that would allow the trajectory
to “climb” hills. One possible augmentation to (2.4) that would enable us to
escape from local minima is to add noise. One then considers the diffusion
process: p
dX(t) = −∇f (X(t))dt + 2T (t)dB(t). (2.5)
Where B(t) is the standard Brownian motion in Rn . It has been shown in
[3, 7, 8], under appropriate conditions on f , that if the annealing schedule is
chosen as follows:
c
T (t) , , for some c ≥ c0 , (2.6)
log(2 + t)

where c0 is a constant positive scalar (the exact value of c0 is problem depen-


dent). Under these conditions, as t → ∞, the transition probability of X(t)
converges (weakly) to a probability measure Π. The latter, has its support
on the set of global minimizers. A characterization of Π was given by Hwang
in [11]. It was shown that Π is the weak limit of the following, so called,
Boltzmann density:
   Z   −1
f (x) f (x)
p(t, x) = exp − exp − dx . (2.7)
T (t) Rn T (t)

Discussion of the conditions for the existence of Π, can be found in [11].


A description of Π in terms of the Hessian of f can also be found in [11].
Extensions of these results to constrained optimization problems appear in
[16].

4
2.2 Phase Transitions in Combinatorial Optimization
The aim in combinatorial optimization is to select from a finite set of configu-
rations of the system, the one that minimizes an objective function. The most
famous combinatorial problem is the Travelling Salesman Problem (TSP). A
large part of theoretical computer science is concerned with estimating the
complexity of combinatorial problems. Loosely speaking, the aim of com-
putational complexity theory is to classify problems in terms of their degree
of difficulty. One measure of complexity is time complexity, and worst case
time complexity has been the aspect that received most attention. We re-
fer the interested reader to [15] for results in this direction. We will briefly
summarize results that have to do with average time complexity, the Laplace
method, and phase transitions.
Most of complexity theory is concerned with worst case complexity. How-
ever, many useful methods (e.g. the simplex method) will require an expo-
nential amount of time to converge only in pathological cases. It is therefore
of great interest to estimate average case complexity. The physics community
has recently proposed the use of tools from statistical mechanics as one way
of estimating average case complexity. A review in the form of a tutorial can
be found in [14]. Here we just briefly adumbrate the main ideas.
The first step in the statistical mechanics approach is to define a prob-
ability measure on the configuration of the system. This definition is done
with the Boltzmann density:

exp{− 1t f (C)}
pt (C) = P .
exp{− 1t f (C)}
C

The preceding equation is of course the discrete version of (2.7). Using the
above definition, the average value of the objective function is given by:
X
< ft >= pt (C)f (C).
C

Tools and techniques of statistical mechanics can be used to calculate ‘com-


putational phase transitions’. A computational phase transition is an abrupt
change in the computational effort required to solve a combinatorial opti-
mization problem. It is beyond the scope of this article to elaborate on this
interesting area of optimization. We refer the interested reader to the review
in [14]. The book of Talagrand [20] presents some rigorous results on this
subject.

5
2.3 Worst Case Optimization
In many areas where optimization methods can be fruitfully applied, worst
case analysis can provide considerable insight into the decision process. The
fundamental tool for worst case analysis is the continuous minimax problem:
min Φ(x),
x∈X

where Φ(x) = maxy∈Y f (x, y). The continuous minimax problem arises in
numerous disciplines, including n–person games, finance, economics and pol-
icy optimization (see [18] for a review). In general, they are used by the
decision maker to assess the worst–case strategy of the opponent and com-
pute the optimal response. The opponent can also be interpreted as nature
choosing the worst–case value of the uncertainty, and the solution would be
the strategy which ensures the optimal response to the worst–case. Neither
the robust decision maker nor the opponent would benefit by deviating uni-
laterally from this strategy. The solution can be characterized as a saddle
point when f (x, ·) is convex in x and f (·, y) is concave in y. A survey of
algorithms for computing saddle points can be found in [5, 18].
Evaluating Φ(x) is extremely difficult due to the fact that global opti-
mization is required over Y . Moreover, this function will in general be non–
differentiable. For this reason, it has been suggested by many researchers
(e.g. [17, 21]) to approximate Φ(x) with Φ(x; t) given by:
f (x, y)
Z
Φ(x; t) = exp{− }dy.
Y t
This is of course another application of the Laplace method, and it can easily
be seen that:
lim −t ln Φ(x; t) = Φ(x).
t↓0

This idea has been implemented in [17, 21] with considerable success.

References
[1] F. Aluffi-Pentini, V. Parisi, and F. Zirilli. Global optimization and
stochastic differential equations. J. Optim. Theory Appl., 47(1):1–16,
1985.
[2] Carl M. Bender and Steven A. Orszag. Advanced mathematical meth-
ods for scientists and engineers. I. Springer-Verlag, New York, 1999.
Asymptotic methods and perturbation theory, Reprint of the 1978 orig-
inal.

6
[3] T.S. Chiang, C.R. Hwang, and S.J. Sheu. Diffusion for global optimiza-
tion in Rn . SIAM J. Control Optim., 25(3):737–753, 1987.

[4] N. G. de Bruijn. Asymptotic methods in analysis. Dover Publications


Inc., New York, third edition, 1981.

[5] V. F. Dem′ yanov and V. N. Malozëmov. Introduction to minimax. Dover


Publications Inc., New York, 1990. Translated from the Russian by D.
Louvish, Reprint of the 1974 edition.

[6] S.B. Gelfand and S.K. Mitter. Recursive stochastic algorithms for global
optimization in Rd . SIAM J. Control Optim., 29(5):999–1018, 1991.

[7] S. Geman and C.R. Hwang. Diffusions for global optimization. SIAM
J. Control Optim., 24(5):1031–1043, 1986.

[8] B. Gidas. The Langevin equation as a global minimization algorithm.


In Disordered systems and biological organization (Les Houches, 1985),
volume 20 of NATO Adv. Sci. Inst. Ser. F Comput. Systems Sci., pages
321–326. Springer, Berlin, 1986.

[9] B. Gidas. Simulations and global optimization. In Random media (Min-


neapolis, Minn., 1985), volume 7 of IMA Vol. Math. Appl., pages 129–
145. Springer, New York, 1987.

[10] B. Gidas. Metropolis-type Monte Carlo simulation algorithms and sim-


ulated annealing. In Topics in contemporary probability and its applica-
tions, Probab. Stochastics Ser., pages 159–232. CRC, Boca Raton, FL,
1995.

[11] C.R. Hwang. Laplace’s method revisited: weak convergence of proba-


bility measures. Ann. Probab., 8(6):1177–1182, 1980.

[12] H. J. Kushner. Asymptotic global behavior for stochastic approximation


and diffusions with slowly decreasing noise effects: global minimization
via Monte Carlo. SIAM J. Appl. Math., 47(1):169–185, 1987.

[13] Li-Zhi L., Liqun Q., and Hon W.T. A gradient-based continuous method
for large-scale optimization problems. Journal of Global Optimization,
31(2):271, 2005.

[14] O.C. Martin, R. Monasson, and R. Zecchina. Statistical mechanics meth-


ods and phase transitions in optimization problems. Theoret. Comput.
Sci., 265(1-2):3–67, 2001.

7
[15] Christos H. Papadimitriou. Computational complexity. Addison-Wesley
Publishing Company, Reading, MA, 1994.

[16] P. Parpas, B. Rustem, and E. Pistikopoulos. Linearly constrained global


optimization and stochastic differential equations. Journal of Global
Optimization, 36(2):191–217, October 2006.

[17] E. Polak, J. O. Royset, and R. S. Womersley. Algorithms with adap-


tive smoothing for finite minimax problems. J. Optim. Theory Appl.,
119(3):459–484, 2003.

[18] Berç Rustem and Melendres Howe. Algorithms for worst-case design and
applications to risk management. Princeton University Press, Princeton,
NJ, 2002.

[19] Stephen M. Stigler. Laplace’s 1774 memoir on inverse probability.


Statist. Sci., 1(3):359–378, 1986.

[20] Michel Talagrand. Spin glasses: a challenge for mathematicians, vol-


ume 46 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge.
A Series of Modern Surveys in Mathematics [Results in Mathematics
and Related Areas. 3rd Series. A Series of Modern Surveys in Mathe-
matics]. Springer-Verlag, Berlin, 2003. Cavity and mean field models.

[21] Song Xu. Smoothing method for minimax problems. Comput. Optim.
Appl., 20(3):267–279, 2001.

[22] F. Zirilli. The use of ordinary differential equations in the solution of


nonlinear systems of equations. In Nonlinear optimization, 1981 (Cam-
bridge, 1981), NATO Conf. Ser. II: Systems Sci., pages 39–46. Academic
Press, London, 1982.

You might also like