Professional Documents
Culture Documents
Introduction
I. Introduction
Finding the global minimum can be a hard optimization problem since the objective functions can have many local
minima. A procedure for solving such problems should sample values of the objective function in such a way as to
have a high probability of finding a near-optimal solution and lend itself to efficient implementation. Such criteria
is met by the Simulated Annealing method which was introduced by Kirkpatrick et al. and independently by Cerny
in early 1980s.
Simulated Annealing (SA) is a stochastic computational technique derived from statistical mechanics for finding
near globally-minimum-cost solutions to large optimization problems [1].
Statistical mechanics is the study of the behavior of large systems of interacting components, such as atoms in a
fluid, in thermal equilibrium at a finite temperature. If the system is in thermal equilibrium, then its particles have
a probability to change from one state of energy to another given by Boltzmann distribution which is dependent on
the system temperature and the magnitude of the pretended energy change. This in such a way that higher temperatures allow random changes while low temperatures tend to allow only decreasing energy state changes.
In order to achieve a low-energy state, one must use an annealing process, which consists on elevating the system
temperature and gradually lower it down and spending enough time at each temperature to reach thermal equilibrium.
In contrast to many of the classical optimization methods, this one is not based in gradients and it does not has a
deterministic convergence: the same seed and parametric configuration may make the algorithm converge to a different solution from one run to another. This is due to the random nature on how it decides to make its steps towards
the final candidate solution.
Such behavior may not result in the most precise/optimal solution, however it has other exploitable advantages: it
can get unstuck from local optimum points when the algorithm is in a high-energy state, it can deal with noisy
objective functions, and it can be used for combinatorial/discrete optimization, among others. All this with a small
number of iterations / function evaluations in comparison to other optimization methods. When applicable, the SA
algorithm can be used alternately with other methods to increase the accuracy of the final solution.
In this document the basic theory of this algorithm is explained and some of its benefits are verified with practical
examples.
Page 1
P (c, n )
1
1 e E / T
(1)
Where E is the change on the objective function value from c to n, and T is the current algorithm temperature
parameter value. Notice that this function tends to have a value of 0.5 when T>>E and that it varies from ~0 to
~1 in other cases. Therefore, the higher the temperature the more random the algorithm becomes (which matches
with its natural model) and any candidate get the same chance of being chosen or rejected.
When T1 the behavior of P(c,n) becomes very similar to an time-inverted unit-step-function: its value goes rapidly to 0 when E>0 and to 1 when E<0. This means that, when the temperature goes very low, only good candidates will be chosen and that all bad candidates will be rejected. This behavior is illustrated in Figure 1 for different
values of T.
The values of T are determined by a custom function of time (or iterations) that is called cooling function or temperature profile. Changing the temperature profile can lead to different properties of the SA. For example, a simple
decreasing ramp may provoke different performance than a slower transition function. A periodic temperature profile may lead to the possibility of abandoning local optimum values to get a better one.
The graphs in Figure 2 show different temperature profiles with the properties mentioned in the previous paragraph.
The best choice depends on the characteristics of each problem and the strategy that the engineer pretends to use.
The pseudo-code can be consulted in TABLE 1 [2].
III.
Practical examples
In this section the SA algorithm performance is studied and compared against the Conjugate Gradients (FR) and
the Nelder-Mead (native from MATLAB) optimization algorithms. This is done across a collection of problems
Page 2
y ( x1 6) 2
1
( x 2 4.5) 4
25
(2)
In order to assess the Simulated Annealing algorithm performance, it is compared with the Conjugate Gradients FR
and the Nelder Mead algorithms. The results of applying them to the Bowl function are shown in the TABLE 3.
Also, as a visual aid on how the Simulated Annealing chooses the next steps please look at the Figure 3.
With the purpose of exploring the behavior far away from the optimum point, lets trigger the algorithm using the
point [-20, 60] as the seed value. The results can be checked in TABLE 4 and the Simulated Annealing algorithm
evolution can be observed in Figure 4.
10
4x1
2 4x 2
y sin
cos
e
13
13
2
(3)
Lets also limit the solution space to the range 0 to 12 for both x1 and x2. With this scenario the analytic solution
happens at [11.4285, 9.8035]. Such limits are incorporated to the problem with punishment functions which increase the value of the function as it goes away from the ranges of interest.
The results of such experiments, as well as the Simulated Annealing evolution are shown in TABLE 6 and Figure
6 correspondingly.
Page 3
IV. Conclusions
The SA algorithm is a cheap optimization method compared to gradient based methods and the Nelder-Mead algorithm. Its capability of locating good-enough solutions in very short number of iterations makes it a tool that can be
used for initial objective function exploration. Once a set of good candidate solutions are gotten, the algorithm can
be adapted to finer steps and less randomness in order to achieve more precise solutions. Otherwise, its inexact
solutions can be used as seed values for other optimization algorithms that otherwise would get stuck in local minimum points along the objective function if they were initiated using the original seed value.
In order to increase the capability of the SA algorithm to escape the local minimum points, periodic temperature
profiles can be used so the function can recover the solution mobility after settling to a candidate solution. If it
happens to lose a global maximum due to this recovered randomness it will still report the most optimum point
along the search.
Having a variable step size (determined by the current algorithm temperature) also allows the algorithm to gradually
change its search style from coarse to fine benefiting its global solution search. Configuring the step is also helpful
when the objective function is noisy, since such functions tend to make gradient based algorithms to diverge since
they take the neighbor function values to determine the search direction and step magnitude.
Page 4
V. References
[1] J. Gall, "Simulated Annealing," in Computer Vision. A Reference Guide., Tbingen, Germany, Springerlink,
2014, p. 898.
[2] P. Rossmanith, "Simulated Annealing," in Algorithms Unplugged, Springerlink, 2011, p. 406.
[3] P. v. d. H. J. K. W. M. H. S. E. Aarts, "Simulated Annealing," in Metaheuristic Procedures for Training Neural
Networks, Springerlink, 2006.
Page 5
VI. Appendix
Tables
TABLE 1.
PSEUDO CODE FOR THE SIMULATED ANNEALING ALGORITHM. NOTICE THAT THE ACCEPTANCE OF A NEW POINT HAPPENS WITH A PROBABILITY P GIVEN BY THE PROBABILITY FUNCTION (1).
Page 6
Algorithm
Seed value
Found solution
[6.0000 4.4300]
Function
tions
31
[6.0000 4.5000]
[6.3730 5.7405]
174
70
3.6161e-06
0.0855
[5.9731 3.5287]
30
0.9717
[5.8626 5.1233]
10
0.6382
TABLE 4.
RESULTS OF MINIMIZING THE BOWL FUNCTION NOW USING [20,60] AS THE SEED VALUE FOR EACH ALGORITHM. NOTICE HOW THE CONJUGATE GRADIENTS FR ALGORITHM REQUIRES MUCH MORE FUNCTION ITERATIONS BEFORE CONVERGING TO A SOLUTION, WHILE THE NELDER-MEAD AND THE SIMULATED ANNEALIGN ALGORITHMS CONSERVE THE
NUMBER OF FUNCTIONS EVALUATIONS TO GET TO A POINT NEAR THE SOLUTION.
Algorithm
Seed value
Found solution
Page 7
Function
tions
[6.0000 4.4226]
9688
0.0774
[6.0000 4.5000]
[5.8363 4.8158]
180
70
1.3098e-06
0.3557
[6.0663 4.0117]
30
0.4928
TABLE 5.
RESULTS FROM THE EXPERIMENTS MADE WITH THE NOISY BOWL FUNCTION AND USING THE CONJUGATE GRADIENT FR,
NELDER-MEAD AND THE SIMULATED ANNEALING (10 FUNCTION EVALUATIONS ONLY). NOW IT BECOMES OBVIOUS THAT
THE SIMULATED ANNEALING GOT A MUCH BETTER RESULT THAN THE OTHER ALGORITHMS WHICH SIMPLY DIVERGE
FROM THE ANALYTIC SOLUTION.
Algorithm
Conjugate
ents FR
Seed value
Gradi- [1, 1]
Nelder-Mead
SA (10 elements
downward
ramp
profile)
Found solution
Function
tions
37317
[5.8477 -1.0434]
(maxed out iterations)
[1.0590 0.9977]
401
(maxed out function
evaluations)
[6.0581 4.3855]
10
6.0563
0.1284
TABLE 6.
RESULTS OF APPLICATION OF THE CONJUGATE GRADIENTS FR, NELDER-MEAD AND SIMULATED ANNEALING ALGORITHMS FOR MINIMIZING THE FUNCTION DESCRIBED BY (3). NOTICE NOW THAT THE SIMULATED ANNEALING ALGORITHM IS THE ONE THAT IS CAPABLE OF GETTING A MUCH BETTER RESULT THAN THE OTHER ALGORITHMS. IT ALSO
NEEDED MORE TRIES IN ORDER TO GET SUCH SOLUTIONS (EACH TRY THROWS A DIFFERENT RESULT AS IT WORKS AS A
STOCHASTIC PROCESS
Algorithm
Seed value
Found solution
[4.9285 3.3035]
Function
tions
54
[4.9285 3.3035]
80
9.1924
SA (10 elements
downward
ramp
profile)
SA (30 elements
downward
ramp
profile) 5 tries.
SA (30 elements 3
cycle sawtooth) 2
tries.
SA (90 elements 3
cycle sawtooth)
[5.2441 7.0207]
10
6.7817
[11.0842 9.8603]
30*5=150
0.3490
[11.3353 10.0847]
30*2=60
0.2963
[11.2248 9.8939]
90
0.2229
Page 8
[11.6230 9.8049]
90
0.1945
TABLE 7
RESULTS OF MINIMIZING THE ERROR WITH RESPECT TO THE DESIGN SPECIFICATIONS FOR THE FILTER DESCRIBED IN
Case 4: Low-pass filter on micro-strip technology USING THE CONJUTAGE GRADIENTS FR, NELDER-MEAD AND SIMULATED ANNEALING ALGORITHMS. IN THIS CASE, THE NELDER-MEAD PERFORMED THE BEST, FOLLOWED BY THE SIMULATED ANNEALING ALGORITHM. THE CONJUGATE GRADIENTS FR METHOD DID NOT CONVERGE TO A SOLUTION EVEN AFTER
MORE THAN 30000 FUNCTION EVALUATIONS.
Algorithm
Found
solution
(mm) Relative to
seed
[1.4002
6.1634
4.9724]
[0.2730
1.1980
0.9154]
[0.5438
0.9741
1.6639]
37317
-0.0077
163
0.0837
63
evalua-
TABLE 8.
RESULTS OF MINIMIZING THE MAXIMUM ERROR FOR THE DESIGN PROBLEM Case 5: Noisy filter optimization. THE SAME FILTER AS IN Case 4: Low-pass filter on micro-strip technology HAS BEEN USED, BUT THIS TIME THERE IS A WHITE NOISE COMPONENT ADDED TO THE FILTER RESPONSE MAKING THIS PROBLEM MORE DIFFICULT. NOTICE HOW THE NELDER-MEAD
ALGORITHM IS STILL REPORTING GOOD RESULTS (NOT NEGATIVE BUT THE LOWEST IN THE TABLE) BY MAXING OUT ITS
FUNCTION EVALUATIONS. THE SIMULATED ANNEALING ALGORITHM DOES NOT CONVERGE TO THE BEST SOLUTION BUT
CAN BE USED TO GENERATE GOOD SEEDS NEAR THE REGION WHERE THE OPTIMUM POINT RESIDES.
Algorithm
Nelder-Mead
Found
solution
(mm) Relative to
seed
[0.2368
1.1248
0.8668]
[0.3401
0.8781
2.0473]
[0.5284
1.0444
1.3042]
[2.0137
1.9943
1.9928]
[1.4555
0.4623
2.4090]
[0.9258
0.8552
2.1638]
Page 9
601
0.1593
100
0.1326
100
0.8610
601
0.3735
100
0.2072
100
evalua-
Figures
Figure 1.
Plotted transition probability function. This describes how probable is to accept or reject a step given the energy difference between the current point and the proposed one ()
(a)
(b)
(c)
(d)
Figure 2. Different temperature profiles are used for this implementation of the Simulated Annealing algorithm. (a) shows a soft-transition
profile which is then used periodically in (c). (b) is a downwards ramp profile which is used periodically in (d).
Page 10
(a)
Figure 3.
(b)
Evolution of the Simulated Annealing algorithm for the Bowl function starting from the point [1,1]. The steps
that were accepted are marked with a circle. Notice how they concentrate near the analytic minimum. In (a) the
algorithm runs evaluating 70 points in the objective function while in (b) it uses only 10. Notice that number is
the number of points in the temperature profile function being used.
(a)
(b)
Figure 4.
Evolution of the Simulated Annealing algorithm for the Bowl function starting from the point [20,60]. This
time, 70 temperature profile points were used to produce (a) and 30 for (b).
Figure 5.
Evolution of the Simulated Annealing algorithm within the noisy Bowl function surface starting at point [1,1].
Page 11
(a)
(b)
(c)
Figure 6.
Evolution of the Simulated Annealing Algorithm applied to the function (3). All these scenarios are using a
sawtooth profile with different number of elements: (a) shows the case with 30 elements, (b) and (c) show the
behavior using 90 elements. Each element in the temperature profile translates to a function evaluation.
Page 12
Figure 7.
Dimensional description of the RF filter used for Case 4: Low-pass filter on micro-strip technology and Case 5:
Noisy filter optimization.
(a)
Figure 8.
(b)
(a) shows the evolution of the maximum design error for the problem at Case 4: Low-pass filter on micro-strip
technology as the Simulated Annealing algorithm progresses through the temperature profile shown in (b).
Notice that (b) has 64 elements and therefore the function is evaluated 64 times and that the best error value
happens between the evaluations #20 and #30.
Page 13