Simulated Annealing
Key Points
- •Simulated annealing is a probabilistic search that sometimes accepts worse moves to escape local optima, mimicking how metals cool and crystallize.
- •At temperature T, a worse move that increases cost by Δ is accepted with probability exp(-Δ/T), which decreases as T cools.
- •A good annealing schedule (initial temperature, cooling rate, and iterations per temperature) is critical for performance and solution quality.
- •Neighborhood design matters: small, meaningful random changes give a smooth search landscape and efficient exploration.
- •Exponential cooling ( · is the most common in practice; logarithmic cooling has convergence guarantees but is usually too slow.
- •Multiple restarts and occasional reheating improve robustness and help avoid getting stuck.
- •Time complexity is roughly the number of iterations times the cost to generate and evaluate a neighbor; careful delta updates are key.
- •Simulated annealing is widely used in TSP, routing, placement, scheduling, and many NP-hard problems where exact optimization is impractical.
Prerequisites
- →Probability and random variables — Understanding acceptance probabilities, distributions, and randomness is essential for the Metropolis rule and tuning.
- →Exponential and logarithmic functions — Cooling schedules and acceptance probabilities rely on exp and log behavior.
- →Local search / hill climbing — SA extends local search by probabilistic uphill acceptance; knowing the baseline helps with neighborhood design.
- →C++ random library — Implementing SA requires high-quality RNGs, distributions, and seeding.
- →Time and space complexity — To design efficient neighborhoods and delta updates you must analyze per-iteration costs.
- →Graph and geometry basics — Common SA applications like TSP and placement use distances and simple geometric reasoning.
- →Markov chains (basic) — SA is a Markov process; the idea of stationary distributions clarifies why the Metropolis rule works.
Detailed Explanation
Tap terms for definitions01Overview
Hook: Imagine trying to find the lowest valley in a gigantic, foggy mountain range. If you only walk downhill, you might end up stuck in a small basin; to find the deepest valley, you sometimes need to climb a bit first. Concept: Simulated annealing (SA) is a stochastic optimization technique that formalizes this idea. It treats the objective function like an energy landscape and explores it using random moves that are sometimes accepted even when they make the solution worse. The chance to accept bad moves is controlled by a temperature parameter T that decreases over time: at high temperature, the algorithm explores widely; at low temperature, it settles into exploitation. Example: In the Traveling Salesman Problem (TSP), a random 2-opt move may increase tour length. Early on, SA might accept this to escape a poor tour. As the temperature cools, the algorithm becomes increasingly conservative, keeping only improvements or very small degradations.
02Intuition & Analogies
Hook: Think of cooling molten metal. When it is very hot, atoms move freely and can rearrange to reduce internal stress. As it cools slowly, the material settles into a strong crystalline structure, ideally near the minimum energy state. Concept: SA borrows this thermodynamic story. We view a candidate solution as a configuration of atoms. The objective value is its energy. Temperature measures how much randomness we allow. At high T, the system is allowed to jiggle a lot, making big changes that can hop over barriers. As T drops, the system calms down and only accepts changes that do not increase energy too much. Example: Picture building a schedule for classes. Initially, you shuffle many classes around (high T) and even accept worse schedules to escape bottlenecks. Later, you make only small swaps and mostly accept improvements (low T), polishing the schedule you already have.
03Formal Definition
04When to Use
Hook: SA shines when exact algorithms are too slow and greedy methods get trapped. Concept: Use SA for large, rugged landscapes with many local minima, where you can generate neighbors and compute objective differences quickly. It is especially useful when you can craft domain-specific neighborhoods and delta evaluations. Example use cases: (1) TSP and vehicle routing with 2‑opt/3‑opt moves; (2) VLSI placement, floorplanning, and timetabling where swaps or moves have efficient incremental cost updates; (3) Graph problems like Max-Cut or community detection with vertex flips; (4) Continuous geometric problems such as locating facilities or centers; (5) Any NP-hard optimization where a near‑optimal solution within a time limit is acceptable.
⚠️Common Mistakes
Hook: Many SA failures come from parameter choices and poor move design, not from the idea itself. Concept: Common pitfalls include cooling too fast (algorithm freezes in a bad basin), too small or too large moves (either no exploration or random walk), and mis-scaled temperature (accepting almost everything or almost nothing). Not using delta evaluations leads to unnecessary O(n) recomputation per move. Using a poor RNG or seeding badly hurts reproducibility. Forgetting to clamp or validate neighbors can produce invalid states. Example: In TSP, recomputing full tour length at each move is O(n) and kills performance; instead, compute 2‑opt deltas in O(1). Another example: setting β = 0.5 cools far too fast for large instances; typical β in [0.99, 0.9999] balances exploration and exploitation. Always test and tune parameters per instance class and consider multiple restarts.
Key Formulas
Metropolis Acceptance
Explanation: If the proposed move improves or keeps the energy the same, always accept it. If it worsens by ΔE, accept with probability exp(−ΔE/T); larger ΔE or smaller T makes acceptance less likely.
Exponential Cooling
Explanation: Temperature decays geometrically by factor β per cooling step. This is the most common practical schedule due to simplicity and a smooth exploration-to-exploitation transition.
Linear Cooling
Explanation: Temperature decreases linearly. This is easy to implement but must stop before T becomes negative; it may cool too quickly near the end.
Logarithmic Cooling
Explanation: Temperature decreases very slowly, which under broad conditions ensures convergence to a global optimum. It is often too slow for time-limited applications.
Boltzmann Stationary Distribution
Explanation: At a fixed temperature T with the Metropolis rule, the Markov chain over states has a stationary distribution proportional to exp(−E/T). Lower-energy states are exponentially more likely.
2-opt Delta (TSP)
Explanation: Replacing edges (a,b) and (c,d) with (a,c) and (b,d) changes tour length by this amount. This allows O(1) evaluation of a 2‑opt move without recomputing the full tour length.
Acceptance Rate Approximation
Explanation: If the distribution of uphill costs has density f(Δ), the expected acceptance rate can be approximated by this integral. It guides tuning T to achieve a target acceptance rate.
Sufficient Cooling for Convergence
Explanation: A classical result: sufficiently slow logarithmic cooling ensures convergence to a global optimum under mild conditions. In practice, this schedule is often too slow.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Rastrigin function in 2D: global minimum at (0,0) with value 0 5 static inline double rastrigin(double x, double y) { 6 const double A = 10.0; 7 return 2.0 * A + (x * x - A * cos(2.0 * M_PI * x)) + (y * y - A * cos(2.0 * M_PI * y)); 8 } 9 10 int main() { 11 ios::sync_with_stdio(false); 12 cin.tie(nullptr); 13 14 mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count()); 15 normal_distribution<double> gauss(0.0, 1.0); // standard normal for perturbations 16 uniform_real_distribution<double> uni01(0.0, 1.0); 17 18 // Initial solution near the origin 19 double x = 5.0, y = 5.0; 20 double fx = rastrigin(x, y); 21 22 // Best found 23 double bx = x, by = y, bfx = fx; 24 25 // Annealing parameters 26 double T = 1e2; // initial temperature 27 const double T_end = 1e-6; // stopping temperature 28 const double beta = 0.9995; // exponential cooling multiplier 29 const int iters_per_T = 500; // proposals per temperature level 30 31 // Typical step size; scaled by temperature for exploration 32 const double base_step = 1.0; 33 34 while (T > T_end) { 35 for (int it = 0; it < iters_per_T; ++it) { 36 // Propose a neighbor using Gaussian perturbation scaled by sqrt(T) 37 double nx = x + gauss(rng) * base_step * sqrt(T); 38 double ny = y + gauss(rng) * base_step * sqrt(T); 39 40 // Optional: clamp to search bounds if any (here we allow any real value) 41 double fn = rastrigin(nx, ny); 42 double delta = fn - fx; // we minimize 43 44 if (delta <= 0.0) { 45 // Always accept improvements 46 x = nx; y = ny; fx = fn; 47 } else { 48 // Accept uphill move with probability exp(-delta / T) 49 double p = exp(-delta / T); 50 if (uni01(rng) < p) { 51 x = nx; y = ny; fx = fn; 52 } 53 } 54 55 // Track best 56 if (fx < bfx) { 57 bfx = fx; bx = x; by = y; 58 } 59 } 60 // Cool down 61 T *= beta; 62 } 63 64 // Output best solution found: x y value 65 cout << fixed << setprecision(6) << bx << ' ' << by << ' ' << bfx << '\n'; 66 return 0; 67 } 68
This program minimizes the Rastrigin function in 2D using simulated annealing with Gaussian neighbors. The acceptance rule always takes improvements and sometimes accepts worse moves depending on temperature. The step size scales with sqrt(T) to explore more at high temperatures and refine at low temperatures. The algorithm keeps track of the best point seen and prints it at the end.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Pt { double x, y; }; 5 6 static inline double dist(const Pt& a, const Pt& b) { 7 double dx = a.x - b.x, dy = a.y - b.y; 8 return sqrt(dx*dx + dy*dy); 9 } 10 11 int main() { 12 ios::sync_with_stdio(false); 13 cin.tie(nullptr); 14 15 mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count()); 16 uniform_real_distribution<double> uni01(0.0, 1.0); 17 uniform_int_distribution<int> uN; 18 19 // Generate synthetic instance 20 int n = 300; // number of cities 21 vector<Pt> p(n); 22 for (int i = 0; i < n; ++i) { 23 p[i].x = (double) (rng() % 100000) / 100.0; // 0..1000 24 p[i].y = (double) (rng() % 100000) / 100.0; 25 } 26 27 // Initial tour: 0..n-1 28 vector<int> tour(n); 29 iota(tour.begin(), tour.end(), 0); 30 31 auto tour_len = [&](const vector<int>& t){ 32 double L = 0.0; 33 for (int i = 0; i < n; ++i) { 34 int a = t[i], b = t[(i+1)%n]; 35 L += dist(p[a], p[b]); 36 } 37 return L; 38 }; 39 40 double cur = tour_len(tour); 41 double best = cur; 42 vector<int> best_tour = tour; 43 44 // Annealing parameters 45 double T = 1000.0; 46 const double T_end = 1e-3; 47 const double beta = 0.9995; // exponential cooling 48 const int iters_per_T = 2000; 49 const int KMAX = 50; // cap segment length to keep reversal cost bounded 50 51 while (T > T_end) { 52 for (int iter = 0; iter < iters_per_T; ++iter) { 53 // Choose i < j with limited segment length 54 int i = (int)(rng() % n); 55 int j = i + 1 + (int)(rng() % min(KMAX, n-1)); 56 if (j >= n) j = j - n; // wrap around 57 58 // Ensure i < j in circular sense by mapping to linear indices 59 int a = i, b = j; 60 if (a > b) swap(a, b); 61 62 // Identify endpoints: (a-1,a) and (b,b+1) will be replaced by (a-1,b) and (a,b+1) 63 int a_prev = (a - 1 + n) % n; 64 int b_next = (b + 1) % n; 65 int A = tour[a_prev], B = tour[a], C = tour[b], D = tour[b_next]; 66 67 double before = dist(p[A], p[B]) + dist(p[C], p[D]); 68 double after = dist(p[A], p[C]) + dist(p[B], p[D]); 69 double delta = after - before; // change in tour length if we reverse [a..b] 70 71 bool accept = false; 72 if (delta <= 0.0) accept = true; 73 else { 74 double prob = exp(-delta / T); 75 if (uni01(rng) < prob) accept = true; 76 } 77 78 if (accept) { 79 // Reverse segment [a..b] 80 reverse(tour.begin() + a, tour.begin() + b + 1); 81 cur += delta; 82 if (cur < best) { 83 best = cur; 84 best_tour = tour; 85 } 86 } 87 } 88 T *= beta; 89 } 90 91 // Output best length and first few cities 92 cout << fixed << setprecision(6) << best << '\n'; 93 for (int i = 0; i < min(10, n); ++i) cout << best_tour[i] << (i+1<min(10,n)? ' ' : '\n'); 94 return 0; 95 } 96
This program applies simulated annealing with 2‑opt moves to a synthetic TSP instance. The delta in tour length for a 2‑opt move is computed in O(1) using only four edges. To bound the cost of applying a move, the reversed segment length is capped by KMAX. Exponential cooling and a fixed number of iterations per temperature are used. The program prints the best tour length found and the first few city indices.