⚙️AlgorithmIntermediate

Simulated Annealing

Key Points

•
Simulated annealing is a probabilistic search that sometimes accepts worse moves to escape local optima, mimicking how metals cool and crystallize.
•
At temperature T, a worse move that increases cost by Δ is accepted with probability exp(-Δ/T), which decreases as T cools.
•
A good annealing schedule (initial temperature, cooling rate, and iterations per temperature) is critical for performance and solution quality.
•
Neighborhood design matters: small, meaningful random changes give a smooth search landscape and efficient exploration.
•
Exponential cooling ( $T_{k} = T_{0}$ · $β^{k})$ is the most common in practice; logarithmic cooling has convergence guarantees but is usually too slow.
•
Multiple restarts and occasional reheating improve robustness and help avoid getting stuck.
•
Time complexity is roughly the number of iterations times the cost to generate and evaluate a neighbor; careful delta updates are key.
•
Simulated annealing is widely used in TSP, routing, placement, scheduling, and many NP-hard problems where exact optimization is impractical.

Prerequisites

→Probability and random variables — Understanding acceptance probabilities, distributions, and randomness is essential for the Metropolis rule and tuning.
→Exponential and logarithmic functions — Cooling schedules and acceptance probabilities rely on exp and log behavior.
→Local search / hill climbing — SA extends local search by probabilistic uphill acceptance; knowing the baseline helps with neighborhood design.
→C++ random library — Implementing SA requires high-quality RNGs, distributions, and seeding.
→Time and space complexity — To design efficient neighborhoods and delta updates you must analyze per-iteration costs.
→Graph and geometry basics — Common SA applications like TSP and placement use distances and simple geometric reasoning.
→Markov chains (basic) — SA is a Markov process; the idea of stationary distributions clarifies why the Metropolis rule works.

Detailed Explanation

Tap terms for definitions

01Overview

Hook: Imagine trying to find the lowest valley in a gigantic, foggy mountain range. If you only walk downhill, you might end up stuck in a small basin; to find the deepest valley, you sometimes need to climb a bit first. Concept: Simulated annealing (SA) is a stochastic optimization technique that formalizes this idea. It treats the objective function like an energy landscape and explores it using random moves that are sometimes accepted even when they make the solution worse. The chance to accept bad moves is controlled by a temperature parameter T that decreases over time: at high temperature, the algorithm explores widely; at low temperature, it settles into exploitation. Example: In the Traveling Salesman Problem (TSP), a random 2-opt move may increase tour length. Early on, SA might accept this to escape a poor tour. As the temperature cools, the algorithm becomes increasingly conservative, keeping only improvements or very small degradations.

02Intuition & Analogies

Hook: Think of cooling molten metal. When it is very hot, atoms move freely and can rearrange to reduce internal stress. As it cools slowly, the material settles into a strong crystalline structure, ideally near the minimum energy state. Concept: SA borrows this thermodynamic story. We view a candidate solution as a configuration of atoms. The objective value is its energy. Temperature measures how much randomness we allow. At high T, the system is allowed to jiggle a lot, making big changes that can hop over barriers. As T drops, the system calms down and only accepts changes that do not increase energy too much. Example: Picture building a schedule for classes. Initially, you shuffle many classes around (high T) and even accept worse schedules to escape bottlenecks. Later, you make only small swaps and mostly accept improvements (low T), polishing the schedule you already have.

03Formal Definition

Hook: To be precise, SA is a Markov chain on the space of solutions whose stationary distribution follows a Boltzmann law at each fixed temperature. Concept: Given a state space S, objective (energy) E: S → ℝ, and a neighborhood generator N(s) that proposes s' from s, SA iteratively applies the Metropolis rule at temperature T: accept s' with probability 1 if Δ = E(s') − E(s) ≤ 0; otherwise accept with probability exp(−Δ/T). After some iterations at T, reduce T according to a cooling schedule. Under logarithmic cooling and mild regularity, SA converges in probability to a global optimum. Example: For TSP with a 2‑opt neighborhood, define E as tour length. From tour s, choose indices

i < j

, reverse segment [i..j] to obtain s', compute Δ via a constant-time edge difference, and accept with the Metropolis rule. Repeat while decreasing T exponentially until stopping criteria are met.

04When to Use

Hook: SA shines when exact algorithms are too slow and greedy methods get trapped. Concept: Use SA for large, rugged landscapes with many local minima, where you can generate neighbors and compute objective differences quickly. It is especially useful when you can craft domain-specific neighborhoods and delta evaluations. Example use cases: (1) TSP and vehicle routing with 2‑opt/3‑opt moves; (2) VLSI placement, floorplanning, and timetabling where swaps or moves have efficient incremental cost updates; (3) Graph problems like Max-Cut or community detection with vertex flips; (4) Continuous geometric problems such as locating facilities or centers; (5) Any NP-hard optimization where a near‑optimal solution within a time limit is acceptable.

⚠️Common Mistakes

Hook: Many SA failures come from parameter choices and poor move design, not from the idea itself. Concept: Common pitfalls include cooling too fast (algorithm freezes in a bad basin), too small or too large moves (either no exploration or random walk), and mis-scaled temperature (accepting almost everything or almost nothing). Not using delta evaluations leads to unnecessary O(n) recomputation per move. Using a poor RNG or seeding badly hurts reproducibility. Forgetting to clamp or validate neighbors can produce invalid states. Example: In TSP, recomputing full tour length at each move is O(n) and kills performance; instead, compute 2‑opt deltas in O(1). Another example: setting β = 0.5 cools far too fast for large instances; typical β in [0.99, 0.9999] balances exploration and exploitation. Always test and tune parameters per instance class and consider multiple restarts.

Key Formulas

Metropolis Acceptance

Pr [accept] = {1, exp (- \frac{Δ E}{T}), Δ E \leq 0 Δ E > 0

Explanation: If the proposed move improves or keeps the energy the same, always accept it. If it worsens by ΔE, accept with probability exp(−ΔE/T); larger ΔE or smaller T makes acceptance less likely.

Exponential Cooling

T_{k} = T_{0} \cdot β^{k}, 0 < β < 1

Explanation: Temperature decays geometrically by factor β per cooling step. This is the most common practical schedule due to simplicity and a smooth exploration-to-exploitation transition.

Linear Cooling

T_{k} = T_{0} - k α, α > 0

Explanation: Temperature decreases linearly. This is easy to implement but must stop before T becomes negative; it may cool too quickly near the end.

Logarithmic Cooling

T_{k} = \frac{T _{0}}{lo g ( k + c )}, c > 1

Explanation: Temperature decreases very slowly, which under broad conditions ensures convergence to a global optimum. It is often too slow for time-limited applications.

Boltzmann Stationary Distribution

π_{T} (s) \propto exp (- \frac{E ( s )}{T})

Explanation: At a fixed temperature T with the Metropolis rule, the Markov chain over states has a stationary distribution proportional to exp(−E/T). Lower-energy states are exponentially more likely.

2-opt Delta (TSP)

Δ E_{2-opt} = - d (a, b) - d (c, d) + d (a, c) + d (b, d)

Explanation: Replacing edges (a,b) and (c,d) with (a,c) and (b,d) changes tour length by this amount. This allows O(1) evaluation of a 2‑opt move without recomputing the full tour length.

Acceptance Rate Approximation

E [accept] \approx \int_{0}^{\infty} e^{- Δ/ T} f (Δ) d Δ

Explanation: If the distribution of uphill costs has density f(Δ), the expected acceptance rate can be approximated by this integral. It guides tuning T to achieve a target acceptance rate.

Sufficient Cooling for Convergence

T_{k} \geq \frac{c}{lo g ( k + k _{0} )}

Explanation: A classical result: sufficiently slow logarithmic cooling ensures convergence to a global optimum under mild conditions. In practice, this schedule is often too slow.

Complexity Analysis

The time complexity of simulated annealing depends on three factors: the number of temperature levels L, the number of proposals per level M, and the cost of generating and evaluating a neighbor. If

C_{n}

is the cost to sample a neighbor and C_Δ is the cost to compute the objective change, the total expected time is O(L · M · (

C_{n}

+ C_Δ)). With careful delta evaluations, C_Δ is often O(1). For example, in TSP a 2‑opt delta uses only four edge lengths, so C_Δ = O(1); if applying the move requires reversing a segment, the acceptance step costs O(k) where k is the segment length, so amortized time depends on the distribution of k (you can cap k to keep this near constant). In continuous spaces with simple functions, both neighbor generation (Gaussian perturbation) and Δ computation are O(1), yielding O(L · M) time. Space complexity is typically small: you need to store the current solution and sometimes the best-so-far solution. This is O(n) for problems with n components (e.g., n cities in TSP), plus O(1) to O(

n^{2}

) for any precomputed auxiliary data (e.g., distance matrix). Many implementations avoid O(

n^{2}

) storage by computing distances on the fly, trading a small constant-time overhead. If multiple restarts are run independently, memory scales linearly with the number of parallel runs. Practically, performance hinges on constants: good neighborhoods, fast Δ computations, and tuned schedules. When a strict time limit is present (common in contests), it is better to budget iterations by elapsed time rather than fixed L and M, ensuring the algorithm uses all available time while cooling adaptively.

Code Examples

Simulated annealing for a continuous function (Rastrigin in 2D)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Rastrigin function in 2D: global minimum at (0,0) with value 0
5 static inline double rastrigin(double x, double y) {
6     const double A = 10.0;
7     return 2.0 * A + (x * x - A * cos(2.0 * M_PI * x)) + (y * y - A * cos(2.0 * M_PI * y));
8 }
9 
10 int main() {
11     ios::sync_with_stdio(false);
12     cin.tie(nullptr);
13 
14     mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
15     normal_distribution<double> gauss(0.0, 1.0); // standard normal for perturbations
16     uniform_real_distribution<double> uni01(0.0, 1.0);
17 
18     // Initial solution near the origin
19     double x = 5.0, y = 5.0;
20     double fx = rastrigin(x, y);
21 
22     // Best found
23     double bx = x, by = y, bfx = fx;
24 
25     // Annealing parameters
26     double T = 1e2;           // initial temperature
27     const double T_end = 1e-6; // stopping temperature
28     const double beta = 0.9995; // exponential cooling multiplier
29     const int iters_per_T = 500; // proposals per temperature level
30 
31     // Typical step size; scaled by temperature for exploration
32     const double base_step = 1.0;
33 
34     while (T > T_end) {
35         for (int it = 0; it < iters_per_T; ++it) {
36             // Propose a neighbor using Gaussian perturbation scaled by sqrt(T)
37             double nx = x + gauss(rng) * base_step * sqrt(T);
38             double ny = y + gauss(rng) * base_step * sqrt(T);
39 
40             // Optional: clamp to search bounds if any (here we allow any real value)
41             double fn = rastrigin(nx, ny);
42             double delta = fn - fx; // we minimize
43 
44             if (delta <= 0.0) {
45                 // Always accept improvements
46                 x = nx; y = ny; fx = fn;
47             } else {
48                 // Accept uphill move with probability exp(-delta / T)
49                 double p = exp(-delta / T);
50                 if (uni01(rng) < p) {
51                     x = nx; y = ny; fx = fn;
52                 }
53             }
54 
55             // Track best
56             if (fx < bfx) {
57                 bfx = fx; bx = x; by = y;
58             }
59         }
60         // Cool down
61         T *= beta;
62     }
63 
64     // Output best solution found: x y value
65     cout << fixed << setprecision(6) << bx << ' ' << by << ' ' << bfx << '\n';
66     return 0;
67 }
68

This program minimizes the Rastrigin function in 2D using simulated annealing with Gaussian neighbors. The acceptance rule always takes improvements and sometimes accepts worse moves depending on temperature. The step size scales with sqrt(T) to explore more at high temperatures and refine at low temperatures. The algorithm keeps track of the best point seen and prints it at the end.

Time: O(L · M) where L is the number of temperature levels and M is iterations per level; each neighbor proposal and delta evaluation is O(1).Space: O(1) beyond the constant number of doubles used.

Simulated annealing for TSP using 2-opt moves (synthetic points)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Pt { double x, y; };
5 
6 static inline double dist(const Pt& a, const Pt& b) {
7     double dx = a.x - b.x, dy = a.y - b.y;
8     return sqrt(dx*dx + dy*dy);
9 }
10 
11 int main() {
12     ios::sync_with_stdio(false);
13     cin.tie(nullptr);
14 
15     mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
16     uniform_real_distribution<double> uni01(0.0, 1.0);
17     uniform_int_distribution<int> uN;
18 
19     // Generate synthetic instance
20     int n = 300; // number of cities
21     vector<Pt> p(n);
22     for (int i = 0; i < n; ++i) {
23         p[i].x = (double) (rng() % 100000) / 100.0; // 0..1000
24         p[i].y = (double) (rng() % 100000) / 100.0;
25     }
26 
27     // Initial tour: 0..n-1
28     vector<int> tour(n);
29     iota(tour.begin(), tour.end(), 0);
30 
31     auto tour_len = [&](const vector<int>& t){
32         double L = 0.0;
33         for (int i = 0; i < n; ++i) {
34             int a = t[i], b = t[(i+1)%n];
35             L += dist(p[a], p[b]);
36         }
37         return L;
38     };
39 
40     double cur = tour_len(tour);
41     double best = cur;
42     vector<int> best_tour = tour;
43 
44     // Annealing parameters
45     double T = 1000.0;
46     const double T_end = 1e-3;
47     const double beta = 0.9995; // exponential cooling
48     const int iters_per_T = 2000;
49     const int KMAX = 50; // cap segment length to keep reversal cost bounded
50 
51     while (T > T_end) {
52         for (int iter = 0; iter < iters_per_T; ++iter) {
53             // Choose i < j with limited segment length
54             int i = (int)(rng() % n);
55             int j = i + 1 + (int)(rng() % min(KMAX, n-1));
56             if (j >= n) j = j - n; // wrap around
57 
58             // Ensure i < j in circular sense by mapping to linear indices
59             int a = i, b = j;
60             if (a > b) swap(a, b);
61 
62             // Identify endpoints: (a-1,a) and (b,b+1) will be replaced by (a-1,b) and (a,b+1)
63             int a_prev = (a - 1 + n) % n;
64             int b_next = (b + 1) % n;
65             int A = tour[a_prev], B = tour[a], C = tour[b], D = tour[b_next];
66 
67             double before = dist(p[A], p[B]) + dist(p[C], p[D]);
68             double after  = dist(p[A], p[C]) + dist(p[B], p[D]);
69             double delta = after - before; // change in tour length if we reverse [a..b]
70 
71             bool accept = false;
72             if (delta <= 0.0) accept = true;
73             else {
74                 double prob = exp(-delta / T);
75                 if (uni01(rng) < prob) accept = true;
76             }
77 
78             if (accept) {
79                 // Reverse segment [a..b]
80                 reverse(tour.begin() + a, tour.begin() + b + 1);
81                 cur += delta;
82                 if (cur < best) {
83                     best = cur;
84                     best_tour = tour;
85                 }
86             }
87         }
88         T *= beta;
89     }
90 
91     // Output best length and first few cities
92     cout << fixed << setprecision(6) << best << '\n';
93     for (int i = 0; i < min(10, n); ++i) cout << best_tour[i] << (i+1<min(10,n)? ' ' : '\n');
94     return 0;
95 }
96

This program applies simulated annealing with 2‑opt moves to a synthetic TSP instance. The delta in tour length for a 2‑opt move is computed in O(1) using only four edges. To bound the cost of applying a move, the reversed segment length is capped by KMAX. Exponential cooling and a fixed number of iterations per temperature are used. The program prints the best tour length found and the first few city indices.

Time: O(L · M · (1 + E[segment_length])) due to O(1) delta evaluation and O(k) segment reversal; capping the segment length keeps this near O(L · M).Space: O(n) for the tour and points; no distance matrix is stored.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Rastrigin function in 2D: global minimum at (0,0) with value 0
5	static inline double rastrigin(double x, double y) {
6	const double A = 10.0;
7	return 2.0 * A + (x * x - A * cos(2.0 * M_PI * x)) + (y * y - A * cos(2.0 * M_PI * y));
8	}
9
10	int main() {
11	ios::sync_with_stdio(false);
12	cin.tie(nullptr);
13
14	mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
15	normal_distribution<double> gauss(0.0, 1.0); // standard normal for perturbations
16	uniform_real_distribution<double> uni01(0.0, 1.0);
17
18	// Initial solution near the origin
19	double x = 5.0, y = 5.0;
20	double fx = rastrigin(x, y);
21
22	// Best found
23	double bx = x, by = y, bfx = fx;
24
25	// Annealing parameters
26	double T = 1e2; // initial temperature
27	const double T_end = 1e-6; // stopping temperature
28	const double beta = 0.9995; // exponential cooling multiplier
29	const int iters_per_T = 500; // proposals per temperature level
30
31	// Typical step size; scaled by temperature for exploration
32	const double base_step = 1.0;
33
34	while (T > T_end) {
35	for (int it = 0; it < iters_per_T; ++it) {
36	// Propose a neighbor using Gaussian perturbation scaled by sqrt(T)
37	double nx = x + gauss(rng) * base_step * sqrt(T);
38	double ny = y + gauss(rng) * base_step * sqrt(T);
39
40	// Optional: clamp to search bounds if any (here we allow any real value)
41	double fn = rastrigin(nx, ny);
42	double delta = fn - fx; // we minimize
43
44	if (delta <= 0.0) {
45	// Always accept improvements
46	x = nx; y = ny; fx = fn;
47	} else {
48	// Accept uphill move with probability exp(-delta / T)
49	double p = exp(-delta / T);
50	if (uni01(rng) < p) {
51	x = nx; y = ny; fx = fn;
52	}
53	}
54
55	// Track best
56	if (fx < bfx) {
57	bfx = fx; bx = x; by = y;
58	}
59	}
60	// Cool down
61	T *= beta;
62	}
63
64	// Output best solution found: x y value
65	cout << fixed << setprecision(6) << bx << ' ' << by << ' ' << bfx << '\n';
66	return 0;
67	}
68

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct Pt { double x, y; };
5
6	static inline double dist(const Pt& a, const Pt& b) {
7	double dx = a.x - b.x, dy = a.y - b.y;
8	return sqrt(dxdx + dydy);
9	}
10
11	int main() {
12	ios::sync_with_stdio(false);
13	cin.tie(nullptr);
14
15	mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
16	uniform_real_distribution<double> uni01(0.0, 1.0);
17	uniform_int_distribution<int> uN;
18
19	// Generate synthetic instance
20	int n = 300; // number of cities
21	vector<Pt> p(n);
22	for (int i = 0; i < n; ++i) {
23	p[i].x = (double) (rng() % 100000) / 100.0; // 0..1000
24	p[i].y = (double) (rng() % 100000) / 100.0;
25	}
26
27	// Initial tour: 0..n-1
28	vector<int> tour(n);
29	iota(tour.begin(), tour.end(), 0);
30
31	auto tour_len = [&](const vector<int>& t){
32	double L = 0.0;
33	for (int i = 0; i < n; ++i) {
34	int a = t[i], b = t[(i+1)%n];
35	L += dist(p[a], p[b]);
36	}
37	return L;
38	};
39
40	double cur = tour_len(tour);
41	double best = cur;
42	vector<int> best_tour = tour;
43
44	// Annealing parameters
45	double T = 1000.0;
46	const double T_end = 1e-3;
47	const double beta = 0.9995; // exponential cooling
48	const int iters_per_T = 2000;
49	const int KMAX = 50; // cap segment length to keep reversal cost bounded
50
51	while (T > T_end) {
52	for (int iter = 0; iter < iters_per_T; ++iter) {
53	// Choose i < j with limited segment length
54	int i = (int)(rng() % n);
55	int j = i + 1 + (int)(rng() % min(KMAX, n-1));
56	if (j >= n) j = j - n; // wrap around
57
58	// Ensure i < j in circular sense by mapping to linear indices
59	int a = i, b = j;
60	if (a > b) swap(a, b);
61
62	// Identify endpoints: (a-1,a) and (b,b+1) will be replaced by (a-1,b) and (a,b+1)
63	int a_prev = (a - 1 + n) % n;
64	int b_next = (b + 1) % n;
65	int A = tour[a_prev], B = tour[a], C = tour[b], D = tour[b_next];
66
67	double before = dist(p[A], p[B]) + dist(p[C], p[D]);
68	double after = dist(p[A], p[C]) + dist(p[B], p[D]);
69	double delta = after - before; // change in tour length if we reverse [a..b]
70
71	bool accept = false;
72	if (delta <= 0.0) accept = true;
73	else {
74	double prob = exp(-delta / T);
75	if (uni01(rng) < prob) accept = true;
76	}
77
78	if (accept) {
79	// Reverse segment [a..b]
80	reverse(tour.begin() + a, tour.begin() + b + 1);
81	cur += delta;
82	if (cur < best) {
83	best = cur;
84	best_tour = tour;
85	}
86	}
87	}
88	T *= beta;
89	}
90
91	// Output best length and first few cities
92	cout << fixed << setprecision(6) << best << '\n';
93	for (int i = 0; i < min(10, n); ++i) cout << best_tour[i] << (i+1<min(10,n)? ' ' : '\n');
94	return 0;
95	}
96