⚙️AlgorithmIntermediate

Hill Climbing and Local Search

Key Points

  • Hill climbing is an iterative optimization method that repeatedly moves to a better neighboring solution until no improvement is possible, reaching a local optimum.
  • The neighborhood defines which solutions are considered ‘nearby’; designing it well is crucial for both speed and solution quality.
  • Steepest-ascent evaluates all neighbors and moves to the best improving one, while first-choice stops the scan at the first improvement it finds.
  • Random restarts help escape poor local optima by re-seeding the search from new starting points.
  • Tabu search avoids short cycles by temporarily forbidding recently visited moves or solutions (a short-term memory).
  • Iterated local search alternates between perturbation and local optimization to jump between basins of attraction.
  • Local search is widely used in scheduling, assignment, Max-Cut, and graph coloring where exact methods are too slow.
  • Incremental (delta) evaluation often turns an expensive neighbor scan into an efficient O(1) or O(degree) update per move.

Prerequisites

  • Greedy algorithmsHill climbing uses local, greedy improvements at each step.
  • Graph basics (vertices, edges, degrees)Many applications like Max-Cut and graph coloring rely on graph structures and adjacency representations.
  • Asymptotic complexity (Big-O)To reason about per-iteration costs, neighborhood size, and total runtime.
  • Probability and expectationFor understanding expected steps to improvement and the effect of random restarts.
  • C++ STL and random utilitiesImplementations rely on vectors, shuffling, and high-quality pseudo-random generators.
  • Incremental (delta) evaluationCore technique to make neighbor evaluation fast and practical in large instances.
  • Constraint satisfaction problems (CSPs)N-Queens and graph coloring are CSPs commonly solved by local search heuristics.

Detailed Explanation

Tap terms for definitions

01Overview

Hill climbing and local search are families of algorithms for optimizing an objective when the search space is too large to explore exhaustively. They work by starting from an initial solution, repeatedly probing nearby solutions (a neighborhood), and moving to a neighbor that improves the objective. This greedy, iterative process stops when no neighbor is better, yielding a local optimum. Variants differ in how they pick moves (steepest-ascent vs first-choice), how they escape local optima (random restarts, perturbations), and how they prevent cycling (tabu lists). Because the algorithms only need local comparisons and often support very fast incremental evaluations, they scale well to large, real-world instances where exact algorithms are impractical. Local search is prevalent in combinatorial optimization problems like Max-Cut, N-Queens, graph coloring, and complex scheduling, and also serves as a building block inside metaheuristics such as simulated annealing and genetic algorithms. Despite their simplicity, performance hinges on a good neighborhood design and efficient move evaluation. The trade-off is that they provide no worst-case guarantee of global optimality, but they often find high-quality solutions quickly in practice.

02Intuition & Analogies

Imagine hiking in thick fog with only a headlamp. You cannot see the whole mountain, but you can tell whether a small step uphill or downhill makes you higher or lower. Hill climbing is like always taking the uphill step that makes you a bit higher. Eventually, you may reach a spot where every small step goes downhill: that is a local peak (local optimum). This might not be the tallest mountain (global optimum), but with only local visibility it’s a sensible strategy. Now, what about different variants? Steepest-ascent is like carefully testing all possible small steps around you and choosing the steepest upward slope; it’s thorough but slower. First-choice is like taking the first uphill step you find without checking them all; it moves faster but may miss a slightly better direction. Random restart is like teleporting to random trailheads and starting again if you get stuck on a small hill; do this enough times and you improve your chances of finding a taller peak. Tabu search is the hiker keeping a short memory of the last few steps and refusing to backtrack immediately, which prevents going in circles. Iterated local search is like occasionally taking a purposeful detour (a jolt) before resuming your upward walk, helping you hop from one hill’s basin to another. The core idea is simple: use local information to make progress quickly, and add lightweight strategies to escape getting stuck.

03Formal Definition

Let S be a finite or countably infinite search space with an objective function f: S → ℝ to be maximized (for minimization, negate f). A neighborhood system N: maps each solution x ∈ S to a set of neighbors N(x) ⊆ S. Hill climbing generates a sequence {} with ∈ N() such that f() > f(), if such a neighbor exists; otherwise it terminates at , which is a local optimum, i.e., ∀y ∈ N(), f(y) ≤ f(). Steepest-ascent chooses x_{t+1} = argmax_{y∈N(x_t)} f(y) subject to f(y) > f(). First-choice samples neighbors (e.g., in random order) and moves to the first y with f(y) > f(). Random-restart restarts the process from new drawn from a distribution over S after termination, repeating until a stopping rule is met. Tabu search augments the state with a memory structure (tabu list) that forbids certain moves or attributes for a fixed tenure, allowing non-improving moves to escape local optima while preventing short cycles; aspiration criteria may override tabu status when a move yields a new best. Iterated local search alternates between a perturbation operator P and local optimization L, generating )). Correctness (termination) holds under mild assumptions (finite S, no cycles), but global optimality is not guaranteed without exhaustive restarts or specific problem structure.

04When to Use

Use hill climbing and local search when the search space is large, neighbors are easy to enumerate or sample, and evaluating (or incrementally updating) the objective is fast. Classic applications include graph partitioning (Max-Cut), graph coloring, SAT and CSPs (e.g., N-Queens via min-conflicts), scheduling/rostering with soft penalties, and layout/route optimization (2-opt for TSP). They shine as anytime methods: you can stop early and still have a usable solution. Prefer steepest-ascent when the neighborhood is modest and you can afford full scans; prefer first-choice when the neighborhood is huge but you can quickly find an improvement by sampling. Introduce random restarts when landscapes are highly multi-modal and you can reinitialize cheaply; add tabu or iterated local search when there are many plateaus or shallow traps and you need systematic diversification. Local search is also valuable as a subroutine inside metaheuristics (e.g., simulated annealing’s proposals refined by a quick hill climb, or genetic algorithms using local search as a mutation/refinement step). Avoid local search when exact optimality is required or when good neighborhoods/delta evaluations are unavailable, making each move too expensive.

⚠️Common Mistakes

  1. Poor neighborhood design: Too small leads to slow progress or premature local optima; too large makes each iteration expensive. Start with a simple, problem-aware move (e.g., single-vertex flip in Max-Cut) and add compound moves only if needed. 2) Ignoring incremental (delta) evaluation: Recomputing the whole objective per neighbor can change an O(1) update into O(n), making the method impractical. Derive and maintain gains carefully and update them after each move. 3) Confusing local and global optima: Stopping at a local optimum is expected; if global quality matters, plan random restarts or diversification mechanisms (tabu, perturbations). 4) Deterministic tie-breaking and cycling: Always picking the same order can create cycles or bias; randomize neighbor order or include a tabu list/visited set for short-term memory. 5) Wrong acceptance condition: Mixing up maximization vs minimization or accepting non-improving moves unintentionally will derail the search. Encode acceptance as strict f(y) > f(x) or use well-defined sideways rules. 6) Weak stopping criteria: Relying only on ‘no improvement’ can cause infinite loops on plateaus with noisy evaluations; cap iterations, time, or non-improving steps. 7) Bad initialization: Starting from an extremely poor or infeasible solution can waste time; use constructive heuristics when available. 8) Randomness pitfalls: Not seeding PRNGs properly or using modulo bias leads to hidden bugs; use std::mt19937 with std::uniform_* distributions.

Key Formulas

Steepest-Ascent Update

Explanation: At each step, choose the neighbor with the largest objective value, but only move if it strictly improves the current value. This defines the canonical hill-climbing move for maximization.

Local Optimality Condition

Explanation: A solution x* is locally optimal if no single allowed move leads to a better solution. This property depends on the chosen neighborhood.

Expected Restarts

Explanation: If each random restart independently reaches a target-quality solution with probability p, the expected number of restarts needed is 1/p. This models the benefit of random restarts.

Runtime Decomposition

Explanation: Total time is approximately the sum over restarts of initialization time plus the number of iterations times the average neighbor-evaluation cost. It highlights where optimization (delta evaluation, smaller neighborhoods) pays off.

Max-Cut Flip Gain

Explanation: For Max-Cut with partition signs \{ 1\}, flipping vertex k increases the cut by the sum of weights to same-side neighbors minus opposite-side neighbors. A positive indicates an improving flip.

Gain Update for Neighbors

Explanation: After flipping vertex k in Max-Cut, the gain of a neighbor j changes by this amount in O(1) per edge. Maintaining gains yields fast incremental evaluation.

Iterations’ Work

Explanation: A steepest-ascent step costs time proportional to scanning the neighborhood, while a first-choice step costs the expected number of samples until the first improvement, 1/q, where q is the fraction of improving neighbors.

Plateau

Explanation: On a plateau, many neighbors have equal value and none strictly improves. Sideways moves or diversification are needed to progress.

Complexity Analysis

Let n denote problem size and the neighborhood size. A generic steepest-ascent iteration evaluates all neighbors, costing O() per step. First-choice reduces this to the expected number of samples until the first improvement, O, where q is the fraction of improving neighbors; if improving moves are common, first-choice can be dramatically faster. The total runtime is roughly I × per-iteration cost, where I is the number of accepted moves before reaching a local optimum (or meeting a stopping criterion). Delta (incremental) evaluation can turn an O(n) or O(m) neighbor evaluation into O(1) or O(degree), reducing the dominant constant factor. For Max-Cut on a graph with n vertices and m edges using single-vertex flips, we can maintain per-vertex gains. Initialization computes gains in O(m). Each accepted move requires scanning all vertices to select the best (O(n)) and then updating gains in O(degree(v)) over incident edges. If we instead keep a priority queue keyed by gains, selection drops to O(log n) amortized but needs careful lazy updates; either way, the per-move edge-update cost is O(degree(v)), totaling O(∑ degree(v)) = O(m) across a batch of moves. Space is O(n + m) for adjacency and gains. For N-Queens ( of queens), with row and diagonal counters, evaluating a move’s conflict count is O(1). A first-choice step selects a conflicting queen and samples rows until finding a strictly better one; in the worst case, this is O(n), but typically a small constant. Each accepted move updates three counters in O(1). A random-restart wrapper multiplies expected cost by the expected number of restarts 1/p, where p is the success probability per restart, yielding expected time ≈ (I × per-step cost + initialization) / p. Memory stays O(n) for position arrays and counters. In all cases, careful neighborhood and delta design determine practical performance.

Code Examples

Steepest-Ascent Hill Climbing for Max-Cut with Incremental Gains
1#include <bits/stdc++.h>
2using namespace std;
3
4// Max-Cut via single-vertex flips (steepest-ascent)
5// - Graph: undirected, nonnegative integer edge weights
6// - State: s[i] in {+1, -1} indicating partition side of vertex i
7// - Gain g[i]: change in cut weight if we flip vertex i
8// Delta for flipping k: sum_j w_{kj} * s[k] * s[j]
9// After flipping k, update:
10// g[k] = -g[k]
11// For each neighbor j: g[j] += -2 * w_{jk} * s[j] * s_k_old
12
13struct Edge {int to; int w;};
14
15struct Graph {
16 int n;
17 vector<vector<Edge>> adj; // adjacency list
18 explicit Graph(int n): n(n), adj(n) {}
19 void addEdge(int u, int v, int w=1) {
20 adj[u].push_back({v,w});
21 adj[v].push_back({u,w});
22 }
23};
24
25struct MaxCutHC {
26 Graph &G;
27 vector<int> s; // partition signs (+1 / -1)
28 vector<long long> g;// gain for flipping each vertex
29 long long cutWeight = 0; // current cut weight
30 mt19937 rng{ (uint32_t)chrono::high_resolution_clock::now().time_since_epoch().count() };
31
32 explicit MaxCutHC(Graph &graph) : G(graph), s(G.n, 1), g(G.n, 0) {}
33
34 void randomInit() {
35 uniform_int_distribution<int> bit(0,1);
36 for (int i = 0; i < G.n; ++i) s[i] = bit(rng) ? 1 : -1;
37 // compute initial cut and gains
38 cutWeight = 0;
39 fill(g.begin(), g.end(), 0);
40 for (int u = 0; u < G.n; ++u) {
41 for (auto [v, w] : G.adj[u]) if (u < v) {
42 if (s[u] != s[v]) cutWeight += w;
43 }
44 }
45 for (int u = 0; u < G.n; ++u) {
46 long long gu = 0;
47 for (auto [v, w] : G.adj[u]) gu += 1LL * w * s[u] * s[v];
48 g[u] = gu; // positive means flipping u increases cut weight
49 }
50 }
51
52 // Perform steepest-ascent until local optimum
53 void run() {
54 while (true) {
55 int best = -1;
56 long long bestGain = 0; // only accept strictly positive
57 for (int u = 0; u < G.n; ++u) {
58 if (g[u] > bestGain) {
59 bestGain = g[u];
60 best = u;
61 }
62 }
63 if (best == -1) break; // local optimum reached
64 // Apply flip of 'best'
65 int k = best;
66 int s_k_old = s[k];
67 s[k] = -s[k];
68 cutWeight += bestGain; // flipping increases cut by bestGain
69 g[k] = -g[k]; // exact after flip
70 for (auto [j, w] : G.adj[k]) {
71 // Update neighbor gains in O(1) per edge using old sign of k
72 g[j] += -2LL * w * s[j] * s_k_old;
73 }
74 }
75 }
76};
77
78int main() {
79 ios::sync_with_stdio(false);
80 cin.tie(nullptr);
81
82 // Example graph: a 6-vertex weighted graph
83 Graph G(6);
84 vector<tuple<int,int,int>> edges = {
85 {0,1,3},{0,2,2},{1,2,1},{1,3,4},{2,3,2},{2,4,3},{3,4,5},{3,5,1},{4,5,2}
86 };
87 for (auto [u,v,w] : edges) G.addEdge(u,v,w);
88
89 MaxCutHC solver(G);
90 solver.randomInit();
91 long long initCut = solver.cutWeight;
92 solver.run();
93
94 cout << "Initial cut: " << initCut << "\n";
95 cout << "Final cut: " << solver.cutWeight << "\n";
96 cout << "Partition s: ";
97 for (int i = 0; i < G.n; ++i) cout << (solver.s[i] == 1 ? "+1" : "-1") << (i+1==G.n?'\n':' ');
98
99 return 0;
100}
101

We maximize the cut weight by flipping one vertex at a time. Using signs s[i] in {+1, -1}, the gain of flipping k is Δ_k = sum_j w_{kj} s_k s_j. We maintain all gains and update them in O(degree(k)) after each accepted flip. Each iteration picks the vertex with the largest positive gain (steepest-ascent). The algorithm stops at a 1-flip local optimum where no single flip improves the cut.

Time: Initialization O(m), each iteration O(n + degree(k)), total roughly O(m + I·n + ∑ degree(k)) where I is the number of accepted flips.Space: O(n + m) for the graph, signs, and gains.
First-Choice Hill Climbing with Random Restarts for N-Queens (Min-Conflicts)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Solve N-Queens using first-choice local search with random restarts.
5// State: queenRow[c] = row index of queen in column c.
6// Objective: minimize total conflicts (attacking pairs). Goal is 0 conflicts.
7// We maintain row and diagonal counters for O(1) conflict evaluation.
8
9struct NQueensFirstChoice {
10 int N;
11 vector<int> queenRow; // row of queen in column c
12 vector<int> rowCnt; // number of queens in row r
13 vector<int> d1Cnt, d2Cnt; // main diag r+c in [0,2N-2], anti diag r-c+N-1
14 mt19937 rng{ (uint32_t)chrono::high_resolution_clock::now().time_since_epoch().count() };
15
16 explicit NQueensFirstChoice(int n): N(n), queenRow(n), rowCnt(n,0), d1Cnt(2*n-1,0), d2Cnt(2*n-1,0) {}
17
18 void clearCounts() {
19 fill(rowCnt.begin(), rowCnt.end(), 0);
20 fill(d1Cnt.begin(), d1Cnt.end(), 0);
21 fill(d2Cnt.begin(), d2Cnt.end(), 0);
22 }
23
24 void randomInit() {
25 clearCounts();
26 uniform_int_distribution<int> rowDist(0, N-1);
27 for (int c = 0; c < N; ++c) {
28 int r = rowDist(rng);
29 queenRow[c] = r;
30 rowCnt[r]++;
31 d1Cnt[r + c]++;
32 d2Cnt[r - c + N - 1]++;
33 }
34 }
35
36 inline int conflictsAt(int c, int r) const {
37 // conflicts if queen placed at (r,c), including existing queens there
38 return (rowCnt[r]) + (d1Cnt[r + c]) + (d2Cnt[r - c + N - 1]);
39 }
40
41 inline int currentQueenConflicts(int c) const {
42 int r = queenRow[c];
43 // subtract 3 to exclude the queen itself counted in each line
44 return conflictsAt(c, r) - 3;
45 }
46
47 bool isSolved() const {
48 for (int c = 0; c < N; ++c) if (currentQueenConflicts(c) != 0) return false;
49 return true;
50 }
51
52 bool firstChoiceStep(bool &moved) {
53 // Pick a conflicting queen uniformly at random
54 vector<int> conflicted;
55 conflicted.reserve(N);
56 for (int c = 0; c < N; ++c) if (currentQueenConflicts(c) > 0) conflicted.push_back(c);
57 if (conflicted.empty()) { moved = false; return true; } // solved
58 uniform_int_distribution<size_t> pick(0, conflicted.size()-1);
59 int c = conflicted[pick(rng)];
60
61 int r0 = queenRow[c];
62 int curConf = currentQueenConflicts(c);
63
64 // Temporarily remove queen from counts to evaluate target rows cleanly
65 rowCnt[r0]--; d1Cnt[r0 + c]--; d2Cnt[r0 - c + N - 1]--;
66
67 // Try rows in a random order and accept the first strictly better
68 vector<int> rows(N); iota(rows.begin(), rows.end(), 0);
69 shuffle(rows.begin(), rows.end(), rng);
70 moved = false;
71 for (int r : rows) {
72 if (r == r0) continue;
73 int newConf = (rowCnt[r]) + (d1Cnt[r + c]) + (d2Cnt[r - c + N - 1]);
74 if (newConf < curConf) {
75 queenRow[c] = r;
76 moved = true;
77 break;
78 }
79 }
80 if (!moved) {
81 // No improving row found; revert to original position
82 queenRow[c] = r0;
83 }
84 // Place queen back (at updated position if moved)
85 int rNew = queenRow[c];
86 rowCnt[rNew]++; d1Cnt[rNew + c]++; d2Cnt[rNew - c + N - 1]++;
87 return false;
88 }
89
90 // Solve with up to maxRestarts random restarts and maxSteps per restart
91 bool solve(int maxRestarts, int maxStepsPerRestart) {
92 for (int rs = 0; rs < maxRestarts; ++rs) {
93 randomInit();
94 int stall = 0;
95 for (int step = 0; step < maxStepsPerRestart; ++step) {
96 if (isSolved()) return true;
97 bool moved = false; bool done = firstChoiceStep(moved);
98 if (done && isSolved()) return true;
99 if (!moved) {
100 // Count stalls; if too many consecutive non-moves, restart early
101 if (++stall >= N) break;
102 } else {
103 stall = 0;
104 }
105 }
106 // random restart
107 }
108 return isSolved();
109 }
110};
111
112int main() {
113 ios::sync_with_stdio(false);
114 cin.tie(nullptr);
115
116 int N = 50; // try larger like 200 as well
117 NQueensFirstChoice solver(N);
118 bool ok = solver.solve(/*maxRestarts=*/50, /*maxStepsPerRestart=*/20000);
119 if (!ok) {
120 cout << "Failed to solve N-Queens for N=" << N << "\n";
121 return 0;
122 }
123 cout << "Solved N-Queens for N=" << N << "\n";
124 // Print a compact representation (row indices)
125 for (int c = 0; c < N; ++c) {
126 cout << solver.queenRow[c] << (c+1==N?'\n':' ');
127 }
128 return 0;
129}
130

This is a first-choice (min-conflicts) local search for N-Queens. We maintain row and diagonal counters to evaluate the number of conflicts in O(1). Each step picks a conflicting queen and tries candidate rows in random order, accepting the first strictly improving move it finds. If no improvement is found for many consecutive steps, we perform a random restart. This illustrates fast neighbor sampling, incremental evaluation, and a simple diversification mechanism.

Time: Per accepted move O(1) updates; per step expected O(1/q) row checks (q = fraction of improving rows). Worst-case per step O(n). With R restarts and at most S steps each, O(R·S·n) worst-case but typically much lower.Space: O(n) for positions and counters.