Hill Climbing and Local Search
Key Points
- •Hill climbing is an iterative optimization method that repeatedly moves to a better neighboring solution until no improvement is possible, reaching a local optimum.
- •The neighborhood defines which solutions are considered ‘nearby’; designing it well is crucial for both speed and solution quality.
- •Steepest-ascent evaluates all neighbors and moves to the best improving one, while first-choice stops the scan at the first improvement it finds.
- •Random restarts help escape poor local optima by re-seeding the search from new starting points.
- •Tabu search avoids short cycles by temporarily forbidding recently visited moves or solutions (a short-term memory).
- •Iterated local search alternates between perturbation and local optimization to jump between basins of attraction.
- •Local search is widely used in scheduling, assignment, Max-Cut, and graph coloring where exact methods are too slow.
- •Incremental (delta) evaluation often turns an expensive neighbor scan into an efficient O(1) or O(degree) update per move.
Prerequisites
- →Greedy algorithms — Hill climbing uses local, greedy improvements at each step.
- →Graph basics (vertices, edges, degrees) — Many applications like Max-Cut and graph coloring rely on graph structures and adjacency representations.
- →Asymptotic complexity (Big-O) — To reason about per-iteration costs, neighborhood size, and total runtime.
- →Probability and expectation — For understanding expected steps to improvement and the effect of random restarts.
- →C++ STL and random utilities — Implementations rely on vectors, shuffling, and high-quality pseudo-random generators.
- →Incremental (delta) evaluation — Core technique to make neighbor evaluation fast and practical in large instances.
- →Constraint satisfaction problems (CSPs) — N-Queens and graph coloring are CSPs commonly solved by local search heuristics.
Detailed Explanation
Tap terms for definitions01Overview
Hill climbing and local search are families of algorithms for optimizing an objective when the search space is too large to explore exhaustively. They work by starting from an initial solution, repeatedly probing nearby solutions (a neighborhood), and moving to a neighbor that improves the objective. This greedy, iterative process stops when no neighbor is better, yielding a local optimum. Variants differ in how they pick moves (steepest-ascent vs first-choice), how they escape local optima (random restarts, perturbations), and how they prevent cycling (tabu lists). Because the algorithms only need local comparisons and often support very fast incremental evaluations, they scale well to large, real-world instances where exact algorithms are impractical. Local search is prevalent in combinatorial optimization problems like Max-Cut, N-Queens, graph coloring, and complex scheduling, and also serves as a building block inside metaheuristics such as simulated annealing and genetic algorithms. Despite their simplicity, performance hinges on a good neighborhood design and efficient move evaluation. The trade-off is that they provide no worst-case guarantee of global optimality, but they often find high-quality solutions quickly in practice.
02Intuition & Analogies
Imagine hiking in thick fog with only a headlamp. You cannot see the whole mountain, but you can tell whether a small step uphill or downhill makes you higher or lower. Hill climbing is like always taking the uphill step that makes you a bit higher. Eventually, you may reach a spot where every small step goes downhill: that is a local peak (local optimum). This might not be the tallest mountain (global optimum), but with only local visibility it’s a sensible strategy. Now, what about different variants? Steepest-ascent is like carefully testing all possible small steps around you and choosing the steepest upward slope; it’s thorough but slower. First-choice is like taking the first uphill step you find without checking them all; it moves faster but may miss a slightly better direction. Random restart is like teleporting to random trailheads and starting again if you get stuck on a small hill; do this enough times and you improve your chances of finding a taller peak. Tabu search is the hiker keeping a short memory of the last few steps and refusing to backtrack immediately, which prevents going in circles. Iterated local search is like occasionally taking a purposeful detour (a jolt) before resuming your upward walk, helping you hop from one hill’s basin to another. The core idea is simple: use local information to make progress quickly, and add lightweight strategies to escape getting stuck.
03Formal Definition
04When to Use
Use hill climbing and local search when the search space is large, neighbors are easy to enumerate or sample, and evaluating (or incrementally updating) the objective is fast. Classic applications include graph partitioning (Max-Cut), graph coloring, SAT and CSPs (e.g., N-Queens via min-conflicts), scheduling/rostering with soft penalties, and layout/route optimization (2-opt for TSP). They shine as anytime methods: you can stop early and still have a usable solution. Prefer steepest-ascent when the neighborhood is modest and you can afford full scans; prefer first-choice when the neighborhood is huge but you can quickly find an improvement by sampling. Introduce random restarts when landscapes are highly multi-modal and you can reinitialize cheaply; add tabu or iterated local search when there are many plateaus or shallow traps and you need systematic diversification. Local search is also valuable as a subroutine inside metaheuristics (e.g., simulated annealing’s proposals refined by a quick hill climb, or genetic algorithms using local search as a mutation/refinement step). Avoid local search when exact optimality is required or when good neighborhoods/delta evaluations are unavailable, making each move too expensive.
⚠️Common Mistakes
- Poor neighborhood design: Too small leads to slow progress or premature local optima; too large makes each iteration expensive. Start with a simple, problem-aware move (e.g., single-vertex flip in Max-Cut) and add compound moves only if needed. 2) Ignoring incremental (delta) evaluation: Recomputing the whole objective per neighbor can change an O(1) update into O(n), making the method impractical. Derive and maintain gains carefully and update them after each move. 3) Confusing local and global optima: Stopping at a local optimum is expected; if global quality matters, plan random restarts or diversification mechanisms (tabu, perturbations). 4) Deterministic tie-breaking and cycling: Always picking the same order can create cycles or bias; randomize neighbor order or include a tabu list/visited set for short-term memory. 5) Wrong acceptance condition: Mixing up maximization vs minimization or accepting non-improving moves unintentionally will derail the search. Encode acceptance as strict f(y) > f(x) or use well-defined sideways rules. 6) Weak stopping criteria: Relying only on ‘no improvement’ can cause infinite loops on plateaus with noisy evaluations; cap iterations, time, or non-improving steps. 7) Bad initialization: Starting from an extremely poor or infeasible solution can waste time; use constructive heuristics when available. 8) Randomness pitfalls: Not seeding PRNGs properly or using modulo bias leads to hidden bugs; use std::mt19937 with std::uniform_* distributions.
Key Formulas
Steepest-Ascent Update
Explanation: At each step, choose the neighbor with the largest objective value, but only move if it strictly improves the current value. This defines the canonical hill-climbing move for maximization.
Local Optimality Condition
Explanation: A solution x* is locally optimal if no single allowed move leads to a better solution. This property depends on the chosen neighborhood.
Expected Restarts
Explanation: If each random restart independently reaches a target-quality solution with probability p, the expected number of restarts needed is 1/p. This models the benefit of random restarts.
Runtime Decomposition
Explanation: Total time is approximately the sum over restarts of initialization time plus the number of iterations times the average neighbor-evaluation cost. It highlights where optimization (delta evaluation, smaller neighborhoods) pays off.
Max-Cut Flip Gain
Explanation: For Max-Cut with partition signs \{ 1\}, flipping vertex k increases the cut by the sum of weights to same-side neighbors minus opposite-side neighbors. A positive indicates an improving flip.
Gain Update for Neighbors
Explanation: After flipping vertex k in Max-Cut, the gain of a neighbor j changes by this amount in O(1) per edge. Maintaining gains yields fast incremental evaluation.
Iterations’ Work
Explanation: A steepest-ascent step costs time proportional to scanning the neighborhood, while a first-choice step costs the expected number of samples until the first improvement, 1/q, where q is the fraction of improving neighbors.
Plateau
Explanation: On a plateau, many neighbors have equal value and none strictly improves. Sideways moves or diversification are needed to progress.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Max-Cut via single-vertex flips (steepest-ascent) 5 // - Graph: undirected, nonnegative integer edge weights 6 // - State: s[i] in {+1, -1} indicating partition side of vertex i 7 // - Gain g[i]: change in cut weight if we flip vertex i 8 // Delta for flipping k: sum_j w_{kj} * s[k] * s[j] 9 // After flipping k, update: 10 // g[k] = -g[k] 11 // For each neighbor j: g[j] += -2 * w_{jk} * s[j] * s_k_old 12 13 struct Edge {int to; int w;}; 14 15 struct Graph { 16 int n; 17 vector<vector<Edge>> adj; // adjacency list 18 explicit Graph(int n): n(n), adj(n) {} 19 void addEdge(int u, int v, int w=1) { 20 adj[u].push_back({v,w}); 21 adj[v].push_back({u,w}); 22 } 23 }; 24 25 struct MaxCutHC { 26 Graph &G; 27 vector<int> s; // partition signs (+1 / -1) 28 vector<long long> g;// gain for flipping each vertex 29 long long cutWeight = 0; // current cut weight 30 mt19937 rng{ (uint32_t)chrono::high_resolution_clock::now().time_since_epoch().count() }; 31 32 explicit MaxCutHC(Graph &graph) : G(graph), s(G.n, 1), g(G.n, 0) {} 33 34 void randomInit() { 35 uniform_int_distribution<int> bit(0,1); 36 for (int i = 0; i < G.n; ++i) s[i] = bit(rng) ? 1 : -1; 37 // compute initial cut and gains 38 cutWeight = 0; 39 fill(g.begin(), g.end(), 0); 40 for (int u = 0; u < G.n; ++u) { 41 for (auto [v, w] : G.adj[u]) if (u < v) { 42 if (s[u] != s[v]) cutWeight += w; 43 } 44 } 45 for (int u = 0; u < G.n; ++u) { 46 long long gu = 0; 47 for (auto [v, w] : G.adj[u]) gu += 1LL * w * s[u] * s[v]; 48 g[u] = gu; // positive means flipping u increases cut weight 49 } 50 } 51 52 // Perform steepest-ascent until local optimum 53 void run() { 54 while (true) { 55 int best = -1; 56 long long bestGain = 0; // only accept strictly positive 57 for (int u = 0; u < G.n; ++u) { 58 if (g[u] > bestGain) { 59 bestGain = g[u]; 60 best = u; 61 } 62 } 63 if (best == -1) break; // local optimum reached 64 // Apply flip of 'best' 65 int k = best; 66 int s_k_old = s[k]; 67 s[k] = -s[k]; 68 cutWeight += bestGain; // flipping increases cut by bestGain 69 g[k] = -g[k]; // exact after flip 70 for (auto [j, w] : G.adj[k]) { 71 // Update neighbor gains in O(1) per edge using old sign of k 72 g[j] += -2LL * w * s[j] * s_k_old; 73 } 74 } 75 } 76 }; 77 78 int main() { 79 ios::sync_with_stdio(false); 80 cin.tie(nullptr); 81 82 // Example graph: a 6-vertex weighted graph 83 Graph G(6); 84 vector<tuple<int,int,int>> edges = { 85 {0,1,3},{0,2,2},{1,2,1},{1,3,4},{2,3,2},{2,4,3},{3,4,5},{3,5,1},{4,5,2} 86 }; 87 for (auto [u,v,w] : edges) G.addEdge(u,v,w); 88 89 MaxCutHC solver(G); 90 solver.randomInit(); 91 long long initCut = solver.cutWeight; 92 solver.run(); 93 94 cout << "Initial cut: " << initCut << "\n"; 95 cout << "Final cut: " << solver.cutWeight << "\n"; 96 cout << "Partition s: "; 97 for (int i = 0; i < G.n; ++i) cout << (solver.s[i] == 1 ? "+1" : "-1") << (i+1==G.n?'\n':' '); 98 99 return 0; 100 } 101
We maximize the cut weight by flipping one vertex at a time. Using signs s[i] in {+1, -1}, the gain of flipping k is Δ_k = sum_j w_{kj} s_k s_j. We maintain all gains and update them in O(degree(k)) after each accepted flip. Each iteration picks the vertex with the largest positive gain (steepest-ascent). The algorithm stops at a 1-flip local optimum where no single flip improves the cut.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Solve N-Queens using first-choice local search with random restarts. 5 // State: queenRow[c] = row index of queen in column c. 6 // Objective: minimize total conflicts (attacking pairs). Goal is 0 conflicts. 7 // We maintain row and diagonal counters for O(1) conflict evaluation. 8 9 struct NQueensFirstChoice { 10 int N; 11 vector<int> queenRow; // row of queen in column c 12 vector<int> rowCnt; // number of queens in row r 13 vector<int> d1Cnt, d2Cnt; // main diag r+c in [0,2N-2], anti diag r-c+N-1 14 mt19937 rng{ (uint32_t)chrono::high_resolution_clock::now().time_since_epoch().count() }; 15 16 explicit NQueensFirstChoice(int n): N(n), queenRow(n), rowCnt(n,0), d1Cnt(2*n-1,0), d2Cnt(2*n-1,0) {} 17 18 void clearCounts() { 19 fill(rowCnt.begin(), rowCnt.end(), 0); 20 fill(d1Cnt.begin(), d1Cnt.end(), 0); 21 fill(d2Cnt.begin(), d2Cnt.end(), 0); 22 } 23 24 void randomInit() { 25 clearCounts(); 26 uniform_int_distribution<int> rowDist(0, N-1); 27 for (int c = 0; c < N; ++c) { 28 int r = rowDist(rng); 29 queenRow[c] = r; 30 rowCnt[r]++; 31 d1Cnt[r + c]++; 32 d2Cnt[r - c + N - 1]++; 33 } 34 } 35 36 inline int conflictsAt(int c, int r) const { 37 // conflicts if queen placed at (r,c), including existing queens there 38 return (rowCnt[r]) + (d1Cnt[r + c]) + (d2Cnt[r - c + N - 1]); 39 } 40 41 inline int currentQueenConflicts(int c) const { 42 int r = queenRow[c]; 43 // subtract 3 to exclude the queen itself counted in each line 44 return conflictsAt(c, r) - 3; 45 } 46 47 bool isSolved() const { 48 for (int c = 0; c < N; ++c) if (currentQueenConflicts(c) != 0) return false; 49 return true; 50 } 51 52 bool firstChoiceStep(bool &moved) { 53 // Pick a conflicting queen uniformly at random 54 vector<int> conflicted; 55 conflicted.reserve(N); 56 for (int c = 0; c < N; ++c) if (currentQueenConflicts(c) > 0) conflicted.push_back(c); 57 if (conflicted.empty()) { moved = false; return true; } // solved 58 uniform_int_distribution<size_t> pick(0, conflicted.size()-1); 59 int c = conflicted[pick(rng)]; 60 61 int r0 = queenRow[c]; 62 int curConf = currentQueenConflicts(c); 63 64 // Temporarily remove queen from counts to evaluate target rows cleanly 65 rowCnt[r0]--; d1Cnt[r0 + c]--; d2Cnt[r0 - c + N - 1]--; 66 67 // Try rows in a random order and accept the first strictly better 68 vector<int> rows(N); iota(rows.begin(), rows.end(), 0); 69 shuffle(rows.begin(), rows.end(), rng); 70 moved = false; 71 for (int r : rows) { 72 if (r == r0) continue; 73 int newConf = (rowCnt[r]) + (d1Cnt[r + c]) + (d2Cnt[r - c + N - 1]); 74 if (newConf < curConf) { 75 queenRow[c] = r; 76 moved = true; 77 break; 78 } 79 } 80 if (!moved) { 81 // No improving row found; revert to original position 82 queenRow[c] = r0; 83 } 84 // Place queen back (at updated position if moved) 85 int rNew = queenRow[c]; 86 rowCnt[rNew]++; d1Cnt[rNew + c]++; d2Cnt[rNew - c + N - 1]++; 87 return false; 88 } 89 90 // Solve with up to maxRestarts random restarts and maxSteps per restart 91 bool solve(int maxRestarts, int maxStepsPerRestart) { 92 for (int rs = 0; rs < maxRestarts; ++rs) { 93 randomInit(); 94 int stall = 0; 95 for (int step = 0; step < maxStepsPerRestart; ++step) { 96 if (isSolved()) return true; 97 bool moved = false; bool done = firstChoiceStep(moved); 98 if (done && isSolved()) return true; 99 if (!moved) { 100 // Count stalls; if too many consecutive non-moves, restart early 101 if (++stall >= N) break; 102 } else { 103 stall = 0; 104 } 105 } 106 // random restart 107 } 108 return isSolved(); 109 } 110 }; 111 112 int main() { 113 ios::sync_with_stdio(false); 114 cin.tie(nullptr); 115 116 int N = 50; // try larger like 200 as well 117 NQueensFirstChoice solver(N); 118 bool ok = solver.solve(/*maxRestarts=*/50, /*maxStepsPerRestart=*/20000); 119 if (!ok) { 120 cout << "Failed to solve N-Queens for N=" << N << "\n"; 121 return 0; 122 } 123 cout << "Solved N-Queens for N=" << N << "\n"; 124 // Print a compact representation (row indices) 125 for (int c = 0; c < N; ++c) { 126 cout << solver.queenRow[c] << (c+1==N?'\n':' '); 127 } 128 return 0; 129 } 130
This is a first-choice (min-conflicts) local search for N-Queens. We maintain row and diagonal counters to evaluate the number of conflicts in O(1). Each step picks a conflicting queen and tries candidate rows in random order, accepting the first strictly improving move it finds. If no improvement is found for many consecutive steps, we perform a random restart. This illustrates fast neighbor sampling, incremental evaluation, and a simple diversification mechanism.