📚TheoryIntermediate

NP-Completeness

Key Points

  • NP-completeness classifies decision problems that are both in NP and as hard as any problem in NP via polynomial-time reductions.
  • SAT was the first NP-complete problem (Cook–Levin), and many classic problems reduce to and from SAT.
  • To prove a new problem is NP-complete, show it is in NP and reduce a known NP-complete problem to it in polynomial time.
  • If any NP-complete problem has a polynomial-time algorithm, then P = NP; otherwise, none of them do.
  • In practice, we use heuristics, approximation algorithms, parameterized algorithms, or pseudo-polynomial methods for NP-hard problems.
  • Reductions must be directionally correct and preserve YES/NO answers without super-polynomial blowup.
  • AI/ML tasks like feature selection and architecture search are often NP-hard, guiding the use of approximate or heuristic methods.
  • Verification (checking a certificate) is polynomial-time for NP problems even when finding a solution may be exponential-time.

Prerequisites

  • Propositional Logic and CNFSAT and 3-SAT are formulated in Boolean logic; understanding clauses, literals, and CNF is essential.
  • Asymptotic Analysis (Big-O)NP, polynomial time, and exponential growth are defined and compared using asymptotic notation.
  • Graph Theory BasicsMany NP-complete problems (Clique, Vertex Cover, Hamiltonian Path) are graph problems.
  • Algorithmic ReductionsProving NP-completeness requires constructing and reasoning about polynomial-time reductions.
  • Sets, Hashing, and Graph RepresentationsEfficient verifiers and reductions rely on adjacency lists/sets and fast membership checks.

Detailed Explanation

Tap terms for definitions

01Overview

NP-completeness is a framework to reason about computational difficulty. It focuses on decision problems (YES/NO questions) and compares them by how efficiently they can be transformed into each other. A problem is in NP if, when the answer is YES, there exists a short certificate (witness) that can be checked in polynomial time. Intuitively, even if solving might be hard, checking a proposed solution is easy. A problem L is NP-complete if (1) L is in NP and (2) every problem in NP can be reduced to L using a polynomial-time transformation that preserves YES/NO answers. Cook–Levin showed SAT is NP-complete; many other problems (3-SAT, Clique, Vertex Cover, Hamiltonian Path, Subset Sum, Graph Coloring, TSP decision) were later shown NP-complete by reductions from SAT or from each other. The practical significance is twofold: if any NP-complete problem admits a polynomial-time algorithm, then all of NP does (P = NP); otherwise, none of them do (assuming P ≠ NP). This insight guides algorithm design: when facing NP-completeness, we favor approximations, heuristics, fixed-parameter strategies, or special-case restrictions where polynomial algorithms exist.

02Intuition & Analogies

Imagine puzzles. Some puzzles are quick to verify once someone shows you a completed solution: a Sudoku grid, a jigsaw puzzle, or a maze path. Checking a completed Sudoku is easy: scan rows, columns, and boxes. But discovering that solution from scratch can be painfully hard. That gap—easy to check but potentially hard to find—captures the spirit of NP. Now, think of reductions as puzzle translations. Suppose every Sudoku can be turned into a jigsaw puzzle in a way that preserves solvability and the translation itself is quick. If you had a magical fast jigsaw solver, you could solve any Sudoku by first translating it. That would make jigsaw at least as hard as Sudoku. In NP-completeness, SAT is like the “universal puzzle” to which many others can be translated. A problem is NP-complete if all NP puzzles can be efficiently translated into it and it still has efficiently checkable solutions. Why do we care? Because if you’re trying to design an algorithm for a hard puzzle and you learn that countless other hard puzzles can be translated into yours, it’s a strong signal that a general fast algorithm is unlikely (unless a famous open problem, P vs NP, resolves unexpectedly). In daily life and engineering, we then switch tactics: restrict the puzzle (special cases), accept near-optimal solutions (approximations), focus on small parts (parameterized algorithms), or rely on good guesses (heuristics), rather than chasing a universal fast solver that likely doesn’t exist.

03Formal Definition

Let Σ be a finite alphabet and Σ^{*} the set of finite strings over Σ. A decision problem is a language L ⊆ Σ^{*}. We say L ∈ P if there exists a deterministic Turing machine (or RAM model) that decides membership in L in time polynomial in the input length n = . We say L ∈ NP if there exists a polynomial p and a polynomial-time verifier V(x, y) such that x ∈ L if and only if there exists a certificate y with ≤ p() and V(x, y) = 1. A polynomial-time many-one (Karp) reduction from language A to language B, written A ≤_{p} B, is a function f: Σ^{*} → Σ^{*} computable in polynomial time such that for all x, x ∈ A if and only if f(x) ∈ B. A language L is NP-hard if for all A ∈ NP, A ≤_{p} L. A language L is NP-complete if L ∈ NP and L is NP-hard. Cook–Levin’s theorem states that SAT, the set of satisfiable Boolean formulas in CNF, is NP-complete. From SAT, one can show other problems are NP-complete via chains of Karp reductions (e.g., SAT ≤_{p} 3-SAT ≤_{p} Clique ≤_{p} Vertex Cover, etc.). The central open question is whether equivalently, whether there exists L ∈ NP-complete ∩ P.

04When to Use

Use NP-completeness when you need to understand whether to keep searching for a polynomial-time algorithm or to shift strategy. If your problem resembles known NP-complete problems (e.g., set selection under constraints, path visiting all nodes, exact partitioning, exact coloring), try to prove NP-completeness: (1) show it’s in NP by giving a polynomial-time verifier; (2) reduce a known NP-complete problem to yours in polynomial time. If successful, this justifies adopting alternatives: approximation algorithms (like 2-approx for Vertex Cover), heuristics (local search, SAT solvers), fixed-parameter algorithms (FPT) that are efficient for small parameters, pseudo-polynomial algorithms for numeric instances (Subset Sum/Knapsack), or exploiting special structure (planarity, bounded treewidth, small alphabet, monotonicity). In AI/ML, invoke NP-hardness to set realistic expectations: feature selection, neural architecture search, optimal decision trees, and clustering variants are often NP-hard; practical systems rely on relaxations (e.g., L1 regularization), greedy surrogates, MILP solvers with cuts, or metaheuristics. In software engineering, classify tasks early: if a planning/scheduling formulation is NP-complete, plan for timeouts, approximate guarantees, or parameterized controls rather than insisting on worst-case polynomial runtime.

⚠️Common Mistakes

• Confusing NP with “not polynomial.” NP means “nondeterministic polynomial time,” characterized by polynomial-time verifiability, not confirmed intractability. • Proving NP-hardness in the wrong direction. You must reduce a known NP-complete problem to your target (A ≤_{p} B means B is at least as hard as A); reducing your problem to SAT only shows your problem is no harder than SAT. • Ignoring certificate length or verifier time when claiming L ∈ NP. Certificates must be polynomially bounded and verifiable in polynomial time. • Letting reductions blow up super-polynomially. The mapping must be computable in polynomial time with output size polynomial in input size and preserve YES/NO answers exactly. • Mixing decision and optimization improperly. NP-completeness is defined for decision problems; show the decision variant is NP-complete, then relate to the optimization version. • Using heuristics as proofs. Empirical difficulty or failing to find an algorithm is not a proof of NP-hardness. • Overgeneralizing worst-case hardness. NP-complete in the worst case doesn’t preclude easy average cases, special structures, or practical tractability with modern solvers. • Forgetting parameters. An instance may be fixed-parameter tractable for a small parameter k even if the general problem is NP-complete.

Key Formulas

Containment

Explanation: Every problem solvable in polynomial time is also verifiable in polynomial time. Whether this is strict (P NP) is open.

Verifier-based Definition of NP

Explanation: A language is in NP if YES instances have polynomial-size certificates that a polynomial-time verifier can check. This captures 'easy to check' even if 'hard to find.'

Karp Reduction

Explanation: A many-one reduction maps instances of A to instances of B preserving YES/NO answers and is computable in polynomial time. It shows B is at least as hard as A.

NP-Complete Definition

Explanation: An NP-complete problem is in NP and every NP problem reduces to it. It is a 'hardest' member of NP up to polynomial-time reductions.

Polynomial Time

Explanation: This expresses that an algorithm runs in time bounded by a polynomial in the input size. Polynomial time is the gold standard for tractability.

Exponential/Factorial Growth

Explanation: These bounds characterize brute-force search spaces for many NP-complete problems. They grow too fast for large n, making naive exact solutions infeasible.

Cook–Levin Theorem

Explanation: Boolean satisfiability is the canonical NP-complete problem. It serves as a starting point for many reductions to show other problems are NP-complete.

Pseudo-Polynomial Complexity

Explanation: Here S is the numeric target sum. Runtime depends on numeric values, not just bit-length, so it is not polynomial in input length.

Approximation Ratio

Explanation: For minimization problems, the worst-case ratio between algorithm cost and optimal cost defines the quality guarantee. Smaller is better.

FPT Runtime

Explanation: Fixed-parameter algorithms isolate exponential dependence to a parameter k, leaving polynomial dependence on input size n. Effective when k is small.

3-SAT to Clique Construction

Explanation: Build a graph with one vertex per literal per clause and connect compatible literals from different clauses. A clique of size m exists iff the formula is satisfiable.

Complexity Analysis

For NP problems, verification runs in polynomial time: given a certificate y of size polynomial in input length n, the verifier V(x, y) runs in time. This contrasts with naive search, which often explores exponential-size spaces (e.g., 2^{n} assignments for SAT, or O(n!) permutations for Hamiltonian Path). Proving NP-completeness relies on polynomial-time reductions: the mapping f must compute in time O() and produce an output whose size is polynomial in n; composition of reductions preserves polynomial bounds. In practice, the overhead from reductions is typically O()–O(). For example, the classic 3- reduction creates 3m vertices and O() edges (considering at most 9 edges per clause pair minus conflicts), which is quadratic in the number of clauses. The Hamiltonian Path verifier operates in O(n + m) time by checking adjacency along the proposed path and ensuring each vertex appears once with a hash/set; its space is O(n) for bookkeeping. A 2-approximation for Vertex Cover via a simple edge-picking strategy runs in O(n + m) time and O(n) space, making it practical on large sparse graphs despite the problem’s NP-completeness. Any exact algorithm for Clique, Vertex Cover, or 3-SAT is believed to require super-polynomial time in the worst case; common exact methods (branch-and-bound, DPLL/CDCL for SAT) often perform well empirically but have exponential worst-case bounds. Parameterized algorithms achieve runtimes like f(k)·, e.g., Vertex Cover in O(2^{k} + n + m), which is efficient when k is small. Pseudo-polynomial algorithms, such as O(nS) for Subset Sum, are efficient when numeric targets are small, but not polynomial in input bit-length in general.

Code Examples

Polynomial-Time Verifier for Hamiltonian Path (NP Membership Demo)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Verify whether 'path' is a Hamiltonian path in an undirected graph G.
5// G: adjacency list for vertices 0..n-1
6// path: a permutation of 0..n-1 representing the claimed Hamiltonian path
7// Returns true if path visits each vertex exactly once and each consecutive
8// pair in 'path' is connected by an edge in G.
9bool verifyHamiltonianPath(const vector<vector<int>>& G, const vector<int>& path) {
10 int n = (int)G.size();
11 if ((int)path.size() != n) return false;
12
13 // Build adjacency sets for O(1) edge-existence queries
14 vector<unordered_set<int>> adj(n);
15 for (int u = 0; u < n; ++u) {
16 for (int v : G[u]) adj[u].insert(v);
17 }
18
19 // Check that 'path' is a permutation of 0..n-1 (no repeats, all present)
20 vector<int> seen(n, 0);
21 for (int v : path) {
22 if (v < 0 || v >= n) return false; // invalid vertex index
23 if (seen[v]) return false; // duplicate vertex
24 seen[v] = 1;
25 }
26
27 // Check adjacency along the path
28 for (int i = 0; i + 1 < n; ++i) {
29 int u = path[i], v = path[i + 1];
30 if (!adj[u].count(v)) return false; // missing required edge
31 }
32 return true;
33}
34
35int main() {
36 ios::sync_with_stdio(false);
37 cin.tie(nullptr);
38
39 // Example graph: a simple path 0-1-2-3 plus extra edges
40 int n = 4;
41 vector<vector<int>> G(n);
42 auto add_edge = [&](int u, int v){ G[u].push_back(v); G[v].push_back(u); };
43 add_edge(0,1); add_edge(1,2); add_edge(2,3); // path edges
44 add_edge(0,2); // extra edge
45
46 vector<int> certificate = {0,1,2,3};
47 cout << (verifyHamiltonianPath(G, certificate) ? "YES" : "NO") << "\n";
48
49 // A wrong certificate (repeats or missing edges)
50 vector<int> bad = {0,2,1,3};
51 cout << (verifyHamiltonianPath(G, bad) ? "YES" : "NO") << "\n";
52
53 return 0;
54}
55

This program illustrates NP membership: given a claimed Hamiltonian path (certificate), we verify it in polynomial time. We ensure the certificate is a permutation of vertices and that each consecutive pair is an edge. The algorithm uses hash sets for O(1) adjacency queries.

Time: O(n + m)Space: O(n + m)
3-SAT to Clique Reduction and Small-Instance Checker
1#include <bits/stdc++.h>
2using namespace std;
3
4// Literal encoding: for variable x in [1..n]
5// positive literal x -> +x
6// negative literal ~x -> -x
7// A CNF clause is a vector<int> of literals (here of size exactly 3).
8
9struct Graph {
10 int N; // number of vertices
11 vector<vector<int>> adj; // adjacency list
12};
13
14// Build the standard graph for the 3-SAT -> Clique reduction.
15// Given m clauses (each 3 literals), create 3m vertices: one per literal.
16// Connect vertices from different clauses if they are NOT complementary literals.
17// A clique of size m exists iff the formula is satisfiable.
18Graph buildCliqueReduction(const vector<vector<int>>& clauses) {
19 int m = (int)clauses.size();
20 Graph G; G.N = 3 * m; G.adj.assign(G.N, {});
21
22 auto conflict = [&](int a, int b){
23 // Return true if literals a and b are complementary (x and ~x)
24 return a == -b;
25 };
26
27 // Map (clause i, position j) -> vertex id v = 3*i + j
28 auto vid = [&](int i, int j){ return 3 * i + j; };
29
30 // Build edges between literals from different clauses when not in conflict
31 for (int i = 0; i < m; ++i) {
32 for (int j = 0; j < 3; ++j) {
33 for (int i2 = i + 1; i2 < m; ++i2) {
34 for (int j2 = 0; j2 < 3; ++j2) {
35 int a = clauses[i][j];
36 int b = clauses[i2][j2];
37 if (!conflict(a, b)) {
38 int u = vid(i, j), v = vid(i2, j2);
39 G.adj[u].push_back(v);
40 G.adj[v].push_back(u);
41 }
42 }
43 }
44 }
45 }
46 return G;
47}
48
49// Check if there exists a clique of size m by selecting exactly one vertex per clause
50// ensuring pairwise adjacency (this is tailored to the reduction structure).
51bool existsCliqueSizeM(const Graph& G, const vector<vector<int>>& clauses) {
52 int m = (int)clauses.size();
53 // Precompute adjacency as bitsets for speed on small instances
54 vector<vector<char>> adjmat(G.N, vector<char>(G.N, 0));
55 for (int u = 0; u < G.N; ++u) for (int v : G.adj[u]) adjmat[u][v] = 1;
56
57 vector<int> choice(m, -1); // chosen literal index j in clause i
58
59 function<bool(int)> dfs = [&](int i) -> bool {
60 if (i == m) return true; // chosen one literal per clause, all consistent so far
61 for (int j = 0; j < 3; ++j) {
62 int u = 3 * i + j;
63 bool ok = true;
64 // check adjacency with all previously chosen vertices
65 for (int ip = 0; ip < i && ok; ++ip) {
66 int up = 3 * ip + choice[ip];
67 if (!adjmat[u][up]) ok = false;
68 }
69 if (ok) {
70 choice[i] = j;
71 if (dfs(i + 1)) return true;
72 choice[i] = -1;
73 }
74 }
75 return false;
76 };
77 return dfs(0);
78}
79
80int main() {
81 ios::sync_with_stdio(false);
82 cin.tie(nullptr);
83
84 // Example formula: (x1 ∨ ¬x2 ∨ x3) ∧ (¬x1 ∨ x2 ∨ x4) ∧ (x2 ∨ x3 ∨ ¬x4)
85 // Variables are 1-indexed in literals; negative for negation.
86 vector<vector<int>> clauses = {
87 {+1, -2, +3},
88 {-1, +2, +4},
89 {+2, +3, -4}
90 };
91
92 Graph G = buildCliqueReduction(clauses);
93 bool sat_via_clique = existsCliqueSizeM(G, clauses);
94 cout << (sat_via_clique ? "SAT" : "UNSAT") << "\n";
95
96 // A clearly UNSAT small instance: (x1) ∧ (¬x1) duplicated to fit 3-SAT form
97 vector<vector<int>> clauses2 = {
98 {+1, +1, +1},
99 {-1, -1, -1}
100 };
101 Graph G2 = buildCliqueReduction(clauses2);
102 cout << (existsCliqueSizeM(G2, clauses2) ? "SAT" : "UNSAT") << "\n";
103
104 return 0;
105}
106

This program constructs the graph used in the classic 3-SAT → Clique reduction: one vertex per literal per clause; edges connect compatible literals from different clauses. It then checks, for small inputs, whether there is a clique of size m by picking one literal per clause with full pairwise adjacency. The existence of such a clique is equivalent to satisfiability of the original 3-CNF formula.

Time: Graph construction: O(m^2) (constant 9 comparisons per clause pair). Backtracking check: O(3^m) in the worst case (exponential).Space: O(m^2) for adjacency; O(m) recursion stack.
A Simple 2-Approximation for Vertex Cover
1#include <bits/stdc++.h>
2using namespace std;
3
4// 2-approximation for Vertex Cover on an undirected graph.
5// Repeatedly pick an arbitrary uncovered edge (u,v), add both endpoints to the cover,
6// and remove all incident edges. This guarantees a factor-2 approximation.
7
8vector<int> vertexCover2Approx(int n, const vector<pair<int,int>>& edges) {
9 vector<vector<int>> adj(n);
10 vector<pair<int,int>> E = edges;
11 vector<char> covered(n, 0);
12
13 // Track which edges remain uncovered
14 vector<char> removed(E.size(), 0);
15
16 vector<char> inCover(n, 0);
17 // While there exists an uncovered edge, add both endpoints
18 for (size_t i = 0; i < E.size(); ++i) {
19 if (removed[i]) continue;
20 int u = E[i].first, v = E[i].second;
21 // Add both endpoints to cover
22 inCover[u] = inCover[v] = 1;
23 // Remove all edges incident to u or v
24 for (size_t j = i; j < E.size(); ++j) {
25 if (removed[j]) continue;
26 int a = E[j].first, b = E[j].second;
27 if (a == u || a == v || b == u || b == v) removed[j] = 1;
28 }
29 }
30
31 vector<int> cover;
32 for (int v = 0; v < n; ++v) if (inCover[v]) cover.push_back(v);
33 return cover;
34}
35
36int main() {
37 ios::sync_with_stdio(false);
38 cin.tie(nullptr);
39
40 int n = 6;
41 vector<pair<int,int>> edges = {
42 {0,1},{1,2},{2,3},{3,4},{4,5},{0,5},{1,4}
43 };
44
45 vector<int> cover = vertexCover2Approx(n, edges);
46 cout << "Vertex cover (2-approx) size = " << cover.size() << "\n";
47 cout << "Vertices:";
48 for (int v : cover) cout << ' ' << v;
49 cout << "\n";
50
51 return 0;
52}
53

This algorithm repeatedly selects an uncovered edge and adds both endpoints to the cover. Every step matches at least one edge of any optimal solution, yielding a cover at most twice optimal. It illustrates how we cope with NP-completeness by using efficient approximation algorithms.

Time: O(m^2) in this simple implementation; with adjacency structures it can be reduced to O(m).Space: O(n + m)