Topological Sort
Key Points
- •Topological sort orders the nodes of a directed acyclic graph (DAG) so every edge points from left to right in the order.
- •Kahn’s algorithm repeatedly removes nodes with in-degree 0 using a queue (or priority queue) and detects cycles if nodes remain.
- •A DFS-based approach pushes a node to the front (or a stack) when its recursive exploration finishes; a back-edge signals a cycle.
- •Both Kahn’s and DFS-based topological sorts run in O(n + m) time and O(n + m) space for n nodes and m edges.
- •If the graph has a cycle, no topological order exists; both methods can detect this and report failure.
- •Using a min-heap instead of a queue yields the lexicographically smallest valid topological order.
- •Topological order enables efficient dynamic programming on DAGs, like longest path and counting paths.
- •Common pitfalls include mixing up in-degree vs out-degree, forgetting to handle multiple sources, or using DFS without cycle checks.
Prerequisites
- →Graphs and Directed Graphs — Understanding vertices, edges, and directed edges is essential to model the problem as a DAG.
- →Breadth-First Search (BFS) and Queues — Kahn’s algorithm is BFS-like and uses a queue to manage zero in-degree vertices.
- →Depth-First Search (DFS) and Recursion — The DFS-based approach relies on recursion/stack and finishing times.
- →Asymptotic Complexity (Big-O) — To analyze that both algorithms run in O(n + m) time and O(n + m) space.
- →Priority Queues / Heaps — Needed to produce lexicographically smallest topological orders efficiently.
- →Dynamic Programming Basics — Topological order is often used to structure DP on DAGs such as longest path.
Detailed Explanation
Tap terms for definitions01Overview
Topological sort is a way to line up the vertices of a directed acyclic graph (DAG) so that every directed edge (u, v) goes from an earlier vertex u to a later vertex v in the order. Think of it as arranging tasks with dependencies: a task must appear after all the tasks it depends on. Two classic algorithms compute this order. Kahn’s algorithm uses indegrees and a queue: it repeatedly takes any vertex with no incoming edges, outputs it, and removes its outgoing edges, possibly creating new vertices with indegree zero. The DFS-based algorithm performs a depth-first search; when a vertex finishes (all descendants explored), it is added to the front of the list (or pushed to a stack), yielding a reverse of finishing times. Both approaches also detect cycles: Kahn’s algorithm reports a cycle if not all vertices can be removed; DFS detects a cycle when encountering an edge to a vertex currently in the recursion stack. Topological sorting is fundamental for scheduling, build systems (compiling modules with dependencies), resolving package installs, and for dynamic programming on DAGs such as longest path or counting paths. While there can be many valid orders, any one suffices unless a specific tie-breaking rule (like lexicographically smallest) is required.
02Intuition & Analogies
Imagine you are organizing a cooking show. Some steps must happen before others: you must knead dough before baking, and preheat the oven before baking, but you can chop vegetables whenever. Your goal is to write a plan so that every dependency is respected. The steps are like vertices, and a directed edge from A to B means “do A before B.” A valid schedule that respects all such arrows is a topological order. Kahn’s algorithm is like repeatedly picking tasks that are currently unblocked. At the start, any task without prerequisites (in-degree 0) can be done. You pick one, do it, and by doing so, you remove it from everyone else’s prerequisites. That might unblock new tasks, which you then also pick. If at some point nothing is unblocked but you still have tasks left, you must have a circular dependency (e.g., A before B, B before C, and C before A) — a cycle makes planning impossible. The DFS method feels like writing a to-do list by first diving into a task’s prerequisites before listing the task itself. You recursively visit a task’s dependencies, and only when all of them are handled do you write down the current task. This ensures every task appears after its prerequisites. If along the way you revisit a task that’s currently being explored (in the recursion stack), you spotted a cycle — an impossible plan. Both views mirror real life: you can either keep doing whatever is currently unblocked (Kahn), or you can trace prerequisites back to the deepest requirement and come back up (DFS). The final list is any valid schedule that obeys all arrows.
03Formal Definition
04When to Use
Use topological sort whenever you have a set of items with precedence constraints and no cycles. Typical cases include scheduling tasks in project management, resolving compilation order in build systems (a file depends on others), ordering database migrations, or installing software packages with dependencies. In competitive programming, it commonly appears as: Course Schedule problems, reconstructing a valid order of letters or tasks from constraints, and tie-breaking to produce the lexicographically smallest valid order. It is also the backbone of dynamic programming on DAGs. If you process vertices in topological order, every edge (u, v) goes forward, ensuring that when you compute a value for v that depends on u, u has already been computed. This enables linear-time solutions for longest paths in DAGs, counting the number of paths from a source to all nodes, shortest paths in DAGs with possibly negative weights (since no cycles), and many DP formulations on directed acyclic state graphs. Use Kahn’s algorithm for easy cycle detection and control over tie-breaking via a queue or priority queue; use DFS when recursion is natural or when you also need to find cycles and strongly connected components adjacently (with modifications).
⚠️Common Mistakes
• Confusing in-degree with out-degree in Kahn’s algorithm: you must push vertices whose in-degree is zero, not out-degree. Double-check that you increment indegree[v] for each edge (u, v). • Forgetting to handle multiple starting vertices: many DAGs have several sources (in-degree 0). Initialize the queue with all of them, not just one. • Not detecting cycles: in Kahn’s algorithm, if the output size is less than n, report a cycle. In DFS, you must use a three-color scheme (unvisited, visiting, finished) or an explicit recursion stack; revisiting a visiting node signals a cycle. • Ignoring lexicographic requirements: if the problem asks for the smallest (or largest) valid order, use a priority queue/min-heap (or max-heap) instead of a simple queue in Kahn’s algorithm. • Recursion depth issues in DFS: deep graphs can overflow the stack in languages like C++. Consider iterative DFS or increase recursion limits when needed. • Mishandling 1-indexed vs 0-indexed nodes and off-by-one errors when reading input. • Modifying the graph while iterating without care: when you "remove" edges in Kahn’s algorithm, you should conceptually remove them by decrementing indegrees; do not erase from adjacency lists inside traversal loops carelessly. • Forgetting that multiple valid orders can exist: don’t overfit to a single expected output unless the problem constrains tie-breaking.
Key Formulas
Time Complexity of Topological Sort
Explanation: Processing each vertex and edge a constant number of times gives linear time in the size of the graph. Both Kahn’s and DFS-based algorithms achieve this bound.
Sum of In-degrees
Explanation: Each directed edge contributes exactly 1 to the in-degree of its head, so the total in-degrees equal the number of edges. This justifies O(m) total indegree decrements in Kahn’s algorithm.
Existence Condition
Explanation: A topological ordering exists if and only if the directed graph has no cycles. Any cycle would force a vertex to appear both before and after itself.
Order Constraint
Explanation: In a topological order \(\), every edge must point forward in the order. This is the defining constraint that the algorithm enforces.
Longest Path on DAG
Explanation: Processing vertices in topological order ensures when computing dp[v], all dp[u] for incoming neighbors are known. With unit weights, use w(u, v) = 1.
Counting Paths on DAG
Explanation: The number of paths to v is the sum of paths to all its predecessors. Topological order guarantees prerequisites are computed first.
Space Complexity
Explanation: Storing an adjacency list and auxiliary arrays (in-degree or color, and the output list) requires space proportional to vertices plus edges.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Reads a directed graph and prints a topological ordering if it exists. 5 // If the graph has a cycle, prints "IMPOSSIBLE". 6 // Input format (example): 7 // n m 8 // u1 v1 9 // u2 v2 10 // ... (1-indexed vertices) 11 int main() { 12 ios::sync_with_stdio(false); 13 cin.tie(nullptr); 14 15 int n, m; 16 if (!(cin >> n >> m)) return 0; 17 18 vector<vector<int>> g(n + 1); 19 vector<int> indeg(n + 1, 0); 20 21 for (int i = 0; i < m; ++i) { 22 int u, v; cin >> u >> v; 23 g[u].push_back(v); 24 indeg[v]++; // increment in-degree of v 25 } 26 27 queue<int> q; // vertices with in-degree 0 28 for (int v = 1; v <= n; ++v) { 29 if (indeg[v] == 0) q.push(v); 30 } 31 32 vector<int> order; 33 order.reserve(n); 34 35 while (!q.empty()) { 36 int u = q.front(); q.pop(); 37 order.push_back(u); 38 // "Remove" u's outgoing edges by decrementing indegrees 39 for (int v : g[u]) { 40 if (--indeg[v] == 0) { 41 q.push(v); 42 } 43 } 44 } 45 46 if ((int)order.size() != n) { 47 cout << "IMPOSSIBLE\n"; // Graph has a cycle 48 return 0; 49 } 50 51 for (int i = 0; i < n; ++i) { 52 if (i) cout << ' '; 53 cout << order[i]; 54 } 55 cout << '\n'; 56 return 0; 57 } 58
This is the classic Kahn’s algorithm. We compute in-degrees, initialize a queue with all sources (in-degree 0), and repeatedly pop a vertex, append it to the order, and decrement the in-degrees of its neighbors. If at the end we have not output all vertices, a cycle prevented us from finding a topological order.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // DFS-based topological sort. Detects cycles using colors: 5 // 0 = unvisited, 1 = visiting (in recursion stack), 2 = finished. 6 // On a cycle, prints "IMPOSSIBLE". 7 8 int n, m; 9 vector<vector<int>> g; 10 vector<int> color; 11 vector<int> order; // will hold vertices in reverse finishing time 12 bool has_cycle = false; 13 14 void dfs(int u) { 15 color[u] = 1; // visiting 16 for (int v : g[u]) { 17 if (color[v] == 0) { 18 dfs(v); 19 if (has_cycle) return; 20 } else if (color[v] == 1) { 21 // back edge found -> cycle 22 has_cycle = true; 23 return; 24 } 25 } 26 color[u] = 2; // finished 27 order.push_back(u); 28 } 29 30 int main() { 31 ios::sync_with_stdio(false); 32 cin.tie(nullptr); 33 34 if (!(cin >> n >> m)) return 0; 35 g.assign(n + 1, {}); 36 color.assign(n + 1, 0); 37 38 for (int i = 0; i < m; ++i) { 39 int u, v; cin >> u >> v; 40 g[u].push_back(v); 41 } 42 43 // Optional: for deterministic order, you can sort adjacency lists 44 // for (int u = 1; u <= n; ++u) sort(g[u].begin(), g[u].end()); 45 46 for (int u = 1; u <= n; ++u) { 47 if (color[u] == 0) { 48 dfs(u); 49 if (has_cycle) break; 50 } 51 } 52 53 if (has_cycle) { 54 cout << "IMPOSSIBLE\n"; 55 return 0; 56 } 57 58 reverse(order.begin(), order.end()); 59 for (int i = 0; i < (int)order.size(); ++i) { 60 if (i) cout << ' '; 61 cout << order[i]; 62 } 63 cout << '\n'; 64 return 0; 65 } 66
We run DFS from all unvisited nodes, marking nodes as visiting and then finished. If we see an edge to a visiting node, we found a back edge and hence a cycle. By pushing a node after exploring all its neighbors, the reversed list of finishing times is a valid topological order for a DAG.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Produces the lexicographically smallest topological ordering 5 // by always taking the smallest available zero in-degree vertex. 6 int main() { 7 ios::sync_with_stdio(false); 8 cin.tie(nullptr); 9 10 int n, m; if (!(cin >> n >> m)) return 0; 11 vector<vector<int>> g(n + 1); 12 vector<int> indeg(n + 1, 0); 13 for (int i = 0; i < m; ++i) { 14 int u, v; cin >> u >> v; 15 g[u].push_back(v); 16 indeg[v]++; 17 } 18 19 priority_queue<int, vector<int>, greater<int>> pq; // min-heap 20 for (int v = 1; v <= n; ++v) if (indeg[v] == 0) pq.push(v); 21 22 vector<int> order; order.reserve(n); 23 while (!pq.empty()) { 24 int u = pq.top(); pq.pop(); 25 order.push_back(u); 26 for (int v : g[u]) { 27 if (--indeg[v] == 0) pq.push(v); 28 } 29 } 30 31 if ((int)order.size() != n) { 32 cout << "IMPOSSIBLE\n"; 33 return 0; 34 } 35 36 for (int i = 0; i < n; ++i) { 37 if (i) cout << ' '; 38 cout << order[i]; 39 } 40 cout << '\n'; 41 return 0; 42 } 43
This variant replaces the queue with a min-heap so that when multiple sources are available, we always pick the smallest label. Many problems require the lexicographically smallest valid order under a given labeling.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Computes longest path lengths (by number of edges) from a given source s in a DAG. 5 // If the graph contains a cycle, we report IMPOSSIBLE. 6 // Input: 7 // n m s 8 // u1 v1 9 // u2 v2 10 // ... (1-indexed) 11 12 int main() { 13 ios::sync_with_stdio(false); 14 cin.tie(nullptr); 15 16 int n, m, s; if (!(cin >> n >> m >> s)) return 0; 17 vector<vector<int>> g(n + 1); 18 vector<int> indeg(n + 1, 0); 19 for (int i = 0; i < m; ++i) { 20 int u, v; cin >> u >> v; 21 g[u].push_back(v); 22 indeg[v]++; 23 } 24 25 // Kahn's algorithm to get topo order (and detect cycle) 26 queue<int> q; 27 for (int v = 1; v <= n; ++v) if (indeg[v] == 0) q.push(v); 28 vector<int> topo; topo.reserve(n); 29 while (!q.empty()) { 30 int u = q.front(); q.pop(); 31 topo.push_back(u); 32 for (int v : g[u]) if (--indeg[v] == 0) q.push(v); 33 } 34 if ((int)topo.size() != n) { 35 cout << "IMPOSSIBLE\n"; // not a DAG 36 return 0; 37 } 38 39 const int NEG_INF = -1e9; 40 vector<int> dp(n + 1, NEG_INF), parent(n + 1, -1); 41 dp[s] = 0; // longest distance from s to s is 0 42 43 // Relax edges in topological order 44 for (int u : topo) { 45 if (dp[u] == NEG_INF) continue; // unreachable from s 46 for (int v : g[u]) { 47 if (dp[v] < dp[u] + 1) { 48 dp[v] = dp[u] + 1; 49 parent[v] = u; 50 } 51 } 52 } 53 54 // Output distances from s to all vertices; unreachable as -INF 55 for (int v = 1; v <= n; ++v) { 56 if (v > 1) cout << ' '; 57 if (dp[v] == NEG_INF) cout << "-INF"; else cout << dp[v]; 58 } 59 cout << '\n'; 60 61 // Example: reconstruct one longest path to a vertex with maximum dp 62 int best = s; 63 for (int v = 1; v <= n; ++v) if (dp[v] > dp[best]) best = v; 64 if (dp[best] < 0) { 65 cout << s << '\n'; // only the source is reachable 66 } else { 67 vector<int> path; 68 for (int v = best; v != -1; v = parent[v]) path.push_back(v); 69 reverse(path.begin(), path.end()); 70 for (int i = 0; i < (int)path.size(); ++i) { 71 if (i) cout << ' '; 72 cout << path[i]; 73 } 74 cout << '\n'; 75 } 76 77 return 0; 78 } 79
We first compute a topological order using Kahn’s algorithm. Then we perform dynamic programming along the order: for each edge (u, v), dp[v] = max(dp[v], dp[u] + 1). This yields the length of the longest path from source s to every vertex. We also keep parents to reconstruct one longest path to the furthest reachable vertex.