⚙️AlgorithmIntermediate

Topological Sort

Key Points

  • Topological sort orders the nodes of a directed acyclic graph (DAG) so every edge points from left to right in the order.
  • Kahn’s algorithm repeatedly removes nodes with in-degree 0 using a queue (or priority queue) and detects cycles if nodes remain.
  • A DFS-based approach pushes a node to the front (or a stack) when its recursive exploration finishes; a back-edge signals a cycle.
  • Both Kahn’s and DFS-based topological sorts run in O(n + m) time and O(n + m) space for n nodes and m edges.
  • If the graph has a cycle, no topological order exists; both methods can detect this and report failure.
  • Using a min-heap instead of a queue yields the lexicographically smallest valid topological order.
  • Topological order enables efficient dynamic programming on DAGs, like longest path and counting paths.
  • Common pitfalls include mixing up in-degree vs out-degree, forgetting to handle multiple sources, or using DFS without cycle checks.

Prerequisites

  • Graphs and Directed GraphsUnderstanding vertices, edges, and directed edges is essential to model the problem as a DAG.
  • Breadth-First Search (BFS) and QueuesKahn’s algorithm is BFS-like and uses a queue to manage zero in-degree vertices.
  • Depth-First Search (DFS) and RecursionThe DFS-based approach relies on recursion/stack and finishing times.
  • Asymptotic Complexity (Big-O)To analyze that both algorithms run in O(n + m) time and O(n + m) space.
  • Priority Queues / HeapsNeeded to produce lexicographically smallest topological orders efficiently.
  • Dynamic Programming BasicsTopological order is often used to structure DP on DAGs such as longest path.

Detailed Explanation

Tap terms for definitions

01Overview

Topological sort is a way to line up the vertices of a directed acyclic graph (DAG) so that every directed edge (u, v) goes from an earlier vertex u to a later vertex v in the order. Think of it as arranging tasks with dependencies: a task must appear after all the tasks it depends on. Two classic algorithms compute this order. Kahn’s algorithm uses indegrees and a queue: it repeatedly takes any vertex with no incoming edges, outputs it, and removes its outgoing edges, possibly creating new vertices with indegree zero. The DFS-based algorithm performs a depth-first search; when a vertex finishes (all descendants explored), it is added to the front of the list (or pushed to a stack), yielding a reverse of finishing times. Both approaches also detect cycles: Kahn’s algorithm reports a cycle if not all vertices can be removed; DFS detects a cycle when encountering an edge to a vertex currently in the recursion stack. Topological sorting is fundamental for scheduling, build systems (compiling modules with dependencies), resolving package installs, and for dynamic programming on DAGs such as longest path or counting paths. While there can be many valid orders, any one suffices unless a specific tie-breaking rule (like lexicographically smallest) is required.

02Intuition & Analogies

Imagine you are organizing a cooking show. Some steps must happen before others: you must knead dough before baking, and preheat the oven before baking, but you can chop vegetables whenever. Your goal is to write a plan so that every dependency is respected. The steps are like vertices, and a directed edge from A to B means “do A before B.” A valid schedule that respects all such arrows is a topological order. Kahn’s algorithm is like repeatedly picking tasks that are currently unblocked. At the start, any task without prerequisites (in-degree 0) can be done. You pick one, do it, and by doing so, you remove it from everyone else’s prerequisites. That might unblock new tasks, which you then also pick. If at some point nothing is unblocked but you still have tasks left, you must have a circular dependency (e.g., A before B, B before C, and C before A) — a cycle makes planning impossible. The DFS method feels like writing a to-do list by first diving into a task’s prerequisites before listing the task itself. You recursively visit a task’s dependencies, and only when all of them are handled do you write down the current task. This ensures every task appears after its prerequisites. If along the way you revisit a task that’s currently being explored (in the recursion stack), you spotted a cycle — an impossible plan. Both views mirror real life: you can either keep doing whatever is currently unblocked (Kahn), or you can trace prerequisites back to the deepest requirement and come back up (DFS). The final list is any valid schedule that obeys all arrows.

03Formal Definition

Given a directed graph G = (V, E), a topological ordering is a bijection \(: V \{1, 2, , \}\) such that for every edge \((u, v) E\), we have \((u) < (v)\). In other words, the position of u precedes the position of v in the linear order. A topological ordering exists if and only if the graph is acyclic; i.e., G is a directed acyclic graph (DAG). Kahn’s algorithm maintains an in-degree count \((v)\) for each vertex v. It repeatedly selects any vertex with \((v) = 0\), appends it to the order, and deletes its outgoing edges, decrementing the indegrees of its neighbors. If after processing, the number of output vertices is less than \(\), then a directed cycle exists. The DFS-based algorithm performs a depth-first search on G. Let color[v] track states: 0 = unvisited, 1 = visiting, 2 = finished. When exploring an edge (u, v), if color[v] = 1, a back edge is found and hence a cycle exists. Otherwise, after fully exploring u’s outgoing edges, append u to a list. Reversing this list yields a valid topological order since every edge points from an earlier finish time to a later finish time.

04When to Use

Use topological sort whenever you have a set of items with precedence constraints and no cycles. Typical cases include scheduling tasks in project management, resolving compilation order in build systems (a file depends on others), ordering database migrations, or installing software packages with dependencies. In competitive programming, it commonly appears as: Course Schedule problems, reconstructing a valid order of letters or tasks from constraints, and tie-breaking to produce the lexicographically smallest valid order. It is also the backbone of dynamic programming on DAGs. If you process vertices in topological order, every edge (u, v) goes forward, ensuring that when you compute a value for v that depends on u, u has already been computed. This enables linear-time solutions for longest paths in DAGs, counting the number of paths from a source to all nodes, shortest paths in DAGs with possibly negative weights (since no cycles), and many DP formulations on directed acyclic state graphs. Use Kahn’s algorithm for easy cycle detection and control over tie-breaking via a queue or priority queue; use DFS when recursion is natural or when you also need to find cycles and strongly connected components adjacently (with modifications).

⚠️Common Mistakes

• Confusing in-degree with out-degree in Kahn’s algorithm: you must push vertices whose in-degree is zero, not out-degree. Double-check that you increment indegree[v] for each edge (u, v). • Forgetting to handle multiple starting vertices: many DAGs have several sources (in-degree 0). Initialize the queue with all of them, not just one. • Not detecting cycles: in Kahn’s algorithm, if the output size is less than n, report a cycle. In DFS, you must use a three-color scheme (unvisited, visiting, finished) or an explicit recursion stack; revisiting a visiting node signals a cycle. • Ignoring lexicographic requirements: if the problem asks for the smallest (or largest) valid order, use a priority queue/min-heap (or max-heap) instead of a simple queue in Kahn’s algorithm. • Recursion depth issues in DFS: deep graphs can overflow the stack in languages like C++. Consider iterative DFS or increase recursion limits when needed. • Mishandling 1-indexed vs 0-indexed nodes and off-by-one errors when reading input. • Modifying the graph while iterating without care: when you "remove" edges in Kahn’s algorithm, you should conceptually remove them by decrementing indegrees; do not erase from adjacency lists inside traversal loops carelessly. • Forgetting that multiple valid orders can exist: don’t overfit to a single expected output unless the problem constrains tie-breaking.

Key Formulas

Time Complexity of Topological Sort

Explanation: Processing each vertex and edge a constant number of times gives linear time in the size of the graph. Both Kahn’s and DFS-based algorithms achieve this bound.

Sum of In-degrees

Explanation: Each directed edge contributes exactly 1 to the in-degree of its head, so the total in-degrees equal the number of edges. This justifies O(m) total indegree decrements in Kahn’s algorithm.

Existence Condition

Explanation: A topological ordering exists if and only if the directed graph has no cycles. Any cycle would force a vertex to appear both before and after itself.

Order Constraint

Explanation: In a topological order \(\), every edge must point forward in the order. This is the defining constraint that the algorithm enforces.

Longest Path on DAG

Explanation: Processing vertices in topological order ensures when computing dp[v], all dp[u] for incoming neighbors are known. With unit weights, use w(u, v) = 1.

Counting Paths on DAG

Explanation: The number of paths to v is the sum of paths to all its predecessors. Topological order guarantees prerequisites are computed first.

Space Complexity

Explanation: Storing an adjacency list and auxiliary arrays (in-degree or color, and the output list) requires space proportional to vertices plus edges.

Complexity Analysis

Both Kahn’s algorithm and the DFS-based approach run in linear time with respect to the size of the graph. Let n = |V| and m = |E|. In Kahn’s algorithm, we first compute indegrees in O(m) by scanning all edges. Each vertex is pushed to and popped from the queue at most once, contributing O(n). For each edge (u, v), we decrement indegree[v] exactly once. Therefore the total time is O(n + m). If we use a simple queue, each push/pop is O(1). If we require a lexicographically smallest topological order and use a min-heap priority queue, each push/pop is O(log n), making the total O((n + m) log n). The DFS-based algorithm visits each vertex once and explores each outgoing edge once. Entry, exit, and color checks are O(1) per edge or vertex, so the total is O(n + m). Detecting cycles using a 3-color scheme does not add asymptotic overhead. The final reversal of the finishing-time list is O(n). Space-wise, an adjacency list representation occupies O(n + m). Kahn’s algorithm additionally stores an indegree array O(n), a queue that may contain up to O(n) vertices, and the output order O(n), yielding O(n + m) total. DFS requires a color array O(n), the recursion stack depth up to O(n) in the worst case (which can be a practical concern), and the output list O(n), again totaling O(n + m) auxiliary space. In sum, both algorithms are optimal for sparse and dense graphs in terms of asymptotic complexity when using adjacency lists.

Code Examples

Kahn's Algorithm: Standard Topological Sort with Cycle Detection
1#include <bits/stdc++.h>
2using namespace std;
3
4// Reads a directed graph and prints a topological ordering if it exists.
5// If the graph has a cycle, prints "IMPOSSIBLE".
6// Input format (example):
7// n m
8// u1 v1
9// u2 v2
10// ... (1-indexed vertices)
11int main() {
12 ios::sync_with_stdio(false);
13 cin.tie(nullptr);
14
15 int n, m;
16 if (!(cin >> n >> m)) return 0;
17
18 vector<vector<int>> g(n + 1);
19 vector<int> indeg(n + 1, 0);
20
21 for (int i = 0; i < m; ++i) {
22 int u, v; cin >> u >> v;
23 g[u].push_back(v);
24 indeg[v]++; // increment in-degree of v
25 }
26
27 queue<int> q; // vertices with in-degree 0
28 for (int v = 1; v <= n; ++v) {
29 if (indeg[v] == 0) q.push(v);
30 }
31
32 vector<int> order;
33 order.reserve(n);
34
35 while (!q.empty()) {
36 int u = q.front(); q.pop();
37 order.push_back(u);
38 // "Remove" u's outgoing edges by decrementing indegrees
39 for (int v : g[u]) {
40 if (--indeg[v] == 0) {
41 q.push(v);
42 }
43 }
44 }
45
46 if ((int)order.size() != n) {
47 cout << "IMPOSSIBLE\n"; // Graph has a cycle
48 return 0;
49 }
50
51 for (int i = 0; i < n; ++i) {
52 if (i) cout << ' ';
53 cout << order[i];
54 }
55 cout << '\n';
56 return 0;
57}
58

This is the classic Kahn’s algorithm. We compute in-degrees, initialize a queue with all sources (in-degree 0), and repeatedly pop a vertex, append it to the order, and decrement the in-degrees of its neighbors. If at the end we have not output all vertices, a cycle prevented us from finding a topological order.

Time: O(n + m)Space: O(n + m)
DFS-based Topological Sort with 3-Color Cycle Detection
1#include <bits/stdc++.h>
2using namespace std;
3
4// DFS-based topological sort. Detects cycles using colors:
5// 0 = unvisited, 1 = visiting (in recursion stack), 2 = finished.
6// On a cycle, prints "IMPOSSIBLE".
7
8int n, m;
9vector<vector<int>> g;
10vector<int> color;
11vector<int> order; // will hold vertices in reverse finishing time
12bool has_cycle = false;
13
14void dfs(int u) {
15 color[u] = 1; // visiting
16 for (int v : g[u]) {
17 if (color[v] == 0) {
18 dfs(v);
19 if (has_cycle) return;
20 } else if (color[v] == 1) {
21 // back edge found -> cycle
22 has_cycle = true;
23 return;
24 }
25 }
26 color[u] = 2; // finished
27 order.push_back(u);
28}
29
30int main() {
31 ios::sync_with_stdio(false);
32 cin.tie(nullptr);
33
34 if (!(cin >> n >> m)) return 0;
35 g.assign(n + 1, {});
36 color.assign(n + 1, 0);
37
38 for (int i = 0; i < m; ++i) {
39 int u, v; cin >> u >> v;
40 g[u].push_back(v);
41 }
42
43 // Optional: for deterministic order, you can sort adjacency lists
44 // for (int u = 1; u <= n; ++u) sort(g[u].begin(), g[u].end());
45
46 for (int u = 1; u <= n; ++u) {
47 if (color[u] == 0) {
48 dfs(u);
49 if (has_cycle) break;
50 }
51 }
52
53 if (has_cycle) {
54 cout << "IMPOSSIBLE\n";
55 return 0;
56 }
57
58 reverse(order.begin(), order.end());
59 for (int i = 0; i < (int)order.size(); ++i) {
60 if (i) cout << ' ';
61 cout << order[i];
62 }
63 cout << '\n';
64 return 0;
65}
66

We run DFS from all unvisited nodes, marking nodes as visiting and then finished. If we see an edge to a visiting node, we found a back edge and hence a cycle. By pushing a node after exploring all its neighbors, the reversed list of finishing times is a valid topological order for a DAG.

Time: O(n + m)Space: O(n + m) including recursion stack
Lexicographically Smallest Topological Order (Kahn with Min-Heap)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Produces the lexicographically smallest topological ordering
5// by always taking the smallest available zero in-degree vertex.
6int main() {
7 ios::sync_with_stdio(false);
8 cin.tie(nullptr);
9
10 int n, m; if (!(cin >> n >> m)) return 0;
11 vector<vector<int>> g(n + 1);
12 vector<int> indeg(n + 1, 0);
13 for (int i = 0; i < m; ++i) {
14 int u, v; cin >> u >> v;
15 g[u].push_back(v);
16 indeg[v]++;
17 }
18
19 priority_queue<int, vector<int>, greater<int>> pq; // min-heap
20 for (int v = 1; v <= n; ++v) if (indeg[v] == 0) pq.push(v);
21
22 vector<int> order; order.reserve(n);
23 while (!pq.empty()) {
24 int u = pq.top(); pq.pop();
25 order.push_back(u);
26 for (int v : g[u]) {
27 if (--indeg[v] == 0) pq.push(v);
28 }
29 }
30
31 if ((int)order.size() != n) {
32 cout << "IMPOSSIBLE\n";
33 return 0;
34 }
35
36 for (int i = 0; i < n; ++i) {
37 if (i) cout << ' ';
38 cout << order[i];
39 }
40 cout << '\n';
41 return 0;
42}
43

This variant replaces the queue with a min-heap so that when multiple sources are available, we always pick the smallest label. Many problems require the lexicographically smallest valid order under a given labeling.

Time: O((n + m) log n)Space: O(n + m)
Longest Path in a DAG from a Given Source using Topological Order
1#include <bits/stdc++.h>
2using namespace std;
3
4// Computes longest path lengths (by number of edges) from a given source s in a DAG.
5// If the graph contains a cycle, we report IMPOSSIBLE.
6// Input:
7// n m s
8// u1 v1
9// u2 v2
10// ... (1-indexed)
11
12int main() {
13 ios::sync_with_stdio(false);
14 cin.tie(nullptr);
15
16 int n, m, s; if (!(cin >> n >> m >> s)) return 0;
17 vector<vector<int>> g(n + 1);
18 vector<int> indeg(n + 1, 0);
19 for (int i = 0; i < m; ++i) {
20 int u, v; cin >> u >> v;
21 g[u].push_back(v);
22 indeg[v]++;
23 }
24
25 // Kahn's algorithm to get topo order (and detect cycle)
26 queue<int> q;
27 for (int v = 1; v <= n; ++v) if (indeg[v] == 0) q.push(v);
28 vector<int> topo; topo.reserve(n);
29 while (!q.empty()) {
30 int u = q.front(); q.pop();
31 topo.push_back(u);
32 for (int v : g[u]) if (--indeg[v] == 0) q.push(v);
33 }
34 if ((int)topo.size() != n) {
35 cout << "IMPOSSIBLE\n"; // not a DAG
36 return 0;
37 }
38
39 const int NEG_INF = -1e9;
40 vector<int> dp(n + 1, NEG_INF), parent(n + 1, -1);
41 dp[s] = 0; // longest distance from s to s is 0
42
43 // Relax edges in topological order
44 for (int u : topo) {
45 if (dp[u] == NEG_INF) continue; // unreachable from s
46 for (int v : g[u]) {
47 if (dp[v] < dp[u] + 1) {
48 dp[v] = dp[u] + 1;
49 parent[v] = u;
50 }
51 }
52 }
53
54 // Output distances from s to all vertices; unreachable as -INF
55 for (int v = 1; v <= n; ++v) {
56 if (v > 1) cout << ' ';
57 if (dp[v] == NEG_INF) cout << "-INF"; else cout << dp[v];
58 }
59 cout << '\n';
60
61 // Example: reconstruct one longest path to a vertex with maximum dp
62 int best = s;
63 for (int v = 1; v <= n; ++v) if (dp[v] > dp[best]) best = v;
64 if (dp[best] < 0) {
65 cout << s << '\n'; // only the source is reachable
66 } else {
67 vector<int> path;
68 for (int v = best; v != -1; v = parent[v]) path.push_back(v);
69 reverse(path.begin(), path.end());
70 for (int i = 0; i < (int)path.size(); ++i) {
71 if (i) cout << ' ';
72 cout << path[i];
73 }
74 cout << '\n';
75 }
76
77 return 0;
78}
79

We first compute a topological order using Kahn’s algorithm. Then we perform dynamic programming along the order: for each edge (u, v), dp[v] = max(dp[v], dp[u] + 1). This yields the length of the longest path from source s to every vertex. We also keep parents to reconstruct one longest path to the furthest reachable vertex.

Time: O(n + m)Space: O(n + m)