Tarjan's SCC Algorithm
Key Points
- •Tarjan’s algorithm finds all Strongly Connected Components (SCCs) of a directed graph in a single depth-first search using a stack.
- •Each vertex gets an index (discovery time) and a low-link value that captures the smallest index reachable by following zero or more edges and at most one back edge to a node still on the stack.
- •A vertex is the root of an SCC exactly when its low-link equals its index; popping the stack up to that root yields one SCC.
- •The algorithm runs in O(V + E) time and O(V) additional space, making it efficient for large sparse graphs.
- •The SCC condensation graph is a Directed Acyclic Graph (DAG), enabling topological ordering and DP on components.
- •The low-link idea adapts to undirected graphs to find articulation points and bridges by slightly changing the update rules.
- •Tarjan-based SCC is a core building block for 2-SAT, reachability compression, and cycle detection in directed graphs.
- •Implementation pitfalls include mismanaging the on-stack flag, mixing directed and undirected low-link updates, and recursion depth issues.
Prerequisites
- →Depth-First Search (DFS) — Tarjan’s algorithm is a single DFS traversal that relies on discovery times and backtracking.
- →Stacks and Recursion — The algorithm maintains an explicit stack of active vertices and uses recursion (or an equivalent manual stack) for traversal.
- →Graph Representations — Understanding adjacency lists and directed vs undirected edges is necessary to implement edges and iterate neighbors.
- →Topological Sorting — After SCC condensation, many tasks use topological order on the resulting DAG.
- →2-SAT and Implication Graphs — A key application of SCCs is solving 2-SAT via implications and SCC assignment rules.
- →Big-O Notation — Time and space complexities are expressed and reasoned about using asymptotic analysis.
Detailed Explanation
Tap terms for definitions01Overview
Tarjan’s SCC algorithm is a linear-time method to decompose a directed graph into its strongly connected components (SCCs). Intuitively, an SCC is a maximal subgraph where every vertex can reach every other vertex via directed paths. The algorithm performs exactly one depth-first search (DFS), assigning each vertex a discovery index and a low-link value. The low-link of a node summarizes the smallest discovery index you can reach from that node by traversing edges and, crucially, by using at most one back edge to a vertex that is still active in the current DFS recursion (represented by a stack). When the DFS backtracks to a node whose low-link equals its index, that node is an SCC root: everything on the stack from the top down to that root forms one SCC and is popped together. Because every edge and vertex is processed a constant number of times and the stack operations are O(1), the overall time is O(V + E) and the extra memory is O(V). The SCC condensation forms a DAG, which enables many follow-up computations, like topological ordering, counting source components, and dynamic programming over components. The same low-link framework, with slight rule adjustments, also detects articulation points and bridges in undirected graphs.
02Intuition & Analogies
Imagine exploring a maze of one-way hallways. You drop breadcrumbs with sequence numbers as you first visit rooms (indices). You also carry a rubber band that can stretch back to any earlier room you can still get back to without leaving your current chain of decisions (the DFS stack). The tightest you can pull that band from a room is its low-link value—the earliest breadcrumb number you can still reach while staying within the current active exploration. If from a room you cannot stretch the band to any earlier room except itself, that room marks a boundary: everything you visited since entering that boundary belongs to a tightly knit cluster where every room can reach every other (an SCC). You then bundle those rooms together and seal them off (pop from the stack). That bundle behaves like a single super-room in future reasoning, and importantly, these bundles connect in a one-way, no-cycle manner (a DAG). This bundling reduces complex cyclic structures into simpler blocks. A related idea in undirected graphs is to track whether the rubber band can reach an ancestor through a back edge; if not, cutting a hallway there would disconnect the maze (a bridge), or a room might act as a critical junction (an articulation point). The same mental model—indices for discovery and a band for earliest reachable—guides all these applications.
03Formal Definition
04When to Use
Use Tarjan’s SCC algorithm whenever you need to analyze cyclic structure in directed graphs efficiently: compressing the graph into SCCs, answering reachability queries between cyclic regions, or simplifying constraints. It is essential in solving 2-SAT by building an implication graph and checking that no variable and its negation lie in the same SCC; component topological order then yields a satisfying assignment. In competitive programming, SCC condensation enables dynamic programming on DAGs after compressing cycles, such as maximizing weights over reachable components or counting source/sink components to add minimum edges to make the graph strongly connected. It is also valuable for detecting feedback loops in dependency graphs (e.g., package dependencies, build systems), identifying strongly connected modules in call graphs, and reducing state graphs in model checking. For undirected graphs, a closely related low-link DFS finds articulation points and bridges, useful for network reliability, biconnected component decomposition, and minimum edge additions to increase connectivity. Choose Tarjan when you need a single-pass, stack-based approach (as opposed to Kosaraju’s two-pass method) or when recursion depth and iterative stack management are acceptable for your constraints.
⚠️Common Mistakes
Common implementation errors include: (1) forgetting to set and clear the on-stack flag, which leads to incorrect low-link updates and merging unrelated vertices into the same SCC; always mark a node on the stack immediately after discovery and clear it exactly when popping. (2) Mixing directed and undirected low-link rules: in directed SCC, you update low[u] with index[v] only if v is on the stack (a back edge to an active node), whereas in undirected articulation/bridge detection, you must ignore the immediate parent edge and use low[v] from children; conflating these leads to wrong results. (3) Not handling multiple edges and self-loops: self-loops form a singleton SCC and must be respected; parallel edges should not break invariants. (4) Using 1-based vs 0-based indices inconsistently, causing out-of-bounds or missing nodes; standardize your indexing. (5) Recursion depth overflow on deep graphs; in C++ either increase stack size, convert to iterative DFS, or ensure input constraints are safe. (6) Reusing global arrays between test cases without resetting index, low, and on-stack arrays; always reinitialize per test. (7) In 2-SAT with Tarjan, assigning variables using component IDs without ensuring the ordering assumption (component IDs follow reverse topological order); if unsure, explicitly topologically sort the condensation graph or rely on the pop order property carefully.
Key Formulas
Time Complexity
Explanation: Each vertex is discovered once and each edge is relaxed at most once. Stack push/pop and comparisons are constant time, so the total work is linear in the size of the graph.
Low-link Recurrence (Directed SCC)
Explanation: The low-link of u is the smallest discovery index reachable from u by DFS tree edges and at most one back edge to a vertex still on the stack. This captures whether u can reach an earlier ancestor in the current DFS.
Root Condition
Explanation: If u cannot reach any earlier vertex on the current stack, then u starts an SCC. Popping the stack until u recovers exactly the vertices of that SCC.
Condensation is a DAG
Explanation: By contracting SCCs, any remaining edges cannot form cycles between components; otherwise the two components would merge into a larger SCC. Hence the condensation graph is acyclic.
Low-link Update (Undirected Tree Edge)
Explanation: In undirected articulation/bridge detection, a DFS child v contributes its low-link upward. If low(v) > tin(u), (u,v) is a bridge; if low(v) tin(u) and u is not root, u is an articulation point.
2-SAT Unsatisfiability Criterion
Explanation: In the implication graph, if a variable and its negation are mutually reachable, they enforce each other to be both true and false, which is impossible.
2-SAT Assignment by SCC Order
Explanation: If SCC IDs follow reverse topological order of the condensation DAG (as in Tarjan pop order), assigning a variable true when its component appears after its negation is consistent with implications.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct TarjanSCC { 5 int n, timer, comp_cnt; 6 vector<vector<int>> g; 7 vector<int> idx, low, comp; 8 vector<char> on; 9 vector<int> st; 10 11 TarjanSCC(int n=0): n(n), timer(0), comp_cnt(0) { 12 g.assign(n, {}); 13 idx.assign(n, -1); 14 low.assign(n, 0); 15 comp.assign(n, -1); 16 on.assign(n, 0); 17 } 18 19 void add_edge(int u, int v) { g[u].push_back(v); } 20 21 void dfs(int u) { 22 idx[u] = low[u] = timer++; 23 st.push_back(u); 24 on[u] = 1; 25 for (int v : g[u]) { 26 if (idx[v] == -1) { 27 dfs(v); 28 low[u] = min(low[u], low[v]); 29 } else if (on[v]) { 30 // Back edge to a node on the stack 31 low[u] = min(low[u], idx[v]); 32 } 33 } 34 if (low[u] == idx[u]) { 35 while (true) { 36 int v = st.back(); st.pop_back(); 37 on[v] = 0; comp[v] = comp_cnt; 38 if (v == u) break; 39 } 40 comp_cnt++; 41 } 42 } 43 44 // Returns component IDs (0..comp_cnt-1) and list of components 45 vector<vector<int>> run() { 46 for (int i = 0; i < n; ++i) if (idx[i] == -1) dfs(i); 47 vector<vector<int>> sccs(comp_cnt); 48 for (int v = 0; v < n; ++v) sccs[comp[v]].push_back(v); 49 return sccs; 50 } 51 }; 52 53 int main() { 54 ios::sync_with_stdio(false); 55 cin.tie(nullptr); 56 57 int n, m; // Input: n nodes, m directed edges, nodes are 0..n-1 58 if (!(cin >> n >> m)) return 0; 59 TarjanSCC tj(n); 60 for (int i = 0; i < m; ++i) { 61 int u, v; cin >> u >> v; tj.add_edge(u, v); 62 } 63 auto sccs = tj.run(); 64 65 // Output: number of components, then each component's size and members 66 cout << sccs.size() << '\n'; 67 for (auto &c : sccs) { 68 cout << (int)c.size() << '\n'; 69 for (int i = 0; i < (int)c.size(); ++i) { 70 if (i) cout << ' '; 71 cout << c[i]; 72 } 73 cout << '\n'; 74 } 75 // Example: print component id per node 76 for (int v = 0; v < n; ++v) { 77 if (v) cout << ' '; 78 cout << tj.comp[v]; 79 } 80 cout << '\n'; 81 } 82
This program implements Tarjan’s SCC algorithm with arrays for index, low-link, on-stack flag, and component IDs. A node becomes the root of an SCC when low[u] equals idx[u]. At that point, vertices are popped from the stack to form an SCC. The code reads a directed graph, computes all SCCs, prints the components, and finally prints the component ID for each node.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct TarjanSCC { 5 int n, timer, comp_cnt; 6 vector<vector<int>> g; 7 vector<int> idx, low, comp; 8 vector<char> on; 9 vector<int> st; 10 TarjanSCC(int n=0): n(n), timer(0), comp_cnt(0) { 11 g.assign(n, {}); 12 idx.assign(n, -1); 13 low.assign(n, 0); 14 comp.assign(n, -1); 15 on.assign(n, 0); 16 } 17 void add_edge(int u, int v) { g[u].push_back(v); } 18 void dfs(int u) { 19 idx[u] = low[u] = timer++; 20 st.push_back(u); on[u] = 1; 21 for (int v : g[u]) { 22 if (idx[v] == -1) { dfs(v); low[u] = min(low[u], low[v]); } 23 else if (on[v]) low[u] = min(low[u], idx[v]); 24 } 25 if (low[u] == idx[u]) { 26 while (true) { 27 int v = st.back(); st.pop_back(); 28 on[v] = 0; comp[v] = comp_cnt; 29 if (v == u) break; 30 } 31 comp_cnt++; 32 } 33 } 34 void run() { for (int i = 0; i < n; ++i) if (idx[i] == -1) dfs(i); } 35 }; 36 37 int main(){ 38 ios::sync_with_stdio(false); 39 cin.tie(nullptr); 40 41 int n, m; if(!(cin >> n >> m)) return 0; 42 TarjanSCC tj(n); 43 for(int i=0;i<m;++i){int u,v;cin>>u>>v;tj.add_edge(u,v);} 44 tj.run(); 45 46 int C = tj.comp_cnt; 47 vector<vector<int>> dag(C); 48 vector<int> indeg(C,0); 49 vector<pair<int,int>> edges; 50 edges.reserve(m); 51 for(int u=0;u<n;++u){ 52 for(int v: tj.g[u]){ 53 int cu = tj.comp[u], cv = tj.comp[v]; 54 if(cu!=cv) edges.push_back({cu,cv}); 55 } 56 } 57 sort(edges.begin(), edges.end()); 58 edges.erase(unique(edges.begin(), edges.end()), edges.end()); 59 for(auto e: edges){ dag[e.first].push_back(e.second); indeg[e.second]++; } 60 61 // Kahn's algorithm for topological order on condensation DAG 62 queue<int> q; for(int c=0;c<C;++c) if(indeg[c]==0) q.push(c); 63 vector<int> topo; 64 while(!q.empty()){ 65 int c=q.front(); q.pop(); topo.push_back(c); 66 for(int w: dag[c]) if(--indeg[w]==0) q.push(w); 67 } 68 69 // Output: topological order of components 70 for(int i=0;i<(int)topo.size();++i){ if(i) cout<<' '; cout<<topo[i]; } 71 cout << '\n'; 72 73 // Output: component id for each original vertex 74 for(int v=0; v<n; ++v){ if(v) cout<<' '; cout<<tj.comp[v]; } 75 cout << '\n'; 76 } 77
After running Tarjan to get component IDs, this code constructs the condensation DAG by adding edges between different components and deduplicating them. It then performs a topological sort (Kahn’s algorithm) to produce a valid order over SCCs, enabling DP or other DAG algorithms. It prints the topological order of components and the component ID of each original vertex.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct APBridges { 5 int n, timer; 6 vector<vector<int>> g; 7 vector<int> tin, low; 8 vector<char> vis; 9 vector<int> isAP; 10 vector<pair<int,int>> bridges; 11 APBridges(int n=0): n(n), timer(0) { 12 g.assign(n, {}); 13 tin.assign(n, -1); 14 low.assign(n, 0); 15 vis.assign(n, 0); 16 isAP.assign(n, 0); 17 } 18 void add_edge(int u, int v){ g[u].push_back(v); g[v].push_back(u); } 19 void dfs(int u, int p){ 20 vis[u]=1; tin[u]=low[u]=timer++; 21 int child=0; 22 for(int v: g[u]){ 23 if(v==p) continue; 24 if(vis[v]){ 25 // back edge in undirected graph 26 low[u]=min(low[u], tin[v]); 27 }else{ 28 dfs(v,u); 29 low[u]=min(low[u], low[v]); 30 if(low[v] > tin[u]) bridges.push_back({u,v}); 31 if(low[v] >= tin[u] && p!=-1) isAP[u]=1; 32 child++; 33 } 34 } 35 if(p==-1 && child>1) isAP[u]=1; 36 } 37 void run(){ 38 for(int i=0;i<n;++i) if(!vis[i]) dfs(i,-1); 39 } 40 }; 41 42 int main(){ 43 ios::sync_with_stdio(false); 44 cin.tie(nullptr); 45 46 int n,m; if(!(cin>>n>>m)) return 0; 47 APBridges solver(n); 48 for(int i=0;i<m;++i){int u,v;cin>>u>>v;solver.add_edge(u,v);} 49 solver.run(); 50 51 // Output articulation points 52 vector<int> aps; 53 for(int i=0;i<n;++i) if(solver.isAP[i]) aps.push_back(i); 54 cout<<aps.size()<<'\n'; 55 for(int i=0;i<(int)aps.size();++i){ if(i) cout<<' '; cout<<aps[i]; } 56 cout<<'\n'; 57 58 // Output bridges 59 cout<<solver.bridges.size()<<'\n'; 60 for(auto &e: solver.bridges){ cout<<e.first<<' '<<e.second<<'\n'; } 61 } 62
This code applies the low-link idea to undirected graphs to find articulation points and bridges. For each tree edge (u, v), it checks whether low[v] > tin[u] (bridge) and whether low[v] >= tin[u] (articulation point for non-root u). The root is an articulation point if it has more than one DFS child.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Tarjan { 5 int n, timer, comp_cnt; 6 vector<vector<int>> g; 7 vector<int> idx, low, comp, st; 8 vector<char> on; 9 Tarjan(int n=0): n(n), timer(0), comp_cnt(0) { 10 g.assign(n, {}); 11 idx.assign(n, -1); low.assign(n, 0); comp.assign(n, -1); 12 on.assign(n, 0); 13 } 14 void add_edge(int u,int v){ g[u].push_back(v); } 15 void dfs(int u){ 16 idx[u]=low[u]=timer++; 17 st.push_back(u); on[u]=1; 18 for(int v: g[u]){ 19 if(idx[v]==-1){ dfs(v); low[u]=min(low[u], low[v]); } 20 else if(on[v]) low[u]=min(low[u], idx[v]); 21 } 22 if(low[u]==idx[u]){ 23 while(true){ int v=st.back(); st.pop_back(); on[v]=0; comp[v]=comp_cnt; if(v==u) break; } 24 comp_cnt++; 25 } 26 } 27 void run(){ for(int i=0;i<n;++i) if(idx[i]==-1) dfs(i); } 28 }; 29 30 // 2-SAT: variables x in [0..n-1]; literals: x for true, x^1 for false by mapping id(x, val) 31 struct TwoSAT { 32 int n; Tarjan tj; 33 TwoSAT(int n): n(n), tj(2*n) {} 34 int id(int x, int val){ return x<<1 | (val^1); } 35 int neg(int lit){ return lit^1; } 36 void add_imp(int a, int b){ tj.add_edge(a,b); } 37 void add_or(int x, int xv, int y, int yv){ 38 int A = id(x,xv), B = id(y,yv); 39 add_imp(neg(A), B); 40 add_imp(neg(B), A); 41 } 42 void add_true(int x){ // x must be true 43 int X = id(x,1); add_imp(neg(X), X); 44 } 45 void add_false(int x){ // x must be false 46 int X = id(x,0); add_imp(neg(X), X); 47 } 48 bool solve(vector<int>& assign){ 49 tj.run(); 50 assign.assign(n,0); 51 for(int x=0;x<n;++x){ 52 int t = id(x,1), f = id(x,0); 53 if(tj.comp[t]==tj.comp[f]) return false; 54 // Tarjan component ids increase in pop order (reverse topo). Larger comp id means earlier in topo of implications. 55 assign[x] = (tj.comp[t] > tj.comp[f]) ? 1 : 0; 56 } 57 return true; 58 } 59 }; 60 61 int main(){ 62 ios::sync_with_stdio(false); 63 cin.tie(nullptr); 64 65 int n, m; // n variables, m clauses of form (x^ax or y^ay), ax,ay in {0,1} meaning value required true(1)/false(0) 66 if(!(cin>>n>>m)) return 0; 67 TwoSAT sat(n); 68 for(int i=0;i<m;++i){ 69 int x, ax, y, ay; cin>>x>>ax>>y>>ay; // clause: (x==ax) or (y==ay) 70 sat.add_or(x, ax, y, ay); 71 } 72 vector<int> asg; 73 if(!sat.solve(asg)){ 74 cout << -1 << '\n'; 75 } else { 76 for(int i=0;i<n;++i){ if(i) cout<<' '; cout<<asg[i]; } 77 cout << '\n'; 78 } 79 } 80
This is a 2-SAT solver built on Tarjan’s SCC. Each variable x has two literals: x=true and x=false, mapped to two nodes in the implication graph. A clause (a or b) is encoded as (not a -> b) and (not b -> a). The formula is unsatisfiable if any variable and its negation lie in the same SCC. Otherwise, an assignment is produced using the Tarjan pop-order rule: set x to true if comp(x) > comp(not x).