DFS - Tree and Graph Properties
Key Points
- •Depth-First Search (DFS) assigns each vertex a discovery time and a finish time that capture a neat nesting structure of recursion.
- •Using entry (tin) and exit (tout) times, the entire subtree of a node maps to a contiguous segment in an Euler tour array.
- •Back edges in directed graphs indicate cycles; detecting a gray-to-gray edge during DFS proves the graph is cyclic.
- •A DFS spanning forest organizes edges into tree, back, forward, and cross edges, each with clear timestamp-based tests.
- •Subtree queries such as sums or counts can be answered quickly by flattening the tree into an array via tin order.
- •Topological order for a DAG can be obtained by sorting vertices by decreasing finish times from a DFS.
- •Strongly Connected Components (SCCs) can be found in O(n + m) time using DFS-based algorithms like Kosaraju or Tarjan.
- •DFS is linear in the size of the graph and uses stack space proportional to recursion depth, which can be O(n) in worst cases.
Prerequisites
- →Graphs and representations (adjacency lists) — You must know how to store and traverse graphs efficiently to implement DFS.
- →Recursion and call stack — DFS is naturally recursive; understanding the stack helps explain discovery/finish times.
- →Big-O notation — To analyze DFS-based algorithms’ time and space complexity.
- →Trees and rooted trees — Subtree queries and ancestor tests rely on tree structure induced by DFS.
- →Prefix sums and range queries — Flattened subtrees map to contiguous ranges where prefix sums or Fenwick trees apply.
- →Stacks/queues and basic data structures — For iterative DFS variants and efficient processing of queries.
- →Directed vs undirected graphs — Edge classification and cycle detection differ between directed and undirected settings.
Detailed Explanation
Tap terms for definitions01Overview
Depth-First Search (DFS) is a foundational graph traversal technique that explores as far as possible along each branch before backtracking. While many learn DFS as a simple way to visit all nodes, its real power comes from the structure it induces: each vertex is stamped with a discovery time (when we first see it) and a finish time (when we leave it). These timestamps create a nested parenthesis structure that mirrors the recursion stack. From this structure, we derive useful properties: we can classify edges (tree, back, forward, cross), detect cycles, and flatten trees for fast subtree queries. In directed graphs, the presence of a back edge implies a cycle, while ordering by finish time yields a topological order if no cycles exist. In trees, DFS entry times map subtrees to contiguous ranges, enabling range-query tricks like prefix sums or Fenwick trees. Moreover, algorithms for strongly connected components (SCCs), like Kosaraju’s two-pass DFS, rely directly on finish times and graph reversal. All of this runs in linear time with respect to the number of vertices and edges, making DFS both elegant and efficient for competitive programming tasks in the 1200–1600 rating range.
02Intuition & Analogies
Imagine exploring a cave system with a single rope. You tie the rope at the entrance, pick a tunnel, and go as deep as you can, marking each chamber when you first step in (discovery time). When you reach a dead end, you backtrack along the rope to the last unvisited tunnel, and when you finally leave a chamber for the last time, you note its finish time. If you step into Chamber B from Chamber A, then B and everything inside it is physically inside the loop of rope around A—like nested loops of string. In DFS terms, the discovery and finish times of B are enclosed by those of A, mirroring the nesting of function calls on the recursion stack. This nesting is the key: it lets us tell if a node is a descendant (it lies within your rope loop), or unrelated (their loops don’t overlap). When you find a passage from the current chamber back to one whose rope loop is still open (you haven’t finished it), that’s a back edge—it forms a cycle because you can walk from that ancestor down to you and then jump back up the shortcut. For trees, if you label chambers in the order you entered them, all rooms in a chamber’s subcave are in one continuous block in that list—perfect for summing up treasures in a whole subcave using a range sum. For directed graphs, finishing the deepest caves first and ordering chambers by when you left them reveals a direction-consistent schedule (a topological order) if no cycles exist. And by reversing tunnels and exploring in finish-time order (Kosaraju), you naturally walk exactly within each strongly connected blob before jumping to the next.
03Formal Definition
04When to Use
Use DFS when you need to explore connectivity or structure with minimal overhead. Typical cases include: (1) Tree queries: flatten a rooted tree with entry times so that subtree queries (sums, counts, min/max) reduce to range queries on an array. This is perfect for problems where updates are rare or can be handled with Fenwick/Segment trees on the Euler tour. (2) Cycle detection: in directed graphs, if you must determine whether a cycle exists or output one such cycle, DFS with a recursion stack (gray set) quickly finds it. In undirected graphs, DFS reveals cycles by encountering visited nodes that are not the parent. (3) Topological sorting: if you suspect the graph is a DAG, a single DFS that collects vertices by reverse finish time outputs a topological order. (4) Strongly connected components: to cluster vertices into SCCs for condensation DAGs, Kosaraju (two DFS passes) or Tarjan (single DFS with low-link values) are the go-to choices. (5) Edge classification: when reasoning about graph structure, such as proving properties or detecting back edges for correctness conditions, DFS timestamps provide a precise basis. DFS shines when memory is tight and you need linear-time traversals; it is less suitable when you need shortest paths in weighted graphs (use Dijkstra) or many dynamic updates to the structure (consider dynamic trees or BFS/DSU depending on needs).
⚠️Common Mistakes
Common pitfalls include: (1) Mixing up timestamp inequalities: the correct ancestor test with timestamps is t_in(u) < t_in(v) and t_out(v) < t_out(u). Using non-strict inequalities or comparing t_out incorrectly causes subtle bugs in edge classification and ancestor checks. (2) Forgetting to ignore the parent in undirected DFS when checking for cycles—this produces false cycle reports. Always ensure that the visited neighbor is not the direct parent before declaring a cycle. (3) Not resetting or properly sizing global arrays (tin, tout, color) across multiple test cases, which is frequent in contest settings. (4) Stack overflow due to deep recursion on large or skewed graphs. In C++, very deep trees can exceed the recursion limit; consider iterative DFS or increase stack size. (5) Misbuilding the Euler tour: the subtree interval is [tin[u], tin[u] + sz[u] - 1]; using tout[u] as the right boundary only works if you define tout to equal tin[u] + sz[u] - 1, which differs from the classic exit time. (6) Using DFS for shortest paths in weighted graphs—DFS does not account for weights and can fail badly; use Dijkstra or Bellman–Ford instead. (7) In Kosaraju, forgetting to clear the visited array between passes or pushing vertices before fully finishing them in the first pass. (8) Mutating the graph (like sorting adjacency lists) in the middle of a DFS without care can change traversal order and break assumptions about timestamps or test expectations.
Key Formulas
DFS Time Complexity
Explanation: DFS visits each vertex and edge a constant number of times. Here n is the number of vertices and m is the number of edges.
Ancestor Test
Explanation: Vertex u is an ancestor of v in the DFS tree if and only if v’s discovery happens between u’s discovery and finish, and v finishes before u. This captures the nesting property of recursion.
Euler Tour Subtree Range
Explanation: When we assign positions by discovery time and record subtree sizes, all nodes in u’s subtree occupy a contiguous segment in the Euler array. This enables range operations for subtree queries.
Back Edge Cycle Criterion
Explanation: If during DFS we see an edge to a vertex that is currently on the recursion stack (gray), then we have discovered a cycle in a directed graph.
Back Edge (Directed)
Explanation: Edge (u, v) is a back edge if v is an ancestor of u in the DFS tree. This strictly characterizes cycles in directed graphs.
Forward Edge (Directed)
Explanation: A forward edge goes from a node to a proper descendant in the DFS tree but is not the actual tree edge. T denotes the set of tree edges.
Cross Edge (Intervals Disjoint)
Explanation: If two vertices’ time intervals are disjoint, an edge between them is a cross edge; neither is an ancestor of the other.
Topological Order via Finish Times
Explanation: In a DAG, sorting vertices by decreasing finish time yields a valid topological order. Any edge goes from an earlier to a later position in this order.
DFS Forest Tree Edges
Explanation: A DFS forest over k components with vertex sets has exactly n - k tree edges. Each tree on vertices contributes - 1 tree edges.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // This example demonstrates: 5 // 1) Computing DFS entry times (tin), exit times (tout_exit), and subtree sizes (sz) 6 // 2) Flattening a tree so each subtree is a contiguous range in an array 7 // 3) Answering static subtree sum queries using a prefix sum over the Euler array 8 // Note: For dynamic updates, replace prefix with a Fenwick/Segment Tree. 9 10 const int MAXN = 200000; // adjust as needed 11 vector<int> g[MAXN + 5]; 12 int n; 13 vector<long long> val; // node values (1-indexed) 14 vector<int> tin, tout_exit, sz;// timestamps and subtree sizes 15 vector<int> euler; // euler[pos] = value of node at position pos 16 int timer_dfs = 0; 17 18 void dfs(int u, int p) { 19 tin[u] = ++timer_dfs; // discovery time / position in flattened array 20 sz[u] = 1; 21 euler[tin[u]] = (int)val[u]; 22 for (int v : g[u]) { 23 if (v == p) continue; // important for undirected trees: ignore parent 24 dfs(v, u); 25 sz[u] += sz[v]; 26 } 27 tout_exit[u] = ++timer_dfs; // classic exit time (not used as right boundary here) 28 } 29 30 int main() { 31 ios::sync_with_stdio(false); 32 cin.tie(nullptr); 33 34 // Example input format: 35 // n 36 // val[1..n] 37 // n-1 edges u v (1-indexed, undirected, forms a tree) 38 // q 39 // q lines: node u for subtree sum query 40 // Sample: 41 // 7 42 // 1 2 3 4 5 6 7 43 // 1 2 44 // 1 3 45 // 2 4 46 // 2 5 47 // 3 6 48 // 3 7 49 // 3 50 // 1 51 // 2 52 // 3 53 54 if (!(cin >> n)) return 0; 55 val.assign(n + 1, 0); 56 for (int i = 1; i <= n; ++i) cin >> val[i]; 57 for (int i = 1; i <= n; ++i) g[i].clear(); 58 for (int i = 0; i < n - 1; ++i) { 59 int u, v; cin >> u >> v; 60 g[u].push_back(v); 61 g[v].push_back(u); 62 } 63 64 tin.assign(n + 1, 0); 65 tout_exit.assign(n + 1, 0); 66 sz.assign(n + 1, 0); 67 euler.assign(n + 2, 0); // 1..n used; +2 for safety 68 timer_dfs = 0; 69 70 int root = 1; // assume 1 is the root 71 dfs(root, -1); 72 73 // Build prefix sums over Euler array using subtree range [tin[u], tin[u] + sz[u] - 1] 74 vector<long long> pref(n + 2, 0); 75 for (int i = 1; i <= n; ++i) pref[i] = pref[i - 1] + (long long)euler[i]; 76 77 auto subtree_sum = [&](int u) { 78 int L = tin[u]; 79 int R = tin[u] + sz[u] - 1; // right boundary from subtree size 80 return pref[R] - pref[L - 1]; 81 }; 82 83 // Optional: function to check if u is ancestor of v using classic times 84 auto is_ancestor = [&](int u, int v) { 85 return tin[u] < tin[v] && tout_exit[v] < tout_exit[u]; 86 }; 87 88 int q; cin >> q; 89 while (q--) { 90 int u; cin >> u; 91 cout << subtree_sum(u) << "\n"; 92 } 93 94 // Example: check ancestry (not part of required output) 95 // cerr << (is_ancestor(1, 4) ? "1 is ancestor of 4\n" : "not ancestor\n"); 96 97 return 0; 98 } 99
We perform a DFS on a rooted tree to compute tin, tout_exit, and subtree sizes. Using tin as the position in a flattened array and sz as the subtree size, the subtree of u corresponds to the contiguous range [tin[u], tin[u] + sz[u] - 1]. We store val[u] at euler[tin[u]] and build a prefix sum so each subtree sum query is answered in O(1). We also show an is_ancestor test using the classic strict-inequality timestamp rule.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Detect a cycle in a directed graph using DFS colors: 5 // 0 = unvisited (white), 1 = in recursion stack (gray), 2 = finished (black). 6 // On seeing an edge to a gray node, we have a back edge and can reconstruct a cycle. 7 8 int n, m; 9 vector<vector<int>> g; 10 vector<int> color, parent; 11 int start_cycle = -1, end_cycle = -1; 12 13 bool dfs(int u) { 14 color[u] = 1; // gray 15 for (int v : g[u]) { 16 if (color[v] == 0) { 17 parent[v] = u; 18 if (dfs(v)) return true; 19 } else if (color[v] == 1) { 20 // back edge u -> v found; cycle exists 21 start_cycle = v; // cycle start 22 end_cycle = u; // current node 23 return true; 24 } 25 } 26 color[u] = 2; // black 27 return false; 28 } 29 30 int main() { 31 ios::sync_with_stdio(false); 32 cin.tie(nullptr); 33 34 // Input: 35 // n m 36 // m edges: u v (directed) 37 // Prints a cycle if exists, else ACYCLIC 38 39 if (!(cin >> n >> m)) return 0; 40 g.assign(n + 1, {}); 41 for (int i = 0; i < m; ++i) { 42 int u, v; cin >> u >> v; 43 g[u].push_back(v); 44 } 45 46 color.assign(n + 1, 0); 47 parent.assign(n + 1, -1); 48 49 bool has_cycle = false; 50 for (int u = 1; u <= n; ++u) { 51 if (color[u] == 0 && dfs(u)) { has_cycle = true; break; } 52 } 53 54 if (!has_cycle) { 55 cout << "ACYCLIC\n"; 56 return 0; 57 } 58 59 // Reconstruct the cycle: walk from end_cycle up to start_cycle using parents 60 vector<int> cycle; 61 cycle.push_back(start_cycle); 62 for (int v = end_cycle; v != start_cycle; v = parent[v]) cycle.push_back(v); 63 cycle.push_back(start_cycle); 64 reverse(cycle.begin(), cycle.end()); 65 66 cout << "CYCLE FOUND (length = " << (int)cycle.size() - 1 << "):\n"; 67 for (size_t i = 0; i < cycle.size(); ++i) { 68 cout << cycle[i] << (i + 1 == cycle.size() ? '\n' : ' '); 69 } 70 return 0; 71 } 72
We color nodes during DFS and detect a back edge when encountering a gray node in the recursion stack. This guarantees a directed cycle. We store parents to reconstruct the actual cycle by walking back from the current node to the gray ancestor.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Kosaraju's algorithm: 5 // 1) DFS on G to compute order by finish times. 6 // 2) DFS on reversed graph in reverse order to extract SCCs. 7 8 int n, m; 9 vector<vector<int>> g, gr; // graph and reversed graph 10 vector<int> used; 11 vector<int> order, comp; // order by finish time; comp[v] = component id 12 13 void dfs1(int u) { 14 used[u] = 1; 15 for (int v : g[u]) if (!used[v]) dfs1(v); 16 order.push_back(u); // push after exploring: this is by increasing finish time 17 } 18 19 void dfs2(int u, int cid) { 20 comp[u] = cid; 21 for (int v : gr[u]) if (comp[v] == -1) dfs2(v, cid); 22 } 23 24 int main() { 25 ios::sync_with_stdio(false); 26 cin.tie(nullptr); 27 28 // Input: 29 // n m 30 // m edges u v (directed) 31 // Output: number of SCCs and the component id of each vertex (1..k) 32 33 if (!(cin >> n >> m)) return 0; 34 g.assign(n + 1, {}); 35 gr.assign(n + 1, {}); 36 for (int i = 0; i < m; ++i) { 37 int u, v; cin >> u >> v; 38 g[u].push_back(v); 39 gr[v].push_back(u); // reversed edge 40 } 41 42 used.assign(n + 1, 0); 43 order.clear(); 44 45 for (int u = 1; u <= n; ++u) if (!used[u]) dfs1(u); 46 47 comp.assign(n + 1, -1); 48 int cid = 0; 49 for (int i = (int)order.size() - 1; i >= 0; --i) { 50 int u = order[i]; 51 if (comp[u] == -1) dfs2(u, ++cid); 52 } 53 54 cout << cid << "\n"; // number of SCCs 55 for (int u = 1; u <= n; ++u) cout << comp[u] << (u == n ? '\n' : ' '); 56 return 0; 57 } 58
The first DFS collects vertices in order of finishing times. Processing vertices in reverse finish-time order on the reversed graph ensures each DFS stays inside one SCC before moving to the next. The result is a partition of vertices into SCCs and their component ids.