⚙️AlgorithmIntermediate

DFS - Tree and Graph Properties

Key Points

  • Depth-First Search (DFS) assigns each vertex a discovery time and a finish time that capture a neat nesting structure of recursion.
  • Using entry (tin) and exit (tout) times, the entire subtree of a node maps to a contiguous segment in an Euler tour array.
  • Back edges in directed graphs indicate cycles; detecting a gray-to-gray edge during DFS proves the graph is cyclic.
  • A DFS spanning forest organizes edges into tree, back, forward, and cross edges, each with clear timestamp-based tests.
  • Subtree queries such as sums or counts can be answered quickly by flattening the tree into an array via tin order.
  • Topological order for a DAG can be obtained by sorting vertices by decreasing finish times from a DFS.
  • Strongly Connected Components (SCCs) can be found in O(n + m) time using DFS-based algorithms like Kosaraju or Tarjan.
  • DFS is linear in the size of the graph and uses stack space proportional to recursion depth, which can be O(n) in worst cases.

Prerequisites

  • Graphs and representations (adjacency lists)You must know how to store and traverse graphs efficiently to implement DFS.
  • Recursion and call stackDFS is naturally recursive; understanding the stack helps explain discovery/finish times.
  • Big-O notationTo analyze DFS-based algorithms’ time and space complexity.
  • Trees and rooted treesSubtree queries and ancestor tests rely on tree structure induced by DFS.
  • Prefix sums and range queriesFlattened subtrees map to contiguous ranges where prefix sums or Fenwick trees apply.
  • Stacks/queues and basic data structuresFor iterative DFS variants and efficient processing of queries.
  • Directed vs undirected graphsEdge classification and cycle detection differ between directed and undirected settings.

Detailed Explanation

Tap terms for definitions

01Overview

Depth-First Search (DFS) is a foundational graph traversal technique that explores as far as possible along each branch before backtracking. While many learn DFS as a simple way to visit all nodes, its real power comes from the structure it induces: each vertex is stamped with a discovery time (when we first see it) and a finish time (when we leave it). These timestamps create a nested parenthesis structure that mirrors the recursion stack. From this structure, we derive useful properties: we can classify edges (tree, back, forward, cross), detect cycles, and flatten trees for fast subtree queries. In directed graphs, the presence of a back edge implies a cycle, while ordering by finish time yields a topological order if no cycles exist. In trees, DFS entry times map subtrees to contiguous ranges, enabling range-query tricks like prefix sums or Fenwick trees. Moreover, algorithms for strongly connected components (SCCs), like Kosaraju’s two-pass DFS, rely directly on finish times and graph reversal. All of this runs in linear time with respect to the number of vertices and edges, making DFS both elegant and efficient for competitive programming tasks in the 1200–1600 rating range.

02Intuition & Analogies

Imagine exploring a cave system with a single rope. You tie the rope at the entrance, pick a tunnel, and go as deep as you can, marking each chamber when you first step in (discovery time). When you reach a dead end, you backtrack along the rope to the last unvisited tunnel, and when you finally leave a chamber for the last time, you note its finish time. If you step into Chamber B from Chamber A, then B and everything inside it is physically inside the loop of rope around A—like nested loops of string. In DFS terms, the discovery and finish times of B are enclosed by those of A, mirroring the nesting of function calls on the recursion stack. This nesting is the key: it lets us tell if a node is a descendant (it lies within your rope loop), or unrelated (their loops don’t overlap). When you find a passage from the current chamber back to one whose rope loop is still open (you haven’t finished it), that’s a back edge—it forms a cycle because you can walk from that ancestor down to you and then jump back up the shortcut. For trees, if you label chambers in the order you entered them, all rooms in a chamber’s subcave are in one continuous block in that list—perfect for summing up treasures in a whole subcave using a range sum. For directed graphs, finishing the deepest caves first and ordering chambers by when you left them reveals a direction-consistent schedule (a topological order) if no cycles exist. And by reversing tunnels and exploring in finish-time order (Kosaraju), you naturally walk exactly within each strongly connected blob before jumping to the next.

03Formal Definition

Given a graph G = (V, E), a DFS starts from an unvisited vertex u, marks it discovered at time (u), recursively visits each unvisited neighbor v, and upon finishing all neighbors of u, marks (u). The DFS over all unvisited vertices produces a DFS forest whose trees are rooted at the start vertices of each connected component (undirected) or reachability region (directed). For directed graphs, each edge (u, v) can be classified using timestamps: tree edges are those first discovering v from u; back edges connect u to an ancestor v in the DFS tree ((v) < (u) and (u) < (v)); forward edges go from u to a descendant v ((u) < (v) and (v) < (u) but not a tree edge); cross edges connect nodes in different branches (their time intervals are disjoint). For trees, if we define an Euler tour index pos[u] = (u) and record subtree size sz[u], then the subtree of u corresponds to the contiguous range [pos[u], pos[u] + sz[u] - 1] in the Euler array. In directed graphs, the presence of any back edge implies a directed cycle. In DAGs, sorting vertices by decreasing yields a valid topological ordering. Strongly connected components (SCCs) are maximal subsets of vertices where every vertex reaches every other; Kosaraju’s algorithm computes them via two DFS passes: one to get an order by finish times, and another on the reversed graph following that order.

04When to Use

Use DFS when you need to explore connectivity or structure with minimal overhead. Typical cases include: (1) Tree queries: flatten a rooted tree with entry times so that subtree queries (sums, counts, min/max) reduce to range queries on an array. This is perfect for problems where updates are rare or can be handled with Fenwick/Segment trees on the Euler tour. (2) Cycle detection: in directed graphs, if you must determine whether a cycle exists or output one such cycle, DFS with a recursion stack (gray set) quickly finds it. In undirected graphs, DFS reveals cycles by encountering visited nodes that are not the parent. (3) Topological sorting: if you suspect the graph is a DAG, a single DFS that collects vertices by reverse finish time outputs a topological order. (4) Strongly connected components: to cluster vertices into SCCs for condensation DAGs, Kosaraju (two DFS passes) or Tarjan (single DFS with low-link values) are the go-to choices. (5) Edge classification: when reasoning about graph structure, such as proving properties or detecting back edges for correctness conditions, DFS timestamps provide a precise basis. DFS shines when memory is tight and you need linear-time traversals; it is less suitable when you need shortest paths in weighted graphs (use Dijkstra) or many dynamic updates to the structure (consider dynamic trees or BFS/DSU depending on needs).

⚠️Common Mistakes

Common pitfalls include: (1) Mixing up timestamp inequalities: the correct ancestor test with timestamps is t_in(u) < t_in(v) and t_out(v) < t_out(u). Using non-strict inequalities or comparing t_out incorrectly causes subtle bugs in edge classification and ancestor checks. (2) Forgetting to ignore the parent in undirected DFS when checking for cycles—this produces false cycle reports. Always ensure that the visited neighbor is not the direct parent before declaring a cycle. (3) Not resetting or properly sizing global arrays (tin, tout, color) across multiple test cases, which is frequent in contest settings. (4) Stack overflow due to deep recursion on large or skewed graphs. In C++, very deep trees can exceed the recursion limit; consider iterative DFS or increase stack size. (5) Misbuilding the Euler tour: the subtree interval is [tin[u], tin[u] + sz[u] - 1]; using tout[u] as the right boundary only works if you define tout to equal tin[u] + sz[u] - 1, which differs from the classic exit time. (6) Using DFS for shortest paths in weighted graphs—DFS does not account for weights and can fail badly; use Dijkstra or Bellman–Ford instead. (7) In Kosaraju, forgetting to clear the visited array between passes or pushing vertices before fully finishing them in the first pass. (8) Mutating the graph (like sorting adjacency lists) in the middle of a DFS without care can change traversal order and break assumptions about timestamps or test expectations.

Key Formulas

DFS Time Complexity

Explanation: DFS visits each vertex and edge a constant number of times. Here n is the number of vertices and m is the number of edges.

Ancestor Test

Explanation: Vertex u is an ancestor of v in the DFS tree if and only if v’s discovery happens between u’s discovery and finish, and v finishes before u. This captures the nesting property of recursion.

Euler Tour Subtree Range

Explanation: When we assign positions by discovery time and record subtree sizes, all nodes in u’s subtree occupy a contiguous segment in the Euler array. This enables range operations for subtree queries.

Back Edge Cycle Criterion

Explanation: If during DFS we see an edge to a vertex that is currently on the recursion stack (gray), then we have discovered a cycle in a directed graph.

Back Edge (Directed)

Explanation: Edge (u, v) is a back edge if v is an ancestor of u in the DFS tree. This strictly characterizes cycles in directed graphs.

Forward Edge (Directed)

Explanation: A forward edge goes from a node to a proper descendant in the DFS tree but is not the actual tree edge. T denotes the set of tree edges.

Cross Edge (Intervals Disjoint)

Explanation: If two vertices’ time intervals are disjoint, an edge between them is a cross edge; neither is an ancestor of the other.

Topological Order via Finish Times

Explanation: In a DAG, sorting vertices by decreasing finish time yields a valid topological order. Any edge goes from an earlier to a later position in this order.

DFS Forest Tree Edges

Explanation: A DFS forest over k components with vertex sets has exactly n - k tree edges. Each tree on vertices contributes - 1 tree edges.

Complexity Analysis

DFS runs in O(n + m) time where n is the number of vertices and m is the number of edges, because each vertex is discovered once and each adjacency list is scanned once. Recording timestamps and adds only O(1) work per event. Building an Euler tour array also takes O(n), and answering a static subtree sum query with a prefix array is O(1) per query after O(n) preprocessing; with a Fenwick or segment tree, point updates are O(log n) and subtree queries are O(log n). Cycle detection via a recursion stack maintains an O(1) per-edge overhead (checking colors), keeping overall time O(n + m). Edge classification using timestamps is O(m) after a single DFS populates times. Kosaraju’s algorithm for SCCs performs two DFS passes and a graph reversal, all of which sum to O(n + m). Space-wise, adjacency lists require O(n + m). DFS recursion uses O(h) stack space where h is the recursion depth; in the worst case ). Auxiliary arrays (tin, tout, color, parent, sz, order) take O(n) space. For very deep or adversarial graphs, consider iterative DFS to avoid stack overflow. Overall, DFS-based solutions are among the most space- and time-efficient linear-time techniques in graph algorithms.

Code Examples

Tree Flattening with DFS Timestamps for Subtree Sum Queries (Static)
1#include <bits/stdc++.h>
2using namespace std;
3
4// This example demonstrates:
5// 1) Computing DFS entry times (tin), exit times (tout_exit), and subtree sizes (sz)
6// 2) Flattening a tree so each subtree is a contiguous range in an array
7// 3) Answering static subtree sum queries using a prefix sum over the Euler array
8// Note: For dynamic updates, replace prefix with a Fenwick/Segment Tree.
9
10const int MAXN = 200000; // adjust as needed
11vector<int> g[MAXN + 5];
12int n;
13vector<long long> val; // node values (1-indexed)
14vector<int> tin, tout_exit, sz;// timestamps and subtree sizes
15vector<int> euler; // euler[pos] = value of node at position pos
16int timer_dfs = 0;
17
18void dfs(int u, int p) {
19 tin[u] = ++timer_dfs; // discovery time / position in flattened array
20 sz[u] = 1;
21 euler[tin[u]] = (int)val[u];
22 for (int v : g[u]) {
23 if (v == p) continue; // important for undirected trees: ignore parent
24 dfs(v, u);
25 sz[u] += sz[v];
26 }
27 tout_exit[u] = ++timer_dfs; // classic exit time (not used as right boundary here)
28}
29
30int main() {
31 ios::sync_with_stdio(false);
32 cin.tie(nullptr);
33
34 // Example input format:
35 // n
36 // val[1..n]
37 // n-1 edges u v (1-indexed, undirected, forms a tree)
38 // q
39 // q lines: node u for subtree sum query
40 // Sample:
41 // 7
42 // 1 2 3 4 5 6 7
43 // 1 2
44 // 1 3
45 // 2 4
46 // 2 5
47 // 3 6
48 // 3 7
49 // 3
50 // 1
51 // 2
52 // 3
53
54 if (!(cin >> n)) return 0;
55 val.assign(n + 1, 0);
56 for (int i = 1; i <= n; ++i) cin >> val[i];
57 for (int i = 1; i <= n; ++i) g[i].clear();
58 for (int i = 0; i < n - 1; ++i) {
59 int u, v; cin >> u >> v;
60 g[u].push_back(v);
61 g[v].push_back(u);
62 }
63
64 tin.assign(n + 1, 0);
65 tout_exit.assign(n + 1, 0);
66 sz.assign(n + 1, 0);
67 euler.assign(n + 2, 0); // 1..n used; +2 for safety
68 timer_dfs = 0;
69
70 int root = 1; // assume 1 is the root
71 dfs(root, -1);
72
73 // Build prefix sums over Euler array using subtree range [tin[u], tin[u] + sz[u] - 1]
74 vector<long long> pref(n + 2, 0);
75 for (int i = 1; i <= n; ++i) pref[i] = pref[i - 1] + (long long)euler[i];
76
77 auto subtree_sum = [&](int u) {
78 int L = tin[u];
79 int R = tin[u] + sz[u] - 1; // right boundary from subtree size
80 return pref[R] - pref[L - 1];
81 };
82
83 // Optional: function to check if u is ancestor of v using classic times
84 auto is_ancestor = [&](int u, int v) {
85 return tin[u] < tin[v] && tout_exit[v] < tout_exit[u];
86 };
87
88 int q; cin >> q;
89 while (q--) {
90 int u; cin >> u;
91 cout << subtree_sum(u) << "\n";
92 }
93
94 // Example: check ancestry (not part of required output)
95 // cerr << (is_ancestor(1, 4) ? "1 is ancestor of 4\n" : "not ancestor\n");
96
97 return 0;
98}
99

We perform a DFS on a rooted tree to compute tin, tout_exit, and subtree sizes. Using tin as the position in a flattened array and sz as the subtree size, the subtree of u corresponds to the contiguous range [tin[u], tin[u] + sz[u] - 1]. We store val[u] at euler[tin[u]] and build a prefix sum so each subtree sum query is answered in O(1). We also show an is_ancestor test using the classic strict-inequality timestamp rule.

Time: O(n) for preprocessing; O(1) per querySpace: O(n)
Directed Cycle Detection and Printing a Cycle via Back Edge (Gray Stack)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Detect a cycle in a directed graph using DFS colors:
5// 0 = unvisited (white), 1 = in recursion stack (gray), 2 = finished (black).
6// On seeing an edge to a gray node, we have a back edge and can reconstruct a cycle.
7
8int n, m;
9vector<vector<int>> g;
10vector<int> color, parent;
11int start_cycle = -1, end_cycle = -1;
12
13bool dfs(int u) {
14 color[u] = 1; // gray
15 for (int v : g[u]) {
16 if (color[v] == 0) {
17 parent[v] = u;
18 if (dfs(v)) return true;
19 } else if (color[v] == 1) {
20 // back edge u -> v found; cycle exists
21 start_cycle = v; // cycle start
22 end_cycle = u; // current node
23 return true;
24 }
25 }
26 color[u] = 2; // black
27 return false;
28}
29
30int main() {
31 ios::sync_with_stdio(false);
32 cin.tie(nullptr);
33
34 // Input:
35 // n m
36 // m edges: u v (directed)
37 // Prints a cycle if exists, else ACYCLIC
38
39 if (!(cin >> n >> m)) return 0;
40 g.assign(n + 1, {});
41 for (int i = 0; i < m; ++i) {
42 int u, v; cin >> u >> v;
43 g[u].push_back(v);
44 }
45
46 color.assign(n + 1, 0);
47 parent.assign(n + 1, -1);
48
49 bool has_cycle = false;
50 for (int u = 1; u <= n; ++u) {
51 if (color[u] == 0 && dfs(u)) { has_cycle = true; break; }
52 }
53
54 if (!has_cycle) {
55 cout << "ACYCLIC\n";
56 return 0;
57 }
58
59 // Reconstruct the cycle: walk from end_cycle up to start_cycle using parents
60 vector<int> cycle;
61 cycle.push_back(start_cycle);
62 for (int v = end_cycle; v != start_cycle; v = parent[v]) cycle.push_back(v);
63 cycle.push_back(start_cycle);
64 reverse(cycle.begin(), cycle.end());
65
66 cout << "CYCLE FOUND (length = " << (int)cycle.size() - 1 << "):\n";
67 for (size_t i = 0; i < cycle.size(); ++i) {
68 cout << cycle[i] << (i + 1 == cycle.size() ? '\n' : ' ');
69 }
70 return 0;
71}
72

We color nodes during DFS and detect a back edge when encountering a gray node in the recursion stack. This guarantees a directed cycle. We store parents to reconstruct the actual cycle by walking back from the current node to the gray ancestor.

Time: O(n + m)Space: O(n + m) for the graph and O(n) for DFS stacks/arrays
Strongly Connected Components (SCC) via Kosaraju’s Two-Pass DFS
1#include <bits/stdc++.h>
2using namespace std;
3
4// Kosaraju's algorithm:
5// 1) DFS on G to compute order by finish times.
6// 2) DFS on reversed graph in reverse order to extract SCCs.
7
8int n, m;
9vector<vector<int>> g, gr; // graph and reversed graph
10vector<int> used;
11vector<int> order, comp; // order by finish time; comp[v] = component id
12
13void dfs1(int u) {
14 used[u] = 1;
15 for (int v : g[u]) if (!used[v]) dfs1(v);
16 order.push_back(u); // push after exploring: this is by increasing finish time
17}
18
19void dfs2(int u, int cid) {
20 comp[u] = cid;
21 for (int v : gr[u]) if (comp[v] == -1) dfs2(v, cid);
22}
23
24int main() {
25 ios::sync_with_stdio(false);
26 cin.tie(nullptr);
27
28 // Input:
29 // n m
30 // m edges u v (directed)
31 // Output: number of SCCs and the component id of each vertex (1..k)
32
33 if (!(cin >> n >> m)) return 0;
34 g.assign(n + 1, {});
35 gr.assign(n + 1, {});
36 for (int i = 0; i < m; ++i) {
37 int u, v; cin >> u >> v;
38 g[u].push_back(v);
39 gr[v].push_back(u); // reversed edge
40 }
41
42 used.assign(n + 1, 0);
43 order.clear();
44
45 for (int u = 1; u <= n; ++u) if (!used[u]) dfs1(u);
46
47 comp.assign(n + 1, -1);
48 int cid = 0;
49 for (int i = (int)order.size() - 1; i >= 0; --i) {
50 int u = order[i];
51 if (comp[u] == -1) dfs2(u, ++cid);
52 }
53
54 cout << cid << "\n"; // number of SCCs
55 for (int u = 1; u <= n; ++u) cout << comp[u] << (u == n ? '\n' : ' ');
56 return 0;
57}
58

The first DFS collects vertices in order of finishing times. Processing vertices in reverse finish-time order on the reversed graph ensures each DFS stays inside one SCC before moving to the next. The result is a partition of vertices into SCCs and their component ids.

Time: O(n + m)Space: O(n + m)