⚙️AlgorithmIntermediate

Depth-First Search (DFS)

Key Points

  • Depth-First Search (DFS) explores a graph by going as deep as possible along each path before backtracking.
  • It can be implemented with recursion (call stack) or an explicit stack and runs in O(V + E) time on adjacency lists.
  • DFS is used for connected components, cycle detection, topological sorting, and classifying edges in directed graphs.
  • In directed graphs, DFS classifies edges as tree, back, forward, or cross based on discovery and finish times.
  • Topological sort with DFS works only on DAGs and detects cycles via a 'gray' (on-stack) state.
  • Preorder (discover time) and postorder (finish time) sequences from DFS reveal structure and dependencies.
  • On undirected graphs, the presence of any back edge detected by DFS indicates a cycle.
  • Space usage is O(V) for the visited array plus up to O(V) for recursion or an explicit stack.

Prerequisites

  • Graphs and representations (adjacency list vs matrix)Understanding how graphs are stored is essential to implement DFS efficiently and reason about O(|V| + |E|) time.
  • Recursion and call stacksRecursive DFS relies on function calls to dive deeper and backtrack, mirroring DFS behavior.
  • Stacks (LIFO data structure)Iterative DFS uses an explicit stack to simulate recursion and avoid stack overflows.
  • Big-O notationAnalyzing DFS time and space complexity uses asymptotic notation like O(|V| + |E|).
  • Directed vs undirected graphsCycle detection and edge classification rules differ based on graph directionality.

Detailed Explanation

Tap terms for definitions

01Overview

Depth-First Search (DFS) is a fundamental graph traversal algorithm that systematically explores vertices and edges by following a path as far as possible, then backtracking when it hits a dead end. Think of walking through a maze by always choosing one direction and sticking with it until you can't go further, then retracing your steps to try another path. DFS can be implemented recursively or iteratively using an explicit stack; both approaches simulate the same process of diving deep and unwinding. The algorithm is versatile: it can find connected components, detect cycles, compute topological order in directed acyclic graphs (DAGs), and classify edges to reveal graph structure. Its behavior depends on the representation of the graph; with adjacency lists, it visits each vertex and each edge at most a constant number of times, giving a linear-time complexity in the size of the graph. DFS produces rich metadata like discovery and finishing times for each vertex, which help reason about ancestor–descendant relationships and edge types. This information is the backbone of many advanced graph techniques used in competitive programming and computer science, including identifying articulation points, bridges, and strongly connected components.

02Intuition & Analogies

Imagine exploring a vast library with interconnected rooms. You start at one room, pick the first unexplored doorway you see, and keep going deeper through doorways without turning back. Only when you reach a room where every doorway is either blocked or already visited do you backtrack to the previous room and try the next unexplored doorway. This is precisely the intuition of DFS: commit to a path, push forward, then unwind when stuck. In programming terms, recursion naturally models this behavior: the call to visit a neighbor "pauses" the current room until the deeper calls finish, and then you resume to consider the next doorway. If recursion feels magical, picture a stack of sticky notes: each time you enter a new room, you write "I came from room X and I was at neighbor index k" and push it onto the stack. When you're done with that room, pop the note and continue where you left off. DFS also leaves time stamps (discovery when you first enter a room and finishing when you leave) which let you infer relationships: if room A’s entire exploration interval wraps around room B’s, then B lies inside A’s region—B is a descendant of A in the DFS tree. In directed graphs, when you see a doorway that leads to a room still in the middle of exploration (on the stack), you’ve just found a back edge—proof of a cycle. These simple metaphors—going deep first, using a stack of context, and stamping times—explain why DFS is both simple to implement and powerful in what it reveals.

03Formal Definition

Given a graph G = (V, E), Depth-First Search systematically explores V by maintaining a set of visited vertices and recursively (or iteratively with a stack) visiting unvisited neighbors. Starting from a source s, DFS marks s as visited, then for each edge (s, v) explores v if it is not yet visited, recursively applying the same rule. The traversal defines a DFS forest F whose trees correspond to connected components (undirected) or reachability regions (directed). Each vertex u receives two time stamps during DFS: discovery time d[u] when u is first visited and finishing time f[u] when all edges from u have been processed. For directed graphs, every edge (u, v) is classified using these times and the color states: tree edge if v was undiscovered upon exploration; back edge if v is on the current recursion stack (gray); forward edge if v is a proper descendant (d[u] < d[v] and f[v] < f[u]); and cross edge otherwise. Using an adjacency list, DFS runs in O(|V| + |E|) time and O(|V|) space, where it touches each vertex and each edge at most a constant number of times while storing visitation metadata and the implicit/explicit stack.

04When to Use

Use DFS whenever you need to explore connectivity or depth-related structure in a graph. In undirected graphs, it quickly finds connected components, counts them, and extracts each component’s vertices. When you must detect cycles, DFS provides simple patterns: in undirected graphs, encountering an already visited neighbor that is not the parent reveals a cycle; in directed graphs, any edge to a gray vertex signals a cycle. For ordering tasks where dependencies are acyclic, DFS yields a topological sort by pushing vertices onto a list upon finishing; the reversed finishing order respects all edge directions. DFS also powers more advanced routines: computing articulation points and bridges (network reliability), finding strongly connected components (Kosaraju/Tarjan algorithms), and solving puzzles like path existence in mazes with constraints. Prefer DFS over BFS when the problem cares about ancestor–descendant relations, recursion-friendly backtracking, or needs finishing-time logic (e.g., topological sort). However, if you require shortest paths in unweighted graphs, BFS is the correct choice; if the graph is extremely deep and recursion depth limits are a concern, consider iterative DFS with an explicit stack.

⚠️Common Mistakes

Common pitfalls include forgetting to mark a node as visited before recursing, which can lead to infinite recursion on cycles. In undirected graphs, failing to pass and check the parent can cause false-positive cycle detection because every edge appears twice. In directed graphs, mixing up the states (white/gray/black) or relying only on a visited boolean may miss cycle detection or misclassify edges. Another error is using adjacency matrices on large graphs, which inflates time to O(V^2) and memory to O(V^2) unnecessarily; adjacency lists are preferred for sparse graphs. Many beginners also push onto the topological order when discovering vertices instead of when finishing them, producing an invalid order. Edge classification can be implemented incorrectly if you attempt to classify using only visited flags without discovery/finish times. On the implementation side, recursive DFS on very deep graphs can overflow the call stack; competitive programmers should consider iterative DFS or increase recursion limits (where allowed). Finally, not clearing or correctly sizing global arrays between test cases (common in contests) can cause subtle bugs due to stale state, and forgetting to make the graph 0- or 1-index consistent leads to out-of-bounds errors.

Key Formulas

DFS Time Complexity

Explanation: DFS visits each vertex and each edge a constant number of times when using adjacency lists. Thus the total running time grows linearly with the number of vertices plus edges.

DFS Space Complexity

Explanation: DFS stores visited state and recursion/explicit stack information proportional to the number of vertices. In the worst case, the recursion depth (or explicit stack size) can reach |V|.

Handshake Lemma (Undirected)

Explanation: In undirected graphs, each edge contributes 2 to the total degree count. This identity explains why processing all adjacency lists takes O(|V| + |E|) time.

Out-degree Sum (Directed)

Explanation: In directed graphs, each edge contributes exactly 1 to some vertex’s out-degree. Iterating over all adjacency lists touches each directed edge once.

DFS Interval Nesting

Explanation: Discovery and finishing times form nested intervals. A vertex’s interval strictly contains those of its descendants.

Cycle Witness (Directed)

Explanation: During DFS, an edge to a gray vertex indicates a path from v to u already on the stack, creating a directed cycle. This is the core of DFS-based cycle detection.

Forward Edge Criterion

Explanation: Forward edges point from a node to a proper descendant in the DFS tree. The time intervals confirm the ancestor–descendant relationship.

Cross Edge Criterion

Explanation: When v is finished and not a descendant of u, an explored edge (u,v) is cross. It typically connects distinct DFS subtrees or non-ancestor nodes.

DAG Characterization via DFS

Explanation: A directed graph has no cycles if and only if a DFS produces no back edges. This property underlies DFS-based topological sorting.

Tree Edge Count

Explanation: In a DFS forest with c trees (components/reachability roots), the number of tree edges equals the number of vertices minus the number of trees. Each vertex except the c roots has exactly one parent edge.

Complexity Analysis

Using adjacency lists, DFS inspects each vertex’s adjacency list exactly once. The sum of lengths of all adjacency lists equals the number of edges (|E|) in directed graphs and 2|E| in undirected graphs. Therefore, the total edge processing is linear in the number of edges. Visiting each vertex involves constant work beyond iterating its neighbors (marking visited, setting times), adding O(|V|). Combined, the running time is O(|V| + |E|). When an adjacency matrix is used, scanning neighbors of one vertex takes O(|V|), so the entire traversal may degrade to O(|V|^2), which is inefficient for sparse graphs. Space usage consists of: (1) the visited/color arrays O(|V|), (2) optional arrays for discovery and finish times O(|V|), parents O(|V|), and (3) the recursion stack in recursive implementations or an explicit stack in iterative ones. In the worst case (e.g., a long path), the maximum depth equals |V|, so auxiliary stack space is O(|V|). Edge classification and topological sorting reuse the same asymptotic space. Overall, DFS requires O(|V|) extra memory beyond the graph storage. Note that the graph itself stored as an adjacency list requires O(|V| + |E|) space, which dominates for large, sparse graphs. In practice, for competitive programming constraints (e.g., |V| up to 2e5), iterative DFS may be preferred in C++ to avoid stack overflow, although optimized compilers and careful recursion can also suffice with appropriate stack limits.

Code Examples

Basic recursive DFS: connected components in an undirected graph
1#include <bits/stdc++.h>
2using namespace std;
3
4// Find connected components using DFS (recursive)
5// Input format:
6// n m
7// m lines: u v (1-indexed undirected edges)
8// Output: number of components and listing of each component's nodes
9
10void dfs(int u, const vector<vector<int>>& g, vector<int>& comp, int cid) {
11 comp[u] = cid; // mark component id (also serves as visited)
12 for (int v : g[u]) {
13 if (comp[v] == -1) {
14 dfs(v, g, comp, cid);
15 }
16 }
17}
18
19int main() {
20 ios::sync_with_stdio(false);
21 cin.tie(nullptr);
22
23 int n, m;
24 if (!(cin >> n >> m)) return 0;
25 vector<vector<int>> g(n + 1);
26 for (int i = 0; i < m; ++i) {
27 int u, v; cin >> u >> v;
28 g[u].push_back(v);
29 g[v].push_back(u); // undirected
30 }
31
32 vector<int> comp(n + 1, -1);
33 int cid = 0; // component counter starts from 0
34
35 for (int u = 1; u <= n; ++u) {
36 if (comp[u] == -1) {
37 dfs(u, g, comp, cid);
38 ++cid;
39 }
40 }
41
42 cout << cid << "\n";
43 // Collect vertices per component for display
44 vector<vector<int>> groups(cid);
45 for (int u = 1; u <= n; ++u) groups[comp[u]].push_back(u);
46 for (int i = 0; i < cid; ++i) {
47 cout << "component " << i << ":";
48 for (int u : groups[i]) cout << ' ' << u;
49 cout << '\n';
50 }
51 return 0;
52}
53

We perform a DFS from every unvisited vertex to label all vertices reachable from it with the same component id. The comp array doubles as a visited flag (comp[u] != -1 means visited) and a component label. Each DFS call explores neighbors recursively until no new vertices are found, thereby forming one connected component.

Time: O(n + m) where n=|V|, m=|E|Space: O(n) extra for comp plus O(n) recursion stack in the worst case
Iterative DFS with an explicit stack: preorder and postorder in a directed graph
1#include <bits/stdc++.h>
2using namespace std;
3
4// Iterative DFS that records preorder (discovery) and postorder (finish) sequences.
5// Input:
6// n m
7// m lines: u v (1-indexed directed edges)
8
9int main() {
10 ios::sync_with_stdio(false);
11 cin.tie(nullptr);
12
13 int n, m;
14 if (!(cin >> n >> m)) return 0;
15 vector<vector<int>> g(n + 1);
16 for (int i = 0; i < m; ++i) {
17 int u, v; cin >> u >> v;
18 g[u].push_back(v); // directed
19 }
20
21 vector<int> color(n + 1, 0); // 0=white,1=gray,2=black
22 vector<int> preorder, postorder;
23
24 // Stack stores (node, next_child_index). Simulates recursion.
25 for (int s = 1; s <= n; ++s) if (color[s] == 0) {
26 stack<pair<int,int>> st;
27 st.push({s, 0});
28 color[s] = 1; // gray (discovered)
29 preorder.push_back(s);
30 while (!st.empty()) {
31 auto &[u, idx] = st.top();
32 if (idx < (int)g[u].size()) {
33 int v = g[u][idx++];
34 if (color[v] == 0) {
35 color[v] = 1; // discover v
36 preorder.push_back(v);
37 st.push({v, 0});
38 }
39 // if color[v] != 0, skip; cycle handling is not needed for orders here
40 } else {
41 // finished u
42 color[u] = 2; // black
43 postorder.push_back(u);
44 st.pop();
45 }
46 }
47 }
48
49 cout << "preorder:";
50 for (int x : preorder) cout << ' ' << x;
51 cout << '\n';
52
53 cout << "postorder:";
54 for (int x : postorder) cout << ' ' << x;
55 cout << '\n';
56 return 0;
57}
58

This example demonstrates DFS without recursion by managing an explicit stack that stores both the current node and the index of the next neighbor to explore. We record a node in preorder when it becomes gray (discovered) and in postorder when it becomes black (all neighbors processed). The approach is robust to deep graphs where recursion may overflow.

Time: O(n + m)Space: O(n) for colors and stack (worst-case depth n)
DFS-based topological sort with cycle detection (directed graph)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Topological sort using DFS finishing times. Detect cycles with colors.
5// Input:
6// n m
7// m lines: u v (edge u -> v)
8// Output: topological order or "IMPOSSIBLE" if a cycle exists.
9
10vector<vector<int>> g;
11vector<int> color; // 0=white, 1=gray, 2=black
12vector<int> order;
13bool hasCycle = false;
14
15void dfs(int u) {
16 color[u] = 1; // gray
17 for (int v : g[u]) {
18 if (color[v] == 0) {
19 dfs(v);
20 if (hasCycle) return; // early exit if desired
21 } else if (color[v] == 1) {
22 hasCycle = true; // back edge => cycle
23 return;
24 }
25 }
26 color[u] = 2; // black
27 order.push_back(u); // push on finish
28}
29
30int main() {
31 ios::sync_with_stdio(false);
32 cin.tie(nullptr);
33
34 int n, m;
35 if (!(cin >> n >> m)) return 0;
36 g.assign(n + 1, {});
37 for (int i = 0; i < m; ++i) {
38 int u, v; cin >> u >> v;
39 g[u].push_back(v);
40 }
41
42 color.assign(n + 1, 0);
43 for (int u = 1; u <= n; ++u) if (color[u] == 0) {
44 dfs(u);
45 if (hasCycle) break;
46 }
47
48 if (hasCycle) {
49 cout << "IMPOSSIBLE\n";
50 } else {
51 reverse(order.begin(), order.end());
52 for (int x : order) cout << x << ' ';
53 cout << '\n';
54 }
55 return 0;
56}
57

We use color states to detect back edges: encountering a gray neighbor implies a directed cycle. If no cycle is present, pushing vertices upon finishing (postorder) and then reversing yields a valid topological order that respects all edges.

Time: O(n + m)Space: O(n) for color and order plus O(n) recursion stack in the worst case
Edge classification in a directed graph using discovery/finish times
1#include <bits/stdc++.h>
2using namespace std;
3
4// Classify edges as tree, back, forward, or cross using DFS times.
5// Input:
6// n m
7// m lines: u v (directed edges). We will also print classification for each input edge.
8
9struct Edge { int u, v; };
10
11vector<vector<int>> g;
12vector<int> color, tin, tout;
13int timer = 0;
14
15void dfs(int u) {
16 color[u] = 1; // gray
17 tin[u] = ++timer; // discovery time
18 for (int v : g[u]) {
19 if (color[v] == 0) {
20 cout << u << " -> " << v << ": tree\n";
21 dfs(v);
22 } else if (color[v] == 1) {
23 cout << u << " -> " << v << ": back\n"; // to ancestor on stack
24 } else {
25 // color[v] == 2 (finished). Determine forward vs cross by interval nesting.
26 if (tin[u] < tin[v] && tout[v] < tout[u]) {
27 cout << u << " -> " << v << ": forward\n";
28 } else {
29 cout << u << " -> " << v << ": cross\n";
30 }
31 }
32 }
33 tout[u] = ++timer; // finish time set on exit
34 color[u] = 2; // black
35}
36
37int main() {
38 ios::sync_with_stdio(false);
39 cin.tie(nullptr);
40
41 int n, m;
42 if (!(cin >> n >> m)) return 0;
43 g.assign(n + 1, {});
44 vector<Edge> edges; edges.reserve(m);
45 for (int i = 0; i < m; ++i) {
46 int u, v; cin >> u >> v; edges.push_back({u,v}); g[u].push_back(v);
47 }
48
49 color.assign(n + 1, 0);
50 tin.assign(n + 1, 0);
51 tout.assign(n + 1, 0);
52
53 for (int u = 1; u <= n; ++u) if (color[u] == 0) dfs(u);
54
55 // Note: The classifications were printed during exploration when the relationship can be decided.
56 return 0;
57}
58

We track discovery (tin) and finish (tout) times alongside color states. When exploring an edge (u, v): if v is white, it is a tree edge; if v is gray, it is a back edge; if v is black, we check interval nesting tin[u] < tin[v] and tout[v] < tout[u] to decide forward vs cross. This mirrors the standard theoretical classification for DFS in directed graphs.

Time: O(n + m)Space: O(n) for color and time arrays plus O(n) recursion stack in the worst case