Depth-First Search (DFS)
Key Points
- •Depth-First Search (DFS) explores a graph by going as deep as possible along each path before backtracking.
- •It can be implemented with recursion (call stack) or an explicit stack and runs in O(V + E) time on adjacency lists.
- •DFS is used for connected components, cycle detection, topological sorting, and classifying edges in directed graphs.
- •In directed graphs, DFS classifies edges as tree, back, forward, or cross based on discovery and finish times.
- •Topological sort with DFS works only on DAGs and detects cycles via a 'gray' (on-stack) state.
- •Preorder (discover time) and postorder (finish time) sequences from DFS reveal structure and dependencies.
- •On undirected graphs, the presence of any back edge detected by DFS indicates a cycle.
- •Space usage is O(V) for the visited array plus up to O(V) for recursion or an explicit stack.
Prerequisites
- →Graphs and representations (adjacency list vs matrix) — Understanding how graphs are stored is essential to implement DFS efficiently and reason about O(|V| + |E|) time.
- →Recursion and call stacks — Recursive DFS relies on function calls to dive deeper and backtrack, mirroring DFS behavior.
- →Stacks (LIFO data structure) — Iterative DFS uses an explicit stack to simulate recursion and avoid stack overflows.
- →Big-O notation — Analyzing DFS time and space complexity uses asymptotic notation like O(|V| + |E|).
- →Directed vs undirected graphs — Cycle detection and edge classification rules differ based on graph directionality.
Detailed Explanation
Tap terms for definitions01Overview
Depth-First Search (DFS) is a fundamental graph traversal algorithm that systematically explores vertices and edges by following a path as far as possible, then backtracking when it hits a dead end. Think of walking through a maze by always choosing one direction and sticking with it until you can't go further, then retracing your steps to try another path. DFS can be implemented recursively or iteratively using an explicit stack; both approaches simulate the same process of diving deep and unwinding. The algorithm is versatile: it can find connected components, detect cycles, compute topological order in directed acyclic graphs (DAGs), and classify edges to reveal graph structure. Its behavior depends on the representation of the graph; with adjacency lists, it visits each vertex and each edge at most a constant number of times, giving a linear-time complexity in the size of the graph. DFS produces rich metadata like discovery and finishing times for each vertex, which help reason about ancestor–descendant relationships and edge types. This information is the backbone of many advanced graph techniques used in competitive programming and computer science, including identifying articulation points, bridges, and strongly connected components.
02Intuition & Analogies
Imagine exploring a vast library with interconnected rooms. You start at one room, pick the first unexplored doorway you see, and keep going deeper through doorways without turning back. Only when you reach a room where every doorway is either blocked or already visited do you backtrack to the previous room and try the next unexplored doorway. This is precisely the intuition of DFS: commit to a path, push forward, then unwind when stuck. In programming terms, recursion naturally models this behavior: the call to visit a neighbor "pauses" the current room until the deeper calls finish, and then you resume to consider the next doorway. If recursion feels magical, picture a stack of sticky notes: each time you enter a new room, you write "I came from room X and I was at neighbor index k" and push it onto the stack. When you're done with that room, pop the note and continue where you left off. DFS also leaves time stamps (discovery when you first enter a room and finishing when you leave) which let you infer relationships: if room A’s entire exploration interval wraps around room B’s, then B lies inside A’s region—B is a descendant of A in the DFS tree. In directed graphs, when you see a doorway that leads to a room still in the middle of exploration (on the stack), you’ve just found a back edge—proof of a cycle. These simple metaphors—going deep first, using a stack of context, and stamping times—explain why DFS is both simple to implement and powerful in what it reveals.
03Formal Definition
04When to Use
Use DFS whenever you need to explore connectivity or depth-related structure in a graph. In undirected graphs, it quickly finds connected components, counts them, and extracts each component’s vertices. When you must detect cycles, DFS provides simple patterns: in undirected graphs, encountering an already visited neighbor that is not the parent reveals a cycle; in directed graphs, any edge to a gray vertex signals a cycle. For ordering tasks where dependencies are acyclic, DFS yields a topological sort by pushing vertices onto a list upon finishing; the reversed finishing order respects all edge directions. DFS also powers more advanced routines: computing articulation points and bridges (network reliability), finding strongly connected components (Kosaraju/Tarjan algorithms), and solving puzzles like path existence in mazes with constraints. Prefer DFS over BFS when the problem cares about ancestor–descendant relations, recursion-friendly backtracking, or needs finishing-time logic (e.g., topological sort). However, if you require shortest paths in unweighted graphs, BFS is the correct choice; if the graph is extremely deep and recursion depth limits are a concern, consider iterative DFS with an explicit stack.
⚠️Common Mistakes
Common pitfalls include forgetting to mark a node as visited before recursing, which can lead to infinite recursion on cycles. In undirected graphs, failing to pass and check the parent can cause false-positive cycle detection because every edge appears twice. In directed graphs, mixing up the states (white/gray/black) or relying only on a visited boolean may miss cycle detection or misclassify edges. Another error is using adjacency matrices on large graphs, which inflates time to O(V^2) and memory to O(V^2) unnecessarily; adjacency lists are preferred for sparse graphs. Many beginners also push onto the topological order when discovering vertices instead of when finishing them, producing an invalid order. Edge classification can be implemented incorrectly if you attempt to classify using only visited flags without discovery/finish times. On the implementation side, recursive DFS on very deep graphs can overflow the call stack; competitive programmers should consider iterative DFS or increase recursion limits (where allowed). Finally, not clearing or correctly sizing global arrays between test cases (common in contests) can cause subtle bugs due to stale state, and forgetting to make the graph 0- or 1-index consistent leads to out-of-bounds errors.
Key Formulas
DFS Time Complexity
Explanation: DFS visits each vertex and each edge a constant number of times when using adjacency lists. Thus the total running time grows linearly with the number of vertices plus edges.
DFS Space Complexity
Explanation: DFS stores visited state and recursion/explicit stack information proportional to the number of vertices. In the worst case, the recursion depth (or explicit stack size) can reach |V|.
Handshake Lemma (Undirected)
Explanation: In undirected graphs, each edge contributes 2 to the total degree count. This identity explains why processing all adjacency lists takes O(|V| + |E|) time.
Out-degree Sum (Directed)
Explanation: In directed graphs, each edge contributes exactly 1 to some vertex’s out-degree. Iterating over all adjacency lists touches each directed edge once.
DFS Interval Nesting
Explanation: Discovery and finishing times form nested intervals. A vertex’s interval strictly contains those of its descendants.
Cycle Witness (Directed)
Explanation: During DFS, an edge to a gray vertex indicates a path from v to u already on the stack, creating a directed cycle. This is the core of DFS-based cycle detection.
Forward Edge Criterion
Explanation: Forward edges point from a node to a proper descendant in the DFS tree. The time intervals confirm the ancestor–descendant relationship.
Cross Edge Criterion
Explanation: When v is finished and not a descendant of u, an explored edge (u,v) is cross. It typically connects distinct DFS subtrees or non-ancestor nodes.
DAG Characterization via DFS
Explanation: A directed graph has no cycles if and only if a DFS produces no back edges. This property underlies DFS-based topological sorting.
Tree Edge Count
Explanation: In a DFS forest with c trees (components/reachability roots), the number of tree edges equals the number of vertices minus the number of trees. Each vertex except the c roots has exactly one parent edge.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Find connected components using DFS (recursive) 5 // Input format: 6 // n m 7 // m lines: u v (1-indexed undirected edges) 8 // Output: number of components and listing of each component's nodes 9 10 void dfs(int u, const vector<vector<int>>& g, vector<int>& comp, int cid) { 11 comp[u] = cid; // mark component id (also serves as visited) 12 for (int v : g[u]) { 13 if (comp[v] == -1) { 14 dfs(v, g, comp, cid); 15 } 16 } 17 } 18 19 int main() { 20 ios::sync_with_stdio(false); 21 cin.tie(nullptr); 22 23 int n, m; 24 if (!(cin >> n >> m)) return 0; 25 vector<vector<int>> g(n + 1); 26 for (int i = 0; i < m; ++i) { 27 int u, v; cin >> u >> v; 28 g[u].push_back(v); 29 g[v].push_back(u); // undirected 30 } 31 32 vector<int> comp(n + 1, -1); 33 int cid = 0; // component counter starts from 0 34 35 for (int u = 1; u <= n; ++u) { 36 if (comp[u] == -1) { 37 dfs(u, g, comp, cid); 38 ++cid; 39 } 40 } 41 42 cout << cid << "\n"; 43 // Collect vertices per component for display 44 vector<vector<int>> groups(cid); 45 for (int u = 1; u <= n; ++u) groups[comp[u]].push_back(u); 46 for (int i = 0; i < cid; ++i) { 47 cout << "component " << i << ":"; 48 for (int u : groups[i]) cout << ' ' << u; 49 cout << '\n'; 50 } 51 return 0; 52 } 53
We perform a DFS from every unvisited vertex to label all vertices reachable from it with the same component id. The comp array doubles as a visited flag (comp[u] != -1 means visited) and a component label. Each DFS call explores neighbors recursively until no new vertices are found, thereby forming one connected component.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Iterative DFS that records preorder (discovery) and postorder (finish) sequences. 5 // Input: 6 // n m 7 // m lines: u v (1-indexed directed edges) 8 9 int main() { 10 ios::sync_with_stdio(false); 11 cin.tie(nullptr); 12 13 int n, m; 14 if (!(cin >> n >> m)) return 0; 15 vector<vector<int>> g(n + 1); 16 for (int i = 0; i < m; ++i) { 17 int u, v; cin >> u >> v; 18 g[u].push_back(v); // directed 19 } 20 21 vector<int> color(n + 1, 0); // 0=white,1=gray,2=black 22 vector<int> preorder, postorder; 23 24 // Stack stores (node, next_child_index). Simulates recursion. 25 for (int s = 1; s <= n; ++s) if (color[s] == 0) { 26 stack<pair<int,int>> st; 27 st.push({s, 0}); 28 color[s] = 1; // gray (discovered) 29 preorder.push_back(s); 30 while (!st.empty()) { 31 auto &[u, idx] = st.top(); 32 if (idx < (int)g[u].size()) { 33 int v = g[u][idx++]; 34 if (color[v] == 0) { 35 color[v] = 1; // discover v 36 preorder.push_back(v); 37 st.push({v, 0}); 38 } 39 // if color[v] != 0, skip; cycle handling is not needed for orders here 40 } else { 41 // finished u 42 color[u] = 2; // black 43 postorder.push_back(u); 44 st.pop(); 45 } 46 } 47 } 48 49 cout << "preorder:"; 50 for (int x : preorder) cout << ' ' << x; 51 cout << '\n'; 52 53 cout << "postorder:"; 54 for (int x : postorder) cout << ' ' << x; 55 cout << '\n'; 56 return 0; 57 } 58
This example demonstrates DFS without recursion by managing an explicit stack that stores both the current node and the index of the next neighbor to explore. We record a node in preorder when it becomes gray (discovered) and in postorder when it becomes black (all neighbors processed). The approach is robust to deep graphs where recursion may overflow.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Topological sort using DFS finishing times. Detect cycles with colors. 5 // Input: 6 // n m 7 // m lines: u v (edge u -> v) 8 // Output: topological order or "IMPOSSIBLE" if a cycle exists. 9 10 vector<vector<int>> g; 11 vector<int> color; // 0=white, 1=gray, 2=black 12 vector<int> order; 13 bool hasCycle = false; 14 15 void dfs(int u) { 16 color[u] = 1; // gray 17 for (int v : g[u]) { 18 if (color[v] == 0) { 19 dfs(v); 20 if (hasCycle) return; // early exit if desired 21 } else if (color[v] == 1) { 22 hasCycle = true; // back edge => cycle 23 return; 24 } 25 } 26 color[u] = 2; // black 27 order.push_back(u); // push on finish 28 } 29 30 int main() { 31 ios::sync_with_stdio(false); 32 cin.tie(nullptr); 33 34 int n, m; 35 if (!(cin >> n >> m)) return 0; 36 g.assign(n + 1, {}); 37 for (int i = 0; i < m; ++i) { 38 int u, v; cin >> u >> v; 39 g[u].push_back(v); 40 } 41 42 color.assign(n + 1, 0); 43 for (int u = 1; u <= n; ++u) if (color[u] == 0) { 44 dfs(u); 45 if (hasCycle) break; 46 } 47 48 if (hasCycle) { 49 cout << "IMPOSSIBLE\n"; 50 } else { 51 reverse(order.begin(), order.end()); 52 for (int x : order) cout << x << ' '; 53 cout << '\n'; 54 } 55 return 0; 56 } 57
We use color states to detect back edges: encountering a gray neighbor implies a directed cycle. If no cycle is present, pushing vertices upon finishing (postorder) and then reversing yields a valid topological order that respects all edges.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Classify edges as tree, back, forward, or cross using DFS times. 5 // Input: 6 // n m 7 // m lines: u v (directed edges). We will also print classification for each input edge. 8 9 struct Edge { int u, v; }; 10 11 vector<vector<int>> g; 12 vector<int> color, tin, tout; 13 int timer = 0; 14 15 void dfs(int u) { 16 color[u] = 1; // gray 17 tin[u] = ++timer; // discovery time 18 for (int v : g[u]) { 19 if (color[v] == 0) { 20 cout << u << " -> " << v << ": tree\n"; 21 dfs(v); 22 } else if (color[v] == 1) { 23 cout << u << " -> " << v << ": back\n"; // to ancestor on stack 24 } else { 25 // color[v] == 2 (finished). Determine forward vs cross by interval nesting. 26 if (tin[u] < tin[v] && tout[v] < tout[u]) { 27 cout << u << " -> " << v << ": forward\n"; 28 } else { 29 cout << u << " -> " << v << ": cross\n"; 30 } 31 } 32 } 33 tout[u] = ++timer; // finish time set on exit 34 color[u] = 2; // black 35 } 36 37 int main() { 38 ios::sync_with_stdio(false); 39 cin.tie(nullptr); 40 41 int n, m; 42 if (!(cin >> n >> m)) return 0; 43 g.assign(n + 1, {}); 44 vector<Edge> edges; edges.reserve(m); 45 for (int i = 0; i < m; ++i) { 46 int u, v; cin >> u >> v; edges.push_back({u,v}); g[u].push_back(v); 47 } 48 49 color.assign(n + 1, 0); 50 tin.assign(n + 1, 0); 51 tout.assign(n + 1, 0); 52 53 for (int u = 1; u <= n; ++u) if (color[u] == 0) dfs(u); 54 55 // Note: The classifications were printed during exploration when the relationship can be decided. 56 return 0; 57 } 58
We track discovery (tin) and finish (tout) times alongside color states. When exploring an edge (u, v): if v is white, it is a tree edge; if v is gray, it is a back edge; if v is black, we check interval nesting tin[u] < tin[v] and tout[v] < tout[u] to decide forward vs cross. This mirrors the standard theoretical classification for DFS in directed graphs.