Graph Fundamentals
Key Points
- •A graph models relationships between items using vertices (nodes) and edges (links).
- •An adjacency matrix is an n-by-n table A where indicates whether vertex i is connected to vertex j; it uses O() space and is good for dense graphs.
- •The degree of a vertex is the number of edges touching it; the degree matrix D is a diagonal matrix with degrees on the diagonal.
- •For undirected simple graphs, the sum of all degrees equals twice the number of edges (Handshaking Lemma).
- •Adjacency lists store only existing neighbors, using O(n + m) space and enabling O(n + m) traversals like BFS/DFS on sparse graphs.
- •BFS on an adjacency matrix takes O(), while BFS on adjacency lists takes O(n + m).
- •The graph Laplacian L = D - A connects graph structure to linear algebra and is widely used in spectral clustering and network analysis.
- •Choose representations based on density and operations: matrices for quick edge queries and algebraic methods; lists for fast traversal on sparse graphs.
Prerequisites
- →Big-O notation and basic algorithm analysis — To compare adjacency matrix vs list performance and reason about traversal complexities.
- →Arrays, vectors, and 2D arrays in C++ — Graph matrices are stored as 2D arrays; adjacency lists use vectors of vectors.
- →Queues and BFS idea — BFS relies on queue operations to explore layers in unweighted graphs.
- →Basic linear algebra (matrices) — Adjacency and degree matrices, and the Laplacian L = D − A, are matrix concepts.
- →Set vs multiset semantics — To decide whether to allow multiple edges and how to count degrees.
Detailed Explanation
Tap terms for definitions01Overview
Hook: Think of a social network where people are nodes and friendships are links. If you can model who is connected to whom, you can answer questions like “Who can reach whom in two steps?” or “Who is the most connected?” Concept: Graphs give a clean language for such problems. A graph consists of vertices (also called nodes) and edges (connections). There are two common ways to store a graph on a computer: adjacency matrices and adjacency lists. In this lesson we focus on the matrix viewpoint and the closely related degree matrix. Example: In a class of 4 students, if Alice is friends with Bob and Carol, and Bob is also friends with Dave, we can write a 4×4 table where a 1 at row i, column j means student i is friends with student j. Counting 1s across a row tells you how many friends that student has—their degree—and those counts become the diagonal entries of the degree matrix. These basic ingredients power many algorithms, from simple traversals like BFS to advanced tools like spectral clustering via the Laplacian L = D − A. Understanding when to use each representation and how to compute degrees correctly will help you implement graph algorithms efficiently and accurately.
02Intuition & Analogies
Hook: Imagine a city map. Intersections are places you can stand (vertices), and streets are the paths you can take (edges). If you had a huge billboard-sized table listing all pairs of intersections and checked a box when a street connects them, that’s an adjacency matrix. Concept: The adjacency matrix is like a big seating chart for connections: each row is a “from” seat, each column a “to” seat, and a 1 (or a weight) says “these two are linked.” The degree of an intersection is simply how many streets flow out of it—the number of neighbors you can immediately reach. The degree matrix writes those counts on a diagonal, like labeling each row with the number of exits before reading the rest. Example: Consider 5 bus stops. Stop 0 connects to 1, 2; stop 1 connects to 2; stop 3 connects to 4. Draw a 5×5 table; put a 1 wherever two stops are directly connected. Row sums tell you how many direct rides leave each stop. If the roads are one-way, your row sums are out-degrees; column sums are in-degrees. If the roads are two-way, the table is symmetric, because if i connects to j, then j connects to i. Why two representations? If every stop connects to almost every other (a dense network), a full table is fine and makes “is there a road between i and j?” an instant lookup. But if connections are rare (a sparse network), keeping only neighbor lists saves space and speeds up traversals that only follow existing roads. The Laplacian L = D − A acts like a balance sheet: how much can leave a node (degree) minus what’s tied up in specific connections (adjacency), enabling powerful analyses like measuring how tightly knit different regions are.
03Formal Definition
04When to Use
Use an adjacency matrix when your graph is dense (m is close to n^{2}), when you need O(1) edge-existence queries A_{ij}, or when you plan to apply linear-algebraic techniques (e.g., computing L = D − A, eigenvalues for spectral clustering, counting k-length walks via A^{k}). It also simplifies some implementations when n is small to moderate, because memory is predictable and indexing is straightforward. Use an adjacency list when your graph is sparse (m ≪ n^{2}), or when your algorithms iterate over neighbors (BFS, DFS, Dijkstra on sparse graphs). Neighbor iteration from lists is proportional to the actual number of edges encountered, giving O(n + m) traversals. For streaming or incremental graph building, lists are usually preferable because you can append neighbors without touching O(n^{2}) memory. For weighted graphs with mostly absent edges, lists also avoid storing many zeros. For directed graphs, choose representations that support the queries you need: if you frequently need in-neighbors, maintain reverse adjacency lists or compute column sums of A. For educational or debugging purposes, matrices are nice to print and reason about; for performance in large, sparse real-world networks (social graphs, road networks, web graphs), lists are typically the practical choice.
⚠️Common Mistakes
- Off-by-one indexing: Many inputs are 1-based, but C++ vectors are 0-based. Always convert consistently; otherwise you may write outside bounds or misplace edges. Use assertions to catch u, v in [0, n).
- Forgetting symmetry in undirected graphs: If you set A[u][v] = 1 but forget A[v][u] = 1 (or push both ways in lists), degrees and traversals will be wrong.
- Misinterpreting degrees in digraphs: Row sums give out-degrees; column sums give in-degrees. Mixing them leads to incorrect analyses.
- Ignoring self-loops or multiedges: If your graph is simple, do not add u = v or duplicate edges. If multigraphs are allowed, degrees must count multiplicities; matrices may store counts or weights.
- Using an adjacency matrix for very sparse large graphs: O(n^{2}) memory can exceed limits quickly (e.g., n = 10^5 implies 10^{10} entries). Prefer lists for sparsity.
- Confusing 0/1 with weights: In weighted graphs, A_{ij} holds w_{ij}, not just 0/1. Degrees become weighted sums, not counts.
- BFS/DFS pitfalls: Forgetting to mark visited before enqueuing can add duplicates; in a matrix-based BFS, scanning all n columns per row changes complexity to O(n^{2}).
- Integer overflow in degree sums: With large m or weights, use 64-bit integers (long long) to store degrees and totals.
- Not clearing or reinitializing structures between test cases in competitive programming, causing leftover edges to pollute results.
Key Formulas
Handshaking Lemma (Undirected)
Explanation: In an undirected simple graph with n vertices and m edges, adding all vertex degrees counts each edge twice (once for each endpoint). This identity helps verify data structures and degree computations.
Directed Degree Sums
Explanation: In a directed graph, each edge contributes exactly one to an out-degree and one to an in-degree. Therefore, total out-degree equals total in-degree and equals the number of edges.
Edge Bound (Simple Undirected)
Explanation: A simple undirected graph has at most one edge per unordered pair, so the number of edges cannot exceed n choose 2. This sets the maximum density for such graphs.
Degree from Adjacency Matrix
Explanation: For unweighted undirected graphs, the degree of vertex i is the sum across row i of the adjacency matrix. In weighted graphs, this becomes the weighted degree (sum of weights).
Symmetry of Undirected Graphs
Explanation: The adjacency matrix of an undirected graph is symmetric because if i is connected to j, then j is connected to i. This property is frequently used to halve storage or checks.
Degree Matrix Definition
Explanation: The degree matrix places each vertex’s degree on the diagonal and zeros elsewhere. It summarizes local connectivity per vertex.
Graph Laplacian
Explanation: The combinatorial Laplacian subtracts adjacency from degree, capturing flow-like differences. Its spectrum is central in clustering, partitioning, and measuring connectivity.
Walk Counting via Powers
Explanation: Raising the adjacency matrix to power k and reading entry (i, j) counts the number of k-step walks between i and j in unweighted graphs. This connects combinatorics with linear algebra.
Storage Complexity
Explanation: Adjacency matrices store an entry for every potential pair of vertices, while lists store only actual edges. This guides representation choice based on sparsity.
BFS Complexity by Representation
Explanation: BFS using adjacency lists visits each vertex and edge once, giving O(n + m). Using an adjacency matrix forces scanning all n entries per row, leading to O().
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // This program reads an undirected, unweighted graph and builds: 5 // - Adjacency matrix A (n x n) 6 // - Degree matrix D (n x n, diagonal) 7 // - Laplacian L = D - A 8 // Input format: 9 // n m 10 // m lines of edges: u v (0-based indices) 11 // If your input is 1-based, subtract 1 from u and v when reading. 12 13 int main() { 14 ios::sync_with_stdio(false); 15 cin.tie(nullptr); 16 17 int n, m; 18 if (!(cin >> n >> m)) return 0; 19 20 // Initialize n x n matrix with zeros (O(n^2) space and time to zero it) 21 vector<vector<int>> A(n, vector<int>(n, 0)); 22 23 for (int e = 0; e < m; ++e) { 24 int u, v; cin >> u >> v; 25 if (u < 0 || u >= n || v < 0 || v >= n) { 26 cerr << "Edge out of range: (" << u << "," << v << ")\n"; 27 return 0; 28 } 29 if (u == v) { 30 // Simple graphs often forbid self-loops; skip or handle as needed 31 // Here we allow it: it adds 1 on the diagonal 32 A[u][v] = 1; // For multigraphs, consider incrementing instead of setting 33 } else { 34 A[u][v] = 1; 35 A[v][u] = 1; // ensure symmetry for undirected graphs 36 } 37 } 38 39 // Compute degrees as row sums 40 vector<long long> deg(n, 0); 41 for (int i = 0; i < n; ++i) { 42 long long s = 0; 43 for (int j = 0; j < n; ++j) s += A[i][j]; 44 deg[i] = s; 45 } 46 47 // Build Degree matrix D and Laplacian L = D - A 48 vector<vector<long long>> D(n, vector<long long>(n, 0)); 49 vector<vector<long long>> L(n, vector<long long>(n, 0)); 50 for (int i = 0; i < n; ++i) { 51 D[i][i] = deg[i]; 52 } 53 for (int i = 0; i < n; ++i) { 54 for (int j = 0; j < n; ++j) { 55 L[i][j] = (i == j ? D[i][i] : 0) - A[i][j]; 56 } 57 } 58 59 // Optional: verify Handshaking Lemma (for undirected without self-loops) 60 long long sum_deg = 0; 61 for (int i = 0; i < n; ++i) sum_deg += deg[i]; 62 cerr << "Sum of degrees = " << sum_deg << " (should be 2*m if no self-loops)\n"; 63 64 // Print matrices 65 cout << "Adjacency matrix A:\n"; 66 for (int i = 0; i < n; ++i) { 67 for (int j = 0; j < n; ++j) cout << A[i][j] << (j+1==n?'\n':' '); 68 } 69 70 cout << "Degree matrix D:\n"; 71 for (int i = 0; i < n; ++i) { 72 for (int j = 0; j < n; ++j) cout << D[i][j] << (j+1==n?'\n':' '); 73 } 74 75 cout << "Laplacian L = D - A:\n"; 76 for (int i = 0; i < n; ++i) { 77 for (int j = 0; j < n; ++j) cout << L[i][j] << (j+1==n?'\n':' '); 78 } 79 80 return 0; 81 } 82
We read an undirected graph, build its adjacency matrix A, compute vertex degrees as row sums, assemble the diagonal degree matrix D, and derive the Laplacian L = D − A. We also print a quick diagnostic (sum of degrees) to check against 2m when the graph has no self-loops. This demonstrates how matrix representations summarize connectivity and how D and L are constructed mechanically from A.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Input format: 5 // n m\n 6 // m lines: u v (0-based), undirected 7 // s (source vertex) 8 // Output: distance from s to every vertex (−1 if unreachable) 9 10 int main() { 11 ios::sync_with_stdio(false); 12 cin.tie(nullptr); 13 14 int n, m; if (!(cin >> n >> m)) return 0; 15 vector<vector<int>> adj(n); 16 for (int e = 0; e < m; ++e) { 17 int u, v; cin >> u >> v; 18 if (u < 0 || u >= n || v < 0 || v >= n) return 0; 19 adj[u].push_back(v); 20 adj[v].push_back(u); // undirected 21 } 22 23 int s; cin >> s; 24 if (s < 0 || s >= n) return 0; 25 26 const int INF = -1; 27 vector<int> dist(n, INF); 28 queue<int> q; 29 30 // Initialize 31 dist[s] = 0; 32 q.push(s); 33 34 while (!q.empty()) { 35 int u = q.front(); q.pop(); 36 for (int v : adj[u]) { 37 if (dist[v] == INF) { // not visited yet 38 dist[v] = dist[u] + 1; // one more edge 39 q.push(v); 40 } 41 } 42 } 43 44 // Output distances 45 for (int i = 0; i < n; ++i) { 46 cout << dist[i] << (i+1==n?'\n':' '); 47 } 48 return 0; 49 } 50
This is a standard BFS using adjacency lists. It computes, for an unweighted undirected graph, the shortest path distance (in number of edges) from a source vertex s to every vertex. Using lists ensures we only traverse existing edges, yielding O(n + m) time on sparse graphs.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Input format: 5 // n m\n 6 // m lines: u v (0-based), undirected 7 // s (source) 8 // Output: distance from s to every vertex (−1 if unreachable) 9 10 int main() { 11 ios::sync_with_stdio(false); 12 cin.tie(nullptr); 13 14 int n, m; if (!(cin >> n >> m)) return 0; 15 vector<vector<int>> A(n, vector<int>(n, 0)); 16 for (int e = 0; e < m; ++e) { 17 int u, v; cin >> u >> v; 18 if (u < 0 || u >= n || v < 0 || v >= n) return 0; 19 A[u][v] = 1; 20 A[v][u] = 1; // undirected 21 } 22 int s; cin >> s; if (s < 0 || s >= n) return 0; 23 24 const int INF = -1; 25 vector<int> dist(n, INF); 26 queue<int> q; 27 28 dist[s] = 0; q.push(s); 29 30 while (!q.empty()) { 31 int u = q.front(); q.pop(); 32 // Scan entire row u to find neighbors (costly when sparse) 33 for (int v = 0; v < n; ++v) { 34 if (A[u][v] && dist[v] == INF) { 35 dist[v] = dist[u] + 1; 36 q.push(v); 37 } 38 } 39 } 40 41 for (int i = 0; i < n; ++i) { 42 cout << dist[i] << (i+1==n?'\n':' '); 43 } 44 return 0; 45 } 46
This BFS uses the adjacency matrix. For each dequeued vertex u, it scans the whole row A[u][·] to find neighbors, which costs O(n) per vertex. This highlights the practical difference in performance between matrix and list representations on sparse graphs.