Spectral Graph Theory
Key Points
- •Spectral graph theory studies graphs by looking at eigenvalues and eigenvectors of matrices like the adjacency matrix A and Laplacians L and .
- •The second smallest eigenvalue of the Laplacian (the Fiedler value, quantifies how well a graph is connected and guides graph partitioning.
- •Cheeger’s inequality links to the quality of the best graph cut, explaining why spectral methods find good clusters.
- •Normalized Laplacian L_ − A is scale-invariant and preferred for irregular-degree graphs.
- •Spectral clustering embeds nodes using Laplacian eigenvectors and then applies k-means to get communities.
- •Random walks on graphs have stationary distribution proportional to node degrees; the spectral gap controls mixing time.
- •Graph neural networks often approximate spectral filters defined by Laplacian eigen-decompositions using fast polynomial filters.
- •In practice, iterative methods that multiply by sparse matrices yield scalable spectral algorithms with O(m) work per iteration.
Prerequisites
- →Linear algebra (eigenvalues, eigenvectors, orthogonality) — Spectral methods rely on eigen-decompositions, Rayleigh quotients, and orthogonalization.
- →Graph theory basics — Understanding adjacency, degree, paths, and cuts is necessary to interpret Laplacians and partitions.
- →Probability and Markov chains — Random walks, stationary distributions, and mixing times connect to spectral gaps.
- →Optimization and k-means clustering — Spectral clustering uses k-means on eigenvector embeddings.
- →Numerical methods — Iterative eigen-solvers and convergence criteria are key for scalable implementations.
Detailed Explanation
Tap terms for definitions01Overview
Hook → If you could listen to a graph, what would it sound like? Spectral graph theory answers this by turning a graph into a musical instrument whose notes are its eigenvalues and whose harmonics are eigenvectors. These frequencies reveal connectivity, bottlenecks, and cluster structure that are hard to see directly from edges. Concept → We represent a graph with matrices: the adjacency matrix A, the degree matrix D, the combinatorial Laplacian L = D − A, and the normalized Laplacian L_norm = I − D^{-1/2} A D^{-1/2}. The eigenvalues and eigenvectors of these matrices encode structural properties. For instance, the second smallest eigenvalue of L (the Fiedler value) measures how well-connected the graph is; its eigenvector guides us to split the graph into two balanced pieces with few crossing edges. Example → In spectral clustering, we compute a few of the smallest eigenvectors of L_norm (or equivalently the largest eigenvectors of D^{-1/2} A D^{-1/2}), embed nodes as points in low-dimensional space, and then run k-means. This approach underlies community detection in networks, image segmentation in vision, and feature learning in graph neural networks.
02Intuition & Analogies
Hook → Imagine a network of masses and springs: nodes are masses, edges are springs. When you pluck this system, it vibrates in certain modes at certain frequencies. Low-frequency modes deform the system smoothly, revealing large-scale structure; high-frequency modes wiggle locally. Concept → Graph Laplacians are the mathematical version of this mass–spring system. The Laplacian energy x^T L x measures how much the signal x changes across edges. Eigenvectors minimize this energy subject to constraints; the smallest-energy nontrivial eigenvector (Fiedler vector) is the smoothest way to vary over the graph while not being constant. Where this vector changes sign is where the graph “wants to split,” revealing a natural cut. Example → Suppose a social network has two friend groups with a few cross-group ties. If you color each node by the value of the Fiedler vector, nodes in the same community have similar colors, and there’s a sharp transition across the sparse bridge. Sorting nodes by this value and cutting at the transition yields a clean partition. For k communities, multiple low-frequency eigenvectors serve as coordinates; points from the same community cluster together, and k-means separates them.
03Formal Definition
04When to Use
Hook → If you need the big picture of a large, noisy graph—Who belongs with whom? Where are the bottlenecks?—spectral methods are a strong first choice. Concept → Use spectral graph theory when you want global structure from local edges. It excels for: (1) clustering/community detection via spectral embeddings (e.g., normalized cut minimization), (2) graph partitioning for load balancing in parallel computing, (3) semi-supervised learning on graphs (label propagation smoothness priors), (4) analyzing random walks and Markov chains (mixing time, PageRank variants), and (5) graph neural networks where filters are functions of L (e.g., Chebyshev or GCN filters). Example → In image segmentation, build a pixel graph with weights based on color/position similarity, compute the first few eigenvectors of L_{\text{norm}}, embed pixels into 2–10 dimensions, and run k-means to separate regions. In recommendation systems, user–item bipartite graphs can be embedded via spectral methods to reveal communities. For GNNs on citation networks, approximate spectral filters g(L) with polynomials in L to get localized convolutions that are efficient and stable.
⚠️Common Mistakes
Hook → Spectral tools are powerful, but subtle choices can break them or hide their benefits. Concept → Frequent pitfalls include: (1) using the unnormalized Laplacian on graphs with highly variable degrees, which biases partitions; (2) forgetting to remove isolated nodes or handling them incorrectly; (3) extracting the wrong eigenvector (confusing the top eigenvector of S with the second one, or not deflating the trivial eigenvector); (4) interpreting eigen-gaps without checking for near-multiplicities; (5) running k-means on unnormalized eigenvector rows, which can distort clusters; and (6) treating numerical eigen-solvers as black boxes without monitoring convergence. Example → If you compute the largest eigenvector of S = D^{-1/2} A D^{-1/2} and cluster by its sign, you’ll often get a trivial split aligned with degree because the top eigenvector is proportional to D^{1/2} 1. The fix is to project out this trivial direction and use the next eigenvector(s). Similarly, when doing spectral clustering, always row-normalize the eigenvector matrix before k-means (Ng–Jordan–Weiss variant), and for graphs with isolated nodes, either remove them or assign them as singleton clusters to avoid numerical issues.
Key Formulas
Combinatorial Laplacian
Explanation: Defines the Laplacian from degree and adjacency. It penalizes differences across edges and is central to spectral analysis.
Normalized Laplacian
Explanation: Removes degree scaling, making spectral quantities comparable across nodes with different degrees. Preferred for clustering.
Laplacian eigenvalue ordering
Explanation: L is positive semi-definite, so eigenvalues are nonnegative and ordered. The first is zero with eigenvector 1.
Rayleigh quotient
Explanation: Measures how x aligns with M’s eigenvectors. Minimizing or maximizing this over constraints yields eigenvalues.
Fiedler value (variational)
Explanation: The algebraic connectivity is the smallest nontrivial Laplacian Rayleigh quotient. Its minimizer is the Fiedler vector.
Symmetric normalization
Explanation: Relates the normalized Laplacian to a symmetric stochastic-like matrix S. Dominant eigenvectors of S correspond to smallest eigenvectors of .
Cheeger inequality (graph)
Explanation: Links the best conductance (Cheeger constant) to the Laplacian eigenvalue Small implies a sparse cut exists, and vice versa up to constants.
Stationary distribution
Explanation: For an undirected graph’s random walk, the stationary probability of node i is proportional to its degree.
Mixing by spectral gap
Explanation: Convergence to stationarity is geometric with rate controlled by the spectral gap γ of P (or S). Larger gaps mean faster mixing.
Normalized cut objective
Explanation: Measures the fraction of edges leaving each side relative to its volume. Minimization is relaxed by eigenvectors.
Spectral filtering
Explanation: Applying a graph filter g(L) corresponds to scaling eigencomponents by g on eigenvalues. Used in spectral GNNs.
Relaxed k-cut
Explanation: The optimal Y spans the k smallest eigenvectors of . Row-normalizing Y and running k-means yields spectral clustering.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Undirected, unweighted graph stored as adjacency list 5 struct Graph { 6 int n; // number of nodes 7 vector<vector<int>> adj; // adjacency list 8 vector<double> deg; // degrees 9 Graph(int n_) : n(n_), adj(n_), deg(n_, 0.0) {} 10 void add_edge(int u, int v) { 11 if (u == v) return; // ignore self-loops for simplicity 12 adj[u].push_back(v); 13 adj[v].push_back(u); 14 deg[u] += 1.0; 15 deg[v] += 1.0; 16 } 17 }; 18 19 // Multiply y = S x where S = D^{-1/2} A D^{-1/2} using adjacency list 20 static inline void multiply_S(const Graph &G, const vector<double> &x, vector<double> &y) { 21 int n = G.n; 22 fill(y.begin(), y.end(), 0.0); 23 for (int u = 0; u < n; ++u) { 24 double du = G.deg[u]; 25 if (du == 0.0) continue; // isolated node contributes nothing 26 double scale_u = 1.0 / sqrt(du); 27 for (int v : G.adj[u]) { 28 double dv = G.deg[v]; 29 if (dv == 0.0) continue; 30 double w = scale_u * (1.0 / sqrt(dv)); // 1/sqrt(d_u d_v) 31 y[u] += w * x[v]; 32 } 33 } 34 } 35 36 static inline double dot(const vector<double> &a, const vector<double> &b) { 37 double s = 0.0; int n = (int)a.size(); 38 for (int i = 0; i < n; ++i) s += a[i] * b[i]; 39 return s; 40 } 41 42 static inline double norm2(const vector<double> &a) { 43 return sqrt(max(0.0, dot(a, a))); 44 } 45 46 // Project x to be orthogonal to u: x <- x - (x.u / u.u) u 47 static inline void project_out(vector<double> &x, const vector<double> &u) { 48 double uu = max(1e-18, dot(u, u)); 49 double alpha = dot(x, u) / uu; 50 for (size_t i = 0; i < x.size(); ++i) x[i] -= alpha * u[i]; 51 } 52 53 // Compute the second largest eigenpair (mu2, v2) of S by deflating the trivial eigenvector u = sqrt(deg) 54 // Returns estimated mu2 and fills v2 (unit norm). Also returns lambda2(L_norm) = 1 - mu2. 55 pair<double,double> second_eigen_S(const Graph &G, vector<double> &v2, int iters = 200, double tol = 1e-9) { 56 int n = G.n; 57 // Trivial eigenvector for S is u = sqrt(deg) 58 vector<double> u(n); 59 for (int i = 0; i < n; ++i) u[i] = sqrt(G.deg[i]); 60 61 // Initialize v2 randomly and orthogonalize to u 62 mt19937 rng(42); 63 normal_distribution<double> nd(0.0, 1.0); 64 v2.assign(n, 0.0); 65 for (int i = 0; i < n; ++i) v2[i] = nd(rng); 66 project_out(v2, u); 67 double nrm = norm2(v2); 68 if (nrm < 1e-12) { 69 // If all isolated, return trivial 70 return {0.0, 1.0}; 71 } 72 for (int i = 0; i < n; ++i) v2[i] /= nrm; 73 74 vector<double> y(n, 0.0); 75 double mu_old = 0.0; 76 for (int it = 0; it < iters; ++it) { 77 multiply_S(G, v2, y); 78 // Deflate: ensure orthogonal to u (remove trivial component) 79 project_out(y, u); 80 double nrm_y = norm2(y); 81 if (nrm_y < 1e-14) break; 82 for (int i = 0; i < n; ++i) v2[i] = y[i] / nrm_y; 83 // Rayleigh quotient as eigenvalue estimate 84 multiply_S(G, v2, y); 85 double mu = dot(v2, y); 86 if (fabs(mu - mu_old) < tol * max(1.0, fabs(mu_old))) { 87 mu_old = mu; 88 break; 89 } 90 mu_old = mu; 91 } 92 double mu2 = mu_old; // second largest eigenvalue of S 93 double lambda2 = 1.0 - mu2; // second smallest eigenvalue of L_norm 94 return {mu2, lambda2}; 95 } 96 97 int main() { 98 ios::sync_with_stdio(false); 99 cin.tie(nullptr); 100 101 // Example: two 5-node cliques connected by a single bridge (barbell-like) 102 int n = 10; Graph G(n); 103 // Clique 0..4 104 for (int i = 0; i < 5; ++i) for (int j = i+1; j < 5; ++j) G.add_edge(i, j); 105 // Clique 5..9 106 for (int i = 5; i < 10; ++i) for (int j = i+1; j < 10; ++j) G.add_edge(i, j); 107 // Weak bridge 108 G.add_edge(4, 5); 109 110 vector<double> v2; 111 auto [mu2, lambda2] = second_eigen_S(G, v2, 500, 1e-10); 112 113 cout.setf(std::ios::fixed); cout << setprecision(6); 114 cout << "Estimated mu2 (S): " << mu2 << "\n"; 115 cout << "Estimated lambda2 (L_norm): " << lambda2 << "\n"; 116 117 // 2-way spectral cut by sign (median threshold for robustness) 118 vector<int> idx(n); iota(idx.begin(), idx.end(), 0); 119 vector<double> v2_copy = v2; 120 nth_element(idx.begin(), idx.begin()+n/2, idx.end(), [&](int a, int b){return v2_copy[a] < v2_copy[b];}); 121 double median = v2_copy[idx[n/2]]; 122 123 vector<int> part(n, 0); 124 for (int i = 0; i < n; ++i) part[i] = (v2[i] >= median) ? 1 : 0; 125 126 cout << "Partition (node:part):\n"; 127 for (int i = 0; i < n; ++i) cout << i << ":" << part[i] << (i+1==n?'\n':' '); 128 129 return 0; 130 } 131
We construct an undirected graph, implement a sparse multiply by S = D^{-1/2} A D^{-1/2}, and perform power iteration while projecting out the trivial eigenvector u = sqrt(deg). The iteration converges to the second largest eigenvector of S, which corresponds to the Fiedler vector of L_norm. We estimate the eigenvalue with the Rayleigh quotient and then cut the graph into two parts by thresholding the eigenvector at its median.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Graph { 5 int n; vector<vector<int>> adj; vector<double> deg; 6 Graph(int n_) : n(n_), adj(n_), deg(n_, 0.0) {} 7 void add_edge(int u, int v) { 8 if (u == v) return; adj[u].push_back(v); adj[v].push_back(u); deg[u]+=1.0; deg[v]+=1.0; } 9 }; 10 11 static inline void multiply_S(const Graph &G, const vector<double> &x, vector<double> &y) { 12 int n = G.n; fill(y.begin(), y.end(), 0.0); 13 for (int u = 0; u < n; ++u) { 14 double du = G.deg[u]; if (du == 0.0) continue; double su = 1.0/sqrt(du); 15 for (int v : G.adj[u]) { double dv = G.deg[v]; if (dv == 0.0) continue; y[u] += su * (1.0/sqrt(dv)) * x[v]; } 16 } 17 } 18 19 // Orthonormalize columns of X using modified Gram-Schmidt 20 static void mgs_orthonormalize(vector<vector<double>> &X) { 21 int n = (int)X.size(); int k = (int)X[0].size(); 22 for (int j = 0; j < k; ++j) { 23 // subtract projections on previous columns 24 for (int i = 0; i < j; ++i) { 25 double dot_ij = 0.0; for (int r = 0; r < n; ++r) dot_ij += X[r][j] * X[r][i]; 26 for (int r = 0; r < n; ++r) X[r][j] -= dot_ij * X[r][i]; 27 } 28 double nrm = 0.0; for (int r = 0; r < n; ++r) nrm += X[r][j]*X[r][j]; 29 nrm = sqrt(max(1e-18, nrm)); 30 for (int r = 0; r < n; ++r) X[r][j] /= nrm; 31 } 32 } 33 34 // Project each column of X to be orthogonal to u 35 static void deflate_against(vector<vector<double>> &X, const vector<double> &u) { 36 double uu = 0.0; for (double val : u) uu += val*val; uu = max(1e-18, uu); 37 int n = (int)X.size(); int k = (int)X[0].size(); 38 for (int j = 0; j < k; ++j) { 39 double alpha = 0.0; for (int r = 0; r < n; ++r) alpha += X[r][j] * u[r]; alpha /= uu; 40 for (int r = 0; r < n; ++r) X[r][j] -= alpha * u[r]; 41 } 42 } 43 44 // Block power iteration to get k dominant eigenvectors of S, deflating the trivial one 45 vector<vector<double>> topk_eigenvectors_S(const Graph &G, int k, int iters = 200) { 46 int n = G.n; k = max(1, k); 47 vector<double> u(n); for (int i = 0; i < n; ++i) u[i] = sqrt(G.deg[i]); 48 // Initialize X with Gaussian random entries 49 mt19937 rng(123); 50 normal_distribution<double> nd(0.0, 1.0); 51 vector<vector<double>> X(n, vector<double>(k)); 52 for (int i = 0; i < n; ++i) for (int j = 0; j < k; ++j) X[i][j] = nd(rng); 53 deflate_against(X, u); 54 mgs_orthonormalize(X); 55 56 vector<vector<double>> Y(n, vector<double>(k, 0.0)); 57 vector<double> xcol(n), ycol(n); 58 for (int it = 0; it < iters; ++it) { 59 // Y = S X (column by column multiply) 60 for (int j = 0; j < k; ++j) { 61 for (int r = 0; r < n; ++r) xcol[r] = X[r][j]; 62 fill(ycol.begin(), ycol.end(), 0.0); 63 multiply_S(G, xcol, ycol); 64 for (int r = 0; r < n; ++r) Y[r][j] = ycol[r]; 65 } 66 deflate_against(Y, u); 67 mgs_orthonormalize(Y); 68 X.swap(Y); 69 } 70 return X; // columns approximate top-k (excluding trivial) 71 } 72 73 // Row-normalize an n x k matrix 74 static void row_normalize(vector<vector<double>> &M) { 75 int n = (int)M.size(); int k = (int)M[0].size(); 76 for (int i = 0; i < n; ++i) { 77 double nrm = 0.0; for (int j = 0; j < k; ++j) nrm += M[i][j] * M[i][j]; 78 nrm = sqrt(max(1e-18, nrm)); 79 for (int j = 0; j < k; ++j) M[i][j] /= nrm; 80 } 81 } 82 83 // Simple k-means on rows of data (n x d) 84 vector<int> kmeans(const vector<vector<double>> &data, int k, int iters = 50, int seed = 7) { 85 int n = (int)data.size(); int d = (int)data[0].size(); 86 mt19937 rng(seed); uniform_int_distribution<int> uni(0, n-1); 87 vector<vector<double>> cent(k, vector<double>(d, 0.0)); 88 for (int j = 0; j < k; ++j) cent[j] = data[uni(rng)]; 89 vector<int> label(n, 0); 90 for (int it = 0; it < iters; ++it) { 91 bool changed = false; 92 // assign 93 for (int i = 0; i < n; ++i) { 94 int best = 0; double bestd = numeric_limits<double>::infinity(); 95 for (int j = 0; j < k; ++j) { 96 double d2 = 0.0; for (int t = 0; t < d; ++t) { double diff = data[i][t] - cent[j][t]; d2 += diff*diff; } 97 if (d2 < bestd) { bestd = d2; best = j; } 98 } 99 if (label[i] != best) { label[i] = best; changed = true; } 100 } 101 // update 102 vector<vector<double>> newc(k, vector<double>(d, 0.0)); 103 vector<int> cnt(k, 0); 104 for (int i = 0; i < n; ++i) { cnt[label[i]]++; for (int t = 0; t < d; ++t) newc[label[i]][t] += data[i][t]; } 105 for (int j = 0; j < k; ++j) { 106 if (cnt[j] == 0) { newc[j] = data[uni(rng)]; cnt[j] = 1; } 107 for (int t = 0; t < d; ++t) newc[j][t] /= cnt[j]; 108 } 109 cent.swap(newc); 110 if (!changed) break; 111 } 112 return label; 113 } 114 115 int main() { 116 ios::sync_with_stdio(false); 117 cin.tie(nullptr); 118 119 // Build a toy graph with 3 clusters (two cliques and one ring) 120 int n = 18; Graph G(n); 121 // Clique 0..5 122 for (int i = 0; i < 6; ++i) for (int j = i+1; j < 6; ++j) G.add_edge(i, j); 123 // Clique 6..11 124 for (int i = 6; i < 12; ++i) for (int j = i+1; j < 12; ++j) G.add_edge(i, j); 125 // Ring 12..17 126 for (int i = 12; i < 18; ++i) G.add_edge(i, 12 + (i-12+1)%6); 127 // Sparse inter-cluster edges 128 G.add_edge(2, 7); G.add_edge(10, 14); G.add_edge(4, 12); 129 130 int k = 3; 131 auto U = topk_eigenvectors_S(G, k, 200); // n x k columns are eigenvectors (dominant, excluding trivial) 132 133 // Convert to row-major embedding (n x k) 134 int nrows = G.n; int kcols = k; 135 vector<vector<double>> embed(nrows, vector<double>(kcols, 0.0)); 136 for (int i = 0; i < nrows; ++i) for (int j = 0; j < kcols; ++j) embed[i][j] = U[i][j]; 137 138 // Row-normalize (Ng–Jordan–Weiss) 139 row_normalize(embed); 140 141 // Cluster rows with k-means 142 vector<int> labels = kmeans(embed, k, 100, 11); 143 144 cout << "Cluster labels (node:cluster):\n"; 145 for (int i = 0; i < nrows; ++i) cout << i << ":" << labels[i] << (i+1==nrows?'\n':' '); 146 147 return 0; 148 } 149
This program finds k dominant eigenvectors of S = D^{-1/2} A D^{-1/2} via block power iteration while deflating the trivial eigenvector. It row-normalizes the eigenvector matrix to form a spectral embedding and then applies k-means to obtain k clusters. This is the normalized spectral clustering algorithm (Ng–Jordan–Weiss variant) implemented with sparse O(m) multiplies.