📚TheoryIntermediate

Spectral Graph Theory

Key Points

  • Spectral graph theory studies graphs by looking at eigenvalues and eigenvectors of matrices like the adjacency matrix A and Laplacians L and .
  • The second smallest eigenvalue of the Laplacian (the Fiedler value, quantifies how well a graph is connected and guides graph partitioning.
  • Cheeger’s inequality links to the quality of the best graph cut, explaining why spectral methods find good clusters.
  • Normalized Laplacian L_ A is scale-invariant and preferred for irregular-degree graphs.
  • Spectral clustering embeds nodes using Laplacian eigenvectors and then applies k-means to get communities.
  • Random walks on graphs have stationary distribution proportional to node degrees; the spectral gap controls mixing time.
  • Graph neural networks often approximate spectral filters defined by Laplacian eigen-decompositions using fast polynomial filters.
  • In practice, iterative methods that multiply by sparse matrices yield scalable spectral algorithms with O(m) work per iteration.

Prerequisites

  • Linear algebra (eigenvalues, eigenvectors, orthogonality)Spectral methods rely on eigen-decompositions, Rayleigh quotients, and orthogonalization.
  • Graph theory basicsUnderstanding adjacency, degree, paths, and cuts is necessary to interpret Laplacians and partitions.
  • Probability and Markov chainsRandom walks, stationary distributions, and mixing times connect to spectral gaps.
  • Optimization and k-means clusteringSpectral clustering uses k-means on eigenvector embeddings.
  • Numerical methodsIterative eigen-solvers and convergence criteria are key for scalable implementations.

Detailed Explanation

Tap terms for definitions

01Overview

Hook → If you could listen to a graph, what would it sound like? Spectral graph theory answers this by turning a graph into a musical instrument whose notes are its eigenvalues and whose harmonics are eigenvectors. These frequencies reveal connectivity, bottlenecks, and cluster structure that are hard to see directly from edges. Concept → We represent a graph with matrices: the adjacency matrix A, the degree matrix D, the combinatorial Laplacian L = D − A, and the normalized Laplacian L_norm = I − D^{-1/2} A D^{-1/2}. The eigenvalues and eigenvectors of these matrices encode structural properties. For instance, the second smallest eigenvalue of L (the Fiedler value) measures how well-connected the graph is; its eigenvector guides us to split the graph into two balanced pieces with few crossing edges. Example → In spectral clustering, we compute a few of the smallest eigenvectors of L_norm (or equivalently the largest eigenvectors of D^{-1/2} A D^{-1/2}), embed nodes as points in low-dimensional space, and then run k-means. This approach underlies community detection in networks, image segmentation in vision, and feature learning in graph neural networks.

02Intuition & Analogies

Hook → Imagine a network of masses and springs: nodes are masses, edges are springs. When you pluck this system, it vibrates in certain modes at certain frequencies. Low-frequency modes deform the system smoothly, revealing large-scale structure; high-frequency modes wiggle locally. Concept → Graph Laplacians are the mathematical version of this mass–spring system. The Laplacian energy x^T L x measures how much the signal x changes across edges. Eigenvectors minimize this energy subject to constraints; the smallest-energy nontrivial eigenvector (Fiedler vector) is the smoothest way to vary over the graph while not being constant. Where this vector changes sign is where the graph “wants to split,” revealing a natural cut. Example → Suppose a social network has two friend groups with a few cross-group ties. If you color each node by the value of the Fiedler vector, nodes in the same community have similar colors, and there’s a sharp transition across the sparse bridge. Sorting nodes by this value and cutting at the transition yields a clean partition. For k communities, multiple low-frequency eigenvectors serve as coordinates; points from the same community cluster together, and k-means separates them.

03Formal Definition

formalize the vibration picture, we translate edges into quadratic forms that penalize disagreement across links. an undirected graph G = (V, E) with weights , the matrices are: , = , − A, and L_{} = I − A . L is symmetric positive semi-definite, so it has real eigenvalues 0 = ≤ … ≤ The trivial eigenvector for L is the all-ones vector 1. For L_{}, the trivial eigenvector is 1 with eigenvalue 0. The Rayleigh quotient (x) = ( L x)/( x) equals ()^2 divided by ^2, quantifying smoothness over edges. By the Courant–Fischer theorem, = (x), making the Fiedler vector any minimizer of this constrained problem. normalized random-walk matrix A has largest eigenvalue 1 with stationary vector π proportional to degrees. The symmetric normalization A is similar to P and shares eigenvalues; L_{} = I − S, so the smallest eigenvalues of L_{} correspond to the largest eigenvalues of S. This equivalence lets us find clusters by computing dominant eigenvectors of S and then deflating the trivial one.

04When to Use

Hook → If you need the big picture of a large, noisy graph—Who belongs with whom? Where are the bottlenecks?—spectral methods are a strong first choice. Concept → Use spectral graph theory when you want global structure from local edges. It excels for: (1) clustering/community detection via spectral embeddings (e.g., normalized cut minimization), (2) graph partitioning for load balancing in parallel computing, (3) semi-supervised learning on graphs (label propagation smoothness priors), (4) analyzing random walks and Markov chains (mixing time, PageRank variants), and (5) graph neural networks where filters are functions of L (e.g., Chebyshev or GCN filters). Example → In image segmentation, build a pixel graph with weights based on color/position similarity, compute the first few eigenvectors of L_{\text{norm}}, embed pixels into 2–10 dimensions, and run k-means to separate regions. In recommendation systems, user–item bipartite graphs can be embedded via spectral methods to reveal communities. For GNNs on citation networks, approximate spectral filters g(L) with polynomials in L to get localized convolutions that are efficient and stable.

⚠️Common Mistakes

Hook → Spectral tools are powerful, but subtle choices can break them or hide their benefits. Concept → Frequent pitfalls include: (1) using the unnormalized Laplacian on graphs with highly variable degrees, which biases partitions; (2) forgetting to remove isolated nodes or handling them incorrectly; (3) extracting the wrong eigenvector (confusing the top eigenvector of S with the second one, or not deflating the trivial eigenvector); (4) interpreting eigen-gaps without checking for near-multiplicities; (5) running k-means on unnormalized eigenvector rows, which can distort clusters; and (6) treating numerical eigen-solvers as black boxes without monitoring convergence. Example → If you compute the largest eigenvector of S = D^{-1/2} A D^{-1/2} and cluster by its sign, you’ll often get a trivial split aligned with degree because the top eigenvector is proportional to D^{1/2} 1. The fix is to project out this trivial direction and use the next eigenvector(s). Similarly, when doing spectral clustering, always row-normalize the eigenvector matrix before k-means (Ng–Jordan–Weiss variant), and for graphs with isolated nodes, either remove them or assign them as singleton clusters to avoid numerical issues.

Key Formulas

Combinatorial Laplacian

Explanation: Defines the Laplacian from degree and adjacency. It penalizes differences across edges and is central to spectral analysis.

Normalized Laplacian

Explanation: Removes degree scaling, making spectral quantities comparable across nodes with different degrees. Preferred for clustering.

Laplacian eigenvalue ordering

Explanation: L is positive semi-definite, so eigenvalues are nonnegative and ordered. The first is zero with eigenvector 1.

Rayleigh quotient

Explanation: Measures how x aligns with M’s eigenvectors. Minimizing or maximizing this over constraints yields eigenvalues.

Fiedler value (variational)

Explanation: The algebraic connectivity is the smallest nontrivial Laplacian Rayleigh quotient. Its minimizer is the Fiedler vector.

Symmetric normalization

Explanation: Relates the normalized Laplacian to a symmetric stochastic-like matrix S. Dominant eigenvectors of S correspond to smallest eigenvectors of .

Cheeger inequality (graph)

Explanation: Links the best conductance (Cheeger constant) to the Laplacian eigenvalue Small implies a sparse cut exists, and vice versa up to constants.

Stationary distribution

Explanation: For an undirected graph’s random walk, the stationary probability of node i is proportional to its degree.

Mixing by spectral gap

Explanation: Convergence to stationarity is geometric with rate controlled by the spectral gap γ of P (or S). Larger gaps mean faster mixing.

Normalized cut objective

Explanation: Measures the fraction of edges leaving each side relative to its volume. Minimization is relaxed by eigenvectors.

Spectral filtering

Explanation: Applying a graph filter g(L) corresponds to scaling eigencomponents by g on eigenvalues. Used in spectral GNNs.

Relaxed k-cut

Explanation: The optimal Y spans the k smallest eigenvectors of . Row-normalizing Y and running k-means yields spectral clustering.

Complexity Analysis

For sparse graphs with n nodes and m edges, the core cost in spectral methods is multiplying a vector by a graph-derived matrix. Multiplying by A, L, or the symmetric normalization A can be implemented in O(m) time using adjacency lists, since each undirected edge contributes a constant number of operations. Power iteration to obtain one dominant eigenpair takes O(T m) time where T is the number of iterations needed to converge; T typically scales like O((1/)/) where gap is the eigenvalue separation and the desired accuracy. To get the second eigenvector of S, we deflate or orthogonalize against the trivial eigenvector, preserving the O(m) per-iteration cost. For k eigenvectors, block power or Lanczos methods cost O(T m k) with an additional O(n ) for orthonormalization each iteration. The k-means step in spectral clustering on an n×k embedding costs O(I n k ), where I is the number of Lloyd iterations and is effectively k (distance computations). Space-wise, storing a sparse graph is O(n + m). Storing k dense vectors adds O(n k). During iterations, we maintain a handful of length-n vectors, so peak memory is O(n + m + n k). In practice, convergence is dominated by the spectral gap: well-separated clusters (large gaps) converge quickly, while near-degenerate clusters require more iterations or more sophisticated solvers (e.g., Lanczos with reorthogonalization). Dense representations (n×n) would cost O() memory and O() per multiply and are only suitable for small graphs; thus sparse implementations are crucial for scalability.

Code Examples

Compute the Fiedler vector (2-way spectral cut) via power iteration on S = D^{-1/2} A D^{-1/2}
1#include <bits/stdc++.h>
2using namespace std;
3
4// Undirected, unweighted graph stored as adjacency list
5struct Graph {
6 int n; // number of nodes
7 vector<vector<int>> adj; // adjacency list
8 vector<double> deg; // degrees
9 Graph(int n_) : n(n_), adj(n_), deg(n_, 0.0) {}
10 void add_edge(int u, int v) {
11 if (u == v) return; // ignore self-loops for simplicity
12 adj[u].push_back(v);
13 adj[v].push_back(u);
14 deg[u] += 1.0;
15 deg[v] += 1.0;
16 }
17};
18
19// Multiply y = S x where S = D^{-1/2} A D^{-1/2} using adjacency list
20static inline void multiply_S(const Graph &G, const vector<double> &x, vector<double> &y) {
21 int n = G.n;
22 fill(y.begin(), y.end(), 0.0);
23 for (int u = 0; u < n; ++u) {
24 double du = G.deg[u];
25 if (du == 0.0) continue; // isolated node contributes nothing
26 double scale_u = 1.0 / sqrt(du);
27 for (int v : G.adj[u]) {
28 double dv = G.deg[v];
29 if (dv == 0.0) continue;
30 double w = scale_u * (1.0 / sqrt(dv)); // 1/sqrt(d_u d_v)
31 y[u] += w * x[v];
32 }
33 }
34}
35
36static inline double dot(const vector<double> &a, const vector<double> &b) {
37 double s = 0.0; int n = (int)a.size();
38 for (int i = 0; i < n; ++i) s += a[i] * b[i];
39 return s;
40}
41
42static inline double norm2(const vector<double> &a) {
43 return sqrt(max(0.0, dot(a, a)));
44}
45
46// Project x to be orthogonal to u: x <- x - (x.u / u.u) u
47static inline void project_out(vector<double> &x, const vector<double> &u) {
48 double uu = max(1e-18, dot(u, u));
49 double alpha = dot(x, u) / uu;
50 for (size_t i = 0; i < x.size(); ++i) x[i] -= alpha * u[i];
51}
52
53// Compute the second largest eigenpair (mu2, v2) of S by deflating the trivial eigenvector u = sqrt(deg)
54// Returns estimated mu2 and fills v2 (unit norm). Also returns lambda2(L_norm) = 1 - mu2.
55pair<double,double> second_eigen_S(const Graph &G, vector<double> &v2, int iters = 200, double tol = 1e-9) {
56 int n = G.n;
57 // Trivial eigenvector for S is u = sqrt(deg)
58 vector<double> u(n);
59 for (int i = 0; i < n; ++i) u[i] = sqrt(G.deg[i]);
60
61 // Initialize v2 randomly and orthogonalize to u
62 mt19937 rng(42);
63 normal_distribution<double> nd(0.0, 1.0);
64 v2.assign(n, 0.0);
65 for (int i = 0; i < n; ++i) v2[i] = nd(rng);
66 project_out(v2, u);
67 double nrm = norm2(v2);
68 if (nrm < 1e-12) {
69 // If all isolated, return trivial
70 return {0.0, 1.0};
71 }
72 for (int i = 0; i < n; ++i) v2[i] /= nrm;
73
74 vector<double> y(n, 0.0);
75 double mu_old = 0.0;
76 for (int it = 0; it < iters; ++it) {
77 multiply_S(G, v2, y);
78 // Deflate: ensure orthogonal to u (remove trivial component)
79 project_out(y, u);
80 double nrm_y = norm2(y);
81 if (nrm_y < 1e-14) break;
82 for (int i = 0; i < n; ++i) v2[i] = y[i] / nrm_y;
83 // Rayleigh quotient as eigenvalue estimate
84 multiply_S(G, v2, y);
85 double mu = dot(v2, y);
86 if (fabs(mu - mu_old) < tol * max(1.0, fabs(mu_old))) {
87 mu_old = mu;
88 break;
89 }
90 mu_old = mu;
91 }
92 double mu2 = mu_old; // second largest eigenvalue of S
93 double lambda2 = 1.0 - mu2; // second smallest eigenvalue of L_norm
94 return {mu2, lambda2};
95}
96
97int main() {
98 ios::sync_with_stdio(false);
99 cin.tie(nullptr);
100
101 // Example: two 5-node cliques connected by a single bridge (barbell-like)
102 int n = 10; Graph G(n);
103 // Clique 0..4
104 for (int i = 0; i < 5; ++i) for (int j = i+1; j < 5; ++j) G.add_edge(i, j);
105 // Clique 5..9
106 for (int i = 5; i < 10; ++i) for (int j = i+1; j < 10; ++j) G.add_edge(i, j);
107 // Weak bridge
108 G.add_edge(4, 5);
109
110 vector<double> v2;
111 auto [mu2, lambda2] = second_eigen_S(G, v2, 500, 1e-10);
112
113 cout.setf(std::ios::fixed); cout << setprecision(6);
114 cout << "Estimated mu2 (S): " << mu2 << "\n";
115 cout << "Estimated lambda2 (L_norm): " << lambda2 << "\n";
116
117 // 2-way spectral cut by sign (median threshold for robustness)
118 vector<int> idx(n); iota(idx.begin(), idx.end(), 0);
119 vector<double> v2_copy = v2;
120 nth_element(idx.begin(), idx.begin()+n/2, idx.end(), [&](int a, int b){return v2_copy[a] < v2_copy[b];});
121 double median = v2_copy[idx[n/2]];
122
123 vector<int> part(n, 0);
124 for (int i = 0; i < n; ++i) part[i] = (v2[i] >= median) ? 1 : 0;
125
126 cout << "Partition (node:part):\n";
127 for (int i = 0; i < n; ++i) cout << i << ":" << part[i] << (i+1==n?'\n':' ');
128
129 return 0;
130}
131

We construct an undirected graph, implement a sparse multiply by S = D^{-1/2} A D^{-1/2}, and perform power iteration while projecting out the trivial eigenvector u = sqrt(deg). The iteration converges to the second largest eigenvector of S, which corresponds to the Fiedler vector of L_norm. We estimate the eigenvalue with the Rayleigh quotient and then cut the graph into two parts by thresholding the eigenvector at its median.

Time: O(T m) where T is iterations (typically tens to hundreds) and m is the number of edges.Space: O(n + m) for the graph plus O(n) for working vectors.
Spectral clustering (k-way) using block power iteration and k-means
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Graph {
5 int n; vector<vector<int>> adj; vector<double> deg;
6 Graph(int n_) : n(n_), adj(n_), deg(n_, 0.0) {}
7 void add_edge(int u, int v) {
8 if (u == v) return; adj[u].push_back(v); adj[v].push_back(u); deg[u]+=1.0; deg[v]+=1.0; }
9};
10
11static inline void multiply_S(const Graph &G, const vector<double> &x, vector<double> &y) {
12 int n = G.n; fill(y.begin(), y.end(), 0.0);
13 for (int u = 0; u < n; ++u) {
14 double du = G.deg[u]; if (du == 0.0) continue; double su = 1.0/sqrt(du);
15 for (int v : G.adj[u]) { double dv = G.deg[v]; if (dv == 0.0) continue; y[u] += su * (1.0/sqrt(dv)) * x[v]; }
16 }
17}
18
19// Orthonormalize columns of X using modified Gram-Schmidt
20static void mgs_orthonormalize(vector<vector<double>> &X) {
21 int n = (int)X.size(); int k = (int)X[0].size();
22 for (int j = 0; j < k; ++j) {
23 // subtract projections on previous columns
24 for (int i = 0; i < j; ++i) {
25 double dot_ij = 0.0; for (int r = 0; r < n; ++r) dot_ij += X[r][j] * X[r][i];
26 for (int r = 0; r < n; ++r) X[r][j] -= dot_ij * X[r][i];
27 }
28 double nrm = 0.0; for (int r = 0; r < n; ++r) nrm += X[r][j]*X[r][j];
29 nrm = sqrt(max(1e-18, nrm));
30 for (int r = 0; r < n; ++r) X[r][j] /= nrm;
31 }
32}
33
34// Project each column of X to be orthogonal to u
35static void deflate_against(vector<vector<double>> &X, const vector<double> &u) {
36 double uu = 0.0; for (double val : u) uu += val*val; uu = max(1e-18, uu);
37 int n = (int)X.size(); int k = (int)X[0].size();
38 for (int j = 0; j < k; ++j) {
39 double alpha = 0.0; for (int r = 0; r < n; ++r) alpha += X[r][j] * u[r]; alpha /= uu;
40 for (int r = 0; r < n; ++r) X[r][j] -= alpha * u[r];
41 }
42}
43
44// Block power iteration to get k dominant eigenvectors of S, deflating the trivial one
45vector<vector<double>> topk_eigenvectors_S(const Graph &G, int k, int iters = 200) {
46 int n = G.n; k = max(1, k);
47 vector<double> u(n); for (int i = 0; i < n; ++i) u[i] = sqrt(G.deg[i]);
48 // Initialize X with Gaussian random entries
49 mt19937 rng(123);
50 normal_distribution<double> nd(0.0, 1.0);
51 vector<vector<double>> X(n, vector<double>(k));
52 for (int i = 0; i < n; ++i) for (int j = 0; j < k; ++j) X[i][j] = nd(rng);
53 deflate_against(X, u);
54 mgs_orthonormalize(X);
55
56 vector<vector<double>> Y(n, vector<double>(k, 0.0));
57 vector<double> xcol(n), ycol(n);
58 for (int it = 0; it < iters; ++it) {
59 // Y = S X (column by column multiply)
60 for (int j = 0; j < k; ++j) {
61 for (int r = 0; r < n; ++r) xcol[r] = X[r][j];
62 fill(ycol.begin(), ycol.end(), 0.0);
63 multiply_S(G, xcol, ycol);
64 for (int r = 0; r < n; ++r) Y[r][j] = ycol[r];
65 }
66 deflate_against(Y, u);
67 mgs_orthonormalize(Y);
68 X.swap(Y);
69 }
70 return X; // columns approximate top-k (excluding trivial)
71}
72
73// Row-normalize an n x k matrix
74static void row_normalize(vector<vector<double>> &M) {
75 int n = (int)M.size(); int k = (int)M[0].size();
76 for (int i = 0; i < n; ++i) {
77 double nrm = 0.0; for (int j = 0; j < k; ++j) nrm += M[i][j] * M[i][j];
78 nrm = sqrt(max(1e-18, nrm));
79 for (int j = 0; j < k; ++j) M[i][j] /= nrm;
80 }
81}
82
83// Simple k-means on rows of data (n x d)
84vector<int> kmeans(const vector<vector<double>> &data, int k, int iters = 50, int seed = 7) {
85 int n = (int)data.size(); int d = (int)data[0].size();
86 mt19937 rng(seed); uniform_int_distribution<int> uni(0, n-1);
87 vector<vector<double>> cent(k, vector<double>(d, 0.0));
88 for (int j = 0; j < k; ++j) cent[j] = data[uni(rng)];
89 vector<int> label(n, 0);
90 for (int it = 0; it < iters; ++it) {
91 bool changed = false;
92 // assign
93 for (int i = 0; i < n; ++i) {
94 int best = 0; double bestd = numeric_limits<double>::infinity();
95 for (int j = 0; j < k; ++j) {
96 double d2 = 0.0; for (int t = 0; t < d; ++t) { double diff = data[i][t] - cent[j][t]; d2 += diff*diff; }
97 if (d2 < bestd) { bestd = d2; best = j; }
98 }
99 if (label[i] != best) { label[i] = best; changed = true; }
100 }
101 // update
102 vector<vector<double>> newc(k, vector<double>(d, 0.0));
103 vector<int> cnt(k, 0);
104 for (int i = 0; i < n; ++i) { cnt[label[i]]++; for (int t = 0; t < d; ++t) newc[label[i]][t] += data[i][t]; }
105 for (int j = 0; j < k; ++j) {
106 if (cnt[j] == 0) { newc[j] = data[uni(rng)]; cnt[j] = 1; }
107 for (int t = 0; t < d; ++t) newc[j][t] /= cnt[j];
108 }
109 cent.swap(newc);
110 if (!changed) break;
111 }
112 return label;
113}
114
115int main() {
116 ios::sync_with_stdio(false);
117 cin.tie(nullptr);
118
119 // Build a toy graph with 3 clusters (two cliques and one ring)
120 int n = 18; Graph G(n);
121 // Clique 0..5
122 for (int i = 0; i < 6; ++i) for (int j = i+1; j < 6; ++j) G.add_edge(i, j);
123 // Clique 6..11
124 for (int i = 6; i < 12; ++i) for (int j = i+1; j < 12; ++j) G.add_edge(i, j);
125 // Ring 12..17
126 for (int i = 12; i < 18; ++i) G.add_edge(i, 12 + (i-12+1)%6);
127 // Sparse inter-cluster edges
128 G.add_edge(2, 7); G.add_edge(10, 14); G.add_edge(4, 12);
129
130 int k = 3;
131 auto U = topk_eigenvectors_S(G, k, 200); // n x k columns are eigenvectors (dominant, excluding trivial)
132
133 // Convert to row-major embedding (n x k)
134 int nrows = G.n; int kcols = k;
135 vector<vector<double>> embed(nrows, vector<double>(kcols, 0.0));
136 for (int i = 0; i < nrows; ++i) for (int j = 0; j < kcols; ++j) embed[i][j] = U[i][j];
137
138 // Row-normalize (Ng–Jordan–Weiss)
139 row_normalize(embed);
140
141 // Cluster rows with k-means
142 vector<int> labels = kmeans(embed, k, 100, 11);
143
144 cout << "Cluster labels (node:cluster):\n";
145 for (int i = 0; i < nrows; ++i) cout << i << ":" << labels[i] << (i+1==nrows?'\n':' ');
146
147 return 0;
148}
149

This program finds k dominant eigenvectors of S = D^{-1/2} A D^{-1/2} via block power iteration while deflating the trivial eigenvector. It row-normalizes the eigenvector matrix to form a spectral embedding and then applies k-means to obtain k clusters. This is the normalized spectral clustering algorithm (Ng–Jordan–Weiss variant) implemented with sparse O(m) multiplies.

Time: O(T m k + T n k^2) for eigenvectors (T iterations) plus O(I n k^2) for k-means with I iterations.Space: O(n + m + n k) for the graph and embedding, plus O(n k) working memory.