📚TheoryIntermediate

Eigenvalue Decomposition

Key Points

  • •
    Eigenvalue decomposition rewrites a square matrix as a change of basis that reveals how it stretches and rotates space.
  • •
    Eigenvalues are scaling factors and eigenvectors are directions that do not change direction under the matrix transformation.
  • •
    The characteristic equation det(A - I) = 0 gives all eigenvalues, and the trace and determinant equal the sum and product of eigenvalues respectively.
  • •
    A matrix is diagonalizable if it has enough independent eigenvectors to form a basis, letting D .
  • •
    Real symmetric (Hermitian) matrices have orthonormal eigenvectors and real eigenvalues, yielding (the spectral theorem).
  • •
    Power iteration efficiently finds the dominant eigenvalue/eigenvector of large matrices using repeated matrix–vector multiplication.
  • •
    Eigenvalues diagnose stability, clustering structure, and long-run behavior of Markov chains (e.g., PageRank).
  • •
    Positive definiteness corresponds to all eigenvalues being positive, which guarantees convexity and stable energy-like behavior.

Prerequisites

  • →Matrix and vector operations — Understanding matrix–vector multiplication, norms, and basic linear algebra notation is foundational for eigenpairs.
  • →Determinants and trace — Characteristic equation, trace–sum, and determinant–product properties rely on these concepts.
  • →Orthogonality and inner products — Spectral theorem, Rayleigh quotient, and normalization steps depend on inner products and orthonormal bases.
  • →Numerical stability and floating-point — Iterative eigenvalue algorithms require careful normalization and tolerance settings to avoid numerical issues.
  • →Iterative methods and convergence — Power iteration and Jacobi are iterative; understanding convergence rates and stopping criteria is crucial.
  • →Graph basics and Markov chains — Applications like PageRank interpret eigenvectors of stochastic matrices as steady-state distributions.
  • →C++ programming fundamentals — To implement algorithms correctly, you need familiarity with vectors, loops, and numerical libraries or STL.

Detailed Explanation

Tap terms for definitions

01Overview

Eigenvalue decomposition is a way to understand a square matrix by discovering its most natural directions of action. When a matrix A multiplies a vector, it usually rotates and scales it. But for special directions—eigenvectors—the matrix only scales without changing direction. The amount of scaling is the corresponding eigenvalue. If we can collect enough linearly independent eigenvectors to form a basis, we can rewrite A as A = P D P^{-1}, where D is diagonal and contains eigenvalues, and P holds eigenvectors as columns. For real symmetric (or complex Hermitian) matrices, the situation is even nicer: eigenvalues are real and eigenvectors can be chosen orthonormal, giving A = Q \Lambda Q^{\top}. This diagonal structure is powerful because diagonal matrices are easy to compute with—raising to powers, exponentials, solving systems, and understanding dynamics become straightforward. In applications, eigenvalues and eigenvectors provide insight into stability (do repeated applications of A blow up or decay?), structure (what are the principal directions of variation?), and long-term behavior (where does a Markov chain settle?). Algorithms like power iteration, QR iteration, and Jacobi rotations compute eigenpairs numerically. In machine learning, they underlie PCA, spectral clustering, graph embeddings, and the analysis of optimization dynamics. Even when full decomposition is costly, the top few eigenpairs often suffice and can be found in near-linear time on sparse data.

02Intuition & Analogies

Imagine you push on a soft rubber sheet stretched on a frame. Pushing in most directions bends and distorts it in complicated ways. But there are some special directions where the sheet stretches straight out without twisting—only longer or shorter. Those directions are eigenvectors, and how much they stretch are eigenvalues. If you can label the sheet with a new set of axes aligned to all those special directions, then any push becomes simpler to describe: just independent stretching along each axis. That is what diagonalization does—changing coordinates to the eigenvector basis makes the matrix act like a pure scaler on each coordinate. Another picture: think of repeatedly applying a transformation, like pressing CTRL+V on a vector over and over. If one direction gets stretched a bit more than all others, any starting vector will gradually align with that most-stretched direction. This explains power iteration: multiply by A repeatedly and normalize; the vector converges to the dominant eigenvector. In Markov chains (like surfing the web randomly), the long-run probability of being on each page is the eigenvector for eigenvalue 1 of the transition matrix. In PCA, a cloud of points has directions where it varies the most; those are eigenvectors of the covariance matrix, and the eigenvalues tell you how much variance each direction holds. Finally, symmetry simplifies life. When A equals its transpose (A = A^{\top}), its special directions are guaranteed to be perpendicular and its stretch amounts are real numbers. That orthogonality means your new axes are nicely at right angles—great for numerical stability and interpretation.

03Formal Definition

Let A . A nonzero vector v is an eigenvector and is an eigenvalue if Av = v. Nontrivial solutions exist when (A - I) = 0, the characteristic equation. The multiset of roots \{\}_{i=1}^n (counting algebraic multiplicity) is the spectrum of A. If there exist n linearly independent eigenvectors, stack them as columns of P to obtain D , where D = (,,) is diagonal. A is diagonalizable if and only if the sum of geometric multiplicities equals n. If A is real symmetric (), then all eigenvalues are real and there exists an orthogonal matrix Q () such that , with diagonal of eigenvalues (spectral theorem). The Rayleigh quotient (x) = satisfies (x) = and (x) = when A is symmetric, and eigenvectors are the optimizers. The spectral radius is (A) = . For positive definiteness, A 0 if and only if all eigenvalues are strictly positive. In Markov chains with column-stochastic P, the stationary distribution satisfies = and corresponds to eigenvalue 1.

04When to Use

Use eigenvalue decomposition when you want to understand or simplify repeated linear transformations, compute matrix functions (e.g., A^k, e^{A}), or analyze stability. In numerical linear algebra, use it to solve symmetric problems robustly: PCA via eigen-decomposition of the covariance matrix, spectral clustering using the graph Laplacian’s eigenvectors, or modal analysis in mechanical systems. In Markov processes and PageRank, use the dominant eigenvector of a stochastic matrix to find steady-state distributions. In control and differential equations, eigenvalues of the system matrix determine whether solutions grow or decay over time. In optimization and deep learning, inspecting Hessian eigenvalues reveals curvature (convexity/conditioning), affecting step sizes and convergence of gradient methods. Algorithmically: choose power iteration (or Lanczos) to approximate the largest eigenpair of very large, sparse matrices; use deflation to get a few top components. Prefer the symmetric eigensolvers (Jacobi, QR with shifts, Lanczos) when A is symmetric/Hermitian—they are more stable and efficient. Use full QR iteration when you need the full spectrum of a moderate-size dense matrix. When only qualitative bounds are needed, apply Gershgorin’s theorem to localize eigenvalues cheaply.

⚠️Common Mistakes

  • Assuming all matrices are diagonalizable. Defective matrices (insufficient eigenvectors) exist; they require Jordan forms, and algorithms may behave poorly near such cases.
  • Ignoring symmetry. Applying algorithms that assume symmetry (e.g., Jacobi for all eigenvalues) to nonsymmetric matrices can give wrong results or fail to converge. Always check A = A^{\top} (or Hermitian) before using symmetric solvers.
  • Not normalizing in power iteration. Without normalization, vectors overflow/underflow; with a poor stopping criterion (e.g., absolute change in entries), you may terminate too early or never. Use norms and Rayleigh quotient changes with tolerances.
  • Initializing power iteration with a vector orthogonal to the dominant eigenvector. Although unlikely with random starts, it stalls convergence. Add a small random perturbation or restart.
  • Expecting power iteration to distinguish close eigenvalues quickly. Convergence rate depends on |\lambda_2 / \lambda_1|; if it is near 1, progress is slow. Use acceleration (shifts, Rayleigh quotient iteration) or Krylov methods (Lanczos/Arnoldi).
  • Misinterpreting eigenvalues of non-normal matrices. Large eigenvalues do not necessarily imply large transient growth; use singular values or pseudospectra when A is not normal.
  • Forgetting scaling and conditioning. Poorly scaled matrices and nearly multiple eigenvalues can cause large numerical errors; prefer orthogonal transformations (Householder, Givens) and double precision.
  • Treating stochastic matrices incorrectly. Use column- or row-stochastic convention consistently; handle dangling nodes in PageRank; ensure damping to guarantee a unique stationary distribution.

Key Formulas

Eigenpair definition

Explanation: An eigenvector v is scaled by A without changing direction, by factor . This is the core property defining eigenvalues and eigenvectors.

Characteristic equation

Explanation: Nontrivial solutions Av = v exist only when this determinant vanishes. Solving it yields all eigenvalues (with multiplicity).

Diagonalization

Explanation: If A has a basis of eigenvectors, we can express it via a similarity transform to a diagonal matrix. Computations with A simplify in this basis.

Spectral theorem (real symmetric)

Explanation: Symmetric matrices decompose with an orthogonal matrix of eigenvectors and a real diagonal of eigenvalues. This ensures numerical stability and interpretability.

Trace–eigenvalues relation

Explanation: The trace equals the sum of eigenvalues (counting multiplicity). Useful to sanity-check computed spectra.

Determinant–eigenvalues relation

Explanation: The determinant equals the product of eigenvalues. This connects volume scaling to the eigen-spectrum.

Rayleigh quotient

Explanation: For symmetric A, this scalar lies between the smallest and largest eigenvalues and attains them at the corresponding eigenvectors. It is a key estimator in iterative methods.

Spectral radius

Explanation: This is the largest magnitude eigenvalue of A. It controls asymptotic growth of \ and the convergence of power iteration.

Power iteration convergence

Explanation: When > , the error in the direction decays geometrically with ratio . Close eigenvalues imply slow convergence.

QR iteration

Explanation: Repeated QR factorizations and similarity transforms drive toward upper-triangular (Schur form), revealing eigenvalues on the diagonal.

Gershgorin discs

Explanation: Every eigenvalue lies within at least one Gershgorin disk. These bounds are quick to compute and useful for checks and preconditioning.

Positive definiteness via eigenvalues

Explanation: A symmetric matrix is positive definite if and only if all eigenvalues are positive. This criterion is central in optimization and statistics.

Explained variance

Explanation: In PCA, eigenvalues of the covariance matrix represent variances along principal components. This ratio measures how much variance top-k components capture.

Stationary distribution

Explanation: For a column-stochastic transition matrix P, the steady-state distribution is an eigenvector with eigenvalue 1, normalized to sum to one.

Complexity Analysis

Computing eigenvalue decompositions has a wide range of complexities depending on matrix structure and what you need. For a dense n×n matrix, classical algorithms for the full spectrum (e.g., Householder reduction to tridiagonal/upper-Hessenberg followed by shifted QR iteration) cost O() time and O() space. This is optimal up to constants for dense problems because just reading the matrix is O() and producing all eigenvectors is O(). For symmetric matrices, specialized pipelines (symmetric tridiagonalization + QR or divide-and-conquer) reduce constants and improve stability while maintaining O() time. When only the largest few eigenpairs are needed—common in ML—iterative methods are preferable. Power iteration costs O(k · ) where is the cost of a matrix–vector multiply and k is iterations to converge; for dense matrices T_), while for sparse matrices with m nonzeros T_). Convergence is geometric with rate | if eigenvalues are clustered, more iterations are required. Krylov subspace methods also take O(k · ) per iteration but extract multiple Ritz approximations simultaneously and converge faster in practice. Jacobi’s method for symmetric eigenproblems typically costs O() time (each sweep performs O() rotations with O(1) work each, and a few sweeps are needed) and O() space for storing A and the eigenvector matrix. PageRank power iteration on a web graph with m edges has per-iteration cost O(m) if implemented with sparse adjacency; handling dangling nodes and damping adds only O(n) overhead. Memory is dominated by storing the sparse graph O(m) and a few vectors O(n). Numerical stability favors orthogonal transformations and double precision; ill-conditioned problems may require more iterations or preconditioning.

Code Examples

Power Iteration for Dominant Eigenvalue and Eigenvector
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute dot product of two vectors
5double dot(const vector<double>& a, const vector<double>& b) {
6 double s = 0.0;
7 for (size_t i = 0; i < a.size(); ++i) s += a[i] * b[i];
8 return s;
9}
10
11// Compute Euclidean norm
12double norm2(const vector<double>& x) {
13 return sqrt(max(0.0, dot(x, x)));
14}
15
16// Matrix-vector multiply y = A x (A is dense)
17vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) {
18 size_t n = A.size();
19 vector<double> y(n, 0.0);
20 for (size_t i = 0; i < n; ++i) {
21 double s = 0.0;
22 for (size_t j = 0; j < n; ++j) s += A[i][j] * x[j];
23 y[i] = s;
24 }
25 return y;
26}
27
28// Power iteration: returns approximate dominant eigenvalue and eigenvector
29pair<double, vector<double>> power_iteration(const vector<vector<double>>& A, int max_iters = 1000, double tol = 1e-10) {
30 size_t n = A.size();
31 vector<double> x(n, 0.0);
32 // Random initial vector with small values
33 std::mt19937 rng(42);
34 std::uniform_real_distribution<double> dist(-1.0, 1.0);
35 for (size_t i = 0; i < n; ++i) x[i] = dist(rng);
36
37 // Ensure not the zero vector
38 double nx = norm2(x);
39 if (nx == 0.0) x[0] = 1.0; else for (size_t i = 0; i < n; ++i) x[i] /= nx;
40
41 double lambda_old = 0.0;
42 for (int k = 0; k < max_iters; ++k) {
43 vector<double> y = matvec(A, x);
44 double ny = norm2(y);
45 if (ny == 0.0) break; // A is singular and x in nullspace
46 for (size_t i = 0; i < n; ++i) x[i] = y[i] / ny; // normalize
47 // Rayleigh quotient as eigenvalue estimate
48 double lambda = dot(x, matvec(A, x));
49 if (fabs(lambda - lambda_old) <= tol * max(1.0, fabs(lambda))) {
50 return {lambda, x};
51 }
52 lambda_old = lambda;
53 }
54 // Final estimate
55 double lambda = dot(x, matvec(A, x));
56 return {lambda, x};
57}
58
59int main() {
60 // Example: symmetric matrix with known dominant eigenpair
61 vector<vector<double>> A = {
62 {4.0, 1.0, 0.0},
63 {1.0, 3.0, 1.0},
64 {0.0, 1.0, 2.0}
65 };
66
67 auto res = power_iteration(A, 10000, 1e-12);
68 double lambda = res.first;
69 vector<double> v = res.second;
70
71 // Print eigenvalue and eigenvector (one per line, space-separated)
72 cout.setf(std::ios::fixed); cout<<setprecision(10);
73 cout << lambda << '\n';
74 for (size_t i = 0; i < v.size(); ++i) {
75 if (i) cout << ' ';
76 cout << v[i];
77 }
78 cout << '\n';
79
80 // Residual norm ||A v - lambda v|| (should be small)
81 vector<double> Av = matvec(A, v);
82 for (size_t i = 0; i < v.size(); ++i) Av[i] -= lambda * v[i];
83 cout << norm2(Av) << '\n';
84 return 0;
85}
86

This program implements power iteration to approximate the dominant eigenvalue and its eigenvector of a real matrix using repeated matrix–vector multiplication and normalization. The Rayleigh quotient provides a robust eigenvalue estimate, and convergence is detected by the change in the estimate. The final line prints the residual norm to verify accuracy.

Time: O(k n^2) for dense matrices, where k is the number of iterationsSpace: O(n^2) for the matrix plus O(n) for vectors
Jacobi Eigenvalue Algorithm for Symmetric Matrices (Full Decomposition A = QΛQ^T)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Jacobi rotation for symmetric eigenproblem
5struct JacobiResult {
6 vector<double> eigenvalues; // diagonal of Lambda
7 vector<vector<double>> eigenvectors; // columns are eigenvectors (Q)
8};
9
10JacobiResult jacobi_eigen(const vector<vector<double>>& A, double tol = 1e-10, int max_sweeps = 100) {
11 size_t n = A.size();
12 vector<vector<double>> M = A; // working copy
13 vector<vector<double>> Q(n, vector<double>(n, 0.0));
14 for (size_t i = 0; i < n; ++i) Q[i][i] = 1.0; // initialize Q = I
15
16 auto offdiag_norm = [&](const vector<vector<double>>& B) {
17 double s = 0.0;
18 for (size_t i = 0; i < n; ++i)
19 for (size_t j = 0; j < n; ++j)
20 if (i != j) s += B[i][j] * B[i][j];
21 return sqrt(s);
22 };
23
24 for (int sweep = 0; sweep < max_sweeps; ++sweep) {
25 // Find largest off-diagonal element (by magnitude)
26 size_t p = 0, q = 1;
27 double maxval = 0.0;
28 for (size_t i = 0; i < n; ++i) {
29 for (size_t j = i + 1; j < n; ++j) {
30 double val = fabs(M[i][j]);
31 if (val > maxval) {
32 maxval = val; p = i; q = j;
33 }
34 }
35 }
36 if (maxval < tol) break; // already (nearly) diagonal
37
38 double app = M[p][p];
39 double aqq = M[q][q];
40 double apq = M[p][q];
41
42 // Compute rotation parameters
43 double phi = 0.5 * atan2(2.0 * apq, (aqq - app));
44 double c = cos(phi);
45 double s = sin(phi);
46
47 // Apply rotation to M: only rows/cols p and q change
48 for (size_t k = 0; k < n; ++k) {
49 if (k == p || k == q) continue;
50 double mkp = M[k][p];
51 double mkq = M[k][q];
52 double mkp_new = c * mkp - s * mkq;
53 double mkq_new = s * mkp + c * mkq;
54 M[k][p] = M[p][k] = mkp_new;
55 M[k][q] = M[q][k] = mkq_new;
56 }
57 double app_new = c * c * app - 2.0 * s * c * apq + s * s * aqq;
58 double aqq_new = s * s * app + 2.0 * s * c * apq + c * c * aqq;
59 M[p][p] = app_new;
60 M[q][q] = aqq_new;
61 M[p][q] = M[q][p] = 0.0;
62
63 // Accumulate eigenvectors: Q = Q * R(p,q,phi)
64 for (size_t k = 0; k < n; ++k) {
65 double qkp = Q[k][p];
66 double qkq = Q[k][q];
67 Q[k][p] = c * qkp - s * qkq;
68 Q[k][q] = s * qkp + c * qkq;
69 }
70
71 // Optional convergence check per sweep
72 if (offdiag_norm(M) < tol) break;
73 }
74
75 vector<double> evals(n);
76 for (size_t i = 0; i < n; ++i) evals[i] = M[i][i];
77
78 // Columns of Q are eigenvectors; make them unit-length (numerically already are)
79 for (size_t j = 0; j < n; ++j) {
80 double nj = 0.0; for (size_t i = 0; i < n; ++i) nj += Q[i][j] * Q[i][j];
81 nj = sqrt(max(0.0, nj)); if (nj > 0) for (size_t i = 0; i < n; ++i) Q[i][j] /= nj;
82 }
83
84 return {evals, Q};
85}
86
87// Multiply Q * Lambda * Q^T to reconstruct A (for verification)
88vector<vector<double>> reconstruct(const vector<double>& evals, const vector<vector<double>>& Q) {
89 size_t n = evals.size();
90 vector<vector<double>> R(n, vector<double>(n, 0.0));
91 // R = Q * diag(evals) * Q^T
92 for (size_t k = 0; k < n; ++k) {
93 for (size_t i = 0; i < n; ++i) {
94 double qik = Q[i][k] * evals[k];
95 for (size_t j = 0; j < n; ++j) R[i][j] += qik * Q[j][k];
96 }
97 }
98 return R;
99}
100
101int main() {
102 // Symmetric example matrix
103 vector<vector<double>> A = {
104 {4.0, 1.0, 2.0},
105 {1.0, 3.0, 0.5},
106 {2.0, 0.5, 1.0}
107 };
108
109 JacobiResult J = jacobi_eigen(A, 1e-12, 200);
110
111 // Print eigenvalues (one line)
112 cout.setf(std::ios::fixed); cout<<setprecision(10);
113 for (size_t i = 0; i < J.eigenvalues.size(); ++i) {
114 if (i) cout << ' ';
115 cout << J.eigenvalues[i];
116 }
117 cout << '\n';
118
119 // Verify A \approx Q * Lambda * Q^T by Frobenius norm of difference
120 vector<vector<double>> Arec = reconstruct(J.eigenvalues, J.eigenvectors);
121 double err = 0.0;
122 for (size_t i = 0; i < A.size(); ++i)
123 for (size_t j = 0; j < A.size(); ++j) {
124 double d = A[i][j] - Arec[i][j]; err += d * d;
125 }
126 cout << sqrt(err) << '\n';
127
128 return 0;
129}
130

This program implements the classic Jacobi rotation method for real symmetric matrices, producing all eigenvalues and an orthonormal eigenvector matrix Q such that A ≈ QΛQ^T. Each sweep zeros the largest off-diagonal entry via a plane rotation, and Q accumulates the rotations. The reconstruction error checks the spectral theorem numerically.

Time: Typically O(n^3) (a few sweeps, each O(n^3) in naive implementation); per rotation is O(n)Space: O(n^2) to store A and Q
PageRank via Power Iteration on a Sparse Graph
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute PageRank vector using power iteration with damping.
5// Graph given as adjacency list of outgoing edges.
6vector<double> pagerank(const vector<vector<int>>& out, double damping = 0.85, int max_iters = 100, double tol = 1e-12) {
7 int n = (int)out.size();
8 vector<int> outdeg(n, 0);
9 for (int u = 0; u < n; ++u) outdeg[u] = (int)out[u].size();
10
11 vector<double> r(n, 1.0 / n); // initial rank (probabilities sum to 1)
12 vector<double> rnext(n, 0.0);
13
14 for (int it = 0; it < max_iters; ++it) {
15 fill(rnext.begin(), rnext.end(), 0.0);
16 double dangling_mass = 0.0;
17 for (int j = 0; j < n; ++j) if (outdeg[j] == 0) dangling_mass += r[j];
18
19 // Distribute rank mass along edges (column-stochastic implicit)
20 for (int j = 0; j < n; ++j) {
21 if (outdeg[j] == 0) continue;
22 double share = r[j] / outdeg[j];
23 for (int v : out[j]) rnext[v] += damping * share;
24 }
25
26 // Add damping teleport and dangling contribution
27 double teleport = (1.0 - damping) / n;
28 double dangling_share = damping * dangling_mass / n;
29 for (int i = 0; i < n; ++i) rnext[i] += teleport + dangling_share;
30
31 // L1 normalization (should already sum to 1 numerically)
32 double s = 0.0; for (double x : rnext) s += x; if (s > 0) for (double& x : rnext) x /= s;
33
34 // Check convergence by L1 difference
35 double diff = 0.0; for (int i = 0; i < n; ++i) diff += fabs(rnext[i] - r[i]);
36 r.swap(rnext);
37 if (diff < tol) break;
38 }
39 return r;
40}
41
42int main() {
43 // Example small web: 0->1,2 ; 1->2 ; 2->0 ; 3->2
44 vector<vector<int>> out = {
45 {1,2}, // 0
46 {2}, // 1
47 {0}, // 2
48 {} // 3 (dangling)
49 };
50
51 vector<double> pr = pagerank(out, 0.85, 1000, 1e-14);
52
53 cout.setf(std::ios::fixed); cout<<setprecision(12);
54 for (size_t i = 0; i < pr.size(); ++i) {
55 if (i) cout << ' ';
56 cout << pr[i];
57 }
58 cout << '\n';
59 return 0;
60}
61

This code computes PageRank as the dominant left eigenvector (eigenvalue 1) of a stochastic transition operator with damping. It uses sparse matrix–vector multiplication via adjacency lists, accounts for dangling nodes by redistributing their rank uniformly, and normalizes to keep a valid probability vector.

Time: O(k ¡ (n + m)) per iteration is O(m) in practice, where m is the number of edges and k is iterationsSpace: O(n + m) for the graph and rank vectors