Eigenvalue Decomposition
Key Points
- â˘Eigenvalue decomposition rewrites a square matrix as a change of basis that reveals how it stretches and rotates space.
- â˘Eigenvalues are scaling factors and eigenvectors are directions that do not change direction under the matrix transformation.
- â˘The characteristic equation det(A - I) = 0 gives all eigenvalues, and the trace and determinant equal the sum and product of eigenvalues respectively.
- â˘A matrix is diagonalizable if it has enough independent eigenvectors to form a basis, letting D .
- â˘Real symmetric (Hermitian) matrices have orthonormal eigenvectors and real eigenvalues, yielding (the spectral theorem).
- â˘Power iteration efficiently finds the dominant eigenvalue/eigenvector of large matrices using repeated matrixâvector multiplication.
- â˘Eigenvalues diagnose stability, clustering structure, and long-run behavior of Markov chains (e.g., PageRank).
- â˘Positive definiteness corresponds to all eigenvalues being positive, which guarantees convexity and stable energy-like behavior.
Prerequisites
- âMatrix and vector operations â Understanding matrixâvector multiplication, norms, and basic linear algebra notation is foundational for eigenpairs.
- âDeterminants and trace â Characteristic equation, traceâsum, and determinantâproduct properties rely on these concepts.
- âOrthogonality and inner products â Spectral theorem, Rayleigh quotient, and normalization steps depend on inner products and orthonormal bases.
- âNumerical stability and floating-point â Iterative eigenvalue algorithms require careful normalization and tolerance settings to avoid numerical issues.
- âIterative methods and convergence â Power iteration and Jacobi are iterative; understanding convergence rates and stopping criteria is crucial.
- âGraph basics and Markov chains â Applications like PageRank interpret eigenvectors of stochastic matrices as steady-state distributions.
- âC++ programming fundamentals â To implement algorithms correctly, you need familiarity with vectors, loops, and numerical libraries or STL.
Detailed Explanation
Tap terms for definitions01Overview
Eigenvalue decomposition is a way to understand a square matrix by discovering its most natural directions of action. When a matrix A multiplies a vector, it usually rotates and scales it. But for special directionsâeigenvectorsâthe matrix only scales without changing direction. The amount of scaling is the corresponding eigenvalue. If we can collect enough linearly independent eigenvectors to form a basis, we can rewrite A as A = P D P^{-1}, where D is diagonal and contains eigenvalues, and P holds eigenvectors as columns. For real symmetric (or complex Hermitian) matrices, the situation is even nicer: eigenvalues are real and eigenvectors can be chosen orthonormal, giving A = Q \Lambda Q^{\top}. This diagonal structure is powerful because diagonal matrices are easy to compute withâraising to powers, exponentials, solving systems, and understanding dynamics become straightforward. In applications, eigenvalues and eigenvectors provide insight into stability (do repeated applications of A blow up or decay?), structure (what are the principal directions of variation?), and long-term behavior (where does a Markov chain settle?). Algorithms like power iteration, QR iteration, and Jacobi rotations compute eigenpairs numerically. In machine learning, they underlie PCA, spectral clustering, graph embeddings, and the analysis of optimization dynamics. Even when full decomposition is costly, the top few eigenpairs often suffice and can be found in near-linear time on sparse data.
02Intuition & Analogies
Imagine you push on a soft rubber sheet stretched on a frame. Pushing in most directions bends and distorts it in complicated ways. But there are some special directions where the sheet stretches straight out without twistingâonly longer or shorter. Those directions are eigenvectors, and how much they stretch are eigenvalues. If you can label the sheet with a new set of axes aligned to all those special directions, then any push becomes simpler to describe: just independent stretching along each axis. That is what diagonalization doesâchanging coordinates to the eigenvector basis makes the matrix act like a pure scaler on each coordinate. Another picture: think of repeatedly applying a transformation, like pressing CTRL+V on a vector over and over. If one direction gets stretched a bit more than all others, any starting vector will gradually align with that most-stretched direction. This explains power iteration: multiply by A repeatedly and normalize; the vector converges to the dominant eigenvector. In Markov chains (like surfing the web randomly), the long-run probability of being on each page is the eigenvector for eigenvalue 1 of the transition matrix. In PCA, a cloud of points has directions where it varies the most; those are eigenvectors of the covariance matrix, and the eigenvalues tell you how much variance each direction holds. Finally, symmetry simplifies life. When A equals its transpose (A = A^{\top}), its special directions are guaranteed to be perpendicular and its stretch amounts are real numbers. That orthogonality means your new axes are nicely at right anglesâgreat for numerical stability and interpretation.
03Formal Definition
04When to Use
Use eigenvalue decomposition when you want to understand or simplify repeated linear transformations, compute matrix functions (e.g., A^k, e^{A}), or analyze stability. In numerical linear algebra, use it to solve symmetric problems robustly: PCA via eigen-decomposition of the covariance matrix, spectral clustering using the graph Laplacianâs eigenvectors, or modal analysis in mechanical systems. In Markov processes and PageRank, use the dominant eigenvector of a stochastic matrix to find steady-state distributions. In control and differential equations, eigenvalues of the system matrix determine whether solutions grow or decay over time. In optimization and deep learning, inspecting Hessian eigenvalues reveals curvature (convexity/conditioning), affecting step sizes and convergence of gradient methods. Algorithmically: choose power iteration (or Lanczos) to approximate the largest eigenpair of very large, sparse matrices; use deflation to get a few top components. Prefer the symmetric eigensolvers (Jacobi, QR with shifts, Lanczos) when A is symmetric/Hermitianâthey are more stable and efficient. Use full QR iteration when you need the full spectrum of a moderate-size dense matrix. When only qualitative bounds are needed, apply Gershgorinâs theorem to localize eigenvalues cheaply.
â ď¸Common Mistakes
- Assuming all matrices are diagonalizable. Defective matrices (insufficient eigenvectors) exist; they require Jordan forms, and algorithms may behave poorly near such cases.
- Ignoring symmetry. Applying algorithms that assume symmetry (e.g., Jacobi for all eigenvalues) to nonsymmetric matrices can give wrong results or fail to converge. Always check A = A^{\top} (or Hermitian) before using symmetric solvers.
- Not normalizing in power iteration. Without normalization, vectors overflow/underflow; with a poor stopping criterion (e.g., absolute change in entries), you may terminate too early or never. Use norms and Rayleigh quotient changes with tolerances.
- Initializing power iteration with a vector orthogonal to the dominant eigenvector. Although unlikely with random starts, it stalls convergence. Add a small random perturbation or restart.
- Expecting power iteration to distinguish close eigenvalues quickly. Convergence rate depends on |\lambda_2 / \lambda_1|; if it is near 1, progress is slow. Use acceleration (shifts, Rayleigh quotient iteration) or Krylov methods (Lanczos/Arnoldi).
- Misinterpreting eigenvalues of non-normal matrices. Large eigenvalues do not necessarily imply large transient growth; use singular values or pseudospectra when A is not normal.
- Forgetting scaling and conditioning. Poorly scaled matrices and nearly multiple eigenvalues can cause large numerical errors; prefer orthogonal transformations (Householder, Givens) and double precision.
- Treating stochastic matrices incorrectly. Use column- or row-stochastic convention consistently; handle dangling nodes in PageRank; ensure damping to guarantee a unique stationary distribution.
Key Formulas
Eigenpair definition
Explanation: An eigenvector v is scaled by A without changing direction, by factor . This is the core property defining eigenvalues and eigenvectors.
Characteristic equation
Explanation: Nontrivial solutions Av = v exist only when this determinant vanishes. Solving it yields all eigenvalues (with multiplicity).
Diagonalization
Explanation: If A has a basis of eigenvectors, we can express it via a similarity transform to a diagonal matrix. Computations with A simplify in this basis.
Spectral theorem (real symmetric)
Explanation: Symmetric matrices decompose with an orthogonal matrix of eigenvectors and a real diagonal of eigenvalues. This ensures numerical stability and interpretability.
Traceâeigenvalues relation
Explanation: The trace equals the sum of eigenvalues (counting multiplicity). Useful to sanity-check computed spectra.
Determinantâeigenvalues relation
Explanation: The determinant equals the product of eigenvalues. This connects volume scaling to the eigen-spectrum.
Rayleigh quotient
Explanation: For symmetric A, this scalar lies between the smallest and largest eigenvalues and attains them at the corresponding eigenvectors. It is a key estimator in iterative methods.
Spectral radius
Explanation: This is the largest magnitude eigenvalue of A. It controls asymptotic growth of \ and the convergence of power iteration.
Power iteration convergence
Explanation: When > , the error in the direction decays geometrically with ratio . Close eigenvalues imply slow convergence.
QR iteration
Explanation: Repeated QR factorizations and similarity transforms drive toward upper-triangular (Schur form), revealing eigenvalues on the diagonal.
Gershgorin discs
Explanation: Every eigenvalue lies within at least one Gershgorin disk. These bounds are quick to compute and useful for checks and preconditioning.
Positive definiteness via eigenvalues
Explanation: A symmetric matrix is positive definite if and only if all eigenvalues are positive. This criterion is central in optimization and statistics.
Explained variance
Explanation: In PCA, eigenvalues of the covariance matrix represent variances along principal components. This ratio measures how much variance top-k components capture.
Stationary distribution
Explanation: For a column-stochastic transition matrix P, the steady-state distribution is an eigenvector with eigenvalue 1, normalized to sum to one.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute dot product of two vectors 5 double dot(const vector<double>& a, const vector<double>& b) { 6 double s = 0.0; 7 for (size_t i = 0; i < a.size(); ++i) s += a[i] * b[i]; 8 return s; 9 } 10 11 // Compute Euclidean norm 12 double norm2(const vector<double>& x) { 13 return sqrt(max(0.0, dot(x, x))); 14 } 15 16 // Matrix-vector multiply y = A x (A is dense) 17 vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) { 18 size_t n = A.size(); 19 vector<double> y(n, 0.0); 20 for (size_t i = 0; i < n; ++i) { 21 double s = 0.0; 22 for (size_t j = 0; j < n; ++j) s += A[i][j] * x[j]; 23 y[i] = s; 24 } 25 return y; 26 } 27 28 // Power iteration: returns approximate dominant eigenvalue and eigenvector 29 pair<double, vector<double>> power_iteration(const vector<vector<double>>& A, int max_iters = 1000, double tol = 1e-10) { 30 size_t n = A.size(); 31 vector<double> x(n, 0.0); 32 // Random initial vector with small values 33 std::mt19937 rng(42); 34 std::uniform_real_distribution<double> dist(-1.0, 1.0); 35 for (size_t i = 0; i < n; ++i) x[i] = dist(rng); 36 37 // Ensure not the zero vector 38 double nx = norm2(x); 39 if (nx == 0.0) x[0] = 1.0; else for (size_t i = 0; i < n; ++i) x[i] /= nx; 40 41 double lambda_old = 0.0; 42 for (int k = 0; k < max_iters; ++k) { 43 vector<double> y = matvec(A, x); 44 double ny = norm2(y); 45 if (ny == 0.0) break; // A is singular and x in nullspace 46 for (size_t i = 0; i < n; ++i) x[i] = y[i] / ny; // normalize 47 // Rayleigh quotient as eigenvalue estimate 48 double lambda = dot(x, matvec(A, x)); 49 if (fabs(lambda - lambda_old) <= tol * max(1.0, fabs(lambda))) { 50 return {lambda, x}; 51 } 52 lambda_old = lambda; 53 } 54 // Final estimate 55 double lambda = dot(x, matvec(A, x)); 56 return {lambda, x}; 57 } 58 59 int main() { 60 // Example: symmetric matrix with known dominant eigenpair 61 vector<vector<double>> A = { 62 {4.0, 1.0, 0.0}, 63 {1.0, 3.0, 1.0}, 64 {0.0, 1.0, 2.0} 65 }; 66 67 auto res = power_iteration(A, 10000, 1e-12); 68 double lambda = res.first; 69 vector<double> v = res.second; 70 71 // Print eigenvalue and eigenvector (one per line, space-separated) 72 cout.setf(std::ios::fixed); cout<<setprecision(10); 73 cout << lambda << '\n'; 74 for (size_t i = 0; i < v.size(); ++i) { 75 if (i) cout << ' '; 76 cout << v[i]; 77 } 78 cout << '\n'; 79 80 // Residual norm ||A v - lambda v|| (should be small) 81 vector<double> Av = matvec(A, v); 82 for (size_t i = 0; i < v.size(); ++i) Av[i] -= lambda * v[i]; 83 cout << norm2(Av) << '\n'; 84 return 0; 85 } 86
This program implements power iteration to approximate the dominant eigenvalue and its eigenvector of a real matrix using repeated matrixâvector multiplication and normalization. The Rayleigh quotient provides a robust eigenvalue estimate, and convergence is detected by the change in the estimate. The final line prints the residual norm to verify accuracy.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Jacobi rotation for symmetric eigenproblem 5 struct JacobiResult { 6 vector<double> eigenvalues; // diagonal of Lambda 7 vector<vector<double>> eigenvectors; // columns are eigenvectors (Q) 8 }; 9 10 JacobiResult jacobi_eigen(const vector<vector<double>>& A, double tol = 1e-10, int max_sweeps = 100) { 11 size_t n = A.size(); 12 vector<vector<double>> M = A; // working copy 13 vector<vector<double>> Q(n, vector<double>(n, 0.0)); 14 for (size_t i = 0; i < n; ++i) Q[i][i] = 1.0; // initialize Q = I 15 16 auto offdiag_norm = [&](const vector<vector<double>>& B) { 17 double s = 0.0; 18 for (size_t i = 0; i < n; ++i) 19 for (size_t j = 0; j < n; ++j) 20 if (i != j) s += B[i][j] * B[i][j]; 21 return sqrt(s); 22 }; 23 24 for (int sweep = 0; sweep < max_sweeps; ++sweep) { 25 // Find largest off-diagonal element (by magnitude) 26 size_t p = 0, q = 1; 27 double maxval = 0.0; 28 for (size_t i = 0; i < n; ++i) { 29 for (size_t j = i + 1; j < n; ++j) { 30 double val = fabs(M[i][j]); 31 if (val > maxval) { 32 maxval = val; p = i; q = j; 33 } 34 } 35 } 36 if (maxval < tol) break; // already (nearly) diagonal 37 38 double app = M[p][p]; 39 double aqq = M[q][q]; 40 double apq = M[p][q]; 41 42 // Compute rotation parameters 43 double phi = 0.5 * atan2(2.0 * apq, (aqq - app)); 44 double c = cos(phi); 45 double s = sin(phi); 46 47 // Apply rotation to M: only rows/cols p and q change 48 for (size_t k = 0; k < n; ++k) { 49 if (k == p || k == q) continue; 50 double mkp = M[k][p]; 51 double mkq = M[k][q]; 52 double mkp_new = c * mkp - s * mkq; 53 double mkq_new = s * mkp + c * mkq; 54 M[k][p] = M[p][k] = mkp_new; 55 M[k][q] = M[q][k] = mkq_new; 56 } 57 double app_new = c * c * app - 2.0 * s * c * apq + s * s * aqq; 58 double aqq_new = s * s * app + 2.0 * s * c * apq + c * c * aqq; 59 M[p][p] = app_new; 60 M[q][q] = aqq_new; 61 M[p][q] = M[q][p] = 0.0; 62 63 // Accumulate eigenvectors: Q = Q * R(p,q,phi) 64 for (size_t k = 0; k < n; ++k) { 65 double qkp = Q[k][p]; 66 double qkq = Q[k][q]; 67 Q[k][p] = c * qkp - s * qkq; 68 Q[k][q] = s * qkp + c * qkq; 69 } 70 71 // Optional convergence check per sweep 72 if (offdiag_norm(M) < tol) break; 73 } 74 75 vector<double> evals(n); 76 for (size_t i = 0; i < n; ++i) evals[i] = M[i][i]; 77 78 // Columns of Q are eigenvectors; make them unit-length (numerically already are) 79 for (size_t j = 0; j < n; ++j) { 80 double nj = 0.0; for (size_t i = 0; i < n; ++i) nj += Q[i][j] * Q[i][j]; 81 nj = sqrt(max(0.0, nj)); if (nj > 0) for (size_t i = 0; i < n; ++i) Q[i][j] /= nj; 82 } 83 84 return {evals, Q}; 85 } 86 87 // Multiply Q * Lambda * Q^T to reconstruct A (for verification) 88 vector<vector<double>> reconstruct(const vector<double>& evals, const vector<vector<double>>& Q) { 89 size_t n = evals.size(); 90 vector<vector<double>> R(n, vector<double>(n, 0.0)); 91 // R = Q * diag(evals) * Q^T 92 for (size_t k = 0; k < n; ++k) { 93 for (size_t i = 0; i < n; ++i) { 94 double qik = Q[i][k] * evals[k]; 95 for (size_t j = 0; j < n; ++j) R[i][j] += qik * Q[j][k]; 96 } 97 } 98 return R; 99 } 100 101 int main() { 102 // Symmetric example matrix 103 vector<vector<double>> A = { 104 {4.0, 1.0, 2.0}, 105 {1.0, 3.0, 0.5}, 106 {2.0, 0.5, 1.0} 107 }; 108 109 JacobiResult J = jacobi_eigen(A, 1e-12, 200); 110 111 // Print eigenvalues (one line) 112 cout.setf(std::ios::fixed); cout<<setprecision(10); 113 for (size_t i = 0; i < J.eigenvalues.size(); ++i) { 114 if (i) cout << ' '; 115 cout << J.eigenvalues[i]; 116 } 117 cout << '\n'; 118 119 // Verify A \approx Q * Lambda * Q^T by Frobenius norm of difference 120 vector<vector<double>> Arec = reconstruct(J.eigenvalues, J.eigenvectors); 121 double err = 0.0; 122 for (size_t i = 0; i < A.size(); ++i) 123 for (size_t j = 0; j < A.size(); ++j) { 124 double d = A[i][j] - Arec[i][j]; err += d * d; 125 } 126 cout << sqrt(err) << '\n'; 127 128 return 0; 129 } 130
This program implements the classic Jacobi rotation method for real symmetric matrices, producing all eigenvalues and an orthonormal eigenvector matrix Q such that A â QÎQ^T. Each sweep zeros the largest off-diagonal entry via a plane rotation, and Q accumulates the rotations. The reconstruction error checks the spectral theorem numerically.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute PageRank vector using power iteration with damping. 5 // Graph given as adjacency list of outgoing edges. 6 vector<double> pagerank(const vector<vector<int>>& out, double damping = 0.85, int max_iters = 100, double tol = 1e-12) { 7 int n = (int)out.size(); 8 vector<int> outdeg(n, 0); 9 for (int u = 0; u < n; ++u) outdeg[u] = (int)out[u].size(); 10 11 vector<double> r(n, 1.0 / n); // initial rank (probabilities sum to 1) 12 vector<double> rnext(n, 0.0); 13 14 for (int it = 0; it < max_iters; ++it) { 15 fill(rnext.begin(), rnext.end(), 0.0); 16 double dangling_mass = 0.0; 17 for (int j = 0; j < n; ++j) if (outdeg[j] == 0) dangling_mass += r[j]; 18 19 // Distribute rank mass along edges (column-stochastic implicit) 20 for (int j = 0; j < n; ++j) { 21 if (outdeg[j] == 0) continue; 22 double share = r[j] / outdeg[j]; 23 for (int v : out[j]) rnext[v] += damping * share; 24 } 25 26 // Add damping teleport and dangling contribution 27 double teleport = (1.0 - damping) / n; 28 double dangling_share = damping * dangling_mass / n; 29 for (int i = 0; i < n; ++i) rnext[i] += teleport + dangling_share; 30 31 // L1 normalization (should already sum to 1 numerically) 32 double s = 0.0; for (double x : rnext) s += x; if (s > 0) for (double& x : rnext) x /= s; 33 34 // Check convergence by L1 difference 35 double diff = 0.0; for (int i = 0; i < n; ++i) diff += fabs(rnext[i] - r[i]); 36 r.swap(rnext); 37 if (diff < tol) break; 38 } 39 return r; 40 } 41 42 int main() { 43 // Example small web: 0->1,2 ; 1->2 ; 2->0 ; 3->2 44 vector<vector<int>> out = { 45 {1,2}, // 0 46 {2}, // 1 47 {0}, // 2 48 {} // 3 (dangling) 49 }; 50 51 vector<double> pr = pagerank(out, 0.85, 1000, 1e-14); 52 53 cout.setf(std::ios::fixed); cout<<setprecision(12); 54 for (size_t i = 0; i < pr.size(); ++i) { 55 if (i) cout << ' '; 56 cout << pr[i]; 57 } 58 cout << '\n'; 59 return 0; 60 } 61
This code computes PageRank as the dominant left eigenvector (eigenvalue 1) of a stochastic transition operator with damping. It uses sparse matrixâvector multiplication via adjacency lists, accounts for dangling nodes by redistributing their rank uniformly, and normalizes to keep a valid probability vector.