Low-Rank Approximation
Key Points
- âąLow-rank approximation replaces a big matrix with one that has far fewer degrees of freedom while preserving most of its action.
- âąThe EckartâYoungâMirsky theorem says the truncated SVD gives the best rank-k approximation under both Frobenius and spectral norms.
- âąKeeping only the top k singular values/vectors captures most of the matrixâs âenergy,â often enabling huge compression and denoising.
- âąChoosing k is guided by the singular value decay, target error tolerance, or a desired energy-retention threshold.
- âąFull SVD is expensive for large matrices, but randomized algorithms compute high-quality rank-k approximations much faster.
- âąLow-rank ideas underpin PCA, image compression, recommender systems, and fast linear algebra.
- âąErrors are predictable: in Frobenius norm, the squared error equals the sum of squares of discarded singular values.
- âąIn C++, Eigenâs SVD or a randomized range finder makes low-rank approximation straightforward and efficient.
Prerequisites
- âMatrices and Linear Mappings â Understanding matrices as operators and notions like rank and subspaces is foundational.
- âMatrix and Vector Norms â EckartâYoungâs optimality depends on Frobenius and spectral norms and their properties.
- âSingular Value Decomposition â Low-rank approximation is built directly from the SVD and its singular values/vectors.
- âQR Factorization and Orthonormalization â Randomized SVD and many algorithms require computing orthonormal bases via QR.
- âFloating-Point Stability â SVD computations and truncation involve numerical sensitivity and conditioning.
- âRandom Projections and Concentration â Randomized algorithms rely on properties of Gaussian (or subgaussian) test matrices.
- âPrincipal Component Analysis (PCA) â PCA is an application of low-rank approximation on centered data matrices.
- âConvex Optimization Basics â Nuclear norm regularization and related relaxations appear in low-rank modeling.
- âSparse Linear Algebra â Efficient methods for large sparse matrices avoid dense SVD and exploit sparsity.
Detailed Explanation
Tap terms for definitions01Overview
Low-rank approximation is the idea of summarizing a large matrix A by another matrix of lower rank that is simpler to store and compute with, yet remains close to A in a chosen norm. At the heart of this concept is the Singular Value Decomposition (SVD), which factors A into UÎŁV^T where ÎŁ lists singular values in descending order. Truncating this factorization to the top k singular values and vectors produces a rank-k matrix A_k that balances accuracy and simplicity. The EckartâYoungâMirsky theorem guarantees that this truncated SVD is the best possible rank-k approximation under both the Frobenius norm and the spectral (operator 2-) norm. Practically, this means we can compress data, filter noise, and accelerate computations by working with much smaller factors. Applications abound: reducing image storage while maintaining visual quality, performing Principal Component Analysis (PCA) for dimensionality reduction, computing latent factors in recommendation systems, and approximating large kernel matrices. While full SVD can be computationally heavy for huge matrices, modern randomized algorithms enable fast, high-quality approximations. The key tradeoff is between the approximation rank k (smaller means lighter and faster) and the approximation error (smaller k usually means larger error).
02Intuition & Analogies
Imagine taking a high-resolution photograph (a big matrix of pixel intensities). Most of what makes the photo recognizableâbroad shapes and gradientsâcan be captured by a few dominant patterns, while tiny details and noise contribute less. Low-rank approximation keeps just those dominant patterns. Concretely, SVD discovers a set of âbasis imagesâ (columns of U and rows of V^T) and importance weights (singular values). Keeping the top k weights and their basis images yields a compressed photo that still looks good. Another analogy: think of a song represented as many instrument tracks. If only a few instruments carry the melody and harmony strongly, you can reconstruct a convincing version of the song using just those tracks, discarding faint background sounds. Likewise, in a spreadsheet of student grades across many assignments, overall performance might be explained by a few latent factors like mastery and consistency. The low-rank model captures these few underlying drivers, ignoring minor idiosyncrasies. Geometrically, a matrix maps input vectors to outputs. Many real datasets lie close to a low-dimensional subspace; the matrix mainly stretches and rotates within a small number of directions and does little in the remaining ones. The SVD orders these directions by strength. Truncation keeps the most influential directions and sets the rest to zero, which explains the predictable error: itâs exactly what you lose by removing those weaker directions. This is why low-rank approximations both compress (fewer parameters) and denoise (discard weak, often noisy components).
03Formal Definition
04When to Use
Use low-rank approximation when your matrix exhibits redundancy or when you can tolerate small errors for big gains in speed or storage. Typical cases include: (1) Compression: images, videos, or scientific data where singular values decay quickly; store U_k, \Sigma_k, V_k instead of A. (2) Denoising: signals or datasets with noise largely confined to small singular directions; truncation removes noisy components. (3) Dimensionality reduction: PCA is precisely low-rank approximation of a centered data matrix, enabling visualization, clustering, and regression in lower dimensions. (4) Recommender systems: implicit low-rank structure in userâitem matrices yields latent-factor models. (5) Fast linear algebra: approximate preconditioners, fast matrixâvector products, and reduced storage in iterative solvers. (6) Kernel and graph methods: approximate large kernel/graph Laplacian matrices for scalable learning. Prefer truncated or randomized SVD when m,n are large but the target rank k is modest (k \ll \min(m,n)). Use full SVD for small to medium matrices or when exact singular values are needed. If the matrix is sparse or structured, choose algorithms that exploit sparsity (Lanczos/IRL, randomized methods) instead of dense SVD.
â ïžCommon Mistakes
- Confusing PCA with SVD without centering: PCA requires centering columns of the data matrix. Skipping centering changes the subspace and can mislead conclusions.
- Choosing k blindly: Picking k without inspecting the singular value decay or setting a target energy threshold can underfit (too small k) or waste resources (too large k). Plot or examine \sigma_i and cumulative energy.
- Ignoring the norm: EckartâYoung guarantees optimality for Frobenius and spectral norms (and more generally unitarily invariant norms). Minimizing entrywise absolute error or other norms may not be solved by truncated SVD.
- Recomputing full SVD unnecessarily: For very large matrices and small k, full SVD is overkill. Use truncated (Lanczos/IRL) or randomized SVD for big speedups.
- Numerical issues: Not sorting singular values, forgetting to request thin factors, or forming products like U\SigmaV^{\top} with poor scaling can degrade accuracy or performance.
- Mishandling sparsity: Converting a large sparse matrix to dense for SVD can exceed memory. Prefer sparse-aware methods.
- Overfitting noise in denoising tasks: Setting k too large can reintroduce noise. Use validation, cross-validation, or heuristics (e.g., scree plot) to select k.
- Misinterpreting uniqueness: If \sigma_k = \sigma_{k+1}, the optimal subspace is not unique; algorithms may return different but equally optimal bases.
Key Formulas
SVD
Explanation: Any real matrix factors into orthogonal U and V and a diagonal matrix of singular values in descending order. This decomposition reveals intrinsic directions and strengths.
Truncated SVD
Explanation: The best rank-k approximation keeps the top k singular triplets. It is compact and captures the dominant action of A.
EckartâYoung (Frobenius)
Explanation: Among all rank-k matrices, the truncated SVD uniquely minimizes Frobenius error. The squared error equals the sum of squares of discarded singular values.
EckartâYoung (Spectral)
Explanation: The worst-case operator error of is exactly the next singular value. No other rank-k matrix can do better in 2-norm.
Frobenius norm via singular values
Explanation: The total energy of a matrix equals the sum of squared singular values. This connects SVD to energy retention criteria.
Best rank-k projector
Explanation: Conceptually, truncating the SVD is projecting A onto the manifold of rank- k matrices under Frobenius norm.
Randomized SVD pipeline
Explanation: A random test matrix captures Aâs range. Orthonormalizing yields Q; SVD on the small B gives an approximate SVD of A.
Power iteration in randomized SVD
Explanation: Applying q power iterations amplifies the spectral gap, improving approximation accuracy when singular values decay slowly.
Projection error (intuition)
Explanation: If Q spans (nearly) the top-k left singular subspace, projecting A onto Q captures all but the next singular valueâs contribution.
Energy fraction
Explanation: This fraction quantifies how much of the matrixâs total energy (Frobenius norm squared) is preserved by rank k.
EckartâYoungâMirsky (general)
Explanation: Truncated SVD minimizes all norms that depend only on singular values, not on singular vectors, e.g., Schatten p-norms.
Nuclear norm regularization
Explanation: Convex proxy for promoting low rank, useful when rank constraints are hard; solutions shrink singular values by soft-thresholding.
Complexity Analysis
Code Examples
1 // Compile with: g++ -O2 -std=c++17 -I /usr/include/eigen3 svd_rankk.cpp -o svd_rankk 2 #include <Eigen/Dense> 3 #include <iostream> 4 #include <random> 5 #include <vector> 6 using namespace std; 7 using namespace Eigen; 8 9 // Utility: compute Frobenius norm of a matrix 10 double frobNorm(const MatrixXd &M) { 11 return M.norm(); // Eigen's .norm() is Frobenius for matrices 12 } 13 14 int main() { 15 // Example: build a matrix that is approximately rank-2 plus small noise 16 int m = 8, n = 6, k = 2; // target rank k 17 std::mt19937 rng(42); 18 std::normal_distribution<double> gauss(0.0, 1.0); 19 20 MatrixXd U0 = MatrixXd::NullaryExpr(m, k, [&](){ return gauss(rng); }); 21 HouseholderQR<MatrixXd> qrU(U0); 22 MatrixXd U = qrU.householderQ() * MatrixXd::Identity(m, k); // orthonormal m x k 23 24 MatrixXd V0 = MatrixXd::NullaryExpr(n, k, [&](){ return gauss(rng); }); 25 HouseholderQR<MatrixXd> qrV(V0); 26 MatrixXd V = qrV.householderQ() * MatrixXd::Identity(n, k); // orthonormal n x k 27 28 Vector2d s; s << 10.0, 3.0; // singular values 29 MatrixXd A = U * s.asDiagonal() * V.transpose(); 30 31 // Add small Gaussian noise 32 MatrixXd Noise = MatrixXd::NullaryExpr(m, n, [&](){ return 0.05 * gauss(rng); }); 33 A += Noise; 34 35 // Compute thin SVD of A 36 JacobiSVD<MatrixXd> svd(A, ComputeThinU | ComputeThinV); 37 VectorXd sigma = svd.singularValues(); 38 MatrixXd Ufull = svd.matrixU(); 39 MatrixXd Vfull = svd.matrixV(); 40 41 // Truncate to rank k 42 MatrixXd Uk = Ufull.leftCols(k); 43 MatrixXd Vk = Vfull.leftCols(k); 44 MatrixXd Sk = sigma.head(k).asDiagonal(); 45 46 // Form A_k = U_k S_k V_k^T 47 MatrixXd Ak = Uk * Sk * Vk.transpose(); 48 49 // Report errors and energy retention 50 double errF = frobNorm(A - Ak); 51 double totalEnergy = sigma.array().square().sum(); 52 double keptEnergy = sigma.head(k).array().square().sum(); 53 double energyFrac = keptEnergy / totalEnergy; 54 55 cout << "Matrix size: " << m << "x" << n << "\n"; 56 cout << "Target rank k: " << k << "\n"; 57 cout << "Top singular values: " << sigma.head(min<int>(5, sigma.size())).transpose() << "\n"; 58 cout << "Frobenius error ||A - A_k||_F: " << errF << "\n"; 59 cout << "Energy retained: " << 100.0 * energyFrac << "%\n"; 60 61 // Optional: show A and Ak for inspection on small matrices 62 // cout << "A:\n" << A << "\n"; 63 // cout << "A_k:\n" << Ak << "\n"; 64 65 return 0; 66 } 67
We synthesize a near rank-2 matrix A by multiplying random orthonormal U and V with chosen singular values, then add small noise. We compute the (thin) SVD of A using Eigen, keep only the top k singular values/vectors, and reconstruct the best rank-k approximation A_k. We then compute the Frobenius error and the fraction of energy retained to illustrate EckartâYoungâs guarantees and the compression tradeoff.
1 // Compile with: g++ -O2 -std=c++17 -I /usr/include/eigen3 randomized_svd.cpp -o randomized_svd 2 #include <Eigen/Dense> 3 #include <iostream> 4 #include <random> 5 using namespace std; 6 using namespace Eigen; 7 8 // Generate a standard normal random matrix of size rows x cols 9 MatrixXd gaussianMatrix(int rows, int cols, std::mt19937 &rng) { 10 std::normal_distribution<double> gauss(0.0, 1.0); 11 MatrixXd G(rows, cols); 12 for (int i = 0; i < rows; ++i) 13 for (int j = 0; j < cols; ++j) 14 G(i, j) = gauss(rng); 15 return G; 16 } 17 18 // Orthonormalize columns using QR and return thin Q 19 MatrixXd orth(const MatrixXd &Y) { 20 HouseholderQR<MatrixXd> qr(Y); 21 int k = min(Y.rows(), Y.cols()); 22 return qr.householderQ() * MatrixXd::Identity(Y.rows(), k); 23 } 24 25 int main() { 26 int m = 100, n = 80; // size of A 27 int true_rank = 5; // build a low-rank ground-truth matrix 28 int k = 5; // target rank 29 int p = 10; // oversampling 30 int q = 1; // power iterations 31 32 std::mt19937 rng(123); 33 34 // Construct A = U S V^T with decaying singular values plus small noise 35 MatrixXd U0 = gaussianMatrix(m, true_rank, rng); 36 MatrixXd U = orth(U0); 37 MatrixXd V0 = gaussianMatrix(n, true_rank, rng); 38 MatrixXd V = orth(V0); 39 40 VectorXd s(true_rank); 41 for (int i = 0; i < true_rank; ++i) s(i) = pow(0.6, i) * 20.0; // geometric decay 42 MatrixXd A = U * s.asDiagonal() * V.transpose(); 43 A += 0.01 * gaussianMatrix(m, n, rng); // small noise 44 45 // Randomized range finder 46 MatrixXd Omega = gaussianMatrix(n, k + p, rng); // test matrix 47 MatrixXd Y = A * Omega; // sample the range of A 48 49 // Power iterations to improve separation 50 for (int i = 0; i < q; ++i) { 51 MatrixXd Z = A.transpose() * Y; // go to co-range 52 Y = A * Z; // return to range 53 } 54 55 MatrixXd Q = orth(Y); // orthonormal basis for range(A) 56 MatrixXd B = Q.transpose() * A; // small (k+p) x n matrix 57 58 // SVD on small B 59 JacobiSVD<MatrixXd> svdB(B, ComputeThinU | ComputeThinV); 60 VectorXd sigma = svdB.singularValues(); 61 MatrixXd Utilde = svdB.matrixU(); 62 MatrixXd Vhat = svdB.matrixV(); 63 64 // Form approximate rank-k SVD of A 65 MatrixXd Uk = (Q * Utilde).leftCols(k); 66 MatrixXd Vk = Vhat.leftCols(k); 67 MatrixXd Sk = sigma.head(k).asDiagonal(); 68 69 MatrixXd Ak = Uk * Sk * Vk.transpose(); 70 71 double errF = (A - Ak).norm(); 72 double totalEnergy = (A).squaredNorm(); // equals ||A||_F^2 73 double approxEnergy = (Ak).squaredNorm(); 74 75 cout << "Randomized SVD approximation (m=" << m << ", n=" << n << ")\n"; 76 cout << "Target rank k=" << k << ", oversampling p=" << p << ", power iters q=" << q << "\n"; 77 cout << "Frobenius error ||A - A_k||_F: " << errF << "\n"; 78 cout << "Energy retained (approx): " << 100.0 * (approxEnergy / totalEnergy) << "%\n"; 79 80 return 0; 81 } 82
This implements the standard randomized SVD pipeline: draw a random test matrix, sample Aâs range via Y = AΩ, optionally apply q power iterations to enhance separation, orthonormalize to get Q, compute a small SVD of B = Q^T A, and lift back to form an approximate rank-k SVD of A. It achieves high-quality approximations at a fraction of the cost of full SVD when k âȘ min(m,n).