∑MathIntermediate

Low-Rank Approximation

Key Points

•
Low-rank approximation replaces a big matrix with one that has far fewer degrees of freedom while preserving most of its action.
•
The Eckart–Young–Mirsky theorem says the truncated SVD gives the best rank-k approximation under both Frobenius and spectral norms.
•
Keeping only the top k singular values/vectors captures most of the matrix’s “energy,” often enabling huge compression and denoising.
•
Choosing k is guided by the singular value decay, target error tolerance, or a desired energy-retention threshold.
•
Full SVD is expensive for large matrices, but randomized algorithms compute high-quality rank-k approximations much faster.
•
Low-rank ideas underpin PCA, image compression, recommender systems, and fast linear algebra.
•
Errors are predictable: in Frobenius norm, the squared error equals the sum of squares of discarded singular values.
•
In C++, Eigen’s SVD or a randomized range finder makes low-rank approximation straightforward and efficient.

Prerequisites

→Matrices and Linear Mappings — Understanding matrices as operators and notions like rank and subspaces is foundational.
→Matrix and Vector Norms — Eckart–Young’s optimality depends on Frobenius and spectral norms and their properties.
→Singular Value Decomposition — Low-rank approximation is built directly from the SVD and its singular values/vectors.
→QR Factorization and Orthonormalization — Randomized SVD and many algorithms require computing orthonormal bases via QR.
→Floating-Point Stability — SVD computations and truncation involve numerical sensitivity and conditioning.
→Random Projections and Concentration — Randomized algorithms rely on properties of Gaussian (or subgaussian) test matrices.
→Principal Component Analysis (PCA) — PCA is an application of low-rank approximation on centered data matrices.
→Convex Optimization Basics — Nuclear norm regularization and related relaxations appear in low-rank modeling.
→Sparse Linear Algebra — Efficient methods for large sparse matrices avoid dense SVD and exploit sparsity.

Detailed Explanation

Tap terms for definitions

01Overview

Low-rank approximation is the idea of summarizing a large matrix A by another matrix of lower rank that is simpler to store and compute with, yet remains close to A in a chosen norm. At the heart of this concept is the Singular Value Decomposition (SVD), which factors A into UΣV^T where Σ lists singular values in descending order. Truncating this factorization to the top k singular values and vectors produces a rank-k matrix A_k that balances accuracy and simplicity. The Eckart–Young–Mirsky theorem guarantees that this truncated SVD is the best possible rank-k approximation under both the Frobenius norm and the spectral (operator 2-) norm. Practically, this means we can compress data, filter noise, and accelerate computations by working with much smaller factors. Applications abound: reducing image storage while maintaining visual quality, performing Principal Component Analysis (PCA) for dimensionality reduction, computing latent factors in recommendation systems, and approximating large kernel matrices. While full SVD can be computationally heavy for huge matrices, modern randomized algorithms enable fast, high-quality approximations. The key tradeoff is between the approximation rank k (smaller means lighter and faster) and the approximation error (smaller k usually means larger error).

02Intuition & Analogies

Imagine taking a high-resolution photograph (a big matrix of pixel intensities). Most of what makes the photo recognizable—broad shapes and gradients—can be captured by a few dominant patterns, while tiny details and noise contribute less. Low-rank approximation keeps just those dominant patterns. Concretely, SVD discovers a set of “basis images” (columns of U and rows of V^T) and importance weights (singular values). Keeping the top k weights and their basis images yields a compressed photo that still looks good. Another analogy: think of a song represented as many instrument tracks. If only a few instruments carry the melody and harmony strongly, you can reconstruct a convincing version of the song using just those tracks, discarding faint background sounds. Likewise, in a spreadsheet of student grades across many assignments, overall performance might be explained by a few latent factors like mastery and consistency. The low-rank model captures these few underlying drivers, ignoring minor idiosyncrasies. Geometrically, a matrix maps input vectors to outputs. Many real datasets lie close to a low-dimensional subspace; the matrix mainly stretches and rotates within a small number of directions and does little in the remaining ones. The SVD orders these directions by strength. Truncation keeps the most influential directions and sets the rest to zero, which explains the predictable error: it’s exactly what you lose by removing those weaker directions. This is why low-rank approximations both compress (fewer parameters) and denoise (discard weak, often noisy components).

03Formal Definition

Given A

\in

R^{m \times n}

with singular value decomposition

A = U

Σ

V^{⊤}

, where U

\in

R^{m \times m}

, V

\in

R^{n \times n}

are orthogonal and

Σ

diag

(

σ_{1}

σ_{2}

\dots

σ_{r}

,0,

\dots

) with

σ_{1}

\geq

σ_{2}

\geq

\dots

\geq

σ_{r}

> 0 and r =

rank

(A), the best rank-k approximation of A is the truncated SVD

A_k = U_{:

,1:k}

Σ_{1 : k, 1 : k}

V_{:, 1 : k}^{⊤}

. The Eckart–Young–Mirsky theorem states that for any X with

rank

(X)

\leq

k, we have \

∣ A - A_{k} ∥

\leq

∣ A - X ∥

_2 and \

∣ A - A_{k} ∥

\leq

∣ A - X ∥

_F, where \

∣ \cdot ∥

_2 is the spectral norm and \

∣ \cdot ∥

_F is the Frobenius norm. Moreover, the minimal errors are \

∣ A - A_{k} ∥

_2 =

σ_{k + 1}

and \

∣ A - A_{k} ∥

_F =

\sum_{i = k + 1}^{r} σ_{i}^{2}

. Uniqueness holds if

σ_{k}

σ_{k + 1}

(up to signs within singular vectors). The low-rank approximation problem can also be posed as

min_{rank (X) \leq k}

∣ A - X ∥

where the norm is unitarily invariant (e.g., Schatten p-norms); the truncated SVD solves all such problems simultaneously. The approximation compresses A by storing only

U_{:, 1 : k}

Σ_{1 : k, 1 : k}

, and

V_{:, 1 : k}

, requiring O((m+n)k) memory versus mn for the full matrix.

04When to Use

Use low-rank approximation when your matrix exhibits redundancy or when you can tolerate small errors for big gains in speed or storage. Typical cases include: (1) Compression: images, videos, or scientific data where singular values decay quickly; store U_k, \Sigma_k, V_k instead of A. (2) Denoising: signals or datasets with noise largely confined to small singular directions; truncation removes noisy components. (3) Dimensionality reduction: PCA is precisely low-rank approximation of a centered data matrix, enabling visualization, clustering, and regression in lower dimensions. (4) Recommender systems: implicit low-rank structure in user–item matrices yields latent-factor models. (5) Fast linear algebra: approximate preconditioners, fast matrix–vector products, and reduced storage in iterative solvers. (6) Kernel and graph methods: approximate large kernel/graph Laplacian matrices for scalable learning. Prefer truncated or randomized SVD when m,n are large but the target rank k is modest (k \ll \min(m,n)). Use full SVD for small to medium matrices or when exact singular values are needed. If the matrix is sparse or structured, choose algorithms that exploit sparsity (Lanczos/IRL, randomized methods) instead of dense SVD.

⚠️Common Mistakes

Confusing PCA with SVD without centering: PCA requires centering columns of the data matrix. Skipping centering changes the subspace and can mislead conclusions.
Choosing k blindly: Picking k without inspecting the singular value decay or setting a target energy threshold can underfit (too small k) or waste resources (too large k). Plot or examine \sigma_i and cumulative energy.
Ignoring the norm: Eckart–Young guarantees optimality for Frobenius and spectral norms (and more generally unitarily invariant norms). Minimizing entrywise absolute error or other norms may not be solved by truncated SVD.
Recomputing full SVD unnecessarily: For very large matrices and small k, full SVD is overkill. Use truncated (Lanczos/IRL) or randomized SVD for big speedups.
Numerical issues: Not sorting singular values, forgetting to request thin factors, or forming products like U\SigmaV^{\top} with poor scaling can degrade accuracy or performance.
Mishandling sparsity: Converting a large sparse matrix to dense for SVD can exceed memory. Prefer sparse-aware methods.
Overfitting noise in denoising tasks: Setting k too large can reintroduce noise. Use validation, cross-validation, or heuristics (e.g., scree plot) to select k.
Misinterpreting uniqueness: If \sigma_k = \sigma_{k+1}, the optimal subspace is not unique; algorithms may return different but equally optimal bases.

Key Formulas

SVD

A = U Σ V^{⊤}, U^{⊤} U = I, V^{⊤} V = I, Σ = diag (σ_{1}, \dots, σ_{r})

Explanation: Any real matrix factors into orthogonal U and V and a diagonal matrix of singular values in descending order. This decomposition reveals intrinsic directions and strengths.

Truncated SVD

A_{k} = i = 1 \sum k σ_{i} u_{i} v_{i}^{⊤} = U_{:, 1 : k} Σ_{1 : k, 1 : k} V_{:, 1 : k}^{⊤}

Explanation: The best rank-k approximation keeps the top k singular triplets. It is compact and captures the dominant action of A.

Eckart–Young (Frobenius)

rank (X) \leq k min ∥ A - X ∥_{F} = ∥ A - A_{k} ∥_{F} = i = k + 1 \sum r σ_{i}^{2}

Explanation: Among all rank-k matrices, the truncated SVD uniquely minimizes Frobenius error. The squared error equals the sum of squares of discarded singular values.

Eckart–Young (Spectral)

rank (X) \leq k min ∥ A - X ∥_{2} = ∥ A - A_{k} ∥_{2} = σ_{k + 1}

Explanation: The worst-case operator error of $A_{k}$ is exactly the next singular value. No other rank-k matrix can do better in 2-norm.

Frobenius norm via singular values

∥ A ∥_{F}^{2} = i = 1 \sum r σ_{i}^{2}

Explanation: The total energy of a matrix equals the sum of squared singular values. This connects SVD to energy retention criteria.

Best rank-k projector

proj_{rank k} (A) = ar g rank (X) \leq k min ∥ A - X ∥_{F}

Explanation: Conceptually, truncating the SVD is projecting A onto the manifold of rank- $\leq$ k matrices under Frobenius norm.

Randomized SVD pipeline

Y = A Ω, Q = orth (Y), B = Q^{⊤} A, B = \tilde{U} Σ V^{⊤}, U = Q \tilde{U}

Explanation: A random test matrix $Ω$ captures A’s range. Orthonormalizing yields Q; SVD on the small B gives an approximate SVD of A.

Power iteration in randomized SVD

Y = (A A^{⊤})^{q} A Ω

Explanation: Applying q power iterations amplifies the spectral gap, improving approximation accuracy when singular values decay slowly.

Projection error (intuition)

∥ A - Q Q^{⊤} A ∥_{2} \approx σ_{k + 1}

Explanation: If Q spans (nearly) the top-k left singular subspace, projecting A onto Q captures all but the next singular value’s contribution.

Energy fraction

Energy retained (k) = \frac{\sum _{i = 1}^{k} σ _{i}^{2}}{\sum _{i = 1}^{r} σ _{i}^{2}}

Explanation: This fraction quantifies how much of the matrix’s total energy (Frobenius norm squared) is preserved by rank k.

Eckart–Young–Mirsky (general)

rank (X) \leq k min ∥ A - X ∥ = ∥ A - A_{k} ∥ for any unitarily invariant norm ∥ \cdot ∥

Explanation: Truncated SVD minimizes all norms that depend only on singular values, not on singular vectors, e.g., Schatten p-norms.

Nuclear norm regularization

X argmin \frac{1}{2} ∥ A - X ∥_{F}^{2} + λ ∥ X ∥_{*}

Explanation: Convex proxy for promoting low rank, useful when rank constraints are hard; solutions shrink singular values by soft-thresholding.

Complexity Analysis

Let A be m×n with

m \geq n

without loss of generality. Computing a full dense SVD via standard algorithms (e.g., bidiagonalization + QR) costs O(m

n^{2}

) time and O(mn) space to store A plus O((m+n)min(m,n)) for factors. This is expensive when both dimensions are large. However, many applications only need a rank-k approximation with k ≪ min(m,n). Truncating after computing the full SVD still costs O(m

n^{2}

), but using specialized methods reduces the cost dramatically. Truncated Krylov methods

\frac{L an czos}{I R L}

for the top k singular triplets on sparse or structured matrices often run in O(nnz(A)·k) time plus orthogonalization overhead O(

k^{2}

(m+n)), with memory proportional to O(nnz(A) + (m+n)k). For dense matrices, randomized SVD achieves O(mn(k+p) + (m+n)(k+p)^2) time, where p is a small oversampling parameter (e.g., 5–20), and O((m+n)(k+p)) extra space. With q power iterations, the leading term scales like (2q+1)mn(k+p), improving accuracy when the spectral gap is small. The small matrix SVD on B ∈

R^{(k + p) \times n}

costs O(n(k+p)^2). Forming the approximation

A_{k} = U_{k} Σ_{k} V_{k}^{T}

and multiplying by vectors thereafter costs O((m+n)k) storage and O(mk + nk) time per multiply, versus O(mn) storage and O(mn) time for dense A. Error guarantees are crisp: for exact truncated SVD, the spectral error equals

σ_{k + 1}

, and Frobenius error equals (∑_{

i>k}

σ_{i}^{2})^{1/2}

. Randomized methods achieve comparable bounds up to small factors depending on p and q, with high probability. In practice, the dominant costs are matrix–matrix multiplies AΩ and repeated orthonormalizations; careful BLAS-level implementations attain high performance.

Code Examples

Best rank-k approximation via truncated SVD (Eigen)

1 // Compile with: g++ -O2 -std=c++17 -I /usr/include/eigen3 svd_rankk.cpp -o svd_rankk
2 #include <Eigen/Dense>
3 #include <iostream>
4 #include <random>
5 #include <vector>
6 using namespace std;
7 using namespace Eigen;
8 
9 // Utility: compute Frobenius norm of a matrix
10 double frobNorm(const MatrixXd &M) {
11     return M.norm(); // Eigen's .norm() is Frobenius for matrices
12 }
13 
14 int main() {
15     // Example: build a matrix that is approximately rank-2 plus small noise
16     int m = 8, n = 6, k = 2; // target rank k
17     std::mt19937 rng(42);
18     std::normal_distribution<double> gauss(0.0, 1.0);
19 
20     MatrixXd U0 = MatrixXd::NullaryExpr(m, k, [&](){ return gauss(rng); });
21     HouseholderQR<MatrixXd> qrU(U0);
22     MatrixXd U = qrU.householderQ() * MatrixXd::Identity(m, k); // orthonormal m x k
23 
24     MatrixXd V0 = MatrixXd::NullaryExpr(n, k, [&](){ return gauss(rng); });
25     HouseholderQR<MatrixXd> qrV(V0);
26     MatrixXd V = qrV.householderQ() * MatrixXd::Identity(n, k); // orthonormal n x k
27 
28     Vector2d s; s << 10.0, 3.0; // singular values
29     MatrixXd A = U * s.asDiagonal() * V.transpose();
30 
31     // Add small Gaussian noise
32     MatrixXd Noise = MatrixXd::NullaryExpr(m, n, [&](){ return 0.05 * gauss(rng); });
33     A += Noise;
34 
35     // Compute thin SVD of A
36     JacobiSVD<MatrixXd> svd(A, ComputeThinU | ComputeThinV);
37     VectorXd sigma = svd.singularValues();
38     MatrixXd Ufull = svd.matrixU();
39     MatrixXd Vfull = svd.matrixV();
40 
41     // Truncate to rank k
42     MatrixXd Uk = Ufull.leftCols(k);
43     MatrixXd Vk = Vfull.leftCols(k);
44     MatrixXd Sk = sigma.head(k).asDiagonal();
45 
46     // Form A_k = U_k S_k V_k^T
47     MatrixXd Ak = Uk * Sk * Vk.transpose();
48 
49     // Report errors and energy retention
50     double errF = frobNorm(A - Ak);
51     double totalEnergy = sigma.array().square().sum();
52     double keptEnergy = sigma.head(k).array().square().sum();
53     double energyFrac = keptEnergy / totalEnergy;
54 
55     cout << "Matrix size: " << m << "x" << n << "\n";
56     cout << "Target rank k: " << k << "\n";
57     cout << "Top singular values: " << sigma.head(min<int>(5, sigma.size())).transpose() << "\n";
58     cout << "Frobenius error ||A - A_k||_F: " << errF << "\n";
59     cout << "Energy retained: " << 100.0 * energyFrac << "%\n";
60 
61     // Optional: show A and Ak for inspection on small matrices
62     // cout << "A:\n" << A << "\n";
63     // cout << "A_k:\n" << Ak << "\n";
64 
65     return 0;
66 }
67

We synthesize a near rank-2 matrix A by multiplying random orthonormal U and V with chosen singular values, then add small noise. We compute the (thin) SVD of A using Eigen, keep only the top k singular values/vectors, and reconstruct the best rank-k approximation A_k. We then compute the Frobenius error and the fraction of energy retained to illustrate Eckart–Young’s guarantees and the compression tradeoff.

Time: O(m n min(m, n)) for full dense SVD; reconstruction and reporting are O(m n + (m+n)k).Space: O(m n) to store A plus O((m+n)k) for truncated factors.

Randomized SVD (range finder with power iterations)

1 // Compile with: g++ -O2 -std=c++17 -I /usr/include/eigen3 randomized_svd.cpp -o randomized_svd
2 #include <Eigen/Dense>
3 #include <iostream>
4 #include <random>
5 using namespace std;
6 using namespace Eigen;
7 
8 // Generate a standard normal random matrix of size rows x cols
9 MatrixXd gaussianMatrix(int rows, int cols, std::mt19937 &rng) {
10     std::normal_distribution<double> gauss(0.0, 1.0);
11     MatrixXd G(rows, cols);
12     for (int i = 0; i < rows; ++i)
13         for (int j = 0; j < cols; ++j)
14             G(i, j) = gauss(rng);
15     return G;
16 }
17 
18 // Orthonormalize columns using QR and return thin Q
19 MatrixXd orth(const MatrixXd &Y) {
20     HouseholderQR<MatrixXd> qr(Y);
21     int k = min(Y.rows(), Y.cols());
22     return qr.householderQ() * MatrixXd::Identity(Y.rows(), k);
23 }
24 
25 int main() {
26     int m = 100, n = 80; // size of A
27     int true_rank = 5;    // build a low-rank ground-truth matrix
28     int k = 5;            // target rank
29     int p = 10;           // oversampling
30     int q = 1;            // power iterations
31 
32     std::mt19937 rng(123);
33 
34     // Construct A = U S V^T with decaying singular values plus small noise
35     MatrixXd U0 = gaussianMatrix(m, true_rank, rng);
36     MatrixXd U = orth(U0);
37     MatrixXd V0 = gaussianMatrix(n, true_rank, rng);
38     MatrixXd V = orth(V0);
39 
40     VectorXd s(true_rank);
41     for (int i = 0; i < true_rank; ++i) s(i) = pow(0.6, i) * 20.0; // geometric decay
42     MatrixXd A = U * s.asDiagonal() * V.transpose();
43     A += 0.01 * gaussianMatrix(m, n, rng); // small noise
44 
45     // Randomized range finder
46     MatrixXd Omega = gaussianMatrix(n, k + p, rng);          // test matrix
47     MatrixXd Y = A * Omega;                                  // sample the range of A
48 
49     // Power iterations to improve separation
50     for (int i = 0; i < q; ++i) {
51         MatrixXd Z = A.transpose() * Y; // go to co-range
52         Y = A * Z;                      // return to range
53     }
54 
55     MatrixXd Q = orth(Y);                              // orthonormal basis for range(A)
56     MatrixXd B = Q.transpose() * A;                    // small (k+p) x n matrix
57 
58     // SVD on small B
59     JacobiSVD<MatrixXd> svdB(B, ComputeThinU | ComputeThinV);
60     VectorXd sigma = svdB.singularValues();
61     MatrixXd Utilde = svdB.matrixU();
62     MatrixXd Vhat = svdB.matrixV();
63 
64     // Form approximate rank-k SVD of A
65     MatrixXd Uk = (Q * Utilde).leftCols(k);
66     MatrixXd Vk = Vhat.leftCols(k);
67     MatrixXd Sk = sigma.head(k).asDiagonal();
68 
69     MatrixXd Ak = Uk * Sk * Vk.transpose();
70 
71     double errF = (A - Ak).norm();
72     double totalEnergy = (A).squaredNorm(); // equals ||A||_F^2
73     double approxEnergy = (Ak).squaredNorm();
74 
75     cout << "Randomized SVD approximation (m=" << m << ", n=" << n << ")\n";
76     cout << "Target rank k=" << k << ", oversampling p=" << p << ", power iters q=" << q << "\n";
77     cout << "Frobenius error ||A - A_k||_F: " << errF << "\n";
78     cout << "Energy retained (approx): " << 100.0 * (approxEnergy / totalEnergy) << "%\n";
79 
80     return 0;
81 }
82

This implements the standard randomized SVD pipeline: draw a random test matrix, sample A’s range via Y = AΩ, optionally apply q power iterations to enhance separation, orthonormalize to get Q, compute a small SVD of B = Q^T A, and lift back to form an approximate rank-k SVD of A. It achieves high-quality approximations at a fraction of the cost of full SVD when k ≪ min(m,n).

Time: O((2q+1) m n (k+p) + (m+n)(k+p)^2) for dense A; the small SVD on B is O(n (k+p)^2).Space: O((m+n)(k+p)) extra beyond storing A.

1	// Compile with: g++ -O2 -std=c++17 -I /usr/include/eigen3 svd_rankk.cpp -o svd_rankk
2	#include <Eigen/Dense>
3	#include <iostream>
4	#include <random>
5	#include <vector>
6	using namespace std;
7	using namespace Eigen;
8
9	// Utility: compute Frobenius norm of a matrix
10	double frobNorm(const MatrixXd &M) {
11	return M.norm(); // Eigen's .norm() is Frobenius for matrices
12	}
13
14	int main() {
15	// Example: build a matrix that is approximately rank-2 plus small noise
16	int m = 8, n = 6, k = 2; // target rank k
17	std::mt19937 rng(42);
18	std::normal_distribution<double> gauss(0.0, 1.0);
19
20	MatrixXd U0 = MatrixXd::NullaryExpr(m, k, [&](){ return gauss(rng); });
21	HouseholderQR<MatrixXd> qrU(U0);
22	MatrixXd U = qrU.householderQ() * MatrixXd::Identity(m, k); // orthonormal m x k
23
24	MatrixXd V0 = MatrixXd::NullaryExpr(n, k, [&](){ return gauss(rng); });
25	HouseholderQR<MatrixXd> qrV(V0);
26	MatrixXd V = qrV.householderQ() * MatrixXd::Identity(n, k); // orthonormal n x k
27
28	Vector2d s; s << 10.0, 3.0; // singular values
29	MatrixXd A = U * s.asDiagonal() * V.transpose();
30
31	// Add small Gaussian noise
32	MatrixXd Noise = MatrixXd::NullaryExpr(m, n, [&](){ return 0.05 * gauss(rng); });
33	A += Noise;
34
35	// Compute thin SVD of A
36	JacobiSVD<MatrixXd> svd(A, ComputeThinU \| ComputeThinV);
37	VectorXd sigma = svd.singularValues();
38	MatrixXd Ufull = svd.matrixU();
39	MatrixXd Vfull = svd.matrixV();
40
41	// Truncate to rank k
42	MatrixXd Uk = Ufull.leftCols(k);
43	MatrixXd Vk = Vfull.leftCols(k);
44	MatrixXd Sk = sigma.head(k).asDiagonal();
45
46	// Form A_k = U_k S_k V_k^T
47	MatrixXd Ak = Uk * Sk * Vk.transpose();
48
49	// Report errors and energy retention
50	double errF = frobNorm(A - Ak);
51	double totalEnergy = sigma.array().square().sum();
52	double keptEnergy = sigma.head(k).array().square().sum();
53	double energyFrac = keptEnergy / totalEnergy;
54
55	cout << "Matrix size: " << m << "x" << n << "\n";
56	cout << "Target rank k: " << k << "\n";
57	cout << "Top singular values: " << sigma.head(min<int>(5, sigma.size())).transpose() << "\n";
58	cout << "Frobenius error \|\|A - A_k\|\|_F: " << errF << "\n";
59	cout << "Energy retained: " << 100.0 * energyFrac << "%\n";
60
61	// Optional: show A and Ak for inspection on small matrices
62	// cout << "A:\n" << A << "\n";
63	// cout << "A_k:\n" << Ak << "\n";
64
65	return 0;
66	}
67