Singular Value Decomposition (SVD)
Key Points
- •Singular Value Decomposition (SVD) factors any m×n matrix A into , where U and V are orthogonal and Σ is diagonal with nonnegative entries.
- •The columns of U and V are orthonormal bases of the output and input spaces, and the diagonal entries of Σ are the singular values, sorted from largest to smallest.
- •SVD connects to eigenvalues: and A, so singular values are square roots of those eigenvalues.
- •Truncated SVD keeps only the top k singular values/vectors and gives the best rank-k approximation in both Frobenius and spectral norms.
- •SVD powers PCA: principal components are the right singular vectors of centered data, and singular values relate to explained variance.
- •The Moore–Penrose pseudoinverse is it solves least squares and stabilizes ill-conditioned problems.
- •Computational complexity of full SVD is O(mn\,(m,n)); approximate methods (power iteration/randomized SVD) can be much faster for low-rank structure.
- •SVD is widely used in ML for dimensionality reduction, recommender systems, topic modeling (LSA), and matrix completion.
Prerequisites
- →Vectors and matrices — Understand matrix dimensions, multiplication, transposition, and norms to follow SVD definitions and code.
- →Orthogonality and dot products — SVD relies on orthonormal vectors and properties of orthogonal matrices.
- →Eigenvalues and eigenvectors — SVD is linked to eigen-decomposition via A^{T}A and AA^{T}.
- →Least squares optimization — SVD is used to solve min_x ||Ax − b||_2 and to define the pseudoinverse.
- →Numerical stability and conditioning — Thresholding small singular values and avoiding A^{T}A require understanding conditioning.
- →Basic C++ programming — Implementations require working with vectors, loops, and numerical computations in C++.
Detailed Explanation
Tap terms for definitions01Overview
Singular Value Decomposition (SVD) is a fundamental matrix factorization that expresses any real m×n matrix A as A = UΣV^{T}. Here U (m×m) and V (n×n) are orthogonal (their columns form orthonormal bases), and Σ (m×n) is diagonal with nonnegative entries called singular values. Intuitively, SVD rotates and stretches space: V rotates input coordinates, Σ stretches them along axis-aligned directions, and U rotates to the output coordinates. The singular values indicate how much A amplifies vectors in specific directions, while the singular vectors (columns of U and V) give those directions. SVD exists for every matrix, square or rectangular, and is numerically stable, making it a workhorse in scientific computing.
SVD bridges many concepts: it links to eigen-decomposition through A^{T}A = VΣ^{2}V^{T} and AA^{T} = UΣ^{2}U^{T}; it provides the Moore–Penrose pseudoinverse A^{+} = VΣ^{+}U^{T}; and it yields the best low-rank approximation via truncated SVD, which underlies PCA and data compression. In machine learning, SVD compresses data, removes noise, and exposes latent structure (e.g., topics in documents, user/item factors in recommender systems). Although exact SVD has cubic-like cost in the smaller dimension, approximate methods (power iteration, Lanczos, randomized SVD) efficiently recover the top k components when data are approximately low rank.
02Intuition & Analogies
Imagine A as a machine that takes an input arrow (vector) and outputs another arrow. Most linear machines twist and stretch arrows in complicated ways. SVD says: regardless of how messy that twist is, it’s always equivalent to three simple steps: (1) rotate the input arrows so they align with special directions, (2) stretch or shrink each aligned arrow independently, and (3) rotate again to the final orientation. The first rotation is V^{T}, the stretching is Σ, and the final rotation is U.
Think of a sheet of rubber with a printed grid. Applying A is like grabbing the sheet, rotating it, and pulling it so some grid lines get longer while others shorten, then rotating again. The grid directions that get purely stretched (no shearing) are the singular directions; the stretch amounts are the singular values. Large singular values correspond to directions where A keeps or amplifies information; small ones correspond to directions where A squashes information and potentially loses detail.
For data, picture points in high-dimensional space (rows are samples). PCA via SVD finds the most “energetic” directions along which the cloud of points varies. Keeping only the top few directions (truncated SVD) is like projecting the cloud onto a low-dimensional plane that preserves most variance, simplifying the data while discarding noise. In text analysis (LSA), the document–term matrix’s SVD uncovers latent topics: words and documents align along a few dominant directions that summarize co-occurrence patterns. In recommender systems, user–item ratings form a matrix whose dominant singular directions capture user tastes and item attributes, enabling predictions for missing entries.
03Formal Definition
04When to Use
- Dimensionality reduction: When data lie near a low-dimensional subspace (e.g., images, text embeddings), truncated SVD/PCA preserves most variance with fewer features.
- Noise reduction/denoising: When small singular values mostly reflect noise, truncation improves signal-to-noise ratio in imaging, genomics, and sensor data.
- Least squares and pseudoinverse: For overdetermined or ill-conditioned systems, SVD-based A^{+} yields the minimum-norm least-squares solution and allows stable regularization (thresholding small \sigma_i).
- Recommender systems and matrix completion: User–item matrices are often approximately low rank; SVD (or variants) extracts latent factors for prediction.
- Topic modeling (LSA/LSI): SVD on document–term matrices uncovers latent semantic dimensions.
- Compression: Store only U_k, \Sigma_k, V_k^{\top} for images/audio to reduce storage while controlling reconstruction error.
- Understanding models: In deep learning, analyzing Jacobians/weight matrices via singular spectra reveals conditioning and training dynamics. Use exact SVD for small/medium matrices or when full spectrum is needed; use approximate/top-k methods (power iteration, Lanczos, randomized SVD) for large sparse or tall/skinny matrices when only leading components matter.
⚠️Common Mistakes
- Confusing eigenvalues with singular values: For non-symmetric A, eigenvalues can be complex or negative; singular values are always nonnegative and come from A^{\top}A or AA^{\top}.
- Skipping centering in PCA: PCA on raw features without subtracting the mean biases principal components; always center columns (and optionally scale) before SVD for PCA.
- Not sorting/truncating properly: Ensure singular values are sorted and that U_k, \Sigma_k, V_k correspond to the same top-k indices; mixing indices leads to incorrect reconstructions.
- Ignoring numerical thresholds: Inverting very small \sigma_i explodes numerical errors; use a tolerance (e.g., \tau = \epsilon\cdot\max(m,n)\cdot\sigma_1) to set small singular values to zero in \Sigma^{+}.
- Forming A^{\top}A explicitly: This squares the condition number and can lose accuracy; prefer algorithms that avoid explicit A^{\top}A for numerical stability when possible.
- Mismanaging shapes: Remember U is m×m (or m×r), \Sigma is m×n, V is n×n (or n×r). Reconstruction uses U\Sigma V^{\top}; forgetting the transpose or using mismatched dimensions causes runtime errors.
- Overusing full SVD on huge data: Full SVD is O(mn\min(m,n)); choose top-k approximations or randomized methods when only leading components are needed.
Key Formulas
SVD factorization
Explanation: Any real matrix A can be factored into an orthogonal–diagonal–orthogonal product. U and V rotate, while Σ stretches along coordinate axes.
Link to eigendecomposition
Explanation: The squares of singular values are eigenvalues of A and . Right and left singular vectors are corresponding eigenvectors.
Singular vector relations
Explanation: Each right singular vector maps to its left counterpart scaled by its singular value, and vice versa for .
Truncated SVD (rank-k)
Explanation: The best rank-k approximation keeps the top k singular triplets and discards the rest, yielding a sum of k rank-1 matrices.
Eckart–Young–Mirsky error
Explanation: The spectral norm error equals the next singular value; the Frobenius error is the root-sum-of-squares of discarded singular values.
Moore–Penrose pseudoinverse
Explanation: Invert nonzero singular values (take reciprocals) and swap dimensions to construct the pseudoinverse, enabling least-squares solutions.
Least-squares via SVD
Explanation: The minimum-norm solution to mi \_2 is obtained by applying , scaling by 1/, then applying V.
Explained variance (PCA)
Explanation: For centered data, the fraction of total variance captured by the first k principal components equals the sum of squared top singular values divided by the total.
2-norm condition number
Explanation: Measures sensitivity of linear solves; a large ratio indicates ill-conditioning and potential numerical instability.
SVD complexity
Explanation: Classical algorithms reduce A to bidiagonal form and then diagonalize, leading to cubic-like cost in the smaller dimension.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 using Matrix = vector<vector<double>>; 5 using Vector = vector<double>; 6 7 // Utility: create m x n matrix filled with val 8 Matrix zeros(int m, int n, double val = 0.0) { 9 return Matrix(m, vector<double>(n, val)); 10 } 11 12 // Multiply A (m x n) by x (n) 13 Vector matVec(const Matrix &A, const Vector &x) { 14 int m = (int)A.size(), n = (int)A[0].size(); 15 Vector y(m, 0.0); 16 for (int i = 0; i < m; ++i) { 17 double s = 0.0; 18 for (int j = 0; j < n; ++j) s += A[i][j] * x[j]; 19 y[i] = s; 20 } 21 return y; 22 } 23 24 // Multiply A^T (n x m) by y (m) 25 Vector matTVec(const Matrix &A, const Vector &y) { 26 int m = (int)A.size(), n = (int)A[0].size(); 27 Vector x(n, 0.0); 28 for (int j = 0; j < n; ++j) { 29 double s = 0.0; 30 for (int i = 0; i < m; ++i) s += A[i][j] * y[i]; 31 x[j] = s; 32 } 33 return x; 34 } 35 36 // y <- y - alpha * x 37 void axpy(Vector &y, const Vector &x, double alpha) { 38 for (size_t i = 0; i < y.size(); ++i) y[i] -= alpha * x[i]; 39 } 40 41 // dot product 42 double dot(const Vector &a, const Vector &b) { 43 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s; 44 } 45 46 // 2-norm 47 double norm2(const Vector &x) { 48 return sqrt(max(0.0, dot(x, x))); 49 } 50 51 // Deflate A by rank-1 component: A <- A - sigma * u * v^T 52 void deflate(Matrix &A, const Vector &u, const Vector &v, double sigma) { 53 int m = (int)A.size(), n = (int)A[0].size(); 54 for (int i = 0; i < m; ++i) { 55 double ui = u[i] * sigma; 56 for (int j = 0; j < n; ++j) { 57 A[i][j] -= ui * v[j]; 58 } 59 } 60 } 61 62 // Power iteration to approximate dominant right singular vector of A 63 // Avoids forming A^T A by using v <- A^T(A v) repeatedly. 64 Vector dominantRightSingularVector(const Matrix &A, int iters = 200, double tol = 1e-8) { 65 int n = (int)A[0].size(); 66 std::mt19937 rng(42); 67 std::normal_distribution<double> N(0.0, 1.0); 68 Vector v(n); 69 for (int j = 0; j < n; ++j) v[j] = N(rng); 70 double nv = norm2(v); for (int j = 0; j < n; ++j) v[j] /= (nv > 0 ? nv : 1.0); 71 72 double prev_lambda = 0.0; 73 for (int t = 0; t < iters; ++t) { 74 // w = A v 75 Vector w = matVec(A, v); 76 // z = A^T w = (A^T A) v 77 Vector z = matTVec(A, w); 78 double nz = norm2(z); 79 if (nz > 0) for (int j = 0; j < n; ++j) v[j] = z[j] / nz; 80 // Rayleigh quotient for eigenvalue of A^T A (approx lambda = v^T (A^T A) v) 81 double lambda = dot(v, matTVec(A, matVec(A, v))); 82 if (fabs(lambda - prev_lambda) <= tol * max(1.0, fabs(prev_lambda))) break; 83 prev_lambda = lambda; 84 } 85 return v; 86 } 87 88 // Compute top-k SVD via repeated power iteration + deflation (educational; not production-grade) 89 struct SVDResult { Matrix U; Vector S; Matrix V; }; 90 91 SVDResult topkSVD(Matrix A, int k, int iters = 200, double tol = 1e-8) { 92 int m = (int)A.size(), n = (int)A[0].size(); 93 k = min(k, min(m, n)); 94 Matrix U = zeros(m, k); 95 Matrix V = zeros(n, k); 96 Vector S(k, 0.0); 97 98 for (int idx = 0; idx < k; ++idx) { 99 // Find dominant right singular vector of current (deflated) A 100 Vector v = dominantRightSingularVector(A, iters, tol); 101 // Compute u = A v / ||A v|| and sigma = ||A v|| 102 Vector Av = matVec(A, v); 103 double sigma = norm2(Av); 104 Vector u(m, 0.0); 105 if (sigma > 0) { 106 for (int i = 0; i < m; ++i) u[i] = Av[i] / sigma; 107 } else { 108 // If sigma ~ 0, no more components 109 break; 110 } 111 // Store 112 for (int i = 0; i < m; ++i) U[i][idx] = u[i]; 113 for (int j = 0; j < n; ++j) V[j][idx] = v[j]; 114 S[idx] = sigma; 115 // Deflate A <- A - sigma * u v^T 116 deflate(A, u, v, sigma); 117 } 118 return {U, S, V}; 119 } 120 121 // Reconstruct A_k = U_k Sigma_k V_k^T 122 Matrix reconstruct(const SVDResult &res) { 123 int m = (int)res.U.size(); 124 int k = (int)res.S.size(); 125 int n = (int)res.V.size(); 126 Matrix Ak = zeros(m, n); 127 for (int t = 0; t < k; ++t) { 128 double s = res.S[t]; 129 for (int i = 0; i < m; ++i) { 130 double ui = res.U[i][t] * s; 131 for (int j = 0; j < n; ++j) Ak[i][j] += ui * res.V[j][t]; 132 } 133 } 134 return Ak; 135 } 136 137 // Frobenius norm of difference 138 double frobError(const Matrix &A, const Matrix &B) { 139 int m = (int)A.size(), n = (int)A[0].size(); 140 double s = 0.0; 141 for (int i = 0; i < m; ++i) 142 for (int j = 0; j < n; ++j) { 143 double d = A[i][j] - B[i][j]; s += d * d; 144 } 145 return sqrt(max(0.0, s)); 146 } 147 148 int main() { 149 // Build a synthetic low-rank matrix A = sum_{i=1}^2 sigma_i u_i v_i^T + noise 150 int m = 8, n = 6; 151 Matrix A = zeros(m, n); 152 // Rank-2 structure 153 Vector u1 = {1,0,0,0,1,0,0,0}; double s1 = 5.0; // sparse direction 154 Vector v1 = {1,0,1,0,1,0}; 155 Vector u2 = {0,1,0,1,0,1,0,1}; double s2 = 2.5; 156 Vector v2 = {0,1,0,1,0,1}; 157 auto addOuter = [&](const Vector &u, const Vector &v, double s){ 158 for (int i = 0; i < m; ++i) 159 for (int j = 0; j < n; ++j) 160 A[i][j] += s * u[i] * v[j]; 161 }; 162 addOuter(u1, v1, s1); 163 addOuter(u2, v2, s2); 164 // Add small noise 165 std::mt19937 rng(123); 166 std::normal_distribution<double> N(0.0, 0.05); 167 for (int i = 0; i < m; ++i) 168 for (int j = 0; j < n; ++j) 169 A[i][j] += N(rng); 170 171 // Compute top-2 SVD 172 SVDResult res = topkSVD(A, 2, 300, 1e-9); 173 Matrix Ak = reconstruct(res); 174 175 cout << fixed << setprecision(6); 176 cout << "Top-2 singular values (approx): "; 177 for (double s : res.S) cout << s << ' '; cout << "\n"; 178 cout << "Frobenius reconstruction error ||A - A2||_F: " << frobError(A, Ak) << "\n"; 179 return 0; 180 } 181
This educational implementation computes the top-k singular values/vectors using power iteration on A^{T}A and rank-1 deflation. For each component, it finds an approximate dominant right singular vector v, computes u = Av / ||Av|| and σ = ||Av||, stores (u, σ, v), and deflates A ← A − σuv^{T}. The result reconstructs A_k and illustrates that a low-rank matrix is well-approximated by its top components. It avoids explicitly forming A^{T}A but still multiplies by A and A^{T}. This is not production-grade (no reorthogonalization or advanced stopping), but it conveys the mechanics of truncated SVD.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 using Matrix = vector<vector<double>>; 5 using Vector = vector<double>; 6 7 // Reuse minimal helpers 8 Matrix zeros(int m, int n, double val=0.0){ return Matrix(m, vector<double>(n, val)); } 9 Vector matVec(const Matrix &A, const Vector &x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; } 10 Vector matTVec(const Matrix &A, const Vector &y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; } 11 double dot(const Vector &a,const Vector &b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; } 12 double norm2(const Vector &x){ return sqrt(max(0.0, dot(x,x))); } 13 14 Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){ 15 int n=A[0].size(); 16 std::mt19937 rng(7); std::normal_distribution<double> N(0.0,1.0); 17 Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng); 18 double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0); 19 double prev=0; 20 for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; } 21 return v; 22 } 23 24 struct SVDResult { Matrix U; Vector S; Matrix V; }; 25 26 // Simple top-k SVD as in Example 1 (no code duplication here for brevity) 27 SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){ 28 int m=A.size(), n=A[0].size(); k=min(k, min(m,n)); 29 Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0); 30 auto deflate=[&](const Vector &u,const Vector &v,double s){ for(int i=0;i<m;++i){ double ui=u[i]*s; for(int j=0;j<n;++j) A[i][j]-=ui*v[j]; } }; 31 for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s); } 32 return {U,S,V}; 33 } 34 35 // Center columns of data matrix X (rows are samples) 36 void centerColumns(Matrix &X){ int m=X.size(), n=X[0].size(); for(int j=0;j<n;++j){ double mu=0; for(int i=0;i<m;++i) mu+=X[i][j]; mu/=m; for(int i=0;i<m;++i) X[i][j]-=mu; } } 37 38 // Project data onto top-k principal components (right singular vectors) 39 Matrix projectOntoPCs(const Matrix &X, const Matrix &V_k){ int m=X.size(), k=V_k[0].size(), n=X[0].size(); Matrix Z(m, vector<double>(k,0.0)); for(int i=0;i<m;++i){ for(int t=0;t<k;++t){ double s=0; for(int j=0;j<n;++j) s+=X[i][j]*V_k[j][t]; Z[i][t]=s; } } return Z; } 40 41 int main(){ 42 // Toy dataset: 10 samples, 5 features with strong correlation along two directions 43 int m=10, n=5; Matrix X(m, vector<double>(n,0.0)); 44 std::mt19937 rng(123); std::normal_distribution<double> N(0.0,1.0); 45 // Generate 2D latent factors, map to 5D via fixed matrix W, add noise 46 Matrix W = {{1,0}, {0.8,0.2}, {0.6,0.4}, {0.2,0.8}, {0,1}}; // 5x2 47 for(int i=0;i<m;++i){ double z1=N(rng), z2=0.5*N(rng); for(int j=0;j<n;++j){ X[i][j] = W[j][0]*z1 + W[j][1]*z2 + 0.05*N(rng); } } 48 49 // Center columns (critical for PCA) 50 centerColumns(X); 51 52 // Compute top-2 SVD on centered data to get principal axes (right singular vectors) 53 int k=2; SVDResult res = topkSVD(X, k, 300, 1e-9); 54 55 // Project samples onto PCs (scores) 56 Matrix Z = projectOntoPCs(X, res.V); // m x k 57 58 // Compute explained variance ratio using singular values 59 double total=0, top=0; for(double s: res.S){ total += s*s; } for(int t=0;t<k && t<(int)res.S.size(); ++t) top += res.S[t]*res.S[t]; double evr = top / total; 60 61 cout<<fixed<<setprecision(6); 62 cout<<"Top-"<<k<<" singular values (approx): "; for(int t=0;t<k;++t) cout<<res.S[t]<<" "; cout<<"\n"; 63 cout<<"Explained variance ratio (approx): "<<evr<<"\n"; 64 cout<<"First 3 projected samples onto PCs:\n"; 65 for(int i=0;i<min(3,m);++i){ for(int t=0;t<k;++t) cout<<Z[i][t]<<" "; cout<<"\n"; } 66 return 0; 67 } 68
This example performs PCA with SVD: it centers the data matrix (rows are samples), computes the top-k right singular vectors (principal axes), and projects each sample onto these directions to obtain low-dimensional representations (scores). The squared singular values quantify explained variance, so their ratios report how much variance is retained by k components.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 using Matrix = vector<vector<double>>; using Vector = vector<double>; 5 6 Matrix zeros(int m,int n,double val=0.0){ return Matrix(m, vector<double>(n,val)); } 7 Vector matVec(const Matrix&A,const Vector&x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; } 8 Vector matTVec(const Matrix&A,const Vector&y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; } 9 double dot(const Vector&a,const Vector&b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; } 10 double norm2(const Vector&x){ return sqrt(max(0.0, dot(x,x))); } 11 12 Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){ int n=A[0].size(); mt19937 rng(99); normal_distribution<double> N(0,1); Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng); double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0); double prev=0; for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; } return v; } 13 14 struct SVDResult{ Matrix U; Vector S; Matrix V; }; 15 SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){ int m=A.size(), n=A[0].size(); k=min(k, min(m,n)); Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0); auto deflate=[&](const Vector&u,const Vector&v,double s){ for(int i=0;i<m;++i){ double ui=u[i]*s; for(int j=0;j<n;++j) A[i][j]-=ui*v[j]; } }; for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s);} return {U,S,V}; } 16 17 // Compute x = A^+ b approximately using top-k SVD with thresholding on singular values 18 Vector leastSquaresSVD(const Matrix &A, const Vector &b, int k, double relTol=1e-12){ 19 int m=A.size(), n=A[0].size(); 20 SVDResult svd = topkSVD(A, k, 400, 1e-10); 21 // Compute y = U^T b 22 int r = svd.S.size(); 23 Vector y(r, 0.0); 24 for(int t=0;t<r;++t){ double s=0; for(int i=0;i<m;++i) s += svd.U[i][t]*b[i]; y[t]=s; } 25 // Threshold singular values to avoid division by near-zero 26 double sigma_max = 0.0; for(double s: svd.S) sigma_max = max(sigma_max, s); 27 double tau = relTol * max(m,n) * sigma_max; 28 // z_t = y_t / sigma_t if sigma_t > tau else 0 29 Vector z(r, 0.0); 30 for(int t=0;t<r;++t){ double s = svd.S[t]; if(s > tau) z[t] = y[t] / s; else z[t] = 0.0; } 31 // x = V z (since A^+ b = V Σ^+ U^T b) 32 Vector x(n, 0.0); 33 for(int j=0;j<n;++j){ double s=0; for(int t=0;t<r;++t) s += svd.V[j][t]*z[t]; x[j]=s; } 34 return x; 35 } 36 37 int main(){ 38 // Overdetermined system Ax ≈ b with noise 39 int m=20, n=3; Matrix A(m, vector<double>(n,0.0)); Vector true_x = {2.0, -1.0, 0.5}; 40 std::mt19937 rng(2024); std::uniform_real_distribution<double> U(-1.0, 1.0); std::normal_distribution<double> N(0.0, 0.02); 41 for(int i=0;i<m;++i){ for(int j=0;j<n;++j) A[i][j] = U(rng); } 42 Vector b(m, 0.0); for(int i=0;i<m;++i){ for(int j=0;j<n;++j) b[i]+=A[i][j]*true_x[j]; b[i]+=N(rng); } 43 44 // Solve with top-k SVD (k = n is fine here) 45 Vector x = leastSquaresSVD(A, b, /*k=*/n, /*relTol=*/1e-12); 46 47 cout<<fixed<<setprecision(6); 48 cout<<"Estimated x: "; for(double xi: x) cout<<xi<<" "; cout<<"\n"; 49 return 0; 50 } 51
This program solves min_x ||Ax − b||_2 using a truncated SVD-based pseudoinverse. It computes (approximate) U, Σ, V, thresholds small singular values to avoid instability, and returns x = VΣ^{+}U^{T}b. On noisy overdetermined problems, this approach is stable and near-optimal; with stricter thresholding it behaves like truncated SVD regularization.