📚TheoryIntermediate

Singular Value Decomposition (SVD)

Key Points

  • Singular Value Decomposition (SVD) factors any m×n matrix A into , where U and V are orthogonal and Σ is diagonal with nonnegative entries.
  • The columns of U and V are orthonormal bases of the output and input spaces, and the diagonal entries of Σ are the singular values, sorted from largest to smallest.
  • SVD connects to eigenvalues: and A, so singular values are square roots of those eigenvalues.
  • Truncated SVD keeps only the top k singular values/vectors and gives the best rank-k approximation in both Frobenius and spectral norms.
  • SVD powers PCA: principal components are the right singular vectors of centered data, and singular values relate to explained variance.
  • The Moore–Penrose pseudoinverse is it solves least squares and stabilizes ill-conditioned problems.
  • Computational complexity of full SVD is O(mn\,(m,n)); approximate methods (power iteration/randomized SVD) can be much faster for low-rank structure.
  • SVD is widely used in ML for dimensionality reduction, recommender systems, topic modeling (LSA), and matrix completion.

Prerequisites

  • Vectors and matricesUnderstand matrix dimensions, multiplication, transposition, and norms to follow SVD definitions and code.
  • Orthogonality and dot productsSVD relies on orthonormal vectors and properties of orthogonal matrices.
  • Eigenvalues and eigenvectorsSVD is linked to eigen-decomposition via A^{T}A and AA^{T}.
  • Least squares optimizationSVD is used to solve min_x ||Ax − b||_2 and to define the pseudoinverse.
  • Numerical stability and conditioningThresholding small singular values and avoiding A^{T}A require understanding conditioning.
  • Basic C++ programmingImplementations require working with vectors, loops, and numerical computations in C++.

Detailed Explanation

Tap terms for definitions

01Overview

Singular Value Decomposition (SVD) is a fundamental matrix factorization that expresses any real m×n matrix A as A = UΣV^{T}. Here U (m×m) and V (n×n) are orthogonal (their columns form orthonormal bases), and Σ (m×n) is diagonal with nonnegative entries called singular values. Intuitively, SVD rotates and stretches space: V rotates input coordinates, Σ stretches them along axis-aligned directions, and U rotates to the output coordinates. The singular values indicate how much A amplifies vectors in specific directions, while the singular vectors (columns of U and V) give those directions. SVD exists for every matrix, square or rectangular, and is numerically stable, making it a workhorse in scientific computing.

SVD bridges many concepts: it links to eigen-decomposition through A^{T}A = VΣ^{2}V^{T} and AA^{T} = UΣ^{2}U^{T}; it provides the Moore–Penrose pseudoinverse A^{+} = VΣ^{+}U^{T}; and it yields the best low-rank approximation via truncated SVD, which underlies PCA and data compression. In machine learning, SVD compresses data, removes noise, and exposes latent structure (e.g., topics in documents, user/item factors in recommender systems). Although exact SVD has cubic-like cost in the smaller dimension, approximate methods (power iteration, Lanczos, randomized SVD) efficiently recover the top k components when data are approximately low rank.

02Intuition & Analogies

Imagine A as a machine that takes an input arrow (vector) and outputs another arrow. Most linear machines twist and stretch arrows in complicated ways. SVD says: regardless of how messy that twist is, it’s always equivalent to three simple steps: (1) rotate the input arrows so they align with special directions, (2) stretch or shrink each aligned arrow independently, and (3) rotate again to the final orientation. The first rotation is V^{T}, the stretching is Σ, and the final rotation is U.

Think of a sheet of rubber with a printed grid. Applying A is like grabbing the sheet, rotating it, and pulling it so some grid lines get longer while others shorten, then rotating again. The grid directions that get purely stretched (no shearing) are the singular directions; the stretch amounts are the singular values. Large singular values correspond to directions where A keeps or amplifies information; small ones correspond to directions where A squashes information and potentially loses detail.

For data, picture points in high-dimensional space (rows are samples). PCA via SVD finds the most “energetic” directions along which the cloud of points varies. Keeping only the top few directions (truncated SVD) is like projecting the cloud onto a low-dimensional plane that preserves most variance, simplifying the data while discarding noise. In text analysis (LSA), the document–term matrix’s SVD uncovers latent topics: words and documents align along a few dominant directions that summarize co-occurrence patterns. In recommender systems, user–item ratings form a matrix whose dominant singular directions capture user tastes and item attributes, enabling predictions for missing entries.

03Formal Definition

For A , an SVD is A = U , where U and V are orthogonal (, ), and is diagonal with > 0 and remaining zeros, where r = (A). The columns of U are left singular vectors, the columns of V are right singular vectors, and the are singular values. They satisfy A = and = . Equivalently, and A thus the nonzero singular values \{\} are the square roots of the nonzero eigenvalues of both A and , and (resp. ) are their corresponding eigenvectors. The reduced (or thin) SVD keeps only the first r columns: A = V_ with , , . The Moore–Penrose pseudoinverse is . The best rank-k approximation (Eckart–Young–Mirsky theorem) is = V_, minimizing error in both spectral norm and Frobenius norm over all rank-k matrices.

04When to Use

  • Dimensionality reduction: When data lie near a low-dimensional subspace (e.g., images, text embeddings), truncated SVD/PCA preserves most variance with fewer features.
  • Noise reduction/denoising: When small singular values mostly reflect noise, truncation improves signal-to-noise ratio in imaging, genomics, and sensor data.
  • Least squares and pseudoinverse: For overdetermined or ill-conditioned systems, SVD-based A^{+} yields the minimum-norm least-squares solution and allows stable regularization (thresholding small \sigma_i).
  • Recommender systems and matrix completion: User–item matrices are often approximately low rank; SVD (or variants) extracts latent factors for prediction.
  • Topic modeling (LSA/LSI): SVD on document–term matrices uncovers latent semantic dimensions.
  • Compression: Store only U_k, \Sigma_k, V_k^{\top} for images/audio to reduce storage while controlling reconstruction error.
  • Understanding models: In deep learning, analyzing Jacobians/weight matrices via singular spectra reveals conditioning and training dynamics. Use exact SVD for small/medium matrices or when full spectrum is needed; use approximate/top-k methods (power iteration, Lanczos, randomized SVD) for large sparse or tall/skinny matrices when only leading components matter.

⚠️Common Mistakes

  • Confusing eigenvalues with singular values: For non-symmetric A, eigenvalues can be complex or negative; singular values are always nonnegative and come from A^{\top}A or AA^{\top}.
  • Skipping centering in PCA: PCA on raw features without subtracting the mean biases principal components; always center columns (and optionally scale) before SVD for PCA.
  • Not sorting/truncating properly: Ensure singular values are sorted and that U_k, \Sigma_k, V_k correspond to the same top-k indices; mixing indices leads to incorrect reconstructions.
  • Ignoring numerical thresholds: Inverting very small \sigma_i explodes numerical errors; use a tolerance (e.g., \tau = \epsilon\cdot\max(m,n)\cdot\sigma_1) to set small singular values to zero in \Sigma^{+}.
  • Forming A^{\top}A explicitly: This squares the condition number and can lose accuracy; prefer algorithms that avoid explicit A^{\top}A for numerical stability when possible.
  • Mismanaging shapes: Remember U is m×m (or m×r), \Sigma is m×n, V is n×n (or n×r). Reconstruction uses U\Sigma V^{\top}; forgetting the transpose or using mismatched dimensions causes runtime errors.
  • Overusing full SVD on huge data: Full SVD is O(mn\min(m,n)); choose top-k approximations or randomized methods when only leading components are needed.

Key Formulas

SVD factorization

Explanation: Any real matrix A can be factored into an orthogonal–diagonal–orthogonal product. U and V rotate, while Σ stretches along coordinate axes.

Link to eigendecomposition

Explanation: The squares of singular values are eigenvalues of A and . Right and left singular vectors are corresponding eigenvectors.

Singular vector relations

Explanation: Each right singular vector maps to its left counterpart scaled by its singular value, and vice versa for .

Truncated SVD (rank-k)

Explanation: The best rank-k approximation keeps the top k singular triplets and discards the rest, yielding a sum of k rank-1 matrices.

Eckart–Young–Mirsky error

Explanation: The spectral norm error equals the next singular value; the Frobenius error is the root-sum-of-squares of discarded singular values.

Moore–Penrose pseudoinverse

Explanation: Invert nonzero singular values (take reciprocals) and swap dimensions to construct the pseudoinverse, enabling least-squares solutions.

Least-squares via SVD

Explanation: The minimum-norm solution to mi \_2 is obtained by applying , scaling by 1/, then applying V.

Explained variance (PCA)

Explanation: For centered data, the fraction of total variance captured by the first k principal components equals the sum of squared top singular values divided by the total.

2-norm condition number

Explanation: Measures sensitivity of linear solves; a large ratio indicates ill-conditioning and potential numerical instability.

SVD complexity

Explanation: Classical algorithms reduce A to bidiagonal form and then diagonalize, leading to cubic-like cost in the smaller dimension.

Complexity Analysis

Exact SVD of an m×n matrix typically costs O(mn min(m,n)) time and O(mn) space to store U, Σ, and V in dense form. Practical algorithms (Golub–Kahan bidiagonalization + QR/Divide-and-Conquer) first reduce A to bidiagonal form in O(mn min(m,n)), then compute the SVD of the bidiagonal matrix in O(min(m,n)^3). Memory usage is dominated by storing A and (optionally) U and V; forming the full factors requires O( + ) additional space if both orthogonal matrices are built explicitly. When only the leading k singular values/vectors are needed (k ≪ min(m,n)), iterative methods reduce cost dramatically. A simple power iteration/deflation approach computes the dominant singular triplet in O(T·mn) time, where T is the number of iterations, and extends to k components in O(k·T·mn). Lanczos bidiagonalization and randomized SVD further improve constants and often converge in O((m+n)k·T) for sparse matrices by exploiting matrix–vector multiplies. Space usage for top-k methods is O((m+n)k) to store , Σ_k, , plus temporary vectors. Numerically, forming A explicitly can square the condition number and lose precision. Algorithms that avoid explicit A (Golub–Kahan, Lanczos bidiagonalization) maintain better stability. For pseudoinverses and least squares, truncating or dampening small singular values (threshold τ) prevents division by near-zero values and improves robustness, at the expense of introducing bias that often reduces variance in noisy settings (akin to regularization).

Code Examples

Top-k SVD via power iteration with deflation (educational, dense)
1#include <bits/stdc++.h>
2using namespace std;
3
4using Matrix = vector<vector<double>>;
5using Vector = vector<double>;
6
7// Utility: create m x n matrix filled with val
8Matrix zeros(int m, int n, double val = 0.0) {
9 return Matrix(m, vector<double>(n, val));
10}
11
12// Multiply A (m x n) by x (n)
13Vector matVec(const Matrix &A, const Vector &x) {
14 int m = (int)A.size(), n = (int)A[0].size();
15 Vector y(m, 0.0);
16 for (int i = 0; i < m; ++i) {
17 double s = 0.0;
18 for (int j = 0; j < n; ++j) s += A[i][j] * x[j];
19 y[i] = s;
20 }
21 return y;
22}
23
24// Multiply A^T (n x m) by y (m)
25Vector matTVec(const Matrix &A, const Vector &y) {
26 int m = (int)A.size(), n = (int)A[0].size();
27 Vector x(n, 0.0);
28 for (int j = 0; j < n; ++j) {
29 double s = 0.0;
30 for (int i = 0; i < m; ++i) s += A[i][j] * y[i];
31 x[j] = s;
32 }
33 return x;
34}
35
36// y <- y - alpha * x
37void axpy(Vector &y, const Vector &x, double alpha) {
38 for (size_t i = 0; i < y.size(); ++i) y[i] -= alpha * x[i];
39}
40
41// dot product
42double dot(const Vector &a, const Vector &b) {
43 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
44}
45
46// 2-norm
47double norm2(const Vector &x) {
48 return sqrt(max(0.0, dot(x, x)));
49}
50
51// Deflate A by rank-1 component: A <- A - sigma * u * v^T
52void deflate(Matrix &A, const Vector &u, const Vector &v, double sigma) {
53 int m = (int)A.size(), n = (int)A[0].size();
54 for (int i = 0; i < m; ++i) {
55 double ui = u[i] * sigma;
56 for (int j = 0; j < n; ++j) {
57 A[i][j] -= ui * v[j];
58 }
59 }
60}
61
62// Power iteration to approximate dominant right singular vector of A
63// Avoids forming A^T A by using v <- A^T(A v) repeatedly.
64Vector dominantRightSingularVector(const Matrix &A, int iters = 200, double tol = 1e-8) {
65 int n = (int)A[0].size();
66 std::mt19937 rng(42);
67 std::normal_distribution<double> N(0.0, 1.0);
68 Vector v(n);
69 for (int j = 0; j < n; ++j) v[j] = N(rng);
70 double nv = norm2(v); for (int j = 0; j < n; ++j) v[j] /= (nv > 0 ? nv : 1.0);
71
72 double prev_lambda = 0.0;
73 for (int t = 0; t < iters; ++t) {
74 // w = A v
75 Vector w = matVec(A, v);
76 // z = A^T w = (A^T A) v
77 Vector z = matTVec(A, w);
78 double nz = norm2(z);
79 if (nz > 0) for (int j = 0; j < n; ++j) v[j] = z[j] / nz;
80 // Rayleigh quotient for eigenvalue of A^T A (approx lambda = v^T (A^T A) v)
81 double lambda = dot(v, matTVec(A, matVec(A, v)));
82 if (fabs(lambda - prev_lambda) <= tol * max(1.0, fabs(prev_lambda))) break;
83 prev_lambda = lambda;
84 }
85 return v;
86}
87
88// Compute top-k SVD via repeated power iteration + deflation (educational; not production-grade)
89struct SVDResult { Matrix U; Vector S; Matrix V; };
90
91SVDResult topkSVD(Matrix A, int k, int iters = 200, double tol = 1e-8) {
92 int m = (int)A.size(), n = (int)A[0].size();
93 k = min(k, min(m, n));
94 Matrix U = zeros(m, k);
95 Matrix V = zeros(n, k);
96 Vector S(k, 0.0);
97
98 for (int idx = 0; idx < k; ++idx) {
99 // Find dominant right singular vector of current (deflated) A
100 Vector v = dominantRightSingularVector(A, iters, tol);
101 // Compute u = A v / ||A v|| and sigma = ||A v||
102 Vector Av = matVec(A, v);
103 double sigma = norm2(Av);
104 Vector u(m, 0.0);
105 if (sigma > 0) {
106 for (int i = 0; i < m; ++i) u[i] = Av[i] / sigma;
107 } else {
108 // If sigma ~ 0, no more components
109 break;
110 }
111 // Store
112 for (int i = 0; i < m; ++i) U[i][idx] = u[i];
113 for (int j = 0; j < n; ++j) V[j][idx] = v[j];
114 S[idx] = sigma;
115 // Deflate A <- A - sigma * u v^T
116 deflate(A, u, v, sigma);
117 }
118 return {U, S, V};
119}
120
121// Reconstruct A_k = U_k Sigma_k V_k^T
122Matrix reconstruct(const SVDResult &res) {
123 int m = (int)res.U.size();
124 int k = (int)res.S.size();
125 int n = (int)res.V.size();
126 Matrix Ak = zeros(m, n);
127 for (int t = 0; t < k; ++t) {
128 double s = res.S[t];
129 for (int i = 0; i < m; ++i) {
130 double ui = res.U[i][t] * s;
131 for (int j = 0; j < n; ++j) Ak[i][j] += ui * res.V[j][t];
132 }
133 }
134 return Ak;
135}
136
137// Frobenius norm of difference
138double frobError(const Matrix &A, const Matrix &B) {
139 int m = (int)A.size(), n = (int)A[0].size();
140 double s = 0.0;
141 for (int i = 0; i < m; ++i)
142 for (int j = 0; j < n; ++j) {
143 double d = A[i][j] - B[i][j]; s += d * d;
144 }
145 return sqrt(max(0.0, s));
146}
147
148int main() {
149 // Build a synthetic low-rank matrix A = sum_{i=1}^2 sigma_i u_i v_i^T + noise
150 int m = 8, n = 6;
151 Matrix A = zeros(m, n);
152 // Rank-2 structure
153 Vector u1 = {1,0,0,0,1,0,0,0}; double s1 = 5.0; // sparse direction
154 Vector v1 = {1,0,1,0,1,0};
155 Vector u2 = {0,1,0,1,0,1,0,1}; double s2 = 2.5;
156 Vector v2 = {0,1,0,1,0,1};
157 auto addOuter = [&](const Vector &u, const Vector &v, double s){
158 for (int i = 0; i < m; ++i)
159 for (int j = 0; j < n; ++j)
160 A[i][j] += s * u[i] * v[j];
161 };
162 addOuter(u1, v1, s1);
163 addOuter(u2, v2, s2);
164 // Add small noise
165 std::mt19937 rng(123);
166 std::normal_distribution<double> N(0.0, 0.05);
167 for (int i = 0; i < m; ++i)
168 for (int j = 0; j < n; ++j)
169 A[i][j] += N(rng);
170
171 // Compute top-2 SVD
172 SVDResult res = topkSVD(A, 2, 300, 1e-9);
173 Matrix Ak = reconstruct(res);
174
175 cout << fixed << setprecision(6);
176 cout << "Top-2 singular values (approx): ";
177 for (double s : res.S) cout << s << ' '; cout << "\n";
178 cout << "Frobenius reconstruction error ||A - A2||_F: " << frobError(A, Ak) << "\n";
179 return 0;
180}
181

This educational implementation computes the top-k singular values/vectors using power iteration on A^{T}A and rank-1 deflation. For each component, it finds an approximate dominant right singular vector v, computes u = Av / ||Av|| and σ = ||Av||, stores (u, σ, v), and deflates A ← A − σuv^{T}. The result reconstructs A_k and illustrates that a low-rank matrix is well-approximated by its top components. It avoids explicitly forming A^{T}A but still multiplies by A and A^{T}. This is not production-grade (no reorthogonalization or advanced stopping), but it conveys the mechanics of truncated SVD.

Time: O(k · T · m · n), where T is the number of power iterations per component.Space: O(mn) to store A plus O((m+n)k) for U_k, V_k, and σ.
PCA via SVD: center data, compute top principal components, and project
1#include <bits/stdc++.h>
2using namespace std;
3
4using Matrix = vector<vector<double>>;
5using Vector = vector<double>;
6
7// Reuse minimal helpers
8Matrix zeros(int m, int n, double val=0.0){ return Matrix(m, vector<double>(n, val)); }
9Vector matVec(const Matrix &A, const Vector &x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; }
10Vector matTVec(const Matrix &A, const Vector &y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; }
11double dot(const Vector &a,const Vector &b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
12double norm2(const Vector &x){ return sqrt(max(0.0, dot(x,x))); }
13
14Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){
15 int n=A[0].size();
16 std::mt19937 rng(7); std::normal_distribution<double> N(0.0,1.0);
17 Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng);
18 double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0);
19 double prev=0;
20 for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; }
21 return v;
22}
23
24struct SVDResult { Matrix U; Vector S; Matrix V; };
25
26// Simple top-k SVD as in Example 1 (no code duplication here for brevity)
27SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){
28 int m=A.size(), n=A[0].size(); k=min(k, min(m,n));
29 Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0);
30 auto deflate=[&](const Vector &u,const Vector &v,double s){ for(int i=0;i<m;++i){ double ui=u[i]*s; for(int j=0;j<n;++j) A[i][j]-=ui*v[j]; } };
31 for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s); }
32 return {U,S,V};
33}
34
35// Center columns of data matrix X (rows are samples)
36void centerColumns(Matrix &X){ int m=X.size(), n=X[0].size(); for(int j=0;j<n;++j){ double mu=0; for(int i=0;i<m;++i) mu+=X[i][j]; mu/=m; for(int i=0;i<m;++i) X[i][j]-=mu; } }
37
38// Project data onto top-k principal components (right singular vectors)
39Matrix projectOntoPCs(const Matrix &X, const Matrix &V_k){ int m=X.size(), k=V_k[0].size(), n=X[0].size(); Matrix Z(m, vector<double>(k,0.0)); for(int i=0;i<m;++i){ for(int t=0;t<k;++t){ double s=0; for(int j=0;j<n;++j) s+=X[i][j]*V_k[j][t]; Z[i][t]=s; } } return Z; }
40
41int main(){
42 // Toy dataset: 10 samples, 5 features with strong correlation along two directions
43 int m=10, n=5; Matrix X(m, vector<double>(n,0.0));
44 std::mt19937 rng(123); std::normal_distribution<double> N(0.0,1.0);
45 // Generate 2D latent factors, map to 5D via fixed matrix W, add noise
46 Matrix W = {{1,0}, {0.8,0.2}, {0.6,0.4}, {0.2,0.8}, {0,1}}; // 5x2
47 for(int i=0;i<m;++i){ double z1=N(rng), z2=0.5*N(rng); for(int j=0;j<n;++j){ X[i][j] = W[j][0]*z1 + W[j][1]*z2 + 0.05*N(rng); } }
48
49 // Center columns (critical for PCA)
50 centerColumns(X);
51
52 // Compute top-2 SVD on centered data to get principal axes (right singular vectors)
53 int k=2; SVDResult res = topkSVD(X, k, 300, 1e-9);
54
55 // Project samples onto PCs (scores)
56 Matrix Z = projectOntoPCs(X, res.V); // m x k
57
58 // Compute explained variance ratio using singular values
59 double total=0, top=0; for(double s: res.S){ total += s*s; } for(int t=0;t<k && t<(int)res.S.size(); ++t) top += res.S[t]*res.S[t]; double evr = top / total;
60
61 cout<<fixed<<setprecision(6);
62 cout<<"Top-"<<k<<" singular values (approx): "; for(int t=0;t<k;++t) cout<<res.S[t]<<" "; cout<<"\n";
63 cout<<"Explained variance ratio (approx): "<<evr<<"\n";
64 cout<<"First 3 projected samples onto PCs:\n";
65 for(int i=0;i<min(3,m);++i){ for(int t=0;t<k;++t) cout<<Z[i][t]<<" "; cout<<"\n"; }
66 return 0;
67}
68

This example performs PCA with SVD: it centers the data matrix (rows are samples), computes the top-k right singular vectors (principal axes), and projects each sample onto these directions to obtain low-dimensional representations (scores). The squared singular values quantify explained variance, so their ratios report how much variance is retained by k components.

Time: O(k · T · m · n) for top-k SVD plus O(mn) for centering and projection.Space: O(mn) for data plus O((m+n)k) for components and scores.
Least squares via SVD pseudoinverse with thresholding
1#include <bits/stdc++.h>
2using namespace std;
3
4using Matrix = vector<vector<double>>; using Vector = vector<double>;
5
6Matrix zeros(int m,int n,double val=0.0){ return Matrix(m, vector<double>(n,val)); }
7Vector matVec(const Matrix&A,const Vector&x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; }
8Vector matTVec(const Matrix&A,const Vector&y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; }
9double dot(const Vector&a,const Vector&b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
10double norm2(const Vector&x){ return sqrt(max(0.0, dot(x,x))); }
11
12Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){ int n=A[0].size(); mt19937 rng(99); normal_distribution<double> N(0,1); Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng); double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0); double prev=0; for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; } return v; }
13
14struct SVDResult{ Matrix U; Vector S; Matrix V; };
15SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){ int m=A.size(), n=A[0].size(); k=min(k, min(m,n)); Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0); auto deflate=[&](const Vector&u,const Vector&v,double s){ for(int i=0;i<m;++i){ double ui=u[i]*s; for(int j=0;j<n;++j) A[i][j]-=ui*v[j]; } }; for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s);} return {U,S,V}; }
16
17// Compute x = A^+ b approximately using top-k SVD with thresholding on singular values
18Vector leastSquaresSVD(const Matrix &A, const Vector &b, int k, double relTol=1e-12){
19 int m=A.size(), n=A[0].size();
20 SVDResult svd = topkSVD(A, k, 400, 1e-10);
21 // Compute y = U^T b
22 int r = svd.S.size();
23 Vector y(r, 0.0);
24 for(int t=0;t<r;++t){ double s=0; for(int i=0;i<m;++i) s += svd.U[i][t]*b[i]; y[t]=s; }
25 // Threshold singular values to avoid division by near-zero
26 double sigma_max = 0.0; for(double s: svd.S) sigma_max = max(sigma_max, s);
27 double tau = relTol * max(m,n) * sigma_max;
28 // z_t = y_t / sigma_t if sigma_t > tau else 0
29 Vector z(r, 0.0);
30 for(int t=0;t<r;++t){ double s = svd.S[t]; if(s > tau) z[t] = y[t] / s; else z[t] = 0.0; }
31 // x = V z (since A^+ b = V Σ^+ U^T b)
32 Vector x(n, 0.0);
33 for(int j=0;j<n;++j){ double s=0; for(int t=0;t<r;++t) s += svd.V[j][t]*z[t]; x[j]=s; }
34 return x;
35}
36
37int main(){
38 // Overdetermined system Ax ≈ b with noise
39 int m=20, n=3; Matrix A(m, vector<double>(n,0.0)); Vector true_x = {2.0, -1.0, 0.5};
40 std::mt19937 rng(2024); std::uniform_real_distribution<double> U(-1.0, 1.0); std::normal_distribution<double> N(0.0, 0.02);
41 for(int i=0;i<m;++i){ for(int j=0;j<n;++j) A[i][j] = U(rng); }
42 Vector b(m, 0.0); for(int i=0;i<m;++i){ for(int j=0;j<n;++j) b[i]+=A[i][j]*true_x[j]; b[i]+=N(rng); }
43
44 // Solve with top-k SVD (k = n is fine here)
45 Vector x = leastSquaresSVD(A, b, /*k=*/n, /*relTol=*/1e-12);
46
47 cout<<fixed<<setprecision(6);
48 cout<<"Estimated x: "; for(double xi: x) cout<<xi<<" "; cout<<"\n";
49 return 0;
50}
51

This program solves min_x ||Ax − b||_2 using a truncated SVD-based pseudoinverse. It computes (approximate) U, Σ, V, thresholds small singular values to avoid instability, and returns x = VΣ^{+}U^{T}b. On noisy overdetermined problems, this approach is stable and near-optimal; with stricter thresholding it behaves like truncated SVD regularization.

Time: O(k · T · m · n) for the SVD plus O((m+n)k) for projections and reconstruction.Space: O(mn) for A plus O((m+n)k) for factors and temporaries.