📚TheoryIntermediate

Singular Value Decomposition (SVD)

Key Points

•
Singular Value Decomposition (SVD) factors any m×n matrix A into $A = U Σ V^{T}$ , where U and V are orthogonal and Σ is diagonal with nonnegative entries.
•
The columns of U and V are orthonormal bases of the output and input spaces, and the diagonal entries of Σ are the singular values, sorted from largest to smallest.
•
SVD connects to eigenvalues: $A^{T} A = V Σ^{2} V^{T}$ and A $A^{T} = U Σ^{2} U^{T}$ , so singular values are square roots of those eigenvalues.
•
Truncated SVD keeps only the top k singular values/vectors and gives the best rank-k approximation in both Frobenius and spectral norms.
•
SVD powers PCA: principal components are the right singular vectors of centered data, and singular values relate to explained variance.
•
The Moore–Penrose pseudoinverse is $A^{+} = V Σ^{+} U^{T};$ it solves least squares and stabilizes ill-conditioned problems.
•
Computational complexity of full SVD is O(mn\, $min$ (m,n)); approximate methods (power iteration/randomized SVD) can be much faster for low-rank structure.
•
SVD is widely used in ML for dimensionality reduction, recommender systems, topic modeling (LSA), and matrix completion.

Prerequisites

→Vectors and matrices — Understand matrix dimensions, multiplication, transposition, and norms to follow SVD definitions and code.
→Orthogonality and dot products — SVD relies on orthonormal vectors and properties of orthogonal matrices.
→Eigenvalues and eigenvectors — SVD is linked to eigen-decomposition via A^{T}A and AA^{T}.
→Least squares optimization — SVD is used to solve min_x ||Ax − b||_2 and to define the pseudoinverse.
→Numerical stability and conditioning — Thresholding small singular values and avoiding A^{T}A require understanding conditioning.
→Basic C++ programming — Implementations require working with vectors, loops, and numerical computations in C++.

Detailed Explanation

Tap terms for definitions

01Overview

Singular Value Decomposition (SVD) is a fundamental matrix factorization that expresses any real m×n matrix A as A = UΣV^{T}. Here U (m×m) and V (n×n) are orthogonal (their columns form orthonormal bases), and Σ (m×n) is diagonal with nonnegative entries called singular values. Intuitively, SVD rotates and stretches space: V rotates input coordinates, Σ stretches them along axis-aligned directions, and U rotates to the output coordinates. The singular values indicate how much A amplifies vectors in specific directions, while the singular vectors (columns of U and V) give those directions. SVD exists for every matrix, square or rectangular, and is numerically stable, making it a workhorse in scientific computing.

SVD bridges many concepts: it links to eigen-decomposition through A^{T}A = VΣ^{2}V^{T} and AA^{T} = UΣ^{2}U^{T}; it provides the Moore–Penrose pseudoinverse A^{+} = VΣ^{+}U^{T}; and it yields the best low-rank approximation via truncated SVD, which underlies PCA and data compression. In machine learning, SVD compresses data, removes noise, and exposes latent structure (e.g., topics in documents, user/item factors in recommender systems). Although exact SVD has cubic-like cost in the smaller dimension, approximate methods (power iteration, Lanczos, randomized SVD) efficiently recover the top k components when data are approximately low rank.

02Intuition & Analogies

Imagine A as a machine that takes an input arrow (vector) and outputs another arrow. Most linear machines twist and stretch arrows in complicated ways. SVD says: regardless of how messy that twist is, it’s always equivalent to three simple steps: (1) rotate the input arrows so they align with special directions, (2) stretch or shrink each aligned arrow independently, and (3) rotate again to the final orientation. The first rotation is V^{T}, the stretching is Σ, and the final rotation is U.

Think of a sheet of rubber with a printed grid. Applying A is like grabbing the sheet, rotating it, and pulling it so some grid lines get longer while others shorten, then rotating again. The grid directions that get purely stretched (no shearing) are the singular directions; the stretch amounts are the singular values. Large singular values correspond to directions where A keeps or amplifies information; small ones correspond to directions where A squashes information and potentially loses detail.

For data, picture points in high-dimensional space (rows are samples). PCA via SVD finds the most “energetic” directions along which the cloud of points varies. Keeping only the top few directions (truncated SVD) is like projecting the cloud onto a low-dimensional plane that preserves most variance, simplifying the data while discarding noise. In text analysis (LSA), the document–term matrix’s SVD uncovers latent topics: words and documents align along a few dominant directions that summarize co-occurrence patterns. In recommender systems, user–item ratings form a matrix whose dominant singular directions capture user tastes and item attributes, enabling predictions for missing entries.

03Formal Definition

For A

\in

R^{m \times n}

, an SVD is A = U

Σ

V^{⊤}

, where U

\in

R^{m \times m}

and V

\in

R^{n \times n}

are orthogonal (

U^{⊤} U = I_{m}

V^{⊤} V = I_{n}

), and

Σ

\in

R^{m \times n}

is diagonal with

σ_{1}

\geq

σ_{2}

\geq

\dots

\geq

σ_{r}

> 0 and remaining zeros, where r =

rank

(A). The columns

u_{i}

of U are left singular vectors, the columns

v_{i}

of V are right singular vectors, and the

σ_{i}

are singular values. They satisfy A

v_{i}

σ_{i}

u_{i}

and

A^{⊤} u_{i}

σ_{i}

v_{i}

. Equivalently,

A^{⊤} A = V Σ^{2} V^{⊤}

and A

A^{⊤} = U Σ^{2} U^{⊤};

thus the nonzero singular values \{

σ_{i}

\} are the square roots of the nonzero eigenvalues of both

A^{⊤}

A and

A A^{⊤}

, and

v_{i}

(resp.

u_{i}

) are their corresponding eigenvectors. The reduced (or thin) SVD keeps only the first r columns: A =

U_{r} Σ_{r}

r^{⊤}

with

U_{r}

\in

R^{m \times r}

Σ_{r}

\in

R^{r \times r}

V_{r}

\in

R^{n \times r}

. The Moore–Penrose pseudoinverse is

A^{+} = V_{r} Σ_{r}^{- 1} U_{r}^{⊤}

. The best rank-k approximation (Eckart–Young–Mirsky theorem) is

A_{k}

U_{k} Σ_{k}

k^{⊤}

, minimizing error in both spectral norm and Frobenius norm over all rank-k matrices.

04When to Use

Dimensionality reduction: When data lie near a low-dimensional subspace (e.g., images, text embeddings), truncated SVD/PCA preserves most variance with fewer features.
Noise reduction/denoising: When small singular values mostly reflect noise, truncation improves signal-to-noise ratio in imaging, genomics, and sensor data.
Least squares and pseudoinverse: For overdetermined or ill-conditioned systems, SVD-based A^{+} yields the minimum-norm least-squares solution and allows stable regularization (thresholding small \sigma_i).
Recommender systems and matrix completion: User–item matrices are often approximately low rank; SVD (or variants) extracts latent factors for prediction.
Topic modeling (LSA/LSI): SVD on document–term matrices uncovers latent semantic dimensions.
Compression: Store only U_k, \Sigma_k, V_k^{\top} for images/audio to reduce storage while controlling reconstruction error.
Understanding models: In deep learning, analyzing Jacobians/weight matrices via singular spectra reveals conditioning and training dynamics. Use exact SVD for small/medium matrices or when full spectrum is needed; use approximate/top-k methods (power iteration, Lanczos, randomized SVD) for large sparse or tall/skinny matrices when only leading components matter.

⚠️Common Mistakes

Confusing eigenvalues with singular values: For non-symmetric A, eigenvalues can be complex or negative; singular values are always nonnegative and come from A^{\top}A or AA^{\top}.
Skipping centering in PCA: PCA on raw features without subtracting the mean biases principal components; always center columns (and optionally scale) before SVD for PCA.
Not sorting/truncating properly: Ensure singular values are sorted and that U_k, \Sigma_k, V_k correspond to the same top-k indices; mixing indices leads to incorrect reconstructions.
Ignoring numerical thresholds: Inverting very small \sigma_i explodes numerical errors; use a tolerance (e.g., \tau = \epsilon\cdot\max(m,n)\cdot\sigma_1) to set small singular values to zero in \Sigma^{+}.
Forming A^{\top}A explicitly: This squares the condition number and can lose accuracy; prefer algorithms that avoid explicit A^{\top}A for numerical stability when possible.
Mismanaging shapes: Remember U is m×m (or m×r), \Sigma is m×n, V is n×n (or n×r). Reconstruction uses U\Sigma V^{\top}; forgetting the transpose or using mismatched dimensions causes runtime errors.
Overusing full SVD on huge data: Full SVD is O(mn\min(m,n)); choose top-k approximations or randomized methods when only leading components are needed.

Key Formulas

SVD factorization

A = U Σ V^{⊤}

Explanation: Any real matrix A can be factored into an orthogonal–diagonal–orthogonal product. U and V rotate, while Σ stretches along coordinate axes.

Link to eigendecomposition

A^{⊤} A = V Σ^{2} V^{⊤}, A A^{⊤} = U Σ^{2} U^{⊤}

Explanation: The squares of singular values are eigenvalues of $A^{⊤}$ A and $A A^{⊤}$ . Right and left singular vectors are corresponding eigenvectors.

Singular vector relations

A v_{i} = σ_{i} u_{i}, A^{⊤} u_{i} = σ_{i} v_{i}

Explanation: Each right singular vector maps to its left counterpart scaled by its singular value, and vice versa for $A^{⊤}$ .

Truncated SVD (rank-k)

A_{k} = i = 1 \sum k σ_{i} u_{i} v_{i}^{⊤}

Explanation: The best rank-k approximation keeps the top k singular triplets and discards the rest, yielding a sum of k rank-1 matrices.

Eckart–Young–Mirsky error

∥ A - A_{k} ∥_{2} = σ_{k + 1}, ∥ A - A_{k} ∥_{F} = i = k + 1 \sum r σ_{i}^{2}

Explanation: The spectral norm error equals the next singular value; the Frobenius error is the root-sum-of-squares of discarded singular values.

Moore–Penrose pseudoinverse

A^{+} = V Σ^{+} U^{⊤}

Explanation: Invert nonzero singular values (take reciprocals) and swap dimensions to construct the pseudoinverse, enabling least-squares solutions.

Least-squares via SVD

x^{⋆} = A^{+} b = V Σ^{+} U^{⊤} b

Explanation: The minimum-norm solution to mi $n_{x}$ \ $∣ A x - b ∥$ _2 is obtained by applying $U^{⊤}$ , scaling by 1/ $σ_{i}$ , then applying V.

Explained variance (PCA)

Var explained by top- k = \frac{\sum _{i = 1}^{k} σ _{i}^{2}}{\sum _{i = 1}^{r} σ _{i}^{2}}

Explanation: For centered data, the fraction of total variance captured by the first k principal components equals the sum of squared top singular values divided by the total.

2-norm condition number

κ_{2} (A) = \frac{σ _{m a x}}{σ _{m i n}}

Explanation: Measures sensitivity of linear solves; a large ratio indicates ill-conditioning and potential numerical instability.

SVD complexity

cost (full SVD) = O (mn min (m, n))

Explanation: Classical algorithms reduce A to bidiagonal form and then diagonalize, leading to cubic-like cost in the smaller dimension.

Complexity Analysis

Exact SVD of an m×n matrix typically costs O(mn min(m,n)) time and O(mn) space to store U, Σ, and V in dense form. Practical algorithms (Golub–Kahan bidiagonalization + QR/Divide-and-Conquer) first reduce A to bidiagonal form in O(mn min(m,n)), then compute the SVD of the bidiagonal matrix in O(min(m,n)^3). Memory usage is dominated by storing A and (optionally) U and V; forming the full factors requires O(

m^{2}

n^{2}

) additional space if both orthogonal matrices are built explicitly. When only the leading k singular values/vectors are needed (k ≪ min(m,n)), iterative methods reduce cost dramatically. A simple power iteration/deflation approach computes the dominant singular triplet in O(T·mn) time, where T is the number of iterations, and extends to k components in O(k·T·mn). Lanczos bidiagonalization and randomized SVD further improve constants and often converge in O((m+n)k·T) for sparse matrices by exploiting matrix–vector multiplies. Space usage for top-k methods is O((m+n)k) to store

U_{k}

, Σ_k,

V_{k}

, plus temporary vectors. Numerically, forming

A^{T}

A explicitly can square the condition number and lose precision. Algorithms that avoid explicit

A^{T}

A (Golub–Kahan, Lanczos bidiagonalization) maintain better stability. For pseudoinverses and least squares, truncating or dampening small singular values (threshold τ) prevents division by near-zero values and improves robustness, at the expense of introducing bias that often reduces variance in noisy settings (akin to regularization).

Code Examples

Top-k SVD via power iteration with deflation (educational, dense)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 using Matrix = vector<vector<double>>;
5 using Vector = vector<double>;
6 
7 // Utility: create m x n matrix filled with val
8 Matrix zeros(int m, int n, double val = 0.0) {
9     return Matrix(m, vector<double>(n, val));
10 }
11 
12 // Multiply A (m x n) by x (n)
13 Vector matVec(const Matrix &A, const Vector &x) {
14     int m = (int)A.size(), n = (int)A[0].size();
15     Vector y(m, 0.0);
16     for (int i = 0; i < m; ++i) {
17         double s = 0.0;
18         for (int j = 0; j < n; ++j) s += A[i][j] * x[j];
19         y[i] = s;
20     }
21     return y;
22 }
23 
24 // Multiply A^T (n x m) by y (m)
25 Vector matTVec(const Matrix &A, const Vector &y) {
26     int m = (int)A.size(), n = (int)A[0].size();
27     Vector x(n, 0.0);
28     for (int j = 0; j < n; ++j) {
29         double s = 0.0;
30         for (int i = 0; i < m; ++i) s += A[i][j] * y[i];
31         x[j] = s;
32     }
33     return x;
34 }
35 
36 // y <- y - alpha * x
37 void axpy(Vector &y, const Vector &x, double alpha) {
38     for (size_t i = 0; i < y.size(); ++i) y[i] -= alpha * x[i];
39 }
40 
41 // dot product
42 double dot(const Vector &a, const Vector &b) {
43     double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
44 }
45 
46 // 2-norm
47 double norm2(const Vector &x) {
48     return sqrt(max(0.0, dot(x, x)));
49 }
50 
51 // Deflate A by rank-1 component: A <- A - sigma * u * v^T
52 void deflate(Matrix &A, const Vector &u, const Vector &v, double sigma) {
53     int m = (int)A.size(), n = (int)A[0].size();
54     for (int i = 0; i < m; ++i) {
55         double ui = u[i] * sigma;
56         for (int j = 0; j < n; ++j) {
57             A[i][j] -= ui * v[j];
58         }
59     }
60 }
61 
62 // Power iteration to approximate dominant right singular vector of A
63 // Avoids forming A^T A by using v <- A^T(A v) repeatedly.
64 Vector dominantRightSingularVector(const Matrix &A, int iters = 200, double tol = 1e-8) {
65     int n = (int)A[0].size();
66     std::mt19937 rng(42);
67     std::normal_distribution<double> N(0.0, 1.0);
68     Vector v(n);
69     for (int j = 0; j < n; ++j) v[j] = N(rng);
70     double nv = norm2(v); for (int j = 0; j < n; ++j) v[j] /= (nv > 0 ? nv : 1.0);
71 
72     double prev_lambda = 0.0;
73     for (int t = 0; t < iters; ++t) {
74         // w = A v
75         Vector w = matVec(A, v);
76         // z = A^T w = (A^T A) v
77         Vector z = matTVec(A, w);
78         double nz = norm2(z);
79         if (nz > 0) for (int j = 0; j < n; ++j) v[j] = z[j] / nz;
80         // Rayleigh quotient for eigenvalue of A^T A (approx lambda = v^T (A^T A) v)
81         double lambda = dot(v, matTVec(A, matVec(A, v)));
82         if (fabs(lambda - prev_lambda) <= tol * max(1.0, fabs(prev_lambda))) break;
83         prev_lambda = lambda;
84     }
85     return v;
86 }
87 
88 // Compute top-k SVD via repeated power iteration + deflation (educational; not production-grade)
89 struct SVDResult { Matrix U; Vector S; Matrix V; };
90 
91 SVDResult topkSVD(Matrix A, int k, int iters = 200, double tol = 1e-8) {
92     int m = (int)A.size(), n = (int)A[0].size();
93     k = min(k, min(m, n));
94     Matrix U = zeros(m, k);
95     Matrix V = zeros(n, k);
96     Vector S(k, 0.0);
97 
98     for (int idx = 0; idx < k; ++idx) {
99         // Find dominant right singular vector of current (deflated) A
100         Vector v = dominantRightSingularVector(A, iters, tol);
101         // Compute u = A v / ||A v|| and sigma = ||A v||
102         Vector Av = matVec(A, v);
103         double sigma = norm2(Av);
104         Vector u(m, 0.0);
105         if (sigma > 0) {
106             for (int i = 0; i < m; ++i) u[i] = Av[i] / sigma;
107         } else {
108             // If sigma ~ 0, no more components
109             break;
110         }
111         // Store
112         for (int i = 0; i < m; ++i) U[i][idx] = u[i];
113         for (int j = 0; j < n; ++j) V[j][idx] = v[j];
114         S[idx] = sigma;
115         // Deflate A <- A - sigma * u v^T
116         deflate(A, u, v, sigma);
117     }
118     return {U, S, V};
119 }
120 
121 // Reconstruct A_k = U_k Sigma_k V_k^T
122 Matrix reconstruct(const SVDResult &res) {
123     int m = (int)res.U.size();
124     int k = (int)res.S.size();
125     int n = (int)res.V.size();
126     Matrix Ak = zeros(m, n);
127     for (int t = 0; t < k; ++t) {
128         double s = res.S[t];
129         for (int i = 0; i < m; ++i) {
130             double ui = res.U[i][t] * s;
131             for (int j = 0; j < n; ++j) Ak[i][j] += ui * res.V[j][t];
132         }
133     }
134     return Ak;
135 }
136 
137 // Frobenius norm of difference
138 double frobError(const Matrix &A, const Matrix &B) {
139     int m = (int)A.size(), n = (int)A[0].size();
140     double s = 0.0;
141     for (int i = 0; i < m; ++i)
142         for (int j = 0; j < n; ++j) {
143             double d = A[i][j] - B[i][j]; s += d * d;
144         }
145     return sqrt(max(0.0, s));
146 }
147 
148 int main() {
149     // Build a synthetic low-rank matrix A = sum_{i=1}^2 sigma_i u_i v_i^T + noise
150     int m = 8, n = 6;
151     Matrix A = zeros(m, n);
152     // Rank-2 structure
153     Vector u1 = {1,0,0,0,1,0,0,0}; double s1 = 5.0; // sparse direction
154     Vector v1 = {1,0,1,0,1,0};
155     Vector u2 = {0,1,0,1,0,1,0,1}; double s2 = 2.5;
156     Vector v2 = {0,1,0,1,0,1};
157     auto addOuter = [&](const Vector &u, const Vector &v, double s){
158         for (int i = 0; i < m; ++i)
159             for (int j = 0; j < n; ++j)
160                 A[i][j] += s * u[i] * v[j];
161     };
162     addOuter(u1, v1, s1);
163     addOuter(u2, v2, s2);
164     // Add small noise
165     std::mt19937 rng(123);
166     std::normal_distribution<double> N(0.0, 0.05);
167     for (int i = 0; i < m; ++i)
168         for (int j = 0; j < n; ++j)
169             A[i][j] += N(rng);
170 
171     // Compute top-2 SVD
172     SVDResult res = topkSVD(A, 2, 300, 1e-9);
173     Matrix Ak = reconstruct(res);
174 
175     cout << fixed << setprecision(6);
176     cout << "Top-2 singular values (approx): ";
177     for (double s : res.S) cout << s << ' '; cout << "\n";
178     cout << "Frobenius reconstruction error ||A - A2||_F: " << frobError(A, Ak) << "\n";
179     return 0;
180 }
181

This educational implementation computes the top-k singular values/vectors using power iteration on A^{T}A and rank-1 deflation. For each component, it finds an approximate dominant right singular vector v, computes u = Av / ||Av|| and σ = ||Av||, stores (u, σ, v), and deflates A ← A − σuv^{T}. The result reconstructs A_k and illustrates that a low-rank matrix is well-approximated by its top components. It avoids explicitly forming A^{T}A but still multiplies by A and A^{T}. This is not production-grade (no reorthogonalization or advanced stopping), but it conveys the mechanics of truncated SVD.

Time: O(k · T · m · n), where T is the number of power iterations per component.Space: O(mn) to store A plus O((m+n)k) for U_k, V_k, and σ.

PCA via SVD: center data, compute top principal components, and project

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 using Matrix = vector<vector<double>>;
5 using Vector = vector<double>;
6 
7 // Reuse minimal helpers
8 Matrix zeros(int m, int n, double val=0.0){ return Matrix(m, vector<double>(n, val)); }
9 Vector matVec(const Matrix &A, const Vector &x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; }
10 Vector matTVec(const Matrix &A, const Vector &y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; }
11 double dot(const Vector &a,const Vector &b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
12 double norm2(const Vector &x){ return sqrt(max(0.0, dot(x,x))); }
13 
14 Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){
15     int n=A[0].size();
16     std::mt19937 rng(7); std::normal_distribution<double> N(0.0,1.0);
17     Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng);
18     double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0);
19     double prev=0;
20     for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; }
21     return v;
22 }
23 
24 struct SVDResult { Matrix U; Vector S; Matrix V; };
25 
26 // Simple top-k SVD as in Example 1 (no code duplication here for brevity)
27 SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){
28     int m=A.size(), n=A[0].size(); k=min(k, min(m,n));
29     Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0);
30     auto deflate=[&](const Vector &u,const Vector &v,double s){ for(int i=0;i<m;++i){ double ui=u[i]*s; for(int j=0;j<n;++j) A[i][j]-=ui*v[j]; } };
31     for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s); }
32     return {U,S,V};
33 }
34 
35 // Center columns of data matrix X (rows are samples)
36 void centerColumns(Matrix &X){ int m=X.size(), n=X[0].size(); for(int j=0;j<n;++j){ double mu=0; for(int i=0;i<m;++i) mu+=X[i][j]; mu/=m; for(int i=0;i<m;++i) X[i][j]-=mu; } }
37 
38 // Project data onto top-k principal components (right singular vectors)
39 Matrix projectOntoPCs(const Matrix &X, const Matrix &V_k){ int m=X.size(), k=V_k[0].size(), n=X[0].size(); Matrix Z(m, vector<double>(k,0.0)); for(int i=0;i<m;++i){ for(int t=0;t<k;++t){ double s=0; for(int j=0;j<n;++j) s+=X[i][j]*V_k[j][t]; Z[i][t]=s; } } return Z; }
40 
41 int main(){
42     // Toy dataset: 10 samples, 5 features with strong correlation along two directions
43     int m=10, n=5; Matrix X(m, vector<double>(n,0.0));
44     std::mt19937 rng(123); std::normal_distribution<double> N(0.0,1.0);
45     // Generate 2D latent factors, map to 5D via fixed matrix W, add noise
46     Matrix W = {{1,0}, {0.8,0.2}, {0.6,0.4}, {0.2,0.8}, {0,1}}; // 5x2
47     for(int i=0;i<m;++i){ double z1=N(rng), z2=0.5*N(rng); for(int j=0;j<n;++j){ X[i][j] = W[j][0]*z1 + W[j][1]*z2 + 0.05*N(rng); } }
48 
49     // Center columns (critical for PCA)
50     centerColumns(X);
51 
52     // Compute top-2 SVD on centered data to get principal axes (right singular vectors)
53     int k=2; SVDResult res = topkSVD(X, k, 300, 1e-9);
54 
55     // Project samples onto PCs (scores)
56     Matrix Z = projectOntoPCs(X, res.V); // m x k
57 
58     // Compute explained variance ratio using singular values
59     double total=0, top=0; for(double s: res.S){ total += s*s; } for(int t=0;t<k && t<(int)res.S.size(); ++t) top += res.S[t]*res.S[t]; double evr = top / total;
60 
61     cout<<fixed<<setprecision(6);
62     cout<<"Top-"<<k<<" singular values (approx): "; for(int t=0;t<k;++t) cout<<res.S[t]<<" "; cout<<"\n";
63     cout<<"Explained variance ratio (approx): "<<evr<<"\n";
64     cout<<"First 3 projected samples onto PCs:\n";
65     for(int i=0;i<min(3,m);++i){ for(int t=0;t<k;++t) cout<<Z[i][t]<<" "; cout<<"\n"; }
66     return 0;
67 }
68

This example performs PCA with SVD: it centers the data matrix (rows are samples), computes the top-k right singular vectors (principal axes), and projects each sample onto these directions to obtain low-dimensional representations (scores). The squared singular values quantify explained variance, so their ratios report how much variance is retained by k components.

Time: O(k · T · m · n) for top-k SVD plus O(mn) for centering and projection.Space: O(mn) for data plus O((m+n)k) for components and scores.

Least squares via SVD pseudoinverse with thresholding

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 using Matrix = vector<vector<double>>; using Vector = vector<double>;
5 
6 Matrix zeros(int m,int n,double val=0.0){ return Matrix(m, vector<double>(n,val)); }
7 Vector matVec(const Matrix&A,const Vector&x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; }
8 Vector matTVec(const Matrix&A,const Vector&y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; }
9 double dot(const Vector&a,const Vector&b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
10 double norm2(const Vector&x){ return sqrt(max(0.0, dot(x,x))); }
11 
12 Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){ int n=A[0].size(); mt19937 rng(99); normal_distribution<double> N(0,1); Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng); double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0); double prev=0; for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; } return v; }
13 
14 struct SVDResult{ Matrix U; Vector S; Matrix V; };
15 SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){ int m=A.size(), n=A[0].size(); k=min(k, min(m,n)); Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0); auto deflate=[&](const Vector&u,const Vector&v,double s){ for(int i=0;i<m;++i){ double ui=u[i]*s; for(int j=0;j<n;++j) A[i][j]-=ui*v[j]; } }; for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s);} return {U,S,V}; }
16 
17 // Compute x = A^+ b approximately using top-k SVD with thresholding on singular values
18 Vector leastSquaresSVD(const Matrix &A, const Vector &b, int k, double relTol=1e-12){
19     int m=A.size(), n=A[0].size();
20     SVDResult svd = topkSVD(A, k, 400, 1e-10);
21     // Compute y = U^T b
22     int r = svd.S.size();
23     Vector y(r, 0.0);
24     for(int t=0;t<r;++t){ double s=0; for(int i=0;i<m;++i) s += svd.U[i][t]*b[i]; y[t]=s; }
25     // Threshold singular values to avoid division by near-zero
26     double sigma_max = 0.0; for(double s: svd.S) sigma_max = max(sigma_max, s);
27     double tau = relTol * max(m,n) * sigma_max;
28     // z_t = y_t / sigma_t if sigma_t > tau else 0
29     Vector z(r, 0.0);
30     for(int t=0;t<r;++t){ double s = svd.S[t]; if(s > tau) z[t] = y[t] / s; else z[t] = 0.0; }
31     // x = V z (since A^+ b = V Σ^+ U^T b)
32     Vector x(n, 0.0);
33     for(int j=0;j<n;++j){ double s=0; for(int t=0;t<r;++t) s += svd.V[j][t]*z[t]; x[j]=s; }
34     return x;
35 }
36 
37 int main(){
38     // Overdetermined system Ax ≈ b with noise
39     int m=20, n=3; Matrix A(m, vector<double>(n,0.0)); Vector true_x = {2.0, -1.0, 0.5};
40     std::mt19937 rng(2024); std::uniform_real_distribution<double> U(-1.0, 1.0); std::normal_distribution<double> N(0.0, 0.02);
41     for(int i=0;i<m;++i){ for(int j=0;j<n;++j) A[i][j] = U(rng); }
42     Vector b(m, 0.0); for(int i=0;i<m;++i){ for(int j=0;j<n;++j) b[i]+=A[i][j]*true_x[j]; b[i]+=N(rng); }
43 
44     // Solve with top-k SVD (k = n is fine here)
45     Vector x = leastSquaresSVD(A, b, /*k=*/n, /*relTol=*/1e-12);
46 
47     cout<<fixed<<setprecision(6);
48     cout<<"Estimated x: "; for(double xi: x) cout<<xi<<" "; cout<<"\n";
49     return 0;
50 }
51

This program solves min_x ||Ax − b||_2 using a truncated SVD-based pseudoinverse. It computes (approximate) U, Σ, V, thresholds small singular values to avoid instability, and returns x = VΣ^{+}U^{T}b. On noisy overdetermined problems, this approach is stable and near-optimal; with stricter thresholding it behaves like truncated SVD regularization.

Time: O(k · T · m · n) for the SVD plus O((m+n)k) for projections and reconstruction.Space: O(mn) for A plus O((m+n)k) for factors and temporaries.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	using Matrix = vector<vector<double>>;
5	using Vector = vector<double>;
6
7	// Utility: create m x n matrix filled with val
8	Matrix zeros(int m, int n, double val = 0.0) {
9	return Matrix(m, vector<double>(n, val));
10	}
11
12	// Multiply A (m x n) by x (n)
13	Vector matVec(const Matrix &A, const Vector &x) {
14	int m = (int)A.size(), n = (int)A[0].size();
15	Vector y(m, 0.0);
16	for (int i = 0; i < m; ++i) {
17	double s = 0.0;
18	for (int j = 0; j < n; ++j) s += A[i][j] * x[j];
19	y[i] = s;
20	}
21	return y;
22	}
23
24	// Multiply A^T (n x m) by y (m)
25	Vector matTVec(const Matrix &A, const Vector &y) {
26	int m = (int)A.size(), n = (int)A[0].size();
27	Vector x(n, 0.0);
28	for (int j = 0; j < n; ++j) {
29	double s = 0.0;
30	for (int i = 0; i < m; ++i) s += A[i][j] * y[i];
31	x[j] = s;
32	}
33	return x;
34	}
35
36	// y <- y - alpha * x
37	void axpy(Vector &y, const Vector &x, double alpha) {
38	for (size_t i = 0; i < y.size(); ++i) y[i] -= alpha * x[i];
39	}
40
41	// dot product
42	double dot(const Vector &a, const Vector &b) {
43	double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
44	}
45
46	// 2-norm
47	double norm2(const Vector &x) {
48	return sqrt(max(0.0, dot(x, x)));
49	}
50
51	// Deflate A by rank-1 component: A <- A - sigma * u * v^T
52	void deflate(Matrix &A, const Vector &u, const Vector &v, double sigma) {
53	int m = (int)A.size(), n = (int)A[0].size();
54	for (int i = 0; i < m; ++i) {
55	double ui = u[i] * sigma;
56	for (int j = 0; j < n; ++j) {
57	A[i][j] -= ui * v[j];
58	}
59	}
60	}
61
62	// Power iteration to approximate dominant right singular vector of A
63	// Avoids forming A^T A by using v <- A^T(A v) repeatedly.
64	Vector dominantRightSingularVector(const Matrix &A, int iters = 200, double tol = 1e-8) {
65	int n = (int)A[0].size();
66	std::mt19937 rng(42);
67	std::normal_distribution<double> N(0.0, 1.0);
68	Vector v(n);
69	for (int j = 0; j < n; ++j) v[j] = N(rng);
70	double nv = norm2(v); for (int j = 0; j < n; ++j) v[j] /= (nv > 0 ? nv : 1.0);
71
72	double prev_lambda = 0.0;
73	for (int t = 0; t < iters; ++t) {
74	// w = A v
75	Vector w = matVec(A, v);
76	// z = A^T w = (A^T A) v
77	Vector z = matTVec(A, w);
78	double nz = norm2(z);
79	if (nz > 0) for (int j = 0; j < n; ++j) v[j] = z[j] / nz;
80	// Rayleigh quotient for eigenvalue of A^T A (approx lambda = v^T (A^T A) v)
81	double lambda = dot(v, matTVec(A, matVec(A, v)));
82	if (fabs(lambda - prev_lambda) <= tol * max(1.0, fabs(prev_lambda))) break;
83	prev_lambda = lambda;
84	}
85	return v;
86	}
87
88	// Compute top-k SVD via repeated power iteration + deflation (educational; not production-grade)
89	struct SVDResult { Matrix U; Vector S; Matrix V; };
90
91	SVDResult topkSVD(Matrix A, int k, int iters = 200, double tol = 1e-8) {
92	int m = (int)A.size(), n = (int)A[0].size();
93	k = min(k, min(m, n));
94	Matrix U = zeros(m, k);
95	Matrix V = zeros(n, k);
96	Vector S(k, 0.0);
97
98	for (int idx = 0; idx < k; ++idx) {
99	// Find dominant right singular vector of current (deflated) A
100	Vector v = dominantRightSingularVector(A, iters, tol);
101	// Compute u = A v / \|\|A v\|\| and sigma = \|\|A v\|\|
102	Vector Av = matVec(A, v);
103	double sigma = norm2(Av);
104	Vector u(m, 0.0);
105	if (sigma > 0) {
106	for (int i = 0; i < m; ++i) u[i] = Av[i] / sigma;
107	} else {
108	// If sigma ~ 0, no more components
109	break;
110	}
111	// Store
112	for (int i = 0; i < m; ++i) U[i][idx] = u[i];
113	for (int j = 0; j < n; ++j) V[j][idx] = v[j];
114	S[idx] = sigma;
115	// Deflate A <- A - sigma * u v^T
116	deflate(A, u, v, sigma);
117	}
118	return {U, S, V};
119	}
120
121	// Reconstruct A_k = U_k Sigma_k V_k^T
122	Matrix reconstruct(const SVDResult &res) {
123	int m = (int)res.U.size();
124	int k = (int)res.S.size();
125	int n = (int)res.V.size();
126	Matrix Ak = zeros(m, n);
127	for (int t = 0; t < k; ++t) {
128	double s = res.S[t];
129	for (int i = 0; i < m; ++i) {
130	double ui = res.U[i][t] * s;
131	for (int j = 0; j < n; ++j) Ak[i][j] += ui * res.V[j][t];
132	}
133	}
134	return Ak;
135	}
136
137	// Frobenius norm of difference
138	double frobError(const Matrix &A, const Matrix &B) {
139	int m = (int)A.size(), n = (int)A[0].size();
140	double s = 0.0;
141	for (int i = 0; i < m; ++i)
142	for (int j = 0; j < n; ++j) {
143	double d = A[i][j] - B[i][j]; s += d * d;
144	}
145	return sqrt(max(0.0, s));
146	}
147
148	int main() {
149	// Build a synthetic low-rank matrix A = sum_{i=1}^2 sigma_i u_i v_i^T + noise
150	int m = 8, n = 6;
151	Matrix A = zeros(m, n);
152	// Rank-2 structure
153	Vector u1 = {1,0,0,0,1,0,0,0}; double s1 = 5.0; // sparse direction
154	Vector v1 = {1,0,1,0,1,0};
155	Vector u2 = {0,1,0,1,0,1,0,1}; double s2 = 2.5;
156	Vector v2 = {0,1,0,1,0,1};
157	auto addOuter = [&](const Vector &u, const Vector &v, double s){
158	for (int i = 0; i < m; ++i)
159	for (int j = 0; j < n; ++j)
160	A[i][j] += s * u[i] * v[j];
161	};
162	addOuter(u1, v1, s1);
163	addOuter(u2, v2, s2);
164	// Add small noise
165	std::mt19937 rng(123);
166	std::normal_distribution<double> N(0.0, 0.05);
167	for (int i = 0; i < m; ++i)
168	for (int j = 0; j < n; ++j)
169	A[i][j] += N(rng);
170
171	// Compute top-2 SVD
172	SVDResult res = topkSVD(A, 2, 300, 1e-9);
173	Matrix Ak = reconstruct(res);
174
175	cout << fixed << setprecision(6);
176	cout << "Top-2 singular values (approx): ";
177	for (double s : res.S) cout << s << ' '; cout << "\n";
178	cout << "Frobenius reconstruction error \|\|A - A2\|\|_F: " << frobError(A, Ak) << "\n";
179	return 0;
180	}
181

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	using Matrix = vector<vector<double>>; using Vector = vector<double>;
5
6	Matrix zeros(int m,int n,double val=0.0){ return Matrix(m, vector<double>(n,val)); }
7	Vector matVec(const Matrix&A,const Vector&x){ int m=A.size(), n=A[0].size(); Vector y(m); for(int i=0;i<m;++i){ double s=0; for(int j=0;j<n;++j) s+=A[i][j]*x[j]; y[i]=s;} return y; }
8	Vector matTVec(const Matrix&A,const Vector&y){ int m=A.size(), n=A[0].size(); Vector x(n); for(int j=0;j<n;++j){ double s=0; for(int i=0;i<m;++i) s+=A[i][j]*y[i]; x[j]=s;} return x; }
9	double dot(const Vector&a,const Vector&b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
10	double norm2(const Vector&x){ return sqrt(max(0.0, dot(x,x))); }
11
12	Vector dominantRightSingularVector(const Matrix &A, int iters=200, double tol=1e-8){ int n=A[0].size(); mt19937 rng(99); normal_distribution<double> N(0,1); Vector v(n); for(int j=0;j<n;++j) v[j]=N(rng); double nv=norm2(v); for(int j=0;j<n;++j) v[j]/=(nv>0?nv:1.0); double prev=0; for(int t=0;t<iters;++t){ Vector w=matVec(A,v); Vector z=matTVec(A,w); double nz=norm2(z); if(nz>0) for(int j=0;j<n;++j) v[j]=z[j]/nz; double lam=dot(v, matTVec(A, matVec(A,v))); if(fabs(lam-prev)<=tol*max(1.0,fabs(prev))) break; prev=lam; } return v; }
13
14	struct SVDResult{ Matrix U; Vector S; Matrix V; };
15	SVDResult topkSVD(Matrix A, int k, int iters=200, double tol=1e-8){ int m=A.size(), n=A[0].size(); k=min(k, min(m,n)); Matrix U(m, vector<double>(k,0.0)); Matrix V(n, vector<double>(k,0.0)); Vector S(k,0.0); auto deflate=[&](const Vector&u,const Vector&v,double s){ for(int i=0;i<m;++i){ double ui=u[i]s; for(int j=0;j<n;++j) A[i][j]-=uiv[j]; } }; for(int t=0;t<k;++t){ Vector v=dominantRightSingularVector(A, iters, tol); Vector Av=matVec(A,v); double s=norm2(Av); if(s<=1e-14) break; Vector u(m); for(int i=0;i<m;++i) u[i]=Av[i]/s; for(int i=0;i<m;++i) U[i][t]=u[i]; for(int j=0;j<n;++j) V[j][t]=v[j]; S[t]=s; deflate(u,v,s);} return {U,S,V}; }
16
17	// Compute x = A^+ b approximately using top-k SVD with thresholding on singular values
18	Vector leastSquaresSVD(const Matrix &A, const Vector &b, int k, double relTol=1e-12){
19	int m=A.size(), n=A[0].size();
20	SVDResult svd = topkSVD(A, k, 400, 1e-10);
21	// Compute y = U^T b
22	int r = svd.S.size();
23	Vector y(r, 0.0);
24	for(int t=0;t<r;++t){ double s=0; for(int i=0;i<m;++i) s += svd.U[i][t]*b[i]; y[t]=s; }
25	// Threshold singular values to avoid division by near-zero
26	double sigma_max = 0.0; for(double s: svd.S) sigma_max = max(sigma_max, s);
27	double tau = relTol * max(m,n) * sigma_max;
28	// z_t = y_t / sigma_t if sigma_t > tau else 0
29	Vector z(r, 0.0);
30	for(int t=0;t<r;++t){ double s = svd.S[t]; if(s > tau) z[t] = y[t] / s; else z[t] = 0.0; }
31	// x = V z (since A^+ b = V Σ^+ U^T b)
32	Vector x(n, 0.0);
33	for(int j=0;j<n;++j){ double s=0; for(int t=0;t<r;++t) s += svd.V[j][t]*z[t]; x[j]=s; }
34	return x;
35	}
36
37	int main(){
38	// Overdetermined system Ax ≈ b with noise
39	int m=20, n=3; Matrix A(m, vector<double>(n,0.0)); Vector true_x = {2.0, -1.0, 0.5};
40	std::mt19937 rng(2024); std::uniform_real_distribution<double> U(-1.0, 1.0); std::normal_distribution<double> N(0.0, 0.02);
41	for(int i=0;i<m;++i){ for(int j=0;j<n;++j) A[i][j] = U(rng); }
42	Vector b(m, 0.0); for(int i=0;i<m;++i){ for(int j=0;j<n;++j) b[i]+=A[i][j]*true_x[j]; b[i]+=N(rng); }
43
44	// Solve with top-k SVD (k = n is fine here)
45	Vector x = leastSquaresSVD(A, b, /k=/n, /relTol=/1e-12);
46
47	cout<<fixed<<setprecision(6);
48	cout<<"Estimated x: "; for(double xi: x) cout<<xi<<" "; cout<<"\n";
49	return 0;
50	}
51