📚TheoryIntermediate

Linear Algebra Theory

Key Points

•
Linear algebra studies vectors, linear combinations, and transformations that preserve addition and scalar multiplication.
•
Matrices represent linear transformations; their columns tell you where basis vectors go.
•
Eigenvalue decomposition factors a matrix into directions that are only stretched, not rotated, by the matrix.
•
The spectral theorem says every real symmetric matrix has orthonormal eigenvectors and can be diagonalized by an orthogonal matrix.
•
Singular Value Decomposition (SVD) factors any matrix into rotations/reflections and axis-aligned scalings; it always exists.
•
Rank equals the number of non-zero singular values, and key norms relate to singular values: the spectral norm is the largest singular value and the Frobenius norm squares sum to all singular values squared.
•
Low-rank approximation via truncated SVD gives the best rank-k approximation in Frobenius norm (Eckart–Young theorem).
•
In AI/ML, linear algebra powers PCA, embeddings, optimization steps, and attention; understanding norms and condition numbers guides stability and generalization.

Prerequisites

→Basic linear algebra notation (vectors, matrices) — To read and write formulas, index entries, and understand matrix–vector products.
→Calculus (multivariable basics) — To interpret gradients, Hessians, and how eigenvalues relate to curvature in optimization.
→Numerical computing and floating-point — To understand rounding errors, stability, and the need for pivoting/orthonormalization.
→Probability and statistics — To connect SVD with PCA and covariance matrices in ML applications.

Detailed Explanation

Tap terms for definitions

01Overview

Linear algebra is the language of vectors and the transformations that act on them. It studies vector spaces (sets of objects you can add and scale) and linear transformations that preserve these operations. In computation, we represent linear transformations as matrices, so manipulating matrices is central. Concepts like linear independence, basis, and dimension tell us how many directions of freedom a vector space has, and which minimal set of vectors can generate all others. Eigenvalues and eigenvectors reveal special directions where a transformation only scales vectors, not rotates them. For real symmetric matrices, the spectral theorem guarantees an orthonormal basis of eigenvectors, making such matrices particularly well-behaved. The Singular Value Decomposition (SVD) extends these ideas to any rectangular matrix, factoring it into orthogonal matrices and a diagonal matrix of nonnegative singular values. These singular values connect directly to matrix properties like rank, operator norms, and conditioning. In modern AI/ML, these tools are indispensable: PCA (an SVD of the data matrix) finds principal components, neural network layers multiply matrices by activations, attention is batched matrix multiplication, and low-rank approximations compress and speed up models while controlling error via norms.

02Intuition & Analogies

Imagine vectors as arrows in space and linear transformations as machines that stretch, shrink, rotate, and shear arrows but always preserve the origin and straight lines. If you draw a grid on paper and apply a linear transformation, the grid lines remain straight and parallel, though the grid might tilt and stretch. A basis is like choosing the two (in 2D) or three (in 3D) most convenient grid directions so that every point can be described by coordinates along those directions. Linear independence means your chosen directions aren’t redundant; you can’t recreate one arrow by mixing the others. Eigenvectors are the magical directions the machine doesn’t rotate—only stretches or compresses them—like rubber bands aligned with the machine’s main pull directions. For symmetric machines (imagine a perfectly even stretch in mirror-like ways), these special directions are perpendicular to each other and form a clean coordinate system. SVD says any messy transformation can be decomposed into three simpler steps: rotate (Vᵀ), stretch along coordinate axes (Σ), then rotate again (U). This is like turning a crumpled ellipsoid so that you can measure its principal radii and orientations. Rank measures how many independent directions survive the transformation; if rank is low, most directions are collapsed, which is why low-rank approximations capture the most important structure with fewer numbers. Norms measure how big the machine’s effect is: the spectral norm is the maximum stretch of any unit vector; the Frobenius norm measures total energy of all stretches. The condition number compares the largest and smallest stretches and warns how sensitive solutions are to small changes.

03Formal Definition

A vector space V over a field F (e.g., F =

R

C

) is a set with addition and scalar multiplication satisfying the usual axioms (associativity, commutativity of addition, distributivity, etc.). A list of vectors

v_{1}

\dots

v_{k}

\in

V is linearly independent if

\sum_{i = 1}^{k}

α_{i}

v_{i} = 0

implies all

α_{i}

= 0. The span of a set S

\subseteq

V is all finite linear combinations of elements of S. A basis is a linearly independent set whose span equals V. The dimension of V,

dim

(V), is the size of any basis (well-defined by the Steinitz exchange lemma). A linear transformation T: V

\to

W satisfies T(

α

u +

β

v) =

α

T(u) +

β

T(v). Relative to chosen bases, T is represented by a matrix A such that [T(x)] = A[x]. An eigenvalue–eigenvector pair (

λ

, v

\neq =

0) of A satisfies Av =

λ

v. For diagonalizable A, there exists an invertible V with

A = V

Λ

V^{- 1}

, where

Λ

is diagonal of eigenvalues. For real symmetric

A = A^{⊤}

, there exists an orthogonal Q with

A = Q

Λ

Q^{⊤}

. For any A

\in

R^{m \times n}

, the SVD is

A = U

Σ

V^{⊤}

, with U

\in

R^{m \times m}

, V

\in

R^{n \times n}

orthogonal, and

Σ

\in

R^{m \times n}

diagonal with nonnegative singular values

σ_{1}

\geq

\dots

\geq

σ_{m i n (m, n)}

\geq

0. The rank of A equals the number of nonzero singular values. Matrix norms include the spectral norm \

∣ A ∥

_{2} =

σ_{1}

and Frobenius norm \

∣ A ∥

_{F} =

\sum_{i} σ_{i}^{2}

. The condition number in the 2-norm is

κ_{2}

(A) =

σ_{1}

σ_{m i n}

04When to Use

Use bases and dimension when you need minimal coordinate representations (e.g., compressing data or verifying feature redundancy). Apply linear independence tests via rank to detect multicollinearity in regression or to ensure a set of features provides unique information. Use eigenvalues/eigenvectors for analyzing stability (e.g., Jacobian linearization), graph clustering (Laplacian eigenvectors), and for iterative solvers and PageRank (power iteration). When matrices are symmetric (covariances, Laplacians, Hessians), use the spectral theorem to diagonalize with an orthogonal matrix—this simplifies computations, ensures real eigenvalues, and supports orthogonal projections. Use SVD for any rectangular data matrix: PCA (via SVD of centered data), least-squares with pseudoinverse, noise filtering, and low-rank compression of embeddings or weight matrices. The Eckart–Young theorem guides optimal rank-k truncation when you want the best approximation under the Frobenius norm. Rely on the spectral norm to bound worst-case amplification (robustness) and the Frobenius norm as an energy measure across all directions. Monitor the condition number to assess numerical stability of linear solves and to decide whether to regularize (ridge regression) or precondition.

⚠️Common Mistakes

Confusing linear independence with orthogonality: independent vectors need not be perpendicular; always check rank or attempt to express one as a combination of others.
Assuming every matrix is diagonalizable by eigenvectors: non-symmetric or defective matrices may not have a full eigenbasis; use SVD or Jordan form theory when needed.
Mixing up eigenvalues and singular values: for non-symmetric matrices, eigenvalues can be complex or negative, while singular values are always real and nonnegative; norms and rank are tied to singular values, not eigenvalues.
Ignoring scaling of eigenvectors: eigenvectors are not unique in length; normalize them when comparing or computing Rayleigh quotients to avoid misleading magnitudes.
Using plain Gaussian elimination without pivoting: this can be numerically unstable; prefer partial pivoting or QR for solving and rank.
Misinterpreting condition number: a large \kappa means small input errors can cause large output errors; it is not a direct error but a sensitivity bound.
Believing Frobenius and spectral norms behave the same: |A|{F} measures overall energy, while |A|{2} captures the maximum stretch; choose the norm that matches your guarantee.
Assuming randomized low-rank projections always give the best rank-k approximation: they approximate the top singular subspace; only truncated SVD is provably optimal under Eckart–Young.

Key Formulas

Span

span (S) = {i = 1 \sum k α_{i} v_{i} : v_{i} \in S, α_{i} \in F}

Explanation: The span of a set S is all vectors you can reach by linear combinations of S. It defines the subspace generated by S.

Rank

rank (A) = dim (im (A)) = dim (col (A))

Explanation: The rank equals the dimension of the column space (image) of A. It counts independent output directions of the linear transformation.

Eigenvalue equation

A v = λ v

Explanation: An eigenvector v is only scaled (by $λ$ ) when multiplied by A. This reveals invariant directions of the transformation.

Eigen-decomposition

A = V Λ V^{- 1}

Explanation: A diagonalizable matrix can be factored into eigenvectors and eigenvalues. Computations simplify in this eigenbasis.

Spectral theorem

A = Q Λ Q^{⊤}, Q^{⊤} Q = I, A = A^{⊤}

Explanation: Real symmetric matrices are orthogonally diagonalizable with real eigenvalues and orthonormal eigenvectors. This ensures numerical stability and simplicity.

SVD

A = U Σ V^{⊤}, Σ = diag (σ_{1}, \dots, σ_{r}), σ_{1} \geq \dots \geq σ_{r} \geq 0

Explanation: Any matrix factors into left/right orthogonal matrices and nonnegative singular values. It generalizes eigen-decomposition to rectangular and non-normal matrices.

Norms via singular values

∥ A ∥_{2} = σ_{1}, ∥ A ∥_{F} = i = 1 \sum r σ_{i}^{2}

Explanation: The spectral norm is the largest singular value; the Frobenius norm is the square root of the sum of squares of all singular values. These connect geometry to computation.

Rank via SVD

rank (A) = # {i : σ_{i} > 0}

Explanation: The number of nonzero singular values equals the rank. This is robust to choice of basis and works for rectangular matrices.

Truncated SVD

A_{k} = U_{:, 1 : k} Σ_{1 : k, 1 : k} V_{:, 1 : k}^{⊤}

Explanation: The best rank-k approximation keeps only the top k singular triplets. It minimizes error among all rank-k matrices.

Eckart–Young (Frobenius)

∥ A - A_{k} ∥_{F}^{2} = i = k + 1 \sum r σ_{i}^{2}

Explanation: The squared Frobenius error of the best rank-k approximation equals the sum of squared discarded singular values. This quantifies information loss.

Condition number

κ_{2} (A) = ∥ A ∥_{2} ∥ A^{- 1} ∥_{2} = \frac{σ _{1}}{σ _{m i n}}

Explanation: The 2-norm condition number measures sensitivity of solving Ax=b. Larger values imply solutions can change a lot under small perturbations.

Rayleigh quotient

ρ (x) = \frac{x ^{⊤} A x}{x ^{⊤} x}

Explanation: For symmetric A, the Rayleigh quotient of a nonzero x lies between the smallest and largest eigenvalues. It reaches extremes at eigenvectors.

Gaussian elimination cost

T (n) = \frac{2}{3} n^{3} + O (n^{2})

Explanation: The arithmetic cost for solving a dense n-by-n system via elimination is about $\frac{2}{3} n^{3}$ operations. This dominates for large n.

Power iteration step

x_{k + 1} = \frac{A x _{k}}{∥ A x _{k} ∥ _{2}}

Explanation: Repeatedly multiplying and normalizing converges to the dominant eigenvector if it is unique and the start vector has a component in that direction.

Complexity Analysis

Core dense linear algebra has cubic-time bottlenecks. Forming a row echelon form via Gaussian elimination with partial pivoting costs about

\frac{2}{3} n^{3}

arithmetic operations for an n-by-n matrix and O(

n^{2}

) space to store the matrix; counting nonzero rows yields the rank. For rectangular m-by-n matrices (

m \geq n

), elimination takes O(m

n^{2}

). The eigenvalue problem in full generality requires O(

n^{3}

) time using QR iterations or divide-and-conquer SVD methods, with O(

n^{2}

) space. However, many ML tasks only need the top singular vectors or eigenpairs: power iteration or Lanczos reduces per-iteration cost to O(

n^{2}

) (for dense n-by-n) and O(nnz(A)) for sparse matrices, with O(n) additional space; convergence typically requires O(

lo g

(1/

ε

)) iterations depending on the spectral gap. Randomized low-rank approximation using a range finder has complexity O(mnk) for computing Y = A

Ω

, plus O(m

k^{2}

) for orthonormalizing Y and O(kmn) for forming the projection Q(

Q^{⊤}

A); when k ≪ min(m,n), this is far cheaper than full SVD, and memory drops to O((m+n)k). Norm computations vary: the Frobenius norm costs O(mn) time and O(1) extra space, while the spectral norm generally requires iterative methods. Condition number estimation in the 2-norm is hard exactly but can be bounded via iterative norms or through SVD when available. In practice, numerical stability affects effective complexity: partial pivoting adds negligible overhead yet improves robustness, while orthonormalizations (Modified Gram–Schmidt or Householder) maintain accuracy in low-rank methods at O(m

k^{2}

) cost. Overall, careful algorithm choice (dense vs. sparse, full vs. truncated) and data structure selection dominate performance in ML pipelines.

Code Examples

Gaussian elimination with partial pivoting: rank, linear independence, and pivot columns (basis)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Numerical threshold for treating values as zero
5 const double EPS = 1e-9;
6 
7 // Perform Gaussian elimination with partial pivoting to row-echelon form.
8 // Returns pivot column indices and transforms A in-place to an upper-triangular-like form.
9 pair<int, vector<int>> rowEchelon(vector<vector<double>>& A) {
10     int m = (int)A.size();
11     int n = (int)A[0].size();
12     int row = 0;
13     vector<int> pivotCol;
14     for (int col = 0; col < n && row < m; ++col) {
15         // Find pivot in this column at or below current row
16         int sel = row;
17         for (int i = row + 1; i < m; ++i) if (fabs(A[i][col]) > fabs(A[sel][col])) sel = i;
18         if (fabs(A[sel][col]) < EPS) continue; // no pivot in this column
19         swap(A[sel], A[row]);
20         pivotCol.push_back(col);
21         // Normalize pivot row (optional for rank, helpful for RREF-like form)
22         double piv = A[row][col];
23         for (int j = col; j < n; ++j) A[row][j] /= piv;
24         // Eliminate below pivot
25         for (int i = row + 1; i < m; ++i) {
26             double factor = A[i][col];
27             if (fabs(factor) < EPS) continue;
28             for (int j = col; j < n; ++j) A[i][j] -= factor * A[row][j];
29         }
30         ++row;
31     }
32     int rank = (int)pivotCol.size();
33     return {rank, pivotCol};
34 }
35 
36 // Check if a set of column vectors is linearly independent: form matrix with vectors as columns.
37 bool isIndependent(const vector<vector<double>>& cols) {
38     int n = (int)cols.size();
39     int m = (int)cols[0].size();
40     vector<vector<double>> A(m, vector<double>(n));
41     for (int j = 0; j < n; ++j) for (int i = 0; i < m; ++i) A[i][j] = cols[j][i];
42     auto tmp = A;
43     auto res = rowEchelon(tmp);
44     return res.first == n; // rank equals number of columns
45 }
46 
47 int main() {
48     // Example: matrix A with dependent columns
49     vector<vector<double>> A = {
50         {1, 2, 3, 6},
51         {0, 1, 1, 2},
52         {2, 5, 7, 14}
53     }; // Note: col4 = 2*col3, so rank should be 3
54 
55     auto A_copy = A;
56     auto [rankA, pivots] = rowEchelon(A_copy);
57     cout << fixed << setprecision(6);
58     cout << "Rank(A) = " << rankA << "\n";
59     cout << "Pivot columns (0-based): ";
60     for (int c : pivots) cout << c << ' ';
61     cout << "\n";
62 
63     // Build columns as vectors to test independence of first 3 columns
64     vector<vector<double>> cols(3, vector<double>(3));
65     for (int j = 0; j < 3; ++j)
66         for (int i = 0; i < 3; ++i)
67             cols[j][i] = A[i][j];
68 
69     cout << "First 3 columns independent? " << (isIndependent(cols) ? "yes" : "no") << "\n";
70 
71     return 0;
72 }
73

We implement Gaussian elimination with partial pivoting to transform a matrix to row echelon form. The number of pivot columns equals the rank, and these pivot columns identify a basis for the column space. We also test linear independence by checking whether the rank equals the number of vectors. Partial pivoting chooses the largest absolute value in a column to improve numerical stability.

Time: O(m n^2) for an m-by-n matrix (O(n^3) if square).Space: O(m n) to store the matrix; O(1) extra beyond input storage.

Power iteration for dominant eigenvalue and eigenvector (symmetric A)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 const double EPS = 1e-10;
5 
6 vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) {
7     int n = (int)A.size();
8     vector<double> y(n, 0.0);
9     for (int i = 0; i < n; ++i) {
10         for (int j = 0; j < n; ++j) y[i] += A[i][j] * x[j];
11     }
12     return y;
13 }
14 
15 double dot(const vector<double>& a, const vector<double>& b) {
16     double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
17 }
18 
19 double norm2(const vector<double>& x) {
20     return sqrt(max(0.0, dot(x,x)));
21 }
22 
23 pair<double, vector<double>> power_iteration(const vector<vector<double>>& A, int max_iters=1000, double tol=1e-9) {
24     int n = (int)A.size();
25     // Random initial vector
26     mt19937 rng(42);
27     uniform_real_distribution<double> dist(-1.0, 1.0);
28     vector<double> b(n); for (int i=0;i<n;++i) b[i] = dist(rng);
29     double lambda_old = 0.0;
30 
31     for (int it = 0; it < max_iters; ++it) {
32         // Multiply and normalize
33         vector<double> Ab = matvec(A, b);
34         double nb = norm2(Ab);
35         if (nb < EPS) break; // A maps b near zero
36         for (int i=0;i<n;++i) b[i] = Ab[i] / nb;
37         // Rayleigh quotient estimate
38         vector<double> Ab_norm = matvec(A, b);
39         double lambda = dot(b, Ab_norm);
40         if (fabs(lambda - lambda_old) < tol * max(1.0, fabs(lambda_old))) {
41             return {lambda, b};
42         }
43         lambda_old = lambda;
44     }
45     // Final estimate
46     vector<double> Ab = matvec(A, b);
47     double lambda = dot(b, Ab);
48     return {lambda, b};
49 }
50 
51 int main(){
52     // Symmetric matrix with known eigenvalues
53     vector<vector<double>> A = {
54         {4, 1, 0},
55         {1, 3, 0},
56         {0, 0, 2}
57     }; // eigenvalues: 5, 2, 2 (dominant = 5)
58 
59     auto [lambda, v] = power_iteration(A, 10000, 1e-12);
60 
61     cout << fixed << setprecision(8);
62     cout << "Dominant eigenvalue ~ " << lambda << "\n";
63     cout << "Eigenvector (normalized): ";
64     for (double vi : v) cout << vi << ' '; cout << "\n";
65 
66     // Residual norm ||Av - lambda v|| to check accuracy
67     vector<double> Av = matvec(A, v);
68     for (int i=0;i<(int)v.size();++i) Av[i] -= lambda * v[i];
69     double res = 0; for (double t: Av) res += t*t; res = sqrt(res);
70     cout << "Residual norm = " << res << "\n";
71 
72     return 0;
73 }
74

Power iteration repeatedly multiplies a vector by A and normalizes it. For symmetric matrices with a unique largest eigenvalue in magnitude, this process converges to the dominant eigenvector, and the Rayleigh quotient gives the eigenvalue estimate. The small residual verifies Av ≈ λv.

Time: O(n^2 * iters) for dense n-by-n matrices (O(nnz(A) * iters) for sparse).Space: O(n^2) to store A (dense), O(n) extra for vectors.

Randomized low-rank approximation A ≈ Q(QᵀA) via range finder (Eckart–Young-inspired)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Utility: matrix multiply C = A * B (A: m x p, B: p x n)
5 vector<vector<double>> matmul(const vector<vector<double>>& A, const vector<vector<double>>& B) {
6     int m = (int)A.size(), p = (int)A[0].size(), n = (int)B[0].size();
7     vector<vector<double>> C(m, vector<double>(n, 0.0));
8     for (int i=0;i<m;++i)
9         for (int k=0;k<p;++k) if (A[i][k] != 0.0)
10             for (int j=0;j<n;++j)
11                 C[i][j] += A[i][k] * B[k][j];
12     return C;
13 }
14 
15 // Transpose
16 vector<vector<double>> transpose(const vector<vector<double>>& A){
17     int m=(int)A.size(), n=(int)A[0].size();
18     vector<vector<double>> AT(n, vector<double>(m));
19     for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i]=A[i][j];
20     return AT;
21 }
22 
23 // Modified Gram-Schmidt to orthonormalize columns of Y (m x k)
24 vector<vector<double>> mgs(const vector<vector<double>>& Y){
25     int m=(int)Y.size(), k=(int)Y[0].size();
26     vector<vector<double>> Q(m, vector<double>(k, 0.0));
27     // Copy Y into Q's columns
28     for(int j=0;j<k;++j) for(int i=0;i<m;++i) Q[i][j]=Y[i][j];
29     for(int j=0;j<k;++j){
30         // Subtract projections onto previous q's
31         for(int i2=0;i2<j;++i2){
32             double r=0.0; for(int i=0;i<m;++i) r += Q[i][i2]*Q[i][j];
33             for(int i=0;i<m;++i) Q[i][j] -= r*Q[i][i2];
34         }
35         // Normalize
36         double nrm=0.0; for(int i=0;i<m;++i) nrm += Q[i][j]*Q[i][j]; nrm = sqrt(max(1e-18, nrm));
37         for(int i=0;i<m;++i) Q[i][j] /= nrm;
38     }
39     return Q;
40 }
41 
42 // Frobenius norm
43 double frob(const vector<vector<double>>& A){
44     double s=0.0; for (auto& r: A) for (double x: r) s += x*x; return sqrt(s);
45 }
46 
47 // Randomized range finder: returns Q (m x k) with orthonormal columns approximating range(A)
48 vector<vector<double>> randomized_range_finder(const vector<vector<double>>& A, int k, int oversample=5){
49     int m=(int)A.size(), n=(int)A[0].size();
50     int l = min(n, k+oversample);
51     // Omega: n x l random
52     mt19937 rng(42); normal_distribution<double> dist(0.0,1.0);
53     vector<vector<double>> Omega(n, vector<double>(l));
54     for(int i=0;i<n;++i) for(int j=0;j<l;++j) Omega[i][j] = dist(rng);
55     // Y = A * Omega  (m x l)
56     vector<vector<double>> Y = matmul(A, Omega);
57     // Orthonormalize columns to get Q (m x l)
58     vector<vector<double>> Q = mgs(Y);
59     // Truncate to k columns
60     Q.resize(m);
61     for(int i=0;i<m;++i) Q[i].resize(k);
62     return Q;
63 }
64 
65 int main(){
66     // Build a low-rank-ish matrix A = U * S * V^T synthetically
67     int m=50, n=40, r=3; // true rank r
68     mt19937 rng(0); normal_distribution<double> dist(0.0,1.0);
69     vector<vector<double>> U(m, vector<double>(r)), V(n, vector<double>(r));
70     for(int i=0;i<m;++i) for(int j=0;j<r;++j) U[i][j]=dist(rng);
71     for(int i=0;i<n;++i) for(int j=0;j<r;++j) V[i][j]=dist(rng);
72     // Orthonormalize U,V columns quickly via MGS
73     U = mgs(U); V = mgs(V);
74     vector<double> s = {10.0, 3.0, 1.0};
75     vector<vector<double>> S(r, vector<double>(r, 0.0));
76     for(int i=0;i<r;++i) S[i][i]=s[i];
77     vector<vector<double>> A = matmul(matmul(U,S), transpose(V));
78 
79     // Add small noise to make it full rank but still low-rank dominant
80     vector<vector<double>> noise(m, vector<double>(n));
81     for(int i=0;i<m;++i) for(int j=0;j<n;++j) noise[i][j] = 0.01*dist(rng);
82     for(int i=0;i<m;++i) for(int j=0;j<n;++j) A[i][j] += noise[i][j];
83 
84     // Randomized rank-k approximation
85     int k=3; // target rank
86     vector<vector<double>> Q = randomized_range_finder(A, k, 5); // m x k
87     vector<vector<double>> QT = transpose(Q);                    // k x m
88     vector<vector<double>> B = matmul(QT, A);                    // k x n
89     vector<vector<double>> A_approx = matmul(Q, B);              // m x n
90 
91     double rel_err = frob(A_approx) > 0 ? frob(A_approx) : 1.0; // placeholder to avoid unused warnings
92     double err = frob(A); // will recompute below
93 
94     // Compute relative Frobenius error ||A - A_approx||_F / ||A||_F
95     vector<vector<double>> Diff = A;
96     for(int i=0;i<m;++i) for(int j=0;j<n;++j) Diff[i][j] -= A_approx[i][j];
97     double relF = frob(Diff) / max(1e-18, frob(A));
98 
99     cout << fixed << setprecision(6);
100     cout << "Relative Frobenius error (rank-" << k << ") = " << relF << "\n";
101 
102     return 0;
103 }
104

This code computes a randomized low-rank approximation by finding an orthonormal basis Q for the range of A via Y = AΩ and Modified Gram–Schmidt, then projecting A onto that subspace: Â = Q(QᵀA). When k matches the number of dominant singular directions, the approximation is close to the optimal truncated SVD (Eckart–Young). The example constructs a synthetic low-rank matrix with small noise and reports the relative Frobenius error.

Time: O(m n k) for forming Y and the projection, plus O(m k^2) for orthonormalization.Space: O((m+n)k) extra beyond storing A.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Numerical threshold for treating values as zero
5	const double EPS = 1e-9;
6
7	// Perform Gaussian elimination with partial pivoting to row-echelon form.
8	// Returns pivot column indices and transforms A in-place to an upper-triangular-like form.
9	pair<int, vector<int>> rowEchelon(vector<vector<double>>& A) {
10	int m = (int)A.size();
11	int n = (int)A[0].size();
12	int row = 0;
13	vector<int> pivotCol;
14	for (int col = 0; col < n && row < m; ++col) {
15	// Find pivot in this column at or below current row
16	int sel = row;
17	for (int i = row + 1; i < m; ++i) if (fabs(A[i][col]) > fabs(A[sel][col])) sel = i;
18	if (fabs(A[sel][col]) < EPS) continue; // no pivot in this column
19	swap(A[sel], A[row]);
20	pivotCol.push_back(col);
21	// Normalize pivot row (optional for rank, helpful for RREF-like form)
22	double piv = A[row][col];
23	for (int j = col; j < n; ++j) A[row][j] /= piv;
24	// Eliminate below pivot
25	for (int i = row + 1; i < m; ++i) {
26	double factor = A[i][col];
27	if (fabs(factor) < EPS) continue;
28	for (int j = col; j < n; ++j) A[i][j] -= factor * A[row][j];
29	}
30	++row;
31	}
32	int rank = (int)pivotCol.size();
33	return {rank, pivotCol};
34	}
35
36	// Check if a set of column vectors is linearly independent: form matrix with vectors as columns.
37	bool isIndependent(const vector<vector<double>>& cols) {
38	int n = (int)cols.size();
39	int m = (int)cols[0].size();
40	vector<vector<double>> A(m, vector<double>(n));
41	for (int j = 0; j < n; ++j) for (int i = 0; i < m; ++i) A[i][j] = cols[j][i];
42	auto tmp = A;
43	auto res = rowEchelon(tmp);
44	return res.first == n; // rank equals number of columns
45	}
46
47	int main() {
48	// Example: matrix A with dependent columns
49	vector<vector<double>> A = {
50	{1, 2, 3, 6},
51	{0, 1, 1, 2},
52	{2, 5, 7, 14}
53	}; // Note: col4 = 2*col3, so rank should be 3
54
55	auto A_copy = A;
56	auto [rankA, pivots] = rowEchelon(A_copy);
57	cout << fixed << setprecision(6);
58	cout << "Rank(A) = " << rankA << "\n";
59	cout << "Pivot columns (0-based): ";
60	for (int c : pivots) cout << c << ' ';
61	cout << "\n";
62
63	// Build columns as vectors to test independence of first 3 columns
64	vector<vector<double>> cols(3, vector<double>(3));
65	for (int j = 0; j < 3; ++j)
66	for (int i = 0; i < 3; ++i)
67	cols[j][i] = A[i][j];
68
69	cout << "First 3 columns independent? " << (isIndependent(cols) ? "yes" : "no") << "\n";
70
71	return 0;
72	}
73

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	const double EPS = 1e-10;
5
6	vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) {
7	int n = (int)A.size();
8	vector<double> y(n, 0.0);
9	for (int i = 0; i < n; ++i) {
10	for (int j = 0; j < n; ++j) y[i] += A[i][j] * x[j];
11	}
12	return y;
13	}
14
15	double dot(const vector<double>& a, const vector<double>& b) {
16	double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
17	}
18
19	double norm2(const vector<double>& x) {
20	return sqrt(max(0.0, dot(x,x)));
21	}
22
23	pair<double, vector<double>> power_iteration(const vector<vector<double>>& A, int max_iters=1000, double tol=1e-9) {
24	int n = (int)A.size();
25	// Random initial vector
26	mt19937 rng(42);
27	uniform_real_distribution<double> dist(-1.0, 1.0);
28	vector<double> b(n); for (int i=0;i<n;++i) b[i] = dist(rng);
29	double lambda_old = 0.0;
30
31	for (int it = 0; it < max_iters; ++it) {
32	// Multiply and normalize
33	vector<double> Ab = matvec(A, b);
34	double nb = norm2(Ab);
35	if (nb < EPS) break; // A maps b near zero
36	for (int i=0;i<n;++i) b[i] = Ab[i] / nb;
37	// Rayleigh quotient estimate
38	vector<double> Ab_norm = matvec(A, b);
39	double lambda = dot(b, Ab_norm);
40	if (fabs(lambda - lambda_old) < tol * max(1.0, fabs(lambda_old))) {
41	return {lambda, b};
42	}
43	lambda_old = lambda;
44	}
45	// Final estimate
46	vector<double> Ab = matvec(A, b);
47	double lambda = dot(b, Ab);
48	return {lambda, b};
49	}
50
51	int main(){
52	// Symmetric matrix with known eigenvalues
53	vector<vector<double>> A = {
54	{4, 1, 0},
55	{1, 3, 0},
56	{0, 0, 2}
57	}; // eigenvalues: 5, 2, 2 (dominant = 5)
58
59	auto [lambda, v] = power_iteration(A, 10000, 1e-12);
60
61	cout << fixed << setprecision(8);
62	cout << "Dominant eigenvalue ~ " << lambda << "\n";
63	cout << "Eigenvector (normalized): ";
64	for (double vi : v) cout << vi << ' '; cout << "\n";
65
66	// Residual norm \|\|Av - lambda v\|\| to check accuracy
67	vector<double> Av = matvec(A, v);
68	for (int i=0;i<(int)v.size();++i) Av[i] -= lambda * v[i];
69	double res = 0; for (double t: Av) res += t*t; res = sqrt(res);
70	cout << "Residual norm = " << res << "\n";
71
72	return 0;
73	}
74

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Utility: matrix multiply C = A * B (A: m x p, B: p x n)
5	vector<vector<double>> matmul(const vector<vector<double>>& A, const vector<vector<double>>& B) {
6	int m = (int)A.size(), p = (int)A[0].size(), n = (int)B[0].size();
7	vector<vector<double>> C(m, vector<double>(n, 0.0));
8	for (int i=0;i<m;++i)
9	for (int k=0;k<p;++k) if (A[i][k] != 0.0)
10	for (int j=0;j<n;++j)
11	C[i][j] += A[i][k] * B[k][j];
12	return C;
13	}
14
15	// Transpose
16	vector<vector<double>> transpose(const vector<vector<double>>& A){
17	int m=(int)A.size(), n=(int)A[0].size();
18	vector<vector<double>> AT(n, vector<double>(m));
19	for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i]=A[i][j];
20	return AT;
21	}
22
23	// Modified Gram-Schmidt to orthonormalize columns of Y (m x k)
24	vector<vector<double>> mgs(const vector<vector<double>>& Y){
25	int m=(int)Y.size(), k=(int)Y[0].size();
26	vector<vector<double>> Q(m, vector<double>(k, 0.0));
27	// Copy Y into Q's columns
28	for(int j=0;j<k;++j) for(int i=0;i<m;++i) Q[i][j]=Y[i][j];
29	for(int j=0;j<k;++j){
30	// Subtract projections onto previous q's
31	for(int i2=0;i2<j;++i2){
32	double r=0.0; for(int i=0;i<m;++i) r += Q[i][i2]*Q[i][j];
33	for(int i=0;i<m;++i) Q[i][j] -= r*Q[i][i2];
34	}
35	// Normalize
36	double nrm=0.0; for(int i=0;i<m;++i) nrm += Q[i][j]*Q[i][j]; nrm = sqrt(max(1e-18, nrm));
37	for(int i=0;i<m;++i) Q[i][j] /= nrm;
38	}
39	return Q;
40	}
41
42	// Frobenius norm
43	double frob(const vector<vector<double>>& A){
44	double s=0.0; for (auto& r: A) for (double x: r) s += x*x; return sqrt(s);
45	}
46
47	// Randomized range finder: returns Q (m x k) with orthonormal columns approximating range(A)
48	vector<vector<double>> randomized_range_finder(const vector<vector<double>>& A, int k, int oversample=5){
49	int m=(int)A.size(), n=(int)A[0].size();
50	int l = min(n, k+oversample);
51	// Omega: n x l random
52	mt19937 rng(42); normal_distribution<double> dist(0.0,1.0);
53	vector<vector<double>> Omega(n, vector<double>(l));
54	for(int i=0;i<n;++i) for(int j=0;j<l;++j) Omega[i][j] = dist(rng);
55	// Y = A * Omega (m x l)
56	vector<vector<double>> Y = matmul(A, Omega);
57	// Orthonormalize columns to get Q (m x l)
58	vector<vector<double>> Q = mgs(Y);
59	// Truncate to k columns
60	Q.resize(m);
61	for(int i=0;i<m;++i) Q[i].resize(k);
62	return Q;
63	}
64
65	int main(){
66	// Build a low-rank-ish matrix A = U * S * V^T synthetically
67	int m=50, n=40, r=3; // true rank r
68	mt19937 rng(0); normal_distribution<double> dist(0.0,1.0);
69	vector<vector<double>> U(m, vector<double>(r)), V(n, vector<double>(r));
70	for(int i=0;i<m;++i) for(int j=0;j<r;++j) U[i][j]=dist(rng);
71	for(int i=0;i<n;++i) for(int j=0;j<r;++j) V[i][j]=dist(rng);
72	// Orthonormalize U,V columns quickly via MGS
73	U = mgs(U); V = mgs(V);
74	vector<double> s = {10.0, 3.0, 1.0};
75	vector<vector<double>> S(r, vector<double>(r, 0.0));
76	for(int i=0;i<r;++i) S[i][i]=s[i];
77	vector<vector<double>> A = matmul(matmul(U,S), transpose(V));
78
79	// Add small noise to make it full rank but still low-rank dominant
80	vector<vector<double>> noise(m, vector<double>(n));
81	for(int i=0;i<m;++i) for(int j=0;j<n;++j) noise[i][j] = 0.01*dist(rng);
82	for(int i=0;i<m;++i) for(int j=0;j<n;++j) A[i][j] += noise[i][j];
83
84	// Randomized rank-k approximation
85	int k=3; // target rank
86	vector<vector<double>> Q = randomized_range_finder(A, k, 5); // m x k
87	vector<vector<double>> QT = transpose(Q); // k x m
88	vector<vector<double>> B = matmul(QT, A); // k x n
89	vector<vector<double>> A_approx = matmul(Q, B); // m x n
90
91	double rel_err = frob(A_approx) > 0 ? frob(A_approx) : 1.0; // placeholder to avoid unused warnings
92	double err = frob(A); // will recompute below
93
94	// Compute relative Frobenius error \|\|A - A_approx\|\|_F / \|\|A\|\|_F
95	vector<vector<double>> Diff = A;
96	for(int i=0;i<m;++i) for(int j=0;j<n;++j) Diff[i][j] -= A_approx[i][j];
97	double relF = frob(Diff) / max(1e-18, frob(A));
98
99	cout << fixed << setprecision(6);
100	cout << "Relative Frobenius error (rank-" << k << ") = " << relF << "\n";
101
102	return 0;
103	}
104