🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Matrix Norms & Condition Numbers

Key Points

  • •
    Matrix norms measure the size of a matrix in different but related ways, with Frobenius treating entries like a big vector, spectral measuring the strongest stretch, and nuclear summing all singular values.
  • •
    The spectral norm equals the largest singular value and controls how much a matrix can amplify any vector in the 2-norm.
  • •
    The Frobenius norm equals the square root of the sum of squared entries and is also the square root of the trace of AT A.
  • •
    The nuclear norm equals the sum of singular values and is a convex surrogate for matrix rank, useful in low-rank recovery and regularization.
  • •
    Condition number in the 2-norm is the ratio of largest to smallest singular value and predicts how sensitive solutions of Ax=b are to perturbations.
  • •
    Exact spectral and nuclear norms require SVD, but fast approximations can be obtained with power iteration and randomized subspace methods.
  • •
    Unitary (orthogonal) invariance means Frobenius, spectral, and nuclear norms do not change under orthogonal row/column rotations.
  • •
    Avoid explicitly forming AT A when possible for numerical stability; prefer multiplying by A and AT directly or using factorizations.

Prerequisites

  • →Vector norms (1-, 2-, and ∞-norms) — Matrix operator norms are defined via induced vector norms; understanding vector norms clarifies their meaning.
  • →Eigenvalues and eigenvectors — Spectral norm and condition numbers are defined via eigenvalues of A^T A and their properties.
  • →Singular Value Decomposition (SVD) — Spectral and nuclear norms are expressed in terms of singular values; many properties follow from SVD.
  • →Orthogonality and orthogonal matrices — Unitary invariance and stability arguments use orthogonal transformations and inner products.
  • →Basic numerical linear algebra (LU/QR) — Stable computation of smallest singular values and solving linear systems relies on factorizations.

Detailed Explanation

Tap terms for definitions

01Overview

Matrix norms generalize the idea of vector length to matrices and let us reason about the size, amplification, and stability of linear transformations. Three especially important norms are the Frobenius norm, spectral (operator 2-) norm, and nuclear (trace) norm. The Frobenius norm treats all entries as one long vector and measures their Euclidean length. The spectral norm measures the strongest stretching factor a matrix can apply to any vector when using the Euclidean vector norm. The nuclear norm adds up all singular values and is widely used because it is the convex envelope of matrix rank on the unit spectral-norm ball. Condition numbers combine a matrix norm with the norm of its inverse (when the inverse exists) to quantify sensitivity: a high condition number means small input changes can blow up into large output errors. These tools provide the language for error bounds, stability guarantees, and regularization in algorithms spanning numerical linear algebra, optimization, machine learning, signal processing, and control.

02Intuition & Analogies

Imagine a matrix as a machine that takes in a vector and outputs another vector by stretching, squashing, and rotating it. If you put in a unit-length arrow and the machine sometimes stretches it a lot, the spectral norm is that maximum possible stretch. It’s like testing the machine with all possible unit arrows and recording the largest output length: this tells you the worst-case amplification. Now think of the Frobenius norm as weighing the machine’s overall “energy.” If you disassembled the machine into gears (the entries of the matrix) and measured the squared sizes of all gears, the Frobenius norm is the square root of their total. It does not care about direction; it cares about aggregate magnitude, much like the overall power consumption of a device. The nuclear norm views the machine through its singular values (its principal stretch factors in orthogonal directions). If you imagine the machine stretching space along several independent axes by amounts σ1, σ2, …, the nuclear norm is the total stretch budget: the sum of these amounts. This is why it relates to rank—the number of nonzero axes—and becomes a natural tool to encourage low-rank structure by penalizing total stretch. Finally, condition number is a fragility meter. If the machine barely squashes some direction (very small σ_min) but strongly stretches another (large σ_max), then tiny errors along the squashed direction can explode when you try to reverse the process. That’s a high condition number and a warning sign for solving Ax=b reliably.

03Formal Definition

Let A ∈ Rm×n. The Frobenius norm is ∥ A ∥F​ = ∑i=1m​∑j=1n​aij2​​ = trace(ATA)​. The spectral (operator 2-) norm is ∥ A ∥2​ = max∥x∥2​=1​ ∥ Ax ∥2​ = σmax​(A) = λmax​(ATA)​, where σmax​ denotes the largest singular value and λmax​ the largest eigenvalue. The nuclear (trace) norm is ∥ A ∥∗​ = ∑i=1min(m,n)​ σi​(A) = \operatorname{trace}\big( (ATA)^{1/2} \big), the sum of singular values. All three are unitarily invariant: for orthogonal U and V, ∥ UAV ∥F​ = ∥ A ∥F​, ∥ UAV ∥2​ = ∥ A ∥2​, and ∥ UAV ∥∗​ = ∥ A ∥∗​. They satisfy submultiplicativity for spectral norm: ∥ AB ∥2​ ≤ ∥ A ∥2​\,∥ B ∥2​, and the triangle inequality ∥ A+B ∥ ≤ ∥ A ∥+∥ B ∥. For an invertible square A, the 2-norm condition number is κ2​(A) = ∥ A ∥2​ ∥ A−1 ∥2​ = σmin​(A)σmax​(A)​ ≥ 1, where σmin​ is the smallest singular value.

04When to Use

Use the Frobenius norm when you want an entrywise measure that is easy to compute and differentiable—common in loss functions (e.g., least squares) and when bounding total energy of noise. Use the spectral norm when you care about worst-case amplification, stability of iterative methods, Lipschitz constants of linear layers (e.g., in deep learning), or need tight operator-norm bounds in proofs. Use the nuclear norm when promoting or measuring low-rank structure—matrix completion, robust PCA, and regularization in recommendation systems—because it is a convex proxy for rank that is optimization-friendly. Use condition numbers to predict sensitivity of solutions to Ax=b, to choose tolerances in numerical solvers, to diagnose ill-conditioning, and to compare algorithmic stability across formulations. In implementation, compute Frobenius exactly in O(mn); estimate spectral norm with power iteration or Lanczos when SVD is too expensive; and estimate nuclear norm with randomized subspace methods when only the leading singular values matter. When top accuracy is required or matrices are small, compute exact SVD to get spectral and nuclear norms directly.

⚠️Common Mistakes

  • Confusing entrywise norms (like Frobenius) with operator norms (like spectral). The Frobenius norm does not control worst-case amplification; use spectral norm for that. Avoid mixing them in bounds without proper inequalities. - Forming A^{\mathsf T}A explicitly for spectral computations can square the condition number and magnify round-off. Prefer multiplying by A and A^{\mathsf T} (as in power/Lanczos) or use stable factorizations. - Assuming the 2-norm condition number applies to rectangular or singular matrices without care. For non-square matrices, use the ratio \sigma_{\max}/\sigma_{\min} of A’s nonzero singular values; for rank-deficient matrices, \kappa_{2} is infinite. - Stopping power iteration too early or with a poor initial vector can yield underestimates. Use a tolerance on successive singular value estimates and multiple restarts if needed. - Believing nuclear norm equals rank or always recovers the minimal rank. It is a convex surrogate that often—but not always—promotes low rank; guarantees require incoherence and sampling conditions. - Ignoring unitary invariance. Applying orthogonal transforms (e.g., QR preconditioning) does not change these norms, which can simplify analysis. - Using matrix inverse explicitly to estimate \sigma_{\min}. Solve linear systems with LU/QR and inverse iteration instead of forming A^{-1} to avoid numerical instability.

Key Formulas

Frobenius norm

∥A∥F​=i=1∑m​j=1∑n​aij2​​=trace(ATA)​

Explanation: Adds up the squares of all entries and takes the square root. Equivalently, it is the square root of the trace of AT A, highlighting its relation to singular values.

Spectral norm

∥A∥2​=∥x∥2​=1max​∥Ax∥2​=σmax​(A)=λmax​(ATA)​

Explanation: The largest stretch a matrix can apply to a unit vector equals the largest singular value. Practically, this can be approximated with power iteration on A and AT.

Nuclear norm

∥A∥∗​=i=1∑min(m,n)​σi​(A)=trace((ATA)1/2)

Explanation: Sums all singular values and equals the trace of the matrix square root of AT A. It is convex and promotes low rank in optimization problems.

2-norm condition number

κ2​(A)=∥A∥2​∥A−1∥2​=σmin​(A)σmax​(A)​

Explanation: Measures sensitivity of solving Ax=b under 2-norm. Large values indicate potential amplification of relative errors.

Submultiplicativity (2-norm)

∥AB∥2​≤∥A∥2​∥B∥2​

Explanation: The worst-case amplification of a product is at most the product of worst-case amplifications. Useful for bounding errors in multi-stage pipelines.

Operator inequality

∥Ax∥2​≤∥A∥2​∥x∥2​

Explanation: Applying A to any vector cannot increase its 2-norm by more than the spectral norm. This underpins Lipschitz bounds and stability of linear layers.

Norm equivalences (2 vs Frobenius)

∥A∥2​≤∥A∥F​≤rank(A)​∥A∥2​

Explanation: The Frobenius norm lies between the spectral norm and its rank​ multiple. This helps translate bounds between entrywise and operator measures.

Nuclear–Frobenius inequality

∥A∥∗​≤rank(A)​∥A∥F​

Explanation: By Cauchy–Schwarz on singular values, the sum is bounded by r​ times the Euclidean length. It relates low-rank structure to total energy.

Frobenius–singular values relation

∥A∥F2​=i=1∑min(m,n)​σi​(A)2=trace(ATA)

Explanation: The Frobenius norm squares equal the sum of squared singular values. It connects entrywise energy to principal stretches.

Sensitivity bound

If (A+ΔA)x=b+Δb, ∥x∥2​∥Δx∥2​​≲κ2​(A)(∥A∥2​∥ΔA∥2​​+∥b∥2​∥Δb∥2​​)

Explanation: Relative solution error is bounded by the condition number times relative data perturbations. It motivates monitoring κ(A) in practice.

Complexity Analysis

Computing the Frobenius norm is linear in the number of entries: O(mn) time and O(1) extra space, since we sum squares of all elements. Exact spectral and nuclear norms require the singular value decomposition (SVD). A dense SVD costs O(mn2) when m≥n, or O(m2 n) when n≥m, and O(mn) memory for storing factors; this is the gold standard for accuracy but can be prohibitive for large matrices. For large-scale problems, iterative approximations are preferred. Power iteration estimates the largest singular value using only matrix–vector products with A and AT; each iteration costs O(mn) time and O(m+n) space, and a modest number (10–200) of iterations typically suffices, depending on the spectral gap. Estimating the smallest singular value (for the condition number) is harder; inverse iteration on (AT A)^{-1} uses linear solves that can be accelerated by an LU or Cholesky factorization computed once in O(n3) time and O(n2) space, followed by O(n2) per iteration. Randomized subspace iteration to approximate the nuclear norm with rank r basis requires O(q m n r) time for q power steps, plus O(m n r) to form the compressed matrix, and only O((m+n) r) memory; the final r×r eigen-decomposition is negligible (O(r3)). Numerically, avoid explicitly forming AT A for spectral norm estimation to prevent loss of accuracy; prefer applying A and AT. When exactness is required or matrices are small, SVD remains the most reliable approach.

Code Examples

Exact Frobenius norm (and basic helpers)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute Frobenius norm: ||A||_F = sqrt(sum_{i,j} a_{ij}^2)
5double frobeniusNorm(const vector<vector<double>>& A) {
6 long long m = (long long)A.size();
7 long long n = m ? (long long)A[0].size() : 0;
8 long double sumsq = 0.0L;
9 for (long long i = 0; i < m; ++i) {
10 for (long long j = 0; j < n; ++j) {
11 sumsq += (long double)A[i][j] * (long double)A[i][j];
12 }
13 }
14 return sqrt((double)sumsq);
15}
16
17int main() {
18 // Example matrix (3x3)
19 vector<vector<double>> A = {
20 {1.0, -2.0, 3.0},
21 {4.0, 0.5, -1.0},
22 {0.0, 2.0, 1.0}
23 };
24
25 double f = frobeniusNorm(A);
26 cout.setf(std::ios::fixed); cout<<setprecision(6);
27 cout << "Frobenius norm ||A||_F = " << f << "\n";
28
29 // Sanity check: ||A||_F^2 equals trace(A^T A)
30 // Compute A^T A and its trace
31 int m = (int)A.size(), n = (int)A[0].size();
32 vector<vector<double>> AtA(n, vector<double>(n, 0.0));
33 for (int i = 0; i < n; ++i)
34 for (int k = 0; k < m; ++k)
35 for (int j = 0; j < n; ++j)
36 AtA[i][j] += A[k][i] * A[k][j];
37 double tr = 0.0; for (int i = 0; i < n; ++i) tr += AtA[i][i];
38 cout << "trace(A^T A) = " << tr << ", (||A||_F)^2 = " << (f*f) << "\n";
39 return 0;
40}
41

This program computes the Frobenius norm exactly by summing squares of all entries. It also verifies the identity ||A||_F^2 = trace(A^T A) by explicitly forming A^T A and comparing the diagonal sum with the squared Frobenius norm.

Time: O(mn) for the Frobenius norm; O(m n^2) extra for the AtA sanity check in this demo.Space: O(1) extra for the Frobenius norm; O(n^2) to store A^T A in the sanity check.
Estimate spectral norm and 2-norm condition number via power and inverse iteration
1#include <bits/stdc++.h>
2using namespace std;
3
4using Matrix = vector<vector<double>>;
5using Vec = vector<double>;
6
7// Multiply y = A * x (A: m x n, x: n)
8Vec matVec(const Matrix& A, const Vec& x) {
9 int m = (int)A.size(), n = (int)A[0].size();
10 Vec y(m, 0.0);
11 for (int i = 0; i < m; ++i) {
12 double s = 0.0;
13 for (int j = 0; j < n; ++j) s += A[i][j] * x[j];
14 y[i] = s;
15 }
16 return y;
17}
18
19// Multiply z = A^T * y (A: m x n, y: m)
20Vec matTVec(const Matrix& A, const Vec& y) {
21 int m = (int)A.size(), n = (int)A[0].size();
22 Vec z(n, 0.0);
23 for (int j = 0; j < n; ++j) {
24 double s = 0.0;
25 for (int i = 0; i < m; ++i) s += A[i][j] * y[i];
26 z[j] = s;
27 }
28 return z;
29}
30
31// Vector helpers
32double norm2(const Vec& v) { double s=0; for(double x: v) s += x*x; return sqrt(s); }
33void normalize(Vec& v) { double n = norm2(v); if (n>0) for(double& x: v) x/=n; }
34double dot(const Vec& a, const Vec& b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
35
36// Power iteration to estimate spectral norm (largest singular value)
37double spectralNormPower(const Matrix& A, int maxIters=200, double tol=1e-8) {
38 int n = (int)A[0].size();
39 std::mt19937 rng(42);
40 std::normal_distribution<double> N(0.0, 1.0);
41 Vec v(n); for (int i=0;i<n;++i) v[i] = N(rng); normalize(v);
42 double prev = 0.0;
43 for (int it = 0; it < maxIters; ++it) {
44 // u = A v / ||A v||
45 Vec Av = matVec(A, v);
46 double nAv = norm2(Av);
47 if (nAv == 0) return 0.0; // zero matrix
48 for (double& x: Av) x /= nAv;
49 // v = A^T u / ||A^T u||
50 Vec ATu = matTVec(A, Av);
51 normalize(ATu);
52 v = ATu;
53 // Rayleigh quotient estimate: sigma ~= ||A v||
54 double sigma = norm2(matVec(A, v));
55 if (fabs(sigma - prev) < tol * max(1.0, sigma)) return sigma;
56 prev = sigma;
57 }
58 return prev;
59}
60
61// Build A^T A explicitly (n x n)
62Matrix buildAtA(const Matrix& A) {
63 int m = (int)A.size(), n = (int)A[0].size();
64 Matrix AtA(n, vector<double>(n, 0.0));
65 for (int i = 0; i < n; ++i)
66 for (int k = 0; k < m; ++k)
67 for (int j = 0; j < n; ++j)
68 AtA[i][j] += A[k][i] * A[k][j];
69 return AtA;
70}
71
72// LU decomposition with partial pivoting for a square matrix
73struct LU { Matrix LUmat; vector<int> piv; bool ok; };
74
75LU luDecompose(Matrix A) {
76 int n = (int)A.size();
77 vector<int> piv(n); iota(piv.begin(), piv.end(), 0);
78 for (int k = 0; k < n; ++k) {
79 // pivot selection
80 int p = k; double maxabs = fabs(A[k][k]);
81 for (int i = k+1; i < n; ++i) if (fabs(A[i][k]) > maxabs) { maxabs = fabs(A[i][k]); p = i; }
82 if (maxabs == 0.0) return {A, piv, false}; // singular
83 if (p != k) { swap(A[p], A[k]); swap(piv[p], piv[k]); }
84 // elimination
85 for (int i = k+1; i < n; ++i) {
86 A[i][k] /= A[k][k];
87 double lik = A[i][k];
88 for (int j = k+1; j < n; ++j) A[i][j] -= lik * A[k][j];
89 }
90 }
91 return {A, piv, true};
92}
93
94// Solve A x = b from LU with pivots
95Vec luSolve(const LU& lu, const Vec& b) {
96 int n = (int)lu.LUmat.size();
97 Vec x = b;
98 // Apply row permutations to b
99 Vec bp(n);
100 for (int i = 0; i < n; ++i) bp[i] = x[lu.piv[i]];
101 // Forward solve Ly = bp (L has unit diagonal)
102 for (int i = 0; i < n; ++i) {
103 for (int j = 0; j < i; ++j) bp[i] -= lu.LUmat[i][j] * bp[j];
104 }
105 // Backward solve Ux = y
106 for (int i = n-1; i >= 0; --i) {
107 for (int j = i+1; j < n; ++j) bp[i] -= lu.LUmat[i][j] * bp[j];
108 bp[i] /= lu.LUmat[i][i];
109 }
110 return bp;
111}
112
113// Estimate smallest singular value via power iteration on (A^T A)^{-1}
114// Requires LU factorization of M = A^T A (assumed nonsingular)
115double smallestSigmaEstimate(const Matrix& A, int maxIters=200, double tol=1e-8) {
116 int n = (int)A[0].size();
117 Matrix M = buildAtA(A); // n x n SPD if A full-column rank
118 LU lu = luDecompose(M);
119 if (!lu.ok) return 0.0; // singular or rank-deficient -> sigma_min = 0
120
121 std::mt19937 rng(123);
122 std::normal_distribution<double> N(0.0, 1.0);
123 Vec v(n); for (int i=0;i<n;++i) v[i] = N(rng); normalize(v);
124 double prev = 0.0;
125 for (int it=0; it<maxIters; ++it) {
126 // w = M^{-1} v (solve M w = v)
127 Vec w = luSolve(lu, v);
128 double nw = norm2(w);
129 if (nw == 0.0) return 0.0;
130 // Rayleigh quotient for M^{-1}
131 double lambda = dot(v, w) / dot(v, v);
132 // normalize for next iteration
133 for (int i=0;i<n;++i) v[i] = w[i] / nw;
134 if (fabs(lambda - prev) < tol * max(1.0, fabs(lambda))) {
135 // sigma_min = 1 / sqrt(lambda_max(M^{-1}))
136 return 1.0 / sqrt(lambda);
137 }
138 prev = lambda;
139 }
140 // Fallback after maxIters
141 double lambda = prev > 0 ? prev : 0.0;
142 return (lambda>0) ? 1.0 / sqrt(lambda) : 0.0;
143}
144
145int main(){
146 // Example: tall matrix (m x n) for spectral norm; square for condition number
147 Matrix A = {
148 {3.0, 1.0, 0.0},
149 {0.0, 2.0, 2.0},
150 {0.0, 0.0, 0.5},
151 {1.0, 0.0, 0.0}
152 }; // 4x3
153
154 cout.setf(std::ios::fixed); cout<<setprecision(8);
155 double sigmaMax = spectralNormPower(A, 300, 1e-9);
156 cout << "Estimated spectral norm ||A||_2 ≈ " << sigmaMax << "\n";
157
158 // For condition number, use a square invertible matrix
159 Matrix B = {
160 {4.0, 1.0, 0.0},
161 {2.0, 3.0, 1.0},
162 {0.0, 1.0, 2.0}
163 }; // 3x3
164
165 double sMaxB = spectralNormPower(B, 300, 1e-9);
166 double sMinB = smallestSigmaEstimate(B, 300, 1e-9);
167 double kappa2 = (sMinB > 0) ? (sMaxB / sMinB) : numeric_limits<double>::infinity();
168 cout << "Estimated sigma_max(B) ≈ " << sMaxB << ", sigma_min(B) ≈ " << sMinB << "\n";
169 cout << "Estimated condition number kappa_2(B) ≈ " << kappa2 << "\n";
170 return 0;
171}
172

This program estimates the spectral norm of a (possibly rectangular) matrix using power iteration without forming A^T A explicitly. For a square matrix, it also estimates the smallest singular value by running power iteration on (A^T A)^{-1} via repeated LU-based solves, then computes the 2-norm condition number as σ_max/σ_min. Using solves instead of forming an explicit inverse improves numerical stability.

Time: Spectral norm estimate: O(k m n) for k iterations. Smallest singular value estimate: building A^T A is O(m n^2), LU is O(n^3) once, and each of k iterations costs O(n^2).Space: O(m+n) for vectors in spectral norm; O(n^2) to store A^T A and its LU for the smallest singular value.
Approximate nuclear norm via randomized subspace iteration (sum of top-k singular values)
1#include <bits/stdc++.h>
2using namespace std;
3using Matrix = vector<vector<double>>;
4using Vec = vector<double>;
5
6// Matrix utilities
7int nrows(const Matrix& A){ return (int)A.size(); }
8int ncols(const Matrix& A){ return (int)A[0].size(); }
9Matrix zeros(int m, int n){ return Matrix(m, vector<double>(n, 0.0)); }
10Matrix transpose(const Matrix& A){ int m=nrows(A), n=ncols(A); Matrix AT(n, vector<double>(m)); for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i]=A[i][j]; return AT; }
11Matrix matmul(const Matrix& A, const Matrix& B){ int m=nrows(A), p=ncols(A), n=ncols(B); Matrix C(m, vector<double>(n,0.0)); for(int i=0;i<m;++i) for(int k=0;k<p;++k){ double aik=A[i][k]; for(int j=0;j<n;++j) C[i][j]+=aik*B[k][j]; } return C; }
12
13// Modified Gram-Schmidt to orthonormalize columns of A -> Q
14Matrix mgsOrthonormalize(const Matrix& A){
15 int m=nrows(A), r=ncols(A); Matrix Q= A;
16 for(int j=0;j<r;++j){
17 for(int k=0;k<j;++k){
18 double dot=0; for(int i=0;i<m;++i) dot+= Q[i][j]*Q[i][k];
19 for(int i=0;i<m;++i) Q[i][j]-= dot*Q[i][k];
20 }
21 double nrm=0; for(int i=0;i<m;++i) nrm+= Q[i][j]*Q[i][j]; nrm=sqrt(max(nrm,1e-30));
22 for(int i=0;i<m;++i) Q[i][j]/= nrm;
23 }
24 return Q;
25}
26
27// Symmetric Jacobi eigenvalue algorithm for small r x r matrices
28vector<double> jacobiEigenvalues(Matrix A, int maxSweeps=100, double tol=1e-10){
29 int n=nrows(A);
30 for(int sweep=0; sweep<maxSweeps; ++sweep){
31 double off=0.0; for(int i=0;i<n;++i) for(int j=i+1;j<n;++j) off+= fabs(A[i][j]);
32 if(off < tol) break;
33 for(int p=0;p<n;++p){
34 for(int q=p+1;q<n;++q){
35 if (fabs(A[p][q]) < 1e-15) continue;
36 double app=A[p][p], aqq=A[q][q], apq=A[p][q];
37 double tau = (aqq - app)/(2.0*apq);
38 double t = ((tau>=0)? 1.0/(tau+sqrt(1+tau*tau)) : 1.0/(tau - sqrt(1+tau*tau)));
39 double c = 1.0/sqrt(1+t*t), s = t*c;
40 // Apply rotation to rows p,q and columns p,q (symmetric update)
41 for(int k=0;k<n;++k){
42 double aik=A[p][k], aqk=A[q][k];
43 A[p][k]= c*aik - s*aqk;
44 A[q][k]= s*aik + c*aqk;
45 }
46 for(int k=0;k<n;++k){
47 double kip=A[k][p], kiq=A[k][q];
48 A[k][p]= c*kip - s*kiq;
49 A[k][q]= s*kip + c*kiq;
50 }
51 }
52 }
53 }
54 vector<double> eval(n); for(int i=0;i<n;++i) eval[i]=A[i][i];
55 return eval;
56}
57
58// Randomized subspace iteration to approximate nuclear norm by top-r singular values sum
59double nuclearNormApprox(const Matrix& A, int r=5, int q=1){
60 int m=nrows(A), n=ncols(A);
61 r = min({r, m, n});
62 // 1) Random starting matrix Omega (n x r)
63 std::mt19937 rng(7); std::normal_distribution<double> N(0.0,1.0);
64 Matrix Omega(n, vector<double>(r));
65 for(int i=0;i<n;++i) for(int j=0;j<r;++j) Omega[i][j]=N(rng);
66
67 // 2) Y = A * Omega; power steps: Y = (A A^T)^q * A * Omega
68 Matrix Y = matmul(A, Omega); // m x r
69 Y = mgsOrthonormalize(Y);
70 Matrix AT = transpose(A);
71 for(int it=0; it<q; ++it){
72 Matrix Z = matmul(AT, Y); // n x r
73 Z = mgsOrthonormalize(Z);
74 Y = matmul(A, Z); // m x r
75 Y = mgsOrthonormalize(Y);
76 }
77
78 // 3) Form compressed matrix B = Q^T A (r x n)
79 Matrix Q = Y; // columns orthonormal
80 Matrix QT = transpose(Q);
81 Matrix B = matmul(QT, A); // r x n
82
83 // 4) Compute eigenvalues of B B^T (r x r), their sqrt are singular values of B
84 Matrix BBt = matmul(B, transpose(B));
85 vector<double> evals = jacobiEigenvalues(BBt, 100, 1e-12);
86 double sumSigma = 0.0;
87 for(double lam : evals) sumSigma += sqrt(max(0.0, lam));
88 // This sums top-r singular values of A approximately (lower bound on ||A||_*)
89 return sumSigma;
90}
91
92int main(){
93 // Example matrix (low-ish rank structure)
94 Matrix A = {
95 {1.0, 2.0, 3.0, 4.0},
96 {2.0, 4.1, 6.1, 8.2},
97 {0.9, 1.8, 2.7, 3.6}
98 }; // 3x4 approximately rank-1 or 2
99
100 cout.setf(std::ios::fixed); cout<<setprecision(6);
101 double approxNuc = nuclearNormApprox(A, /*r=*/2, /*q=*/1);
102 cout << "Approximate nuclear norm (sum of top-2 sigmas) ≈ " << approxNuc << "\n";
103 // Note: This is a lower bound on the true nuclear norm when r < rank(A).
104 return 0;
105}
106

This program approximates the nuclear (trace) norm by summing the top-r singular values using randomized subspace iteration. It builds a small r-dimensional orthonormal basis capturing A’s dominant action, compresses A, and computes the singular values of the compressed matrix via eigenvalues of B B^T. The result is a lower bound on the true nuclear norm when r is smaller than the actual rank; increasing r and the number of power steps q improves accuracy.

Time: O(m n r) for the initial sample and compression plus O(q m n r) for q power steps; the final r×r eigen-decomposition is O(r^3).Space: O((m+n) r) for the subspace and compressed matrix; O(r^2) for the small eigenproblem.
#matrix norm#spectral norm#frobenius norm#nuclear norm#condition number#singular values#svd#power iteration#randomized algorithms#orthogonal invariance#operator norm#low-rank#stability#ill-conditioned#trace norm