∑MathIntermediate

Matrix Norms & Condition Numbers

Key Points

•
Matrix norms measure the size of a matrix in different but related ways, with Frobenius treating entries like a big vector, spectral measuring the strongest stretch, and nuclear summing all singular values.
•
The spectral norm equals the largest singular value and controls how much a matrix can amplify any vector in the 2-norm.
•
The Frobenius norm equals the square root of the sum of squared entries and is also the square root of the trace of $A^{T}$ A.
•
The nuclear norm equals the sum of singular values and is a convex surrogate for matrix rank, useful in low-rank recovery and regularization.
•
Condition number in the 2-norm is the ratio of largest to smallest singular value and predicts how sensitive solutions of Ax=b are to perturbations.
•
Exact spectral and nuclear norms require SVD, but fast approximations can be obtained with power iteration and randomized subspace methods.
•
Unitary (orthogonal) invariance means Frobenius, spectral, and nuclear norms do not change under orthogonal row/column rotations.
•
Avoid explicitly forming $A^{T}$ A when possible for numerical stability; prefer multiplying by A and $A^{T}$ directly or using factorizations.

Prerequisites

→Vector norms (1-, 2-, and ∞-norms) — Matrix operator norms are defined via induced vector norms; understanding vector norms clarifies their meaning.
→Eigenvalues and eigenvectors — Spectral norm and condition numbers are defined via eigenvalues of A^T A and their properties.
→Singular Value Decomposition (SVD) — Spectral and nuclear norms are expressed in terms of singular values; many properties follow from SVD.
→Orthogonality and orthogonal matrices — Unitary invariance and stability arguments use orthogonal transformations and inner products.
→Basic numerical linear algebra (LU/QR) — Stable computation of smallest singular values and solving linear systems relies on factorizations.

Detailed Explanation

Tap terms for definitions

01Overview

Matrix norms generalize the idea of vector length to matrices and let us reason about the size, amplification, and stability of linear transformations. Three especially important norms are the Frobenius norm, spectral (operator 2-) norm, and nuclear (trace) norm. The Frobenius norm treats all entries as one long vector and measures their Euclidean length. The spectral norm measures the strongest stretching factor a matrix can apply to any vector when using the Euclidean vector norm. The nuclear norm adds up all singular values and is widely used because it is the convex envelope of matrix rank on the unit spectral-norm ball. Condition numbers combine a matrix norm with the norm of its inverse (when the inverse exists) to quantify sensitivity: a high condition number means small input changes can blow up into large output errors. These tools provide the language for error bounds, stability guarantees, and regularization in algorithms spanning numerical linear algebra, optimization, machine learning, signal processing, and control.

02Intuition & Analogies

Imagine a matrix as a machine that takes in a vector and outputs another vector by stretching, squashing, and rotating it. If you put in a unit-length arrow and the machine sometimes stretches it a lot, the spectral norm is that maximum possible stretch. It’s like testing the machine with all possible unit arrows and recording the largest output length: this tells you the worst-case amplification. Now think of the Frobenius norm as weighing the machine’s overall “energy.” If you disassembled the machine into gears (the entries of the matrix) and measured the squared sizes of all gears, the Frobenius norm is the square root of their total. It does not care about direction; it cares about aggregate magnitude, much like the overall power consumption of a device. The nuclear norm views the machine through its singular values (its principal stretch factors in orthogonal directions). If you imagine the machine stretching space along several independent axes by amounts σ1, σ2, …, the nuclear norm is the total stretch budget: the sum of these amounts. This is why it relates to rank—the number of nonzero axes—and becomes a natural tool to encourage low-rank structure by penalizing total stretch. Finally, condition number is a fragility meter. If the machine barely squashes some direction (very small σ_min) but strongly stretches another (large σ_max), then tiny errors along the squashed direction can explode when you try to reverse the process. That’s a high condition number and a warning sign for solving Ax=b reliably.

03Formal Definition

Let A

\in

R^{m \times n}

. The Frobenius norm is

∥

∥_{F}

\sum_{i = 1}^{m} \sum_{j = 1}^{n} a_{ij}^{2}

trace (A^{T} A)

. The spectral (operator 2-) norm is

∥

∥_{2}

max_{∥ x ∥_{2} = 1}

∥

∥_{2}

σ_{m a x}

(A) =

λ_{m a x} (A^{T} A)

, where

σ_{m a x}

denotes the largest singular value and

λ_{m a x}

the largest eigenvalue. The nuclear (trace) norm is

∥

∥_{*}

\sum_{i = 1}^{m i n (m, n)}

σ_{i}

(A) =

\operatorname{trace}\big

( (

A^{T}

A)^{1/2}

\big

), the sum of singular values. All three are unitarily invariant: for orthogonal U and V,

∥

UAV

∥_{F}

∥

∥_{F}

∥

UAV

∥_{2}

∥

∥_{2}

, and

∥

UAV

∥_{*}

∥

∥_{*}

. They satisfy submultiplicativity for spectral norm:

∥

∥_{2}

\leq

∥

∥_{2}

∥

∥_{2}

, and the triangle inequality

∥

A+B

∥

\leq

∥

∥

∥

∥

. For an invertible square A, the 2-norm condition number is

κ_{2}

(A) =

∥

∥_{2}

∥

A^{- 1}

∥_{2}

\frac{σ _{m a x} ( A )}{σ _{m i n} ( A )}

\geq

1, where

σ_{m i n}

is the smallest singular value.

04When to Use

Use the Frobenius norm when you want an entrywise measure that is easy to compute and differentiable—common in loss functions (e.g., least squares) and when bounding total energy of noise. Use the spectral norm when you care about worst-case amplification, stability of iterative methods, Lipschitz constants of linear layers (e.g., in deep learning), or need tight operator-norm bounds in proofs. Use the nuclear norm when promoting or measuring low-rank structure—matrix completion, robust PCA, and regularization in recommendation systems—because it is a convex proxy for rank that is optimization-friendly. Use condition numbers to predict sensitivity of solutions to Ax=b, to choose tolerances in numerical solvers, to diagnose ill-conditioning, and to compare algorithmic stability across formulations. In implementation, compute Frobenius exactly in O(mn); estimate spectral norm with power iteration or Lanczos when SVD is too expensive; and estimate nuclear norm with randomized subspace methods when only the leading singular values matter. When top accuracy is required or matrices are small, compute exact SVD to get spectral and nuclear norms directly.

⚠️Common Mistakes

Confusing entrywise norms (like Frobenius) with operator norms (like spectral). The Frobenius norm does not control worst-case amplification; use spectral norm for that. Avoid mixing them in bounds without proper inequalities. - Forming $A^{\mathsf T}$ A explicitly for spectral computations can square the condition number and magnify round-off. Prefer multiplying by A and $A^{\mathsf T}$ (as in power/Lanczos) or use stable factorizations. - Assuming the 2-norm condition number applies to rectangular or singular matrices without care. For non-square matrices, use the ratio \sigm $a_{\max}$ /\sigm $a_{\min}$ of A’s nonzero singular values; for rank-deficient matrices, \kapp $a_{2}$ is infinite. - Stopping power iteration too early or with a poor initial vector can yield underestimates. Use a tolerance on successive singular value estimates and multiple restarts if needed. - Believing nuclear norm equals rank or always recovers the minimal rank. It is a convex surrogate that often—but not always—promotes low rank; guarantees require incoherence and sampling conditions. - Ignoring unitary invariance. Applying orthogonal transforms (e.g., QR preconditioning) does not change these norms, which can simplify analysis. - Using matrix inverse explicitly to estimate \sigm $a_{\min}$ . Solve linear systems with LU/QR and inverse iteration instead of forming $A^{-1}$ to avoid numerical instability.

Key Formulas

Frobenius norm

∥ A ∥_{F} = i = 1 \sum m j = 1 \sum n a_{ij}^{2} = trace (A^{T} A)

Explanation: Adds up the squares of all entries and takes the square root. Equivalently, it is the square root of the trace of $A^{T}$ A, highlighting its relation to singular values.

Spectral norm

∥ A ∥_{2} = ∥ x ∥_{2} = 1 max ∥ A x ∥_{2} = σ_{m a x} (A) = λ_{m a x} (A^{T} A)

Explanation: The largest stretch a matrix can apply to a unit vector equals the largest singular value. Practically, this can be approximated with power iteration on A and $A^{T}$ .

Nuclear norm

∥ A ∥_{*} = i = 1 \sum m i n (m, n) σ_{i} (A) = trace ((A^{T} A)^{1/2})

Explanation: Sums all singular values and equals the trace of the matrix square root of $A^{T}$ A. It is convex and promotes low rank in optimization problems.

2-norm condition number

κ_{2} (A) = ∥ A ∥_{2} ∥ A^{- 1} ∥_{2} = \frac{σ _{m a x} ( A )}{σ _{m i n} ( A )}

Explanation: Measures sensitivity of solving Ax=b under 2-norm. Large values indicate potential amplification of relative errors.

Submultiplicativity (2-norm)

∥ A B ∥_{2} \leq ∥ A ∥_{2} ∥ B ∥_{2}

Explanation: The worst-case amplification of a product is at most the product of worst-case amplifications. Useful for bounding errors in multi-stage pipelines.

Operator inequality

∥ A x ∥_{2} \leq ∥ A ∥_{2} ∥ x ∥_{2}

Explanation: Applying A to any vector cannot increase its 2-norm by more than the spectral norm. This underpins Lipschitz bounds and stability of linear layers.

Norm equivalences (2 vs Frobenius)

∥ A ∥_{2} \leq ∥ A ∥_{F} \leq rank (A) ∥ A ∥_{2}

Explanation: The Frobenius norm lies between the spectral norm and its $r ank$ multiple. This helps translate bounds between entrywise and operator measures.

Nuclear–Frobenius inequality

∥ A ∥_{*} \leq rank (A) ∥ A ∥_{F}

Explanation: By Cauchy–Schwarz on singular values, the sum is bounded by $r$ times the Euclidean length. It relates low-rank structure to total energy.

Frobenius–singular values relation

∥ A ∥_{F}^{2} = i = 1 \sum m i n (m, n) σ_{i} (A)^{2} = trace (A^{T} A)

Explanation: The Frobenius norm squares equal the sum of squared singular values. It connects entrywise energy to principal stretches.

Sensitivity bound

If (A + Δ A) x = b + Δ b, \frac{∥ Δ x ∥ _{2}}{∥ x ∥ _{2}} ≲ κ_{2} (A) (\frac{∥ Δ A ∥ _{2}}{∥ A ∥ _{2}} + \frac{∥ Δ b ∥ _{2}}{∥ b ∥ _{2}})

Explanation: Relative solution error is bounded by the condition number times relative data perturbations. It motivates monitoring $κ$ (A) in practice.

Complexity Analysis

Computing the Frobenius norm is linear in the number of entries: O(mn) time and O(1) extra space, since we sum squares of all elements. Exact spectral and nuclear norms require the singular value decomposition (SVD). A dense SVD costs O(m

n^{2}

) when

m \geq n

, or O(

m^{2}

n) when

n \geq m

, and O(mn) memory for storing factors; this is the gold standard for accuracy but can be prohibitive for large matrices. For large-scale problems, iterative approximations are preferred. Power iteration estimates the largest singular value using only matrix–vector products with A and

A^{T}

; each iteration costs O(mn) time and O(m+n) space, and a modest number (10–200) of iterations typically suffices, depending on the spectral gap. Estimating the smallest singular value (for the condition number) is harder; inverse iteration on (

A^{T}

A)^{-1} uses linear solves that can be accelerated by an LU or Cholesky factorization computed once in O(

n^{3}

) time and O(

n^{2}

) space, followed by O(

n^{2}

) per iteration. Randomized subspace iteration to approximate the nuclear norm with rank r basis requires O(q m n r) time for q power steps, plus O(m n r) to form the compressed matrix, and only O((m+n) r) memory; the final r×r eigen-decomposition is negligible (O(

r^{3}

)). Numerically, avoid explicitly forming

A^{T}

A for spectral norm estimation to prevent loss of accuracy; prefer applying A and

A^{T}

. When exactness is required or matrices are small, SVD remains the most reliable approach.

Code Examples

Exact Frobenius norm (and basic helpers)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute Frobenius norm: ||A||_F = sqrt(sum_{i,j} a_{ij}^2)
5 double frobeniusNorm(const vector<vector<double>>& A) {
6     long long m = (long long)A.size();
7     long long n = m ? (long long)A[0].size() : 0;
8     long double sumsq = 0.0L;
9     for (long long i = 0; i < m; ++i) {
10         for (long long j = 0; j < n; ++j) {
11             sumsq += (long double)A[i][j] * (long double)A[i][j];
12         }
13     }
14     return sqrt((double)sumsq);
15 }
16 
17 int main() {
18     // Example matrix (3x3)
19     vector<vector<double>> A = {
20         {1.0, -2.0, 3.0},
21         {4.0,  0.5, -1.0},
22         {0.0,  2.0,  1.0}
23     };
24 
25     double f = frobeniusNorm(A);
26     cout.setf(std::ios::fixed); cout<<setprecision(6);
27     cout << "Frobenius norm ||A||_F = " << f << "\n";
28 
29     // Sanity check: ||A||_F^2 equals trace(A^T A)
30     // Compute A^T A and its trace
31     int m = (int)A.size(), n = (int)A[0].size();
32     vector<vector<double>> AtA(n, vector<double>(n, 0.0));
33     for (int i = 0; i < n; ++i)
34         for (int k = 0; k < m; ++k)
35             for (int j = 0; j < n; ++j)
36                 AtA[i][j] += A[k][i] * A[k][j];
37     double tr = 0.0; for (int i = 0; i < n; ++i) tr += AtA[i][i];
38     cout << "trace(A^T A) = " << tr << ",  (||A||_F)^2 = " << (f*f) << "\n";
39     return 0;
40 }
41

This program computes the Frobenius norm exactly by summing squares of all entries. It also verifies the identity ||A||_F^2 = trace(A^T A) by explicitly forming A^T A and comparing the diagonal sum with the squared Frobenius norm.

Time: O(mn) for the Frobenius norm; O(m n^2) extra for the AtA sanity check in this demo.Space: O(1) extra for the Frobenius norm; O(n^2) to store A^T A in the sanity check.

Estimate spectral norm and 2-norm condition number via power and inverse iteration

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 using Matrix = vector<vector<double>>;
5 using Vec = vector<double>;
6 
7 // Multiply y = A * x  (A: m x n, x: n)
8 Vec matVec(const Matrix& A, const Vec& x) {
9     int m = (int)A.size(), n = (int)A[0].size();
10     Vec y(m, 0.0);
11     for (int i = 0; i < m; ++i) {
12         double s = 0.0;
13         for (int j = 0; j < n; ++j) s += A[i][j] * x[j];
14         y[i] = s;
15     }
16     return y;
17 }
18 
19 // Multiply z = A^T * y  (A: m x n, y: m)
20 Vec matTVec(const Matrix& A, const Vec& y) {
21     int m = (int)A.size(), n = (int)A[0].size();
22     Vec z(n, 0.0);
23     for (int j = 0; j < n; ++j) {
24         double s = 0.0;
25         for (int i = 0; i < m; ++i) s += A[i][j] * y[i];
26         z[j] = s;
27     }
28     return z;
29 }
30 
31 // Vector helpers
32 double norm2(const Vec& v) { double s=0; for(double x: v) s += x*x; return sqrt(s); }
33 void normalize(Vec& v) { double n = norm2(v); if (n>0) for(double& x: v) x/=n; }
34 double dot(const Vec& a, const Vec& b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
35 
36 // Power iteration to estimate spectral norm (largest singular value)
37 double spectralNormPower(const Matrix& A, int maxIters=200, double tol=1e-8) {
38     int n = (int)A[0].size();
39     std::mt19937 rng(42);
40     std::normal_distribution<double> N(0.0, 1.0);
41     Vec v(n); for (int i=0;i<n;++i) v[i] = N(rng); normalize(v);
42     double prev = 0.0;
43     for (int it = 0; it < maxIters; ++it) {
44         // u = A v / ||A v||
45         Vec Av = matVec(A, v);
46         double nAv = norm2(Av);
47         if (nAv == 0) return 0.0; // zero matrix
48         for (double& x: Av) x /= nAv;
49         // v = A^T u / ||A^T u||
50         Vec ATu = matTVec(A, Av);
51         normalize(ATu);
52         v = ATu;
53         // Rayleigh quotient estimate: sigma ~= ||A v||
54         double sigma = norm2(matVec(A, v));
55         if (fabs(sigma - prev) < tol * max(1.0, sigma)) return sigma;
56         prev = sigma;
57     }
58     return prev;
59 }
60 
61 // Build A^T A explicitly (n x n)
62 Matrix buildAtA(const Matrix& A) {
63     int m = (int)A.size(), n = (int)A[0].size();
64     Matrix AtA(n, vector<double>(n, 0.0));
65     for (int i = 0; i < n; ++i)
66         for (int k = 0; k < m; ++k)
67             for (int j = 0; j < n; ++j)
68                 AtA[i][j] += A[k][i] * A[k][j];
69     return AtA;
70 }
71 
72 // LU decomposition with partial pivoting for a square matrix
73 struct LU { Matrix LUmat; vector<int> piv; bool ok; };
74 
75 LU luDecompose(Matrix A) {
76     int n = (int)A.size();
77     vector<int> piv(n); iota(piv.begin(), piv.end(), 0);
78     for (int k = 0; k < n; ++k) {
79         // pivot selection
80         int p = k; double maxabs = fabs(A[k][k]);
81         for (int i = k+1; i < n; ++i) if (fabs(A[i][k]) > maxabs) { maxabs = fabs(A[i][k]); p = i; }
82         if (maxabs == 0.0) return {A, piv, false}; // singular
83         if (p != k) { swap(A[p], A[k]); swap(piv[p], piv[k]); }
84         // elimination
85         for (int i = k+1; i < n; ++i) {
86             A[i][k] /= A[k][k];
87             double lik = A[i][k];
88             for (int j = k+1; j < n; ++j) A[i][j] -= lik * A[k][j];
89         }
90     }
91     return {A, piv, true};
92 }
93 
94 // Solve A x = b from LU with pivots
95 Vec luSolve(const LU& lu, const Vec& b) {
96     int n = (int)lu.LUmat.size();
97     Vec x = b;
98     // Apply row permutations to b
99     Vec bp(n);
100     for (int i = 0; i < n; ++i) bp[i] = x[lu.piv[i]];
101     // Forward solve Ly = bp (L has unit diagonal)
102     for (int i = 0; i < n; ++i) {
103         for (int j = 0; j < i; ++j) bp[i] -= lu.LUmat[i][j] * bp[j];
104     }
105     // Backward solve Ux = y
106     for (int i = n-1; i >= 0; --i) {
107         for (int j = i+1; j < n; ++j) bp[i] -= lu.LUmat[i][j] * bp[j];
108         bp[i] /= lu.LUmat[i][i];
109     }
110     return bp;
111 }
112 
113 // Estimate smallest singular value via power iteration on (A^T A)^{-1}
114 // Requires LU factorization of M = A^T A (assumed nonsingular)
115 double smallestSigmaEstimate(const Matrix& A, int maxIters=200, double tol=1e-8) {
116     int n = (int)A[0].size();
117     Matrix M = buildAtA(A); // n x n SPD if A full-column rank
118     LU lu = luDecompose(M);
119     if (!lu.ok) return 0.0; // singular or rank-deficient -> sigma_min = 0
120 
121     std::mt19937 rng(123);
122     std::normal_distribution<double> N(0.0, 1.0);
123     Vec v(n); for (int i=0;i<n;++i) v[i] = N(rng); normalize(v);
124     double prev = 0.0;
125     for (int it=0; it<maxIters; ++it) {
126         // w = M^{-1} v  (solve M w = v)
127         Vec w = luSolve(lu, v);
128         double nw = norm2(w);
129         if (nw == 0.0) return 0.0;
130         // Rayleigh quotient for M^{-1}
131         double lambda = dot(v, w) / dot(v, v);
132         // normalize for next iteration
133         for (int i=0;i<n;++i) v[i] = w[i] / nw;
134         if (fabs(lambda - prev) < tol * max(1.0, fabs(lambda))) {
135             // sigma_min = 1 / sqrt(lambda_max(M^{-1}))
136             return 1.0 / sqrt(lambda);
137         }
138         prev = lambda;
139     }
140     // Fallback after maxIters
141     double lambda = prev > 0 ? prev : 0.0;
142     return (lambda>0) ? 1.0 / sqrt(lambda) : 0.0;
143 }
144 
145 int main(){
146     // Example: tall matrix (m x n) for spectral norm; square for condition number
147     Matrix A = {
148         {3.0, 1.0, 0.0},
149         {0.0, 2.0, 2.0},
150         {0.0, 0.0, 0.5},
151         {1.0, 0.0, 0.0}
152     }; // 4x3
153 
154     cout.setf(std::ios::fixed); cout<<setprecision(8);
155     double sigmaMax = spectralNormPower(A, 300, 1e-9);
156     cout << "Estimated spectral norm ||A||_2 ≈ " << sigmaMax << "\n";
157 
158     // For condition number, use a square invertible matrix
159     Matrix B = {
160         {4.0, 1.0, 0.0},
161         {2.0, 3.0, 1.0},
162         {0.0, 1.0, 2.0}
163     }; // 3x3
164 
165     double sMaxB = spectralNormPower(B, 300, 1e-9);
166     double sMinB = smallestSigmaEstimate(B, 300, 1e-9);
167     double kappa2 = (sMinB > 0) ? (sMaxB / sMinB) : numeric_limits<double>::infinity();
168     cout << "Estimated sigma_max(B) ≈ " << sMaxB << ", sigma_min(B) ≈ " << sMinB << "\n";
169     cout << "Estimated condition number kappa_2(B) ≈ " << kappa2 << "\n";
170     return 0;
171 }
172

This program estimates the spectral norm of a (possibly rectangular) matrix using power iteration without forming A^T A explicitly. For a square matrix, it also estimates the smallest singular value by running power iteration on (A^T A)^{-1} via repeated LU-based solves, then computes the 2-norm condition number as σ_max/σ_min. Using solves instead of forming an explicit inverse improves numerical stability.

Time: Spectral norm estimate: O(k m n) for k iterations. Smallest singular value estimate: building A^T A is O(m n^2), LU is O(n^3) once, and each of k iterations costs O(n^2).Space: O(m+n) for vectors in spectral norm; O(n^2) to store A^T A and its LU for the smallest singular value.

Approximate nuclear norm via randomized subspace iteration (sum of top-k singular values)

1 #include <bits/stdc++.h>
2 using namespace std;
3 using Matrix = vector<vector<double>>;
4 using Vec = vector<double>;
5 
6 // Matrix utilities
7 int nrows(const Matrix& A){ return (int)A.size(); }
8 int ncols(const Matrix& A){ return (int)A[0].size(); }
9 Matrix zeros(int m, int n){ return Matrix(m, vector<double>(n, 0.0)); }
10 Matrix transpose(const Matrix& A){ int m=nrows(A), n=ncols(A); Matrix AT(n, vector<double>(m)); for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i]=A[i][j]; return AT; }
11 Matrix matmul(const Matrix& A, const Matrix& B){ int m=nrows(A), p=ncols(A), n=ncols(B); Matrix C(m, vector<double>(n,0.0)); for(int i=0;i<m;++i) for(int k=0;k<p;++k){ double aik=A[i][k]; for(int j=0;j<n;++j) C[i][j]+=aik*B[k][j]; } return C; }
12 
13 // Modified Gram-Schmidt to orthonormalize columns of A -> Q
14 Matrix mgsOrthonormalize(const Matrix& A){
15     int m=nrows(A), r=ncols(A); Matrix Q= A;
16     for(int j=0;j<r;++j){
17         for(int k=0;k<j;++k){
18             double dot=0; for(int i=0;i<m;++i) dot+= Q[i][j]*Q[i][k];
19             for(int i=0;i<m;++i) Q[i][j]-= dot*Q[i][k];
20         }
21         double nrm=0; for(int i=0;i<m;++i) nrm+= Q[i][j]*Q[i][j]; nrm=sqrt(max(nrm,1e-30));
22         for(int i=0;i<m;++i) Q[i][j]/= nrm;
23     }
24     return Q;
25 }
26 
27 // Symmetric Jacobi eigenvalue algorithm for small r x r matrices
28 vector<double> jacobiEigenvalues(Matrix A, int maxSweeps=100, double tol=1e-10){
29     int n=nrows(A);
30     for(int sweep=0; sweep<maxSweeps; ++sweep){
31         double off=0.0; for(int i=0;i<n;++i) for(int j=i+1;j<n;++j) off+= fabs(A[i][j]);
32         if(off < tol) break;
33         for(int p=0;p<n;++p){
34             for(int q=p+1;q<n;++q){
35                 if (fabs(A[p][q]) < 1e-15) continue;
36                 double app=A[p][p], aqq=A[q][q], apq=A[p][q];
37                 double tau = (aqq - app)/(2.0*apq);
38                 double t = ((tau>=0)? 1.0/(tau+sqrt(1+tau*tau)) : 1.0/(tau - sqrt(1+tau*tau)));
39                 double c = 1.0/sqrt(1+t*t), s = t*c;
40                 // Apply rotation to rows p,q and columns p,q (symmetric update)
41                 for(int k=0;k<n;++k){
42                     double aik=A[p][k], aqk=A[q][k];
43                     A[p][k]= c*aik - s*aqk;
44                     A[q][k]= s*aik + c*aqk;
45                 }
46                 for(int k=0;k<n;++k){
47                     double kip=A[k][p], kiq=A[k][q];
48                     A[k][p]= c*kip - s*kiq;
49                     A[k][q]= s*kip + c*kiq;
50                 }
51             }
52         }
53     }
54     vector<double> eval(n); for(int i=0;i<n;++i) eval[i]=A[i][i];
55     return eval;
56 }
57 
58 // Randomized subspace iteration to approximate nuclear norm by top-r singular values sum
59 double nuclearNormApprox(const Matrix& A, int r=5, int q=1){
60     int m=nrows(A), n=ncols(A);
61     r = min({r, m, n});
62     // 1) Random starting matrix Omega (n x r)
63     std::mt19937 rng(7); std::normal_distribution<double> N(0.0,1.0);
64     Matrix Omega(n, vector<double>(r));
65     for(int i=0;i<n;++i) for(int j=0;j<r;++j) Omega[i][j]=N(rng);
66 
67     // 2) Y = A * Omega; power steps: Y = (A A^T)^q * A * Omega
68     Matrix Y = matmul(A, Omega); // m x r
69     Y = mgsOrthonormalize(Y);
70     Matrix AT = transpose(A);
71     for(int it=0; it<q; ++it){
72         Matrix Z = matmul(AT, Y);   // n x r
73         Z = mgsOrthonormalize(Z);
74         Y = matmul(A, Z);          // m x r
75         Y = mgsOrthonormalize(Y);
76     }
77 
78     // 3) Form compressed matrix B = Q^T A  (r x n)
79     Matrix Q = Y; // columns orthonormal
80     Matrix QT = transpose(Q);
81     Matrix B = matmul(QT, A); // r x n
82 
83     // 4) Compute eigenvalues of B B^T (r x r), their sqrt are singular values of B
84     Matrix BBt = matmul(B, transpose(B));
85     vector<double> evals = jacobiEigenvalues(BBt, 100, 1e-12);
86     double sumSigma = 0.0;
87     for(double lam : evals) sumSigma += sqrt(max(0.0, lam));
88     // This sums top-r singular values of A approximately (lower bound on ||A||_*)
89     return sumSigma;
90 }
91 
92 int main(){
93     // Example matrix (low-ish rank structure)
94     Matrix A = {
95         {1.0, 2.0, 3.0, 4.0},
96         {2.0, 4.1, 6.1, 8.2},
97         {0.9, 1.8, 2.7, 3.6}
98     }; // 3x4 approximately rank-1 or 2
99 
100     cout.setf(std::ios::fixed); cout<<setprecision(6);
101     double approxNuc = nuclearNormApprox(A, /*r=*/2, /*q=*/1);
102     cout << "Approximate nuclear norm (sum of top-2 sigmas) ≈ " << approxNuc << "\n";
103     // Note: This is a lower bound on the true nuclear norm when r < rank(A).
104     return 0;
105 }
106

This program approximates the nuclear (trace) norm by summing the top-r singular values using randomized subspace iteration. It builds a small r-dimensional orthonormal basis capturing A’s dominant action, compresses A, and computes the singular values of the compressed matrix via eigenvalues of B B^T. The result is a lower bound on the true nuclear norm when r is smaller than the actual rank; increasing r and the number of power steps q improves accuracy.

Time: O(m n r) for the initial sample and compression plus O(q m n r) for q power steps; the final r×r eigen-decomposition is O(r^3).Space: O((m+n) r) for the subspace and compressed matrix; O(r^2) for the small eigenproblem.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute Frobenius norm: \|\|A\|\|_F = sqrt(sum_{i,j} a_{ij}^2)
5	double frobeniusNorm(const vector<vector<double>>& A) {
6	long long m = (long long)A.size();
7	long long n = m ? (long long)A[0].size() : 0;
8	long double sumsq = 0.0L;
9	for (long long i = 0; i < m; ++i) {
10	for (long long j = 0; j < n; ++j) {
11	sumsq += (long double)A[i][j] * (long double)A[i][j];
12	}
13	}
14	return sqrt((double)sumsq);
15	}
16
17	int main() {
18	// Example matrix (3x3)
19	vector<vector<double>> A = {
20	{1.0, -2.0, 3.0},
21	{4.0, 0.5, -1.0},
22	{0.0, 2.0, 1.0}
23	};
24
25	double f = frobeniusNorm(A);
26	cout.setf(std::ios::fixed); cout<<setprecision(6);
27	cout << "Frobenius norm \|\|A\|\|_F = " << f << "\n";
28
29	// Sanity check: \|\|A\|\|_F^2 equals trace(A^T A)
30	// Compute A^T A and its trace
31	int m = (int)A.size(), n = (int)A[0].size();
32	vector<vector<double>> AtA(n, vector<double>(n, 0.0));
33	for (int i = 0; i < n; ++i)
34	for (int k = 0; k < m; ++k)
35	for (int j = 0; j < n; ++j)
36	AtA[i][j] += A[k][i] * A[k][j];
37	double tr = 0.0; for (int i = 0; i < n; ++i) tr += AtA[i][i];
38	cout << "trace(A^T A) = " << tr << ", (\|\|A\|\|_F)^2 = " << (f*f) << "\n";
39	return 0;
40	}
41

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	using Matrix = vector<vector<double>>;
5	using Vec = vector<double>;
6
7	// Multiply y = A * x (A: m x n, x: n)
8	Vec matVec(const Matrix& A, const Vec& x) {
9	int m = (int)A.size(), n = (int)A[0].size();
10	Vec y(m, 0.0);
11	for (int i = 0; i < m; ++i) {
12	double s = 0.0;
13	for (int j = 0; j < n; ++j) s += A[i][j] * x[j];
14	y[i] = s;
15	}
16	return y;
17	}
18
19	// Multiply z = A^T * y (A: m x n, y: m)
20	Vec matTVec(const Matrix& A, const Vec& y) {
21	int m = (int)A.size(), n = (int)A[0].size();
22	Vec z(n, 0.0);
23	for (int j = 0; j < n; ++j) {
24	double s = 0.0;
25	for (int i = 0; i < m; ++i) s += A[i][j] * y[i];
26	z[j] = s;
27	}
28	return z;
29	}
30
31	// Vector helpers
32	double norm2(const Vec& v) { double s=0; for(double x: v) s += x*x; return sqrt(s); }
33	void normalize(Vec& v) { double n = norm2(v); if (n>0) for(double& x: v) x/=n; }
34	double dot(const Vec& a, const Vec& b){ double s=0; for(size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
35
36	// Power iteration to estimate spectral norm (largest singular value)
37	double spectralNormPower(const Matrix& A, int maxIters=200, double tol=1e-8) {
38	int n = (int)A[0].size();
39	std::mt19937 rng(42);
40	std::normal_distribution<double> N(0.0, 1.0);
41	Vec v(n); for (int i=0;i<n;++i) v[i] = N(rng); normalize(v);
42	double prev = 0.0;
43	for (int it = 0; it < maxIters; ++it) {
44	// u = A v / \|\|A v\|\|
45	Vec Av = matVec(A, v);
46	double nAv = norm2(Av);
47	if (nAv == 0) return 0.0; // zero matrix
48	for (double& x: Av) x /= nAv;
49	// v = A^T u / \|\|A^T u\|\|
50	Vec ATu = matTVec(A, Av);
51	normalize(ATu);
52	v = ATu;
53	// Rayleigh quotient estimate: sigma ~= \|\|A v\|\|
54	double sigma = norm2(matVec(A, v));
55	if (fabs(sigma - prev) < tol * max(1.0, sigma)) return sigma;
56	prev = sigma;
57	}
58	return prev;
59	}
60
61	// Build A^T A explicitly (n x n)
62	Matrix buildAtA(const Matrix& A) {
63	int m = (int)A.size(), n = (int)A[0].size();
64	Matrix AtA(n, vector<double>(n, 0.0));
65	for (int i = 0; i < n; ++i)
66	for (int k = 0; k < m; ++k)
67	for (int j = 0; j < n; ++j)
68	AtA[i][j] += A[k][i] * A[k][j];
69	return AtA;
70	}
71
72	// LU decomposition with partial pivoting for a square matrix
73	struct LU { Matrix LUmat; vector<int> piv; bool ok; };
74
75	LU luDecompose(Matrix A) {
76	int n = (int)A.size();
77	vector<int> piv(n); iota(piv.begin(), piv.end(), 0);
78	for (int k = 0; k < n; ++k) {
79	// pivot selection
80	int p = k; double maxabs = fabs(A[k][k]);
81	for (int i = k+1; i < n; ++i) if (fabs(A[i][k]) > maxabs) { maxabs = fabs(A[i][k]); p = i; }
82	if (maxabs == 0.0) return {A, piv, false}; // singular
83	if (p != k) { swap(A[p], A[k]); swap(piv[p], piv[k]); }
84	// elimination
85	for (int i = k+1; i < n; ++i) {
86	A[i][k] /= A[k][k];
87	double lik = A[i][k];
88	for (int j = k+1; j < n; ++j) A[i][j] -= lik * A[k][j];
89	}
90	}
91	return {A, piv, true};
92	}
93
94	// Solve A x = b from LU with pivots
95	Vec luSolve(const LU& lu, const Vec& b) {
96	int n = (int)lu.LUmat.size();
97	Vec x = b;
98	// Apply row permutations to b
99	Vec bp(n);
100	for (int i = 0; i < n; ++i) bp[i] = x[lu.piv[i]];
101	// Forward solve Ly = bp (L has unit diagonal)
102	for (int i = 0; i < n; ++i) {
103	for (int j = 0; j < i; ++j) bp[i] -= lu.LUmat[i][j] * bp[j];
104	}
105	// Backward solve Ux = y
106	for (int i = n-1; i >= 0; --i) {
107	for (int j = i+1; j < n; ++j) bp[i] -= lu.LUmat[i][j] * bp[j];
108	bp[i] /= lu.LUmat[i][i];
109	}
110	return bp;
111	}
112
113	// Estimate smallest singular value via power iteration on (A^T A)^{-1}
114	// Requires LU factorization of M = A^T A (assumed nonsingular)
115	double smallestSigmaEstimate(const Matrix& A, int maxIters=200, double tol=1e-8) {
116	int n = (int)A[0].size();
117	Matrix M = buildAtA(A); // n x n SPD if A full-column rank
118	LU lu = luDecompose(M);
119	if (!lu.ok) return 0.0; // singular or rank-deficient -> sigma_min = 0
120
121	std::mt19937 rng(123);
122	std::normal_distribution<double> N(0.0, 1.0);
123	Vec v(n); for (int i=0;i<n;++i) v[i] = N(rng); normalize(v);
124	double prev = 0.0;
125	for (int it=0; it<maxIters; ++it) {
126	// w = M^{-1} v (solve M w = v)
127	Vec w = luSolve(lu, v);
128	double nw = norm2(w);
129	if (nw == 0.0) return 0.0;
130	// Rayleigh quotient for M^{-1}
131	double lambda = dot(v, w) / dot(v, v);
132	// normalize for next iteration
133	for (int i=0;i<n;++i) v[i] = w[i] / nw;
134	if (fabs(lambda - prev) < tol * max(1.0, fabs(lambda))) {
135	// sigma_min = 1 / sqrt(lambda_max(M^{-1}))
136	return 1.0 / sqrt(lambda);
137	}
138	prev = lambda;
139	}
140	// Fallback after maxIters
141	double lambda = prev > 0 ? prev : 0.0;
142	return (lambda>0) ? 1.0 / sqrt(lambda) : 0.0;
143	}
144
145	int main(){
146	// Example: tall matrix (m x n) for spectral norm; square for condition number
147	Matrix A = {
148	{3.0, 1.0, 0.0},
149	{0.0, 2.0, 2.0},
150	{0.0, 0.0, 0.5},
151	{1.0, 0.0, 0.0}
152	}; // 4x3
153
154	cout.setf(std::ios::fixed); cout<<setprecision(8);
155	double sigmaMax = spectralNormPower(A, 300, 1e-9);
156	cout << "Estimated spectral norm \|\|A\|\|_2 ≈ " << sigmaMax << "\n";
157
158	// For condition number, use a square invertible matrix
159	Matrix B = {
160	{4.0, 1.0, 0.0},
161	{2.0, 3.0, 1.0},
162	{0.0, 1.0, 2.0}
163	}; // 3x3
164
165	double sMaxB = spectralNormPower(B, 300, 1e-9);
166	double sMinB = smallestSigmaEstimate(B, 300, 1e-9);
167	double kappa2 = (sMinB > 0) ? (sMaxB / sMinB) : numeric_limits<double>::infinity();
168	cout << "Estimated sigma_max(B) ≈ " << sMaxB << ", sigma_min(B) ≈ " << sMinB << "\n";
169	cout << "Estimated condition number kappa_2(B) ≈ " << kappa2 << "\n";
170	return 0;
171	}
172