Matrix Factorizations (Numerical)
Key Points
- •Matrix factorizations rewrite a matrix into simpler building blocks (triangular or orthogonal) that make solving and analyzing linear systems much easier.
- •LU with partial pivoting is the standard tool for solving square systems efficiently and stably by reducing A to lower and upper triangular matrices.
- •QR (preferably via Householder reflections) is the robust method for least-squares problems and for obtaining orthonormal bases.
- •Cholesky is the fastest and most memory-friendly factorization for symmetric positive definite (SPD) matrices.
- •Numerical stability matters: use partial pivoting for LU, Householder (or modified Gram–Schmidt) for QR, and check SPD for Cholesky.
- •Operation counts are cubic in the matrix dimension for dense problems: roughly O() time and O() space.
- •Determinants, ranks, and condition indicators can be extracted cheaply from factorizations (e.g., det(A) via LU diagonals).
- •In practice, prefer library routines (BLAS/LAPACK/Eigen); the provided C++ code demonstrates the algorithms and their pitfalls on small inputs.
Prerequisites
- →Gaussian elimination — Foundation for LU factorization and understanding pivoting and triangular solves.
- →Vector and matrix norms — Needed to discuss orthogonality, conditioning, and least-squares residuals.
- →Dot products and projections — Core to Gram–Schmidt and understanding Householder reflections.
- →Properties of SPD matrices — Required to know when Cholesky applies and why it is stable and unique.
- →Floating-point arithmetic and rounding error — Essential for understanding numerical stability and algorithm choices.
- →Big-O notation and flop counts — To analyze algorithmic time and space complexity.
- →Triangular systems (forward/back substitution) — Used after LU, QR, and Cholesky to complete solves.
- →Basic C++ (vectors, loops, exceptions) — To read, write, and modify the provided implementations safely.
- →Matrix multiplication basics — Helps understand how factorizations reconstruct A and how reflectors act on matrices.
- →Least-squares problem formulation — Necessary context for when to choose QR and how to interpret solutions.
Detailed Explanation
Tap terms for definitions01Overview
Matrix factorizations are techniques that decompose a matrix into a product of simpler matrices that are easier to work with. Instead of directly manipulating a complicated matrix A, we rewrite it as a product such as A = LU (lower times upper triangular), A = QR (orthogonal times upper triangular), or A = LL^{T} (Cholesky for symmetric positive definite matrices). These forms let us solve linear systems, compute least-squares solutions, estimate determinants, and analyze numerical stability using simpler building blocks. For example, triangular systems can be solved quickly by forward or backward substitution, and orthogonal matrices preserve lengths, which improves numerical robustness. In numerical linear algebra, these factorizations are the workhorses behind many algorithms in scientific computing, machine learning, and data analysis. They convert expensive, error-prone manipulations into structured computations with well-understood costs and stability properties. While all three factorizations have O(n^3) time complexity for dense n×n matrices, they differ in constants, stability, and applicability: LU with pivoting for general square matrices, QR (preferably Householder) for least squares or when orthogonality is advantageous, and Cholesky for the special (but common) SPD case that yields the fastest, most stable solve.
02Intuition & Analogies
Think of a messy tool cabinet (your matrix A) that makes it hard to find and use tools (solutions). Factoring the matrix is like reorganizing the cabinet into drawers by type: one drawer for screwdrivers (lower triangular L), another for wrenches (upper triangular U), or perhaps grouping tools by size and orientation (orthogonal Q and triangular R). Once organized, you can quickly pull the right tool to solve a problem: triangular systems are easy to handle step by step, and orthogonal matrices don’t distort distances, so computations remain well-behaved.
- LU: Imagine stacking blocks to build a wall. Lower-triangular L tells you how much of each “lower-level” block supports the current row, while U stores the resulting top edges you build on. Partial pivoting is like choosing the sturdiest block to stand on to avoid wobble (numerical instability).
- QR: Picture aligning a set of arrows (vectors) so that each new arrow is made perpendicular to the previously aligned ones and then scaled. Q stores the orthonormal directions (nice, perpendicular axes), and R stores the lengths and how to combine them. Reflecting a vector across a plane (Householder reflection) is a stable way to realign arrows without accumulating much error.
- Cholesky: If your cabinet is perfectly symmetric and sturdy (SPD), you can halve the work: the matrix equals L times its own transpose. It’s like using mirrored drawers—organize once, and the other side follows automatically—making the process faster and more memory efficient.
03Formal Definition
04When to Use
- LU with partial pivoting (PA = LU): Use for solving many linear systems Ax = b with the same square A and different right-hand sides b. It’s the default for general dense square matrices. Also useful for computing det(A) (product of U’s diagonal times pivot sign) and for inverting A in a structured way (though direct inversion is usually discouraged).
- QR (A = QR): Prefer for overdetermined least-squares problems (m > n), where you want a numerically stable solution that avoids forming A^{T}A. Also use when you need an orthonormal basis for the column space, to estimate rank, or in algorithms like eigenvalue methods (e.g., the QR algorithm for eigenvalues).
- Cholesky (A = LL^{T}): Choose when A is symmetric positive definite—common in covariance matrices, kernel matrices, and normal-equation matrices from least squares. It’s the fastest factorization for such A, with fewer flops and great stability. Use it to solve Ax = b (two triangular solves), compute log-determinants, and sample from Gaussian distributions. In practice: if you know A is SPD, use Cholesky; if you have general square A, use LU with pivoting; if you face least-squares or want orthogonality, use QR (preferably Householder).
⚠️Common Mistakes
- Skipping pivoting in LU: Without partial pivoting, Gaussian elimination can be numerically unstable or even fail due to tiny pivots. Always use partial pivoting for general matrices.
- Using classical Gram–Schmidt for QR: It suffers loss of orthogonality in finite precision. Prefer Householder reflections (most stable) or modified Gram–Schmidt for teaching or moderate accuracy needs.
- Applying Cholesky to non-SPD matrices: This leads to taking square roots of negative numbers or division by zero. Always check symmetry (A = A^{T} within a tolerance) and positive definiteness (leading principal minors positive, or attempt the factorization and detect breakdown).
- Forming normal equations A^{T}A x = A^{T}b when a QR solve is available: This squares the condition number and can dramatically increase error. Use QR directly for least squares.
- Ignoring scaling/conditioning: Poorly scaled matrices amplify rounding errors. Consider scaling rows/columns or using algorithms that reveal conditioning (e.g., rank-revealing QR).
- Treating determinants as a stability check: A tiny determinant does not alone diagnose singularity; look at condition number or pivot sizes. Also, avoid computing det(A) via cofactors—use LU.
- Recomputing factorizations unnecessarily: If A is fixed and multiple right-hand sides are solved, reuse the factorization to save time.
Key Formulas
LU with Partial Pivoting
Explanation: Any nonsingular square matrix A can be factored into a permutation P, unit lower-triangular L, and upper-triangular U. This enables fast triangular solves and stable Gaussian elimination.
QR Factorization
Explanation: A is written as an orthogonal matrix Q times an upper-triangular matrix R. Orthogonality preserves lengths and makes least-squares solves stable.
Cholesky Factorization
Explanation: For symmetric positive definite A, there exists a unique lower-triangular L with positive diagonal such that . This halves the work compared to LU for dense matrices.
Determinant from LU
Explanation: The determinant equals the product of diagonal entries of U times the sign of the permutation. It can be computed in O(n) after LU is available.
Least Squares via QR
Explanation: For full-column-rank A, the minimizer is obtained by computing and solving the top n×n triangular system . This avoids forming A.
2-Norm Condition Number
Explanation: This measures sensitivity of solutions to perturbations in A or b. A large condition number means small errors in data can cause large errors in the solution.
Norm Preservation by Orthogonal Q
Explanation: Orthogonal transformations do not change Euclidean lengths, which helps maintain numerical stability in QR algorithms.
Dense Flop Counts
Explanation: For large dense n×n matrices, cubic time dominates. Cholesky is fastest, LU is intermediate, and QR via Householder is costliest among the three.
Normal Equations
Explanation: This characterizes least-squares solutions but squares the condition number and is less stable. Prefer QR-based methods over forming A explicitly.
Residual Norm
Explanation: The residual measures how closely x satisfies Ax = b (or fits data). A small residual indicates a good fit, though the solution may still be sensitive if A is ill-conditioned.
Growth Factor in Gaussian Elimination
Explanation: This quantifies element growth during elimination. Large growth can cause numerical issues; partial pivoting controls growth in most practical cases.
Complexity Analysis
Code Examples
1 #include <iostream> 2 #include <vector> 3 #include <cmath> 4 #include <iomanip> 5 #include <stdexcept> 6 #include <limits> 7 #include <algorithm> 8 9 using Matrix = std::vector<std::vector<double>>; 10 11 struct LUResult { 12 Matrix LU; // Combined L (unit lower) and U (upper) 13 std::vector<int> P; // Row permutation: Pb = b[P[i]] 14 int sign; // Sign of the permutation (+1 or -1) 15 }; 16 17 // Perform in-place LU with partial pivoting on a copy of A 18 LUResult lu_decompose(const Matrix& A) { 19 int n = (int)A.size(); 20 if (n == 0 || (int)A[0].size() != n) throw std::invalid_argument("A must be square"); 21 Matrix LU = A; // will hold both L and U 22 std::vector<int> P(n); 23 for (int i = 0; i < n; ++i) P[i] = i; 24 int sign = 1; 25 const double eps = 1e-12; 26 27 for (int k = 0; k < n; ++k) { 28 // Pivot: find row with max |LU[i][k]| for i >= k 29 int piv = k; 30 double maxabs = std::abs(LU[k][k]); 31 for (int i = k + 1; i < n; ++i) { 32 double val = std::abs(LU[i][k]); 33 if (val > maxabs) { maxabs = val; piv = i; } 34 } 35 if (maxabs < eps) throw std::runtime_error("Matrix is singular to working precision"); 36 if (piv != k) { 37 std::swap(LU[piv], LU[k]); 38 std::swap(P[piv], P[k]); 39 sign = -sign; 40 } 41 // Elimination 42 for (int i = k + 1; i < n; ++i) { 43 LU[i][k] /= LU[k][k]; // multiplier stored in L part 44 double lik = LU[i][k]; 45 for (int j = k + 1; j < n; ++j) { 46 LU[i][j] -= lik * LU[k][j]; 47 } 48 } 49 } 50 return {LU, P, sign}; 51 } 52 53 // Solve Ly = Pb (L is unit-lower inside LU) 54 std::vector<double> forward_subst(const Matrix& LU, const std::vector<int>& P, const std::vector<double>& b) { 55 int n = (int)LU.size(); 56 std::vector<double> y(n); 57 for (int i = 0; i < n; ++i) { 58 double sum = b[P[i]]; // apply permutation P to b 59 for (int j = 0; j < i; ++j) sum -= LU[i][j] * y[j]; 60 y[i] = sum; // L has unit diagonal 61 } 62 return y; 63 } 64 65 // Solve Ux = y (U is upper-triangular part of LU) 66 std::vector<double> back_subst(const Matrix& LU, const std::vector<double>& y) { 67 int n = (int)LU.size(); 68 std::vector<double> x(n); 69 const double eps = 1e-14; 70 for (int i = n - 1; i >= 0; --i) { 71 double sum = y[i]; 72 for (int j = i + 1; j < n; ++j) sum -= LU[i][j] * x[j]; 73 double uii = LU[i][i]; 74 if (std::abs(uii) < eps) throw std::runtime_error("Singular U on back substitution"); 75 x[i] = sum / uii; 76 } 77 return x; 78 } 79 80 // Convenience: solve Ax = b via PA = LU 81 std::vector<double> lu_solve(const Matrix& A, const std::vector<double>& b) { 82 auto lu = lu_decompose(A); 83 auto y = forward_subst(lu.LU, lu.P, b); 84 auto x = back_subst(lu.LU, y); 85 return x; 86 } 87 88 // Extract determinant quickly from LU 89 double lu_determinant(const LUResult& lu) { 90 double det = (double)lu.sign; 91 for (int i = 0; i < (int)lu.LU.size(); ++i) det *= lu.LU[i][i]; 92 return det; 93 } 94 95 int main() { 96 // Example: Solve Ax = b and compute det(A) 97 Matrix A = { 98 { 2.0, 1.0, 1.0}, 99 { 4.0, -6.0, 0.0}, 100 {-2.0, 7.0, 2.0} 101 }; 102 std::vector<double> b = {5.0, -2.0, 9.0}; 103 104 try { 105 auto lu = lu_decompose(A); 106 auto y = forward_subst(lu.LU, lu.P, b); 107 auto x = back_subst(lu.LU, y); 108 109 std::cout << std::fixed << std::setprecision(6); 110 std::cout << "Solution x:"; 111 for (double xi : x) std::cout << " " << xi; std::cout << "\n"; 112 113 double detA = lu_determinant(lu); 114 std::cout << "det(A) = " << detA << "\n"; 115 } catch (const std::exception& e) { 116 std::cerr << "Error: " << e.what() << "\n"; 117 } 118 return 0; 119 } 120
This program computes an LU factorization with partial pivoting (PA = LU) using an in-place scheme that stores L’s multipliers below the diagonal and U on and above the diagonal. It then solves Ax = b by applying the permutation to b, performing forward substitution with unit-lower L, and back substitution with U. The determinant is the product of U’s diagonal times the permutation sign.
1 #include <iostream> 2 #include <vector> 3 #include <cmath> 4 #include <iomanip> 5 #include <stdexcept> 6 7 using Matrix = std::vector<std::vector<double>>; 8 9 // Build an m-length vector w that embeds v starting at index k 10 std::vector<double> embed(const std::vector<double>& v, int m, int k) { 11 std::vector<double> w(m, 0.0); 12 for (int i = 0; i < (int)v.size(); ++i) w[k + i] = v[i]; 13 return w; 14 } 15 16 // Apply Householder reflector H = I - 2 w w^T to the left of A: A <- H A 17 void apply_left_reflector(Matrix& A, const std::vector<double>& w) { 18 int m = (int)A.size(); 19 int n = (int)A[0].size(); 20 // For each column j, col_j <- col_j - 2 * w * (w^T col_j) 21 for (int j = 0; j < n; ++j) { 22 double dot = 0.0; 23 for (int i = 0; i < m; ++i) dot += w[i] * A[i][j]; 24 for (int i = 0; i < m; ++i) A[i][j] -= 2.0 * w[i] * dot; 25 } 26 } 27 28 // Compute QR using Householder reflections. Returns Q and R explicitly. 29 void householder_qr(const Matrix& Ain, Matrix& Q, Matrix& R) { 30 int m = (int)Ain.size(); 31 int n = (int)Ain[0].size(); 32 R = Ain; // will become upper-triangular 33 Q.assign(m, std::vector<double>(m, 0.0)); 34 for (int i = 0; i < m; ++i) Q[i][i] = 1.0; // start with identity 35 36 int kmax = std::min(m, n); 37 for (int k = 0; k < kmax; ++k) { 38 // Form Householder vector v to zero out R[k+1..m-1][k] 39 double normx = 0.0; 40 for (int i = k; i < m; ++i) normx = std::hypot(normx, R[i][k]); 41 if (normx == 0.0) continue; // nothing to do 42 double alpha = (R[k][k] >= 0.0) ? -normx : normx; // avoid cancellation 43 44 std::vector<double> v(m - k); 45 v[0] = R[k][k] - alpha; 46 for (int i = k + 1; i < m; ++i) v[i - k] = R[i][k]; 47 // Normalize v 48 double vnorm = 0.0; 49 for (double vi : v) vnorm = std::hypot(vnorm, vi); 50 for (double &vi : v) vi /= vnorm; 51 52 // Build full w and apply to R and accumulate Q = Q * H 53 std::vector<double> w = embed(v, m, k); 54 apply_left_reflector(R, w); 55 apply_left_reflector(Q, w); 56 57 // Ensure exact zeros below diagonal in column k (for cleanliness) 58 for (int i = k + 1; i < m; ++i) R[i][k] = 0.0; 59 } 60 // After accumulating Q <- Q * H_k ... H_0, we have A' = Q * R (since H is symmetric), 61 // but the desired relation is A = Q R, with Q orthogonal. Our Q is indeed orthogonal. 62 } 63 64 // Solve least-squares min_x ||Ax - b||_2 using QR: x from R1 x = Q^T b (top n entries) 65 std::vector<double> qr_least_squares(const Matrix& A, const std::vector<double>& b) { 66 int m = (int)A.size(); 67 int n = (int)A[0].size(); 68 if ((int)b.size() != m) throw std::invalid_argument("Size mismatch"); 69 Matrix Q, R; 70 householder_qr(A, Q, R); 71 // Compute y = Q^T b 72 std::vector<double> y(m, 0.0); 73 for (int i = 0; i < m; ++i) { 74 for (int j = 0; j < m; ++j) y[i] += Q[j][i] * b[j]; // Q^T b 75 } 76 // Back-solve R1 x = y[0..n-1] 77 std::vector<double> x(n, 0.0); 78 for (int i = n - 1; i >= 0; --i) { 79 double sum = y[i]; 80 for (int j = i + 1; j < n; ++j) sum -= R[i][j] * x[j]; 81 if (std::abs(R[i][i]) < 1e-14) throw std::runtime_error("Rank deficiency detected"); 82 x[i] = sum / R[i][i]; 83 } 84 return x; 85 } 86 87 int main() { 88 // Overdetermined example: fit y = ax + b to 3 points via least squares 89 // A = [x 1], b = y 90 Matrix A = { 91 {0.0, 1.0}, 92 {1.0, 1.0}, 93 {2.0, 1.0} 94 }; // m=3, n=2 95 std::vector<double> b = {1.0, 2.0, 2.0}; 96 97 try { 98 std::vector<double> x = qr_least_squares(A, b); 99 std::cout << std::fixed << std::setprecision(6); 100 std::cout << "Least-squares solution [a, b]: "; 101 for (double xi : x) std::cout << xi << ' '; 102 std::cout << "\n"; 103 } catch (const std::exception& e) { 104 std::cerr << "Error: " << e.what() << "\n"; 105 } 106 return 0; 107 } 108
This program computes a QR factorization using stable Householder reflections. It explicitly forms both Q and R, then solves an overdetermined least-squares problem by computing y = Q^T b and back-solving the leading triangular system. Householder reflections preserve norms and maintain numerical orthogonality better than classical Gram–Schmidt.
1 #include <iostream> 2 #include <vector> 3 #include <cmath> 4 #include <iomanip> 5 #include <stdexcept> 6 7 using Matrix = std::vector<std::vector<double>>; 8 9 Matrix cholesky_decompose(const Matrix& A) { 10 int n = (int)A.size(); 11 if (n == 0 || (int)A[0].size() != n) throw std::invalid_argument("A must be square"); 12 Matrix L(n, std::vector<double>(n, 0.0)); 13 const double eps = 1e-12; 14 15 for (int i = 0; i < n; ++i) { 16 for (int j = 0; j <= i; ++j) { 17 double sum = A[i][j]; 18 for (int k = 0; k < j; ++k) sum -= L[i][k] * L[j][k]; 19 if (i == j) { 20 if (sum <= eps) throw std::runtime_error("Matrix is not SPD (non-positive pivot)"); 21 L[i][j] = std::sqrt(sum); 22 } else { 23 L[i][j] = sum / L[j][j]; 24 } 25 } 26 } 27 return L; 28 } 29 30 std::vector<double> forward_subst_L(const Matrix& L, const std::vector<double>& b) { 31 int n = (int)L.size(); 32 std::vector<double> y(n, 0.0); 33 for (int i = 0; i < n; ++i) { 34 double sum = b[i]; 35 for (int j = 0; j < i; ++j) sum -= L[i][j] * y[j]; 36 y[i] = sum / L[i][i]; 37 } 38 return y; 39 } 40 41 std::vector<double> back_subst_LT(const Matrix& L, const std::vector<double>& y) { 42 int n = (int)L.size(); 43 std::vector<double> x(n, 0.0); 44 for (int i = n - 1; i >= 0; --i) { 45 double sum = y[i]; 46 for (int j = i + 1; j < n; ++j) sum -= L[j][i] * x[j]; // use L^T 47 x[i] = sum / L[i][i]; 48 } 49 return x; 50 } 51 52 int main() { 53 // SPD example matrix (symmetric and positive definite) 54 Matrix A = { 55 {25.0, 15.0, -5.0}, 56 {15.0, 18.0, 0.0}, 57 {-5.0, 0.0, 11.0} 58 }; 59 std::vector<double> b = {35.0, 33.0, 6.0}; 60 61 try { 62 Matrix L = cholesky_decompose(A); 63 auto y = forward_subst_L(L, b); 64 auto x = back_subst_LT(L, y); 65 66 std::cout << std::fixed << std::setprecision(6); 67 std::cout << "Solution x:"; 68 for (double xi : x) std::cout << ' ' << xi; std::cout << "\n"; 69 70 // Optional: log-det(A) = 2 * sum(log(L_ii)) 71 double logdetA = 0.0; 72 for (int i = 0; i < (int)L.size(); ++i) logdetA += 2.0 * std::log(L[i][i]); 73 std::cout << "log(det(A)) = " << logdetA << "\n"; 74 } catch (const std::exception& e) { 75 std::cerr << "Error: " << e.what() << "\n"; 76 } 77 return 0; 78 } 79
This program computes the Cholesky factor L of an SPD matrix A (A = L L^T) and solves Ax = b using two triangular solves. It also demonstrates how to compute log(det(A)) efficiently from the diagonal of L, which is useful for probabilistic models and numerical stability.