Sparse Matrices & Computation
Key Points
- •A sparse matrix stores only its nonzero entries, saving huge amounts of memory when most entries are zero.
- •The three core formats are COO (triplets), CSR (row-compressed), and CSC (column-compressed), each optimized for different operations.
- •CSR is excellent for fast sparse matrix–vector multiplication (SpMV), while CSC is great for operations that are column-centric or for solving with column-oriented factorizations.
- •Converting between COO, CSR, and CSC mostly involves counting nonzeros per row/column and computing prefix sums.
- •Careful handling of indices, duplicates, and sorting within rows/columns is crucial for correctness and performance.
- •SpMV with CSR runs in O() time and uses O( + m) space, where m is the number of rows.
- •Sparse iterative solvers like Conjugate Gradient require only SpMV, dot products, and axpy operations, making CSR ideal for large SPD systems.
- •Common pitfalls include mixing 0-based/1-based indices, leaving explicit zeros, unsorted columns within rows, and integer overflow in index arrays.
Prerequisites
- →Arrays and memory layouts — Understanding contiguous storage and indexing is essential to grasp CSR/CSC pointer and index arrays.
- →Big-O notation — To analyze time and space complexity of sparse operations like SpMV and conversions.
- →Basic C++ vectors and iterators — Implementing sparse formats relies on std::vector, iteration, and standard algorithms.
- →Linear algebra fundamentals — Concepts like matrix–vector products, transpose, and norms are prerequisites for sparse computations.
- →Prefix sums (scan) — Constructing CSR/CSC from counts requires computing cumulative sums.
- →Sorting and stability — Merging duplicates and canonicalizing rows/columns depend on sorting subarrays.
- →Floating-point arithmetic — Sparse computations involve summation order, cancellation, and tolerance handling in iterative solvers.
Detailed Explanation
Tap terms for definitions01Overview
A sparse matrix is a matrix in which most entries are zero. Instead of storing all m×n entries, sparse formats store only the nonzero values and their positions, dramatically reducing memory and often accelerating computations. Three widely used storage schemes are COO (Coordinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column). COO stores each nonzero as a triplet (row, column, value) and is simple to construct but inefficient for repeated arithmetic. CSR groups nonzeros by row with three arrays: row_ptr (size m+1), col_idx (size nnz), and val (size nnz). CSC is the column-analog: col_ptr (size n+1), row_idx, and val. With these compressed formats, we can perform key operations like matrix–vector products, transposes, and iterative solves much faster and using far less memory than dense matrices.
02Intuition & Analogies
Imagine a huge seating chart for a stadium with a billion seats (cells). If only a few seats are reserved (nonzero entries), copying the entire chart is wasteful. Instead, you’d keep a short list of reservations: seat row, seat column, and the name (value). That list is the COO format. Now suppose you frequently need to find all reservations in a given row (e.g., print all people in section A). You’d store the list grouped by row, with a quick index to jump to each row’s block—this is CSR. Similarly, if most of your queries ask for reservations by column (e.g., aisle numbers), you’d group by column—this is CSC. The pointer arrays (row_ptr or col_ptr) are like a table-of-contents telling you where each row or column’s list starts and ends. When multiplying a sparse matrix by a vector, you only touch the reserved seats (nonzeros), skipping the empties, so the work scales with the number of reservations rather than the stadium size. This is the key advantage: avoid processing zeros you don’t care about. But to make this efficient, your reservations in each row/column should be sorted by seat number (column/row), duplicates should be merged, and indices must be consistent (all 0-based or all 1-based).
03Formal Definition
04When to Use
Use sparse formats when your matrix has a small fraction of nonzeros (e.g., s = \mathrm{nnz}/(mn) \le 0.1 or far less) and you repeatedly perform linear algebra like matrix–vector products, solves, or graph traversals. CSR is the go-to for iterative methods (Conjugate Gradient, GMRES) where fast SpMV dominates runtime. CSC is preferred when algorithms are column-oriented (e.g., some factorizations, column sampling, or when interacting with libraries expecting CSC). COO is ideal as a construction format: streaming in (i,j,v) data from files, assembling stiffness matrices in finite element methods, or graph edges, then converting to CSR/CSC for computation. In graph algorithms, CSR encodes the adjacency list compactly and supports fast neighbor iteration; in recommendation systems and NLP, sparse feature matrices benefit from CSR/CSC storage. If your matrix is moderately dense or you need frequent random access to arbitrary A[i,j], dense storage may be simpler and faster due to cache locality. Also consider the cost of conversions: if you only do one small operation, conversion overhead might dominate; if you do many SpMVs or iterations, converting once pays off.
⚠️Common Mistakes
- Mixing 0-based and 1-based indexing leads to out-of-bounds errors or incorrect results. Pick one (C++ typically 0-based) and be consistent throughout.
- Leaving explicit zeros in COO or CSR wastes space and time; filter them out during assembly and merge duplicates (same (i,j)) by summing values.
- Not sorting columns within each CSR row (or rows within each CSC column) can degrade performance and break algorithms that assume sorted structure (e.g., binary search, merge-based ops).
- Forgetting that row_ptr has length m+1 (and col_ptr length n+1) causes off-by-one errors when iterating segments.
- Integer overflow in indices for very large matrices (\mathrm{nnz} > 2^{31}-1). Use 64-bit indices (std::int64_t or size_t) when needed.
- Assuming SpMV or conversions are thread-safe without care; writing into shared arrays during parallel fill requires per-thread buffers or atomic operations.
- Using CSR for operations that inherently need fast column access (or CSC for row access) without conversion can lead to O(mn) scans; convert formats or transpose instead.
- Neglecting numerical issues: in iterative solvers, poor scaling or ill-conditioned matrices slow convergence; apply diagonal scaling or preconditioning.
Key Formulas
Sparsity
Explanation: This fraction measures how full the matrix is. Small s means the matrix is very sparse, which usually benefits from sparse formats.
CSR SpMV
Explanation: For each row i, iterate from ro[i] to ro[i+1]-1 and accumulate products into . This touches each nonzero exactly once.
SpMV Time Complexity
Explanation: The time scales linearly with the number of nonzeros. Each nonzero contributes one multiply and one add (plus memory access).
CSR Storage Count
Explanation: CSR uses m+1 entries for ro, nnz for co, and nnz for values. In bytes, multiply by the size of each type.
Prefix Sum for CSR
Explanation: The next row’s start equals the current start plus the count of nonzeros in this row. This is how we build ro from counts.
Frobenius Norm
Explanation: A common way to measure matrix size that’s easy to compute from sparse storage by summing squares of stored values.
SpMV Operation Count
Explanation: There is one multiplication per nonzero and at most one addition per nonzero, minus the number of row starts z. This highlights low arithmetic intensity.
Condition Number (SPD)
Explanation: For SPD matrices, CG convergence depends on the condition number. A larger kappa implies slower convergence.
CG Convergence Bound
Explanation: The A-norm of the error after k iterations decays geometrically with a rate determined by the condition number. Preconditioning reduces kappa.
COO to CSR Complexity
Explanation: Counting per-row nonzeros and computing a prefix sum is linear in rows plus nonzeros, followed by a single pass to place entries.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct CSR { 5 int nrows = 0, ncols = 0; 6 vector<int> row_ptr; // size = nrows + 1 7 vector<int> col_idx; // size = nnz 8 vector<double> val; // size = nnz 9 10 // Multiply y = A x 11 vector<double> matvec(const vector<double>& x) const { 12 if ((int)x.size() != ncols) throw runtime_error("Dimension mismatch in matvec"); 13 vector<double> y(nrows, 0.0); 14 for (int i = 0; i < nrows; ++i) { 15 double sum = 0.0; 16 for (int p = row_ptr[i]; p < row_ptr[i+1]; ++p) { 17 sum += val[p] * x[col_idx[p]]; 18 } 19 y[i] = sum; 20 } 21 return y; 22 } 23 24 // Construct CSR from COO triplets (0-based). Duplicates are summed; rows are sorted by column. 25 static CSR fromCOO(int m, int n, const vector<int>& I, const vector<int>& J, const vector<double>& V) { 26 if (I.size() != J.size() || I.size() != V.size()) throw runtime_error("COO arrays must be same length"); 27 int nnz = (int)I.size(); 28 CSR A; A.nrows = m; A.ncols = n; A.row_ptr.assign(m+1, 0); 29 // 1) Count nonzeros per row 30 for (int k = 0; k < nnz; ++k) { 31 int r = I[k]; 32 if (r < 0 || r >= m) throw runtime_error("Row index out of range"); 33 int c = J[k]; 34 if (c < 0 || c >= n) throw runtime_error("Column index out of range"); 35 ++A.row_ptr[r]; 36 } 37 // 2) Prefix sum to get row_ptr 38 int cumsum = 0; 39 for (int i = 0; i < m; ++i) { 40 int cnt = A.row_ptr[i]; 41 A.row_ptr[i] = cumsum; 42 cumsum += cnt; 43 } 44 A.row_ptr[m] = cumsum; // total nnz (may include duplicates) 45 A.col_idx.assign(nnz, 0); 46 A.val.assign(nnz, 0.0); 47 // 3) Temp copy of row_ptr to track current insertion positions 48 vector<int> next(A.row_ptr.begin(), A.row_ptr.end()); 49 // 4) Scatter COO entries into CSR buckets (unsorted within each row) 50 for (int k = 0; k < nnz; ++k) { 51 int r = I[k]; 52 int pos = next[r]++; 53 A.col_idx[pos] = J[k]; 54 A.val[pos] = V[k]; 55 } 56 // 5) For each row: sort by column and sum duplicates; compact in-place 57 vector<int> new_row_ptr(m+1, 0); 58 vector<int> new_col; 59 vector<double> new_val; 60 new_row_ptr[0] = 0; 61 for (int i = 0; i < m; ++i) { 62 int start = A.row_ptr[i], end = A.row_ptr[i+1]; 63 int len = end - start; 64 vector<pair<int,double>> row; row.reserve(len); 65 for (int p = start; p < end; ++p) row.emplace_back(A.col_idx[p], A.val[p]); 66 sort(row.begin(), row.end(), [](auto& a, auto& b){ return a.first < b.first; }); 67 // merge duplicates 68 for (int t = 0; t < (int)row.size(); ) { 69 int cj = row[t].first; 70 double acc = 0.0; 71 int u = t; 72 while (u < (int)row.size() && row[u].first == cj) { acc += row[u].second; ++u; } 73 if (acc != 0.0) { // drop explicit zeros 74 new_col.push_back(cj); 75 new_val.push_back(acc); 76 } 77 t = u; 78 } 79 new_row_ptr[i+1] = (int)new_col.size(); 80 } 81 A.row_ptr.swap(new_row_ptr); 82 A.col_idx.swap(new_col); 83 A.val.swap(new_val); 84 return A; 85 } 86 }; 87 88 int main() { 89 // Example COO for a 4x5 matrix 90 int m = 4, n = 5; 91 // Triplets (I, J, V), 0-based 92 vector<int> I = {0,0,1,2,2,2,3,3}; 93 vector<int> J = {0,3,1,0,2,4,1,4}; 94 vector<double> V = {10.0, 2.0, 3.0, 4.0, 5.0, -1.0, 7.0, 1.5}; 95 96 CSR A = CSR::fromCOO(m, n, I, J, V); 97 98 // Vector x of length n 99 vector<double> x = {1, 2, 3, 4, 5}; 100 vector<double> y = A.matvec(x); 101 102 cout << fixed << setprecision(2); 103 cout << "y = A x = ["; 104 for (int i = 0; i < (int)y.size(); ++i) { 105 if (i) cout << ", "; 106 cout << y[i]; 107 } 108 cout << "]\n"; 109 110 // Print CSR internals for verification 111 cout << "row_ptr: "; 112 for (int v : A.row_ptr) cout << v << ' '; cout << "\n"; 113 cout << "col_idx: "; 114 for (int v : A.col_idx) cout << v << ' '; cout << "\n"; 115 cout << "val: "; 116 for (double v : A.val) cout << v << ' '; cout << "\n"; 117 return 0; 118 } 119
We assemble a sparse matrix from COO triplets into CSR by counting per-row nonzeros, prefix-summing to get row_ptr, scattering triplets, then sorting and merging duplicates within each row. We then perform SpMV by iterating each row’s contiguous block and accumulating val[p] * x[col_idx[p]]. This program demonstrates the core data structures and a key kernel used in iterative methods.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct CSR { int m=0, n=0; vector<int> row_ptr, col_idx; vector<double> val; }; 5 struct CSC { int m=0, n=0; vector<int> col_ptr, row_idx; vector<double> val; }; 6 7 // CSR -> CSC using counting by column (stable). Equivalent to computing (CSR(A))^T in CSC of A^T. 8 CSC csr_to_csc(const CSR& A) { 9 CSC B; B.m = A.m; B.n = A.n; int m = A.m, n = A.n; int nnz = (int)A.col_idx.size(); 10 B.col_ptr.assign(n+1, 0); 11 // Count entries per column 12 for (int p = 0; p < nnz; ++p) { 13 int j = A.col_idx[p]; 14 if (j < 0 || j >= n) throw runtime_error("Bad column index"); 15 ++B.col_ptr[j]; 16 } 17 // Prefix sum to starting offsets 18 int cumsum = 0; 19 for (int j = 0; j < n; ++j) { 20 int cnt = B.col_ptr[j]; 21 B.col_ptr[j] = cumsum; 22 cumsum += cnt; 23 } 24 B.col_ptr[n] = cumsum; 25 B.row_idx.assign(nnz, 0); 26 B.val.assign(nnz, 0.0); 27 // Temp positions per column 28 vector<int> next(B.col_ptr.begin(), B.col_ptr.end()); 29 // Fill CSC arrays by scanning CSR rows 30 for (int i = 0; i < m; ++i) { 31 for (int p = A.row_ptr[i]; p < A.row_ptr[i+1]; ++p) { 32 int j = A.col_idx[p]; 33 int q = next[j]++; 34 B.row_idx[q] = i; 35 B.val[q] = A.val[p]; 36 } 37 } 38 return B; 39 } 40 41 // CSC -> CSR by the same logic (count by row) 42 CSR csc_to_csr(const CSC& B) { 43 CSR A; A.m = B.m; A.n = B.n; int m = B.m, n = B.n; int nnz = (int)B.row_idx.size(); 44 A.row_ptr.assign(m+1, 0); 45 for (int q = 0; q < nnz; ++q) { 46 int i = B.row_idx[q]; 47 if (i < 0 || i >= m) throw runtime_error("Bad row index"); 48 ++A.row_ptr[i]; 49 } 50 int cumsum = 0; for (int i = 0; i < m; ++i) { int cnt = A.row_ptr[i]; A.row_ptr[i] = cumsum; cumsum += cnt; } A.row_ptr[m] = cumsum; 51 A.col_idx.assign(nnz, 0); A.val.assign(nnz, 0.0); 52 vector<int> next(A.row_ptr.begin(), A.row_ptr.end()); 53 for (int j = 0; j < n; ++j) { 54 for (int q = B.col_ptr[j]; q < B.col_ptr[j+1]; ++q) { 55 int i = B.row_idx[q]; 56 int p = next[i]++; 57 A.col_idx[p] = j; 58 A.val[p] = B.val[q]; 59 } 60 } 61 return A; 62 } 63 64 int main() { 65 // A simple 3x4 CSR matrix 66 CSR A; A.m = 3; A.n = 4; 67 A.row_ptr = {0, 2, 3, 5}; 68 A.col_idx = {0, 2, 1, 1, 3}; 69 A.val = {5.0, 1.0, 2.0, 4.0, -3.0}; 70 71 CSC B = csr_to_csc(A); // convert to CSC 72 CSR A2 = csc_to_csr(B); // back to CSR 73 74 // Verify equality of A and A2 (same pattern and values, order within rows may change) 75 auto canonicalize = [](CSR C){ 76 for (int i = 0; i < C.m; ++i) { 77 int s = C.row_ptr[i], e = C.row_ptr[i+1]; 78 vector<pair<int,double>> row; 79 for (int p = s; p < e; ++p) row.emplace_back(C.col_idx[p], C.val[p]); 80 sort(row.begin(), row.end()); 81 for (int p = s, t = 0; p < e; ++p, ++t) { 82 C.col_idx[p] = row[t].first; C.val[p] = row[t].second; 83 } 84 } 85 return C; 86 }; 87 88 CSR A_can = canonicalize(A), A2_can = canonicalize(A2); 89 bool same = (A_can.m == A2_can.m) && (A_can.n == A2_can.n) && (A_can.row_ptr == A2_can.row_ptr) 90 && (A_can.col_idx == A2_can.col_idx); 91 92 cout << (same ? "Conversion CSR->CSC->CSR preserved structure." : "Mismatch after round-trip conversion!") << "\n"; 93 94 // Print CSC pointers for inspection 95 cout << "CSC col_ptr: "; for (int v : B.col_ptr) cout << v << ' '; cout << "\n"; 96 cout << "CSC row_idx: "; for (int v : B.row_idx) cout << v << ' '; cout << "\n"; 97 cout << "CSC val: "; for (double v : B.val) cout << v << ' '; cout << "\n"; 98 return 0; 99 } 100
This example converts CSR to CSC using a counting sort by column and then back from CSC to CSR by counting rows. Both conversions are O(n + nnz) and are the standard way to transpose or switch access orientation. We verify the round trip preserves the matrix (up to within-row order).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct CSR { 5 int nrows = 0, ncols = 0; 6 vector<int> row_ptr, col_idx; vector<double> val; 7 vector<double> matvec(const vector<double>& x) const { 8 if ((int)x.size() != ncols) throw runtime_error("Dimension mismatch"); 9 vector<double> y(nrows, 0.0); 10 for (int i = 0; i < nrows; ++i) { 11 double s = 0.0; 12 for (int p = row_ptr[i]; p < row_ptr[i+1]; ++p) s += val[p] * x[col_idx[p]]; 13 y[i] = s; 14 } 15 return y; 16 } 17 }; 18 19 // Conjugate Gradient for A x = b, A is SPD and stored in CSR. Returns x. 20 vector<double> cg_solve(const CSR& A, const vector<double>& b, int max_iter=1000, double tol=1e-8) { 21 int n = A.nrows; 22 if (A.nrows != A.ncols) throw runtime_error("CG requires square matrix"); 23 if ((int)b.size() != n) throw runtime_error("Dimension mismatch"); 24 25 vector<double> x(n, 0.0); // initial guess x0 = 0 26 vector<double> r = b; // r0 = b - A x0 = b 27 vector<double> p = r; // p0 = r0 28 29 auto dot = [](const vector<double>& u, const vector<double>& v){ 30 return inner_product(u.begin(), u.end(), v.begin(), 0.0); 31 }; 32 33 double rsold = dot(r, r); 34 if (sqrt(rsold) < tol) return x; 35 36 for (int k = 0; k < max_iter; ++k) { 37 vector<double> Ap = A.matvec(p); 38 double alpha = rsold / max(1e-300, dot(p, Ap)); 39 // x_{k+1} = x_k + alpha p_k ; r_{k+1} = r_k - alpha A p_k 40 for (int i = 0; i < n; ++i) { 41 x[i] += alpha * p[i]; 42 r[i] -= alpha * Ap[i]; 43 } 44 double rsnew = dot(r, r); 45 double rnorm = sqrt(rsnew); 46 if (rnorm < tol) { 47 cerr << "CG converged in " << k+1 << " iterations. Residual = " << rnorm << "\n"; 48 break; 49 } 50 double beta = rsnew / max(1e-300, rsold); 51 for (int i = 0; i < n; ++i) p[i] = r[i] + beta * p[i]; 52 rsold = rsnew; 53 } 54 return x; 55 } 56 57 int main() { 58 // Example SPD matrix (tridiagonal) 3x3: 59 // [ 4 -1 0 ] 60 // [ -1 4 -1 ] 61 // [ 0 -1 3 ] 62 CSR A; A.nrows = 3; A.ncols = 3; 63 A.row_ptr = {0, 2, 5, 7}; 64 A.col_idx = {0, 1, 0, 1, 2, 1, 2}; 65 A.val = {4.0,-1.0,-1.0,4.0,-1.0,-1.0,3.0}; 66 67 vector<double> b = {1.0, 2.0, 3.0}; 68 vector<double> x = cg_solve(A, b, 100, 1e-10); 69 70 cout << fixed << setprecision(8); 71 cout << "Solution x = ["; 72 for (int i = 0; i < (int)x.size(); ++i) { if (i) cout << ", "; cout << x[i]; } 73 cout << "]\n"; 74 75 // Check residual norm ||Ax - b|| 76 vector<double> r = A.matvec(x); 77 for (int i = 0; i < (int)r.size(); ++i) r[i] -= b[i]; 78 double rn = sqrt(inner_product(r.begin(), r.end(), r.begin(), 0.0)); 79 cout << "Residual norm = " << rn << "\n"; 80 return 0; 81 } 82
Conjugate Gradient is an iterative solver for symmetric positive definite systems that uses only SpMV, dot products, and vector updates. CSR’s fast SpMV makes CG efficient for large sparse problems. We demonstrate solving a small tridiagonal SPD system and report the residual norm to verify accuracy.