Marchenko-Pastur Distribution
Key Points
- •The Marchenko–Pastur (MP) distribution describes the limiting eigenvalue distribution of sample covariance matrices S = when both the dimension p and the sample size n grow with p/n .
- •Its density has compact support on [a, b] with a = (1 - )^{2} and b = (1 + )^{2}, and an additional point mass at zero of weight 1 - 1/ when > 1.
- •The MP density inside [a, b] equals f_(x) = . Outside this interval the density is zero.
- •If data are whitened (entries i.i.d. mean 0, variance 1), the empirical spectral distribution of S converges to MP(); with variance it scales by .
- •The largest eigenvalue tends to b and the smallest nonzero tends to a (if 1); deviations at the edges are on the Tracy–Widom scale.
- •MP theory underpins high-dimensional PCA, signal detection, and shrinkage: bulk eigenvalues follow MP, while true low-rank signals appear as outliers (spikes).
- •Correct scaling by 1/n and the correct aspect ratio = p/n are essential; forgetting these leads to wrong comparisons to MP.
- •Finite-sample spectra fluctuate around MP; simulations with moderate sizes already show good agreement, especially in the bulk.
Prerequisites
- →Eigenvalues and eigenvectors — Understanding spectra of symmetric matrices is essential to interpret the MP law and PCA.
- →Random variables and independence — The MP law assumes i.i.d. entries and relies on averaging properties across many samples.
- →Law of Large Numbers and concentration — Convergence of empirical distributions and stability of spectra require averaging and concentration ideas.
- →Covariance and sample covariance — MP describes the eigenvalues of the sample covariance matrix specifically.
- →Stieltjes transform and complex analysis basics — The Stieltjes transform characterizes the MP distribution and enables numerical inversion.
- →Numerical eigendecomposition methods — Computing empirical spectra requires practical algorithms like Jacobi or QR for symmetric matrices.
Detailed Explanation
Tap terms for definitions01Overview
The Marchenko–Pastur (MP) distribution is a cornerstone of random matrix theory for statistics and machine learning. It predicts the asymptotic behavior of eigenvalues of sample covariance matrices built from high-dimensional data. Suppose you collect n samples of p features and form the p × p sample covariance matrix S = (1/n) XX^{\top}, where X is the p × n centered data matrix. When both p and n grow large such that their ratio p/n approaches a constant \gamma > 0, the histogram of eigenvalues of S converges to a deterministic curve: the Marchenko–Pastur law with parameter \gamma. This curve has a simple closed-form density on a compact interval [a, b], where a = (1 − \sqrt{\gamma})^{2} and b = (1 + \sqrt{\gamma})^{2}, capturing the entire “bulk” of the spectrum. If p exceeds n (\gamma > 1), there is an extra point mass at zero because S becomes rank-deficient. This phenomenon appears widely: in PCA of high-dimensional, noise-only data, the cloud of eigenvalues forms the MP bulk. If the data actually contain low-rank signals (“spikes”), those signals show up as eigenvalues outside the MP support. Thus, MP provides both a baseline for what noise looks like and a diagnostic for detecting structure. Its tools (density, Stieltjes transform, edge behavior) guide thresholding, shrinkage, and performance predictions in modern data analysis.
02Intuition & Analogies
Imagine recording the height of waves on a choppy sea using many sensors. Each sensor measures random noise of the same variance. When you compute the sample covariance across sensors, you’re mixing lots of small random contributions. In low dimensions (few sensors), the covariance eigenvalues bounce around without a neat pattern. But as the number of sensors and the number of recordings both grow large together, the randomness averages out into a stable shape—like how a large jar of mixed candies settles into a predictable color ratio. The aspect ratio \gamma = p/n is the recipe controlling this shape. If there are far more sensors than time points (\gamma > 1), many directions in sensor-space do not have enough data to be determined, so you get many zero eigenvalues—hence the point mass at zero. If sensors and samples are balanced, the energy spreads smoothly over a compact interval. The edges of this interval, a and b, act like guardrails: under pure noise, eigenvalues rarely go beyond them (up to finite-sample wiggles). From a PCA viewpoint, imagine your data are pure white noise. The MP law tells you exactly what spread of eigenvalues you should expect just from noise. If you see an eigenvalue poking above the top edge b, it’s like spotting a mountain above the clouds—it likely indicates a real underlying factor (a signal) rather than random fluctuation. This is why the MP distribution is so useful: it separates what is typical under randomness from what is exceptional and potentially meaningful.
03Formal Definition
04When to Use
- High-dimensional PCA benchmarking: Use MP as the null model for eigenvalues under white noise; eigenvalues above b suggest possible low-rank signals.
- Covariance shrinkage and regularization: MP informs how much bulk eigenvalues are inflated by sampling noise, guiding shrinkage targets.
- Model diagnostics in factor analysis: Compare the empirical spectrum to MP to check whether residuals behave like white noise.
- Randomized linear algebra and signal detection: Predict typical singular value scales for random projections and noise-only matrices.
- Wireless communications and information theory: Characterize capacity of MIMO channels via eigenvalue distributions akin to MP.
- Stress testing algorithms: Simulate matrices with MP bulk to evaluate robustness of spectral clustering, graph embeddings, and whitening steps. Use MP when entries are approximately i.i.d., centered, and homoscedastic, and when both p and n are large with a stable aspect ratio \gamma. If data have strong correlations, heavy tails, or nonstationarities, MP may not apply directly without preprocessing (e.g., whitening, truncation).
⚠️Common Mistakes
- Wrong scaling: Using S = XX^{\top} instead of S = (1/n) XX^{\top} shifts the entire spectrum by a factor of n. Always divide by n to compare to MP.
- Confusing shapes: MP for S = (1/n) XX^{\top} (p × p) and for (1/n) X^{\top}X (n × n) share nonzero spectra, but their aspect ratio parameters invert; keep track of whether you analyze p/n or n/p and the atom at zero.
- Ignoring the atom at zero: When \gamma > 1 (p > n), a mass 1 - 1/\gamma sits at zero; missing it skews density comparisons.
- Non-i.i.d. or unwhitened data: MP assumes independent, equal-variance entries (after centering). Strong correlations or heteroscedasticity invalidate the MP baseline; whiten or model the structure first.
- Finite-sample overinterpretation: Small deviations from [a, b] can occur for moderate sizes; edge fluctuations scale as n^{-2/3}. Use statistical thresholds (e.g., Tracy–Widom) rather than eyeballing.
- Misreading support scaling: If entries have variance \sigma^{2} \neq 1, the support is [\sigma^{2} a, \sigma^{2} b]; forgetting \sigma^{2} leads to mismatched overlays.
Key Formulas
MP support
Explanation: These are the lower and upper edges of the Marchenko–Pastur density for unit variance. All bulk eigenvalues lie between a and b in the large-dimensional limit.
MP density
Explanation: This is the limiting density of eigenvalues for S = with variance 1 entries. It is zero outside [a, b] and integrates to 1 when 1 (otherwise mass at 0 completes it).
Atom at zero
Explanation: When >1 (more features than samples), the covariance matrix has a fraction 1 - 1/ of zero eigenvalues. When 1 there is no atom at zero.
Stieltjes transform (closed form)
Explanation: This analytic formula holds for z in the complex upper half-plane with the square-root branch chosen so that m(z) > 0. It characterizes MP and allows recovery of the density via its imaginary part.
Self-consistent equation
Explanation: The Stieltjes transform of MP solves this quadratic relationship. It is often used in proofs and numerical fixed-point schemes for related models.
Inversion of Stieltjes transform
Explanation: The density can be recovered from the imaginary part of the Stieltjes transform approached from the upper half-plane. This provides a robust way to approximate densities numerically.
Variance scaling
Explanation: If entries have variance , the MP distribution scales by . The support becomes [ a, b] and eigenvalues are multiplied by .
Edge convergence
Explanation: The largest eigenvalue converges almost surely to b, and the smallest nonzero eigenvalue converges to a when 1. This describes the limiting edges of the spectrum.
Mean of MP (unit variance)
Explanation: The MP distribution has mean 1 when entries are unit variance. This matches the fact that E[(S)/p] = 1 for whitened data.
ESD convergence
Explanation: The empirical spectral distribution of the sample covariance converges weakly, almost surely, to the MP law as p, n with p/n .
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Index helper for row-major p x p matrices 5 inline size_t idx(size_t i, size_t j, size_t p) { return i * p + j; } 6 7 // Build S = (1/n) * X * X^T from X (p x n), with X stored row-major as vector<double> 8 vector<double> sample_covariance(const vector<double>& X, size_t p, size_t n) { 9 vector<double> S(p * p, 0.0); 10 // Accumulate S_ij = (1/n) sum_k X_{i,k} X_{j,k} 11 for (size_t k = 0; k < n; ++k) { 12 // For column k, get pointer to X(:,k) 13 // Access X_{i,k} at X[i*n + k] 14 for (size_t i = 0; i < p; ++i) { 15 double xi = X[i * n + k]; 16 for (size_t j = 0; j <= i; ++j) { // compute lower triangle, mirror later 17 S[idx(i, j, p)] += xi * X[j * n + k]; 18 } 19 } 20 } 21 // Scale by 1/n and symmetrize 22 double invn = 1.0 / static_cast<double>(n); 23 for (size_t i = 0; i < p; ++i) { 24 for (size_t j = 0; j <= i; ++j) { 25 S[idx(i, j, p)] *= invn; 26 S[idx(j, i, p)] = S[idx(i, j, p)]; 27 } 28 } 29 return S; 30 } 31 32 // Symmetric Jacobi eigenvalue algorithm to compute eigenvalues of S (no eigenvectors) 33 vector<double> jacobi_eigenvalues(vector<double> A, size_t p, int max_sweeps = 100, double tol = 1e-10) { 34 // A is p x p symmetric, row-major; will be overwritten 35 vector<double> diag(p); 36 auto offdiag_norm = [&](const vector<double>& M) { 37 double s = 0.0; 38 for (size_t i = 0; i < p; ++i) 39 for (size_t j = 0; j < p; ++j) 40 if (i != j) s += M[idx(i, j, p)] * M[idx(i, j, p)]; 41 return sqrt(s); 42 }; 43 44 for (int sweep = 0; sweep < max_sweeps; ++sweep) { 45 // One sweep over all i<j pairs 46 double max_off = 0.0; 47 for (size_t i = 0; i + 1 < p; ++i) { 48 for (size_t j = i + 1; j < p; ++j) { 49 double aii = A[idx(i, i, p)]; 50 double ajj = A[idx(j, j, p)]; 51 double aij = A[idx(i, j, p)]; 52 max_off = max(max_off, fabs(aij)); 53 if (fabs(aij) < tol) continue; 54 // Compute Jacobi rotation parameters 55 double tau = (ajj - aii) / (2.0 * aij); 56 double t = (tau >= 0.0) ? 1.0 / (tau + sqrt(1.0 + tau * tau)) 57 : 1.0 / (tau - sqrt(1.0 + tau * tau)); 58 double c = 1.0 / sqrt(1.0 + t * t); 59 double s = t * c; 60 // Apply rotation to rows and columns i and j 61 for (size_t k = 0; k < p; ++k) { 62 if (k != i && k != j) { 63 double aik = A[idx(i, k, p)]; 64 double ajk = A[idx(j, k, p)]; 65 double tik = c * aik - s * ajk; 66 double tjk = s * aik + c * ajk; 67 A[idx(i, k, p)] = tik; 68 A[idx(k, i, p)] = tik; // symmetry 69 A[idx(j, k, p)] = tjk; 70 A[idx(k, j, p)] = tjk; // symmetry 71 } 72 } 73 // Update diagonal block 74 double aii_new = c * c * aii - 2.0 * s * c * aij + s * s * ajj; 75 double ajj_new = s * s * aii + 2.0 * s * c * aij + c * c * ajj; 76 A[idx(i, i, p)] = aii_new; 77 A[idx(j, j, p)] = ajj_new; 78 A[idx(i, j, p)] = 0.0; 79 A[idx(j, i, p)] = 0.0; 80 } 81 } 82 if (max_off < tol) break; // Converged 83 } 84 for (size_t i = 0; i < p; ++i) diag[i] = A[idx(i, i, p)]; 85 return diag; 86 } 87 88 // Marchenko–Pastur PDF for unit variance 89 double mp_pdf(double x, double gamma) { 90 double a = (1.0 - sqrt(gamma)) * (1.0 - sqrt(gamma)); 91 double b = (1.0 + sqrt(gamma)) * (1.0 + sqrt(gamma)); 92 if (x <= a || x >= b) return 0.0; 93 double val = sqrt((b - x) * (x - a)); 94 return val / (2.0 * M_PI * gamma * x); 95 } 96 97 int main() { 98 ios::sync_with_stdio(false); 99 cin.tie(nullptr); 100 101 // Parameters 102 size_t p = 200; // number of features (rows) 103 size_t n = 400; // number of samples (columns) 104 double gamma = static_cast<double>(p) / static_cast<double>(n); 105 106 // 1) Generate X with i.i.d. N(0,1) 107 std::mt19937_64 rng(12345); 108 std::normal_distribution<double> nd(0.0, 1.0); 109 vector<double> X(p * n); 110 for (size_t i = 0; i < p; ++i) 111 for (size_t k = 0; k < n; ++k) 112 X[i * n + k] = nd(rng); 113 114 // 2) Compute sample covariance S = (1/n) X X^T 115 vector<double> S = sample_covariance(X, p, n); 116 117 // 3) Compute eigenvalues via Jacobi (educational; for speed use LAPACK/Eigen) 118 vector<double> evals = jacobi_eigenvalues(S, p, /*max_sweeps=*/60, /*tol=*/1e-9); 119 120 // 4) Build an empirical density histogram and compare to MP pdf 121 double a = (1.0 - sqrt(gamma)) * (1.0 - sqrt(gamma)); 122 double b = (1.0 + sqrt(gamma)) * (1.0 + sqrt(gamma)); 123 124 // Determine range for histogram (clip to [min,max] of evals intersect [a,b]) 125 double minv = *min_element(evals.begin(), evals.end()); 126 double maxv = *max_element(evals.begin(), evals.end()); 127 double lo = max(0.0, min(minv, a)); 128 double hi = max(maxv, b); 129 130 int bins = 60; 131 vector<int> counts(bins, 0); 132 double width = (hi - lo) / bins; 133 for (double lam : evals) { 134 int bin = (lam <= lo) ? 0 : (lam >= hi ? bins - 1 : static_cast<int>((lam - lo) / width)); 135 counts[bin]++; 136 } 137 138 // Print: bin_center, empirical_density, mp_pdf 139 cout << fixed << setprecision(6); 140 cout << "# bin_center\tempirical_density\tmp_pdf\n"; 141 for (int i = 0; i < bins; ++i) { 142 double center = lo + (i + 0.5) * width; 143 double empirical_density = static_cast<double>(counts[i]) / (static_cast<double>(p) * width); 144 double theory = mp_pdf(center, gamma); 145 cout << center << "\t" << empirical_density << "\t" << theory << "\n"; 146 } 147 148 // Optional: report mass near zero vs atom prediction when gamma>1 149 if (gamma > 1.0) { 150 double atom_theory = 1.0 - 1.0 / gamma; 151 int near_zero = 0; 152 for (double lam : evals) if (lam < 1e-8) near_zero++; 153 double atom_emp = static_cast<double>(near_zero) / static_cast<double>(p); 154 cerr << "# Atom at 0 (theory vs empirical): " << atom_theory << " vs " << atom_emp << "\n"; 155 } 156 157 return 0; 158 } 159
This program generates a p × n matrix X with i.i.d. standard normal entries, forms the sample covariance S = (1/n)XX^T, and computes all eigenvalues via a simple symmetric Jacobi method. It then constructs a histogram-based empirical density and prints it alongside the theoretical Marchenko–Pastur density at the same bin centers. For gamma > 1, it also reports the expected atom at zero and the observed fraction of (near-)zero eigenvalues. Use a plotting tool to visualize the two columns (empirical vs. theoretical) to see the MP bulk emerge.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute MP support endpoints for unit variance 5 inline pair<double,double> mp_support(double gamma) { 6 double s = sqrt(gamma); 7 return { (1.0 - s) * (1.0 - s), (1.0 + s) * (1.0 + s) }; 8 } 9 10 // Closed-form MP Stieltjes transform m(z) = (1 - gamma - z + sqrt((z - a)(z - b))) / (2 gamma z) 11 // Branch chosen so that Im sqrt(...) >= 0 when Im z > 0. 12 complex<double> mp_stieltjes(complex<double> z, double gamma) { 13 auto [a, b] = mp_support(gamma); 14 complex<double> root = sqrt((z - a) * (z - b)); 15 if (imag(root) < 0) root = -root; // ensure correct analytic branch in upper half-plane 16 return (1.0 - gamma - z + root) / (2.0 * gamma * z); 17 } 18 19 // Closed-form MP density for unit variance 20 double mp_pdf(double x, double gamma) { 21 auto [a, b] = mp_support(gamma); 22 if (x <= a || x >= b) return 0.0; 23 double val = sqrt((b - x) * (x - a)); 24 return val / (2.0 * M_PI * gamma * x); 25 } 26 27 // Approximate density via Stieltjes inversion: f(x) ~= (1/pi) * Im m(x + i*eta) 28 // Smaller eta -> more accurate but more oscillatory; choose moderate eta. 29 double mp_pdf_from_stieltjes(double x, double gamma, double eta) { 30 complex<double> z(x, eta); 31 complex<double> m = mp_stieltjes(z, gamma); 32 return imag(m) / M_PI; 33 } 34 35 int main() { 36 ios::sync_with_stdio(false); 37 cin.tie(nullptr); 38 39 double gamma = 0.5; // aspect ratio p/n 40 auto [a, b] = mp_support(gamma); 41 int N = 25; // number of evaluation points across support 42 double eta = 1e-3; // imaginary shift for Stieltjes inversion 43 44 cout << fixed << setprecision(8); 45 cout << "# x\tclosed_form_pdf\tstieltjes_pdf(eta=" << eta << ")\n"; 46 for (int i = 0; i <= N; ++i) { 47 double x = a + (b - a) * (static_cast<double>(i) / N); 48 double f_closed = mp_pdf(x, gamma); 49 double f_st = mp_pdf_from_stieltjes(x, gamma, eta); 50 cout << x << "\t" << f_closed << "\t" << f_st << "\n"; 51 } 52 53 // Also show behavior slightly outside support (should be near zero) 54 double x_left = max(0.0, a - 0.1 * (b - a)); 55 double x_right = b + 0.1 * (b - a); 56 cout << "# Outside support checks:\n"; 57 cout << x_left << "\t" << mp_pdf(x_left, gamma) << "\t" << mp_pdf_from_stieltjes(x_left, gamma, eta) << "\n"; 58 cout << x_right << "\t" << mp_pdf(x_right, gamma) << "\t" << mp_pdf_from_stieltjes(x_right, gamma, eta) << "\n"; 59 60 return 0; 61 } 62
This program implements the closed-form Stieltjes transform of the Marchenko–Pastur law and uses the inversion formula f(x) ≈ (1/π) Im m(x + iη) to recover the density numerically. It prints the closed-form density and the Stieltjes-based approximation at grid points across the support and a couple of points outside. Reducing η improves accuracy but may increase numerical oscillations near the edges; values around 1e−3 to 1e−2 are practical for double precision.