∑MathAdvanced

Marchenko-Pastur Distribution

Key Points

•
The Marchenko–Pastur (MP) distribution describes the limiting eigenvalue distribution of sample covariance matrices S = $\frac{1}{n}$ $X X^{⊤}$ when both the dimension p and the sample size n grow with p/n $\to$ $γ$ .
•
Its density has compact support on [a, b] with a = (1 - $γ$ )^{2} and b = (1 + $γ$ )^{2}, and an additional point mass at zero of weight 1 - 1/ $γ$ when $γ$ > 1.
•
The MP density inside [a, b] equals f_ $γ$ (x) = $\frac{( b - x ) ( x - a )}{2 πγ x}$ . Outside this interval the density is zero.
•
If data are whitened (entries i.i.d. mean 0, variance 1), the empirical spectral distribution of S converges to MP( $γ$ ); with variance $σ^{2}$ it scales by $σ^{2}$ .
•
The largest eigenvalue tends to b and the smallest nonzero tends to a (if $γ$ $\leq$ 1); deviations at the edges are on the $n^{- 2/3}$ Tracy–Widom scale.
•
MP theory underpins high-dimensional PCA, signal detection, and shrinkage: bulk eigenvalues follow MP, while true low-rank signals appear as outliers (spikes).
•
Correct scaling by 1/n and the correct aspect ratio $γ$ = p/n are essential; forgetting these leads to wrong comparisons to MP.
•
Finite-sample spectra fluctuate around MP; simulations with moderate sizes already show good agreement, especially in the bulk.

Prerequisites

→Eigenvalues and eigenvectors — Understanding spectra of symmetric matrices is essential to interpret the MP law and PCA.
→Random variables and independence — The MP law assumes i.i.d. entries and relies on averaging properties across many samples.
→Law of Large Numbers and concentration — Convergence of empirical distributions and stability of spectra require averaging and concentration ideas.
→Covariance and sample covariance — MP describes the eigenvalues of the sample covariance matrix specifically.
→Stieltjes transform and complex analysis basics — The Stieltjes transform characterizes the MP distribution and enables numerical inversion.
→Numerical eigendecomposition methods — Computing empirical spectra requires practical algorithms like Jacobi or QR for symmetric matrices.

Detailed Explanation

Tap terms for definitions

01Overview

The Marchenko–Pastur (MP) distribution is a cornerstone of random matrix theory for statistics and machine learning. It predicts the asymptotic behavior of eigenvalues of sample covariance matrices built from high-dimensional data. Suppose you collect n samples of p features and form the p × p sample covariance matrix S = (1/n) XX^{\top}, where X is the p × n centered data matrix. When both p and n grow large such that their ratio p/n approaches a constant \gamma > 0, the histogram of eigenvalues of S converges to a deterministic curve: the Marchenko–Pastur law with parameter \gamma. This curve has a simple closed-form density on a compact interval [a, b], where a = (1 − \sqrt{\gamma})^{2} and b = (1 + \sqrt{\gamma})^{2}, capturing the entire “bulk” of the spectrum. If p exceeds n (\gamma > 1), there is an extra point mass at zero because S becomes rank-deficient. This phenomenon appears widely: in PCA of high-dimensional, noise-only data, the cloud of eigenvalues forms the MP bulk. If the data actually contain low-rank signals (“spikes”), those signals show up as eigenvalues outside the MP support. Thus, MP provides both a baseline for what noise looks like and a diagnostic for detecting structure. Its tools (density, Stieltjes transform, edge behavior) guide thresholding, shrinkage, and performance predictions in modern data analysis.

02Intuition & Analogies

Imagine recording the height of waves on a choppy sea using many sensors. Each sensor measures random noise of the same variance. When you compute the sample covariance across sensors, you’re mixing lots of small random contributions. In low dimensions (few sensors), the covariance eigenvalues bounce around without a neat pattern. But as the number of sensors and the number of recordings both grow large together, the randomness averages out into a stable shape—like how a large jar of mixed candies settles into a predictable color ratio. The aspect ratio \gamma = p/n is the recipe controlling this shape. If there are far more sensors than time points (\gamma > 1), many directions in sensor-space do not have enough data to be determined, so you get many zero eigenvalues—hence the point mass at zero. If sensors and samples are balanced, the energy spreads smoothly over a compact interval. The edges of this interval, a and b, act like guardrails: under pure noise, eigenvalues rarely go beyond them (up to finite-sample wiggles). From a PCA viewpoint, imagine your data are pure white noise. The MP law tells you exactly what spread of eigenvalues you should expect just from noise. If you see an eigenvalue poking above the top edge b, it’s like spotting a mountain above the clouds—it likely indicates a real underlying factor (a signal) rather than random fluctuation. This is why the MP distribution is so useful: it separates what is typical under randomness from what is exceptional and potentially meaningful.

03Formal Definition

Let X be a p × n random matrix with entries

X_{ij}

that are i.i.d., centered, and have variance

Var

(

X_{ij}

) =

σ^{2}

, with suitable moment conditions (e.g., finite fourth moment). Define the sample covariance matrix S =

\frac{1}{n}

X X^{⊤}

(p × p). Consider the empirical spectral distribution (ESD)

μ_{S}

\frac{1}{p}

\sum_{i = 1}^{p}

δ_{λ_{i}}

, where \{

λ_{i}

\} are the eigenvalues of S. If p, n

\to

\infty

such that p/n

\to

γ

\in

(0,

\infty

), then

μ_{S}

converges weakly, almost surely, to the Marchenko–Pastur distribution MP_{

γ

σ^{2}

}. For

σ^{2}

= 1, the MP density is supported on [a, b], with a = (1 -

γ

)^{2} and b = (1 +

γ

)^{2}, and has density

f_{γ}

(x) =

\frac{( b - x ) ( x - a )}{2 πγ x}

1_{[a, b]}

(x). If

γ

> 1, there is additionally a point mass at zero of size 1 - 1/

γ

. For general

σ^{2}

, the distribution scales: the support is [

σ^{2}

σ^{2}

b], and the density is f_{

γ

σ^{2}

}(x) =

\frac{1}{σ ^{2}}

f_{γ}

(x/

σ^{2}

). The Stieltjes transform m(z) =

\int

\frac{1}{x - z}

\, d

μ

(x) of

M P_{γ, 1}

satisfies an explicit formula and the self-consistent equation z = -

\frac{1}{m ( z )}

\frac{γ}{1 + m ( z )}

for z in the complex upper half-plane (with the appropriate branch).

04When to Use

High-dimensional PCA benchmarking: Use MP as the null model for eigenvalues under white noise; eigenvalues above b suggest possible low-rank signals.
Covariance shrinkage and regularization: MP informs how much bulk eigenvalues are inflated by sampling noise, guiding shrinkage targets.
Model diagnostics in factor analysis: Compare the empirical spectrum to MP to check whether residuals behave like white noise.
Randomized linear algebra and signal detection: Predict typical singular value scales for random projections and noise-only matrices.
Wireless communications and information theory: Characterize capacity of MIMO channels via eigenvalue distributions akin to MP.
Stress testing algorithms: Simulate matrices with MP bulk to evaluate robustness of spectral clustering, graph embeddings, and whitening steps. Use MP when entries are approximately i.i.d., centered, and homoscedastic, and when both p and n are large with a stable aspect ratio \gamma. If data have strong correlations, heavy tails, or nonstationarities, MP may not apply directly without preprocessing (e.g., whitening, truncation).

⚠️Common Mistakes

Wrong scaling: Using S = XX^{\top} instead of S = (1/n) XX^{\top} shifts the entire spectrum by a factor of n. Always divide by n to compare to MP.
Confusing shapes: MP for S = (1/n) XX^{\top} (p × p) and for (1/n) X^{\top}X (n × n) share nonzero spectra, but their aspect ratio parameters invert; keep track of whether you analyze p/n or n/p and the atom at zero.
Ignoring the atom at zero: When \gamma > 1 (p > n), a mass 1 - 1/\gamma sits at zero; missing it skews density comparisons.
Non-i.i.d. or unwhitened data: MP assumes independent, equal-variance entries (after centering). Strong correlations or heteroscedasticity invalidate the MP baseline; whiten or model the structure first.
Finite-sample overinterpretation: Small deviations from [a, b] can occur for moderate sizes; edge fluctuations scale as n^{-2/3}. Use statistical thresholds (e.g., Tracy–Widom) rather than eyeballing.
Misreading support scaling: If entries have variance \sigma^{2} \neq 1, the support is [\sigma^{2} a, \sigma^{2} b]; forgetting \sigma^{2} leads to mismatched overlays.

Key Formulas

MP support

a = (1 - γ)^{2}, b = (1 + γ)^{2}

Explanation: These are the lower and upper edges of the Marchenko–Pastur density for unit variance. All bulk eigenvalues lie between a and b in the large-dimensional limit.

MP density

f_{γ} (x) = \frac{( b - x ) ( x - a )}{2 πγ x} 1_{[a, b]} (x)

Explanation: This is the limiting density of eigenvalues for S = $\frac{1}{n}$ $X X^{⊤}$ with variance 1 entries. It is zero outside [a, b] and integrates to 1 when $γ$ $\leq$ 1 (otherwise mass at 0 completes it).

Atom at zero

μ_{atom at 0} = max {0, 1 - \frac{1}{γ}}

Explanation: When $γ$ >1 (more features than samples), the covariance matrix has a fraction 1 - 1/ $γ$ of zero eigenvalues. When $γ \leq$ 1 there is no atom at zero.

Stieltjes transform (closed form)

m_{γ} (z) = \int \frac{1}{x - z} d μ_{γ} (x) = \frac{1 - γ - z + ( z - a ) ( z - b )}{2 γ z}

Explanation: This analytic formula holds for z in the complex upper half-plane with the square-root branch chosen so that $Im$ m(z) > 0. It characterizes MP and allows recovery of the density via its imaginary part.

Self-consistent equation

z = - \frac{1}{m ( z )} + \frac{γ}{1 + m ( z )}

Explanation: The Stieltjes transform of MP solves this quadratic relationship. It is often used in proofs and numerical fixed-point schemes for related models.

Inversion of Stieltjes transform

f (x) = η ↓ 0 lim \frac{1}{π} Im m (x + i η)

Explanation: The density can be recovered from the imaginary part of the Stieltjes transform approached from the upper half-plane. This provides a robust way to approximate densities numerically.

Variance scaling

μ_{γ, σ^{2}} (A) = μ_{γ, 1} (A / σ^{2})

Explanation: If entries have variance $σ^{2}$ , the MP distribution scales by $σ^{2}$ . The support becomes [ $σ^{2}$ a, $σ^{2}$ b] and eigenvalues are multiplied by $σ^{2}$ .

Edge convergence

λ_{m a x} (S) a . s . b, λ_{m i n}^{+} (S) a . s . a

Explanation: The largest eigenvalue converges almost surely to b, and the smallest nonzero eigenvalue converges to a when $γ$ $\leq$ 1. This describes the limiting edges of the spectrum.

Mean of MP (unit variance)

\int x d μ_{γ} (x) = 1

Explanation: The MP distribution has mean 1 when entries are unit variance. This matches the fact that E[ $tr$ (S)/p] = 1 for whitened data.

ESD convergence

ESD (S) = \frac{1}{p} i = 1 \sum p δ_{λ_{i} (S)} a.s. μ_{γ}

Explanation: The empirical spectral distribution of the sample covariance converges weakly, almost surely, to the MP law as p, n $\to$ $\infty$ with p/n $\to$ $γ$ .

Complexity Analysis

Let X be p × n. Forming the sample covariance S =

\frac{1}{n}

X X^{⊤}

naively costs O(

p^{2}

n) time and O(

p^{2}

) extra space if you stream columns, or O(p n) space if you store X to reuse it. In the provided code we materialize X and then accumulate S, so memory is O(p n +

p^{2}

). For dense data with p ≈ n, this dominates computation. Computing all eigenvalues of the symmetric p × p matrix S with a simple Jacobi method costs roughly O(

p^{3}

) per full sweep; a few sweeps (tens to hundreds) are often enough for moderate accuracy, leading to practical time of c · O(

p^{3}

). Professional libraries (LAPACK’s dsyev or Eigen) achieve O(

p^{3}

) with better constants using tridiagonal reduction and QR. Thus, end-to-end, the simulation is O(

p^{2}

n +

p^{3}

) time and O(p n +

p^{2}

) space. Evaluating the MP density or Stieltjes transform on a grid of N points is cheap: O(N) time and O(1) space per point (constant work per x). Using the Stieltjes transform to approximate the density via f(x) ≈ (1/

π

) Im m(x + i

η

) with fixed

η

adds only constant-factor costs due to complex arithmetic. In practice: when n is much larger than p, forming S (O(

p^{2}

n)) is the bottleneck; when p is comparable to or larger than n, eigen-decomposition (O(

p^{3}

)) dominates. For large-scale applications, randomized methods or iterative eigensolvers compute only a few top eigenvalues in O(

p^{2}

n + k

p^{2}

) time, but here we deliberately compute the full spectrum for didactic purposes.

Code Examples

Simulate sample covariance eigenvalues and compare to Marchenko–Pastur density

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Index helper for row-major p x p matrices
5 inline size_t idx(size_t i, size_t j, size_t p) { return i * p + j; }
6 
7 // Build S = (1/n) * X * X^T from X (p x n), with X stored row-major as vector<double>
8 vector<double> sample_covariance(const vector<double>& X, size_t p, size_t n) {
9     vector<double> S(p * p, 0.0);
10     // Accumulate S_ij = (1/n) sum_k X_{i,k} X_{j,k}
11     for (size_t k = 0; k < n; ++k) {
12         // For column k, get pointer to X(:,k)
13         // Access X_{i,k} at X[i*n + k]
14         for (size_t i = 0; i < p; ++i) {
15             double xi = X[i * n + k];
16             for (size_t j = 0; j <= i; ++j) { // compute lower triangle, mirror later
17                 S[idx(i, j, p)] += xi * X[j * n + k];
18             }
19         }
20     }
21     // Scale by 1/n and symmetrize
22     double invn = 1.0 / static_cast<double>(n);
23     for (size_t i = 0; i < p; ++i) {
24         for (size_t j = 0; j <= i; ++j) {
25             S[idx(i, j, p)] *= invn;
26             S[idx(j, i, p)] = S[idx(i, j, p)];
27         }
28     }
29     return S;
30 }
31 
32 // Symmetric Jacobi eigenvalue algorithm to compute eigenvalues of S (no eigenvectors)
33 vector<double> jacobi_eigenvalues(vector<double> A, size_t p, int max_sweeps = 100, double tol = 1e-10) {
34     // A is p x p symmetric, row-major; will be overwritten
35     vector<double> diag(p);
36     auto offdiag_norm = [&](const vector<double>& M) {
37         double s = 0.0;
38         for (size_t i = 0; i < p; ++i)
39             for (size_t j = 0; j < p; ++j)
40                 if (i != j) s += M[idx(i, j, p)] * M[idx(i, j, p)];
41         return sqrt(s);
42     };
43 
44     for (int sweep = 0; sweep < max_sweeps; ++sweep) {
45         // One sweep over all i<j pairs
46         double max_off = 0.0;
47         for (size_t i = 0; i + 1 < p; ++i) {
48             for (size_t j = i + 1; j < p; ++j) {
49                 double aii = A[idx(i, i, p)];
50                 double ajj = A[idx(j, j, p)];
51                 double aij = A[idx(i, j, p)];
52                 max_off = max(max_off, fabs(aij));
53                 if (fabs(aij) < tol) continue;
54                 // Compute Jacobi rotation parameters
55                 double tau = (ajj - aii) / (2.0 * aij);
56                 double t = (tau >= 0.0) ? 1.0 / (tau + sqrt(1.0 + tau * tau))
57                                         : 1.0 / (tau - sqrt(1.0 + tau * tau));
58                 double c = 1.0 / sqrt(1.0 + t * t);
59                 double s = t * c;
60                 // Apply rotation to rows and columns i and j
61                 for (size_t k = 0; k < p; ++k) {
62                     if (k != i && k != j) {
63                         double aik = A[idx(i, k, p)];
64                         double ajk = A[idx(j, k, p)];
65                         double tik = c * aik - s * ajk;
66                         double tjk = s * aik + c * ajk;
67                         A[idx(i, k, p)] = tik;
68                         A[idx(k, i, p)] = tik; // symmetry
69                         A[idx(j, k, p)] = tjk;
70                         A[idx(k, j, p)] = tjk; // symmetry
71                     }
72                 }
73                 // Update diagonal block
74                 double aii_new = c * c * aii - 2.0 * s * c * aij + s * s * ajj;
75                 double ajj_new = s * s * aii + 2.0 * s * c * aij + c * c * ajj;
76                 A[idx(i, i, p)] = aii_new;
77                 A[idx(j, j, p)] = ajj_new;
78                 A[idx(i, j, p)] = 0.0;
79                 A[idx(j, i, p)] = 0.0;
80             }
81         }
82         if (max_off < tol) break; // Converged
83     }
84     for (size_t i = 0; i < p; ++i) diag[i] = A[idx(i, i, p)];
85     return diag;
86 }
87 
88 // Marchenko–Pastur PDF for unit variance
89 double mp_pdf(double x, double gamma) {
90     double a = (1.0 - sqrt(gamma)) * (1.0 - sqrt(gamma));
91     double b = (1.0 + sqrt(gamma)) * (1.0 + sqrt(gamma));
92     if (x <= a || x >= b) return 0.0;
93     double val = sqrt((b - x) * (x - a));
94     return val / (2.0 * M_PI * gamma * x);
95 }
96 
97 int main() {
98     ios::sync_with_stdio(false);
99     cin.tie(nullptr);
100 
101     // Parameters
102     size_t p = 200;     // number of features (rows)
103     size_t n = 400;     // number of samples (columns)
104     double gamma = static_cast<double>(p) / static_cast<double>(n);
105 
106     // 1) Generate X with i.i.d. N(0,1)
107     std::mt19937_64 rng(12345);
108     std::normal_distribution<double> nd(0.0, 1.0);
109     vector<double> X(p * n);
110     for (size_t i = 0; i < p; ++i)
111         for (size_t k = 0; k < n; ++k)
112             X[i * n + k] = nd(rng);
113 
114     // 2) Compute sample covariance S = (1/n) X X^T
115     vector<double> S = sample_covariance(X, p, n);
116 
117     // 3) Compute eigenvalues via Jacobi (educational; for speed use LAPACK/Eigen)
118     vector<double> evals = jacobi_eigenvalues(S, p, /*max_sweeps=*/60, /*tol=*/1e-9);
119 
120     // 4) Build an empirical density histogram and compare to MP pdf
121     double a = (1.0 - sqrt(gamma)) * (1.0 - sqrt(gamma));
122     double b = (1.0 + sqrt(gamma)) * (1.0 + sqrt(gamma));
123 
124     // Determine range for histogram (clip to [min,max] of evals intersect [a,b])
125     double minv = *min_element(evals.begin(), evals.end());
126     double maxv = *max_element(evals.begin(), evals.end());
127     double lo = max(0.0, min(minv, a));
128     double hi = max(maxv, b);
129 
130     int bins = 60;
131     vector<int> counts(bins, 0);
132     double width = (hi - lo) / bins;
133     for (double lam : evals) {
134         int bin = (lam <= lo) ? 0 : (lam >= hi ? bins - 1 : static_cast<int>((lam - lo) / width));
135         counts[bin]++;
136     }
137 
138     // Print: bin_center, empirical_density, mp_pdf
139     cout << fixed << setprecision(6);
140     cout << "# bin_center\tempirical_density\tmp_pdf\n";
141     for (int i = 0; i < bins; ++i) {
142         double center = lo + (i + 0.5) * width;
143         double empirical_density = static_cast<double>(counts[i]) / (static_cast<double>(p) * width);
144         double theory = mp_pdf(center, gamma);
145         cout << center << "\t" << empirical_density << "\t" << theory << "\n";
146     }
147 
148     // Optional: report mass near zero vs atom prediction when gamma>1
149     if (gamma > 1.0) {
150         double atom_theory = 1.0 - 1.0 / gamma;
151         int near_zero = 0;
152         for (double lam : evals) if (lam < 1e-8) near_zero++;
153         double atom_emp = static_cast<double>(near_zero) / static_cast<double>(p);
154         cerr << "# Atom at 0 (theory vs empirical): " << atom_theory << " vs " << atom_emp << "\n";
155     }
156 
157     return 0;
158 }
159

This program generates a p × n matrix X with i.i.d. standard normal entries, forms the sample covariance S = (1/n)XX^T, and computes all eigenvalues via a simple symmetric Jacobi method. It then constructs a histogram-based empirical density and prints it alongside the theoretical Marchenko–Pastur density at the same bin centers. For gamma > 1, it also reports the expected atom at zero and the observed fraction of (near-)zero eigenvalues. Use a plotting tool to visualize the two columns (empirical vs. theoretical) to see the MP bulk emerge.

Time: O(p^2 n) to form S and approximately O(p^3) for the Jacobi eigensolver (per a modest number of sweeps).Space: O(p n + p^2) to store X and S; O(p^2) workspace during Jacobi.

Stieltjes transform of Marchenko–Pastur and density recovery

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute MP support endpoints for unit variance
5 inline pair<double,double> mp_support(double gamma) {
6     double s = sqrt(gamma);
7     return { (1.0 - s) * (1.0 - s), (1.0 + s) * (1.0 + s) };
8 }
9 
10 // Closed-form MP Stieltjes transform m(z) = (1 - gamma - z + sqrt((z - a)(z - b))) / (2 gamma z)
11 // Branch chosen so that Im sqrt(...) >= 0 when Im z > 0.
12 complex<double> mp_stieltjes(complex<double> z, double gamma) {
13     auto [a, b] = mp_support(gamma);
14     complex<double> root = sqrt((z - a) * (z - b));
15     if (imag(root) < 0) root = -root; // ensure correct analytic branch in upper half-plane
16     return (1.0 - gamma - z + root) / (2.0 * gamma * z);
17 }
18 
19 // Closed-form MP density for unit variance
20 double mp_pdf(double x, double gamma) {
21     auto [a, b] = mp_support(gamma);
22     if (x <= a || x >= b) return 0.0;
23     double val = sqrt((b - x) * (x - a));
24     return val / (2.0 * M_PI * gamma * x);
25 }
26 
27 // Approximate density via Stieltjes inversion: f(x) ~= (1/pi) * Im m(x + i*eta)
28 // Smaller eta -> more accurate but more oscillatory; choose moderate eta.
29 double mp_pdf_from_stieltjes(double x, double gamma, double eta) {
30     complex<double> z(x, eta);
31     complex<double> m = mp_stieltjes(z, gamma);
32     return imag(m) / M_PI;
33 }
34 
35 int main() {
36     ios::sync_with_stdio(false);
37     cin.tie(nullptr);
38 
39     double gamma = 0.5; // aspect ratio p/n
40     auto [a, b] = mp_support(gamma);
41     int N = 25;         // number of evaluation points across support
42     double eta = 1e-3;  // imaginary shift for Stieltjes inversion
43 
44     cout << fixed << setprecision(8);
45     cout << "# x\tclosed_form_pdf\tstieltjes_pdf(eta=" << eta << ")\n";
46     for (int i = 0; i <= N; ++i) {
47         double x = a + (b - a) * (static_cast<double>(i) / N);
48         double f_closed = mp_pdf(x, gamma);
49         double f_st = mp_pdf_from_stieltjes(x, gamma, eta);
50         cout << x << "\t" << f_closed << "\t" << f_st << "\n";
51     }
52 
53     // Also show behavior slightly outside support (should be near zero)
54     double x_left = max(0.0, a - 0.1 * (b - a));
55     double x_right = b + 0.1 * (b - a);
56     cout << "# Outside support checks:\n";
57     cout << x_left << "\t" << mp_pdf(x_left, gamma) << "\t" << mp_pdf_from_stieltjes(x_left, gamma, eta) << "\n";
58     cout << x_right << "\t" << mp_pdf(x_right, gamma) << "\t" << mp_pdf_from_stieltjes(x_right, gamma, eta) << "\n";
59 
60     return 0;
61 }
62

This program implements the closed-form Stieltjes transform of the Marchenko–Pastur law and uses the inversion formula f(x) ≈ (1/π) Im m(x + iη) to recover the density numerically. It prints the closed-form density and the Stieltjes-based approximation at grid points across the support and a couple of points outside. Reducing η improves accuracy but may increase numerical oscillations near the edges; values around 1e−3 to 1e−2 are practical for double precision.

Time: O(N) for N evaluation points (constant-time complex arithmetic per point).Space: O(1) additional space beyond outputs.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Index helper for row-major p x p matrices
5	inline size_t idx(size_t i, size_t j, size_t p) { return i * p + j; }
6
7	// Build S = (1/n) * X * X^T from X (p x n), with X stored row-major as vector<double>
8	vector<double> sample_covariance(const vector<double>& X, size_t p, size_t n) {
9	vector<double> S(p * p, 0.0);
10	// Accumulate S_ij = (1/n) sum_k X_{i,k} X_{j,k}
11	for (size_t k = 0; k < n; ++k) {
12	// For column k, get pointer to X(:,k)
13	// Access X_{i,k} at X[i*n + k]
14	for (size_t i = 0; i < p; ++i) {
15	double xi = X[i * n + k];
16	for (size_t j = 0; j <= i; ++j) { // compute lower triangle, mirror later
17	S[idx(i, j, p)] += xi * X[j * n + k];
18	}
19	}
20	}
21	// Scale by 1/n and symmetrize
22	double invn = 1.0 / static_cast<double>(n);
23	for (size_t i = 0; i < p; ++i) {
24	for (size_t j = 0; j <= i; ++j) {
25	S[idx(i, j, p)] *= invn;
26	S[idx(j, i, p)] = S[idx(i, j, p)];
27	}
28	}
29	return S;
30	}
31
32	// Symmetric Jacobi eigenvalue algorithm to compute eigenvalues of S (no eigenvectors)
33	vector<double> jacobi_eigenvalues(vector<double> A, size_t p, int max_sweeps = 100, double tol = 1e-10) {
34	// A is p x p symmetric, row-major; will be overwritten
35	vector<double> diag(p);
36	auto offdiag_norm = [&](const vector<double>& M) {
37	double s = 0.0;
38	for (size_t i = 0; i < p; ++i)
39	for (size_t j = 0; j < p; ++j)
40	if (i != j) s += M[idx(i, j, p)] * M[idx(i, j, p)];
41	return sqrt(s);
42	};
43
44	for (int sweep = 0; sweep < max_sweeps; ++sweep) {
45	// One sweep over all i<j pairs
46	double max_off = 0.0;
47	for (size_t i = 0; i + 1 < p; ++i) {
48	for (size_t j = i + 1; j < p; ++j) {
49	double aii = A[idx(i, i, p)];
50	double ajj = A[idx(j, j, p)];
51	double aij = A[idx(i, j, p)];
52	max_off = max(max_off, fabs(aij));
53	if (fabs(aij) < tol) continue;
54	// Compute Jacobi rotation parameters
55	double tau = (ajj - aii) / (2.0 * aij);
56	double t = (tau >= 0.0) ? 1.0 / (tau + sqrt(1.0 + tau * tau))
57	: 1.0 / (tau - sqrt(1.0 + tau * tau));
58	double c = 1.0 / sqrt(1.0 + t * t);
59	double s = t * c;
60	// Apply rotation to rows and columns i and j
61	for (size_t k = 0; k < p; ++k) {
62	if (k != i && k != j) {
63	double aik = A[idx(i, k, p)];
64	double ajk = A[idx(j, k, p)];
65	double tik = c * aik - s * ajk;
66	double tjk = s * aik + c * ajk;
67	A[idx(i, k, p)] = tik;
68	A[idx(k, i, p)] = tik; // symmetry
69	A[idx(j, k, p)] = tjk;
70	A[idx(k, j, p)] = tjk; // symmetry
71	}
72	}
73	// Update diagonal block
74	double aii_new = c * c * aii - 2.0 * s * c * aij + s * s * ajj;
75	double ajj_new = s * s * aii + 2.0 * s * c * aij + c * c * ajj;
76	A[idx(i, i, p)] = aii_new;
77	A[idx(j, j, p)] = ajj_new;
78	A[idx(i, j, p)] = 0.0;
79	A[idx(j, i, p)] = 0.0;
80	}
81	}
82	if (max_off < tol) break; // Converged
83	}
84	for (size_t i = 0; i < p; ++i) diag[i] = A[idx(i, i, p)];
85	return diag;
86	}
87
88	// Marchenko–Pastur PDF for unit variance
89	double mp_pdf(double x, double gamma) {
90	double a = (1.0 - sqrt(gamma)) * (1.0 - sqrt(gamma));
91	double b = (1.0 + sqrt(gamma)) * (1.0 + sqrt(gamma));
92	if (x <= a \|\| x >= b) return 0.0;
93	double val = sqrt((b - x) * (x - a));
94	return val / (2.0 * M_PI * gamma * x);
95	}
96
97	int main() {
98	ios::sync_with_stdio(false);
99	cin.tie(nullptr);
100
101	// Parameters
102	size_t p = 200; // number of features (rows)
103	size_t n = 400; // number of samples (columns)
104	double gamma = static_cast<double>(p) / static_cast<double>(n);
105
106	// 1) Generate X with i.i.d. N(0,1)
107	std::mt19937_64 rng(12345);
108	std::normal_distribution<double> nd(0.0, 1.0);
109	vector<double> X(p * n);
110	for (size_t i = 0; i < p; ++i)
111	for (size_t k = 0; k < n; ++k)
112	X[i * n + k] = nd(rng);
113
114	// 2) Compute sample covariance S = (1/n) X X^T
115	vector<double> S = sample_covariance(X, p, n);
116
117	// 3) Compute eigenvalues via Jacobi (educational; for speed use LAPACK/Eigen)
118	vector<double> evals = jacobi_eigenvalues(S, p, /max_sweeps=/60, /tol=/1e-9);
119
120	// 4) Build an empirical density histogram and compare to MP pdf
121	double a = (1.0 - sqrt(gamma)) * (1.0 - sqrt(gamma));
122	double b = (1.0 + sqrt(gamma)) * (1.0 + sqrt(gamma));
123
124	// Determine range for histogram (clip to [min,max] of evals intersect [a,b])
125	double minv = *min_element(evals.begin(), evals.end());
126	double maxv = *max_element(evals.begin(), evals.end());
127	double lo = max(0.0, min(minv, a));
128	double hi = max(maxv, b);
129
130	int bins = 60;
131	vector<int> counts(bins, 0);
132	double width = (hi - lo) / bins;
133	for (double lam : evals) {
134	int bin = (lam <= lo) ? 0 : (lam >= hi ? bins - 1 : static_cast<int>((lam - lo) / width));
135	counts[bin]++;
136	}
137
138	// Print: bin_center, empirical_density, mp_pdf
139	cout << fixed << setprecision(6);
140	cout << "# bin_center\tempirical_density\tmp_pdf\n";
141	for (int i = 0; i < bins; ++i) {
142	double center = lo + (i + 0.5) * width;
143	double empirical_density = static_cast<double>(counts[i]) / (static_cast<double>(p) * width);
144	double theory = mp_pdf(center, gamma);
145	cout << center << "\t" << empirical_density << "\t" << theory << "\n";
146	}
147
148	// Optional: report mass near zero vs atom prediction when gamma>1
149	if (gamma > 1.0) {
150	double atom_theory = 1.0 - 1.0 / gamma;
151	int near_zero = 0;
152	for (double lam : evals) if (lam < 1e-8) near_zero++;
153	double atom_emp = static_cast<double>(near_zero) / static_cast<double>(p);
154	cerr << "# Atom at 0 (theory vs empirical): " << atom_theory << " vs " << atom_emp << "\n";
155	}
156
157	return 0;
158	}
159

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute MP support endpoints for unit variance
5	inline pair<double,double> mp_support(double gamma) {
6	double s = sqrt(gamma);
7	return { (1.0 - s) * (1.0 - s), (1.0 + s) * (1.0 + s) };
8	}
9
10	// Closed-form MP Stieltjes transform m(z) = (1 - gamma - z + sqrt((z - a)(z - b))) / (2 gamma z)
11	// Branch chosen so that Im sqrt(...) >= 0 when Im z > 0.
12	complex<double> mp_stieltjes(complex<double> z, double gamma) {
13	auto [a, b] = mp_support(gamma);
14	complex<double> root = sqrt((z - a) * (z - b));
15	if (imag(root) < 0) root = -root; // ensure correct analytic branch in upper half-plane
16	return (1.0 - gamma - z + root) / (2.0 * gamma * z);
17	}
18
19	// Closed-form MP density for unit variance
20	double mp_pdf(double x, double gamma) {
21	auto [a, b] = mp_support(gamma);
22	if (x <= a \|\| x >= b) return 0.0;
23	double val = sqrt((b - x) * (x - a));
24	return val / (2.0 * M_PI * gamma * x);
25	}
26
27	// Approximate density via Stieltjes inversion: f(x) ~= (1/pi) * Im m(x + i*eta)
28	// Smaller eta -> more accurate but more oscillatory; choose moderate eta.
29	double mp_pdf_from_stieltjes(double x, double gamma, double eta) {
30	complex<double> z(x, eta);
31	complex<double> m = mp_stieltjes(z, gamma);
32	return imag(m) / M_PI;
33	}
34
35	int main() {
36	ios::sync_with_stdio(false);
37	cin.tie(nullptr);
38
39	double gamma = 0.5; // aspect ratio p/n
40	auto [a, b] = mp_support(gamma);
41	int N = 25; // number of evaluation points across support
42	double eta = 1e-3; // imaginary shift for Stieltjes inversion
43
44	cout << fixed << setprecision(8);
45	cout << "# x\tclosed_form_pdf\tstieltjes_pdf(eta=" << eta << ")\n";
46	for (int i = 0; i <= N; ++i) {
47	double x = a + (b - a) * (static_cast<double>(i) / N);
48	double f_closed = mp_pdf(x, gamma);
49	double f_st = mp_pdf_from_stieltjes(x, gamma, eta);
50	cout << x << "\t" << f_closed << "\t" << f_st << "\n";
51	}
52
53	// Also show behavior slightly outside support (should be near zero)
54	double x_left = max(0.0, a - 0.1 * (b - a));
55	double x_right = b + 0.1 * (b - a);
56	cout << "# Outside support checks:\n";
57	cout << x_left << "\t" << mp_pdf(x_left, gamma) << "\t" << mp_pdf_from_stieltjes(x_left, gamma, eta) << "\n";
58	cout << x_right << "\t" << mp_pdf(x_right, gamma) << "\t" << mp_pdf_from_stieltjes(x_right, gamma, eta) << "\n";
59
60	return 0;
61	}
62