⚙️AlgorithmIntermediate

Principal Component Analysis (PCA)

Key Points

•
Principal Component Analysis (PCA) finds new orthogonal axes (principal components) that capture the maximum variance in your data.
•
You must first center the data by subtracting the mean of each feature; otherwise PCA gives wrong directions.
•
Computationally, PCA is an eigendecomposition of the covariance matrix or, equivalently, an SVD of the centered data matrix.
•
Projecting data onto the top k principal components reduces dimensionality while preserving most variance and often denoises.
•
Explained variance ratios from eigenvalues help you choose k to meet a variance retention target (e.g., 95%).
•
For large datasets, avoid forming the covariance explicitly; use power iteration with matrix–vector products via $X_{c}$ and $X_{c}$ ^T.
•
Store the fitted mean and components to transform future data consistently and to reconstruct approximations if needed.
•
PCA is linear; for curved manifolds or nonlinear patterns consider kernel PCA or nonlinear methods.

Prerequisites

→Vectors and matrices — PCA manipulates matrices, performs matrix–vector products, and relies on orthogonality and norms.
→Variance and covariance — Understanding covariance clarifies why PCA maximizes variance along components.
→Eigenvalues and eigenvectors — PCA is defined by the eigendecomposition of the covariance matrix.
→Iterative eigenmethods (power iteration) — Efficient top-k PCA often uses power/orthogonal iteration instead of full eigendecomposition.
→Feature scaling and standardization — When features have different units or scales, standardization affects PCA results.

Detailed Explanation

Tap terms for definitions

01Overview

Principal Component Analysis (PCA) is a linear technique for reducing the dimensionality of a dataset while preserving as much variability (information) as possible. It re-expresses the data in terms of new, orthogonal axes called principal components. Each component is a direction in feature space along which the data varies the most, with the first component capturing the maximum variance, the second capturing the next largest variance subject to being orthogonal to the first, and so on. Practically, PCA helps compress data, visualize high-dimensional patterns, remove noise, and decorrelate features. Mathematically, PCA is performed by computing the covariance matrix of mean-centered data and then finding its eigenvalues and eigenvectors. The eigenvectors define the principal directions, and the eigenvalues quantify how much variance each direction captures. Alternatively, PCA can be obtained via the Singular Value Decomposition (SVD) of the centered data matrix, which is often numerically robust. After fitting PCA, we can project original samples onto the top k components to obtain a lower-dimensional representation, and we can approximately reconstruct original data by reversing the projection. Choosing k usually involves examining cumulative explained variance. Because PCA is linear and sensitive to scale and outliers, thoughtful preprocessing (centering, optionally standardization) is essential.

02Intuition & Analogies

Imagine a cloud of points representing your data as a flock of birds in 3D space. If you want to understand the flock’s main flight direction, you’d look for the longest axis of the cloud. That is the first principal component: the direction where the flock spreads out the most. Next, you’d look for the second longest axis, as long as it’s perpendicular to the first. Together, these axes describe the main shape of the cloud. If someone told you to summarize the flock with a simpler description—say, a 2D shadow—you would choose the best viewing plane to capture as much of the flock’s shape as possible. Projecting onto the first two principal components is exactly picking that best plane. In photography, this is like finding the angle where the object appears most detailed; in audio, it’s like separating loud, informative tones from faint background noise. PCA automates this by calculating the directions of maximum spread (variance) in your data. Each principal component is a recipe (a weighted mixture) of your original features that points along these informative directions. By keeping only the top few components, you keep the essence of the data while discarding directions that look mostly like noise. That’s why PCA is great for compression and denoising. But because it is a straight-line (linear) summary, it can miss curved patterns—like trying to approximate a spiral with straight rulers.

03Formal Definition

Given data matrix X

\in

R^{n \times d}

with rows x_

i^{⊤}

(samples) and columns as features, define the mean-centered matrix

X_{c} = X

1 μ^{⊤}

, where

μ

\frac{1}{n} \sum_{i = 1}^{n}

x_{i}

and

1

is an n-dimensional vector of ones. The sample covariance matrix is C =

\frac{1}{n - 1}

c^{⊤}

X_{c}

\in

R^{d \times d}

. PCA seeks an orthonormal matrix V = [

v_{1}

\dots

v_{d}

] such that C

v_{i}

λ_{i}

v_{i}

with eigenvalues

λ_{1}

\geq

λ_{2}

\geq

\dots

\geq

λ_{d}

\geq

0 and

V^{⊤} V = I

. The principal components are the eigenvectors

v_{i}

; their variances are the eigenvalues

λ_{i}

. For dimensionality reduction to k dimensions, form

V_{k}

= [

v_{1}

\dots

v_{k}

] and compute the low-dimensional representation (scores)

Y = X_{c}

V_{k}

\in

R^{n \times k}

. Approximate reconstruction is

\hat{X}

= Y V_

k^{⊤}

1 μ^{⊤}

. Equivalently, if

X_{c} = U

Σ

V^{⊤}

is the SVD with singular values

σ_{i}

, then the principal directions are the right singular vectors V, and

λ_{i}

σ_{i}^{2}

/(n-1). The explained variance ratio for component i is

r_{i}

λ_{i}

\sum_{j = 1}^{d}

λ_{j}

, and cumulative explained variance for top k is

R_{k}

\sum_{i = 1}^{k}

r_{i}

04When to Use

Use PCA when you need to reduce dimensionality while retaining most variance: for visualization (projecting to 2D/3D), compressing features before downstream tasks (e.g., clustering, regression), denoising by discarding low-variance directions, and decorrelating features to simplify models. It is well-suited when relationships are approximately linear, noise is roughly isotropic, and features are commensurate (or standardized if on different scales). PCA is also helpful when d is large and you want to mitigate the curse of dimensionality by capturing structure with far fewer components. Consider PCA before k-means clustering to improve cluster separation, or before linear regression to reduce multicollinearity (principal component regression). Use SVD-based approaches or covariance-free matrix–vector products for large n or d to avoid materializing huge covariance matrices. Do not rely on PCA when patterns are strongly nonlinear, when interpretability in original feature space is critical, or when outliers dominate variance. For heterogeneous units (e.g., kilograms vs meters), standardize features first. In supervised settings, PCA ignores labels; confirm that variance correlates with predictive signal or prefer supervised dimensionality reduction.

⚠️Common Mistakes

Skipping centering: Not subtracting feature means causes the first component to align with the mean offset instead of true variation. Always center; standardize if units differ.
Forgetting to sort eigenpairs: Some libraries return unsorted eigenvalues. Ensure components are ordered by decreasing eigenvalue.
Mixing up shapes: Confusing whether samples are rows or columns leads to wrong covariance. In our convention, samples are rows and features are columns.
Data leakage: Fitting PCA (means, components) on the full dataset before splitting leaks test information. Fit on training data; apply to validation/test with the stored mean and components.
Keeping too many/few components: Arbitrary k may underfit or include mostly noise. Use explained variance plots or thresholds (e.g., 95%).
Numerical instability: Explicit covariance of very high-dimensional or ill-conditioned data can be unstable. Prefer SVD or covariance-free multiplies X_c^{\top}(X_c v).
Outliers: PCA is variance-driven and can chase outliers. Consider robust scaling or outlier removal.
Misinterpretation: Loadings (eigenvectors) are linear combinations of features; large coefficients do not always imply causality.

Key Formulas

Feature mean

μ = \frac{1}{n} i = 1 \sum n x_{i}

Explanation: The mean vector averages each feature across all samples. Subtracting this vector centers the data for PCA.

Mean-centering

X_{c} = X - 1 μ^{⊤}

Explanation: Center the data matrix by subtracting the feature means from every row. PCA requires centered data to measure variance about zero.

Sample covariance

C = \frac{1}{n - 1} X_{c}^{⊤} X_{c}

Explanation: This symmetric matrix captures pairwise covariances between features. PCA performs an eigendecomposition of C.

Eigendecomposition

C = V Λ V^{⊤}, V^{⊤} V = I, Λ = diag (λ_{1}, \dots, λ_{d})

Explanation: The covariance decomposes into orthonormal eigenvectors and nonnegative eigenvalues. Eigenvectors define principal directions; eigenvalues give variances along them.

SVD relation

X_{c} = U Σ V^{⊤}, C = \frac{1}{n - 1} V Σ^{2} V^{⊤}

Explanation: The right singular vectors of centered data equal PCA loadings. Squared singular values (scaled by 1/(n−1)) are the eigenvalues of the covariance.

Projection (scores)

Y = X_{c} V_{k}

Explanation: Multiplying centered data by the top k eigenvectors yields k-dimensional coordinates that preserve maximal variance.

Reconstruction

\hat{X} = Y V_{k}^{⊤} + 1 μ^{⊤}

Explanation: Map low-dimensional scores back to the original feature space using the top components, then add back the mean.

Explained variance

r_{i} = \frac{λ _{i}}{\sum _{j = 1}^{d} λ _{j}}, R_{k} = \frac{\sum _{i = 1}^{k} λ _{i}}{\sum _{j = 1}^{d} λ _{j}}

Explanation: $r_{i}$ is the fraction of variance captured by component i; $R_{k}$ is the cumulative fraction for the first k components. Use these to choose k.

Optimal reconstruction error

∥ X - \hat{X} ∥_{F}^{2} = i = k + 1 \sum d λ_{i}

Explanation: The squared Frobenius error of the best rank-k PCA reconstruction equals the sum of discarded eigenvalues. PCA is the optimal linear rank-k approximation.

Power iteration

v^{(t + 1)} = \frac{C v ^{(t)}}{∥ C v ^{(t)} ∥ _{2}}

Explanation: Iteratively multiplying by C and normalizing converges to the dominant eigenvector. Orthogonalize to obtain subsequent components.

Whitening transform

Z = Y Λ_{k}^{- 1/2}

Explanation: Scaling scores by inverse square roots of eigenvalues yields features with approximately identity covariance (unit variance and zero covariance).

Dual (n \ll d) trick

G = X_{c} X_{c}^{⊤}, V = X_{c}^{⊤} U Σ^{- 1}

Explanation: When features far outnumber samples, compute eigenpairs of the sample Gram matrix G. Map left singular vectors back to feature space to obtain principal directions.

Complexity Analysis

Let n be the number of samples and d the number of features. Centering the data costs O(nd) time and O(d) extra space for the mean. Explicitly forming the covariance matrix C = (1/(n−1)) X_

c^{T}

X_{c}

costs O(n

d^{2}

) time and O(

d^{2}

) space, which becomes prohibitive when d is large. If you then compute a full eigendecomposition of C, the time is O(

d^{3}

) and space is O(

d^{2}

). However, many applications only need the top k components (k ≪ d). In that case, iterative methods like power iteration or orthogonal iteration avoid full eigendecomposition. Each power iteration step requires a matrix–vector product with C. If C is explicitly formed, this is O(

d^{2}

) per iteration. A better approach is to avoid C and compute C v via X_

c^{T}

(

X_{c}

v)/(n−1), which costs O(nd) per iteration and uses only O(nd) space to store the centered data and O(dk) to store k components. With t iterations per component (typically tens to a few hundred, depending on the eigengap), total time is roughly O(center: nd + project: k t nd) and space O(nd + dk). Projection of all samples onto k components costs O(ndk), and reconstruction costs O(ndk) as well. If n ≪ d, using the dual approach via the Gram matrix

G = X_{c}

c^{T}

(size n×n) reduces eigendecomposition to O(

n^{3}

) and then mapping back to feature space costs O(ndk). SVD-based PCA on

X_{c}

has time O(n d min(n, d)) and space O(nd), often giving the best numerical stability. Overall, the choice between explicit covariance, covariance-free multiplies, dual methods, and SVD depends on the relative sizes of n and d and the required number of components k.

Code Examples

PCA via power iteration without forming the covariance matrix (orthogonal iteration)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Helper to index row-major matrices stored in a flat vector
5 inline size_t idx(size_t r, size_t c, size_t ncols) { return r * ncols + c; }
6 
7 // Compute column means (size d) and center X in place (n x d)
8 vector<double> center_columns(vector<double>& X, size_t n, size_t d) {
9     vector<double> mu(d, 0.0);
10     for (size_t j = 0; j < d; ++j) {
11         double s = 0.0;
12         for (size_t i = 0; i < n; ++i) s += X[idx(i, j, d)];
13         mu[j] = s / static_cast<double>(n);
14         for (size_t i = 0; i < n; ++i) X[idx(i, j, d)] -= mu[j];
15     }
16     return mu;
17 }
18 
19 // y = X * v  where X is (n x d), v is (d)
20 void matVec_X_v(const vector<double>& X, const vector<double>& v, vector<double>& y, size_t n, size_t d) {
21     fill(y.begin(), y.end(), 0.0);
22     for (size_t i = 0; i < n; ++i) {
23         double s = 0.0;
24         const double* row = &X[idx(i, 0, d)];
25         for (size_t j = 0; j < d; ++j) s += row[j] * v[j];
26         y[i] = s;
27     }
28 }
29 
30 // z = X^T * u  where X is (n x d), u is (n)
31 void matTVec_XT_u(const vector<double>& X, const vector<double>& u, vector<double>& z, size_t n, size_t d) {
32     fill(z.begin(), z.end(), 0.0);
33     for (size_t j = 0; j < d; ++j) {
34         double s = 0.0;
35         for (size_t i = 0; i < n; ++i) s += X[idx(i, j, d)] * u[i];
36         z[j] = s;
37     }
38 }
39 
40 // w = (1/(n-1)) * X^T * (X * v)   (covariance times vector) without forming C
41 void covMatVec_via_data(const vector<double>& Xc, const vector<double>& v, vector<double>& w, size_t n, size_t d) {
42     static thread_local vector<double> tmp; // reusable buffer for Xc * v
43     tmp.assign(n, 0.0);
44     matVec_X_v(Xc, v, tmp, n, d);              // tmp = Xc * v (n)
45     matTVec_XT_u(Xc, tmp, w, n, d);            // w = Xc^T * tmp (d)
46     double scale = 1.0 / static_cast<double>(n > 1 ? (n - 1) : 1);
47     for (size_t j = 0; j < d; ++j) w[j] *= scale;
48 }
49 
50 // Dot product and L2 norm
51 inline double dot(const vector<double>& a, const vector<double>& b) {
52     double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i] * b[i]; return s;
53 }
54 inline double norm2(const vector<double>& a) { return sqrt(max(0.0, dot(a, a))); }
55 
56 // Orthogonalize v against columns of W (d x m) using Modified Gram-Schmidt
57 void orthogonalize(vector<double>& v, const vector<double>& W, size_t d, size_t m) {
58     for (size_t c = 0; c < m; ++c) {
59         const double* w = &W[idx(0, c, m)];
60         double proj = 0.0;
61         for (size_t j = 0; j < d; ++j) proj += v[j] * w[j];
62         for (size_t j = 0; j < d; ++j) v[j] -= proj * w[j];
63     }
64 }
65 
66 // Power iteration with orthogonalization to find top-k eigenvectors of covariance
67 struct PCAResult { vector<double> mean; vector<double> components; vector<double> eigenvalues; size_t n, d, k; };
68 
69 PCAResult pca_power_iteration(const vector<double>& X_in, size_t n, size_t d, size_t k,
70                               size_t max_iter = 500, double tol = 1e-6, uint64_t seed = 42) {
71     vector<double> X = X_in; // copy to center in place
72     vector<double> mean = center_columns(X, n, d);
73 
74     // Storage for components (d x k, column-major across k) and eigenvalues
75     vector<double> W(d * k, 0.0);
76     vector<double> lambdas(k, 0.0);
77 
78     mt19937_64 rng(seed);
79     normal_distribution<double> dist(0.0, 1.0);
80 
81     vector<double> v(d), w(d);
82 
83     for (size_t comp = 0; comp < k; ++comp) {
84         // Random initialization
85         for (size_t j = 0; j < d; ++j) v[j] = dist(rng);
86         // Orthogonalize against previous components
87         if (comp > 0) orthogonalize(v, W, d, comp);
88         double nv = norm2(v); if (nv > 0) for (size_t j = 0; j < d; ++j) v[j] /= nv;
89 
90         double prev_val = 0.0;
91         for (size_t it = 0; it < max_iter; ++it) {
92             // Multiply by covariance via data and orthogonalize
93             covMatVec_via_data(X, v, w, n, d);
94             if (comp > 0) orthogonalize(w, W, d, comp);
95             double nw = norm2(w);
96             if (nw == 0.0) break; // degenerate
97             for (size_t j = 0; j < d; ++j) v[j] = w[j] / nw;
98             // Rayleigh quotient as eigenvalue estimate
99             covMatVec_via_data(X, v, w, n, d);
100             double val = dot(v, w);
101             if (fabs(val - prev_val) < tol * max(1.0, fabs(prev_val))) { prev_val = val; break; }
102             prev_val = val;
103         }
104         // Save component v into W(:, comp)
105         for (size_t j = 0; j < d; ++j) W[idx(j, comp, k)] = v[j];
106         // Final eigenvalue estimate
107         covMatVec_via_data(X, v, w, n, d);
108         lambdas[comp] = dot(v, w);
109     }
110 
111     return {mean, W, lambdas, n, d, k};
112 }
113 
114 // Project centered data onto components: Y = Xc * W_k  (n x k)
115 vector<double> project_scores(const vector<double>& X, const vector<double>& mean,
116                               const vector<double>& W, size_t n, size_t d, size_t k) {
117     vector<double> Y(n * k, 0.0);
118     for (size_t i = 0; i < n; ++i) {
119         // Build centered row
120         for (size_t c = 0; c < k; ++c) {
121             double s = 0.0;
122             for (size_t j = 0; j < d; ++j) s += (X[idx(i, j, d)] - mean[j]) * W[idx(j, c, k)];
123             Y[idx(i, c, k)] = s;
124         }
125     }
126     return Y;
127 }
128 
129 // Reconstruct: X_hat = Y * W^T + 1 * mean^T
130 vector<double> reconstruct(const vector<double>& Y, const vector<double>& W, const vector<double>& mean,
131                            size_t n, size_t d, size_t k) {
132     vector<double> Xhat(n * d, 0.0);
133     for (size_t i = 0; i < n; ++i) {
134         for (size_t j = 0; j < d; ++j) {
135             double s = 0.0;
136             for (size_t c = 0; c < k; ++c) s += Y[idx(i, c, k)] * W[idx(j, c, k)];
137             Xhat[idx(i, j, d)] = s + mean[j];
138         }
139     }
140     return Xhat;
141 }
142 
143 int main() {
144     // Create synthetic 2D data with correlated features embedded in 3D
145     size_t n = 600, d = 3, k = 2;
146     vector<double> X(n * d);
147     mt19937_64 rng(123);
148     normal_distribution<double> g1(0.0, 3.0); // main variance
149     normal_distribution<double> g2(0.0, 1.0); // secondary variance
150     normal_distribution<double> noise(0.0, 0.2);
151 
152     // True latent 2D, then mix into 3D via a loading matrix A (3x2)
153     array<array<double,2>,3> A = {{{0.8, 0.1}, {0.6, -0.2}, {0.0, 0.98}}};
154     for (size_t i = 0; i < n; ++i) {
155         double z1 = g1(rng), z2 = g2(rng);
156         double x = A[0][0]*z1 + A[0][1]*z2 + noise(rng);
157         double y = A[1][0]*z1 + A[1][1]*z2 + noise(rng);
158         double z = A[2][0]*z1 + A[2][1]*z2 + noise(rng);
159         X[idx(i,0,d)] = x + 5.0; // add mean shift to test centering
160         X[idx(i,1,d)] = y - 2.0;
161         X[idx(i,2,d)] = z + 1.0;
162     }
163 
164     auto res = pca_power_iteration(X, n, d, k, 300, 1e-7);
165 
166     // Report eigenvalues and explained variance
167     double total_var = 0.0; for (double l : res.eigenvalues) total_var += l; // partial total (top-k)
168     // For true total variance, one would sum all d eigenvalues; here we show top-k share estimate
169 
170     cout << fixed << setprecision(6);
171     cout << "Top-" << k << " eigenvalues (variances):\n";
172     for (size_t i = 0; i < k; ++i) cout << "  lambda_" << (i+1) << " = " << res.eigenvalues[i] << "\n";
173 
174     // Project and reconstruct first 5 samples
175     auto Y = project_scores(X, res.mean, res.components, n, d, k);
176     auto Xhat = reconstruct(Y, res.components, res.mean, n, d, k);
177 
178     cout << "\nOriginal vs Reconstructed (first 3 samples):\n";
179     for (size_t i = 0; i < 3; ++i) {
180         cout << "Sample " << i << ":\n  orig:  ";
181         for (size_t j = 0; j < d; ++j) cout << setw(9) << X[idx(i,j,d)] << ' ';
182         cout << "\n  recon: ";
183         for (size_t j = 0; j < d; ++j) cout << setw(9) << Xhat[idx(i,j,d)] << ' ';
184         cout << "\n";
185     }
186 
187     // Show first component loadings
188     cout << "\nFirst component loadings (v1): ";
189     for (size_t j = 0; j < d; ++j) cout << res.components[idx(j,0,k)] << ' ';
190     cout << "\n";
191     return 0;
192 }
193

This program implements PCA using orthogonal power iteration without explicitly forming the covariance matrix. It centers the data, repeatedly multiplies a vector by the covariance via X_c^T(X_c v)/(n−1), orthogonalizes against previously found components, and normalizes. The Rayleigh quotient estimates each eigenvalue (variance). It then projects samples onto the top k components and reconstructs approximate originals. This approach scales to larger problems by avoiding O(d^2) covariance storage and uses only matrix–vector products on the centered data.

Time: Centering: O(nd). Power iteration per step via data: O(nd). For k components and t iterations: O(k t n d). Projection/reconstruction: O(n d k).Space: O(nd) to store centered data, O(dk) for components, O(d) for mean and working vectors.

Streaming (online) covariance with Welford’s algorithm + PCA of the covariance

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 inline size_t idx(size_t r, size_t c, size_t ncols) { return r * ncols + c; }
5 
6 struct OnlineCov {
7     size_t d = 0;        // dimensionality
8     size_t n = 0;        // count
9     vector<double> mean; // running mean (d)
10     vector<double> M2;   // sum of outer products of deviations (d x d)
11 
12     explicit OnlineCov(size_t dim) : d(dim), n(0), mean(d, 0.0), M2(d * d, 0.0) {}
13 
14     void update(const vector<double>& x) {
15         if (x.size() != d) throw runtime_error("dimension mismatch");
16         n++;
17         vector<double> delta(d), delta2(d);
18         for (size_t j = 0; j < d; ++j) delta[j] = x[j] - mean[j];
19         // update mean
20         for (size_t j = 0; j < d; ++j) mean[j] += delta[j] / static_cast<double>(n);
21         for (size_t j = 0; j < d; ++j) delta2[j] = x[j] - mean[j];
22         // M2 += outer(delta, delta2)
23         for (size_t r = 0; r < d; ++r) {
24             double dr = delta[r];
25             for (size_t c = 0; c < d; ++c) M2[idx(r,c,d)] += dr * delta2[c];
26         }
27     }
28 
29     vector<double> covariance() const {
30         vector<double> C = M2;
31         double denom = (n > 1) ? (static_cast<double>(n) - 1.0) : 1.0;
32         for (double& v : C) v /= denom;
33         return C; // d x d
34     }
35 };
36 
37 inline double dot(const vector<double>& a, const vector<double>& b) { double s=0; for (size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
38 inline double norm2(const vector<double>& a) { return sqrt(max(0.0, dot(a,a))); }
39 
40 // w = C * v for symmetric C (d x d)
41 void symMatVec(const vector<double>& C, const vector<double>& v, vector<double>& w, size_t d) {
42     fill(w.begin(), w.end(), 0.0);
43     for (size_t r = 0; r < d; ++r) {
44         const double* row = &C[idx(r,0,d)];
45         double s = 0.0; for (size_t c = 0; c < d; ++c) s += row[c] * v[c];
46         w[r] = s;
47     }
48 }
49 
50 // Orthogonalize against W (d x m)
51 void orthogonalize(vector<double>& v, const vector<double>& W, size_t d, size_t m) {
52     for (size_t c = 0; c < m; ++c) {
53         const double* w = &W[idx(0, c, m)];
54         double proj = 0.0; for (size_t j = 0; j < d; ++j) proj += v[j] * w[j];
55         for (size_t j = 0; j < d; ++j) v[j] -= proj * w[j];
56     }
57 }
58 
59 struct EigenResult { vector<double> vecs; vector<double> vals; size_t d, k; };
60 
61 EigenResult topk_eigs_power(const vector<double>& C, size_t d, size_t k, size_t max_iter = 500, double tol = 1e-6, uint64_t seed=7) {
62     vector<double> W(d * k, 0.0), vals(k, 0.0);
63     mt19937_64 rng(seed); normal_distribution<double> nd(0.0, 1.0);
64     vector<double> v(d), w(d);
65     for (size_t comp = 0; comp < k; ++comp) {
66         for (size_t j = 0; j < d; ++j) v[j] = nd(rng);
67         if (comp > 0) orthogonalize(v, W, d, comp);
68         double nv = norm2(v); if (nv > 0) for (size_t j = 0; j < d; ++j) v[j] /= nv;
69         double prev = 0.0;
70         for (size_t it = 0; it < max_iter; ++it) {
71             symMatVec(C, v, w, d);
72             if (comp > 0) orthogonalize(w, W, d, comp);
73             double nw = norm2(w); if (nw == 0) break;
74             for (size_t j = 0; j < d; ++j) v[j] = w[j] / nw;
75             symMatVec(C, v, w, d); double val = dot(v, w);
76             if (fabs(val - prev) < tol * max(1.0, fabs(prev))) { prev = val; break; }
77             prev = val;
78         }
79         for (size_t j = 0; j < d; ++j) W[idx(j, comp, k)] = v[j];
80         symMatVec(C, v, w, d); vals[comp] = dot(v, w);
81     }
82     return {W, vals, d, k};
83 }
84 
85 int main() {
86     // Stream synthetic 5D data (two strong directions + noise)
87     size_t d = 5; size_t n = 2000;
88     OnlineCov oc(d);
89     mt19937_64 rng(2024);
90     normal_distribution<double> z1(0.0, 4.0), z2(0.0, 2.0), noise(0.0, 0.4);
91     // Loading matrix A (5 x 2)
92     array<array<double,2>,5> A = {{{0.7, 0.2},{0.5, -0.3},{0.3, 0.8},{0.0, 0.4},{0.1, -0.1}}};
93 
94     for (size_t i = 0; i < n; ++i) {
95         double u1 = z1(rng), u2 = z2(rng);
96         vector<double> x(d);
97         for (size_t r = 0; r < d; ++r) x[r] = A[r][0]*u1 + A[r][1]*u2 + noise(rng);
98         // Add drift to test centering
99         if (i % 2 == 0) x[0] += 5.0;
100         oc.update(x);
101     }
102 
103     auto C = oc.covariance();
104     size_t k = 2;
105     auto er = topk_eigs_power(C, d, k, 300, 1e-7);
106 
107     // Report eigenvalues and explained variance
108     double total_var = 0.0; for (size_t j = 0; j < d; ++j) total_var += C[idx(j,j,d)];
109     cout << fixed << setprecision(6);
110     cout << "Top-" << k << " eigenvalues: \n";
111     for (size_t i = 0; i < k; ++i) cout << "  lambda_" << (i+1) << " = " << er.vals[i] << "\n";
112     double cum = 0.0; for (size_t i = 0; i < k; ++i) cum += er.vals[i];
113     cout << "Cumulative explained variance (top-" << k << ") = " << (cum / total_var) * 100.0 << "%\n";
114 
115     // Print first component loadings
116     cout << "First component loadings: ";
117     for (size_t j = 0; j < d; ++j) cout << er.vecs[idx(j,0,k)] << ' ';
118     cout << "\n";
119     return 0;
120 }
121

This program computes a covariance matrix online using Welford’s algorithm, which updates the mean and the sum of outer-product deviations per arriving sample. After streaming all samples, it forms the sample covariance, then finds the top k eigenpairs using power iteration on the explicit symmetric covariance. It reports eigenvalues and cumulative explained variance, demonstrating how to handle data that do not fit in memory at once while still performing PCA at the end.

Time: Per update: O(d^2) (outer product). Total streaming: O(n d^2). Eigen solve for k with t iterations: O(k t d^2).Space: O(d^2) for M2/covariance and O(d) for mean and working vectors.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Helper to index row-major matrices stored in a flat vector
5	inline size_t idx(size_t r, size_t c, size_t ncols) { return r * ncols + c; }
6
7	// Compute column means (size d) and center X in place (n x d)
8	vector<double> center_columns(vector<double>& X, size_t n, size_t d) {
9	vector<double> mu(d, 0.0);
10	for (size_t j = 0; j < d; ++j) {
11	double s = 0.0;
12	for (size_t i = 0; i < n; ++i) s += X[idx(i, j, d)];
13	mu[j] = s / static_cast<double>(n);
14	for (size_t i = 0; i < n; ++i) X[idx(i, j, d)] -= mu[j];
15	}
16	return mu;
17	}
18
19	// y = X * v where X is (n x d), v is (d)
20	void matVec_X_v(const vector<double>& X, const vector<double>& v, vector<double>& y, size_t n, size_t d) {
21	fill(y.begin(), y.end(), 0.0);
22	for (size_t i = 0; i < n; ++i) {
23	double s = 0.0;
24	const double* row = &X[idx(i, 0, d)];
25	for (size_t j = 0; j < d; ++j) s += row[j] * v[j];
26	y[i] = s;
27	}
28	}
29
30	// z = X^T * u where X is (n x d), u is (n)
31	void matTVec_XT_u(const vector<double>& X, const vector<double>& u, vector<double>& z, size_t n, size_t d) {
32	fill(z.begin(), z.end(), 0.0);
33	for (size_t j = 0; j < d; ++j) {
34	double s = 0.0;
35	for (size_t i = 0; i < n; ++i) s += X[idx(i, j, d)] * u[i];
36	z[j] = s;
37	}
38	}
39
40	// w = (1/(n-1)) * X^T * (X * v) (covariance times vector) without forming C
41	void covMatVec_via_data(const vector<double>& Xc, const vector<double>& v, vector<double>& w, size_t n, size_t d) {
42	static thread_local vector<double> tmp; // reusable buffer for Xc * v
43	tmp.assign(n, 0.0);
44	matVec_X_v(Xc, v, tmp, n, d); // tmp = Xc * v (n)
45	matTVec_XT_u(Xc, tmp, w, n, d); // w = Xc^T * tmp (d)
46	double scale = 1.0 / static_cast<double>(n > 1 ? (n - 1) : 1);
47	for (size_t j = 0; j < d; ++j) w[j] *= scale;
48	}
49
50	// Dot product and L2 norm
51	inline double dot(const vector<double>& a, const vector<double>& b) {
52	double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i] * b[i]; return s;
53	}
54	inline double norm2(const vector<double>& a) { return sqrt(max(0.0, dot(a, a))); }
55
56	// Orthogonalize v against columns of W (d x m) using Modified Gram-Schmidt
57	void orthogonalize(vector<double>& v, const vector<double>& W, size_t d, size_t m) {
58	for (size_t c = 0; c < m; ++c) {
59	const double* w = &W[idx(0, c, m)];
60	double proj = 0.0;
61	for (size_t j = 0; j < d; ++j) proj += v[j] * w[j];
62	for (size_t j = 0; j < d; ++j) v[j] -= proj * w[j];
63	}
64	}
65
66	// Power iteration with orthogonalization to find top-k eigenvectors of covariance
67	struct PCAResult { vector<double> mean; vector<double> components; vector<double> eigenvalues; size_t n, d, k; };
68
69	PCAResult pca_power_iteration(const vector<double>& X_in, size_t n, size_t d, size_t k,
70	size_t max_iter = 500, double tol = 1e-6, uint64_t seed = 42) {
71	vector<double> X = X_in; // copy to center in place
72	vector<double> mean = center_columns(X, n, d);
73
74	// Storage for components (d x k, column-major across k) and eigenvalues
75	vector<double> W(d * k, 0.0);
76	vector<double> lambdas(k, 0.0);
77
78	mt19937_64 rng(seed);
79	normal_distribution<double> dist(0.0, 1.0);
80
81	vector<double> v(d), w(d);
82
83	for (size_t comp = 0; comp < k; ++comp) {
84	// Random initialization
85	for (size_t j = 0; j < d; ++j) v[j] = dist(rng);
86	// Orthogonalize against previous components
87	if (comp > 0) orthogonalize(v, W, d, comp);
88	double nv = norm2(v); if (nv > 0) for (size_t j = 0; j < d; ++j) v[j] /= nv;
89
90	double prev_val = 0.0;
91	for (size_t it = 0; it < max_iter; ++it) {
92	// Multiply by covariance via data and orthogonalize
93	covMatVec_via_data(X, v, w, n, d);
94	if (comp > 0) orthogonalize(w, W, d, comp);
95	double nw = norm2(w);
96	if (nw == 0.0) break; // degenerate
97	for (size_t j = 0; j < d; ++j) v[j] = w[j] / nw;
98	// Rayleigh quotient as eigenvalue estimate
99	covMatVec_via_data(X, v, w, n, d);
100	double val = dot(v, w);
101	if (fabs(val - prev_val) < tol * max(1.0, fabs(prev_val))) { prev_val = val; break; }
102	prev_val = val;
103	}
104	// Save component v into W(:, comp)
105	for (size_t j = 0; j < d; ++j) W[idx(j, comp, k)] = v[j];
106	// Final eigenvalue estimate
107	covMatVec_via_data(X, v, w, n, d);
108	lambdas[comp] = dot(v, w);
109	}
110
111	return {mean, W, lambdas, n, d, k};
112	}
113
114	// Project centered data onto components: Y = Xc * W_k (n x k)
115	vector<double> project_scores(const vector<double>& X, const vector<double>& mean,
116	const vector<double>& W, size_t n, size_t d, size_t k) {
117	vector<double> Y(n * k, 0.0);
118	for (size_t i = 0; i < n; ++i) {
119	// Build centered row
120	for (size_t c = 0; c < k; ++c) {
121	double s = 0.0;
122	for (size_t j = 0; j < d; ++j) s += (X[idx(i, j, d)] - mean[j]) * W[idx(j, c, k)];
123	Y[idx(i, c, k)] = s;
124	}
125	}
126	return Y;
127	}
128
129	// Reconstruct: X_hat = Y * W^T + 1 * mean^T
130	vector<double> reconstruct(const vector<double>& Y, const vector<double>& W, const vector<double>& mean,
131	size_t n, size_t d, size_t k) {
132	vector<double> Xhat(n * d, 0.0);
133	for (size_t i = 0; i < n; ++i) {
134	for (size_t j = 0; j < d; ++j) {
135	double s = 0.0;
136	for (size_t c = 0; c < k; ++c) s += Y[idx(i, c, k)] * W[idx(j, c, k)];
137	Xhat[idx(i, j, d)] = s + mean[j];
138	}
139	}
140	return Xhat;
141	}
142
143	int main() {
144	// Create synthetic 2D data with correlated features embedded in 3D
145	size_t n = 600, d = 3, k = 2;
146	vector<double> X(n * d);
147	mt19937_64 rng(123);
148	normal_distribution<double> g1(0.0, 3.0); // main variance
149	normal_distribution<double> g2(0.0, 1.0); // secondary variance
150	normal_distribution<double> noise(0.0, 0.2);
151
152	// True latent 2D, then mix into 3D via a loading matrix A (3x2)
153	array<array<double,2>,3> A = {{{0.8, 0.1}, {0.6, -0.2}, {0.0, 0.98}}};
154	for (size_t i = 0; i < n; ++i) {
155	double z1 = g1(rng), z2 = g2(rng);
156	double x = A[0][0]z1 + A[0][1]z2 + noise(rng);
157	double y = A[1][0]z1 + A[1][1]z2 + noise(rng);
158	double z = A[2][0]z1 + A[2][1]z2 + noise(rng);
159	X[idx(i,0,d)] = x + 5.0; // add mean shift to test centering
160	X[idx(i,1,d)] = y - 2.0;
161	X[idx(i,2,d)] = z + 1.0;
162	}
163
164	auto res = pca_power_iteration(X, n, d, k, 300, 1e-7);
165
166	// Report eigenvalues and explained variance
167	double total_var = 0.0; for (double l : res.eigenvalues) total_var += l; // partial total (top-k)
168	// For true total variance, one would sum all d eigenvalues; here we show top-k share estimate
169
170	cout << fixed << setprecision(6);
171	cout << "Top-" << k << " eigenvalues (variances):\n";
172	for (size_t i = 0; i < k; ++i) cout << " lambda_" << (i+1) << " = " << res.eigenvalues[i] << "\n";
173
174	// Project and reconstruct first 5 samples
175	auto Y = project_scores(X, res.mean, res.components, n, d, k);
176	auto Xhat = reconstruct(Y, res.components, res.mean, n, d, k);
177
178	cout << "\nOriginal vs Reconstructed (first 3 samples):\n";
179	for (size_t i = 0; i < 3; ++i) {
180	cout << "Sample " << i << ":\n orig: ";
181	for (size_t j = 0; j < d; ++j) cout << setw(9) << X[idx(i,j,d)] << ' ';
182	cout << "\n recon: ";
183	for (size_t j = 0; j < d; ++j) cout << setw(9) << Xhat[idx(i,j,d)] << ' ';
184	cout << "\n";
185	}
186
187	// Show first component loadings
188	cout << "\nFirst component loadings (v1): ";
189	for (size_t j = 0; j < d; ++j) cout << res.components[idx(j,0,k)] << ' ';
190	cout << "\n";
191	return 0;
192	}
193