Variance and Covariance
Key Points
- •Variance measures how spread out a random variable is around its mean, while covariance measures how two variables move together.
- •Key identities are Var(X) = E[] - (E[X])^2 and Cov(X, Y) = E[XY] - E[X]E[Y].
- •For independent variables, Var(X + Y) = Var(X) + Var(Y); in general add 2Cov(X, Y).
- •Standard deviation is the square root of variance and has the same units as the data.
- •Covariance can be negative, zero, or positive; zero covariance does not always imply independence.
- •Efficient, numerically stable computation uses one-pass algorithms like Welford's method.
- •Variance and covariance power concentration bounds such as Chebyshev's inequality for probabilistic guarantees.
- •In competitive programming, these tools help analyze randomized algorithms, Monte Carlo estimates, and sums of indicators.
Prerequisites
- →Basic probability (events, sample spaces, independence) — Variance and covariance are expectations over random variables and rely on independence concepts.
- →Linearity of expectation — Key to deriving and manipulating Var and Cov formulas efficiently.
- →Summations and algebraic manipulation — Needed to expand squares, handle sums of random variables, and verify identities.
- →Floating-point arithmetic and numerical stability — Practical implementations must avoid catastrophic cancellation and overflow.
- →Random number generation — Monte Carlo experiments and simulations require proper RNG usage.
Detailed Explanation
Tap terms for definitions01Overview
Variance and covariance are core tools for quantifying uncertainty. Variance tells you how much a random variable typically deviates from its mean; covariance tells you whether two variables tend to increase together, decrease together, or move independently. Practically, they let you bound errors, compare algorithm stability, and understand how randomness accumulates when summing many contributions. Two frequently used formulas are Var(X) = E[X^2] - (E[X])^2 and Cov(X, Y) = E[XY] - E[X]E[Y]. These convert spread questions into expectations, which are often easier to compute by linearity. For independent X and Y, covariance is zero, and variance of sums simply adds. This additivity under independence is the workhorse behind many concentration ideas and the analysis of randomized algorithms. Standard deviation, the square root of variance, puts spread back into the original units, making it intuitive for confidence bounds. In programming contests, even if direct probability theory is rare, you will encounter sampling, hashing, randomized choices, and indicator variables—settings where variance and covariance quickly yield clean correctness and performance guarantees.
02Intuition & Analogies
Think of a dartboard. If you always hit near the bullseye, your throws have low variance; if your hits are scattered across the board, variance is high. The mean is your average hit location; variance measures how spread out your arrows land around that mean. Now imagine two friends throwing simultaneously: if whenever you throw far right, your friend also tends to throw far right, their throws are positively correlated—covariance is positive. If when you go right they go left, covariance is negative. If their throws seem unrelated to yours, covariance is near zero. Another analogy is daily commute times. Your commute varies because of traffic (variance). Your coworker’s commute might be affected by the same traffic jams; when yours is longer, theirs tends to be longer too—positive covariance due to shared causes. If you live in different cities with unrelated traffic, the covariance is near zero. Finally, consider adding noise. If you add multiple independent noise sources (like several small, unrelated delays), the total uncertainty grows additively in variance, not standard deviation—because spreads combine quadratically. This explains why many small independent errors can still produce a sizable total spread and motivates computing variance rather than just adding absolute errors.
03Formal Definition
04When to Use
Use variance when you need a single-number summary of uncertainty, to compare stability of algorithms, or to derive probabilistic guarantees via concentration inequalities (e.g., Chebyshev). For sums of random contributions—like counts of successes across trials, random hashing collisions, or Monte Carlo estimators—variance tells you how the error scales with the number of samples. Use covariance when analyzing how two quantities interact: error propagation across stages, dependency between features, or whether two randomized subroutines amplify or cancel each other's variability. In coding contests, practical applications include: estimating probabilities or expectations by sampling and attaching error bars; proving that a randomized algorithm succeeds with high probability; bounding deviations of sums of indicators; diagnosing numerical stability in floating-point aggregates (choose numerically stable variance formulas and streaming updates); and exploiting independence in designs (pairwise independent hashing yields simple variance bounds). When dealing with multivariate data (e.g., geometric problems with random points), a covariance matrix captures spread and orientation—useful in PCA-like reasoning or bounding distances.
⚠️Common Mistakes
Common pitfalls include: (1) Using the naive one-pass formula Var = E[X^2] - (E[X])^2 with floating-point sums, which can suffer catastrophic cancellation when the variance is small relative to the mean; prefer Welford’s method or compensated summation. (2) Confusing population and sample variance: the unbiased estimator for i.i.d. samples uses 1/(n-1), not 1/n. (3) Assuming zero covariance implies independence; it does not, except under special distributions (e.g., jointly Gaussian). (4) Forgetting covariance terms when summing dependent variables; Var(X+Y) = Var(X)+Var(Y) only for independent or uncorrelated variables. (5) Mis-scaling: Var(aX + b) = a^2 Var(X), not a Var(X); and standard deviation scales by |a|. (6) Interpreting covariance magnitude without units; covariance units are the product of variable units, so compare via the correlation coefficient to get a dimensionless measure. (7) Overtrusting Chebyshev bounds; they are general but often loose—use better inequalities when available. (8) Mixing integer arithmetic with double accumulation causing overflow; always accumulate in double/long double for sums of squares and products.
Key Formulas
Variance Identities
Explanation: Variance is the expected squared distance from the mean. The alternative form using E[] is often easier to compute by linearity of expectation.
Covariance Identities
Explanation: Covariance measures joint deviations from means. The E[XY] form is convenient for algebra and proofs.
Scaling Rule
Explanation: Adding a constant does not change variance; scaling by a multiplies variance by . This is crucial for normalization.
Variance of a Sum (Two Variables)
Explanation: Variance of a sum includes individual variances and twice the covariance. If X and Y are independent, the covariance term is zero.
Variance of a Sum (General)
Explanation: The total variance equals the sum of individual variances plus all pairwise covariance contributions. For mutual independence, only the variance terms remain.
Correlation Coefficient
Explanation: Correlation is a dimensionless measure in [-1,1] that compares covariance to the product of standard deviations. It enables fair comparisons across scales.
Cauchy–Schwarz Bound
Explanation: The magnitude of covariance cannot exceed the product of standard deviations. This bound implies ≤ 1.
Chebyshev's Inequality
Explanation: This concentration bound uses only mean and variance to control tail probabilities. It is general but often loose compared to distribution-specific bounds.
Law of Total Variance
Explanation: Decomposes overall uncertainty into average conditional variance plus variance of conditional means. Useful in hierarchical or mixture models.
Unbiased Sample Variance
Explanation: For i.i.d. samples from a population, this estimator has expected value equal to the true variance. The 1/(n-1) factor corrects bias from estimating the mean.
Bilinearity of Covariance
Explanation: Covariance distributes linearly over sums and scales with constants. This is often used to simplify complex covariance expressions.
Covariance Matrix PSD
Explanation: The covariance matrix is symmetric positive semidefinite, meaning quadratic forms are nonnegative. This property underlies PCA and Mahalanobis distance.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Online (streaming) computation of mean, variance, and covariance. 5 // Uses Welford's algorithm for numerical stability. 6 // Also provides unbiased sample variance/covariance on demand. 7 8 struct OnlineStats2 { 9 long long n = 0; // count of paired samples 10 long double mean_x = 0; // running mean of X 11 long double mean_y = 0; // running mean of Y 12 long double M2_x = 0; // sum of squared deviations for X 13 long double M2_y = 0; // sum of squared deviations for Y 14 long double C_xy = 0; // sum of product of deviations for covariance 15 16 void add(long double x, long double y) { 17 // Update both sequences consistently 18 n++; 19 long double dx = x - mean_x; 20 long double dy = y - mean_y; 21 // Update means 22 mean_x += dx / n; 23 mean_y += dy / n; 24 // After updating means, deviations from updated means are: 25 long double dx2 = x - mean_x; 26 long double dy2 = y - mean_y; 27 // Accumulate sums of squares and cross-products 28 M2_x += dx * dx2; // equivalent to sum (x - mean)^2 29 M2_y += dy * dy2; // equivalent to sum (y - mean)^2 30 C_xy += dx * dy2; // equivalent to sum (x - mean_x)*(y - mean_y) 31 } 32 33 // Population variance (divide by n); valid for n > 0 34 long double var_pop_x() const { return n > 0 ? M2_x / n : 0.0L; } 35 long double var_pop_y() const { return n > 0 ? M2_y / n : 0.0L; } 36 37 // Unbiased sample variance (divide by n-1); valid for n > 1 38 long double var_samp_x() const { return n > 1 ? M2_x / (n - 1) : 0.0L; } 39 long double var_samp_y() const { return n > 1 ? M2_y / (n - 1) : 0.0L; } 40 41 // Population and sample covariance 42 long double cov_pop() const { return n > 0 ? C_xy / n : 0.0L; } 43 long double cov_samp() const { return n > 1 ? C_xy / (n - 1) : 0.0L; } 44 45 long double corr() const { 46 long double vx = var_pop_x(); 47 long double vy = var_pop_y(); 48 if (vx <= 0 || vy <= 0) return 0.0L; 49 return cov_pop() / sqrt(vx * vy); 50 } 51 }; 52 53 int main() { 54 ios::sync_with_stdio(false); 55 cin.tie(nullptr); 56 57 // Example: compute statistics over paired data (x_i, y_i) 58 vector<long double> x = {1, 2, 3, 4, 5}; 59 vector<long double> y = {2, 4, 6, 8, 10}; // perfectly correlated: y = 2x 60 61 OnlineStats2 stats; 62 for (size_t i = 0; i < x.size(); ++i) stats.add(x[i], y[i]); 63 64 cout.setf(std::ios::fixed); cout << setprecision(6); 65 cout << "mean_x = " << (double)stats.mean_x << "\n"; 66 cout << "mean_y = " << (double)stats.mean_y << "\n"; 67 cout << "var_pop_x = " << (double)stats.var_pop_x() << "\n"; 68 cout << "var_pop_y = " << (double)stats.var_pop_y() << "\n"; 69 cout << "cov_pop = " << (double)stats.cov_pop() << "\n"; 70 cout << "corr = " << (double)stats.corr() << "\n"; 71 return 0; 72 } 73
Maintains running means and second-order accumulators to compute mean, variance, and covariance in one pass with numerical stability. Demonstrates population vs sample formulas and correlation.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Empirically verify Var(X+Y) = Var(X)+Var(Y) for independent variables 5 // and observe the extra 2Cov(X,Y) term when X and Y are dependent. 6 7 struct OnlineVar { 8 long long n = 0; long double mean = 0, M2 = 0; 9 void add(long double x) { n++; long double d = x - mean; mean += d / n; M2 += d * (x - mean); } 10 long double var_pop() const { return n > 0 ? M2 / n : 0.0L; } 11 }; 12 13 int main(){ 14 ios::sync_with_stdio(false); 15 cin.tie(nullptr); 16 17 const int N = 2'000'000; 18 mt19937_64 rng(123456); 19 normal_distribution<long double> normX(0.0L, 1.0L); // Var=1 20 21 // Case 1: Independent X,Y ~ N(0,1) 22 OnlineVar vX, vY, vS1; // for X, Y, and S1 = X+Y 23 for(int i=0;i<N;i++){ 24 long double x = normX(rng); 25 long double y = normX(rng); // independent draw 26 vX.add(x); vY.add(y); vS1.add(x+y); 27 } 28 29 // Case 2: Dependent: Y = X + Z with Z ~ N(0,1) independent of X 30 OnlineVar vXd, vYd, vS2; // S2 = X + Y = 2X + Z 31 for(int i=0;i<N;i++){ 32 long double x = normX(rng); 33 long double z = normX(rng); 34 long double y = x + z; // Cov(X,Y) = Var(X) = 1 35 vXd.add(x); vYd.add(y); vS2.add(x+y); 36 } 37 38 cout.setf(std::ios::fixed); cout<<setprecision(4); 39 cout << "Independent case:\n"; 40 cout << "Var(X)≈" << (double)vX.var_pop() << ", Var(Y)≈" << (double)vY.var_pop() 41 << ", Var(X+Y)≈" << (double)vS1.var_pop() << " (expected 2)\n\n"; 42 43 cout << "Dependent case (Y=X+Z):\n"; 44 cout << "Var(X)≈" << (double)vXd.var_pop() << ", Var(Y)≈" << (double)vYd.var_pop() 45 << ", Var(X+Y)≈" << (double)vS2.var_pop() << " (expected Var(2X+Z)=4+1=5)\n"; 46 47 return 0; 48 } 49
Simulates Gaussian variables to empirically confirm that variance of sums adds for independence and includes an extra term when variables are dependent. The dependent construction Y=X+Z makes Cov(X,Y)=Var(X), increasing Var(X+Y).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Straightforward two-pass computation of sample mean, variance, and covariance. 5 // Safer numerically than naive one-pass E[X^2]-E[X]^2; still use double/long double. 6 7 int main(){ 8 ios::sync_with_stdio(false); 9 cin.tie(nullptr); 10 11 int n; cin >> n; 12 vector<long double> x(n), y(n); 13 for(int i=0;i<n;i++) cin >> x[i]; 14 for(int i=0;i<n;i++) cin >> y[i]; 15 16 auto mean = [&](const vector<long double>& a){ 17 long double s = 0; for(auto v: a) s += v; return s / (long double)a.size(); 18 }; 19 20 long double mx = mean(x), my = mean(y); 21 22 long double Sxx = 0, Syy = 0, Sxy = 0; 23 for(int i=0;i<n;i++){ 24 long double dx = x[i] - mx; 25 long double dy = y[i] - my; 26 Sxx += dx*dx; Syy += dy*dy; Sxy += dx*dy; 27 } 28 29 long double var_sample_x = (n>1) ? Sxx/(n-1) : 0.0L; 30 long double var_sample_y = (n>1) ? Syy/(n-1) : 0.0L; 31 long double cov_sample = (n>1) ? Sxy/(n-1) : 0.0L; 32 33 cout.setf(std::ios::fixed); cout<<setprecision(6); 34 cout << "mean_x " << (double)mx << " mean_y " << (double)my << "\n"; 35 cout << "sample_var_x " << (double)var_sample_x << " sample_var_y " << (double)var_sample_y << "\n"; 36 cout << "sample_cov_xy " << (double)cov_sample << "\n"; 37 return 0; 38 } 39
Reads two arrays and computes unbiased sample variance and covariance using the classic two-pass approach: first compute means, then accumulate centered sums. Uses 1/(n-1) to remove bias.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Estimate p = P(X^2 + Y^2 <= 1) ~ pi/4 via Monte Carlo and attach an error bar. 5 // Uses sample variance to form a Chebyshev-style confidence band (distribution-free). 6 7 struct OnlineBernoulli { 8 long long n = 0; long double mean = 0, M2 = 0; // for indicator values 9 void add(int b){ n++; long double x = (long double)b; long double d = x - mean; mean += d/n; M2 += d*(x - mean); } 10 long double p_hat() const { return mean; } 11 long double var_mean_pop() const { // Var of sample mean under population variance 12 if (n==0) return 0.0L; long double var_pop = (n>0 ? M2/n : 0.0L); return var_pop / n; } 13 long double var_mean_unb() const { // Unbiased estimator of Var(mean) 14 if (n<=1) return 0.0L; long double s2 = M2/(n-1); return s2 / n; } 15 }; 16 17 int main(){ 18 ios::sync_with_stdio(false); 19 cin.tie(nullptr); 20 21 int N = 1'000'00; // 100k samples 22 mt19937_64 rng(987654321); 23 uniform_real_distribution<long double> U(-1.0L, 1.0L); 24 25 OnlineBernoulli est; 26 for(int i=0;i<N;i++){ 27 long double x = U(rng), y = U(rng); 28 int inside = (x*x + y*y <= 1.0L) ? 1 : 0; // indicator 29 est.add(inside); 30 } 31 32 long double p = est.p_hat(); 33 long double pi_est = 4.0L * p; 34 35 // Chebyshev: P(|p_hat - p| >= k*sqrt(Var(p_hat))) <= 1/k^2. 36 // Use unbiased variance of the mean. 37 long double var_mean = est.var_mean_unb(); 38 long double se = sqrt(max((long double)0.0, var_mean)); 39 long double k = 3.0L; // ~88.9% bound without normality assumptions 40 41 cout.setf(std::ios::fixed); cout<<setprecision(6); 42 cout << "pi_estimate = " << (double)pi_est << "\n"; 43 cout << "p_hat = " << (double)p << ", std_error ≈ " << (double)se << "\n"; 44 cout << "Chebyshev band (k=" << (double)k << "): p in [" 45 << (double)(p - k*se) << ", " << (double)(p + k*se) << "] with prob ≥ " 46 << (double)(1.0L - 1.0L/(k*k)) << "\n"; 47 48 return 0; 49 } 50
Estimates π using random sampling and quantifies uncertainty. Uses the variance of the sample mean of Bernoulli indicators and Chebyshev’s inequality to produce a conservative probability bound without assuming normality.