Expected Value
Key Points
- •Expected value is the long-run average outcome of a random variable if you could repeat the experiment many times.
- •For discrete variables, it is computed as E[X] = x P(); for continuous variables, use an integral.
- •Linearity of expectation always holds: E[aX + bY] = aE[X] + bE[Y], even if X and Y are dependent.
- •Indicator variables turn counting problems into probability sums: E[count] = P(even).
- •The geometric distribution models trials until first success and has E[X] = 1/p.
- •Expected values guide algorithm analysis, including expected time and expected number of events like collisions.
- •Monte Carlo simulation can estimate expected values when exact computation is hard.
- •Do not confuse E[f(X)] with f(E[X]); linearity applies only to linear functions.
Prerequisites
- →Basic probability (events, sample space, probability axioms) — Expected value is defined in terms of probabilities over outcomes.
- →Random variables (discrete and continuous) — Expectation is a property of random variables and depends on PMFs/PDFs.
- →Summations and integrals — E[X] uses sums for discrete variables and integrals for continuous ones.
- →Bernoulli and geometric distributions — Common building blocks for many expectation problems and simulations.
- →Linearity of expectation — Key tool to simplify complex expected value calculations.
- →Variance and standard error (basic) — Needed to understand Monte Carlo convergence and error bars.
- →C++ basics (loops, functions, std::vector) — Required to implement simulations and computations.
- →Random number generation in C++ (std::mt19937, distributions) — Necessary for Monte Carlo estimation and randomized experiments.
- →Asymptotic notation — Expectation frequently appears in time complexity analysis like O(n \log n).
Detailed Explanation
Tap terms for definitions01Overview
Expected value is a way to summarize the average outcome you would anticipate from a random process if it were repeated many times under identical conditions. Think of it as the "center of mass" of the distribution of a random variable. For a discrete random variable, you multiply each possible value by its probability and add them up. For a continuous one, you integrate the value times its probability density over all possibilities. Expected value is fundamental in probability, statistics, and computer science because it captures what happens on average, even if individual outcomes vary widely. It powers reasoning in randomized algorithms, risk assessments, and performance guarantees. A superpower of expectation is linearity: the expectation of a sum equals the sum of expectations, regardless of dependencies. This makes complex counting and performance analyses surprisingly simple: you can break a problem into manageable pieces, compute each piece’s expected contribution, and add them. Famous examples include the expected number of coin tosses to see heads (1/p), the expected number of collisions in a hash table (about n^2/(2m) for n keys, m buckets), and the coupon collector problem’s expected time (nH_n). Even when exact distributions are messy, expectation frequently remains tractable, either analytically or via simulation.
02Intuition & Analogies
Imagine you play a game where you roll a fair die and get paid the number of dots shown. If you play once, your payout could be 1, 6, or anything in between. But if you play thousands of times and average your earnings, the number stabilizes around 3.5. That stable long-run average is the expected value: not necessarily an achievable single outcome (you can’t roll 3.5), but the equilibrium of outcomes over many repeats. Linearity is like splitting chores among teammates. If Alex and Blair each have jobs that take variable time, the average total time is just Alex’s average plus Blair’s average, even if sometimes they help or hinder each other. The average contribution adds up, no matter the coordination (dependence). This lets us convert complicated, intertwined processes into sums of simple averages. Indicator variables act like light switches that are 1 if an event happens and 0 otherwise. Counting how many events occur is the same as summing their indicators. Since E[Indicator] = probability the light is on, the expected count is just the sum of those probabilities. This trick is magical for problems like counting collisions, matches, or appearances where directly computing the distribution is hard. The geometric distribution models “try until you succeed.” If each independent attempt has success chance p, on average you need 1/p tries. It’s like shooting free throws with 60% success: you’ll hit on average in about 1/0.6 ≈ 1.67 shots. Individually, you might hit on the first or fourth attempt, but over many games, the average settles at 1/p.
03Formal Definition
04When to Use
Use expected value when you need the average performance, cost, or count over randomness. In algorithm analysis, it yields expected running time (like randomized quicksort’s O(n \log n)), expected memory usage (expected collisions in hashing), or expected number of steps in processes (coupon collector). In probabilistic counting, it helps answer questions like: How many matches will we see? How many bins are empty after random allocations? How many unique items will appear? Indicator variables make these cases especially straightforward. When exact distributions are messy or intractable, use Monte Carlo simulation to estimate expectations with controllable error. Expected value also supports decision-making under uncertainty (e.g., choose the option with highest expected utility) and parameter estimation (e.g., method of moments sets sample averages equal to expected values). In competitive programming and technical interviews, expected value often simplifies problems that seem to require enumerating exponentially many outcomes. If you can decompose a complex outcome into a sum of contributions whose expectations you can find, linearity lets you add them. Apply conditional expectation when stages or conditioning information are natural (e.g., expected steps given current state, then average over states).
⚠️Common Mistakes
- Confusing E[f(X)] with f(E[X]). Linearity holds only for linear functions; in general E[X^2] \neq (E[X])^2 and E[1/X] \neq 1/E[X]. Jensen’s inequality describes how convexity skews this relationship.
- Assuming independence is needed for linearity. It is not. E[X + Y] = E[X] + E[Y] always, even when X and Y are dependent. Independence is needed for some multiplicative properties (like E[XY] = E[X]E[Y]) but not for sums.
- Ignoring existence conditions. Some distributions have infinite or undefined expectation (e.g., Cauchy). Always check integrability (\sum |x|p(x) < \infty or \int |x| f(x) dx < \infty).
- Mixing sample averages with true expectations. A Monte Carlo estimate \hat{\mu} approximates E[X], but it has sampling error. Report confidence or error bounds and ensure enough trials.
- Misusing indicator variables. Ensure the indicator exactly matches the event of interest, and remember E[Indicator] = probability of that event, not necessarily 1/n or similar shortcuts.
- Dropping conditions prematurely. When using the law of total expectation, keep track of conditioning variables; compute E[X \mid Y=y] correctly before averaging over Y.
- Overlooking units and conventions. For the geometric distribution, clarify whether X counts trials until success (mean 1/p) or failures before success (mean (1-p)/p).
Key Formulas
Discrete Expectation
Explanation: To find the expected value of a discrete random variable, multiply each possible value by its probability and sum. This yields the long-run average outcome.
Continuous Expectation
Explanation: For continuous variables with density , integrate x times the density over all real numbers. This generalizes the weighted average idea to continuous outcomes.
Linearity of Expectation
Explanation: Expectation distributes over sums and scales by constants, regardless of dependence. This allows complex averages to be computed term-wise.
Finite Sum Linearity
Explanation: The expectation of a finite sum equals the sum of expectations. This is the backbone of indicator-based counting arguments.
Indicator Expectation
Explanation: The expected value of the indicator of event A equals the probability that A occurs. This converts counting problems into probability sums.
Expected Count via Indicators
Explanation: The expected number of events among ,..., that occur equals the sum of their individual probabilities, even if the events are dependent.
Law of Total Expectation
Explanation: Compute the conditional expectation of X given Y, then average over Y. Useful when analyzing multi-stage processes or Markov chains.
Geometric Mean (trials until success)
Explanation: For the geometric distribution counting trials until the first success with success probability p, the expected number of trials is 1/p.
Coupon Collector Expectation
Explanation: The expected number of draws to collect all n coupons equals n times the n-th harmonic number. This follows from linearity over stages.
Expected Hash Collisions
Explanation: Under uniform hashing of n keys into m buckets, each pair collides with probability 1/m. Summing over all pairs yields the expected number of colliding pairs.
Variance via Moments
Explanation: Variance measures spread around the mean using expectations. Though not required for expectation itself, it quantifies uncertainty of Monte Carlo estimates.
Harmonic Number Asymptotics
Explanation: The harmonic number grows like log n plus the Euler–Mascheroni constant. This helps approximate coupon collector expectations for large n.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 int main() { 5 // Example: Expected value of a fair six-sided die 6 // Analytical computation using the PMF: p(k) = 1/6 for k=1..6 7 vector<int> values = {1,2,3,4,5,6}; 8 vector<double> probs(6, 1.0/6.0); 9 10 double expected = 0.0; 11 for (int i = 0; i < 6; ++i) expected += values[i] * probs[i]; 12 13 // Monte Carlo simulation to estimate E[X] 14 const int64_t TRIALS = 2'000'000; // 2 million rolls 15 mt19937 rng(42); // fixed seed for reproducibility 16 uniform_int_distribution<int> die(1, 6); 17 18 long double sum = 0.0L; 19 for (int64_t t = 0; t < TRIALS; ++t) sum += die(rng); 20 long double estimate = sum / TRIALS; 21 22 cout.setf(ios::fixed); cout << setprecision(6); 23 cout << "Analytical E[die] = " << expected << "\n"; 24 cout << "Monte Carlo estimate = " << (double)estimate << " (|error| = " << fabs((double)estimate - expected) << ")\n"; 25 26 // Time Complexity: O(TRIALS) 27 // Space Complexity: O(1) 28 return 0; 29 } 30
We compute E[X] for a fair die by summing x * P(X=x) and verify it with a large Monte Carlo simulation. The estimate converges to 3.5 as the number of trials increases, illustrating the long-run average.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Count collisions as number of colliding pairs across buckets 5 // If a bucket has c keys, it contributes c*(c-1)/2 colliding pairs. 6 7 int main() { 8 ios::sync_with_stdio(false); 9 cin.tie(nullptr); 10 11 int n = 200; // number of keys 12 int m = 1000; // number of buckets 13 int TRIALS = 3000; // Monte Carlo trials 14 15 // Analytical expectation via indicators: 16 // For each pair (i,j), P(collision) = 1/m under uniform hashing. 17 // E[collisions] = C(n,2)/m 18 long double analytical = ( (long double)n * (n - 1) / 2.0L ) / (long double)m; 19 20 mt19937 rng(123); 21 uniform_int_distribution<int> bucket(0, m - 1); 22 23 long double sum = 0.0L; 24 vector<int> cnt(m); 25 for (int t = 0; t < TRIALS; ++t) { 26 fill(cnt.begin(), cnt.end(), 0); 27 for (int i = 0; i < n; ++i) { 28 int b = bucket(rng); // random bucket for key i 29 ++cnt[b]; 30 } 31 long long collisions = 0; 32 for (int b = 0; b < m; ++b) { 33 long long c = cnt[b]; 34 collisions += c * (c - 1) / 2; // number of colliding pairs in this bucket 35 } 36 sum += collisions; 37 } 38 long double estimate = sum / TRIALS; 39 40 cout.setf(ios::fixed); cout << setprecision(6); 41 cout << "Analytical E[collisions] = " << (double)analytical << "\n"; 42 cout << "Monte Carlo estimate = " << (double)estimate 43 << " (|error| = " << fabsl(estimate - analytical) << ")\n"; 44 45 // Time Complexity: O(TRIALS * (n + m)) because we assign n keys and scan m buckets each trial 46 // Space Complexity: O(m) for bucket counts 47 return 0; 48 } 49
We use indicator variables for each key pair to derive E[collisions] = C(n,2)/m. The simulation assigns keys to buckets uniformly and counts colliding pairs, confirming linearity of expectation.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 int main() { 5 ios::sync_with_stdio(false); 6 cin.tie(nullptr); 7 8 double p = 0.3; // success probability per trial 9 int TRIALS = 1'000'000; // number of experiments 10 11 mt19937 rng(2024); 12 uniform_real_distribution<double> U(0.0, 1.0); 13 14 long double total_trials = 0.0L; 15 for (int t = 0; t < TRIALS; ++t) { 16 int count = 0; 17 // Keep trying until success (U < p) 18 while (true) { 19 ++count; 20 if (U(rng) < p) break; 21 } 22 total_trials += count; 23 } 24 long double estimate = total_trials / TRIALS; 25 long double analytical = 1.0L / p; 26 27 cout.setf(ios::fixed); cout << setprecision(6); 28 cout << "Analytical E[trials] = " << (double)analytical << "\n"; 29 cout << "Monte Carlo estimate = " << (double)estimate 30 << " (|error| = " << fabsl(estimate - analytical) << ")\n"; 31 32 // Time Complexity: O(TRIALS * E[X]) = O(TRIALS / p) expected, since each loop averages 1/p iterations 33 // Space Complexity: O(1) 34 return 0; 35 } 36
This simulates the number of Bernoulli trials needed to see the first success. The sample mean approaches 1/p, matching the geometric distribution’s expectation under the trials-until-success convention.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute harmonic number H_n 5 long double harmonic(int n) { 6 long double H = 0.0L; 7 for (int k = 1; k <= n; ++k) H += 1.0L / k; 8 return H; 9 } 10 11 int main() { 12 ios::sync_with_stdio(false); 13 cin.tie(nullptr); 14 15 int n = 50; // number of coupon types 16 17 // Analytical formula: E[T] = n * H_n 18 long double analytical = (long double)n * harmonic(n); 19 20 // DP-style stage expectation: from k collected to k+1 collected takes geometric steps 21 // with success prob p_k = (n - k) / n, so expected steps = 1 / p_k = n / (n - k) 22 long double staged = 0.0L; 23 for (int k = 0; k < n; ++k) staged += (long double)n / (n - k); 24 25 // Monte Carlo simulation for a smaller n to keep runtime reasonable 26 int n_small = 20; 27 int TRIALS = 2000; 28 mt19937 rng(7); 29 uniform_int_distribution<int> pick(0, n_small - 1); 30 31 long double sum_steps = 0.0L; 32 vector<char> seen(n_small); 33 for (int t = 0; t < TRIALS; ++t) { 34 fill(seen.begin(), seen.end(), 0); 35 int distinct = 0; 36 int steps = 0; 37 while (distinct < n_small) { 38 int c = pick(rng); 39 if (!seen[c]) { seen[c] = 1; ++distinct; } 40 ++steps; 41 } 42 sum_steps += steps; 43 } 44 long double sim_estimate = sum_steps / TRIALS; 45 long double sim_analytical = (long double)n_small * harmonic(n_small); 46 47 cout.setf(ios::fixed); cout << setprecision(6); 48 cout << "Coupon collector (n=" << n << ")\n"; 49 cout << "Formula: n*H_n = " << (double)analytical << "\n"; 50 cout << "Stage sum (DP-style) = " << (double)staged << "\n"; 51 cout << "\nSimulation (n=" << n_small << ", trials=" << TRIALS << ")\n"; 52 cout << "Monte Carlo estimate = " << (double)sim_estimate << "\n"; 53 cout << "Analytical for n_small = " << (double)sim_analytical 54 << " (|error| = " << fabsl(sim_estimate - sim_analytical) << ")\n"; 55 56 // Time Complexity: 57 // - Harmonic/Stage sums: O(n) 58 // - Simulation: O(TRIALS * expected steps) ≈ O(TRIALS * n_small log n_small) 59 // Space Complexity: O(1) for formulas, O(n_small) for simulation state 60 61 return 0; 62 } 63
We compute the coupon collector expectation via the closed form nH_n and via a DP-style sum over stages using geometric expectations. A small-n simulation confirms the formula. This demonstrates conditional expectation and linearity across stages.