MathIntermediate

Expected Value

Key Points

  • Expected value is the long-run average outcome of a random variable if you could repeat the experiment many times.
  • For discrete variables, it is computed as E[X] = x P(); for continuous variables, use an integral.
  • Linearity of expectation always holds: E[aX + bY] = aE[X] + bE[Y], even if X and Y are dependent.
  • Indicator variables turn counting problems into probability sums: E[count] = P(even).
  • The geometric distribution models trials until first success and has E[X] = 1/p.
  • Expected values guide algorithm analysis, including expected time and expected number of events like collisions.
  • Monte Carlo simulation can estimate expected values when exact computation is hard.
  • Do not confuse E[f(X)] with f(E[X]); linearity applies only to linear functions.

Prerequisites

  • Basic probability (events, sample space, probability axioms)Expected value is defined in terms of probabilities over outcomes.
  • Random variables (discrete and continuous)Expectation is a property of random variables and depends on PMFs/PDFs.
  • Summations and integralsE[X] uses sums for discrete variables and integrals for continuous ones.
  • Bernoulli and geometric distributionsCommon building blocks for many expectation problems and simulations.
  • Linearity of expectationKey tool to simplify complex expected value calculations.
  • Variance and standard error (basic)Needed to understand Monte Carlo convergence and error bars.
  • C++ basics (loops, functions, std::vector)Required to implement simulations and computations.
  • Random number generation in C++ (std::mt19937, distributions)Necessary for Monte Carlo estimation and randomized experiments.
  • Asymptotic notationExpectation frequently appears in time complexity analysis like O(n \log n).

Detailed Explanation

Tap terms for definitions

01Overview

Expected value is a way to summarize the average outcome you would anticipate from a random process if it were repeated many times under identical conditions. Think of it as the "center of mass" of the distribution of a random variable. For a discrete random variable, you multiply each possible value by its probability and add them up. For a continuous one, you integrate the value times its probability density over all possibilities. Expected value is fundamental in probability, statistics, and computer science because it captures what happens on average, even if individual outcomes vary widely. It powers reasoning in randomized algorithms, risk assessments, and performance guarantees. A superpower of expectation is linearity: the expectation of a sum equals the sum of expectations, regardless of dependencies. This makes complex counting and performance analyses surprisingly simple: you can break a problem into manageable pieces, compute each piece’s expected contribution, and add them. Famous examples include the expected number of coin tosses to see heads (1/p), the expected number of collisions in a hash table (about n^2/(2m) for n keys, m buckets), and the coupon collector problem’s expected time (nH_n). Even when exact distributions are messy, expectation frequently remains tractable, either analytically or via simulation.

02Intuition & Analogies

Imagine you play a game where you roll a fair die and get paid the number of dots shown. If you play once, your payout could be 1, 6, or anything in between. But if you play thousands of times and average your earnings, the number stabilizes around 3.5. That stable long-run average is the expected value: not necessarily an achievable single outcome (you can’t roll 3.5), but the equilibrium of outcomes over many repeats. Linearity is like splitting chores among teammates. If Alex and Blair each have jobs that take variable time, the average total time is just Alex’s average plus Blair’s average, even if sometimes they help or hinder each other. The average contribution adds up, no matter the coordination (dependence). This lets us convert complicated, intertwined processes into sums of simple averages. Indicator variables act like light switches that are 1 if an event happens and 0 otherwise. Counting how many events occur is the same as summing their indicators. Since E[Indicator] = probability the light is on, the expected count is just the sum of those probabilities. This trick is magical for problems like counting collisions, matches, or appearances where directly computing the distribution is hard. The geometric distribution models “try until you succeed.” If each independent attempt has success chance p, on average you need 1/p tries. It’s like shooting free throws with 60% success: you’ll hit on average in about 1/0.6 ≈ 1.67 shots. Individually, you might hit on the first or fourth attempt, but over many games, the average settles at 1/p.

03Formal Definition

Let X be a random variable on a probability space (, , ). For a discrete X with support \{\} and mass function p() = (), the expected value is E[X] = p(), provided p() < . For a continuous X with density (x), the expected value is E[X] = x (x)\, dx, provided (x) dx < . Expectation is linear: for random variables X, Y and constants a, b, E[aX + bY] = aE[X] + bE[Y], which extends to finite or countable sums under absolute convergence. For an indicator random variable ( if event A occurs, 0 otherwise), E[] = (A). If \{\} are indicators of events , then the expected number of events occurring is E[ ] = (), irrespective of dependencies among the . The geometric distribution with parameter p in the “number of trials until first success” convention has support \{1,2,\} with E[X] = 1/p. Expectation can also be computed via conditioning: the law of total expectation states E[X] = E[ E[X Y] ], enabling stepwise analysis by first averaging within each condition on Y and then averaging across Y.

04When to Use

Use expected value when you need the average performance, cost, or count over randomness. In algorithm analysis, it yields expected running time (like randomized quicksort’s O(n \log n)), expected memory usage (expected collisions in hashing), or expected number of steps in processes (coupon collector). In probabilistic counting, it helps answer questions like: How many matches will we see? How many bins are empty after random allocations? How many unique items will appear? Indicator variables make these cases especially straightforward. When exact distributions are messy or intractable, use Monte Carlo simulation to estimate expectations with controllable error. Expected value also supports decision-making under uncertainty (e.g., choose the option with highest expected utility) and parameter estimation (e.g., method of moments sets sample averages equal to expected values). In competitive programming and technical interviews, expected value often simplifies problems that seem to require enumerating exponentially many outcomes. If you can decompose a complex outcome into a sum of contributions whose expectations you can find, linearity lets you add them. Apply conditional expectation when stages or conditioning information are natural (e.g., expected steps given current state, then average over states).

⚠️Common Mistakes

  1. Confusing E[f(X)] with f(E[X]). Linearity holds only for linear functions; in general E[X^2] \neq (E[X])^2 and E[1/X] \neq 1/E[X]. Jensen’s inequality describes how convexity skews this relationship.
  2. Assuming independence is needed for linearity. It is not. E[X + Y] = E[X] + E[Y] always, even when X and Y are dependent. Independence is needed for some multiplicative properties (like E[XY] = E[X]E[Y]) but not for sums.
  3. Ignoring existence conditions. Some distributions have infinite or undefined expectation (e.g., Cauchy). Always check integrability (\sum |x|p(x) < \infty or \int |x| f(x) dx < \infty).
  4. Mixing sample averages with true expectations. A Monte Carlo estimate \hat{\mu} approximates E[X], but it has sampling error. Report confidence or error bounds and ensure enough trials.
  5. Misusing indicator variables. Ensure the indicator exactly matches the event of interest, and remember E[Indicator] = probability of that event, not necessarily 1/n or similar shortcuts.
  6. Dropping conditions prematurely. When using the law of total expectation, keep track of conditioning variables; compute E[X \mid Y=y] correctly before averaging over Y.
  7. Overlooking units and conventions. For the geometric distribution, clarify whether X counts trials until success (mean 1/p) or failures before success (mean (1-p)/p).

Key Formulas

Discrete Expectation

Explanation: To find the expected value of a discrete random variable, multiply each possible value by its probability and sum. This yields the long-run average outcome.

Continuous Expectation

Explanation: For continuous variables with density , integrate x times the density over all real numbers. This generalizes the weighted average idea to continuous outcomes.

Linearity of Expectation

Explanation: Expectation distributes over sums and scales by constants, regardless of dependence. This allows complex averages to be computed term-wise.

Finite Sum Linearity

Explanation: The expectation of a finite sum equals the sum of expectations. This is the backbone of indicator-based counting arguments.

Indicator Expectation

Explanation: The expected value of the indicator of event A equals the probability that A occurs. This converts counting problems into probability sums.

Expected Count via Indicators

Explanation: The expected number of events among ,..., that occur equals the sum of their individual probabilities, even if the events are dependent.

Law of Total Expectation

Explanation: Compute the conditional expectation of X given Y, then average over Y. Useful when analyzing multi-stage processes or Markov chains.

Geometric Mean (trials until success)

Explanation: For the geometric distribution counting trials until the first success with success probability p, the expected number of trials is 1/p.

Coupon Collector Expectation

Explanation: The expected number of draws to collect all n coupons equals n times the n-th harmonic number. This follows from linearity over stages.

Expected Hash Collisions

Explanation: Under uniform hashing of n keys into m buckets, each pair collides with probability 1/m. Summing over all pairs yields the expected number of colliding pairs.

Variance via Moments

Explanation: Variance measures spread around the mean using expectations. Though not required for expectation itself, it quantifies uncertainty of Monte Carlo estimates.

Harmonic Number Asymptotics

Explanation: The harmonic number grows like log n plus the Euler–Mascheroni constant. This helps approximate coupon collector expectations for large n.

Complexity Analysis

Expected value appears in complexity analysis in two main ways: computing expectations analytically and estimating them via simulation. Analytical computation often reduces to summations over n terms or pairs, leading to O(n) or O() time depending on structure. For example, using indicators to count expected collisions sums over all pairs, which is O(), but sometimes symmetry or closed forms collapse this to O(1) computation after deriving the formula. Space is typically O(1) for closed-form evaluations. Monte Carlo estimation computes the sample mean over T trials. Each trial’s cost depends on the process: rolling a die is O(1) per trial (total O(T)), simulating n balls into m bins is O(n) per trial (total O(Tn)), and geometric waiting time has expected O steps per trial (total O). Space is usually O(1) or O(m) if a histogram is needed. The standard error of the sample mean decays as O(/), where is the variance of the random variable, so to halve the error you need roughly 4× more trials. This variance-driven convergence is a key trade-off of simulation. In algorithmic expected-time analyses, such as randomized quicksort (expected O(n n)) or coupon collector-like processes (expected O(n n)), expectation allows bounding average behavior even if worst-case time is worse. When combining multiple random components, linearity allows summing expected costs of independent or dependent subroutines without tracking their correlation, simplifying the analysis. Always confirm that the expectation is finite; some heavy-tailed processes can have infinite or undefined expectations, making expected-time claims invalid.

Code Examples

Computing and Simulating E[X] for a Fair Die
1#include <bits/stdc++.h>
2using namespace std;
3
4int main() {
5 // Example: Expected value of a fair six-sided die
6 // Analytical computation using the PMF: p(k) = 1/6 for k=1..6
7 vector<int> values = {1,2,3,4,5,6};
8 vector<double> probs(6, 1.0/6.0);
9
10 double expected = 0.0;
11 for (int i = 0; i < 6; ++i) expected += values[i] * probs[i];
12
13 // Monte Carlo simulation to estimate E[X]
14 const int64_t TRIALS = 2'000'000; // 2 million rolls
15 mt19937 rng(42); // fixed seed for reproducibility
16 uniform_int_distribution<int> die(1, 6);
17
18 long double sum = 0.0L;
19 for (int64_t t = 0; t < TRIALS; ++t) sum += die(rng);
20 long double estimate = sum / TRIALS;
21
22 cout.setf(ios::fixed); cout << setprecision(6);
23 cout << "Analytical E[die] = " << expected << "\n";
24 cout << "Monte Carlo estimate = " << (double)estimate << " (|error| = " << fabs((double)estimate - expected) << ")\n";
25
26 // Time Complexity: O(TRIALS)
27 // Space Complexity: O(1)
28 return 0;
29}
30

We compute E[X] for a fair die by summing x * P(X=x) and verify it with a large Monte Carlo simulation. The estimate converges to 3.5 as the number of trials increases, illustrating the long-run average.

Time: O(TRIALS)Space: O(1)
Linearity with Indicators: Expected Hash Collisions (Theory + Simulation)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Count collisions as number of colliding pairs across buckets
5// If a bucket has c keys, it contributes c*(c-1)/2 colliding pairs.
6
7int main() {
8 ios::sync_with_stdio(false);
9 cin.tie(nullptr);
10
11 int n = 200; // number of keys
12 int m = 1000; // number of buckets
13 int TRIALS = 3000; // Monte Carlo trials
14
15 // Analytical expectation via indicators:
16 // For each pair (i,j), P(collision) = 1/m under uniform hashing.
17 // E[collisions] = C(n,2)/m
18 long double analytical = ( (long double)n * (n - 1) / 2.0L ) / (long double)m;
19
20 mt19937 rng(123);
21 uniform_int_distribution<int> bucket(0, m - 1);
22
23 long double sum = 0.0L;
24 vector<int> cnt(m);
25 for (int t = 0; t < TRIALS; ++t) {
26 fill(cnt.begin(), cnt.end(), 0);
27 for (int i = 0; i < n; ++i) {
28 int b = bucket(rng); // random bucket for key i
29 ++cnt[b];
30 }
31 long long collisions = 0;
32 for (int b = 0; b < m; ++b) {
33 long long c = cnt[b];
34 collisions += c * (c - 1) / 2; // number of colliding pairs in this bucket
35 }
36 sum += collisions;
37 }
38 long double estimate = sum / TRIALS;
39
40 cout.setf(ios::fixed); cout << setprecision(6);
41 cout << "Analytical E[collisions] = " << (double)analytical << "\n";
42 cout << "Monte Carlo estimate = " << (double)estimate
43 << " (|error| = " << fabsl(estimate - analytical) << ")\n";
44
45 // Time Complexity: O(TRIALS * (n + m)) because we assign n keys and scan m buckets each trial
46 // Space Complexity: O(m) for bucket counts
47 return 0;
48}
49

We use indicator variables for each key pair to derive E[collisions] = C(n,2)/m. The simulation assigns keys to buckets uniformly and counts colliding pairs, confirming linearity of expectation.

Time: O(TRIALS * (n + m))Space: O(m)
Geometric Distribution: Expected Trials Until First Success (E[X] = 1/p)
1#include <bits/stdc++.h>
2using namespace std;
3
4int main() {
5 ios::sync_with_stdio(false);
6 cin.tie(nullptr);
7
8 double p = 0.3; // success probability per trial
9 int TRIALS = 1'000'000; // number of experiments
10
11 mt19937 rng(2024);
12 uniform_real_distribution<double> U(0.0, 1.0);
13
14 long double total_trials = 0.0L;
15 for (int t = 0; t < TRIALS; ++t) {
16 int count = 0;
17 // Keep trying until success (U < p)
18 while (true) {
19 ++count;
20 if (U(rng) < p) break;
21 }
22 total_trials += count;
23 }
24 long double estimate = total_trials / TRIALS;
25 long double analytical = 1.0L / p;
26
27 cout.setf(ios::fixed); cout << setprecision(6);
28 cout << "Analytical E[trials] = " << (double)analytical << "\n";
29 cout << "Monte Carlo estimate = " << (double)estimate
30 << " (|error| = " << fabsl(estimate - analytical) << ")\n";
31
32 // Time Complexity: O(TRIALS * E[X]) = O(TRIALS / p) expected, since each loop averages 1/p iterations
33 // Space Complexity: O(1)
34 return 0;
35}
36

This simulates the number of Bernoulli trials needed to see the first success. The sample mean approaches 1/p, matching the geometric distribution’s expectation under the trials-until-success convention.

Time: O(TRIALS / p) expectedSpace: O(1)
Coupon Collector: E[T] = n * H_n (Formula, DP-style sum, and Small Simulation)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute harmonic number H_n
5long double harmonic(int n) {
6 long double H = 0.0L;
7 for (int k = 1; k <= n; ++k) H += 1.0L / k;
8 return H;
9}
10
11int main() {
12 ios::sync_with_stdio(false);
13 cin.tie(nullptr);
14
15 int n = 50; // number of coupon types
16
17 // Analytical formula: E[T] = n * H_n
18 long double analytical = (long double)n * harmonic(n);
19
20 // DP-style stage expectation: from k collected to k+1 collected takes geometric steps
21 // with success prob p_k = (n - k) / n, so expected steps = 1 / p_k = n / (n - k)
22 long double staged = 0.0L;
23 for (int k = 0; k < n; ++k) staged += (long double)n / (n - k);
24
25 // Monte Carlo simulation for a smaller n to keep runtime reasonable
26 int n_small = 20;
27 int TRIALS = 2000;
28 mt19937 rng(7);
29 uniform_int_distribution<int> pick(0, n_small - 1);
30
31 long double sum_steps = 0.0L;
32 vector<char> seen(n_small);
33 for (int t = 0; t < TRIALS; ++t) {
34 fill(seen.begin(), seen.end(), 0);
35 int distinct = 0;
36 int steps = 0;
37 while (distinct < n_small) {
38 int c = pick(rng);
39 if (!seen[c]) { seen[c] = 1; ++distinct; }
40 ++steps;
41 }
42 sum_steps += steps;
43 }
44 long double sim_estimate = sum_steps / TRIALS;
45 long double sim_analytical = (long double)n_small * harmonic(n_small);
46
47 cout.setf(ios::fixed); cout << setprecision(6);
48 cout << "Coupon collector (n=" << n << ")\n";
49 cout << "Formula: n*H_n = " << (double)analytical << "\n";
50 cout << "Stage sum (DP-style) = " << (double)staged << "\n";
51 cout << "\nSimulation (n=" << n_small << ", trials=" << TRIALS << ")\n";
52 cout << "Monte Carlo estimate = " << (double)sim_estimate << "\n";
53 cout << "Analytical for n_small = " << (double)sim_analytical
54 << " (|error| = " << fabsl(sim_estimate - sim_analytical) << ")\n";
55
56 // Time Complexity:
57 // - Harmonic/Stage sums: O(n)
58 // - Simulation: O(TRIALS * expected steps) ≈ O(TRIALS * n_small log n_small)
59 // Space Complexity: O(1) for formulas, O(n_small) for simulation state
60
61 return 0;
62}
63

We compute the coupon collector expectation via the closed form nH_n and via a DP-style sum over stages using geometric expectations. A small-n simulation confirms the formula. This demonstrates conditional expectation and linearity across stages.

Time: O(n) for formulas; O(TRIALS * n_small log n_small) expected for simulationSpace: O(1) for formulas; O(n_small) for simulation