šŸŽ“How I Study AIHISA
šŸ“–Read
šŸ“„PapersšŸ“°BlogsšŸŽ¬Courses
šŸ’”Learn
šŸ›¤ļøPathsšŸ“šTopicsšŸ’”ConceptsšŸŽ“Shorts
šŸŽÆPractice
ā±ļøCoach🧩Problems🧠ThinkingšŸŽÆPrompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
āˆ‘MathIntermediate

Random Variables & Distributions

Key Points

  • •
    A random variable maps uncertain outcomes to numbers and is described by a distribution that assigns likelihoods to values or ranges.
  • •
    Discrete random variables use a probability mass function (PMF), while continuous random variables use a probability density function (PDF).
  • •
    The cumulative distribution function (CDF) works for both types and gives P(X≤x), which is the area under the PMF/PDF up to x.
  • •
    Expectation is the long-run average value (a weighted sum or integral), and variance measures spread around the mean.
  • •
    You can sample discrete distributions with prefix sums and binary search, and continuous ones via inverse transform or Box–Muller for normals.
  • •
    Monte Carlo simulation estimates expectations and probabilities by averaging many random samples.
  • •
    Always ensure probabilities sum (or integrate) to 1 and remember that a PDF’s value is not a probability by itself.
  • •
    Careful seeding and numeric stability (floating-point issues) are essential for correct simulation results in C++.

Prerequisites

  • →Basic Probability — Understanding events, sample spaces, and conditional probability is essential to define random variables and distributions.
  • →Single-Variable Calculus — Integrals are needed to compute probabilities and expectations for continuous distributions.
  • →Sequences and Limits — Convergence concepts underpin the Law of Large Numbers and Central Limit Theorem.
  • →C++ Programming Fundamentals — You must know how to compile, use standard libraries, and manage types and I/O to implement samplers.
  • →Random Number Generation — Using PRNGs (mt19937), seeding, and uniform distributions is the basis for sampling more complex distributions.
  • →Numerical Stability and Floating-Point — Simulations rely on floating-point arithmetic; avoiding underflow/overflow and precision loss is important.
  • →Data Structures (Arrays, Binary Search) — Efficient discrete sampling uses prefix arrays and binary search.

Detailed Explanation

Tap terms for definitions

01Overview

A random variable is a numerical view of uncertainty. Instead of saying ā€œthe die shows a face,ā€ we say ā€œX = 1, 2, ..., 6ā€ with certain probabilities. The distribution of a random variable specifies how likely each outcome is. For discrete random variables, the distribution is given by a probability mass function (PMF) that assigns a probability to each possible value. For continuous random variables, we use a probability density function (PDF), which describes how probability is distributed over the real line; probabilities are obtained by integrating the PDF over intervals. A unifying tool is the cumulative distribution function (CDF), which gives the probability that the variable is less than or equal to a given value. Random variables are the building blocks for statistical modeling, machine learning, and randomized algorithms. They let us quantify expectations, risks, and variability. For example, the number of emails you receive in an hour could be modeled by a Poisson random variable (discrete), while the time you wait for the next bus might be modeled by an exponential random variable (continuous). In practice, we simulate random variables to test algorithms, approximate integrals (Monte Carlo), and fit models to data. C++ provides powerful pseudo-random number generators and distributions, and understanding the math behind random variables helps you use these tools correctly and efficiently.

02Intuition & Analogies

Think of a random variable as a measuring tape you apply to uncertainty. When you roll a die, you could measure ā€œthe face value,ā€ giving numbers 1 through 6—this is a discrete random variable. When you measure the waiting time until the next bus, you get any real number, like 3.7 minutes—this is continuous. The distribution is a recipe telling you how likely each measurement is. For a discrete case, the recipe lists ingredients with exact amounts (probabilities for 1, 2, ..., 6). For a continuous case, it’s more like a density of sand spread along a line; any single point has no sand by itself, but intervals collect sand (probability) proportional to the area under the curve. The CDF is like a running total: slide from left to right and watch how much probability has accumulated so far. The expectation is the balance point (center of mass) of the distribution—if you placed weights at each possible value proportional to their probabilities, the expectation is where the seesaw balances. Variance captures how far values typically sit from the center: tightly clustered distributions have small variance; widely spread ones have large variance. Sampling is like drawing marbles from a jar without seeing inside: each draw follows the distribution’s recipe. For discrete distributions, you can label segments on a 0–1 ruler with lengths equal to probabilities; picking a uniform point then lands in one segment. For continuous distributions, the inverse transform method bends the 0–1 ruler using the CDF so that uniform draws map to the correct values. For normals, Box–Muller cleverly transforms two uniforms into bell-shaped numbers. These intuitive pictures guide both theory and practical algorithms.

03Formal Definition

A random variable X is a measurable function from a probability space (Ī©, F, P) to the real numbers R. Its distribution is the pushforward measure PX​(A) = P(X ∈ A) for measurable sets A. The cumulative distribution function (CDF) of X is FX​(x) = P(X ≤ x), which is nondecreasing, right-continuous, and satisfies limxā†’āˆ’āˆžā€‹ FX​(x) = 0 and limxā†’āˆžā€‹ FX​(x) = 1. X is discrete if it has countable support S āŠ‚ R with probability mass function (PMF) pX​(x) = P(X=x) for x ∈ S, satisfying āˆ‘x∈S​ pX​(x) = 1. X is continuous if there exists a nonnegative integrable function fX​(x) (the probability density function, PDF) such that for all a<b, P(a<X ≤ b) = ∫ab​ fX​(x)\,dx and āˆ«āˆ’āˆžāˆžā€‹ fX​(x)\,dx=1; then FX​(x) = āˆ«āˆ’āˆžx​ fX​(t)\,dt. For measurable g, the expectation is E[g(X)] = āˆ‘x∈S​ g(x) pX​(x) for discrete X and E[g(X)] = āˆ«āˆ’āˆžāˆžā€‹ g(x) fX​(x)\,dx for continuous X, provided the sums or integrals converge. The variance is Var(X) = E[(X - E[X])^2] = E[X2] - (E[X])^2. Independence of X and Y is characterized by FX,Y​(x,y) = FX​(x)FY​(y) or equivalently by factorization of joint PMF/PDF when they exist.

04When to Use

Use discrete random variables when outcomes are countable: dice, number of website clicks, defect counts, or successes in n trials (binomial). They’re ideal for modeling counts, categories, and yes/no processes. Use continuous random variables when measurements can vary smoothly: heights, times, temperatures, or distances. Exponential models are natural for waiting times; normal distributions model aggregated noise and measurement errors. In algorithm design, random variables and their distributions arise in randomized algorithms (e.g., quicksort pivoting), hashing analysis, and probabilistic data structures (Bloom filters). In simulation, you’ll sample from distributions to estimate expectations (Monte Carlo integration), evaluate risk (VaR in finance), or test system performance (queues, A/B testing). The CDF and quantile (inverse CDF) are used for computing probabilities, thresholds (percentiles), and generating samples via inverse transform sampling. Choose hand-rolled samplers when you need control, portability, or custom distributions; use library distributions (like std::normal_distribution) for speed and reliability. For large discrete supports with many repeated samples, consider O(1) alias sampling after O(n) preprocessing. For heavy-tailed or complex continuous distributions, use transformations, rejection sampling, or Markov chain Monte Carlo (MCMC).

āš ļøCommon Mistakes

• Confusing PMF and PDF: a PDF value f(x) is not a probability. Only integrals over intervals yield probabilities. For discrete variables, p(x) itself is a probability. • Forgetting normalization: discrete probabilities must sum to 1 and continuous densities must integrate to 1. In code, floating-point rounding can cause slight drift; renormalize and clamp the final prefix to 1. • Bad RNG usage: seeding mt19937 with time repeatedly inside loops or using rand() % n creates bias. Seed once with std::random_device and use uniform_real_distribution for [0,1). • Off-by-one in CDF prefix arrays: upper_bound vs lower_bound choices matter. With a uniform u in [0,1), using upper_bound on prefix CDF selects the correct index; ensure the last prefix entry is exactly 1 within epsilon. • Misinterpreting variance: standard deviation is the square root of variance; comparing spreads should use the same unit as the data (std. dev.), not squared units. • Ignoring dependence: applying formulas (e.g., Var(X+Y) = Var(X)+Var(Y)) without independence assumptions leads to wrong results. • Overfitting histograms: too many bins create noise; too few hide structure. Choose bin counts proportional to N^{1/3} or use rules (Freedman–Diaconis). • Numerical instability in tails: computing 1 - F(x) for large x can lose precision. Prefer survival functions or log-space computations when available.

Key Formulas

CDF

FX​(x)=P(X≤x)

Explanation: The CDF gives the probability that X is at most x. It fully characterizes the distribution for both discrete and continuous cases.

PMF Normalization

x∈Sāˆ‘ā€‹pX​(x)=1,pX​(x)=P(X=x)

Explanation: For a discrete random variable with support S, probabilities across all possible values must sum to 1. This ensures a valid distribution.

PDF Normalization

āˆ«āˆ’āˆžāˆžā€‹fX​(x)dx=1,fX​(x)≄0

Explanation: A continuous random variable’s density must integrate to 1 over the real line. Nonnegativity ensures probabilities are not negative.

CDF from PMF/PDF

FX​(x)=t≤xāˆ‘ā€‹pX​(t)(discrete),FX​(x)=āˆ«āˆ’āˆžx​fX​(t)dt(continuous)

Explanation: The CDF accumulates probability up to x. For discrete variables it is a sum; for continuous variables it is an integral.

Expectation

E[X]=xāˆ‘ā€‹xpX​(x)orE[X]=āˆ«āˆ’āˆžāˆžā€‹xfX​(x)dx

Explanation: The expectation is the long-run average value of X. It is a weighted sum or integral of values using their probabilities or densities.

Variance

Var(X)=E[X2]āˆ’(E[X])2

Explanation: Variance measures spread around the mean. Computing it via E[X2] avoids directly handling squared deviations in data streams.

Linearity of Expectation

E[aX+bY]=aE[X]+bE[Y]

Explanation: Expectation distributes over linear combinations regardless of independence. This is a key tool in algorithm analysis.

Independence (Continuous)

fX,Y​(x,y)=fX​(x)fY​(y)(ifĀ independent)

Explanation: For independent continuous random variables, their joint density factorizes. The same holds for PMFs in the discrete case.

Marginalization

pX​(x)=yāˆ‘ā€‹pX,Y​(x,y),fX​(x)=āˆ«āˆ’āˆžāˆžā€‹fX,Y​(x,y)dy

Explanation: You obtain the distribution of a subset of variables by summing or integrating out the others. This is used in Bayes nets and probabilistic models.

Quantile Function

Q(u)=Fāˆ’1(u)=inf{x:FX​(x)≄u}

Explanation: The quantile maps a probability u in [0,1] to the corresponding threshold x. Inverse transform sampling draws X by taking x=Q(U) for U ∼ Uniform(0,1).

Exponential Inverse CDF

Fāˆ’1(u)=āˆ’Ī»1​ln(1āˆ’u)(Exponential)

Explanation: Sampling from an exponential distribution with rate Ī» can be done by transforming a uniform random number using this closed form.

Normal PDF

fN​(x;μ,σ)=σ2π​1​exp(āˆ’2σ2(xāˆ’Ī¼)2​)

Explanation: This bell-shaped density is ubiquitous due to the Central Limit Theorem. It is parameterized by mean μ and standard deviation σ.

Binomial PMF

P(X=k)=(kn​)pk(1āˆ’p)nāˆ’k

Explanation: Models the number of successes in n independent Bernoulli trials with success probability p. Mean is np and variance is np(1-p).

Poisson PMF

P(X=k)=eāˆ’Ī»k!Ī»k​

Explanation: Models counts of rare events over fixed intervals with rate Ī». Mean and variance are both Ī».

Chebyshev's Inequality

P(∣Xˉnā€‹āˆ’Ī¼āˆ£ā‰„t)≤nt2Var(X)​

Explanation: Provides a distribution-free bound on deviations of the sample mean from the true mean. Useful for conservative guarantees.

Central Limit Theorem

nā€‹ĻƒXˉnā€‹āˆ’Ī¼ā€‹d​N(0,1)

Explanation: As n grows, the standardized sample mean approaches a standard normal distribution. This underlies many statistical approximations.

Complexity Analysis

Core computations for random variables center on evaluating PMF/PDF/CDF, sampling, and empirical estimation. For a discrete distribution over n outcomes, building a prefix CDF is O(n) time and O(n) space; sampling via binary search on the prefix is O(log n) per sample. If you will draw many samples, alias sampling can reduce per-sample time to O(1) after O(n) preprocessing at the cost of O(n) space. Computing the discrete CDF at a given index is O(1) with the prefix array. For continuous distributions with closed-form inverse CDFs (e.g., exponential), inverse transform sampling is O(1) per sample and O(1) space. For normals via Box–Muller, each pair of uniform variates yields two normal variates with O(1) time per sample and O(1) space; using a cached second variate can halve the number of logarithm, square root, and trigonometric calls. Evaluating PDFs or CDFs is O(1) if they have closed forms; numerical integration or special functions (e.g., the normal CDF) may require iterative methods with higher costs. Monte Carlo estimation of expectations, probabilities, and quantiles is O(N) time for N samples and typically O(1) auxiliary space (or O(B) when building histograms with B bins). The estimation error decreases like O(1/N​) regardless of dimension, but variance reduction techniques (antithetic variates, control variates) can improve constants. Numerically, floating-point rounding can accumulate in prefix sums and histograms; using double precision and occasional renormalization mitigates drift. Random number generation with mt19937 is fast (period 2^19937āˆ’1) and has effectively O(1) amortized cost per variate; avoid reseeding inside loops to maintain performance and statistical quality.

Code Examples

Sampling a Discrete Random Variable (Loaded Die) via Prefix CDF + Binary Search
1#include <bits/stdc++.h>
2using namespace std;
3
4struct DiscreteRV {
5 vector<double> prefix; // prefix[i] = P(X <= i)
6 explicit DiscreteRV(vector<double> probs) {
7 if (probs.empty()) throw runtime_error("Empty probability vector");
8 // Ensure non-negative and finite
9 for (double p : probs) {
10 if (!(p >= 0.0) || !isfinite(p)) throw runtime_error("Invalid probability");
11 }
12 // Normalize to sum to 1 (robust against rounding)
13 long double sum = 0.0L;
14 for (double p : probs) sum += p;
15 if (sum <= 0.0L) throw runtime_error("Sum of probabilities must be positive");
16 vector<double> pn;
17 pn.reserve(probs.size());
18 for (double p : probs) pn.push_back(static_cast<double>(p / sum));
19 // Build prefix CDF; clamp the last entry to 1 exactly
20 prefix.resize(pn.size());
21 partial_sum(pn.begin(), pn.end(), prefix.begin());
22 prefix.back() = 1.0; // guard against tiny drift like 0.9999999997
23 }
24 // Return P(X = k)
25 double pmf(size_t k) const {
26 if (k >= prefix.size()) return 0.0;
27 double prev = (k == 0 ? 0.0 : prefix[k-1]);
28 return max(0.0, prefix[k] - prev);
29 }
30 // Return P(X <= k)
31 double cdf(size_t k) const {
32 if (k >= prefix.size()) return 1.0;
33 return prefix[k];
34 }
35 // Sample using u ~ Uniform[0,1)
36 size_t sample(mt19937 &gen) const {
37 uniform_real_distribution<double> U(0.0, 1.0);
38 double u = U(gen);
39 // upper_bound finds first prefix[i] > u, which is the correct index
40 auto it = upper_bound(prefix.begin(), prefix.end(), u);
41 size_t idx = static_cast<size_t>(it - prefix.begin());
42 if (idx >= prefix.size()) idx = prefix.size() - 1; // safety
43 return idx;
44 }
45};
46
47int main() {
48 ios::sync_with_stdio(false);
49 cin.tie(nullptr);
50
51 // Loaded die probabilities for faces 1..6
52 vector<double> probs = {0.05, 0.10, 0.20, 0.25, 0.25, 0.15};
53 DiscreteRV die(probs);
54
55 // RNG: seed once
56 random_device rd;
57 mt19937 gen(rd());
58
59 // Empirical test
60 const int N = 200000; // number of samples
61 vector<int> counts(6, 0);
62 for (int i = 0; i < N; ++i) {
63 size_t face_idx = die.sample(gen); // 0..5
64 ++counts[face_idx];
65 }
66
67 cout << fixed << setprecision(4);
68 cout << "Face\tPMF(th)\tPMF(emp)\tCDF(th)\n";
69 for (size_t k = 0; k < 6; ++k) {
70 double pmf_th = die.pmf(k);
71 double pmf_emp = static_cast<double>(counts[k]) / N;
72 double cdf_th = die.cdf(k);
73 cout << (k+1) << '\t' << pmf_th << '\t' << pmf_emp << '\t' << cdf_th << "\n";
74 }
75
76 return 0;
77}
78

We model a loaded die with a discrete distribution. The constructor normalizes probabilities and builds a prefix CDF. Sampling draws a uniform u in [0,1) and finds the smallest index with prefix[i] > u via upper_bound, which selects an outcome with the correct probability. We compare theoretical PMF/CDF to empirical frequencies from many samples.

Time: O(n) to build; O(log n) per sample; O(N log n) for N samplesSpace: O(n)
Sampling an Exponential Distribution via Inverse Transform
1#include <bits/stdc++.h>
2using namespace std;
3
4struct ExponentialRV {
5 double lambda; // rate > 0
6 explicit ExponentialRV(double lam) : lambda(lam) {
7 if (!(lambda > 0.0) || !isfinite(lambda)) throw runtime_error("lambda must be positive and finite");
8 }
9 // PDF and CDF
10 double pdf(double x) const { return (x < 0 ? 0.0 : lambda * exp(-lambda * x)); }
11 double cdf(double x) const { return (x < 0 ? 0.0 : 1.0 - exp(-lambda * x)); }
12 // Quantile (inverse CDF). Use 1-u to avoid log(0) when u=0.
13 double inv_cdf(double u) const {
14 if (!(u >= 0.0 && u <= 1.0)) throw runtime_error("u must be in [0,1]");
15 if (u == 1.0) return numeric_limits<double>::infinity();
16 return -log(1.0 - u) / lambda;
17 }
18 // Sample using inverse transform with U ~ Uniform(0,1)
19 double sample(mt19937 &gen) const {
20 uniform_real_distribution<double> U(0.0, 1.0);
21 double u = U(gen);
22 // ensure u in (0,1) to avoid returning -log(1) = -0
23 if (u == 0.0) u = nextafter(0.0, 1.0);
24 return inv_cdf(u);
25 }
26};
27
28int main() {
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 double lambda = 2.0; // mean = 1/lambda = 0.5
33 ExponentialRV expo(lambda);
34
35 random_device rd;
36 mt19937 gen(rd());
37
38 const int N = 200000; // samples
39 double sum = 0.0, sumsq = 0.0;
40 int count_gt_1 = 0;
41 for (int i = 0; i < N; ++i) {
42 double x = expo.sample(gen);
43 sum += x;
44 sumsq += x * x;
45 if (x > 1.0) ++count_gt_1;
46 }
47
48 double mean_emp = sum / N;
49 double var_emp = sumsq / N - mean_emp * mean_emp;
50
51 cout << fixed << setprecision(6);
52 cout << "Theoretical mean: " << 1.0 / lambda << "\n";
53 cout << "Empirical mean: " << mean_emp << "\n";
54 cout << "Theoretical P(X>1): " << exp(-lambda * 1.0) << "\n";
55 cout << "Empirical P(X>1): " << static_cast<double>(count_gt_1) / N << "\n";
56 cout << "Empirical variance (for reference): " << var_emp << " (theoretical = " << 1.0/(lambda*lambda) << ")\n";
57
58 return 0;
59}
60

This program samples from an exponential distribution using inverse transform sampling: if U ~ Uniform(0,1), then X = F^{-1}(U) = -ln(1-U)/Ī» has the desired distribution. We verify the mean and a tail probability against their theoretical values and report empirical variance.

Time: O(1) per sample; O(N) total for N samplesSpace: O(1)
Normal Sampling via Box–Muller and Histogram Approximation of the PDF
1#include <bits/stdc++.h>
2using namespace std;
3
4struct NormalBM {
5 double mu, sigma;
6 bool has_spare = false; // cache the second variate
7 double spare = 0.0;
8 explicit NormalBM(double m, double s) : mu(m), sigma(s) {
9 if (!(sigma > 0.0) || !isfinite(mu) || !isfinite(sigma)) throw runtime_error("Invalid parameters");
10 }
11 double sample(mt19937 &gen) {
12 if (has_spare) { has_spare = false; return mu + sigma * spare; }
13 uniform_real_distribution<double> U(0.0, 1.0);
14 double u1 = U(gen);
15 double u2 = U(gen);
16 // Avoid log(0)
17 if (u1 == 0.0) u1 = nextafter(0.0, 1.0);
18 double r = sqrt(-2.0 * log(u1));
19 double theta = 2.0 * M_PI * u2;
20 double z0 = r * cos(theta);
21 double z1 = r * sin(theta);
22 spare = z1; has_spare = true;
23 return mu + sigma * z0;
24 }
25 static double pdf(double x, double mu, double sigma) {
26 const double inv = 1.0 / (sigma * sqrt(2.0 * M_PI));
27 double z = (x - mu) / sigma;
28 return inv * exp(-0.5 * z * z);
29 }
30};
31
32int main() {
33 ios::sync_with_stdio(false);
34 cin.tie(nullptr);
35
36 double mu = 0.0, sigma = 1.0; // standard normal
37 NormalBM norm(mu, sigma);
38 random_device rd; mt19937 gen(rd());
39
40 const int N = 200000; // samples
41 const int B = 21; // histogram bins
42 double lo = mu - 4*sigma, hi = mu + 4*sigma;
43 double width = (hi - lo) / B;
44
45 vector<int> hist(B, 0);
46 double sum = 0.0, sumsq = 0.0;
47
48 for (int i = 0; i < N; ++i) {
49 double x = norm.sample(gen);
50 sum += x; sumsq += x * x;
51 if (x >= lo && x < hi) {
52 int b = static_cast<int>((x - lo) / width);
53 if (b == B) b = B - 1; // edge case
54 ++hist[b];
55 }
56 }
57
58 double mean_emp = sum / N;
59 double var_emp = sumsq / N - mean_emp * mean_emp;
60
61 cout << fixed << setprecision(6);
62 cout << "Empirical mean: " << mean_emp << ", variance: " << var_emp << " (theoretical: 0, 1)\n\n";
63
64 cout << "bin_center\temp_pdf\tth_pdf\n";
65 for (int b = 0; b < B; ++b) {
66 double center = lo + (b + 0.5) * width;
67 double emp_pdf = static_cast<double>(hist[b]) / (N * width); // normalize counts to density
68 double th_pdf = NormalBM::pdf(center, mu, sigma);
69 cout << center << '\t' << emp_pdf << '\t' << th_pdf << '\n';
70 }
71
72 return 0;
73}
74

Box–Muller transforms two independent uniforms into two independent standard normal variables. We cache one variate for efficiency. A histogram normalized by bin width approximates the PDF; we print empirical vs. theoretical densities at bin centers and report empirical mean/variance.

Time: O(1) per sample; O(N + B) total for N samples and B binsSpace: O(B)
#random variable#pmf#pdf#cdf#expectation#variance#discrete distribution#continuous distribution#inverse transform sampling#box-muller#monte carlo#histogram#binomial#poisson#normal distribution