∑MathIntermediate

Random Variables & Distributions

Key Points

•
A random variable maps uncertain outcomes to numbers and is described by a distribution that assigns likelihoods to values or ranges.
•
Discrete random variables use a probability mass function (PMF), while continuous random variables use a probability density function (PDF).
•
The cumulative distribution function (CDF) works for both types and gives P( $X \leq x$ ), which is the area under the PMF/PDF up to x.
•
Expectation is the long-run average value (a weighted sum or integral), and variance measures spread around the mean.
•
You can sample discrete distributions with prefix sums and binary search, and continuous ones via inverse transform or Box–Muller for normals.
•
Monte Carlo simulation estimates expectations and probabilities by averaging many random samples.
•
Always ensure probabilities sum (or integrate) to 1 and remember that a PDF’s value is not a probability by itself.
•
Careful seeding and numeric stability (floating-point issues) are essential for correct simulation results in C++.

Prerequisites

→Basic Probability — Understanding events, sample spaces, and conditional probability is essential to define random variables and distributions.
→Single-Variable Calculus — Integrals are needed to compute probabilities and expectations for continuous distributions.
→Sequences and Limits — Convergence concepts underpin the Law of Large Numbers and Central Limit Theorem.
→C++ Programming Fundamentals — You must know how to compile, use standard libraries, and manage types and I/O to implement samplers.
→Random Number Generation — Using PRNGs (mt19937), seeding, and uniform distributions is the basis for sampling more complex distributions.
→Numerical Stability and Floating-Point — Simulations rely on floating-point arithmetic; avoiding underflow/overflow and precision loss is important.
→Data Structures (Arrays, Binary Search) — Efficient discrete sampling uses prefix arrays and binary search.

Detailed Explanation

Tap terms for definitions

01Overview

A random variable is a numerical view of uncertainty. Instead of saying “the die shows a face,” we say “X = 1, 2, ..., 6” with certain probabilities. The distribution of a random variable specifies how likely each outcome is. For discrete random variables, the distribution is given by a probability mass function (PMF) that assigns a probability to each possible value. For continuous random variables, we use a probability density function (PDF), which describes how probability is distributed over the real line; probabilities are obtained by integrating the PDF over intervals. A unifying tool is the cumulative distribution function (CDF), which gives the probability that the variable is less than or equal to a given value. Random variables are the building blocks for statistical modeling, machine learning, and randomized algorithms. They let us quantify expectations, risks, and variability. For example, the number of emails you receive in an hour could be modeled by a Poisson random variable (discrete), while the time you wait for the next bus might be modeled by an exponential random variable (continuous). In practice, we simulate random variables to test algorithms, approximate integrals (Monte Carlo), and fit models to data. C++ provides powerful pseudo-random number generators and distributions, and understanding the math behind random variables helps you use these tools correctly and efficiently.

02Intuition & Analogies

Think of a random variable as a measuring tape you apply to uncertainty. When you roll a die, you could measure “the face value,” giving numbers 1 through 6—this is a discrete random variable. When you measure the waiting time until the next bus, you get any real number, like 3.7 minutes—this is continuous. The distribution is a recipe telling you how likely each measurement is. For a discrete case, the recipe lists ingredients with exact amounts (probabilities for 1, 2, ..., 6). For a continuous case, it’s more like a density of sand spread along a line; any single point has no sand by itself, but intervals collect sand (probability) proportional to the area under the curve. The CDF is like a running total: slide from left to right and watch how much probability has accumulated so far. The expectation is the balance point (center of mass) of the distribution—if you placed weights at each possible value proportional to their probabilities, the expectation is where the seesaw balances. Variance captures how far values typically sit from the center: tightly clustered distributions have small variance; widely spread ones have large variance. Sampling is like drawing marbles from a jar without seeing inside: each draw follows the distribution’s recipe. For discrete distributions, you can label segments on a 0–1 ruler with lengths equal to probabilities; picking a uniform point then lands in one segment. For continuous distributions, the inverse transform method bends the 0–1 ruler using the CDF so that uniform draws map to the correct values. For normals, Box–Muller cleverly transforms two uniforms into bell-shaped numbers. These intuitive pictures guide both theory and practical algorithms.

03Formal Definition

A random variable X is a measurable function from a probability space (

Ω

F

P

) to the real numbers

R

. Its distribution is the pushforward measure

P_{X}

(A) =

P

\in

A) for measurable sets A. The cumulative distribution function (CDF) of X is

F_{X}

(x) =

P

\leq

x), which is nondecreasing, right-continuous, and satisfies

lim_{x \to - \infty}

F_{X}

(x) = 0 and

lim_{x \to \infty}

F_{X}

(x) = 1. X is discrete if it has countable support

S

\subset

R

with probability mass function (PMF)

p_{X}

(x) =

P

(

X = x

) for x

\in

S

, satisfying

\sum_{x \in S}

p_{X}

(x) = 1. X is continuous if there exists a nonnegative integrable function

f_{X}

(x) (the probability density function, PDF) such that for all

a < b

P

(

a < X

\leq

b) =

\int_{a}^{b}

f_{X}

(x)\,dx and

\int_{- \infty}^{\infty}

f_{X}

(x)\,

d x = 1;

then

F_{X}

(x) =

\int_{- \infty}^{x}

f_{X}

(t)\,dt. For measurable g, the expectation is

E

[g(X)] =

\sum_{x \in S}

g(x)

p_{X}

(x) for discrete X and

E

[g(X)] =

\int_{- \infty}^{\infty}

g(x)

f_{X}

(x)\,dx for continuous X, provided the sums or integrals converge. The variance is

Var

(X) =

E

[(X -

E

[X])^2] =

E

[

X^{2}

] - (

E

[X])^2. Independence of X and Y is characterized by

F_{X, Y}

(x,y) =

F_{X}

(x)

F_{Y}

(y) or equivalently by factorization of joint PMF/PDF when they exist.

04When to Use

Use discrete random variables when outcomes are countable: dice, number of website clicks, defect counts, or successes in n trials (binomial). They’re ideal for modeling counts, categories, and yes/no processes. Use continuous random variables when measurements can vary smoothly: heights, times, temperatures, or distances. Exponential models are natural for waiting times; normal distributions model aggregated noise and measurement errors. In algorithm design, random variables and their distributions arise in randomized algorithms (e.g., quicksort pivoting), hashing analysis, and probabilistic data structures (Bloom filters). In simulation, you’ll sample from distributions to estimate expectations (Monte Carlo integration), evaluate risk (VaR in finance), or test system performance (queues, A/B testing). The CDF and quantile (inverse CDF) are used for computing probabilities, thresholds (percentiles), and generating samples via inverse transform sampling. Choose hand-rolled samplers when you need control, portability, or custom distributions; use library distributions (like std::norma $l_d$ istribution) for speed and reliability. For large discrete supports with many repeated samples, consider O(1) alias sampling after O(n) preprocessing. For heavy-tailed or complex continuous distributions, use transformations, rejection sampling, or Markov chain Monte Carlo (MCMC).

⚠️Common Mistakes

• Confusing PMF and PDF: a PDF value f(x) is not a probability. Only integrals over intervals yield probabilities. For discrete variables, p(x) itself is a probability. • Forgetting normalization: discrete probabilities must sum to 1 and continuous densities must integrate to 1. In code, floating-point rounding can cause slight drift; renormalize and clamp the final prefix to 1. • Bad RNG usage: seeding mt19937 with time repeatedly inside loops or using rand() % n creates bias. Seed once with std::rando $m_d$ evice and use unifor $m_r$ ea $l_d$ istribution for [0,1). • Off-by-one in CDF prefix arrays: uppe $r_b$ ound vs lowe $r_b$ ound choices matter. With a uniform u in [0,1), using uppe $r_b$ ound on prefix CDF selects the correct index; ensure the last prefix entry is exactly 1 within epsilon. • Misinterpreting variance: standard deviation is the square root of variance; comparing spreads should use the same unit as the data (std. dev.), not squared units. • Ignoring dependence: applying formulas (e.g., Var(X+Y) = Var(X)+Var(Y)) without independence assumptions leads to wrong results. • Overfitting histograms: too many bins create noise; too few hide structure. Choose bin counts proportional to $N^{1/3}$ or use rules (Freedman–Diaconis). • Numerical instability in tails: computing 1 - F(x) for large x can lose precision. Prefer survival functions or log-space computations when available.

Key Formulas

CDF

F_{X} (x) = P (X \leq x)

Explanation: The CDF gives the probability that X is at most x. It fully characterizes the distribution for both discrete and continuous cases.

PMF Normalization

x \in S \sum p_{X} (x) = 1, p_{X} (x) = P (X = x)

Explanation: For a discrete random variable with support S, probabilities across all possible values must sum to 1. This ensures a valid distribution.

PDF Normalization

\int_{- \infty}^{\infty} f_{X} (x) d x = 1, f_{X} (x) \geq 0

Explanation: A continuous random variable’s density must integrate to 1 over the real line. Nonnegativity ensures probabilities are not negative.

CDF from PMF/PDF

F_{X} (x) = t \leq x \sum p_{X} (t) (discrete), F_{X} (x) = \int_{- \infty}^{x} f_{X} (t) d t (continuous)

Explanation: The CDF accumulates probability up to x. For discrete variables it is a sum; for continuous variables it is an integral.

Expectation

E [X] = x \sum x p_{X} (x) or E [X] = \int_{- \infty}^{\infty} x f_{X} (x) d x

Explanation: The expectation is the long-run average value of X. It is a weighted sum or integral of values using their probabilities or densities.

Variance

Var (X) = E [X^{2}] - (E [X])^{2}

Explanation: Variance measures spread around the mean. Computing it via E[ $X^{2}$ ] avoids directly handling squared deviations in data streams.

Linearity of Expectation

E [a X + bY] = a E [X] + b E [Y]

Explanation: Expectation distributes over linear combinations regardless of independence. This is a key tool in algorithm analysis.

Independence (Continuous)

f_{X, Y} (x, y) = f_{X} (x) f_{Y} (y) (if independent)

Explanation: For independent continuous random variables, their joint density factorizes. The same holds for PMFs in the discrete case.

Marginalization

p_{X} (x) = y \sum p_{X, Y} (x, y), f_{X} (x) = \int_{- \infty}^{\infty} f_{X, Y} (x, y) d y

Explanation: You obtain the distribution of a subset of variables by summing or integrating out the others. This is used in Bayes nets and probabilistic models.

Quantile Function

Q (u) = F^{- 1} (u) = in f {x : F_{X} (x) \geq u}

Explanation: The quantile maps a probability u in [0,1] to the corresponding threshold x. Inverse transform sampling draws X by taking $x = Q (U$ ) for U $\sim$ Uniform(0,1).

Exponential Inverse CDF

F^{- 1} (u) = - \frac{1}{λ} ln (1 - u) (Exponential)

Explanation: Sampling from an exponential distribution with rate λ can be done by transforming a uniform random number using this closed form.

Normal PDF

f_{N} (x; μ, σ) = \frac{1}{σ 2 π} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}})

Explanation: This bell-shaped density is ubiquitous due to the Central Limit Theorem. It is parameterized by mean μ and standard deviation $σ .$

Binomial PMF

P (X = k) = (k n) p^{k} (1 - p)^{n - k}

Explanation: Models the number of successes in n independent Bernoulli trials with success probability p. Mean is np and variance is np(1-p).

Poisson PMF

P (X = k) = e^{- λ} \frac{λ ^{k}}{k !}

Explanation: Models counts of rare events over fixed intervals with rate $λ .$ Mean and variance are both $λ .$

Chebyshev's Inequality

P (∣ \overset{ˉ}{X}_{n} - μ ∣ \geq t) \leq \frac{Var ( X )}{n t ^{2}}

Explanation: Provides a distribution-free bound on deviations of the sample mean from the true mean. Useful for conservative guarantees.

Central Limit Theorem

n \frac{X ˉ _{n} - μ}{σ} d N (0, 1)

Explanation: As n grows, the standardized sample mean approaches a standard normal distribution. This underlies many statistical approximations.

Complexity Analysis

Core computations for random variables center on evaluating PMF/PDF/CDF, sampling, and empirical estimation. For a discrete distribution over n outcomes, building a prefix CDF is O(n) time and O(n) space; sampling via binary search on the prefix is O(log n) per sample. If you will draw many samples, alias sampling can reduce per-sample time to O(1) after O(n) preprocessing at the cost of O(n) space. Computing the discrete CDF at a given index is O(1) with the prefix array. For continuous distributions with closed-form inverse CDFs (e.g., exponential), inverse transform sampling is O(1) per sample and O(1) space. For normals via Box–Muller, each pair of uniform variates yields two normal variates with O(1) time per sample and O(1) space; using a cached second variate can halve the number of logarithm, square root, and trigonometric calls. Evaluating PDFs or CDFs is O(1) if they have closed forms; numerical integration or special functions (e.g., the normal CDF) may require iterative methods with higher costs. Monte Carlo estimation of expectations, probabilities, and quantiles is O(N) time for N samples and typically O(1) auxiliary space (or O(B) when building histograms with B bins). The estimation error decreases like O(1/

N

) regardless of dimension, but variance reduction techniques (antithetic variates, control variates) can improve constants. Numerically, floating-point rounding can accumulate in prefix sums and histograms; using double precision and occasional renormalization mitigates drift. Random number generation with mt19937 is fast (period 2^19937−1) and has effectively O(1) amortized cost per variate; avoid reseeding inside loops to maintain performance and statistical quality.

Code Examples

Sampling a Discrete Random Variable (Loaded Die) via Prefix CDF + Binary Search

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct DiscreteRV {
5     vector<double> prefix; // prefix[i] = P(X <= i)
6     explicit DiscreteRV(vector<double> probs) {
7         if (probs.empty()) throw runtime_error("Empty probability vector");
8         // Ensure non-negative and finite
9         for (double p : probs) {
10             if (!(p >= 0.0) || !isfinite(p)) throw runtime_error("Invalid probability");
11         }
12         // Normalize to sum to 1 (robust against rounding)
13         long double sum = 0.0L;
14         for (double p : probs) sum += p;
15         if (sum <= 0.0L) throw runtime_error("Sum of probabilities must be positive");
16         vector<double> pn;
17         pn.reserve(probs.size());
18         for (double p : probs) pn.push_back(static_cast<double>(p / sum));
19         // Build prefix CDF; clamp the last entry to 1 exactly
20         prefix.resize(pn.size());
21         partial_sum(pn.begin(), pn.end(), prefix.begin());
22         prefix.back() = 1.0; // guard against tiny drift like 0.9999999997
23     }
24     // Return P(X = k)
25     double pmf(size_t k) const {
26         if (k >= prefix.size()) return 0.0;
27         double prev = (k == 0 ? 0.0 : prefix[k-1]);
28         return max(0.0, prefix[k] - prev);
29     }
30     // Return P(X <= k)
31     double cdf(size_t k) const {
32         if (k >= prefix.size()) return 1.0;
33         return prefix[k];
34     }
35     // Sample using u ~ Uniform[0,1)
36     size_t sample(mt19937 &gen) const {
37         uniform_real_distribution<double> U(0.0, 1.0);
38         double u = U(gen);
39         // upper_bound finds first prefix[i] > u, which is the correct index
40         auto it = upper_bound(prefix.begin(), prefix.end(), u);
41         size_t idx = static_cast<size_t>(it - prefix.begin());
42         if (idx >= prefix.size()) idx = prefix.size() - 1; // safety
43         return idx;
44     }
45 };
46 
47 int main() {
48     ios::sync_with_stdio(false);
49     cin.tie(nullptr);
50 
51     // Loaded die probabilities for faces 1..6
52     vector<double> probs = {0.05, 0.10, 0.20, 0.25, 0.25, 0.15};
53     DiscreteRV die(probs);
54 
55     // RNG: seed once
56     random_device rd;
57     mt19937 gen(rd());
58 
59     // Empirical test
60     const int N = 200000; // number of samples
61     vector<int> counts(6, 0);
62     for (int i = 0; i < N; ++i) {
63         size_t face_idx = die.sample(gen); // 0..5
64         ++counts[face_idx];
65     }
66 
67     cout << fixed << setprecision(4);
68     cout << "Face\tPMF(th)\tPMF(emp)\tCDF(th)\n";
69     for (size_t k = 0; k < 6; ++k) {
70         double pmf_th = die.pmf(k);
71         double pmf_emp = static_cast<double>(counts[k]) / N;
72         double cdf_th = die.cdf(k);
73         cout << (k+1) << '\t' << pmf_th << '\t' << pmf_emp << '\t' << cdf_th << "\n";
74     }
75 
76     return 0;
77 }
78

We model a loaded die with a discrete distribution. The constructor normalizes probabilities and builds a prefix CDF. Sampling draws a uniform u in [0,1) and finds the smallest index with prefix[i] > u via upper_bound, which selects an outcome with the correct probability. We compare theoretical PMF/CDF to empirical frequencies from many samples.

Time: O(n) to build; O(log n) per sample; O(N log n) for N samplesSpace: O(n)

Sampling an Exponential Distribution via Inverse Transform

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct ExponentialRV {
5     double lambda; // rate > 0
6     explicit ExponentialRV(double lam) : lambda(lam) {
7         if (!(lambda > 0.0) || !isfinite(lambda)) throw runtime_error("lambda must be positive and finite");
8     }
9     // PDF and CDF
10     double pdf(double x) const { return (x < 0 ? 0.0 : lambda * exp(-lambda * x)); }
11     double cdf(double x) const { return (x < 0 ? 0.0 : 1.0 - exp(-lambda * x)); }
12     // Quantile (inverse CDF). Use 1-u to avoid log(0) when u=0.
13     double inv_cdf(double u) const {
14         if (!(u >= 0.0 && u <= 1.0)) throw runtime_error("u must be in [0,1]");
15         if (u == 1.0) return numeric_limits<double>::infinity();
16         return -log(1.0 - u) / lambda;
17     }
18     // Sample using inverse transform with U ~ Uniform(0,1)
19     double sample(mt19937 &gen) const {
20         uniform_real_distribution<double> U(0.0, 1.0);
21         double u = U(gen);
22         // ensure u in (0,1) to avoid returning -log(1) = -0
23         if (u == 0.0) u = nextafter(0.0, 1.0);
24         return inv_cdf(u);
25     }
26 };
27 
28 int main() {
29     ios::sync_with_stdio(false);
30     cin.tie(nullptr);
31 
32     double lambda = 2.0; // mean = 1/lambda = 0.5
33     ExponentialRV expo(lambda);
34 
35     random_device rd;
36     mt19937 gen(rd());
37 
38     const int N = 200000; // samples
39     double sum = 0.0, sumsq = 0.0;
40     int count_gt_1 = 0;
41     for (int i = 0; i < N; ++i) {
42         double x = expo.sample(gen);
43         sum += x;
44         sumsq += x * x;
45         if (x > 1.0) ++count_gt_1;
46     }
47 
48     double mean_emp = sum / N;
49     double var_emp = sumsq / N - mean_emp * mean_emp;
50 
51     cout << fixed << setprecision(6);
52     cout << "Theoretical mean: " << 1.0 / lambda << "\n";
53     cout << "Empirical mean:   " << mean_emp << "\n";
54     cout << "Theoretical P(X>1): " << exp(-lambda * 1.0) << "\n";
55     cout << "Empirical P(X>1):   " << static_cast<double>(count_gt_1) / N << "\n";
56     cout << "Empirical variance (for reference): " << var_emp << " (theoretical = " << 1.0/(lambda*lambda) << ")\n";
57 
58     return 0;
59 }
60

This program samples from an exponential distribution using inverse transform sampling: if U ~ Uniform(0,1), then X = F^{-1}(U) = -ln(1-U)/λ has the desired distribution. We verify the mean and a tail probability against their theoretical values and report empirical variance.

Time: O(1) per sample; O(N) total for N samplesSpace: O(1)

Normal Sampling via Box–Muller and Histogram Approximation of the PDF

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct NormalBM {
5     double mu, sigma;
6     bool has_spare = false; // cache the second variate
7     double spare = 0.0;
8     explicit NormalBM(double m, double s) : mu(m), sigma(s) {
9         if (!(sigma > 0.0) || !isfinite(mu) || !isfinite(sigma)) throw runtime_error("Invalid parameters");
10     }
11     double sample(mt19937 &gen) {
12         if (has_spare) { has_spare = false; return mu + sigma * spare; }
13         uniform_real_distribution<double> U(0.0, 1.0);
14         double u1 = U(gen);
15         double u2 = U(gen);
16         // Avoid log(0)
17         if (u1 == 0.0) u1 = nextafter(0.0, 1.0);
18         double r = sqrt(-2.0 * log(u1));
19         double theta = 2.0 * M_PI * u2;
20         double z0 = r * cos(theta);
21         double z1 = r * sin(theta);
22         spare = z1; has_spare = true;
23         return mu + sigma * z0;
24     }
25     static double pdf(double x, double mu, double sigma) {
26         const double inv = 1.0 / (sigma * sqrt(2.0 * M_PI));
27         double z = (x - mu) / sigma;
28         return inv * exp(-0.5 * z * z);
29     }
30 };
31 
32 int main() {
33     ios::sync_with_stdio(false);
34     cin.tie(nullptr);
35 
36     double mu = 0.0, sigma = 1.0; // standard normal
37     NormalBM norm(mu, sigma);
38     random_device rd; mt19937 gen(rd());
39 
40     const int N = 200000; // samples
41     const int B = 21;     // histogram bins
42     double lo = mu - 4*sigma, hi = mu + 4*sigma;
43     double width = (hi - lo) / B;
44 
45     vector<int> hist(B, 0);
46     double sum = 0.0, sumsq = 0.0;
47 
48     for (int i = 0; i < N; ++i) {
49         double x = norm.sample(gen);
50         sum += x; sumsq += x * x;
51         if (x >= lo && x < hi) {
52             int b = static_cast<int>((x - lo) / width);
53             if (b == B) b = B - 1; // edge case
54             ++hist[b];
55         }
56     }
57 
58     double mean_emp = sum / N;
59     double var_emp = sumsq / N - mean_emp * mean_emp;
60 
61     cout << fixed << setprecision(6);
62     cout << "Empirical mean: " << mean_emp << ", variance: " << var_emp << " (theoretical: 0, 1)\n\n";
63 
64     cout << "bin_center\temp_pdf\tth_pdf\n";
65     for (int b = 0; b < B; ++b) {
66         double center = lo + (b + 0.5) * width;
67         double emp_pdf = static_cast<double>(hist[b]) / (N * width); // normalize counts to density
68         double th_pdf = NormalBM::pdf(center, mu, sigma);
69         cout << center << '\t' << emp_pdf << '\t' << th_pdf << '\n';
70     }
71 
72     return 0;
73 }
74

Box–Muller transforms two independent uniforms into two independent standard normal variables. We cache one variate for efficiency. A histogram normalized by bin width approximates the PDF; we print empirical vs. theoretical densities at bin centers and report empirical mean/variance.

Time: O(1) per sample; O(N + B) total for N samples and B binsSpace: O(B)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct DiscreteRV {
5	vector<double> prefix; // prefix[i] = P(X <= i)
6	explicit DiscreteRV(vector<double> probs) {
7	if (probs.empty()) throw runtime_error("Empty probability vector");
8	// Ensure non-negative and finite
9	for (double p : probs) {
10	if (!(p >= 0.0) \|\| !isfinite(p)) throw runtime_error("Invalid probability");
11	}
12	// Normalize to sum to 1 (robust against rounding)
13	long double sum = 0.0L;
14	for (double p : probs) sum += p;
15	if (sum <= 0.0L) throw runtime_error("Sum of probabilities must be positive");
16	vector<double> pn;
17	pn.reserve(probs.size());
18	for (double p : probs) pn.push_back(static_cast<double>(p / sum));
19	// Build prefix CDF; clamp the last entry to 1 exactly
20	prefix.resize(pn.size());
21	partial_sum(pn.begin(), pn.end(), prefix.begin());
22	prefix.back() = 1.0; // guard against tiny drift like 0.9999999997
23	}
24	// Return P(X = k)
25	double pmf(size_t k) const {
26	if (k >= prefix.size()) return 0.0;
27	double prev = (k == 0 ? 0.0 : prefix[k-1]);
28	return max(0.0, prefix[k] - prev);
29	}
30	// Return P(X <= k)
31	double cdf(size_t k) const {
32	if (k >= prefix.size()) return 1.0;
33	return prefix[k];
34	}
35	// Sample using u ~ Uniform[0,1)
36	size_t sample(mt19937 &gen) const {
37	uniform_real_distribution<double> U(0.0, 1.0);
38	double u = U(gen);
39	// upper_bound finds first prefix[i] > u, which is the correct index
40	auto it = upper_bound(prefix.begin(), prefix.end(), u);
41	size_t idx = static_cast<size_t>(it - prefix.begin());
42	if (idx >= prefix.size()) idx = prefix.size() - 1; // safety
43	return idx;
44	}
45	};
46
47	int main() {
48	ios::sync_with_stdio(false);
49	cin.tie(nullptr);
50
51	// Loaded die probabilities for faces 1..6
52	vector<double> probs = {0.05, 0.10, 0.20, 0.25, 0.25, 0.15};
53	DiscreteRV die(probs);
54
55	// RNG: seed once
56	random_device rd;
57	mt19937 gen(rd());
58
59	// Empirical test
60	const int N = 200000; // number of samples
61	vector<int> counts(6, 0);
62	for (int i = 0; i < N; ++i) {
63	size_t face_idx = die.sample(gen); // 0..5
64	++counts[face_idx];
65	}
66
67	cout << fixed << setprecision(4);
68	cout << "Face\tPMF(th)\tPMF(emp)\tCDF(th)\n";
69	for (size_t k = 0; k < 6; ++k) {
70	double pmf_th = die.pmf(k);
71	double pmf_emp = static_cast<double>(counts[k]) / N;
72	double cdf_th = die.cdf(k);
73	cout << (k+1) << '\t' << pmf_th << '\t' << pmf_emp << '\t' << cdf_th << "\n";
74	}
75
76	return 0;
77	}
78

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct ExponentialRV {
5	double lambda; // rate > 0
6	explicit ExponentialRV(double lam) : lambda(lam) {
7	if (!(lambda > 0.0) \|\| !isfinite(lambda)) throw runtime_error("lambda must be positive and finite");
8	}
9	// PDF and CDF
10	double pdf(double x) const { return (x < 0 ? 0.0 : lambda * exp(-lambda * x)); }
11	double cdf(double x) const { return (x < 0 ? 0.0 : 1.0 - exp(-lambda * x)); }
12	// Quantile (inverse CDF). Use 1-u to avoid log(0) when u=0.
13	double inv_cdf(double u) const {
14	if (!(u >= 0.0 && u <= 1.0)) throw runtime_error("u must be in [0,1]");
15	if (u == 1.0) return numeric_limits<double>::infinity();
16	return -log(1.0 - u) / lambda;
17	}
18	// Sample using inverse transform with U ~ Uniform(0,1)
19	double sample(mt19937 &gen) const {
20	uniform_real_distribution<double> U(0.0, 1.0);
21	double u = U(gen);
22	// ensure u in (0,1) to avoid returning -log(1) = -0
23	if (u == 0.0) u = nextafter(0.0, 1.0);
24	return inv_cdf(u);
25	}
26	};
27
28	int main() {
29	ios::sync_with_stdio(false);
30	cin.tie(nullptr);
31
32	double lambda = 2.0; // mean = 1/lambda = 0.5
33	ExponentialRV expo(lambda);
34
35	random_device rd;
36	mt19937 gen(rd());
37
38	const int N = 200000; // samples
39	double sum = 0.0, sumsq = 0.0;
40	int count_gt_1 = 0;
41	for (int i = 0; i < N; ++i) {
42	double x = expo.sample(gen);
43	sum += x;
44	sumsq += x * x;
45	if (x > 1.0) ++count_gt_1;
46	}
47
48	double mean_emp = sum / N;
49	double var_emp = sumsq / N - mean_emp * mean_emp;
50
51	cout << fixed << setprecision(6);
52	cout << "Theoretical mean: " << 1.0 / lambda << "\n";
53	cout << "Empirical mean: " << mean_emp << "\n";
54	cout << "Theoretical P(X>1): " << exp(-lambda * 1.0) << "\n";
55	cout << "Empirical P(X>1): " << static_cast<double>(count_gt_1) / N << "\n";
56	cout << "Empirical variance (for reference): " << var_emp << " (theoretical = " << 1.0/(lambda*lambda) << ")\n";
57
58	return 0;
59	}
60

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct NormalBM {
5	double mu, sigma;
6	bool has_spare = false; // cache the second variate
7	double spare = 0.0;
8	explicit NormalBM(double m, double s) : mu(m), sigma(s) {
9	if (!(sigma > 0.0) \|\| !isfinite(mu) \|\| !isfinite(sigma)) throw runtime_error("Invalid parameters");
10	}
11	double sample(mt19937 &gen) {
12	if (has_spare) { has_spare = false; return mu + sigma * spare; }
13	uniform_real_distribution<double> U(0.0, 1.0);
14	double u1 = U(gen);
15	double u2 = U(gen);
16	// Avoid log(0)
17	if (u1 == 0.0) u1 = nextafter(0.0, 1.0);
18	double r = sqrt(-2.0 * log(u1));
19	double theta = 2.0 * M_PI * u2;
20	double z0 = r * cos(theta);
21	double z1 = r * sin(theta);
22	spare = z1; has_spare = true;
23	return mu + sigma * z0;
24	}
25	static double pdf(double x, double mu, double sigma) {
26	const double inv = 1.0 / (sigma * sqrt(2.0 * M_PI));
27	double z = (x - mu) / sigma;
28	return inv * exp(-0.5 * z * z);
29	}
30	};
31
32	int main() {
33	ios::sync_with_stdio(false);
34	cin.tie(nullptr);
35
36	double mu = 0.0, sigma = 1.0; // standard normal
37	NormalBM norm(mu, sigma);
38	random_device rd; mt19937 gen(rd());
39
40	const int N = 200000; // samples
41	const int B = 21; // histogram bins
42	double lo = mu - 4sigma, hi = mu + 4sigma;
43	double width = (hi - lo) / B;
44
45	vector<int> hist(B, 0);
46	double sum = 0.0, sumsq = 0.0;
47
48	for (int i = 0; i < N; ++i) {
49	double x = norm.sample(gen);
50	sum += x; sumsq += x * x;
51	if (x >= lo && x < hi) {
52	int b = static_cast<int>((x - lo) / width);
53	if (b == B) b = B - 1; // edge case
54	++hist[b];
55	}
56	}
57
58	double mean_emp = sum / N;
59	double var_emp = sumsq / N - mean_emp * mean_emp;
60
61	cout << fixed << setprecision(6);
62	cout << "Empirical mean: " << mean_emp << ", variance: " << var_emp << " (theoretical: 0, 1)\n\n";
63
64	cout << "bin_center\temp_pdf\tth_pdf\n";
65	for (int b = 0; b < B; ++b) {
66	double center = lo + (b + 0.5) * width;
67	double emp_pdf = static_cast<double>(hist[b]) / (N * width); // normalize counts to density
68	double th_pdf = NormalBM::pdf(center, mu, sigma);
69	cout << center << '\t' << emp_pdf << '\t' << th_pdf << '\n';
70	}
71
72	return 0;
73	}
74