Rejection Sampling
Key Points
- •Rejection sampling draws from a hard target distribution by using an easier proposal and accepting with probability p(x)/(M q(x)).
- •You must choose a proposal q(x) and a constant M such that p(x) M q(x) for every x in the domain.
- •If p is normalized, the expected acceptance rate is 1/M; with unnormalized , it is Z/M where Z = (x) \, dx.
- •Each accepted sample is exactly distributed as the target; there is no asymptotic bias.
- •Efficiency depends critically on matching q to p, especially in the tails; a poor proposal makes M large and wastes samples.
- •The expected number of proposal draws per accepted sample is M (or M/Z for unnormalized targets).
- •Rejection sampling works in any dimension but acceptance rates usually drop with dimension unless the proposal closely wraps the target.
- •In C++, compute acceptance probabilities safely (often in log-space), ensure M is a proven upper bound, and use high-quality RNGs.
Prerequisites
- →Probability density functions (PDFs) and supports — Understanding p(x), q(x), and where they are nonzero is essential to set correct bounds and acceptance rules.
- →Random number generation and distributions in C++ — You need to draw Uniform and proposal samples reliably using std::mt19937 and standard distributions.
- →Calculus and integrals — Integral reasoning justifies acceptance rates and normalization constants.
- →Logarithms and numerical stability — Log-space computations prevent underflow when densities are tiny.
- →Geometric distribution — It models the number of proposals per accepted sample and informs expected runtime.
- →Big-O notation — To reason about expected time and space complexity as a function of M and sample size.
- →Tail behavior of distributions — Ensuring q has tails at least as heavy as p avoids infinite bounds.
- →Monte Carlo methods — Places rejection sampling within the broader context of simulation and estimation.
Detailed Explanation
Tap terms for definitions01Overview
Hook → Imagine throwing darts at a board where the shape you care about (your target) is a complicated region, but you can throw uniformly at a simple rectangle that fully covers it. You keep only the darts that land inside the shape. That is the essence of rejection sampling. Concept → Rejection sampling is a Monte Carlo technique to generate exact samples from a target probability density p(x) by using an easy-to-sample proposal q(x) and a constant M such that p(x) \le M q(x) for all x. The algorithm samples X from q and an independent U from Uniform(0,1); it accepts X if U \le p(X)/(M q(X)), otherwise it rejects and repeats. When p is known only up to a constant (unnormalized), you can still apply the method as long as you can bound the ratio by M. Example → To sample uniformly inside the unit circle, propose uniformly from the square [−1,1]^2 and accept if the point lies inside the circle (x^2 + y^2 \le 1). This is a special case where the acceptance test can be expressed purely in geometric terms, and the acceptance rate equals the area ratio \pi/4.
02Intuition & Analogies
Hook → Think of tracing a complex coastline on a map by placing a transparent grid over it. You randomly pick grid squares and keep only those that overlap land. Over many picks, the kept squares represent the land’s shape. Rejection sampling works similarly but with probabilities instead of land. Concept → We first cover the “shape” of the target distribution with an “envelope” made from the proposal distribution scaled by M. Because M q(x) lies everywhere above p(x), any point drawn from q has a fair chance to be accepted using the ratio p(x)/(M q(x)). The accept/reject step acts like checking whether a randomly thrown dart, drawn under the envelope curve, falls below the target curve at the same x. If yes, we keep it; otherwise, toss it and try again. Example → Picture p(x) as a mountain range and M q(x) as a taller, smoother ridge that completely covers it. You throw a vertical dart at a horizontal position X drawn from q. You then draw a uniform height U between 0 and M q(X). If the dart height is below the mountain p(X), you keep X. Over time, the kept X’s trace the mountain exactly, even though you never directly sampled from p. The better your envelope hugs the mountain (small M and similar shape/tails), the fewer darts you waste. If the envelope is too tall or misshapen, you spend lots of time throwing away darts, especially in high dimensions where most volume concentrates in counterintuitive ways.
03Formal Definition
04When to Use
Hook → If you can compute a density (maybe up to a constant) but cannot invert its CDF or run an efficient Markov chain, rejection sampling is often the fastest path to exact IID draws. Use cases →
- Unnormalized targets: Bayesian posteriors p(\theta \mid y) known up to a constant; you can evaluate \tilde{p}(\theta) and find a proposal q with a provable bound M.
- Geometric sampling: Uniformly sample complex shapes by enclosing them in simple shapes (e.g., circles within squares) and accepting points that fall inside.
- Low to moderate dimensions: When a tightly fitting proposal is available so acceptance rates are reasonable (e.g., mixture proposals for multimodal targets).
- As a subroutine: Use rejection sampling to build exact samplers for components inside larger algorithms (e.g., slice sampling updates, auxiliary-variable methods, or to sample from truncated distributions).
- Prototyping and validation: It produces IID draws with no burn-in, making it a gold standard to validate approximate samplers (e.g., compare histograms against MCMC outputs on small problems). Avoid when → In high dimensions with poor envelopes (M huge), when the tails of q are lighter than p (ratio unbounded), or when finding a rigorous M is difficult.
⚠️Common Mistakes
Hook → Most failures trace back to an unsafe bound or a mismatched proposal. Common pitfalls →
- Missing or invalid bound M: Using an M that does not satisfy p(x) \le M q(x) everywhere biases the output. Fix by proving the bound analytically or adding a conservative safety margin; never rely solely on a coarse numeric check.
- Tail mismatch: If q has lighter tails than p, the ratio p(x)/q(x) can blow up in the tails, making M infinite. Always choose proposals with equal or heavier tails.
- Ignoring support: Proposing outside the support of p without handling it can waste time or cause division by zero. Ensure q(x) > 0 wherever p(x) > 0.
- Numerical underflow/overflow: Computing p(X)/(M q(X)) directly can underflow for small probabilities. Compare logs: accept if \log U \le \log \tilde{p}(X) - \log M - \log q(X).
- Poor envelope fit: A very large M leads to low acceptance and slow code. Improve q (e.g., mixtures, scaling) or split the domain into regions with separate bounds.
- RNG misuse: Re-seeding per draw or using low-quality RNGs ruins reproducibility and statistical quality. Use std::mt19937 or better, seeded once, and keep distributions outside hot loops.
Key Formulas
Acceptance rule
Explanation: This is the core test in rejection sampling. The proposal X is kept with probability equal to the ratio of target to envelope at X.
Envelope bound
Explanation: The scaled proposal M q(x) must dominate the target everywhere. Proving or safely overestimating M is essential for correctness.
Acceptance rate (normalized)
Explanation: When p is a proper density (integrates to 1), the expected acceptance probability is 1/M. Smaller M means more accepted samples.
Acceptance rate (unnormalized)
Explanation: If you only know p up to a constant, the average acceptance is Z/M. You do not need to know Z to run the algorithm, only to predict efficiency.
Joint density of accepted proposals
Explanation: The joint distribution of a proposed x and acceptance is proportional to p(x). Conditioning on acceptance yields the exact target density p(x).
Expected trials per acceptance
Explanation: The number of proposals until the first acceptance is geometric with success probability Z/M. This quantifies the expected computational cost per sample.
Log-space acceptance test
Explanation: Using logs avoids numerical underflow when p and q are very small. It is algebraically equivalent to the standard acceptance rule.
Variance scaling with acceptance
Explanation: If each accepted sample has variance , then to obtain n accepted samples you need about nM/Z proposals. Computational variance reduces with higher acceptance.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Target: p(x) = 2x on [0,1], normalized 5 // Proposal: q(x) = 1 on [0,1] (Uniform), easy to sample 6 // Envelope: M = sup_x p(x)/q(x) = 2 7 8 int main() { 9 ios::sync_with_stdio(false); 10 cin.tie(nullptr); 11 12 std::mt19937 rng(12345); // Fixed seed for reproducibility 13 std::uniform_real_distribution<double> U01(0.0, 1.0); 14 15 const double M = 2.0; // Envelope constant 16 const size_t N = 200000; // Number of accepted samples desired 17 18 vector<double> samples; samples.reserve(N); 19 size_t proposals = 0; 20 while (samples.size() < N) { 21 double x = U01(rng); // X ~ q (Uniform[0,1]) 22 double u = U01(rng); // U ~ Uniform[0,1] 23 double px = 2.0 * x; // p(x) 24 double qx = 1.0; // q(x) 25 double accept_prob = px / (M * qx); // = x, since M=2 26 proposals++; 27 if (u <= accept_prob) { 28 samples.push_back(x); 29 } 30 } 31 32 // Diagnostic: empirical mean should be E[X] = 2/3 for p(x)=2x 33 double mean = accumulate(samples.begin(), samples.end(), 0.0) / samples.size(); 34 cout << fixed << setprecision(6); 35 cout << "Accepted: " << samples.size() << "/" << proposals 36 << " (rate = " << (double)samples.size()/proposals << ")\n"; 37 cout << "Sample mean ~ " << mean << " (theory 2/3 = 0.666667)\n"; 38 39 return 0; 40 } 41
We sample from the triangular density p(x)=2x by proposing from Uniform(0,1) and accepting with probability x. The bound M=2 ensures p(x) ≤ M q(x) for all x. The acceptance rate should approach 1/M = 0.5, and the empirical mean should be near 2/3, verifying correctness.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Target: Uniform on the unit disk {(x,y): x^2 + y^2 <= 1} 5 // Proposal: Uniform on square [-1,1] x [-1,1] 6 // Envelope: M = area(square)/area(circle) = 4 / pi; acceptance test reduces to x^2 + y^2 <= 1 7 8 int main(){ 9 ios::sync_with_stdio(false); 10 cin.tie(nullptr); 11 12 std::mt19937 rng(2024); 13 std::uniform_real_distribution<double> U(-1.0, 1.0); 14 15 const size_t N = 200000; // desired accepted points 16 size_t accepted = 0, proposals = 0; 17 vector<pair<double,double>> pts; pts.reserve(N); 18 19 while (accepted < N) { 20 double x = U(rng); 21 double y = U(rng); 22 proposals++; 23 if (x*x + y*y <= 1.0) { // accept if inside circle 24 pts.emplace_back(x,y); 25 accepted++; 26 } 27 } 28 29 cout << fixed << setprecision(6); 30 double rate = (double)accepted / proposals; // should be close to pi/4 31 cout << "Accepted: " << accepted << "/" << proposals 32 << " (rate ~ " << rate << ") vs theory pi/4 = " << (M_PI/4.0) << "\n"; 33 34 // Optional: estimate pi via 4 * acceptance_rate 35 cout << "Estimated pi ~ " << 4.0 * rate << "\n"; 36 37 return 0; 38 } 39
This classic geometric rejection sampler proposes uniformly over a square and accepts points inside the unit circle. The acceptance rate approximates π/4, and all accepted points are IID uniform on the disk. It illustrates a clean case where the acceptance test reduces to a simple geometric predicate.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Goal: Sample from target with unnormalized density \tilde{p}(x) = exp(-x^4) on R. 5 // Proposal: q(x) = Normal(0, 1) with density phi(x) = (1/sqrt(2*pi)) * exp(-x^2/2). 6 // We need M >= sup_x \tilde{p}(x) / q(x). Analytic bound is possible but here we numerically 7 // overestimate on a wide grid and add a generous safety margin. WARNING: In production, 8 // prefer a proven bound; numerical search can under-estimate if the grid is too coarse. 9 10 struct NormalPDF { 11 // log density of N(0,1) 12 static double logpdf(double x) { 13 static const double LOG_SQRT_2PI = 0.5*log(2.0*M_PI); 14 return -LOG_SQRT_2PI - 0.5*x*x; 15 } 16 }; 17 18 // Unnormalized target log-density: log tilde{p}(x) = -x^4 19 static inline double log_tilde_p(double x) { return -(x*x)*(x*x); } 20 21 int main(){ 22 ios::sync_with_stdio(false); 23 cin.tie(nullptr); 24 25 std::mt19937 rng(7); 26 std::normal_distribution<double> N01(0.0, 1.0); 27 std::uniform_real_distribution<double> U01(0.0, 1.0); 28 29 // 1) Find a conservative M by scanning r(x) = tilde{p}(x)/q(x) over [-R, R] 30 // The true sup occurs near x=0 for this pair; tails are safe since exp(-x^4) decays faster than Gaussian. 31 const double R = 6.0; // search radius 32 const int G = 200000; // grid points 33 double max_log_r = -numeric_limits<double>::infinity(); 34 for (int i = 0; i <= G; ++i) { 35 double x = -R + (2.0*R)*i / G; 36 double lr = log_tilde_p(x) - NormalPDF::logpdf(x); // log ratio 37 if (lr > max_log_r) max_log_r = lr; 38 } 39 double M = exp(max_log_r) * 1.10; // add 10% safety margin 40 41 cerr << fixed << setprecision(6); 42 cerr << "Estimated M (with margin): " << M << "\n"; 43 44 // 2) Draw samples via rejection using a log-safe acceptance test 45 const size_t N = 100000; // number of accepted samples desired 46 vector<double> xs; xs.reserve(N); 47 size_t proposals = 0; 48 49 while (xs.size() < N) { 50 double x = N01(rng); // X ~ q 51 double u = U01(rng); // U ~ Uniform(0,1) 52 // log acceptance = log tilde{p}(x) - log M - log q(x) 53 double log_alpha = log_tilde_p(x) - log(M) - NormalPDF::logpdf(x); 54 proposals++; 55 if (log(u) <= log_alpha) { 56 xs.push_back(x); 57 } 58 } 59 60 // Diagnostics: symmetry implies mean ~ 0; variance is finite (~0.78 for this target) 61 double mean = accumulate(xs.begin(), xs.end(), 0.0) / xs.size(); 62 double var = 0.0; for (double v : xs) var += (v-mean)*(v-mean); var /= xs.size(); 63 64 cout << fixed << setprecision(6); 65 cout << "Accepted: " << xs.size() << "/" << proposals 66 << " (rate = " << (double)xs.size()/proposals << ")\n"; 67 cout << "Mean ~ " << mean << ", Variance ~ " << var << "\n"; 68 69 return 0; 70 } 71
We target the unnormalized density proportional to exp(−x^4), propose from N(0,1), and compute a conservative bound M by scanning the log-ratio on a wide grid with a safety margin. Acceptance uses a log-space test to avoid underflow. This illustrates a practical, reusable pattern: provide target log-density, proposal sampler and log-density, choose M, then loop until enough samples are accepted. Note the caution that numeric bounds should be validated or analytically justified in rigorous settings.