🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
⚙️AlgorithmIntermediate

Rejection Sampling

Key Points

  • •
    Rejection sampling draws from a hard target distribution by using an easier proposal and accepting with probability p(x)/(M q(x)).
  • •
    You must choose a proposal q(x) and a constant M such that p(x) ≤ M q(x) for every x in the domain.
  • •
    If p is normalized, the expected acceptance rate is 1/M; with unnormalized p~​, it is Z/M where Z = ∫ p~​(x) \, dx.
  • •
    Each accepted sample is exactly distributed as the target; there is no asymptotic bias.
  • •
    Efficiency depends critically on matching q to p, especially in the tails; a poor proposal makes M large and wastes samples.
  • •
    The expected number of proposal draws per accepted sample is M (or M/Z for unnormalized targets).
  • •
    Rejection sampling works in any dimension but acceptance rates usually drop with dimension unless the proposal closely wraps the target.
  • •
    In C++, compute acceptance probabilities safely (often in log-space), ensure M is a proven upper bound, and use high-quality RNGs.

Prerequisites

  • →Probability density functions (PDFs) and supports — Understanding p(x), q(x), and where they are nonzero is essential to set correct bounds and acceptance rules.
  • →Random number generation and distributions in C++ — You need to draw Uniform and proposal samples reliably using std::mt19937 and standard distributions.
  • →Calculus and integrals — Integral reasoning justifies acceptance rates and normalization constants.
  • →Logarithms and numerical stability — Log-space computations prevent underflow when densities are tiny.
  • →Geometric distribution — It models the number of proposals per accepted sample and informs expected runtime.
  • →Big-O notation — To reason about expected time and space complexity as a function of M and sample size.
  • →Tail behavior of distributions — Ensuring q has tails at least as heavy as p avoids infinite bounds.
  • →Monte Carlo methods — Places rejection sampling within the broader context of simulation and estimation.

Detailed Explanation

Tap terms for definitions

01Overview

Hook → Imagine throwing darts at a board where the shape you care about (your target) is a complicated region, but you can throw uniformly at a simple rectangle that fully covers it. You keep only the darts that land inside the shape. That is the essence of rejection sampling. Concept → Rejection sampling is a Monte Carlo technique to generate exact samples from a target probability density p(x) by using an easy-to-sample proposal q(x) and a constant M such that p(x) \le M q(x) for all x. The algorithm samples X from q and an independent U from Uniform(0,1); it accepts X if U \le p(X)/(M q(X)), otherwise it rejects and repeats. When p is known only up to a constant (unnormalized), you can still apply the method as long as you can bound the ratio by M. Example → To sample uniformly inside the unit circle, propose uniformly from the square [−1,1]^2 and accept if the point lies inside the circle (x^2 + y^2 \le 1). This is a special case where the acceptance test can be expressed purely in geometric terms, and the acceptance rate equals the area ratio \pi/4.

02Intuition & Analogies

Hook → Think of tracing a complex coastline on a map by placing a transparent grid over it. You randomly pick grid squares and keep only those that overlap land. Over many picks, the kept squares represent the land’s shape. Rejection sampling works similarly but with probabilities instead of land. Concept → We first cover the “shape” of the target distribution with an “envelope” made from the proposal distribution scaled by M. Because M q(x) lies everywhere above p(x), any point drawn from q has a fair chance to be accepted using the ratio p(x)/(M q(x)). The accept/reject step acts like checking whether a randomly thrown dart, drawn under the envelope curve, falls below the target curve at the same x. If yes, we keep it; otherwise, toss it and try again. Example → Picture p(x) as a mountain range and M q(x) as a taller, smoother ridge that completely covers it. You throw a vertical dart at a horizontal position X drawn from q. You then draw a uniform height U between 0 and M q(X). If the dart height is below the mountain p(X), you keep X. Over time, the kept X’s trace the mountain exactly, even though you never directly sampled from p. The better your envelope hugs the mountain (small M and similar shape/tails), the fewer darts you waste. If the envelope is too tall or misshapen, you spend lots of time throwing away darts, especially in high dimensions where most volume concentrates in counterintuitive ways.

03Formal Definition

Hook→We now shift from pictures to math so the guarantees are crystal clear. Concept→Let p(x) be a target density on a measurable space with support S. Let q(x) be a proposal density such that S ⊆ \{x: q(x) > 0\}, and choose M ≥ supx∈S​ q(x)p(x)​. The rejection sampler draws X ∼ q and U ∼ Uniform(0,1), independently, and accepts X if U ≤ Mq(X)p(X)​. If p is known only up to a constant, say p(x) = p~​(x)/Z with unknown Z, it suffices to ensure p~​(x) ≤ M q(x). Key properties → (1) Correctness: The conditional density of X given acceptance equals p(x). Indeed, Pr\{X ∈ dx, accept\} = q(x) ⋅ Mq(x)p(x)​ dx = Mp(x)​ dx, and dividing by the total acceptance probability yields p(x). (2) Average acceptance rate: if p is normalized, E[accept] = 1/M; for unnormalized p~​, it equals Z/M with Z = ∫ p~​(x) dx. (3) Runtime: The number of proposals until acceptance is geometric with success probability Z/M, so the expected number of proposals per accepted sample is M/Z. These facts hold in any dimension provided the bound M exists and q has support covering p.

04When to Use

Hook → If you can compute a density (maybe up to a constant) but cannot invert its CDF or run an efficient Markov chain, rejection sampling is often the fastest path to exact IID draws. Use cases →

  • Unnormalized targets: Bayesian posteriors p(\theta \mid y) known up to a constant; you can evaluate \tilde{p}(\theta) and find a proposal q with a provable bound M.
  • Geometric sampling: Uniformly sample complex shapes by enclosing them in simple shapes (e.g., circles within squares) and accepting points that fall inside.
  • Low to moderate dimensions: When a tightly fitting proposal is available so acceptance rates are reasonable (e.g., mixture proposals for multimodal targets).
  • As a subroutine: Use rejection sampling to build exact samplers for components inside larger algorithms (e.g., slice sampling updates, auxiliary-variable methods, or to sample from truncated distributions).
  • Prototyping and validation: It produces IID draws with no burn-in, making it a gold standard to validate approximate samplers (e.g., compare histograms against MCMC outputs on small problems). Avoid when → In high dimensions with poor envelopes (M huge), when the tails of q are lighter than p (ratio unbounded), or when finding a rigorous M is difficult.

⚠️Common Mistakes

Hook → Most failures trace back to an unsafe bound or a mismatched proposal. Common pitfalls →

  • Missing or invalid bound M: Using an M that does not satisfy p(x) \le M q(x) everywhere biases the output. Fix by proving the bound analytically or adding a conservative safety margin; never rely solely on a coarse numeric check.
  • Tail mismatch: If q has lighter tails than p, the ratio p(x)/q(x) can blow up in the tails, making M infinite. Always choose proposals with equal or heavier tails.
  • Ignoring support: Proposing outside the support of p without handling it can waste time or cause division by zero. Ensure q(x) > 0 wherever p(x) > 0.
  • Numerical underflow/overflow: Computing p(X)/(M q(X)) directly can underflow for small probabilities. Compare logs: accept if \log U \le \log \tilde{p}(X) - \log M - \log q(X).
  • Poor envelope fit: A very large M leads to low acceptance and slow code. Improve q (e.g., mixtures, scaling) or split the domain into regions with separate bounds.
  • RNG misuse: Re-seeding per draw or using low-quality RNGs ruins reproducibility and statistical quality. Use std::mt19937 or better, seeded once, and keep distributions outside hot loops.

Key Formulas

Acceptance rule

Accept X∼q if U≤Mq(X)p(X)​,U∼Uniform(0,1)

Explanation: This is the core test in rejection sampling. The proposal X is kept with probability equal to the ratio of target to envelope at X.

Envelope bound

M≥x∈Ssup​q(x)p(x)​

Explanation: The scaled proposal M q(x) must dominate the target everywhere. Proving or safely overestimating M is essential for correctness.

Acceptance rate (normalized)

Pr(accept)=∫q(x)⋅Mq(x)p(x)​dx=M1​

Explanation: When p is a proper density (integrates to 1), the expected acceptance probability is 1/M. Smaller M means more accepted samples.

Acceptance rate (unnormalized)

Pr(accept)=MZ​,Z=∫p~​(x)dx

Explanation: If you only know p up to a constant, the average acceptance is Z/M. You do not need to know Z to run the algorithm, only to predict efficiency.

Joint density of accepted proposals

Pr{X∈dx,accept}=q(x)⋅Mq(x)p(x)​dx=Mp(x)​dx

Explanation: The joint distribution of a proposed x and acceptance is proportional to p(x). Conditioning on acceptance yields the exact target density p(x).

Expected trials per acceptance

E[N]=Pr(accept)1​=ZM​

Explanation: The number of proposals until the first acceptance is geometric with success probability Z/M. This quantifies the expected computational cost per sample.

Log-space acceptance test

logU≤logp~​(X)−logM−logq(X)

Explanation: Using logs avoids numerical underflow when p and q are very small. It is algebraically equivalent to the standard acceptance rule.

Variance scaling with acceptance

Var(μ^​)≈n⋅(Z/M)σ2​

Explanation: If each accepted sample has variance σ2, then to obtain n accepted samples you need about nM/Z proposals. Computational variance reduces with higher acceptance.

Complexity Analysis

Let p be the target density and q the proposal with envelope constant M such that p(x) ≤ M q(x) for all x. Each rejection trial requires O(1) time to: (a) draw one proposal X from q, (b) draw one uniform U, and (c) evaluate p(X) and q(X) (or their unnormalized/log forms). The acceptance probability is Z/M where Z = ∫ p~​(x) dx (Z=1 if p is normalized). Therefore, the number of proposals N per accepted sample is geometric with mean M/Z. The expected time to obtain a single accepted sample is OZM​, and for n accepted samples the expected time is O(ZM​ · n). If evaluating p and q has additional cost Tp​ and Tq​, the expected time becomes O((Tp​ + Tq​) · M/Z) per accepted sample. Space usage is minimal: O(1) per thread for RNG state and temporary variables. If you store all n accepted samples in memory, space becomes O(n). The algorithm parallelizes trivially across cores because each sample is independent and uses independent RNG streams. The main efficiency lever is the choice of q and M. A tight envelope (small M) and a proposal that mirrors the target’s shape and tails improve acceptance. In high dimensions, naive envelopes often lead to exponential growth of M (e.g., covering a hypersphere with a hypercube), making the method impractical; using adaptive or mixture proposals can mitigate this but at the cost of more complex bound derivations. Numerical stability should be handled via log-space computations when densities are very small to avoid underflow.

Code Examples

Sampling a triangular density p(x)=2x on [0,1] using a uniform proposal
1#include <bits/stdc++.h>
2using namespace std;
3
4// Target: p(x) = 2x on [0,1], normalized
5// Proposal: q(x) = 1 on [0,1] (Uniform), easy to sample
6// Envelope: M = sup_x p(x)/q(x) = 2
7
8int main() {
9 ios::sync_with_stdio(false);
10 cin.tie(nullptr);
11
12 std::mt19937 rng(12345); // Fixed seed for reproducibility
13 std::uniform_real_distribution<double> U01(0.0, 1.0);
14
15 const double M = 2.0; // Envelope constant
16 const size_t N = 200000; // Number of accepted samples desired
17
18 vector<double> samples; samples.reserve(N);
19 size_t proposals = 0;
20 while (samples.size() < N) {
21 double x = U01(rng); // X ~ q (Uniform[0,1])
22 double u = U01(rng); // U ~ Uniform[0,1]
23 double px = 2.0 * x; // p(x)
24 double qx = 1.0; // q(x)
25 double accept_prob = px / (M * qx); // = x, since M=2
26 proposals++;
27 if (u <= accept_prob) {
28 samples.push_back(x);
29 }
30 }
31
32 // Diagnostic: empirical mean should be E[X] = 2/3 for p(x)=2x
33 double mean = accumulate(samples.begin(), samples.end(), 0.0) / samples.size();
34 cout << fixed << setprecision(6);
35 cout << "Accepted: " << samples.size() << "/" << proposals
36 << " (rate = " << (double)samples.size()/proposals << ")\n";
37 cout << "Sample mean ~ " << mean << " (theory 2/3 = 0.666667)\n";
38
39 return 0;
40}
41

We sample from the triangular density p(x)=2x by proposing from Uniform(0,1) and accepting with probability x. The bound M=2 ensures p(x) ≤ M q(x) for all x. The acceptance rate should approach 1/M = 0.5, and the empirical mean should be near 2/3, verifying correctness.

Time: O(n) expected for n accepted samples (expected 2 proposals per acceptance)Space: O(n) to store samples, O(1) otherwise
Uniform points in the unit disk via square proposals (2D geometric rejection)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Target: Uniform on the unit disk {(x,y): x^2 + y^2 <= 1}
5// Proposal: Uniform on square [-1,1] x [-1,1]
6// Envelope: M = area(square)/area(circle) = 4 / pi; acceptance test reduces to x^2 + y^2 <= 1
7
8int main(){
9 ios::sync_with_stdio(false);
10 cin.tie(nullptr);
11
12 std::mt19937 rng(2024);
13 std::uniform_real_distribution<double> U(-1.0, 1.0);
14
15 const size_t N = 200000; // desired accepted points
16 size_t accepted = 0, proposals = 0;
17 vector<pair<double,double>> pts; pts.reserve(N);
18
19 while (accepted < N) {
20 double x = U(rng);
21 double y = U(rng);
22 proposals++;
23 if (x*x + y*y <= 1.0) { // accept if inside circle
24 pts.emplace_back(x,y);
25 accepted++;
26 }
27 }
28
29 cout << fixed << setprecision(6);
30 double rate = (double)accepted / proposals; // should be close to pi/4
31 cout << "Accepted: " << accepted << "/" << proposals
32 << " (rate ~ " << rate << ") vs theory pi/4 = " << (M_PI/4.0) << "\n";
33
34 // Optional: estimate pi via 4 * acceptance_rate
35 cout << "Estimated pi ~ " << 4.0 * rate << "\n";
36
37 return 0;
38}
39

This classic geometric rejection sampler proposes uniformly over a square and accepts points inside the unit circle. The acceptance rate approximates π/4, and all accepted points are IID uniform on the disk. It illustrates a clean case where the acceptance test reduces to a simple geometric predicate.

Time: O(n) expected for n accepted points (expected 4/π ≈ 1.273 proposals per acceptance)Space: O(n) to store points, O(1) otherwise
Generic 1D rejection sampler (log-safe) sampling from \u007e exp(-x^4) using a Normal proposal
1#include <bits/stdc++.h>
2using namespace std;
3
4// Goal: Sample from target with unnormalized density \tilde{p}(x) = exp(-x^4) on R.
5// Proposal: q(x) = Normal(0, 1) with density phi(x) = (1/sqrt(2*pi)) * exp(-x^2/2).
6// We need M >= sup_x \tilde{p}(x) / q(x). Analytic bound is possible but here we numerically
7// overestimate on a wide grid and add a generous safety margin. WARNING: In production,
8// prefer a proven bound; numerical search can under-estimate if the grid is too coarse.
9
10struct NormalPDF {
11 // log density of N(0,1)
12 static double logpdf(double x) {
13 static const double LOG_SQRT_2PI = 0.5*log(2.0*M_PI);
14 return -LOG_SQRT_2PI - 0.5*x*x;
15 }
16};
17
18// Unnormalized target log-density: log tilde{p}(x) = -x^4
19static inline double log_tilde_p(double x) { return -(x*x)*(x*x); }
20
21int main(){
22 ios::sync_with_stdio(false);
23 cin.tie(nullptr);
24
25 std::mt19937 rng(7);
26 std::normal_distribution<double> N01(0.0, 1.0);
27 std::uniform_real_distribution<double> U01(0.0, 1.0);
28
29 // 1) Find a conservative M by scanning r(x) = tilde{p}(x)/q(x) over [-R, R]
30 // The true sup occurs near x=0 for this pair; tails are safe since exp(-x^4) decays faster than Gaussian.
31 const double R = 6.0; // search radius
32 const int G = 200000; // grid points
33 double max_log_r = -numeric_limits<double>::infinity();
34 for (int i = 0; i <= G; ++i) {
35 double x = -R + (2.0*R)*i / G;
36 double lr = log_tilde_p(x) - NormalPDF::logpdf(x); // log ratio
37 if (lr > max_log_r) max_log_r = lr;
38 }
39 double M = exp(max_log_r) * 1.10; // add 10% safety margin
40
41 cerr << fixed << setprecision(6);
42 cerr << "Estimated M (with margin): " << M << "\n";
43
44 // 2) Draw samples via rejection using a log-safe acceptance test
45 const size_t N = 100000; // number of accepted samples desired
46 vector<double> xs; xs.reserve(N);
47 size_t proposals = 0;
48
49 while (xs.size() < N) {
50 double x = N01(rng); // X ~ q
51 double u = U01(rng); // U ~ Uniform(0,1)
52 // log acceptance = log tilde{p}(x) - log M - log q(x)
53 double log_alpha = log_tilde_p(x) - log(M) - NormalPDF::logpdf(x);
54 proposals++;
55 if (log(u) <= log_alpha) {
56 xs.push_back(x);
57 }
58 }
59
60 // Diagnostics: symmetry implies mean ~ 0; variance is finite (~0.78 for this target)
61 double mean = accumulate(xs.begin(), xs.end(), 0.0) / xs.size();
62 double var = 0.0; for (double v : xs) var += (v-mean)*(v-mean); var /= xs.size();
63
64 cout << fixed << setprecision(6);
65 cout << "Accepted: " << xs.size() << "/" << proposals
66 << " (rate = " << (double)xs.size()/proposals << ")\n";
67 cout << "Mean ~ " << mean << ", Variance ~ " << var << "\n";
68
69 return 0;
70}
71

We target the unnormalized density proportional to exp(−x^4), propose from N(0,1), and compute a conservative bound M by scanning the log-ratio on a wide grid with a safety margin. Acceptance uses a log-space test to avoid underflow. This illustrates a practical, reusable pattern: provide target log-density, proposal sampler and log-density, choose M, then loop until enough samples are accepted. Note the caution that numeric bounds should be validated or analytically justified in rigorous settings.

Time: O(n) expected for n accepted samples, with a constant factor ≈ M/Z proposals per sample; plus one-time O(G) grid scanSpace: O(n) to store samples, O(1) otherwise
#rejection sampling#accept-reject#proposal distribution#envelope bound#acceptance rate#unnormalized density#monte carlo#importance sampling#metropolis-hastings#tail behavior#geometric distribution#random number generation#log-space#iid samples