Metropolis-Hastings Algorithm
Key Points
- •Metropolis–Hastings is a clever accept/reject method that lets you sample from complex probability distributions using only an unnormalized density.
- •You propose a move using an easier distribution q and accept it with probability α = min(1, q(xx))).
- •If the proposal is symmetric (q(x'x')), the acceptance ratio simplifies to α = min(1,
- •Using log-densities prevents numerical underflow and makes the algorithm more stable in practice.
- •The method constructs a Markov chain whose stationary distribution is the target thanks to detailed balance.
- •Tuning the proposal scale is crucial: too small gives slow exploration; too large gives low acceptance.
- •Burn-in, thinning, and monitoring autocorrelation help turn the raw chain into reliable estimates.
- •Time per step is dominated by evaluating the target (and proposal), giving overall time roughly O(T) for T iterations, but effective sample size depends on mixing.
Prerequisites
- →Probability densities and mass functions — You must understand what a distribution is, how to evaluate it (possibly up to a constant), and the meaning of support.
- →Markov chains — MH constructs a Markov chain; concepts like transition kernels, stationary distribution, and ergodicity are essential.
- →Logarithms and exponential functions — Acceptance probabilities are computed in the log domain to avoid numerical issues.
- →Random number generation — You need to sample from proposals and uniforms for accept/reject; understanding RNG seeding ensures reproducibility.
- →Basic calculus and linear algebra (for multivariate targets) — Evaluating multivariate densities and tuning covariance structures often requires this background.
- →Bayes’ theorem (for Bayesian applications) — Posterior targets are defined by likelihood × prior up to a normalizing constant.
Detailed Explanation
Tap terms for definitions01Overview
The Metropolis–Hastings (MH) algorithm is a foundational technique in Markov Chain Monte Carlo (MCMC) used to draw samples from probability distributions that are hard to sample from directly. Instead of generating independent samples, MH builds a Markov chain whose long-run behavior (its stationary distribution) matches the target distribution π you care about—often known only up to a constant of proportionality. At each step, the algorithm proposes a new point using a simpler distribution q, then decides whether to accept or reject the move using an acceptance probability chosen to make the chain spend time in regions proportional to π. This accept/reject rule corrects for any bias introduced by the proposal mechanism, allowing even asymmetric proposals to work correctly. Because you only need to evaluate the target density up to a constant, MH is widely used in Bayesian statistics, physics, and optimization-like settings where normalizing constants are unknown or expensive to compute. While conceptually simple, practical performance depends on careful tuning of the proposal distribution and diagnostics to assess convergence and mixing. MH’s flexibility allows it to tackle high-dimensional and multi-modal targets, though alternative MCMC methods may outperform it in some challenging scenarios.
02Intuition & Analogies
Imagine hiking a landscape at night with a headlamp. The landscape’s height at position x represents how plausible that x is under the target distribution π(x)—higher means more likely. You can’t see the entire map, and you don’t know how it’s globally normalized, but you can measure the relative height at any single spot you shine your light on (evaluate π(x) up to a constant). At each step, you suggest a nearby place to move (the proposal q), like taking a stride of a certain length in a random direction. If the new place is higher (more plausible), you gladly move there. If it’s lower, you sometimes still move there, with a chance that depends on how much lower it is. This occasional willingness to go downhill is critical: it keeps you from getting stuck on local peaks and ensures fair exploration. The accept/reject rule is like a fairness auditor—it adjusts your willingness to move based on how you chose the step. If your stepping rule is biased (asymmetric), the auditor compensates so your long-term walk still spends the right fraction of time in each region of the landscape. Over time, your footsteps trace out a path that visits places in proportion to their plausibility under π. Averaging your visited points (or functions of them) then gives accurate estimates of expectations with respect to the target distribution, without ever needing its normalizing constant.
03Formal Definition
04When to Use
Use Metropolis–Hastings when you need samples from a distribution that is hard to sample from directly but is easy to evaluate up to a constant. Common cases include Bayesian posterior distributions where the evidence (normalizing constant) is unknown, models with complex likelihoods, and high-dimensional integrals where quadrature is impossible. Choose MH when you can design a reasonable proposal q: for local exploration, a random-walk proposal is simple and robust; for heavy-tailed or constrained domains (e.g., positive parameters), asymmetric proposals like log-normal or Gamma proposals can be better. MH is also useful as a building block inside larger samplers (e.g., within Gibbs steps) or for discrete targets where neighborhood proposals are natural. Avoid plain random-walk MH when the target is very high-dimensional and strongly correlated—gradient-based samplers (e.g., Hamiltonian Monte Carlo) or adaptive/blocked proposals may mix faster. For multi-modal targets with widely separated modes, consider tempered transitions or multiple chains to improve exploration. Always plan for burn-in, tuning, and convergence diagnostics when using MH in practice.
⚠️Common Mistakes
- Ignoring proposal asymmetry: Forgetting the q(x|x')/q(x'|x) factor leads to biased sampling. Always include the Hastings correction unless the proposal is provably symmetric.
- Working in probability instead of log-probability: Multiplying tiny numbers underflows to zero. Compute in the log domain and compare log u with log α to avoid numerical issues.
- Poor proposal scale: Steps that are too small give high acceptance but slow exploration; too large give low acceptance and frequent rejections. Tune the scale to achieve reasonable acceptance (often 0.2–0.5 for random-walk proposals, problem-dependent).
- Violating support constraints: Proposing negative values for parameters that must be positive (e.g., variances) leads to invalid evaluations. Use constrained proposals (e.g., log-space random walk) or reflect/reject invalid proposals carefully.
- Insufficient burn-in and diagnostics: Using early samples before the chain forgets its start can bias estimates. Discard burn-in, check trace plots and autocorrelation, and run multiple chains if possible.
- Storing every sample: Keeping all T states can be memory-heavy. Either stream estimates online or thin judiciously; better yet, store summaries and periodic checkpoints.
- Bad seeding or reused RNG streams: For reproducibility and independence across runs, manage seeds and do not reuse overlapping random streams.
- Not checking for NaN/Inf: Propagating invalid log-probabilities silently breaks the chain. Guard against domain errors and return -inf for impossible states.
Key Formulas
Metropolis–Hastings Acceptance
Explanation: The probability of accepting a proposed move from x to x' depends on how well x' fits the target and how the proposal treats forward and reverse moves. This corrects for asymmetric proposals to ensure the Markov chain has the desired stationary distribution.
Symmetric Proposal Simplification
Explanation: When the proposal is symmetric (e.g., Gaussian random walk), the q terms cancel. Acceptance depends only on the target density ratio.
MH Transition Kernel
Explanation: The next state is either the proposed y with acceptance α or stays at x with the remaining probability r(x). This kernel defines the Markov chain.
Detailed Balance for MH
Explanation: This equality ensures reversibility and guarantees that π is stationary for the Metropolis–Hastings chain.
Stationarity
Explanation: Applying the transition kernel to a state leaves the distribution unchanged. This is the key correctness property of MH.
Log-Domain Acceptance
Explanation: Computing in the log domain avoids underflow/overflow when densities are very small. You can compare log u with log α instead of forming α directly.
MCMC Law of Large Numbers
Explanation: Sample averages along the chain converge almost surely to the true expectation under assuming ergodicity. This justifies using MH samples for estimation.
Effective Sample Size
Explanation: Autocorrelation reduces effective information. This approximation shows how lag-k autocorrelations ρ_k inflate variance and shrink the equivalent number of independent samples.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute log-sum-exp of two log-values safely 5 static inline double logsumexp(double a, double b) { 6 if (a > b) return a + log1p(exp(b - a)); 7 else return b + log1p(exp(a - b)); 8 } 9 10 // Log-density of Normal(mu, sigma) 11 static inline double log_normal(double x, double mu, double sigma) { 12 static const double LOG_SQRT_2PI = 0.5 * log(2.0 * M_PI); 13 double z = (x - mu) / sigma; 14 return -LOG_SQRT_2PI - log(sigma) - 0.5 * z * z; // exact log-pdf 15 } 16 17 // Unnormalized log target: mixture 0.5*N(0,1) + 0.5*N(5, 0.5) 18 // We include proper component normalizing constants so weights are meaningful. 19 static inline double log_target(double x) { 20 double a = log(0.5) + log_normal(x, 0.0, 1.0); 21 double b = log(0.5) + log_normal(x, 5.0, 0.5); 22 return logsumexp(a, b); // log of sum of weighted components 23 } 24 25 struct MHResult { vector<double> samples; double accept_rate; double mean; double var; }; 26 27 MHResult metropolis_random_walk( 28 double x0, // initial value 29 int iterations, // total iterations 30 int burn_in, // how many to discard 31 int thin, // keep every 'thin'-th sample 32 double proposal_std, // std of N(0, proposal_std^2) 33 uint64_t seed // RNG seed 34 ) { 35 mt19937_64 rng(seed); 36 normal_distribution<double> norm01(0.0, 1.0); 37 uniform_real_distribution<double> unif(0.0, 1.0); 38 39 vector<double> out; 40 out.reserve(max(0, (iterations - burn_in) / max(1, thin)) + 1); 41 42 double x = x0; 43 double logp_x = log_target(x); // cache current log π(x) 44 int accepted = 0; 45 46 // Random-walk proposal: x' = x + epsilon, epsilon ~ N(0, proposal_std^2) 47 for (int t = 1; t <= iterations; ++t) { 48 double x_prop = x + proposal_std * norm01(rng); 49 double logp_prop = log_target(x_prop); 50 51 // Symmetric proposal => log q terms cancel 52 double log_alpha = min(0.0, logp_prop - logp_x); 53 // Accept/reject in log-domain 54 if (log(unif(rng)) < log_alpha) { 55 x = x_prop; 56 logp_x = logp_prop; 57 ++accepted; 58 } 59 60 // Record after burn-in and per thinning schedule 61 if (t > burn_in && ((t - burn_in) % max(1, thin) == 0)) { 62 out.push_back(x); 63 } 64 } 65 66 // Compute simple summaries 67 double mean = 0.0; 68 for (double v : out) mean += v; 69 mean /= max<size_t>(1, out.size()); 70 double var = 0.0; 71 for (double v : out) var += (v - mean) * (v - mean); 72 var /= max<size_t>(1, out.size()); 73 74 MHResult res; 75 res.samples = move(out); 76 res.accept_rate = (double)accepted / (double)iterations; 77 res.mean = mean; 78 res.var = var; 79 return res; 80 } 81 82 int main() { 83 // Settings 84 double x0 = -3.0; // start near one mode 85 int iterations = 20000; // total MH steps 86 int burn_in = 2000; // discard initial steps 87 int thin = 5; // keep every 5th sample 88 double proposal_std = 1.0; // tune for acceptance ~ 0.2-0.5 89 uint64_t seed = 123456789ULL; 90 91 MHResult res = metropolis_random_walk(x0, iterations, burn_in, thin, proposal_std, seed); 92 93 cout.setf(std::ios::fixed); cout << setprecision(4); 94 cout << "Collected samples: " << res.samples.size() << "\n"; 95 cout << "Acceptance rate: " << res.accept_rate << "\n"; 96 cout << "Sample mean: " << res.mean << "\n"; 97 cout << "Sample variance: " << res.var << "\n"; 98 99 // Print first few samples 100 cout << "First 10 samples:" << '\n'; 101 for (size_t i = 0; i < min<size_t>(10, res.samples.size()); ++i) { 102 cout << res.samples[i] << (i + 1 == 10 ? '\n' : ' '); 103 } 104 return 0; 105 } 106
This program samples from a 1D mixture of two Gaussians using a Gaussian random-walk proposal. Because the proposal is symmetric, the acceptance probability reduces to α = min(1, π(x')/π(x)). All calculations are done in the log domain for stability. Burn-in and thinning are supported, and acceptance rate plus simple summaries are reported. You can adjust the proposal_std to trade off acceptance rate and exploration.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Log of unnormalized Gamma(k, theta) density: x^{k-1} e^{-x/theta}, support x>0 5 static inline double log_unnorm_gamma(double x, double k, double theta) { 6 if (x <= 0.0) return -INFINITY; // outside support 7 return (k - 1.0) * log(x) - (x / theta); // ignore constants independent of x 8 } 9 10 struct MHResult { vector<double> samples; double accept_rate; }; 11 12 MHResult mh_log_normal_proposal( 13 double x0, int iterations, int burn_in, int thin, 14 double k, double theta, // target Gamma shape k and scale theta 15 double prop_sigma, // std of log-space random walk 16 uint64_t seed 17 ) { 18 mt19937_64 rng(seed); 19 normal_distribution<double> norm01(0.0, 1.0); 20 uniform_real_distribution<double> unif(0.0, 1.0); 21 22 vector<double> out; 23 out.reserve(max(0, (iterations - burn_in) / max(1, thin)) + 1); 24 25 double x = max(1e-6, x0); // ensure positive start 26 double logp_x = log_unnorm_gamma(x, k, theta); 27 int accepted = 0; 28 29 for (int t = 1; t <= iterations; ++t) { 30 // Propose multiplicative step: y = x * exp(sigma * Z), Z ~ N(0,1) 31 double z = norm01(rng); 32 double y = x * exp(prop_sigma * z); 33 34 // Compute log acceptance with Hastings correction 35 double logp_y = log_unnorm_gamma(y, k, theta); 36 // For log-normal proposals: log q(x|y) - log q(y|x) = log(y/x) 37 double log_hastings = log(y) - log(x); 38 double log_alpha = min(0.0, (logp_y - logp_x) + log_hastings); 39 40 if (log(unif(rng)) < log_alpha) { 41 x = y; 42 logp_x = logp_y; 43 ++accepted; 44 } 45 46 if (t > burn_in && ((t - burn_in) % max(1, thin) == 0)) { 47 out.push_back(x); 48 } 49 } 50 51 MHResult res; 52 res.samples = move(out); 53 res.accept_rate = (double)accepted / (double)iterations; 54 return res; 55 } 56 57 int main() { 58 double x0 = 1.0; // positive start 59 int iterations = 30000; 60 int burn_in = 3000; 61 int thin = 10; 62 double k = 2.0, theta = 1.5; // target Gamma parameters 63 double prop_sigma = 0.5; // std of log-space move; tune for acceptance ~ 0.2-0.5 64 uint64_t seed = 42ULL; 65 66 MHResult res = mh_log_normal_proposal(x0, iterations, burn_in, thin, k, theta, prop_sigma, seed); 67 68 cout.setf(std::ios::fixed); cout << setprecision(4); 69 cout << "Kept samples: " << res.samples.size() << "\n"; 70 cout << "Acceptance rate: " << res.accept_rate << "\n"; 71 // Report simple summaries 72 double mean = 0.0; for (double v : res.samples) mean += v; mean /= max<size_t>(1, res.samples.size()); 73 double var = 0.0; for (double v : res.samples) var += (v - mean) * (v - mean); var /= max<size_t>(1, res.samples.size()); 74 cout << "Sample mean: " << mean << "\n"; 75 cout << "Sample variance: " << var << "\n"; 76 77 // Print first few samples 78 cout << "First 10 samples:\n"; 79 for (size_t i = 0; i < min<size_t>(10, res.samples.size()); ++i) cout << res.samples[i] << (i+1==10?'\n':' '); 80 return 0; 81 } 82
This example samples a Gamma target on (0, ∞) using a multiplicative log-normal random-walk proposal. Because the proposal is asymmetric, the Hastings correction appears as log q(x|y) − log q(y|x) = log(y/x). Working in log-space preserves numerical stability and naturally enforces the positivity constraint of the target. The program reports acceptance rate and simple summaries.