⚙️AlgorithmAdvanced

Hamiltonian Monte Carlo (HMC)

Key Points

•
Hamiltonian Monte Carlo (HMC) uses gradients of the log-density to propose long-distance moves that still land in high-probability regions.
•
It treats sampling as simulating a frictionless physical system with position (parameters) and momentum (auxiliary variables).
•
The leapfrog integrator makes proposals that are reversible and volume-preserving, enabling a Metropolis correction with high acceptance rates.
•
Per iteration cost is proportional to the number of leapfrog steps times the cost of computing the gradient of the log-density.
•
HMC excels in moderate-to-high dimensions and for strongly correlated posteriors where random-walk MCMC struggles.
•
Proper tuning of step size, number of steps, and mass matrix dramatically affects performance and acceptance rate.
•
HMC is not suitable for discrete or non-differentiable parameters because it requires gradients.
•
NUTS (No-U-Turn Sampler) builds on HMC by automatically choosing the trajectory length, reducing manual tuning.

Prerequisites

→Multivariable calculus (gradients) — HMC requires computing gradients of the log-density and understanding how they guide motion.
→Linear algebra (vectors, matrices) — Leapfrog updates and mass matrices require basic vector/matrix operations.
→Probability and Bayesian inference — Understanding target distributions, posteriors, priors, and likelihoods is essential.
→Markov Chain Monte Carlo (MCMC) basics — HMC is an MCMC method; knowledge of Metropolis-Hastings and detailed balance helps.
→Numerical integration of ODEs — HMC uses symplectic integrators (leapfrog) to approximate Hamiltonian dynamics.
→C++ programming and random number generation — Implementing samplers requires writing efficient, reproducible code.
→Data preprocessing and scaling — Standardization improves the geometry for HMC and allows larger stable step sizes.

Detailed Explanation

Tap terms for definitions

01Overview

Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo (MCMC) algorithm that uses gradient information to explore probability distributions efficiently. Instead of taking short, random steps (like a random walk), HMC simulates the motion of a particle moving through a landscape defined by the negative log probability (potential energy). By introducing auxiliary momentum variables and integrating Hamilton’s equations with a symplectic integrator (typically leapfrog), HMC proposes distant states that still have high probability, thereby reducing autocorrelation and improving mixing.

The core idea is to alternate between sampling a fresh momentum from a Gaussian distribution and then following a nearly energy-conserving path through the parameter space for several steps before applying a Metropolis acceptance test. Because the leapfrog method is time-reversible and volume-preserving, the proposals are accepted with relatively high probability if the step size is tuned properly. This makes HMC particularly powerful in higher dimensions and for posteriors with strong correlations where random-walk proposals are inefficient.

HMC requires the gradient of the log target density, which restricts its use to differentiable models but yields large gains when available. Practical performance depends on selecting the step size, number of leapfrog steps, and an appropriate mass matrix that matches the geometry of the target. Variants like the No-U-Turn Sampler (NUTS) automate trajectory length selection and often include adaptation of both step size and mass matrix during warm-up.

02Intuition & Analogies

Imagine hiking across a mountainous landscape where altitude equals negative log probability: valleys are high-probability regions, and peaks are low-probability regions. A basic random-walk sampler takes small, jittery steps in random directions; it gets stuck meandering in valleys, repeatedly proposing moves uphill that are frequently rejected. Progress is slow and inefficient.

HMC gives you a map with slope directions (the gradient) and a way to build momentum so you can glide along the valley floor. You start by giving yourself a push (sample momentum), then follow the terrain’s shape using the gradient to steer. Because you move with momentum, you travel longer distances along contours of nearly constant altitude, visiting many plausible points before losing speed. This produces proposals that are far apart but still land in promising areas, leading to faster exploration and lower autocorrelation.

The leapfrog integrator is like a careful sequence of mini-steps that keeps total energy nearly constant as you move. Reversibility (you can exactly retrace your path backward) and volume preservation (you don’t squish or stretch space) ensure that, after your flight, a simple Metropolis check can correct for small numerical errors. If the energy barely changes, your proposal is accepted most of the time. Intuitively, HMC replaces clumsy, uphill-prone stumbling with smooth, physics-informed gliding that respects the landscape’s geometry.

03Formal Definition

Let \(

π

(x)\) be a target density on \(

R^{d}

\), assumed differentiable. Define the potential energy \(U(x) = -

lo g

π

(x)\) and introduce an auxiliary momentum \(p

\in

R^{d}

\) with kinetic energy \(K(p) =

\frac{1}{2}

⊤

M^{- 1}

p\), where $M$ is a symmetric positive definite mass matrix. The joint density over $(x,p)$ is proportional to \(

exp

(-H(x,p))\) with Hamiltonian $H(x,p) = U(x) + K(p)$. Sampling $(x,p)$ from this joint and marginalizing out $p$ leaves $x$ distributed according to \(

π

\). A single HMC transition proceeds by: (1) resampling \(p

\sim

N

(0, M)\), (2) approximately integrating Hamilton’s equations \(

\overset{x}{˙}

\nabla_{p}

H = M^{-1}p\

), \(

\overset{p}{˙}

= -

\nabla_{x}

H = -

\nabla

U(x)\) for $L$ steps with step size \(

ε

\) using a symplectic integrator (leapfrog), yielding a proposal $(x',p')$, and (3) performing a Metropolis accept/reject with probability \(

α

min

\{1,

\exp\big

(-H(x',p') + H(x,p)

\big

)\}\). If accepted, set \(x

\leftarrow

x'\); otherwise keep the old state. The leapfrog method updates as: \(

p_{t + ε /2} = p_{t}

\frac{ε}{2}

\nabla

x_{t}

)\), \(

x_{t + ε} = x_{t}

ε

M^{- 1}

p_{t + ε /2}

\), and \(

p_{t + ε} = p_{t + ε /2}

\frac{ε}{2}

\nabla

x_{t + ε}

)\). This scheme is reversible and volume-preserving, ensuring detailed balance when combined with the Metropolis correction.

04When to Use

Continuous, differentiable targets: Use HMC when your log-density is differentiable and gradients are available or cheaply computable (e.g., models with smooth priors/likelihoods).
Moderate-to-high dimensions: HMC shines as dimension grows, where random-walk proposals degrade severely. It manages correlated, elongated posteriors well by moving along level sets.
Complex posteriors with strong correlations: Hierarchical Bayesian models or logistic/Poisson regression with informative priors benefit significantly.
Expensive likelihoods where fewer, higher-quality proposals are better than many cheap, low-quality ones.
When you can precondition: If you can estimate scaling/rotation (mass matrix) from warm-up, HMC performance improves dramatically.

Avoid or modify when: parameters are discrete or piecewise constant (no gradients), the log-density is very rough or non-differentiable, or you cannot compute/approximate gradients. If hand-tuning (L) is difficult, prefer NUTS, which adapts trajectory length. If curvature varies widely, consider Riemannian HMC or adapt a dense mass matrix. For extremely multimodal targets with distant, isolated modes, HMC can still struggle to jump between modes; tempered methods may help.

⚠️Common Mistakes

Wrong sign for gradients: In HMC, the potential is (U(x) = -\log \pi(x)). The leapfrog uses (-\nabla U = \nabla \log \pi) only in specific places. Mixing these signs leads to poor acceptance or divergence. Define one convention and stick to it.
Step size too large: Large (\varepsilon) causes energy drift and rejections; too small wastes computation. Aim for an average acceptance around 60–90%, often near 65–80% in higher dimensions.
Too few or too many leapfrog steps: Too few makes proposals local (like random walk); too many can loop back (wasting computation) or accumulate numerical error. NUTS mitigates this automatically.
Ignoring mass matrix: Using (M=I) for anisotropic targets yields tiny stable step sizes. Estimate a diagonal or dense (M) from warm-up to match posterior scales/correlations.
Not centering/scaling parameters: Poorly scaled parameters amplify stiffness, forcing tiny (\varepsilon). Standardize predictors and reparameterize when possible.
Using non-symplectic integrators: Forward Euler breaks reversibility/volume preservation, causing bias even with Metropolis correction. Use leapfrog or higher-order symplectic schemes.
Forgetting to flip momentum on rejection/acceptance consistently: Ensure the accept/reject step uses the correct Hamiltonian and that proposals are constructed reversibly (commonly accept with current (p') or its negation consistently).
Insufficient warm-up: Without step size and mass adaptation, chains can diverge or mix poorly. Use dual averaging for (\varepsilon) and estimate (M) during warm-up.
Miscomputing the logistic regression gradient: Remember (\nabla U = -\nabla \log \pi). The sigmoid and prior terms must be included with correct signs.

Key Formulas

Potential Energy

U (x) = - lo g π (x)

Explanation: Defines the energy landscape from the target density. Lower U corresponds to higher probability regions; HMC moves along this landscape.

Kinetic Energy

K (p) = \frac{1}{2} p^{⊤} M^{- 1} p

Explanation: Relates momentum to energy via the mass matrix. With M = I, it reduces to half the squared Euclidean norm of p.

Hamiltonian

H (x, p) = U (x) + K (p)

Explanation: Total energy used in the Metropolis correction. Leapfrog approximately conserves H, so proposals are often accepted.

Hamilton’s Equations

\overset{x}{˙} = \nabla_{p} H (x, p) = M^{- 1} p, \overset{p}{˙} = - \nabla_{x} H (x, p) = - \nabla U (x)

Explanation: Continuous-time dynamics that HMC discretizes. They define how position and momentum evolve along the trajectory.

Leapfrog Half-Step (Momentum)

p_{t + ε /2} = p_{t} - \frac{ε}{2} \nabla U (x_{t})

Explanation: First half-step update of momentum using the gradient of the potential. It keeps the integrator time-reversible and stable.

Leapfrog Full-Step (Position)

x_{t + ε} = x_{t} + ε M^{- 1} p_{t + ε /2}

Explanation: Update position using the intermediate momentum. This is the central position update of leapfrog.

Leapfrog Half-Step (Momentum 2)

p_{t + ε} = p_{t + ε /2} - \frac{ε}{2} \nabla U (x_{t + ε})

Explanation: Second half-step update of momentum using the new position. Together, the three updates form a reversible, volume-preserving step.

Metropolis Acceptance

α = min {1, exp (- H (x^{'}, p^{'}) + H (x, p))}

Explanation: Accept or reject the proposal based on the change in total energy. Small integration error (small ΔH) leads to high acceptance.

Logistic Sigmoid

σ (z) = \frac{1}{1 + e ^{- z}}

Explanation: Maps real numbers to (0,1); used in logistic regression likelihoods. Its derivative is $ $σ$ (z)(1- $σ$ (z))$.

Logistic Regression Potential Gradient

\nabla U (w) = - \nabla lo g π (w) = - i = 1 \sum n (y_{i} - σ (x_{i}^{⊤} w)) x_{i} + \frac{1}{σ _{0}^{2}} w

Explanation: For a Gaussian prior w ~ N(0, $σ_{0}^{2}$ I), the gradient of the potential includes the data-fit term and the prior penalty term.

Per-Iteration Complexity

Cost per iteration \approx L \cdot C_{\nabla}, C_{\nabla} = O (d) to O (n d)

Explanation: Each leapfrog step requires a gradient. For simple targets C_∇ is O(d); for generalized linear models it’s O(nd).

Complexity Analysis

Per HMC iteration, the dominant work is computing gradients of the potential U(x) for each leapfrog step. If L is the number of leapfrog steps and C_∇ is the cost of one gradient evaluation, the time per iteration is O(L · C_∇). For simple analytic targets like a multivariate Gaussian with a precomputed inverse covariance, C_∇ = O(d), giving O(L d) time. For Bayesian logistic regression with n data points and d features, evaluating the gradient requires computing n inner products and sums, so C_∇ = O(n d), and the iteration cost becomes O(L n d). The leapfrog integrator is second-order accurate: local energy error per step is O(

ε^{3}),

and over L steps the accumulated error is roughly O(L

ε^{3})

for energy, which drives the Metropolis rejection rate. To maintain a desired acceptance probability as dimension grows, the step size ε often must decrease; preconditioning with a good mass matrix M can restore larger stable ε and reduce L. Space complexity is modest. Storing the current position x and momentum p takes O(d). For models with data, memory is dominated by the dataset storage, O(n d), plus any precomputed structures (e.g., a covariance inverse for Gaussians). Temporary workspaces for gradients and intermediate leapfrog states are O(d). If a dense mass matrix is used, storing M and possibly its Cholesky factor adds O(

d^{2}

). In practice, diagonal M is common, keeping extra memory O(d). Overall, HMC is memory-light compared to many optimization routines, but time can be substantial if gradients are expensive or many leapfrog steps are needed.

Code Examples

HMC sampler for a 2D correlated Gaussian

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct HMC {
5     int d;                   // dimension
6     double eps;              // step size
7     int L;                   // number of leapfrog steps
8     mt19937 rng;
9     normal_distribution<double> stdnorm{0.0, 1.0};
10     uniform_real_distribution<double> unif{0.0, 1.0};
11 
12     vector<double> mu;                   // mean
13     vector<vector<double>> invSigma;     // inverse covariance (precomputed)
14 
15     HMC(int dim, double step, int steps, const vector<double>& mu_, const vector<vector<double>>& invS)
16         : d(dim), eps(step), L(steps), mu(mu_), invSigma(invS) {
17         random_device rd; rng.seed(rd());
18     }
19 
20     // Vector utilities
21     static vector<double> add(const vector<double>& a, const vector<double>& b) {
22         vector<double> c(a.size());
23         for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] + b[i];
24         return c;
25     }
26 
27     static vector<double> sub(const vector<double>& a, const vector<double>& b) {
28         vector<double> c(a.size());
29         for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] - b[i];
30         return c;
31     }
32 
33     static vector<double> scal(double s, const vector<double>& a) {
34         vector<double> c(a.size());
35         for (size_t i = 0; i < a.size(); ++i) c[i] = s * a[i];
36         return c;
37     }
38 
39     static double dot(const vector<double>& a, const vector<double>& b) {
40         double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
41     }
42 
43     vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) const {
44         vector<double> y(A.size(), 0.0);
45         for (size_t i = 0; i < A.size(); ++i) {
46             double s = 0.0;
47             for (size_t j = 0; j < A[i].size(); ++j) s += A[i][j] * x[j];
48             y[i] = s;
49         }
50         return y;
51     }
52 
53     // Potential U(x) = 0.5 * (x-mu)^T invSigma (x-mu) (const dropped)
54     double U(const vector<double>& x) const {
55         vector<double> xm = sub(x, mu);
56         vector<double> t = matvec(invSigma, xm);
57         return 0.5 * dot(xm, t);
58     }
59 
60     // Gradient of U: grad_U = invSigma * (x - mu)
61     vector<double> gradU(const vector<double>& x) const {
62         vector<double> xm = sub(x, mu);
63         return matvec(invSigma, xm);
64     }
65 
66     // Single HMC transition from state x
67     vector<double> step(const vector<double>& x0, bool& accepted, double& dH) {
68         // Sample momentum p ~ N(0, I)
69         vector<double> p0(d);
70         for (int i = 0; i < d; ++i) p0[i] = stdnorm(rng);
71 
72         // Cache initial energy
73         double H0 = U(x0) + 0.5 * dot(p0, p0);
74 
75         // Initialize position and momentum
76         vector<double> x = x0;
77         vector<double> p = p0;
78 
79         // Leapfrog integration
80         // Half-step momentum
81         vector<double> g = gradU(x);
82         for (int i = 0; i < d; ++i) p[i] -= 0.5 * eps * g[i];
83         // L full steps
84         for (int step = 0; step < L; ++step) {
85             // Full position step (M = I)
86             for (int i = 0; i < d; ++i) x[i] += eps * p[i];
87             // Compute gradient at new position, except after last step we'll still do half momentum
88             g = gradU(x);
89             // If not last step, full momentum step; but leapfrog uses half at end too
90             if (step != L - 1) {
91                 for (int i = 0; i < d; ++i) p[i] -= eps * g[i];
92             }
93         }
94         // Final half-step momentum
95         for (int i = 0; i < d; ++i) p[i] -= 0.5 * eps * g[i];
96 
97         // Compute proposed energy
98         double H1 = U(x) + 0.5 * dot(p, p);
99         dH = H1 - H0;
100 
101         double loga = -dH; // log acceptance ratio
102         double u = log(unif(rng));
103         if (u < loga) { accepted = true; return x; }
104         accepted = false; return x0;
105     }
106 };
107 
108 int main() {
109     // Define a 2D Gaussian with mean [0,0] and correlation 0.8
110     int d = 2;
111     vector<double> mu = {0.0, 0.0};
112     double rho = 0.8, s1 = 1.0, s2 = 0.5;
113     // Sigma = [[s1^2, rho*s1*s2],[rho*s1*s2, s2^2]]
114     double s11 = s1*s1, s22 = s2*s2, s12 = rho*s1*s2;
115     double det = s11*s22 - s12*s12;
116     vector<vector<double>> invSigma = {{ s22/det, -s12/det }, { -s12/det, s11/det }};
117 
118     // HMC parameters
119     double eps = 0.05; // step size
120     int L = 25;        // leapfrog steps
121 
122     HMC hmc(d, eps, L, mu, invSigma);
123 
124     // Initial state
125     vector<double> x = {2.0, -2.0};
126 
127     // Run sampler
128     int n_iter = 3000;
129     int burn = 500;
130     int accepted_count = 0;
131 
132     cout << fixed << setprecision(4);
133     for (int it = 0; it < n_iter; ++it) {
134         bool acc; double dH;
135         x = hmc.step(x, acc, dH);
136         accepted_count += acc ? 1 : 0;
137         if (it % 500 == 0) {
138             cout << "iter=" << it << ", x=[" << x[0] << ", " << x[1] << "]";
139             cout << ", accepted=" << acc << ", dH=" << dH << "\n";
140         }
141         if (it >= burn && it % 200 == 0) {
142             // Print a posterior draw occasionally after burn-in
143             cout << "post-sample: " << x[0] << ", " << x[1] << "\n";
144         }
145     }
146     cout << "Acceptance rate: " << (double)accepted_count / n_iter << "\n";
147     return 0;
148 }
149

This program samples from a 2D correlated Gaussian using HMC. The potential U and its gradient are analytic and cheap: U(x) = 0.5 (x-μ)^T Σ^{-1} (x-μ), ∇U = Σ^{-1}(x-μ). We perform L leapfrog steps with step size ε, compute the Hamiltonian difference, and accept or reject the proposal. Because M = I and the target is Gaussian, well-tuned ε and L yield high acceptance and long, efficient moves along the correlated directions.

Time: O(L · d) per iteration (gradient is a matrix-vector product with precomputed Σ^{-1}).Space: O(d) for state and momentum; O(d^2) for storing Σ^{-1}.

HMC for Bayesian logistic regression (Gaussian prior)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct LogisticHMC {
5     int n, d;
6     double eps;  // step size
7     int L;       // leapfrog steps
8     double sigma0; // prior std for weights
9 
10     vector<vector<double>> X; // n x d
11     vector<int> y;            // n labels in {0,1}
12 
13     mt19937 rng;
14     normal_distribution<double> stdnorm{0.0, 1.0};
15     uniform_real_distribution<double> unif{0.0, 1.0};
16 
17     LogisticHMC(const vector<vector<double>>& X_, const vector<int>& y_, double eps_, int L_, double sigma0_)
18         : n((int)X_.size()), d((int)X_[0].size()), eps(eps_), L(L_), sigma0(sigma0_), X(X_), y(y_) {
19         random_device rd; rng.seed(rd());
20     }
21 
22     static double sigmoid(double z) { return 1.0 / (1.0 + exp(-z)); }
23 
24     // U(w) = -log posterior = -sum(y_i * x_i^T w - log(1+exp(x_i^T w))) + (1/(2 sigma0^2))||w||^2
25     double U(const vector<double>& w) const {
26         double loss = 0.0;
27         for (int i = 0; i < n; ++i) {
28             double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w[j];
29             // log(1 + exp(z)) is stable if we use log1p(exp(z))
30             double log1pexp = (z > 0) ? z + log1p(exp(-z)) : log1p(exp(z));
31             loss += - (y[i] * z - log1pexp);
32         }
33         double prior = 0.0; for (int j = 0; j < d; ++j) prior += 0.5 * (w[j] * w[j]) / (sigma0 * sigma0);
34         return loss + prior;
35     }
36 
37     // grad U(w) = -sum((y_i - sigma(x_i^T w)) x_i) + (1/sigma0^2) w
38     vector<double> gradU(const vector<double>& w) const {
39         vector<double> g(d, 0.0);
40         for (int i = 0; i < n; ++i) {
41             double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w[j];
42             double p = sigmoid(z);
43             double coeff = (p - (double)y[i]); // note: derivative of negative log-lik
44             for (int j = 0; j < d; ++j) g[j] += coeff * X[i][j];
45         }
46         double invs2 = 1.0 / (sigma0 * sigma0);
47         for (int j = 0; j < d; ++j) g[j] += invs2 * w[j];
48         return g;
49     }
50 
51     // One HMC step
52     vector<double> step(const vector<double>& w0, bool& accepted, double& dH) {
53         vector<double> p0(d);
54         for (int j = 0; j < d; ++j) p0[j] = stdnorm(rng); // M = I
55 
56         double H0 = U(w0) + 0.5 * inner_product(p0.begin(), p0.end(), p0.begin(), 0.0);
57 
58         vector<double> w = w0;
59         vector<double> p = p0;
60 
61         // Half-step momentum
62         vector<double> g = gradU(w);
63         for (int j = 0; j < d; ++j) p[j] -= 0.5 * eps * g[j];
64 
65         // Leapfrog steps
66         for (int t = 0; t < L; ++t) {
67             for (int j = 0; j < d; ++j) w[j] += eps * p[j];
68             g = gradU(w);
69             if (t != L - 1) {
70                 for (int j = 0; j < d; ++j) p[j] -= eps * g[j];
71             }
72         }
73         for (int j = 0; j < d; ++j) p[j] -= 0.5 * eps * g[j];
74 
75         double H1 = U(w) + 0.5 * inner_product(p.begin(), p.end(), p.begin(), 0.0);
76         dH = H1 - H0;
77         double loga = -dH;
78         if (log(unif(rng)) < loga) { accepted = true; return w; }
79         accepted = false; return w0;
80     }
81 };
82 
83 // Generate a simple synthetic dataset for logistic regression
84 void make_dataset(int n, int d, vector<vector<double>>& X, vector<int>& y, mt19937& rng) {
85     normal_distribution<double> N01(0.0, 1.0);
86     vector<double> w_true(d, 0.0);
87     for (int j = 0; j < d; ++j) w_true[j] = (j % 2 == 0 ? 1.5 : -0.8); // true weights
88 
89     X.assign(n, vector<double>(d)); y.assign(n, 0);
90     for (int i = 0; i < n; ++i) {
91         for (int j = 0; j < d; ++j) X[i][j] = N01(rng);
92         double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w_true[j];
93         double p = 1.0 / (1.0 + exp(-z));
94         bernoulli_distribution Bern(p);
95         y[i] = Bern(rng);
96     }
97 }
98 
99 int main() {
100     ios::sync_with_stdio(false);
101     cin.tie(nullptr);
102 
103     // Create synthetic data
104     mt19937 rng(42);
105     int n = 300, d = 3;
106     vector<vector<double>> X; vector<int> y;
107     make_dataset(n, d, X, y, rng);
108 
109     // Standardize columns (improves HMC stability)
110     for (int j = 0; j < d; ++j) {
111         double mean = 0.0; for (int i = 0; i < n; ++i) mean += X[i][j]; mean /= n;
112         double var = 0.0; for (int i = 0; i < n; ++i) { double v = X[i][j] - mean; var += v*v; }
113         var /= n; double sd = sqrt(var) + 1e-9;
114         for (int i = 0; i < n; ++i) X[i][j] = (X[i][j] - mean) / sd;
115     }
116 
117     double eps = 0.01;  // step size (tune for your data)
118     int L = 50;         // leapfrog steps
119     double sigma0 = 5.0; // prior std
120 
121     LogisticHMC hmc(X, y, eps, L, sigma0);
122 
123     vector<double> w(d, 0.0); // initial weights
124     int n_iter = 2000, burn = 500;
125     int accepted = 0;
126 
127     cout << fixed << setprecision(4);
128     for (int it = 0; it < n_iter; ++it) {
129         bool acc; double dH;
130         w = hmc.step(w, acc, dH);
131         accepted += acc ? 1 : 0;
132         if (it % 200 == 0) {
133             cout << "iter=" << it << ", dH=" << dH << ", acc=" << acc << ", w=[";
134             for (int j = 0; j < d; ++j) cout << w[j] << (j+1<d?", ":"");
135             cout << "]\n";
136         }
137         if (it >= burn && it % 200 == 0) {
138             cout << "posterior draw: ";
139             for (int j = 0; j < d; ++j) cout << w[j] << (j+1<d?", ":"\n");
140         }
141     }
142     cout << "Acceptance rate: " << (double)accepted / n_iter << "\n";
143     return 0;
144 }
145

This example performs HMC for Bayesian logistic regression with a Gaussian prior on weights. The potential U is the negative log posterior, and gradU is its gradient. We standardize features to improve geometry, use M = I, and run leapfrog steps to propose new weights. The acceptance probability corrects for integration error. This shows how HMC handles data-dependent models where each gradient evaluation costs O(n d).

Time: O(L · n · d) per iteration due to gradient over all data points.Space: O(n · d) for the dataset, O(d) for parameters and momentum.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct HMC {
5	int d; // dimension
6	double eps; // step size
7	int L; // number of leapfrog steps
8	mt19937 rng;
9	normal_distribution<double> stdnorm{0.0, 1.0};
10	uniform_real_distribution<double> unif{0.0, 1.0};
11
12	vector<double> mu; // mean
13	vector<vector<double>> invSigma; // inverse covariance (precomputed)
14
15	HMC(int dim, double step, int steps, const vector<double>& mu_, const vector<vector<double>>& invS)
16	: d(dim), eps(step), L(steps), mu(mu_), invSigma(invS) {
17	random_device rd; rng.seed(rd());
18	}
19
20	// Vector utilities
21	static vector<double> add(const vector<double>& a, const vector<double>& b) {
22	vector<double> c(a.size());
23	for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] + b[i];
24	return c;
25	}
26
27	static vector<double> sub(const vector<double>& a, const vector<double>& b) {
28	vector<double> c(a.size());
29	for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] - b[i];
30	return c;
31	}
32
33	static vector<double> scal(double s, const vector<double>& a) {
34	vector<double> c(a.size());
35	for (size_t i = 0; i < a.size(); ++i) c[i] = s * a[i];
36	return c;
37	}
38
39	static double dot(const vector<double>& a, const vector<double>& b) {
40	double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
41	}
42
43	vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) const {
44	vector<double> y(A.size(), 0.0);
45	for (size_t i = 0; i < A.size(); ++i) {
46	double s = 0.0;
47	for (size_t j = 0; j < A[i].size(); ++j) s += A[i][j] * x[j];
48	y[i] = s;
49	}
50	return y;
51	}
52
53	// Potential U(x) = 0.5 * (x-mu)^T invSigma (x-mu) (const dropped)
54	double U(const vector<double>& x) const {
55	vector<double> xm = sub(x, mu);
56	vector<double> t = matvec(invSigma, xm);
57	return 0.5 * dot(xm, t);
58	}
59
60	// Gradient of U: grad_U = invSigma * (x - mu)
61	vector<double> gradU(const vector<double>& x) const {
62	vector<double> xm = sub(x, mu);
63	return matvec(invSigma, xm);
64	}
65
66	// Single HMC transition from state x
67	vector<double> step(const vector<double>& x0, bool& accepted, double& dH) {
68	// Sample momentum p ~ N(0, I)
69	vector<double> p0(d);
70	for (int i = 0; i < d; ++i) p0[i] = stdnorm(rng);
71
72	// Cache initial energy
73	double H0 = U(x0) + 0.5 * dot(p0, p0);
74
75	// Initialize position and momentum
76	vector<double> x = x0;
77	vector<double> p = p0;
78
79	// Leapfrog integration
80	// Half-step momentum
81	vector<double> g = gradU(x);
82	for (int i = 0; i < d; ++i) p[i] -= 0.5 * eps * g[i];
83	// L full steps
84	for (int step = 0; step < L; ++step) {
85	// Full position step (M = I)
86	for (int i = 0; i < d; ++i) x[i] += eps * p[i];
87	// Compute gradient at new position, except after last step we'll still do half momentum
88	g = gradU(x);
89	// If not last step, full momentum step; but leapfrog uses half at end too
90	if (step != L - 1) {
91	for (int i = 0; i < d; ++i) p[i] -= eps * g[i];
92	}
93	}
94	// Final half-step momentum
95	for (int i = 0; i < d; ++i) p[i] -= 0.5 * eps * g[i];
96
97	// Compute proposed energy
98	double H1 = U(x) + 0.5 * dot(p, p);
99	dH = H1 - H0;
100
101	double loga = -dH; // log acceptance ratio
102	double u = log(unif(rng));
103	if (u < loga) { accepted = true; return x; }
104	accepted = false; return x0;
105	}
106	};
107
108	int main() {
109	// Define a 2D Gaussian with mean [0,0] and correlation 0.8
110	int d = 2;
111	vector<double> mu = {0.0, 0.0};
112	double rho = 0.8, s1 = 1.0, s2 = 0.5;
113	// Sigma = [[s1^2, rhos1s2],[rhos1s2, s2^2]]
114	double s11 = s1s1, s22 = s2s2, s12 = rhos1s2;
115	double det = s11s22 - s12s12;
116	vector<vector<double>> invSigma = {{ s22/det, -s12/det }, { -s12/det, s11/det }};
117
118	// HMC parameters
119	double eps = 0.05; // step size
120	int L = 25; // leapfrog steps
121
122	HMC hmc(d, eps, L, mu, invSigma);
123
124	// Initial state
125	vector<double> x = {2.0, -2.0};
126
127	// Run sampler
128	int n_iter = 3000;
129	int burn = 500;
130	int accepted_count = 0;
131
132	cout << fixed << setprecision(4);
133	for (int it = 0; it < n_iter; ++it) {
134	bool acc; double dH;
135	x = hmc.step(x, acc, dH);
136	accepted_count += acc ? 1 : 0;
137	if (it % 500 == 0) {
138	cout << "iter=" << it << ", x=[" << x[0] << ", " << x[1] << "]";
139	cout << ", accepted=" << acc << ", dH=" << dH << "\n";
140	}
141	if (it >= burn && it % 200 == 0) {
142	// Print a posterior draw occasionally after burn-in
143	cout << "post-sample: " << x[0] << ", " << x[1] << "\n";
144	}
145	}
146	cout << "Acceptance rate: " << (double)accepted_count / n_iter << "\n";
147	return 0;
148	}
149

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct LogisticHMC {
5	int n, d;
6	double eps; // step size
7	int L; // leapfrog steps
8	double sigma0; // prior std for weights
9
10	vector<vector<double>> X; // n x d
11	vector<int> y; // n labels in {0,1}
12
13	mt19937 rng;
14	normal_distribution<double> stdnorm{0.0, 1.0};
15	uniform_real_distribution<double> unif{0.0, 1.0};
16
17	LogisticHMC(const vector<vector<double>>& X_, const vector<int>& y_, double eps_, int L_, double sigma0_)
18	: n((int)X_.size()), d((int)X_[0].size()), eps(eps_), L(L_), sigma0(sigma0_), X(X_), y(y_) {
19	random_device rd; rng.seed(rd());
20	}
21
22	static double sigmoid(double z) { return 1.0 / (1.0 + exp(-z)); }
23
24	// U(w) = -log posterior = -sum(y_i * x_i^T w - log(1+exp(x_i^T w))) + (1/(2 sigma0^2))\|\|w\|\|^2
25	double U(const vector<double>& w) const {
26	double loss = 0.0;
27	for (int i = 0; i < n; ++i) {
28	double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w[j];
29	// log(1 + exp(z)) is stable if we use log1p(exp(z))
30	double log1pexp = (z > 0) ? z + log1p(exp(-z)) : log1p(exp(z));
31	loss += - (y[i] * z - log1pexp);
32	}
33	double prior = 0.0; for (int j = 0; j < d; ++j) prior += 0.5 * (w[j] * w[j]) / (sigma0 * sigma0);
34	return loss + prior;
35	}
36
37	// grad U(w) = -sum((y_i - sigma(x_i^T w)) x_i) + (1/sigma0^2) w
38	vector<double> gradU(const vector<double>& w) const {
39	vector<double> g(d, 0.0);
40	for (int i = 0; i < n; ++i) {
41	double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w[j];
42	double p = sigmoid(z);
43	double coeff = (p - (double)y[i]); // note: derivative of negative log-lik
44	for (int j = 0; j < d; ++j) g[j] += coeff * X[i][j];
45	}
46	double invs2 = 1.0 / (sigma0 * sigma0);
47	for (int j = 0; j < d; ++j) g[j] += invs2 * w[j];
48	return g;
49	}
50
51	// One HMC step
52	vector<double> step(const vector<double>& w0, bool& accepted, double& dH) {
53	vector<double> p0(d);
54	for (int j = 0; j < d; ++j) p0[j] = stdnorm(rng); // M = I
55
56	double H0 = U(w0) + 0.5 * inner_product(p0.begin(), p0.end(), p0.begin(), 0.0);
57
58	vector<double> w = w0;
59	vector<double> p = p0;
60
61	// Half-step momentum
62	vector<double> g = gradU(w);
63	for (int j = 0; j < d; ++j) p[j] -= 0.5 * eps * g[j];
64
65	// Leapfrog steps
66	for (int t = 0; t < L; ++t) {
67	for (int j = 0; j < d; ++j) w[j] += eps * p[j];
68	g = gradU(w);
69	if (t != L - 1) {
70	for (int j = 0; j < d; ++j) p[j] -= eps * g[j];
71	}
72	}
73	for (int j = 0; j < d; ++j) p[j] -= 0.5 * eps * g[j];
74
75	double H1 = U(w) + 0.5 * inner_product(p.begin(), p.end(), p.begin(), 0.0);
76	dH = H1 - H0;
77	double loga = -dH;
78	if (log(unif(rng)) < loga) { accepted = true; return w; }
79	accepted = false; return w0;
80	}
81	};
82
83	// Generate a simple synthetic dataset for logistic regression
84	void make_dataset(int n, int d, vector<vector<double>>& X, vector<int>& y, mt19937& rng) {
85	normal_distribution<double> N01(0.0, 1.0);
86	vector<double> w_true(d, 0.0);
87	for (int j = 0; j < d; ++j) w_true[j] = (j % 2 == 0 ? 1.5 : -0.8); // true weights
88
89	X.assign(n, vector<double>(d)); y.assign(n, 0);
90	for (int i = 0; i < n; ++i) {
91	for (int j = 0; j < d; ++j) X[i][j] = N01(rng);
92	double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w_true[j];
93	double p = 1.0 / (1.0 + exp(-z));
94	bernoulli_distribution Bern(p);
95	y[i] = Bern(rng);
96	}
97	}
98
99	int main() {
100	ios::sync_with_stdio(false);
101	cin.tie(nullptr);
102
103	// Create synthetic data
104	mt19937 rng(42);
105	int n = 300, d = 3;
106	vector<vector<double>> X; vector<int> y;
107	make_dataset(n, d, X, y, rng);
108
109	// Standardize columns (improves HMC stability)
110	for (int j = 0; j < d; ++j) {
111	double mean = 0.0; for (int i = 0; i < n; ++i) mean += X[i][j]; mean /= n;
112	double var = 0.0; for (int i = 0; i < n; ++i) { double v = X[i][j] - mean; var += v*v; }
113	var /= n; double sd = sqrt(var) + 1e-9;
114	for (int i = 0; i < n; ++i) X[i][j] = (X[i][j] - mean) / sd;
115	}
116
117	double eps = 0.01; // step size (tune for your data)
118	int L = 50; // leapfrog steps
119	double sigma0 = 5.0; // prior std
120
121	LogisticHMC hmc(X, y, eps, L, sigma0);
122
123	vector<double> w(d, 0.0); // initial weights
124	int n_iter = 2000, burn = 500;
125	int accepted = 0;
126
127	cout << fixed << setprecision(4);
128	for (int it = 0; it < n_iter; ++it) {
129	bool acc; double dH;
130	w = hmc.step(w, acc, dH);
131	accepted += acc ? 1 : 0;
132	if (it % 200 == 0) {
133	cout << "iter=" << it << ", dH=" << dH << ", acc=" << acc << ", w=[";
134	for (int j = 0; j < d; ++j) cout << w[j] << (j+1<d?", ":"");
135	cout << "]\n";
136	}
137	if (it >= burn && it % 200 == 0) {
138	cout << "posterior draw: ";
139	for (int j = 0; j < d; ++j) cout << w[j] << (j+1<d?", ":"\n");
140	}
141	}
142	cout << "Acceptance rate: " << (double)accepted / n_iter << "\n";
143	return 0;
144	}
145