📚TheoryAdvanced

Rate-Distortion Theory

Key Points

•
Rate–distortion theory tells you the minimum number of bits per symbol needed to represent data while keeping average distortion below a target D.
•
The core object is the rate–distortion function R(D) = $min_{P (\hat{X} ∣ X) : E [d (X, \hat{X})] \leq D}$ I(X; $\hat$ X), which balances compression rate and fidelity.
•
For discrete memoryless sources and separable distortions, R(D) is convex, nonincreasing in D, and can be computed numerically by the Blahut–Arimoto algorithm.
•
Classic closed forms include the binary source with Hamming distortion and the Gaussian source with mean-squared error (MSE).
•
The Lagrange multiplier $β$ controls the trade-off slope: larger $β$ forces lower distortion and higher rate; $β$ = - $\frac{d R}{d D}$ .
•
Optimal encoders behave like a "test channel" P( $\hat$ X|X) that stochastically maps X to reconstructions to meet distortion at minimum mutual information.
•
In practice, scalar/vector quantization and entropy coding approach R(D); algorithms like Lloyd–Max and k-means are practical approximations.
•
Units matter: use bits when logging base 2; many algorithms work in nats (natural logs) and must be converted by dividing by $ln$ 2.

Prerequisites

→Probability distributions and expectations — R(D) is defined via expected distortion and mutual information over probability distributions.
→Entropy and mutual information — Rate–distortion is a constrained minimization of mutual information; understanding H(·) and I(·;·) is essential.
→Convex optimization (Lagrangian/KKT) — The Lagrangian form with multiplier β and properties like convexity underlie algorithms and interpretations.
→KL divergence and log-sum-exp — Numerical stability and objective expressions use KL divergence and require stable log computations.
→Quantization (scalar/vector) — Practical lossy compression uses quantizers, which approximate the theoretical R(D).

Detailed Explanation

Tap terms for definitions

01Overview

Hook → What’s the smallest file size you can get for a photo while keeping it “good enough” to your eyes? That question is the heartbeat of rate–distortion theory. Concept → Rate–distortion theory formalizes the trade‑off between compression rate (bits per symbol) and allowed average distortion (how far reconstructions can deviate from the originals). It defines the rate–distortion function R(D), the theoretical lower bound on the number of bits needed to achieve a target distortion D under an optimal (possibly probabilistic) encoder/decoder. For memoryless sources and separable distortion measures, R(D) depends only on the source distribution and the distortion function, not on a specific code. Example → JPEG, MP3, and video codecs all sit on this curve in practice: by tuning a quality knob, you move along the R(D) trade‑off. The closer a codec operates to R(D), the more efficient it is. Rate–distortion theory gives the target every practical compressor aims for, and provides tools (like the Blahut–Arimoto algorithm) to compute R(D) for discrete sources when closed forms don’t exist.

02Intuition & Analogies

Hook → Imagine packing a suitcase with a strict weight limit: you want to take as much as possible (content) without exceeding the weight (distortion budget). Concept → Each time you leave something out or choose a lighter version, you “save weight” (save bits) but sacrifice fidelity (add distortion). Rate–distortion theory puts a ruler on this compromise. The curve R(D) is like the best packing guide: for any allowed weight D, it tells you the lightest possible suitcase (fewest bits) that still captures the essentials. Analogy 1: Summarizing a book. A one‑sentence summary (low rate) omits many details (high distortion). A paragraph keeps more plot (higher rate, lower distortion). Rate–distortion theory quantifies the shortest summary length for a target comprehension level. Analogy 2: Streaming video. Low bandwidth (low rate) forces blur or blockiness (higher distortion). Boosting bandwidth reduces artifacts. The slope of the R(D) curve at a point measures how many extra bits you need to shave off a small amount of distortion—like the exchange rate between quality and bandwidth. Example → For a fair coin (binary source) judged with Hamming distortion (0 for correct bit, 1 for error), the curve has a clean form: when you allow D errors per bit on average, you need about 1 − $H_b$ (D) bits/bit—zero bits if you tolerate errors half the time, and one full bit if you insist on zero errors.

03Formal Definition

Hoo k \to T o

reason precisely, we need a clean mathematical object.

C o n ce pt \to L e t

X be a source with distribution

P_{X}

over alphabet

X

, and let

\hat

X be a reconstruction over alphabet

\hat{X}

. A distortion measure d:

X

\times

\hat{X}

\to

[0,

\infty

) assigns a penalty to each reconstruction. The rate–distortion function is R(D) =

min_{P (\hat{X} ∣ X) : E [d (X, \hat{X})] \leq D}

I(X;

\hat

X), where I(X;

\hat

X) is mutual information. This minimum is over all conditional distributions (test channels) from X to

\hat

X. For a length‑n memoryless source and separable distortion

d^{(n)}

(

x^{n}

\hat

x^{n}

) =

\frac{1}{n}

\sum_{i = 1}^{n}

x_{i}

\hat

x_{i}

), the operational meaning is: for any

ϵ

> 0, there exist codes of rate

R > R (D

) achieving distortion

\leq

D+

ϵ

, and no codes with

r a t e < R (D

) can achieve

\leq

D for large n. Lagrangian form: for

β

\geq

0, minimize I(X;

\hat

X) +

β

E

[d(X,

\hat

X)]; the optimal

β

equals -

\frac{d R}{d D}

. Properties: R(D) is convex and nonincreasing in D; R(D)=H(X) at the smallest feasible D (often 0), and R(D)=0 when D exceeds a maximum allowable distortion

D_{m a x}

04When to Use

Hook → Whenever you need to decide “how lossy is acceptable?” Concept → Use rate–distortion theory to set principled targets and evaluate designs. Use cases: (1) Codec design and tuning: pick a distortion target (e.g., PSNR/MSE for images, perceptual scores for audio), compute/estimate R(D), and benchmark codecs against it. (2) Quantizer design: Lloyd–Max (scalar) or k‑means (vector) training aims to approach R(D) with MSE. (3) System budgeting: split a total bit budget across components (e.g., color vs. luma in images) using slopes (\beta) to equalize marginal gains. (4) Machine learning: the Information Bottleneck applies a rate–distortion objective with relevance replacing distortion. (5) Communications and storage: when bandwidth is scarce, operate at a point on R(D) that matches application tolerance for errors. Example → For a binary sensor stream with allowed 5% symbol errors (Hamming distortion), the theoretical minimum rate is $H_b$ (p) − $H_b$ (0.05) bits/symbol, guiding how much compression the telemetry system can safely apply.

⚠️Common Mistakes

Mixing units: computing R in nats (natural logs) but interpreting as bits. Always divide by ln 2 to convert nats to bits.
Ignoring the constraint set: the minimization is over all conditional PMFs P(\hat X|X) on a chosen reproduction alphabet; picking too small an alphabet can artificially inflate R(D).
Confusing empirical distortion with expected distortion: R(D) is defined via expectation under the source model; finite-sample estimates fluctuate and need enough data or smoothing.
Using a mismatched distortion measure: computing R(D) for MSE but evaluating a codec with perceptual metrics leads to misleading comparisons.
Assuming quantizer levels directly give R: K levels imply at most \lo $g_2$ K bits per symbol, but entropy coding and symbol probabilities matter; the actual rate can be lower.
Poor numerical stability in Blahut–Arimoto: exponentials can underflow/overflow; use log-sum-exp tricks and careful normalization.
Expecting R(D) to be linear: R(D) is convex and typically curved; slopes (\beta) vary across D, so uniform bit allocation across components is rarely optimal.

Key Formulas

Rate–Distortion Definition

R (D) = P (\hat{X} ∣ X) : E [d (X, \hat{X})] \leq D min I (X; \hat{X})

Explanation: This defines the smallest mutual information between the source and its reconstruction under an average distortion constraint. It is the fundamental lower bound on bits per symbol needed to achieve distortion D.

Mutual Information (discrete)

I (X; \hat{X}) = x, \overset{x}{^} \sum P_{X, \hat{X}} (x, \overset{x}{^}) lo g \frac{P _{X, \hat{X}} ( x , x ^ )}{P _{X} ( x ) P _{\hat{X}} ( x ^ )}

Explanation: Mutual information measures how many nats (or bits if using log base 2) X tells us about $\hat$ X. In rate–distortion, it equals the minimal required rate at a given distortion.

Average Distortion

E [d (X, \hat{X})] = x, \overset{x}{^} \sum P_{X} (x) P (\hat{X} = \overset{x}{^} ∣ X = x) d (x, \overset{x}{^})

Explanation: This computes the expected distortion under the source distribution and the chosen test channel. It must be no greater than D to be feasible.

Lagrangian Form

P (\hat{X} ∣ X) min [I (X; \hat{X}) + β E [d (X, \hat{X})]]

Explanation: Instead of a hard constraint, introduce a multiplier $β$ $\geq$ 0 to trade off rate and distortion. The optimal $β$ equals the negative slope of R(D).

Slope–Multiplier Relation

β = - \frac{d R ( D )}{d D}

Explanation: At differentiable points, the Lagrange multiplier equals the negative derivative of R with respect to D. It quantifies marginal bits needed to reduce distortion slightly.

Blahut–Arimoto Update

P (\overset{x}{^} ∣ x) = \frac{P ( x ^ ) e ^{- β d (x, \overset{x}{^})}}{x ~ \sum P ( x ~ ) e ^{- β d (x, \tilde{x})}}

Explanation: Given $β$ and current P( $\hat$ x), update the conditional distribution to minimize the Lagrangian. Alternating with P( $\hat$ x)= $\sum_{x}$ $P_{X}$ (x)P( $\hat$ x|x) converges to the optimum.

Binary Source with Hamming Distortion

R (D) = H_{b} (p) - H_{b} (D), 0 \leq D \leq D_{m a x} = min {p, 1 - p}

Explanation: For a Bernoulli(p) source and Hamming distortion, this closed form gives the minimal rate. Beyond $D_{m a x}$ , the rate is zero because always guessing the more likely symbol meets the distortion.

Binary Entropy

H_{b} (q) = - q lo g_{2} q - (1 - q) lo g_{2} (1 - q)

Explanation: This is the entropy of a Bernoulli(q) random variable. It appears in the binary rate–distortion formula.

Gaussian Source with MSE

R (D) = \frac{1}{2} lo g_{2} \frac{σ ^{2}}{D}, 0 < D \leq σ^{2}

Explanation: For a zero-mean Gaussian source of variance $σ^{2}$ and squared-error distortion, this is the exact R(D). It is tight and equals the Shannon lower bound.

Distortion Under Test Channel

D = x, \overset{x}{^} \sum P_{X} (x) P (\overset{x}{^} ∣ x) d (x, \overset{x}{^})

Explanation: Once a test channel is fixed, this computes the achieved distortion; matching a desired D requires tuning $β$ or the reproduction alphabet.

Complexity Analysis

Blahut–Arimoto (BA) for rate–distortion with

∣ X ∣

n_{x}

and

∣ \hat{X} ∣

n_{\overset{x}{^}}

performs, per iteration, two main sweeps: updating P(

\hat

∣ x) f or a ll (x, \overset{x}{^}) p ai rs an d reco m p u t in g P (\overset{x}{^}) . E a c h s w ee pt o u c h es O (n_{x} n_{\overset{x}{^}}) e n t r i es, soo n e i t er a t i o n cos t s O (n_{x} n_{\overset{x}{^}}) . I f I i t er a t i o n s a re n ee d e d t oco n v er g e f or a g i v e n L a g r an g e m u lt i pl i er β, t h e t o t a lt im e i s O (I n_{x} n_{\overset{x}{^}}) . M e m ory h o l d s P (\overset{x}{^}), P (\overset{x}{^} ∣

x), and the distortion matrix, requiring O(

n_{x}

n_{\overset{x}{^}}

) space. To trace the R–D curve you typically evaluate multiple β values; if you use B values, multiply time by B. Numerically, convergence accelerates when P(

\hat

x) is warm-started from the previous β and when log-sum-exp stabilization is used to avoid underflow at large

β .

Lloyd–Max scalar quantizer training on N samples with K levels alternates assignment and centroid update steps. Each iteration evaluates distances for all N×K pairs and then recomputes K means, costing O(NK) per iteration; with J iterations the time is O(JNK). Memory is O(N+K). Entropy estimation of the quantized outputs costs O(N+K). Closed-form evaluations like the binary source with Hamming distortion are O(1) per D, trivial in both time and space. In practice, BA dominates computational cost for general discrete sources; careful implementation (vectorization, caching

e^{- β d (x, \overset{x}{^})}

, and convergence checks) can materially reduce runtime. Lloyd–Max is fast and scalable but only approximates R(D) because it ignores optimal stochastic encoders and entropy coding effects.

Code Examples

Blahut–Arimoto for Rate–Distortion (discrete source, generic distortion)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct RDResult {
5     vector<double> P_hat;                 // P(\hat x)
6     vector<vector<double>> P_hat_given_x; // P(\hat x|x)
7     double D;                             // achieved distortion
8     double R_nats;                        // mutual information in nats
9     double R_bits;                        // mutual information in bits
10     int iters;                            // iterations used
11 };
12 
13 static inline double logsumexp(const vector<double>& v) {
14     double m = *max_element(v.begin(), v.end());
15     double s = 0.0;
16     for (double x : v) s += exp(x - m);
17     return m + log(s);
18 }
19 
20 RDResult blahut_arimoto_RD(const vector<double>& P_x,
21                            const vector<vector<double>>& distortion, // d(x,\hat x)
22                            double beta, // Lagrange multiplier (>=0)
23                            int max_iters = 10000,
24                            double tol = 1e-10) {
25     int nx = (int)P_x.size();
26     int nh = (int)distortion[0].size();
27 
28     // Initialize P(\hat x) uniformly
29     vector<double> P_hat(nh, 1.0 / nh);
30     vector<vector<double>> P_hat_given_x(nx, vector<double>(nh, 0.0));
31 
32     auto KL = [&](const vector<double>& p, const vector<double>& q){
33         double v = 0.0;
34         for (int i = 0; i < (int)p.size(); ++i) if (p[i] > 0 && q[i] > 0) v += p[i] * log(p[i] / q[i]);
35         return v;
36     };
37 
38     double prev_obj = numeric_limits<double>::infinity();
39     int it = 0;
40     for (; it < max_iters; ++it) {
41         // Update P(\hat x|x) using stabilized softmax: P(h|x) \propto P(h) * exp(-beta * d(x,h))
42         for (int x = 0; x < nx; ++x) {
43             // compute unnormalized log-weights: log P_hat[h] - beta * d(x,h)
44             vector<double> lw(nh);
45             for (int h = 0; h < nh; ++h) lw[h] = log(max(P_hat[h], 1e-300)) - beta * distortion[x][h];
46             double lZ = logsumexp(lw);
47             for (int h = 0; h < nh; ++h) P_hat_given_x[x][h] = exp(lw[h] - lZ);
48         }
49         // Update P(\hat x) = sum_x P(x) P(\hat x|x)
50         vector<double> new_P_hat(nh, 0.0);
51         for (int h = 0; h < nh; ++h) {
52             double s = 0.0;
53             for (int x = 0; x < nx; ++x) s += P_x[x] * P_hat_given_x[x][h];
54             new_P_hat[h] = s;
55         }
56 
57         // Compute objective: I + beta * E[d]
58         // I = sum_x P(x) KL(P(h|x) || P(h))
59         double I = 0.0, Ed = 0.0;
60         for (int x = 0; x < nx; ++x) {
61             I += P_x[x] * KL(P_hat_given_x[x], new_P_hat);
62             for (int h = 0; h < nh; ++h) Ed += P_x[x] * P_hat_given_x[x][h] * distortion[x][h];
63         }
64         double obj = I + beta * Ed;
65 
66         // Convergence check on objective and on P_hat
67         double diffP = 0.0;
68         for (int h = 0; h < nh; ++h) diffP = max(diffP, fabs(new_P_hat[h] - P_hat[h]));
69 
70         P_hat.swap(new_P_hat);
71 
72         if (fabs(prev_obj - obj) < tol && diffP < sqrt(tol)) break;
73         prev_obj = obj;
74     }
75 
76     // Compute final metrics
77     double I = 0.0, Ed = 0.0;
78     for (int x = 0; x < nx; ++x) {
79         // Recompute P(h|x) one last time with final P_hat to be consistent
80         vector<double> lw(nh);
81         for (int h = 0; h < nh; ++h) lw[h] = log(max(P_hat[h], 1e-300)) - beta * distortion[x][h];
82         double lZ = logsumexp(lw);
83         for (int h = 0; h < nh; ++h) {
84             P_hat_given_x[x][h] = exp(lw[h] - lZ);
85             Ed += P_x[x] * P_hat_given_x[x][h] * distortion[x][h];
86         }
87         // KL per x
88         double kl = 0.0;
89         for (int h = 0; h < nh; ++h) if (P_hat_given_x[x][h] > 0 && P_hat[h] > 0)
90             kl += P_hat_given_x[x][h] * log(P_hat_given_x[x][h] / P_hat[h]);
91         I += P_x[x] * kl;
92     }
93 
94     RDResult res;
95     res.P_hat = P_hat;
96     res.P_hat_given_x = P_hat_given_x;
97     res.D = Ed;
98     res.R_nats = I;
99     res.R_bits = I / log(2.0);
100     res.iters = it + 1;
101     return res;
102 }
103 
104 int main() {
105     // Example: 3-symbol source X with values {0,1,3} and reproduction levels {0,1,2,3}
106     // Source distribution P_X
107     vector<double> P_x = {0.7, 0.2, 0.1};
108 
109     // Distortion: squared error between source values {0,1,3} and recon levels {0,1,2,3}
110     vector<int> xvals = {0,1,3};
111     vector<int> hvals = {0,1,2,3};
112     vector<vector<double>> d(xvals.size(), vector<double>(hvals.size()));
113     for (size_t i = 0; i < xvals.size(); ++i)
114         for (size_t j = 0; j < hvals.size(); ++j)
115             d[i][j] = pow(double(xvals[i] - hvals[j]), 2.0);
116 
117     cout << fixed << setprecision(6);
118 
119     // Sweep several beta values to trace (D, R)
120     vector<double> betas = {0.1, 0.5, 1.0, 2.0, 5.0, 10.0};
121     cout << "beta\tD\tR_bits\titers" << '\n';
122     for (double beta : betas) {
123         RDResult r = blahut_arimoto_RD(P_x, d, beta, 10000, 1e-12);
124         cout << beta << '\t' << r.D << '\t' << r.R_bits << '\t' << r.iters << '\n';
125     }
126 
127     return 0;
128 }
129

This program implements the Blahut–Arimoto algorithm for computing a point on the rate–distortion curve at a chosen Lagrange multiplier β. It stabilizes exponentials with log-sum-exp, alternates updates for P(\hat x|x) and P(\hat x), and reports the achieved average distortion D and mutual information R in bits. The example uses a 3-symbol discrete source with squared-error distortion to a 4-level reproduction alphabet and sweeps β to sketch the R–D trade-off.

Time: O(I · n_x · n_{\hat x}) per β (I = iterations)Space: O(n_x · n_{\hat x})

Closed-form R(D) for Binary Source with Hamming Distortion

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 static inline double Hb(double p){
5     if (p <= 0.0) return 0.0;
6     if (p >= 1.0) return 0.0;
7     return -p*log2(p) - (1.0-p)*log2(1.0-p);
8 }
9 
10 // R(D) = H_b(p) - H_b(D) for 0 <= D <= min(p,1-p); else 0
11 static double R_of_D_binary(double p, double D){
12     double Dmax = min(p, 1.0 - p);
13     if (D < 0) return numeric_limits<double>::quiet_NaN();
14     if (D > Dmax) return 0.0;
15     return max(0.0, Hb(p) - Hb(D));
16 }
17 
18 int main(){
19     cout << fixed << setprecision(6);
20     double p = 0.3; // Bernoulli source parameter P(X=1)
21     double Dmax = min(p, 1.0 - p);
22     cout << "p=" << p << ", D_max=" << Dmax << "\n";
23     cout << "D\tR(D) [bits]" << "\n";
24     for (int i = 0; i <= 10; ++i){
25         double D = Dmax * i / 10.0;
26         cout << D << "\t" << R_of_D_binary(p, D) << "\n";
27     }
28     return 0;
29 }
30

For a Bernoulli(p) source with Hamming distortion, the rate–distortion function has the closed form R(D)=H_b(p)−H_b(D) on 0≤D≤min(p,1−p) and 0 beyond. This code computes and prints the curve for a chosen p, serving as a sanity check and a baseline to compare against numerical Blahut–Arimoto results on the same source.

Time: O(1) per DSpace: O(1)

Lloyd–Max Scalar Quantizer for Gaussian Source (Estimate D and Output Entropy)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Generate N samples from N(0,1)
5 vector<double> gaussian_samples(size_t N, unsigned seed=42){
6     mt19937 rng(seed);
7     normal_distribution<double> nd(0.0, 1.0);
8     vector<double> x(N);
9     for (size_t i=0;i<N;++i) x[i] = nd(rng);
10     return x;
11 }
12 
13 struct QResult {
14     vector<double> levels;   // reconstruction levels (codebook)
15     vector<double> thresh;   // decision thresholds (size K+1)
16     double mse;              // empirical distortion
17     double entropy_bits;     // entropy of output symbols (upper-bound to rate)
18 };
19 
20 QResult lloyd_max_gaussian(const vector<double>& x, int K, int max_iters=100, double tol=1e-6){
21     // Initialize levels by quantiles
22     vector<double> sorted = x;
23     sort(sorted.begin(), sorted.end());
24     vector<double> levels(K);
25     for (int k=0;k<K;++k){
26         size_t idx = (size_t)round(( (k+0.5)/K ) * (sorted.size()-1));
27         levels[k] = sorted[idx];
28     }
29     vector<double> thresh(K+1);
30     thresh.front() = -numeric_limits<double>::infinity();
31     thresh.back()  =  numeric_limits<double>::infinity();
32 
33     auto update_thresholds = [&](){
34         for (int k=0;k<K-1;++k)
35             thresh[k+1] = 0.5*(levels[k] + levels[k+1]);
36     };
37 
38     auto assign_and_update = [&](){
39         vector<double> sum(K,0.0); vector<int> cnt(K,0);
40         for (double v : x){
41             // find bin via binary search on thresholds
42             int lo=0, hi=K; // find k s.t. thresh[k] <= v < thresh[k+1]
43             while (hi-lo>1){ int mid=(lo+hi)/2; if (v>=thresh[mid]) lo=mid; else hi=mid; }
44             int k=lo;
45             sum[k]+=v; cnt[k]++;
46         }
47         double max_change=0.0;
48         for (int k=0;k<K;++k){
49             if (cnt[k]>0){
50                 double new_level = sum[k]/cnt[k];
51                 max_change = max(max_change, fabs(new_level - levels[k]));
52                 levels[k] = new_level;
53             }
54         }
55         return max_change;
56     };
57 
58     update_thresholds();
59     double change=0.0; int it=0;
60     for (; it<max_iters; ++it){
61         change = assign_and_update();
62         update_thresholds();
63         if (change < tol) break;
64     }
65 
66     // Evaluate MSE and output entropy
67     vector<int> hist(K,0);
68     double mse=0.0;
69     for (double v : x){
70         int lo=0, hi=K; while (hi-lo>1){ int mid=(lo+hi)/2; if (v>=thresh[mid]) lo=mid; else hi=mid; }
71         int k=lo; double q = levels[k];
72         hist[k]++;
73         mse += (v - q)*(v - q);
74     }
75     mse /= (double)x.size();
76     double H=0.0;
77     for (int k=0;k<K;++k){
78         if (hist[k]==0) continue;
79         double p = (double)hist[k]/x.size();
80         H += -p*log2(p);
81     }
82 
83     return {levels, thresh, mse, H};
84 }
85 
86 int main(){
87     ios::sync_with_stdio(false);
88     cin.tie(nullptr);
89 
90     int K = 8;           // number of quantization levels (~3 bits/symbol max)
91     size_t N = 200000;   // samples for training/evaluation
92     auto x = gaussian_samples(N, 123);
93 
94     auto res = lloyd_max_gaussian(x, K, 200, 1e-7);
95 
96     cout << fixed << setprecision(6);
97     cout << "K=" << K << " levels, Empirical MSE D=" << res.mse << "\n";
98     cout << "Output entropy (upper bound on rate) H(\u005chat{X})=" << res.entropy_bits << " bits/symbol\n";
99     cout << "Gaussian Shannon limit R(D) = 0.5*log2(sigma^2/D) with sigma^2=1: "
100          << 0.5*log2(1.0/res.mse) << " bits/symbol\n";
101 
102     return 0;
103 }
104

This code trains a 1D Lloyd–Max quantizer with K levels on samples from a N(0,1) source. It iteratively refines decision thresholds and reconstruction levels to minimize empirical MSE. After convergence, it reports the achieved distortion and the entropy of the quantized outputs, which upper-bounds the actual coding rate with ideal entropy coding. It also prints the theoretical Gaussian R(D) for comparison.

Time: O(J · N · K) where J is the number of Lloyd iterationsSpace: O(N + K)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct RDResult {
5	vector<double> P_hat; // P(\hat x)
6	vector<vector<double>> P_hat_given_x; // P(\hat x\|x)
7	double D; // achieved distortion
8	double R_nats; // mutual information in nats
9	double R_bits; // mutual information in bits
10	int iters; // iterations used
11	};
12
13	static inline double logsumexp(const vector<double>& v) {
14	double m = *max_element(v.begin(), v.end());
15	double s = 0.0;
16	for (double x : v) s += exp(x - m);
17	return m + log(s);
18	}
19
20	RDResult blahut_arimoto_RD(const vector<double>& P_x,
21	const vector<vector<double>>& distortion, // d(x,\hat x)
22	double beta, // Lagrange multiplier (>=0)
23	int max_iters = 10000,
24	double tol = 1e-10) {
25	int nx = (int)P_x.size();
26	int nh = (int)distortion[0].size();
27
28	// Initialize P(\hat x) uniformly
29	vector<double> P_hat(nh, 1.0 / nh);
30	vector<vector<double>> P_hat_given_x(nx, vector<double>(nh, 0.0));
31
32	auto KL = [&](const vector<double>& p, const vector<double>& q){
33	double v = 0.0;
34	for (int i = 0; i < (int)p.size(); ++i) if (p[i] > 0 && q[i] > 0) v += p[i] * log(p[i] / q[i]);
35	return v;
36	};
37
38	double prev_obj = numeric_limits<double>::infinity();
39	int it = 0;
40	for (; it < max_iters; ++it) {
41	// Update P(\hat x\|x) using stabilized softmax: P(h\|x) \propto P(h) * exp(-beta * d(x,h))
42	for (int x = 0; x < nx; ++x) {
43	// compute unnormalized log-weights: log P_hat[h] - beta * d(x,h)
44	vector<double> lw(nh);
45	for (int h = 0; h < nh; ++h) lw[h] = log(max(P_hat[h], 1e-300)) - beta * distortion[x][h];
46	double lZ = logsumexp(lw);
47	for (int h = 0; h < nh; ++h) P_hat_given_x[x][h] = exp(lw[h] - lZ);
48	}
49	// Update P(\hat x) = sum_x P(x) P(\hat x\|x)
50	vector<double> new_P_hat(nh, 0.0);
51	for (int h = 0; h < nh; ++h) {
52	double s = 0.0;
53	for (int x = 0; x < nx; ++x) s += P_x[x] * P_hat_given_x[x][h];
54	new_P_hat[h] = s;
55	}
56
57	// Compute objective: I + beta * E[d]
58	// I = sum_x P(x) KL(P(h\|x) \|\| P(h))
59	double I = 0.0, Ed = 0.0;
60	for (int x = 0; x < nx; ++x) {
61	I += P_x[x] * KL(P_hat_given_x[x], new_P_hat);
62	for (int h = 0; h < nh; ++h) Ed += P_x[x] * P_hat_given_x[x][h] * distortion[x][h];
63	}
64	double obj = I + beta * Ed;
65
66	// Convergence check on objective and on P_hat
67	double diffP = 0.0;
68	for (int h = 0; h < nh; ++h) diffP = max(diffP, fabs(new_P_hat[h] - P_hat[h]));
69
70	P_hat.swap(new_P_hat);
71
72	if (fabs(prev_obj - obj) < tol && diffP < sqrt(tol)) break;
73	prev_obj = obj;
74	}
75
76	// Compute final metrics
77	double I = 0.0, Ed = 0.0;
78	for (int x = 0; x < nx; ++x) {
79	// Recompute P(h\|x) one last time with final P_hat to be consistent
80	vector<double> lw(nh);
81	for (int h = 0; h < nh; ++h) lw[h] = log(max(P_hat[h], 1e-300)) - beta * distortion[x][h];
82	double lZ = logsumexp(lw);
83	for (int h = 0; h < nh; ++h) {
84	P_hat_given_x[x][h] = exp(lw[h] - lZ);
85	Ed += P_x[x] * P_hat_given_x[x][h] * distortion[x][h];
86	}
87	// KL per x
88	double kl = 0.0;
89	for (int h = 0; h < nh; ++h) if (P_hat_given_x[x][h] > 0 && P_hat[h] > 0)
90	kl += P_hat_given_x[x][h] * log(P_hat_given_x[x][h] / P_hat[h]);
91	I += P_x[x] * kl;
92	}
93
94	RDResult res;
95	res.P_hat = P_hat;
96	res.P_hat_given_x = P_hat_given_x;
97	res.D = Ed;
98	res.R_nats = I;
99	res.R_bits = I / log(2.0);
100	res.iters = it + 1;
101	return res;
102	}
103
104	int main() {
105	// Example: 3-symbol source X with values {0,1,3} and reproduction levels {0,1,2,3}
106	// Source distribution P_X
107	vector<double> P_x = {0.7, 0.2, 0.1};
108
109	// Distortion: squared error between source values {0,1,3} and recon levels {0,1,2,3}
110	vector<int> xvals = {0,1,3};
111	vector<int> hvals = {0,1,2,3};
112	vector<vector<double>> d(xvals.size(), vector<double>(hvals.size()));
113	for (size_t i = 0; i < xvals.size(); ++i)
114	for (size_t j = 0; j < hvals.size(); ++j)
115	d[i][j] = pow(double(xvals[i] - hvals[j]), 2.0);
116
117	cout << fixed << setprecision(6);
118
119	// Sweep several beta values to trace (D, R)
120	vector<double> betas = {0.1, 0.5, 1.0, 2.0, 5.0, 10.0};
121	cout << "beta\tD\tR_bits\titers" << '\n';
122	for (double beta : betas) {
123	RDResult r = blahut_arimoto_RD(P_x, d, beta, 10000, 1e-12);
124	cout << beta << '\t' << r.D << '\t' << r.R_bits << '\t' << r.iters << '\n';
125	}
126
127	return 0;
128	}
129

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	static inline double Hb(double p){
5	if (p <= 0.0) return 0.0;
6	if (p >= 1.0) return 0.0;
7	return -plog2(p) - (1.0-p)log2(1.0-p);
8	}
9
10	// R(D) = H_b(p) - H_b(D) for 0 <= D <= min(p,1-p); else 0
11	static double R_of_D_binary(double p, double D){
12	double Dmax = min(p, 1.0 - p);
13	if (D < 0) return numeric_limits<double>::quiet_NaN();
14	if (D > Dmax) return 0.0;
15	return max(0.0, Hb(p) - Hb(D));
16	}
17
18	int main(){
19	cout << fixed << setprecision(6);
20	double p = 0.3; // Bernoulli source parameter P(X=1)
21	double Dmax = min(p, 1.0 - p);
22	cout << "p=" << p << ", D_max=" << Dmax << "\n";
23	cout << "D\tR(D) [bits]" << "\n";
24	for (int i = 0; i <= 10; ++i){
25	double D = Dmax * i / 10.0;
26	cout << D << "\t" << R_of_D_binary(p, D) << "\n";
27	}
28	return 0;
29	}
30

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Generate N samples from N(0,1)
5	vector<double> gaussian_samples(size_t N, unsigned seed=42){
6	mt19937 rng(seed);
7	normal_distribution<double> nd(0.0, 1.0);
8	vector<double> x(N);
9	for (size_t i=0;i<N;++i) x[i] = nd(rng);
10	return x;
11	}
12
13	struct QResult {
14	vector<double> levels; // reconstruction levels (codebook)
15	vector<double> thresh; // decision thresholds (size K+1)
16	double mse; // empirical distortion
17	double entropy_bits; // entropy of output symbols (upper-bound to rate)
18	};
19
20	QResult lloyd_max_gaussian(const vector<double>& x, int K, int max_iters=100, double tol=1e-6){
21	// Initialize levels by quantiles
22	vector<double> sorted = x;
23	sort(sorted.begin(), sorted.end());
24	vector<double> levels(K);
25	for (int k=0;k<K;++k){
26	size_t idx = (size_t)round(( (k+0.5)/K ) * (sorted.size()-1));
27	levels[k] = sorted[idx];
28	}
29	vector<double> thresh(K+1);
30	thresh.front() = -numeric_limits<double>::infinity();
31	thresh.back() = numeric_limits<double>::infinity();
32
33	auto update_thresholds = [&](){
34	for (int k=0;k<K-1;++k)
35	thresh[k+1] = 0.5*(levels[k] + levels[k+1]);
36	};
37
38	auto assign_and_update = [&](){
39	vector<double> sum(K,0.0); vector<int> cnt(K,0);
40	for (double v : x){
41	// find bin via binary search on thresholds
42	int lo=0, hi=K; // find k s.t. thresh[k] <= v < thresh[k+1]
43	while (hi-lo>1){ int mid=(lo+hi)/2; if (v>=thresh[mid]) lo=mid; else hi=mid; }
44	int k=lo;
45	sum[k]+=v; cnt[k]++;
46	}
47	double max_change=0.0;
48	for (int k=0;k<K;++k){
49	if (cnt[k]>0){
50	double new_level = sum[k]/cnt[k];
51	max_change = max(max_change, fabs(new_level - levels[k]));
52	levels[k] = new_level;
53	}
54	}
55	return max_change;
56	};
57
58	update_thresholds();
59	double change=0.0; int it=0;
60	for (; it<max_iters; ++it){
61	change = assign_and_update();
62	update_thresholds();
63	if (change < tol) break;
64	}
65
66	// Evaluate MSE and output entropy
67	vector<int> hist(K,0);
68	double mse=0.0;
69	for (double v : x){
70	int lo=0, hi=K; while (hi-lo>1){ int mid=(lo+hi)/2; if (v>=thresh[mid]) lo=mid; else hi=mid; }
71	int k=lo; double q = levels[k];
72	hist[k]++;
73	mse += (v - q)*(v - q);
74	}
75	mse /= (double)x.size();
76	double H=0.0;
77	for (int k=0;k<K;++k){
78	if (hist[k]==0) continue;
79	double p = (double)hist[k]/x.size();
80	H += -p*log2(p);
81	}
82
83	return {levels, thresh, mse, H};
84	}
85
86	int main(){
87	ios::sync_with_stdio(false);
88	cin.tie(nullptr);
89
90	int K = 8; // number of quantization levels (~3 bits/symbol max)
91	size_t N = 200000; // samples for training/evaluation
92	auto x = gaussian_samples(N, 123);
93
94	auto res = lloyd_max_gaussian(x, K, 200, 1e-7);
95
96	cout << fixed << setprecision(6);
97	cout << "K=" << K << " levels, Empirical MSE D=" << res.mse << "\n";
98	cout << "Output entropy (upper bound on rate) H(\u005chat{X})=" << res.entropy_bits << " bits/symbol\n";
99	cout << "Gaussian Shannon limit R(D) = 0.5*log2(sigma^2/D) with sigma^2=1: "
100	<< 0.5*log2(1.0/res.mse) << " bits/symbol\n";
101
102	return 0;
103	}
104