🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathAdvanced

Lebesgue Integration

Key Points

  • •
    Lebesgue integration measures how much time a function spends near each value and adds up value × size of the set where it occurs.
  • •
    It integrates by slicing the range (y-axis) of a function and measuring preimages, unlike Riemann which slices the domain (x-axis).
  • •
    A bounded function on [a,b] is Riemann integrable if and only if its set of discontinuities has Lebesgue measure zero; otherwise Lebesgue integral often still exists.
  • •
    Lebesgue integration excels for wildly discontinuous functions and connects directly to probability via expectations.
  • •
    Simple functions approximate nonnegative measurable functions from below; their integrals are easy sums of value × measure.
  • •
    Powerful theorems (Monotone and Dominated Convergence) justify swapping limits and integrals under mild conditions.
  • •
    Monte Carlo estimation naturally implements the Lebesgue viewpoint by sampling the domain according to measure and averaging values.
  • •
    In practice, histogram-based “simple function” approximations and Monte Carlo provide robust numeric approximations with O(N) time.

Prerequisites

  • →Sets and functions — Understanding preimages f^{-1}(A) and basic set operations is essential to definining measurability and level sets.
  • →Sigma-algebras and measures — Lebesgue integration is built on measurable sets and measures; these define which sets have size and how to add sizes.
  • →Limits, supremum/infimum — The integral is defined via suprema of simple-function integrals and limits of approximations.
  • →Riemann integration — Provides a baseline method to compare with and motivates Lebesgue’s improvements.
  • →Probability and expectation — Lebesgue integrals are expectations; Monte Carlo methods rely on this equivalence.
  • →Basic real analysis — Concepts like convergence almost everywhere, dominated convergence, and L^1 are foundational.
  • →Numerical sampling and random number generation — Practical Lebesgue approximations use Monte Carlo and histograms, which require sampling uniformly.

Detailed Explanation

Tap terms for definitions

01Overview

Lebesgue integration is a way to define and compute the “area under a curve” that is more flexible and powerful than the classical Riemann integral. Instead of partitioning the horizontal axis into many tiny intervals and summing heights × widths, Lebesgue’s idea is to ask: for each output value y, how large is the set of x where f(x) is about y? We then add up value × size-of-that-set across all values. This shift—from chopping the domain to grouping by function values—makes it possible to integrate functions with many discontinuities, or those that are unbounded on small sets, in a clean and robust way. A central building block is the simple function: a function that takes only finitely many values. Any nonnegative measurable function can be approximated from below by an increasing sequence of simple functions. We first define the Lebesgue integral for simple functions (a finite weighted sum of measures of level sets) and then take limits. For signed functions, we integrate positive and negative parts separately. Lebesgue integration beautifully unifies analysis and probability: the Lebesgue integral is the expected value of a random variable. On [a,b], sampling uniformly and averaging f(x) estimates the integral. This connection enables practical Monte Carlo approximations and underlies modern data science and machine learning.

02Intuition & Analogies

Imagine a city with neighborhoods (subsets of the domain) and a temperature map f(x). The Riemann approach surveys the city by slicing it into a fixed grid of blocks and averaging temperatures per block—good if temperature doesn’t spike within blocks. Lebesgue instead groups all locations sharing similar temperatures—“everyone between 20–21°C,” “between 21–22°C,” and so on—regardless of where they live in the city. For each temperature band, you measure how big that group is (its area/measure), then compute (average temperature in the band) × (size of the band), and finally add over all bands. If extreme temperatures occur in tiny pockets scattered everywhere, the Lebesgue method still handles it: those pockets might be tiny in total area, so their contribution is small, even if the spikes are huge. Another analogy: think of a playlist sorted by song loudness (range-based) rather than by time (domain-based). To estimate total energy used, you multiply each loudness level by how long the playlist stays at that loudness. Lebesgue integrates exactly like that—group by output, weigh by how long we stay there. This viewpoint explains why Lebesgue integration is stable under messy behavior on negligible sets. If a function misbehaves on a set the size of a dust speck (measure zero), its contribution to the total is zero. In probability terms, if you pick a random point uniformly from [0,1], the chance of landing on that dust speck is 0; expected value ignores it. That’s why Monte Carlo averages are natural Lebesgue integrals: expectation of f(X) with X uniform on [a,b] equals the integral divided by (b−a).

03Formal Definition

Let (X, F, μ) be a measure space. A simple function is s = ∑k=1m​ ak​ 1Ek​​, where each Ek​ ∈ F is measurable and ak​ ≥ 0. Define its Lebesgue integral by ∫ s \, dμ = ∑k=1m​ ak​ \, μ(Ek​). For a nonnegative measurable function f: X → [0, ∞], define its Lebesgue integral as the supremum over all simple functions bounded above by f: ∫ f \, dμ = sup\{ ∫ s \, dμ : 0 ≤ s ≤ f, \; s  simple\}. For a general (possibly signed) measurable function f, decompose f=f+−f−, where f+ = max(f,0) and f− = max(-f,0). If ∫ f+ \, dμ and ∫ f− \, dμ are both finite, define ∫ f \, dμ = ∫ f+ \, dμ - ∫ f− \, dμ. The space L1(μ) consists of (equivalence classes of) measurable functions f with ∫ ∣f∣ \, dμ < ∞. On an interval [a,b] with Lebesgue measure m, for f integrable, ∫ab​ f(x) \, dx denotes ∫ f \, dm. If X ∼ Uniform(a,b), then ∫ab​ f(x) \, dx = (b-a)\, E[f(X)].

04When to Use

Use Lebesgue integration when functions are highly irregular, have many (even uncountably many) discontinuities, or are unbounded on sets of small measure. Classic examples include indicator functions of fractal-like sets, functions equal to different values on rationals vs irrationals, or signals with tall, very narrow spikes. It is also the right tool when swapping limits and integrals is necessary; the Monotone and Dominated Convergence Theorems provide clean conditions for doing this safely. In probability and statistics, expected values, variances, and likelihoods are Lebesgue integrals with respect to probability measures. Monte Carlo methods (random sampling) are straightforward implementations of Lebesgue’s viewpoint: the integral equals measure × expectation. In numerical work, if Riemann sums are unstable because spikes are missed by fixed grid samples, range-based (histogram) approximations or Monte Carlo averages tend to be more robust. When you need invariance to changes on negligible sets (e.g., replacing f by an equal-almost-everywhere version), Lebesgue integration behaves perfectly; Riemann integration may fail even for simple, bounded functions if the set of discontinuities is too large (e.g., the characteristic function of rationals).

⚠️Common Mistakes

  • Confusing “measure zero” with “empty” or “small but nonzero.” A set can be infinite yet have Lebesgue measure zero (e.g., the rationals). Changing f on a measure-zero set does not change the Lebesgue integral, but it can destroy Riemann integrability.
  • Believing improper Riemann integrals and Lebesgue integrals are the same concept. Improper Riemann integration uses limits of Riemann integrals; Lebesgue integration is defined via measurable sets and simple functions and handles many more cases cleanly.
  • Ignoring absolute integrability. For Lebesgue, \int f exists (as a finite number) only if \int |f| < \infty (for signed functions). Conditional convergence in the Riemann sense does not transfer; Lebesgue demands integrability of the absolute value.
  • Misusing convergence theorems. Monotone Convergence requires a nondecreasing sequence of nonnegative functions; Dominated Convergence requires an integrable dominating function. Without these, interchanging limit and integral may fail.
  • Numerical pitfall: using uniform Riemann grids for “spiky” functions. Midpoint/trapezoid rules can miss narrow spikes entirely; Monte Carlo or Lebesgue-style histograms, which estimate how often values occur, are typically more reliable.
  • Treating floating-point rationals/irrationals literally. In code, every sample is a floating number; you cannot distinguish rationals vs irrationals numerically. Use theoretical properties (measure zero) or model sets explicitly rather than testing exact rationality.

Key Formulas

Simple Function Integral

∫sdμ=k=1∑m​ak​μ(Ek​)

Explanation: For a simple function taking finitely many values ak​ on measurable sets Ek​, the integral is just value × measure summed over levels. This is the starting point for the Lebesgue integral.

Lebesgue Integral (Nonnegative)

∫fdμ=sup{∫sdμ:0≤s≤f,s simple}

Explanation: A nonnegative measurable function is integrated as the supremum of integrals of all simple functions lying below it. This formalizes approximating f from below by step-like functions.

Signed Function Decomposition

∫fdμ=∫f+dμ−∫f−dμ,f±=max(±f,0)

Explanation: Integrate the positive and negative parts separately and subtract. The integral exists (is finite) if and only if both parts have finite integrals.

Indicator Integrates to Measure

1E​ integrand: ∫1E​dμ=μ(E)

Explanation: The integral of an indicator function equals the measure of the set. This ties integrals directly to “sizes” of sets.

Integral as Expectation

∫ab​f(x)dx=(b−a)E[f(U)],U∼Uniform(a,b)

Explanation: Sampling U uniformly on [a,b], the average value of f(U) times the interval length equals the integral. This connects Lebesgue integration to Monte Carlo methods.

Monotone Convergence Theorem

∫n→∞lim​fn​dμ=n→∞lim​∫fn​dμif 0≤fn​↑f

Explanation: For nonnegative functions increasing pointwise, the integral of the limit equals the limit of the integrals. This justifies building integrals via increasing simple approximations.

Dominated Convergence Theorem

n→∞lim​∫fn​dμ=∫fdμif fn​→f a.e. and ∣fn​∣≤g∈L1

Explanation: If a dominating integrable function exists, limits pass through the integral. This is crucial for analysis and numerical limit manipulations.

Riemann Integrability Criterion

f bounded on [a,b] is Riemann integrable ⟺m(Df​)=0

Explanation: Df​ is the set of discontinuities. If it has measure zero, the Riemann and Lebesgue integrals exist and agree; if not, Riemann may fail while Lebesgue still works.

Monte Carlo Convergence Rate

​N1​i=1∑N​f(Xi​)−E[f(X)]​=OP​(N​σ​)

Explanation: With i.i.d. samples and finite variance σ2, the sample mean converges to the expectation at rate 1/N​. This characterizes the error decay of Monte Carlo integration.

Riemann Sum Definition

∫ab​f(x)dx=∥P∥→0lim​i∑​f(ξi​)(xi​−xi−1​) when Riemann-integrable

Explanation: Partition P=[x0​,…,xn​] with tags ξi​∈[xi−1​,xi​]; if sums converge as mesh goes to zero, the Riemann integral exists. When it exists for a bounded function, it equals the Lebesgue integral.

Complexity Analysis

The algorithms accompanying this lesson approximate Lebesgue integrals numerically by sampling the domain. For Monte Carlo integration over [a,b], we draw N independent uniform samples and compute the sample mean of f(x), scaling by (b−a). Each evaluation is O(1) (assuming f is O(1)), so the total time is O(N). Memory usage is O(1) beyond random number generation and running totals. The statistical error decreases on the order of 1/N​ provided Var[f(U)] is finite; this is dimension-independent but can be slow compared to high-order quadrature on smooth functions. Histogram-based “simple function” approximation partitions the range of f into B bins and estimates the measure of each preimage via N random samples. Time is O(N + B) (one pass to bin, one pass to accumulate), and memory is O(B) to store bin counts. This mirrors the theoretical Lebesgue construction with simple functions and is robust to narrow spikes dispersed across the domain. For comparison, a midpoint Riemann sum with M intervals costs O(M) time and O(1) memory. Its error depends on missing localized features: if f has tall, narrow spikes occupying small total measure, a coarse Riemann grid can entirely miss them, leading to large bias even as M grows moderately. In contrast, Monte Carlo detects spikes proportionally to their measure: the expected fraction of hits equals the total measure of the spiky regions, so estimates remain unbiased (again assuming finite variance). In practice, choose N based on desired confidence bounds and B to balance bias (bin width) and variance (count noise).

Code Examples

Lebesgue-style integration via range (y-axis) histogram of simple functions
1#include <bits/stdc++.h>
2using namespace std;
3
4// Approximate \int_a^b f(x) dx by a Lebesgue-style simple function:
5// 1) Sample N points X ~ Uniform[a,b]
6// 2) Bin f(X) into B range-bins [y_j, y_{j+1})
7// 3) Use bin mid as simple-function value and estimate measure of preimage by counts/N * (b-a)
8// Returns the estimate and prints diagnostics.
9struct LebesgueHistogramResult {
10 double estimate;
11 double y_min, y_max;
12 vector<size_t> counts;
13};
14
15LebesgueHistogramResult lebesgue_hist_integral(
16 const function<double(double)>& f,
17 double a, double b,
18 size_t samples, size_t bins,
19 uint64_t seed = 42ULL
20){
21 mt19937_64 rng(seed);
22 uniform_real_distribution<double> U(a, b);
23
24 vector<double> vals;
25 vals.reserve(samples);
26 for (size_t i = 0; i < samples; ++i) {
27 double x = U(rng);
28 vals.push_back(f(x));
29 }
30
31 // Determine range from samples (robustified by small padding)
32 auto [mn_it, mx_it] = minmax_element(vals.begin(), vals.end());
33 double y_min = *mn_it, y_max = *mx_it;
34 if (y_min == y_max) {
35 // Constant function case
36 double estimate = (b - a) * y_min;
37 return {estimate, y_min, y_max, vector<size_t>(bins, 0)};
38 }
39 double pad = 1e-12 * max(1.0, max(fabs(y_min), fabs(y_max)));
40 y_min -= pad; y_max += pad;
41
42 vector<size_t> counts(bins, 0);
43 auto bin_of = [&](double y){
44 double t = (y - y_min) / (y_max - y_min);
45 size_t k = (size_t) floor(t * bins);
46 if (k >= bins) k = bins - 1; // clamp
47 return k;
48 };
49
50 for (double y : vals) counts[bin_of(y)]++;
51
52 // Simple-function value: bin midpoint; measure of preimage: count/N * (b-a)
53 double estimate = 0.0;
54 for (size_t k = 0; k < bins; ++k) {
55 double y_left = y_min + (y_max - y_min) * (double)k / (double)bins;
56 double y_right = y_min + (y_max - y_min) * (double)(k+1) / (double)bins;
57 double y_mid = 0.5 * (y_left + y_right);
58 double measure = (double)counts[k] / (double)samples * (b - a);
59 estimate += y_mid * measure;
60 }
61
62 return {estimate, y_min, y_max, counts};
63}
64
65// Example function with narrow spikes of large height (bounded):
66// On each interval [i*0.01, i*0.01 + 0.0001), value = 1000; otherwise 0.
67// Total measure of spikes = 100 * 0.0001 = 0.01, integral = 1000 * 0.01 = 10.
68double spiky(double x) {
69 int i = (int) floor(x * 100.0); // i in {0,...,99} when x in [0,1)
70 if (i < 0) i = 0; if (i > 99) i = 99;
71 double left = i * 0.01;
72 double right_spike = left + 0.0001; // very narrow
73 return (x >= left && x < right_spike) ? 1000.0 : 0.0;
74}
75
76int main(){
77 double a = 0.0, b = 1.0;
78 size_t N = 200000; // number of uniform samples
79 size_t B = 50; // number of y-bins
80
81 auto res = lebesgue_hist_integral(spiky, a, b, N, B, 12345ULL);
82
83 cout.setf(ios::fixed); cout << setprecision(6);
84 cout << "Lebesgue histogram estimate of "+string("\\int_0^1 spiky(x) dx")+": " << res.estimate << "\n";
85 cout << "(Ground truth is 10)\n";
86 cout << "y-range estimated from samples: [" << res.y_min << ", " << res.y_max << "]\n";
87 // Optional: show a few nonempty bins
88 size_t shown = 0;
89 for (size_t k = 0; k < res.counts.size() && shown < 5; ++k) {
90 if (res.counts[k] > 0) {
91 double y_left = res.y_min + (res.y_max - res.y_min) * (double)k / (double)B;
92 double y_right = res.y_min + (res.y_max - res.y_min) * (double)(k+1) / (double)B;
93 cout << "bin [" << y_left << ", " << y_right << ") count = " << res.counts[k] << "\n";
94 shown++;
95 }
96 }
97 return 0;
98}
99

This program approximates a Lebesgue integral by constructing a simple-function via a histogram of f-values. It samples the domain uniformly (measuring set sizes), bins the observed f(x) values (slicing the range), and sums midpoint × estimated measure per bin. On the spiky() function, Riemann midpoints with a coarse grid often miss spikes, but the histogram sees them in proportion to their total measure, so the estimate is near 10.

Time: O(N + B) where N is samples and B is bins.Space: O(B) for bin counts, O(1) otherwise.
Riemann midpoint vs Monte Carlo (Lebesgue/expectation) on a spiky function
1#include <bits/stdc++.h>
2using namespace std;
3
4double spiky(double x) {
5 int i = (int) floor(x * 100.0);
6 if (i < 0) i = 0; if (i > 99) i = 99;
7 double left = i * 0.01;
8 double right_spike = left + 0.0001;
9 return (x >= left && x < right_spike) ? 1000.0 : 0.0;
10}
11
12// Classical midpoint Riemann sum
13double riemann_midpoint(const function<double(double)>& f, double a, double b, size_t M){
14 double h = (b - a) / (double)M;
15 double sum = 0.0;
16 for (size_t i = 0; i < M; ++i){
17 double x_mid = a + (i + 0.5) * h;
18 sum += f(x_mid);
19 }
20 return sum * h;
21}
22
23// Monte Carlo estimate using expectation: \int_a^b f = (b-a) * E[f(U)]
24struct MCResult { double estimate; double stderr; double ci95_halfwidth; };
25
26MCResult monte_carlo_integral(const function<double(double)>& f, double a, double b, size_t N, uint64_t seed=2024ULL){
27 mt19937_64 rng(seed);
28 uniform_real_distribution<double> U(a, b);
29 long double mean = 0.0L, M2 = 0.0L; // Welford for variance of f(U)
30 for (size_t i = 0; i < N; ++i){
31 double x = U(rng);
32 double y = f(x);
33 long double delta = y - mean;
34 mean += delta / (long double)(i + 1);
35 M2 += delta * (y - mean);
36 }
37 long double var = (N > 1) ? (M2 / (long double)(N - 1)) : 0.0L;
38 long double est = (b - a) * mean;
39 long double stderr = (b - a) * sqrt((double)var / (double)N);
40 long double ci95 = 1.96L * stderr;
41 return {(double)est, (double)stderr, (double)ci95};
42}
43
44int main(){
45 double a = 0.0, b = 1.0;
46 cout.setf(ios::fixed); cout << setprecision(6);
47
48 // Ground truth integral = 10
49 for (size_t M : {50UL, 100UL, 200UL, 1000UL}){
50 double rmid = riemann_midpoint(spiky, a, b, M);
51 cout << "Riemann midpoint (M=" << M << ") = " << rmid << "\n";
52 }
53
54 for (size_t N : {1000UL, 10000UL, 100000UL}){
55 auto mc = monte_carlo_integral(spiky, a, b, N);
56 cout << "Monte Carlo (N=" << N << ") = " << mc.estimate
57 << ", stderr = " << mc.stderr
58 << ", 95% CI half-width = " << mc.ci95_halfwidth << "\n";
59 }
60 return 0;
61}
62

We compare a domain-slicing Riemann midpoint rule with a Lebesgue/expectation-based Monte Carlo method on a function with very narrow, tall spikes whose total measure is small. Midpoints can entirely skip spikes unless the grid is extremely fine, producing biased estimates near 0. Monte Carlo detects spikes with frequency equal to their measure and yields an unbiased estimate with error shrinking like 1/sqrt(N).

Time: Riemann midpoint: O(M). Monte Carlo: O(N).Space: Both O(1).
Darboux (upper/lower) sums for the Dirichlet function: Riemann fails, Lebesgue succeeds
1#include <bits/stdc++.h>
2using namespace std;
3
4// The Dirichlet function on [0,1]: f(x)=1 if x is rational, 0 if irrational.
5// Mathematically, every interval contains rationals and irrationals, so
6// - lower Darboux sum over ANY partition is 0 (infimum on each subinterval is 0),
7// - upper Darboux sum is 1 (supremum on each subinterval is 1).
8// Thus Riemann integral does not exist, but Lebesgue integral is 0 (since rationals have measure 0).
9// This code demonstrates the theoretical Darboux sums for any partition size M.
10
11pair<double,double> dirichlet_darboux_sums(size_t M){
12 (void)M; // independent of partition
13 double lower_sum = 0.0;
14 double upper_sum = 1.0;
15 return {lower_sum, upper_sum};
16}
17
18int main(){
19 cout.setf(ios::fixed); cout << setprecision(6);
20 for (size_t M : {4UL, 10UL, 100UL, 1000UL}){
21 auto [L,U] = dirichlet_darboux_sums(M);
22 cout << "Partition M=" << M << ": lower sum = " << L << ", upper sum = " << U << "\n";
23 }
24 cout << "Conclusion: Riemann integral fails (lower!=upper), but Lebesgue integral equals 0 because the set where f=1 has measure 0." << "\n";
25 return 0;
26}
27

This illustrates the classic difference between Riemann and Lebesgue. No matter how fine the partition, lower sums stay 0 and upper sums stay 1, so the Riemann integral does not exist. In Lebesgue terms, the set of rationals has measure zero, so the integral of its indicator is 0.

Time: O(1).Space: O(1).
#lebesgue integral#riemann integral#measure theory#simple function#monotone convergence#dominated convergence#expectation#monte carlo integration#measure zero#darboux sums#indicator function#l1 space#probability measure#spiky functions#histogram approximation