∑MathIntermediate

Floating Point Arithmetic

Key Points

•
Floating-point numbers approximate real numbers using a fixed number of bits following the IEEE 754 standard.
•
Every float stores a sign, a biased exponent, and a fraction (mantissa) that together represent numbers of the form (-1)^s $\times$ 1.f $\times$ 2^{e-bias}.
•
Rounding is unavoidable; the rounding error is typically bounded by about half an ulp (unit in the last place) under round-to-nearest-even.
•
Naive operations like summation can accumulate significant error, but techniques like Kahan summation greatly reduce it with the same asymptotic time.
•
Comparing floats with == is unsafe; you should use tolerances (absolute and relative) or ULP-based comparisons.
•
Special values exist: +0/-0, + $\infty$ /- $\infty$ , and NaN; NaNs are unordered and propagate through most operations.
•
Subnormal numbers (very close to zero) preserve gradual underflow but can be slower and less precise.
•
In C++, use numeri $c_{l} imi t s$ , nextafter, and careful bit reinterpretation to inspect and control floating-point behavior.

Prerequisites

→Binary numbers and powers of two — Understanding how exponents and fraction bits represent powers of two is essential for IEEE 754.
→Scientific notation — Floating point generalizes scientific notation to binary; this helps with the sign-exponent-significand idea.
→Rounding and significant figures — Floating-point arithmetic rounds to a fixed precision at each step.
→C++ basics (types, functions, templates) — Code examples rely on C++ types like float/double and templated functions.
→Bitwise operations and memory layout — Inspecting IEEE fields requires safe bit reinterpretation and bit masks.
→Error metrics (absolute vs relative) — Choosing appropriate comparison strategies depends on these concepts.
→Numerical stability — Motivates why some algorithms (e.g., Kahan) reduce error without extra asymptotic cost.
→Limits and special values — Infinities and NaNs behave differently in comparisons and arithmetic.

Detailed Explanation

Tap terms for definitions

01Overview

Hook → We often assume computers can store any real number exactly, but they actually use a clever coding trick that fits only a finite set of values. Concept → Floating-point arithmetic (IEEE 754) represents real numbers in binary with a fixed number of bits: one bit for the sign, several bits for the exponent (range), and the rest for the fraction/mantissa (precision). This creates a dense grid of representable numbers near zero and a sparser grid for large magnitudes. Rounding happens on nearly every operation because the exact real result may not be representable. Example → Adding one billion and then repeatedly adding 0.1 can lead to surprises: some 0.1 additions might appear to do nothing because 0.1 is not exactly representable and the spacing (ULP) around 1e9 is larger than 0.1.

Hook → Why should we care? Because tiny rounding errors can snowball in iterative algorithms. Concept → IEEE 754 also defines special values (infinities, NaNs), standardized rounding modes, and exceptions/flags for events like overflow or division by zero. Example → Dividing by zero yields ±∞, while 0/0 yields NaN, and NaNs compare unequal to everything (even themselves), which affects conditionals and sorting if not handled.

02Intuition & Analogies

Hook → Imagine a ruler that changes tick-mark spacing as you slide along it. Near the left end (small numbers) the ticks are very close; farther right (huge numbers), they spread out. Concept → Floating-point is like this variable-spacing ruler. The exponent decides which “zoom level” you’re on, and the fraction fine-tunes the position within that zoom. You can only land exactly on tick marks. When the true answer falls between ticks, the system rounds to the nearest tick (often “ties to even”). Example → 0.1 in decimal does not align with binary tick marks, so the computer stores the nearest tick, something like 0.10000000000000000555...

Hook → Think of money in whole cents. You can’t represent half a cent, so you round every transaction. Concept → Each arithmetic step introduces a tiny rounding error, usually bounded. Over many steps, those pennies of error can add up—unless you keep track of what is lost (compensated summation). Example → Kahan’s algorithm keeps a small side-account (the compensation) so the pennies don’t vanish; it improves the final total without changing big-O time.

Hook → What about special cases? Concept → IEEE 754 also encodes ±0, infinities, and NaN to keep calculations predictable even when things go off the rails. Example → 1.0/0.0 becomes +∞, and any arithmetic with NaN stays NaN so errors don’t get silently masked.

03Formal Definition

Hoo k \to W e

need a precise model to reason about errors.

C o n ce pt \to I n

IEEE 754 binary formats, a finite, normalized nonzero number x is represented as x = (-1)^{s}

\times

(1 +

\sum_{i = 1}^{p - 1}

f_{i}

2^{-i})

\times

2^{e -

bias

}, where s is the sign bit, e is the stored exponent (1

\leq

\leq

2^{k}-2 for k exponent bits),

f_{i}

are fraction bits, p is precision (number of significand bits including the implicit 1), and bias is a fixed constant (127 for binary32, 1023 for binary64).

E x am pl e \to F or

double (binary64):

k = 11

p = 53

bia s = 1023

. Rounding model: For a basic operation

\circ

\in

{+, -,

\times

\div

}, the computed result satisfies

fl

\circ

y) = (x

\circ

y)(1 +

δ

) with

∣ δ ∣

\leq

u, where u is the unit roundoff (typically

u = 2^{-(p-1

)} under round-to-nearest-even). Subnormal numbers (

e = 0

f \neq = 0

) drop the leading 1 and use exponent 1-

bias

, ensuring gradual underflow. Special patterns with

e = 2^{k} - 1

encode infinities (

f = 0

) and NaNs (

f \neq = 0

). IEEE 754 also defines flags and rounding modes (to nearest-even, toward ±∞, toward 0) and operations like fused multiply-add (FMA) that compute ax+b with a single rounding. These definitions create a portable, analyzable foundation for numerical computing.

04When to Use

Hook → Should you ever avoid floating point? Concept → Use IEEE 754 floats when you need a wide dynamic range and fast approximate real arithmetic. They are ideal for science, graphics, signal processing, machine learning, and simulation where small errors are acceptable and performance matters. Example → Physics engines, 3D rendering, and training neural networks depend on floating point for speed and scale.

Hook → But what if you need exact cents or combinatorial counts? Concept → Prefer integers or decimal fixed-point for exactness in financial apps or when rounding must follow decimal rules. Example → Currency calculations are safer with scaled integers (e.g., store cents as int64) to avoid 0.1 issues.

Hook → Can you improve reliability without changing types? Concept → Use numerically stable algorithms (sorting before summation, Kahan/Neumaier summation, Horner’s method, compensated dot products) and FMA when available. Example → Summing small terms first or using Kahan preserves low-order bits and reduces error without changing big-O time.

⚠️Common Mistakes

Using == to compare floats. NaNs compare false to everything and tiny rounding differences break equality. Prefer tolerance-based or ULP-based comparisons.
Ignoring scale. Adding a tiny number to a huge one can be a no-op at that magnitude. Rescale or reorder operations (e.g., sum from smallest to largest) to reduce loss of significance.
Subtracting nearly equal numbers (catastrophic cancellation). Reformulate expressions (e.g., use rationalized forms or series expansions) to avoid massive relative error.
Forgetting special values. NaNs propagate; ±0 exist; infinities can silently appear. Always check inputs and guard critical branches.
Assuming decimal literals are exact. 0.1 is not exactly representable in binary; parsing already introduces error.
Overlooking rounding modes and denormal performance. Changing the rounding mode or handling subnormals can affect speed and results; do not rely on FTZ/DAZ defaults without understanding consequences.
Misinterpreting machine epsilon. \epsilon is not the smallest positive float; it’s the gap around 1.0. The smallest positive normalized and subnormal numbers are much smaller.
Reinterpreting bits with unsafe casts. Use memcpy or std::bit_cast (C++20) to avoid undefined behavior when inspecting IEEE fields.

Key Formulas

Normalized IEEE Representation

x = (- 1)^{s} \times (1 + i = 1 \sum p - 1 f_{i} 2^{- i}) \times 2^{e - bias}

Explanation: A finite, normalized nonzero float is encoded by sign s, exponent e (with a bias), and fraction bits $f_{i}$ . The leading 1 is implicit, giving p bits of precision for the significand.

Subnormal Representation

x = (- 1)^{s} \times (i = 1 \sum p - 1 f_{i} 2^{- i}) \times 2^{1 - bias}

Explanation: Subnormals use an exponent of 1−bias and have no implicit leading 1. This allows gradual underflow with evenly spaced values near zero.

Machine Epsilon and Unit Roundoff

ϵ = 2^{- (p - 1)}, u = \frac{ϵ}{2} = 2^{- p}

Explanation: Machine epsilon ε is the gap between 1 and the next larger number; unit roundoff u is roughly the maximum relative error for one rounded operation. For double, $p = 53$ , so $ε = 2^{- 52}$ and $u = 2^{- 53}$ .

Standard Rounding Model

fl (x \circ y) = (x \circ y) (1 + δ), ∣ δ ∣ \leq u

Explanation: A floating-point operation produces the exact result multiplied by 1+δ with | $δ ∣$ bounded by unit roundoff. This model underlies most error analyses.

ULP Spacing (Binary)

ULP (x) = 2^{e - (p - 1)}

Explanation: For a normalized x with unbiased exponent e, adjacent representable numbers are spaced by 2^{e-(p-1)}. Spacing grows with magnitude, explaining why addition of tiny terms to huge numbers may have no effect.

Naive Summation Error Bound

fl (i = 1 \sum n x_{i}) - i = 1 \sum n x_{i} \leq γ_{n} i = 1 \sum n ∣ x_{i} ∣, γ_{n} = \frac{n u}{1 - n u}

Explanation: Summing n numbers in floating point can accumulate error proportional to n times unit roundoff. The bound uses $γ_{n}$ , which grows roughly linearly with n when nu is small.

Kahan Summation Error (Informal)

∣ \hat{S} - S ∣ \leq (u + O (n u^{2})) i = 1 \sum n ∣ x_{i} ∣

Explanation: Compensated summation dramatically reduces error compared to naive summation. The leading term behaves like one rounding, with tiny higher-order terms in nu.

Round-to-Nearest-even Error

∣ x - fl (x) ∣ \leq \frac{1}{2} ULP (x)

Explanation: Under round-to-nearest-even, the maximum absolute rounding error is half an ulp at x. This is the core guarantee that keeps per-operation errors small.

Format Parameters

binary32: p = 24, k = 8, bias = 127; binary64: p = 53, k = 11, bias = 1023

Explanation: These constants define the precision (p), exponent bits (k), and bias for the two most common IEEE 754 formats, float and double.

Complexity Analysis

Floating-point arithmetic operations like addition, subtraction, multiplication, and division execute in constant time, O(1), on typical hardware. However, constant factors vary: multiplication and division are slower than addition/subtraction; fused multiply-add (FMA) often matches multiplication latency while improving accuracy by rounding once. Operations on subnormal numbers, or when changing rounding modes, can incur significant slowdowns depending on the CPU and settings (e.g., denormals-are-zero, flush-to-zero). From an algorithmic perspective, numerical methods built from these primitives can scale differently in error even if time complexity is the same. For example, naive summation of n terms is O(n) time and O(1) space, as is Kahan summation. Yet their numerical error behaves very differently: naive error grows roughly like n·u (more precisely

γ_{n}

= nu/(1-nu)), while Kahan’s error is close to one rounding (u plus tiny higher-order terms). Thus, when n is large or when terms vary greatly in magnitude, compensated summation is preferred despite identical big-O runtime. Space complexity is typically O(1) for arithmetic and small kernels (e.g., dot products, reductions), but vectorized or blocked algorithms may temporarily allocate O(n) buffers. Many numerical algorithms trade space for stability (e.g., storing partial sums or using pairwise summation in a binary tree, which is O(n) time and O(

lo g

n) depth on parallel hardware). In summary, while time complexity often remains unchanged, numerically stable formulations can vastly improve accuracy without asymptotic cost, and hardware details (FMA, subnormals) affect practical performance constants.

Code Examples

Decompose a double into IEEE 754 sign, exponent, and fraction bits

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct DoubleParts {
5     uint64_t sign;      // 0 or 1
6     uint64_t exponent;  // 11 bits
7     uint64_t fraction;  // 52 bits
8 };
9 
10 DoubleParts dissectDouble(double x) {
11     uint64_t bits = 0;
12     static_assert(sizeof(double) == sizeof(uint64_t), "Unexpected double size");
13     memcpy(&bits, &x, sizeof(bits)); // Safe bit reinterpretation
14     DoubleParts p{};
15     p.sign = (bits >> 63) & 0x1ull;
16     p.exponent = (bits >> 52) & 0x7ffull; // 11 bits
17     p.fraction = bits & 0x000fffffffffffffull; // 52 bits
18     return p;
19 }
20 
21 int main() {
22     cout.setf(std::ios::boolalpha);
23 
24     vector<double> samples = {1.0, -0.0, 0.1, numeric_limits<double>::infinity(), -numeric_limits<double>::infinity(), 
25                               numeric_limits<double>::quiet_NaN(), numeric_limits<double>::denorm_min()};
26 
27     for (double x : samples) {
28         DoubleParts p = dissectDouble(x);
29         cout << "x = " << x << "\n";
30         cout << "  sign:      " << p.sign << "\n";
31         cout << "  exponent:  0x" << hex << setw(3) << setfill('0') << p.exponent << dec
32              << " (" << p.exponent << ")\n";
33         cout << "  fraction:  0x" << hex << setw(13) << setfill('0') << p.fraction << dec << "\n";
34 
35         // Classify special cases
36         bool isZero      = (p.exponent == 0 && p.fraction == 0);
37         bool isSubnormal = (p.exponent == 0 && p.fraction != 0);
38         bool isInf       = (p.exponent == 0x7ff && p.fraction == 0);
39         bool isNaN       = (p.exponent == 0x7ff && p.fraction != 0);
40 
41         cout << "  zero?       " << isZero << "\n";
42         cout << "  subnormal?  " << isSubnormal << "\n";
43         cout << "  infinity?   " << isInf << "\n";
44         cout << "  NaN?        " << isNaN << "\n";
45 
46         if (!isZero && !isSubnormal && !isInf && !isNaN) {
47             int64_t unbiased = static_cast<int64_t>(p.exponent) - 1023; // bias for double
48             cout << "  unbiased exponent (e - bias): " << unbiased << "\n";
49         }
50         cout << "\n";
51     }
52 }
53

This program inspects the underlying IEEE 754 binary64 (double) layout using memcpy to avoid aliasing issues. It extracts the sign, exponent, and fraction fields, prints them in hex, and classifies each sample as zero, subnormal, infinity, or NaN. For normal numbers, it also prints the unbiased exponent. This helps visualize how decimal-looking values like 0.1 are actually represented and why special values have distinct bit patterns.

Time: O(1) per number (constant time to dissect and print)Space: O(1)

Kahan compensated summation vs naive summation

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 template <class T>
5 T naive_sum(const vector<T>& a) {
6     T s = 0;
7     for (T v : a) s += v; // Each addition rounds; errors may accumulate
8     return s;
9 }
10 
11 template <class T>
12 T kahan_sum(const vector<T>& a) {
13     T s = 0;      // Running sum
14     T c = 0;      // Compensation for lost low-order bits
15     for (T v : a) {
16         T y = v - c;      // First, subtract the compensation
17         T t = s + y;      // Add the corrected value
18         c = (t - s) - y;  // New compensation captures low-order roundoff
19         s = t;
20     }
21     return s;
22 }
23 
24 int main() {
25     // Construct a numerically challenging sum: one big number plus many tiny ones
26     const int n = 1000000;
27     vector<double> a; a.reserve(n + 1);
28     a.push_back(1e9);
29     for (int i = 0; i < n; ++i) a.push_back(1e-6);
30 
31     double s1 = naive_sum(a);
32     double s2 = kahan_sum(a);
33 
34     cout.setf(std::ios::fixed); cout << setprecision(10);
35     cout << "Naive sum:  " << s1 << "\n";
36     cout << "Kahan sum:  " << s2 << "\n";
37     cout << "Expected (real): 1000000000.999999\n";
38     cout << "Absolute error naive: " << fabs(s1 - 1000000000.999999) << "\n";
39     cout << "Absolute error Kahan: " << fabs(s2 - 1000000000.999999) << "\n";
40 }
41

We compare naive accumulation to Kahan’s compensated summation on a difficult dataset (one large term and one million small terms). The naive method loses low-order bits when the running total is large; additions of 1e-6 may be swallowed by spacing near 1e9. Kahan’s method tracks the lost bits in a compensation variable and adds them back later, significantly reducing total error without changing big-O time.

Time: O(n) to traverse the array onceSpace: O(1) extra space beyond the input array

Robust floating-point comparisons: absolute/relative and ULP-based

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Absolute/relative tolerance comparison
5 bool approx_equal(double a, double b, double rel_tol = 1e-12, double abs_tol = 1e-15) {
6     // Handle infinities and NaNs explicitly
7     if (std::isnan(a) || std::isnan(b)) return false; // NaN is never equal
8     if (std::isinf(a) || std::isinf(b)) return a == b; // only equal if same infinity
9 
10     double diff = fabs(a - b);
11     if (diff <= abs_tol) return true; // near zero
12 
13     double max_ab = max(fabs(a), fabs(b));
14     return diff <= rel_tol * max_ab;
15 }
16 
17 // ULP distance for IEEE 754 double (orders all bit patterns monotonically)
18 static inline uint64_t to_ordered_uint(double x) {
19     uint64_t u; memcpy(&u, &x, sizeof(u));
20     // Map negatives to high range so that increasing numeric values have increasing ordered integers
21     return (u & (1ull<<63)) ? (~u + 1ull) : (u | (1ull<<63));
22 }
23 
24 uint64_t ulp_distance(double a, double b) {
25     if (std::isnan(a) || std::isnan(b)) return UINT64_MAX; // undefined distance
26     uint64_t ua = to_ordered_uint(a);
27     uint64_t ub = to_ordered_uint(b);
28     return (ua > ub) ? (ua - ub) : (ub - ua);
29 }
30 
31 int main() {
32     double x = 1.0;
33     double y = std::nextafter(1.0, 2.0); // adjacent representable number above 1.0
34 
35     cout.setf(std::ios::boolalpha);
36     cout << "x = " << setprecision(17) << x << ", y = " << y << "\n";
37     cout << "approx_equal(x, y) with tight tolerances: " << approx_equal(x, y, 1e-16, 0.0) << "\n";
38     cout << "ULP distance(x, y) = " << ulp_distance(x, y) << " (should be 1)\n";
39 
40     double a = 1e9;
41     double b = a + 0.1; // 0.1 may not change a if it rounds away
42     cout << "ULP distance(1e9, 1e9+0.1) = " << ulp_distance(a, b) << "\n";
43 }
44

This example shows two practical strategies for comparing doubles. The first uses a combination of absolute and relative tolerances, suitable for most numeric code and stable across scales. The second uses ULP distance by mapping the bit patterns to a monotonic integer order; it counts how many representable numbers lie between the two values. We also demonstrate std::nextafter to move by exactly one ULP.

Time: O(1) for each comparison and ULP distance computationSpace: O(1)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct DoubleParts {
5	uint64_t sign; // 0 or 1
6	uint64_t exponent; // 11 bits
7	uint64_t fraction; // 52 bits
8	};
9
10	DoubleParts dissectDouble(double x) {
11	uint64_t bits = 0;
12	static_assert(sizeof(double) == sizeof(uint64_t), "Unexpected double size");
13	memcpy(&bits, &x, sizeof(bits)); // Safe bit reinterpretation
14	DoubleParts p{};
15	p.sign = (bits >> 63) & 0x1ull;
16	p.exponent = (bits >> 52) & 0x7ffull; // 11 bits
17	p.fraction = bits & 0x000fffffffffffffull; // 52 bits
18	return p;
19	}
20
21	int main() {
22	cout.setf(std::ios::boolalpha);
23
24	vector<double> samples = {1.0, -0.0, 0.1, numeric_limits<double>::infinity(), -numeric_limits<double>::infinity(),
25	numeric_limits<double>::quiet_NaN(), numeric_limits<double>::denorm_min()};
26
27	for (double x : samples) {
28	DoubleParts p = dissectDouble(x);
29	cout << "x = " << x << "\n";
30	cout << " sign: " << p.sign << "\n";
31	cout << " exponent: 0x" << hex << setw(3) << setfill('0') << p.exponent << dec
32	<< " (" << p.exponent << ")\n";
33	cout << " fraction: 0x" << hex << setw(13) << setfill('0') << p.fraction << dec << "\n";
34
35	// Classify special cases
36	bool isZero = (p.exponent == 0 && p.fraction == 0);
37	bool isSubnormal = (p.exponent == 0 && p.fraction != 0);
38	bool isInf = (p.exponent == 0x7ff && p.fraction == 0);
39	bool isNaN = (p.exponent == 0x7ff && p.fraction != 0);
40
41	cout << " zero? " << isZero << "\n";
42	cout << " subnormal? " << isSubnormal << "\n";
43	cout << " infinity? " << isInf << "\n";
44	cout << " NaN? " << isNaN << "\n";
45
46	if (!isZero && !isSubnormal && !isInf && !isNaN) {
47	int64_t unbiased = static_cast<int64_t>(p.exponent) - 1023; // bias for double
48	cout << " unbiased exponent (e - bias): " << unbiased << "\n";
49	}
50	cout << "\n";
51	}
52	}
53

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	template <class T>
5	T naive_sum(const vector<T>& a) {
6	T s = 0;
7	for (T v : a) s += v; // Each addition rounds; errors may accumulate
8	return s;
9	}
10
11	template <class T>
12	T kahan_sum(const vector<T>& a) {
13	T s = 0; // Running sum
14	T c = 0; // Compensation for lost low-order bits
15	for (T v : a) {
16	T y = v - c; // First, subtract the compensation
17	T t = s + y; // Add the corrected value
18	c = (t - s) - y; // New compensation captures low-order roundoff
19	s = t;
20	}
21	return s;
22	}
23
24	int main() {
25	// Construct a numerically challenging sum: one big number plus many tiny ones
26	const int n = 1000000;
27	vector<double> a; a.reserve(n + 1);
28	a.push_back(1e9);
29	for (int i = 0; i < n; ++i) a.push_back(1e-6);
30
31	double s1 = naive_sum(a);
32	double s2 = kahan_sum(a);
33
34	cout.setf(std::ios::fixed); cout << setprecision(10);
35	cout << "Naive sum: " << s1 << "\n";
36	cout << "Kahan sum: " << s2 << "\n";
37	cout << "Expected (real): 1000000000.999999\n";
38	cout << "Absolute error naive: " << fabs(s1 - 1000000000.999999) << "\n";
39	cout << "Absolute error Kahan: " << fabs(s2 - 1000000000.999999) << "\n";
40	}
41

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Absolute/relative tolerance comparison
5	bool approx_equal(double a, double b, double rel_tol = 1e-12, double abs_tol = 1e-15) {
6	// Handle infinities and NaNs explicitly
7	if (std::isnan(a) \|\| std::isnan(b)) return false; // NaN is never equal
8	if (std::isinf(a) \|\| std::isinf(b)) return a == b; // only equal if same infinity
9
10	double diff = fabs(a - b);
11	if (diff <= abs_tol) return true; // near zero
12
13	double max_ab = max(fabs(a), fabs(b));
14	return diff <= rel_tol * max_ab;
15	}
16
17	// ULP distance for IEEE 754 double (orders all bit patterns monotonically)
18	static inline uint64_t to_ordered_uint(double x) {
19	uint64_t u; memcpy(&u, &x, sizeof(u));
20	// Map negatives to high range so that increasing numeric values have increasing ordered integers
21	return (u & (1ull<<63)) ? (~u + 1ull) : (u \| (1ull<<63));
22	}
23
24	uint64_t ulp_distance(double a, double b) {
25	if (std::isnan(a) \|\| std::isnan(b)) return UINT64_MAX; // undefined distance
26	uint64_t ua = to_ordered_uint(a);
27	uint64_t ub = to_ordered_uint(b);
28	return (ua > ub) ? (ua - ub) : (ub - ua);
29	}
30
31	int main() {
32	double x = 1.0;
33	double y = std::nextafter(1.0, 2.0); // adjacent representable number above 1.0
34
35	cout.setf(std::ios::boolalpha);
36	cout << "x = " << setprecision(17) << x << ", y = " << y << "\n";
37	cout << "approx_equal(x, y) with tight tolerances: " << approx_equal(x, y, 1e-16, 0.0) << "\n";
38	cout << "ULP distance(x, y) = " << ulp_distance(x, y) << " (should be 1)\n";
39
40	double a = 1e9;
41	double b = a + 0.1; // 0.1 may not change a if it rounds away
42	cout << "ULP distance(1e9, 1e9+0.1) = " << ulp_distance(a, b) << "\n";
43	}
44