Floor Sum Formula
Key Points
- β’The floor sum computes S(n,m,a,b) = sum_{i=0}^{n-1} floor((a i + b)/m) efficiently in O(log(min(a,m))) time.
- β’Key trick: extract whole parts when or , then swap roles of a and m recursively, mirroring the Euclidean algorithm.
- β’Geometric view: it counts lattice points under the line y = (a x + b)/m with x = 0..n-1, which explains the recurrence.
- β’A stable, iterative implementation avoids recursion depth and uses 128-bit intermediates to prevent overflow.
- β’The decomposition step adds * n(nβ1)/2 and * n, where = floor, = floor.
- β’The core recurrence uses y_ n + b)/m) and i0 = ceil(( m β b)/a) to reduce the problem size.
- β’This routine (known as floo in AtCoder Library) underpins many number-theory and combinatorics tasks.
- β’Applications include summing arithmetic progressions modulo m, counting pairs with linear constraints, and various CF problems rated 1900β2300.
Prerequisites
- βProperties of floor and ceiling β Understanding how floor and ceiling interact with addition and integer division is critical for the identities used.
- βEuclidean algorithm and modular arithmetic β The reduction mirrors gcd steps and relies on remainders and swaps.
- βArithmetic series and triangular numbers β Whole-part extraction contributes terms proportional to n(nβ1)/2 and n.
- βSafe integer arithmetic in C++ β 128-bit temporaries prevent overflow in intermediate products with 64-bit inputs.
- βCounting lattice points under a line β Provides the geometric intuition behind the recurrence and avoids off-by-one mistakes.
- βCeil via integer formula β Computing i0 = ceil(x/a) correctly is essential to avoid boundary errors.
Detailed Explanation
Tap terms for definitions01Overview
The floor sum problem asks us to compute S(n, m, a, b) = sum_{i=0}^{n-1} floor((a i + b)/m) quickly. A direct O(n) loop is too slow when n is large. The well-known trick (popularized by AtCoder Library as floor_sum) transforms the sum using properties of integer division and the floor function, reducing it in a way similar to the Euclidean algorithm for gcd. The strategy is to first extract the integer parts contributed by a and b when they are at least m (since those parts contribute predictable arithmetic series). After this normalization step (0 β€ a, b < m), we apply a clever swap of parameters that shrinks the problem size dramatically. This swap relies on counting how many terms reach the next quotient value and then re-indexing the remaining count in terms of a smaller problem with swapped numerator/denominator roles. Repeating these steps yields a logarithmic-time algorithm.
This computation has a clean geometric interpretation: for each integer x from 0 to nβ1, floor((a x + b)/m) is the number of integer y values such that 0 β€ y β€ floor((a x + b)/m) β 1. Therefore, S(n, m, a, b) equals the total number of lattice points (x, y) under the line y = (a x + b)/m within the vertical strip 0 β€ x β€ nβ1. Understanding it this way makes the recurrence intuitive and helps avoid off-by-one errors. The result is a compact, fast, and robust primitive for many competitive programming problems.
02Intuition & Analogies
Imagine drawing a slanted line on graph paper: y = (a x + b)/m. For each integer x from 0 to nβ1, the value floor((a x + b)/m) tells you how many whole squares lie strictly below the line in that vertical column. Summing over all x counts all the little squares under the line in the strip from x = 0 to x = nβ1. That picture is precisely what S(n, m, a, b) measures: the number of lattice points under a line.
Now, what happens if the line is steep because a is large compared to m? Each step in x adds roughly a/m to y, and if a β₯ m then βwhole stepsβ of y happen regularly. Those whole steps form an arithmetic progression that we can count directly β thatβs why we extract q_a = floor(a/m), contributing q_a times the triangular number n(nβ1)/2. Similarly, a large intercept b contributes q_b = floor(b/m) to every column, adding q_b * n. After removing these whole parts, the lineβs slope and intercept become the remainders a % m and b % m, making the line βflatterβ within a single block of size m.
The remaining cleverness is to turn the picture sideways. Instead of counting columns up to height floor((a x + b)/m), we count horizontal layers up to y_max = floor((a n + b)/m). By swapping the roles of numerator and denominator (a and m) and translating the intercept appropriately, we get a smaller instance of the same problem. This is the same spirit as the Euclidean algorithm: we keep reducing and swapping until the problem becomes trivial. Geometrically, itβs like tiling the area under the line with rectangles and smaller slanted slices, peeling them off layer by layer until nothing remains.
03Formal Definition
04When to Use
Use the floor sum when you need to compute sums of the form \sum_{i=0}^{n-1} \left\lfloor \frac{a i + b}{m} \right\rfloor or, equivalently, when counting lattice points under a rational-slope line in a vertical strip. Typical competitive programming cases include:
- Summing an arithmetic progression modulo m: \sum ((a i + b) \bmod m). This reduces to known terms minus m times a floor sum.
- Counting pairs (i, j) satisfying linear inequalities like m j β€ a i + b with bounded i (0 β€ i < n). The count equals the floor sum directly.
- Working with Beatty-like sequences or partitioning ranges by quotient values when dividing by a linear function.
- Number-theory problems where parameters are large (up to 10^{18}) and direct iteration is impossible, but gcd-like logarithmic reductions are fine.
- Geometry on grids: computing the number of grid points under or on a line segment with integer endpoints projected onto a strip.
When a and m are both large but their ratio is moderate, the algorithm remains fast because each reduction mirrors the Euclidean algorithmβs complexity. If your expression deviates from linear-over-constant structure (e.g., nonlinear numerators, varying denominators), this technique does not apply directly.
β οΈCommon Mistakes
- Forgetting whole-part extraction: not reducing a and b by m first misses the large, easy contributions and can slow or break the logic. Always add q_a * n(nβ1)/2 and q_b * n, then set a %= m, b %= m.
- Off-by-one in i0: i0 = ceil(x_max / a) must be computed as (x_max + a β 1) / a for integers. Using floor or omitting +aβ1 produces incorrect counts for boundary columns.
- Overflow in 64-bit: terms like n(nβ1)/2 or (nβ1) n (a/m) can overflow 64-bit even when the final answer fits. Use 128-bit temporaries (__int128) for products, then cast back to 64-bit.
- Mishandling negatives: the standard routine assumes a, b, n, m with n β₯ 0, m β₯ 1, and typically a, b β₯ 0. If negative values are required, you must normalize carefully or use a signed-aware variant; otherwise results are wrong.
- Infinite loops by skipping termination: after normalization, if y_max = floor((a n + b)/m) is zero, the remaining sum is zero and you must return; forgetting this causes division-by-zero or loops.
- Recursion depth: a recursive implementation can be deep in worst cases; prefer an iterative loop that swaps (a, m) and updates (n, b) to avoid stack issues.
- Using floating-point: never compute with doubles; all steps are exact in integers, and floating-point can introduce subtle rounding errors.
Key Formulas
Floor sum definition
Explanation: This is the quantity we want to compute efficiently. It sums the integer quotients of a linear function over a range of i.
Division with remainder
Explanation: We split a and b into whole multiples of m plus remainders. This enables extracting large, easy-to-sum parts before recurring on smaller remainders.
Whole-part extraction
Explanation: The term adds an arithmetic progression over i and the term adds a constant to each summand. After removing them, only remainders are left to handle.
Key parameters
Explanation: These values describe the top horizontal layer under the line and where it begins across the columns. They drive the main reduction step.
Core reduction (0 \le a,b < m)
Explanation: Counts the top layers for the last n β i0 columns and re-expresses the remaining smaller area with swapped roles of a and m. This shrinks parameters logarithmically.
Sum of progression modulo m
Explanation: The modulo is the original value minus m times the floor. This identity converts a sum of mods into a floor sum plus easy arithmetic terms.
Time complexity
Explanation: Each reduction step swaps and reduces (a, m) similarly to the Euclidean algorithm for gcd, yielding a logarithmic number of iterations.
Triangular number
Explanation: This is the sum of the first n β 1 integers. It appears when summing the whole-part contribution from across i.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Computes S(n, m, a, b) = sum_{i=0}^{n-1} floor((a*i + b)/m) 5 // Assumes: n >= 0, m >= 1, and typically a, b >= 0. 6 // Uses iterative reduction to avoid recursion depth, with __int128 to avoid overflow. 7 long long floor_sum(long long n, long long m, long long a, long long b) { 8 long long ans = 0; 9 while (true) { 10 // Extract whole parts when a >= m: contributes (a/m) * sum_{i=0}^{n-1} i 11 if (a >= m) { 12 long long q = a / m; 13 __int128 add = (__int128)(n - 1) * n / 2 * q; // safe in 128-bit 14 ans += (long long)add; 15 a %= m; 16 } 17 // Extract whole parts when b >= m: contributes n * (b/m) 18 if (b >= m) { 19 long long q = b / m; 20 __int128 add = (__int128)n * q; 21 ans += (long long)add; 22 b %= m; 23 } 24 // Now 0 <= a, b < m (or a == 0). If a == 0, the remaining sum is zero. 25 if (a == 0) return ans; 26 // y_max = floor((a*n + b)/m). If zero, nothing more to add. 27 long long y_max = (long long)(((__int128)a * n + b) / m); 28 if (y_max == 0) return ans; 29 // x_max = y_max * m - b, i0 = ceil(x_max / a) 30 long long x_max = (long long)((__int128)y_max * m - b); 31 long long i0 = (x_max + a - 1) / a; // ceil division, a > 0 here 32 // Add the top y_max layers for the last (n - i0) columns 33 ans += (n - i0) * y_max; 34 // Shrink the problem and swap roles of a and m 35 n = y_max; 36 b = (a - (x_max % a)) % a; // new intercept, stays in [0, a) 37 swap(a, m); 38 } 39 } 40 41 int main() { 42 ios::sync_with_stdio(false); 43 cin.tie(nullptr); 44 45 // Demo: compute S(n,m,a,b) for a few cases 46 vector<tuple<long long,long long,long long,long long>> tests = { 47 {5, 7, 4, 3}, // sum_{i=0}^{4} floor((4i+3)/7) 48 {10, 6, 8, 5}, // larger a than m 49 {3, 3, 2, 0}, // expected 1 50 {1000000000000LL, 1000000000LL, 123456789LL, 987654321LL} 51 }; 52 53 for (auto [n, m, a, b] : tests) { 54 cout << "S(" << n << "," << m << "," << a << "," << b << ") = " 55 << floor_sum(n, m, a, b) << "\n"; 56 } 57 return 0; 58 } 59
Implements the O(log(min(a,m))) floor sum using whole-part extraction and a Euclidean-like reduction. The variables y_max, x_max, and i0 identify the top horizontal layers to count immediately; then the instance shrinks by swapping (a, m). 128-bit temporaries ensure no overflow in intermediate products.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 long long floor_sum(long long n, long long m, long long a, long long b) { 5 long long ans = 0; 6 while (true) { 7 if (a >= m) { 8 long long q = a / m; 9 __int128 add = (__int128)(n - 1) * n / 2 * q; 10 ans += (long long)add; 11 a %= m; 12 } 13 if (b >= m) { 14 long long q = b / m; 15 __int128 add = (__int128)n * q; 16 ans += (long long)add; 17 b %= m; 18 } 19 if (a == 0) return ans; 20 long long y_max = (long long)(((__int128)a * n + b) / m); 21 if (y_max == 0) return ans; 22 long long x_max = (long long)((__int128)y_max * m - b); 23 long long i0 = (x_max + a - 1) / a; 24 ans += (n - i0) * y_max; 25 n = y_max; 26 b = (a - (x_max % a)) % a; 27 swap(a, m); 28 } 29 } 30 31 // Computes sum_{i=0}^{n-1} ((a*i + b) % m) in O(log(min(a,m))) 32 long long sum_progression_mod(long long n, long long m, long long a, long long b) { 33 // Using identity: sum mod = a * n(n-1)/2 + b*n - m * floor_sum(n, m, a, b) 34 __int128 total = 0; 35 total += (__int128)a * (n - 1) * n / 2; // a * n(n-1)/2 36 total += (__int128)b * n; // + b*n 37 total -= (__int128)m * floor_sum(n, m, a, b); 38 return (long long)total; 39 } 40 41 int main() { 42 ios::sync_with_stdio(false); 43 cin.tie(nullptr); 44 45 // Example: sum of (3i + 2) mod 7 for i = 0..9 46 long long n = 10, m = 7, a = 3, b = 2; 47 long long ans = sum_progression_mod(n, m, a, b); 48 49 // Verify by brute force for small n 50 long long brute = 0; 51 for (long long i = 0; i < n; ++i) brute += (a * i + b) % m; 52 53 cout << "sum_{i=0}^{" << (n-1) << "} ((" << a << "*i+" << b << ") % " << m << ") = " << ans << "\n"; 54 cout << "brute = " << brute << "\n"; 55 return 0; 56 } 57
Uses the identity sum ((a i + b) mod m) = a*n(nβ1)/2 + b*n β m*S(n,m,a,b). This is common when you need the total of a linear progression reduced modulo m without iterating over i.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 long long floor_sum(long long n, long long m, long long a, long long b) { 5 long long ans = 0; 6 while (true) { 7 if (a >= m) { 8 long long q = a / m; 9 __int128 add = (__int128)(n - 1) * n / 2 * q; 10 ans += (long long)add; 11 a %= m; 12 } 13 if (b >= m) { 14 long long q = b / m; 15 __int128 add = (__int128)n * q; 16 ans += (long long)add; 17 b %= m; 18 } 19 if (a == 0) return ans; 20 long long y_max = (long long)(((__int128)a * n + b) / m); 21 if (y_max == 0) return ans; 22 long long x_max = (long long)((__int128)y_max * m - b); 23 long long i0 = (x_max + a - 1) / a; 24 ans += (n - i0) * y_max; 25 n = y_max; 26 b = (a - (x_max % a)) % a; 27 swap(a, m); 28 } 29 } 30 31 // Counts the number of integer points (x, y) with 0 <= x <= n-1 and 0 <= y < (a*x + b)/m 32 // This equals S(n, m, a, b). 33 int main() { 34 ios::sync_with_stdio(false); 35 cin.tie(nullptr); 36 37 long long n = 12; // number of columns: x = 0..11 38 long long a = 9, b = 7; // line: y = (a*x + b)/m 39 long long m = 10; 40 41 long long count_points = floor_sum(n, m, a, b); 42 43 cout << "Lattice points under y = (" << a << "*x + " << b << ")/" << m 44 << " for x in [0," << (n-1) << "] = " << count_points << "\n"; 45 46 // Optional small verification by brute force 47 long long brute = 0; 48 for (long long x = 0; x < n; ++x) { 49 long long ymax = (a * x + b) / m; // floor division 50 brute += ymax; 51 } 52 cout << "brute = " << brute << "\n"; 53 return 0; 54 } 55
Interprets the floor sum as counting lattice points beneath the rational line y = (a x + b)/m within the strip x β {0, β¦, nβ1}. The floor_sum function directly yields the count, useful in combinatorics and geometry problems.