∑MathAdvanced

Floor Sum Formula

Key Points

•
The floor sum computes S(n,m,a,b) = sum_{ $i=0}^{n-1}$ floor((a i + b)/m) efficiently in O(log(min(a,m))) time.
•
Key trick: extract whole parts when $a \geq m$ or $b \geq m$ , then swap roles of a and m recursively, mirroring the Euclidean algorithm.
•
Geometric view: it counts lattice points under the line y = (a x + b)/m with x = 0..n-1, which explains the recurrence.
•
A stable, iterative implementation avoids recursion depth and uses 128-bit intermediates to prevent overflow.
•
The decomposition step adds $q_{a}$ * n(n−1)/2 and $q_{b}$ * n, where $q_{a}$ = floor $\frac{a}{m}$ , $q_{b}$ = floor $\frac{b}{m}$ .
•
The core recurrence uses y_ $ma x = f l oor ((a$ n + b)/m) and i0 = ceil(( $y_{m} a x$ m − b)/a) to reduce the problem size.
•
This routine (known as floo $r_{s} u m$ in AtCoder Library) underpins many number-theory and combinatorics tasks.
•
Applications include summing arithmetic progressions modulo m, counting pairs with linear constraints, and various CF problems rated 1900–2300.

Prerequisites

→Properties of floor and ceiling — Understanding how floor and ceiling interact with addition and integer division is critical for the identities used.
→Euclidean algorithm and modular arithmetic — The reduction mirrors gcd steps and relies on remainders and swaps.
→Arithmetic series and triangular numbers — Whole-part extraction contributes terms proportional to n(n−1)/2 and n.
→Safe integer arithmetic in C++ — 128-bit temporaries prevent overflow in intermediate products with 64-bit inputs.
→Counting lattice points under a line — Provides the geometric intuition behind the recurrence and avoids off-by-one mistakes.
→Ceil via integer formula — Computing i0 = ceil(x/a) correctly is essential to avoid boundary errors.

Detailed Explanation

Tap terms for definitions

01Overview

The floor sum problem asks us to compute S(n, m, a, b) = sum_{i=0}^{n-1} floor((a i + b)/m) quickly. A direct O(n) loop is too slow when n is large. The well-known trick (popularized by AtCoder Library as floor_sum) transforms the sum using properties of integer division and the floor function, reducing it in a way similar to the Euclidean algorithm for gcd. The strategy is to first extract the integer parts contributed by a and b when they are at least m (since those parts contribute predictable arithmetic series). After this normalization step (0 ≤ a, b < m), we apply a clever swap of parameters that shrinks the problem size dramatically. This swap relies on counting how many terms reach the next quotient value and then re-indexing the remaining count in terms of a smaller problem with swapped numerator/denominator roles. Repeating these steps yields a logarithmic-time algorithm.

This computation has a clean geometric interpretation: for each integer x from 0 to n−1, floor((a x + b)/m) is the number of integer y values such that 0 ≤ y ≤ floor((a x + b)/m) − 1. Therefore, S(n, m, a, b) equals the total number of lattice points (x, y) under the line y = (a x + b)/m within the vertical strip 0 ≤ x ≤ n−1. Understanding it this way makes the recurrence intuitive and helps avoid off-by-one errors. The result is a compact, fast, and robust primitive for many competitive programming problems.

02Intuition & Analogies

Imagine drawing a slanted line on graph paper: y = (a x + b)/m. For each integer x from 0 to n−1, the value floor((a x + b)/m) tells you how many whole squares lie strictly below the line in that vertical column. Summing over all x counts all the little squares under the line in the strip from x = 0 to x = n−1. That picture is precisely what S(n, m, a, b) measures: the number of lattice points under a line.

Now, what happens if the line is steep because a is large compared to m? Each step in x adds roughly a/m to y, and if a ≥ m then “whole steps” of y happen regularly. Those whole steps form an arithmetic progression that we can count directly — that’s why we extract q_a = floor(a/m), contributing q_a times the triangular number n(n−1)/2. Similarly, a large intercept b contributes q_b = floor(b/m) to every column, adding q_b * n. After removing these whole parts, the line’s slope and intercept become the remainders a % m and b % m, making the line “flatter” within a single block of size m.

The remaining cleverness is to turn the picture sideways. Instead of counting columns up to height floor((a x + b)/m), we count horizontal layers up to y_max = floor((a n + b)/m). By swapping the roles of numerator and denominator (a and m) and translating the intercept appropriately, we get a smaller instance of the same problem. This is the same spirit as the Euclidean algorithm: we keep reducing and swapping until the problem becomes trivial. Geometrically, it’s like tiling the area under the line with rectangles and smaller slanted slices, peeling them off layer by layer until nothing remains.

03Formal Definition

Define S(n, m, a, b) =

\sum_{i = 0}^{n - 1}

\left\lfloor

\frac{ai + b}{m}

\right\rfloor

for integers

n \geq 0

and

m \geq 1

, where typically a,

b \geq 0

. The algorithm relies on two identities. 1) Whole-part extraction: write

a = q_{a}

m +

r_{a}

and

b = q_{b}

m +

r_{b}

with 0 ≤

r_{a}

r_{b} < m

. Then S(n, m, a, b) =

q_{a}

\cdot

\frac{n ( n - 1 )}{2}

q_{b}

\cdot

n + S(n, m,

r_{a}

r_{b}

). This follows from distributing

q_{a}

i +

q_{b}

inside the floor and summing the arithmetic progression. 2) Core reduction (with 0 ≤ a,

b < m

and

a > 0

): let

y_{m a x}

\left\lfloor

\frac{an + b}{m}

\right\rfloor

and

x_{m a x} = y_{m a x}

m - b. Define

i_{0}

\left\lceil

\frac{x _{m a x}}{a}

\right\rceil

. Then S(n, m, a, b) = (n -

i_{0}

)

\cdot

y_{m a x}

+ S

\bigl

(

y_{m a x}

,\ a,\ m,\

\bigl

(a - (

x_{m a x}

mod

\bigr

)

mod

\bigr

). This identity counts the top

y_{m a x}

layers explicitly for the last n -

i_{0}

columns and re-expresses the remaining lower part as a floor sum with swapped parameters. Each application decreases the parameters analogously to the Euclidean algorithm, giving logarithmic complexity. Edge cases: if

a = 0

, then S(n, m, 0, b) = n

\cdot

\left\lfloor

\frac{b}{m}

\right\rfloor

. If

y_{m a x} = 0

after normalization, the sum is zero. These identities underpin an iterative implementation that avoids recursion.

04When to Use

Use the floor sum when you need to compute sums of the form \sum_{i=0}^{n-1} \left\lfloor \frac{a i + b}{m} \right\rfloor or, equivalently, when counting lattice points under a rational-slope line in a vertical strip. Typical competitive programming cases include:

Summing an arithmetic progression modulo m: \sum ((a i + b) \bmod m). This reduces to known terms minus m times a floor sum.
Counting pairs (i, j) satisfying linear inequalities like m j ≤ a i + b with bounded i (0 ≤ i < n). The count equals the floor sum directly.
Working with Beatty-like sequences or partitioning ranges by quotient values when dividing by a linear function.
Number-theory problems where parameters are large (up to 10^{18}) and direct iteration is impossible, but gcd-like logarithmic reductions are fine.
Geometry on grids: computing the number of grid points under or on a line segment with integer endpoints projected onto a strip.

When a and m are both large but their ratio is moderate, the algorithm remains fast because each reduction mirrors the Euclidean algorithm’s complexity. If your expression deviates from linear-over-constant structure (e.g., nonlinear numerators, varying denominators), this technique does not apply directly.

⚠️Common Mistakes

Forgetting whole-part extraction: not reducing a and b by m first misses the large, easy contributions and can slow or break the logic. Always add q_a * n(n−1)/2 and q_b * n, then set a %= m, b %= m.
Off-by-one in i0: i0 = ceil(x_max / a) must be computed as (x_max + a − 1) / a for integers. Using floor or omitting +a−1 produces incorrect counts for boundary columns.
Overflow in 64-bit: terms like n(n−1)/2 or (n−1) n (a/m) can overflow 64-bit even when the final answer fits. Use 128-bit temporaries (__int128) for products, then cast back to 64-bit.
Mishandling negatives: the standard routine assumes a, b, n, m with n ≥ 0, m ≥ 1, and typically a, b ≥ 0. If negative values are required, you must normalize carefully or use a signed-aware variant; otherwise results are wrong.
Infinite loops by skipping termination: after normalization, if y_max = floor((a n + b)/m) is zero, the remaining sum is zero and you must return; forgetting this causes division-by-zero or loops.
Recursion depth: a recursive implementation can be deep in worst cases; prefer an iterative loop that swaps (a, m) and updates (n, b) to avoid stack issues.
Using floating-point: never compute with doubles; all steps are exact in integers, and floating-point can introduce subtle rounding errors.

Key Formulas

Floor sum definition

S (n, m, a, b) = i = 0 \sum n - 1 ⌊ \frac{ai + b}{m} ⌋

Explanation: This is the quantity we want to compute efficiently. It sums the integer quotients of a linear function over a range of i.

Division with remainder

a = q_{a} m + r_{a}, b = q_{b} m + r_{b}, 0 \leq r_{a}, r_{b} < m

Explanation: We split a and b into whole multiples of m plus remainders. This enables extracting large, easy-to-sum parts before recurring on smaller remainders.

Whole-part extraction

S (n, m, a, b) = q_{a} \cdot \frac{n ( n - 1 )}{2} + q_{b} \cdot n + S (n, m, a mod m, b mod m)

Explanation: The $q_{a}$ term adds an arithmetic progression over i and the $q_{b}$ term adds a constant to each summand. After removing them, only remainders are left to handle.

Key parameters

y_{m a x} = ⌊ \frac{an + b}{m} ⌋, x_{m a x} = y_{m a x} m - b, i_{0} = ⌈ \frac{x _{m a x}}{a} ⌉

Explanation: These values describe the top horizontal layer under the line and where it begins across the columns. They drive the main reduction step.

Core reduction (0 \le a,b < m)

S (n, m, a, b) = (n - i_{0}) \cdot y_{m a x} + S (y_{m a x}, a, m, (a - (x_{m a x} mod a)) mod a)

Explanation: Counts the top $y_{m} a x$ layers for the last n − i0 columns and re-expresses the remaining smaller area with swapped roles of a and m. This shrinks parameters logarithmically.

Sum of progression modulo m

i = 0 \sum n - 1 ((ai + b) mod m) = a \cdot \frac{n ( n - 1 )}{2} + bn - m \cdot S (n, m, a, b)

Explanation: The modulo is the original value minus m times the floor. This identity converts a sum of mods into a floor sum plus easy arithmetic terms.

Time complexity

O (lo g (min {a, m}))

Explanation: Each reduction step swaps and reduces (a, m) similarly to the Euclidean algorithm for gcd, yielding a logarithmic number of iterations.

Triangular number

\frac{n ( n - 1 )}{2} = i = 0 \sum n - 1 i

Explanation: This is the sum of the first n − 1 integers. It appears when summing the whole-part contribution from $q_{a}$ across i.

Complexity Analysis

Let T(a, m) denote the number of iterations after the whole-part extraction reduces (a, b) modulo m. When 0 ≤ a,

b < m

and

a > 0

, the algorithm computes y_

ma x = f l oor ((a

n + b)/m). It then updates (n, m, a, b) to (

y_{m} a x

, a, m, ne

w_{b}

) and swaps a with m. This is analogous to the Euclidean algorithm: the pair (a, m) becomes (m, a) with a reduced by modulo in the previous step. Consequently, the number of loop iterations is bounded by O(log(min(a, m))) (more conservatively O(log(a + m))). Each iteration performs O(1) arithmetic operations: a few multiplications/divisions and constant-time updates of n, a, b, m. Therefore, total time is O(log(min(a, m))). Space usage is O(1) auxiliary memory because we maintain a constant number of integer variables and use no recursion (iterative implementation). Care must be taken with integer overflow: terms such as (n − 1) * n *

\frac{a}{m}

/ 2, a * n, and

y_{m} a x

* m can overflow 64-bit if computed naively. Using 128-bit temporaries (e.g., C++ __int128) ensures correctness without affecting asymptotic complexity. The number of iterations is small (typically under ~120 even for 64-bit inputs), so constant factors are low and performance is excellent for competitive programming constraints with inputs up to 10^{18}.

Code Examples

Iterative floor_sum implementation (ACL-style) with 128-bit safety

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Computes S(n, m, a, b) = sum_{i=0}^{n-1} floor((a*i + b)/m)
5 // Assumes: n >= 0, m >= 1, and typically a, b >= 0.
6 // Uses iterative reduction to avoid recursion depth, with __int128 to avoid overflow.
7 long long floor_sum(long long n, long long m, long long a, long long b) {
8     long long ans = 0;
9     while (true) {
10         // Extract whole parts when a >= m: contributes (a/m) * sum_{i=0}^{n-1} i
11         if (a >= m) {
12             long long q = a / m;
13             __int128 add = (__int128)(n - 1) * n / 2 * q; // safe in 128-bit
14             ans += (long long)add;
15             a %= m;
16         }
17         // Extract whole parts when b >= m: contributes n * (b/m)
18         if (b >= m) {
19             long long q = b / m;
20             __int128 add = (__int128)n * q;
21             ans += (long long)add;
22             b %= m;
23         }
24         // Now 0 <= a, b < m (or a == 0). If a == 0, the remaining sum is zero.
25         if (a == 0) return ans;
26         // y_max = floor((a*n + b)/m). If zero, nothing more to add.
27         long long y_max = (long long)(((__int128)a * n + b) / m);
28         if (y_max == 0) return ans;
29         // x_max = y_max * m - b, i0 = ceil(x_max / a)
30         long long x_max = (long long)((__int128)y_max * m - b);
31         long long i0 = (x_max + a - 1) / a; // ceil division, a > 0 here
32         // Add the top y_max layers for the last (n - i0) columns
33         ans += (n - i0) * y_max;
34         // Shrink the problem and swap roles of a and m
35         n = y_max;
36         b = (a - (x_max % a)) % a; // new intercept, stays in [0, a)
37         swap(a, m);
38     }
39 }
40 
41 int main() {
42     ios::sync_with_stdio(false);
43     cin.tie(nullptr);
44 
45     // Demo: compute S(n,m,a,b) for a few cases
46     vector<tuple<long long,long long,long long,long long>> tests = {
47         {5, 7, 4, 3},     // sum_{i=0}^{4} floor((4i+3)/7)
48         {10, 6, 8, 5},    // larger a than m
49         {3, 3, 2, 0},     // expected 1
50         {1000000000000LL, 1000000000LL, 123456789LL, 987654321LL}
51     };
52 
53     for (auto [n, m, a, b] : tests) {
54         cout << "S(" << n << "," << m << "," << a << "," << b << ") = "
55              << floor_sum(n, m, a, b) << "\n";
56     }
57     return 0;
58 }
59

Implements the O(log(min(a,m))) floor sum using whole-part extraction and a Euclidean-like reduction. The variables y_max, x_max, and i0 identify the top horizontal layers to count immediately; then the instance shrinks by swapping (a, m). 128-bit temporaries ensure no overflow in intermediate products.

Time: O(log(min(a, m)))Space: O(1)

Sum of arithmetic progression modulo m via floor_sum

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 long long floor_sum(long long n, long long m, long long a, long long b) {
5     long long ans = 0;
6     while (true) {
7         if (a >= m) {
8             long long q = a / m;
9             __int128 add = (__int128)(n - 1) * n / 2 * q;
10             ans += (long long)add;
11             a %= m;
12         }
13         if (b >= m) {
14             long long q = b / m;
15             __int128 add = (__int128)n * q;
16             ans += (long long)add;
17             b %= m;
18         }
19         if (a == 0) return ans;
20         long long y_max = (long long)(((__int128)a * n + b) / m);
21         if (y_max == 0) return ans;
22         long long x_max = (long long)((__int128)y_max * m - b);
23         long long i0 = (x_max + a - 1) / a;
24         ans += (n - i0) * y_max;
25         n = y_max;
26         b = (a - (x_max % a)) % a;
27         swap(a, m);
28     }
29 }
30 
31 // Computes sum_{i=0}^{n-1} ((a*i + b) % m) in O(log(min(a,m)))
32 long long sum_progression_mod(long long n, long long m, long long a, long long b) {
33     // Using identity: sum mod = a * n(n-1)/2 + b*n - m * floor_sum(n, m, a, b)
34     __int128 total = 0;
35     total += (__int128)a * (n - 1) * n / 2; // a * n(n-1)/2
36     total += (__int128)b * n;               // + b*n
37     total -= (__int128)m * floor_sum(n, m, a, b);
38     return (long long)total;
39 }
40 
41 int main() {
42     ios::sync_with_stdio(false);
43     cin.tie(nullptr);
44 
45     // Example: sum of (3i + 2) mod 7 for i = 0..9
46     long long n = 10, m = 7, a = 3, b = 2;
47     long long ans = sum_progression_mod(n, m, a, b);
48 
49     // Verify by brute force for small n
50     long long brute = 0;
51     for (long long i = 0; i < n; ++i) brute += (a * i + b) % m;
52 
53     cout << "sum_{i=0}^{" << (n-1) << "} ((" << a << "*i+" << b << ") % " << m << ") = " << ans << "\n";
54     cout << "brute = " << brute << "\n";
55     return 0;
56 }
57

Uses the identity sum ((a i + b) mod m) = a*n(n−1)/2 + b*n − m*S(n,m,a,b). This is common when you need the total of a linear progression reduced modulo m without iterating over i.

Time: O(log(min(a, m)))Space: O(1)

Counting lattice points under a rational line in a strip

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 long long floor_sum(long long n, long long m, long long a, long long b) {
5     long long ans = 0;
6     while (true) {
7         if (a >= m) {
8             long long q = a / m;
9             __int128 add = (__int128)(n - 1) * n / 2 * q;
10             ans += (long long)add;
11             a %= m;
12         }
13         if (b >= m) {
14             long long q = b / m;
15             __int128 add = (__int128)n * q;
16             ans += (long long)add;
17             b %= m;
18         }
19         if (a == 0) return ans;
20         long long y_max = (long long)(((__int128)a * n + b) / m);
21         if (y_max == 0) return ans;
22         long long x_max = (long long)((__int128)y_max * m - b);
23         long long i0 = (x_max + a - 1) / a;
24         ans += (n - i0) * y_max;
25         n = y_max;
26         b = (a - (x_max % a)) % a;
27         swap(a, m);
28     }
29 }
30 
31 // Counts the number of integer points (x, y) with 0 <= x <= n-1 and 0 <= y < (a*x + b)/m
32 // This equals S(n, m, a, b).
33 int main() {
34     ios::sync_with_stdio(false);
35     cin.tie(nullptr);
36 
37     long long n = 12;         // number of columns: x = 0..11
38     long long a = 9, b = 7;   // line: y = (a*x + b)/m
39     long long m = 10;
40 
41     long long count_points = floor_sum(n, m, a, b);
42 
43     cout << "Lattice points under y = (" << a << "*x + " << b << ")/" << m
44          << " for x in [0," << (n-1) << "] = " << count_points << "\n";
45 
46     // Optional small verification by brute force
47     long long brute = 0;
48     for (long long x = 0; x < n; ++x) {
49         long long ymax = (a * x + b) / m; // floor division
50         brute += ymax;
51     }
52     cout << "brute = " << brute << "\n";
53     return 0;
54 }
55

Interprets the floor sum as counting lattice points beneath the rational line y = (a x + b)/m within the strip x ∈ {0, …, n−1}. The floor_sum function directly yields the count, useful in combinatorics and geometry problems.

Time: O(log(min(a, m)))Space: O(1)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Computes S(n, m, a, b) = sum_{i=0}^{n-1} floor((a*i + b)/m)
5	// Assumes: n >= 0, m >= 1, and typically a, b >= 0.
6	// Uses iterative reduction to avoid recursion depth, with __int128 to avoid overflow.
7	long long floor_sum(long long n, long long m, long long a, long long b) {
8	long long ans = 0;
9	while (true) {
10	// Extract whole parts when a >= m: contributes (a/m) * sum_{i=0}^{n-1} i
11	if (a >= m) {
12	long long q = a / m;
13	__int128 add = (__int128)(n - 1) * n / 2 * q; // safe in 128-bit
14	ans += (long long)add;
15	a %= m;
16	}
17	// Extract whole parts when b >= m: contributes n * (b/m)
18	if (b >= m) {
19	long long q = b / m;
20	__int128 add = (__int128)n * q;
21	ans += (long long)add;
22	b %= m;
23	}
24	// Now 0 <= a, b < m (or a == 0). If a == 0, the remaining sum is zero.
25	if (a == 0) return ans;
26	// y_max = floor((a*n + b)/m). If zero, nothing more to add.
27	long long y_max = (long long)(((__int128)a * n + b) / m);
28	if (y_max == 0) return ans;
29	// x_max = y_max * m - b, i0 = ceil(x_max / a)
30	long long x_max = (long long)((__int128)y_max * m - b);
31	long long i0 = (x_max + a - 1) / a; // ceil division, a > 0 here
32	// Add the top y_max layers for the last (n - i0) columns
33	ans += (n - i0) * y_max;
34	// Shrink the problem and swap roles of a and m
35	n = y_max;
36	b = (a - (x_max % a)) % a; // new intercept, stays in [0, a)
37	swap(a, m);
38	}
39	}
40
41	int main() {
42	ios::sync_with_stdio(false);
43	cin.tie(nullptr);
44
45	// Demo: compute S(n,m,a,b) for a few cases
46	vector<tuple<long long,long long,long long,long long>> tests = {
47	{5, 7, 4, 3}, // sum_{i=0}^{4} floor((4i+3)/7)
48	{10, 6, 8, 5}, // larger a than m
49	{3, 3, 2, 0}, // expected 1
50	{1000000000000LL, 1000000000LL, 123456789LL, 987654321LL}
51	};
52
53	for (auto [n, m, a, b] : tests) {
54	cout << "S(" << n << "," << m << "," << a << "," << b << ") = "
55	<< floor_sum(n, m, a, b) << "\n";
56	}
57	return 0;
58	}
59