NTT (Number Theoretic Transform)
Key Points
- •The Number Theoretic Transform (NTT) is an FFT-like algorithm that performs discrete convolutions exactly using modular arithmetic instead of floating-point numbers.
- •NTT requires a prime modulus p with a large power-of-two factor in p−1 so that high-order roots of unity exist (e.g., 998244353 = 119×2^{23}+1 with primitive root g=3).
- •Convolution via NTT runs in O(n n) time and avoids rounding errors, making it ideal for competitive programming and exact polynomial operations.
- •NTT computes transforms using butterfly operations, bit-reversal permutations, and modular powers of a primitive root of unity.
- •To multiply under a non-NTT-friendly modulus (like 10^9+7) or with large integers, combine results from multiple NTT-friendly primes using the Chinese Remainder Theorem (CRT).
- •Always pad vectors to a power-of-two size at least the sum of lengths minus one to match the needed convolution length.
- •Remember to multiply by the modular inverse of n when doing the inverse transform to correctly scale results.
- •Careful implementation details—like precomputing roots, handling negatives modulo p, and avoiding 32-bit overflow—are crucial for correctness and speed.
Prerequisites
- →Discrete Fourier Transform and FFT — NTT mirrors FFT structure; understanding DFT/FFT illuminates why convolution becomes pointwise multiplication.
- →Modular Arithmetic — NTT operates entirely modulo a prime; you must be comfortable with modular addition, multiplication, and reduction.
- →Fast Modular Exponentiation — Computing powers of primitive roots and modular inverses via Fermat’s little theorem is core to NTT.
- →Primitive Roots and Orders — Existence of n-th roots of unity in Z_p requires that n divides p−1 and knowledge of primitive roots.
- →Chinese Remainder Theorem (CRT) — Combining results from multiple NTT primes to reach arbitrary moduli or exact integers depends on CRT/Garner.
- →Asymptotic Complexity — Understanding O(n \log n) informs when NTT beats naive O(n^2) multiplication.
- →64-bit Arithmetic and Overflow — NTT and CRT multiplications must use 64-bit or 128-bit intermediates to avoid overflow.
Detailed Explanation
Tap terms for definitions01Overview
The Number Theoretic Transform (NTT) is the integer-arithmetic twin of the Fast Fourier Transform (FFT). Where the FFT evaluates a polynomial at complex roots of unity and risks floating-point rounding errors, the NTT evaluates at modular roots of unity inside a finite field (integers modulo a prime p). This allows exact computations with integers, which is essential in competitive programming and cryptographic protocols. The most common practical use is fast polynomial (or integer) multiplication via convolution: given sequences a and b, NTT computes their convolution c where c[k] = sum_i a[i] * b[k-i]. Like FFT, it achieves O(n log n) performance by recursively (or iteratively) structuring the calculations into butterfly steps and exploiting roots of unity.
However, NTT requires special primes p such that p−1 is divisible by a large power of two: this ensures that high-order roots of unity exist. A canonical example is p = 998244353 with primitive root g = 3; it supports length up to 2^{23} transforms. When the target modulus (e.g., 10^9+7) is not NTT-friendly, one can perform NTT under multiple friendly moduli and then reconstruct the exact integer result via the Chinese Remainder Theorem (CRT), finally reducing modulo the target. This approach retains exactness while keeping the asymptotic complexity.
In practice, NTT implementations hinge on efficient modular exponentiation, precomputation of twiddle factors (powers of primitive roots), in-place bit-reversal permutation, and careful scaling in the inverse transform. With these, NTT provides fast, deterministic, and exact convolution for large inputs.
02Intuition & Analogies
Imagine you want to know how two sound waves combine. The FFT lets you switch to a frequency view, multiply the corresponding frequencies, and switch back—far faster than directly summing overlaps in time. NTT does the same trick but replaces continuous waves with exact, discrete arithmetic in a finite world where numbers wrap around modulo a prime p.
Think of evaluating a polynomial at special points that "cycle" perfectly after a fixed number of steps—these are roots of unity. In the real/complex world, you would use angles on the unit circle (cosines and sines). In the NTT world, you pick a special number g (a primitive root modulo p) whose powers loop back to 1 after p−1 steps. By choosing the right step size, you get points that loop every n steps—these are the n-th roots of unity modulo p.
Why is this powerful? Because evaluating a polynomial at these structured points lets you turn convolution (which is like sliding one sequence across another and summing overlaps) into pointwise multiplication. After multiplying the values at those points, you transform back to get the exact convolution. No rounding, no floating errors, just pure modular arithmetic. The only caveat: your finite world must contain those special points. So you pick a prime p whose p−1 has a big power of 2 factor, ensuring many roots of unity exist. This is why 998244353 is beloved: it’s big enough and has 2^{23} dividing p−1, so you can handle sequences up to 8 million length with power-of-two padding. When your end goal is a different modulus (like 1e9+7) or even raw integer results, you can combine multiple NTT computations using CRT to stitch together the exact answer.
03Formal Definition
04When to Use
Use NTT when you need exact integer polynomial or sequence convolution at large scale: multiplying polynomials, computing power series, combinatorics (e.g., counting paths, subset convolutions when adapted), and big integer multiplication. It shines when rounding errors from FFT would be problematic, such as in competitive programming problems that demand exact answers or modular arithmetic.
- Convolution under an NTT-friendly modulus: If your target modulus is 998244353, 1004535809, 167772161, etc., NTT is a direct drop-in with minimal overhead.
- Convolution under arbitrary prime/composite modulus (e.g., 10^9+7): Perform multiple NTTs under different NTT-friendly primes and reconstruct via CRT (or Garner) modulo your target.
- Big integer multiplication: Split the numbers into base-B digits, convolve using NTT under one or more primes, and carry-compact to obtain exact integers.
- Power series operations: Inversion, logarithm, exponentiation, and division of polynomials over (\mathbb{Z}_p) rely heavily on fast convolution.
Avoid NTT when sequences are very small (naive O(n^2) might be faster) or when you cannot ensure that required transform lengths divide p−1. For non power-of-two lengths, you may either pad to the next power of two or use Bluestein’s algorithm variant, though padding is typically simpler and faster under common NTT primes.
⚠️Common Mistakes
- Using a modulus without sufficient 2-adic length: If n does not divide p−1, the n-th roots of unity do not exist, and the transform will be incorrect. Always ensure the padded length n is a power of two that divides p−1.
- Forgetting the inverse scaling: The inverse NTT must multiply by n^{-1} modulo p. Omitting this leaves results off by a factor of n.
- Wrong primitive root or power: Picking (\omega = g^{(p-1)/n}) where g is not a primitive root—or where n does not exactly divide p−1—breaks correctness. Verify g is a primitive root and n | (p−1).
- Integer overflow before reduction: Multiplying two 32-bit ints can overflow 32-bit. Use 64-bit (int64_t or __int128) temporaries before reducing modulo p.
- Mishandling negatives: After subtraction in butterflies, values may become negative. Normalize by adding p before a modulo reduction to keep values in [0, p−1].
- Incorrect padding length: For linear convolution of sizes n_a and n_b, pad to n ≥ n_a + n_b − 1, usually rounding up to the next power of two supported by p.
- Mixing moduli in CRT improperly: When reconstructing under a target modulus, ensure moduli are pairwise coprime, precompute modular inverses correctly, and use 128-bit intermediates to avoid overflow.
- Reusing precomputed roots across different n incorrectly: Precomputed twiddle tables depend on the stage length. Either precompute per power-of-two or compute wlen freshly at each stage.
Key Formulas
NTT (forward)
Explanation: This is the discrete transform over a finite field: evaluate the polynomial at powers of a primitive n-th root of unity modulo p. It maps time-domain coefficients to frequency-domain values.
Inverse NTT
Explanation: To invert, use the inverse root of unity and multiply by the modular inverse of n. This perfectly recovers the original coefficients.
Linear convolution
Explanation: The k-th coefficient of the product polynomial equals the sum over all index pairs adding to k. Computing this naively is O(), which NTT reduces to O(n log n).
Primitive n-th root of unity
Explanation: Given a primitive root g modulo p and a transform length n dividing p−1, this constructs an element of exact order n. Its powers are the evaluation points of the NTT.
Time Complexity
Explanation: The NTT, like FFT, uses divide-and-conquer butterfly stages. There are log n stages, each doing O(n) work, leading to O(n log n) total operations.
CRT for two moduli
Explanation: Combines residues r0 and r1 modulo coprime m0 and m1 into a unique solution modulo m0·m1. This is used to merge NTT results from two primes.
Garner lifting step
Explanation: Garner's algorithm incrementally reconstructs a residue modulo a target without forming huge products. is the current combined modulus product.
Padding length
Explanation: For linear convolution of sizes and , pad to the next power of two at least as large as +−1. This ensures the transform length is valid.
Fermat inverse
Explanation: For prime p and nonzero a, the modular inverse can be computed via fast exponentiation using Fermat’s little theorem.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // NTT under modulus 998244353 with primitive root g = 3 5 // Supports power-of-two lengths up to 2^23. 6 7 static const uint32_t MOD = 998244353; // 119 * 2^23 + 1 8 static const uint32_t G = 3; // primitive root modulo MOD 9 10 uint32_t addmod(uint32_t a, uint32_t b) { 11 uint32_t s = a + b; 12 if (s >= MOD) s -= MOD; 13 return s; 14 } 15 uint32_t submod(uint32_t a, uint32_t b) { 16 return a >= b ? a - b : a + MOD - b; 17 } 18 uint32_t powmod(uint32_t a, uint32_t e) { 19 uint64_t r = 1, x = a; 20 while (e) { 21 if (e & 1) r = (r * x) % MOD; 22 x = (x * x) % MOD; 23 e >>= 1; 24 } 25 return (uint32_t)r; 26 } 27 28 void ntt(vector<uint32_t> &a, bool invert) { 29 int n = (int)a.size(); 30 // Bit-reversal permutation 31 for (int i = 1, j = 0; i < n; i++) { 32 int bit = n >> 1; 33 for (; j & bit; bit >>= 1) j ^= bit; 34 j ^= bit; 35 if (i < j) swap(a[i], a[j]); 36 } 37 38 for (int len = 2; len <= n; len <<= 1) { 39 // wlen = g^{(MOD-1)/len} 40 uint32_t wlen = powmod(G, (MOD - 1) / len); 41 if (invert) { 42 // use inverse root for inverse NTT 43 wlen = powmod(wlen, MOD - 2); 44 } 45 for (int i = 0; i < n; i += len) { 46 uint64_t w = 1; 47 for (int j = 0; j < len / 2; j++) { 48 uint32_t u = a[i + j]; 49 uint32_t v = (uint32_t)((w * a[i + j + len / 2]) % MOD); 50 a[i + j] = addmod(u, v); 51 a[i + j + len / 2] = submod(u, v); 52 w = (w * wlen) % MOD; 53 } 54 } 55 } 56 57 if (invert) { 58 uint32_t inv_n = powmod(n, MOD - 2); 59 for (int i = 0; i < n; i++) { 60 a[i] = (uint32_t)((uint64_t)a[i] * inv_n % MOD); 61 } 62 } 63 } 64 65 vector<uint32_t> convolution_mod(const vector<uint32_t>& A, const vector<uint32_t>& B) { 66 if (A.empty() || B.empty()) return {}; 67 int nA = (int)A.size(), nB = (int)B.size(); 68 int need = nA + nB - 1; 69 int n = 1; while (n < need) n <<= 1; 70 vector<uint32_t> fa(A.begin(), A.end()), fb(B.begin(), B.end()); 71 fa.resize(n); fb.resize(n); 72 73 ntt(fa, false); 74 ntt(fb, false); 75 for (int i = 0; i < n; i++) fa[i] = (uint32_t)((uint64_t)fa[i] * fb[i] % MOD); 76 ntt(fa, true); 77 78 fa.resize(need); 79 return fa; 80 } 81 82 int main() { 83 ios::sync_with_stdio(false); 84 cin.tie(nullptr); 85 86 // Example: multiply polynomials (2 + 3x + 4x^2) * (1 + 2x) 87 vector<uint32_t> A = {2, 3, 4}; 88 vector<uint32_t> B = {1, 2}; 89 90 vector<uint32_t> C = convolution_mod(A, B); 91 92 // Expected: [2*1, 2*2 + 3*1, 3*2 + 4*1, 4*2] = [2, 7, 10, 8] 93 for (size_t i = 0; i < C.size(); i++) { 94 cout << C[i] << (i + 1 == C.size() ? '\n' : ' '); 95 } 96 return 0; 97 } 98
Implements an iterative Cooley–Tukey NTT under 998244353 with primitive root 3. The convolution pads inputs to power-of-two length, performs two forward transforms, multiplies pointwise, and applies the inverse transform with scaling by n^{-1}.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // We will convolve two sequences and return results modulo 1e9+7 (not NTT-friendly) 5 // by combining three NTT-friendly primes via CRT. 6 7 struct ModNTT { 8 uint32_t MOD, G; 9 ModNTT(uint32_t mod, uint32_t g): MOD(mod), G(g) {} 10 inline uint32_t add(uint32_t a, uint32_t b) const { uint32_t s=a+b; return s>=MOD? s-MOD : s; } 11 inline uint32_t sub(uint32_t a, uint32_t b) const { return a>=b? a-b : a+MOD-b; } 12 uint32_t powmod(uint32_t a, uint32_t e) const { 13 uint64_t r=1, x=a; 14 while(e){ if(e&1) r=(r*x)%MOD; x=(x*x)%MOD; e>>=1; } 15 return (uint32_t)r; 16 } 17 void ntt(vector<uint32_t>& a, bool inv) const { 18 int n=(int)a.size(); 19 for(int i=1,j=0;i<n;i++){ 20 int bit=n>>1; for(; j & bit; bit>>=1) j^=bit; j^=bit; if(i<j) swap(a[i],a[j]); 21 } 22 for(int len=2; len<=n; len<<=1){ 23 uint32_t wlen = powmod(G, (MOD-1)/len); 24 if(inv) wlen = powmod(wlen, MOD-2); 25 for(int i=0;i<n;i+=len){ 26 uint64_t w=1; 27 for(int j=0;j<len/2;j++){ 28 uint32_t u=a[i+j]; 29 uint32_t v=(uint32_t)((w*a[i+j+len/2])%MOD); 30 a[i+j] = add(u,v); 31 a[i+j+len/2] = sub(u,v); 32 w = (w*wlen)%MOD; 33 } 34 } 35 } 36 if(inv){ 37 uint32_t inv_n = powmod(n, MOD-2); 38 for(int i=0;i<n;i++) a[i] = (uint32_t)((uint64_t)a[i]*inv_n % MOD); 39 } 40 } 41 42 vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B) const { 43 if(A.empty() || B.empty()) return {}; 44 int need = (int)A.size() + (int)B.size() - 1; 45 int n=1; while(n<need) n<<=1; 46 vector<uint32_t> fa(A.begin(),A.end()), fb(B.begin(),B.end()); 47 fa.resize(n); fb.resize(n); 48 ntt(fa,false); ntt(fb,false); 49 for(int i=0;i<n;i++) fa[i] = (uint32_t)((uint64_t)fa[i]*fb[i]%MOD); 50 ntt(fa,true); 51 fa.resize(need); 52 return fa; 53 } 54 }; 55 56 static const uint32_t MOD_TARGET = 1000000007u; // 1e9+7 57 58 // Three NTT-friendly primes (all have primitive root 3) 59 static const uint32_t P[3] = {998244353u, 1004535809u, 167772161u}; 60 static const uint32_t G[3] = {3u, 3u, 3u}; 61 62 // Modular exponentiation for arbitrary modulus (used to compute inverses mod primes) 63 uint64_t mod_pow64(uint64_t a, uint64_t e, uint64_t m){ 64 __int128 r=1, x=a % m; 65 while(e){ if(e&1) r=(r*x)%m; x=(x*x)%m; e>>=1; } 66 return (uint64_t)r; 67 } 68 69 // Reconstruct a single coefficient modulo MOD_TARGET using Garner-like steps 70 uint32_t crt_to_target(uint32_t r0, uint32_t r1, uint32_t r2){ 71 // x ≡ r0 (mod P0) 72 uint64_t x0 = r0 % MOD_TARGET; 73 74 // Lift to include P1 75 uint64_t m0 = P[0]; 76 uint64_t t1 = r1 >= r0 ? (r1 - r0) : (uint64_t)r1 + P[1] - r0; // in [0, P1) 77 uint64_t inv_m0_mod_p1 = mod_pow64(m0 % P[1], P[1]-2, P[1]); 78 t1 = (t1 * inv_m0_mod_p1) % P[1]; 79 __int128 M0_mod_T = m0 % MOD_TARGET; 80 x0 = (uint64_t)(((__int128)x0 + (__int128)t1 * M0_mod_T) % MOD_TARGET); 81 82 // Lift to include P2 83 uint64_t m01_mod_p2 = (uint64_t)(((__int128)P[0] % P[2]) * (P[1] % P[2]) % P[2]); 84 uint64_t t2_rhs; 85 // Compute current x modulo P2 without forming huge product: x = r0 + m0*t1 (mod P2) 86 uint64_t r0_plus = (r0 + (uint64_t)(((__int128)(P[0]%P[2]) * (t1 % P[2])) % P[2])) % P[2]; 87 t2_rhs = r2 >= r0_plus ? (r2 - r0_plus) : (uint64_t)r2 + P[2] - r0_plus; 88 uint64_t inv_m01_mod_p2 = mod_pow64(m01_mod_p2, P[2]-2, P[2]); 89 uint64_t t2 = (t2_rhs * inv_m01_mod_p2) % P[2]; 90 91 // Update x0 modulo target: multiply by (P0*P1) mod MOD_TARGET 92 uint64_t M01_mod_T = (uint64_t)(((__int128)(P[0]%MOD_TARGET) * (P[1]%MOD_TARGET)) % MOD_TARGET); 93 x0 = (uint64_t)(((__int128)x0 + (__int128)t2 * M01_mod_T) % MOD_TARGET); 94 return (uint32_t)x0; 95 } 96 97 vector<uint32_t> convolution_mod_1e9p7(const vector<uint32_t>& A, const vector<uint32_t>& B){ 98 ModNTT N0(P[0], G[0]); 99 ModNTT N1(P[1], G[1]); 100 ModNTT N2(P[2], G[2]); 101 vector<uint32_t> c0 = N0.convolution(A,B); 102 vector<uint32_t> c1 = N1.convolution(A,B); 103 vector<uint32_t> c2 = N2.convolution(A,B); 104 size_t n = c0.size(); 105 vector<uint32_t> res(n); 106 for(size_t i=0;i<n;i++) res[i] = crt_to_target(c0[i], c1[i], c2[i]); 107 return res; 108 } 109 110 int main(){ 111 ios::sync_with_stdio(false); 112 cin.tie(nullptr); 113 114 // Example: multiply under 1e9+7 target modulus 115 vector<uint32_t> A = {1000000000u, 2u, 3u}; // large values mod 1e9+7 are okay as residues 116 vector<uint32_t> B = {7u, 8u, 9u, 10u}; 117 // Reduce inputs modulo target? Not necessary; we want exact integer conv then mod 1e9+7. 118 // Values are treated as integers less than primes; safe as we use 64-bit temporaries. 119 120 vector<uint32_t> C = convolution_mod_1e9p7(A, B); 121 for (size_t i = 0; i < C.size(); i++) cout << C[i] << (i+1==C.size()?'\n':' '); 122 return 0; 123 } 124
Computes convolution modulo 1e9+7 by performing three NTT convolutions under different primes, then reconstructing each coefficient modulo 1e9+7 using a Garner-style CRT combination. It avoids large intermediate products by reducing at each step and uses 128-bit temporaries for safety.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Multiply big nonnegative integers given as decimal strings using NTT + CRT. 5 // We split digits in base B=10^4, convolve, then carry-compact to get the result. 6 // We use two NTT-friendly primes whose product fits in 64 bits for exact reconstruction. 7 8 struct NTT { 9 uint32_t MOD, G; 10 NTT(uint32_t mod, uint32_t g): MOD(mod), G(g) {} 11 inline uint32_t add(uint32_t a, uint32_t b) const { uint32_t s=a+b; return s>=MOD? s-MOD : s; } 12 inline uint32_t sub(uint32_t a, uint32_t b) const { return a>=b? a-b : a+MOD-b; } 13 uint32_t powmod(uint32_t a, uint32_t e) const { 14 uint64_t r=1, x=a; 15 while(e){ if(e&1) r=(r*x)%MOD; x=(x*x)%MOD; e>>=1; } 16 return (uint32_t)r; 17 } 18 void ntt(vector<uint32_t>& a, bool inv) const { 19 int n=(int)a.size(); 20 for(int i=1,j=0;i<n;i++){ 21 int bit=n>>1; for(; j & bit; bit>>=1) j^=bit; j^=bit; if(i<j) swap(a[i],a[j]); 22 } 23 for(int len=2; len<=n; len<<=1){ 24 uint32_t wlen = powmod(G, (MOD-1)/len); 25 if(inv) wlen = powmod(wlen, MOD-2); 26 for(int i=0;i<n;i+=len){ 27 uint64_t w=1; 28 for(int j=0;j<len/2;j++){ 29 uint32_t u=a[i+j]; 30 uint32_t v=(uint32_t)((w*a[i+j+len/2])%MOD); 31 a[i+j] = add(u,v); 32 a[i+j+len/2] = sub(u,v); 33 w = (w*wlen)%MOD; 34 } 35 } 36 } 37 if(inv){ 38 uint32_t inv_n = powmod(n, MOD-2); 39 for(int i=0;i<n;i++) a[i] = (uint32_t)((uint64_t)a[i]*inv_n % MOD); 40 } 41 } 42 vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B) const { 43 if(A.empty() || B.empty()) return {}; 44 int need = (int)A.size() + (int)B.size() - 1; 45 int n=1; while(n<need) n<<=1; 46 vector<uint32_t> fa(A.begin(),A.end()), fb(B.begin(),B.end()); 47 fa.resize(n); fb.resize(n); 48 ntt(fa,false); ntt(fb,false); 49 for(int i=0;i<n;i++) fa[i] = (uint32_t)((uint64_t)fa[i]*fb[i]%MOD); 50 ntt(fa,true); 51 fa.resize(need); 52 return fa; 53 } 54 }; 55 56 // Two primes (primitive root 3 for both) 57 static const uint32_t P0 = 998244353u; // ~1e9 58 static const uint32_t P1 = 1004535809u; // ~1e9 59 static const uint32_t G0 = 3u, G1 = 3u; 60 61 // Combine residues r0 (mod P0) and r1 (mod P1) into x in [0, P0*P1) 62 static inline unsigned long long crt2(uint32_t r0, uint32_t r1){ 63 unsigned long long m0 = P0, m1 = P1; 64 // t ≡ (r1 - r0) * inv(m0, m1) (mod m1) 65 unsigned long long diff = (r1 >= r0) ? (r1 - r0) : (unsigned long long)r1 + m1 - r0; 66 // inv_m0_mod_m1 via Fermat since m1 is prime 67 auto modpow = [](unsigned long long a, unsigned long long e, unsigned long long m){ __int128 r=1,x=a%m; while(e){ if(e&1) r=(r*x)%m; x=(x*x)%m; e>>=1;} return (unsigned long long)r; }; 68 unsigned long long inv_m0 = modpow(m0 % m1, m1 - 2, m1); 69 unsigned long long t = ( (__int128)diff * inv_m0 ) % m1; 70 unsigned long long x = (unsigned long long)r0 + (unsigned long long)((__int128)t * m0); 71 return x; // fits in < 1e18 72 } 73 74 vector<int> to_base(const string& s, int B){ 75 // Parse decimal string into base-B little-endian digits 76 vector<int> a; a.reserve((int)s.size()/4+2); 77 int n=(int)s.size(); 78 int i=n; 79 while(i>0){ 80 int l=max(0,i-4); // chunk 4 decimal digits 81 int val=0; 82 for(int k=l;k<i;k++) val = val*10 + (s[k]-'0'); 83 a.push_back(val); 84 i=l; 85 } 86 // Now a holds base-10^4 digits little-endian 87 return a; 88 } 89 90 string from_base(vector<unsigned long long>& a, int B){ 91 // Remove leading zeros 92 while(!a.empty() && a.back()==0) a.pop_back(); 93 if(a.empty()) return string("0"); 94 // Convert back to decimal string 95 const int WIDTH = 4; // since B=10^4 96 stringstream ss; 97 ss << a.back(); 98 char buf[8]; 99 for(int i=(int)a.size()-2;i>=0;i--){ 100 unsigned long long x = a[i]; 101 // print with leading zeros to WIDTH digits 102 snprintf(buf, sizeof(buf), "%04llu", x); 103 ss << buf; 104 } 105 return ss.str(); 106 } 107 108 string multiply_big(const string& A, const string& B){ 109 if (A=="0" || B=="0") return "0"; 110 const int BASE = 10000; // 1e4 111 112 vector<int> a = to_base(A, BASE); 113 vector<int> b = to_base(B, BASE); 114 115 NTT N0(P0, G0), N1(P1, G1); 116 vector<uint32_t> a0(a.size()), b0(b.size()), a1(a.size()), b1(b.size()); 117 for(size_t i=0;i<a.size();i++){ a0[i]=a[i]%P0; a1[i]=a[i]%P1; } 118 for(size_t i=0;i<b.size();i++){ b0[i]=b[i]%P0; b1[i]=b[i]%P1; } 119 120 vector<uint32_t> c0 = N0.convolution(a0,b0); 121 vector<uint32_t> c1 = N1.convolution(a1,b1); 122 123 size_t n = c0.size(); 124 vector<unsigned long long> coeff(n); 125 for(size_t i=0;i<n;i++) coeff[i] = crt2(c0[i], c1[i]); // exact coefficient value (< P0*P1) 126 127 // Carry handling in base BASE 128 unsigned long long carry=0; 129 for(size_t i=0;i<n;i++){ 130 unsigned __int128 v = (unsigned __int128)coeff[i] + carry; 131 coeff[i] = (unsigned long long)(v % BASE); 132 carry = (unsigned long long)(v / BASE); 133 } 134 while(carry){ coeff.push_back(carry % BASE); carry /= BASE; } 135 136 return from_base(coeff, BASE); 137 } 138 139 int main(){ 140 ios::sync_with_stdio(false); 141 cin.tie(nullptr); 142 143 string A, B; 144 // Demo: multiply two large numbers 145 A = "123456789012345678901234567890"; 146 B = "98765432109876543210"; 147 string C = multiply_big(A, B); 148 cout << C << "\n"; 149 return 0; 150 } 151
Parses decimal strings into base 10^4 digits, performs two NTT convolutions under 998244353 and 1004535809, combines coefficients exactly via CRT (product < 2^64), and then performs carry propagation to reconstruct the decimal result. This achieves exact big-integer multiplication in O(n log n).