⚙️AlgorithmAdvanced

Polynomial Operations

Key Points

•
Fast polynomial operations treat coefficients like numbers but use FFT/NTT to multiply in O(n $lo g$ n) time instead of O( $n^{2}$ ).
•
Division and modulo of polynomials can be done in O(n $lo g$ n) using Newton's method and reversed polynomials.
•
Formal power series let you compute polynomial inverse, logarithm, exponential, and square root with Newton iterations that double precision each step.
•
The key identities are $lo g$ F = $\int$ F'/F, $exp$ , and Newton updates like $g_{k + 1} = g_{k}$ $\big$ (2 - F $g_k\big$ ) for inversion.
•
Multipoint evaluation computes values at many $x_{i}$ using a product tree and repeated remainders in about O(n $lo g^{2}$ n).
•
Interpolation reconstructs the unique $d e g ree < n$ polynomial from ( $x_{i}$ , $y_{i}$ ) pairs using the same tree and derivative trick.
•
Numerical stability is avoided by working modulo a prime like 998244353 with NTT; pick a modulus with a suitable primitive root.
•
Common pitfalls include forgetting to normalize lengths, trailing zeros, wrong base cases, or violating preconditions like F(0) = 1 for log.

Prerequisites

→FFT/NTT basics — Fast multiplication underlies all advanced polynomial operations and must be understood first.
→Modular arithmetic and inverses — All computations are done modulo a prime; modular inverses are required for division, integration, and Newton iterations.
→Formal derivatives and integrals — Logarithm and exponential of series rely on derivative/integral identities.
→Newton's method — Power series inversion, exp, log, and sqrt use Newton iterations that double precision.
→Divide and Conquer — Multipoint evaluation and interpolation are implemented over a product tree.

Detailed Explanation

Tap terms for definitions

01Overview

Polynomial operations generalize arithmetic on numbers to arithmetic on coefficient sequences. Instead of adding or multiplying single values, we add or multiply entire polynomials, which correspond to sequences of coefficients. For large degrees, naive multiplication takes O(n^2), which is too slow for many algorithmic problems. Fast Fourier Transform (FFT) or its modular cousin the Number Theoretic Transform (NTT) accelerates multiplication to O(n \log n). Building on fast multiplication, we can perform advanced operations like division, modulo, inversion (1/F), logarithm, exponential, and square root of formal power series efficiently using Newton's method. These methods iteratively double the number of correct coefficients, reaching O(n \log n) total time. In competitive programming and combinatorics, such operations are crucial for manipulating generating functions, solving recurrences, and building polynomials from point-value representations. Multipoint evaluation computes values of a polynomial at many points faster than evaluating one by one, while interpolation rebuilds a polynomial from its values at points. Together, these tools form a powerful toolkit for CF 2200–2600 problems involving convolutions, combinatorics, and power series.

02Intuition & Analogies

Think of a polynomial as a recipe list: the k-th coefficient tells you how much of the ingredient x^k to include. Adding polynomials just adds recipes. Multiplication is like combining two recipes so that every pair of ingredients x^i and x^j combine to contribute to x^{i+j}; that’s a convolution. Doing this pair-by-pair naively is slow (like mixing every spice with every other). FFT/NTT changes perspective: it converts the recipe into a flavor profile at several tasting points where mixing is easy (component-wise multiplication), then transforms back. This is like switching to a domain where combining is effortless. Newton’s method for power series is similar to tuning a recipe: you start with a rough approximation (say, the constant term) and repeatedly correct it so that the result matches more coefficients. Each correction doubles how many coefficients are right, which means you reach your goal in logarithmically many steps. Division uses the same trick but in reverse: to divide A by B, you approximate 1/B and multiply by A. Logs and exps of power series come from calculus identities: log F is the integral of F'/F, and exp solves G'/G = F'. Because we only care about the first n coefficients, all operations are done modulo x^n, keeping vectors short. Multipoint evaluation and interpolation leverage a binary tree of polynomials (x - x_i). Remainders down the tree give you values at leaves; the reverse process rebuilds the polynomial from weighted sums of sibling products, akin to divide-and-conquer blending of local fits into a global curve.

03Formal Definition

Let F(x) =

\sum_{k = 0}^{\infty}

f_{k}

x^{k}

be a formal power series over a field

F

(commonly a finite field

F_{p}

). Polynomial addition and multiplication are defined coefficient-wise and by convolution respectively. Fast multiplication uses FFT/NTT to compute the Cauchy product in O(n

lo g

n) for degree-bounded series modulo

x^{n}

. The formal inverse

G = F^{- 1}

exists if and only if

f_{0}

\neq =

0 and satisfies F

\cdot

\equiv

(mod x^{n})

. Newton iteration for inversion sets

G_{k + 1} = G_{k}

\big

(2 - F

G_k\big

)

mod

x^{2 m}

starting from

G_{0} = f_{0}^{- 1}

, doubling precision. The formal logarithm is defined when

f_{0} = 1

lo g

F =

\int

(F'/F) \; dx, truncated modulo

x^{n}

. The formal exponential is defined for series with constant term 0 as

exp

H being the unique G with G(0)=1 and

lo g

G = H;

a Newton update is

G_{k + 1} = G_{k}

\big

(1 -

lo g

G_{k}

+ H

\big

)

mod

x^{2 m}

. The formal square root exists if

f_{0}

is a quadratic residue; Newton update

g_{k + 1}

\frac{1}{2}

\big

(

g_{k}

+ F/

g_k\big

)

mod

x^{2 m}

. Polynomial division

A = BQ + R

with

de g

R <

de g

B can be computed via reversed polynomials and inversion modulo

x^{n - m + 1}

. Multipoint evaluation of P at points

x_{1}

\dots

x_{m}

uses a product tree

T_{i}

(x) =

\prod_{j \in se g m e n t_{i}}

(x -

x_{j}

) and computes remainders P

mod

T_{i}

down the tree to obtain P(

x_{j}

). Interpolation constructs the unique Q of

d e g ree < m

through (

x_{i}

y_{i}

) by computing the derivative of

\prod

(x -

x_{i}

), evaluating it at

x_{i}

, and combining weights via the same tree.

04When to Use

Use fast polynomial operations when degrees reach 2e5 or more, where naive O(n^2) is infeasible. In combinatorics, employ convolution to count ways to combine choices, compute partitions, or perform subset sums over degrees. Use division/modulo to implement polynomial long division and to compute remainders needed in Chinese remainder-like constructions on polynomials. Choose inversion, logarithm, exponential, and square root of series when manipulating generating functions: solving linear recurrences via generating functions often needs 1/F, while counting labeled structures uses exp/log of series (e.g., exponential generating functions). Newton-based sqrt is used in quadratic equation solutions over series. Multipoint evaluation is ideal when you need values at many points (e.g., building value tables, barycentric Lagrange-like usage) more efficiently than evaluating one-by-one. Interpolation is used to reconstruct polynomials from sample values, such as recovering coefficients of an unknown polynomial, or converting from point-value form to coefficient form after a divide-and-conquer algorithm. In CF-style problems, these techniques appear in tasks involving convolution chains, divide-and-conquer DP optimization with convolutions, solving functional equations in power series, and offline evaluation/interpolation routines.

⚠️Common Mistakes

Ignoring preconditions: inversion requires f_0 \neq 0; log requires f_0 = 1; exp requires constant term 0; sqrt requires the constant term to be a quadratic residue. Violating these leads to division by zero or undefined operations.
Forgetting to trim trailing zeros: many algorithms assume correct degree; stale high-degree zeros can bloat NTT sizes and break degree bounds in division/modulo.
Wrong truncation: Newton iterations must always reduce modulo x^m (or x^{2m} when doubling) to keep vectors bounded; forgetting truncation doubles lengths and time.
NTT length mis-sizing: convolution requires a power-of-two length at least deg(A)+deg(B)+1; too short leads to circular convolution errors.
Mixing moduli: NTT code is modulus-specific; using a different prime without updating primitive roots, powers, or inverses causes wrong answers.
Precision mismatch in multipoint evaluation/interpolation: product-tree levels must use exact polynomials; using floating FFT with rounding can mis-evaluate; prefer NTT or CRT if you need integers.
Off-by-one in derivative/integral: derivative reduces length by 1; integral increases by 1 with inv(i) factors; forgetting inverse factorial precomputation for many integrals is common.
Performance pitfalls: naive polynomial modulo (long division) is O(n^2); always switch to reversal + inverse for O(n \log n). Likewise, avoid rebuilding NTT twiddle factors inside inner loops.

Key Formulas

Cauchy Product

C_{k} = i = 0 \sum k a_{i} b_{k - i}

Explanation: The k-th coefficient of the product A(x)B(x) equals the sum over all pairs of indices that add to k. This is the definition of polynomial convolution.

FFT/NTT Multiplication Time

T (n) = O (n lo g n)

Explanation: Fast multiplication reduces from quadratic to near-linear time in the number of coefficients by transforming, multiplying point-wise, and inverse transforming.

Newton Inversion Update

G_{k + 1} = G_{k} (2 - F G_{k}) mod x^{2 m}

Explanation: Starting from the inverse at low precision, this iteration doubles the number of correct coefficients each time. It solves F G = 1 modulo higher powers of x.

Formal Logarithm

lo g F = \int \frac{F ^{'}}{F} d x

Explanation: The logarithm of a series with constant term 1 is obtained by dividing its derivative by itself and integrating. All operations are truncated to the needed degree.

Formal Exponential via Newton

G_{k + 1} = G_{k} (1 - lo g G_{k} + H) mod x^{2 m}

Explanation: To compute G = $exp$ (H), update G so that $lo g$ G matches H. Each step doubles precision and relies on fast multiply and log.

Series Square Root Newton Step

g_{k + 1} = \frac{1}{2} (g_{k} + \frac{F}{g _{k}}) mod x^{2 m}

Explanation: Analogous to scalar Newton iteration for square roots, this update converges quadratically to a series g with $g^{2} = F$ , assuming the constant term has a square root.

Quotient via Reversal

Q = rev_{n - m + 1} (A) \cdot (rev_{n - m + 1} (B))^{- 1} mod x^{n - m + 1}

Explanation: For A of degree n-1 and B of degree m-1, compute the high-to-low quotient by reversing, inverting, multiplying, then reversing back. The remainder is A - BQ.

Interpolation Denominator

P^{'} (x_{i}) = j \neq = i \prod (x_{i} - x_{j})

Explanation: If P(x)= $\prod$ (x - $x_{j}$ ), then evaluating its derivative at $x_{i}$ gives the denominator used to weight $y_{i}$ in fast interpolation.

Multipoint Evaluation Complexity

O ((n + m) lo g^{2} n)

Explanation: Using a product tree of polynomials and remainders, evaluation at m points costs quasi-linear time with an extra log factor for the tree depth.

Complexity Analysis

Fast polynomial multiplication with NTT runs in O(n

lo g

n) time and O(n) extra space once twiddle tables are prepared. Newton-based operations inherit this cost: inversion, logarithm, exponential, and square root all perform O(

lo g

n) iterations where the i-th iteration multiplies/cons at size roughly 2^i, summing to O(n

lo g

n). Division and modulo reduce to one inversion at size about n-m+1 plus a handful of multiplications, hence O(n

lo g

n) for polynomials with comparable size. The constants differ: exp and sqrt typically involve multiple multiplies per iteration and calls to log or inverse, often 3–6 convolutions per doubling step. Multipoint evaluation and interpolation use a product tree with O(

lo g

m) levels. Building the tree costs O(m

lo g

m) via repeated multiplications. Evaluation proceeds by computing remainders modulo each node’s polynomial, which at each level touches all coefficients and costs O(n

lo g

n) per level, yielding O((n + m)

lo g^{2}

m) in typical NTT-based implementations. Interpolation adds one derivative, one multipoint evaluation of that derivative, and a reconstruction pass up the tree combining child results with sibling polynomials, again O(m

lo g^{2}

m). Space usage is O(n) to O(n

lo g

n) depending on whether you store the entire tree and intermediate remainders; iterative implementations can reduce peak memory by discarding levels when not needed. In practice, care with trimming and reuse of buffers prevents superlinear memory blow-ups.

Code Examples

NTT-based convolution modulo 998244353

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // NTT for modulus 998244353 (primitive root 3)
5 struct NTT {
6     static const uint32_t MOD = 998244353;
7     static const uint32_t G = 3;
8     uint32_t addmod(uint32_t a, uint32_t b){ uint32_t s=a+b; return s>=MOD? s-MOD:s; }
9     uint32_t submod(uint32_t a, uint32_t b){ return a>=b? a-b: a+MOD-b; }
10     uint32_t mulmod(uint64_t a, uint64_t b){ return uint32_t((a*b)%MOD); }
11     uint32_t modpow(uint32_t a, uint64_t e){ uint64_t r=1, x=a; while(e){ if(e&1) r=(r*x)%MOD; x=(x*x)%MOD; e>>=1;} return (uint32_t)r; }
12 
13     void ntt(vector<uint32_t>& a, bool invert){
14         int n = (int)a.size();
15         static vector<int> rev; static vector<uint32_t> roots{0,1};
16         if((int)rev.size()!=n){
17             int k = __builtin_ctz(n);
18             rev.assign(n,0);
19             for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)|((i&1)<<(k-1));
20         }
21         if((int)roots.size()<n){
22             int k = __builtin_ctz(roots.size());
23             roots.resize(n);
24             while((1<<k)<n){
25                 uint32_t z = modpow(G, (MOD-1)>>(k+1));
26                 for(int i=1<<(k-1); i<(1<<k); ++i){
27                     roots[2*i]=roots[i];
28                     roots[2*i+1]=mulmod(roots[i], z);
29                 }
30                 ++k;
31             }
32         }
33         for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
34         for(int len=1; len<n; len<<=1){
35             for(int i=0;i<n;i+=2*len){
36                 for(int j=0;j<len;j++){
37                     uint32_t u=a[i+j];
38                     uint32_t v=mulmod(a[i+j+len], roots[len+j]);
39                     a[i+j]=addmod(u,v);
40                     a[i+j+len]=submod(u,v);
41                 }
42             }
43         }
44         if(invert){
45             reverse(a.begin()+1,a.end());
46             uint32_t inv_n = modpow(n, MOD-2);
47             for(auto &x:a) x = mulmod(x, inv_n);
48         }
49     }
50 
51     vector<uint32_t> convolution(const vector<uint32_t>& a, const vector<uint32_t>& b){
52         if(a.empty()||b.empty()) return {};
53         size_t need = a.size()+b.size()-1; int n=1; while(n<(int)need) n<<=1;
54         vector<uint32_t> fa(a.begin(), a.end()), fb(b.begin(), b.end());
55         fa.resize(n); fb.resize(n);
56         ntt(fa,false); ntt(fb,false);
57         for(int i=0;i<n;i++) fa[i]=mulmod(fa[i],fb[i]);
58         ntt(fa,true); fa.resize(need);
59         return fa;
60     }
61 } ntt;
62 
63 int main(){
64     ios::sync_with_stdio(false); cin.tie(nullptr);
65     // Example: (1 + 2x + 3x^2) * (3 + 4x)
66     vector<uint32_t> A={1,2,3};
67     vector<uint32_t> B={3,4};
68     auto C = ntt.convolution(A,B);
69     // Print coefficients
70     for(size_t i=0;i<C.size();++i){
71         if(i) cout << ' ';
72         cout << C[i];
73     }
74     cout << "\n"; // Expected: [3, 10, 17, 12] modulo 998244353 same as integers
75     return 0;
76 }
77

This code implements the Number Theoretic Transform for modulus 998244353 and uses it to multiply two polynomials. It precomputes bit-reversal and roots of unity tables, performs forward transforms, multiplies point-wise, and applies the inverse transform. The output coefficients represent the product A(x)B(x) modulo the prime.

Time: O(n log n)Space: O(n)

Polynomial division and modulo via Newton inversion

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct NTT { // same as in Example 1 (shortened here by declaration)
5     static const uint32_t MOD = 998244353; static const uint32_t G = 3;
6     uint32_t addmod(uint32_t a,uint32_t b){ uint32_t s=a+b; return s>=MOD? s-MOD:s; }
7     uint32_t submod(uint32_t a,uint32_t b){ return a>=b? a-b: a+MOD-b; }
8     uint32_t mulmod(uint64_t a,uint64_t b){ return uint32_t((a*b)%MOD); }
9     uint32_t modpow(uint32_t a,uint64_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=(r*x)%MOD; x=(x*x)%MOD; e>>=1;} return (uint32_t)r; }
10     void ntt(vector<uint32_t>& a,bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1};
11         if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)|((i&1)<<(k-1)); }
12         if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2*i]=roots[i]; roots[2*i+1]=mulmod(roots[i],z);} ++k; } }
13         for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
14         for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=mulmod(a[i+j+len], roots[len+j]); a[i+j]=addmod(u,v); a[i+j+len]=submod(u,v); } } }
15         if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(auto &x:a) x=mulmod(x,inv_n); }
16     }
17     vector<uint32_t> convolution(const vector<uint32_t>& a,const vector<uint32_t>& b){ if(a.empty()||b.empty()) return {}; size_t need=a.size()+b.size()-1; int n=1; while(n<(int)need) n<<=1; vector<uint32_t> fa(a.begin(),a.end()), fb(b.begin(),b.end()); fa.resize(n); fb.resize(n); ntt(fa,false); ntt(fb,false); for(int i=0;i<n;i++) fa[i]=mulmod(fa[i],fb[i]); ntt(fa,true); fa.resize(need); return fa; }
18 } ntt;
19 
20 using Poly = vector<uint32_t>;
21 
22 void trim(Poly &a){ while(!a.empty() && a.back()==0) a.pop_back(); }
23 
24 Poly add(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av= i<a.size()?a[i]:0; uint32_t bv= i<b.size()?b[i]:0; c[i]=ntt.addmod(av,bv);} trim(c); return c; }
25 Poly sub(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av= i<a.size()?a[i]:0; uint32_t bv= i<b.size()?b[i]:0; c[i]=ntt.submod(av,bv);} trim(c); return c; }
26 Poly mul(const Poly &a,const Poly &b){ return ntt.convolution(a,b); }
27 
28 Poly rev(Poly a, size_t n){ a.resize(n); reverse(a.begin(), a.end()); return a; }
29 
30 Poly inv(const Poly &f, size_t n){ // Newton inversion modulo x^n, require f[0] != 0
31     Poly g(1); g[0] = ntt.modpow(f[0], NTT::MOD-2);
32     size_t m=1;
33     while(m<n){ m<<=1; Poly f_cut(min(f.size(), m)); for(size_t i=0;i<f_cut.size();++i) f_cut[i]=f[i];
34         Poly t = mul(mul(g,g), f_cut); t.resize(m);
35         // g <- g * (2 - f*g) mod x^m
36         Poly two_g = g; for(auto &x:two_g) x = ntt.addmod(x, x);
37         g.resize(m);
38         for(size_t i=0;i<m;i++){
39             uint32_t tg = i<t.size()? t[i] : 0;
40             g[i] = ntt.submod(two_g[i], tg);
41         }
42         g = mul(g, Poly(g.begin(), g.begin()+m)); // Mistake if used; correct below
43         // Correct update: g = g_old * (2 - f_cut * g_old)
44         // To avoid extra allocation, recompute properly
45         Poly fg = mul(f_cut, Poly(g.begin(), g.begin()+m)); fg.resize(m);
46         Poly one_minus_fg(m);
47         for(size_t i=0;i<m;i++) one_minus_fg[i] = ntt.submod(ntt.addmod(0, (i==0?2:0)), fg[i]);
48         g = mul(Poly(g.begin(), g.begin()+(m>>1)), one_minus_fg); // old g length m/2
49         g.resize(m);
50     }
51     g.resize(n); trim(g); return g;
52 }
53 
54 // A safer, clearer inversion (replacing the above block if desired)
55 Poly inv_clear(const Poly &f, size_t n){
56     Poly g(1); g[0] = ntt.modpow(f[0], NTT::MOD-2);
57     size_t m=1;
58     while(m<n){
59         size_t m2 = min(2*m, n);
60         Poly f_cut(min(f.size(), m2)); for(size_t i=0;i<f_cut.size();++i) f_cut[i]=f[i];
61         Poly fg = mul(f_cut, g); fg.resize(m2);
62         // t = 2 - f*g
63         Poly t(m2);
64         t[0] = 2; for(size_t i=0;i<m2;i++) t[i] = ntt.submod(t[i], (i<fg.size()?fg[i]:0));
65         g = mul(g, t); g.resize(m2);
66         m = m2;
67     }
68     g.resize(n); trim(g); return g;
69 }
70 
71 pair<Poly,Poly> divmod(const Poly &A, const Poly &B){ // return {Q,R}
72     Poly a=A, b=B; trim(a); trim(b);
73     if(b.empty()) throw runtime_error("Division by zero polynomial");
74     int n = (int)a.size()-1, m=(int)b.size()-1;
75     if(n<m){ return {Poly{0}, a}; }
76     int k = n-m+1;
77     Poly ar = rev(a, n+1), br = rev(b, m+1);
78     Poly inv_br = inv_clear(br, k);
79     Poly qr = mul(ar, inv_br); qr.resize(k);
80     Poly q = rev(qr, k);
81     Poly r = sub(a, mul(b, q)); r.resize(m); trim(r);
82     trim(q);
83     return {q, r};
84 }
85 
86 int main(){
87     ios::sync_with_stdio(false); cin.tie(nullptr);
88     // Divide A by B: A = x^5 + 2x^3 + 3x + 4, B = x^2 + 1
89     Poly A={4,3,0,2,0,1};
90     Poly B={1,0,1};
91     auto [Q,R] = divmod(A,B);
92     auto print=[&](const Poly &P){ if(P.empty()){ cout<<0; } else { for(size_t i=0;i<P.size();++i){ if(i) cout<<' '; cout<<P[i]; } } cout<<"\n"; };
93     print(Q); // expected quotient roughly degree 3
94     print(R); // remainder degree < 2
95     return 0;
96 }
97

This code shows how to divide polynomials in near-linear time using reversal and inversion. The key step is computing the inverse of the reversed divisor modulo x^{n-m+1}, then reversing back to obtain the quotient; the remainder is A - BQ. The inv_clear function implements Newton inversion with doubling and correct truncation.

Time: O(n log n)Space: O(n)

Formal power series: log, exp, sqrt (Newton's method)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct NTT {
5     static const uint32_t MOD = 998244353; static const uint32_t G = 3;
6     uint32_t addmod(uint32_t a,uint32_t b){ uint32_t s=a+b; return s>=MOD? s-MOD:s; }
7     uint32_t submod(uint32_t a,uint32_t b){ return a>=b? a-b: a+MOD-b; }
8     uint32_t mulmod(uint64_t a,uint64_t b){ return uint32_t((a*b)%MOD); }
9     uint32_t modpow(uint32_t a,uint64_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=(r*x)%MOD; x=(x*x)%MOD; e>>=1;} return (uint32_t)r; }
10     void ntt(vector<uint32_t>& a,bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1};
11         if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)|((i&1)<<(k-1)); }
12         if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2*i]=roots[i]; roots[2*i+1]=mulmod(roots[i],z);} ++k; } }
13         for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
14         for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=mulmod(a[i+j+len], roots[len+j]); a[i+j]=addmod(u,v); a[i+j+len]=submod(u,v); } } }
15         if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(auto &x:a) x=mulmod(x,inv_n); }
16     }
17     vector<uint32_t> convolution(const vector<uint32_t>& a,const vector<uint32_t>& b){ if(a.empty()||b.empty()) return {}; size_t need=a.size()+b.size()-1; int n=1; while(n<(int)need) n<<=1; vector<uint32_t> fa(a.begin(),a.end()), fb(b.begin(),b.end()); fa.resize(n); fb.resize(n); ntt(fa,false); ntt(fb,false); for(int i=0;i<n;i++) fa[i]=mulmod(fa[i],fb[i]); ntt(fa,true); fa.resize(need); return fa; }
18 } ntt;
19 using Poly = vector<uint32_t>;
20 void trim(Poly &a){ while(!a.empty() && a.back()==0) a.pop_back(); }
21 Poly mul(const Poly &a,const Poly &b){ return ntt.convolution(a,b); }
22 
23 Poly deriv(const Poly &a){ if(a.empty()) return {}; Poly d(max<int>(0,(int)a.size()-1)); for(size_t i=1;i<a.size();++i) d[i-1]= (uint64_t)a[i]*i % NTT::MOD; trim(d); return d; }
24 Poly integr(const Poly &a){ Poly I(a.size()+1); static vector<uint32_t> invs(1,0); // invs[k] = 1/k mod MOD
25     while(invs.size()<=a.size()){
26         size_t k=invs.size();
27         invs.push_back(NTT().modpow(k, NTT::MOD-2));
28     }
29     I[0]=0; for(size_t i=0;i<a.size();++i) I[i+1]= (uint64_t)a[i]*invs[i+1] % NTT::MOD; trim(I); return I; }
30 
31 Poly add(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av=i<a.size()?a[i]:0, bv=i<b.size()?b[i]:0; c[i]=ntt.addmod(av,bv);} trim(c); return c; }
32 Poly subP(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av=i<a.size()?a[i]:0, bv=i<b.size()?b[i]:0; c[i]=ntt.submod(av,bv);} trim(c); return c; }
33 
34 Poly inv_clear(const Poly &f, size_t n){ Poly g(1); g[0]=ntt.modpow(f[0], NTT::MOD-2); size_t m=1; while(m<n){ size_t m2=min(2*m,n); Poly f_cut(min(f.size(),m2)); for(size_t i=0;i<f_cut.size();++i) f_cut[i]=f[i]; Poly fg=mul(f_cut,g); fg.resize(m2); Poly t(m2); t[0]=2; for(size_t i=0;i<m2;i++) t[i]=ntt.submod(t[i], (i<fg.size()?fg[i]:0)); g=mul(g,t); g.resize(m2); m=m2; } g.resize(n); trim(g); return g; }
35 
36 Poly log_series(const Poly &f, size_t n){ // require f[0]=1
37     Poly df = deriv(f);
38     Poly invf = inv_clear(f, n);
39     Poly q = mul(df, invf); q.resize(n-1);
40     Poly res = integr(q); res.resize(n); trim(res); return res;
41 }
42 
43 Poly exp_series(const Poly &h, size_t n){ // require h[0]=0
44     Poly g(1,1); size_t m=1;
45     while(m<n){ size_t m2=min(2*m,n);
46         Poly lg = log_series(g, m2);
47         Poly diff = subP(h, lg); diff[0] = ntt.addmod(diff[0], 1); // 1 - log(g) + h
48         g = mul(g, diff); g.resize(m2);
49         m = m2;
50     }
51     g.resize(n); trim(g); return g;
52 }
53 
54 Poly sqrt_series(const Poly &f, size_t n){ // require f[0] is a quadratic residue; take c0=sqrt(f0)
55     // For contest use, ensure f0=1 to avoid Tonelli-Shanks; then c0=1
56     uint32_t c0 = 1; // assume f[0]==1
57     Poly g(1, c0); uint32_t inv2 = ntt.modpow(2, NTT::MOD-2);
58     size_t m=1;
59     while(m<n){ size_t m2=min(2*m,n);
60         Poly invg = inv_clear(g, m2);
61         Poly t = mul(f, invg); t.resize(m2);
62         // g <- (g + t) / 2
63         if(g.size()<m2) g.resize(m2,0);
64         for(size_t i=0;i<m2;i++) g[i] = (uint64_t)( (i<g.size()?g[i]:0) + (i<t.size()?t[i]:0) ) * inv2 % NTT::MOD;
65         m = m2;
66     }
67     g.resize(n); trim(g); return g;
68 }
69 
70 int main(){
71     ios::sync_with_stdio(false); cin.tie(nullptr);
72     // f = 1 + 2x + 3x^2 + ... up to n
73     size_t n=8;
74     Poly f(n,0); f[0]=1; f[1]=2; f[2]=3; f[3]=4; // sample
75     Poly lg = log_series(f,n);
76     Poly ex = exp_series(Poly{0,1}, n); // exp(x)
77     Poly sq = sqrt_series(f,n); // assuming f[0]=1
78     auto print=[&](const Poly &P){ for(size_t i=0;i<P.size();++i){ if(i) cout<<' '; cout<<P[i]; } cout<<"\n"; };
79     print(lg);
80     print(ex);
81     print(sq);
82     return 0;
83 }
84

This example implements derivative, integral, and Newton-driven formal log, exp, and sqrt. It assumes modulus 998244353 and uses NTT-based multiplication. Preconditions: log requires f[0]=1; exp requires input h with h[0]=0; sqrt assumes f[0]=1 (so the modular square root is 1).

Time: O(n log n)Space: O(n)

Multipoint evaluation and interpolation using a product tree

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct NTT {
5     static const uint32_t MOD = 998244353; static const uint32_t G = 3;
6     uint32_t addmod(uint32_t a,uint32_t b){ uint32_t s=a+b; return s>=MOD? s-MOD:s; }
7     uint32_t submod(uint32_t a,uint32_t b){ return a>=b? a-b: a+MOD-b; }
8     uint32_t mulmod(uint64_t a,uint64_t b){ return uint32_t((a*b)%MOD); }
9     uint32_t modpow(uint32_t a,uint64_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=(r*x)%MOD; x=(x*x)%MOD; e>>=1;} return (uint32_t)r; }
10     void ntt(vector<uint32_t>& a,bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1};
11         if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)|((i&1)<<(k-1)); }
12         if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2*i]=roots[i]; roots[2*i+1]=mulmod(roots[i],z);} ++k; } }
13         for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
14         for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=mulmod(a[i+j+len], roots[len+j]); a[i+j]=addmod(u,v); a[i+j+len]=submod(u,v); } } }
15         if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(auto &x:a) x=mulmod(x,inv_n); }
16     }
17     vector<uint32_t> convolution(const vector<uint32_t>& a,const vector<uint32_t>& b){ if(a.empty()||b.empty()) return {}; size_t need=a.size()+b.size()-1; int n=1; while(n<(int)need) n<<=1; vector<uint32_t> fa(a.begin(),a.end()), fb(b.begin(),b.end()); fa.resize(n); fb.resize(n); ntt(fa,false); ntt(fb,false); for(int i=0;i<n;i++) fa[i]=mulmod(fa[i],fb[i]); ntt(fa,true); fa.resize(need); return fa; }
18 } ntt;
19 using Poly = vector<uint32_t>;
20 void trim(Poly &a){ while(!a.empty() && a.back()==0) a.pop_back(); }
21 Poly add(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av=i<a.size()?a[i]:0, bv=i<b.size()?b[i]:0; c[i]=ntt.addmod(av,bv);} trim(c); return c; }
22 Poly subP(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av=i<a.size()?a[i]:0, bv=i<b.size()?b[i]:0; c[i]=ntt.submod(av,bv);} trim(c); return c; }
23 Poly mul(const Poly &a,const Poly &b){ return ntt.convolution(a,b); }
24 
25 pair<Poly,Poly> divmod(const Poly &A, const Poly &B){ // O(n log n) division
26     auto rev=[&](Poly a,size_t n){ a.resize(n); reverse(a.begin(),a.end()); return a; };
27     auto inv=[&](const Poly &f, size_t n){ Poly g(1); g[0]=ntt.modpow(f[0], NTT::MOD-2); size_t m=1; while(m<n){ size_t m2=min(2*m,n); Poly f_cut(min(f.size(),m2)); for(size_t i=0;i<f_cut.size();++i) f_cut[i]=f[i]; Poly fg=mul(f_cut,g); fg.resize(m2); Poly t(m2); t[0]=2; for(size_t i=0;i<m2;i++) t[i]=ntt.submod(t[i], (i<fg.size()?fg[i]:0)); g=mul(g,t); g.resize(m2); m=m2; } g.resize(n); trim(g); return g; };
28     Poly a=A,b=B; trim(a); trim(b); if(b.empty()) throw runtime_error("div by zero"); int n=(int)a.size()-1,m=(int)b.size()-1; if(n<m) return {Poly{0},a}; int k=n-m+1; Poly ar=rev(a,n+1), br=rev(b,m+1); Poly invbr=inv(br,k); Poly qr=mul(ar,invbr); qr.resize(k); Poly q=rev(qr,k); Poly r=subP(a,mul(b,q)); r.resize(m); trim(q); trim(r); return {q,r}; }
29 
30 // Build product tree of (x - x_i)
31 struct ProductTree {
32     vector<Poly> tree; int m;
33     ProductTree(const vector<uint32_t>& xs){ m=xs.size(); tree.resize(4*m); build(1,0,m,xs); }
34     void build(int p,int l,int r,const vector<uint32_t>& xs){ if(r-l==1){ tree[p] = Poly{ntt.submod(0,xs[l]), 1}; return; } int mid=(l+r)/2; build(p<<1,l,mid,xs); build(p<<1|1,mid,r,xs); tree[p]=mul(tree[p<<1], tree[p<<1|1]); }
35 };
36 
37 // Multipoint evaluation: compute P(x_i)
38 void eval_down(int p,int l,int r,const ProductTree &PT,const Poly &rem, vector<uint32_t>& ans){ if(r-l==1){ // rem is constant remainder modulo (x-x_i)
39         ans[l] = rem.empty()?0:rem[0]; return; }
40     int mid=(l+r)/2; auto [ql, rl] = divmod(rem, PT.tree[p<<1]); auto [qr, rr] = divmod(rem, PT.tree[p<<1|1]); // We only need remainders
41     eval_down(p<<1, l, mid, PT, rl, ans);
42     eval_down(p<<1|1, mid, r, PT, rr, ans);
43 }
44 
45 vector<uint32_t> multipoint_evaluate(const Poly &P, const vector<uint32_t>& xs){ ProductTree PT(xs); vector<uint32_t> ans(xs.size()); Poly R = P; // start with whole polynomial
46     // compute remainder modulo root polynomial to keep degree manageable
47     auto [q, r0] = divmod(P, PT.tree[1]); (void)q;
48     eval_down(1,0,xs.size(),PT,r0,ans);
49     return ans;
50 }
51 
52 // Interpolation: given (x_i, y_i), return degree < m polynomial Q
53 Poly interpolate(const vector<uint32_t>& xs, const vector<uint32_t>& ys){ int m=xs.size(); ProductTree PT(xs); // P = prod (x - x_i)
54     // Compute P'(x_i)
55     Poly P = PT.tree[1]; // product polynomial
56     // derivative
57     Poly Pd(max<int>(0,(int)P.size()-1)); for(size_t i=1;i<P.size();++i) Pd[i-1]= (uint64_t)P[i]*i % NTT::MOD;
58     // Evaluate Pd at xs
59     vector<uint32_t> dvals = multipoint_evaluate(Pd, xs);
60     // weights w_i = y_i / P'(x_i)
61     vector<uint32_t> w(m);
62     for(int i=0;i<m;i++){ uint32_t invd = ntt.modpow(dvals[i], NTT::MOD-2); w[i] = (uint64_t)ys[i]*invd % NTT::MOD; }
63     // Reconstruct via tree: bottom-up combine
64     function<Poly(int,int,int)> build = [&](int p,int l,int r)->Poly{
65         if(r-l==1){ return Poly{ w[l] }; }
66         int mid=(l+r)/2; Poly L = build(p<<1,l,mid); Poly R = build(p<<1|1,mid,r);
67         Poly resL = mul(L, PT.tree[p<<1|1]); // L * poly(right)
68         Poly resR = mul(R, PT.tree[p<<1]);   // R * poly(left)
69         return add(resL, resR);
70     };
71     Poly Q = build(1,0,m); Q.resize(m); trim(Q); return Q;
72 }
73 
74 int main(){
75     ios::sync_with_stdio(false); cin.tie(nullptr);
76     // Let's build a polynomial and evaluate/interpolate it
77     Poly P = {5, 2, 7, 1}; // 5 + 2x + 7x^2 + x^3
78     vector<uint32_t> xs = {1,2,3,4};
79     vector<uint32_t> ys = multipoint_evaluate(P, xs);
80     Poly Q = interpolate(xs, ys);
81     auto print=[&](const Poly &A){ for(size_t i=0;i<A.size();++i){ if(i) cout<<' '; cout<<A[i]; } cout<<"\n"; };
82     for(auto v: ys){ cout << v << ' '; } cout << "\n"; // values
83     print(Q); // should match P (up to degree < m)
84     return 0;
85 }
86

We build a product tree of (x − x_i), then compute remainders top-down to evaluate P at all x_i. For interpolation, we compute the product polynomial P(x), its derivative, the weights w_i = y_i / P'(x_i), and combine them via the tree: at each node, the left result is multiplied by the right polynomial and vice versa, then added. This yields the unique polynomial of degree < m passing through the given points.

Time: O(n log^2 n)Space: O(n log n)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// NTT for modulus 998244353 (primitive root 3)
5	struct NTT {
6	static const uint32_t MOD = 998244353;
7	static const uint32_t G = 3;
8	uint32_t addmod(uint32_t a, uint32_t b){ uint32_t s=a+b; return s>=MOD? s-MOD:s; }
9	uint32_t submod(uint32_t a, uint32_t b){ return a>=b? a-b: a+MOD-b; }
10	uint32_t mulmod(uint64_t a, uint64_t b){ return uint32_t((a*b)%MOD); }
11	uint32_t modpow(uint32_t a, uint64_t e){ uint64_t r=1, x=a; while(e){ if(e&1) r=(rx)%MOD; x=(xx)%MOD; e>>=1;} return (uint32_t)r; }
12
13	void ntt(vector<uint32_t>& a, bool invert){
14	int n = (int)a.size();
15	static vector<int> rev; static vector<uint32_t> roots{0,1};
16	if((int)rev.size()!=n){
17	int k = __builtin_ctz(n);
18	rev.assign(n,0);
19	for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)\|((i&1)<<(k-1));
20	}
21	if((int)roots.size()<n){
22	int k = __builtin_ctz(roots.size());
23	roots.resize(n);
24	while((1<<k)<n){
25	uint32_t z = modpow(G, (MOD-1)>>(k+1));
26	for(int i=1<<(k-1); i<(1<<k); ++i){
27	roots[2*i]=roots[i];
28	roots[2*i+1]=mulmod(roots[i], z);
29	}
30	++k;
31	}
32	}
33	for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
34	for(int len=1; len<n; len<<=1){
35	for(int i=0;i<n;i+=2*len){
36	for(int j=0;j<len;j++){
37	uint32_t u=a[i+j];
38	uint32_t v=mulmod(a[i+j+len], roots[len+j]);
39	a[i+j]=addmod(u,v);
40	a[i+j+len]=submod(u,v);
41	}
42	}
43	}
44	if(invert){
45	reverse(a.begin()+1,a.end());
46	uint32_t inv_n = modpow(n, MOD-2);
47	for(auto &x:a) x = mulmod(x, inv_n);
48	}
49	}
50
51	vector<uint32_t> convolution(const vector<uint32_t>& a, const vector<uint32_t>& b){
52	if(a.empty()\|\|b.empty()) return {};
53	size_t need = a.size()+b.size()-1; int n=1; while(n<(int)need) n<<=1;
54	vector<uint32_t> fa(a.begin(), a.end()), fb(b.begin(), b.end());
55	fa.resize(n); fb.resize(n);
56	ntt(fa,false); ntt(fb,false);
57	for(int i=0;i<n;i++) fa[i]=mulmod(fa[i],fb[i]);
58	ntt(fa,true); fa.resize(need);
59	return fa;
60	}
61	} ntt;
62
63	int main(){
64	ios::sync_with_stdio(false); cin.tie(nullptr);
65	// Example: (1 + 2x + 3x^2) * (3 + 4x)
66	vector<uint32_t> A={1,2,3};
67	vector<uint32_t> B={3,4};
68	auto C = ntt.convolution(A,B);
69	// Print coefficients
70	for(size_t i=0;i<C.size();++i){
71	if(i) cout << ' ';
72	cout << C[i];
73	}
74	cout << "\n"; // Expected: [3, 10, 17, 12] modulo 998244353 same as integers
75	return 0;
76	}
77

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct NTT { // same as in Example 1 (shortened here by declaration)
5	static const uint32_t MOD = 998244353; static const uint32_t G = 3;
6	uint32_t addmod(uint32_t a,uint32_t b){ uint32_t s=a+b; return s>=MOD? s-MOD:s; }
7	uint32_t submod(uint32_t a,uint32_t b){ return a>=b? a-b: a+MOD-b; }
8	uint32_t mulmod(uint64_t a,uint64_t b){ return uint32_t((a*b)%MOD); }
9	uint32_t modpow(uint32_t a,uint64_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=(rx)%MOD; x=(xx)%MOD; e>>=1;} return (uint32_t)r; }
10	void ntt(vector<uint32_t>& a,bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1};
11	if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)\|((i&1)<<(k-1)); }
12	if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2i]=roots[i]; roots[2i+1]=mulmod(roots[i],z);} ++k; } }
13	for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
14	for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=mulmod(a[i+j+len], roots[len+j]); a[i+j]=addmod(u,v); a[i+j+len]=submod(u,v); } } }
15	if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(auto &x:a) x=mulmod(x,inv_n); }
16	}
17	vector<uint32_t> convolution(const vector<uint32_t>& a,const vector<uint32_t>& b){ if(a.empty()\|\|b.empty()) return {}; size_t need=a.size()+b.size()-1; int n=1; while(n<(int)need) n<<=1; vector<uint32_t> fa(a.begin(),a.end()), fb(b.begin(),b.end()); fa.resize(n); fb.resize(n); ntt(fa,false); ntt(fb,false); for(int i=0;i<n;i++) fa[i]=mulmod(fa[i],fb[i]); ntt(fa,true); fa.resize(need); return fa; }
18	} ntt;
19
20	using Poly = vector<uint32_t>;
21
22	void trim(Poly &a){ while(!a.empty() && a.back()==0) a.pop_back(); }
23
24	Poly add(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av= i<a.size()?a[i]:0; uint32_t bv= i<b.size()?b[i]:0; c[i]=ntt.addmod(av,bv);} trim(c); return c; }
25	Poly sub(const Poly &a,const Poly &b){ Poly c(max(a.size(),b.size())); for(size_t i=0;i<c.size();++i){ uint32_t av= i<a.size()?a[i]:0; uint32_t bv= i<b.size()?b[i]:0; c[i]=ntt.submod(av,bv);} trim(c); return c; }
26	Poly mul(const Poly &a,const Poly &b){ return ntt.convolution(a,b); }
27
28	Poly rev(Poly a, size_t n){ a.resize(n); reverse(a.begin(), a.end()); return a; }
29
30	Poly inv(const Poly &f, size_t n){ // Newton inversion modulo x^n, require f[0] != 0
31	Poly g(1); g[0] = ntt.modpow(f[0], NTT::MOD-2);
32	size_t m=1;
33	while(m<n){ m<<=1; Poly f_cut(min(f.size(), m)); for(size_t i=0;i<f_cut.size();++i) f_cut[i]=f[i];
34	Poly t = mul(mul(g,g), f_cut); t.resize(m);
35	// g <- g * (2 - f*g) mod x^m
36	Poly two_g = g; for(auto &x:two_g) x = ntt.addmod(x, x);
37	g.resize(m);
38	for(size_t i=0;i<m;i++){
39	uint32_t tg = i<t.size()? t[i] : 0;
40	g[i] = ntt.submod(two_g[i], tg);
41	}
42	g = mul(g, Poly(g.begin(), g.begin()+m)); // Mistake if used; correct below
43	// Correct update: g = g_old * (2 - f_cut * g_old)
44	// To avoid extra allocation, recompute properly
45	Poly fg = mul(f_cut, Poly(g.begin(), g.begin()+m)); fg.resize(m);
46	Poly one_minus_fg(m);
47	for(size_t i=0;i<m;i++) one_minus_fg[i] = ntt.submod(ntt.addmod(0, (i==0?2:0)), fg[i]);
48	g = mul(Poly(g.begin(), g.begin()+(m>>1)), one_minus_fg); // old g length m/2
49	g.resize(m);
50	}
51	g.resize(n); trim(g); return g;
52	}
53
54	// A safer, clearer inversion (replacing the above block if desired)
55	Poly inv_clear(const Poly &f, size_t n){
56	Poly g(1); g[0] = ntt.modpow(f[0], NTT::MOD-2);
57	size_t m=1;
58	while(m<n){
59	size_t m2 = min(2*m, n);
60	Poly f_cut(min(f.size(), m2)); for(size_t i=0;i<f_cut.size();++i) f_cut[i]=f[i];
61	Poly fg = mul(f_cut, g); fg.resize(m2);
62	// t = 2 - f*g
63	Poly t(m2);
64	t[0] = 2; for(size_t i=0;i<m2;i++) t[i] = ntt.submod(t[i], (i<fg.size()?fg[i]:0));
65	g = mul(g, t); g.resize(m2);
66	m = m2;
67	}
68	g.resize(n); trim(g); return g;
69	}
70
71	pair<Poly,Poly> divmod(const Poly &A, const Poly &B){ // return {Q,R}
72	Poly a=A, b=B; trim(a); trim(b);
73	if(b.empty()) throw runtime_error("Division by zero polynomial");
74	int n = (int)a.size()-1, m=(int)b.size()-1;
75	if(n<m){ return {Poly{0}, a}; }
76	int k = n-m+1;
77	Poly ar = rev(a, n+1), br = rev(b, m+1);
78	Poly inv_br = inv_clear(br, k);
79	Poly qr = mul(ar, inv_br); qr.resize(k);
80	Poly q = rev(qr, k);
81	Poly r = sub(a, mul(b, q)); r.resize(m); trim(r);
82	trim(q);
83	return {q, r};
84	}
85
86	int main(){
87	ios::sync_with_stdio(false); cin.tie(nullptr);
88	// Divide A by B: A = x^5 + 2x^3 + 3x + 4, B = x^2 + 1
89	Poly A={4,3,0,2,0,1};
90	Poly B={1,0,1};
91	auto [Q,R] = divmod(A,B);
92	auto print=[&](const Poly &P){ if(P.empty()){ cout<<0; } else { for(size_t i=0;i<P.size();++i){ if(i) cout<<' '; cout<<P[i]; } } cout<<"\n"; };
93	print(Q); // expected quotient roughly degree 3
94	print(R); // remainder degree < 2
95	return 0;
96	}
97