Berlekamp-Massey Algorithm
Key Points
- •Berlekamp–Massey (BM) finds the shortest linear recurrence that exactly fits a given sequence over a field (e.g., modulo a prime).
- •If the true recurrence has order k, then the first 2k terms are enough to recover it uniquely in O() time.
- •Once you know the recurrence, you can compute the n-th term in O( n) using Kitamasa’s method (polynomial reduction) or O( n) with matrix exponentiation.
- •BM maintains a “connection polynomial” and fixes it whenever it detects a discrepancy between prediction and actual sequence value.
- •The generating function of a linearly recurrent sequence is rational: S(x) = P(x)/Q(x) where Q encodes the recurrence.
- •In competitive programming, BM is ideal for “find the n-th term” when you can generate the first 2k terms but n is huge.
- •BM requires a field: use a prime modulus (like 998244353 or 10^9+7); do not use a composite modulus without extra care.
- •Careful indexing and signs are crucial: the returned coefficients typically satisfy s[n] = s[n-i] modulo MOD.
Prerequisites
- →Modular arithmetic over prime fields — BM requires inverses; working modulo a prime ensures every nonzero element has an inverse.
- →Polynomial representation and operations — BM and Kitamasa both manipulate polynomials (connection polynomial, reduction modulo Q).
- →Binary exponentiation (fast power) — Kitamasa uses exponentiation by squaring on polynomials.
- →Matrix exponentiation (optional) — Alternative way to get the n-th term using the companion matrix.
- →Sequence recurrences and generating functions — Understanding why linearly recurrent sequences have rational generating functions motivates the methods.
- →Chinese Remainder Theorem (CRT) — If you need integer coefficients, run BM under several primes and reconstruct integers via CRT.
Detailed Explanation
Tap terms for definitions01Overview
The Berlekamp–Massey algorithm is a classic method to reconstruct the shortest linear recurrence that generates a given sequence. Think of a linear recurrence as a rule that predicts each new term from a fixed number k of previous terms using constant coefficients. Famous examples include the Fibonacci numbers. BM takes the first n terms of a sequence (over a field, like integers modulo a prime) and outputs the minimal recurrence order k and its k coefficients. If the true order is k, then 2k terms suffice to recover it uniquely with high certainty. The algorithm runs in O(n^2) time, much faster than naive linear algebra approaches that might solve large linear systems. BM works by maintaining a polynomial that captures the relationship among terms. As it scans the sequence left-to-right, it computes a discrepancy—how far the current polynomial’s prediction is from the actual term—and corrects the polynomial if needed. This self-correcting process ensures the polynomial always remains the shortest one consistent with what we have seen. Once we know the recurrence, we can compute very large-index terms without iterating all previous values. Using Kitamasa’s algorithm (polynomial exponentiation modulo the characteristic polynomial), we can get the n-th term in O(k^2 log n). That makes BM a powerful tool in programming contests for “find the n-th term” problems where n can be astronomically large.
02Intuition & Analogies
Imagine trying to guess a recipe from the taste of a dish. Each spoonful (sequence term) gives you information. You suspect the chef uses a fixed set of k spices (the last k terms) with constant proportions (coefficients) to produce the next taste (the next term). Berlekamp–Massey is like a disciplined taster: it tries a current guess of the spice mix, takes a new spoonful, and checks if the guess predicts the taste. If not, it corrects the recipe using the size and direction of the error (the discrepancy). More concretely, keep a “rule card” (a polynomial) that says how to combine the last few terms to predict the next. As you move along the sequence, you test the rule card. If the prediction matches, great—your recipe is still good. If it fails, you update the rule card by blending it with a saved backup card from the last failure, shifting it to align with the current position, and scaling it to cancel the current error. Over time, you need to increase the number of spices (the order k) only when the mismatches are so serious that a shift-and-scale of the backup card is unavoidable. This is why BM ends up with the shortest possible rule: it only expands when absolutely necessary. Once the rule is known, predicting any far-future term is like repeatedly applying the chef’s exact proportions without tasting every intermediate spoonful. Kitamasa’s method lets you apply the rule in logarithmically many steps—like fast-forwarding a song by jumping through sections rather than listening to every second.
03Formal Definition
04When to Use
Use Berlekamp–Massey when you suspect a sequence is generated by a fixed linear recurrence and you can access its first several terms. Typical competitive programming scenarios include:
- Given a DP with linear transitions, precompute a few dozen terms and use BM to recover the recurrence; then answer “n-th term” queries for huge n.
- Sequences from linear recurrences masked by modular arithmetic, such as sums of exponentials or periodic convolutions; BM recovers the minimal polynomial.
- Cryptographic toy problems: recovering LFSR taps over GF(2) from output bits (BM over GF(2)).
- Problems where constructing the companion matrix is messy or large; BM + Kitamasa offers O(k^2 \log n) n-th term without explicit matrices. Choose BM when: (1) you can ensure a field (e.g., prime modulus), (2) you can compute at least 2k terms where k is the expected order, and (3) you want an algorithm simpler and faster than solving large linear systems. If recurrences are non-linear, time-varying, or the modulus is composite (no field), BM needs modifications or may fail.
⚠️Common Mistakes
- Using a composite modulus: BM relies on inverses; apply it over a field (e.g., a prime modulus) or use rationals/multiple primes + CRT.
- Wrong sign/indexing: If your connection polynomial is (C(x) = 1 + C_1 x + \cdots + C_L x^L), then the recurrence is typically (s_n = -\sum_{i=1}^{L} C_i s_{n-i}) (mod MOD). Many implementations return coefficients already sign-flipped. Be consistent across BM and n-th term code.
- Insufficient terms: If you feed fewer than 2k terms for a k-order recurrence, BM can return a smaller-order polynomial that fits the prefix but fails later.
- Forgetting modulo normalization: After additions/subtractions, bring results into ([0,\text{MOD})). Negative residues cause WA.
- Off-by-one in Kitamasa reduction: When reducing a 2k-degree product by the k-degree recurrence, double-check loop bounds and which indices receive additions.
- Overflow: Use 128-bit temporaries when MOD is near 1e9+7 to avoid overflow in products before modulo.
- Expecting BM to work on noisy/inexact data: It requires exact equality in a field. For integer sequences with large coefficients, consider working modulo several primes and reconstruct via CRT to avoid rational arithmetic.
Key Formulas
Linear Recurrence
Explanation: Each term is a linear combination of the previous k terms. The constants do not depend on n.
Characteristic Polynomial
Explanation: This polynomial encodes the recurrence; reducing polynomials modulo Q corresponds to applying the recurrence.
BM Discrepancy
Explanation: The discrepancy measures how well the current connection polynomial C(x) predicts . If it is zero, the current model fits up to n; otherwise, update C(x).
BM Update Rule
Explanation: Upon a nonzero discrepancy, shift and scale the backup polynomial B(x) to cancel the error at position n. If 2L n, also increase L and reset backup.
Rational Generating Function
Explanation: Any linearly recurrent sequence has a rational generating function where Q(x) is the characteristic polynomial. This fact underpins fast n-th term computation by polynomial reduction.
BM Complexity
Explanation: The time to process n terms is quadratic and the space is linear in n because we keep polynomials up to length O(n).
Kitamasa Complexity
Explanation: Computing from the first k terms and the recurrence reduces to O(log n) polynomial multiplications and reductions, each costing O().
Binary Exponentiation
Explanation: Raise a matrix or polynomial to the n-th power using repeated squaring with O(log n) multiplications.
Triangular Number (for intuition)
Explanation: Useful when reasoning about total inner-loop operations summing over increasing polynomial degrees; it grows quadratically.
Hankel Rank
Explanation: The rank of the Hankel matrix built from the sequence equals the order of the minimal recurrence, which is why 2k terms suffice to recover it uniquely.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 using int64 = long long; 5 const int64 MOD = 998244353; // a prime modulus (field) 6 7 int64 norm(int64 x){ x %= MOD; if(x < 0) x += MOD; return x; } 8 int64 addm(int64 a, int64 b){ a += b; if(a >= MOD) a -= MOD; return a; } 9 int64 subm(int64 a, int64 b){ a -= b; if(a < 0) a += MOD; return a; } 10 int64 mulm(int64 a, int64 b){ return (int64)((__int128)a * b % MOD); } 11 int64 modpow(int64 a, int64 e){ int64 r=1; while(e){ if(e&1) r=mulm(r,a); a=mulm(a,a); e>>=1; } return r; } 12 int64 invm(int64 a){ return modpow(norm(a), MOD-2); } 13 14 // Returns the minimal recurrence coefficients c[0..k-1] such that 15 // s[n] = sum_{i=1..k} c[i-1] * s[n-i] (mod MOD) 16 vector<int64> berlekamp_massey(const vector<int64>& s){ 17 vector<int64> C = {1}, B = {1}; // connection and backup polynomials 18 int64 L = 0, m = 1; // current order and shift counter 19 int64 b = 1; // last nonzero discrepancy for B 20 for(size_t n = 0; n < s.size(); ++n){ 21 // Compute discrepancy d = s[n] + sum_{i=1..L} C[i]*s[n-i] 22 int64 d = 0; 23 for(int i = 0; i <= (int)L; ++i){ 24 int64 Ci = (i < (int)C.size() ? C[i] : 0); 25 d = addm(d, mulm(Ci, s[n - i])); 26 } 27 if(d == 0){ 28 // No change needed; just increase shift m 29 ++m; 30 continue; 31 } 32 // We need to adjust C: C <- C - (d/b) * x^m * B 33 vector<int64> T = C; // keep old C 34 int64 coef = mulm(d, invm(b)); 35 // Ensure C has enough size for shift by m 36 if(C.size() < B.size() + m) C.resize(B.size() + m, 0); 37 for(size_t i = 0; i < B.size(); ++i){ 38 C[i + m] = subm(C[i + m], mulm(coef, B[i])); 39 } 40 if(2*L <= (int)n){ 41 // Significant update: increase order 42 L = (int)n + 1 - L; 43 B = T; 44 b = d; 45 m = 1; 46 }else{ 47 // Minor correction; keep L 48 ++m; 49 } 50 } 51 // Convert connection polynomial C(x) = 1 + C1 x + ... + CL x^L 52 // into recurrence s[n] = sum c[i]*s[n-1-i], where c[i] = -C[i+1] 53 vector<int64> rec(L, 0); 54 for(int i = 0; i < (int)L; ++i){ 55 int64 val = norm(- (i+1 < (int)C.size() ? C[i+1] : 0)); 56 rec[i] = val; 57 } 58 return rec; 59 } 60 61 int main(){ 62 ios::sync_with_stdio(false); 63 cin.tie(nullptr); 64 65 int n; cin >> n; // number of known terms 66 vector<int64> s(n); 67 for(int i = 0; i < n; ++i){ cin >> s[i]; s[i] = norm(s[i]); } 68 69 vector<int64> rec = berlekamp_massey(s); 70 71 cout << rec.size() << "\n"; // order k 72 for(size_t i = 0; i < rec.size(); ++i){ 73 if(i) cout << ' '; 74 cout << rec[i]; 75 } 76 cout << "\n"; 77 return 0; 78 } 79
Reads the first n terms of a sequence modulo a prime and returns the minimal linear recurrence coefficients. The BM implementation maintains and updates the connection polynomial using discrepancy corrections. The output coefficients satisfy s[n] = sum_{i=1..k} rec[i-1] * s[n-i] (mod MOD).
1 #include <bits/stdc++.h> 2 using namespace std; 3 using int64 = long long; 4 const int64 MOD = 1000000007; // prime modulus 5 6 int64 norm(int64 x){ x%=MOD; if(x<0) x+=MOD; return x; } 7 int64 addm(int64 a,int64 b){ a+=b; if(a>=MOD) a-=MOD; return a; } 8 int64 mulm(int64 a,int64 b){ return (int64)((__int128)a*b % MOD); } 9 10 // Given recurrence s[n] = sum_{i=1..k} c[i-1]*s[n-i] 11 // and initial values s[0..k-1], compute s[n]. 12 // Uses Kitamasa: exponentiate polynomial x modulo Q(x)=1 - c1 x - ... - ck x^k 13 14 // Multiply two coefficient vectors A and B (length k), representing polynomials 15 // f(x) = sum A[i] x^i (i<k), g(x)=sum B[i] x^i, then reduce modulo Q to degree<k. 16 vector<int64> combine(const vector<int64>& A, const vector<int64>& B, const vector<int64>& rec){ 17 int k = (int)rec.size(); 18 vector<int64> res(2*k, 0); 19 // convolution 20 for(int i=0;i<k;++i) if(A[i]){ 21 for(int j=0;j<k;++j) if(B[j]){ 22 res[i+j] = addm(res[i+j], mulm(A[i], B[j])); 23 } 24 } 25 // reduce high degrees using x^k = c1 x^{k-1} + ... + ck (from Q(x)=1 - c1 x - ... - ck x^k) 26 for(int d=2*k-1; d>=k; --d){ 27 if(res[d]==0) continue; 28 int64 coef = res[d]; 29 // res[d - k + j] += coef * rec[j] for j=0..k-1 30 for(int j=1; j<=k; ++j){ 31 // x^d -> x^{d-j} with multiplier rec[j-1] 32 res[d - j] = addm(res[d - j], mulm(coef, rec[j-1])); 33 } 34 } 35 res.resize(k); 36 return res; 37 } 38 39 // Compute coefficients vector P such that s[n] = sum_{i=0..k-1} P[i]*s[i] 40 vector<int64> kitamasa_coeffs(long long n, const vector<int64>& rec){ 41 int k = (int)rec.size(); 42 // Base: x^0 mod Q -> polynomial 1, represents s[t] = s[t] 43 vector<int64> pol(k,0); pol[0]=1; // current answer 44 // Base power: x^1 mod Q is represented by shift vector E: s[t+1] expressed via s[t..t-k+1] 45 vector<int64> base(k,0); // represents x mod Q as linear form on first k terms 46 // x * s[t] corresponds to s[t+1]. Under recurrence, s[k] = sum rec[j]*s[k-1-j] 47 // The vector for x is [0,1,0,0,...] before reduction; reduced, it becomes: 48 // shift right: base[1]=1, base[2]=0,..., base[k-1]=0, base[0]=rec[0], base[1]+=rec[1], ... folded appropriately. 49 // Easier way: define base as the companion relation: s[n] = sum rec[i]*s[n-1-i] 50 // So x acting on [s[0..k-1]] produces [s[1..k-1], s[k]] -> coefficients for s[k] are rec. 51 // Therefore, base = [0,0,...,0] with base[0..k-2]=unit shift, handled by combine. 52 // We'll build base by noting that x corresponds to polynomial with coefficient 0.. except position 1 is 1. 53 // Then combine() reduction will inject rec automatically via the reduction step above. 54 base[1%k] = 1; // start with x 55 56 long long e = n; 57 while(e > 0){ 58 if(e & 1LL) pol = combine(pol, base, rec); 59 e >>= 1LL; 60 if(e) base = combine(base, base, rec); 61 } 62 return pol; 63 } 64 65 int64 nth_term(long long n, const vector<int64>& init, const vector<int64>& rec){ 66 int k = (int)rec.size(); 67 if(n < (long long)init.size()) return norm(init[(int)n]); 68 vector<int64> P = kitamasa_coeffs(n, rec); 69 int64 ans = 0; 70 for(int i=0;i<k;++i) ans = addm(ans, mulm(P[i], init[i])); 71 return ans; 72 } 73 74 int main(){ 75 ios::sync_with_stdio(false); 76 cin.tie(nullptr); 77 78 // Example: Fibonacci with init [F0=0, F1=1], rec: F_n = F_{n-1} + F_{n-2} 79 vector<int64> init = {0,1}; 80 vector<int64> rec = {1,1}; // c1=1, c2=1 81 82 long long qn; cin >> qn; // query index n 83 cout << nth_term(qn, init, rec) << "\n"; 84 return 0; 85 } 86
Implements Kitamasa’s method to compute the n-th term from the first k initial terms and the k recurrence coefficients. It avoids matrices by exponentiating the polynomial x modulo the characteristic polynomial Q, yielding coefficients P such that s[n] = Σ P[i] s[i].
1 #include <bits/stdc++.h> 2 using namespace std; 3 using int64 = long long; 4 const int64 MOD = 998244353; // prime modulus 5 6 int64 norm(int64 x){ x%=MOD; if(x<0) x+=MOD; return x; } 7 int64 addm(int64 a,int64 b){ a+=b; if(a>=MOD) a-=MOD; return a; } 8 int64 subm(int64 a,int64 b){ a-=b; if(a<0) a+=MOD; return a; } 9 int64 mulm(int64 a,int64 b){ return (int64)((__int128)a*b % MOD); } 10 int64 modpow(int64 a, int64 e){ int64 r=1; while(e){ if(e&1) r=mulm(r,a); a=mulm(a,a); e>>=1; } return r; } 11 int64 invm(int64 a){ return modpow(norm(a), MOD-2); } 12 13 vector<int64> berlekamp_massey(const vector<int64>& s){ 14 vector<int64> C = {1}, B = {1}; 15 int64 L = 0, m = 1, b = 1; 16 for(size_t n=0;n<s.size();++n){ 17 int64 d = 0; 18 for(int i=0;i<= (int)L; ++i){ 19 d = addm(d, mulm((i<(int)C.size()?C[i]:0), s[n - i])); 20 } 21 if(d==0){ ++m; continue; } 22 vector<int64> T = C; 23 int64 coef = mulm(d, invm(b)); 24 if(C.size() < B.size() + m) C.resize(B.size()+m, 0); 25 for(size_t i=0;i<B.size();++i) C[i+m] = subm(C[i+m], mulm(coef, B[i])); 26 if(2*L <= (int)n){ L = (int)n + 1 - L; B = T; b = d; m = 1; } 27 else { ++m; } 28 } 29 vector<int64> rec(L,0); 30 for(int i=0;i<(int)L;++i) rec[i] = norm(- (i+1 < (int)C.size() ? C[i+1] : 0)); 31 return rec; 32 } 33 34 vector<int64> combine(const vector<int64>& A, const vector<int64>& B, const vector<int64>& rec){ 35 int k = (int)rec.size(); 36 vector<int64> res(2*k, 0); 37 for(int i=0;i<k;++i) if(A[i]){ 38 for(int j=0;j<k;++j) if(B[j]){ 39 res[i+j] = addm(res[i+j], mulm(A[i], B[j])); 40 } 41 } 42 for(int d=2*k-1; d>=k; --d){ 43 if(res[d]==0) continue; 44 int64 coef = res[d]; 45 for(int j=1;j<=k;++j){ 46 res[d-j] = addm(res[d-j], mulm(coef, rec[j-1])); 47 } 48 } 49 res.resize(k); 50 return res; 51 } 52 53 vector<int64> kitamasa_coeffs(long long n, const vector<int64>& rec){ 54 int k = (int)rec.size(); 55 vector<int64> pol(k,0); pol[0]=1; // answer accumulator (x^0) 56 vector<int64> base(k,0); base[1%k]=1; // represents x 57 long long e = n; 58 while(e>0){ 59 if(e&1LL) pol = combine(pol, base, rec); 60 e >>= 1LL; 61 if(e) base = combine(base, base, rec); 62 } 63 return pol; 64 } 65 66 int64 nth_term(long long n, const vector<int64>& init, const vector<int64>& rec){ 67 int k = (int)rec.size(); 68 if(n < (long long)init.size()) return norm(init[(int)n]); 69 vector<int64> P = kitamasa_coeffs(n, rec); 70 int64 ans = 0; 71 for(int i=0;i<k;++i) ans = addm(ans, mulm(P[i], init[i])); 72 return ans; 73 } 74 75 int main(){ 76 ios::sync_with_stdio(false); 77 cin.tie(nullptr); 78 79 int m; cin >> m; // number of known terms 80 vector<int64> s(m); 81 for(int i=0;i<m;++i){ cin >> s[i]; s[i] = norm(s[i]); } 82 83 // Recover minimal recurrence 84 vector<int64> rec = berlekamp_massey(s); 85 int k = (int)rec.size(); 86 87 // Initial vector: first k terms are needed for Kitamasa 88 vector<int64> init = s; 89 if((int)init.size() < k){ 90 // pad if pathological (e.g., zero sequence); safe because recurrence fits prefix 91 init.resize(k, 0); 92 } 93 94 int q; cin >> q; // number of queries 95 while(q--){ 96 long long n; cin >> n; // 0-indexed term index 97 cout << nth_term(n, init, rec) << '\n'; 98 } 99 return 0; 100 } 101
This program reads the first m terms, reconstructs the minimal recurrence with BM, and then answers multiple large-index queries using Kitamasa’s method. It assumes the sequence truly follows a linear recurrence modulo MOD.