Aho-Corasick - DP Applications
Key Points
- •Aho–Corasick (AC) turns a set of forbidden patterns into a finite automaton that lets you process or generate strings while tracking whether any pattern appears.
- •Dynamic programming (DP) over AC states counts how many length-n strings avoid (or contain) those patterns by following automaton transitions.
- •Use a state that includes the current AC node and, optionally, a flag for whether a pattern has been matched so far.
- •For very large n, compress safe (non-terminal) states into a transition matrix and use fast matrix exponentiation to jump n steps in O( log n).
- •You can count strings that contain at least one forbidden substring either by a 2-layer DP or by complementing minus the count of avoiders.
- •Treat any terminal (output) node as a bad or absorbing state so the DP does not continue from it when counting avoiders.
- •The AC automaton is built from a trie with failure links; filling missing transitions via failure links makes DP transitions O(1) per character.
- •Applications include counting forbidden substrings, constrained string construction per position, and computing answers modulo a large prime.
Prerequisites
- →Trie data structure — AC builds on a trie of patterns; understanding insertion and prefix paths is essential.
- →KMP failure function — Failure links in AC generalize the KMP idea to multiple patterns; intuition carries over.
- →Dynamic programming — The counting is a DP over automaton states across positions.
- →Matrix exponentiation — For large n, we exponentiate the transition matrix to jump n steps efficiently.
- →Modular arithmetic — Counts are large; results are typically computed modulo a prime.
- →Graph walks and adjacency matrices — DP over automata is equivalent to counting fixed-length walks in a directed multigraph.
- →Time and space complexity analysis — To choose between iterative DP and matrix methods depending on n and S.
Detailed Explanation
Tap terms for definitions01Overview
The Aho–Corasick automaton is a data structure that efficiently recognizes multiple patterns in a text. In competitive programming, a powerful application is dynamic programming (DP) over automaton states to count how many strings of a given length avoid (or contain) any pattern from a forbidden set. We construct a trie of all forbidden patterns, compute failure links (like KMP for multiple patterns), and mark every node that represents reaching some pattern (directly or via suffix links). Once we have the automaton, every character extension from a state corresponds to a deterministic next state. This turns counting problems about strings into graph-walk counting problems over automaton states. When n is moderate, we can do standard DP in O(n · S · σ), where S is the number of states and σ is the alphabet size. For very large n (e.g., up to 10^18), we compress the DP into a matrix M of dimension equal to the number of non-terminal ("safe") states (plus one aggregator state if we also track the "already matched" case) and compute M^n with fast exponentiation. This technique answers questions like “How many strings of length n over an alphabet of size k avoid these patterns?” and “How many contain at least one forbidden pattern?” The automaton encapsulates all overlap interactions among patterns, so the DP just follows edges, avoiding re-checking substrings.
02Intuition & Analogies
Imagine you are walking through a city where each intersection represents “how much of a forbidden pattern suffix you currently hold” as you build a string, one character at a time. Each road is labeled by a character; taking that road means appending that character. If you ever reach a “danger zone” intersection (a terminal node), that means a forbidden pattern has just appeared in your string. The Aho–Corasick automaton is the city map: it combines a trie (roads that follow exact prefixes) with failure links (back alleys) that tell you where to go if your next character doesn’t continue the current prefix, jumping to the longest valid suffix that matches a prefix of some pattern. This ensures that no matter what character you add, you always know your next intersection in O(1) time. DP over this city map simply counts how many ways to walk exactly n steps without ever stepping into danger zones. If you need to count strings that do hit a danger zone at least once, you can carry an extra light that flips on the first time you enter a danger zone; from then on, it stays on. The state is then (intersection, light on/off). For very long walks (huge n), rather than taking one step at a time, we precompute how many ways you can go from any intersection to any other in 1 step (a transition matrix). Then we fast-forward n steps by squaring this matrix repeatedly (matrix exponentiation), like leaping through time instead of walking. The sum of ways that end in safe intersections is the count of avoiding strings; the complement to k^n gives hitting strings.
03Formal Definition
04When to Use
Use AC + DP when you must count or construct strings under substring constraints that involve multiple patterns simultaneously, especially when patterns overlap in complex ways. Typical scenarios: (1) Count strings of length n over an alphabet Σ that avoid a set of forbidden substrings P (modulo a prime). (2) Count strings that contain at least one forbidden pattern; you can either do a two-layer DP with a matched flag or take the complement of k^n minus avoiders. (3) Position-constrained construction: at each position i, only a subset Σ_i ⊆ Σ is allowed; DP naturally handles this by only taking allowed transitions per position. (4) Very large n (e.g., 10^12): compress Safe states into a transition matrix and use fast matrix exponentiation to compute M^n. (5) Weighted counts: when letters or transitions carry weights, you can sum weights instead of counts. (6) Enumerating or sampling: while DP counts, parent pointers can reconstruct one valid string if the total count is non-zero. Avoid using naive inclusion–exclusion on patterns unless the pattern set is tiny and non-overlapping; AC encodes overlaps automatically and scales with the total pattern length. When the alphabet is small to medium and the sum of pattern lengths is manageable (few thousands), AC DP is usually the intended solution in CF 2100–2500 problems.
⚠️Common Mistakes
• Not propagating terminal status along failure links: after building the trie and failure links, every node should inherit the output status from its failure ancestor. Missing this leads to undercounting forbidden hits. • Forgetting to complete transitions: δ must be total. During build, fill next[u][c] = next[link[u]][c] when the trie edge is absent; otherwise DP needs expensive while-loops. • Counting through terminal states: when counting avoiders, never continue DP from nodes in F. Either skip transitions that land in F or compress Safe states only. • Off-by-one in matrix orientation: decide whether you use row or column vectors; ensure M_{i,j} correctly represents transitions from j to i (or vice versa). Test with tiny n to validate. • Modulo errors: apply modulo after every addition and multiplication; use 64-bit temporaries to avoid overflow. • Oversized matrices: for matrix exponentiation, compress to Safe states and consider an aggregated “matched” state instead of full 2S to keep dimensions small. • Wrong alphabet mapping: ensure characters map to 0..k-1 consistently; mixing ‘a’..‘z’ with custom alphabets causes bad transitions. • Assuming DP and matrix answers differ: they should match. Verify with small n and brute force (k small) to catch implementation bugs.
Key Formulas
AC State Bound
Explanation: The number of AC states is at most the total number of trie nodes, which is one plus the sum of pattern lengths. This bounds DP and matrix dimensions.
DP Recurrence for Avoiders
Explanation: To count strings of length t+1 ending at safe state v, sum over all safe states u and letters a that transition to v while staying safe. It encodes one-step extensions.
Total Avoiders
Explanation: The total number of length-n strings that avoid the patterns equals the sum of DP counts over all safe ending states.
Matrix Form
Explanation: If is the column vector of counts over safe states, and M is the safe-state transition matrix, then applying M n times gives counts after n steps.
Matrix Entries
Explanation: Entry counts how many letters take you from safe state j to safe state i. This ensures integer matrix powers count walks.
Complement Count
Explanation: The number of strings that contain at least one forbidden pattern equals all possible strings minus the number that avoid every pattern.
Aggregator Matrix
Explanation: An (S+1)×(S+1) matrix where the last row/column represents the absorbing 'matched' aggregator with a k-fold self-loop. Transitions from safe to forbidden route into the aggregator.
Time Complexities
Explanation: Iterative DP scales linearly with n, states S, and alphabet size k. Matrix exponentiation trades n for a cubic dependence on S but only logarithmic in n.
Geometric Series (for checks)
Explanation: Useful to sanity-check growth or aggregator transitions. It sums the number of strings across lengths when needed.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Count strings of length n over alphabet 'a'..('a'+K-1) 5 // that avoid a set of forbidden patterns (and also how many contain at least one). 6 // DP state includes: (AC node, matchedFlag). 7 // Complexity: O(n * S * K) 8 9 struct AhoCorasick { 10 int K; // alphabet size 11 vector<vector<int>> next; // transitions (filled to be total) 12 vector<int> link; // failure links 13 vector<int> out; // 1 if this node (or suffix) matches any pattern 14 15 AhoCorasick(int K): K(K) { 16 add_node(); // root = 0 17 } 18 19 int add_node() { 20 next.push_back(vector<int>(K, -1)); 21 link.push_back(0); 22 out.push_back(0); 23 return (int)next.size()-1; 24 } 25 26 void insert(const string &s) { 27 int v = 0; 28 for (char ch : s) { 29 int c = ch - 'a'; 30 if (next[v][c] == -1) next[v][c] = add_node(); 31 v = next[v][c]; 32 } 33 out[v] = 1; // mark terminal 34 } 35 36 void build() { 37 queue<int> q; 38 // Initialize root transitions 39 for (int c = 0; c < K; ++c) { 40 int u = next[0][c]; 41 if (u != -1) { 42 link[u] = 0; 43 q.push(u); 44 } else { 45 next[0][c] = 0; // complete missing transitions to root 46 } 47 } 48 // BFS 49 while (!q.empty()) { 50 int v = q.front(); q.pop(); 51 out[v] = out[v] || out[link[v]]; // inherit terminality 52 for (int c = 0; c < K; ++c) { 53 int u = next[v][c]; 54 if (u != -1) { 55 link[u] = next[link[v]][c]; 56 q.push(u); 57 } else { 58 next[v][c] = next[link[v]][c]; // fill to total function 59 } 60 } 61 } 62 } 63 }; 64 65 int main() { 66 ios::sync_with_stdio(false); 67 cin.tie(nullptr); 68 69 const long long MOD = 1000000007LL; 70 71 // Example setup 72 int K = 2; // alphabet = {'a','b'} 73 vector<string> patterns = {"ab", "baa"}; 74 long long n = 10; // length 75 76 AhoCorasick ac(K); 77 for (auto &p : patterns) ac.insert(p); 78 ac.build(); 79 80 int S = (int)ac.next.size(); 81 82 // dp0[u] = count of length-t strings ending at node u with no match so far 83 // dp1[u] = count of length-t strings ending at node u with at least one match so far 84 vector<long long> dp0(S, 0), dp1(S, 0), ndp0(S, 0), ndp1(S, 0); 85 dp0[0] = 1; // start at root, length 0 86 87 for (long long t = 0; t < n; ++t) { 88 fill(ndp0.begin(), ndp0.end(), 0); 89 fill(ndp1.begin(), ndp1.end(), 0); 90 for (int u = 0; u < S; ++u) { 91 long long a0 = dp0[u]; 92 long long a1 = dp1[u]; 93 if ((a0 | a1) == 0) continue; 94 for (int c = 0; c < K; ++c) { 95 int v = ac.next[u][c]; 96 if (!ac.out[v]) { 97 // stay unmatched if we were unmatched 98 ndp0[v] = (ndp0[v] + a0) % MOD; 99 } else { 100 // crossing into terminal by this char turns matched on 101 ndp1[v] = (ndp1[v] + a0) % MOD; 102 } 103 // once matched, always matched 104 ndp1[v] = (ndp1[v] + a1) % MOD; 105 } 106 } 107 dp0.swap(ndp0); 108 dp1.swap(ndp1); 109 } 110 111 long long avoid = 0, matched = 0, total0 = 1; 112 for (int u = 0; u < S; ++u) if (!ac.out[u]) avoid = (avoid + dp0[u]) % MOD; 113 for (int u = 0; u < S; ++u) matched = (matched + dp1[u]) % MOD; 114 for (long long i = 0; i < n; ++i) total0 = (total0 * K) % MOD; // total strings k^n 115 116 cout << "Avoiders: " << avoid << "\n"; 117 cout << "Contain >=1 forbidden: " << matched << "\n"; 118 cout << "Check k^n: " << total0 << " and avoid+match mod M = " << (avoid + matched) % MOD << "\n"; 119 120 return 0; 121 } 122
We build the AC automaton over alphabet 'a'..('a'+K-1), fill all missing transitions via failure links, and propagate terminal flags. The DP uses two layers: dp0 (have not matched any forbidden pattern yet) and dp1 (have matched at least once). On each extension by a character, if the next node is terminal, unmatched transitions move into the matched layer. The matched layer is closed under transitions. Summing dp0 over safe states gives the count of avoiding strings; summing dp1 over all states gives the count of strings that contain at least one forbidden substring. We also verify avoiders + matched = k^n modulo MOD.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Build AC, compress to safe states, add one aggregator state for "already matched", 5 // and compute counts for length n via matrix exponentiation. 6 7 struct AhoCorasick { 8 int K; 9 vector<vector<int>> next; 10 vector<int> link; 11 vector<int> out; 12 AhoCorasick(int K): K(K) { add_node(); } 13 int add_node(){ next.push_back(vector<int>(K,-1)); link.push_back(0); out.push_back(0); return (int)next.size()-1; } 14 void insert(const string &s){ int v=0; for(char ch: s){ int c=ch-'a'; if(next[v][c]==-1) next[v][c]=add_node(); v=next[v][c]; } out[v]=1; } 15 void build(){ queue<int> q; for(int c=0;c<K;c++){ int u=next[0][c]; if(u!=-1){ link[u]=0; q.push(u);} else next[0][c]=0; } while(!q.empty()){ int v=q.front(); q.pop(); out[v]|=out[link[v]]; for(int c=0;c<K;c++){ int u=next[v][c]; if(u!=-1){ link[u]=next[link[v]][c]; q.push(u);} else next[v][c]=next[link[v]][c]; } } } 16 }; 17 18 struct Matrix { 19 int n; long long MOD; vector<vector<long long>> a; // a[i][j] 20 Matrix(int n=0, long long MOD=1000000007LL): n(n), MOD(MOD), a(n, vector<long long>(n,0)) {} 21 static Matrix identity(int n, long long MOD){ Matrix I(n, MOD); for(int i=0;i<n;i++) I.a[i][i]=1; return I; } 22 Matrix operator*(const Matrix& o) const { 23 Matrix r(n, MOD); 24 // naive O(n^3); consider optimizing if sparse 25 for(int i=0;i<n;i++){ 26 for(int k=0;k<n;k++) if(a[i][k]){ 27 long long aik = a[i][k]; 28 for(int j=0;j<n;j++) if(o.a[k][j]){ 29 r.a[i][j] = (r.a[i][j] + aik * o.a[k][j]) % MOD; 30 } 31 } 32 } 33 return r; 34 } 35 }; 36 37 Matrix mpow(Matrix base, unsigned long long e){ 38 Matrix res = Matrix::identity(base.n, base.MOD); 39 while(e){ 40 if(e & 1ULL) res = res * base; 41 base = base * base; 42 e >>= 1ULL; 43 } 44 return res; 45 } 46 47 int main(){ 48 ios::sync_with_stdio(false); 49 cin.tie(nullptr); 50 51 const long long MOD = 1000000007LL; 52 int K = 3; // alphabet {'a','b','c'} 53 vector<string> patterns = {"ab", "ba", "caa"}; 54 unsigned long long n = 1000000000000ULL; // 1e12 55 56 AhoCorasick ac(K); 57 for (auto &p: patterns) ac.insert(p); 58 ac.build(); 59 60 int N = (int)ac.next.size(); 61 vector<int> safe_idx(N, -1); 62 vector<int> id2node; id2node.reserve(N); 63 for(int u=0; u<N; ++u){ if(!ac.out[u]){ safe_idx[u] = (int)id2node.size(); id2node.push_back(u); } } 64 int S = (int)id2node.size(); 65 66 // Dimension: S safe states + 1 aggregator (last index) 67 int D = S + 1, AGG = S; 68 Matrix M(D, MOD); 69 70 // Column-vector convention: v_next = M * v_curr 71 // Fill transitions from safe states 72 for(int j=0; j<S; ++j){ 73 int u = id2node[j]; 74 for(int c=0;c<K;c++){ 75 int v = ac.next[u][c]; 76 if(!ac.out[v]){ 77 int i = safe_idx[v]; 78 M.a[i][j] = (M.a[i][j] + 1) % MOD; // safe->safe 79 } else { 80 M.a[AGG][j] = (M.a[AGG][j] + 1) % MOD; // safe->aggregator 81 } 82 } 83 } 84 // Aggregator self-loop: once matched, every next char keeps you matched; k choices each step 85 M.a[AGG][AGG] = (M.a[AGG][AGG] + K) % MOD; 86 87 // Initial vector v0: start at root (if safe) 88 vector<long long> v0(D, 0); 89 if (safe_idx[0] == -1) { 90 // Root is terminal (rare unless empty pattern). Then all length>=1 are matched immediately. 91 v0[AGG] = 1; 92 } else { 93 v0[safe_idx[0]] = 1; 94 } 95 96 Matrix P = mpow(M, n); 97 98 // v_n = P * v0 99 vector<long long> vn(D, 0); 100 for(int i=0;i<D;i++){ 101 long long sum = 0; 102 for(int j=0;j<D;j++) if(P.a[i][j] && v0[j]){ 103 sum = (sum + P.a[i][j] * v0[j]) % MOD; 104 } 105 vn[i] = sum; 106 } 107 108 long long avoid = 0; 109 for(int i=0;i<S;i++) avoid = (avoid + vn[i]) % MOD; 110 long long matched = vn[AGG]; 111 112 cout << "Avoiders (mod 1e9+7): " << avoid << "\n"; 113 cout << "Contain >=1 forbidden (mod 1e9+7): " << matched << "\n"; 114 115 return 0; 116 } 117
We compress to safe states and add a single absorbing aggregator to collect transitions that would hit any terminal state. The transition matrix M is built so that column j (from-state) distributes to rows i (to-states) by the number of letters causing that move. The aggregator has a k-fold self-loop. We raise M to the n-th power and multiply by the initial vector to get counts after n steps. The safe components sum to avoiders; the aggregator component equals the number of strings with at least one forbidden occurrence.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Given: at position i, only a subset of letters is allowed. 5 // Count strings of length n that avoid forbidden patterns. 6 7 struct AhoCorasick { 8 int K; 9 vector<vector<int>> next; 10 vector<int> link; 11 vector<int> out; 12 AhoCorasick(int K): K(K){ add_node(); } 13 int add_node(){ next.push_back(vector<int>(K,-1)); link.push_back(0); out.push_back(0); return (int)next.size()-1; } 14 void insert(const string &s){ int v=0; for(char ch: s){ int c=ch-'a'; if(next[v][c]==-1) next[v][c]=add_node(); v=next[v][c]; } out[v]=1; } 15 void build(){ queue<int> q; for(int c=0;c<K;c++){ int u=next[0][c]; if(u!=-1){ link[u]=0; q.push(u);} else next[0][c]=0; } while(!q.empty()){ int v=q.front(); q.pop(); out[v]|=out[link[v]]; for(int c=0;c<K;c++){ int u=next[v][c]; if(u!=-1){ link[u]=next[link[v]][c]; q.push(u);} else next[v][c]=next[link[v]][c]; } } } 16 }; 17 18 int main(){ 19 ios::sync_with_stdio(false); 20 cin.tie(nullptr); 21 22 const long long MOD = 1000000007LL; 23 24 int K = 4; // alphabet: {'a','b','c','d'} 25 vector<string> patterns = {"ab", "bcd"}; 26 int n = 8; 27 28 // allowed[i] is a bitmask over K letters indicating which are allowed at position i 29 vector<int> allowed(n, 0); 30 // Example: alternate constraints: positions even allow {a,c}, odd allow {b,d} 31 for(int i=0;i<n;i++){ 32 if(i%2==0) allowed[i] = (1<<0) | (1<<2); // a,c 33 else allowed[i] = (1<<1) | (1<<3); // b, d 34 } 35 36 AhoCorasick ac(K); 37 for(auto &p: patterns) ac.insert(p); 38 ac.build(); 39 40 int S = (int)ac.next.size(); 41 vector<long long> dp(S, 0), ndp(S, 0); 42 dp[0] = 1; 43 44 for(int i=0;i<n;i++){ 45 fill(ndp.begin(), ndp.end(), 0); 46 for(int u=0; u<S; ++u){ 47 if (dp[u] == 0) continue; 48 if (ac.out[u]) continue; // never continue from terminal states 49 for(int c=0;c<K;c++) if(allowed[i] & (1<<c)){ 50 int v = ac.next[u][c]; 51 if (!ac.out[v]){ 52 ndp[v] += dp[u]; 53 if (ndp[v] >= MOD) ndp[v] -= MOD; 54 } 55 // transitions into terminal states are skipped (avoidance) 56 } 57 } 58 dp.swap(ndp); 59 } 60 61 long long ans = 0; 62 for(int u=0; u<S; ++u) if(!ac.out[u]){ ans += dp[u]; if(ans >= MOD) ans -= MOD; } 63 64 cout << "Avoiders with position constraints: " << ans << "\n"; 65 return 0; 66 } 67
This demonstrates per-position constraints on allowed letters. For each position, we only follow transitions labeled by allowed characters and skip any transition that would enter a terminal node. The final sum over safe states gives the count of pattern-avoiding strings subject to the per-position constraints.