🗂️Data StructureAdvanced

Suffix Automaton - Advanced Usage

Key Points

  • A suffix automaton (SAM) is a compact DFA that captures all distinct substrings of a string and supports many advanced queries in linear time.
  • Occurrences of a pattern in the text equal the size of its end-position set, which you can compute as a frequency propagated along suffix links.
  • The number of distinct substrings equals the sum over states of len[v] - len[link[v]], which counts how many new lengths each state contributes.
  • The K-th lexicographically smallest substring can be found by DP on the DAG of SAM transitions and guided traversal from the initial state.
  • The longest common substring between strings is found by walking the second (or many) strings over the SAM of the first and tracking matched lengths.
  • DP on the SAM (often in decreasing order of len) is the key to attach counts, maximize/minimize scores, or filter by frequency constraints.
  • SAM builds in O(n) time and space, has at most 2n-1 states, and answers pattern queries in O(m) where m is pattern length.
  • Implementation pitfalls include forgetting to propagate counts in topological order, off-by-one errors around len and suffix links, and unsorted transitions for lexicographic tasks.

Prerequisites

  • Deterministic finite automata (DFA)SAM is a minimal DFA; understanding states, transitions, and determinism clarifies why each substring maps to a unique path.
  • Suffix link/suffix tree intuitionSuffix links encode the relation between contexts; they are crucial for construction and DP propagation.
  • Big-O notation and amortized analysisSAM’s linear bounds rely on amortized arguments about cloning and transitions.
  • Topological order and DP on DAGsCounting paths and propagating counts along suffix links require DAG DP in length order.
  • Lexicographic orderK-th substring selection depends on sorting transitions and reasoning about lexicographic skips.
  • String processing basicsIndexing, substrings, prefixes/suffixes, and character encodings are used throughout.
  • Hash maps vs fixed arraysChoosing transition storage affects performance; arrays are best for small alphabets.
  • Proof by invariantsLCS walking logic maintains invariants on matched length when climbing suffix links.

Detailed Explanation

Tap terms for definitions

01Overview

A suffix automaton (SAM) is a deterministic finite automaton built from a string S that recognizes exactly the set of substrings of S. It compresses all substrings into a minimal structure where each state represents an equivalence class of end positions (endpos) of substrings. The SAM has three main components per state: len[v] (the length of the longest string in the class), link[v] (a suffix link pointing to a state representing the longest proper suffix class), and next[v] (transitions labeled by characters). Remarkably, a SAM for a string of length n has at most 2n-1 states, so it can be built and stored in linear time and space. This compactness makes it a powerful backbone for string problems that need to count, order, or compare substrings. Advanced usages include counting occurrences of any pattern P in O(|P|), computing the number of distinct substrings, finding the K-th lexicographically smallest substring, computing the longest common substring (LCS) of multiple strings via a generalized SAM, and dynamic programming over the SAM’s DAG to aggregate counts and optimize over constrained substrings. Because transitions are deterministic, each distinct substring corresponds to exactly one path from the initial state; this fact underlies many DP solutions. The combination of suffix links and length-bounded contributions provides natural formulas to count or filter substrings by frequency.

02Intuition & Analogies

Imagine writing every substring of a text on sticky notes and grouping notes that end at the same positions in the text. Many notes are duplicates (the same substring appears in different places), and many are suffixes of one another. A suffix automaton is like a smart filing system that merges these notes into folders (states) where each folder represents all substrings that end at the same set of positions. Each folder knows the length of the longest note it contains (len), and it has a pointer (suffix link) to the folder that would contain the note if you chop off one character from the front as far as possible while staying in the same group structure. Now picture a subway map. The starting station is the empty string. Each labeled rail (transition) takes you to the next folder by appending a character. Any substring is a unique route starting at the empty station, moving along rails labeled by its characters. Because the map is deterministic, there is exactly one way to traverse a given substring, which is why counting distinct substrings becomes counting distinct paths. The suffix links are like emergency staircases that let you jump back to a simpler folder representing shorter contexts when a track you want isn’t available—this is the trick used when matching patterns or other strings quickly against the SAM. For lexicographic tasks (like the K-th substring), if you sort the rails leaving each station alphabetically and know how many paths lie beyond each rail, you can skip entire blocks of substrings without enumerating them. For counting occurrences, you can imagine each time a folder is reached during the text’s construction you drop a marble there; after building, you tilt the map along suffix links so marbles roll backward, adding up counts to represent how many end positions each class owns.

03Formal Definition

Given a string S over alphabet , its suffix automaton is the minimal DFA that recognizes the language of all substrings of S. Each state v corresponds to an endpos-equivalence class: two strings x,y are equivalent if endpos(x) = endpos(y), where endpos(x) is the set of indices where x ends in S. For each state v, we store: len[v], the length of the longest string in its class; link[v], a suffix link pointing to the state representing the longest proper suffix of any string in v’s class; and next[v][c], the transition under character c, if it exists. The initial state 0 represents the empty string class, with link[0] = -1 and len[0] = 0. The automaton is built online by extending S from left to right; the construction maintains minimality by cloning states when necessary to preserve DFA determinism while updating reachability. The SAM has at most 2-1 states and at most O() transitions, and it can be constructed in () time. Several key properties follow: each distinct substring of S corresponds to a unique path starting at the initial state; the number of distinct substrings equals ([v] - [[v]]); and the number of occurrences of a substring represented by state v equals , which is computable by accumulating visit counts from longer to shorter via suffix links. The graph formed by transitions is a DAG, enabling dynamic programming over states ordered by nondecreasing len.

04When to Use

Use a suffix automaton when you need fast, repeated substring queries on a fixed text S: pattern occurrence counts in O(|P|), checking membership of a substring, or answering how many distinct substrings exist. It shines in competitive programming for problems like: computing the K-th lexicographically smallest substring without enumerating all substrings; longest common substring (2 or many strings) by scanning other strings against the SAM of the first and tracking matched lengths; counting substrings that occur at least K times by combining endpos sizes with length gaps; and accumulating statistics (e.g., number of substrings starting with a given prefix) via DP on the DAG. If you have multiple queries on one text, SAM typically beats suffix arrays or trees in coding complexity while retaining linear performance. When memory must be strictly linear in |S| and the alphabet is modest (e.g., lowercase letters), SAM is a robust choice. For very large alphabets, maps may slow transitions but correctness remains. For tasks that need lexicographic ranks or suffix order explicitly, a suffix array might be more direct; for substring automata with all substrings and on-the-fly minimality, SAM is ideal.

⚠️Common Mistakes

  • Forgetting to propagate occurrence counts in decreasing order of len. Raw cnt[v] after building only counts times state v was the active end; you must bucket-sort states by len and do cnt[link[v]] += cnt[v] to get |endpos(v)|.
  • Confusing len[v] with the number of substrings at state v. The contribution of state v to distinct substrings is len[v] - len[link[v]], not len[v] itself.
  • Off-by-one in K-th substring DP: decide whether to count the empty substring. The standard approach excludes the empty substring, so the path contribution for an edge is 1 + dp[next]. Sort outgoing edges lexicographically.
  • Mishandling transitions when matching another string for LCS: after failing a transition, while-climb via suffix links and shrink the matched length to len[cur]; do not let the length exceed len[cur] after jumps.
  • Using unordered_map transitions for very tight time limits without reserving buckets or switching to fixed arrays when the alphabet is small; this can cause TLE due to hashing overhead.
  • Forgetting to clone during construction or copying cnt/len incorrectly when cloning; clones must copy transitions, inherit link, set len to len[p] + 1, and never receive initial cnt=1 (clones get cnt=0 initially).
  • Ignoring multiple test cases: always reinitialize the SAM and auxiliary arrays per test case.
  • Recursion depth in DFS DP for large n: prefer iterative/topological DP or ensure the environment can handle deep recursion.

Key Formulas

State Bound

Explanation: A suffix automaton for a string of length n has at most 2n-1 states. This guarantees linear space and underpins O(n) construction time.

Transition Bound

Explanation: The total number of transitions is linear in n. This ensures that DP or propagation steps over all edges remain linear-time.

Distinct Substrings

Explanation: Each state v contributes all lengths between len[link[v]]+1 and len[v], which are new compared to its suffix link. Summing these gaps counts all distinct substrings exactly once.

Endpos Propagation

Explanation: After construction, cnt[v] initially counts how many times v was last during extension. Propagating counts from longer to shorter via suffix links yields |endpos(v)|, the number of occurrences of substrings in v’s class.

Pattern Occurrences

Explanation: Following P’s characters from the initial state yields a state that represents P. Its propagated count equals the number of times P appears in the text.

Path Counting for Substrings

Explanation: Counting paths starting at v (excluding the empty path) yields the number of substrings represented by paths from v. This supports K-th lexicographic substring selection.

LCS Matching Invariant

Explanation: When walking another string over the SAM, we extend matches on existing transitions and, on failure, climb suffix links while shrinking the current matched length to remain valid. The maximum L over the scan is the LCS length.

Frequent Substrings

Explanation: All substrings represented by state v (the length range it contributes) share the same endpos count. If cnt[v] K, all those substrings occur at least K times, contributing the full gap.

Construction Complexity

Explanation: SAM builds in linear time and space with respect to the string length n. Each extension creates at most one new state and possibly one clone.

Pattern Query Cost

Explanation: Checking a pattern or counting its occurrences is linear in the pattern length, since it follows at most one deterministic transition per character.

Complexity Analysis

Let n be the text length. SAM construction is O(n) time and O(n) space. Each character extension creates at most one new state and at most one clone, so the total states are at most 2n-1. The total number of transitions is linear, thus all per-edge work sums to O(n). Counting occurrences of a pattern P takes O() time by following transitions; if a transition is missing, the pattern does not occur, and we stop early. To compute for all states, we first bucket states by len and then propagate counts from larger len to smaller via suffix links in O(n). Distinct substring count via (len[v]-len[link[v]]) is O(n) after construction because it just iterates states. For K-th lexicographic substring, we need DP on the DAG to count paths: dp[v] = (1+dp[to]) in O(), which is O(n). Selecting the K-th substring then traverses transitions in sorted order, subtracting block sizes; this takes O(K * ) in the worst case if you materialize the path, but the decision at each step is O() and the number of steps equals the answer length. With =26 this is fast. The longest common substring (2 strings) scan runs in O(+); for multiple strings, each scan is O() plus one O(n) propagation, giving total O(n + ). Memory usage is dominated by states and transitions: O(n) integers and O(n) if using fixed arrays for small alphabets, or O(n) pointers/hashes if using maps. All these routines maintain linear (or near-linear) performance suited for large inputs in competitive programming.

Code Examples

Pattern occurrence counting via |endpos| on a Suffix Automaton
1#include <bits/stdc++.h>
2using namespace std;
3
4// Suffix Automaton for lowercase 'a'..'z'.
5// Supports occurrence counting via endpos sizes.
6struct SAM {
7 struct State {
8 int next[26]; // transitions
9 int link; // suffix link
10 int len; // length of the longest string in this state
11 long long cnt; // number of end positions (after propagation)
12 State() : link(-1), len(0), cnt(0) { fill(begin(next), end(next), -1); }
13 };
14
15 vector<State> st;
16 int last; // index of the state representing the whole current string
17
18 SAM(int maxlen = 0) { st.reserve(2 * maxlen); st.push_back(State()); last = 0; }
19
20 void extend(char ch) {
21 int c = ch - 'a';
22 int cur = (int)st.size();
23 st.push_back(State());
24 st[cur].len = st[last].len + 1;
25 st[cur].cnt = 1; // Each new end contributes 1 occurrence initially
26
27 int p = last;
28 while (p != -1 && st[p].next[c] == -1) {
29 st[p].next[c] = cur;
30 p = st[p].link;
31 }
32 if (p == -1) {
33 st[cur].link = 0;
34 } else {
35 int q = st[p].next[c];
36 if (st[p].len + 1 == st[q].len) {
37 st[cur].link = q;
38 } else {
39 int clone = (int)st.size();
40 st.push_back(st[q]); // copy transitions, link, len, cnt (cnt will be 0 for clone)
41 st[clone].len = st[p].len + 1;
42 st[clone].cnt = 0; // clone does not represent new endpos directly
43 while (p != -1 && st[p].next[c] == q) {
44 st[p].next[c] = clone;
45 p = st[p].link;
46 }
47 st[q].link = st[cur].link = clone;
48 }
49 }
50 last = cur;
51 }
52
53 // Propagate cnt along suffix links in decreasing order of len to get |endpos|
54 void propagate_counts() {
55 int maxLen = 0;
56 for (auto &s : st) maxLen = max(maxLen, s.len);
57 vector<int> cntLen(maxLen + 1, 0);
58 for (auto &s : st) cntLen[s.len]++;
59 for (int i = 1; i <= maxLen; ++i) cntLen[i] += cntLen[i-1];
60 vector<int> order(st.size());
61 for (int i = (int)st.size() - 1; i >= 0; --i) {
62 order[--cntLen[st[i].len]] = i;
63 }
64 for (int i = (int)order.size() - 1; i >= 0; --i) {
65 int v = order[i];
66 int p = st[v].link;
67 if (p != -1) st[p].cnt += st[v].cnt;
68 }
69 }
70
71 // Count occurrences of pattern P in the text used to build the SAM.
72 long long count_occurrences(const string &P) const {
73 int v = 0;
74 for (char ch : P) {
75 int c = ch - 'a';
76 if (c < 0 || c >= 26) return 0; // out of alphabet
77 v = st[v].next[c];
78 if (v == -1) return 0;
79 }
80 return st[v].cnt; // |endpos(P)|
81 }
82};
83
84int main() {
85 ios::sync_with_stdio(false);
86 cin.tie(nullptr);
87
88 string S; int Q;
89 if (!(cin >> S >> Q)) return 0;
90 SAM sam((int)S.size());
91 for (char ch : S) sam.extend(ch);
92 sam.propagate_counts();
93
94 while (Q--) {
95 string P; cin >> P;
96 cout << sam.count_occurrences(P) << '\n';
97 }
98 return 0;
99}
100

We build a SAM for S (lowercase alphabet). Each extension sets cnt=1 for the newly created terminal state, then we propagate counts along suffix links in decreasing order of len to compute |endpos(v)| for all states. For a query pattern P, we follow transitions from the initial state; if traversal succeeds, the state’s cnt gives the total number of occurrences of P in S.

Time: O(|S| + \sum |P_i|) for Q queries; each operation is linear in input size.Space: O(|S|) states and transitions; specifically at most 2|S|-1 states with 26 integers per state.
Distinct substrings and K-th lexicographically smallest substring
1#include <bits/stdc++.h>
2using namespace std;
3
4struct SAM {
5 struct State {
6 int next[26];
7 int link;
8 int len;
9 State() : link(-1), len(0) { fill(begin(next), end(next), -1); }
10 };
11 vector<State> st;
12 int last;
13
14 SAM(int maxlen = 0) { st.reserve(2 * maxlen); st.push_back(State()); last = 0; }
15
16 void extend(char ch) {
17 int c = ch - 'a';
18 int cur = (int)st.size(); st.push_back(State());
19 st[cur].len = st[last].len + 1;
20 int p = last;
21 while (p != -1 && st[p].next[c] == -1) { st[p].next[c] = cur; p = st[p].link; }
22 if (p == -1) st[cur].link = 0;
23 else {
24 int q = st[p].next[c];
25 if (st[p].len + 1 == st[q].len) st[cur].link = q;
26 else {
27 int clone = (int)st.size(); st.push_back(st[q]);
28 st[clone].len = st[p].len + 1;
29 // keep link and transitions; no cnt needed here
30 while (p != -1 && st[p].next[c] == q) { st[p].next[c] = clone; p = st[p].link; }
31 st[q].link = st[cur].link = clone;
32 }
33 }
34 last = cur;
35 }
36
37 long long count_distinct_substrings() const {
38 long long ans = 0;
39 for (size_t v = 1; v < st.size(); ++v) ans += (st[v].len - st[st[v].link].len);
40 return ans;
41 }
42
43 // DP to count the number of distinct substrings starting from each state.
44 // dp[v] = sum over edges (1 + dp[to]). 1 counts the single-edge path.
45 vector<long long> dp;
46 long long dfs_count_paths(int v) {
47 long long &res = dp[v];
48 if (res != -1) return res;
49 res = 0;
50 for (int c = 0; c < 26; ++c) if (st[v].next[c] != -1) {
51 int u = st[v].next[c];
52 res += 1; // the substring formed by taking this edge only
53 res += dfs_count_paths(u); // plus longer substrings continuing from u
54 if (res < 0) { /* overflow guard not needed for 64-bit up to ~1e10 */ }
55 }
56 return res;
57 }
58
59 // Return the K-th lexicographically smallest substring (1-indexed). If K is invalid, return "".
60 string kth_substring(long long K) {
61 dp.assign(st.size(), -1);
62 dfs_count_paths(0); // compute dp[0] = total number of distinct substrings
63 if (K <= 0 || K > dp[0]) return string();
64 string ans;
65 int v = 0;
66 while (K > 0) {
67 for (int c = 0; c < 26; ++c) if (st[v].next[c] != -1) {
68 int u = st[v].next[c];
69 long long block = 1 + (dp[u] == -1 ? 0 : dp[u]);
70 if (K > block) {
71 K -= block; // skip all substrings starting with this edge
72 } else {
73 ans.push_back(char('a' + c));
74 K -= 1; // we used the single-edge path
75 v = u;
76 break; // continue to refine longer substrings if K > 0
77 }
78 }
79 if ((int)ans.size() == 0 && v == 0 && K > 0) {
80 // No more edges (should not happen if K was valid)
81 return string();
82 }
83 if (K == 0) break;
84 }
85 return ans;
86 }
87};
88
89int main() {
90 ios::sync_with_stdio(false);
91 cin.tie(nullptr);
92
93 string S; long long K; if (!(cin >> S >> K)) return 0;
94 SAM sam((int)S.size());
95 for (char ch : S) sam.extend(ch);
96
97 cout << sam.count_distinct_substrings() << '\n';
98 string kth = sam.kth_substring(K);
99 if (kth.empty()) cout << "-1\n"; else cout << kth << '\n';
100 return 0;
101}
102

We build a SAM and compute the number of distinct substrings via the standard sum of length gaps. To get the K-th lexicographically smallest substring, we do a DP that counts how many substrings start from each state (number of paths) and then traverse from the initial state, exploring outgoing edges in 'a'..'z' order while skipping entire blocks using the DP counts. We exclude the empty substring (paths must take at least one edge).

Time: O(|S|) build, O(|S|) for DP counting, and O(L * \sigma) for selecting where L is the answer length and \sigma=26.Space: O(|S|) for SAM; O(|S|) for the DP array.
Longest Common Substring (2 strings) and generalized multi-string LCS using SAM
1#include <bits/stdc++.h>
2using namespace std;
3
4struct SAM {
5 struct State {
6 int next[26];
7 int link;
8 int len;
9 State() : link(-1), len(0) { fill(begin(next), end(next), -1); }
10 };
11 vector<State> st;
12 int last;
13
14 SAM(int maxlen = 0) { st.reserve(2 * maxlen); st.push_back(State()); last = 0; }
15
16 void extend(char ch) {
17 int c = ch - 'a';
18 int cur = (int)st.size(); st.push_back(State());
19 st[cur].len = st[last].len + 1;
20 int p = last;
21 while (p != -1 && st[p].next[c] == -1) { st[p].next[c] = cur; p = st[p].link; }
22 if (p == -1) st[cur].link = 0;
23 else {
24 int q = st[p].next[c];
25 if (st[p].len + 1 == st[q].len) st[cur].link = q;
26 else {
27 int clone = (int)st.size(); st.push_back(st[q]);
28 st[clone].len = st[p].len + 1;
29 while (p != -1 && st[p].next[c] == q) { st[p].next[c] = clone; p = st[p].link; }
30 st[q].link = st[cur].link = clone;
31 }
32 }
33 last = cur;
34 }
35
36 // LCS length with another string T by walking over the SAM of S.
37 int lcs_two(const string &T) const {
38 int v = 0, L = 0, best = 0;
39 for (char ch : T) {
40 int c = ch - 'a';
41 if (c < 0 || c >= 26) { v = 0; L = 0; continue; }
42 if (st[v].next[c] != -1) {
43 v = st[v].next[c];
44 L++;
45 } else {
46 while (v != -1 && st[v].next[c] == -1) v = st[v].link;
47 if (v == -1) { v = 0; L = 0; continue; }
48 L = st[v].len + 1;
49 v = st[v].next[c];
50 }
51 best = max(best, L);
52 }
53 return best;
54 }
55
56 // Generalized LCS length with multiple strings: intersect per-state maximum matched lengths.
57 int lcs_multi(const vector<string> &others) const {
58 int nStates = (int)st.size();
59 vector<int> g(nStates); // global minimum over strings of their per-state max match length
60 for (int i = 0; i < nStates; ++i) g[i] = st[i].len; // initialize to len[v]
61
62 // Build order by len for suffix-link propagation
63 int maxLen = 0; for (auto &s: st) maxLen = max(maxLen, s.len);
64 vector<int> cntLen(maxLen+1), order(nStates);
65 for (auto &s: st) cntLen[s.len]++;
66 for (int i = 1; i <= maxLen; ++i) cntLen[i] += cntLen[i-1];
67 for (int i = nStates - 1; i >= 0; --i) order[--cntLen[st[i].len]] = i;
68
69 for (const string &T : others) {
70 vector<int> mx(nStates, 0);
71 int v = 0, L = 0;
72 for (char ch : T) {
73 int c = ch - 'a';
74 if (c < 0 || c >= 26) { v = 0; L = 0; continue; }
75 if (st[v].next[c] != -1) {
76 v = st[v].next[c];
77 L++;
78 } else {
79 while (v != -1 && st[v].next[c] == -1) v = st[v].link;
80 if (v == -1) { v = 0; L = 0; continue; }
81 L = st[v].len + 1;
82 v = st[v].next[c];
83 }
84 mx[v] = max(mx[v], L);
85 }
86 // Propagate mx to suffix links (decreasing len)
87 for (int i = nStates - 1; i >= 1; --i) {
88 int u = order[i];
89 int p = st[u].link;
90 if (p != -1) mx[p] = max(mx[p], min(mx[u], st[p].len));
91 }
92 // Intersect: global min with per-string mx
93 for (int vtx = 0; vtx < nStates; ++vtx) g[vtx] = min(g[vtx], mx[vtx]);
94 }
95 int ans = 0;
96 for (int v = 0; v < nStates; ++v) ans = max(ans, g[v]);
97 return ans;
98 }
99};
100
101int main() {
102 ios::sync_with_stdio(false);
103 cin.tie(nullptr);
104
105 string S; int m;
106 if (!(cin >> S >> m)) return 0;
107 SAM sam((int)S.size());
108 for (char ch : S) sam.extend(ch);
109
110 if (m == 1) {
111 string T; cin >> T;
112 cout << sam.lcs_two(T) << '\n';
113 } else {
114 vector<string> others(m);
115 for (int i = 0; i < m; ++i) cin >> others[i];
116 cout << sam.lcs_multi(others) << '\n';
117 }
118 return 0;
119}
120

We build a SAM for S, then compute LCS. For two strings, we walk T over the SAM: on success, extend the matched length; on failure, climb suffix links and shrink L to len[cur] before resuming. For multiple strings, for each T we record per-state maximum matched length during the scan (mx), propagate mx to suffix links in decreasing len order (capping by len[link]), and intersect across all strings by taking the minimum in g. The final answer is the maximum g[v].

Time: O(|S|) build. Two-string LCS: O(|T|). Multi-string: O(\sum |T_i| + |S|) with an extra O(|S|) propagation per string.Space: O(|S|) for SAM plus O(|S|) auxiliary arrays for mx/g/order.
DP on SAM: Count substrings that occur at least K times
1#include <bits/stdc++.h>
2using namespace std;
3
4struct SAM {
5 struct State {
6 int next[26];
7 int link;
8 int len;
9 long long cnt; // |endpos| after propagation
10 State() : link(-1), len(0), cnt(0) { fill(begin(next), end(next), -1); }
11 };
12 vector<State> st;
13 int last;
14
15 SAM(int maxlen = 0) { st.reserve(2 * maxlen); st.push_back(State()); last = 0; }
16
17 void extend(char ch) {
18 int c = ch - 'a';
19 int cur = (int)st.size(); st.push_back(State());
20 st[cur].len = st[last].len + 1;
21 st[cur].cnt = 1; // new end position
22 int p = last;
23 while (p != -1 && st[p].next[c] == -1) { st[p].next[c] = cur; p = st[p].link; }
24 if (p == -1) st[cur].link = 0;
25 else {
26 int q = st[p].next[c];
27 if (st[p].len + 1 == st[q].len) st[cur].link = q;
28 else {
29 int clone = (int)st.size(); st.push_back(st[q]);
30 st[clone].len = st[p].len + 1;
31 st[clone].cnt = 0; // clone doesn't add new endpos
32 while (p != -1 && st[p].next[c] == q) { st[p].next[c] = clone; p = st[p].link; }
33 st[q].link = st[cur].link = clone;
34 }
35 }
36 last = cur;
37 }
38
39 long long count_at_least_K(long long K) {
40 int maxLen = 0; for (auto &s : st) maxLen = max(maxLen, s.len);
41 vector<int> cntLen(maxLen+1), order(st.size());
42 for (auto &s : st) cntLen[s.len]++;
43 for (int i = 1; i <= maxLen; ++i) cntLen[i] += cntLen[i-1];
44 for (int i = (int)st.size() - 1; i >= 0; --i) order[--cntLen[st[i].len]] = i;
45 for (int i = (int)order.size() - 1; i >= 0; --i) {
46 int v = order[i];
47 int p = st[v].link;
48 if (p != -1) st[p].cnt += st[v].cnt; // propagate endpos sizes
49 }
50 long long ans = 0;
51 for (size_t v = 1; v < st.size(); ++v) if (st[v].cnt >= K) {
52 ans += (st[v].len - st[st[v].link].len);
53 }
54 return ans;
55 }
56};
57
58int main() {
59 ios::sync_with_stdio(false);
60 cin.tie(nullptr);
61
62 string S; long long K;
63 if (!(cin >> S >> K)) return 0;
64 SAM sam((int)S.size());
65 for (char ch : S) sam.extend(ch);
66 cout << sam.count_at_least_K(K) << '\n';
67 return 0;
68}
69

We build a SAM with initial cnt=1 for each new last state. After bucket-sorting by len, we propagate cnt from longer to shorter states along suffix links to get |endpos|. The number of substrings that occur at least K times is the sum over states with cnt[v] ≥ K of (len[v] − len[link[v]]), since all substrings represented by the length gap share the same frequency.

Time: O(|S|) for build and propagation.Space: O(|S|) for states and transitions.