Suffix Array Construction
Key Points
- â˘A suffix array stores the starting indices of all suffixes of a string in lexicographic order, enabling fast substring queries and many string operations.
- â˘The classic O(n n) doubling algorithm sorts suffixes by pairs of ranks (rank[i], rank[i+2^k]) using radix/counting sort and doubles the gap each round.
- â˘Using comparison-based sorts inside doubling raises the time to O(n n), while radix/counting sort keeps it at O(n n).
- â˘Kasaiâs algorithm builds the LCP (Longest Common Prefix) array in O(n), which speeds up many computations like distinct substrings and suffix tree emulation.
- â˘SA-IS constructs suffix arrays in O(n) using induced sorting and S/L-type classification of characters and LMS positions.
- â˘DC3 (a.k.a. skew) is another O(n) algorithm, but SA-IS is often simpler in practice and has excellent performance on competitive programming inputs.
- â˘For pattern matching, a suffix array supports O(m n) search for a pattern of length m via binary search, and can be improved with LCP/RMQ.
- â˘Always append a unique sentinel (smallest character) at the end to make suffixes comparable and ensure correctness of algorithms.
Prerequisites
- âStable counting sort / radix sort â Radix sorting of rank pairs is the core technique that keeps the doubling algorithm at O(n log n).
- âBinary search â Pattern search over the suffix array uses lower_bound/upper_bound style binary searches.
- âLexicographic string comparison â Understanding how suffixes are ordered is essential for correctness and comparisons.
- âBig-O notation and logarithms â To analyze the O(n log n) vs O(n) trade-offs and iteration doubling behavior.
- âRange Minimum Query (RMQ) / Sparse Table (optional) â RMQ over the LCP array enables O(1) LCP queries between arbitrary suffixes.
- âAlphabet compression / coordinate compression â Ensures small integer alphabets to make counting sorts linear and cache friendly.
Detailed Explanation
Tap terms for definitions01Overview
A suffix array is a compact, array-based index for a string that lists all suffixes in lexicographic (dictionary) order. Instead of storing a tree of all suffixes (like a suffix tree), it stores only the starting positions of suffixes in sorted order, which is memory efficient and cache friendly. Once built, we can perform binary search over this array to answer substring existence queries and to find occurrences of a pattern. It also serves as a foundation for computing the LCP (Longest Common Prefix) array, which summarizes the shared prefixes between neighboring suffixes in the suffix array. These two arrays together can simulate many suffix-tree operations and answer queries like the number of distinct substrings, repeated substrings, and lexicographic ranks. Construction can be done in multiple ways: the widely taught O(n \log n) doubling method; a slower O(n \log^{2} n) variant if you use comparison sorting; and linear-time approaches such as SA-IS and DC3. In contests and practical applications, O(n \log n) doubling with radix/counting sort is usually fast and straightforward to implement, while SA-IS offers top theoretical bounds and strong practical performance with careful coding.
02Intuition & Analogies
Imagine writing down every suffix of a word on separate cards: for banana, the cards are "banana", "anana", "nana", "ana", "na", and "a". If you sort these cards alphabetically, you get the order in which these suffixes would appear in a dictionary. The suffix array is simply the list of the original starting positions of these cards in that sorted dictionaryâlike noting where each dictionary entry came from in the original word. Why is this useful? Because any substring is a prefix of some suffix. So, if you can quickly find where a particular prefix would fit among the sorted suffixes, you can tell whether a pattern exists in the text and where. Binary searching among the sorted suffixes is like looking through an index at the back of a bookâjump into the middle, compare the pattern with the suffix at that point, and narrow your range in log steps. The doubling algorithmâs intuition is to avoid comparing full strings by assigning each suffix a short ID (its rank) based on its first few characters, then repeatedly refining that ID by doubling the number of characters considered: 1, 2, 4, 8, and so on. Each round combines two smaller IDs to form a bigger one, much like building bigger LEGO pieces from smaller labeled blocks. Radix (counting) sort can sort these pairs of IDs in linear time per round, making the whole build efficient. SA-IS goes further: it classifies characters into S-type and L-type depending on whether the remainder of the string to the right is lexicographically larger or smaller, identifies special LMS positions, and cleverly âinducesâ the full sorted order from a subsetâlike organizing a library by first shelving a key subset, then carefully filling in the gaps using local rules.
03Formal Definition
04When to Use
Use suffix arrays when you need fast substring queries on a static string with low memory overhead: finding if a pattern appears, counting occurrences, or listing positions (O(m \log n) per query). They are excellent for problems involving lexicographic ranks of suffixes, finding repeated substrings, computing the number of distinct substrings, and constructing the BurrowsâWheeler Transform. In competitive programming, the O(n \log n) doubling algorithm is typically the go-to solution due to its simplicity and speed; it comfortably handles strings of length up to ~2e5â1e6 depending on time limits. If n is extremely large or you are implementing a library with theoretical guarantees, consider SA-IS or DC3 to achieve O(n) worst-case time. Build the LCP array if you need: frequency of repeats, longest repeated substring, or to simulate suffix-tree-like queries (often accelerated via RMQ on LCP). If the alphabet is large (e.g., full Unicode), compress it to 1..\sigma first to improve performance and ensure radix sorts are linear in n + \sigma. Finally, if your data updates frequently, suffix arrays are less suitable; consider online structures (suffix automaton for distinct substrings/repeats, or dynamic text indexes) depending on the task.
â ď¸Common Mistakes
Common pitfalls include forgetting the sentinel or not making it strictly smaller than all characters, which breaks comparisons at string ends. Another frequent issue is mixing 0-based and 1-based indexing for ranks or SA, causing off-by-one bugs, especially when accessing rank[i+2^k] beyond the end (you must treat out-of-range as 0 or the sentinelâs class). Using comparison-based sort for pairs in the doubling algorithm leads to an unnecessary O(n \log^{2} n) time; prefer counting/radix sort with bounded ranks. On large alphabets, skipping alphabet compression can inflate memory and slow down counting sort; compress characters to a dense 1..\sigma range. In SA-IS, misclassifying S/L types (tie cases) or mishandling LMS detection leads to subtle errors; remember that S-type is defined with a right-to-left rule and equals tie inherits the next positionâs type. During induced sorting, ensure stability and correct bucket boundaries (begin vs end) when placing L vs S neighbors. For pattern search, incorrectly handling binary search bounds or not comparing safely up to string end can cause false negatives/positives; always compare up to pattern length or suffix end. Finally, when building LCP with Kasai, forgetting to decrement the running height h each iteration or not using the rank array leads to O(n^2) behavior instead of O(n).
Key Formulas
Suffix Array Definition
Explanation: The suffix array is the ordering of indices 0..n that sorts all suffixes of the string with sentinel appended. argsort means the permutation that sorts the given list.
LCP Between Neighbors
Explanation: LCP[p] stores the length of the longest common prefix of two adjacent suffixes in the suffix array. LCP[0] is defined as 0 because there is no previous suffix.
Doubling Rank Update
Explanation: At iteration k, each suffix i is represented by a pair: the class of its first 2^k characters and the class of the next 2^k characters. Sorting these pairs yields new classes for 2^{k+1} prefixes.
Time Complexity of Doubling
Explanation: Each doubling iteration sorts n items; with radix/counting sort it is linear per round across O( n) rounds, while comparison sorts add an extra n factor.
Linear-Time Construction
Explanation: Both SA-IS and DC3 achieve linear worst-case time by carefully structuring recursive or induced sorting steps that touch each position a constant number of times.
Distinct Substrings via SA/LCP
Explanation: The number of new substrings contributed by suffix SA[i] equals its length minus the overlap with its predecessor measured by LCP[i]. Summing over all suffixes gives the total.
LCP as RMQ on LCP Array
Explanation: The LCP between any two suffixes with SA positions i and j is the minimum LCP value on the interval between them. This can be answered in O(1) with an RMQ structure.
Pattern Search Cost
Explanation: Binary searching the suffix array for a pattern of length m performs O( n) comparisons, each taking up to O(m) time by character-wise comparison.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Build suffix array using the classic doubling algorithm with counting sort. 5 // Appends a sentinel '$' that is strictly smaller than all other characters. 6 // Returns SA: positions 0..n-1 of the string (including sentinel) in lexicographic order of suffixes. 7 8 vector<int> suffix_array(const string &s_in) { 9 string s = s_in; 10 if (s.empty() || s.back() != '$') s.push_back('$'); // ensure sentinel 11 int n = (int)s.size(); 12 13 // Initial sorting by single characters using counting sort 14 const int ALPHA = 256; // byte alphabet upper bound 15 vector<int> p(n), c(n), cnt(max(ALPHA, n), 0); 16 for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++; 17 for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1]; 18 for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i; 19 20 // Assign classes (ranks) 21 c[p[0]] = 0; 22 int classes = 1; 23 for (int i = 1; i < n; ++i) { 24 if (s[p[i]] != s[p[i - 1]]) ++classes; 25 c[p[i]] = classes - 1; 26 } 27 28 // Doubling: sort by 2^k length prefixes 29 vector<int> pn(n), cn(n); 30 for (int h = 0; (1 << h) < n; ++h) { 31 int len = 1 << h; 32 // Shift positions: first key is c[i], second key is c[(i+len)%n] 33 for (int i = 0; i < n; ++i) { 34 pn[i] = p[i] - len; 35 if (pn[i] < 0) pn[i] += n; 36 } 37 // Counting sort by class c 38 fill(cnt.begin(), cnt.begin() + classes, 0); 39 for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++; 40 for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1]; 41 for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i]; 42 43 // Recompute classes 44 cn[p[0]] = 0; 45 classes = 1; 46 for (int i = 1; i < n; ++i) { 47 pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]}; 48 pair<int,int> prev = {c[p[i - 1]], c[(p[i - 1] + len) % n]}; 49 if (cur != prev) ++classes; 50 cn[p[i]] = classes - 1; 51 } 52 c.swap(cn); 53 if (classes == n) break; // all ranks unique 54 } 55 56 return p; // p is the suffix array 57 } 58 59 int main() { 60 ios::sync_with_stdio(false); 61 cin.tie(nullptr); 62 63 string s = "banana"; // demo 64 auto sa = suffix_array(s); 65 66 cout << "String with sentinel: " << (s.back()=='$'?s:s+"$") << "\n"; 67 cout << "Suffix Array indices:" << '\n'; 68 for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' '); 69 70 cout << "Sorted suffixes:" << '\n'; 71 string t = s; if (t.empty() || t.back()!='$') t.push_back('$'); 72 for (int i = 0; i < (int)sa.size(); ++i) { 73 cout << setw(2) << i << ": " << t.substr(sa[i]) << '\n'; 74 } 75 return 0; 76 } 77
This program builds the suffix array using the doubling algorithm implemented as sorting cyclic shifts. It starts by sorting single characters with counting sort, assigns ranks (classes), and iteratively doubles the considered prefix length, using counting sort on the rank classes. The sentinel '$' guarantees that all suffixes are proper and comparable. The result is the array of starting indices in lexicographic order.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 vector<int> suffix_array(const string &s_in) { 5 string s = s_in; 6 if (s.empty() || s.back() != '$') s.push_back('$'); 7 int n = (int)s.size(); 8 const int ALPHA = 256; 9 vector<int> p(n), c(n), cnt(max(ALPHA, n), 0); 10 for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++; 11 for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1]; 12 for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i; 13 c[p[0]] = 0; int classes = 1; 14 for (int i = 1; i < n; ++i) { if (s[p[i]] != s[p[i-1]]) ++classes; c[p[i]] = classes - 1; } 15 vector<int> pn(n), cn(n); 16 for (int h = 0; (1 << h) < n; ++h) { 17 int len = 1 << h; 18 for (int i = 0; i < n; ++i) { pn[i] = p[i] - len; if (pn[i] < 0) pn[i] += n; } 19 fill(cnt.begin(), cnt.begin() + classes, 0); 20 for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++; 21 for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1]; 22 for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i]; 23 cn[p[0]] = 0; classes = 1; 24 for (int i = 1; i < n; ++i) { 25 pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]}; 26 pair<int,int> prev = {c[p[i-1]], c[(p[i-1] + len) % n]}; 27 if (cur != prev) ++classes; cn[p[i]] = classes - 1; 28 } 29 c.swap(cn); 30 if (classes == n) break; 31 } 32 return p; 33 } 34 35 // Kasai's algorithm: build LCP between consecutive suffixes in SA order. 36 vector<int> build_lcp(const string &s_in, const vector<int> &sa) { 37 string s = s_in; 38 if (s.empty() || s.back() != '$') s.push_back('$'); 39 int n = (int)s.size(); 40 vector<int> rank(n, 0), lcp(n, 0); 41 for (int i = 0; i < n; ++i) rank[sa[i]] = i; 42 int h = 0; 43 for (int i = 0; i < n; ++i) { 44 int r = rank[i]; 45 if (r == 0) { h = 0; continue; } 46 int j = sa[r - 1]; 47 // Extend match from previous h 48 while (i + h < n && j + h < n && s[i + h] == s[j + h]) ++h; 49 lcp[r] = h; 50 if (h > 0) --h; // Decrease by at most 1 51 } 52 return lcp; 53 } 54 55 // Compare pattern p with suffix s.substr(pos) 56 int cmp_suffix_pattern(const string &s, int pos, const string &pat) { 57 int n = (int)s.size(); 58 int m = (int)pat.size(); 59 for (int i = 0; i < m && pos + i < n; ++i) { 60 if (s[pos + i] < pat[i]) return -1; 61 if (s[pos + i] > pat[i]) return 1; 62 } 63 if (pos + m <= n) return 0; // pattern fully matched within suffix 64 return -1; // suffix ended before pattern 65 } 66 67 // Binary search over SA to find the range of matches [L, R) for pattern 68 pair<int,int> find_occurrences(const string &s_in, const vector<int> &sa, const string &pat) { 69 string s = s_in; 70 if (s.empty() || s.back() != '$') s.push_back('$'); 71 int n = (int)sa.size(); 72 int L = 0, R = n; 73 // lower_bound: first suffix >= pat 74 while (L < R) { 75 int mid = (L + R) >> 1; 76 int c = cmp_suffix_pattern(s, sa[mid], pat); 77 if (c >= 0) R = mid; else L = mid + 1; 78 } 79 int start = L; 80 L = 0; R = n; 81 // upper_bound: first suffix > pat with prefix pat 82 while (L < R) { 83 int mid = (L + R) >> 1; 84 int c = cmp_suffix_pattern(s, sa[mid], pat); 85 if (c > 0) R = mid; else L = mid + 1; 86 } 87 int end = L; 88 return {start, end}; 89 } 90 91 int main() { 92 ios::sync_with_stdio(false); 93 cin.tie(nullptr); 94 95 string s = "mississippi"; 96 auto sa = suffix_array(s); 97 auto lcp = build_lcp(s, sa); 98 99 cout << "SA indices:" << '\n'; 100 for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()? '\n' : ' '); 101 cout << "LCP array:" << '\n'; 102 for (int i = 0; i < (int)lcp.size(); ++i) cout << lcp[i] << (i+1==(int)lcp.size()? '\n' : ' '); 103 104 vector<string> queries = {"issi", "miss", "ppi", "a"}; 105 for (auto &q : queries) { 106 auto range = find_occurrences(s, sa, q); 107 cout << "Query '" << q << "' occurs at SA range [" << range.first << ", " << range.second << ")\n"; 108 for (int i = range.first; i < range.second; ++i) { 109 cout << " position " << sa[i] << '\n'; 110 } 111 } 112 return 0; 113 } 114
This code extends the suffix array with Kasaiâs O(n) LCP construction and supports substring search by binary searching over the suffix array. The comparator compares the pattern with a suffix character-by-character. The function find_occurrences returns the SA interval where the pattern would fit, which corresponds to all occurrences. The LCP array is computed using the rank array and a rolling match length to keep the overall cost linear.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // SA-IS algorithm: builds suffix array in O(n) for integer alphabet in [0, sigma]. 5 // Requirements: 6 // - Input is a vector<int> s with values in [0, sigma], ending with a unique 0 sentinel. 7 // - 0 must be the smallest symbol and occurs exactly once at the end. 8 // Reference idea: classify S/L-types, identify LMS, induced sort LMS, reduce, recurse if needed, then induce full SA. 9 10 static void induce(const vector<int>& s, int sigma, const vector<bool>& isS, const vector<int>& lms_pos, 11 vector<int>& sa) { 12 int n = (int)s.size(); 13 vector<int> bucket(sigma + 1, 0); 14 for (int x : s) bucket[x]++; 15 vector<int> head(sigma + 1, 0), tail(sigma + 1, 0); 16 // head: start index of bucket, tail: last index of bucket 17 for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1]; 18 for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1; 19 20 fill(sa.begin(), sa.end(), -1); 21 // Place LMS at bucket tails (stable, process in reverse order) 22 for (int i = (int)lms_pos.size() - 1; i >= 0; --i) { 23 int p = lms_pos[i]; 24 int b = s[p]; 25 sa[tail[b]--] = p; 26 } 27 // Induce L-type to bucket heads (scan left to right) 28 vector<int> head2 = head; // copy since we'll advance heads 29 for (int i = 0; i < n; ++i) { 30 int p = sa[i]; 31 if (p <= 0) continue; 32 int j = p - 1; 33 if (!isS[j]) { 34 int b = s[j]; 35 sa[head2[b]++] = j; 36 } 37 } 38 // Induce S-type to bucket tails (scan right to left) 39 vector<int> tail2 = tail; // copy since we'll decrement tails 40 for (int i = n - 1; i >= 0; --i) { 41 int p = sa[i]; 42 if (p <= 0) continue; 43 int j = p - 1; 44 if (isS[j]) { 45 int b = s[j]; 46 sa[tail2[b]--] = j; 47 } 48 } 49 } 50 51 static bool is_lms(int i, const vector<bool>& isS) { 52 return i > 0 && isS[i] && !isS[i - 1]; 53 } 54 55 static vector<int> SA_IS(const vector<int>& s, int sigma) { 56 int n = (int)s.size(); 57 vector<bool> isS(n, false); 58 // Classify S/L types. By convention, the sentinel at n-1 is S-type. 59 isS[n - 1] = true; 60 for (int i = n - 2; i >= 0; --i) { 61 if (s[i] < s[i + 1]) isS[i] = true; 62 else if (s[i] > s[i + 1]) isS[i] = false; 63 else isS[i] = isS[i + 1]; 64 } 65 66 // Collect LMS positions in order 67 vector<int> lms_pos; 68 lms_pos.reserve(n); 69 for (int i = 1; i < n; ++i) if (is_lms(i, isS)) lms_pos.push_back(i); 70 71 vector<int> sa(n, -1); 72 induce(s, sigma, isS, lms_pos, sa); 73 74 // Extract LMS in SA order 75 vector<int> lms_in_sa; 76 lms_in_sa.reserve(lms_pos.size()); 77 for (int p : sa) if (p != -1 && is_lms(p, isS)) lms_in_sa.push_back(p); 78 79 // Name LMS substrings 80 vector<int> name(n, -1); 81 int cur_name = 0; 82 name[n - 1] = cur_name; // sentinel LMS-equivalent base 83 84 auto equal_lms = [&](int a, int b) { 85 if (a == n - 1 || b == n - 1) return a == b; // both sentinel 86 int i = 0; 87 while (true) { 88 bool a_lms = is_lms(a + i, isS); 89 bool b_lms = is_lms(b + i, isS); 90 if (s[a + i] != s[b + i] || a_lms != b_lms) return false; 91 if (a_lms && b_lms && i > 0) return true; // both hit next LMS at same time 92 ++i; 93 } 94 }; 95 96 // Assign names to LMS-substrings 97 name[lms_in_sa[0]] = cur_name; 98 for (int i = 1; i < (int)lms_in_sa.size(); ++i) { 99 if (!equal_lms(lms_in_sa[i - 1], lms_in_sa[i])) ++cur_name; 100 name[lms_in_sa[i]] = cur_name; 101 } 102 103 int m = (int)lms_pos.size(); 104 vector<int> s1(m); 105 // Map LMS positions to compact string s1 in original order of lms_pos 106 for (int i = 0; i < m; ++i) s1[i] = name[lms_pos[i]]; 107 108 vector<int> sa1; 109 if (cur_name + 1 == m) { 110 // Already unique: sa1 is inverse permutation of s1 111 sa1.assign(m, 0); 112 for (int i = 0; i < m; ++i) sa1[s1[i]] = i; 113 } else { 114 // Recurse on s1 with alphabet size cur_name+1 115 sa1 = SA_IS(s1, cur_name); 116 } 117 118 // Reconstruct LMS order using sa1 (which gives order of lms_pos) 119 vector<int> ordered_lms; 120 ordered_lms.reserve(m); 121 for (int i = 0; i < m; ++i) ordered_lms.push_back(lms_pos[sa1[i]]); 122 123 // Induce again from ordered LMS to get full SA 124 // Prepare empty SA and place ordered LMS at bucket tails 125 vector<int> bucket(sigma + 1, 0); 126 for (int x : s) bucket[x]++; 127 vector<int> head(sigma + 1, 0), tail(sigma + 1, 0); 128 for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1]; 129 for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1; 130 131 fill(sa.begin(), sa.end(), -1); 132 for (int i = (int)ordered_lms.size() - 1; i >= 0; --i) { 133 int p = ordered_lms[i]; 134 int b = s[p]; 135 sa[tail[b]--] = p; 136 } 137 138 // Final induction L then S 139 vector<bool> isS2 = isS; 140 // Induce L-type 141 vector<int> head2 = head; 142 for (int i = 0; i < n; ++i) { 143 int p = sa[i]; 144 if (p <= 0) continue; 145 int j = p - 1; 146 if (!isS2[j]) { 147 int b = s[j]; 148 sa[head2[b]++] = j; 149 } 150 } 151 // Induce S-type 152 vector<int> tail2 = tail; 153 for (int i = n - 1; i >= 0; --i) { 154 int p = sa[i]; 155 if (p <= 0) continue; 156 int j = p - 1; 157 if (isS2[j]) { 158 int b = s[j]; 159 sa[tail2[b]--] = j; 160 } 161 } 162 163 return sa; 164 } 165 166 // Helper: compress characters of a string to 1..sigma and append 0 sentinel. 167 static vector<int> to_int_alphabet_with_sentinel(const string& s) { 168 vector<unsigned char> v(s.begin(), s.end()); 169 vector<unsigned char> uniq = v; 170 sort(uniq.begin(), uniq.end()); uniq.erase(unique(uniq.begin(), uniq.end()), uniq.end()); 171 vector<int> res; res.reserve(s.size() + 1); 172 // Map smallest char to 1 (since 0 is reserved for sentinel) 173 for (auto ch : v) { 174 int id = (int)(lower_bound(uniq.begin(), uniq.end(), ch) - uniq.begin()) + 1; 175 res.push_back(id); 176 } 177 res.push_back(0); // sentinel 178 return res; 179 } 180 181 int main() { 182 ios::sync_with_stdio(false); 183 cin.tie(nullptr); 184 185 string s = "banana"; // demo 186 auto a = to_int_alphabet_with_sentinel(s); 187 int sigma = 0; for (int x : a) sigma = max(sigma, x); 188 auto sa = SA_IS(a, sigma); 189 190 cout << "SA-IS Suffix Array (with sentinel at end):\n"; 191 for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' '); 192 193 // Print sorted suffixes for verification 194 string t = s + string("\0", 1); // show sentinel as NUL; for display, we skip it 195 cout << "Sorted suffixes (displaying without NUL):\n"; 196 for (int i = 0; i < (int)sa.size(); ++i) { 197 int pos = sa[i]; 198 if (pos == (int)s.size()) cout << "$" << '\n'; 199 else cout << s.substr(pos) << '\n'; 200 } 201 return 0; 202 } 203
This is a complete SA-IS implementation for integer alphabets: it classifies S/L types, collects LMS positions, induces a partial order, names LMS substrings to form a reduced problem, recursively solves it if necessary, and then induces the full suffix array. A helper compresses a byte string to integers 1..Ď and appends sentinel 0 to satisfy SA-ISâs requirements. Each stage processes elements with linear work, delivering O(n) time in theory and practice.