⚙️AlgorithmAdvanced

Suffix Array Construction

Key Points

  • •
    A suffix array stores the starting indices of all suffixes of a string in lexicographic order, enabling fast substring queries and many string operations.
  • •
    The classic O(n n) doubling algorithm sorts suffixes by pairs of ranks (rank[i], rank[i+2^k]) using radix/counting sort and doubles the gap each round.
  • •
    Using comparison-based sorts inside doubling raises the time to O(n n), while radix/counting sort keeps it at O(n n).
  • •
    Kasai’s algorithm builds the LCP (Longest Common Prefix) array in O(n), which speeds up many computations like distinct substrings and suffix tree emulation.
  • •
    SA-IS constructs suffix arrays in O(n) using induced sorting and S/L-type classification of characters and LMS positions.
  • •
    DC3 (a.k.a. skew) is another O(n) algorithm, but SA-IS is often simpler in practice and has excellent performance on competitive programming inputs.
  • •
    For pattern matching, a suffix array supports O(m n) search for a pattern of length m via binary search, and can be improved with LCP/RMQ.
  • •
    Always append a unique sentinel (smallest character) at the end to make suffixes comparable and ensure correctness of algorithms.

Prerequisites

  • →Stable counting sort / radix sort — Radix sorting of rank pairs is the core technique that keeps the doubling algorithm at O(n log n).
  • →Binary search — Pattern search over the suffix array uses lower_bound/upper_bound style binary searches.
  • →Lexicographic string comparison — Understanding how suffixes are ordered is essential for correctness and comparisons.
  • →Big-O notation and logarithms — To analyze the O(n log n) vs O(n) trade-offs and iteration doubling behavior.
  • →Range Minimum Query (RMQ) / Sparse Table (optional) — RMQ over the LCP array enables O(1) LCP queries between arbitrary suffixes.
  • →Alphabet compression / coordinate compression — Ensures small integer alphabets to make counting sorts linear and cache friendly.

Detailed Explanation

Tap terms for definitions

01Overview

A suffix array is a compact, array-based index for a string that lists all suffixes in lexicographic (dictionary) order. Instead of storing a tree of all suffixes (like a suffix tree), it stores only the starting positions of suffixes in sorted order, which is memory efficient and cache friendly. Once built, we can perform binary search over this array to answer substring existence queries and to find occurrences of a pattern. It also serves as a foundation for computing the LCP (Longest Common Prefix) array, which summarizes the shared prefixes between neighboring suffixes in the suffix array. These two arrays together can simulate many suffix-tree operations and answer queries like the number of distinct substrings, repeated substrings, and lexicographic ranks. Construction can be done in multiple ways: the widely taught O(n \log n) doubling method; a slower O(n \log^{2} n) variant if you use comparison sorting; and linear-time approaches such as SA-IS and DC3. In contests and practical applications, O(n \log n) doubling with radix/counting sort is usually fast and straightforward to implement, while SA-IS offers top theoretical bounds and strong practical performance with careful coding.

02Intuition & Analogies

Imagine writing down every suffix of a word on separate cards: for banana, the cards are "banana", "anana", "nana", "ana", "na", and "a". If you sort these cards alphabetically, you get the order in which these suffixes would appear in a dictionary. The suffix array is simply the list of the original starting positions of these cards in that sorted dictionary—like noting where each dictionary entry came from in the original word. Why is this useful? Because any substring is a prefix of some suffix. So, if you can quickly find where a particular prefix would fit among the sorted suffixes, you can tell whether a pattern exists in the text and where. Binary searching among the sorted suffixes is like looking through an index at the back of a book—jump into the middle, compare the pattern with the suffix at that point, and narrow your range in log steps. The doubling algorithm’s intuition is to avoid comparing full strings by assigning each suffix a short ID (its rank) based on its first few characters, then repeatedly refining that ID by doubling the number of characters considered: 1, 2, 4, 8, and so on. Each round combines two smaller IDs to form a bigger one, much like building bigger LEGO pieces from smaller labeled blocks. Radix (counting) sort can sort these pairs of IDs in linear time per round, making the whole build efficient. SA-IS goes further: it classifies characters into S-type and L-type depending on whether the remainder of the string to the right is lexicographically larger or smaller, identifies special LMS positions, and cleverly “induces” the full sorted order from a subset—like organizing a library by first shelving a key subset, then carefully filling in the gaps using local rules.

03Formal Definition

Given a string S of length n over an ordered alphabet, append a unique sentinel character $ (with value smaller than any other character) to obtain S' of length n+1. The suffix array SA of S' is a permutation of indices 0..n such that S'[SA[0]..] < S'[SA[1]..] < ... < S'[SA[n]..] in lexicographic order. Define the rank array rk where rk[i] is the position of suffix i in SA, i.e., rk[SA[p]] = p. The LCP array LCP is defined by LCP[0] = 0 and, for p > 0, LCP[p] = lcp(S'[SA[p-1]..], S'[SA[p]..]), the length of the longest common prefix of consecutive suffixes in SA order. In the doubling method, one maintains an integer class (rank) c^{(k)}[i] representing the equivalence class of the length-2^k prefix of suffix i. Then c^{(k+1)} is obtained by stable sorting indices i by the pair (c^{(k)}[i], c^{(k)}[i+2^k]) and reassigning new classes. When classes become unique (i.e., max class = n), the order is final. SA-IS defines S-type and L-type positions: i is S-type if S'[i..] < S'[i+1..] or S'[i] = S'[i+1] and i+1 is S-type; otherwise i is L-type. LMS (Leftmost S-type) positions are S-type positions i such that i-1 is L-type. By induced sorting, placing LMS-substrings in correct order and then inducing L- and S-type neighbors yields the full SA in linear time.

04When to Use

Use suffix arrays when you need fast substring queries on a static string with low memory overhead: finding if a pattern appears, counting occurrences, or listing positions (O(m \log n) per query). They are excellent for problems involving lexicographic ranks of suffixes, finding repeated substrings, computing the number of distinct substrings, and constructing the Burrows–Wheeler Transform. In competitive programming, the O(n \log n) doubling algorithm is typically the go-to solution due to its simplicity and speed; it comfortably handles strings of length up to ~2e5–1e6 depending on time limits. If n is extremely large or you are implementing a library with theoretical guarantees, consider SA-IS or DC3 to achieve O(n) worst-case time. Build the LCP array if you need: frequency of repeats, longest repeated substring, or to simulate suffix-tree-like queries (often accelerated via RMQ on LCP). If the alphabet is large (e.g., full Unicode), compress it to 1..\sigma first to improve performance and ensure radix sorts are linear in n + \sigma. Finally, if your data updates frequently, suffix arrays are less suitable; consider online structures (suffix automaton for distinct substrings/repeats, or dynamic text indexes) depending on the task.

⚠️Common Mistakes

Common pitfalls include forgetting the sentinel or not making it strictly smaller than all characters, which breaks comparisons at string ends. Another frequent issue is mixing 0-based and 1-based indexing for ranks or SA, causing off-by-one bugs, especially when accessing rank[i+2^k] beyond the end (you must treat out-of-range as 0 or the sentinel’s class). Using comparison-based sort for pairs in the doubling algorithm leads to an unnecessary O(n \log^{2} n) time; prefer counting/radix sort with bounded ranks. On large alphabets, skipping alphabet compression can inflate memory and slow down counting sort; compress characters to a dense 1..\sigma range. In SA-IS, misclassifying S/L types (tie cases) or mishandling LMS detection leads to subtle errors; remember that S-type is defined with a right-to-left rule and equals tie inherits the next position’s type. During induced sorting, ensure stability and correct bucket boundaries (begin vs end) when placing L vs S neighbors. For pattern search, incorrectly handling binary search bounds or not comparing safely up to string end can cause false negatives/positives; always compare up to pattern length or suffix end. Finally, when building LCP with Kasai, forgetting to decrement the running height h each iteration or not using the rank array leads to O(n^2) behavior instead of O(n).

Key Formulas

Suffix Array Definition

Explanation: The suffix array is the ordering of indices 0..n that sorts all suffixes of the string with sentinel appended. argsort means the permutation that sorts the given list.

LCP Between Neighbors

Explanation: LCP[p] stores the length of the longest common prefix of two adjacent suffixes in the suffix array. LCP[0] is defined as 0 because there is no previous suffix.

Doubling Rank Update

Explanation: At iteration k, each suffix i is represented by a pair: the class of its first 2^k characters and the class of the next 2^k characters. Sorting these pairs yields new classes for 2^{k+1} prefixes.

Time Complexity of Doubling

Explanation: Each doubling iteration sorts n items; with radix/counting sort it is linear per round across O( n) rounds, while comparison sorts add an extra n factor.

Linear-Time Construction

Explanation: Both SA-IS and DC3 achieve linear worst-case time by carefully structuring recursive or induced sorting steps that touch each position a constant number of times.

Distinct Substrings via SA/LCP

Explanation: The number of new substrings contributed by suffix SA[i] equals its length minus the overlap with its predecessor measured by LCP[i]. Summing over all suffixes gives the total.

LCP as RMQ on LCP Array

Explanation: The LCP between any two suffixes with SA positions i and j is the minimum LCP value on the interval between them. This can be answered in O(1) with an RMQ structure.

Pattern Search Cost

Explanation: Binary searching the suffix array for a pattern of length m performs O( n) comparisons, each taking up to O(m) time by character-wise comparison.

Complexity Analysis

Let n be the length of the string after appending a sentinel (so original length is n-1). In the doubling algorithm, we maintain integer ranks in [0, n). At iteration k, we sort n pairs ([i], [i+2^k]) using stable radix/counting sort, which runs in O(n + ) time with = O(n) because ranks are bounded by n. The number of iterations is O( n) since prefix length doubles each round (1, 2, 4, ...). Therefore time is O(n n) and space is O(n) for SA, rank arrays, and temporary buffers. If a comparison sort like std::sort is used directly on string indices with a comparator that inspects O(1) pair ranks, each round costs O(n n), yielding O(n n) total. Kasai’s algorithm to compute LCP is O(n) time and O(n) space, using the rank array and a running height that decreases by at most 1 per step. For pattern search, binary search over SA does O( n) comparisons, each comparing up to m characters, giving O(m n) time and O(1) extra space per query; reporting k occurrences adds O(k). SA-IS achieves O(n) time by classifying S/L types, sorting only LMS substrings first, compressing them to a smaller problem, and inducing the full order; each stage processes each character a constant number of times. Its space is O(n + ) for buckets, types, and auxiliary arrays; implementations often use O(n) extra space. Practically, the O(n n) doubling with counting sort is highly optimized and fast, while SA-IS provides theoretical linear bounds and excellent performance with a slightly more complex implementation.

Code Examples

Suffix Array in O(n log n) using doubling with counting sort (cyclic shifts method)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Build suffix array using the classic doubling algorithm with counting sort.
5// Appends a sentinel '$' that is strictly smaller than all other characters.
6// Returns SA: positions 0..n-1 of the string (including sentinel) in lexicographic order of suffixes.
7
8vector<int> suffix_array(const string &s_in) {
9 string s = s_in;
10 if (s.empty() || s.back() != '$') s.push_back('$'); // ensure sentinel
11 int n = (int)s.size();
12
13 // Initial sorting by single characters using counting sort
14 const int ALPHA = 256; // byte alphabet upper bound
15 vector<int> p(n), c(n), cnt(max(ALPHA, n), 0);
16 for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++;
17 for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1];
18 for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i;
19
20 // Assign classes (ranks)
21 c[p[0]] = 0;
22 int classes = 1;
23 for (int i = 1; i < n; ++i) {
24 if (s[p[i]] != s[p[i - 1]]) ++classes;
25 c[p[i]] = classes - 1;
26 }
27
28 // Doubling: sort by 2^k length prefixes
29 vector<int> pn(n), cn(n);
30 for (int h = 0; (1 << h) < n; ++h) {
31 int len = 1 << h;
32 // Shift positions: first key is c[i], second key is c[(i+len)%n]
33 for (int i = 0; i < n; ++i) {
34 pn[i] = p[i] - len;
35 if (pn[i] < 0) pn[i] += n;
36 }
37 // Counting sort by class c
38 fill(cnt.begin(), cnt.begin() + classes, 0);
39 for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++;
40 for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1];
41 for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i];
42
43 // Recompute classes
44 cn[p[0]] = 0;
45 classes = 1;
46 for (int i = 1; i < n; ++i) {
47 pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]};
48 pair<int,int> prev = {c[p[i - 1]], c[(p[i - 1] + len) % n]};
49 if (cur != prev) ++classes;
50 cn[p[i]] = classes - 1;
51 }
52 c.swap(cn);
53 if (classes == n) break; // all ranks unique
54 }
55
56 return p; // p is the suffix array
57}
58
59int main() {
60 ios::sync_with_stdio(false);
61 cin.tie(nullptr);
62
63 string s = "banana"; // demo
64 auto sa = suffix_array(s);
65
66 cout << "String with sentinel: " << (s.back()=='$'?s:s+"$") << "\n";
67 cout << "Suffix Array indices:" << '\n';
68 for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' ');
69
70 cout << "Sorted suffixes:" << '\n';
71 string t = s; if (t.empty() || t.back()!='$') t.push_back('$');
72 for (int i = 0; i < (int)sa.size(); ++i) {
73 cout << setw(2) << i << ": " << t.substr(sa[i]) << '\n';
74 }
75 return 0;
76}
77

This program builds the suffix array using the doubling algorithm implemented as sorting cyclic shifts. It starts by sorting single characters with counting sort, assigns ranks (classes), and iteratively doubles the considered prefix length, using counting sort on the rank classes. The sentinel '$' guarantees that all suffixes are proper and comparable. The result is the array of starting indices in lexicographic order.

Time: O(n log n)Space: O(n)
Kasai’s LCP array and pattern search via binary search on the suffix array
1#include <bits/stdc++.h>
2using namespace std;
3
4vector<int> suffix_array(const string &s_in) {
5 string s = s_in;
6 if (s.empty() || s.back() != '$') s.push_back('$');
7 int n = (int)s.size();
8 const int ALPHA = 256;
9 vector<int> p(n), c(n), cnt(max(ALPHA, n), 0);
10 for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++;
11 for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1];
12 for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i;
13 c[p[0]] = 0; int classes = 1;
14 for (int i = 1; i < n; ++i) { if (s[p[i]] != s[p[i-1]]) ++classes; c[p[i]] = classes - 1; }
15 vector<int> pn(n), cn(n);
16 for (int h = 0; (1 << h) < n; ++h) {
17 int len = 1 << h;
18 for (int i = 0; i < n; ++i) { pn[i] = p[i] - len; if (pn[i] < 0) pn[i] += n; }
19 fill(cnt.begin(), cnt.begin() + classes, 0);
20 for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++;
21 for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1];
22 for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i];
23 cn[p[0]] = 0; classes = 1;
24 for (int i = 1; i < n; ++i) {
25 pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]};
26 pair<int,int> prev = {c[p[i-1]], c[(p[i-1] + len) % n]};
27 if (cur != prev) ++classes; cn[p[i]] = classes - 1;
28 }
29 c.swap(cn);
30 if (classes == n) break;
31 }
32 return p;
33}
34
35// Kasai's algorithm: build LCP between consecutive suffixes in SA order.
36vector<int> build_lcp(const string &s_in, const vector<int> &sa) {
37 string s = s_in;
38 if (s.empty() || s.back() != '$') s.push_back('$');
39 int n = (int)s.size();
40 vector<int> rank(n, 0), lcp(n, 0);
41 for (int i = 0; i < n; ++i) rank[sa[i]] = i;
42 int h = 0;
43 for (int i = 0; i < n; ++i) {
44 int r = rank[i];
45 if (r == 0) { h = 0; continue; }
46 int j = sa[r - 1];
47 // Extend match from previous h
48 while (i + h < n && j + h < n && s[i + h] == s[j + h]) ++h;
49 lcp[r] = h;
50 if (h > 0) --h; // Decrease by at most 1
51 }
52 return lcp;
53}
54
55// Compare pattern p with suffix s.substr(pos)
56int cmp_suffix_pattern(const string &s, int pos, const string &pat) {
57 int n = (int)s.size();
58 int m = (int)pat.size();
59 for (int i = 0; i < m && pos + i < n; ++i) {
60 if (s[pos + i] < pat[i]) return -1;
61 if (s[pos + i] > pat[i]) return 1;
62 }
63 if (pos + m <= n) return 0; // pattern fully matched within suffix
64 return -1; // suffix ended before pattern
65}
66
67// Binary search over SA to find the range of matches [L, R) for pattern
68pair<int,int> find_occurrences(const string &s_in, const vector<int> &sa, const string &pat) {
69 string s = s_in;
70 if (s.empty() || s.back() != '$') s.push_back('$');
71 int n = (int)sa.size();
72 int L = 0, R = n;
73 // lower_bound: first suffix >= pat
74 while (L < R) {
75 int mid = (L + R) >> 1;
76 int c = cmp_suffix_pattern(s, sa[mid], pat);
77 if (c >= 0) R = mid; else L = mid + 1;
78 }
79 int start = L;
80 L = 0; R = n;
81 // upper_bound: first suffix > pat with prefix pat
82 while (L < R) {
83 int mid = (L + R) >> 1;
84 int c = cmp_suffix_pattern(s, sa[mid], pat);
85 if (c > 0) R = mid; else L = mid + 1;
86 }
87 int end = L;
88 return {start, end};
89}
90
91int main() {
92 ios::sync_with_stdio(false);
93 cin.tie(nullptr);
94
95 string s = "mississippi";
96 auto sa = suffix_array(s);
97 auto lcp = build_lcp(s, sa);
98
99 cout << "SA indices:" << '\n';
100 for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()? '\n' : ' ');
101 cout << "LCP array:" << '\n';
102 for (int i = 0; i < (int)lcp.size(); ++i) cout << lcp[i] << (i+1==(int)lcp.size()? '\n' : ' ');
103
104 vector<string> queries = {"issi", "miss", "ppi", "a"};
105 for (auto &q : queries) {
106 auto range = find_occurrences(s, sa, q);
107 cout << "Query '" << q << "' occurs at SA range [" << range.first << ", " << range.second << ")\n";
108 for (int i = range.first; i < range.second; ++i) {
109 cout << " position " << sa[i] << '\n';
110 }
111 }
112 return 0;
113}
114

This code extends the suffix array with Kasai’s O(n) LCP construction and supports substring search by binary searching over the suffix array. The comparator compares the pattern with a suffix character-by-character. The function find_occurrences returns the SA interval where the pattern would fit, which corresponds to all occurrences. The LCP array is computed using the rank array and a rolling match length to keep the overall cost linear.

Time: SA: O(n log n), LCP: O(n), query: O(m log n + occ)Space: O(n)
SA-IS (Induced Sorting) linear-time suffix array construction
1#include <bits/stdc++.h>
2using namespace std;
3
4// SA-IS algorithm: builds suffix array in O(n) for integer alphabet in [0, sigma].
5// Requirements:
6// - Input is a vector<int> s with values in [0, sigma], ending with a unique 0 sentinel.
7// - 0 must be the smallest symbol and occurs exactly once at the end.
8// Reference idea: classify S/L-types, identify LMS, induced sort LMS, reduce, recurse if needed, then induce full SA.
9
10static void induce(const vector<int>& s, int sigma, const vector<bool>& isS, const vector<int>& lms_pos,
11 vector<int>& sa) {
12 int n = (int)s.size();
13 vector<int> bucket(sigma + 1, 0);
14 for (int x : s) bucket[x]++;
15 vector<int> head(sigma + 1, 0), tail(sigma + 1, 0);
16 // head: start index of bucket, tail: last index of bucket
17 for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1];
18 for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1;
19
20 fill(sa.begin(), sa.end(), -1);
21 // Place LMS at bucket tails (stable, process in reverse order)
22 for (int i = (int)lms_pos.size() - 1; i >= 0; --i) {
23 int p = lms_pos[i];
24 int b = s[p];
25 sa[tail[b]--] = p;
26 }
27 // Induce L-type to bucket heads (scan left to right)
28 vector<int> head2 = head; // copy since we'll advance heads
29 for (int i = 0; i < n; ++i) {
30 int p = sa[i];
31 if (p <= 0) continue;
32 int j = p - 1;
33 if (!isS[j]) {
34 int b = s[j];
35 sa[head2[b]++] = j;
36 }
37 }
38 // Induce S-type to bucket tails (scan right to left)
39 vector<int> tail2 = tail; // copy since we'll decrement tails
40 for (int i = n - 1; i >= 0; --i) {
41 int p = sa[i];
42 if (p <= 0) continue;
43 int j = p - 1;
44 if (isS[j]) {
45 int b = s[j];
46 sa[tail2[b]--] = j;
47 }
48 }
49}
50
51static bool is_lms(int i, const vector<bool>& isS) {
52 return i > 0 && isS[i] && !isS[i - 1];
53}
54
55static vector<int> SA_IS(const vector<int>& s, int sigma) {
56 int n = (int)s.size();
57 vector<bool> isS(n, false);
58 // Classify S/L types. By convention, the sentinel at n-1 is S-type.
59 isS[n - 1] = true;
60 for (int i = n - 2; i >= 0; --i) {
61 if (s[i] < s[i + 1]) isS[i] = true;
62 else if (s[i] > s[i + 1]) isS[i] = false;
63 else isS[i] = isS[i + 1];
64 }
65
66 // Collect LMS positions in order
67 vector<int> lms_pos;
68 lms_pos.reserve(n);
69 for (int i = 1; i < n; ++i) if (is_lms(i, isS)) lms_pos.push_back(i);
70
71 vector<int> sa(n, -1);
72 induce(s, sigma, isS, lms_pos, sa);
73
74 // Extract LMS in SA order
75 vector<int> lms_in_sa;
76 lms_in_sa.reserve(lms_pos.size());
77 for (int p : sa) if (p != -1 && is_lms(p, isS)) lms_in_sa.push_back(p);
78
79 // Name LMS substrings
80 vector<int> name(n, -1);
81 int cur_name = 0;
82 name[n - 1] = cur_name; // sentinel LMS-equivalent base
83
84 auto equal_lms = [&](int a, int b) {
85 if (a == n - 1 || b == n - 1) return a == b; // both sentinel
86 int i = 0;
87 while (true) {
88 bool a_lms = is_lms(a + i, isS);
89 bool b_lms = is_lms(b + i, isS);
90 if (s[a + i] != s[b + i] || a_lms != b_lms) return false;
91 if (a_lms && b_lms && i > 0) return true; // both hit next LMS at same time
92 ++i;
93 }
94 };
95
96 // Assign names to LMS-substrings
97 name[lms_in_sa[0]] = cur_name;
98 for (int i = 1; i < (int)lms_in_sa.size(); ++i) {
99 if (!equal_lms(lms_in_sa[i - 1], lms_in_sa[i])) ++cur_name;
100 name[lms_in_sa[i]] = cur_name;
101 }
102
103 int m = (int)lms_pos.size();
104 vector<int> s1(m);
105 // Map LMS positions to compact string s1 in original order of lms_pos
106 for (int i = 0; i < m; ++i) s1[i] = name[lms_pos[i]];
107
108 vector<int> sa1;
109 if (cur_name + 1 == m) {
110 // Already unique: sa1 is inverse permutation of s1
111 sa1.assign(m, 0);
112 for (int i = 0; i < m; ++i) sa1[s1[i]] = i;
113 } else {
114 // Recurse on s1 with alphabet size cur_name+1
115 sa1 = SA_IS(s1, cur_name);
116 }
117
118 // Reconstruct LMS order using sa1 (which gives order of lms_pos)
119 vector<int> ordered_lms;
120 ordered_lms.reserve(m);
121 for (int i = 0; i < m; ++i) ordered_lms.push_back(lms_pos[sa1[i]]);
122
123 // Induce again from ordered LMS to get full SA
124 // Prepare empty SA and place ordered LMS at bucket tails
125 vector<int> bucket(sigma + 1, 0);
126 for (int x : s) bucket[x]++;
127 vector<int> head(sigma + 1, 0), tail(sigma + 1, 0);
128 for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1];
129 for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1;
130
131 fill(sa.begin(), sa.end(), -1);
132 for (int i = (int)ordered_lms.size() - 1; i >= 0; --i) {
133 int p = ordered_lms[i];
134 int b = s[p];
135 sa[tail[b]--] = p;
136 }
137
138 // Final induction L then S
139 vector<bool> isS2 = isS;
140 // Induce L-type
141 vector<int> head2 = head;
142 for (int i = 0; i < n; ++i) {
143 int p = sa[i];
144 if (p <= 0) continue;
145 int j = p - 1;
146 if (!isS2[j]) {
147 int b = s[j];
148 sa[head2[b]++] = j;
149 }
150 }
151 // Induce S-type
152 vector<int> tail2 = tail;
153 for (int i = n - 1; i >= 0; --i) {
154 int p = sa[i];
155 if (p <= 0) continue;
156 int j = p - 1;
157 if (isS2[j]) {
158 int b = s[j];
159 sa[tail2[b]--] = j;
160 }
161 }
162
163 return sa;
164}
165
166// Helper: compress characters of a string to 1..sigma and append 0 sentinel.
167static vector<int> to_int_alphabet_with_sentinel(const string& s) {
168 vector<unsigned char> v(s.begin(), s.end());
169 vector<unsigned char> uniq = v;
170 sort(uniq.begin(), uniq.end()); uniq.erase(unique(uniq.begin(), uniq.end()), uniq.end());
171 vector<int> res; res.reserve(s.size() + 1);
172 // Map smallest char to 1 (since 0 is reserved for sentinel)
173 for (auto ch : v) {
174 int id = (int)(lower_bound(uniq.begin(), uniq.end(), ch) - uniq.begin()) + 1;
175 res.push_back(id);
176 }
177 res.push_back(0); // sentinel
178 return res;
179}
180
181int main() {
182 ios::sync_with_stdio(false);
183 cin.tie(nullptr);
184
185 string s = "banana"; // demo
186 auto a = to_int_alphabet_with_sentinel(s);
187 int sigma = 0; for (int x : a) sigma = max(sigma, x);
188 auto sa = SA_IS(a, sigma);
189
190 cout << "SA-IS Suffix Array (with sentinel at end):\n";
191 for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' ');
192
193 // Print sorted suffixes for verification
194 string t = s + string("\0", 1); // show sentinel as NUL; for display, we skip it
195 cout << "Sorted suffixes (displaying without NUL):\n";
196 for (int i = 0; i < (int)sa.size(); ++i) {
197 int pos = sa[i];
198 if (pos == (int)s.size()) cout << "$" << '\n';
199 else cout << s.substr(pos) << '\n';
200 }
201 return 0;
202}
203

This is a complete SA-IS implementation for integer alphabets: it classifies S/L types, collects LMS positions, induces a partial order, names LMS substrings to form a reduced problem, recursively solves it if necessary, and then induces the full suffix array. A helper compresses a byte string to integers 1..σ and appends sentinel 0 to satisfy SA-IS’s requirements. Each stage processes elements with linear work, delivering O(n) time in theory and practice.

Time: O(n)Space: O(n + \sigma)