⚙️AlgorithmAdvanced

Suffix Array Construction

Key Points

•
A suffix array stores the starting indices of all suffixes of a string in lexicographic order, enabling fast substring queries and many string operations.
•
The classic O(n $lo g$ n) doubling algorithm sorts suffixes by pairs of ranks (rank[i], rank[i+2^k]) using radix/counting sort and doubles the gap each round.
•
Using comparison-based sorts inside doubling raises the time to O(n $lo g^{2}$ n), while radix/counting sort keeps it at O(n $lo g$ n).
•
Kasai’s algorithm builds the LCP (Longest Common Prefix) array in O(n), which speeds up many computations like distinct substrings and suffix tree emulation.
•
SA-IS constructs suffix arrays in O(n) using induced sorting and S/L-type classification of characters and LMS positions.
•
DC3 (a.k.a. skew) is another O(n) algorithm, but SA-IS is often simpler in practice and has excellent performance on competitive programming inputs.
•
For pattern matching, a suffix array supports O(m $lo g$ n) search for a pattern of length m via binary search, and can be improved with LCP/RMQ.
•
Always append a unique sentinel (smallest character) at the end to make suffixes comparable and ensure correctness of algorithms.

Prerequisites

→Stable counting sort / radix sort — Radix sorting of rank pairs is the core technique that keeps the doubling algorithm at O(n log n).
→Binary search — Pattern search over the suffix array uses lower_bound/upper_bound style binary searches.
→Lexicographic string comparison — Understanding how suffixes are ordered is essential for correctness and comparisons.
→Big-O notation and logarithms — To analyze the O(n log n) vs O(n) trade-offs and iteration doubling behavior.
→Range Minimum Query (RMQ) / Sparse Table (optional) — RMQ over the LCP array enables O(1) LCP queries between arbitrary suffixes.
→Alphabet compression / coordinate compression — Ensures small integer alphabets to make counting sorts linear and cache friendly.

Detailed Explanation

Tap terms for definitions

01Overview

A suffix array is a compact, array-based index for a string that lists all suffixes in lexicographic (dictionary) order. Instead of storing a tree of all suffixes (like a suffix tree), it stores only the starting positions of suffixes in sorted order, which is memory efficient and cache friendly. Once built, we can perform binary search over this array to answer substring existence queries and to find occurrences of a pattern. It also serves as a foundation for computing the LCP (Longest Common Prefix) array, which summarizes the shared prefixes between neighboring suffixes in the suffix array. These two arrays together can simulate many suffix-tree operations and answer queries like the number of distinct substrings, repeated substrings, and lexicographic ranks. Construction can be done in multiple ways: the widely taught O(n \log n) doubling method; a slower O(n \log^{2} n) variant if you use comparison sorting; and linear-time approaches such as SA-IS and DC3. In contests and practical applications, O(n \log n) doubling with radix/counting sort is usually fast and straightforward to implement, while SA-IS offers top theoretical bounds and strong practical performance with careful coding.

02Intuition & Analogies

Imagine writing down every suffix of a word on separate cards: for banana, the cards are "banana", "anana", "nana", "ana", "na", and "a". If you sort these cards alphabetically, you get the order in which these suffixes would appear in a dictionary. The suffix array is simply the list of the original starting positions of these cards in that sorted dictionary—like noting where each dictionary entry came from in the original word. Why is this useful? Because any substring is a prefix of some suffix. So, if you can quickly find where a particular prefix would fit among the sorted suffixes, you can tell whether a pattern exists in the text and where. Binary searching among the sorted suffixes is like looking through an index at the back of a book—jump into the middle, compare the pattern with the suffix at that point, and narrow your range in log steps. The doubling algorithm’s intuition is to avoid comparing full strings by assigning each suffix a short ID (its rank) based on its first few characters, then repeatedly refining that ID by doubling the number of characters considered: 1, 2, 4, 8, and so on. Each round combines two smaller IDs to form a bigger one, much like building bigger LEGO pieces from smaller labeled blocks. Radix (counting) sort can sort these pairs of IDs in linear time per round, making the whole build efficient. SA-IS goes further: it classifies characters into S-type and L-type depending on whether the remainder of the string to the right is lexicographically larger or smaller, identifies special LMS positions, and cleverly “induces” the full sorted order from a subset—like organizing a library by first shelving a key subset, then carefully filling in the gaps using local rules.

03Formal Definition

Given a string S of length n over an ordered alphabet, append a unique sentinel character $ (with value smaller than any other character) to obtain S' of length n+1. The suffix array SA of S' is a permutation of indices 0..n such that S'[SA[0]..] < S'[SA[1]..] < ... < S'[SA[n]..] in lexicographic order. Define the rank array rk where rk[i] is the position of suffix i in SA, i.e., rk[SA[p]] = p. The LCP array LCP is defined by LCP[0] = 0 and, for p > 0, LCP[p] = lcp(S'[SA[p-1]..], S'[SA[p]..]), the length of the longest common prefix of consecutive suffixes in SA order. In the doubling method, one maintains an integer class (rank) c^{(k)}[i] representing the equivalence class of the length-2^k prefix of suffix i. Then c^{(k+1)} is obtained by stable sorting indices i by the pair (c^{(k)}[i], c^{(k)}[i+2^k]) and reassigning new classes. When classes become unique (i.e., max class = n), the order is final. SA-IS defines S-type and L-type positions: i is S-type if S'[i..] < S'[i+1..] or S'[i] = S'[i+1] and i+1 is S-type; otherwise i is L-type. LMS (Leftmost S-type) positions are S-type positions i such that i-1 is L-type. By induced sorting, placing LMS-substrings in correct order and then inducing L- and S-type neighbors yields the full SA in linear time.

04When to Use

Use suffix arrays when you need fast substring queries on a static string with low memory overhead: finding if a pattern appears, counting occurrences, or listing positions (O(m \log n) per query). They are excellent for problems involving lexicographic ranks of suffixes, finding repeated substrings, computing the number of distinct substrings, and constructing the Burrows–Wheeler Transform. In competitive programming, the O(n \log n) doubling algorithm is typically the go-to solution due to its simplicity and speed; it comfortably handles strings of length up to ~2e5–1e6 depending on time limits. If n is extremely large or you are implementing a library with theoretical guarantees, consider SA-IS or DC3 to achieve O(n) worst-case time. Build the LCP array if you need: frequency of repeats, longest repeated substring, or to simulate suffix-tree-like queries (often accelerated via RMQ on LCP). If the alphabet is large (e.g., full Unicode), compress it to 1..\sigma first to improve performance and ensure radix sorts are linear in n + \sigma. Finally, if your data updates frequently, suffix arrays are less suitable; consider online structures (suffix automaton for distinct substrings/repeats, or dynamic text indexes) depending on the task.

⚠️Common Mistakes

Common pitfalls include forgetting the sentinel or not making it strictly smaller than all characters, which breaks comparisons at string ends. Another frequent issue is mixing 0-based and 1-based indexing for ranks or SA, causing off-by-one bugs, especially when accessing rank[i+2^k] beyond the end (you must treat out-of-range as 0 or the sentinel’s class). Using comparison-based sort for pairs in the doubling algorithm leads to an unnecessary O(n \log^{2} n) time; prefer counting/radix sort with bounded ranks. On large alphabets, skipping alphabet compression can inflate memory and slow down counting sort; compress characters to a dense 1..\sigma range. In SA-IS, misclassifying S/L types (tie cases) or mishandling LMS detection leads to subtle errors; remember that S-type is defined with a right-to-left rule and equals tie inherits the next position’s type. During induced sorting, ensure stability and correct bucket boundaries (begin vs end) when placing L vs S neighbors. For pattern search, incorrectly handling binary search bounds or not comparing safely up to string end can cause false negatives/positives; always compare up to pattern length or suffix end. Finally, when building LCP with Kasai, forgetting to decrement the running height h each iteration or not using the rank array leads to O(n^2) behavior instead of O(n).

Key Formulas

Suffix Array Definition

S A = i \in [0, n] argsort S^{'} [i ..]

Explanation: The suffix array is the ordering of indices 0..n that sorts all suffixes of the string with sentinel appended. argsort means the permutation that sorts the given list.

LCP Between Neighbors

LCP [p] = lcp (S^{'} [S A [p - 1] ..], S^{'} [S A [p] ..]), LCP [0] = 0

Explanation: LCP[p] stores the length of the longest common prefix of two adjacent suffixes in the suffix array. LCP[0] is defined as 0 because there is no previous suffix.

Doubling Rank Update

c^{(k + 1)} [i] = rank of (c^{(k)} [i], c^{(k)} [i + 2^{k}])

Explanation: At iteration k, each suffix i is represented by a pair: the class of its first 2^k characters and the class of the next 2^k characters. Sorting these pairs yields new classes for 2^{k+1} prefixes.

Time Complexity of Doubling

T (n) = O (n lo g n) (doubling with radix), T (n) = O (n lo g^{2} n) (doubling with comparison sort)

Explanation: Each doubling iteration sorts n items; with radix/counting sort it is linear per round across O( $lo g$ n) rounds, while comparison sorts add an extra $lo g$ n factor.

Linear-Time Construction

T (n) = O (n) (SA-IS, DC3)

Explanation: Both SA-IS and DC3 achieve linear worst-case time by carefully structuring recursive or induced sorting steps that touch each position a constant number of times.

Distinct Substrings via SA/LCP

# distinct substrings = i = 0 \sum n ((n - S A [i]) - LCP [i])

Explanation: The number of new substrings contributed by suffix SA[i] equals its length minus the overlap with its predecessor measured by LCP[i]. Summing over all suffixes gives the total.

LCP as RMQ on LCP Array

LCP (i, j) = p = m i n (i, j) + 1 min m a x (i, j) LCP [p]

Explanation: The LCP between any two suffixes with SA positions i and j is the minimum LCP value on the interval between them. This can be answered in O(1) with an RMQ structure.

Pattern Search Cost

O (m lo g n) comparisons for pattern search

Explanation: Binary searching the suffix array for a pattern of length m performs O( $lo g$ n) comparisons, each taking up to O(m) time by character-wise comparison.

Complexity Analysis

Let n be the length of the string after appending a sentinel (so original length is n-1). In the doubling algorithm, we maintain integer ranks in [0, n). At iteration k, we sort n pairs (

c^{(k)}

[i],

c^{(k)}

[i+2^k]) using stable radix/counting sort, which runs in O(n +

σ

) time with

σ

= O(n) because ranks are bounded by n. The number of iterations is O(

lo g

n) since prefix length doubles each round (1, 2, 4, ...). Therefore time is O(n

lo g

n) and space is O(n) for SA, rank arrays, and temporary buffers. If a comparison sort like std::sort is used directly on string indices with a comparator that inspects O(1) pair ranks, each round costs O(n

lo g

n), yielding O(n

lo g^{2}

n) total. Kasai’s algorithm to compute LCP is O(n) time and O(n) space, using the rank array and a running height that decreases by at most 1 per step. For pattern search, binary search over SA does O(

lo g

n) comparisons, each comparing up to m characters, giving O(m

lo g

n) time and O(1) extra space per query; reporting k occurrences adds O(k). SA-IS achieves O(n) time by classifying S/L types, sorting only LMS substrings first, compressing them to a smaller problem, and inducing the full order; each stage processes each character a constant number of times. Its space is O(n +

σ

) for buckets, types, and auxiliary arrays; implementations often use O(n) extra space. Practically, the O(n

lo g

n) doubling with counting sort is highly optimized and fast, while SA-IS provides theoretical linear bounds and excellent performance with a slightly more complex implementation.

Code Examples

Suffix Array in O(n log n) using doubling with counting sort (cyclic shifts method)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Build suffix array using the classic doubling algorithm with counting sort.
5 // Appends a sentinel '$' that is strictly smaller than all other characters.
6 // Returns SA: positions 0..n-1 of the string (including sentinel) in lexicographic order of suffixes.
7 
8 vector<int> suffix_array(const string &s_in) {
9     string s = s_in;
10     if (s.empty() || s.back() != '$') s.push_back('$'); // ensure sentinel
11     int n = (int)s.size();
12 
13     // Initial sorting by single characters using counting sort
14     const int ALPHA = 256; // byte alphabet upper bound
15     vector<int> p(n), c(n), cnt(max(ALPHA, n), 0);
16     for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++;
17     for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1];
18     for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i;
19 
20     // Assign classes (ranks)
21     c[p[0]] = 0;
22     int classes = 1;
23     for (int i = 1; i < n; ++i) {
24         if (s[p[i]] != s[p[i - 1]]) ++classes;
25         c[p[i]] = classes - 1;
26     }
27 
28     // Doubling: sort by 2^k length prefixes
29     vector<int> pn(n), cn(n);
30     for (int h = 0; (1 << h) < n; ++h) {
31         int len = 1 << h;
32         // Shift positions: first key is c[i], second key is c[(i+len)%n]
33         for (int i = 0; i < n; ++i) {
34             pn[i] = p[i] - len;
35             if (pn[i] < 0) pn[i] += n;
36         }
37         // Counting sort by class c
38         fill(cnt.begin(), cnt.begin() + classes, 0);
39         for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++;
40         for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1];
41         for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i];
42 
43         // Recompute classes
44         cn[p[0]] = 0;
45         classes = 1;
46         for (int i = 1; i < n; ++i) {
47             pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]};
48             pair<int,int> prev = {c[p[i - 1]], c[(p[i - 1] + len) % n]};
49             if (cur != prev) ++classes;
50             cn[p[i]] = classes - 1;
51         }
52         c.swap(cn);
53         if (classes == n) break; // all ranks unique
54     }
55 
56     return p; // p is the suffix array
57 }
58 
59 int main() {
60     ios::sync_with_stdio(false);
61     cin.tie(nullptr);
62 
63     string s = "banana"; // demo
64     auto sa = suffix_array(s);
65 
66     cout << "String with sentinel: " << (s.back()=='$'?s:s+"$") << "\n";
67     cout << "Suffix Array indices:" << '\n';
68     for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' ');
69 
70     cout << "Sorted suffixes:" << '\n';
71     string t = s; if (t.empty() || t.back()!='$') t.push_back('$');
72     for (int i = 0; i < (int)sa.size(); ++i) {
73         cout << setw(2) << i << ": " << t.substr(sa[i]) << '\n';
74     }
75     return 0;
76 }
77

This program builds the suffix array using the doubling algorithm implemented as sorting cyclic shifts. It starts by sorting single characters with counting sort, assigns ranks (classes), and iteratively doubles the considered prefix length, using counting sort on the rank classes. The sentinel '$' guarantees that all suffixes are proper and comparable. The result is the array of starting indices in lexicographic order.

Time: O(n log n)Space: O(n)

Kasai’s LCP array and pattern search via binary search on the suffix array

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 vector<int> suffix_array(const string &s_in) {
5     string s = s_in;
6     if (s.empty() || s.back() != '$') s.push_back('$');
7     int n = (int)s.size();
8     const int ALPHA = 256;
9     vector<int> p(n), c(n), cnt(max(ALPHA, n), 0);
10     for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++;
11     for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1];
12     for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i;
13     c[p[0]] = 0; int classes = 1;
14     for (int i = 1; i < n; ++i) { if (s[p[i]] != s[p[i-1]]) ++classes; c[p[i]] = classes - 1; }
15     vector<int> pn(n), cn(n);
16     for (int h = 0; (1 << h) < n; ++h) {
17         int len = 1 << h;
18         for (int i = 0; i < n; ++i) { pn[i] = p[i] - len; if (pn[i] < 0) pn[i] += n; }
19         fill(cnt.begin(), cnt.begin() + classes, 0);
20         for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++;
21         for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1];
22         for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i];
23         cn[p[0]] = 0; classes = 1;
24         for (int i = 1; i < n; ++i) {
25             pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]};
26             pair<int,int> prev = {c[p[i-1]], c[(p[i-1] + len) % n]};
27             if (cur != prev) ++classes; cn[p[i]] = classes - 1;
28         }
29         c.swap(cn);
30         if (classes == n) break;
31     }
32     return p;
33 }
34 
35 // Kasai's algorithm: build LCP between consecutive suffixes in SA order.
36 vector<int> build_lcp(const string &s_in, const vector<int> &sa) {
37     string s = s_in;
38     if (s.empty() || s.back() != '$') s.push_back('$');
39     int n = (int)s.size();
40     vector<int> rank(n, 0), lcp(n, 0);
41     for (int i = 0; i < n; ++i) rank[sa[i]] = i;
42     int h = 0;
43     for (int i = 0; i < n; ++i) {
44         int r = rank[i];
45         if (r == 0) { h = 0; continue; }
46         int j = sa[r - 1];
47         // Extend match from previous h
48         while (i + h < n && j + h < n && s[i + h] == s[j + h]) ++h;
49         lcp[r] = h;
50         if (h > 0) --h; // Decrease by at most 1
51     }
52     return lcp;
53 }
54 
55 // Compare pattern p with suffix s.substr(pos)
56 int cmp_suffix_pattern(const string &s, int pos, const string &pat) {
57     int n = (int)s.size();
58     int m = (int)pat.size();
59     for (int i = 0; i < m && pos + i < n; ++i) {
60         if (s[pos + i] < pat[i]) return -1;
61         if (s[pos + i] > pat[i]) return 1;
62     }
63     if (pos + m <= n) return 0; // pattern fully matched within suffix
64     return -1; // suffix ended before pattern
65 }
66 
67 // Binary search over SA to find the range of matches [L, R) for pattern
68 pair<int,int> find_occurrences(const string &s_in, const vector<int> &sa, const string &pat) {
69     string s = s_in;
70     if (s.empty() || s.back() != '$') s.push_back('$');
71     int n = (int)sa.size();
72     int L = 0, R = n;
73     // lower_bound: first suffix >= pat
74     while (L < R) {
75         int mid = (L + R) >> 1;
76         int c = cmp_suffix_pattern(s, sa[mid], pat);
77         if (c >= 0) R = mid; else L = mid + 1;
78     }
79     int start = L;
80     L = 0; R = n;
81     // upper_bound: first suffix > pat with prefix pat
82     while (L < R) {
83         int mid = (L + R) >> 1;
84         int c = cmp_suffix_pattern(s, sa[mid], pat);
85         if (c > 0) R = mid; else L = mid + 1;
86     }
87     int end = L;
88     return {start, end};
89 }
90 
91 int main() {
92     ios::sync_with_stdio(false);
93     cin.tie(nullptr);
94 
95     string s = "mississippi";
96     auto sa = suffix_array(s);
97     auto lcp = build_lcp(s, sa);
98 
99     cout << "SA indices:" << '\n';
100     for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()? '\n' : ' ');
101     cout << "LCP array:" << '\n';
102     for (int i = 0; i < (int)lcp.size(); ++i) cout << lcp[i] << (i+1==(int)lcp.size()? '\n' : ' ');
103 
104     vector<string> queries = {"issi", "miss", "ppi", "a"};
105     for (auto &q : queries) {
106         auto range = find_occurrences(s, sa, q);
107         cout << "Query '" << q << "' occurs at SA range [" << range.first << ", " << range.second << ")\n";
108         for (int i = range.first; i < range.second; ++i) {
109             cout << " position " << sa[i] << '\n';
110         }
111     }
112     return 0;
113 }
114

This code extends the suffix array with Kasai’s O(n) LCP construction and supports substring search by binary searching over the suffix array. The comparator compares the pattern with a suffix character-by-character. The function find_occurrences returns the SA interval where the pattern would fit, which corresponds to all occurrences. The LCP array is computed using the rank array and a rolling match length to keep the overall cost linear.

Time: SA: O(n log n), LCP: O(n), query: O(m log n + occ)Space: O(n)

SA-IS (Induced Sorting) linear-time suffix array construction

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // SA-IS algorithm: builds suffix array in O(n) for integer alphabet in [0, sigma].
5 // Requirements:
6 // - Input is a vector<int> s with values in [0, sigma], ending with a unique 0 sentinel.
7 // - 0 must be the smallest symbol and occurs exactly once at the end.
8 // Reference idea: classify S/L-types, identify LMS, induced sort LMS, reduce, recurse if needed, then induce full SA.
9 
10 static void induce(const vector<int>& s, int sigma, const vector<bool>& isS, const vector<int>& lms_pos,
11                    vector<int>& sa) {
12     int n = (int)s.size();
13     vector<int> bucket(sigma + 1, 0);
14     for (int x : s) bucket[x]++;
15     vector<int> head(sigma + 1, 0), tail(sigma + 1, 0);
16     // head: start index of bucket, tail: last index of bucket
17     for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1];
18     for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1;
19 
20     fill(sa.begin(), sa.end(), -1);
21     // Place LMS at bucket tails (stable, process in reverse order)
22     for (int i = (int)lms_pos.size() - 1; i >= 0; --i) {
23         int p = lms_pos[i];
24         int b = s[p];
25         sa[tail[b]--] = p;
26     }
27     // Induce L-type to bucket heads (scan left to right)
28     vector<int> head2 = head; // copy since we'll advance heads
29     for (int i = 0; i < n; ++i) {
30         int p = sa[i];
31         if (p <= 0) continue;
32         int j = p - 1;
33         if (!isS[j]) {
34             int b = s[j];
35             sa[head2[b]++] = j;
36         }
37     }
38     // Induce S-type to bucket tails (scan right to left)
39     vector<int> tail2 = tail; // copy since we'll decrement tails
40     for (int i = n - 1; i >= 0; --i) {
41         int p = sa[i];
42         if (p <= 0) continue;
43         int j = p - 1;
44         if (isS[j]) {
45             int b = s[j];
46             sa[tail2[b]--] = j;
47         }
48     }
49 }
50 
51 static bool is_lms(int i, const vector<bool>& isS) {
52     return i > 0 && isS[i] && !isS[i - 1];
53 }
54 
55 static vector<int> SA_IS(const vector<int>& s, int sigma) {
56     int n = (int)s.size();
57     vector<bool> isS(n, false);
58     // Classify S/L types. By convention, the sentinel at n-1 is S-type.
59     isS[n - 1] = true;
60     for (int i = n - 2; i >= 0; --i) {
61         if (s[i] < s[i + 1]) isS[i] = true;
62         else if (s[i] > s[i + 1]) isS[i] = false;
63         else isS[i] = isS[i + 1];
64     }
65 
66     // Collect LMS positions in order
67     vector<int> lms_pos;
68     lms_pos.reserve(n);
69     for (int i = 1; i < n; ++i) if (is_lms(i, isS)) lms_pos.push_back(i);
70 
71     vector<int> sa(n, -1);
72     induce(s, sigma, isS, lms_pos, sa);
73 
74     // Extract LMS in SA order
75     vector<int> lms_in_sa;
76     lms_in_sa.reserve(lms_pos.size());
77     for (int p : sa) if (p != -1 && is_lms(p, isS)) lms_in_sa.push_back(p);
78 
79     // Name LMS substrings
80     vector<int> name(n, -1);
81     int cur_name = 0;
82     name[n - 1] = cur_name; // sentinel LMS-equivalent base
83 
84     auto equal_lms = [&](int a, int b) {
85         if (a == n - 1 || b == n - 1) return a == b; // both sentinel
86         int i = 0;
87         while (true) {
88             bool a_lms = is_lms(a + i, isS);
89             bool b_lms = is_lms(b + i, isS);
90             if (s[a + i] != s[b + i] || a_lms != b_lms) return false;
91             if (a_lms && b_lms && i > 0) return true; // both hit next LMS at same time
92             ++i;
93         }
94     };
95 
96     // Assign names to LMS-substrings
97     name[lms_in_sa[0]] = cur_name;
98     for (int i = 1; i < (int)lms_in_sa.size(); ++i) {
99         if (!equal_lms(lms_in_sa[i - 1], lms_in_sa[i])) ++cur_name;
100         name[lms_in_sa[i]] = cur_name;
101     }
102 
103     int m = (int)lms_pos.size();
104     vector<int> s1(m);
105     // Map LMS positions to compact string s1 in original order of lms_pos
106     for (int i = 0; i < m; ++i) s1[i] = name[lms_pos[i]];
107 
108     vector<int> sa1;
109     if (cur_name + 1 == m) {
110         // Already unique: sa1 is inverse permutation of s1
111         sa1.assign(m, 0);
112         for (int i = 0; i < m; ++i) sa1[s1[i]] = i;
113     } else {
114         // Recurse on s1 with alphabet size cur_name+1
115         sa1 = SA_IS(s1, cur_name);
116     }
117 
118     // Reconstruct LMS order using sa1 (which gives order of lms_pos)
119     vector<int> ordered_lms;
120     ordered_lms.reserve(m);
121     for (int i = 0; i < m; ++i) ordered_lms.push_back(lms_pos[sa1[i]]);
122 
123     // Induce again from ordered LMS to get full SA
124     // Prepare empty SA and place ordered LMS at bucket tails
125     vector<int> bucket(sigma + 1, 0);
126     for (int x : s) bucket[x]++;
127     vector<int> head(sigma + 1, 0), tail(sigma + 1, 0);
128     for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1];
129     for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1;
130 
131     fill(sa.begin(), sa.end(), -1);
132     for (int i = (int)ordered_lms.size() - 1; i >= 0; --i) {
133         int p = ordered_lms[i];
134         int b = s[p];
135         sa[tail[b]--] = p;
136     }
137 
138     // Final induction L then S
139     vector<bool> isS2 = isS;
140     // Induce L-type
141     vector<int> head2 = head;
142     for (int i = 0; i < n; ++i) {
143         int p = sa[i];
144         if (p <= 0) continue;
145         int j = p - 1;
146         if (!isS2[j]) {
147             int b = s[j];
148             sa[head2[b]++] = j;
149         }
150     }
151     // Induce S-type
152     vector<int> tail2 = tail;
153     for (int i = n - 1; i >= 0; --i) {
154         int p = sa[i];
155         if (p <= 0) continue;
156         int j = p - 1;
157         if (isS2[j]) {
158             int b = s[j];
159             sa[tail2[b]--] = j;
160         }
161     }
162 
163     return sa;
164 }
165 
166 // Helper: compress characters of a string to 1..sigma and append 0 sentinel.
167 static vector<int> to_int_alphabet_with_sentinel(const string& s) {
168     vector<unsigned char> v(s.begin(), s.end());
169     vector<unsigned char> uniq = v;
170     sort(uniq.begin(), uniq.end()); uniq.erase(unique(uniq.begin(), uniq.end()), uniq.end());
171     vector<int> res; res.reserve(s.size() + 1);
172     // Map smallest char to 1 (since 0 is reserved for sentinel)
173     for (auto ch : v) {
174         int id = (int)(lower_bound(uniq.begin(), uniq.end(), ch) - uniq.begin()) + 1;
175         res.push_back(id);
176     }
177     res.push_back(0); // sentinel
178     return res;
179 }
180 
181 int main() {
182     ios::sync_with_stdio(false);
183     cin.tie(nullptr);
184 
185     string s = "banana"; // demo
186     auto a = to_int_alphabet_with_sentinel(s);
187     int sigma = 0; for (int x : a) sigma = max(sigma, x);
188     auto sa = SA_IS(a, sigma);
189 
190     cout << "SA-IS Suffix Array (with sentinel at end):\n";
191     for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' ');
192 
193     // Print sorted suffixes for verification
194     string t = s + string("\0", 1); // show sentinel as NUL; for display, we skip it
195     cout << "Sorted suffixes (displaying without NUL):\n";
196     for (int i = 0; i < (int)sa.size(); ++i) {
197         int pos = sa[i];
198         if (pos == (int)s.size()) cout << "$" << '\n';
199         else cout << s.substr(pos) << '\n';
200     }
201     return 0;
202 }
203

This is a complete SA-IS implementation for integer alphabets: it classifies S/L types, collects LMS positions, induces a partial order, names LMS substrings to form a reduced problem, recursively solves it if necessary, and then induces the full suffix array. A helper compresses a byte string to integers 1..σ and appends sentinel 0 to satisfy SA-IS’s requirements. Each stage processes elements with linear work, delivering O(n) time in theory and practice.

Time: O(n)Space: O(n + \sigma)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Build suffix array using the classic doubling algorithm with counting sort.
5	// Appends a sentinel '$' that is strictly smaller than all other characters.
6	// Returns SA: positions 0..n-1 of the string (including sentinel) in lexicographic order of suffixes.
7
8	vector<int> suffix_array(const string &s_in) {
9	string s = s_in;
10	if (s.empty() \|\| s.back() != '$') s.push_back('$'); // ensure sentinel
11	int n = (int)s.size();
12
13	// Initial sorting by single characters using counting sort
14	const int ALPHA = 256; // byte alphabet upper bound
15	vector<int> p(n), c(n), cnt(max(ALPHA, n), 0);
16	for (int i = 0; i < n; ++i) cnt[(unsigned char)s[i]]++;
17	for (int i = 1; i < (int)cnt.size(); ++i) cnt[i] += cnt[i - 1];
18	for (int i = 0; i < n; ++i) p[--cnt[(unsigned char)s[i]]] = i;
19
20	// Assign classes (ranks)
21	c[p[0]] = 0;
22	int classes = 1;
23	for (int i = 1; i < n; ++i) {
24	if (s[p[i]] != s[p[i - 1]]) ++classes;
25	c[p[i]] = classes - 1;
26	}
27
28	// Doubling: sort by 2^k length prefixes
29	vector<int> pn(n), cn(n);
30	for (int h = 0; (1 << h) < n; ++h) {
31	int len = 1 << h;
32	// Shift positions: first key is c[i], second key is c[(i+len)%n]
33	for (int i = 0; i < n; ++i) {
34	pn[i] = p[i] - len;
35	if (pn[i] < 0) pn[i] += n;
36	}
37	// Counting sort by class c
38	fill(cnt.begin(), cnt.begin() + classes, 0);
39	for (int i = 0; i < n; ++i) cnt[c[pn[i]]]++;
40	for (int i = 1; i < classes; ++i) cnt[i] += cnt[i - 1];
41	for (int i = n - 1; i >= 0; --i) p[--cnt[c[pn[i]]]] = pn[i];
42
43	// Recompute classes
44	cn[p[0]] = 0;
45	classes = 1;
46	for (int i = 1; i < n; ++i) {
47	pair<int,int> cur = {c[p[i]], c[(p[i] + len) % n]};
48	pair<int,int> prev = {c[p[i - 1]], c[(p[i - 1] + len) % n]};
49	if (cur != prev) ++classes;
50	cn[p[i]] = classes - 1;
51	}
52	c.swap(cn);
53	if (classes == n) break; // all ranks unique
54	}
55
56	return p; // p is the suffix array
57	}
58
59	int main() {
60	ios::sync_with_stdio(false);
61	cin.tie(nullptr);
62
63	string s = "banana"; // demo
64	auto sa = suffix_array(s);
65
66	cout << "String with sentinel: " << (s.back()=='$'?s:s+"$") << "\n";
67	cout << "Suffix Array indices:" << '\n';
68	for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' ');
69
70	cout << "Sorted suffixes:" << '\n';
71	string t = s; if (t.empty() \|\| t.back()!='$') t.push_back('$');
72	for (int i = 0; i < (int)sa.size(); ++i) {
73	cout << setw(2) << i << ": " << t.substr(sa[i]) << '\n';
74	}
75	return 0;
76	}
77

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// SA-IS algorithm: builds suffix array in O(n) for integer alphabet in [0, sigma].
5	// Requirements:
6	// - Input is a vector<int> s with values in [0, sigma], ending with a unique 0 sentinel.
7	// - 0 must be the smallest symbol and occurs exactly once at the end.
8	// Reference idea: classify S/L-types, identify LMS, induced sort LMS, reduce, recurse if needed, then induce full SA.
9
10	static void induce(const vector<int>& s, int sigma, const vector<bool>& isS, const vector<int>& lms_pos,
11	vector<int>& sa) {
12	int n = (int)s.size();
13	vector<int> bucket(sigma + 1, 0);
14	for (int x : s) bucket[x]++;
15	vector<int> head(sigma + 1, 0), tail(sigma + 1, 0);
16	// head: start index of bucket, tail: last index of bucket
17	for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1];
18	for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1;
19
20	fill(sa.begin(), sa.end(), -1);
21	// Place LMS at bucket tails (stable, process in reverse order)
22	for (int i = (int)lms_pos.size() - 1; i >= 0; --i) {
23	int p = lms_pos[i];
24	int b = s[p];
25	sa[tail[b]--] = p;
26	}
27	// Induce L-type to bucket heads (scan left to right)
28	vector<int> head2 = head; // copy since we'll advance heads
29	for (int i = 0; i < n; ++i) {
30	int p = sa[i];
31	if (p <= 0) continue;
32	int j = p - 1;
33	if (!isS[j]) {
34	int b = s[j];
35	sa[head2[b]++] = j;
36	}
37	}
38	// Induce S-type to bucket tails (scan right to left)
39	vector<int> tail2 = tail; // copy since we'll decrement tails
40	for (int i = n - 1; i >= 0; --i) {
41	int p = sa[i];
42	if (p <= 0) continue;
43	int j = p - 1;
44	if (isS[j]) {
45	int b = s[j];
46	sa[tail2[b]--] = j;
47	}
48	}
49	}
50
51	static bool is_lms(int i, const vector<bool>& isS) {
52	return i > 0 && isS[i] && !isS[i - 1];
53	}
54
55	static vector<int> SA_IS(const vector<int>& s, int sigma) {
56	int n = (int)s.size();
57	vector<bool> isS(n, false);
58	// Classify S/L types. By convention, the sentinel at n-1 is S-type.
59	isS[n - 1] = true;
60	for (int i = n - 2; i >= 0; --i) {
61	if (s[i] < s[i + 1]) isS[i] = true;
62	else if (s[i] > s[i + 1]) isS[i] = false;
63	else isS[i] = isS[i + 1];
64	}
65
66	// Collect LMS positions in order
67	vector<int> lms_pos;
68	lms_pos.reserve(n);
69	for (int i = 1; i < n; ++i) if (is_lms(i, isS)) lms_pos.push_back(i);
70
71	vector<int> sa(n, -1);
72	induce(s, sigma, isS, lms_pos, sa);
73
74	// Extract LMS in SA order
75	vector<int> lms_in_sa;
76	lms_in_sa.reserve(lms_pos.size());
77	for (int p : sa) if (p != -1 && is_lms(p, isS)) lms_in_sa.push_back(p);
78
79	// Name LMS substrings
80	vector<int> name(n, -1);
81	int cur_name = 0;
82	name[n - 1] = cur_name; // sentinel LMS-equivalent base
83
84	auto equal_lms = [&](int a, int b) {
85	if (a == n - 1 \|\| b == n - 1) return a == b; // both sentinel
86	int i = 0;
87	while (true) {
88	bool a_lms = is_lms(a + i, isS);
89	bool b_lms = is_lms(b + i, isS);
90	if (s[a + i] != s[b + i] \|\| a_lms != b_lms) return false;
91	if (a_lms && b_lms && i > 0) return true; // both hit next LMS at same time
92	++i;
93	}
94	};
95
96	// Assign names to LMS-substrings
97	name[lms_in_sa[0]] = cur_name;
98	for (int i = 1; i < (int)lms_in_sa.size(); ++i) {
99	if (!equal_lms(lms_in_sa[i - 1], lms_in_sa[i])) ++cur_name;
100	name[lms_in_sa[i]] = cur_name;
101	}
102
103	int m = (int)lms_pos.size();
104	vector<int> s1(m);
105	// Map LMS positions to compact string s1 in original order of lms_pos
106	for (int i = 0; i < m; ++i) s1[i] = name[lms_pos[i]];
107
108	vector<int> sa1;
109	if (cur_name + 1 == m) {
110	// Already unique: sa1 is inverse permutation of s1
111	sa1.assign(m, 0);
112	for (int i = 0; i < m; ++i) sa1[s1[i]] = i;
113	} else {
114	// Recurse on s1 with alphabet size cur_name+1
115	sa1 = SA_IS(s1, cur_name);
116	}
117
118	// Reconstruct LMS order using sa1 (which gives order of lms_pos)
119	vector<int> ordered_lms;
120	ordered_lms.reserve(m);
121	for (int i = 0; i < m; ++i) ordered_lms.push_back(lms_pos[sa1[i]]);
122
123	// Induce again from ordered LMS to get full SA
124	// Prepare empty SA and place ordered LMS at bucket tails
125	vector<int> bucket(sigma + 1, 0);
126	for (int x : s) bucket[x]++;
127	vector<int> head(sigma + 1, 0), tail(sigma + 1, 0);
128	for (int i = 1; i <= sigma; ++i) head[i] = head[i - 1] + bucket[i - 1];
129	for (int i = 0; i <= sigma; ++i) tail[i] = head[i] + bucket[i] - 1;
130
131	fill(sa.begin(), sa.end(), -1);
132	for (int i = (int)ordered_lms.size() - 1; i >= 0; --i) {
133	int p = ordered_lms[i];
134	int b = s[p];
135	sa[tail[b]--] = p;
136	}
137
138	// Final induction L then S
139	vector<bool> isS2 = isS;
140	// Induce L-type
141	vector<int> head2 = head;
142	for (int i = 0; i < n; ++i) {
143	int p = sa[i];
144	if (p <= 0) continue;
145	int j = p - 1;
146	if (!isS2[j]) {
147	int b = s[j];
148	sa[head2[b]++] = j;
149	}
150	}
151	// Induce S-type
152	vector<int> tail2 = tail;
153	for (int i = n - 1; i >= 0; --i) {
154	int p = sa[i];
155	if (p <= 0) continue;
156	int j = p - 1;
157	if (isS2[j]) {
158	int b = s[j];
159	sa[tail2[b]--] = j;
160	}
161	}
162
163	return sa;
164	}
165
166	// Helper: compress characters of a string to 1..sigma and append 0 sentinel.
167	static vector<int> to_int_alphabet_with_sentinel(const string& s) {
168	vector<unsigned char> v(s.begin(), s.end());
169	vector<unsigned char> uniq = v;
170	sort(uniq.begin(), uniq.end()); uniq.erase(unique(uniq.begin(), uniq.end()), uniq.end());
171	vector<int> res; res.reserve(s.size() + 1);
172	// Map smallest char to 1 (since 0 is reserved for sentinel)
173	for (auto ch : v) {
174	int id = (int)(lower_bound(uniq.begin(), uniq.end(), ch) - uniq.begin()) + 1;
175	res.push_back(id);
176	}
177	res.push_back(0); // sentinel
178	return res;
179	}
180
181	int main() {
182	ios::sync_with_stdio(false);
183	cin.tie(nullptr);
184
185	string s = "banana"; // demo
186	auto a = to_int_alphabet_with_sentinel(s);
187	int sigma = 0; for (int x : a) sigma = max(sigma, x);
188	auto sa = SA_IS(a, sigma);
189
190	cout << "SA-IS Suffix Array (with sentinel at end):\n";
191	for (int i = 0; i < (int)sa.size(); ++i) cout << sa[i] << (i+1==(int)sa.size()?'\n':' ');
192
193	// Print sorted suffixes for verification
194	string t = s + string("\0", 1); // show sentinel as NUL; for display, we skip it
195	cout << "Sorted suffixes (displaying without NUL):\n";
196	for (int i = 0; i < (int)sa.size(); ++i) {
197	int pos = sa[i];
198	if (pos == (int)s.size()) cout << "$" << '\n';
199	else cout << s.substr(pos) << '\n';
200	}
201	return 0;
202	}
203