🗂️Data StructureIntermediate

Policy-Based Data Structures

Key Points

  • Policy-Based Data Structures (PBDS) are GNU C++ extensions that add advanced containers like an order-statistics tree, a fast hash table, and ropes for efficient string edits.
  • The order-statistics tree supports fin_order(k) to get the k-th smallest element and orde_key(x) to get the rank of x in O(log n) time.
  • You can emulate an ordered multiset by storing pairs (value, uniqu) to handle duplicates while keeping order-statistics queries valid.
  • g_table is a hash table with expected O(1) operations and can outperform std::unordere in many competitive programming workloads.
  • rope is a balanced-tree string structure that makes insertions, deletions, and substrings around O(log n) time, helpful for text-editing style problems.
  • These extensions are GNU-specific; they compile on Codeforces, AtCoder (GCC), and most CP judges, but are not standard C++.
  • Use PBDS when STL containers are missing rank/k-th operations or when you need faster hashing or heavy string splicing.
  • Be careful with duplicates, 0-indexing in fin_order, and strict comparators; improper use leads to wrong answers or compile errors.

Prerequisites

  • Binary search trees and balancing (red-black trees)Understanding how balanced trees maintain O(log n) height explains the complexity guarantees of the PBDS tree.
  • Big-O notation and logarithmsAnalyzing PBDS operations and comparing alternatives requires comfort with O(log n), O(n log n), and amortized costs.
  • C++ templates and STL containersPBDS uses templates heavily and integrates with set/map-like interfaces.
  • Comparators and strict weak orderingCorrect comparator design is crucial for ordered containers and for handling duplicates via composite keys.
  • Hashing fundamentalsTo use gp_hash_table effectively and safely, you need to reason about hash functions, collisions, and load factor.
  • Basic string data structuresKnowing std::string limitations helps you see when rope is advantageous for splicing-heavy workloads.

Detailed Explanation

Tap terms for definitions

01Overview

Policy-Based Data Structures (PBDS) are a set of useful container templates in the GNU C++ library that extend standard containers with operations that are common in algorithms and competitive programming. The star is the order-statistics tree (tree<...> with tree_order_statistics_node_update), a balanced binary search tree that supports both rank queries (how many elements are strictly less than x) and selection queries (give me the k-th smallest element). Alongside it, gp_hash_table offers a high-performance hash map with expected O(1) operations and rope provides fast string concatenation, insertion, deletion, and substring extraction by storing text in a balanced tree rather than a contiguous array. Together, these tools fill gaps where the C++ STL lacks certain operations or speed characteristics, enabling simpler solutions with better asymptotic complexity. In Codeforces-range problems (1200–1600), PBDS can dramatically reduce code complexity for tasks involving dynamic order statistics (k-th, rank), frequency maps on large inputs, or many string splices. While PBDS is not part of the C++ standard, it is widely supported on GCC-based judges; you should still avoid it on compilers that do not ship the GNU extensions.

02Intuition & Analogies

Imagine a neatly sorted bookshelf where you can quickly answer two questions: (1) What is the position of a given book? and (2) Which book is in position k? A normal set only keeps the books sorted, but gives you neither the exact rank of a book nor lets you ask for the k-th book directly. The order-statistics tree is like adding an index card to each shelf section that tracks how many books are to the left, so you can immediately jump to the k-th or count how many come before a particular title. For duplicates, if two copies of the same book exist, we attach a small ticket number to each copy so they become distinct while keeping their order by title and then by ticket. For a hash table, think of labeled boxes where each label is computed from the item’s name (a hash). If boxes are well distributed, you almost instantly reach the right one. The gp_hash_table uses a very practical system of boxes and probing to find a place fast under average conditions. For strings, a rope is like storing a long text as many shorter strings connected by a tree of binders. Splicing in or cutting out a paragraph doesn’t require shifting all text to the right like a standard string would; you just rearrange a few binders at the splice point. This is why rope excels at repeated insert/erase/substring operations compared to contiguous storage.

03Formal Definition

The PBDS order-statistics tree is typically declared as , Mapped, Compare, r_tag, tre_statistic_update>. When Mapped is nul, it behaves like a set. It maintains a balanced binary search tree (a red-black tree) augmented with subtree sizes via tre_statistic_update, enabling two operations: fin_order(k) returns an iterator to the element with 0-based order k; orde_key(x) returns the number of elements strictly less than x. Both operations run in O(log n) time because they traverse O(tree height) while using stored subtree sizes for decisions. The g_, Mapped, Hash, Eq, ...> is a hash table with open addressing policies (implementation details can vary) and achieves expected O(1) time for insert, find, and erase assuming a well-behaved hash function and a bounded load factor. The (commonly ) is a balanced binary tree whose leaves store small contiguous character chunks. Core operations—concatenation, insertion at position i, erasing a substring, and getting a substring—are implemented by splitting and joining nodes and thus typically cost O(log n + k) where k is the affected substring length (e.g., returned substring size or inserted text size).

04When to Use

Use the order-statistics tree when you need dynamic rank queries or k-th element selection alongside standard set/map behavior. Typical tasks include maintaining a running median, counting inversions, coordinate compression on-the-fly, selecting the k-th smallest value after insertions/deletions, and computing ranks in leaderboards. When duplicates are required, store pairs (value, unique_id) to emulate a multiset with order statistics. Choose gp_hash_table when you need fast hash maps for integer keys or lightweight composites like pairs and when std::unordered_map is too slow. It shines in frequency counting, sliding-window maps, and adjacency maps in dense input scenarios—common in competitive programming. Provide a robust custom hash for composite keys to reduce collisions and avoid adversarial slowdowns. Pick rope when a problem requires many string splices: insertions, deletions, and substrings at arbitrary positions—like simulating a simple text editor, processing many cut-and-paste operations, or building strings incrementally with frequent middle insertions. If you only append to the end or need random indexing constantly, std::string or std::vector<char> may be simpler and faster.

⚠️Common Mistakes

• Using less_equal or a non-strict comparator in tree<...>. The Compare must model a strict weak ordering (like std::less). To support duplicates, do not use less_equal; instead, store pairs (value, unique_id) so elements remain unique under a strict comparator. • Forgetting 0-based indexing. find_by_order(k) uses 0-based k, so the smallest element is k = 0. Off-by-one errors are common when converting to 1-based ranks. • Assuming order_of_key(x) returns the index of x even when x is absent. It actually returns how many elements are strictly less than x. If x is not present, this is still a valid count, not an error. • Erasing from an ordered-multiset emulation incorrectly. If you store pairs (value, id), calling erase(value) does nothing because the key type is pair. You must locate and erase a specific iterator, usually with lower_bound({value, -INF}). • Treating PBDS as standard C++. PBDS is GNU-specific; it compiles on GCC-based judges but may fail elsewhere (e.g., MSVC). Include the right headers: <ext/pb_ds/assoc_container.hpp>, <ext/pb_ds/tree_policy.hpp>, and <ext/rope>. • Weak hashes in gp_hash_table. With composite keys, supply a strong custom hash; otherwise, collisions may degrade performance. Avoid pathologically chosen inputs when possible. • Expecting rope to behave like std::string in all ways. Random access r[i] is O(log n), not O(1). Printing a rope typically involves iterating or indexing; converting directly to std::string is not a built-in convenience. • Forgetting to use tree_order_statistics_node_update in the tree type alias. Without it, find_by_order and order_of_key do not exist.

Key Formulas

Rank definition

Explanation: The rank of x in a sorted set S is the number of elements strictly less than x. In PBDS this is computed by orde_key(x).

Select (k-th element)

Explanation: Selecting the k-th element means returning the element that would appear at index k in sorted order. In PBDS this is performed by fin_order(k).

Order-statistics tree operation time

Explanation: Each insert, erase, fin_order, and orde_key takes logarithmic time in the number of elements because the underlying tree is balanced.

Building cost from incremental inserts

Explanation: Inserting n elements one by one into a balanced tree yields overall (n log n) time. This follows from the sum of logarithms.

Inversion count

Explanation: The number of inversions in an array counts out-of-order pairs. PBDS can compute this in O(n log n) by counting greater elements to the left.

Expected hash table time

Explanation: Hash operations are expected constant time when the load factor is controlled and the hash function spreads keys uniformly.

Rope edit time

Explanation: Inserting or erasing a substring in a rope takes logarithmic time to navigate the tree plus linear time in the size of the edited block.

Order of key equals rank

Explanation: The value returned by orde_key(x) is exactly the rank of x, i.e., the count of elements strictly less than x.

Select–rank relationship

Explanation: For distinct elements, the k-th element has rank k. This connects fin_order and orde_key directly.

Complexity Analysis

The PBDS order-statistics tree is a red-black tree augmented with subtree sizes. Balancing guarantees height O(log n), so insert, erase, lowe/uppe, fin_order(k), and orde_key(x) each take O(log n) time and O(1) extra space per operation. Building the structure via n inserts therefore costs Θ(n log n) time and O(n) space. Using a (value, uniqu) key to emulate a multiset preserves these asymptotics; comparisons add a tiny constant factor. The g_table provides expected O(1) amortized time for insert, find, and erase when the hash function distributes keys uniformly and the load factor remains bounded. Space usage is O(n) with some overhead for buckets and metadata. In worst-case adversarial inputs (hash collisions), operations can degrade, but with robust hashing and typical competitive data, performance is excellent. The rope stores text in a balanced tree of character chunks. Operations that modify structure near a position run in O(log n + k), where n is the current text length and k is the size of the inserted/erased substring. Taking a substring of length k similarly costs O(log n + k). Random access by index is O(log n) because it descends the tree. Rope space is O(n) plus node overhead; many small edits avoid the O(n) shifts that std::string would require. Practically, for CF 1200–1600 problems: using an order-statistics tree converts many O() counting tasks (like inversion counting) to O(n log n). g_table often reduces constant factors enough to pass tight time limits. Rope becomes beneficial when splicing dominates costs; otherwise, std::string may be simpler and faster for contiguous operations.

Code Examples

Ordered set basics: rank and k-th element with PBDS
1#include <bits/stdc++.h> // For convenience in CP
2#include <ext/pb_ds/assoc_container.hpp>
3#include <ext/pb_ds/tree_policy.hpp>
4
5using namespace std;
6using namespace __gnu_pbds;
7
8// Alias for an order-statistics set of ints
9// Key type = int, mapped type = null_type (set behavior)
10// Comparator = less<int> (strict weak ordering)
11// Tag = rb_tree_tag (red-black tree)
12// Node update = tree_order_statistics_node_update (enables order stats)
13template <class T>
14using ordered_set = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
15
16int main() {
17 ios::sync_with_stdio(false);
18 cin.tie(nullptr);
19
20 ordered_set<int> os; // empty
21
22 // Insert elements (duplicates are ignored since this is a set)
23 os.insert(10);
24 os.insert(5);
25 os.insert(20);
26 os.insert(10); // ignored
27
28 // 1) order_of_key(x): how many elements are < x
29 cout << "order_of_key(15) = " << os.order_of_key(15) << "\n"; // {5,10,20} -> 2
30 cout << "order_of_key(5) = " << os.order_of_key(5) << "\n"; // -> 0
31 cout << "order_of_key(100) = " << os.order_of_key(100) << "\n"; // -> 3
32
33 // 2) find_by_order(k): iterator to k-th smallest (0-based)
34 for (int k = 0; k < (int)os.size(); ++k) {
35 cout << "k=" << k << ", element=" << *os.find_by_order(k) << "\n";
36 }
37
38 // Erase an element and query again
39 os.erase(10); // remove value 10
40 cout << "After erasing 10, size=" << os.size() << "\n";
41 if (!os.empty()) {
42 cout << "Smallest element is now: " << *os.find_by_order(0) << "\n";
43 }
44
45 // Check presence (normal set interface works)
46 cout << (os.find(20) != os.end() ? "20 present" : "20 missing") << "\n";
47
48 return 0;
49}
50

This program demonstrates the two special operations: order_of_key(x) returns the rank of x (count of elements less than x), and find_by_order(k) retrieves the k-th smallest element (0-based). The container behaves like a std::set with extra order-statistics power.

Time: Each insert/erase/query is O(log n)Space: O(n)
Ordered multiset with duplicates and running median
1#include <bits/stdc++.h>
2#include <ext/pb_ds/assoc_container.hpp>
3#include <ext/pb_ds/tree_policy.hpp>
4
5using namespace std;
6using namespace __gnu_pbds;
7
8template <class T>
9using ordered_tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
10
11// We emulate a multiset by storing (value, unique_id) so that duplicates are distinct.
12// Comparator is lexicographic: first by value, then by id.
13
14int main() {
15 ios::sync_with_stdio(false);
16 cin.tie(nullptr);
17
18 ordered_tree<pair<int,int>> ms; // ordered multiset emulation
19 int next_id = 0; // unique id for duplicates
20
21 // Simple interactive demo: I x (insert), D x (delete one x), M (print median), Q (quit)
22 // Example input:
23 // I 5
24 // I 1
25 // I 5
26 // M
27 // D 5
28 // M
29
30 char op;
31 while (cin >> op) {
32 if (op == 'I') {
33 int x; cin >> x;
34 ms.insert({x, next_id++});
35 } else if (op == 'D') {
36 int x; cin >> x;
37 // Erase a single occurrence of x if present
38 auto it = ms.lower_bound({x, numeric_limits<int>::min()});
39 if (it != ms.end() && it->first == x) ms.erase(it);
40 } else if (op == 'M') {
41 if (ms.empty()) {
42 cout << "EMPTY\n";
43 } else {
44 int n = (int)ms.size();
45 // Lower median for even n
46 int k = (n - 1) / 2;
47 int median = ms.find_by_order(k)->first;
48 cout << median << "\n";
49 }
50 } else if (op == 'Q') {
51 break;
52 }
53 }
54
55 return 0;
56}
57

By storing pairs (value, unique_id), we support duplicates while preserving a strict weak ordering. We can find the running median by reading the element at order (n−1)/2 using find_by_order. Deleting a single occurrence of x uses lower_bound on (x, −INF) to grab one copy.

Time: Each insert/erase/median query is O(log n)Space: O(n)
Counting inversions in O(n log n) using PBDS
1#include <bits/stdc++.h>
2#include <ext/pb_ds/assoc_container.hpp>
3#include <ext/pb_ds/tree_policy.hpp>
4
5using namespace std;
6using namespace __gnu_pbds;
7
8template <class T>
9using ordered_tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;
10
11int main() {
12 ios::sync_with_stdio(false);
13 cin.tie(nullptr);
14
15 int n;
16 if (!(cin >> n)) return 0;
17 vector<long long> a(n);
18 for (int i = 0; i < n; ++i) cin >> a[i];
19
20 // To handle duplicates correctly, store (value, index)
21 ordered_tree<pair<long long,int>> os;
22 long long inv = 0;
23
24 for (int i = 0; i < n; ++i) {
25 // Number of elements <= a[i] is order_of_key({a[i], +INF})
26 int le = os.order_of_key({a[i], INT_MAX});
27 int inserted = i;
28 int greater = inserted - le; // strictly greater to the left
29 inv += greater;
30 os.insert({a[i], i});
31 }
32
33 cout << inv << "\n";
34 return 0;
35}
36

We iterate left to right and keep a PBDS of previously seen elements as (value, index). For the current a[i], elements greater than a[i] among inserted ones equal (inserted_count − count_of_elements_≤a[i]). Using order_of_key({a[i], INT_MAX}) counts all elements with value ≤ a[i]. Summing this over i yields the inversion count.

Time: O(n log n)Space: O(n)
gp_hash_table with custom hash for pair keys, and rope for fast string splicing
1#include <bits/stdc++.h>
2#include <ext/pb_ds/assoc_container.hpp>
3#include <ext/rope>
4
5using namespace std;
6using namespace __gnu_pbds;
7using namespace __gnu_cxx; // for rope
8
9// Strong 64-bit mix for pairs to reduce collisions
10struct PairHash {
11 size_t operator()(const pair<int,int>& p) const noexcept {
12 uint64_t x = (static_cast<uint64_t>(static_cast<uint32_t>(p.first)) << 32) ^ static_cast<uint32_t>(p.second);
13 x ^= 0x9e3779b97f4a7c15ULL;
14 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
15 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
16 x = x ^ (x >> 31);
17 return static_cast<size_t>(x);
18 }
19};
20
21int main() {
22 ios::sync_with_stdio(false);
23 cin.tie(nullptr);
24
25 // Part 1: gp_hash_table for counting frequency of 2D points
26 gp_hash_table<pair<int,int>, int, PairHash> freq;
27 vector<pair<int,int>> pts = {{1,2},{2,3},{1,2},{4,5},{2,3},{2,3}};
28 for (auto &p : pts) freq[p]++;
29
30 cout << "Frequencies (gp_hash_table):\n";
31 // Use an auxiliary set to avoid duplicate prints
32 set<pair<int,int>> printed(pts.begin(), pts.end());
33 for (auto &p : printed) {
34 cout << '(' << p.first << ',' << p.second << ") -> " << freq[p] << '\n';
35 }
36
37 // Part 2: rope for efficient text edits
38 rope<char> r;
39 r.append("hello");
40 r.push_back(' ');
41 r.append("world"); // r = "hello world"
42 r.insert(5, ", brave new"); // insert after "hello"
43 r.erase(0, 1); // erase first character
44 rope<char> sub = r.substr(0, 12); // first 12 characters
45
46 cout << "\nRope full text: ";
47 for (int i = 0; i < (int)r.size(); ++i) cout << r[i];
48 cout << "\nRope substring: ";
49 for (int i = 0; i < (int)sub.size(); ++i) cout << sub[i];
50 cout << '\n';
51
52 return 0;
53}
54

The first half shows gp_hash_table counting occurrences of integer pairs with a custom hash to reduce collisions, achieving expected O(1) performance. The second half demonstrates rope operations: building text by appending and inserting, erasing a prefix, and extracting a substring. Iterating to print is straightforward; random access r[i] is O(log n).

Time: gp_hash_table operations are expected O(1); rope edits are O(log n + k)Space: O(n) for both structures (with overhead for buckets/nodes)