Policy-Based Data Structures
Key Points
- •Policy-Based Data Structures (PBDS) are GNU C++ extensions that add advanced containers like an order-statistics tree, a fast hash table, and ropes for efficient string edits.
- •The order-statistics tree supports fin_order(k) to get the k-th smallest element and orde_key(x) to get the rank of x in O(log n) time.
- •You can emulate an ordered multiset by storing pairs (value, uniqu) to handle duplicates while keeping order-statistics queries valid.
- •g_table is a hash table with expected O(1) operations and can outperform std::unordere in many competitive programming workloads.
- •rope is a balanced-tree string structure that makes insertions, deletions, and substrings around O(log n) time, helpful for text-editing style problems.
- •These extensions are GNU-specific; they compile on Codeforces, AtCoder (GCC), and most CP judges, but are not standard C++.
- •Use PBDS when STL containers are missing rank/k-th operations or when you need faster hashing or heavy string splicing.
- •Be careful with duplicates, 0-indexing in fin_order, and strict comparators; improper use leads to wrong answers or compile errors.
Prerequisites
- →Binary search trees and balancing (red-black trees) — Understanding how balanced trees maintain O(log n) height explains the complexity guarantees of the PBDS tree.
- →Big-O notation and logarithms — Analyzing PBDS operations and comparing alternatives requires comfort with O(log n), O(n log n), and amortized costs.
- →C++ templates and STL containers — PBDS uses templates heavily and integrates with set/map-like interfaces.
- →Comparators and strict weak ordering — Correct comparator design is crucial for ordered containers and for handling duplicates via composite keys.
- →Hashing fundamentals — To use gp_hash_table effectively and safely, you need to reason about hash functions, collisions, and load factor.
- →Basic string data structures — Knowing std::string limitations helps you see when rope is advantageous for splicing-heavy workloads.
Detailed Explanation
Tap terms for definitions01Overview
Policy-Based Data Structures (PBDS) are a set of useful container templates in the GNU C++ library that extend standard containers with operations that are common in algorithms and competitive programming. The star is the order-statistics tree (tree<...> with tree_order_statistics_node_update), a balanced binary search tree that supports both rank queries (how many elements are strictly less than x) and selection queries (give me the k-th smallest element). Alongside it, gp_hash_table offers a high-performance hash map with expected O(1) operations and rope provides fast string concatenation, insertion, deletion, and substring extraction by storing text in a balanced tree rather than a contiguous array. Together, these tools fill gaps where the C++ STL lacks certain operations or speed characteristics, enabling simpler solutions with better asymptotic complexity. In Codeforces-range problems (1200–1600), PBDS can dramatically reduce code complexity for tasks involving dynamic order statistics (k-th, rank), frequency maps on large inputs, or many string splices. While PBDS is not part of the C++ standard, it is widely supported on GCC-based judges; you should still avoid it on compilers that do not ship the GNU extensions.
02Intuition & Analogies
Imagine a neatly sorted bookshelf where you can quickly answer two questions: (1) What is the position of a given book? and (2) Which book is in position k? A normal set only keeps the books sorted, but gives you neither the exact rank of a book nor lets you ask for the k-th book directly. The order-statistics tree is like adding an index card to each shelf section that tracks how many books are to the left, so you can immediately jump to the k-th or count how many come before a particular title. For duplicates, if two copies of the same book exist, we attach a small ticket number to each copy so they become distinct while keeping their order by title and then by ticket. For a hash table, think of labeled boxes where each label is computed from the item’s name (a hash). If boxes are well distributed, you almost instantly reach the right one. The gp_hash_table uses a very practical system of boxes and probing to find a place fast under average conditions. For strings, a rope is like storing a long text as many shorter strings connected by a tree of binders. Splicing in or cutting out a paragraph doesn’t require shifting all text to the right like a standard string would; you just rearrange a few binders at the splice point. This is why rope excels at repeated insert/erase/substring operations compared to contiguous storage.
03Formal Definition
04When to Use
Use the order-statistics tree when you need dynamic rank queries or k-th element selection alongside standard set/map behavior. Typical tasks include maintaining a running median, counting inversions, coordinate compression on-the-fly, selecting the k-th smallest value after insertions/deletions, and computing ranks in leaderboards. When duplicates are required, store pairs (value, unique_id) to emulate a multiset with order statistics. Choose gp_hash_table when you need fast hash maps for integer keys or lightweight composites like pairs and when std::unordered_map is too slow. It shines in frequency counting, sliding-window maps, and adjacency maps in dense input scenarios—common in competitive programming. Provide a robust custom hash for composite keys to reduce collisions and avoid adversarial slowdowns. Pick rope when a problem requires many string splices: insertions, deletions, and substrings at arbitrary positions—like simulating a simple text editor, processing many cut-and-paste operations, or building strings incrementally with frequent middle insertions. If you only append to the end or need random indexing constantly, std::string or std::vector<char> may be simpler and faster.
⚠️Common Mistakes
• Using less_equal or a non-strict comparator in tree<...>. The Compare must model a strict weak ordering (like std::less). To support duplicates, do not use less_equal; instead, store pairs (value, unique_id) so elements remain unique under a strict comparator. • Forgetting 0-based indexing. find_by_order(k) uses 0-based k, so the smallest element is k = 0. Off-by-one errors are common when converting to 1-based ranks. • Assuming order_of_key(x) returns the index of x even when x is absent. It actually returns how many elements are strictly less than x. If x is not present, this is still a valid count, not an error. • Erasing from an ordered-multiset emulation incorrectly. If you store pairs (value, id), calling erase(value) does nothing because the key type is pair. You must locate and erase a specific iterator, usually with lower_bound({value, -INF}). • Treating PBDS as standard C++. PBDS is GNU-specific; it compiles on GCC-based judges but may fail elsewhere (e.g., MSVC). Include the right headers: <ext/pb_ds/assoc_container.hpp>, <ext/pb_ds/tree_policy.hpp>, and <ext/rope>. • Weak hashes in gp_hash_table. With composite keys, supply a strong custom hash; otherwise, collisions may degrade performance. Avoid pathologically chosen inputs when possible. • Expecting rope to behave like std::string in all ways. Random access r[i] is O(log n), not O(1). Printing a rope typically involves iterating or indexing; converting directly to std::string is not a built-in convenience. • Forgetting to use tree_order_statistics_node_update in the tree type alias. Without it, find_by_order and order_of_key do not exist.
Key Formulas
Rank definition
Explanation: The rank of x in a sorted set S is the number of elements strictly less than x. In PBDS this is computed by orde_key(x).
Select (k-th element)
Explanation: Selecting the k-th element means returning the element that would appear at index k in sorted order. In PBDS this is performed by fin_order(k).
Order-statistics tree operation time
Explanation: Each insert, erase, fin_order, and orde_key takes logarithmic time in the number of elements because the underlying tree is balanced.
Building cost from incremental inserts
Explanation: Inserting n elements one by one into a balanced tree yields overall (n log n) time. This follows from the sum of logarithms.
Inversion count
Explanation: The number of inversions in an array counts out-of-order pairs. PBDS can compute this in O(n log n) by counting greater elements to the left.
Expected hash table time
Explanation: Hash operations are expected constant time when the load factor is controlled and the hash function spreads keys uniformly.
Rope edit time
Explanation: Inserting or erasing a substring in a rope takes logarithmic time to navigate the tree plus linear time in the size of the edited block.
Order of key equals rank
Explanation: The value returned by orde_key(x) is exactly the rank of x, i.e., the count of elements strictly less than x.
Select–rank relationship
Explanation: For distinct elements, the k-th element has rank k. This connects fin_order and orde_key directly.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> // For convenience in CP 2 #include <ext/pb_ds/assoc_container.hpp> 3 #include <ext/pb_ds/tree_policy.hpp> 4 5 using namespace std; 6 using namespace __gnu_pbds; 7 8 // Alias for an order-statistics set of ints 9 // Key type = int, mapped type = null_type (set behavior) 10 // Comparator = less<int> (strict weak ordering) 11 // Tag = rb_tree_tag (red-black tree) 12 // Node update = tree_order_statistics_node_update (enables order stats) 13 template <class T> 14 using ordered_set = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>; 15 16 int main() { 17 ios::sync_with_stdio(false); 18 cin.tie(nullptr); 19 20 ordered_set<int> os; // empty 21 22 // Insert elements (duplicates are ignored since this is a set) 23 os.insert(10); 24 os.insert(5); 25 os.insert(20); 26 os.insert(10); // ignored 27 28 // 1) order_of_key(x): how many elements are < x 29 cout << "order_of_key(15) = " << os.order_of_key(15) << "\n"; // {5,10,20} -> 2 30 cout << "order_of_key(5) = " << os.order_of_key(5) << "\n"; // -> 0 31 cout << "order_of_key(100) = " << os.order_of_key(100) << "\n"; // -> 3 32 33 // 2) find_by_order(k): iterator to k-th smallest (0-based) 34 for (int k = 0; k < (int)os.size(); ++k) { 35 cout << "k=" << k << ", element=" << *os.find_by_order(k) << "\n"; 36 } 37 38 // Erase an element and query again 39 os.erase(10); // remove value 10 40 cout << "After erasing 10, size=" << os.size() << "\n"; 41 if (!os.empty()) { 42 cout << "Smallest element is now: " << *os.find_by_order(0) << "\n"; 43 } 44 45 // Check presence (normal set interface works) 46 cout << (os.find(20) != os.end() ? "20 present" : "20 missing") << "\n"; 47 48 return 0; 49 } 50
This program demonstrates the two special operations: order_of_key(x) returns the rank of x (count of elements less than x), and find_by_order(k) retrieves the k-th smallest element (0-based). The container behaves like a std::set with extra order-statistics power.
1 #include <bits/stdc++.h> 2 #include <ext/pb_ds/assoc_container.hpp> 3 #include <ext/pb_ds/tree_policy.hpp> 4 5 using namespace std; 6 using namespace __gnu_pbds; 7 8 template <class T> 9 using ordered_tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>; 10 11 // We emulate a multiset by storing (value, unique_id) so that duplicates are distinct. 12 // Comparator is lexicographic: first by value, then by id. 13 14 int main() { 15 ios::sync_with_stdio(false); 16 cin.tie(nullptr); 17 18 ordered_tree<pair<int,int>> ms; // ordered multiset emulation 19 int next_id = 0; // unique id for duplicates 20 21 // Simple interactive demo: I x (insert), D x (delete one x), M (print median), Q (quit) 22 // Example input: 23 // I 5 24 // I 1 25 // I 5 26 // M 27 // D 5 28 // M 29 30 char op; 31 while (cin >> op) { 32 if (op == 'I') { 33 int x; cin >> x; 34 ms.insert({x, next_id++}); 35 } else if (op == 'D') { 36 int x; cin >> x; 37 // Erase a single occurrence of x if present 38 auto it = ms.lower_bound({x, numeric_limits<int>::min()}); 39 if (it != ms.end() && it->first == x) ms.erase(it); 40 } else if (op == 'M') { 41 if (ms.empty()) { 42 cout << "EMPTY\n"; 43 } else { 44 int n = (int)ms.size(); 45 // Lower median for even n 46 int k = (n - 1) / 2; 47 int median = ms.find_by_order(k)->first; 48 cout << median << "\n"; 49 } 50 } else if (op == 'Q') { 51 break; 52 } 53 } 54 55 return 0; 56 } 57
By storing pairs (value, unique_id), we support duplicates while preserving a strict weak ordering. We can find the running median by reading the element at order (n−1)/2 using find_by_order. Deleting a single occurrence of x uses lower_bound on (x, −INF) to grab one copy.
1 #include <bits/stdc++.h> 2 #include <ext/pb_ds/assoc_container.hpp> 3 #include <ext/pb_ds/tree_policy.hpp> 4 5 using namespace std; 6 using namespace __gnu_pbds; 7 8 template <class T> 9 using ordered_tree = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>; 10 11 int main() { 12 ios::sync_with_stdio(false); 13 cin.tie(nullptr); 14 15 int n; 16 if (!(cin >> n)) return 0; 17 vector<long long> a(n); 18 for (int i = 0; i < n; ++i) cin >> a[i]; 19 20 // To handle duplicates correctly, store (value, index) 21 ordered_tree<pair<long long,int>> os; 22 long long inv = 0; 23 24 for (int i = 0; i < n; ++i) { 25 // Number of elements <= a[i] is order_of_key({a[i], +INF}) 26 int le = os.order_of_key({a[i], INT_MAX}); 27 int inserted = i; 28 int greater = inserted - le; // strictly greater to the left 29 inv += greater; 30 os.insert({a[i], i}); 31 } 32 33 cout << inv << "\n"; 34 return 0; 35 } 36
We iterate left to right and keep a PBDS of previously seen elements as (value, index). For the current a[i], elements greater than a[i] among inserted ones equal (inserted_count − count_of_elements_≤a[i]). Using order_of_key({a[i], INT_MAX}) counts all elements with value ≤ a[i]. Summing this over i yields the inversion count.
1 #include <bits/stdc++.h> 2 #include <ext/pb_ds/assoc_container.hpp> 3 #include <ext/rope> 4 5 using namespace std; 6 using namespace __gnu_pbds; 7 using namespace __gnu_cxx; // for rope 8 9 // Strong 64-bit mix for pairs to reduce collisions 10 struct PairHash { 11 size_t operator()(const pair<int,int>& p) const noexcept { 12 uint64_t x = (static_cast<uint64_t>(static_cast<uint32_t>(p.first)) << 32) ^ static_cast<uint32_t>(p.second); 13 x ^= 0x9e3779b97f4a7c15ULL; 14 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL; 15 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL; 16 x = x ^ (x >> 31); 17 return static_cast<size_t>(x); 18 } 19 }; 20 21 int main() { 22 ios::sync_with_stdio(false); 23 cin.tie(nullptr); 24 25 // Part 1: gp_hash_table for counting frequency of 2D points 26 gp_hash_table<pair<int,int>, int, PairHash> freq; 27 vector<pair<int,int>> pts = {{1,2},{2,3},{1,2},{4,5},{2,3},{2,3}}; 28 for (auto &p : pts) freq[p]++; 29 30 cout << "Frequencies (gp_hash_table):\n"; 31 // Use an auxiliary set to avoid duplicate prints 32 set<pair<int,int>> printed(pts.begin(), pts.end()); 33 for (auto &p : printed) { 34 cout << '(' << p.first << ',' << p.second << ") -> " << freq[p] << '\n'; 35 } 36 37 // Part 2: rope for efficient text edits 38 rope<char> r; 39 r.append("hello"); 40 r.push_back(' '); 41 r.append("world"); // r = "hello world" 42 r.insert(5, ", brave new"); // insert after "hello" 43 r.erase(0, 1); // erase first character 44 rope<char> sub = r.substr(0, 12); // first 12 characters 45 46 cout << "\nRope full text: "; 47 for (int i = 0; i < (int)r.size(); ++i) cout << r[i]; 48 cout << "\nRope substring: "; 49 for (int i = 0; i < (int)sub.size(); ++i) cout << sub[i]; 50 cout << '\n'; 51 52 return 0; 53 } 54
The first half shows gp_hash_table counting occurrences of integer pairs with a custom hash to reduce collisions, achieving expected O(1) performance. The second half demonstrates rope operations: building text by appending and inserting, erasing a prefix, and extracting a substring. Iterating to print is straightforward; random access r[i] is O(log n).