🗂️Data StructureIntermediate

Hash Table

Key Points

•
A hash table stores key–value pairs and finds items in expected O(1) time using a hash function to map keys to buckets.
•
Collisions are inevitable; the two main ways to handle them are chaining (linked buckets) and open addressing (probing empty slots).
•
In C++, unordere $d_{m} a p$ and unordere $d_{s} e t$ are standard hash tables that use chaining and can be tuned with reserve and ma $x_{l} o a d$ _factor.
•
Designing a good hash function matters; poor hashing or adversarial inputs can degrade performance to O(n) per operation.
•
Use randomized hashing (e.g., splitmix64 with a random seed) to mitigate anti-hash attacks on competitive programming platforms.
•
For pairs, vectors, and custom structs, you must provide custom hash functors to use them as unordere $d_{m} a p$ keys.
•
Open addressing has different performance behavior; probe counts grow as the load factor approaches 1, so rehashing is crucial.
•
The g $p_{h} a s h$ _table from p $b_{d} s$ often performs faster than unordere $d_{m} a p$ in practice but still needs careful hashing and seeding.

Prerequisites

→Arrays and indexing — Hash tables map hash values to array indices; understanding array operations is fundamental.
→Modular arithmetic — Hash functions commonly use modulo operations to fit values into a fixed bucket range.
→Basic probability — Expected-time analysis assumes uniform hashing and uses probability for collision reasoning.
→Big-O notation and amortized analysis — Understanding expected O(1) operations and amortized costs is essential for performance guarantees.
→C++ templates and functors — Custom hashers are implemented as templated functors passed to unordered_map/set or gp_hash_table.
→Balanced BSTs vs hash tables — Knowing when to choose ordered structures (map/set) versus hash-based ones prevents misuse.

Detailed Explanation

Tap terms for definitions

01Overview

A hash table is a data structure that lets you store and retrieve items by key in expected constant time. The key idea is to apply a hash function to a key to produce an integer, then use that integer to choose a bucket (index) in an array. Because many different keys can map to the same bucket (a collision), a hash table must have a collision resolution strategy. The two common strategies are chaining (each bucket stores a small list of items that hashed there) and open addressing (probe other slots in the array until you find an empty one). In C++, unordered_map and unordered_set implement hash tables with chaining and typically give O(1) expected time for insert, find, and erase. However, in the worst case—like when an adversary crafts inputs that collide—operations can degrade to O(n). Competitive programmers often protect against this by using randomized hash functions (e.g., splitmix64 with a random seed) or third-party structures like gp_hash_table from the GNU pb_ds library, which are often faster in practice. Choosing a good load factor (ratio of items to buckets) and rehashing when necessary are important for keeping performance predictable.

02Intuition & Analogies

Imagine a row of lockers (the array of buckets). Every student (key) is assigned a locker number by a simple rule: take their student ID, do some arithmetic, and get a locker index. Most of the time, each locker holds exactly the right student’s stuff. But sometimes two students are assigned the same locker—that’s a collision. With chaining, we allow multiple students to share a locker by keeping a small list inside it; when a student comes to retrieve items, they check the list quickly. With open addressing, if a student finds their locker busy, they try the next locker, and the next, following a deterministic pattern (linear probing, quadratic probing, or double hashing) until they find an empty one. The key to speed is making sure the lockers aren’t too crowded (low load factor), and that our rule (hash function) spreads students out uniformly. If a prankster gives many students IDs that always map to the same locker, the line at that locker grows long, and everyone slows down. Randomizing our rule (adding a secret seed) makes the prankster’s job hard because, without knowing the secret, they can’t easily force all students into one locker. In practice, you also pick a bigger hallway of lockers when the school gets more students: that’s rehashing—build a bigger array and reassign everyone quickly using the same rule.

03Formal Definition

Let U be the universe of keys and H: U

\to

\{0,1,

\dots

,m-1\} be a hash function that maps each key to one of m buckets. For a set S

\subseteq

U with

∣ S ∣

= n, the load factor is

α

\frac{n}{m}

. A chained hash table stores, for each bucket i, a small list (typically linked list or vector) of elements whose keys hash to i. Under the Simple Uniform Hashing Assumption (SUHA)—each key is equally likely to hash to any bucket independently—the expected bucket size is

α

, and search/insert/delete run in expected O(1 +

α

) time. Open addressing stores at most one item per bucket and uses a probe sequence p(k,0), p(k,1), ... to find a slot for key k; examples include linear probing, quadratic probing, and double hashing. For linear probing under SUHA with load factor

α

< 1, expected unsuccessful search probes about

\frac{1}{2}\left

(1 +

\frac{1}{(1-\alpha)^2}\right

), while expected successful search probes about

\frac{1}{2}\left

(1 +

\frac{1}{1-\alpha}\right

). In the worst case (adversarial keys or degenerate hashing), any standard hash table operation can take

Θ

(n). Randomized hashing and universal hash families reduce adversarial collision probability, offering expected guarantees even against adaptive inputs.

04When to Use

Use a hash table whenever you need fast expected O(1) insert, find, and erase for keys without ordering requirements. Typical tasks include frequency counting (word counts, value frequencies), membership queries (set lookups), building adjacency maps in graphs, memoization/cache maps for dynamic programming, and mapping composite keys (like pairs or tuples) to values. In competitive programming, unordered_map and unordered_set handle most needs quickly, especially with reserve to reduce rehashing. When input size is large and performance is tight, gp_hash_table can be faster due to lower constant factors. If you require ordered iteration or operations like predecessor/successor, prefer balanced BSTs (map/set). If you need to handle extremely memory-constrained environments or require very predictable probe counts, consider open addressing variants (like linear probing with careful load-factor control) or specialized hash tables (robin hood hashing, hopscotch). For cryptographic collision resistance or untrusted inputs, do not rely on standard hash functions; instead, use randomized hashers (splitmix64 with a secret seed) or a universal hashing scheme, and consider salting keys to mitigate collision-based denial-of-service.

⚠️Common Mistakes

Forgetting to call reserve on unordered_map/unordered_set before bulk inserts, causing many rehashes and performance drops. Fix: call umap.reserve(expected_size) and optionally set max_load_factor.
Using composite keys (pair, vector, struct) without providing a proper hash function, leading to compilation errors or poor hashing. Fix: write a custom hash functor that also defines equality consistently.
Assuming O(1) worst-case time and ignoring adversarial inputs. Fix: know that worst-case is O(n) per operation; mitigate with randomized hashing and reasonable load factors.
Writing weak hashes (e.g., returning key % m directly or combining fields with naive XOR) that cause patterns and clustering. Fix: use well-mixed 64-bit hashers like splitmix64, and combine fields with robust mixing and a random seed.
Misusing operator[] on unordered_map when you only want to test membership; it inserts a default value and changes size. Fix: use find or contains (C++20) for lookup without insertion.
Iteration invalidation surprises after rehash. Fix: know that rehash invalidates iterators and references; minimize holding long-lived iterators across inserts/rehashes.
High load factor in open addressing leading to skyrocketing probe counts. Fix: trigger rehash around 0.5–0.75 and ensure good probing strategies.
Comparing pointers as keys when objects move or addresses are unstable. Fix: hash the content or use stable IDs instead of raw addresses.

Key Formulas

Load Factor

α = \frac{n}{m}

Explanation: Alpha is the average number of elements per bucket when n items are stored in m buckets. Keeping $α$ small (e.g., around 0.5–1.0) helps ensure short chains and fast operations.

Chaining Expected Chain Length

E [chain length] = α

Explanation: Under uniform hashing, the expected number of elements in any bucket is the load factor. This implies average search and insert time is proportional to 1 + $α$ .

Expected Time with Chaining

T_{chaining} = O (1 + α)

Explanation: Search/insert/delete in a chained hash table take constant time on average plus a term linear in the load factor. When $α$ is bounded, this becomes expected O(1).

Linear Probing Successful Search

E [probes_{succ}] \approx \frac{1}{2} (1 + \frac{1}{1 - α})

Explanation: For open addressing with linear probing under uniform hashing, the expected number of probes for a successful search grows as the table fills. It diverges as $α$ $\to$ 1.

Linear Probing Unsuccessful Search

E [probes_{unsucc}] \approx \frac{1}{2} (1 + \frac{1}{( 1 - α ) ^{2}})

Explanation: The expected probes for an unsuccessful search grow faster than for successful searches as the load factor increases, highlighting the need to rehash before the table is too full.

Universal Hashing Collision Bound

Pr [collision for two keys] \leq \frac{1}{m}

Explanation: For a universal hash family, the probability that two distinct keys collide is at most 1/m. This provides expected guarantees even against adversarial inputs.

Birthday Collision Approximation

P (at least one collision) \approx 1 - e^{- \frac{n ( n - 1 )}{2 m}}

Explanation: When hashing n random keys into m buckets, the chance of any collision follows a birthday-paradox-like curve. Collisions become likely once n is about $m$ .

Affine Universal Hash

h_{a, b} (x) = ((a x + b) mod p) mod m

Explanation: Choosing random a,b with 1 $\leq$ $a < p$ and 0 $\leq$ $b < p$ for prime p creates a universal family. It is simple and provides low collision probability for arbitrary inputs.

Amortized Insert

T_{amortized insert} = O (1)

Explanation: Although occasional rehashing costs O(n), spreading that cost over many inserts yields constant amortized time per insertion.

Space Complexity

S = O (m + n)

Explanation: A chained table stores m buckets plus n elements; an open-address table stores about m $\approx$ $\frac{n}{α}$ slots. Space is linear in the number of elements.

Complexity Analysis

Under the uniform hashing assumption, a chained hash table with m buckets and n elements has load factor

α

= n/m. Each operation examines O(1 +

α

) elements on average because the expected chain length in any bucket is

α

. If

α

is kept bounded by rehashing (increasing m as n grows), search, insert, and erase run in expected O(1) time. Worst-case behavior occurs when many keys collide into the same bucket; then operations degrade to

Θ

(n) for that bucket. Rehashing occasionally requires O(n) time to move all elements into a larger table, but since it happens infrequently (once per growth step), the amortized insertion time remains O(1). For open addressing, there is only one item per slot, so probe sequences are used to find positions. As the load factor

α

approaches 1, expected probes grow rapidly. For linear probing, expected successful search probes are about

\frac{1}{2}

(1 + 1/(1-

α

)), and unsuccessful about

\frac{1}{2}

(1 + 1/(1-

α

)^2). Hence, keeping

α

\leq

0.5–0.75 is crucial. Open addressing uses contiguous arrays, which improves cache locality and constants, but deletions require tombstones or rehashing. Space usage is O(m + n) for chaining (m buckets plus nodes storing keys/values and pointers). For open addressing, space is O(m), where m

\approx

α

. In practice, constant factors differ: chaining stores pointers and nodes (overhead), while open addressing stores keys/values inline (denser). Finally, adversarial inputs can force

Θ

(n) per operation; randomized hashing (e.g., splitmix64 with a random seed) reduces the chance of concentrated collisions and recovers expected O(1) bounds against adaptive inputs.

Code Examples

Basics of unordered_map and unordered_set with tuning (reserve, max_load_factor)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 int main() {
5     ios::sync_with_stdio(false);
6     cin.tie(nullptr);
7 
8     // Example: frequency counting with unordered_map
9     unordered_map<string, int> freq;
10 
11     // Tuning: reserve enough buckets to avoid many rehashes
12     // Suppose we expect up to 100000 distinct words
13     freq.reserve(100000);
14     freq.max_load_factor(0.7f); // keep chains short
15 
16     vector<string> words = {"apple","banana","apple","pear","banana","apple"};
17     for (const auto &w : words) {
18         // operator[] inserts if missing, then increments
19         ++freq[w];
20     }
21 
22     // Query: check membership without inserting using find/contains
23 #if __cplusplus >= 202002L
24     bool has_orange = freq.contains("orange");
25 #else
26     bool has_orange = (freq.find("orange") != freq.end());
27 #endif
28 
29     cout << "has_orange? " << (has_orange ? "yes" : "no") << "\n";
30 
31     // Iterate and print frequencies
32     for (auto &kv : freq) {
33         cout << kv.first << ": " << kv.second << "\n";
34     }
35 
36     // unordered_set example: membership only
37     unordered_set<long long> seen;
38     seen.reserve(1000);
39     seen.insert(10);
40     seen.insert(20);
41     cout << (seen.count(10) ? "present" : "absent") << "\n";
42 
43     // Introspection: bucket_count and load_factor
44     cout << "bucket_count: " << freq.bucket_count() << " load_factor: " << freq.load_factor() << "\n";
45 
46     return 0;
47 }
48

This program demonstrates typical operations: insert via operator[] for unordered_map, membership queries via find/contains, and iteration. It also shows performance tuning with reserve and max_load_factor to reduce rehashing and keep expected O(1) operations. The unordered_set example shows pure membership use cases.

Time: Expected O(1) per insert/lookup/erase; rehash is O(n) when it occurs.Space: O(n) for elements plus O(m) buckets; overall linear.

Custom randomized hash for pairs and vectors to resist anti-hash attacks

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // A strong 64-bit mixer: splitmix64
5 static uint64_t splitmix64(uint64_t x) {
6     x += 0x9e3779b97f4a7c15ULL;
7     x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
8     x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
9     return x ^ (x >> 31);
10 }
11 
12 struct custom_hash {
13     // Random seed to defeat adversarial inputs
14     const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
15 
16     // Hash for 64-bit integers
17     size_t operator()(uint64_t x) const noexcept {
18         return (size_t)splitmix64(x + FIXED_RANDOM);
19     }
20 
21     // Overload for signed long long
22     size_t operator()(long long x) const noexcept {
23         return (*this)((uint64_t)x);
24     }
25 
26     // Hash combine helper
27     size_t combine(size_t a, size_t b) const noexcept {
28         // Similar to boost::hash_combine but with extra mixing
29         uint64_t res = a;
30         res ^= b + 0x9e3779b97f4a7c15ULL + (res << 6) + (res >> 2);
31         return (size_t)splitmix64(res + FIXED_RANDOM);
32     }
33 
34     // Hash for pair<T,U>
35     template<class T, class U>
36     size_t operator()(const pair<T,U>& p) const noexcept {
37         size_t h1 = (*this)((uint64_t)std::hash<T>{}(p.first));
38         size_t h2 = (*this)((uint64_t)std::hash<U>{}(p.second));
39         return combine(h1, h2);
40     }
41 
42     // Hash for vector<int> (illustrative; for other types, generalize)
43     size_t operator()(const vector<int>& v) const noexcept {
44         size_t h = 0;
45         for (int x : v) {
46             size_t hx = (*this)((uint64_t)(uint32_t)x);
47             h = combine(h, hx);
48         }
49         return h;
50     }
51 };
52 
53 int main() {
54     ios::sync_with_stdio(false);
55     cin.tie(nullptr);
56 
57     // Using pair<int,int> as a key
58     unordered_map<pair<int,int>, int, custom_hash> grid_cost;
59     grid_cost.reserve(1 << 16);
60 
61     grid_cost[{1,2}] = 5;
62     grid_cost[{2,3}] = 7;
63     cout << grid_cost[{1,2}] << "\n"; // 5
64 
65     // Using vector<int> as a key
66     unordered_map<vector<int>, string, custom_hash> pattern_name;
67     pattern_name.reserve(1024);
68     pattern_name[{1,2,3}] = "seq123";
69     pattern_name[{2,3,5,7}] = "primes";
70     cout << pattern_name[{2,3,5,7}] << "\n";
71 
72     // Anti-hash: show that many equal hashes won't easily occur with random seed
73     unordered_set<long long, custom_hash> us;
74     us.reserve(100000);
75     for (long long i = 0; i < 100000; ++i) us.insert(i * 1000003LL);
76     cout << "size: " << us.size() << "\n"; // Should be 100000
77 
78     return 0;
79 }
80

We define a custom_hash using splitmix64 with a random per-process seed to complicate adversarial collision attempts. The functor supports 64-bit integers, pairs, and vectors by robustly combining component hashes. This enables unordered containers to accept composite keys while maintaining good performance.

Time: Expected O(1) per operation; hashing a key is O(k) where k is the key’s length (e.g., vector size).Space: O(n) for stored elements; hash functor has negligible overhead.

Implementing a simple open-addressing hash table (linear probing) for integers

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 class LinearProbingHash {
5     // States for each slot
6     enum State : uint8_t { EMPTY = 0, FILLED = 1, DELETED = 2 };
7 
8     struct Slot {
9         long long key = 0;
10         State state = EMPTY;
11     };
12 
13     vector<Slot> table;
14     size_t n = 0;              // number of FILLED keys
15     double max_alpha = 0.7;    // rehash threshold
16 
17     static inline uint64_t splitmix64(uint64_t x) {
18         x += 0x9e3779b97f4a7c15ULL;
19         x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
20         x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
21         return x ^ (x >> 31);
22     }
23 
24     size_t H(long long x) const {
25         // Simple 64-bit mixing, then map into table size
26         return (size_t)(splitmix64((uint64_t)x)) & (table.size() - 1);
27     }
28 
29     void rehash(size_t new_m) {
30         // new_m must be a power of two for &-masking to work
31         vector<Slot> old = move(table);
32         table.assign(new_m, {});
33         n = 0;
34         for (const auto &s : old) if (s.state == FILLED) insert(s.key);
35     }
36 
37     void maybe_grow() {
38         if ((double)n / (double)table.size() > max_alpha) {
39             rehash(table.size() * 2);
40         }
41     }
42 
43 public:
44     explicit LinearProbingHash(size_t m = 8) {
45         // Round up m to power of two
46         size_t pw = 1;
47         while (pw < m) pw <<= 1;
48         table.assign(pw, {});
49     }
50 
51     bool insert(long long key) {
52         maybe_grow();
53         size_t m = table.size();
54         size_t i = H(key);
55         size_t first_deleted = SIZE_MAX;
56         while (true) {
57             if (table[i].state == EMPTY) {
58                 size_t pos = (first_deleted != SIZE_MAX ? first_deleted : i);
59                 table[pos].key = key;
60                 table[pos].state = FILLED;
61                 ++n;
62                 return true; // inserted new
63             } else if (table[i].state == DELETED) {
64                 if (first_deleted == SIZE_MAX) first_deleted = i; // remember first tombstone
65             } else if (table[i].state == FILLED && table[i].key == key) {
66                 return false; // already present
67             }
68             i = (i + 1) & (m - 1); // linear probing with wrap-around
69         }
70     }
71 
72     bool contains(long long key) const {
73         size_t m = table.size();
74         size_t i = H(key);
75         while (true) {
76             if (table[i].state == EMPTY) return false; // stop at empty slot
77             if (table[i].state == FILLED && table[i].key == key) return true;
78             i = (i + 1) & (m - 1);
79         }
80     }
81 
82     bool erase(long long key) {
83         size_t m = table.size();
84         size_t i = H(key);
85         while (true) {
86             if (table[i].state == EMPTY) return false;
87             if (table[i].state == FILLED && table[i].key == key) {
88                 table[i].state = DELETED; // tombstone to preserve probe chains
89                 --n;
90                 return true;
91             }
92             i = (i + 1) & (m - 1);
93         }
94     }
95 
96     size_t size() const { return n; }
97     size_t capacity() const { return table.size(); }
98 };
99 
100 int main() {
101     ios::sync_with_stdio(false);
102     cin.tie(nullptr);
103 
104     LinearProbingHash hs;
105     for (int i = 0; i < 10; ++i) hs.insert(i * 3);
106 
107     cout << "contains 9? " << (hs.contains(9) ? "yes" : "no") << "\n";
108     cout << "contains 12? " << (hs.contains(12) ? "yes" : "no") << "\n";
109     cout << "size: " << hs.size() << " capacity: " << hs.capacity() << "\n";
110 
111     hs.erase(9);
112     cout << "contains 9 after erase? " << (hs.contains(9) ? "yes" : "no") << "\n";
113 
114     return 0;
115 }
116

This is a minimal open-addressing hash table for integers using linear probing with tombstones and power-of-two capacity for fast modulo via bit-masking. It demonstrates hashing, probing, insertion, lookup, deletion, and dynamic growth via rehashing at a target load factor.

Time: Expected O(1) per operation when load factor is bounded; worst-case O(n).Space: O(m) where m is the current table capacity.

High-performance gp_hash_table (pb_ds) with randomized hasher

1 #include <bits/stdc++.h>
2 #include <ext/pb_ds/assoc_container.hpp>
3 using namespace std;
4 using namespace __gnu_pbds;
5 
6 static uint64_t splitmix64_u(uint64_t x) {
7     x += 0x9e3779b97f4a7c15ULL;
8     x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
9     x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
10     return x ^ (x >> 31);
11 }
12 
13 struct fast_hash {
14     const uint64_t SEED = chrono::steady_clock::now().time_since_epoch().count();
15     size_t operator()(uint64_t x) const { return (size_t)splitmix64_u(x + SEED); }
16     size_t operator()(long long x) const { return operator()((uint64_t)x); }
17 };
18 
19 int main() {
20     ios::sync_with_stdio(false);
21     cin.tie(nullptr);
22 
23     // gp_hash_table<Key, Mapped, Hash>
24     gp_hash_table<long long, int, fast_hash> mp;
25     mp.resize(1 << 16); // pre-allocate buckets
26 
27     // Example: count occurrences
28     vector<long long> a = {1,2,3,2,1,4,5,2,1};
29     for (auto x : a) mp[x]++;
30 
31     cout << "count(2) = " << mp[2] << "\n";
32 
33     // Membership-style container: value-less map (mapped type = null_type)
34     gp_hash_table<long long, null_type, fast_hash> st;
35     st.resize(1 << 16);
36     st.insert(10);
37     st.insert(20);
38     cout << (st.find(10) != st.end() ? "present" : "absent") << "\n";
39 
40     return 0;
41 }
42

The GNU pb_ds gp_hash_table is often faster than unordered_map due to lower constant factors and cache-friendly design. We use a randomized fast_hash to reduce collision risks. The example shows both key→value and key-only (set-like) usage by using null_type for the mapped value.

Time: Expected O(1) per operation; rehash/resizes are O(n).Space: O(n) with implementation-dependent overhead.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	int main() {
5	ios::sync_with_stdio(false);
6	cin.tie(nullptr);
7
8	// Example: frequency counting with unordered_map
9	unordered_map<string, int> freq;
10
11	// Tuning: reserve enough buckets to avoid many rehashes
12	// Suppose we expect up to 100000 distinct words
13	freq.reserve(100000);
14	freq.max_load_factor(0.7f); // keep chains short
15
16	vector<string> words = {"apple","banana","apple","pear","banana","apple"};
17	for (const auto &w : words) {
18	// operator[] inserts if missing, then increments
19	++freq[w];
20	}
21
22	// Query: check membership without inserting using find/contains
23	#if __cplusplus >= 202002L
24	bool has_orange = freq.contains("orange");
25	#else
26	bool has_orange = (freq.find("orange") != freq.end());
27	#endif
28
29	cout << "has_orange? " << (has_orange ? "yes" : "no") << "\n";
30
31	// Iterate and print frequencies
32	for (auto &kv : freq) {
33	cout << kv.first << ": " << kv.second << "\n";
34	}
35
36	// unordered_set example: membership only
37	unordered_set<long long> seen;
38	seen.reserve(1000);
39	seen.insert(10);
40	seen.insert(20);
41	cout << (seen.count(10) ? "present" : "absent") << "\n";
42
43	// Introspection: bucket_count and load_factor
44	cout << "bucket_count: " << freq.bucket_count() << " load_factor: " << freq.load_factor() << "\n";
45
46	return 0;
47	}
48

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// A strong 64-bit mixer: splitmix64
5	static uint64_t splitmix64(uint64_t x) {
6	x += 0x9e3779b97f4a7c15ULL;
7	x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
8	x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
9	return x ^ (x >> 31);
10	}
11
12	struct custom_hash {
13	// Random seed to defeat adversarial inputs
14	const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
15
16	// Hash for 64-bit integers
17	size_t operator()(uint64_t x) const noexcept {
18	return (size_t)splitmix64(x + FIXED_RANDOM);
19	}
20
21	// Overload for signed long long
22	size_t operator()(long long x) const noexcept {
23	return (*this)((uint64_t)x);
24	}
25
26	// Hash combine helper
27	size_t combine(size_t a, size_t b) const noexcept {
28	// Similar to boost::hash_combine but with extra mixing
29	uint64_t res = a;
30	res ^= b + 0x9e3779b97f4a7c15ULL + (res << 6) + (res >> 2);
31	return (size_t)splitmix64(res + FIXED_RANDOM);
32	}
33
34	// Hash for pair<T,U>
35	template<class T, class U>
36	size_t operator()(const pair<T,U>& p) const noexcept {
37	size_t h1 = (*this)((uint64_t)std::hash<T>{}(p.first));
38	size_t h2 = (*this)((uint64_t)std::hash<U>{}(p.second));
39	return combine(h1, h2);
40	}
41
42	// Hash for vector<int> (illustrative; for other types, generalize)
43	size_t operator()(const vector<int>& v) const noexcept {
44	size_t h = 0;
45	for (int x : v) {
46	size_t hx = (*this)((uint64_t)(uint32_t)x);
47	h = combine(h, hx);
48	}
49	return h;
50	}
51	};
52
53	int main() {
54	ios::sync_with_stdio(false);
55	cin.tie(nullptr);
56
57	// Using pair<int,int> as a key
58	unordered_map<pair<int,int>, int, custom_hash> grid_cost;
59	grid_cost.reserve(1 << 16);
60
61	grid_cost[{1,2}] = 5;
62	grid_cost[{2,3}] = 7;
63	cout << grid_cost[{1,2}] << "\n"; // 5
64
65	// Using vector<int> as a key
66	unordered_map<vector<int>, string, custom_hash> pattern_name;
67	pattern_name.reserve(1024);
68	pattern_name[{1,2,3}] = "seq123";
69	pattern_name[{2,3,5,7}] = "primes";
70	cout << pattern_name[{2,3,5,7}] << "\n";
71
72	// Anti-hash: show that many equal hashes won't easily occur with random seed
73	unordered_set<long long, custom_hash> us;
74	us.reserve(100000);
75	for (long long i = 0; i < 100000; ++i) us.insert(i * 1000003LL);
76	cout << "size: " << us.size() << "\n"; // Should be 100000
77
78	return 0;
79	}
80

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	class LinearProbingHash {
5	// States for each slot
6	enum State : uint8_t { EMPTY = 0, FILLED = 1, DELETED = 2 };
7
8	struct Slot {
9	long long key = 0;
10	State state = EMPTY;
11	};
12
13	vector<Slot> table;
14	size_t n = 0; // number of FILLED keys
15	double max_alpha = 0.7; // rehash threshold
16
17	static inline uint64_t splitmix64(uint64_t x) {
18	x += 0x9e3779b97f4a7c15ULL;
19	x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
20	x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
21	return x ^ (x >> 31);
22	}
23
24	size_t H(long long x) const {
25	// Simple 64-bit mixing, then map into table size
26	return (size_t)(splitmix64((uint64_t)x)) & (table.size() - 1);
27	}
28
29	void rehash(size_t new_m) {
30	// new_m must be a power of two for &-masking to work
31	vector<Slot> old = move(table);
32	table.assign(new_m, {});
33	n = 0;
34	for (const auto &s : old) if (s.state == FILLED) insert(s.key);
35	}
36
37	void maybe_grow() {
38	if ((double)n / (double)table.size() > max_alpha) {
39	rehash(table.size() * 2);
40	}
41	}
42
43	public:
44	explicit LinearProbingHash(size_t m = 8) {
45	// Round up m to power of two
46	size_t pw = 1;
47	while (pw < m) pw <<= 1;
48	table.assign(pw, {});
49	}
50
51	bool insert(long long key) {
52	maybe_grow();
53	size_t m = table.size();
54	size_t i = H(key);
55	size_t first_deleted = SIZE_MAX;
56	while (true) {
57	if (table[i].state == EMPTY) {
58	size_t pos = (first_deleted != SIZE_MAX ? first_deleted : i);
59	table[pos].key = key;
60	table[pos].state = FILLED;
61	++n;
62	return true; // inserted new
63	} else if (table[i].state == DELETED) {
64	if (first_deleted == SIZE_MAX) first_deleted = i; // remember first tombstone
65	} else if (table[i].state == FILLED && table[i].key == key) {
66	return false; // already present
67	}
68	i = (i + 1) & (m - 1); // linear probing with wrap-around
69	}
70	}
71
72	bool contains(long long key) const {
73	size_t m = table.size();
74	size_t i = H(key);
75	while (true) {
76	if (table[i].state == EMPTY) return false; // stop at empty slot
77	if (table[i].state == FILLED && table[i].key == key) return true;
78	i = (i + 1) & (m - 1);
79	}
80	}
81
82	bool erase(long long key) {
83	size_t m = table.size();
84	size_t i = H(key);
85	while (true) {
86	if (table[i].state == EMPTY) return false;
87	if (table[i].state == FILLED && table[i].key == key) {
88	table[i].state = DELETED; // tombstone to preserve probe chains
89	--n;
90	return true;
91	}
92	i = (i + 1) & (m - 1);
93	}
94	}
95
96	size_t size() const { return n; }
97	size_t capacity() const { return table.size(); }
98	};
99
100	int main() {
101	ios::sync_with_stdio(false);
102	cin.tie(nullptr);
103
104	LinearProbingHash hs;
105	for (int i = 0; i < 10; ++i) hs.insert(i * 3);
106
107	cout << "contains 9? " << (hs.contains(9) ? "yes" : "no") << "\n";
108	cout << "contains 12? " << (hs.contains(12) ? "yes" : "no") << "\n";
109	cout << "size: " << hs.size() << " capacity: " << hs.capacity() << "\n";
110
111	hs.erase(9);
112	cout << "contains 9 after erase? " << (hs.contains(9) ? "yes" : "no") << "\n";
113
114	return 0;
115	}
116

1	#include <bits/stdc++.h>
2	#include <ext/pb_ds/assoc_container.hpp>
3	using namespace std;
4	using namespace __gnu_pbds;
5
6	static uint64_t splitmix64_u(uint64_t x) {
7	x += 0x9e3779b97f4a7c15ULL;
8	x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
9	x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
10	return x ^ (x >> 31);
11	}
12
13	struct fast_hash {
14	const uint64_t SEED = chrono::steady_clock::now().time_since_epoch().count();
15	size_t operator()(uint64_t x) const { return (size_t)splitmix64_u(x + SEED); }
16	size_t operator()(long long x) const { return operator()((uint64_t)x); }
17	};
18
19	int main() {
20	ios::sync_with_stdio(false);
21	cin.tie(nullptr);
22
23	// gp_hash_table<Key, Mapped, Hash>
24	gp_hash_table<long long, int, fast_hash> mp;
25	mp.resize(1 << 16); // pre-allocate buckets
26
27	// Example: count occurrences
28	vector<long long> a = {1,2,3,2,1,4,5,2,1};
29	for (auto x : a) mp[x]++;
30
31	cout << "count(2) = " << mp[2] << "\n";
32
33	// Membership-style container: value-less map (mapped type = null_type)
34	gp_hash_table<long long, null_type, fast_hash> st;
35	st.resize(1 << 16);
36	st.insert(10);
37	st.insert(20);
38	cout << (st.find(10) != st.end() ? "present" : "absent") << "\n";
39
40	return 0;
41	}
42