⚙️AlgorithmIntermediate

XOR Hashing (Zobrist Hashing)

Key Points

  • XOR (Zobrist) hashing assigns a random 64-bit number to each possible element and hashes a set by XORing the numbers of present elements.
  • It is order-independent and supports O(1) incremental updates: toggle an element by XORing its random number.
  • For multisets, parity alone is not enough; you must incorporate counts (e.g., use a random value that depends on both element and count).
  • Collision probability for two independent 64-bit hashes is about 1 in 2^64, which is negligible for most competitive programming tasks.
  • The method shines in game state hashing, duplicate-state detection in BFS/DFS, and quick equality checks of sets or bag-like structures.
  • Zobrist hashing updates game positions very efficiently by XORing out the piece-from-square and XORing in the piece-to-square.
  • For large collections, consider the birthday paradox: with m hashes, collision probability grows roughly like / 2^(b+1).
  • Use 64-bit or 128-bit hashes, secure seeding, and optional double hashing to reduce adversarial collisions.

Prerequisites

  • Bitwise operations and XORUnderstanding XOR properties (associative, commutative, self-inverse) explains why toggling and order-independence work.
  • Hashing and hash tablesKnowing how hashes are used in unordered_map/set motivates why fast, low-collision hashes matter.
  • Randomization and probabilityCollision analysis relies on uniformity assumptions and the birthday paradox.
  • Graphs and trees (DFS, centroids)Tree isomorphism hashing uses DFS to aggregate child hashes and centroids for unrooted trees.
  • Game state representationMapping board features (piece–square, side-to-move) to tokens is central in Zobrist hashing.

Detailed Explanation

Tap terms for definitions

01Overview

XOR hashing, popularly known as Zobrist hashing in game AI, is a randomized hashing technique for sets, multisets, and composite states. The core idea is to preassign a random 64-bit number to every possible atomic feature (for example, a specific piece type on a specific square, or a particular element value). The hash of a state is then the bitwise XOR of the random numbers corresponding to the features that are present in the state. Because XOR is commutative and associative, the final hash does not depend on the order in which we combine features, which makes it perfect for hashing sets. Even better, XOR has a convenient invertibility property: x ⊕ x = 0. This enables O(1) incremental updates—adding or removing a feature is simply XORing with its number. Random 64-bit assignments make different states map to seemingly unrelated values, providing a low probability of collision. In practice, this technique is used extensively in chess engines and other game AIs to maintain transposition tables, as well as in competitive programming for deduplication, multiset comparison, and hashing complex combinatorial objects like trees. While not cryptographically secure, XOR hashing is fast, memory-efficient, and remarkably effective when implemented with care.

02Intuition & Analogies

Imagine you keep a keyring where each unique item in your collection is represented by a special key with a secret pattern. To describe your whole collection, you shine a blacklight over your keyring, and the patterns of the keys magically combine to show a single glowing color. If you add a key, the glow changes in a predictable way; if you remove that exact key, the glow reverts—like flipping a light switch on and off. That glow is your hash. The secret is that each key’s pattern was assigned randomly up front, so two different sets of keys almost never glow the exact same way. Now think about a board game like chess. Each square-piece pair gets its own secret key. The position’s glow is the XOR of the keys for all pieces currently on the board. Make a move? You don’t need to rescan the whole board. You just turn off the key for the piece leaving its square and turn on the key for the piece entering the destination square. Captures? Toggle the captured piece’s key off too. This makes updating the glow nearly free, even though the board has many squares. For bags (multisets), imagine not just whether a key is on the ring, but how many copies are there. If you only used one key per type, two copies would cancel if you toggled twice—bad! So you need a key that depends on both the item and its count, like a custom-cut key per (item, count) pair. When the count changes from c to c+1, you swap the old key for the new one. The glow still updates in constant time, and now it encodes multiplicities correctly.

03Formal Definition

Let U be a finite universe of atomic features (e.g., elements of a set, piece–square pairs). Choose a function R: U → {0,1}^{b} that maps each feature u ∈ U to an independent b-bit random value. For a set S ⊆ U, define the Zobrist/XOR hash as H(S) = ⊕_{u ∈ S} R(u), where ⊕ denotes bitwise XOR and the empty set hashes to 0. By commutativity and associativity of XOR, H(S) is order-independent. Furthermore, if S' = S Δ {v} (symmetric difference), then H(S') = H(S) ⊕ R(v), enabling O(1) toggles. For multisets M: U → ℕ, a parity-only construction (M) = ⊕_{u} R(u) repeated M(u) times loses multiplicity information modulo 2. To encode counts, define a random mapping : U × ℕ → {0,1}^{b} and set H(M) = ⊕_{u} (u, M(u)). Incrementally, when M(u) changes from c to c', update (u, c) ⊕ (u, c'). Under the assumption that R (or ) acts like a family of independent uniform random variables, the collision probability for two distinct states is approximately 2^{-b}. For many practical applications, is sufficient; for added safety, use two independent 64-bit hashes to emulate a 128-bit hash.

04When to Use

Use XOR/Zobrist hashing when you need very fast, incremental, order-independent hashing with extremely low collision probability but do not require cryptographic guarantees. Classic uses include: (1) Game state hashing (e.g., chess), where each move updates the hash by toggling 2–4 random values for piece movement, captures, castling rights, and side-to-move. (2) Duplicate detection in BFS/DFS or memoization maps—store visited states by their hashes to prune search trees quickly. (3) Checking equality of sets or multisets on the fly—keep a running XOR hash and compare; for critical correctness, optionally verify equality on collision. (4) Cycle detection in permutations or sequence of operations—maintain a hash of the current configuration and recognize repeats cheaply. (5) Tree isomorphism sketches—hash each node from the multiset of child hashes using a multiset-aware XOR mapping; useful for comparing unordered rooted trees. (6) Order-invariant substring or window features—when the bag of characters/items matters rather than their order. Avoid XOR hashing if adversaries can adapt inputs to force collisions, or when a cryptographic hash is required. Also avoid the naive parity-only approach for multisets if counts matter.

⚠️Common Mistakes

• Using parity-only XOR for multisets: XORing an element’s random value M(u) times only preserves M(u) modulo 2, so pairs cancel. Fix: Hash counts by XORing R_c(u, count) and swap out the old-count token when the count changes. • Too few bits: 32-bit XOR hashes have collision risks in large datasets due to the birthday paradox. Prefer 64-bit (or two independent 64-bit values) for safety. • Recomputing from scratch: The power of Zobrist hashing is O(1) updates. Don’t iterate the whole state; instead, XOR out what left and XOR in what arrived. • Weak or predictable seeding: If inputs can be adversarial (e.g., online judges with crafted tests), seed your random mapping unpredictably (e.g., from a high-resolution clock) and consider double hashing. • Forgetting all state features: In game positions, include side-to-move, castling rights, en passant squares, etc. Missing features cause distinct states to share a hash. • Assuming zero collision probability: Collisions are extremely unlikely but not impossible. For mission-critical equality checks, confirm with a secondary exact comparison when hashes match. • Mixing arithmetic incorrectly: For counts, do not just multiply a fixed R(u) by count and XOR—it can create linear relations. Prefer an R_c(u, c) derived via a strong mixer (e.g., splitmix64) on the pair (u, c).

Key Formulas

Set Hash

Explanation: The hash of a set S is the XOR of random values assigned to each present element. Because XOR is commutative and associative, order does not matter.

Incremental Toggle

Explanation: Adding or removing a single element v updates the hash by XORing its random value. Toggling the same element twice restores the original hash.

Multiset Hash

Explanation: For a multiset M with counts M(u), use a random value that depends on both the element and its count. When a count changes, XOR out the old and XOR in the new one.

Pairwise Collision

Explanation: Two independently hashed distinct states have about 1 in 2^b chance to collide. With b = 64, this is negligible for typical programming tasks.

Birthday Bound

Explanation: With m random b-bit hashes, the probability of at least one collision follows the birthday paradox. It shows how collision risk grows quadratically with m.

XOR Laws

Explanation: Self-inverse, commutative, and associative properties justify order independence and constant-time toggling.

Double Hashing

Explanation: Using two independent 64-bit XOR hashes emulates a 128-bit hash. Both must match, pushing collision probability down to about 2^{-128}.

Complexity Analysis

Let U be the universe of features. Preprocessing typically assigns a random 64-bit number to each feature u ∈ U. If the universe is explicit and small (e.g., chess piece–square pairs), we precompute a table of size , using O() time and space. If U is large or implicit (e.g., arbitrary integers), we lazily derive R(u) by passing u and a seed through a 64-bit mixer like splitmix64, achieving O(1) per new feature and O(k) total space for k features encountered. Computing the hash of a state from scratch requires XORing one 64-bit number per present feature. If a state has s features, this is O(s) time. The main advantage is incremental updates: when the state changes by a constant number of local edits (like a piece moving squares or a set element toggled), the hash update is O(1) time—just a handful of XOR operations. This is critical in search algorithms, where millions of small updates occur. Space usage is dominated by the random table (if precomputed) or by the code and constants used by the mixer. A 64-bit hash uses 8 bytes; storing it in a hash table adds overhead but remains compact. Collision probability for one comparison is approximately 2^{-64}. For m stored states, the expected number of collisions scales like Θ( / 2^{65}) by the birthday paradox. In competitive programming at 1700–2100 level, even millions of states remain safe with 64-bit hashes; for additional safety or adversarial settings, use two independent 64-bit hashes (about 16 bytes total) to drive down collision risk to ≈ 2^{-128}.

Code Examples

Set and Multiset XOR Hashing with O(1) Incremental Updates
1#include <bits/stdc++.h>
2using namespace std;
3
4// A fast 64-bit mixer (public domain). Deterministically maps integers to pseudo-random 64-bit values.
5static inline uint64_t splitmix64(uint64_t x) {
6 x += 0x9e3779b97f4a7c15ULL;
7 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
8 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
9 return x ^ (x >> 31);
10}
11
12struct XorSetHash {
13 // Hash of a set of integers (order-independent). Add/remove by toggling R(x).
14 uint64_t H = 0;
15 uint64_t seed;
16 explicit XorSetHash(uint64_t seed_ = chrono::high_resolution_clock::now().time_since_epoch().count()) : seed(seed_) {}
17 uint64_t R(int x) const { return splitmix64(seed ^ (uint64_t)x * 0x9e3779b97f4a7c15ULL); }
18 // Toggle presence of x
19 void toggle(int x) { H ^= R(x); }
20 // Explicit add/remove via a maintained membership set (optional)
21};
22
23struct XorMultisetHash {
24 // Correctly handles counts by mapping (x, count) to a random 64-bit value.
25 uint64_t H = 0;
26 uint64_t seed;
27 unordered_map<int,int> cnt; // current counts
28 explicit XorMultisetHash(uint64_t seed_ = chrono::high_resolution_clock::now().time_since_epoch().count() ^ 0x12345678ULL)
29 : seed(seed_) {}
30 // Random token for the pair (x, c). c==0 maps to 0 implicitly (we never XOR R_c(x,0)).
31 uint64_t R_c(int x, int c) const {
32 // Derive from both x and c using independent multipliers to avoid linear relations.
33 uint64_t key = (uint64_t)x * 0x9e3779b97f4a7c15ULL ^ (uint64_t)c * 0xbf58476d1ce4e5b9ULL ^ seed;
34 return splitmix64(key);
35 }
36 void add(int x, int delta = 1) {
37 int oldc = cnt[x];
38 int newc = oldc + delta;
39 // XOR out old token, XOR in new token
40 if (oldc > 0) H ^= R_c(x, oldc);
41 if (newc > 0) H ^= R_c(x, newc);
42 if (newc <= 0) cnt.erase(x); else cnt[x] = newc;
43 }
44 void remove(int x, int delta = 1) { add(x, -delta); }
45};
46
47int main() {
48 ios::sync_with_stdio(false);
49 cin.tie(nullptr);
50
51 // Demonstrate set hashing (order-independent & incremental)
52 XorSetHash sh(123456789ULL);
53 vector<int> a = {5, 10, 42, 10, 5};
54 // We'll maintain real set membership for demo clarity
55 unordered_set<int> S;
56 for (int x : a) {
57 if (S.count(x)) { // toggle out
58 S.erase(x);
59 sh.toggle(x);
60 } else { // toggle in
61 S.insert(x);
62 sh.toggle(x);
63 }
64 }
65 cout << "Set hash after toggles: " << sh.H << "\n";
66
67 // Demonstrate multiset hashing (counts matter)
68 XorMultisetHash mh(987654321ULL);
69 mh.add(7); // count(7)=1
70 mh.add(7); // count(7)=2
71 mh.add(3, 5); // count(3)=5
72 uint64_t h1 = mh.H;
73 mh.remove(7); // count(7)=1
74 mh.add(7); // count(7)=2, returns to previous state
75 uint64_t h2 = mh.H;
76 cout << "Multiset hash stable after undo: " << boolalpha << (h1 == h2) << "\n";
77
78 // Equality check via hashes (with optional verification)
79 XorMultisetHash A(13579ULL), B(13579ULL); // same seed so same R_c mapping
80 vector<int> X = {1,1,2,4,4,4};
81 vector<int> Y = {4,2,4,1,4,1};
82 for (int v : X) A.add(v);
83 for (int v : Y) B.add(v);
84 cout << "Hashes equal (expected true): " << (A.H == B.H) << "\n";
85 // Optional: verify counts on collision just to be 100% sure in critical code.
86
87 return 0;
88}
89

This program implements two hashers. XorSetHash supports toggling membership of elements, suitable for plain sets when elements appear at most once. XorMultisetHash handles counts by assigning a random token to the pair (element, count) and swapping the token whenever the count changes, ensuring multiplicities are captured. The example demonstrates order independence, O(1) updates, and equality checking of multisets constructed in different orders.

Time: O(1) per add/remove/toggle; O(s) to build from scratch for s elementsSpace: O(k) for k distinct elements encountered (for counts and any cached tokens)
Zobrist Hashing for a Board Game with Incremental Move Updates
1#include <bits/stdc++.h>
2using namespace std;
3
4static inline uint64_t splitmix64(uint64_t x) {
5 x += 0x9e3779b97f4a7c15ULL;
6 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
7 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
8 return x ^ (x >> 31);
9}
10
11struct Zobrist {
12 // Board: 8x8, piece types: 12 (e.g., chess pieces: 6 per color)
13 static constexpr int N = 8, SQUARES = N * N, PIECES = 12;
14 uint64_t Z[SQUARES][PIECES];
15 uint64_t sideToMove; // optional feature
16 uint64_t seed;
17 Zobrist(uint64_t seed_ = chrono::high_resolution_clock::now().time_since_epoch().count()) : seed(seed_) {
18 // Initialize table deterministically from seed
19 for (int s = 0; s < SQUARES; ++s) {
20 for (int p = 0; p < PIECES; ++p) {
21 uint64_t key = seed ^ (uint64_t)s * 0x9e3779b97f4a7c15ULL ^ (uint64_t)p * 0xbf58476d1ce4e5b9ULL;
22 Z[s][p] = splitmix64(key);
23 }
24 }
25 sideToMove = splitmix64(seed ^ 0xabcdef123456789ULL);
26 }
27};
28
29struct Board {
30 // -1 means empty, otherwise 0..11 piece index
31 array<int, Zobrist::SQUARES> sq{};
32 bool whiteToMove = true;
33 const Zobrist &Z;
34 uint64_t H = 0;
35 Board(const Zobrist &Z_) : Z(Z_) { sq.fill(-1); }
36
37 static int idx(int r, int c) { return r * Zobrist::N + c; }
38
39 void place(int r, int c, int piece) {
40 int s = idx(r,c);
41 if (sq[s] != -1) throw runtime_error("square not empty");
42 sq[s] = piece;
43 H ^= Z.Z[s][piece];
44 }
45 void remove(int r, int c) {
46 int s = idx(r,c);
47 if (sq[s] == -1) throw runtime_error("square empty");
48 H ^= Z.Z[s][sq[s]];
49 sq[s] = -1;
50 }
51 void movePiece(int r1, int c1, int r2, int c2) {
52 int s1 = idx(r1,c1), s2 = idx(r2,c2);
53 int p = sq[s1];
54 if (p == -1) throw runtime_error("no piece at from-square");
55 // XOR out piece from s1
56 H ^= Z.Z[s1][p];
57 // If capture on s2, XOR out captured piece
58 if (sq[s2] != -1) H ^= Z.Z[s2][sq[s2]];
59 // Move piece
60 sq[s2] = p; sq[s1] = -1;
61 // XOR in piece at s2
62 H ^= Z.Z[s2][p];
63 // Toggle side to move
64 whiteToMove = !whiteToMove;
65 H ^= Z.sideToMove;
66 }
67
68 // Recompute from scratch to test correctness
69 uint64_t recompute() const {
70 uint64_t h = 0;
71 for (int s = 0; s < Zobrist::SQUARES; ++s) if (sq[s] != -1) h ^= Z.Z[s][sq[s]];
72 if (whiteToMove) h ^= Z.sideToMove; else h ^= 0ULL ^ 0ULL; // toggled parity
73 return h;
74 }
75};
76
77int main(){
78 ios::sync_with_stdio(false);
79 cin.tie(nullptr);
80
81 Zobrist Z(2024ULL);
82 Board B(Z);
83
84 // Place a few pieces: White pawn (0), White knight (1), Black pawn (6), ... (indices are arbitrary here)
85 B.place(1, 0, 0); // white pawn at a2
86 B.place(0, 1, 1); // white knight at b1
87 B.place(6, 0, 6); // black pawn at a7
88
89 uint64_t h0 = B.H;
90 // A simple move: knight from b1 to c3
91 B.movePiece(0,1,2,2);
92
93 uint64_t h_inc = B.H;
94 uint64_t h_rec = B.recompute();
95 cout << "Incremental equals recomputed: " << boolalpha << (h_inc == h_rec) << "\n";
96
97 // Another move with capture: white pawn from a2 to a7 capturing black pawn
98 // For demo only; not legal chess, but shows XOR updates.
99 B.movePiece(1,0,6,0);
100 cout << "Hash after capture: " << B.H << "\n";
101
102 return 0;
103}
104

This example builds a Zobrist table for an 8×8 board with 12 piece types. The board hash is the XOR of Z[square][piece] for all occupied squares plus a side-to-move token. A move updates the hash by XORing out the from-square piece, optionally XORing out a captured piece, XORing in the to-square piece, and toggling the side token. A recomputation cross-check confirms that incremental updates match the full recomputation.

Time: O(1) per move update; O(64) to recompute from scratch (or O(#pieces))Space: O(64*12) for the Zobrist table; O(1) for the current hash
Tree Isomorphism Sketch via XOR Multiset Hashing of Child Subtrees
1#include <bits/stdc++.h>
2using namespace std;
3
4static inline uint64_t splitmix64(uint64_t x) {
5 x += 0x9e3779b97f4a7c15ULL;
6 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
7 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
8 return x ^ (x >> 31);
9}
10
11struct CountHasher {
12 uint64_t seed;
13 explicit CountHasher(uint64_t seed_ = chrono::high_resolution_clock::now().time_since_epoch().count()) : seed(seed_) {}
14 // Random token for (val, count)
15 uint64_t Rc(uint64_t val, int cnt) const {
16 uint64_t key = seed ^ val * 0x9e3779b97f4a7c15ULL ^ (uint64_t)cnt * 0xbf58476d1ce4e5b9ULL;
17 return splitmix64(key);
18 }
19 // Hash a multiset of child-hashes given as a vector (order-independent, count-aware)
20 uint64_t multiset_hash(const vector<uint64_t>& v) const {
21 // Count equal child hashes
22 vector<uint64_t> w = v;
23 sort(w.begin(), w.end());
24 uint64_t H = 0;
25 for (size_t i = 0; i < w.size();) {
26 size_t j = i;
27 while (j < w.size() && w[j] == w[i]) ++j;
28 int cnt = (int)(j - i);
29 H ^= Rc(w[i], cnt);
30 i = j;
31 }
32 // Also incorporate degree to reduce trivial collisions
33 H ^= Rc(0xDEADBEEFCAFEBABEULL, (int)v.size());
34 return H;
35 }
36};
37
38// Compute an isomorphism-invariant hash of an unordered rooted tree.
39uint64_t tree_hash_rooted(const vector<vector<int>>& g, int u, int p, const CountHasher& CH) {
40 vector<uint64_t> child;
41 for (int v : g[u]) if (v != p) child.push_back(tree_hash_rooted(g, v, u, CH));
42 return CH.multiset_hash(child);
43}
44
45// Unrooted tree: hash each centroid and combine (two centroids possible)
46vector<int> centroids(const vector<vector<int>>& g) {
47 int n = (int)g.size();
48 vector<int> sz(n,0);
49 function<void(int,int)> dfs = [&](int u,int p){
50 sz[u] = 1;
51 for (int v: g[u]) if (v!=p){ dfs(v,u); sz[u]+=sz[v]; }
52 }; dfs(0,-1);
53 int n2 = n;
54 vector<int> parent(n,-1);
55 queue<int> q;
56 vector<int> deg(n); for(int i=0;i<n;++i){ deg[i]=g[i].size(); if(deg[i]<=1) q.push(i); }
57 int removed = 0;
58 while(removed < n2-2){
59 int qs = q.size();
60 for(int i=0;i<qs;++i){
61 int u=q.front(); q.pop(); removed++;
62 for(int v: g[u]) if(--deg[v]==1) q.push(v);
63 }
64 }
65 vector<int> C;
66 while(!q.empty()){ C.push_back(q.front()); q.pop(); }
67 sort(C.begin(), C.end()); C.erase(unique(C.begin(), C.end()), C.end());
68 return C;
69}
70
71int main(){
72 ios::sync_with_stdio(false);
73 cin.tie(nullptr);
74
75 int n1 = 5, n2 = 5;
76 vector<vector<int>> g1(n1), g2(n2);
77 // Tree 1 edges: 0-1, 1-2, 1-3, 3-4 (an unordered shape)
78 auto add_edge=[&](vector<vector<int>>& g,int a,int b){ g[a].push_back(b); g[b].push_back(a); };
79 add_edge(g1,0,1); add_edge(g1,1,2); add_edge(g1,1,3); add_edge(g1,3,4);
80 // Tree 2 is an isomorphic copy with relabeled nodes
81 add_edge(g2,2,0); add_edge(g2,0,4); add_edge(g2,0,1); add_edge(g2,1,3);
82
83 CountHasher CH(424242ULL);
84
85 auto C1 = centroids(g1); auto C2 = centroids(g2);
86 vector<uint64_t> H1, H2;
87 for (int c : C1) H1.push_back(tree_hash_rooted(g1, c, -1, CH));
88 for (int c : C2) H2.push_back(tree_hash_rooted(g2, c, -1, CH));
89 sort(H1.begin(), H1.end()); sort(H2.begin(), H2.end());
90
91 cout << "Trees isomorphic (by hash): " << boolalpha << (H1 == H2) << "\n";
92 return 0;
93}
94

This program computes an isomorphism-invariant hash of an unordered tree. Each node’s hash is the XOR of random tokens associated with the multiset of its children’s hashes, where each distinct child-hash is paired with its count. This avoids the parity problem of raw XOR. For unrooted trees, we hash from centroids and compare the multiset of centroid hashes between two trees. While still probabilistic, this is highly reliable with 64-bit (or 128-bit) hashing.

Time: O(n log n) per tree due to sorting children’s hashes at each node (amortized), typically O(n) with counting/hash maps; centroid finding is O(n)Space: O(n) recursion/auxiliary storage