Linear Basis for XOR
Key Points
- •A linear basis for XOR is a compact set of at most W numbers (W = number of bits) that can generate every XOR value obtainable from a multiset of numbers.
- •Insertion reduces a new number by existing basis vectors from highest bit to lowest; if it cannot be fully reduced to zero, it becomes a new basis vector.
- •Queries like maximum subset XOR or maximum XOR with a given mask are answered greedily in O(W) time by toggling basis vectors when they increase the answer.
- •The basis formalizes Gaussian elimination over the field \(\), treating each integer as a W-dimensional bit-vector.
- •If the basis has rank r, then there are exactly \(2^r\) distinct subset XOR values and a target value has \(2^{n-r}\) representing subsets if it is reachable.
- •You can rebuild the basis into a canonical (reduced) form to get the k-th smallest distinct subset XOR in O(W) using bit-index mapping.
- •Merging two bases is done by inserting each vector of one into the other, keeping complexity O() in the worst case (but practically tiny since or 64).
- •Linear XOR bases power many CF problems around 1900–2300 rating: ranges with timestamps, offline queries, k-th subset XOR, and counting representations.
Prerequisites
- →Bitwise operations and binary representation — Linear XOR basis manipulates bits directly; understanding shifts, masks, and XOR is essential.
- →Gaussian elimination (high level) — The basis construction is the XOR analogue of Gaussian elimination; the idea of pivots and row operations carries over.
- →Vector spaces over fields — The theory uses \(\mathbb{F}_2\) as the field; linear independence, span, and rank come from linear algebra.
- →Logarithms and bit length — Complexities are expressed in terms of W = \(\lceil \log_2(\text{MAX}) \rceil\); understanding bit length helps reason about bounds.
- →Modular arithmetic and fast exponentiation — Counting representations requires computing \(2^{n-r}\) modulo a prime efficiently.
Detailed Explanation
Tap terms for definitions01Overview
Hook: Imagine you have many switches (bits) on many devices (numbers). Flipping some devices together (XOR) creates new light patterns. You want a smallest toolset that can reproduce any light pattern you could make by flipping some subset. That toolset is the XOR linear basis. Concept: A linear basis for XOR is a compact representation of all subset XORs of a set of integers. It consists of at most W numbers (where W is the bit-width, like 60 for 64-bit signed values up to 1e18). Each basis vector has a unique highest set bit (pivot). To insert a number, you keep canceling its highest set bit using existing basis vectors; if you end with a nonzero number, it adds a new independent pivot. Queries like “maximum subset XOR” or “maximum XOR against a mask x” work by greedily attempting to improve the current value using basis vectors in descending pivot order. Example: Suppose you have numbers [3, 5, 6] (011, 101, 110). Building a basis might yield [6 (110), 5 (101)]. With these two, you can generate 0, 5, 6, 3 (= 5 xor 6). So any subset XOR is one of four values, and the maximum subset XOR is 6. If you query “max XOR with x = 1 (001)”, greedily toggling basis vectors gives 7 (= 1 xor 6).
02Intuition & Analogies
Think of each number as a recipe of switches (bits) it flips: 1 means “flip this switch,” 0 means “leave it.” XOR is like combining recipes where flips cancel if done twice. A linear basis is your minimal cookbook—no recipe in it can be recreated by combining the others, and together they can produce every recipe your original collection could. When you insert a new recipe, you try to cancel its biggest distinctive flip using existing recipes. If after all cancellations something remains, that remainder has a brand-new biggest flip you didn’t have; you keep it. This guarantees you never store redundant recipes. To maximize XOR, picture a scoreboard showing your current value. You look at your recipes from the one that changes the top-most scoreboard digit (most significant bit). If toggling it increases the score, you use it; otherwise, skip. Because each recipe affects a different top digit, this greedy choice is always safe. For counting: If your cookbook has r independent recipes, there are exactly (2^r) ways to combine them (each included or not), so there are (2^r) distinct dishes (subset XORs). If you started with n numbers but only r are independent, then there are extra “free choices” (n − r redundancies). Whenever a target dish is reachable, there are (2^{n-r}) different original subsets that cook it—choose freely among redundant items without changing the taste (XOR value). Example: With basis [8, 3], you can make 0, 3, 8, 11. To get the maximum against x = 4, try toggling 8 (makes 12, good), then try 3 (makes 15, better).
03Formal Definition
04When to Use
- Maximum subset XOR: Replace tries or DP with a linear basis to get O(nW) preprocessing and O(W) query for the maximum.
- Maximize XOR with a given mask x: After building the basis, answer each query in O(W).
- Counting representations: Need the number of subsets producing a target y. If y is reachable, it is exactly (2^{n-r}); otherwise 0.
- Distinct subset XOR set size: Immediately (2^r). Useful for combinatorics of subsets and counting problems.
- k-th smallest subset XOR: After reducing to a canonical (RREF-like) list, you can map index bits to basis vectors to retrieve the k-th smallest value in O(W).
- Merging information: In divide-and-conquer, DSU on tree, or segment tree nodes, you can merge subproblems by merging bases.
- Range queries (advanced): With timestamps in each pivot, you can support queries like “maximum subset XOR using only elements with index ≥ L” via segment trees or offline sweeps.
- Memory- and speed-sensitive cases: Since W ≤ 60–64 in most problems, the basis is tiny and operations are extremely fast compared to tries or full DP.
⚠️Common Mistakes
- Using signed integers: Right shifts and comparisons can behave unexpectedly. Use unsigned types (e.g., unsigned long long) and explicit bit tests.
- Wrong insertion order: You must eliminate from the most significant bit downwards; otherwise you may fail to create unique pivots.
- Forgetting to clear lower bits when building a canonical list for k-th queries: Without a full backward elimination step, the simple bit-to-vector mapping won’t produce values in sorted order.
- Assuming duplicates change the span: Inserting a number that reduces to zero doesn’t change the basis. Track rank separately from count of inserted elements.
- Overflowing counts: When computing (2^{n-r}) mod M, use fast exponentiation; don’t attempt to build all subsets.
- Confusing “reachable as XOR” with “max XOR against x”: The former asks if y is in the span; the latter is a greedy maximize query starting from x.
- Not bounding W: Loop over a fixed bit width (e.g., 0..63). Accidentally using 0..31 when inputs need 60 bits will silently break correctness.
- Believing merges are expensive: Merging two bases is just O(W^2) worst-case, which is negligible for W ≤ 64.
Key Formulas
Span size
Explanation: If the basis of S has rank r (r independent vectors), then each vector can be either included or not included, giving 2^r distinct XOR results. This counts distinct subset XOR values, not the number of subsets.
Counting representations
Explanation: If target y is reachable, there are 2^(n−r) original subsets that produce it because n−r choices are free (dependence). If it is not in the span, there are no solutions.
Insertion rule
Explanation: Reduce x by existing pivot rows from the most significant bit downward. If a nonzero remainder remains, it opens a new pivot at its most significant bit.
Greedy maximum
Explanation: Starting from ans = 0 (or a mask x), toggle each basis vector in descending pivot order if it increases the value. This works because pivots are independent at distinct bits.
Time/space bounds
Explanation: Each insertion or query touches at most W pivot bits; in typical problems, giving fast per-operation costs and tiny memory.
Rank from pivots
Explanation: The number of nonzero pivot rows equals the rank r. This dimension dictates how many distinct subset XORs exist.
Most significant bit index
Explanation: The pivot of a vector is at the index of its highest set bit. We always reduce by highest pivots first to maintain echelon form.
Canonicalization
Explanation: After forward elimination, a backward pass clears higher pivot bits from lower rows, making vectors minimal and enabling k-th mapping to sorted order.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct XORBasis64 { 5 static constexpr int W = 64; // bit width 6 array<unsigned long long, W> b{}; // pivots by bit index 7 int r = 0; // rank (number of pivots) 8 9 // Insert value x into the basis (Gaussian elimination over F2) 10 bool insert(unsigned long long x) { 11 for (int i = W - 1; i >= 0; --i) { 12 if (((x >> i) & 1ULL) == 0ULL) continue; // bit i not set 13 if (b[i]) { // pivot exists at i: cancel it 14 x ^= b[i]; 15 } else { 16 b[i] = x; // new pivot at bit i 17 ++r; 18 return true; // independent, added 19 } 20 } 21 return false; // dependent (reduced to 0) 22 } 23 24 // Check if y is in the span of the basis 25 bool contains(unsigned long long y) const { 26 for (int i = W - 1; i >= 0; --i) { 27 if (((y >> i) & 1ULL) == 0ULL) continue; 28 if (!b[i]) return false; // cannot cancel pivot i 29 y ^= b[i]; 30 } 31 return y == 0ULL; 32 } 33 34 // Maximum subset XOR (start from 0 and greedily improve) 35 unsigned long long maxSubsetXor() const { 36 unsigned long long ans = 0ULL; 37 for (int i = W - 1; i >= 0; --i) { 38 if (!b[i]) continue; 39 unsigned long long cand = ans ^ b[i]; 40 if (cand > ans) ans = cand; 41 } 42 return ans; 43 } 44 45 // Maximum of (x xor s) over all s in span 46 unsigned long long maxXorWith(unsigned long long x) const { 47 unsigned long long ans = x; 48 for (int i = W - 1; i >= 0; --i) { 49 if (!b[i]) continue; 50 unsigned long long cand = ans ^ b[i]; 51 if (cand > ans) ans = cand; 52 } 53 return ans; 54 } 55 }; 56 57 int main() { 58 ios::sync_with_stdio(false); 59 cin.tie(nullptr); 60 61 int n; cin >> n; 62 XORBasis64 basis; 63 for (int i = 0; i < n; ++i) { 64 unsigned long long x; cin >> x; 65 basis.insert(x); 66 } 67 68 cout << "rank = " << basis.r << "\n"; 69 cout << "max subset XOR = " << basis.maxSubsetXor() << "\n"; 70 71 unsigned long long q; cin >> q; // a mask to test max xor with 72 cout << "max xor with " << q << " = " << basis.maxXorWith(q) << "\n"; 73 74 unsigned long long y; cin >> y; // test reachability 75 cout << y << (basis.contains(y) ? " is" : " is not") << " reachable\n"; 76 return 0; 77 } 78
This program builds a 64-bit XOR basis from input numbers. It supports: inserting numbers (adding independent pivots), checking if a target y is in the span (reachability), computing the maximum subset XOR, and maximizing XOR against a provided mask. All operations are O(W) with W=64.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct XORBasis64 { 5 static constexpr int W = 64; 6 array<unsigned long long, W> b{}; // pivot by msb 7 int r = 0; 8 9 bool insert(unsigned long long x) { 10 for (int i = W - 1; i >= 0; --i) { 11 if (((x >> i) & 1ULL) == 0ULL) continue; 12 if (b[i]) x ^= b[i]; 13 else { b[i] = x; ++r; return true; } 14 } 15 return false; 16 } 17 18 // Reduce to canonical (RREF-like) form: clear higher pivots from lower rows. 19 // After this, values extracted in increasing pivot order allow k-th mapping. 20 void canonicalize() { 21 // Forward clean: ensure each b[i] has its pivot at i (already maintained). 22 // Backward elimination: clear pivot i from all lower pivots j < i. 23 for (int i = W - 1; i >= 0; --i) if (b[i]) { 24 for (int j = i - 1; j >= 0; --j) if (b[j] && ((b[j] >> i) & 1ULL)) { 25 b[j] ^= b[i]; 26 } 27 } 28 } 29 30 // Extract compact list of basis vectors in ascending pivot order 31 vector<unsigned long long> values() const { 32 vector<unsigned long long> v; v.reserve(r); 33 for (int i = 0; i < W; ++i) if (b[i]) v.push_back(b[i]); 34 return v; // v.size() == r 35 } 36 37 // Return the k-th smallest distinct subset XOR (0-indexed). If k >= 2^r, return ULLONG_MAX as a sentinel. 38 unsigned long long kth_smallest(unsigned __int128 k) const { 39 vector<unsigned long long> v; v.reserve(r); 40 for (int i = 0; i < W; ++i) if (b[i]) v.push_back(b[i]); 41 const int R = (int)v.size(); 42 if (R == 0) return (k == 0 ? 0ULL : ULLONG_MAX); 43 // total distinct = 2^R 44 unsigned __int128 tot = (R >= 128 ? (unsigned __int128)0 : ((unsigned __int128)1 << R)); 45 if (k >= tot) return ULLONG_MAX; 46 // Map bits of k to vectors: ans = XOR of v[i] where bit i of k is 1. 47 unsigned long long ans = 0ULL; 48 for (int i = 0; i < R; ++i) if ((k >> i) & 1) ans ^= v[i]; 49 return ans; // requires canonicalized basis for sorted order 50 } 51 }; 52 53 int main() { 54 ios::sync_with_stdio(false); 55 cin.tie(nullptr); 56 57 int n; cin >> n; 58 XORBasis64 lb; 59 for (int i = 0; i < n; ++i) { 60 unsigned long long x; cin >> x; 61 lb.insert(x); 62 } 63 64 lb.canonicalize(); 65 // optional: print all distinct subset XORs in order 66 vector<unsigned long long> all; 67 unsigned __int128 total = (lb.r >= 64 ? (unsigned __int128)0 : ((unsigned __int128)1 << lb.r)); 68 // Example: query some k values 69 int q; cin >> q; 70 while (q--) { 71 unsigned long long k; cin >> k; // 0-indexed 72 unsigned long long ans = lb.kth_smallest((unsigned __int128)k); 73 if (ans == ULLONG_MAX) cout << "k out of range\n"; 74 else cout << ans << "\n"; 75 } 76 return 0; 77 } 78
After building the basis, we run a backward elimination to get a canonical form where each vector is as small as possible and has a unique pivot bit. In this form, the k-th smallest distinct subset XOR (0-indexed) can be obtained by XORing the i-th vector when the i-th bit of k is 1. This produces sorted order over distinct subset XORs. We use ULLONG_MAX as a sentinel for out-of-range k.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 static const long long MOD = 1000000007LL; 5 6 struct XORBasis64 { 7 static constexpr int W = 64; 8 array<unsigned long long, W> b{}; 9 int r = 0; // rank 10 long long n_ins = 0; // total inserted elements (for counting ways) 11 12 bool insert(unsigned long long x) { 13 ++n_ins; 14 for (int i = W - 1; i >= 0; --i) { 15 if (((x >> i) & 1ULL) == 0ULL) continue; 16 if (b[i]) x ^= b[i]; 17 else { b[i] = x; ++r; return true; } 18 } 19 return false; 20 } 21 22 // Merge another basis into this one 23 void mergeFrom(const XORBasis64 &other) { 24 for (int i = 0; i < W; ++i) if (other.b[i]) { 25 // insert pivot vector without increasing n_ins: it’s structural merge 26 unsigned long long x = other.b[i]; 27 for (int j = W - 1; j >= 0; --j) { 28 if (((x >> j) & 1ULL) == 0ULL) continue; 29 if (b[j]) x ^= b[j]; 30 else { b[j] = x; ++r; break; } 31 } 32 } 33 // Note: n_ins is application-specific across merges; not modified here. 34 } 35 36 bool contains(unsigned long long y) const { 37 for (int i = W - 1; i >= 0; --i) { 38 if (((y >> i) & 1ULL) == 0ULL) continue; 39 if (!b[i]) return false; 40 y ^= b[i]; 41 } 42 return y == 0ULL; 43 } 44 }; 45 46 long long modpow2(long long e) { 47 long long base = 2, res = 1; 48 while (e > 0) { 49 if (e & 1) res = (res * base) % MOD; 50 base = (base * base) % MOD; 51 e >>= 1; 52 } 53 return res; 54 } 55 56 int main() { 57 ios::sync_with_stdio(false); 58 cin.tie(nullptr); 59 60 int n1, n2; cin >> n1 >> n2; 61 XORBasis64 A, B; 62 for (int i = 0; i < n1; ++i) { unsigned long long x; cin >> x; A.insert(x); } 63 for (int i = 0; i < n2; ++i) { unsigned long long x; cin >> x; B.insert(x); } 64 65 // Merge bases: C spans all subset XORs of A ∪ B 66 XORBasis64 C = A; 67 C.mergeFrom(B); 68 69 unsigned long long y; cin >> y; 70 if (!C.contains(y)) { 71 cout << 0 << "\n"; // unreachable 72 } else { 73 // If you know the total original inserted count n = n1 + n2, and C has rank r, 74 // then the number of subsets giving y equals 2^(n - r) modulo MOD. 75 long long n = n1 + n2; 76 cout << modpow2(n - C.r) << "\n"; 77 } 78 return 0; 79 } 80
Two inputs are turned into bases and merged by inserting each pivot from B into A. The merged basis C represents the union’s span. If a target y is reachable, the number of original subsets that XOR to y equals 2^(n − r) where r is the rank of the merged basis and n is the total count of inserted numbers. We output this count modulo 1e9+7.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct XORBasis64 { 5 static constexpr int W = 64; 6 array<unsigned long long, W> b{}; 7 8 bool insert(unsigned long long x) { 9 for (int i = W - 1; i >= 0; --i) { 10 if (((x >> i) & 1ULL) == 0ULL) continue; 11 if (b[i]) x ^= b[i]; 12 else { b[i] = x; return true; } 13 } 14 return false; 15 } 16 17 unsigned long long maxSubsetXor() const { 18 unsigned long long ans = 0ULL; 19 for (int i = W - 1; i >= 0; --i) if (b[i]) { 20 unsigned long long cand = ans ^ b[i]; 21 if (cand > ans) ans = cand; 22 } 23 return ans; 24 } 25 }; 26 27 int main() { 28 ios::sync_with_stdio(false); 29 cin.tie(nullptr); 30 31 int q; cin >> q; // number of operations 32 XORBasis64 lb; 33 while (q--) { 34 int type; cin >> type; 35 if (type == 1) { 36 unsigned long long x; cin >> x; // add x 37 lb.insert(x); 38 } else if (type == 2) { 39 cout << lb.maxSubsetXor() << "\n"; // query current maximum subset XOR 40 } 41 } 42 return 0; 43 } 44
Maintains a running XOR basis for a sequence of updates. Each insertion is O(W), and each maximum subset XOR query is O(W). This pattern is common in streaming or prefix-query problems.