āš™ļøAlgorithmIntermediate

Randomized Algorithms

Key Points

  • •
    Randomized algorithms use coin flips (random bits) to guide choices, often making code simpler and fast on average.
  • •
    Las Vegas algorithms always return a correct answer, but their running time is a random variable (e.g., randomized quicksort).
  • •
    Monte Carlo algorithms run in predictable time but can return a wrong answer with small probability (e.g., Miller–Rabin primality testing).
  • •
    Randomized pivoting in quicksort/quickselect yields O(n n) expected sort time and O(n) expected selection time by avoiding bad pivots.
  • •
    Treaps assign random priorities to nodes, giving expected O( n) for insert/erase/search with simple code.
  • •
    Randomized hashing (salting) thwarts adversarial inputs by making collisions extremely unlikely in practice.
  • •
    Error probability can be driven down exponentially by repeating independent trials (probability amplification).
  • •
    High-probability guarantees and expectation bounds rely on basic probability tools like linearity of expectation, union bound, and Chernoff/Markov inequalities.

Prerequisites

  • →Basic Probability (random variables, expectation, independence) — Expected-time and error analyses rely on linearity of expectation, independence assumptions, and simple tail bounds.
  • →Asymptotic Analysis (Big-O, recurrences) — Understanding O(n), O(n log n), and how to solve recurrences is central to randomized runtime guarantees.
  • →Comparison-Based Sorting — Randomized quicksort builds on partitioning and comparison logic from deterministic sorting.
  • →Binary Search Trees — Treaps are BSTs with an added heap property on random priorities.
  • →Hash Tables and Hash Functions — Randomized hashing requires understanding buckets, collisions, and expected O(1) operations.
  • →Modular Arithmetic — Miller–Rabin uses modular multiplication and exponentiation carefully to avoid overflow.
  • →Recursion and Stack Behavior — Randomized quicksort/treap operations are recursive; understanding stack depth and tail-call patterns helps ensure stability.

Detailed Explanation

Tap terms for definitions

01Overview

Randomized algorithms are procedures that use random bits to make decisions during execution. Instead of always following a fixed path determined solely by the input, they occasionally flip a coin—choose random pivots, random priorities, random samples—to steer the computation. This often makes algorithms simpler and faster on average, even if worst-case inputs exist. Two major types exist: Las Vegas algorithms always return correct answers, but their runtime is random; Monte Carlo algorithms finish in predictable time but can return wrong answers with small probability. Classic examples include randomized quicksort and quickselect (Las Vegas), treaps (balanced BSTs with random priorities), and Miller–Rabin primality tests (Monte Carlo). Randomness also protects data structures against crafted worst-case inputs (e.g., salted hashing to avoid many collisions). The power of randomization comes from probabilistic analysis: expected values, tail bounds, and independence. With these tools, we can prove expected O(n \log n) for randomized quicksort, expected O(\log n) per operation for treaps, and exponentially small error probabilities for Monte Carlo tests, all with clean, practical implementations in C++.

02Intuition & Analogies

Imagine you're searching for a good restaurant in a city you don't know. A deterministic plan might walk every street in a fixed order—guaranteed to work but possibly very slow if the worst streets come first. A randomized plan might pick a random starting street or ask a few random people for tips. You might not always take the same path, but on average you find a great place quickly, and your approach is robust to someone trying to trick you (an adversary) because they don't know your random choices. This is the spirit of randomized algorithms: inject unpredictability to dodge bad luck and simplify logic.

  • Randomized quicksort: Picking a random pivot is like choosing a random person in line to help split the crowd into two halves. Most of the time, you split the line fairly, so repeated splitting finishes in about n log n steps.
  • Quickselect: To find the median, you randomly pick a pivot and throw away the half that cannot contain the median—like halving your search space again and again. Even if some splits are poor, most are good, so you finish in linear time on average.
  • Treap: Assign each node a random priority, as if shuffling cards before building a BST. Shuffled cards rarely produce long runs, so the tree height is logarithmic with high probability.
  • Randomized hashing: Add a secret random salt so that even if someone crafts many keys, they can't predict collisions reliably, keeping hash table operations fast.
  • Monte Carlo testing: Asking several random witnesses to confirm primality is like polling multiple independent voters; each extra vote reduces the chance of being wrong exponentially.

03Formal Definition

A randomized algorithm A is a function of the input x and a random bit string R. Its output and runtime are random variables determined by R. The expected running time on input x is \( [(x, R)] \), and the success probability (for decision/search problems) is \( [ A(x, R) ] \). - Las Vegas: For all x and all R, A(x, R) is correct; the runtime (x, R) is random. We analyze performance via \( [(x, R)] \) or high-probability bounds. Randomized quicksort/quickselect and treaps fall into this category. - Monte Carlo: For all x and all R, the runtime is bounded deterministically (often by a function of input size), but \( [ A(x, R) ] \). We can reduce \( \) by independent repetition (probability amplification). Miller–Rabin primality is a canonical example. Analytically, we often use indicator variables and linearity of expectation to compute expected work, and tail bounds to convert expectations into high-probability guarantees. For hashing, pairwise or k-wise independence gives collision probabilities like \( [h(x)=h(y)] 1/m \) for a table of size m under suitable hash families.

04When to Use

Use randomized algorithms when deterministic solutions are complex, slow in the worst case, or vulnerable to adversarial inputs.

  • Sorting/selection at scale: Randomized quicksort (expected O(n \log n)) and quickselect (expected O(n)) are simple, cache-friendly, and fast in practice.
  • Balanced BSTs without rotations logic: Treaps provide expected O(\log n) operations with lean, elegant code compared to deterministic red–black/AVL trees.
  • Hash tables in adversarial settings: Randomized hashing (salting, strong hash families) ensures expected O(1) operations even when inputs might be crafted to collide.
  • Primality testing and number theory: Miller–Rabin yields very fast, practical primality checks with controllable error.
  • Sampling/estimation: Reservoir sampling, randomized rounding, and Monte Carlo estimators are ideal when full data processing is expensive.
  • High-dimensional/complex optimization: Random restarts or randomized heuristics often break symmetry and escape worst-case patterns. Prefer randomized approaches when you can tolerate tiny error probabilities (Monte Carlo) or when correctness is mandatory but runtime can fluctuate (Las Vegas), and when average-case speed and simplicity matter more than worst-case guarantees.

āš ļøCommon Mistakes

  • Using poor RNGs: std::rand() with modulo bias or predictable seeds harms both performance and security. Prefer std::mt19937_64 seeded from std::random_device or a time-based seed, and avoid modulo bias with std::uniform_int_distribution.
  • Forgetting worst cases: Randomized quicksort still has O(n^2) worst-case time; do not disable safeguards like switching to insertion sort on small partitions or tail-recursion elimination to avoid stack overflows.
  • Insufficient amplification: Monte Carlo tests like Miller–Rabin need enough independent rounds to make error probabilities negligible; choosing k too small undermines reliability.
  • Misclassifying algorithms: Reservoir sampling is randomized but always correct with fixed runtime; it is not a Monte Carlo algorithm. Randomized quicksort is Las Vegas, not Monte Carlo.
  • Hashing without salting: Custom hash for unordered_map without a secret random salt can be exploited; always add a per-process random seed (splitmix64-based salt).
  • Ignoring independence assumptions: Analyses often assume independent trials; reusing correlated randomness (e.g., same seed for multiple structures) can break guarantees.
  • Overflow in number theory: Modular multiplication/power in Miller–Rabin must use 128-bit intermediates for 64-bit inputs; otherwise tests can silently fail.
  • Biased pivot selection: Using a fixed pivot (first/last) defeats the purpose; always sample uniformly (or use randomized shuffles) to avoid adversarial inputs.

Key Formulas

Randomized Quicksort Recurrence

Explanation: With a uniformly random pivot, the expected time equals the average over all split sizes plus linear partition cost. Solving this recurrence gives O(n log n) expected time.

Expected Comparisons in Quicksort

Explanation: Pairs of elements are compared only if one is chosen as the first pivot among the subarray spanning them. Summing their probabilities yields an O(n log n) expected number of comparisons.

Quickselect Expected Time

Explanation: Because each random pivot discards, in expectation, a constant fraction of elements, the expected total work is linear in n.

Probability Amplification

Explanation: If a single independent trial succeeds with probability p, then repeating k times and accepting if any succeeds increases success probability to 1 - (1-p)^k.

Markov's Inequality

Explanation: For any nonnegative random variable X, the chance it exceeds a is at most the ratio of its mean to a. Useful to turn expectations into simple tail bounds.

Chernoff Bound (Upper Tail)

Explanation: For a sum X of independent Bernoulli variables with mean , the probability of deviating above the mean by a factor (1+) decays exponentially in .

Union Bound

Explanation: The probability that any of several bad events occurs is at most the sum of their probabilities. Used to bound total collision probability across many pairs.

Pairwise-Independent Hash Collision

Explanation: With a pairwise-independent hash family into m buckets, the probability that two distinct keys collide is at most 1/m. This underpins expected O(1) hashing.

Miller–Rabin Error Bound

Explanation: Each independent random base fails to detect compositeness with probability at most for odd composite n. Repeating k times drives the error down exponentially.

Expected Treap Height

Explanation: Random priorities induce a random binary search tree structure comparable to a random permutation, whose expected height is logarithmic.

Complexity Analysis

Randomized quicksort runs in expected O(n log n) time and O(log n) expected stack space (due to average recursion depth) when choosing pivots uniformly at random. The worst case is still O() time and O(n) recursion depth, but the probability of encountering long bad runs is exponentially small, and practical mitigations (random shuffling, tail recursion elimination, cutoff to insertion sort) reduce risk. Randomized quickselect achieves expected O(n) time because each random pivot discards, on average, a constant fraction of elements; the worst case is O(), but again unlikely with uniform pivoting. Treaps provide expected O(log n) time per insert, erase, and search. Their space is O(n) for n nodes, plus O(h) = O(log n) auxiliary stack space per operation due to recursion. The expected height is O(log n), yielding the logarithmic performance. Randomized hashing with a salted custom hash keeps unordere operations at expected O(1) time and O(n) space. The salt prevents an adversary from forcing many collisions; under standard independence assumptions, the expected chain length is constant. Monte Carlo algorithms like Miller–Rabin run in time proportional to the number of rounds times the cost of modular exponentiation. For 64-bit integers, each round costs O(log n) word operations using fast modular multiplication; k rounds give O(k log n) time and constant extra space. The error probability decreases exponentially as , allowing a tunable trade-off between speed and reliability.

Code Examples

Randomized Quicksort and Quickselect (Las Vegas)
1#include <bits/stdc++.h>
2using namespace std;
3
4// 64-bit Mersenne Twister RNG seeded with high-resolution clock
5static mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
6
7// Partition using Lomuto scheme with a uniformly random pivot in [l, r]
8int partition_random(vector<long long>& a, int l, int r) {
9 uniform_int_distribution<int> dist(l, r);
10 int pidx = dist(rng);
11 swap(a[pidx], a[r]); // move pivot to end
12 long long pivot = a[r];
13 int i = l; // place for next smaller-than-pivot element
14 for (int j = l; j < r; ++j) {
15 if (a[j] < pivot) {
16 swap(a[i], a[j]);
17 ++i;
18 }
19 }
20 swap(a[i], a[r]);
21 return i; // final pivot position
22}
23
24void quicksort(vector<long long>& a, int l, int r) {
25 while (l < r) {
26 int p = partition_random(a, l, r);
27 // Recurse on smaller side first to limit recursion depth to O(log n) on average
28 if (p - l < r - p) {
29 if (l < p - 1) quicksort(a, l, p - 1);
30 l = p + 1;
31 } else {
32 if (p + 1 < r) quicksort(a, p + 1, r);
33 r = p - 1;
34 }
35 }
36}
37
38// Quickselect: find k-th smallest (0-indexed). Returns value.
39long long quickselect(vector<long long>& a, int l, int r, int k) {
40 while (true) {
41 int p = partition_random(a, l, r);
42 if (p == k) return a[p];
43 else if (k < p) r = p - 1; // search left
44 else l = p + 1; // search right
45 }
46}
47
48int main() {
49 ios::sync_with_stdio(false);
50 cin.tie(nullptr);
51
52 vector<long long> v = {9, 1, 7, 3, 2, 8, 6, 5, 4};
53 vector<long long> w = v; // copy for quickselect demo
54
55 // Quicksort demo
56 quicksort(v, 0, (int)v.size() - 1);
57 cout << "Sorted: ";
58 for (auto x : v) cout << x << ' ';
59 cout << '\n';
60
61 // Quickselect demo: find median (for odd n)
62 int k = (int)w.size() / 2;
63 long long kth = quickselect(w, 0, (int)w.size() - 1, k);
64 cout << k << "-th smallest (median index) = " << kth << '\n';
65
66 return 0;
67}
68

We uniformly randomize the pivot for both quicksort and quickselect. Random pivoting avoids structured worst-case inputs (e.g., already sorted arrays) and yields Las Vegas behavior: the result is always correct, while running time is a random variable with excellent expectation. Tail recursion elimination reduces recursion depth in practice.

Time: Quicksort: expected O(n log n), worst-case O(n^2). Quickselect: expected O(n), worst-case O(n^2).Space: Expected O(log n) stack for quicksort due to recursion pattern; O(1) extra space for partitioning.
Treap: Randomized Balanced BST (insert, erase, find, k-th)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Node {
5 int key; // BST key
6 uint64_t prio; // random priority (heap property)
7 int sz; // subtree size (for order statistics)
8 Node* l; Node* r;
9 Node(int k, uint64_t p): key(k), prio(p), sz(1), l(nullptr), r(nullptr) {}
10};
11
12static mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
13
14int getsz(Node* t) { return t ? t->sz : 0; }
15void pull(Node* t) { if (t) t->sz = 1 + getsz(t->l) + getsz(t->r); }
16
17// Split by key: left contains keys <= key, right contains keys > key
18pair<Node*, Node*> split(Node* t, int key) {
19 if (!t) return {nullptr, nullptr};
20 if (t->key <= key) {
21 auto pr = split(t->r, key);
22 t->r = pr.first; pull(t);
23 return {t, pr.second};
24 } else {
25 auto pr = split(t->l, key);
26 t->l = pr.second; pull(t);
27 return {pr.first, t};
28 }
29}
30
31Node* merge(Node* a, Node* b) {
32 if (!a || !b) return a ? a : b;
33 if (a->prio > b->prio) { // higher priority on top (max-heap)
34 a->r = merge(a->r, b); pull(a); return a;
35 } else {
36 b->l = merge(a, b->l); pull(b); return b;
37 }
38}
39
40bool find(Node* t, int key) {
41 while (t) {
42 if (key == t->key) return true;
43 t = (key < t->key) ? t->l : t->r;
44 }
45 return false;
46}
47
48// Insert key (ignores duplicates; change split boundary to allow)
49Node* insert(Node* t, int key) {
50 if (find(t, key)) return t; // skip duplicates for simplicity
51 Node* node = new Node(key, rng());
52 auto ab = split(t, key);
53 return merge(merge(ab.first, node), ab.second);
54}
55
56Node* erase(Node* t, int key) {
57 if (!t) return nullptr;
58 if (key == t->key) {
59 Node* u = merge(t->l, t->r);
60 delete t; return u;
61 } else if (key < t->key) {
62 t->l = erase(t->l, key); pull(t); return t;
63 } else {
64 t->r = erase(t->r, key); pull(t); return t;
65 }
66}
67
68// k-th smallest (1-indexed). Assumes 1 <= k <= size
69int kth(Node* t, int k) {
70 if (!t) throw runtime_error("k out of bounds");
71 int ls = getsz(t->l);
72 if (k == ls + 1) return t->key;
73 if (k <= ls) return kth(t->l, k);
74 return kth(t->r, k - ls - 1);
75}
76
77void inorder(Node* t) {
78 if (!t) return; inorder(t->l); cout << t->key << ' '; inorder(t->r);
79}
80
81void destroy(Node* t) { if (!t) return; destroy(t->l); destroy(t->r); delete t; }
82
83int main() {
84 ios::sync_with_stdio(false);
85 cin.tie(nullptr);
86
87 Node* root = nullptr;
88 vector<int> vals = {5, 1, 8, 3, 6, 2, 9, 7, 4};
89 for (int x : vals) root = insert(root, x);
90
91 cout << "Inorder (sorted): "; inorder(root); cout << '\n';
92 cout << "k-th queries: ";
93 for (int k = 1; k <= 5; ++k) cout << kth(root, k) << ' ';
94 cout << '\n';
95
96 root = erase(root, 5);
97 cout << "After erase 5: "; inorder(root); cout << '\n';
98
99 cout << "Contains 6? " << (find(root, 6) ? "yes" : "no") << '\n';
100
101 destroy(root);
102 return 0;
103}
104

A treap maintains BST order by key and heap order by a random priority, which is drawn uniformly from a large range. The random heap order makes the expected height O(log n). We implement split/merge for elegant insert/erase and support k-th queries via subtree sizes.

Time: Expected O(log n) per insert/erase/find/k-th; worst-case O(n) with vanishing probability.Space: O(n) for nodes plus O(log n) expected recursion stack during operations.
Randomized Hashing (Salted custom hash for unordered_map and pairs)
1#include <bits/stdc++.h>
2using namespace std;
3
4// SplitMix64: fast mixer suitable for hash seeds (not cryptographic)
5// Source: Sebastiano Vigna (public domain)
6static uint64_t splitmix64(uint64_t x) {
7 x += 0x9e3779b97f4a7c15ULL;
8 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
9 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
10 return x ^ (x >> 31);
11}
12
13struct RandomSalt {
14 static uint64_t salt() {
15 static const uint64_t s = chrono::steady_clock::now().time_since_epoch().count();
16 return s;
17 }
18};
19
20struct custom_hash {
21 size_t operator()(uint64_t x) const noexcept {
22 // Add secret salt then mix
23 return splitmix64(x + RandomSalt::salt());
24 }
25 size_t operator()(const pair<uint64_t,uint64_t>& p) const noexcept {
26 // Combine with a shifted salt to decorrelate coordinates
27 uint64_t h1 = splitmix64(p.first + RandomSalt::salt());
28 uint64_t h2 = splitmix64(p.second + (RandomSalt::salt() << 1));
29 return h1 ^ (h2 + 0x9e3779b97f4a7c15ULL + (h1<<6) + (h1>>2)); // standard combine
30 }
31};
32
33int main() {
34 ios::sync_with_stdio(false);
35 cin.tie(nullptr);
36
37 // Salted map from 64-bit to int
38 unordered_map<uint64_t, int, custom_hash> freq;
39 vector<uint64_t> keys = {1, 2, 3, 0xDEADBEEFDEADBEEFULL, 42, 42, 1};
40 for (auto k : keys) freq[k]++;
41 cout << "Counts:\n";
42 for (auto &kv : freq) cout << kv.first << " -> " << kv.second << '\n';
43
44 // Salted map with pair keys
45 unordered_map<pair<uint64_t,uint64_t>, string, custom_hash> grid;
46 grid[{1,2}] = "A";
47 grid[{2,1}] = "B";
48 grid[{1000000007ULL, 1000000009ULL}] = "primes";
49 cout << "grid[(1,2)] = " << grid[{1,2}] << '\n';
50
51 return 0;
52}
53

We add a secret, per-process random salt to the hash inputs and then mix with SplitMix64. This prevents adversaries from predicting collisions and forcing worst-case O(n) behavior in unordered_map. The same idea extends to composite keys like pairs.

Time: Each operation is expected O(1). In adversarial scenarios without salting, worst-case can degrade to O(n).Space: O(n) to store entries; O(1) extra per operation.
Miller–Rabin Primality Test (Monte Carlo with k rounds)
1#include <bits/stdc++.h>
2using namespace std;
3
4static mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
5
6// Multiply a*b % mod using 128-bit to avoid overflow on 64-bit inputs
7static inline uint64_t mul_mod(uint64_t a, uint64_t b, uint64_t mod) {
8 __uint128_t z = ( __uint128_t )a * b;
9 return (uint64_t)(z % mod);
10}
11
12static uint64_t pow_mod(uint64_t a, uint64_t e, uint64_t mod) {
13 uint64_t r = 1;
14 while (e) {
15 if (e & 1) r = mul_mod(r, a, mod);
16 a = mul_mod(a, a, mod);
17 e >>= 1;
18 }
19 return r;
20}
21
22bool miller_rabin_once(uint64_t n, uint64_t a, uint64_t d, int s) {
23 // n-1 = d * 2^s with d odd
24 uint64_t x = pow_mod(a % n, d, n);
25 if (x == 1 || x == n - 1) return true;
26 for (int r = 1; r < s; ++r) {
27 x = mul_mod(x, x, n);
28 if (x == n - 1) return true;
29 }
30 return false;
31}
32
33// Returns true if n is probably prime; false if composite
34bool is_probable_prime(uint64_t n, int k = 8) {
35 if (n < 2) return false;
36 static const uint64_t small_primes[] = {2,3,5,7,11,13,17,19,23,0};
37 for (int i = 0; small_primes[i]; ++i) {
38 if (n % small_primes[i] == 0) return n == small_primes[i];
39 }
40 // write n-1 = d * 2^s with d odd
41 uint64_t d = n - 1; int s = 0;
42 while ((d & 1) == 0) { d >>= 1; ++s; }
43
44 uniform_int_distribution<uint64_t> dist(2, n - 2);
45 for (int i = 0; i < k; ++i) {
46 uint64_t a = dist(rng);
47 if (!miller_rabin_once(n, a, d, s)) return false; // composite
48 }
49 return true; // probably prime; error <= (1/4)^k
50}
51
52int main() {
53 ios::sync_with_stdio(false);
54 cin.tie(nullptr);
55
56 vector<uint64_t> tests = {2, 3, 4, 5, 17, 561, 1105, 1'000'000'007ULL, 1'000'000'009ULL};
57 for (auto n : tests) {
58 bool pp = is_probable_prime(n, 10); // 10 rounds -> error <= (1/4)^10
59 cout << n << ": " << (pp ? "probably prime" : "composite") << '\n';
60 }
61 return 0;
62}
63

Miller–Rabin is a Monte Carlo algorithm: each random witness (base a) independently detects compositeness with probability at least 3/4 for odd composite n. Repeating k times reduces the error probability to (1/4)^k while runtime scales linearly in k. We use 128-bit intermediates to avoid overflow.

Time: O(k log n) word operations for 64-bit n (k rounds of modular exponentiation).Space: O(1) extra space.