Sieve of Eratosthenes
Key Points
- •The Sieve of Eratosthenes marks multiples of each prime to find all primes up to n in O(n log log n) time.
- •You can start crossing out at because smaller multiples were already removed by smaller primes.
- •Using a bitset or vector<bool> packs memory so the sieve needs O(n) bits, which is very memory efficient.
- •A segmented sieve finds primes in a large range [L, R] by sieving only that window using primes up to .
- •The linear (Euler) sieve can compute primes and the smallest prime factor (SPF) array in O(n) time.
- •With an SPF array, you can factorize any number up to n in O(log n) time by repeatedly dividing by SPF.
- •Prime counts up to n are about n / log n (by the Prime Number Theorem), so the sieve outputs roughly that many primes.
- •Watch for pitfalls: use 64-bit for p*p, handle L = 1 in segmented sieve, and avoid re-marking evens for speed.
Prerequisites
- →Prime and composite numbers — You must know what primes and composites are to understand what the sieve marks.
- →Integer division and modulo — Rounding up to the first multiple in a range and factorization logic rely on division and modulus.
- →Square root bound — Recognizing that checking primes up to √n suffices underpins sieve loops and correctness.
- →Big-O notation — Complexity results like O(n log log n) and O(n) need asymptotic understanding.
- →Arrays and memory layout — The sieve is an array-marking algorithm where contiguous access matters for performance.
- →64-bit integers — Prevent overflow when computing p*p and handling large ranges [L, R].
- →Loops and conditionals in C++ — Implementing the sieve involves nested loops with careful bounds and conditions.
Detailed Explanation
Tap terms for definitions01Overview
Hook → Imagine crossing off days on a giant calendar: you strike out every 2nd day, then every 3rd, then every 5th, and so on. Very quickly, only special days remain. Those survivors are like prime numbers among the integers. Concept → The Sieve of Eratosthenes is an ancient yet still optimal method to list all primes up to a limit n. Instead of checking each number individually, it systematically eliminates composites by marking multiples of each discovered prime. The clever trick is to start marking from p^2 because any smaller multiple of p has already been marked by a smaller prime factor. Example → If n = 30: start with all numbers marked as potentially prime. Mark multiples of 2 (4, 6, 8, ...), then multiples of 3 starting at 9, then 5 starting at 25. The remaining unmarked numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. This approach runs in O(n log log n) time and O(n) bits of memory, making it ideal for competitive programming tasks where n can be as large as 10^7 with tight time limits.
02Intuition & Analogies
Hook → Think of finding prime numbers like sorting marbles on a conveyor belt with labels. If you try to inspect each marble deeply to decide if it’s special (prime), you’ll be too slow. But if you instead use simple stamping machines that mark every k-th marble for various k’s, the unmarked ones left at the end are your special marbles. Concept → The sieve uses the idea that every composite number has a smallest prime factor. When you run through candidates in increasing order, the first time you see a number p that’s still unmarked, you know it’s prime. Then you let a machine start stamping every p-th number as composite. Crucially, you can start stamping from p^2 because p·2, p·3, ..., p·(p−1) were already stamped when processing 2, 3, ..., p−1. This saves lots of work. Example → For 2, cross out even numbers. For 3, cross out 9, 12, 15, ... (start at 9). For 5, cross out 25, 30, ... (start at 25). Notice how you never revisit work unnecessarily. When numbers get large, you don’t need the whole conveyor belt at once: you can slide a window [L, R] (segmented sieve) and stamp only inside that window using the same small set of machines (primes up to \sqrt{R}). And if you want more than just the list of primes, you can attach a label (SPF—the smallest prime factor) to each marble as you go, which later lets you crack a number into its prime pieces in almost no time.
03Formal Definition
04When to Use
Hook → Different problems need different flavors of the sieve depending on input size and what you need to compute beyond just listing primes. Concept → Use the classic sieve when you need all primes up to a single bound n (e.g., n \le 10^7 to 10^8 depending on memory/time). If you need to factor many numbers quickly, build an SPF array with the linear sieve and then factor each query in O(log n). When handling very large upper bounds but relatively small ranges (e.g., count primes in [L, R] where R up to 10^{12} and R−L up to 10^{6}), use a segmented sieve to limit memory and achieve good cache behavior. Example → Competitive programming patterns: (1) Precompute primes up to 10^7 once at the start, then answer many is_prime(x) queries in O(1). (2) Precompute SPF up to 2·10^6, then factor up to 10^5 queries fast. (3) For test cases giving L and R up to 10^{12} with windows of width up to 10^6, generate base primes up to \sqrt{R} (\le 10^6) and segmented-sieve the window. (4) If memory is tight, use an odd-only sieve (halve memory) or a manually packed bitset.
⚠️Common Mistakes
Hook → Most wrong answers come from subtle off-by-one and overflow issues—not the algorithm itself. Concept → Frequent pitfalls include starting to mark at 2p instead of p^2 (slower but still correct), integer overflow from computing p*p in 32-bit integers when n is large, forgetting to handle L = 1 in a segmented sieve, and mixing inclusive/exclusive bounds incorrectly. Using std::bitset with a runtime size is impossible (it requires a compile-time constant), and allocating too large a bitset can exceed memory. Example → Mistake 1: for (int j = p * p; j <= n; j += p) with p as int and n up to 10^9 can overflow j at p ≈ 46341; fix by promoting to long long. Mistake 2: In segmented sieve, leaving 1 as prime when L = 1 leads to wrong outputs; explicitly mark index 0 as composite in that case. Mistake 3: Computing the first multiple in [L, R] as (L / p) * p without rounding up properly; correct is j = max(1LL * p * p, ((L + p - 1) / p) * 1LL * p). Mistake 4: Re-marking even multiples in the inner loop wastes time; you can skip evens or sieve only odds. Mistake 5: Using std::vector<bool> without realizing it’s a packed bit proxy type—this is fine for memory but can be slower in some contexts; consider std::vector<uint8_t> or a custom bitset when performance is critical.
Key Formulas
Prime Number Theorem
Explanation: The number of grows like n divided by the natural logarithm of n. This estimates how many primes the sieve will output.
Mertens-like sum over primes
Explanation: The harmonic sum over primes diverges very slowly, roughly like log log n. Plugging this into the sieve’s work explains the O(n log log n) time.
Work of classic sieve
Explanation: Marking multiples of each prime p costs about n/p assignments. Summing over primes yields the sieve’s total time complexity.
Time complexity (classic sieve)
Explanation: The classic Sieve of Eratosthenes runs in near-linear time with a very slow-growing log log factor.
Time complexity (linear sieve)
Explanation: Euler’s sieve ensures each composite is marked once, giving a truly linear run time while also building SPF.
Space complexity (packed sieve)
Explanation: Using a bit-per-number representation, the sieve’s memory grows proportionally to n. This is about n/8 bytes.
Upper bound for sieving primes
Explanation: You only need to mark multiples for base primes up to √n, since larger primes’ first multiple would exceed n.
First multiple in [L, R]
Explanation: In segmented sieve, the first multiple of p to mark inside [L, R] is the larger of and the least multiple of .
Segmented sieve cost
Explanation: Sieving a window needs linear time in the window size plus the cost to generate base primes up to √R (about R/log R in total once).
Factorization via SPF
Explanation: Using the SPF array, factorizing x takes time proportional to the number of prime factors with multiplicity, which is at most logarithmic.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Classic Sieve of Eratosthenes: generate primes up to n 5 // Uses vector<bool> which packs bits (bitset-like memory usage) 6 7 vector<int> sieve_primes(int n, vector<bool>& is_prime) { 8 is_prime.assign(n + 1, true); 9 if (n >= 0) is_prime[0] = false; 10 if (n >= 1) is_prime[1] = false; 11 12 // Only need to loop p up to sqrt(n) 13 for (int p = 2; 1LL * p * p <= n; ++p) { 14 if (is_prime[p]) { 15 // Start at p*p to avoid redundant marking; cast to long long to prevent overflow 16 long long start = 1LL * p * p; 17 for (long long j = start; j <= n; j += p) { 18 is_prime[(size_t)j] = false; 19 } 20 } 21 } 22 23 vector<int> primes; 24 for (int i = 2; i <= n; ++i) if (is_prime[i]) primes.push_back(i); 25 return primes; 26 } 27 28 int main() { 29 ios::sync_with_stdio(false); 30 cin.tie(nullptr); 31 32 int n; // Example: input 30 -> prints primes up to 30 33 if (!(cin >> n)) return 0; 34 35 vector<bool> is_prime; 36 vector<int> primes = sieve_primes(n, is_prime); 37 38 cout << "Count = " << primes.size() << "\n"; 39 for (int p : primes) cout << p << ' '; 40 cout << '\n'; 41 return 0; 42 } 43
We store a boolean flag for each number from 0 to n. For each prime p ≤ √n that remains true, we mark multiples starting from p^2 as composite. After the process, indices with true flags are primes. vector<bool> packs bits to reduce memory and is enough for typical constraints like n up to 10^7.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Linear sieve producing prime list and SPF (Smallest Prime Factor) 5 // Each composite is marked once as i * p where p is its smallest prime factor 6 7 struct LinearSieve { 8 int n; 9 vector<int> primes; // list of primes up to n 10 vector<int> spf; // spf[x] = smallest prime factor of x (spf[p] = p) 11 12 explicit LinearSieve(int n): n(n), spf(n + 1, 0) { 13 primes.reserve(n / 10); 14 for (int i = 2; i <= n; ++i) { 15 if (spf[i] == 0) { // i is prime 16 spf[i] = i; 17 primes.push_back(i); 18 } 19 for (int p : primes) { 20 long long x = 1LL * i * p; 21 if (x > n) break; 22 spf[(int)x] = p; // p is the smallest prime factor for x 23 if (p == spf[i]) break; // ensure each composite is marked once 24 } 25 } 26 } 27 28 // Factorize number x (2 <= x <= n) in O(number of prime factors) 29 vector<pair<int,int>> factorize(int x) const { 30 vector<pair<int,int>> fac; 31 while (x > 1) { 32 int p = spf[x]; 33 int cnt = 0; 34 while (x % p == 0) { 35 x /= p; 36 ++cnt; 37 } 38 fac.push_back({p, cnt}); 39 } 40 return fac; 41 } 42 }; 43 44 int main() { 45 ios::sync_with_stdio(false); 46 cin.tie(nullptr); 47 48 int N, Q; // Example: N=100, Q=3 with queries like 60 84 97 49 if (!(cin >> N >> Q)) return 0; 50 51 LinearSieve sieve(N); 52 53 // Print primes up to N (optional) 54 // for (int p : sieve.primes) cout << p << ' '; cout << '\n'; 55 56 while (Q--) { 57 int x; cin >> x; 58 auto fac = sieve.factorize(x); 59 cout << x << " = "; 60 for (size_t i = 0; i < fac.size(); ++i) { 61 auto [p, c] = fac[i]; 62 cout << p; 63 if (c > 1) cout << '^' << c; 64 if (i + 1 != fac.size()) cout << " * "; 65 } 66 cout << '\n'; 67 } 68 return 0; 69 } 70
Euler’s sieve produces an SPF array while ensuring each composite is marked only once, yielding O(n) time. Using SPF, factorization proceeds by repeatedly dividing by the smallest prime factor, giving very fast query-time decomposition.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Segmented sieve: compute primes in [L, R] (inclusive) 5 // 1) Pre-sieve primes up to sqrt(R) 6 // 2) Mark multiples in the window [L, R] 7 8 static vector<int> base_primes_up_to(long long n) { 9 int m = (int)n; 10 vector<bool> is_prime(m + 1, true); 11 if (m >= 0) is_prime[0] = false; 12 if (m >= 1) is_prime[1] = false; 13 for (long long p = 2; p * p <= m; ++p) if (is_prime[(size_t)p]) 14 for (long long j = p * p; j <= m; j += p) 15 is_prime[(size_t)j] = false; 16 vector<int> primes; 17 for (int i = 2; i <= m; ++i) if (is_prime[i]) primes.push_back(i); 18 return primes; 19 } 20 21 vector<long long> segmented_sieve(long long L, long long R) { 22 if (L > R) return {}; 23 // Handle small ranges 24 L = max(2LL, L); 25 int sz = (int)(R - L + 1); 26 vector<bool> is_prime_range(sz, true); 27 28 long long root = floor(sqrt((long double)R)); 29 vector<int> primes = base_primes_up_to(root); 30 31 for (long long p : primes) { 32 long long start = max(p * p, ((L + p - 1) / p) * p); // first multiple in [L, R] 33 for (long long j = start; j <= R; j += p) { 34 is_prime_range[(size_t)(j - L)] = false; 35 } 36 } 37 38 vector<long long> res; 39 for (int i = 0; i < sz; ++i) if (is_prime_range[i]) res.push_back(L + i); 40 return res; 41 } 42 43 int main() { 44 ios::sync_with_stdio(false); 45 cin.tie(nullptr); 46 47 long long L, R; // Example: input 10 50 -> primes in [10,50] 48 if (!(cin >> L >> R)) return 0; 49 50 auto primes = segmented_sieve(L, R); 51 cout << "Count = " << primes.size() << "\n"; 52 for (auto p : primes) cout << p << ' '; 53 cout << '\n'; 54 return 0; 55 } 56
We first compute all primes up to √R with a standard sieve. Then we maintain a boolean window for [L, R] and mark multiples of each base prime within the window. This uses O(R−L+1) memory, which is ideal for large R when the range width is moderate.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Odd-only sieve: represent only odd numbers ≥ 3 5 // index i corresponds to number val = 2*i + 3 6 // is_comp[i] == false means the odd number is prime candidate 7 8 struct OddSieve { 9 int n; // upper bound 10 vector<uint8_t> comp; // byte-per-flag (simple). Could pack further if needed. 11 12 explicit OddSieve(int n): n(n) { 13 int m = max(0, (n - 3) / 2 + 1); // count of odds in [3..n] 14 comp.assign(m, 0); 15 int limit = (int)floor(sqrt((long double)n)); 16 auto idx = [](int x){ return (x - 3) >> 1; }; // map odd x>=3 -> index 17 18 for (int p = 3; p <= limit; p += 2) { 19 if (p >= 3 && ((p & 1) == 1) && !comp[idx(p)]) { 20 long long start = 1LL * p * p; // first multiple to mark 21 if (start > n) break; 22 // convert start (odd) to index 23 for (long long j = start; j <= n; j += 2LL * p) { 24 comp[idx((int)j)] = 1; 25 } 26 } 27 } 28 } 29 30 bool isPrime(int x) const { 31 if (x < 2) return false; 32 if (x == 2) return true; 33 if ((x & 1) == 0) return false; // even > 2 34 int i = (x - 3) >> 1; 35 if (i < 0 || i >= (int)comp.size()) return false; // out of range 36 return comp[i] == 0; 37 } 38 39 vector<int> primes() const { 40 vector<int> res; if (n >= 2) res.push_back(2); 41 for (int i = 0; i < (int)comp.size(); ++i) if (!comp[i]) res.push_back(2*i + 3); 42 return res; 43 } 44 }; 45 46 int main() { 47 ios::sync_with_stdio(false); 48 cin.tie(nullptr); 49 50 int n; // Example: input 100 -> primes up to 100 51 if (!(cin >> n)) return 0; 52 53 OddSieve S(n); 54 55 auto ps = S.primes(); 56 cout << "Count = " << ps.size() << "\n"; 57 for (int p : ps) cout << p << ' '; 58 cout << '\n'; 59 60 // Example prime queries 61 int q; if (cin >> q) { 62 while (q--) { 63 int x; cin >> x; 64 cout << x << (S.isPrime(x) ? " is prime" : " is not prime") << '\n'; 65 } 66 } 67 return 0; 68 } 69
We skip all evens and store flags only for odds ≥ 3, halving memory and work. The inner loop steps by 2p to remain on odd multiples. Prime queries become O(1) array lookups.