MathIntermediate

Sieve of Eratosthenes

Key Points

  • The Sieve of Eratosthenes marks multiples of each prime to find all primes up to n in O(n log log n) time.
  • You can start crossing out at because smaller multiples were already removed by smaller primes.
  • Using a bitset or vector<bool> packs memory so the sieve needs O(n) bits, which is very memory efficient.
  • A segmented sieve finds primes in a large range [L, R] by sieving only that window using primes up to .
  • The linear (Euler) sieve can compute primes and the smallest prime factor (SPF) array in O(n) time.
  • With an SPF array, you can factorize any number up to n in O(log n) time by repeatedly dividing by SPF.
  • Prime counts up to n are about n / log n (by the Prime Number Theorem), so the sieve outputs roughly that many primes.
  • Watch for pitfalls: use 64-bit for p*p, handle L = 1 in segmented sieve, and avoid re-marking evens for speed.

Prerequisites

  • Prime and composite numbersYou must know what primes and composites are to understand what the sieve marks.
  • Integer division and moduloRounding up to the first multiple in a range and factorization logic rely on division and modulus.
  • Square root boundRecognizing that checking primes up to √n suffices underpins sieve loops and correctness.
  • Big-O notationComplexity results like O(n log log n) and O(n) need asymptotic understanding.
  • Arrays and memory layoutThe sieve is an array-marking algorithm where contiguous access matters for performance.
  • 64-bit integersPrevent overflow when computing p*p and handling large ranges [L, R].
  • Loops and conditionals in C++Implementing the sieve involves nested loops with careful bounds and conditions.

Detailed Explanation

Tap terms for definitions

01Overview

Hook → Imagine crossing off days on a giant calendar: you strike out every 2nd day, then every 3rd, then every 5th, and so on. Very quickly, only special days remain. Those survivors are like prime numbers among the integers. Concept → The Sieve of Eratosthenes is an ancient yet still optimal method to list all primes up to a limit n. Instead of checking each number individually, it systematically eliminates composites by marking multiples of each discovered prime. The clever trick is to start marking from p^2 because any smaller multiple of p has already been marked by a smaller prime factor. Example → If n = 30: start with all numbers marked as potentially prime. Mark multiples of 2 (4, 6, 8, ...), then multiples of 3 starting at 9, then 5 starting at 25. The remaining unmarked numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. This approach runs in O(n log log n) time and O(n) bits of memory, making it ideal for competitive programming tasks where n can be as large as 10^7 with tight time limits.

02Intuition & Analogies

Hook → Think of finding prime numbers like sorting marbles on a conveyor belt with labels. If you try to inspect each marble deeply to decide if it’s special (prime), you’ll be too slow. But if you instead use simple stamping machines that mark every k-th marble for various k’s, the unmarked ones left at the end are your special marbles. Concept → The sieve uses the idea that every composite number has a smallest prime factor. When you run through candidates in increasing order, the first time you see a number p that’s still unmarked, you know it’s prime. Then you let a machine start stamping every p-th number as composite. Crucially, you can start stamping from p^2 because p·2, p·3, ..., p·(p−1) were already stamped when processing 2, 3, ..., p−1. This saves lots of work. Example → For 2, cross out even numbers. For 3, cross out 9, 12, 15, ... (start at 9). For 5, cross out 25, 30, ... (start at 25). Notice how you never revisit work unnecessarily. When numbers get large, you don’t need the whole conveyor belt at once: you can slide a window [L, R] (segmented sieve) and stamp only inside that window using the same small set of machines (primes up to \sqrt{R}). And if you want more than just the list of primes, you can attach a label (SPF—the smallest prime factor) to each marble as you go, which later lets you crack a number into its prime pieces in almost no time.

03Formal Definition

the intuition, we can describe the sieve precisely and reason about its cost and correctness. a given integer n 2, initialize an array i[0..n] as true except indices 0 and 1. For each integer p from 2 to , if i[p] is true, mark all multiples , + p, + 2p, ... n as not prime. After finishing, the set P = { i | 2 i n and i[i] } is exactly the set of primes up to n. Correctness follows because every composite m has a prime factor p , so m is marked when processing p; and primes are never marked because they have no smaller prime factor. time is O(n n), stemming from the sum over primes of n/p operations, and the space is O(n) bits (or O(n) booleans). For ranges [L, R], let S be the primes up to . Mark for each p S all multiples in [L, R] starting at max(, L/p p). The unmarked numbers (except 1) are prime in [L, R]. The linear (Euler) sieve variant maintains a prime list and smallest prime factor (SPF) array and ensures each composite is marked exactly once, achieving O(n) time while simultaneously producing SPF.

04When to Use

Hook → Different problems need different flavors of the sieve depending on input size and what you need to compute beyond just listing primes. Concept → Use the classic sieve when you need all primes up to a single bound n (e.g., n \le 10^7 to 10^8 depending on memory/time). If you need to factor many numbers quickly, build an SPF array with the linear sieve and then factor each query in O(log n). When handling very large upper bounds but relatively small ranges (e.g., count primes in [L, R] where R up to 10^{12} and R−L up to 10^{6}), use a segmented sieve to limit memory and achieve good cache behavior. Example → Competitive programming patterns: (1) Precompute primes up to 10^7 once at the start, then answer many is_prime(x) queries in O(1). (2) Precompute SPF up to 2·10^6, then factor up to 10^5 queries fast. (3) For test cases giving L and R up to 10^{12} with windows of width up to 10^6, generate base primes up to \sqrt{R} (\le 10^6) and segmented-sieve the window. (4) If memory is tight, use an odd-only sieve (halve memory) or a manually packed bitset.

⚠️Common Mistakes

Hook → Most wrong answers come from subtle off-by-one and overflow issues—not the algorithm itself. Concept → Frequent pitfalls include starting to mark at 2p instead of p^2 (slower but still correct), integer overflow from computing p*p in 32-bit integers when n is large, forgetting to handle L = 1 in a segmented sieve, and mixing inclusive/exclusive bounds incorrectly. Using std::bitset with a runtime size is impossible (it requires a compile-time constant), and allocating too large a bitset can exceed memory. Example → Mistake 1: for (int j = p * p; j <= n; j += p) with p as int and n up to 10^9 can overflow j at p ≈ 46341; fix by promoting to long long. Mistake 2: In segmented sieve, leaving 1 as prime when L = 1 leads to wrong outputs; explicitly mark index 0 as composite in that case. Mistake 3: Computing the first multiple in [L, R] as (L / p) * p without rounding up properly; correct is j = max(1LL * p * p, ((L + p - 1) / p) * 1LL * p). Mistake 4: Re-marking even multiples in the inner loop wastes time; you can skip evens or sieve only odds. Mistake 5: Using std::vector<bool> without realizing it’s a packed bit proxy type—this is fine for memory but can be slower in some contexts; consider std::vector<uint8_t> or a custom bitset when performance is critical.

Key Formulas

Prime Number Theorem

Explanation: The number of grows like n divided by the natural logarithm of n. This estimates how many primes the sieve will output.

Mertens-like sum over primes

Explanation: The harmonic sum over primes diverges very slowly, roughly like log log n. Plugging this into the sieve’s work explains the O(n log log n) time.

Work of classic sieve

Explanation: Marking multiples of each prime p costs about n/p assignments. Summing over primes yields the sieve’s total time complexity.

Time complexity (classic sieve)

Explanation: The classic Sieve of Eratosthenes runs in near-linear time with a very slow-growing log log factor.

Time complexity (linear sieve)

Explanation: Euler’s sieve ensures each composite is marked once, giving a truly linear run time while also building SPF.

Space complexity (packed sieve)

Explanation: Using a bit-per-number representation, the sieve’s memory grows proportionally to n. This is about n/8 bytes.

Upper bound for sieving primes

Explanation: You only need to mark multiples for base primes up to √n, since larger primes’ first multiple would exceed n.

First multiple in [L, R]

Explanation: In segmented sieve, the first multiple of p to mark inside [L, R] is the larger of and the least multiple of .

Segmented sieve cost

Explanation: Sieving a window needs linear time in the window size plus the cost to generate base primes up to √R (about R/log R in total once).

Factorization via SPF

Explanation: Using the SPF array, factorizing x takes time proportional to the number of prime factors with multiplicity, which is at most logarithmic.

Complexity Analysis

The classic Sieve of Eratosthenes iterates over primes p ≤ √n and marks multiples starting from . For each prime p, the number of markings is about n/p. Summing over primes yields ∑_{p ≤ n} n/ ∑_{p ≤ n} 1/ (log log n + M + o(1)), which is O(n log log n). In practice, the hidden constant is small thanks to tight memory access patterns if you store flags contiguously. The space complexity is O(n) booleans; with a packed representation ( or a custom bitset), this is O(n) bits, i.e., n/8 bytes. For example, needs roughly 10^7 bits ≈ 1.25 MB. The linear (Euler) sieve ensures each composite m is marked exactly once as i × p where p is the smallest prime dividing m. This eliminates the log log factor, giving O(n) time, and it naturally produces an SPF (or LPF) array. Constants can be somewhat larger than the classic sieve, but for building SPF it is the method of choice. The segmented sieve splits [2, n] (or any [L, R]) into blocks of size B. You precompute base primes up to √R in O(√R log log √R), then, for each block, mark multiples of those base primes within the block in O(B ∑ 1/p) ≈ O(B log log R). Overall, sieving the full range [2, n] with good blocking costs about O(n log log n), while memory stays O(B) for the block plus O(√R) for the base primes. Careful block sizing (e.g., B ≈ 10^6) improves cache locality and reduces constant factors.

Code Examples

Classic sieve up to n (mark from p^2) with packed booleans
1#include <bits/stdc++.h>
2using namespace std;
3
4// Classic Sieve of Eratosthenes: generate primes up to n
5// Uses vector<bool> which packs bits (bitset-like memory usage)
6
7vector<int> sieve_primes(int n, vector<bool>& is_prime) {
8 is_prime.assign(n + 1, true);
9 if (n >= 0) is_prime[0] = false;
10 if (n >= 1) is_prime[1] = false;
11
12 // Only need to loop p up to sqrt(n)
13 for (int p = 2; 1LL * p * p <= n; ++p) {
14 if (is_prime[p]) {
15 // Start at p*p to avoid redundant marking; cast to long long to prevent overflow
16 long long start = 1LL * p * p;
17 for (long long j = start; j <= n; j += p) {
18 is_prime[(size_t)j] = false;
19 }
20 }
21 }
22
23 vector<int> primes;
24 for (int i = 2; i <= n; ++i) if (is_prime[i]) primes.push_back(i);
25 return primes;
26}
27
28int main() {
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 int n; // Example: input 30 -> prints primes up to 30
33 if (!(cin >> n)) return 0;
34
35 vector<bool> is_prime;
36 vector<int> primes = sieve_primes(n, is_prime);
37
38 cout << "Count = " << primes.size() << "\n";
39 for (int p : primes) cout << p << ' ';
40 cout << '\n';
41 return 0;
42}
43

We store a boolean flag for each number from 0 to n. For each prime p ≤ √n that remains true, we mark multiples starting from p^2 as composite. After the process, indices with true flags are primes. vector<bool> packs bits to reduce memory and is enough for typical constraints like n up to 10^7.

Time: O(n log log n)Space: O(n) bits
Linear (Euler) sieve with SPF for fast factorization
1#include <bits/stdc++.h>
2using namespace std;
3
4// Linear sieve producing prime list and SPF (Smallest Prime Factor)
5// Each composite is marked once as i * p where p is its smallest prime factor
6
7struct LinearSieve {
8 int n;
9 vector<int> primes; // list of primes up to n
10 vector<int> spf; // spf[x] = smallest prime factor of x (spf[p] = p)
11
12 explicit LinearSieve(int n): n(n), spf(n + 1, 0) {
13 primes.reserve(n / 10);
14 for (int i = 2; i <= n; ++i) {
15 if (spf[i] == 0) { // i is prime
16 spf[i] = i;
17 primes.push_back(i);
18 }
19 for (int p : primes) {
20 long long x = 1LL * i * p;
21 if (x > n) break;
22 spf[(int)x] = p; // p is the smallest prime factor for x
23 if (p == spf[i]) break; // ensure each composite is marked once
24 }
25 }
26 }
27
28 // Factorize number x (2 <= x <= n) in O(number of prime factors)
29 vector<pair<int,int>> factorize(int x) const {
30 vector<pair<int,int>> fac;
31 while (x > 1) {
32 int p = spf[x];
33 int cnt = 0;
34 while (x % p == 0) {
35 x /= p;
36 ++cnt;
37 }
38 fac.push_back({p, cnt});
39 }
40 return fac;
41 }
42};
43
44int main() {
45 ios::sync_with_stdio(false);
46 cin.tie(nullptr);
47
48 int N, Q; // Example: N=100, Q=3 with queries like 60 84 97
49 if (!(cin >> N >> Q)) return 0;
50
51 LinearSieve sieve(N);
52
53 // Print primes up to N (optional)
54 // for (int p : sieve.primes) cout << p << ' '; cout << '\n';
55
56 while (Q--) {
57 int x; cin >> x;
58 auto fac = sieve.factorize(x);
59 cout << x << " = ";
60 for (size_t i = 0; i < fac.size(); ++i) {
61 auto [p, c] = fac[i];
62 cout << p;
63 if (c > 1) cout << '^' << c;
64 if (i + 1 != fac.size()) cout << " * ";
65 }
66 cout << '\n';
67 }
68 return 0;
69}
70

Euler’s sieve produces an SPF array while ensuring each composite is marked only once, yielding O(n) time. Using SPF, factorization proceeds by repeatedly dividing by the smallest prime factor, giving very fast query-time decomposition.

Time: O(n) for sieve, O(log n) per factorizationSpace: O(n)
Segmented sieve for primes in [L, R]
1#include <bits/stdc++.h>
2using namespace std;
3
4// Segmented sieve: compute primes in [L, R] (inclusive)
5// 1) Pre-sieve primes up to sqrt(R)
6// 2) Mark multiples in the window [L, R]
7
8static vector<int> base_primes_up_to(long long n) {
9 int m = (int)n;
10 vector<bool> is_prime(m + 1, true);
11 if (m >= 0) is_prime[0] = false;
12 if (m >= 1) is_prime[1] = false;
13 for (long long p = 2; p * p <= m; ++p) if (is_prime[(size_t)p])
14 for (long long j = p * p; j <= m; j += p)
15 is_prime[(size_t)j] = false;
16 vector<int> primes;
17 for (int i = 2; i <= m; ++i) if (is_prime[i]) primes.push_back(i);
18 return primes;
19}
20
21vector<long long> segmented_sieve(long long L, long long R) {
22 if (L > R) return {};
23 // Handle small ranges
24 L = max(2LL, L);
25 int sz = (int)(R - L + 1);
26 vector<bool> is_prime_range(sz, true);
27
28 long long root = floor(sqrt((long double)R));
29 vector<int> primes = base_primes_up_to(root);
30
31 for (long long p : primes) {
32 long long start = max(p * p, ((L + p - 1) / p) * p); // first multiple in [L, R]
33 for (long long j = start; j <= R; j += p) {
34 is_prime_range[(size_t)(j - L)] = false;
35 }
36 }
37
38 vector<long long> res;
39 for (int i = 0; i < sz; ++i) if (is_prime_range[i]) res.push_back(L + i);
40 return res;
41}
42
43int main() {
44 ios::sync_with_stdio(false);
45 cin.tie(nullptr);
46
47 long long L, R; // Example: input 10 50 -> primes in [10,50]
48 if (!(cin >> L >> R)) return 0;
49
50 auto primes = segmented_sieve(L, R);
51 cout << "Count = " << primes.size() << "\n";
52 for (auto p : primes) cout << p << ' ';
53 cout << '\n';
54 return 0;
55}
56

We first compute all primes up to √R with a standard sieve. Then we maintain a boolean window for [L, R] and mark multiples of each base prime within the window. This uses O(R−L+1) memory, which is ideal for large R when the range width is moderate.

Time: O((R - L + 1) log log R + sqrt(R) log log sqrt(R))Space: O(R - L + 1)
Odd-only sieve (half memory) and prime queries
1#include <bits/stdc++.h>
2using namespace std;
3
4// Odd-only sieve: represent only odd numbers ≥ 3
5// index i corresponds to number val = 2*i + 3
6// is_comp[i] == false means the odd number is prime candidate
7
8struct OddSieve {
9 int n; // upper bound
10 vector<uint8_t> comp; // byte-per-flag (simple). Could pack further if needed.
11
12 explicit OddSieve(int n): n(n) {
13 int m = max(0, (n - 3) / 2 + 1); // count of odds in [3..n]
14 comp.assign(m, 0);
15 int limit = (int)floor(sqrt((long double)n));
16 auto idx = [](int x){ return (x - 3) >> 1; }; // map odd x>=3 -> index
17
18 for (int p = 3; p <= limit; p += 2) {
19 if (p >= 3 && ((p & 1) == 1) && !comp[idx(p)]) {
20 long long start = 1LL * p * p; // first multiple to mark
21 if (start > n) break;
22 // convert start (odd) to index
23 for (long long j = start; j <= n; j += 2LL * p) {
24 comp[idx((int)j)] = 1;
25 }
26 }
27 }
28 }
29
30 bool isPrime(int x) const {
31 if (x < 2) return false;
32 if (x == 2) return true;
33 if ((x & 1) == 0) return false; // even > 2
34 int i = (x - 3) >> 1;
35 if (i < 0 || i >= (int)comp.size()) return false; // out of range
36 return comp[i] == 0;
37 }
38
39 vector<int> primes() const {
40 vector<int> res; if (n >= 2) res.push_back(2);
41 for (int i = 0; i < (int)comp.size(); ++i) if (!comp[i]) res.push_back(2*i + 3);
42 return res;
43 }
44};
45
46int main() {
47 ios::sync_with_stdio(false);
48 cin.tie(nullptr);
49
50 int n; // Example: input 100 -> primes up to 100
51 if (!(cin >> n)) return 0;
52
53 OddSieve S(n);
54
55 auto ps = S.primes();
56 cout << "Count = " << ps.size() << "\n";
57 for (int p : ps) cout << p << ' ';
58 cout << '\n';
59
60 // Example prime queries
61 int q; if (cin >> q) {
62 while (q--) {
63 int x; cin >> x;
64 cout << x << (S.isPrime(x) ? " is prime" : " is not prime") << '\n';
65 }
66 }
67 return 0;
68}
69

We skip all evens and store flags only for odds ≥ 3, halving memory and work. The inner loop steps by 2p to remain on odd multiples. Prime queries become O(1) array lookups.

Time: O(n log log n) but with a smaller constant (odds only)Space: O(n) bits (about half the classic sieve)