⚙️AlgorithmIntermediate

Greedy Algorithms

Key Points

  • Greedy algorithms build a solution step by step by always taking the best local choice available.
  • Correctness typically hinges on the greedy-choice property and optimal substructure, proven via exchange argument or greedy-stays-ahead.
  • Classic problems solvable by greedy include activity selection, fractional knapsack, and Huffman coding.
  • Sorting and priority queues are the backbone data structures for most greedy solutions.
  • Not every optimization problem admits a correct greedy solution; some require dynamic programming or backtracking.
  • Tie-breaking rules can matter; choose them carefully and justify them in the proof.
  • Greedy solutions are often simple and fast, frequently running in O(n n) time with modest memory.
  • Always accompany a greedy algorithm with a correctness proof sketch and clear invariants.

Prerequisites

  • Asymptotic analysis (Big-O)To evaluate the efficiency of greedy algorithms in terms of time and space.
  • Sorting algorithmsMany greedy algorithms begin with sorting by a key such as finish time or ratio.
  • Priority queues and heapsGreedy selection often uses a heap to get the current best element efficiently.
  • Graph fundamentalsUnderstanding vertices, edges, and weights is necessary for greedy graph algorithms like Dijkstra.
  • Proof techniques (induction, invariants)To prove correctness via exchange arguments or stays-ahead invariants.
  • Basic linear algebra and probability (optional)Helpful for understanding Huffman coding and expected code length minimization.
  • Data structures (arrays, vectors, maps)To store and manipulate items, intervals, and frequencies in implementations.
  • Stable vs unstable sorting and custom comparatorsTie-breaking can affect correctness; comparators must implement the intended order.

Detailed Explanation

Tap terms for definitions

01Overview

A greedy algorithm is a strategy for solving optimization problems by making a series of choices, each of which looks best at the moment. Instead of exploring all possibilities, a greedy method commits to the local optimum at each step and never reconsiders it. This can lead to remarkably simple and efficient algorithms when the problem structure ensures that locally optimal decisions combine into a globally optimal solution. The two key structural properties that enable this are the greedy-choice property (there exists an optimal solution that begins with the greedy choice) and optimal substructure (the remaining problem after a choice is made is of the same form). Many well-known algorithms are greedy at their core: scheduling by earliest finishing time, Huffman coding for optimal prefix-free compression, Dijkstra’s algorithm for shortest paths with nonnegative edges, and the fractional knapsack solution by value-to-weight ratio. In practice, greedy approaches dominate competitive programming for certain problem patterns because they require only sorting or a priority queue and yield clear correctness arguments. However, greediness is not magic—without the right structure, it may produce suboptimal answers. Consequently, experienced problem solvers combine design with proof: propose a greedy rule, test it on corner cases, and justify it with an exchange argument or a "stays-ahead" invariant that shows the greedy solution is never worse than any optimal solution prefix by prefix.

02Intuition & Analogies

Imagine you have a day filled with shows you want to watch, each with a start and end time. If your goal is to watch as many complete shows as possible, a natural idea is to always pick the show that ends first among those you can still attend. This local choice (earliest finish) leaves you with the most remaining time for the rest—intuitively maximizing future opportunities. That’s a greedy idea: do what looks best now to stay flexible later. Another everyday analogy is packing snacks for a hike with a weight limit. If you can break snacks into parts (like pouring trail mix), you’ll fill your bag with items that give the most calories per gram first. This ratio rule ensures each additional gram yields the biggest payoff—a classic greedy for the fractional knapsack. For compression, think of pairing the quietest instruments first when mixing sounds to reduce overall loudness increments; Huffman coding does something similar by repeatedly merging the least frequent symbols, thereby keeping their code lengths short and minimizing total weighted length. Finally, consider finding a route in a city where all roads have nonnegative travel time: picking the next unvisited location with the smallest known arrival time is safe because any detour later cannot beat the already minimal time discovered—this is the heart of Dijkstra’s greedy step. These analogies highlight a common theme: greedy works when an early, seemingly myopic decision cannot hurt you later because the problem’s structure guarantees it either matches or can be transformed into an optimal plan.

03Formal Definition

Greedy algorithms solve an optimization problem by constructing a solution iteratively. At each step t, they select an element from a candidate set that maximizes (or minimizes) a local objective f(, stat), update the state, and continue until feasibility is achieved. A problem is amenable to a greedy approach if: (1) Greedy-choice property: There exists an optimal solution whose first choice equals the greedy choice ; equivalently, picking does not preclude optimality. (2) Optimal substructure: After taking , the remainder of the problem reduces to the same problem class on a smaller instance, and an optimal completion combined with yields a global optimum. Correctness is commonly proven via: (a) Exchange argument: Starting from an optimal solution O, iteratively swap elements to match the greedy picks without worsening the objective, transforming O into the greedy solution; or (b) Greedy-stays-ahead: For all k, a suitable prefix measure of the greedy solution is at least (for maximization) or at most (for minimization) that of any optimal solution, ensuring final optimality. Many greedy problems can be formalized as combinatorial optimization over matroids or polymatroids, where a canonical exchange property holds. Complexity-wise, most greedy algorithms reduce to sorting (O(n n)) and/or repeated extraction/insertion in a heap (O( n) per operation), with linear or near-linear space.

04When to Use

Use greedy algorithms when the problem exhibits a structure where making the best local move leads to a global optimum. Indicators include: (1) You can sort items by a single key (finish time, ratio, weight, frequency) and then scan once to build a solution. (2) A natural monotone invariant suggests that the greedy prefix is never worse than any other feasible prefix (stays-ahead). (3) You can argue that any optimal solution can be transformed to begin with the greedy choice without hurting optimality (exchange). Classic scenarios: interval scheduling by earliest finish to maximize the number of non-overlapping intervals; selecting items for fractional knapsack by value density; building an optimal prefix-free code by repeatedly merging the two least-frequent symbols (Huffman); determining shortest paths in nonnegative-weight graphs via Dijkstra’s algorithm. Competitive programming patterns: picking minimal sets that cover constraints with earliest-deadline-first; maintaining current best candidates with a priority queue as you sweep sorted events; or scheduling/partitioning where an obvious tie-breaker plus a heap yields provably optimal behavior. Avoid greedy if constraints or costs interact globally in complex ways (e.g., 0/1 knapsack, edit distance) where dynamic programming is necessary.

⚠️Common Mistakes

Common pitfalls include: (1) Applying greedy without proof. A locally optimal rule can be seductive but wrong (e.g., 0/1 knapsack by value/weight ratio fails). Always provide an exchange or stays-ahead argument. (2) Ignoring tie-breaking. For activity selection, sorting by earliest finish is correct; breaking ties by shortest duration is not necessary but safe, whereas sorting by earliest start can be incorrect. State tie-break rules explicitly and justify them. (3) Violating assumptions. Dijkstra’s algorithm requires nonnegative edge weights; using it with negative edges breaks correctness. (4) Mixing fractional and integral constraints. Fractional knapsack allows splitting items; using the same idea for 0/1 knapsack leads to suboptimal picks. (5) Overlooking feasibility during selection. When greedily adding intervals or jobs, ensure constraints (non-overlap, deadlines, capacity) remain satisfied. (6) Floating-point precision with ratio sorts. When comparing value/weight, avoid direct division when possible; compare cross-products to reduce precision errors. (7) Using unstable or incorrect sorting keys. If a proof depends on stable ordering under ties, use stable_sort or include the tie-breaker in the comparator. (8) Incorrect heap choice. Some problems require a min-heap, others a max-heap; mixing them leads to wrong selections. (9) Not maintaining invariants. Clearly state and check the invariant after each greedy step to guide implementation and debugging. (10) Premature optimization. Keep the core greedy logic clear; only optimize after correctness is assured.

Key Formulas

Greedy-choice property

Explanation: States that there is at least one optimal solution that starts with the greedy pick . After picking , the remainder forms a subproblem whose optimal solution completes global optimality.

Greedy stays ahead

Explanation: At every prefix k, the greedy solution’s progress (measured appropriately) is no worse than that of any optimal solution. This ensures final optimality when both complete after the same number of steps.

Fractional Knapsack LP

Explanation: The linear program for fractional knapsack. Sorting by value density / and filling greedily solves it optimally because the LP’s extreme points correspond to taking items fully or one item fractionally.

Interval Scheduling Objective

Explanation: Choose as many intervals as possible so that none overlap. Greedy by earliest finishing time satisfies this constraint while maximizing the count.

Huffman Objective with Kraft Inequality

Explanation: Minimize expected code length L given symbol probabilities and codeword lengths . Huffman merges the two least probable symbols iteratively to achieve the minimum under the prefix-free constraint.

Runtime building blocks

Explanation: Greedy algorithms often rely on sorting (O(n log n)) or priority queues for m operations over n elements (O((n + m) log n)). Many solutions are dominated by these costs.

Dijkstra Invariant

Explanation: Once a vertex u is extracted from the priority queue, its tentative distance equals the true shortest path distance from the source s, provided all edge weights are nonnegative.

Arithmetic series

Explanation: Useful for summing linear sequences during analysis of incremental greedy steps. It shows that repeatedly doing linear work per step can add up to quadratic time.

Complexity Analysis

Most greedy algorithms are lightweight because they avoid exhaustive search. Their time complexity is typically driven by two primitives: sorting and heap operations. When a solution requires sorting n elements by a key (finish time, ratio, deadline), the dominant cost is O(n n) time and O(1) or O(n) extra space depending on the sort implementation. When the algorithm repeatedly selects the current best element, a priority queue offers O( n) per insert/extract, leading to O(n n) if each element is handled a constant number of times, or O((n + m) n) in graph problems with m edges (e.g., Dijkstra). Linear scans that greedily accept or reject items post-sort often run in O(n) after sorting. Space complexity is generally modest: O(n) to store items, heaps, or adjacency lists, plus recursion/auxiliary structures for proofs or code generation (e.g., Huffman tree). For specific examples: interval scheduling is O(n n) time for sorting and O(n) space; fractional knapsack is O(n n) time and O(1) or O(n) space; Huffman coding uses a min-heap with O(n n) time and O(n) space for the tree; Dijkstra with a binary heap is O((n + m) n) time and O(n + m) space. Be mindful of constants: sorting with custom comparators and heavy object copies can add overhead; prefer indices or lightweight structs. When precision is required (e.g., sorting by ratios), compare via cross-products to avoid floating-point errors without changing asymptotics.

Code Examples

Interval Scheduling (Activity Selection) by Earliest Finish Time
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Interval {
5 int start, finish;
6};
7
8int main() {
9 ios::sync_with_stdio(false);
10 cin.tie(nullptr);
11
12 // Example intervals: [start, finish)
13 vector<Interval> a = {
14 {1, 4}, {3, 5}, {0, 6}, {5, 7}, {3, 9}, {5, 9}, {6, 10}, {8, 11}, {8, 12}, {2, 14}, {12, 16}
15 };
16
17 // Greedy rule: sort by earliest finish time, then pick non-overlapping intervals
18 sort(a.begin(), a.end(), [](const Interval& x, const Interval& y){
19 if (x.finish != y.finish) return x.finish < y.finish; // primary key: finish
20 return x.start < y.start; // tie-breaker: earlier start (safe)
21 });
22
23 vector<Interval> chosen;
24 int currentEnd = INT_MIN; // the finish time of the last chosen interval
25
26 for (const auto& it : a) {
27 if (it.start >= currentEnd) {
28 // Non-overlapping; accept it and update the boundary
29 chosen.push_back(it);
30 currentEnd = it.finish;
31 }
32 }
33
34 cout << "Maximum number of non-overlapping intervals: " << chosen.size() << "\n";
35 cout << "Chosen intervals (start, finish):\n";
36 for (auto &it : chosen) cout << it.start << " " << it.finish << "\n";
37
38 return 0;
39}
40

Sort intervals by earliest finishing time, then scan once, greedily selecting any interval that starts after (or at) the finish of the last chosen interval. The invariant is that at every step, the chosen set is feasible and ends as early as possible, leaving maximum room for future intervals.

Time: O(n log n)Space: O(n)
Fractional Knapsack by Value Density
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Item {
5 long long value;
6 long long weight;
7};
8
9int main() {
10 ios::sync_with_stdio(false);
11 cin.tie(nullptr);
12
13 // Items: (value, weight) and knapsack capacity W. We can take fractions.
14 vector<Item> items = {
15 {60, 10}, {100, 20}, {120, 30}
16 };
17 long long W = 50;
18
19 // Sort by decreasing value/weight. Avoid floating division when comparing.
20 sort(items.begin(), items.end(), [](const Item& a, const Item& b){
21 // Compare a.value/a.weight > b.value/b.weight via cross-multiplication
22 // to avoid precision issues.
23 return (__int128)a.value * b.weight > (__int128)b.value * a.weight;
24 });
25
26 long double totalValue = 0.0L;
27 long long remaining = W;
28
29 for (const auto& it : items) {
30 if (remaining == 0) break;
31 if (it.weight <= remaining) {
32 // Take the whole item
33 totalValue += it.value;
34 remaining -= it.weight;
35 } else {
36 // Take a fraction to fill the bag
37 long double fraction = (long double)remaining / (long double)it.weight;
38 totalValue += (long double)it.value * fraction;
39 remaining = 0;
40 break;
41 }
42 }
43
44 cout.setf(std::ios::fixed); cout << setprecision(6);
45 cout << "Maximum fractional knapsack value: " << (double)totalValue << "\n";
46 return 0;
47}
48

We sort items by value density (value per unit weight) and take as much as possible from the highest-density items first. When capacity runs out in the middle of an item, we take just a fraction. Cross-multiplication avoids floating-point comparison errors when sorting ratios.

Time: O(n log n)Space: O(1) additional (O(n) to store items)
Huffman Coding (Optimal Prefix-Free Codes)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Node {
5 long long freq;
6 Node* left;
7 Node* right;
8 Node(long long f, Node* l=nullptr, Node* r=nullptr) : freq(f), left(l), right(r) {}
9};
10
11struct Cmp {
12 bool operator()(const Node* a, const Node* b) const { return a->freq > b->freq; }
13};
14
15void buildCodes(Node* root, string path, unordered_map<long long, string>& codes) {
16 if (!root) return;
17 if (!root->left && !root->right) {
18 // Leaf node: assign code
19 codes[root->freq] = path.empty() ? "0" : path; // handle single-symbol edge case
20 return;
21 }
22 buildCodes(root->left, path + "0", codes);
23 buildCodes(root->right, path + "1", codes);
24}
25
26void freeTree(Node* root){ if(!root) return; freeTree(root->left); freeTree(root->right); delete root; }
27
28int main(){
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 // Example frequencies for symbols A,B,C,D,E (we only use frequencies here)
33 vector<pair<char,long long>> symbols = {{'A',45},{'B',13},{'C',12},{'D',16},{'E',9},{'F',5}};
34
35 priority_queue<Node*, vector<Node*>, Cmp> pq;
36 for (auto &p : symbols) pq.push(new Node(p.second));
37
38 // Greedy: repeatedly merge two least frequent nodes
39 while (pq.size() > 1) {
40 Node* a = pq.top(); pq.pop();
41 Node* b = pq.top(); pq.pop();
42 Node* merged = new Node(a->freq + b->freq, a, b);
43 pq.push(merged);
44 }
45
46 Node* root = pq.top();
47
48 // Compute total weighted path length (expected code length times total weight)
49 // Also build one possible code assignment (using left=0, right=1)
50 unordered_map<long long, string> tmpCodes; // keyed by frequency for demo
51 buildCodes(root, "", tmpCodes);
52
53 // For reporting, map back to symbols by matching frequencies (note: duplicates need distinct nodes in real impl)
54 long long totalCost = 0; // sum p_i * l_i scaled by frequency units
55 cout << "Symbol : Code\n";
56 for (auto &p : symbols) {
57 // In production, store symbol in Node; for brevity, use freq to look up
58 string code = tmpCodes[p.second];
59 cout << p.first << " : " << code << "\n";
60 totalCost += (long long)code.size() * p.second;
61 }
62 cout << "Total weighted code length: " << totalCost << "\n";
63
64 freeTree(root);
65 return 0;
66}
67

Huffman coding uses a min-heap to repeatedly merge the two least-frequent nodes, ensuring rarely used symbols get deeper (longer) codes while frequent symbols stay near the root (shorter codes). The exchange argument shows any optimal tree can be rearranged to place the two least frequent symbols as siblings at maximum depth, matching the greedy merge.

Time: O(n log n)Space: O(n)
Dijkstra’s Shortest Path (Nonnegative Weights)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Edge { int to; long long w; };
5
6vector<long long> dijkstra(int n, int s, const vector<vector<Edge>>& g) {
7 const long long INF = (1LL<<62);
8 vector<long long> dist(n, INF);
9 vector<char> used(n, 0);
10 priority_queue<pair<long long,int>, vector<pair<long long,int>>, greater<pair<long long,int>>> pq;
11
12 dist[s] = 0;
13 pq.push({0, s});
14
15 while (!pq.empty()) {
16 auto [du, u] = pq.top(); pq.pop();
17 if (used[u]) continue; // skip outdated entries
18 used[u] = 1;
19 // Invariant: dist[u] is finalized and minimal
20 for (auto e : g[u]) {
21 if (dist[e.to] > du + e.w) {
22 dist[e.to] = du + e.w;
23 pq.push({dist[e.to], e.to});
24 }
25 }
26 }
27 return dist;
28}
29
30int main(){
31 ios::sync_with_stdio(false);
32 cin.tie(nullptr);
33
34 int n = 5; // vertices 0..4
35 vector<vector<Edge>> g(n);
36 auto add = [&](int u, int v, long long w){ g[u].push_back({v,w}); g[v].push_back({u,w}); };
37
38 // Undirected graph with nonnegative weights
39 add(0,1,2); add(0,2,4); add(1,2,1); add(1,3,7); add(2,4,3); add(3,4,1);
40
41 int s = 0;
42 auto dist = dijkstra(n, s, g);
43
44 cout << "Distances from source " << s << ":\n";
45 for (int i = 0; i < n; ++i) cout << i << ": " << dist[i] << "\n";
46
47 return 0;
48}
49

Dijkstra’s algorithm repeatedly extracts the vertex with the smallest tentative distance and relaxes its edges. With nonnegative weights, once a vertex is extracted, no shorter path can appear later, which is the greedy invariant ensuring correctness.

Time: O((n + m) log n) with a binary heapSpace: O(n + m)