Online Algorithm Theory
Key Points
- •Online algorithms make decisions step by step without seeing the future and are judged against an all-knowing offline optimum.
- •Competitive analysis measures the worst-case ratio between an online algorithm’s cost and the optimal offline cost.
- •Classic results: Ski Rental is 2-competitive deterministically and e/(e-1) ≈ 1.58-competitive with randomization.
- •Paging (caching) has k-competitive deterministic algorithms like LRU and an Ω(log k) lower bound for randomized algorithms (with O(log k) achievable).
- •Online learning uses regret, the extra loss compared to the best fixed decision in hindsight, aiming for sublinear regret.
- •The Multiplicative Weights (experts) method achieves O(√(T log n)) regret in the full-information model.
- •Bandit algorithms balance exploration and exploitation when only the chosen arm’s feedback is observed.
- •These ideas power real-time ML, ad allocation, recommendation, and streaming systems where future data is unknown.
Prerequisites
- →Asymptotic notation (Big-O) — To interpret competitive ratios, regret bounds, and per-round complexities.
- →Probability and expectation — Randomized competitive analysis and regret bounds take expectations over algorithm randomness.
- →Basic calculus and logarithms — Choosing learning rates and interpreting bounds involving log n and square roots.
- →Data structures: hash tables and linked lists — Efficient LRU implementations require O(1) operations via hash + list.
- →Algorithm analysis (worst-case vs average-case) — Competitive analysis is a worst-case framework; understanding the distinction avoids misinterpretation.
- →Convexity and exponential functions — Multiplicative Weights uses exponential updates and inequalities in its analysis.
- →Randomized algorithms and adversary models — Understanding oblivious vs adaptive adversaries clarifies which guarantees hold.
Detailed Explanation
Tap terms for definitions01Overview
Online algorithm theory studies decision-making when inputs arrive sequentially and must be acted upon immediately without knowledge of future requests. Unlike offline algorithms that can analyze the entire input before deciding, online algorithms commit to choices that cannot be changed later. Because worst-case absolute performance can be misleading in this asymmetric setting, we compare online algorithms against an optimal offline benchmark that knows the full sequence. This is captured by competitive analysis, which asks how many times worse (in the worst case) an online algorithm can be than the offline optimum. A parallel thread—online learning—compares cumulative loss to the best fixed action in hindsight via regret. These frameworks guide the design of robust strategies that perform well even under adversarial or unpredictable inputs.
Canonical examples include the Ski Rental problem (rent vs. buy under unknown duration), paging/caching (evicting pages when memory is limited), experts and bandits (repeated decisions with feedback), and streaming algorithms (summaries with limited memory). Deterministic strategies often have tight worst-case bounds; randomization can significantly improve guarantees against oblivious adversaries. In practice, these theories underpin real-time systems in networking, operating systems, cloud autoscaling, and online machine learning pipelines.
02Intuition & Analogies
Imagine you are deciding each morning whether to rent or buy skis but do not know how many days you will ski this season. If you buy too early, you may waste money; if you delay too long, accumulated rent may exceed the buy price. That tug-of-war is the heart of online decision-making: act now with partial information and pay a penalty if the future surprises you.
Another analogy is your kitchen fridge (a tiny cache). You fetch ingredients (pages) over time. When it’s full, you must evict something before adding a new item. If you knew the entire week’s meals, you would keep exactly what you need. Without that foresight, a good heuristic like “Least Recently Used” (evict what you haven’t touched in the longest time) tends to work well and, remarkably, has provable worst-case guarantees.
For learning, think of consulting a panel of experts every day. Each expert gives advice, and afterwards you learn how costly each advice would have been. You would like to weight experts so that your cumulative loss is nearly as small as the single best expert you could have followed in hindsight. The Multiplicative Weights method does this by gently down-weighting poor performers and up-weighting good ones, ensuring your “extra loss” (regret) grows slower than time.
In bandits, you taste only the flavor you choose (partial feedback). To discover which flavor is best, you must occasionally try others (exploration), but not too much (exploitation). Algorithms like UCB carefully balance this tradeoff to minimize total disappointment over time.
03Formal Definition
04When to Use
Use online algorithms when inputs arrive over time and delaying decisions is impossible or costly: operating systems (paging/caching), network routing and admission control, cloud autoscaling, and real-time analytics. They are essential when the full input cannot fit into memory (streaming) or future requests are inherently unpredictable.
Competitive analysis is the right lens when the environment may be adversarial or worst-case scenarios must be tolerated—e.g., eviction policies in caches where pathological access patterns can occur. It is also helpful when you want algorithmic guarantees without assuming probabilistic input models.
Online learning with regret is appropriate when you repeatedly make predictions or choices and receive evaluative feedback—ad placement, portfolio allocation, recommendation systems, and adaptive control. Use experts/Multiplicative Weights when you can observe the losses of all actions after each round (full information). Use bandit algorithms (UCB, Thompson Sampling, ε-greedy) when you only see the feedback for the chosen action and must manage exploration vs. exploitation.
If you can batch data and obtain the entire input cheaply, offline algorithms may be simpler and more accurate. However, when latency, memory, or uncertainty dominate, the online frameworks provide principled performance guarantees.
⚠️Common Mistakes
- Confusing competitive ratio with approximation ratio: approximation compares offline algorithms on a fixed static input, while competitive ratio compares an online algorithm to the offline optimum across all sequences.
- Ignoring the additive constant in competitive bounds, which can matter for short sequences and lead to misleading empirical conclusions.
- Evaluating online algorithms on average-case i.i.d. inputs while claiming worst-case competitive guarantees; these are distinct regimes with different theorems.
- Misusing randomization assumptions: results often assume an oblivious adversary (input fixed before randomness). Against an adaptive adversary, guarantees can weaken or fail.
- For paging, implementing LRU without O(1) data structures (hash + linked list) leads to hidden (O(n)) updates and poor real-world performance.
- In online learning, forgetting to normalize probabilities or clipping losses to [0,1] can break theoretical regret bounds.
- In bandits, over-exploring or under-exploring due to poorly tuned parameters (e.g., ε too big/small) leads to linear regret in practice.
- Comparing to a moving oracle in regret (best per-round or switching expert) without using the corresponding theory (e.g., tracking regret) yields unfair baselines.
Key Formulas
Competitive Ratio (pure multiplicative)
Explanation: The worst-case ratio of the online algorithm’s cost versus the offline optimal cost over all input sequences. A smaller value indicates a better online algorithm.
Competitive Ratio with Additive Constant
Explanation: Allows a fixed additive slack α independent of the input. Often needed to handle initialization or rounding effects for small inputs.
Expected Competitive Ratio (randomized)
Explanation: For randomized online algorithms against oblivious adversaries, the guarantee holds in expectation over the algorithm’s randomness for every fixed input sequence.
Regret (Experts)
Explanation: Cumulative loss of the algorithm minus the cumulative loss of the single best expert in hindsight. Sublinear regret means average regret per round goes to zero.
Multiplicative Weights Update
Explanation: Weights are exponentially decreased for experts with higher loss; probabilities are normalized weights. The learning rate η controls the aggressiveness of updates.
MW Regret Template
Explanation: A standard bound in MW analysis. Choosing η optimally (≈√(log n / T)) yields a regret of order √(T log n).
Optimized MW Regret (losses in [0,1])
Explanation: With η = √(2 log n / T), MW achieves this clean bound. It shows regret grows sublinearly with T and logarithmically with the number of experts n.
Randomized Ski Rental Strategy
Explanation: Buy at a random time with this cumulative distribution, where B is the buy price (rent = 1 per day). This yields competitive ratio e/(e−1).
Ski Rental Randomized Ratio
Explanation: The optimal randomized ski rental algorithm achieves a competitive ratio of e/(e−1) ≈ 1.58 against an oblivious adversary.
LRU k-Competitiveness
Explanation: For cache size k, LRU incurs at most k times as many misses as the offline optimal strategy, in the worst case over all sequences.
Harmonic Number
Explanation: Appears in the analysis of randomized paging algorithms; competitive ratios often scale like , which grows roughly as log k.
UCB1 Bandit Regret
Explanation: For K-armed bandits with gaps Δ_i between the best arm and arm i, UCB1’s expected regret scales logarithmically with time and inversely with the gaps.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Simulate deterministic ski rental: rent until total rent equals B, then buy. 5 // rent cost = 1 per day, buy cost = B. 6 struct SkiResult { double cost_alg; double cost_opt; }; 7 8 SkiResult deterministic_ski(int T, int B){ 9 int rent_days = min(T, B); // rent for at most B days 10 int cost = rent_days; // pay 1 per rented day 11 if(T > B) cost += B; // then buy once after B days 12 int cost_opt = min(T, B); // offline: rent all T or buy immediately, whichever cheaper 13 return {double(cost), double(cost_opt)}; 14 } 15 16 // Randomized ski rental: choose threshold tau ~ CDF F(x) = (e^{x/B}-1)/(e-1), x in [0,B] 17 // Buy when cumulative rent exceeds tau (equivalently, by end of ceil(tau) days). 18 SkiResult randomized_ski_once(int T, int B, mt19937 &rng){ 19 uniform_real_distribution<double> U(0.0, 1.0); 20 double u = U(rng); 21 double tau = B * log(1.0 + u * (M_E - 1.0)); // inverse CDF 22 int threshold_days = (int)ceil(tau); // buy at start of day threshold_days+1 23 int rent_days = min(T, threshold_days); 24 int cost = rent_days; 25 if(T > threshold_days) cost += B; // if season outlives threshold, eventually buy 26 int cost_opt = min(T, B); 27 return {double(cost), double(cost_opt)}; 28 } 29 30 int main(){ 31 ios::sync_with_stdio(false); 32 cin.tie(nullptr); 33 34 int B = 20; // buy price 35 vector<int> horizons = {5, 10, 20, 21, 50, 100}; 36 37 cout << "Deterministic 2-competitive simulation\n"; 38 for(int T : horizons){ 39 auto r = deterministic_ski(T, B); 40 cout << "T=" << T << " cost_alg=" << r.cost_alg 41 << " cost_opt=" << r.cost_opt 42 << " ratio=" << (r.cost_alg / max(1.0, r.cost_opt)) << "\n"; 43 } 44 45 cout << "\nRandomized e/(e-1)-competitive (Monte Carlo averaging)\n"; 46 mt19937 rng(12345); 47 int trials = 20000; 48 for(int T : horizons){ 49 double sum_alg = 0.0, sum_opt = 0.0; 50 for(int it=0; it<trials; ++it){ 51 auto r = randomized_ski_once(T, B, rng); 52 sum_alg += r.cost_alg; 53 sum_opt += r.cost_opt; 54 } 55 double ratio = (sum_alg / trials) / max(1.0, (sum_opt / trials)); 56 cout << "T=" << T << " E[cost_alg]=" << (sum_alg / trials) 57 << " cost_opt=" << (sum_opt / trials) 58 << " ratio≈" << ratio << "\n"; 59 } 60 61 return 0; 62 } 63
We compare the classic deterministic rent-then-buy rule (2-competitive) with the optimal randomized strategy using the known inverse CDF. For a range of season lengths T, we print each algorithm’s cost and the ratio against the offline optimum. The randomized strategy’s expected ratio approaches e/(e−1) ≈ 1.58 for long horizons where OPT buys.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // LRU Cache with O(1) expected operations using list + unordered_map 5 struct LRU { 6 int cap; 7 list<int> dq; // front = most recent 8 unordered_map<int, list<int>::iterator> pos; 9 int faults = 0; 10 LRU(int k): cap(k) {} 11 void access(int x){ 12 auto it = pos.find(x); 13 if(it != pos.end()){ 14 // hit: move to front 15 dq.erase(it->second); 16 dq.push_front(x); 17 pos[x] = dq.begin(); 18 } else { 19 // miss 20 faults++; 21 if((int)dq.size() == cap){ 22 int victim = dq.back(); 23 dq.pop_back(); 24 pos.erase(victim); 25 } 26 dq.push_front(x); 27 pos[x] = dq.begin(); 28 } 29 } 30 }; 31 32 // Bélády's optimal offline paging (for comparison). O(n*k) naive. 33 int belady_faults(const vector<int>& req, int k){ 34 unordered_set<int> cache; 35 int faults = 0; 36 const int n = (int)req.size(); 37 for(int i=0;i<n;i++){ 38 int x = req[i]; 39 if(cache.count(x)) continue; // hit 40 faults++; 41 if((int)cache.size() < k){ cache.insert(x); continue; } 42 // evict the page with farthest next use (or never used again) 43 int victim = -1, far = -1; 44 for(int y : cache){ 45 int next = n+1; // infinity sentinel 46 for(int j=i+1;j<n;j++) if(req[j]==y){ next=j; break; } 47 if(next > far){ far = next; victim = y; } 48 } 49 cache.erase(victim); 50 cache.insert(x); 51 } 52 return faults; 53 } 54 55 int main(){ 56 ios::sync_with_stdio(false); 57 cin.tie(nullptr); 58 59 vector<int> seq = {1,2,3,4,1,2,5,1,2,3,4,5}; 60 int k = 3; // cache size 61 62 LRU lru(k); 63 for(int x : seq) lru.access(x); 64 int opt = belady_faults(seq, k); 65 66 cout << "Sequence: "; 67 for(int x: seq) cout << x << ' '; cout << "\n"; 68 cout << "Cache size k=" << k << "\n"; 69 cout << "LRU faults=" << lru.faults << "\n"; 70 cout << "OPT (Belady) faults=" << opt << "\n"; 71 if(opt>0) cout << "Empirical ratio LRU/OPT=" << (double)lru.faults/opt << "\n"; 72 73 return 0; 74 } 75
Implements an online LRU cache with O(1) expected-time operations and compares its page faults on a classic reference string to the offline optimal Bélády algorithm. The LRU competitive ratio is k in the worst case, but on typical sequences it often performs close to optimal.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Multiplicative Weights for full-information experts with losses in [0,1] 5 struct MW { 6 int n; double eta; 7 vector<double> w; // expert weights 8 MW(int n_, double eta_) : n(n_), eta(eta_), w(n_, 1.0) {} 9 vector<double> probs() const { 10 double S = accumulate(w.begin(), w.end(), 0.0); 11 vector<double> p(n); 12 for(int i=0;i<n;i++) p[i] = w[i]/S; 13 return p; 14 } 15 void update(const vector<double>& loss){ 16 for(int i=0;i<n;i++) w[i] *= exp(-eta * loss[i]); 17 } 18 }; 19 20 int main(){ 21 ios::sync_with_stdio(false); 22 cin.tie(nullptr); 23 24 int n = 5; // experts 25 int T = 2000; // rounds 26 double eta = sqrt(2.0 * log(n) / T); // tuned for losses in [0,1] 27 MW alg(n, eta); 28 29 // Create a stochastic environment where expert 0 is slightly better 30 // Expert 0: Bernoulli(0.35) loss; others: Bernoulli(0.5) 31 mt19937 rng(123); 32 bernoulli_distribution good(0.35), bad(0.5); 33 34 double alg_loss = 0.0; 35 vector<double> expert_loss(n, 0.0); 36 37 for(int t=1; t<=T; ++t){ 38 vector<double> loss(n); 39 loss[0] = good(rng); 40 for(int i=1;i<n;i++) loss[i] = bad(rng); 41 42 auto p = alg.probs(); 43 double round_alg_loss = 0.0; 44 for(int i=0;i<n;i++) round_alg_loss += p[i]*loss[i]; 45 alg_loss += round_alg_loss; 46 for(int i=0;i<n;i++) expert_loss[i] += loss[i]; 47 48 alg.update(loss); 49 } 50 51 double best = *min_element(expert_loss.begin(), expert_loss.end()); 52 double regret = alg_loss - best; 53 54 cout.setf(std::ios::fixed); cout<<setprecision(4); 55 cout << "Experts n=" << n << ", rounds T=" << T << "\n"; 56 cout << "Algorithm cumulative loss=" << alg_loss << "\n"; 57 cout << "Best expert loss=" << best << "\n"; 58 cout << "Regret=" << regret << " (should be O(sqrt(T log n))≈ " 59 << sqrt(2.0*T*log(n)) << ")\n"; 60 61 return 0; 62 } 63
Implements the full-information Multiplicative Weights algorithm. At each round, it forms a probability distribution over experts from exponential weights, incurs the expected loss, observes all experts’ losses, and updates weights. The reported regret matches the theoretical O(√(T log n)) scaling up to constants and randomness.