📚TheoryIntermediate

Online Algorithm Theory

Key Points

  • Online algorithms make decisions step by step without seeing the future and are judged against an all-knowing offline optimum.
  • Competitive analysis measures the worst-case ratio between an online algorithm’s cost and the optimal offline cost.
  • Classic results: Ski Rental is 2-competitive deterministically and e/(e-1) ≈ 1.58-competitive with randomization.
  • Paging (caching) has k-competitive deterministic algorithms like LRU and an Ω(log k) lower bound for randomized algorithms (with O(log k) achievable).
  • Online learning uses regret, the extra loss compared to the best fixed decision in hindsight, aiming for sublinear regret.
  • The Multiplicative Weights (experts) method achieves O(√(T log n)) regret in the full-information model.
  • Bandit algorithms balance exploration and exploitation when only the chosen arm’s feedback is observed.
  • These ideas power real-time ML, ad allocation, recommendation, and streaming systems where future data is unknown.

Prerequisites

  • Asymptotic notation (Big-O)To interpret competitive ratios, regret bounds, and per-round complexities.
  • Probability and expectationRandomized competitive analysis and regret bounds take expectations over algorithm randomness.
  • Basic calculus and logarithmsChoosing learning rates and interpreting bounds involving log n and square roots.
  • Data structures: hash tables and linked listsEfficient LRU implementations require O(1) operations via hash + list.
  • Algorithm analysis (worst-case vs average-case)Competitive analysis is a worst-case framework; understanding the distinction avoids misinterpretation.
  • Convexity and exponential functionsMultiplicative Weights uses exponential updates and inequalities in its analysis.
  • Randomized algorithms and adversary modelsUnderstanding oblivious vs adaptive adversaries clarifies which guarantees hold.

Detailed Explanation

Tap terms for definitions

01Overview

Online algorithm theory studies decision-making when inputs arrive sequentially and must be acted upon immediately without knowledge of future requests. Unlike offline algorithms that can analyze the entire input before deciding, online algorithms commit to choices that cannot be changed later. Because worst-case absolute performance can be misleading in this asymmetric setting, we compare online algorithms against an optimal offline benchmark that knows the full sequence. This is captured by competitive analysis, which asks how many times worse (in the worst case) an online algorithm can be than the offline optimum. A parallel thread—online learning—compares cumulative loss to the best fixed action in hindsight via regret. These frameworks guide the design of robust strategies that perform well even under adversarial or unpredictable inputs.

Canonical examples include the Ski Rental problem (rent vs. buy under unknown duration), paging/caching (evicting pages when memory is limited), experts and bandits (repeated decisions with feedback), and streaming algorithms (summaries with limited memory). Deterministic strategies often have tight worst-case bounds; randomization can significantly improve guarantees against oblivious adversaries. In practice, these theories underpin real-time systems in networking, operating systems, cloud autoscaling, and online machine learning pipelines.

02Intuition & Analogies

Imagine you are deciding each morning whether to rent or buy skis but do not know how many days you will ski this season. If you buy too early, you may waste money; if you delay too long, accumulated rent may exceed the buy price. That tug-of-war is the heart of online decision-making: act now with partial information and pay a penalty if the future surprises you.

Another analogy is your kitchen fridge (a tiny cache). You fetch ingredients (pages) over time. When it’s full, you must evict something before adding a new item. If you knew the entire week’s meals, you would keep exactly what you need. Without that foresight, a good heuristic like “Least Recently Used” (evict what you haven’t touched in the longest time) tends to work well and, remarkably, has provable worst-case guarantees.

For learning, think of consulting a panel of experts every day. Each expert gives advice, and afterwards you learn how costly each advice would have been. You would like to weight experts so that your cumulative loss is nearly as small as the single best expert you could have followed in hindsight. The Multiplicative Weights method does this by gently down-weighting poor performers and up-weighting good ones, ensuring your “extra loss” (regret) grows slower than time.

In bandits, you taste only the flavor you choose (partial feedback). To discover which flavor is best, you must occasionally try others (exploration), but not too much (exploitation). Algorithms like UCB carefully balance this tradeoff to minimize total disappointment over time.

03Formal Definition

An online algorithm A receives a request sequence \( = (, , , )\). At time \(t\), it must output an action \(\) based only on \(,,\) and past actions, incurring cost \((, )\). Its total cost is \(() = \). Let OPT denote the optimal offline algorithm that knows \(\) in advance. Competitive analysis: A is c-competitive if for all \(\), \(() c\,() + \), where \( 0\) is an additive constant independent of \(\). For randomized A against an oblivious adversary, the guarantee is in expectation over A’s randomness. Classic: Ski Rental has a deterministic lower/upper bound of 2 and a randomized bound of \(e/(e-1)\). Online learning: At each round t, an algorithm chooses a distribution \(\) over n experts and then observes losses \((i) [0,1]\). The algorithm’s expected loss is \( (i)\, (i)\). Regret after T rounds is \( = (i)(i) - (i)\). An algorithm is no-regret if \()\). Multiplicative Weights with a well-chosen learning rate \(\) guarantees \( = O()\) in the full-information setting. In bandits (partial feedback), optimal regret scales slower but includes dependence on arm gaps.

04When to Use

Use online algorithms when inputs arrive over time and delaying decisions is impossible or costly: operating systems (paging/caching), network routing and admission control, cloud autoscaling, and real-time analytics. They are essential when the full input cannot fit into memory (streaming) or future requests are inherently unpredictable.

Competitive analysis is the right lens when the environment may be adversarial or worst-case scenarios must be tolerated—e.g., eviction policies in caches where pathological access patterns can occur. It is also helpful when you want algorithmic guarantees without assuming probabilistic input models.

Online learning with regret is appropriate when you repeatedly make predictions or choices and receive evaluative feedback—ad placement, portfolio allocation, recommendation systems, and adaptive control. Use experts/Multiplicative Weights when you can observe the losses of all actions after each round (full information). Use bandit algorithms (UCB, Thompson Sampling, ε-greedy) when you only see the feedback for the chosen action and must manage exploration vs. exploitation.

If you can batch data and obtain the entire input cheaply, offline algorithms may be simpler and more accurate. However, when latency, memory, or uncertainty dominate, the online frameworks provide principled performance guarantees.

⚠️Common Mistakes

  • Confusing competitive ratio with approximation ratio: approximation compares offline algorithms on a fixed static input, while competitive ratio compares an online algorithm to the offline optimum across all sequences.
  • Ignoring the additive constant in competitive bounds, which can matter for short sequences and lead to misleading empirical conclusions.
  • Evaluating online algorithms on average-case i.i.d. inputs while claiming worst-case competitive guarantees; these are distinct regimes with different theorems.
  • Misusing randomization assumptions: results often assume an oblivious adversary (input fixed before randomness). Against an adaptive adversary, guarantees can weaken or fail.
  • For paging, implementing LRU without O(1) data structures (hash + linked list) leads to hidden (O(n)) updates and poor real-world performance.
  • In online learning, forgetting to normalize probabilities or clipping losses to [0,1] can break theoretical regret bounds.
  • In bandits, over-exploring or under-exploring due to poorly tuned parameters (e.g., ε too big/small) leads to linear regret in practice.
  • Comparing to a moving oracle in regret (best per-round or switching expert) without using the corresponding theory (e.g., tracking regret) yields unfair baselines.

Key Formulas

Competitive Ratio (pure multiplicative)

Explanation: The worst-case ratio of the online algorithm’s cost versus the offline optimal cost over all input sequences. A smaller value indicates a better online algorithm.

Competitive Ratio with Additive Constant

Explanation: Allows a fixed additive slack α independent of the input. Often needed to handle initialization or rounding effects for small inputs.

Expected Competitive Ratio (randomized)

Explanation: For randomized online algorithms against oblivious adversaries, the guarantee holds in expectation over the algorithm’s randomness for every fixed input sequence.

Regret (Experts)

Explanation: Cumulative loss of the algorithm minus the cumulative loss of the single best expert in hindsight. Sublinear regret means average regret per round goes to zero.

Multiplicative Weights Update

Explanation: Weights are exponentially decreased for experts with higher loss; probabilities are normalized weights. The learning rate η controls the aggressiveness of updates.

MW Regret Template

Explanation: A standard bound in MW analysis. Choosing η optimally (≈√(log n / T)) yields a regret of order √(T log n).

Optimized MW Regret (losses in [0,1])

Explanation: With η = √(2 log n / T), MW achieves this clean bound. It shows regret grows sublinearly with T and logarithmically with the number of experts n.

Randomized Ski Rental Strategy

Explanation: Buy at a random time with this cumulative distribution, where B is the buy price (rent = 1 per day). This yields competitive ratio e/(e−1).

Ski Rental Randomized Ratio

Explanation: The optimal randomized ski rental algorithm achieves a competitive ratio of e/(e−1) ≈ 1.58 against an oblivious adversary.

LRU k-Competitiveness

Explanation: For cache size k, LRU incurs at most k times as many misses as the offline optimal strategy, in the worst case over all sequences.

Harmonic Number

Explanation: Appears in the analysis of randomized paging algorithms; competitive ratios often scale like , which grows roughly as log k.

UCB1 Bandit Regret

Explanation: For K-armed bandits with gaps Δ_i between the best arm and arm i, UCB1’s expected regret scales logarithmically with time and inversely with the gaps.

Complexity Analysis

Online algorithms are often designed for low per-request latency and small memory footprints. For paging, LRU can be implemented with a hash map and doubly linked list, giving O(1) expected time per access (insert, lookup, move-to-front) and O(k) space for cache capacity k. The optimal offline comparator (Bélády) requires looking ahead to find farthest future use; a naive implementation is O(nk) time for a sequence of length n. This contrast highlights why we compare online LRU to OPT via competitive analysis rather than raw running time. For Ski Rental, the algorithm’s decision is trivial per day (O(1) time, O(1) space). Deterministic rent-then-buy achieves a tight 2-competitive bound. The randomized strategy uses a single random threshold (or exponential clock) and still runs in O(1) per day. Monte Carlo evaluation to estimate expected ratios takes O(T · trials) time. In online learning with n experts over T rounds, Multiplicative Weights maintains weights and normalizes probabilities each round: O(n) time and O(n) space per round, totaling O(nT) time and O(n) space. Its regret bound O(√(T log n)) implies average regret O(√(log n / T)), approaching zero as T grows. Bandit algorithms typically pay O(K) time per round to update statistics for K arms, though data structures can reduce argmax or sampling costs. Their regret bounds depend on problem-dependent gaps (Δ) or problem-independent rates O(√(KT)). Overall, online algorithms trade global optimality for responsiveness, using principled worst-case or regret guarantees to ensure robust performance.

Code Examples

Ski Rental: Deterministic 2-competitive and Randomized e/(e−1)-competitive (Monte Carlo)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Simulate deterministic ski rental: rent until total rent equals B, then buy.
5// rent cost = 1 per day, buy cost = B.
6struct SkiResult { double cost_alg; double cost_opt; };
7
8SkiResult deterministic_ski(int T, int B){
9 int rent_days = min(T, B); // rent for at most B days
10 int cost = rent_days; // pay 1 per rented day
11 if(T > B) cost += B; // then buy once after B days
12 int cost_opt = min(T, B); // offline: rent all T or buy immediately, whichever cheaper
13 return {double(cost), double(cost_opt)};
14}
15
16// Randomized ski rental: choose threshold tau ~ CDF F(x) = (e^{x/B}-1)/(e-1), x in [0,B]
17// Buy when cumulative rent exceeds tau (equivalently, by end of ceil(tau) days).
18SkiResult randomized_ski_once(int T, int B, mt19937 &rng){
19 uniform_real_distribution<double> U(0.0, 1.0);
20 double u = U(rng);
21 double tau = B * log(1.0 + u * (M_E - 1.0)); // inverse CDF
22 int threshold_days = (int)ceil(tau); // buy at start of day threshold_days+1
23 int rent_days = min(T, threshold_days);
24 int cost = rent_days;
25 if(T > threshold_days) cost += B; // if season outlives threshold, eventually buy
26 int cost_opt = min(T, B);
27 return {double(cost), double(cost_opt)};
28}
29
30int main(){
31 ios::sync_with_stdio(false);
32 cin.tie(nullptr);
33
34 int B = 20; // buy price
35 vector<int> horizons = {5, 10, 20, 21, 50, 100};
36
37 cout << "Deterministic 2-competitive simulation\n";
38 for(int T : horizons){
39 auto r = deterministic_ski(T, B);
40 cout << "T=" << T << " cost_alg=" << r.cost_alg
41 << " cost_opt=" << r.cost_opt
42 << " ratio=" << (r.cost_alg / max(1.0, r.cost_opt)) << "\n";
43 }
44
45 cout << "\nRandomized e/(e-1)-competitive (Monte Carlo averaging)\n";
46 mt19937 rng(12345);
47 int trials = 20000;
48 for(int T : horizons){
49 double sum_alg = 0.0, sum_opt = 0.0;
50 for(int it=0; it<trials; ++it){
51 auto r = randomized_ski_once(T, B, rng);
52 sum_alg += r.cost_alg;
53 sum_opt += r.cost_opt;
54 }
55 double ratio = (sum_alg / trials) / max(1.0, (sum_opt / trials));
56 cout << "T=" << T << " E[cost_alg]=" << (sum_alg / trials)
57 << " cost_opt=" << (sum_opt / trials)
58 << " ratio≈" << ratio << "\n";
59 }
60
61 return 0;
62}
63

We compare the classic deterministic rent-then-buy rule (2-competitive) with the optimal randomized strategy using the known inverse CDF. For a range of season lengths T, we print each algorithm’s cost and the ratio against the offline optimum. The randomized strategy’s expected ratio approaches e/(e−1) ≈ 1.58 for long horizons where OPT buys.

Time: O(|horizons| + trials) per fixed B, with O(1) work per simulationSpace: O(1)
Paging: LRU (online) vs. Bélády (offline OPT) on a reference string
1#include <bits/stdc++.h>
2using namespace std;
3
4// LRU Cache with O(1) expected operations using list + unordered_map
5struct LRU {
6 int cap;
7 list<int> dq; // front = most recent
8 unordered_map<int, list<int>::iterator> pos;
9 int faults = 0;
10 LRU(int k): cap(k) {}
11 void access(int x){
12 auto it = pos.find(x);
13 if(it != pos.end()){
14 // hit: move to front
15 dq.erase(it->second);
16 dq.push_front(x);
17 pos[x] = dq.begin();
18 } else {
19 // miss
20 faults++;
21 if((int)dq.size() == cap){
22 int victim = dq.back();
23 dq.pop_back();
24 pos.erase(victim);
25 }
26 dq.push_front(x);
27 pos[x] = dq.begin();
28 }
29 }
30};
31
32// Bélády's optimal offline paging (for comparison). O(n*k) naive.
33int belady_faults(const vector<int>& req, int k){
34 unordered_set<int> cache;
35 int faults = 0;
36 const int n = (int)req.size();
37 for(int i=0;i<n;i++){
38 int x = req[i];
39 if(cache.count(x)) continue; // hit
40 faults++;
41 if((int)cache.size() < k){ cache.insert(x); continue; }
42 // evict the page with farthest next use (or never used again)
43 int victim = -1, far = -1;
44 for(int y : cache){
45 int next = n+1; // infinity sentinel
46 for(int j=i+1;j<n;j++) if(req[j]==y){ next=j; break; }
47 if(next > far){ far = next; victim = y; }
48 }
49 cache.erase(victim);
50 cache.insert(x);
51 }
52 return faults;
53}
54
55int main(){
56 ios::sync_with_stdio(false);
57 cin.tie(nullptr);
58
59 vector<int> seq = {1,2,3,4,1,2,5,1,2,3,4,5};
60 int k = 3; // cache size
61
62 LRU lru(k);
63 for(int x : seq) lru.access(x);
64 int opt = belady_faults(seq, k);
65
66 cout << "Sequence: ";
67 for(int x: seq) cout << x << ' '; cout << "\n";
68 cout << "Cache size k=" << k << "\n";
69 cout << "LRU faults=" << lru.faults << "\n";
70 cout << "OPT (Belady) faults=" << opt << "\n";
71 if(opt>0) cout << "Empirical ratio LRU/OPT=" << (double)lru.faults/opt << "\n";
72
73 return 0;
74}
75

Implements an online LRU cache with O(1) expected-time operations and compares its page faults on a classic reference string to the offline optimal Bélády algorithm. The LRU competitive ratio is k in the worst case, but on typical sequences it often performs close to optimal.

Time: LRU accesses O(n) total; Bélády O(nk) with naive lookaheadSpace: LRU O(k); Bélády O(k)
Multiplicative Weights (Experts): O(√(T log n)) regret (simulation)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Multiplicative Weights for full-information experts with losses in [0,1]
5struct MW {
6 int n; double eta;
7 vector<double> w; // expert weights
8 MW(int n_, double eta_) : n(n_), eta(eta_), w(n_, 1.0) {}
9 vector<double> probs() const {
10 double S = accumulate(w.begin(), w.end(), 0.0);
11 vector<double> p(n);
12 for(int i=0;i<n;i++) p[i] = w[i]/S;
13 return p;
14 }
15 void update(const vector<double>& loss){
16 for(int i=0;i<n;i++) w[i] *= exp(-eta * loss[i]);
17 }
18};
19
20int main(){
21 ios::sync_with_stdio(false);
22 cin.tie(nullptr);
23
24 int n = 5; // experts
25 int T = 2000; // rounds
26 double eta = sqrt(2.0 * log(n) / T); // tuned for losses in [0,1]
27 MW alg(n, eta);
28
29 // Create a stochastic environment where expert 0 is slightly better
30 // Expert 0: Bernoulli(0.35) loss; others: Bernoulli(0.5)
31 mt19937 rng(123);
32 bernoulli_distribution good(0.35), bad(0.5);
33
34 double alg_loss = 0.0;
35 vector<double> expert_loss(n, 0.0);
36
37 for(int t=1; t<=T; ++t){
38 vector<double> loss(n);
39 loss[0] = good(rng);
40 for(int i=1;i<n;i++) loss[i] = bad(rng);
41
42 auto p = alg.probs();
43 double round_alg_loss = 0.0;
44 for(int i=0;i<n;i++) round_alg_loss += p[i]*loss[i];
45 alg_loss += round_alg_loss;
46 for(int i=0;i<n;i++) expert_loss[i] += loss[i];
47
48 alg.update(loss);
49 }
50
51 double best = *min_element(expert_loss.begin(), expert_loss.end());
52 double regret = alg_loss - best;
53
54 cout.setf(std::ios::fixed); cout<<setprecision(4);
55 cout << "Experts n=" << n << ", rounds T=" << T << "\n";
56 cout << "Algorithm cumulative loss=" << alg_loss << "\n";
57 cout << "Best expert loss=" << best << "\n";
58 cout << "Regret=" << regret << " (should be O(sqrt(T log n))≈ "
59 << sqrt(2.0*T*log(n)) << ")\n";
60
61 return 0;
62}
63

Implements the full-information Multiplicative Weights algorithm. At each round, it forms a probability distribution over experts from exponential weights, incurs the expected loss, observes all experts’ losses, and updates weights. The reported regret matches the theoretical O(√(T log n)) scaling up to constants and randomness.

Time: O(nT) total (O(n) per round)Space: O(n)