Game Theory
Key Points
- •Game theory studies strategic decision-making among multiple players where each player’s payoff depends on everyone’s actions.
- •Normal-form games represent simultaneous choices with a payoff matrix; extensive-form games use trees for sequential moves.
- •A Nash equilibrium is a strategy profile where no player can gain by unilaterally deviating; it may require mixed (randomized) strategies.
- •In zero-sum games, one player’s gain is the other’s loss, and the minimax theorem guarantees value equality between max–min and min–max.
- •Dominated strategies can be removed via iterated elimination to simplify a game without affecting rational outcomes.
- •Mixed strategies assign probabilities to pure actions and often ensure equilibrium existence in finite games.
- •Minimax, regret matching, and linear programming are practical tools to compute or approximate equilibria in different settings.
- •Game theory underpins AI/ML areas like GAN training (minimax), multi-agent RL, mechanism design, and safety/alignments like RLHF.
Prerequisites
- →Probability distributions — Mixed strategies are probability distributions over actions; expected utility requires computing weighted averages.
- →Linear algebra (vectors and matrices) — Payoff matrices, mixed strategies, and expected payoffs use matrix–vector operations like p^T A q.
- →Optimization and linear programming — Zero-sum equilibria can be solved via LP; understanding constraints and duality clarifies minimax.
- →Algorithms and complexity (Big-O) — Analyzing runtime of equilibrium-finding and learning dynamics guides practical implementations.
- →Basic C++ programming — Implementing payoff processing, loops, and numerical checks is necessary for the provided code.
- →Graph/tree basics — Extensive-form games are trees; backward induction and subgames rely on tree reasoning.
- →Convexity and fixed-point ideas (high level) — Equilibrium existence and properties rely on convex sets and fixed-point theorems, even if not proven here.
Detailed Explanation
Tap terms for definitions01Overview
Game theory is the study of strategic interaction among rational decision-makers (players). Each player chooses an action (strategy) to maximize their own payoff, but their outcome depends on the actions of others. Games can be represented in normal form (as payoff matrices) when decisions are simultaneous, or in extensive form (as trees) when decisions are sequential or information is revealed over time. A central solution concept is Nash equilibrium: a strategy profile in which no player benefits from changing only their own strategy. While some games have pure-strategy equilibria (deterministic choices), many require mixed strategies—probability distributions over actions. In zero-sum games (one player’s gain equals the other’s loss), the minimax theorem ensures a well-defined game value and equality of the max–min and min–max outcomes. Tools like eliminating dominated strategies, computing best responses, and solving linear programs help analyze and solve games. In AI/ML, game theory models adversarial training (e.g., GANs), strategic multi-agent interactions, and incentive design. Understanding how strategies interact, converge, and respond to incentives allows us to design robust algorithms and systems that perform well in competitive or cooperative environments.
02Intuition & Analogies
Imagine choosing whether to bring an umbrella without knowing if your friend brings one, but your comfort depends on both choices. Or picture two drivers approaching an intersection: each prefers to go, but if both go, they crash; if both yield, they wait—so each player’s best move depends on the other’s move. That interdependence is the heart of game theory. A normal-form game is like a grid of outcomes: rows are Player 1’s actions, columns are Player 2’s, and each cell shows both players’ payoffs. Think of it like a price-setting competition where two stores pick prices at the same time; profits depend on both prices. A Nash equilibrium is a stable stand-off: neither store wants to change price alone, given the other’s price. Sometimes, to avoid being predictable (and exploitable), players must randomize—like a soccer goalie deciding which way to dive or a penalty taker choosing a side. Mixed strategies keep opponents indifferent and unable to gain by guessing your move. In zero-sum settings (e.g., rock–paper–scissors or two AIs battling), one agent’s win is another’s loss. Minimax is like preparing for the worst-case opponent: choose a strategy that maximizes your minimum guaranteed payoff; amazingly, the adversary’s best counterbalance yields the same value when roles are reversed. When games unfold over time (extensive form), think of a conversation or chess: later moves depend on earlier ones; solving by backward reasoning (backward induction) gives subgame-perfect plans. These analogies map neatly onto ML: GANs pit a generator against a discriminator in a minimax duel, and multi-agent RL agents learn best responses against each other.
03Formal Definition
04When to Use
- Modeling simultaneous strategic choices: pricing wars, traffic routing, computational auctions—use normal-form games and search for Nash equilibria (pure or mixed). - Adversarial training and robustness: in GANs, minimize the discriminator’s ability to distinguish real from generated samples via a minimax objective; solve or approximate zero-sum equilibria. - Multi-agent reinforcement learning: agents learning best responses to each other; use regret minimization, fictitious play, or policy gradient dynamics to approach equilibrium behavior. - Mechanism and market design: design auctions and incentives so equilibria implement desired outcomes (efficiency, truthfulness). - Sequential decision problems with strategic interaction: model as extensive-form games; apply backward induction or subgame-perfect equilibrium concepts. - Security and scheduling: defender–attacker models, patrolling, inspection games; often zero-sum or Stackelberg (leader–follower) variants. - Simplification by dominance: when strategy sets are large, iteratively eliminate dominated strategies to shrink the game before computing equilibria. - Algorithmic analysis: when exact equilibria are hard, use approximation via no-regret learning (regret matching, Hedge) that converges to equilibrium (zero-sum) or correlated equilibria (general-sum).
⚠️Common Mistakes
- Confusing zero-sum with general-sum: minimax and value guarantees hold only for zero-sum/constant-sum games; in general-sum, equilibria need not maximize joint payoffs. - Ignoring mixed strategies: many finite games lack pure equilibria; failing to consider randomization misses existing Nash equilibria (e.g., matching pennies). - Incorrect dominance elimination: eliminating weakly dominated strategies in the wrong order can remove equilibria; stick to strictly dominated eliminations or be careful about order dependence. - Miscomputing best responses: forgetting to compute column-wise and row-wise maxima (or not handling ties) leads to missing pure equilibria. - Numerical instability in 2x2 mixed solutions: denominators can be zero (no interior mixed equilibrium); check for boundary cases and validate probabilities in [0,1]. - Overfitting dynamics to transient behavior: in learning-in-games, early oscillations do not imply no equilibrium; average strategies often converge. - Confusing minimax with max–max: robust optimization maximizes the minimum guaranteed payoff, not the maximum against a fixed opponent. - Assuming Nash equilibria are unique or efficient: games can have multiple equilibria, and equilibria can be Pareto-inefficient (e.g., Prisoner’s Dilemma).
Key Formulas
Normal-form game
Explanation: A game is defined by its players, each player’s set of strategies, and the payoff functions mapping action profiles to real numbers.
Expected utility under mixed strategies
Explanation: The expected payoff to player i is the average payoff over all pure profiles, weighted by the probability each profile occurs under the product of mixed strategies.
Best response
Explanation: A best response is any pure strategy that maximizes player i’s expected utility given the opponents’ strategies.
Nash equilibrium condition
Explanation: At equilibrium, no player can get a higher payoff by deviating to any other strategy, holding others fixed.
Minimax theorem (zero-sum)
Explanation: In finite zero-sum matrix games, the player maximizing their guaranteed payoff and the player minimizing the opponent’s payoff agree on the value; order of max and min can be swapped.
2x2 mixed-strategy equilibrium
Explanation: For a 2x2 bimatrix game, p is Player 1’s probability of playing the first row that makes Player 2 indifferent; q is Player 2’s probability of the first column that makes Player 1 indifferent. Valid only when denominators are nonzero and probabilities lie in [0,1].
Primal LP for zero-sum value
Explanation: Player 1 maximizes their guaranteed value v by choosing a mixed strategy p that makes every column’s expected payoff at least v. This is a linear program over the simplex.
Dual LP for zero-sum value
Explanation: Player 2 minimizes the game value by choosing q so that every row’s expected payoff is at most v; dual to the primal LP and yields the same value at optimum.
Average external regret
Explanation: Average regret measures how much more a fixed action would have earned compared to the actions actually played. No-regret algorithms drive this quantity toward zero as T grows.
No-regret convergence rate
Explanation: For many algorithms (e.g., Hedge/EXP3 in full-information), average regret decreases on the order of 1/sqrt(T), implying convergence to equilibrium concepts in repeated play.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Find all pure-strategy Nash equilibria in a 2-player normal-form game (bimatrix) 5 // A: m x n payoff matrix for Player 1 6 // B: m x n payoff matrix for Player 2 7 // A cell (i,j) is a pure NE if: 8 // - i is a best response to column j for Player 1 (max A[*][j]) 9 // - j is a best response to row i for Player 2 (max B[i][*]) 10 11 struct Equilibrium { 12 int row, col; // 0-indexed 13 double u1, u2; // payoffs 14 }; 15 16 vector<Equilibrium> pureNash(const vector<vector<double>>& A, const vector<vector<double>>& B, double eps = 1e-12) { 17 int m = (int)A.size(); 18 int n = (int)A[0].size(); 19 vector<vector<bool>> best1(m, vector<bool>(n, false)); 20 vector<vector<bool>> best2(m, vector<bool>(n, false)); 21 22 // Player 1: best responses per column 23 for (int j = 0; j < n; ++j) { 24 double mx = -numeric_limits<double>::infinity(); 25 for (int i = 0; i < m; ++i) mx = max(mx, A[i][j]); 26 for (int i = 0; i < m; ++i) if (A[i][j] >= mx - eps) best1[i][j] = true; // handle ties 27 } 28 29 // Player 2: best responses per row 30 for (int i = 0; i < m; ++i) { 31 double mx = -numeric_limits<double>::infinity(); 32 for (int j = 0; j < n; ++j) mx = max(mx, B[i][j]); 33 for (int j = 0; j < n; ++j) if (B[i][j] >= mx - eps) best2[i][j] = true; // handle ties 34 } 35 36 vector<Equilibrium> res; 37 for (int i = 0; i < m; ++i) { 38 for (int j = 0; j < n; ++j) { 39 if (best1[i][j] && best2[i][j]) { 40 res.push_back({i, j, A[i][j], B[i][j]}); 41 } 42 } 43 } 44 return res; 45 } 46 47 int main() { 48 // Example: Coordination game with two equilibria 49 // A and B identical: players prefer to match actions. 50 vector<vector<double>> A = {{2, 0}, 51 {0, 1}}; 52 vector<vector<double>> B = {{2, 0}, 53 {0, 1}}; 54 55 auto eqs = pureNash(A, B); 56 cout << "Pure Nash equilibria found: " << eqs.size() << "\n"; 57 for (auto &e : eqs) { 58 cout << "At (row=" << e.row << ", col=" << e.col << ") with payoffs (" 59 << e.u1 << ", " << e.u2 << ")\n"; 60 } 61 62 return 0; 63 } 64
This program scans the payoff matrices to locate cells that are mutual best responses—these are exactly the pure-strategy Nash equilibria in a two-player normal-form game. We allow ties (within epsilon) to capture multiple best responses. The example coordination game has two equilibria: (0,0) and (1,1).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Solve a 2x2 bimatrix game for a mixed-strategy Nash equilibrium (if interior exists). 5 // A = [[A11, A12], [A21, A22]], B = [[B11, B12], [B21, B22]] 6 // Player 1 rows: R1, R2; Player 2 cols: C1, C2. 7 // p = P1 plays R1, q = P2 plays C1. 8 // Indifference conditions yield: 9 // p = (B22 - B21) / (B11 - B21 - B12 + B22) 10 // q = (A22 - A12) / (A11 - A12 - A21 + A22) 11 // If denominators are zero or p,q not in [0,1], there is no interior mixed equilibrium; check pure equilibria. 12 13 struct MixedEquilibrium2x2 { 14 bool has_mixed; // interior mixed equilibrium exists 15 double p, q; // probabilities if has_mixed 16 vector<pair<int,int>> pure_eqs; // list of pure equilibria (i,j) 17 }; 18 19 vector<pair<int,int>> pureEquilibria2x2(const array<array<double,2>,2>& A, 20 const array<array<double,2>,2>& B, 21 double eps=1e-12) { 22 vector<pair<int,int>> res; 23 // Best responses for P1 per column 24 for (int j = 0; j < 2; ++j) { 25 double mx = max(A[0][j], A[1][j]); 26 for (int i = 0; i < 2; ++i) { 27 bool br1 = A[i][j] >= mx - eps; 28 // Best responses for P2 per row 29 double my = max(B[i][0], B[i][1]); 30 bool br2 = B[i][j] >= my - eps; 31 if (br1 && br2) res.push_back({i,j}); 32 } 33 } 34 // Remove duplicates 35 sort(res.begin(), res.end()); 36 res.erase(unique(res.begin(), res.end()), res.end()); 37 return res; 38 } 39 40 MixedEquilibrium2x2 solve2x2(const array<array<double,2>,2>& A, 41 const array<array<double,2>,2>& B, 42 double eps=1e-12) { 43 double A11=A[0][0], A12=A[0][1], A21=A[1][0], A22=A[1][1]; 44 double B11=B[0][0], B12=B[0][1], B21=B[1][0], B22=B[1][1]; 45 double denom_p = (B11 - B21 - B12 + B22); 46 double denom_q = (A11 - A12 - A21 + A22); 47 48 MixedEquilibrium2x2 ans; ans.has_mixed = false; ans.p=ans.q=numeric_limits<double>::quiet_NaN(); 49 50 if (fabs(denom_p) > eps && fabs(denom_q) > eps) { 51 double p = (B22 - B21) / denom_p; 52 double q = (A22 - A12) / denom_q; 53 if (p >= -eps && p <= 1+eps && q >= -eps && q <= 1+eps) { 54 // Clamp to [0,1] 55 ans.has_mixed = true; 56 ans.p = min(1.0, max(0.0, p)); 57 ans.q = min(1.0, max(0.0, q)); 58 } 59 } 60 61 // Always compute pure equilibria as fallback or in addition 62 ans.pure_eqs = pureEquilibria2x2(A, B, eps); 63 return ans; 64 } 65 66 int main() { 67 // Example 1: Matching pennies (zero-sum) has interior mixed equilibrium p=q=0.5 68 array<array<double,2>,2> A = {{{ 1, -1 }, 69 { -1, 1 }}}; // P1's payoffs 70 array<array<double,2>,2> B = {{{ -1, 1 }, 71 { 1, -1 }}}; // P2's payoffs 72 73 auto ans = solve2x2(A, B); 74 cout << fixed << setprecision(4); 75 if (ans.has_mixed) { 76 cout << "Mixed equilibrium: p(R1)=" << ans.p << ", q(C1)=" << ans.q << "\n"; 77 } else { 78 cout << "No interior mixed equilibrium.\n"; 79 } 80 cout << "Pure equilibria count: " << ans.pure_eqs.size() << "\n"; 81 for (auto [i,j] : ans.pure_eqs) cout << "Pure NE at (" << i << "," << j << ")\n"; 82 83 // Example 2: Prisoner's Dilemma (dominant strategy -> pure NE only) 84 array<array<double,2>,2> A2 = {{{ -1, -3 }, 85 { 0, -2 }}}; // Row player (C,D) 86 array<array<double,2>,2> B2 = {{{ -1, 0 }, 87 { -3, -2 }}}; // Column player (C,D) 88 auto ans2 = solve2x2(A2, B2); 89 if (ans2.has_mixed) cout << "(PD) Mixed equilibrium exists unexpectedly.\n"; 90 cout << "(PD) Pure equilibria count: " << ans2.pure_eqs.size() << "\n"; 91 for (auto [i,j] : ans2.pure_eqs) cout << "(PD) Pure NE at (" << i << "," << j << ")\n"; 92 93 return 0; 94 } 95
This program computes the interior mixed-strategy Nash equilibrium for a 2x2 bimatrix game by making the opponent indifferent between their two actions. It also finds pure equilibria as a fallback. Matching pennies produces p=q=0.5; Prisoner’s Dilemma yields only a pure equilibrium because each player has a dominant strategy.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Regret matching (full-information) to approximate minimax strategies in a zero-sum matrix game. 5 // Player 1 payoff matrix A (n x n); Player 2 payoff is -A. Both update regrets against each other. 6 // Average strategies converge toward a Nash equilibrium in zero-sum games. 7 8 struct RMState { 9 vector<double> regrets; // cumulative regrets for each action 10 vector<double> strategy; // current mixed strategy 11 vector<double> strat_sum; // cumulative strategy (for averaging) 12 }; 13 14 vector<double> normalizePositive(const vector<double>& x) { 15 double sum_pos = 0.0; 16 for (double v : x) if (v > 1e-15) sum_pos += v; 17 int n = (int)x.size(); 18 vector<double> p(n, 0.0); 19 if (sum_pos <= 1e-15) { 20 // uniform if no positive regret 21 for (int i = 0; i < n; ++i) p[i] = 1.0 / n; 22 } else { 23 for (int i = 0; i < n; ++i) p[i] = (x[i] > 0 ? x[i] / sum_pos : 0.0); 24 } 25 return p; 26 } 27 28 int main() { 29 ios::sync_with_stdio(false); 30 cin.tie(nullptr); 31 32 // Example: Rock-Paper-Scissors payoff matrix for Player 1 33 // Rows: R P S; Cols: R P S 34 // Win: +1, Lose: -1, Tie: 0 35 vector<vector<double>> A = { 36 { 0, -1, 1 }, 37 { 1, 0, -1 }, 38 { -1, 1, 0 } 39 }; 40 41 int n = (int)A.size(); 42 int T = 20000; // iterations 43 44 RMState p1{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)}; 45 RMState p2{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)}; 46 47 auto matvec_row = [&](const vector<vector<double>>& M, const vector<double>& q){ 48 // returns vector v with v[i] = sum_j M[i][j] * q[j] 49 int r = (int)M.size(); int c = (int)M[0].size(); 50 vector<double> v(r,0.0); 51 for (int i = 0; i < r; ++i) { 52 double s = 0.0; 53 for (int j = 0; j < c; ++j) s += M[i][j] * q[j]; 54 v[i] = s; 55 } 56 return v; 57 }; 58 59 auto matvec_col = [&](const vector<vector<double>>& M, const vector<double>& p){ 60 // returns vector w with w[j] = sum_i M[i][j] * p[i] 61 int r = (int)M.size(); int c = (int)M[0].size(); 62 vector<double> w(c,0.0); 63 for (int j = 0; j < c; ++j) { 64 double s = 0.0; 65 for (int i = 0; i < r; ++i) s += M[i][j] * p[i]; 66 w[j] = s; 67 } 68 return w; 69 }; 70 71 for (int t = 1; t <= T; ++t) { 72 // Compute expected payoff of each action against opponent's current mixed strategy 73 vector<double> u1 = matvec_row(A, p2.strategy); // u1[i] = E[A[i, J]] vs p2 74 vector<double> u1_bar(1, 0.0); 75 for (int i = 0; i < n; ++i) u1_bar[0] += p1.strategy[i] * u1[i]; 76 77 // Player 2 wants to minimize A, so its utility is -A 78 vector<double> u2_neg = matvec_col(A, p1.strategy); // u2_neg[j] = E[A[I, j]] 79 vector<double> u2(n, 0.0); 80 for (int j = 0; j < n; ++j) u2[j] = -u2_neg[j]; 81 double u2_bar = 0.0; 82 for (int j = 0; j < n; ++j) u2_bar += p2.strategy[j] * u2[j]; 83 84 // Update cumulative regrets 85 for (int i = 0; i < n; ++i) p1.regrets[i] += (u1[i] - u1_bar[0]); 86 for (int j = 0; j < n; ++j) p2.regrets[j] += (u2[j] - u2_bar); 87 88 // Update strategies proportional to positive regrets 89 p1.strategy = normalizePositive(p1.regrets); 90 p2.strategy = normalizePositive(p2.regrets); 91 92 // Accumulate for average strategy 93 for (int i = 0; i < n; ++i) p1.strat_sum[i] += p1.strategy[i]; 94 for (int j = 0; j < n; ++j) p2.strat_sum[j] += p2.strategy[j]; 95 } 96 97 // Compute average strategies 98 vector<double> avg1(n), avg2(n); 99 for (int i = 0; i < n; ++i) avg1[i] = p1.strat_sum[i] / T; 100 for (int j = 0; j < n; ++j) avg2[j] = p2.strat_sum[j] / T; 101 102 // Estimate game value with average strategies: v ≈ avg1^T A avg2 103 double value = 0.0; 104 for (int i = 0; i < n; ++i) { 105 for (int j = 0; j < n; ++j) value += avg1[i] * A[i][j] * avg2[j]; 106 } 107 108 cout << fixed << setprecision(4); 109 cout << "Approximate equilibrium strategies (averaged):\n"; 110 cout << "Player 1: "; for (double x : avg1) cout << x << ' '; cout << "\n"; 111 cout << "Player 2: "; for (double x : avg2) cout << x << ' '; cout << "\n"; 112 cout << "Estimated game value: " << value << " (should be ~0)\n"; 113 114 return 0; 115 } 116
This implementation of full-information regret matching updates each player’s strategy in proportion to positive cumulative regrets. In zero-sum games, the average strategies converge to a minimax equilibrium. Using Rock–Paper–Scissors, the algorithm approaches the uniform distribution and an approximate value of 0.