📚TheoryIntermediate

Game Theory

Key Points

  • Game theory studies strategic decision-making among multiple players where each player’s payoff depends on everyone’s actions.
  • Normal-form games represent simultaneous choices with a payoff matrix; extensive-form games use trees for sequential moves.
  • A Nash equilibrium is a strategy profile where no player can gain by unilaterally deviating; it may require mixed (randomized) strategies.
  • In zero-sum games, one player’s gain is the other’s loss, and the minimax theorem guarantees value equality between max–min and min–max.
  • Dominated strategies can be removed via iterated elimination to simplify a game without affecting rational outcomes.
  • Mixed strategies assign probabilities to pure actions and often ensure equilibrium existence in finite games.
  • Minimax, regret matching, and linear programming are practical tools to compute or approximate equilibria in different settings.
  • Game theory underpins AI/ML areas like GAN training (minimax), multi-agent RL, mechanism design, and safety/alignments like RLHF.

Prerequisites

  • Probability distributionsMixed strategies are probability distributions over actions; expected utility requires computing weighted averages.
  • Linear algebra (vectors and matrices)Payoff matrices, mixed strategies, and expected payoffs use matrix–vector operations like p^T A q.
  • Optimization and linear programmingZero-sum equilibria can be solved via LP; understanding constraints and duality clarifies minimax.
  • Algorithms and complexity (Big-O)Analyzing runtime of equilibrium-finding and learning dynamics guides practical implementations.
  • Basic C++ programmingImplementing payoff processing, loops, and numerical checks is necessary for the provided code.
  • Graph/tree basicsExtensive-form games are trees; backward induction and subgames rely on tree reasoning.
  • Convexity and fixed-point ideas (high level)Equilibrium existence and properties rely on convex sets and fixed-point theorems, even if not proven here.

Detailed Explanation

Tap terms for definitions

01Overview

Game theory is the study of strategic interaction among rational decision-makers (players). Each player chooses an action (strategy) to maximize their own payoff, but their outcome depends on the actions of others. Games can be represented in normal form (as payoff matrices) when decisions are simultaneous, or in extensive form (as trees) when decisions are sequential or information is revealed over time. A central solution concept is Nash equilibrium: a strategy profile in which no player benefits from changing only their own strategy. While some games have pure-strategy equilibria (deterministic choices), many require mixed strategies—probability distributions over actions. In zero-sum games (one player’s gain equals the other’s loss), the minimax theorem ensures a well-defined game value and equality of the max–min and min–max outcomes. Tools like eliminating dominated strategies, computing best responses, and solving linear programs help analyze and solve games. In AI/ML, game theory models adversarial training (e.g., GANs), strategic multi-agent interactions, and incentive design. Understanding how strategies interact, converge, and respond to incentives allows us to design robust algorithms and systems that perform well in competitive or cooperative environments.

02Intuition & Analogies

Imagine choosing whether to bring an umbrella without knowing if your friend brings one, but your comfort depends on both choices. Or picture two drivers approaching an intersection: each prefers to go, but if both go, they crash; if both yield, they wait—so each player’s best move depends on the other’s move. That interdependence is the heart of game theory. A normal-form game is like a grid of outcomes: rows are Player 1’s actions, columns are Player 2’s, and each cell shows both players’ payoffs. Think of it like a price-setting competition where two stores pick prices at the same time; profits depend on both prices. A Nash equilibrium is a stable stand-off: neither store wants to change price alone, given the other’s price. Sometimes, to avoid being predictable (and exploitable), players must randomize—like a soccer goalie deciding which way to dive or a penalty taker choosing a side. Mixed strategies keep opponents indifferent and unable to gain by guessing your move. In zero-sum settings (e.g., rock–paper–scissors or two AIs battling), one agent’s win is another’s loss. Minimax is like preparing for the worst-case opponent: choose a strategy that maximizes your minimum guaranteed payoff; amazingly, the adversary’s best counterbalance yields the same value when roles are reversed. When games unfold over time (extensive form), think of a conversation or chess: later moves depend on earlier ones; solving by backward reasoning (backward induction) gives subgame-perfect plans. These analogies map neatly onto ML: GANs pit a generator against a discriminator in a minimax duel, and multi-agent RL agents learn best responses against each other.

03Formal Definition

A finite normal-form game is a tuple G = (N, ()_{i N}, ()_{i N}) where N is the set of players, is the finite set of pure strategies for player i, and : is player i’s payoff function. A mixed strategy for player i is a probability distribution (). A strategy profile ()_{i N} induces expected utility () by averaging over product distributions. A best response for player i to opponents’ profile is any (a, ). A Nash equilibrium is a profile ^* such that for all i, (^*) (, ^*_{-i}) for every . In two-player zero-sum games with payoff matrix A for player 1 and -A for player 2, the value v satisfies A q = A q. Extensive-form games are trees with nodes (information sets), actions on edges, and payoffs at leaves; subgame perfect equilibrium (SPE) refines Nash equilibrium by requiring equilibrium behavior in every subgame. Existence of Nash equilibria in finite games is guaranteed in mixed strategies by fixed-point theorems (e.g., Brouwer/Kakutani).

04When to Use

  • Modeling simultaneous strategic choices: pricing wars, traffic routing, computational auctions—use normal-form games and search for Nash equilibria (pure or mixed). - Adversarial training and robustness: in GANs, minimize the discriminator’s ability to distinguish real from generated samples via a minimax objective; solve or approximate zero-sum equilibria. - Multi-agent reinforcement learning: agents learning best responses to each other; use regret minimization, fictitious play, or policy gradient dynamics to approach equilibrium behavior. - Mechanism and market design: design auctions and incentives so equilibria implement desired outcomes (efficiency, truthfulness). - Sequential decision problems with strategic interaction: model as extensive-form games; apply backward induction or subgame-perfect equilibrium concepts. - Security and scheduling: defender–attacker models, patrolling, inspection games; often zero-sum or Stackelberg (leader–follower) variants. - Simplification by dominance: when strategy sets are large, iteratively eliminate dominated strategies to shrink the game before computing equilibria. - Algorithmic analysis: when exact equilibria are hard, use approximation via no-regret learning (regret matching, Hedge) that converges to equilibrium (zero-sum) or correlated equilibria (general-sum).

⚠️Common Mistakes

  • Confusing zero-sum with general-sum: minimax and value guarantees hold only for zero-sum/constant-sum games; in general-sum, equilibria need not maximize joint payoffs. - Ignoring mixed strategies: many finite games lack pure equilibria; failing to consider randomization misses existing Nash equilibria (e.g., matching pennies). - Incorrect dominance elimination: eliminating weakly dominated strategies in the wrong order can remove equilibria; stick to strictly dominated eliminations or be careful about order dependence. - Miscomputing best responses: forgetting to compute column-wise and row-wise maxima (or not handling ties) leads to missing pure equilibria. - Numerical instability in 2x2 mixed solutions: denominators can be zero (no interior mixed equilibrium); check for boundary cases and validate probabilities in [0,1]. - Overfitting dynamics to transient behavior: in learning-in-games, early oscillations do not imply no equilibrium; average strategies often converge. - Confusing minimax with max–max: robust optimization maximizes the minimum guaranteed payoff, not the maximum against a fixed opponent. - Assuming Nash equilibria are unique or efficient: games can have multiple equilibria, and equilibria can be Pareto-inefficient (e.g., Prisoner’s Dilemma).

Key Formulas

Normal-form game

Explanation: A game is defined by its players, each player’s set of strategies, and the payoff functions mapping action profiles to real numbers.

Expected utility under mixed strategies

Explanation: The expected payoff to player i is the average payoff over all pure profiles, weighted by the probability each profile occurs under the product of mixed strategies.

Best response

Explanation: A best response is any pure strategy that maximizes player i’s expected utility given the opponents’ strategies.

Nash equilibrium condition

Explanation: At equilibrium, no player can get a higher payoff by deviating to any other strategy, holding others fixed.

Minimax theorem (zero-sum)

Explanation: In finite zero-sum matrix games, the player maximizing their guaranteed payoff and the player minimizing the opponent’s payoff agree on the value; order of max and min can be swapped.

2x2 mixed-strategy equilibrium

Explanation: For a 2x2 bimatrix game, p is Player 1’s probability of playing the first row that makes Player 2 indifferent; q is Player 2’s probability of the first column that makes Player 1 indifferent. Valid only when denominators are nonzero and probabilities lie in [0,1].

Primal LP for zero-sum value

Explanation: Player 1 maximizes their guaranteed value v by choosing a mixed strategy p that makes every column’s expected payoff at least v. This is a linear program over the simplex.

Dual LP for zero-sum value

Explanation: Player 2 minimizes the game value by choosing q so that every row’s expected payoff is at most v; dual to the primal LP and yields the same value at optimum.

Average external regret

Explanation: Average regret measures how much more a fixed action would have earned compared to the actions actually played. No-regret algorithms drive this quantity toward zero as T grows.

No-regret convergence rate

Explanation: For many algorithms (e.g., Hedge/EXP3 in full-information), average regret decreases on the order of 1/sqrt(T), implying convergence to equilibrium concepts in repeated play.

Complexity Analysis

We present three core computational tasks. 1) Pure-strategy Nash in a bimatrix game (m rows, n columns): For Player 1, compute best responses per column by scanning m entries, and for Player 2, per row by scanning n entries. Marking mutual best responses costs O(mn) time and O(mn) space if we store boolean markers, or O(m + n) extra space if we stream maxima. This is optimal up to constants since we must view all payoffs. 2) Closed-form 2x2 mixed equilibrium: Computing the two probabilities uses constant-time arithmetic on eight payoff entries. Time is O(1), space is O(1). Care is needed for degenerate denominators and bounds checking. Optionally, if no interior mixed equilibrium exists, we can fallback to checking the four pure profiles in O(1). 3) Regret matching for zero-sum n×n matrix games over T iterations: Each iteration computes expected utilities for each action against the opponent’s mixed strategy. A naive matrix–vector multiplication costs O() per player, thus O() overall per round. Updating regrets and normalizing strategies adds O(n). Over T rounds, total time is O(T ) and space is O(n) for regrets and cumulative strategies per player (O(n) each), plus O() to store the payoff matrix. If we exploit sparsity or structure (e.g., circulant matrices like Rock–Paper–Scissors), we can reduce per-iteration cost. The regret-matching dynamics converge in average strategies (to a minimax equilibrium in zero-sum) with sublinear regret; thus, increasing T improves accuracy roughly as 1/√T while increasing runtime linearly in T. For very large games, consider sampling-based or online methods that approximate matrix–vector products to trade accuracy for speed.

Code Examples

Find all pure-strategy Nash equilibria in a bimatrix game
1#include <bits/stdc++.h>
2using namespace std;
3
4// Find all pure-strategy Nash equilibria in a 2-player normal-form game (bimatrix)
5// A: m x n payoff matrix for Player 1
6// B: m x n payoff matrix for Player 2
7// A cell (i,j) is a pure NE if:
8// - i is a best response to column j for Player 1 (max A[*][j])
9// - j is a best response to row i for Player 2 (max B[i][*])
10
11struct Equilibrium {
12 int row, col; // 0-indexed
13 double u1, u2; // payoffs
14};
15
16vector<Equilibrium> pureNash(const vector<vector<double>>& A, const vector<vector<double>>& B, double eps = 1e-12) {
17 int m = (int)A.size();
18 int n = (int)A[0].size();
19 vector<vector<bool>> best1(m, vector<bool>(n, false));
20 vector<vector<bool>> best2(m, vector<bool>(n, false));
21
22 // Player 1: best responses per column
23 for (int j = 0; j < n; ++j) {
24 double mx = -numeric_limits<double>::infinity();
25 for (int i = 0; i < m; ++i) mx = max(mx, A[i][j]);
26 for (int i = 0; i < m; ++i) if (A[i][j] >= mx - eps) best1[i][j] = true; // handle ties
27 }
28
29 // Player 2: best responses per row
30 for (int i = 0; i < m; ++i) {
31 double mx = -numeric_limits<double>::infinity();
32 for (int j = 0; j < n; ++j) mx = max(mx, B[i][j]);
33 for (int j = 0; j < n; ++j) if (B[i][j] >= mx - eps) best2[i][j] = true; // handle ties
34 }
35
36 vector<Equilibrium> res;
37 for (int i = 0; i < m; ++i) {
38 for (int j = 0; j < n; ++j) {
39 if (best1[i][j] && best2[i][j]) {
40 res.push_back({i, j, A[i][j], B[i][j]});
41 }
42 }
43 }
44 return res;
45}
46
47int main() {
48 // Example: Coordination game with two equilibria
49 // A and B identical: players prefer to match actions.
50 vector<vector<double>> A = {{2, 0},
51 {0, 1}};
52 vector<vector<double>> B = {{2, 0},
53 {0, 1}};
54
55 auto eqs = pureNash(A, B);
56 cout << "Pure Nash equilibria found: " << eqs.size() << "\n";
57 for (auto &e : eqs) {
58 cout << "At (row=" << e.row << ", col=" << e.col << ") with payoffs ("
59 << e.u1 << ", " << e.u2 << ")\n";
60 }
61
62 return 0;
63}
64

This program scans the payoff matrices to locate cells that are mutual best responses—these are exactly the pure-strategy Nash equilibria in a two-player normal-form game. We allow ties (within epsilon) to capture multiple best responses. The example coordination game has two equilibria: (0,0) and (1,1).

Time: O(mn)Space: O(mn)
Closed-form mixed-strategy equilibrium for a 2x2 bimatrix game
1#include <bits/stdc++.h>
2using namespace std;
3
4// Solve a 2x2 bimatrix game for a mixed-strategy Nash equilibrium (if interior exists).
5// A = [[A11, A12], [A21, A22]], B = [[B11, B12], [B21, B22]]
6// Player 1 rows: R1, R2; Player 2 cols: C1, C2.
7// p = P1 plays R1, q = P2 plays C1.
8// Indifference conditions yield:
9// p = (B22 - B21) / (B11 - B21 - B12 + B22)
10// q = (A22 - A12) / (A11 - A12 - A21 + A22)
11// If denominators are zero or p,q not in [0,1], there is no interior mixed equilibrium; check pure equilibria.
12
13struct MixedEquilibrium2x2 {
14 bool has_mixed; // interior mixed equilibrium exists
15 double p, q; // probabilities if has_mixed
16 vector<pair<int,int>> pure_eqs; // list of pure equilibria (i,j)
17};
18
19vector<pair<int,int>> pureEquilibria2x2(const array<array<double,2>,2>& A,
20 const array<array<double,2>,2>& B,
21 double eps=1e-12) {
22 vector<pair<int,int>> res;
23 // Best responses for P1 per column
24 for (int j = 0; j < 2; ++j) {
25 double mx = max(A[0][j], A[1][j]);
26 for (int i = 0; i < 2; ++i) {
27 bool br1 = A[i][j] >= mx - eps;
28 // Best responses for P2 per row
29 double my = max(B[i][0], B[i][1]);
30 bool br2 = B[i][j] >= my - eps;
31 if (br1 && br2) res.push_back({i,j});
32 }
33 }
34 // Remove duplicates
35 sort(res.begin(), res.end());
36 res.erase(unique(res.begin(), res.end()), res.end());
37 return res;
38}
39
40MixedEquilibrium2x2 solve2x2(const array<array<double,2>,2>& A,
41 const array<array<double,2>,2>& B,
42 double eps=1e-12) {
43 double A11=A[0][0], A12=A[0][1], A21=A[1][0], A22=A[1][1];
44 double B11=B[0][0], B12=B[0][1], B21=B[1][0], B22=B[1][1];
45 double denom_p = (B11 - B21 - B12 + B22);
46 double denom_q = (A11 - A12 - A21 + A22);
47
48 MixedEquilibrium2x2 ans; ans.has_mixed = false; ans.p=ans.q=numeric_limits<double>::quiet_NaN();
49
50 if (fabs(denom_p) > eps && fabs(denom_q) > eps) {
51 double p = (B22 - B21) / denom_p;
52 double q = (A22 - A12) / denom_q;
53 if (p >= -eps && p <= 1+eps && q >= -eps && q <= 1+eps) {
54 // Clamp to [0,1]
55 ans.has_mixed = true;
56 ans.p = min(1.0, max(0.0, p));
57 ans.q = min(1.0, max(0.0, q));
58 }
59 }
60
61 // Always compute pure equilibria as fallback or in addition
62 ans.pure_eqs = pureEquilibria2x2(A, B, eps);
63 return ans;
64}
65
66int main() {
67 // Example 1: Matching pennies (zero-sum) has interior mixed equilibrium p=q=0.5
68 array<array<double,2>,2> A = {{{ 1, -1 },
69 { -1, 1 }}}; // P1's payoffs
70 array<array<double,2>,2> B = {{{ -1, 1 },
71 { 1, -1 }}}; // P2's payoffs
72
73 auto ans = solve2x2(A, B);
74 cout << fixed << setprecision(4);
75 if (ans.has_mixed) {
76 cout << "Mixed equilibrium: p(R1)=" << ans.p << ", q(C1)=" << ans.q << "\n";
77 } else {
78 cout << "No interior mixed equilibrium.\n";
79 }
80 cout << "Pure equilibria count: " << ans.pure_eqs.size() << "\n";
81 for (auto [i,j] : ans.pure_eqs) cout << "Pure NE at (" << i << "," << j << ")\n";
82
83 // Example 2: Prisoner's Dilemma (dominant strategy -> pure NE only)
84 array<array<double,2>,2> A2 = {{{ -1, -3 },
85 { 0, -2 }}}; // Row player (C,D)
86 array<array<double,2>,2> B2 = {{{ -1, 0 },
87 { -3, -2 }}}; // Column player (C,D)
88 auto ans2 = solve2x2(A2, B2);
89 if (ans2.has_mixed) cout << "(PD) Mixed equilibrium exists unexpectedly.\n";
90 cout << "(PD) Pure equilibria count: " << ans2.pure_eqs.size() << "\n";
91 for (auto [i,j] : ans2.pure_eqs) cout << "(PD) Pure NE at (" << i << "," << j << ")\n";
92
93 return 0;
94}
95

This program computes the interior mixed-strategy Nash equilibrium for a 2x2 bimatrix game by making the opponent indifferent between their two actions. It also finds pure equilibria as a fallback. Matching pennies produces p=q=0.5; Prisoner’s Dilemma yields only a pure equilibrium because each player has a dominant strategy.

Time: O(1)Space: O(1)
No-regret learning (regret matching) for zero-sum matrix games
1#include <bits/stdc++.h>
2using namespace std;
3
4// Regret matching (full-information) to approximate minimax strategies in a zero-sum matrix game.
5// Player 1 payoff matrix A (n x n); Player 2 payoff is -A. Both update regrets against each other.
6// Average strategies converge toward a Nash equilibrium in zero-sum games.
7
8struct RMState {
9 vector<double> regrets; // cumulative regrets for each action
10 vector<double> strategy; // current mixed strategy
11 vector<double> strat_sum; // cumulative strategy (for averaging)
12};
13
14vector<double> normalizePositive(const vector<double>& x) {
15 double sum_pos = 0.0;
16 for (double v : x) if (v > 1e-15) sum_pos += v;
17 int n = (int)x.size();
18 vector<double> p(n, 0.0);
19 if (sum_pos <= 1e-15) {
20 // uniform if no positive regret
21 for (int i = 0; i < n; ++i) p[i] = 1.0 / n;
22 } else {
23 for (int i = 0; i < n; ++i) p[i] = (x[i] > 0 ? x[i] / sum_pos : 0.0);
24 }
25 return p;
26}
27
28int main() {
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 // Example: Rock-Paper-Scissors payoff matrix for Player 1
33 // Rows: R P S; Cols: R P S
34 // Win: +1, Lose: -1, Tie: 0
35 vector<vector<double>> A = {
36 { 0, -1, 1 },
37 { 1, 0, -1 },
38 { -1, 1, 0 }
39 };
40
41 int n = (int)A.size();
42 int T = 20000; // iterations
43
44 RMState p1{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)};
45 RMState p2{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)};
46
47 auto matvec_row = [&](const vector<vector<double>>& M, const vector<double>& q){
48 // returns vector v with v[i] = sum_j M[i][j] * q[j]
49 int r = (int)M.size(); int c = (int)M[0].size();
50 vector<double> v(r,0.0);
51 for (int i = 0; i < r; ++i) {
52 double s = 0.0;
53 for (int j = 0; j < c; ++j) s += M[i][j] * q[j];
54 v[i] = s;
55 }
56 return v;
57 };
58
59 auto matvec_col = [&](const vector<vector<double>>& M, const vector<double>& p){
60 // returns vector w with w[j] = sum_i M[i][j] * p[i]
61 int r = (int)M.size(); int c = (int)M[0].size();
62 vector<double> w(c,0.0);
63 for (int j = 0; j < c; ++j) {
64 double s = 0.0;
65 for (int i = 0; i < r; ++i) s += M[i][j] * p[i];
66 w[j] = s;
67 }
68 return w;
69 };
70
71 for (int t = 1; t <= T; ++t) {
72 // Compute expected payoff of each action against opponent's current mixed strategy
73 vector<double> u1 = matvec_row(A, p2.strategy); // u1[i] = E[A[i, J]] vs p2
74 vector<double> u1_bar(1, 0.0);
75 for (int i = 0; i < n; ++i) u1_bar[0] += p1.strategy[i] * u1[i];
76
77 // Player 2 wants to minimize A, so its utility is -A
78 vector<double> u2_neg = matvec_col(A, p1.strategy); // u2_neg[j] = E[A[I, j]]
79 vector<double> u2(n, 0.0);
80 for (int j = 0; j < n; ++j) u2[j] = -u2_neg[j];
81 double u2_bar = 0.0;
82 for (int j = 0; j < n; ++j) u2_bar += p2.strategy[j] * u2[j];
83
84 // Update cumulative regrets
85 for (int i = 0; i < n; ++i) p1.regrets[i] += (u1[i] - u1_bar[0]);
86 for (int j = 0; j < n; ++j) p2.regrets[j] += (u2[j] - u2_bar);
87
88 // Update strategies proportional to positive regrets
89 p1.strategy = normalizePositive(p1.regrets);
90 p2.strategy = normalizePositive(p2.regrets);
91
92 // Accumulate for average strategy
93 for (int i = 0; i < n; ++i) p1.strat_sum[i] += p1.strategy[i];
94 for (int j = 0; j < n; ++j) p2.strat_sum[j] += p2.strategy[j];
95 }
96
97 // Compute average strategies
98 vector<double> avg1(n), avg2(n);
99 for (int i = 0; i < n; ++i) avg1[i] = p1.strat_sum[i] / T;
100 for (int j = 0; j < n; ++j) avg2[j] = p2.strat_sum[j] / T;
101
102 // Estimate game value with average strategies: v ≈ avg1^T A avg2
103 double value = 0.0;
104 for (int i = 0; i < n; ++i) {
105 for (int j = 0; j < n; ++j) value += avg1[i] * A[i][j] * avg2[j];
106 }
107
108 cout << fixed << setprecision(4);
109 cout << "Approximate equilibrium strategies (averaged):\n";
110 cout << "Player 1: "; for (double x : avg1) cout << x << ' '; cout << "\n";
111 cout << "Player 2: "; for (double x : avg2) cout << x << ' '; cout << "\n";
112 cout << "Estimated game value: " << value << " (should be ~0)\n";
113
114 return 0;
115}
116

This implementation of full-information regret matching updates each player’s strategy in proportion to positive cumulative regrets. In zero-sum games, the average strategies converge to a minimax equilibrium. Using Rock–Paper–Scissors, the algorithm approaches the uniform distribution and an approximate value of 0.

Time: O(T n^2)Space: O(n^2) for the matrix, O(n) for regrets and strategy storage per player