📚TheoryIntermediate

Game Theory

Key Points

•
Game theory studies strategic decision-making among multiple players where each player’s payoff depends on everyone’s actions.
•
Normal-form games represent simultaneous choices with a payoff matrix; extensive-form games use trees for sequential moves.
•
A Nash equilibrium is a strategy profile where no player can gain by unilaterally deviating; it may require mixed (randomized) strategies.
•
In zero-sum games, one player’s gain is the other’s loss, and the minimax theorem guarantees value equality between max–min and min–max.
•
Dominated strategies can be removed via iterated elimination to simplify a game without affecting rational outcomes.
•
Mixed strategies assign probabilities to pure actions and often ensure equilibrium existence in finite games.
•
Minimax, regret matching, and linear programming are practical tools to compute or approximate equilibria in different settings.
•
Game theory underpins AI/ML areas like GAN training (minimax), multi-agent RL, mechanism design, and safety/alignments like RLHF.

Prerequisites

→Probability distributions — Mixed strategies are probability distributions over actions; expected utility requires computing weighted averages.
→Linear algebra (vectors and matrices) — Payoff matrices, mixed strategies, and expected payoffs use matrix–vector operations like p^T A q.
→Optimization and linear programming — Zero-sum equilibria can be solved via LP; understanding constraints and duality clarifies minimax.
→Algorithms and complexity (Big-O) — Analyzing runtime of equilibrium-finding and learning dynamics guides practical implementations.
→Basic C++ programming — Implementing payoff processing, loops, and numerical checks is necessary for the provided code.
→Graph/tree basics — Extensive-form games are trees; backward induction and subgames rely on tree reasoning.
→Convexity and fixed-point ideas (high level) — Equilibrium existence and properties rely on convex sets and fixed-point theorems, even if not proven here.

Detailed Explanation

Tap terms for definitions

01Overview

Game theory is the study of strategic interaction among rational decision-makers (players). Each player chooses an action (strategy) to maximize their own payoff, but their outcome depends on the actions of others. Games can be represented in normal form (as payoff matrices) when decisions are simultaneous, or in extensive form (as trees) when decisions are sequential or information is revealed over time. A central solution concept is Nash equilibrium: a strategy profile in which no player benefits from changing only their own strategy. While some games have pure-strategy equilibria (deterministic choices), many require mixed strategies—probability distributions over actions. In zero-sum games (one player’s gain equals the other’s loss), the minimax theorem ensures a well-defined game value and equality of the max–min and min–max outcomes. Tools like eliminating dominated strategies, computing best responses, and solving linear programs help analyze and solve games. In AI/ML, game theory models adversarial training (e.g., GANs), strategic multi-agent interactions, and incentive design. Understanding how strategies interact, converge, and respond to incentives allows us to design robust algorithms and systems that perform well in competitive or cooperative environments.

02Intuition & Analogies

Imagine choosing whether to bring an umbrella without knowing if your friend brings one, but your comfort depends on both choices. Or picture two drivers approaching an intersection: each prefers to go, but if both go, they crash; if both yield, they wait—so each player’s best move depends on the other’s move. That interdependence is the heart of game theory. A normal-form game is like a grid of outcomes: rows are Player 1’s actions, columns are Player 2’s, and each cell shows both players’ payoffs. Think of it like a price-setting competition where two stores pick prices at the same time; profits depend on both prices. A Nash equilibrium is a stable stand-off: neither store wants to change price alone, given the other’s price. Sometimes, to avoid being predictable (and exploitable), players must randomize—like a soccer goalie deciding which way to dive or a penalty taker choosing a side. Mixed strategies keep opponents indifferent and unable to gain by guessing your move. In zero-sum settings (e.g., rock–paper–scissors or two AIs battling), one agent’s win is another’s loss. Minimax is like preparing for the worst-case opponent: choose a strategy that maximizes your minimum guaranteed payoff; amazingly, the adversary’s best counterbalance yields the same value when roles are reversed. When games unfold over time (extensive form), think of a conversation or chess: later moves depend on earlier ones; solving by backward reasoning (backward induction) gives subgame-perfect plans. These analogies map neatly onto ML: GANs pit a generator against a discriminator in a minimax duel, and multi-agent RL agents learn best responses against each other.

03Formal Definition

A finite normal-form game is a tuple G = (N, (

S_{i}

)_{i

\in

N}, (

u_{i}

)_{i

\in

N}) where N is the set of players,

S_{i}

is the finite set of pure strategies for player i, and

u_{i}

\prod_{j \in N}

S_{j}

\to

R

is player i’s payoff function. A mixed strategy for player i is a probability distribution

σ_{i}

\in

Δ

(

S_{i}

). A strategy profile (

σ_{i}

)_{i

\in

N} induces expected utility

u_{i}

(

σ

) by averaging

u_{i}

over product distributions. A best response for player i to opponents’ profile

σ_{- i}

is any

s_{i}

\in

ar g max_{a \in S_{i}}

u_{i}

(a,

σ_{- i}

). A Nash equilibrium is a profile

σ

^* such that for all i,

u_{i}

(

σ

^*)

\geq

u_{i}

(

s_{i}

σ

^*_{-i}) for every

s_{i}

\in

S_{i}

. In two-player zero-sum games with payoff matrix A for player 1 and -A for player 2, the value v satisfies

max_{p \in Δ_{m}}

min_{q \in Δ_{n}}

p^{T}

A q =

min_{q \in Δ_{n}}

max_{p \in Δ_{m}}

p^{T}

A q. Extensive-form games are trees with nodes (information sets), actions on edges, and payoffs at leaves; subgame perfect equilibrium (SPE) refines Nash equilibrium by requiring equilibrium behavior in every subgame. Existence of Nash equilibria in finite games is guaranteed in mixed strategies by fixed-point theorems (e.g., Brouwer/Kakutani).

04When to Use

Modeling simultaneous strategic choices: pricing wars, traffic routing, computational auctions—use normal-form games and search for Nash equilibria (pure or mixed). - Adversarial training and robustness: in GANs, minimize the discriminator’s ability to distinguish real from generated samples via a minimax objective; solve or approximate zero-sum equilibria. - Multi-agent reinforcement learning: agents learning best responses to each other; use regret minimization, fictitious play, or policy gradient dynamics to approach equilibrium behavior. - Mechanism and market design: design auctions and incentives so equilibria implement desired outcomes (efficiency, truthfulness). - Sequential decision problems with strategic interaction: model as extensive-form games; apply backward induction or subgame-perfect equilibrium concepts. - Security and scheduling: defender–attacker models, patrolling, inspection games; often zero-sum or Stackelberg (leader–follower) variants. - Simplification by dominance: when strategy sets are large, iteratively eliminate dominated strategies to shrink the game before computing equilibria. - Algorithmic analysis: when exact equilibria are hard, use approximation via no-regret learning (regret matching, Hedge) that converges to equilibrium (zero-sum) or correlated equilibria (general-sum).

⚠️Common Mistakes

Confusing zero-sum with general-sum: minimax and value guarantees hold only for zero-sum/constant-sum games; in general-sum, equilibria need not maximize joint payoffs. - Ignoring mixed strategies: many finite games lack pure equilibria; failing to consider randomization misses existing Nash equilibria (e.g., matching pennies). - Incorrect dominance elimination: eliminating weakly dominated strategies in the wrong order can remove equilibria; stick to strictly dominated eliminations or be careful about order dependence. - Miscomputing best responses: forgetting to compute column-wise and row-wise maxima (or not handling ties) leads to missing pure equilibria. - Numerical instability in 2x2 mixed solutions: denominators can be zero (no interior mixed equilibrium); check for boundary cases and validate probabilities in [0,1]. - Overfitting dynamics to transient behavior: in learning-in-games, early oscillations do not imply no equilibrium; average strategies often converge. - Confusing minimax with max–max: robust optimization maximizes the minimum guaranteed payoff, not the maximum against a fixed opponent. - Assuming Nash equilibria are unique or efficient: games can have multiple equilibria, and equilibria can be Pareto-inefficient (e.g., Prisoner’s Dilemma).

Key Formulas

Normal-form game

G = (N, (S_{i})_{i \in N}, (u_{i})_{i \in N})

Explanation: A game is defined by its players, each player’s set of strategies, and the payoff functions mapping action profiles to real numbers.

Expected utility under mixed strategies

u_{i} (σ) = s \in \prod_{j \in N} S_{j} \sum u_{i} (s) j \in N \prod σ_{j} (s_{j})

Explanation: The expected payoff to player i is the average payoff over all pure profiles, weighted by the probability each profile occurs under the product of mixed strategies.

Best response

B R_{i} (σ_{- i}) = ar g s_{i} \in S_{i} max u_{i} (s_{i}, σ_{- i})

Explanation: A best response is any pure strategy that maximizes player i’s expected utility given the opponents’ strategies.

Nash equilibrium condition

σ^{*} is NE ⟺ \forall i, \forall s_{i}^{'} \in S_{i} : u_{i} (s_{i}^{'}, σ_{- i}^{*}) \leq u_{i} (σ^{*})

Explanation: At equilibrium, no player can get a higher payoff by deviating to any other strategy, holding others fixed.

Minimax theorem (zero-sum)

p \in Δ_{m} max q \in Δ_{n} min p^{T} A q = q \in Δ_{n} min p \in Δ_{m} max p^{T} A q

Explanation: In finite zero-sum matrix games, the player maximizing their guaranteed payoff and the player minimizing the opponent’s payoff agree on the value; order of max and min can be swapped.

2x2 mixed-strategy equilibrium

p q = \frac{B _{22} - B _{21}}{B _{11} - B _{21} - B _{12} + B _{22}}, = \frac{A _{22} - A _{12}}{A _{11} - A _{12} - A _{21} + A _{22}}

Explanation: For a 2x2 bimatrix game, p is Player 1’s probability of playing the first row that makes Player 2 indifferent; q is Player 2’s probability of the first column that makes Player 1 indifferent. Valid only when denominators are nonzero and probabilities lie in [0,1].

Primal LP for zero-sum value

p \in Δ_{m} max v s.t. A^{T} p \geq v 1

Explanation: Player 1 maximizes their guaranteed value v by choosing a mixed strategy p that makes every column’s expected payoff at least v. This is a linear program over the simplex.

Dual LP for zero-sum value

q \in Δ_{n} min v s.t. A q \leq v 1

Explanation: Player 2 minimizes the game value by choosing q so that every row’s expected payoff is at most v; dual to the primal LP and yields the same value at optimum.

Average external regret

\overset{ˉ}{R}_{T} = a max \frac{1}{T} t = 1 \sum T (u (a, s_{- i}^{t}) - u (s_{i}^{t}, s_{- i}^{t}))

Explanation: Average regret measures how much more a fixed action would have earned compared to the actions actually played. No-regret algorithms drive this quantity toward zero as T grows.

No-regret convergence rate

\overset{ˉ}{R}_{T} \leq O (\frac{ln ∣ S _{i} ∣}{T})

Explanation: For many algorithms (e.g., Hedge/EXP3 in full-information), average regret decreases on the order of 1/sqrt(T), implying convergence to equilibrium concepts in repeated play.

Complexity Analysis

We present three core computational tasks. 1) Pure-strategy Nash in a bimatrix game (m rows, n columns): For Player 1, compute best responses per column by scanning m entries, and for Player 2, per row by scanning n entries. Marking mutual best responses costs O(mn) time and O(mn) space if we store boolean markers, or O(m + n) extra space if we stream maxima. This is optimal up to constants since we must view all payoffs. 2) Closed-form 2x2 mixed equilibrium: Computing the two probabilities uses constant-time arithmetic on eight payoff entries. Time is O(1), space is O(1). Care is needed for degenerate denominators and bounds checking. Optionally, if no interior mixed equilibrium exists, we can fallback to checking the four pure profiles in O(1). 3) Regret matching for zero-sum n×n matrix games over T iterations: Each iteration computes expected utilities for each action against the opponent’s mixed strategy. A naive matrix–vector multiplication costs O(

n^{2}

) per player, thus O(

n^{2}

) overall per round. Updating regrets and normalizing strategies adds O(n). Over T rounds, total time is O(T

n^{2}

) and space is O(n) for regrets and cumulative strategies per player (O(n) each), plus O(

n^{2}

) to store the payoff matrix. If we exploit sparsity or structure (e.g., circulant matrices like Rock–Paper–Scissors), we can reduce per-iteration cost. The regret-matching dynamics converge in average strategies (to a minimax equilibrium in zero-sum) with sublinear regret; thus, increasing T improves accuracy roughly as 1/√T while increasing runtime linearly in T. For very large games, consider sampling-based or online methods that approximate matrix–vector products to trade accuracy for speed.

Code Examples

Find all pure-strategy Nash equilibria in a bimatrix game

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Find all pure-strategy Nash equilibria in a 2-player normal-form game (bimatrix)
5 // A: m x n payoff matrix for Player 1
6 // B: m x n payoff matrix for Player 2
7 // A cell (i,j) is a pure NE if:
8 //  - i is a best response to column j for Player 1 (max A[*][j])
9 //  - j is a best response to row i for Player 2 (max B[i][*])
10 
11 struct Equilibrium {
12     int row, col;        // 0-indexed
13     double u1, u2;       // payoffs
14 };
15 
16 vector<Equilibrium> pureNash(const vector<vector<double>>& A, const vector<vector<double>>& B, double eps = 1e-12) {
17     int m = (int)A.size();
18     int n = (int)A[0].size();
19     vector<vector<bool>> best1(m, vector<bool>(n, false));
20     vector<vector<bool>> best2(m, vector<bool>(n, false));
21 
22     // Player 1: best responses per column
23     for (int j = 0; j < n; ++j) {
24         double mx = -numeric_limits<double>::infinity();
25         for (int i = 0; i < m; ++i) mx = max(mx, A[i][j]);
26         for (int i = 0; i < m; ++i) if (A[i][j] >= mx - eps) best1[i][j] = true; // handle ties
27     }
28 
29     // Player 2: best responses per row
30     for (int i = 0; i < m; ++i) {
31         double mx = -numeric_limits<double>::infinity();
32         for (int j = 0; j < n; ++j) mx = max(mx, B[i][j]);
33         for (int j = 0; j < n; ++j) if (B[i][j] >= mx - eps) best2[i][j] = true; // handle ties
34     }
35 
36     vector<Equilibrium> res;
37     for (int i = 0; i < m; ++i) {
38         for (int j = 0; j < n; ++j) {
39             if (best1[i][j] && best2[i][j]) {
40                 res.push_back({i, j, A[i][j], B[i][j]});
41             }
42         }
43     }
44     return res;
45 }
46 
47 int main() {
48     // Example: Coordination game with two equilibria
49     // A and B identical: players prefer to match actions.
50     vector<vector<double>> A = {{2, 0},
51                                 {0, 1}};
52     vector<vector<double>> B = {{2, 0},
53                                 {0, 1}};
54 
55     auto eqs = pureNash(A, B);
56     cout << "Pure Nash equilibria found: " << eqs.size() << "\n";
57     for (auto &e : eqs) {
58         cout << "At (row=" << e.row << ", col=" << e.col << ") with payoffs ("
59              << e.u1 << ", " << e.u2 << ")\n";
60     }
61 
62     return 0;
63 }
64

This program scans the payoff matrices to locate cells that are mutual best responses—these are exactly the pure-strategy Nash equilibria in a two-player normal-form game. We allow ties (within epsilon) to capture multiple best responses. The example coordination game has two equilibria: (0,0) and (1,1).

Time: O(mn)Space: O(mn)

Closed-form mixed-strategy equilibrium for a 2x2 bimatrix game

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Solve a 2x2 bimatrix game for a mixed-strategy Nash equilibrium (if interior exists).
5 // A = [[A11, A12], [A21, A22]], B = [[B11, B12], [B21, B22]]
6 // Player 1 rows: R1, R2; Player 2 cols: C1, C2.
7 // p = P1 plays R1, q = P2 plays C1.
8 // Indifference conditions yield:
9 //   p = (B22 - B21) / (B11 - B21 - B12 + B22)
10 //   q = (A22 - A12) / (A11 - A12 - A21 + A22)
11 // If denominators are zero or p,q not in [0,1], there is no interior mixed equilibrium; check pure equilibria.
12 
13 struct MixedEquilibrium2x2 {
14     bool has_mixed; // interior mixed equilibrium exists
15     double p, q;    // probabilities if has_mixed
16     vector<pair<int,int>> pure_eqs; // list of pure equilibria (i,j)
17 };
18 
19 vector<pair<int,int>> pureEquilibria2x2(const array<array<double,2>,2>& A,
20                                         const array<array<double,2>,2>& B,
21                                         double eps=1e-12) {
22     vector<pair<int,int>> res;
23     // Best responses for P1 per column
24     for (int j = 0; j < 2; ++j) {
25         double mx = max(A[0][j], A[1][j]);
26         for (int i = 0; i < 2; ++i) {
27             bool br1 = A[i][j] >= mx - eps;
28             // Best responses for P2 per row
29             double my = max(B[i][0], B[i][1]);
30             bool br2 = B[i][j] >= my - eps;
31             if (br1 && br2) res.push_back({i,j});
32         }
33     }
34     // Remove duplicates
35     sort(res.begin(), res.end());
36     res.erase(unique(res.begin(), res.end()), res.end());
37     return res;
38 }
39 
40 MixedEquilibrium2x2 solve2x2(const array<array<double,2>,2>& A,
41                              const array<array<double,2>,2>& B,
42                              double eps=1e-12) {
43     double A11=A[0][0], A12=A[0][1], A21=A[1][0], A22=A[1][1];
44     double B11=B[0][0], B12=B[0][1], B21=B[1][0], B22=B[1][1];
45     double denom_p = (B11 - B21 - B12 + B22);
46     double denom_q = (A11 - A12 - A21 + A22);
47 
48     MixedEquilibrium2x2 ans; ans.has_mixed = false; ans.p=ans.q=numeric_limits<double>::quiet_NaN();
49 
50     if (fabs(denom_p) > eps && fabs(denom_q) > eps) {
51         double p = (B22 - B21) / denom_p;
52         double q = (A22 - A12) / denom_q;
53         if (p >= -eps && p <= 1+eps && q >= -eps && q <= 1+eps) {
54             // Clamp to [0,1]
55             ans.has_mixed = true;
56             ans.p = min(1.0, max(0.0, p));
57             ans.q = min(1.0, max(0.0, q));
58         }
59     }
60 
61     // Always compute pure equilibria as fallback or in addition
62     ans.pure_eqs = pureEquilibria2x2(A, B, eps);
63     return ans;
64 }
65 
66 int main() {
67     // Example 1: Matching pennies (zero-sum) has interior mixed equilibrium p=q=0.5
68     array<array<double,2>,2> A = {{{ 1, -1 },
69                                     { -1, 1 }}}; // P1's payoffs
70     array<array<double,2>,2> B = {{{ -1, 1 },
71                                     { 1, -1 }}}; // P2's payoffs
72 
73     auto ans = solve2x2(A, B);
74     cout << fixed << setprecision(4);
75     if (ans.has_mixed) {
76         cout << "Mixed equilibrium: p(R1)=" << ans.p << ", q(C1)=" << ans.q << "\n";
77     } else {
78         cout << "No interior mixed equilibrium.\n";
79     }
80     cout << "Pure equilibria count: " << ans.pure_eqs.size() << "\n";
81     for (auto [i,j] : ans.pure_eqs) cout << "Pure NE at (" << i << "," << j << ")\n";
82 
83     // Example 2: Prisoner's Dilemma (dominant strategy -> pure NE only)
84     array<array<double,2>,2> A2 = {{{ -1, -3 },
85                                      { 0, -2 }}}; // Row player (C,D)
86     array<array<double,2>,2> B2 = {{{ -1, 0 },
87                                      { -3, -2 }}}; // Column player (C,D)
88     auto ans2 = solve2x2(A2, B2);
89     if (ans2.has_mixed) cout << "(PD) Mixed equilibrium exists unexpectedly.\n";
90     cout << "(PD) Pure equilibria count: " << ans2.pure_eqs.size() << "\n";
91     for (auto [i,j] : ans2.pure_eqs) cout << "(PD) Pure NE at (" << i << "," << j << ")\n";
92 
93     return 0;
94 }
95

This program computes the interior mixed-strategy Nash equilibrium for a 2x2 bimatrix game by making the opponent indifferent between their two actions. It also finds pure equilibria as a fallback. Matching pennies produces p=q=0.5; Prisoner’s Dilemma yields only a pure equilibrium because each player has a dominant strategy.

Time: O(1)Space: O(1)

No-regret learning (regret matching) for zero-sum matrix games

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Regret matching (full-information) to approximate minimax strategies in a zero-sum matrix game.
5 // Player 1 payoff matrix A (n x n); Player 2 payoff is -A. Both update regrets against each other.
6 // Average strategies converge toward a Nash equilibrium in zero-sum games.
7 
8 struct RMState {
9     vector<double> regrets;     // cumulative regrets for each action
10     vector<double> strategy;    // current mixed strategy
11     vector<double> strat_sum;   // cumulative strategy (for averaging)
12 };
13 
14 vector<double> normalizePositive(const vector<double>& x) {
15     double sum_pos = 0.0;
16     for (double v : x) if (v > 1e-15) sum_pos += v;
17     int n = (int)x.size();
18     vector<double> p(n, 0.0);
19     if (sum_pos <= 1e-15) {
20         // uniform if no positive regret
21         for (int i = 0; i < n; ++i) p[i] = 1.0 / n;
22     } else {
23         for (int i = 0; i < n; ++i) p[i] = (x[i] > 0 ? x[i] / sum_pos : 0.0);
24     }
25     return p;
26 }
27 
28 int main() {
29     ios::sync_with_stdio(false);
30     cin.tie(nullptr);
31 
32     // Example: Rock-Paper-Scissors payoff matrix for Player 1
33     // Rows: R P S; Cols: R P S
34     // Win: +1, Lose: -1, Tie: 0
35     vector<vector<double>> A = {
36         { 0, -1,  1 },
37         { 1,  0, -1 },
38         { -1, 1,  0 }
39     };
40 
41     int n = (int)A.size();
42     int T = 20000; // iterations
43 
44     RMState p1{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)};
45     RMState p2{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)};
46 
47     auto matvec_row = [&](const vector<vector<double>>& M, const vector<double>& q){
48         // returns vector v with v[i] = sum_j M[i][j] * q[j]
49         int r = (int)M.size(); int c = (int)M[0].size();
50         vector<double> v(r,0.0);
51         for (int i = 0; i < r; ++i) {
52             double s = 0.0;
53             for (int j = 0; j < c; ++j) s += M[i][j] * q[j];
54             v[i] = s;
55         }
56         return v;
57     };
58 
59     auto matvec_col = [&](const vector<vector<double>>& M, const vector<double>& p){
60         // returns vector w with w[j] = sum_i M[i][j] * p[i]
61         int r = (int)M.size(); int c = (int)M[0].size();
62         vector<double> w(c,0.0);
63         for (int j = 0; j < c; ++j) {
64             double s = 0.0;
65             for (int i = 0; i < r; ++i) s += M[i][j] * p[i];
66             w[j] = s;
67         }
68         return w;
69     };
70 
71     for (int t = 1; t <= T; ++t) {
72         // Compute expected payoff of each action against opponent's current mixed strategy
73         vector<double> u1 = matvec_row(A, p2.strategy); // u1[i] = E[A[i, J]] vs p2
74         vector<double> u1_bar(1, 0.0);
75         for (int i = 0; i < n; ++i) u1_bar[0] += p1.strategy[i] * u1[i];
76 
77         // Player 2 wants to minimize A, so its utility is -A
78         vector<double> u2_neg = matvec_col(A, p1.strategy); // u2_neg[j] = E[A[I, j]]
79         vector<double> u2(n, 0.0);
80         for (int j = 0; j < n; ++j) u2[j] = -u2_neg[j];
81         double u2_bar = 0.0;
82         for (int j = 0; j < n; ++j) u2_bar += p2.strategy[j] * u2[j];
83 
84         // Update cumulative regrets
85         for (int i = 0; i < n; ++i) p1.regrets[i] += (u1[i] - u1_bar[0]);
86         for (int j = 0; j < n; ++j) p2.regrets[j] += (u2[j] - u2_bar);
87 
88         // Update strategies proportional to positive regrets
89         p1.strategy = normalizePositive(p1.regrets);
90         p2.strategy = normalizePositive(p2.regrets);
91 
92         // Accumulate for average strategy
93         for (int i = 0; i < n; ++i) p1.strat_sum[i] += p1.strategy[i];
94         for (int j = 0; j < n; ++j) p2.strat_sum[j] += p2.strategy[j];
95     }
96 
97     // Compute average strategies
98     vector<double> avg1(n), avg2(n);
99     for (int i = 0; i < n; ++i) avg1[i] = p1.strat_sum[i] / T;
100     for (int j = 0; j < n; ++j) avg2[j] = p2.strat_sum[j] / T;
101 
102     // Estimate game value with average strategies: v ≈ avg1^T A avg2
103     double value = 0.0;
104     for (int i = 0; i < n; ++i) {
105         for (int j = 0; j < n; ++j) value += avg1[i] * A[i][j] * avg2[j];
106     }
107 
108     cout << fixed << setprecision(4);
109     cout << "Approximate equilibrium strategies (averaged):\n";
110     cout << "Player 1: "; for (double x : avg1) cout << x << ' '; cout << "\n";
111     cout << "Player 2: "; for (double x : avg2) cout << x << ' '; cout << "\n";
112     cout << "Estimated game value: " << value << " (should be ~0)\n";
113 
114     return 0;
115 }
116

This implementation of full-information regret matching updates each player’s strategy in proportion to positive cumulative regrets. In zero-sum games, the average strategies converge to a minimax equilibrium. Using Rock–Paper–Scissors, the algorithm approaches the uniform distribution and an approximate value of 0.

Time: O(T n^2)Space: O(n^2) for the matrix, O(n) for regrets and strategy storage per player

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Find all pure-strategy Nash equilibria in a 2-player normal-form game (bimatrix)
5	// A: m x n payoff matrix for Player 1
6	// B: m x n payoff matrix for Player 2
7	// A cell (i,j) is a pure NE if:
8	// - i is a best response to column j for Player 1 (max A[*][j])
9	// - j is a best response to row i for Player 2 (max B[i][*])
10
11	struct Equilibrium {
12	int row, col; // 0-indexed
13	double u1, u2; // payoffs
14	};
15
16	vector<Equilibrium> pureNash(const vector<vector<double>>& A, const vector<vector<double>>& B, double eps = 1e-12) {
17	int m = (int)A.size();
18	int n = (int)A[0].size();
19	vector<vector<bool>> best1(m, vector<bool>(n, false));
20	vector<vector<bool>> best2(m, vector<bool>(n, false));
21
22	// Player 1: best responses per column
23	for (int j = 0; j < n; ++j) {
24	double mx = -numeric_limits<double>::infinity();
25	for (int i = 0; i < m; ++i) mx = max(mx, A[i][j]);
26	for (int i = 0; i < m; ++i) if (A[i][j] >= mx - eps) best1[i][j] = true; // handle ties
27	}
28
29	// Player 2: best responses per row
30	for (int i = 0; i < m; ++i) {
31	double mx = -numeric_limits<double>::infinity();
32	for (int j = 0; j < n; ++j) mx = max(mx, B[i][j]);
33	for (int j = 0; j < n; ++j) if (B[i][j] >= mx - eps) best2[i][j] = true; // handle ties
34	}
35
36	vector<Equilibrium> res;
37	for (int i = 0; i < m; ++i) {
38	for (int j = 0; j < n; ++j) {
39	if (best1[i][j] && best2[i][j]) {
40	res.push_back({i, j, A[i][j], B[i][j]});
41	}
42	}
43	}
44	return res;
45	}
46
47	int main() {
48	// Example: Coordination game with two equilibria
49	// A and B identical: players prefer to match actions.
50	vector<vector<double>> A = {{2, 0},
51	{0, 1}};
52	vector<vector<double>> B = {{2, 0},
53	{0, 1}};
54
55	auto eqs = pureNash(A, B);
56	cout << "Pure Nash equilibria found: " << eqs.size() << "\n";
57	for (auto &e : eqs) {
58	cout << "At (row=" << e.row << ", col=" << e.col << ") with payoffs ("
59	<< e.u1 << ", " << e.u2 << ")\n";
60	}
61
62	return 0;
63	}
64

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Solve a 2x2 bimatrix game for a mixed-strategy Nash equilibrium (if interior exists).
5	// A = [[A11, A12], [A21, A22]], B = [[B11, B12], [B21, B22]]
6	// Player 1 rows: R1, R2; Player 2 cols: C1, C2.
7	// p = P1 plays R1, q = P2 plays C1.
8	// Indifference conditions yield:
9	// p = (B22 - B21) / (B11 - B21 - B12 + B22)
10	// q = (A22 - A12) / (A11 - A12 - A21 + A22)
11	// If denominators are zero or p,q not in [0,1], there is no interior mixed equilibrium; check pure equilibria.
12
13	struct MixedEquilibrium2x2 {
14	bool has_mixed; // interior mixed equilibrium exists
15	double p, q; // probabilities if has_mixed
16	vector<pair<int,int>> pure_eqs; // list of pure equilibria (i,j)
17	};
18
19	vector<pair<int,int>> pureEquilibria2x2(const array<array<double,2>,2>& A,
20	const array<array<double,2>,2>& B,
21	double eps=1e-12) {
22	vector<pair<int,int>> res;
23	// Best responses for P1 per column
24	for (int j = 0; j < 2; ++j) {
25	double mx = max(A[0][j], A[1][j]);
26	for (int i = 0; i < 2; ++i) {
27	bool br1 = A[i][j] >= mx - eps;
28	// Best responses for P2 per row
29	double my = max(B[i][0], B[i][1]);
30	bool br2 = B[i][j] >= my - eps;
31	if (br1 && br2) res.push_back({i,j});
32	}
33	}
34	// Remove duplicates
35	sort(res.begin(), res.end());
36	res.erase(unique(res.begin(), res.end()), res.end());
37	return res;
38	}
39
40	MixedEquilibrium2x2 solve2x2(const array<array<double,2>,2>& A,
41	const array<array<double,2>,2>& B,
42	double eps=1e-12) {
43	double A11=A[0][0], A12=A[0][1], A21=A[1][0], A22=A[1][1];
44	double B11=B[0][0], B12=B[0][1], B21=B[1][0], B22=B[1][1];
45	double denom_p = (B11 - B21 - B12 + B22);
46	double denom_q = (A11 - A12 - A21 + A22);
47
48	MixedEquilibrium2x2 ans; ans.has_mixed = false; ans.p=ans.q=numeric_limits<double>::quiet_NaN();
49
50	if (fabs(denom_p) > eps && fabs(denom_q) > eps) {
51	double p = (B22 - B21) / denom_p;
52	double q = (A22 - A12) / denom_q;
53	if (p >= -eps && p <= 1+eps && q >= -eps && q <= 1+eps) {
54	// Clamp to [0,1]
55	ans.has_mixed = true;
56	ans.p = min(1.0, max(0.0, p));
57	ans.q = min(1.0, max(0.0, q));
58	}
59	}
60
61	// Always compute pure equilibria as fallback or in addition
62	ans.pure_eqs = pureEquilibria2x2(A, B, eps);
63	return ans;
64	}
65
66	int main() {
67	// Example 1: Matching pennies (zero-sum) has interior mixed equilibrium p=q=0.5
68	array<array<double,2>,2> A = {{{ 1, -1 },
69	{ -1, 1 }}}; // P1's payoffs
70	array<array<double,2>,2> B = {{{ -1, 1 },
71	{ 1, -1 }}}; // P2's payoffs
72
73	auto ans = solve2x2(A, B);
74	cout << fixed << setprecision(4);
75	if (ans.has_mixed) {
76	cout << "Mixed equilibrium: p(R1)=" << ans.p << ", q(C1)=" << ans.q << "\n";
77	} else {
78	cout << "No interior mixed equilibrium.\n";
79	}
80	cout << "Pure equilibria count: " << ans.pure_eqs.size() << "\n";
81	for (auto [i,j] : ans.pure_eqs) cout << "Pure NE at (" << i << "," << j << ")\n";
82
83	// Example 2: Prisoner's Dilemma (dominant strategy -> pure NE only)
84	array<array<double,2>,2> A2 = {{{ -1, -3 },
85	{ 0, -2 }}}; // Row player (C,D)
86	array<array<double,2>,2> B2 = {{{ -1, 0 },
87	{ -3, -2 }}}; // Column player (C,D)
88	auto ans2 = solve2x2(A2, B2);
89	if (ans2.has_mixed) cout << "(PD) Mixed equilibrium exists unexpectedly.\n";
90	cout << "(PD) Pure equilibria count: " << ans2.pure_eqs.size() << "\n";
91	for (auto [i,j] : ans2.pure_eqs) cout << "(PD) Pure NE at (" << i << "," << j << ")\n";
92
93	return 0;
94	}
95

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Regret matching (full-information) to approximate minimax strategies in a zero-sum matrix game.
5	// Player 1 payoff matrix A (n x n); Player 2 payoff is -A. Both update regrets against each other.
6	// Average strategies converge toward a Nash equilibrium in zero-sum games.
7
8	struct RMState {
9	vector<double> regrets; // cumulative regrets for each action
10	vector<double> strategy; // current mixed strategy
11	vector<double> strat_sum; // cumulative strategy (for averaging)
12	};
13
14	vector<double> normalizePositive(const vector<double>& x) {
15	double sum_pos = 0.0;
16	for (double v : x) if (v > 1e-15) sum_pos += v;
17	int n = (int)x.size();
18	vector<double> p(n, 0.0);
19	if (sum_pos <= 1e-15) {
20	// uniform if no positive regret
21	for (int i = 0; i < n; ++i) p[i] = 1.0 / n;
22	} else {
23	for (int i = 0; i < n; ++i) p[i] = (x[i] > 0 ? x[i] / sum_pos : 0.0);
24	}
25	return p;
26	}
27
28	int main() {
29	ios::sync_with_stdio(false);
30	cin.tie(nullptr);
31
32	// Example: Rock-Paper-Scissors payoff matrix for Player 1
33	// Rows: R P S; Cols: R P S
34	// Win: +1, Lose: -1, Tie: 0
35	vector<vector<double>> A = {
36	{ 0, -1, 1 },
37	{ 1, 0, -1 },
38	{ -1, 1, 0 }
39	};
40
41	int n = (int)A.size();
42	int T = 20000; // iterations
43
44	RMState p1{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)};
45	RMState p2{vector<double>(n,0.0), vector<double>(n,1.0/n), vector<double>(n,0.0)};
46
47	auto matvec_row = [&](const vector<vector<double>>& M, const vector<double>& q){
48	// returns vector v with v[i] = sum_j M[i][j] * q[j]
49	int r = (int)M.size(); int c = (int)M[0].size();
50	vector<double> v(r,0.0);
51	for (int i = 0; i < r; ++i) {
52	double s = 0.0;
53	for (int j = 0; j < c; ++j) s += M[i][j] * q[j];
54	v[i] = s;
55	}
56	return v;
57	};
58
59	auto matvec_col = [&](const vector<vector<double>>& M, const vector<double>& p){
60	// returns vector w with w[j] = sum_i M[i][j] * p[i]
61	int r = (int)M.size(); int c = (int)M[0].size();
62	vector<double> w(c,0.0);
63	for (int j = 0; j < c; ++j) {
64	double s = 0.0;
65	for (int i = 0; i < r; ++i) s += M[i][j] * p[i];
66	w[j] = s;
67	}
68	return w;
69	};
70
71	for (int t = 1; t <= T; ++t) {
72	// Compute expected payoff of each action against opponent's current mixed strategy
73	vector<double> u1 = matvec_row(A, p2.strategy); // u1[i] = E[A[i, J]] vs p2
74	vector<double> u1_bar(1, 0.0);
75	for (int i = 0; i < n; ++i) u1_bar[0] += p1.strategy[i] * u1[i];
76
77	// Player 2 wants to minimize A, so its utility is -A
78	vector<double> u2_neg = matvec_col(A, p1.strategy); // u2_neg[j] = E[A[I, j]]
79	vector<double> u2(n, 0.0);
80	for (int j = 0; j < n; ++j) u2[j] = -u2_neg[j];
81	double u2_bar = 0.0;
82	for (int j = 0; j < n; ++j) u2_bar += p2.strategy[j] * u2[j];
83
84	// Update cumulative regrets
85	for (int i = 0; i < n; ++i) p1.regrets[i] += (u1[i] - u1_bar[0]);
86	for (int j = 0; j < n; ++j) p2.regrets[j] += (u2[j] - u2_bar);
87
88	// Update strategies proportional to positive regrets
89	p1.strategy = normalizePositive(p1.regrets);
90	p2.strategy = normalizePositive(p2.regrets);
91
92	// Accumulate for average strategy
93	for (int i = 0; i < n; ++i) p1.strat_sum[i] += p1.strategy[i];
94	for (int j = 0; j < n; ++j) p2.strat_sum[j] += p2.strategy[j];
95	}
96
97	// Compute average strategies
98	vector<double> avg1(n), avg2(n);
99	for (int i = 0; i < n; ++i) avg1[i] = p1.strat_sum[i] / T;
100	for (int j = 0; j < n; ++j) avg2[j] = p2.strat_sum[j] / T;
101
102	// Estimate game value with average strategies: v ≈ avg1^T A avg2
103	double value = 0.0;
104	for (int i = 0; i < n; ++i) {
105	for (int j = 0; j < n; ++j) value += avg1[i] * A[i][j] * avg2[j];
106	}
107
108	cout << fixed << setprecision(4);
109	cout << "Approximate equilibrium strategies (averaged):\n";
110	cout << "Player 1: "; for (double x : avg1) cout << x << ' '; cout << "\n";
111	cout << "Player 2: "; for (double x : avg2) cout << x << ' '; cout << "\n";
112	cout << "Estimated game value: " << value << " (should be ~0)\n";
113
114	return 0;
115	}
116