📚TheoryIntermediate

Minimax Theorem

Key Points

  • •
    The Minimax Theorem states that in zero-sum two-player games with suitable convexity and compactness, the best guaranteed payoff for the maximizer equals the worst-case loss for the minimizer.
  • •
    In finite matrix games, this implies there always exists a mixed-strategy Nash equilibrium and a well-defined game value.
  • •
    A saddle point (x*, y*) satisfies f(x*, y) ≤ f(x*, y*) ≤ f(x, y*) for all strategies, meaning neither player can unilaterally improve.
  • •
    Alpha–beta pruning is a practical algorithmic embodiment of minimax for game trees, reducing the number of nodes evaluated.
  • •
    Mixed-strategy equilibria of matrix games can be approximated via multiplicative weights (Hedge), which converges with O(1/√T) error after T iterations.
  • •
    GAN training uses a minimax objective, but because the game is typically non-convex–non-concave, naive gradient descent–ascent may cycle instead of converging.
  • •
    Extra-gradient (optimistic) methods can stabilize saddle-point optimization compared to simultaneous gradient steps.
  • •
    Minimax provides theoretical foundations for adversarial training and robust optimization, guiding algorithm design and analysis.

Prerequisites

  • →Basic probability and expectation — Mixed strategies are probability distributions; payoffs and GAN objectives use expectations.
  • →Linear algebra — Matrix games use bilinear forms x^T A y and vector operations.
  • →Convexity and concavity — Minimax equality requires convex–concave structure; algorithms rely on convex analysis.
  • →Calculus and gradients — Gradient descent–ascent and extra-gradient methods are defined via derivatives.
  • →Algorithms and complexity — Understanding alpha–beta pruning and runtime/space trade-offs is essential for practical search.
  • →Online learning / regret minimization (intro) — Multiplicative weights provides a simple path to approximate minimax equilibria.
  • →Optimization / linear programming (intro) — Matrix game equilibria can be formulated and solved via LP.

Detailed Explanation

Tap terms for definitions

01Overview

The Minimax Theorem is a cornerstone of game theory and optimization that equates two a priori different quantities: the maximum that a player can secure assuming the opponent tries to minimize their gain, and the minimum that the opponent can enforce assuming the player tries to maximize. This result, due to von Neumann, applies to zero-sum games where one player’s gain is the other’s loss. In finite games, it ensures the existence of mixed-strategy equilibria: probability distributions over actions for each player that make them indifferent to unilateral deviations. In continuous settings, under convexity, compactness, and continuity, the theorem generalizes via convex–concave saddle-point theory. Practically, minimax underlies decision-making in adversarial environments: from classic board games (chess, checkers) to modern AI/ML objectives such as Generative Adversarial Networks (GANs) and robust optimization. Algorithmically, it motivates search methods (minimax with alpha–beta pruning for game trees), convex optimization (linear programming for matrix games), and online learning methods (multiplicative weights) that find or approximate equilibria. While the theorem gives existence and equality of values, actually computing equilibria efficiently depends on the problem structure and can require tailored algorithms or approximations.

02Intuition & Analogies

Imagine negotiating the price of a used car. You (buyer) want to pay as little as possible; the seller wants as much as possible. If each of you thinks, “What’s the best I can guarantee regardless of what the other does?”, you’re both performing minimax reasoning. Your safe plan is the price that, even if the seller is as tough as possible, you won’t exceed; the seller’s safe plan is the price that, even if you negotiate hard, they won’t go below. The Minimax Theorem says that—under the right conditions—these two safety guarantees meet at the same point. That point is fair in the adversarial sense: neither of you can force a better outcome without the other allowing it. In games like rock–paper–scissors, no single action is unbeatable. Instead, you randomize: if you play each option with equal probability, your opponent can’t exploit you. This mixed strategy is your shield; the opponent’s best response doesn’t help them. The equality of maximin and minimax means your shield is as strong as their spear, balancing at a saddle point. In machine learning, think of GANs: a generator tries to fool a discriminator, while the discriminator tries to expose fakes. The generator asks, “What strategy minimizes my maximum detection?” and the discriminator asks, “What maximizes my minimum detection?” If the problem were perfectly convex–concave, they’d meet at a saddle point; but real neural networks break these assumptions, which is why simple training can oscillate. Still, minimax thinking guides better algorithms (e.g., extra-gradient) and theory (robust objectives) to stabilize this tug-of-war.

03Formal Definition

Consider a two-player zero-sum game with strategy sets X (maximizer) and Y (minimizer), and payoff function f: X × Y → ℝ where f(x, y) is the payoff to the maximizer (and −f(x, y) to the minimizer). Define the maximin value v_ f(x, y) and the minimax value v_ f(x, y). In general v_ (weak duality). The Minimax Theorem (von Neumann; extended by Sion) asserts equality v_ under appropriate conditions (e.g., X and Y are nonempty, convex, compact, and f is continuous, convex in y, and concave in x). The common value v* is the value of the game. A pair (x*, y*) is a saddle point if for all x ∈ X, y ∈ Y, f(x*, y) ≤ f(x*, y*) ≤ f(x, y*). At a saddle point, the maximin and minimax are attained: f(x*, y*) = v*. In finite games, allowing mixed strategies—probability distributions over pure actions—makes X and Y simplices, which are convex and compact; therefore, a mixed-strategy Nash equilibrium exists and realizes the saddle point in expected payoff. For matrix games with m row actions and n column actions, the payoff is f(x, y) = A y where A ∈ and x, y are probability vectors. The equilibrium can be obtained by linear programming or by no-regret learning dynamics.

04When to Use

Use minimax reasoning when planning under worst-case adversaries. In AI game-playing, minimax (with alpha–beta pruning) evaluates game trees to choose moves that maximize your guaranteed outcome, assuming optimal opponent play. In operations research and security, robust optimization models uncertainty as an adversary, seeking decisions that minimize worst-case loss. In machine learning, adversarial training and GANs use minimax formulations to train models that perform well against malicious perturbations or discriminators. When payoffs are bilinear and strategy sets are probability simplices (finite zero-sum games), linear programming or multiplicative weights provides principled ways to compute (or approximate) equilibria. When the payoff is convex–concave and smooth, first-order saddle-point methods (mirror descent, extra-gradient) are appropriate. If the game is non-convex–non-concave (e.g., neural networks in GANs), pure minimax theory does not ensure convergence of naive algorithms; use stabilized methods (optimistic/extra-gradient, regularization, or proximal terms) and diagnostics (e.g., monitoring duality gaps). In large or unknown games, online learning with no-regret updates converges to equilibria in a time-average sense and scales well.

⚠️Common Mistakes

• Assuming minimax equality without checking conditions: For non-convex–non-concave or non-compact sets, max_x min_y f(x, y) may be strictly less than min_y max_x f(x, y). Always verify convexity/concavity, compactness, and continuity before invoking equality. • Confusing pure with mixed strategies: Finite zero-sum games may lack pure equilibria (e.g., rock–paper–scissors). You must allow probability distributions over actions to guarantee equilibrium existence. • Believing gradient descent–ascent always converges: In bilinear games like f(x, y) = x y, simultaneous gradient steps orbit the saddle. Use extra-gradient, optimistic, or appropriate regularization/step sizes. • Ignoring payoff scaling in multiplicative weights: Theoretical guarantees assume bounded losses in [0, 1]. Normalize payoff matrices to avoid numerical instability and to match regret bounds; then rescale the value back. • Overestimating alpha–beta pruning gains without move ordering: Alpha–beta pruning saves work when good bounds are discovered early. Poor move ordering can leave you with near brute-force complexity. • Forgetting dual viewpoints: Solving only one player’s LP or ignoring primal/dual feasibility can produce incorrect strategies. Check that both players’ strategies and the reported value satisfy complementary slackness or a small duality gap.

Key Formulas

Minimax Theorem

Explanation: Under convexity, compactness, and continuity, the best guaranteed payoff for the maximizer equals the worst-case loss enforced by the minimizer. The common value v* is the value of the game.

Saddle Point Inequalities

Explanation: A saddle point (x*, y*) is stable against unilateral deviations. It certifies optimality for both players in a zero-sum game.

Matrix Game Payoff

Explanation: In finite zero-sum games, payoffs are bilinear in mixed strategies. The simplices and are the sets of probability distributions over actions.

Row Player LP (informal)

Explanation: The row player chooses a distribution x to maximize the guaranteed value v so that every column’s expected payoff is at least v. This is one side of the primal–dual pair.

Column Player LP (informal)

Explanation: The column player minimizes the upper bound u so that every row’s expected payoff is at most u. At equilibrium, u = v = v* when strong duality holds.

GAN Objective

Explanation: GANs set up a minimax game between a generator and discriminator. The generator tries to minimize the discriminator’s ability to tell real from fake, while the discriminator maximizes it.

Gradient Descent–Ascent (GDA)

Explanation: Simultaneous gradient steps for max–min problems. For bilinear f(x, y) = A y, these dynamics can rotate and fail to converge.

Multiplicative Weights Regret

Explanation: The average regret of Hedge vanishes like 1/. When both players use no-regret algorithms in a zero-sum game, time-averaged strategies approach a minimax equilibrium.

Extra-Gradient Step

Explanation: A lookahead gradient at a mid-point corrects rotational dynamics and stabilizes convergence in convex–concave saddle-point problems.

Strategy Simplex Constraints

Explanation: Mixed strategies lie on simplices: nonnegative components summing to one. These constraints make the feasible sets convex and compact.

Complexity Analysis

There are two complementary views of complexity around minimax: (1) theorems of existence/equality, and (2) algorithms that compute or approximate equilibria. For finite game trees, a naive minimax search explores O() nodes, where b is branching factor and d is depth. Alpha–beta pruning reduces the effective branching when good bounds are found, achieving O() in the best case with perfect move ordering. Memory for depth-first alpha–beta is O(d) (call stack and a small set of variables), making it space-efficient despite potentially exponential time. For matrix games with m row actions and n column actions, linear programming can solve equilibria in polynomial time using interior-point methods; however, implementing LP solvers from scratch is nontrivial. Online learning offers a simpler, scalable alternative: multiplicative weights (Hedge) updates all actions each round, requiring O(m n) time per iteration to compute expected payoffs and O(m + n) space for weights and averages. With an appropriate learning rate, after T iterations the average strategies achieve an O(√(log(mn)/T)) duality gap. To guarantee an it suffices to take ))/ yielding total time O((mn log(mn))/ and space O(m + n). For smooth convex–concave saddle-point problems of dimension p and q, gradient descent–ascent and extra-gradient methods run in O(p + q) time per iteration (for dense gradients). Convergence rates depend on smoothness/strong convexity–concavity: extra-gradient attains O rates for monotone operators under Lipschitz continuity, while plain GDA may diverge or cycle in bilinear cases. In large-scale ML (e.g., GANs), per-iteration cost is dominated by neural network forward/backward passes; stability enhancements (optimism, regularization) improve effective convergence without changing asymptotic per-step complexity.

Code Examples

Alpha–Beta Pruning on a Game Tree (Minimax Search)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Node {
5 vector<int> children; // indices of child nodes
6 bool isLeaf = false;
7 int value = 0; // payoff at leaf (from maximizer's perspective)
8};
9
10// Returns pair: (bestValue, bestPath)
11pair<int, vector<int>> alphabeta(const vector<Node>& tree, int u, bool maximizing,
12 int alpha, int beta) {
13 if (tree[u].isLeaf) {
14 return {tree[u].value, {u}};
15 }
16 if (maximizing) {
17 int bestVal = numeric_limits<int>::min();
18 vector<int> bestPath;
19 for (int v : tree[u].children) {
20 auto [childVal, childPath] = alphabeta(tree, v, false, alpha, beta);
21 if (childVal > bestVal) {
22 bestVal = childVal;
23 bestPath = {u};
24 bestPath.insert(bestPath.end(), childPath.begin(), childPath.end());
25 }
26 alpha = max(alpha, bestVal);
27 if (beta <= alpha) break; // beta cut-off
28 }
29 return {bestVal, bestPath};
30 } else {
31 int bestVal = numeric_limits<int>::max();
32 vector<int> bestPath;
33 for (int v : tree[u].children) {
34 auto [childVal, childPath] = alphabeta(tree, v, true, alpha, beta);
35 if (childVal < bestVal) {
36 bestVal = childVal;
37 bestPath = {u};
38 bestPath.insert(bestPath.end(), childPath.begin(), childPath.end());
39 }
40 beta = min(beta, bestVal);
41 if (beta <= alpha) break; // alpha cut-off
42 }
43 return {bestVal, bestPath};
44 }
45}
46
47int main() {
48 // Build a small example tree
49 // Structure:
50 // (0)
51 // / \
52 // 1 2
53 // / | \ / \
54 // 3 4 5 6 7
55 // leaf values at nodes 3..7
56 vector<Node> tree(8);
57 tree[0].children = {1,2};
58 tree[1].children = {3,4,5};
59 tree[2].children = {6,7};
60 // leaves
61 for (int u : {3,4,5,6,7}) tree[u].isLeaf = true;
62 tree[3].value = 3; // example payoffs
63 tree[4].value = 5;
64 tree[5].value = 2;
65 tree[6].value = 9;
66 tree[7].value = 0;
67
68 auto [bestVal, bestPath] = alphabeta(tree, 0, true,
69 numeric_limits<int>::min(),
70 numeric_limits<int>::max());
71
72 cout << "Best minimax value: " << bestVal << "\n";
73 cout << "Chosen path (node ids): ";
74 for (int id : bestPath) cout << id << ' ';
75 cout << "\n";
76
77 return 0;
78}
79

This program runs minimax with alpha–beta pruning on a small game tree. Max nodes choose the child with the highest value, and Min nodes choose the lowest. Alpha (best lower bound for Max) and Beta (best upper bound for Min) prune subtrees that cannot affect the final decision, reflecting minimax reasoning: maximize your guaranteed outcome while assuming the opponent minimizes it.

Time: O(b^d) in the worst case; O(b^{d/2}) with ideal move orderingSpace: O(d) recursion stack plus O(n) to store the tree
Multiplicative Weights to Approximate a Minimax Equilibrium of a Matrix Game
1#include <bits/stdc++.h>
2using namespace std;
3
4struct MWUResult {
5 vector<double> p_avg; // row player's mixed strategy
6 vector<double> q_avg; // column player's mixed strategy
7 double value; // estimated game value (original scale)
8};
9
10static vector<double> softNormalize(const vector<double>& w) {
11 double s = accumulate(w.begin(), w.end(), 0.0);
12 vector<double> p(w.size());
13 for (size_t i = 0; i < w.size(); ++i) p[i] = w[i] / s;
14 return p;
15}
16
17MWUResult solve_minimax_mwu(vector<vector<double>> A, int T = 5000, double eta = -1.0) {
18 int m = (int)A.size();
19 int n = (int)A[0].size();
20 // Normalize payoffs to [0,1] for stability and theory
21 double amin = A[0][0], amax = A[0][0];
22 for (int i = 0; i < m; ++i)
23 for (int j = 0; j < n; ++j) {
24 amin = min(amin, A[i][j]);
25 amax = max(amax, A[i][j]);
26 }
27 double scale = (amax > amin) ? (1.0 / (amax - amin)) : 1.0;
28 for (int i = 0; i < m; ++i)
29 for (int j = 0; j < n; ++j)
30 A[i][j] = (A[i][j] - amin) * scale; // now in [0,1]
31
32 // Default learning rate if not provided: ~ sqrt((log(mn))/T)
33 if (eta <= 0.0) {
34 double L = log((double)(m * n) + 1.0);
35 eta = sqrt(L / max(1, T));
36 eta = min(0.5, eta); // safety cap
37 }
38
39 vector<double> w_row(m, 1.0), w_col(n, 1.0);
40 vector<double> p_sum(m, 0.0), q_sum(n, 0.0);
41
42 auto matvec_colpay = [&](const vector<double>& p, int j){
43 double s = 0.0; for (int i = 0; i < m; ++i) s += p[i] * A[i][j]; return s; };
44 auto matvec_rowpay = [&](int i, const vector<double>& q){
45 double s = 0.0; for (int j = 0; j < n; ++j) s += A[i][j] * q[j]; return s; };
46
47 for (int t = 1; t <= T; ++t) {
48 vector<double> p = softNormalize(w_row);
49 vector<double> q = softNormalize(w_col);
50
51 // Accumulate averages
52 for (int i = 0; i < m; ++i) p_sum[i] += p[i];
53 for (int j = 0; j < n; ++j) q_sum[j] += q[j];
54
55 // Compute expected payoffs per action
56 vector<double> rowPay(m), colPay(n);
57 for (int i = 0; i < m; ++i) rowPay[i] = matvec_rowpay(i, q); // E[A[i, j]] against q
58 for (int j = 0; j < n; ++j) colPay[j] = matvec_colpay(p, j); // E[A[i, j]] against p
59
60 // Update weights: row maximizes, column minimizes
61 for (int i = 0; i < m; ++i) w_row[i] *= exp(eta * rowPay[i]);
62 for (int j = 0; j < n; ++j) w_col[j] *= exp(-eta * colPay[j]);
63
64 // Prevent numerical blow-up via renormalization
65 if (t % 200 == 0) {
66 double sr = accumulate(w_row.begin(), w_row.end(), 0.0);
67 double sc = accumulate(w_col.begin(), w_col.end(), 0.0);
68 for (int i = 0; i < m; ++i) w_row[i] /= sr;
69 for (int j = 0; j < n; ++j) w_col[j] /= sc;
70 }
71 }
72
73 // Averages over time
74 vector<double> p_avg(m), q_avg(n);
75 for (int i = 0; i < m; ++i) p_avg[i] = p_sum[i] / T;
76 for (int j = 0; j < n; ++j) q_avg[j] = q_sum[j] / T;
77
78 // Estimated value on normalized scale
79 double v_norm = 0.0;
80 for (int i = 0; i < m; ++i)
81 for (int j = 0; j < n; ++j)
82 v_norm += p_avg[i] * A[i][j] * q_avg[j];
83
84 // Map back to original scale: A_orig = amin + (amax - amin) * A_norm
85 double value = (amax > amin) ? (amin + (v_norm / scale)) : amin;
86
87 return {p_avg, q_avg, value};
88}
89
90int main() {
91 // Example: Rock–Paper–Scissors payoff for row player (win=+1, tie=0, lose=-1)
92 vector<vector<double>> A = {
93 { 0, -1, 1}, // Rock vs (R,P,S)
94 { 1, 0, -1}, // Paper
95 {-1, 1, 0} // Scissors
96 };
97
98 MWUResult res = solve_minimax_mwu(A, /*T=*/5000);
99
100 cout.setf(ios::fixed); cout << setprecision(4);
101 cout << "Row mixed strategy (p): ";
102 for (double x : res.p_avg) cout << x << ' '; cout << "\n";
103 cout << "Col mixed strategy (q): ";
104 for (double y : res.q_avg) cout << y << ' '; cout << "\n";
105 cout << "Estimated game value: " << res.value << "\n";
106 return 0;
107}
108

This code approximates the mixed-strategy equilibrium of a zero-sum matrix game using multiplicative weights. Both players update weights exponentially toward actions with better expected performance given the opponent’s current mixture. We normalize the payoff matrix to [0, 1] to satisfy Hedge assumptions and then rescale the estimated value to the original units. Time-averaged strategies converge toward a minimax equilibrium with vanishing duality gap.

Time: O(T ¡ m ¡ n)Space: O(m + n)
Gradient Descent–Ascent vs Extra-Gradient on f(x, y) = x·y
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Trajectory { vector<pair<double,double>> pts; };
5
6Trajectory gda_xy(double x0, double y0, double lr, int T) {
7 double x = x0, y = y0; Trajectory tr;
8 for (int t = 0; t < T; ++t) {
9 tr.pts.push_back({x, y});
10 double gx = y; // df/dx = y
11 double gy = x; // df/dy = x
12 double xn = x - lr * gx; // descent in x
13 double yn = y + lr * gy; // ascent in y
14 x = xn; y = yn;
15 }
16 tr.pts.push_back({x, y});
17 return tr;
18}
19
20Trajectory extragradient_xy(double x0, double y0, double lr, int T) {
21 double x = x0, y = y0; Trajectory tr;
22 for (int t = 0; t < T; ++t) {
23 tr.pts.push_back({x, y});
24 // Midpoint (lookahead)
25 double gx = y, gy = x;
26 double x_mid = x - lr * gx;
27 double y_mid = y + lr * gy;
28 // Gradient at midpoint
29 double gx_mid = y_mid;
30 double gy_mid = x_mid;
31 // Corrected update
32 x = x - lr * gx_mid;
33 y = y + lr * gy_mid;
34 }
35 tr.pts.push_back({x, y});
36 return tr;
37}
38
39int main() {
40 double x0 = 1.0, y0 = 1.0; // start away from origin (saddle)
41 double lr = 0.2; // try 0.05..0.3 to see behaviors
42 int T = 50;
43
44 auto tr_gda = gda_xy(x0, y0, lr, T);
45 auto tr_eg = extragradient_xy(x0, y0, lr, T);
46
47 cout.setf(ios::fixed); cout << setprecision(4);
48 cout << "GDA last point: (" << tr_gda.pts.back().first << ", " << tr_gda.pts.back().second << ")\n";
49 cout << "Extra-Grad last point: (" << tr_eg.pts.back().first << ", " << tr_eg.pts.back().second << ")\n";
50
51 // Print a few trajectory points
52 cout << "First 10 steps (GDA vs Extra-Grad):\n";
53 for (int i = 0; i < min(10, (int)tr_gda.pts.size()); ++i) {
54 auto a = tr_gda.pts[i];
55 auto b = tr_eg.pts[i];
56 cout << i << ": GDA(" << a.first << "," << a.second << ") EG(" << b.first << "," << b.second << ")\n";
57 }
58 return 0;
59}
60

For the bilinear saddle f(x, y) = x·y, simultaneous gradient descent–ascent (GDA) exhibits cyclic behavior (rotations) and may not converge. The extra-gradient method computes a lookahead gradient at a midpoint and corrects the update, damping rotation and moving toward the saddle at (0, 0). This simple demo mirrors stability issues in GAN training and why optimistic or extra-gradient variants can help.

Time: O(T)Space: O(T) to store trajectory (O(1) if you stream results)