Lagrangian Duality
Key Points
- •Lagrangian duality turns a constrained minimization problem into a related maximization problem that provides lower bounds on the original objective.
- •The Lagrangian adds weighted penalties (Lagrange multipliers) for constraint violations to the original objective and then minimizes over the primal variables.
- •The dual function is concave and its maximization over nonnegative multipliers (and free multipliers for equalities) defines the dual problem.
- •Weak duality always holds: the best dual value is never greater than the best primal value, so the dual is a certified lower bound.
- •Under conditions like Slater's condition for convex problems, strong duality holds and the optimal primal and dual values are equal.
- •The Karush–Kuhn–Tucker (KKT) conditions characterize optimality; in convex problems they are necessary and sufficient.
- •Complementary slackness links constraints and multipliers: each inequality is either tight or has zero multiplier at optimum.
- •Duality is fundamental in ML: the SVM training problem is often solved in dual form, enabling kernels and efficient algorithms.
Prerequisites
- →Multivariable calculus — Gradients and stationary points are essential to form and interpret KKT stationarity.
- →Linear algebra — Vectors, matrices, and solving linear systems are needed to derive and compute x*(lambda) and dual functions.
- →Convex analysis — Convexity, Slater’s condition, and properties of convex functions underlie strong duality and KKT sufficiency.
- →Optimization basics — Understanding objective/constraints, feasibility, and gradient-based methods helps in applying dual ascent.
- →C++ programming — Implementing optimization routines, numerical linear algebra, and careful floating-point computations.
Detailed Explanation
Tap terms for definitions01Overview
Imagine trying to minimize a cost while obeying rules. If you break a rule, you pay a penalty. Lagrangian duality formalizes this idea: we attach a nonnegative penalty (Lagrange multiplier) to each inequality and a signed multiplier to each equality, add these penalties to the objective, and then ask, "What is the best cost we could get if we were allowed to adjust the decision and pay penalties?" The minimum over the decision variables yields the dual function, which is always a lower bound on the true best cost. Then we flip the perspective and choose penalties to make that bound as tight as possible—this is the dual problem. The power of duality is twofold: it provides certified lower bounds (useful to prove optimality or stop early) and it often simplifies or decomposes the original problem. In convex optimization, and under mild regularity conditions (like Slater’s condition), there is no gap between the best primal and dual values, so solving the dual recovers the primal solution. This perspective underpins many algorithms and ML models, notably the support vector machine (SVM), where the dual exposes the kernel trick and sparsity in support vectors.
02Intuition & Analogies
Hook: Think of planning a road trip with a budget and time limits. You try to pick routes (decisions) to minimize fuel cost while obeying constraints like total time and toll limits. Now suppose a friend says, "If you exceed the time limit, I’ll fine you 2 per extra dollar." You now minimize fuel plus fines. If the fines are large, you will naturally obey the rules; if they’re small, you might break them but pay. Concept: Those fine rates are the Lagrange multipliers. The Lagrangian is fuel cost plus the fines for each rule violation. For any fixed set of fine rates, you can pick a best route (minimize over routes). That minimum is a lower bound on the true best possible cost with rules enforced, because when penalties are high enough, violating becomes too expensive. Example: If the time fine is huge, any plan that violates time will be worse than staying within time—so the best penalized plan equals the best feasible plan. Duality says, "Choose fine rates to make this bound as tight as possible." When chosen just right in convex problems (with some slack feasibility), the penalty scheme pins the best route exactly—you recover the true constrained optimum. Complementary slackness matches intuition: a rule that’s not tight has zero fine at optimum (no reason to price it), and a tight rule has a positive fine (its price reflects its scarcity).
03Formal Definition
04When to Use
Use Lagrangian duality when you need certified lower bounds, decomposition, or a simpler equivalent problem. Typical contexts include: (1) Convex optimization with inequality/equality constraints, where strong duality often holds and KKT conditions enable efficient algorithms. (2) Quadratic programs (QPs), where the dual may be lower dimensional or more structured (e.g., box constraints), enabling faster solvers. (3) Machine learning models like SVMs: the dual transforms constraints into a kernelized quadratic form with simple box constraints and one equality, making kernels and sparsity natural. (4) Resource allocation and networks: dual variables act as prices coordinating decentralized decisions (e.g., in networking or distributed control). (5) Algorithm design: dual ascent, projected gradient on the dual, and ADMM use dual variables to enforce constraints iteratively. (6) Certification: if you have a candidate feasible primal solution, a dual solution provides a gap certificate p(x) - d(\lambda, \nu) to measure suboptimality and stop early. Especially apply it when Slater’s condition is easy to verify, the dual is easier to optimize, or decomposition across constraints/blocks yields parallelism.
⚠️Common Mistakes
• Assuming strong duality without checking qualifications: for nonconvex problems or convex problems without Slater’s condition, there may be a duality gap d^{} < p^{}. Always verify convexity and feasibility conditions. • Sign and direction errors: inequalities must be of the form g_{i}(x) \le 0 with \lambda_{i} \ge 0. If your constraints are of the form a^{T}x \ge b, rewrite as b - a^{T}x \le 0. • Ignoring dual feasibility: optimizing g(\lambda, \nu) with negative \lambda can yield invalid lower bounds. Keep \lambda \ge 0 via projection. • Misusing KKT: in nonconvex problems, KKT conditions are necessary but not sufficient; they may identify saddle points. In convex problems, they are necessary and sufficient. • Forgetting complementary slackness: at optimum, each inequality constraint either binds (g_{i}(x^{}) = 0, \lambda_{i}^{} > 0) or is slack (g_{i}(x^{}) < 0, \lambda_{i}^{} = 0). Violations indicate numerical issues or non-optimality. • Numerical instability: computing x^{}(\lambda) may require solving linear systems; poor conditioning can stall dual ascent. Use robust solvers (e.g., Cholesky for SPD Q) and step-size schedules. • Confusing primal and dual variables: dual gradients are constraint violations at x^{}(\lambda); mixing them with gradients of f leads to incorrect updates. Always use \nabla_{\lambda} g(\lambda) = g(x^{*}(\lambda)) for inequality constraints.
Key Formulas
Lagrangian
Explanation: The Lagrangian augments the objective with penalties for constraint violations. Inequality multipliers must be nonnegative; equality multipliers are free.
Dual Function
Explanation: For fixed multipliers, the best (lowest) value of the Lagrangian over x defines a concave function. It is a lower bound on the optimal primal value.
Dual Problem
Explanation: The dual problem chooses multipliers to maximize the lower bound provided by the dual function, subject to dual feasibility.
Weak Duality
Explanation: The optimal dual value never exceeds the optimal primal value. This guarantees any dual solution provides a certified lower bound.
Strong Duality
Explanation: Under conditions such as convexity and Slater's condition, the duality gap closes and optimal values match, allowing recovery of primal solutions from dual ones.
KKT Stationarity
Explanation: At optimum, the gradient of the Lagrangian with respect to primal variables must vanish, balancing objective and constraint gradients.
KKT Feasibility
Explanation: Primal constraints are satisfied and inequality multipliers are nonnegative at the optimal solution.
Complementary Slackness
Explanation: Each inequality is either active (equals zero with positive multiplier) or inactive (strictly negative with zero multiplier) at optimum.
Saddle-Point Condition
Explanation: A saddle point of the Lagrangian certifies optimality when strong duality holds, simultaneously minimizing over x and maximizing over multipliers.
QP Dual Function
Explanation: For a convex QP with inequalities Ax b and positive definite Q, the dual function has this closed form. It is concave in .
Dual Gradient (Inequalities)
Explanation: The gradient of the dual with respect to an inequality multiplier equals the corresponding constraint residual at the minimizing ().
SVM Dual
Explanation: The dual of soft-margin SVM is a QP with simple box constraints and one equality, enabling kernel methods and sparse solutions.
Duality Gap
Explanation: The difference between a feasible primal value and a dual value is always nonnegative. A zero gap indicates optimality under strong duality.
Slater's Condition
Explanation: If a strictly feasible point exists for convex problems with affine equalities, then strong duality holds and KKT conditions are sufficient.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Primal: minimize f(x) = (x - 3)^2 subject to x >= 5. 5 // Inequality in standard form: g(x) = 5 - x <= 0. 6 // Lagrangian: L(x, lambda) = (x-3)^2 + lambda * (5 - x), lambda >= 0. 7 // For fixed lambda, minimizer over x: d/dx L = 2(x-3) - lambda = 0 => x*(lambda) = 3 + lambda/2. 8 // Dual function: g(lambda) = L(x*(lambda), lambda) = -lambda^2/4 + 2*lambda (concave parabola). 9 // Optimal dual: lambda* = 4, x* = 5, p* = d* = 4. 10 11 struct Result { 12 double lambda_opt; 13 double x_opt; 14 double primal_value; 15 double dual_value; 16 }; 17 18 // Compute x*(lambda) and dual quantities 19 static inline double x_star(double lambda) { 20 return 3.0 + 0.5 * lambda; // x*(lambda) = 3 + lambda/2 21 } 22 23 static inline double Lagrangian(double x, double lambda) { 24 double f = (x - 3.0) * (x - 3.0); 25 double g = 5.0 - x; // inequality value 26 return f + lambda * g; 27 } 28 29 static inline double dual_value(double lambda) { 30 double x = x_star(lambda); 31 return Lagrangian(x, lambda); 32 } 33 34 static inline double dual_gradient(double lambda) { 35 // For inequality constraints, \nabla g(lambda) = g(x*(lambda)) 36 double x = x_star(lambda); 37 return 5.0 - x; // = 5 - (3 + lambda/2) = 2 - lambda/2 38 } 39 40 Result solve_dual_projected_gradient(int iters = 200, double alpha0 = 0.5) { 41 double lambda = 0.0; // start feasible (lambda >= 0) 42 for (int t = 0; t < iters; ++t) { 43 double grad = dual_gradient(lambda); 44 double alpha = alpha0 / (1.0 + t); // diminishing step size 45 lambda = lambda + alpha * grad; // gradient ascent step on concave dual 46 if (lambda < 0.0) lambda = 0.0; // projection onto lambda >= 0 47 } 48 double xopt = x_star(lambda); 49 // Primal and dual values 50 double p = (xopt - 3.0) * (xopt - 3.0); // since x* is feasible at optimum 51 double d = dual_value(lambda); 52 return {lambda, xopt, p, d}; 53 } 54 55 int main() { 56 ios::sync_with_stdio(false); 57 cin.tie(nullptr); 58 59 Result r = solve_dual_projected_gradient(); 60 61 cout << fixed << setprecision(6); 62 cout << "Approx dual lambda*: " << r.lambda_opt << "\n"; 63 cout << "Primal x*(lambda): " << r.x_opt << "\n"; 64 cout << "Primal value p(x): " << r.primal_value << "\n"; 65 cout << "Dual value g(lambda):" << r.dual_value << "\n"; 66 cout << "Duality gap p - d: " << (r.primal_value - r.dual_value) << "\n\n"; 67 68 // Verify KKT conditions numerically 69 double x = r.x_opt; 70 double lambda = r.lambda_opt; 71 double stationarity = fabs(2.0 * (x - 3.0) - lambda); // |\nabla f - lambda * \nabla g| 72 double primal_feas = 5.0 - x; // should be <= 0 73 double dual_feas = lambda; // should be >= 0 74 double comp_slack = lambda * (5.0 - x); 75 76 cout << "KKT checks:" << "\n"; 77 cout << " Stationarity |2(x-3) - lambda| = " << stationarity << "\n"; 78 cout << " Primal feas g(x)=5-x = " << primal_feas << " (<= 0)\n"; 79 cout << " Dual feas lambda = " << dual_feas << " (>= 0)\n"; 80 cout << " Complementary slackness = " << comp_slack << " (\u2248 0)\n"; 81 82 // Closed-form optimum for reference 83 cout << "\nClosed-form: lambda*=4, x*=5, p*=4, d*=4\n"; 84 return 0; 85 } 86
This program maximizes the dual function for a 1D convex problem using projected gradient ascent on the nonnegative multiplier. It computes x*(lambda) in closed form, uses the property that the dual gradient equals the constraint residual, and verifies KKT conditions and the duality gap. The optimal solution has an active constraint (x* = 5) and positive multiplier (lambda* = 4).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Solve a small convex QP via dual ascent. 5 // Primal: minimize 0.5 x^T Q x + c^T x subject to A x <= b (inequalities only). 6 // Assumptions: Q is symmetric positive definite (SPD) to ensure unique x*(lambda). 7 // Dual ascent uses: x*(lambda) = argmin_x L = solve(Q x = -(c + A^T lambda)). 8 // Dual gradient: grad(lambda) = A x*(lambda) - b. Project onto lambda >= 0. 9 10 // Utility: Solve linear system M x = rhs with Gaussian elimination and partial pivoting. 11 // Works for small dense matrices; for SPD Q, a Cholesky would be better in practice. 12 13 vector<double> solveLinearSystem(vector<vector<double>> M, vector<double> rhs) { 14 int n = (int)M.size(); 15 for (int i = 0; i < n; ++i) M[i].push_back(rhs[i]); // augment 16 // Forward elimination with partial pivoting 17 for (int col = 0; col < n; ++col) { 18 int piv = col; 19 for (int r = col + 1; r < n; ++r) 20 if (fabs(M[r][col]) > fabs(M[piv][col])) piv = r; 21 if (fabs(M[piv][col]) < 1e-12) throw runtime_error("Singular matrix detected"); 22 if (piv != col) swap(M[piv], M[col]); 23 double diag = M[col][col]; 24 for (int j = col; j <= n; ++j) M[col][j] /= diag; 25 for (int r = col + 1; r < n; ++r) { 26 double f = M[r][col]; 27 if (fabs(f) < 1e-18) continue; 28 for (int j = col; j <= n; ++j) M[r][j] -= f * M[col][j]; 29 } 30 } 31 // Back substitution 32 vector<double> x(n, 0.0); 33 for (int i = n - 1; i >= 0; --i) { 34 double s = M[i][n]; 35 for (int j = i + 1; j < n; ++j) s -= M[i][j] * x[j]; 36 x[i] = s; // since diagonal is 1 after normalization 37 } 38 return x; 39 } 40 41 struct QP { 42 vector<vector<double>> Q; // n x n SPD 43 vector<double> c; // n 44 vector<vector<double>> A; // m x n 45 vector<double> b; // m 46 }; 47 48 vector<double> matVec(const vector<vector<double>>& M, const vector<double>& x) { 49 int r = (int)M.size(); 50 int c = (int)M[0].size(); 51 vector<double> y(r, 0.0); 52 for (int i = 0; i < r; ++i) { 53 double s = 0.0; 54 for (int j = 0; j < c; ++j) s += M[i][j] * x[j]; 55 y[i] = s; 56 } 57 return y; 58 } 59 60 vector<double> vecAdd(const vector<double>& a, const vector<double>& b, double beta = 1.0) { 61 int n = (int)a.size(); 62 vector<double> r(n); 63 for (int i = 0; i < n; ++i) r[i] = a[i] + beta * b[i]; 64 return r; 65 } 66 67 vector<vector<double>> transpose(const vector<vector<double>>& M) { 68 int r = (int)M.size(), c = (int)M[0].size(); 69 vector<vector<double>> T(c, vector<double>(r)); 70 for (int i = 0; i < r; ++i) 71 for (int j = 0; j < c; ++j) 72 T[j][i] = M[i][j]; 73 return T; 74 } 75 76 // Compute x*(lambda): solve Q x = -(c + A^T lambda) 77 vector<double> x_star(const QP& qp, const vector<double>& lambda) { 78 auto AT = transpose(qp.A); 79 vector<double> ATlam = matVec(AT, lambda); 80 vector<double> rhs = vecAdd(qp.c, ATlam, 1.0); // c + A^T lambda 81 for (double& v : rhs) v = -v; // -(c + A^T lambda) 82 return solveLinearSystem(qp.Q, rhs); 83 } 84 85 // Lagrangian at (x, lambda): 0.5 x^T Q x + c^T x + lambda^T (A x - b) 86 double Lagrangian(const QP& qp, const vector<double>& x, const vector<double>& lambda) { 87 int n = (int)x.size(); 88 // 0.5 x^T Q x 89 double quad = 0.0; 90 for (int i = 0; i < n; ++i) 91 for (int j = 0; j < n; ++j) 92 quad += 0.5 * x[i] * qp.Q[i][j] * x[j]; 93 // c^T x 94 double lin = 0.0; 95 for (int i = 0; i < n; ++i) lin += qp.c[i] * x[i]; 96 // lambda^T (A x - b) 97 vector<double> Ax = matVec(qp.A, x); 98 double pen = 0.0; 99 for (int i = 0; i < (int)lambda.size(); ++i) pen += lambda[i] * (Ax[i] - qp.b[i]); 100 return quad + lin + pen; 101 } 102 103 struct DualResult { 104 vector<double> lambda; 105 vector<double> x; 106 double primal_value; 107 double dual_value; 108 }; 109 110 DualResult dual_projected_gradient(const QP& qp, int iters = 500, double alpha0 = 0.2) { 111 int m = (int)qp.A.size(); 112 vector<double> lambda(m, 0.0); 113 vector<double> x(qp.c.size(), 0.0); 114 for (int t = 0; t < iters; ++t) { 115 x = x_star(qp, lambda); 116 vector<double> Ax = matVec(qp.A, x); 117 // gradient = A x - b 118 vector<double> grad(m); 119 for (int i = 0; i < m; ++i) grad[i] = Ax[i] - qp.b[i]; 120 double alpha = alpha0 / sqrt(1.0 + t); // diminishing step size 121 for (int i = 0; i < m; ++i) { 122 lambda[i] = lambda[i] + alpha * grad[i]; // ascent on concave dual 123 if (lambda[i] < 0.0) lambda[i] = 0.0; // project onto lambda >= 0 124 } 125 } 126 x = x_star(qp, lambda); 127 // primal value: 0.5 x^T Q x + c^T x 128 double quad = 0.0; int n = (int)x.size(); 129 for (int i = 0; i < n; ++i) 130 for (int j = 0; j < n; ++j) 131 quad += 0.5 * x[i] * qp.Q[i][j] * x[j]; 132 double lin = 0.0; for (int i = 0; i < n; ++i) lin += qp.c[i] * x[i]; 133 double p = quad + lin; 134 double d = Lagrangian(qp, x, lambda); // = g(lambda) since x = argmin L(.,lambda) 135 return {lambda, x, p, d}; 136 } 137 138 int main() { 139 ios::sync_with_stdio(false); 140 cin.tie(nullptr); 141 142 // Example QP (n=2, m=2) with Slater's condition satisfied. 143 QP qp; 144 qp.Q = {{2.0, 0.0}, {0.0, 1.0}}; // SPD 145 qp.c = {-2.0, -5.0}; // linear term 146 qp.A = {{1.0, 2.0}, {-1.0, 2.0}}; // constraints A x <= b 147 qp.b = {4.0, 2.0}; 148 149 auto res = dual_projected_gradient(qp, 600, 0.25); 150 151 cout << fixed << setprecision(6); 152 cout << "lambda*: [" << res.lambda[0] << ", " << res.lambda[1] << "]\n"; 153 cout << "x*: [" << res.x[0] << ", " << res.x[1] << "]\n"; 154 cout << "Primal value p*: " << res.primal_value << "\n"; 155 cout << "Dual value d*: " << res.dual_value << "\n"; 156 cout << "Duality gap : " << (res.primal_value - res.dual_value) << "\n\n"; 157 158 // Check feasibility and complementary slackness 159 vector<double> Ax = matVec(qp.A, res.x); 160 cout << "A x - b = [" << (Ax[0] - qp.b[0]) << ", " << (Ax[1] - qp.b[1]) << "] (should be <= 0)\n"; 161 double cs = 0.0; for (int i = 0; i < (int)res.lambda.size(); ++i) cs += res.lambda[i] * (Ax[i] - qp.b[i]); 162 cout << "Sum lambda_i * (A x - b)_i = " << cs << " (\u2248 0 at optimum)\n"; 163 164 return 0; 165 } 166
This program solves the dual of a small convex quadratic program using projected gradient ascent on the dual variables. Each iteration solves a linear system for x*(lambda), computes the dual gradient as the constraint violations, takes an ascent step, and projects onto lambda >= 0. At convergence (under Slater’s condition), the primal solution x*(lambda*) is optimal and the duality gap is near zero, validating strong duality and KKT.