📚TheoryIntermediate

Lagrangian Duality

Key Points

•
Lagrangian duality turns a constrained minimization problem into a related maximization problem that provides lower bounds on the original objective.
•
The Lagrangian adds weighted penalties (Lagrange multipliers) for constraint violations to the original objective and then minimizes over the primal variables.
•
The dual function is concave and its maximization over nonnegative multipliers (and free multipliers for equalities) defines the dual problem.
•
Weak duality always holds: the best dual value is never greater than the best primal value, so the dual is a certified lower bound.
•
Under conditions like Slater's condition for convex problems, strong duality holds and the optimal primal and dual values are equal.
•
The Karush–Kuhn–Tucker (KKT) conditions characterize optimality; in convex problems they are necessary and sufficient.
•
Complementary slackness links constraints and multipliers: each inequality is either tight or has zero multiplier at optimum.
•
Duality is fundamental in ML: the SVM training problem is often solved in dual form, enabling kernels and efficient algorithms.

Prerequisites

→Multivariable calculus — Gradients and stationary points are essential to form and interpret KKT stationarity.
→Linear algebra — Vectors, matrices, and solving linear systems are needed to derive and compute x*(lambda) and dual functions.
→Convex analysis — Convexity, Slater’s condition, and properties of convex functions underlie strong duality and KKT sufficiency.
→Optimization basics — Understanding objective/constraints, feasibility, and gradient-based methods helps in applying dual ascent.
→C++ programming — Implementing optimization routines, numerical linear algebra, and careful floating-point computations.

Detailed Explanation

Tap terms for definitions

01Overview

Imagine trying to minimize a cost while obeying rules. If you break a rule, you pay a penalty. Lagrangian duality formalizes this idea: we attach a nonnegative penalty (Lagrange multiplier) to each inequality and a signed multiplier to each equality, add these penalties to the objective, and then ask, "What is the best cost we could get if we were allowed to adjust the decision and pay penalties?" The minimum over the decision variables yields the dual function, which is always a lower bound on the true best cost. Then we flip the perspective and choose penalties to make that bound as tight as possible—this is the dual problem. The power of duality is twofold: it provides certified lower bounds (useful to prove optimality or stop early) and it often simplifies or decomposes the original problem. In convex optimization, and under mild regularity conditions (like Slater’s condition), there is no gap between the best primal and dual values, so solving the dual recovers the primal solution. This perspective underpins many algorithms and ML models, notably the support vector machine (SVM), where the dual exposes the kernel trick and sparsity in support vectors.

02Intuition & Analogies

Hook: Think of planning a road trip with a budget and time limits. You try to pick routes (decisions) to minimize fuel cost while obeying constraints like total time and toll limits. Now suppose a friend says, "If you exceed the time limit, I’ll fine you $10 per extra hour; if you exceed tolls, I’ll fine you$ 2 per extra dollar." You now minimize fuel plus fines. If the fines are large, you will naturally obey the rules; if they’re small, you might break them but pay. Concept: Those fine rates are the Lagrange multipliers. The Lagrangian is fuel cost plus the fines for each rule violation. For any fixed set of fine rates, you can pick a best route (minimize over routes). That minimum is a lower bound on the true best possible cost with rules enforced, because when penalties are high enough, violating becomes too expensive. Example: If the time fine is huge, any plan that violates time will be worse than staying within time—so the best penalized plan equals the best feasible plan. Duality says, "Choose fine rates to make this bound as tight as possible." When chosen just right in convex problems (with some slack feasibility), the penalty scheme pins the best route exactly—you recover the true constrained optimum. Complementary slackness matches intuition: a rule that’s not tight has zero fine at optimum (no reason to price it), and a tight rule has a positive fine (its price reflects its scarcity).

03Formal Definition

Primal problem: minimize f(x) subject to inequality constraints

g_{i}

(x)

\leq

0 and equality constraints

h_{j}

(x) = 0. The Lagrangian is L(x,

λ

ν

) = f(x) +

\sum_{i = 1}^{m}

λ_{i}

g_{i}

(x) +

\sum_{j = 1}^{p}

ν_{j}

h_{j}

(x), with multipliers

λ

\geq

0 and

ν

free. The dual function is g(

λ

ν

) =

in f_{x}

L(x,

λ

ν

). Because it is a pointwise infimum of affine functions in (

λ

ν

), g is concave, even if f,

g_{i}

h_{j}

are not. The dual problem maximizes this concave function over the feasible multipliers: maximize g(

λ

ν

) subject to

λ

\geq

0. Let

p^{*}

be the optimal primal value and

d^{*}

the optimal dual value. Weak duality states

d^{*}

\leq

p^{*}

always. In convex problems (f convex,

g_{i}

convex,

h_{j}

affine) and under a constraint qualification such as Slater’s condition (existence of a strictly feasible point), strong duality holds:

d^{*} = p^{*}

, and optimality is characterized by the KKT conditions: stationarity, primal feasibility, dual feasibility, and complementary slackness. When strong duality holds, any saddle point (

x^{*}

λ^{*}

ν^{*}

) satisfying L(

x^{*}

λ

ν

)

\leq

x^{*}

λ^{*}

ν^{*}

)

\leq

L(x,

λ^{*}

ν^{*}

) certifies optimality.

04When to Use

Use Lagrangian duality when you need certified lower bounds, decomposition, or a simpler equivalent problem. Typical contexts include: (1) Convex optimization with inequality/equality constraints, where strong duality often holds and KKT conditions enable efficient algorithms. (2) Quadratic programs (QPs), where the dual may be lower dimensional or more structured (e.g., box constraints), enabling faster solvers. (3) Machine learning models like SVMs: the dual transforms constraints into a kernelized quadratic form with simple box constraints and one equality, making kernels and sparsity natural. (4) Resource allocation and networks: dual variables act as prices coordinating decentralized decisions (e.g., in networking or distributed control). (5) Algorithm design: dual ascent, projected gradient on the dual, and ADMM use dual variables to enforce constraints iteratively. (6) Certification: if you have a candidate feasible primal solution, a dual solution provides a gap certificate p(x) - d(\lambda, \nu) to measure suboptimality and stop early. Especially apply it when Slater’s condition is easy to verify, the dual is easier to optimize, or decomposition across constraints/blocks yields parallelism.

⚠️Common Mistakes

• Assuming strong duality without checking qualifications: for nonconvex problems or convex problems without Slater’s condition, there may be a duality gap d^{} < p^{}. Always verify convexity and feasibility conditions. • Sign and direction errors: inequalities must be of the form g_{i}(x) \le 0 with \lambda_{i} \ge 0. If your constraints are of the form a^{T}x \ge b, rewrite as b - a^{T}x \le 0. • Ignoring dual feasibility: optimizing g(\lambda, \nu) with negative \lambda can yield invalid lower bounds. Keep \lambda \ge 0 via projection. • Misusing KKT: in nonconvex problems, KKT conditions are necessary but not sufficient; they may identify saddle points. In convex problems, they are necessary and sufficient. • Forgetting complementary slackness: at optimum, each inequality constraint either binds (g_{i}(x^{}) = 0, \lambda_{i}^{} > 0) or is slack (g_{i}(x^{}) < 0, \lambda_{i}^{} = 0). Violations indicate numerical issues or non-optimality. • Numerical instability: computing x^{}(\lambda) may require solving linear systems; poor conditioning can stall dual ascent. Use robust solvers (e.g., Cholesky for SPD Q) and step-size schedules. • Confusing primal and dual variables: dual gradients are constraint violations at x^{}(\lambda); mixing them with gradients of f leads to incorrect updates. Always use \nabla_{\lambda} g(\lambda) = g(x^{*}(\lambda)) for inequality constraints.

Key Formulas

Lagrangian

L (x, λ, ν) = f (x) + i = 1 \sum m λ_{i} g_{i} (x) + j = 1 \sum p ν_{j} h_{j} (x)

Explanation: The Lagrangian augments the objective with penalties for constraint violations. Inequality multipliers must be nonnegative; equality multipliers are free.

Dual Function

g (λ, ν) = x in f L (x, λ, ν)

Explanation: For fixed multipliers, the best (lowest) value of the Lagrangian over x defines a concave function. It is a lower bound on the optimal primal value.

Dual Problem

λ \geq 0, ν \in R^{p} max g (λ, ν)

Explanation: The dual problem chooses multipliers to maximize the lower bound provided by the dual function, subject to dual feasibility.

Weak Duality

d^{*} \leq p^{*}

Explanation: The optimal dual value never exceeds the optimal primal value. This guarantees any dual solution provides a certified lower bound.

Strong Duality

d^{*} = p^{*}

Explanation: Under conditions such as convexity and Slater's condition, the duality gap closes and optimal values match, allowing recovery of primal solutions from dual ones.

KKT Stationarity

\nabla_{x} L (x^{*}, λ^{*}, ν^{*}) = 0

Explanation: At optimum, the gradient of the Lagrangian with respect to primal variables must vanish, balancing objective and constraint gradients.

KKT Feasibility

g_{i} (x^{*}) \leq 0, h_{j} (x^{*}) = 0, λ^{*} \geq 0

Explanation: Primal constraints are satisfied and inequality multipliers are nonnegative at the optimal solution.

Complementary Slackness

λ_{i}^{*} g_{i} (x^{*}) = 0 for all i

Explanation: Each inequality is either active (equals zero with positive multiplier) or inactive (strictly negative with zero multiplier) at optimum.

Saddle-Point Condition

L (x^{*}, λ, ν) \leq L (x^{*}, λ^{*}, ν^{*}) \leq L (x, λ^{*}, ν^{*})

Explanation: A saddle point of the Lagrangian certifies optimality when strong duality holds, simultaneously minimizing over x and maximizing over multipliers.

QP Dual Function

g (λ) = - \frac{1}{2} (c + A^{⊤} λ)^{⊤} Q^{- 1} (c + A^{⊤} λ) - b^{⊤} λ

Explanation: For a convex QP with inequalities Ax $\leq$ b and positive definite Q, the dual function has this closed form. It is concave in $λ$ .

Dual Gradient (Inequalities)

\nabla_{λ} g (λ) = g (x^{*} (λ))

Explanation: The gradient of the dual with respect to an inequality multiplier equals the corresponding constraint residual at the minimizing $x^{*}$ ( $λ$ ).

SVM Dual

α max i = 1 \sum n α_{i} - \frac{1}{2} i = 1 \sum n j = 1 \sum n α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) s.t. 0 \leq α_{i} \leq C, i = 1 \sum n y_{i} α_{i} = 0

Explanation: The dual of soft-margin SVM is a QP with simple box constraints and one equality, enabling kernel methods and sparse solutions.

Duality Gap

Gap (x, λ, ν) = f (x) - g (λ, ν) \geq 0

Explanation: The difference between a feasible primal value and a dual value is always nonnegative. A zero gap indicates optimality under strong duality.

Slater's Condition

\exists x : g_{i} (x) < 0 \forall i, h_{j} (x) = 0 \forall j

Explanation: If a strictly feasible point exists for convex problems with affine equalities, then strong duality holds and KKT conditions are sufficient.

Complexity Analysis

Optimizing the dual typically involves repeatedly evaluating the dual function and its gradient (or subgradient) and enforcing dual feasibility. For problems where the Lagrangian minimizer

x^{*}

(

λ

ν

) has a closed form or can be computed by solving a linear system, each dual iteration cost is dominated by that inner solve. For a convex quadratic program with n variables and m linear inequality constraints, evaluating

x^{*}

(

λ

) requires solving Q x = -(c +

A^{⊤}

λ

). Using a dense direct solver is O(

n^{3}

) time and O(

n^{2}

) space; computing the gradient

\nabla

λ

) = A

x^{*}

- b is O(mn) time and O(m) space. Projected gradient (or subgradient) ascent on the dual typically needs O(1/

ε^{2}

) iterations to reach

ε

-suboptimality without acceleration, yielding O((

n^{3}

+ mn)/

ε^{2}

) time. If Q is fixed, factorization (e.g., Cholesky) can be precomputed once in O(

n^{3}

) time and reused, reducing per-iteration time to O(

n^{2}

+ mn). Memory usage is dominated by storing Q (O(

n^{2}

)), A (O(mn)), and vectors (O(n + m)). In SVM duals with a dense kernel matrix K of size n

\times

n, a naive gradient step costs O(

n^{2}

) time and O(

n^{2}

) space; specialized methods like SMO exploit structure to achieve practical efficiency and low memory via working sets. When constraints decompose or the problem is sparse, dual ascent can be implemented with near-linear per-iteration cost using iterative solvers (e.g., conjugate gradients) and sparse algebra, significantly improving scalability while maintaining the duality-gap certificate for stopping.

Code Examples

KKT verification via dual ascent on a 1D convex problem (active constraint)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Primal: minimize f(x) = (x - 3)^2 subject to x >= 5.
5 // Inequality in standard form: g(x) = 5 - x <= 0.
6 // Lagrangian: L(x, lambda) = (x-3)^2 + lambda * (5 - x), lambda >= 0.
7 // For fixed lambda, minimizer over x: d/dx L = 2(x-3) - lambda = 0 => x*(lambda) = 3 + lambda/2.
8 // Dual function: g(lambda) = L(x*(lambda), lambda) = -lambda^2/4 + 2*lambda (concave parabola).
9 // Optimal dual: lambda* = 4, x* = 5, p* = d* = 4.
10 
11 struct Result {
12     double lambda_opt;
13     double x_opt;
14     double primal_value;
15     double dual_value;
16 };
17 
18 // Compute x*(lambda) and dual quantities
19 static inline double x_star(double lambda) {
20     return 3.0 + 0.5 * lambda; // x*(lambda) = 3 + lambda/2
21 }
22 
23 static inline double Lagrangian(double x, double lambda) {
24     double f = (x - 3.0) * (x - 3.0);
25     double g = 5.0 - x; // inequality value
26     return f + lambda * g;
27 }
28 
29 static inline double dual_value(double lambda) {
30     double x = x_star(lambda);
31     return Lagrangian(x, lambda);
32 }
33 
34 static inline double dual_gradient(double lambda) {
35     // For inequality constraints, \nabla g(lambda) = g(x*(lambda))
36     double x = x_star(lambda);
37     return 5.0 - x; // = 5 - (3 + lambda/2) = 2 - lambda/2
38 }
39 
40 Result solve_dual_projected_gradient(int iters = 200, double alpha0 = 0.5) {
41     double lambda = 0.0; // start feasible (lambda >= 0)
42     for (int t = 0; t < iters; ++t) {
43         double grad = dual_gradient(lambda);
44         double alpha = alpha0 / (1.0 + t); // diminishing step size
45         lambda = lambda + alpha * grad;    // gradient ascent step on concave dual
46         if (lambda < 0.0) lambda = 0.0;    // projection onto lambda >= 0
47     }
48     double xopt = x_star(lambda);
49     // Primal and dual values
50     double p = (xopt - 3.0) * (xopt - 3.0); // since x* is feasible at optimum
51     double d = dual_value(lambda);
52     return {lambda, xopt, p, d};
53 }
54 
55 int main() {
56     ios::sync_with_stdio(false);
57     cin.tie(nullptr);
58 
59     Result r = solve_dual_projected_gradient();
60 
61     cout << fixed << setprecision(6);
62     cout << "Approx dual lambda*: " << r.lambda_opt << "\n";
63     cout << "Primal x*(lambda):   " << r.x_opt << "\n";
64     cout << "Primal value p(x):   " << r.primal_value << "\n";
65     cout << "Dual value g(lambda):" << r.dual_value << "\n";
66     cout << "Duality gap p - d:   " << (r.primal_value - r.dual_value) << "\n\n";
67 
68     // Verify KKT conditions numerically
69     double x = r.x_opt;
70     double lambda = r.lambda_opt;
71     double stationarity = fabs(2.0 * (x - 3.0) - lambda); // |\nabla f - lambda * \nabla g|
72     double primal_feas = 5.0 - x; // should be <= 0
73     double dual_feas = lambda;    // should be >= 0
74     double comp_slack = lambda * (5.0 - x);
75 
76     cout << "KKT checks:" << "\n";
77     cout << "  Stationarity |2(x-3) - lambda| = " << stationarity << "\n";
78     cout << "  Primal feas g(x)=5-x           = " << primal_feas << " (<= 0)\n";
79     cout << "  Dual feas lambda               = " << dual_feas << " (>= 0)\n";
80     cout << "  Complementary slackness        = " << comp_slack << " (\u2248 0)\n";
81 
82     // Closed-form optimum for reference
83     cout << "\nClosed-form: lambda*=4, x*=5, p*=4, d*=4\n";
84     return 0;
85 }
86

This program maximizes the dual function for a 1D convex problem using projected gradient ascent on the nonnegative multiplier. It computes x*(lambda) in closed form, uses the property that the dual gradient equals the constraint residual, and verifies KKT conditions and the duality gap. The optimal solution has an active constraint (x* = 5) and positive multiplier (lambda* = 4).

Time: O(T) where T is the number of gradient steps (all operations are O(1) per step).Space: O(1)

Dual projected gradient for a small convex QP: min 0.5 x^T Q x + c^T x s.t. A x <= b

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Solve a small convex QP via dual ascent.
5 // Primal: minimize 0.5 x^T Q x + c^T x subject to A x <= b (inequalities only).
6 // Assumptions: Q is symmetric positive definite (SPD) to ensure unique x*(lambda).
7 // Dual ascent uses: x*(lambda) = argmin_x L = solve(Q x = -(c + A^T lambda)).
8 // Dual gradient: grad(lambda) = A x*(lambda) - b. Project onto lambda >= 0.
9 
10 // Utility: Solve linear system M x = rhs with Gaussian elimination and partial pivoting.
11 // Works for small dense matrices; for SPD Q, a Cholesky would be better in practice.
12 
13 vector<double> solveLinearSystem(vector<vector<double>> M, vector<double> rhs) {
14     int n = (int)M.size();
15     for (int i = 0; i < n; ++i) M[i].push_back(rhs[i]); // augment
16     // Forward elimination with partial pivoting
17     for (int col = 0; col < n; ++col) {
18         int piv = col;
19         for (int r = col + 1; r < n; ++r)
20             if (fabs(M[r][col]) > fabs(M[piv][col])) piv = r;
21         if (fabs(M[piv][col]) < 1e-12) throw runtime_error("Singular matrix detected");
22         if (piv != col) swap(M[piv], M[col]);
23         double diag = M[col][col];
24         for (int j = col; j <= n; ++j) M[col][j] /= diag;
25         for (int r = col + 1; r < n; ++r) {
26             double f = M[r][col];
27             if (fabs(f) < 1e-18) continue;
28             for (int j = col; j <= n; ++j) M[r][j] -= f * M[col][j];
29         }
30     }
31     // Back substitution
32     vector<double> x(n, 0.0);
33     for (int i = n - 1; i >= 0; --i) {
34         double s = M[i][n];
35         for (int j = i + 1; j < n; ++j) s -= M[i][j] * x[j];
36         x[i] = s; // since diagonal is 1 after normalization
37     }
38     return x;
39 }
40 
41 struct QP {
42     vector<vector<double>> Q; // n x n SPD
43     vector<double> c;         // n
44     vector<vector<double>> A; // m x n
45     vector<double> b;         // m
46 };
47 
48 vector<double> matVec(const vector<vector<double>>& M, const vector<double>& x) {
49     int r = (int)M.size();
50     int c = (int)M[0].size();
51     vector<double> y(r, 0.0);
52     for (int i = 0; i < r; ++i) {
53         double s = 0.0;
54         for (int j = 0; j < c; ++j) s += M[i][j] * x[j];
55         y[i] = s;
56     }
57     return y;
58 }
59 
60 vector<double> vecAdd(const vector<double>& a, const vector<double>& b, double beta = 1.0) {
61     int n = (int)a.size();
62     vector<double> r(n);
63     for (int i = 0; i < n; ++i) r[i] = a[i] + beta * b[i];
64     return r;
65 }
66 
67 vector<vector<double>> transpose(const vector<vector<double>>& M) {
68     int r = (int)M.size(), c = (int)M[0].size();
69     vector<vector<double>> T(c, vector<double>(r));
70     for (int i = 0; i < r; ++i)
71         for (int j = 0; j < c; ++j)
72             T[j][i] = M[i][j];
73     return T;
74 }
75 
76 // Compute x*(lambda): solve Q x = -(c + A^T lambda)
77 vector<double> x_star(const QP& qp, const vector<double>& lambda) {
78     auto AT = transpose(qp.A);
79     vector<double> ATlam = matVec(AT, lambda);
80     vector<double> rhs = vecAdd(qp.c, ATlam, 1.0); // c + A^T lambda
81     for (double& v : rhs) v = -v; // -(c + A^T lambda)
82     return solveLinearSystem(qp.Q, rhs);
83 }
84 
85 // Lagrangian at (x, lambda): 0.5 x^T Q x + c^T x + lambda^T (A x - b)
86 double Lagrangian(const QP& qp, const vector<double>& x, const vector<double>& lambda) {
87     int n = (int)x.size();
88     // 0.5 x^T Q x
89     double quad = 0.0;
90     for (int i = 0; i < n; ++i)
91         for (int j = 0; j < n; ++j)
92             quad += 0.5 * x[i] * qp.Q[i][j] * x[j];
93     // c^T x
94     double lin = 0.0;
95     for (int i = 0; i < n; ++i) lin += qp.c[i] * x[i];
96     // lambda^T (A x - b)
97     vector<double> Ax = matVec(qp.A, x);
98     double pen = 0.0;
99     for (int i = 0; i < (int)lambda.size(); ++i) pen += lambda[i] * (Ax[i] - qp.b[i]);
100     return quad + lin + pen;
101 }
102 
103 struct DualResult {
104     vector<double> lambda;
105     vector<double> x;
106     double primal_value;
107     double dual_value;
108 };
109 
110 DualResult dual_projected_gradient(const QP& qp, int iters = 500, double alpha0 = 0.2) {
111     int m = (int)qp.A.size();
112     vector<double> lambda(m, 0.0);
113     vector<double> x(qp.c.size(), 0.0);
114     for (int t = 0; t < iters; ++t) {
115         x = x_star(qp, lambda);
116         vector<double> Ax = matVec(qp.A, x);
117         // gradient = A x - b
118         vector<double> grad(m);
119         for (int i = 0; i < m; ++i) grad[i] = Ax[i] - qp.b[i];
120         double alpha = alpha0 / sqrt(1.0 + t); // diminishing step size
121         for (int i = 0; i < m; ++i) {
122             lambda[i] = lambda[i] + alpha * grad[i]; // ascent on concave dual
123             if (lambda[i] < 0.0) lambda[i] = 0.0;     // project onto lambda >= 0
124         }
125     }
126     x = x_star(qp, lambda);
127     // primal value: 0.5 x^T Q x + c^T x
128     double quad = 0.0; int n = (int)x.size();
129     for (int i = 0; i < n; ++i)
130         for (int j = 0; j < n; ++j)
131             quad += 0.5 * x[i] * qp.Q[i][j] * x[j];
132     double lin = 0.0; for (int i = 0; i < n; ++i) lin += qp.c[i] * x[i];
133     double p = quad + lin;
134     double d = Lagrangian(qp, x, lambda); // = g(lambda) since x = argmin L(.,lambda)
135     return {lambda, x, p, d};
136 }
137 
138 int main() {
139     ios::sync_with_stdio(false);
140     cin.tie(nullptr);
141 
142     // Example QP (n=2, m=2) with Slater's condition satisfied.
143     QP qp;
144     qp.Q = {{2.0, 0.0}, {0.0, 1.0}};  // SPD
145     qp.c = {-2.0, -5.0};              // linear term
146     qp.A = {{1.0, 2.0}, {-1.0, 2.0}}; // constraints A x <= b
147     qp.b = {4.0, 2.0};
148 
149     auto res = dual_projected_gradient(qp, 600, 0.25);
150 
151     cout << fixed << setprecision(6);
152     cout << "lambda*: [" << res.lambda[0] << ", " << res.lambda[1] << "]\n";
153     cout << "x*:      [" << res.x[0] << ", " << res.x[1] << "]\n";
154     cout << "Primal value p*: " << res.primal_value << "\n";
155     cout << "Dual value   d*: " << res.dual_value << "\n";
156     cout << "Duality gap  :  " << (res.primal_value - res.dual_value) << "\n\n";
157 
158     // Check feasibility and complementary slackness
159     vector<double> Ax = matVec(qp.A, res.x);
160     cout << "A x - b = [" << (Ax[0] - qp.b[0]) << ", " << (Ax[1] - qp.b[1]) << "] (should be <= 0)\n";
161     double cs = 0.0; for (int i = 0; i < (int)res.lambda.size(); ++i) cs += res.lambda[i] * (Ax[i] - qp.b[i]);
162     cout << "Sum lambda_i * (A x - b)_i = " << cs << " (\u2248 0 at optimum)\n";
163 
164     return 0;
165 }
166

This program solves the dual of a small convex quadratic program using projected gradient ascent on the dual variables. Each iteration solves a linear system for x*(lambda), computes the dual gradient as the constraint violations, takes an ascent step, and projects onto lambda >= 0. At convergence (under Slater’s condition), the primal solution x*(lambda*) is optimal and the duality gap is near zero, validating strong duality and KKT.

Time: O(T (n^3 + m n)) for T iterations using a dense direct solver (Gaussian elimination) per iteration.Space: O(n^2 + m n) to store Q, A, and auxiliary vectors.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Primal: minimize f(x) = (x - 3)^2 subject to x >= 5.
5	// Inequality in standard form: g(x) = 5 - x <= 0.
6	// Lagrangian: L(x, lambda) = (x-3)^2 + lambda * (5 - x), lambda >= 0.
7	// For fixed lambda, minimizer over x: d/dx L = 2(x-3) - lambda = 0 => x*(lambda) = 3 + lambda/2.
8	// Dual function: g(lambda) = L(x(lambda), lambda) = -lambda^2/4 + 2lambda (concave parabola).
9	// Optimal dual: lambda* = 4, x* = 5, p* = d* = 4.
10
11	struct Result {
12	double lambda_opt;
13	double x_opt;
14	double primal_value;
15	double dual_value;
16	};
17
18	// Compute x*(lambda) and dual quantities
19	static inline double x_star(double lambda) {
20	return 3.0 + 0.5 * lambda; // x*(lambda) = 3 + lambda/2
21	}
22
23	static inline double Lagrangian(double x, double lambda) {
24	double f = (x - 3.0) * (x - 3.0);
25	double g = 5.0 - x; // inequality value
26	return f + lambda * g;
27	}
28
29	static inline double dual_value(double lambda) {
30	double x = x_star(lambda);
31	return Lagrangian(x, lambda);
32	}
33
34	static inline double dual_gradient(double lambda) {
35	// For inequality constraints, \nabla g(lambda) = g(x*(lambda))
36	double x = x_star(lambda);
37	return 5.0 - x; // = 5 - (3 + lambda/2) = 2 - lambda/2
38	}
39
40	Result solve_dual_projected_gradient(int iters = 200, double alpha0 = 0.5) {
41	double lambda = 0.0; // start feasible (lambda >= 0)
42	for (int t = 0; t < iters; ++t) {
43	double grad = dual_gradient(lambda);
44	double alpha = alpha0 / (1.0 + t); // diminishing step size
45	lambda = lambda + alpha * grad; // gradient ascent step on concave dual
46	if (lambda < 0.0) lambda = 0.0; // projection onto lambda >= 0
47	}
48	double xopt = x_star(lambda);
49	// Primal and dual values
50	double p = (xopt - 3.0) * (xopt - 3.0); // since x* is feasible at optimum
51	double d = dual_value(lambda);
52	return {lambda, xopt, p, d};
53	}
54
55	int main() {
56	ios::sync_with_stdio(false);
57	cin.tie(nullptr);
58
59	Result r = solve_dual_projected_gradient();
60
61	cout << fixed << setprecision(6);
62	cout << "Approx dual lambda*: " << r.lambda_opt << "\n";
63	cout << "Primal x*(lambda): " << r.x_opt << "\n";
64	cout << "Primal value p(x): " << r.primal_value << "\n";
65	cout << "Dual value g(lambda):" << r.dual_value << "\n";
66	cout << "Duality gap p - d: " << (r.primal_value - r.dual_value) << "\n\n";
67
68	// Verify KKT conditions numerically
69	double x = r.x_opt;
70	double lambda = r.lambda_opt;
71	double stationarity = fabs(2.0 * (x - 3.0) - lambda); // \|\nabla f - lambda * \nabla g\|
72	double primal_feas = 5.0 - x; // should be <= 0
73	double dual_feas = lambda; // should be >= 0
74	double comp_slack = lambda * (5.0 - x);
75
76	cout << "KKT checks:" << "\n";
77	cout << " Stationarity \|2(x-3) - lambda\| = " << stationarity << "\n";
78	cout << " Primal feas g(x)=5-x = " << primal_feas << " (<= 0)\n";
79	cout << " Dual feas lambda = " << dual_feas << " (>= 0)\n";
80	cout << " Complementary slackness = " << comp_slack << " (\u2248 0)\n";
81
82	// Closed-form optimum for reference
83	cout << "\nClosed-form: lambda=4, x=5, p=4, d=4\n";
84	return 0;
85	}
86

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Solve a small convex QP via dual ascent.
5	// Primal: minimize 0.5 x^T Q x + c^T x subject to A x <= b (inequalities only).
6	// Assumptions: Q is symmetric positive definite (SPD) to ensure unique x*(lambda).
7	// Dual ascent uses: x*(lambda) = argmin_x L = solve(Q x = -(c + A^T lambda)).
8	// Dual gradient: grad(lambda) = A x*(lambda) - b. Project onto lambda >= 0.
9
10	// Utility: Solve linear system M x = rhs with Gaussian elimination and partial pivoting.
11	// Works for small dense matrices; for SPD Q, a Cholesky would be better in practice.
12
13	vector<double> solveLinearSystem(vector<vector<double>> M, vector<double> rhs) {
14	int n = (int)M.size();
15	for (int i = 0; i < n; ++i) M[i].push_back(rhs[i]); // augment
16	// Forward elimination with partial pivoting
17	for (int col = 0; col < n; ++col) {
18	int piv = col;
19	for (int r = col + 1; r < n; ++r)
20	if (fabs(M[r][col]) > fabs(M[piv][col])) piv = r;
21	if (fabs(M[piv][col]) < 1e-12) throw runtime_error("Singular matrix detected");
22	if (piv != col) swap(M[piv], M[col]);
23	double diag = M[col][col];
24	for (int j = col; j <= n; ++j) M[col][j] /= diag;
25	for (int r = col + 1; r < n; ++r) {
26	double f = M[r][col];
27	if (fabs(f) < 1e-18) continue;
28	for (int j = col; j <= n; ++j) M[r][j] -= f * M[col][j];
29	}
30	}
31	// Back substitution
32	vector<double> x(n, 0.0);
33	for (int i = n - 1; i >= 0; --i) {
34	double s = M[i][n];
35	for (int j = i + 1; j < n; ++j) s -= M[i][j] * x[j];
36	x[i] = s; // since diagonal is 1 after normalization
37	}
38	return x;
39	}
40
41	struct QP {
42	vector<vector<double>> Q; // n x n SPD
43	vector<double> c; // n
44	vector<vector<double>> A; // m x n
45	vector<double> b; // m
46	};
47
48	vector<double> matVec(const vector<vector<double>>& M, const vector<double>& x) {
49	int r = (int)M.size();
50	int c = (int)M[0].size();
51	vector<double> y(r, 0.0);
52	for (int i = 0; i < r; ++i) {
53	double s = 0.0;
54	for (int j = 0; j < c; ++j) s += M[i][j] * x[j];
55	y[i] = s;
56	}
57	return y;
58	}
59
60	vector<double> vecAdd(const vector<double>& a, const vector<double>& b, double beta = 1.0) {
61	int n = (int)a.size();
62	vector<double> r(n);
63	for (int i = 0; i < n; ++i) r[i] = a[i] + beta * b[i];
64	return r;
65	}
66
67	vector<vector<double>> transpose(const vector<vector<double>>& M) {
68	int r = (int)M.size(), c = (int)M[0].size();
69	vector<vector<double>> T(c, vector<double>(r));
70	for (int i = 0; i < r; ++i)
71	for (int j = 0; j < c; ++j)
72	T[j][i] = M[i][j];
73	return T;
74	}
75
76	// Compute x*(lambda): solve Q x = -(c + A^T lambda)
77	vector<double> x_star(const QP& qp, const vector<double>& lambda) {
78	auto AT = transpose(qp.A);
79	vector<double> ATlam = matVec(AT, lambda);
80	vector<double> rhs = vecAdd(qp.c, ATlam, 1.0); // c + A^T lambda
81	for (double& v : rhs) v = -v; // -(c + A^T lambda)
82	return solveLinearSystem(qp.Q, rhs);
83	}
84
85	// Lagrangian at (x, lambda): 0.5 x^T Q x + c^T x + lambda^T (A x - b)
86	double Lagrangian(const QP& qp, const vector<double>& x, const vector<double>& lambda) {
87	int n = (int)x.size();
88	// 0.5 x^T Q x
89	double quad = 0.0;
90	for (int i = 0; i < n; ++i)
91	for (int j = 0; j < n; ++j)
92	quad += 0.5 * x[i] * qp.Q[i][j] * x[j];
93	// c^T x
94	double lin = 0.0;
95	for (int i = 0; i < n; ++i) lin += qp.c[i] * x[i];
96	// lambda^T (A x - b)
97	vector<double> Ax = matVec(qp.A, x);
98	double pen = 0.0;
99	for (int i = 0; i < (int)lambda.size(); ++i) pen += lambda[i] * (Ax[i] - qp.b[i]);
100	return quad + lin + pen;
101	}
102
103	struct DualResult {
104	vector<double> lambda;
105	vector<double> x;
106	double primal_value;
107	double dual_value;
108	};
109
110	DualResult dual_projected_gradient(const QP& qp, int iters = 500, double alpha0 = 0.2) {
111	int m = (int)qp.A.size();
112	vector<double> lambda(m, 0.0);
113	vector<double> x(qp.c.size(), 0.0);
114	for (int t = 0; t < iters; ++t) {
115	x = x_star(qp, lambda);
116	vector<double> Ax = matVec(qp.A, x);
117	// gradient = A x - b
118	vector<double> grad(m);
119	for (int i = 0; i < m; ++i) grad[i] = Ax[i] - qp.b[i];
120	double alpha = alpha0 / sqrt(1.0 + t); // diminishing step size
121	for (int i = 0; i < m; ++i) {
122	lambda[i] = lambda[i] + alpha * grad[i]; // ascent on concave dual
123	if (lambda[i] < 0.0) lambda[i] = 0.0; // project onto lambda >= 0
124	}
125	}
126	x = x_star(qp, lambda);
127	// primal value: 0.5 x^T Q x + c^T x
128	double quad = 0.0; int n = (int)x.size();
129	for (int i = 0; i < n; ++i)
130	for (int j = 0; j < n; ++j)
131	quad += 0.5 * x[i] * qp.Q[i][j] * x[j];
132	double lin = 0.0; for (int i = 0; i < n; ++i) lin += qp.c[i] * x[i];
133	double p = quad + lin;
134	double d = Lagrangian(qp, x, lambda); // = g(lambda) since x = argmin L(.,lambda)
135	return {lambda, x, p, d};
136	}
137
138	int main() {
139	ios::sync_with_stdio(false);
140	cin.tie(nullptr);
141
142	// Example QP (n=2, m=2) with Slater's condition satisfied.
143	QP qp;
144	qp.Q = {{2.0, 0.0}, {0.0, 1.0}}; // SPD
145	qp.c = {-2.0, -5.0}; // linear term
146	qp.A = {{1.0, 2.0}, {-1.0, 2.0}}; // constraints A x <= b
147	qp.b = {4.0, 2.0};
148
149	auto res = dual_projected_gradient(qp, 600, 0.25);
150
151	cout << fixed << setprecision(6);
152	cout << "lambda*: [" << res.lambda[0] << ", " << res.lambda[1] << "]\n";
153	cout << "x*: [" << res.x[0] << ", " << res.x[1] << "]\n";
154	cout << "Primal value p*: " << res.primal_value << "\n";
155	cout << "Dual value d*: " << res.dual_value << "\n";
156	cout << "Duality gap : " << (res.primal_value - res.dual_value) << "\n\n";
157
158	// Check feasibility and complementary slackness
159	vector<double> Ax = matVec(qp.A, res.x);
160	cout << "A x - b = [" << (Ax[0] - qp.b[0]) << ", " << (Ax[1] - qp.b[1]) << "] (should be <= 0)\n";
161	double cs = 0.0; for (int i = 0; i < (int)res.lambda.size(); ++i) cs += res.lambda[i] * (Ax[i] - qp.b[i]);
162	cout << "Sum lambda_i * (A x - b)_i = " << cs << " (\u2248 0 at optimum)\n";
163
164	return 0;
165	}
166