Calculus of Variations
Key Points
- â˘Calculus of variations optimizes functionalsânumbers produced by whole functionsârather than ordinary functions of numbers.
- â˘The EulerâLagrange equation \( - \frac{d}{dx}\left(\frac{\partial L}{\partial y'}\right) = 0\) gives a necessary condition that optimal functions must satisfy.
- â˘Boundary conditions matter as much as the differential equation: fixing endpoints vs. leaving them free changes the solution through natural boundary terms.
- â˘Constraints on integrals (isoperimetric problems) are handled with Lagrange multipliers, adding terms to the EulerâLagrange equation.
- â˘In practice, we solve these problems numerically by discretizing the function and doing gradient descent or by solving the resulting boundary-value ODE with shooting or finite differences.
- â˘If the Lagrangian does not depend on x explicitly, the Beltrami identity gives a first integral, simplifying calculations.
- â˘Optimal control generalizes calculus of variations; the HamiltonâJacobiâBellman equation plays a similar role to EulerâLagrange for controls.
- â˘For ML and RL with continuous dynamics, these tools inform trajectory optimization, policy gradients in function spaces, and understanding continuous-time limits.
Prerequisites
- âSingle-variable differential calculus â Needed to understand derivatives, Taylor expansions, and slopes y'.
- âIntegral calculus â Functionals are integrals over functions; quadrature rules approximate them numerically.
- âOrdinary differential equations (ODEs) â EulerâLagrange yields ODE boundary-value problems; numerical solvers rely on ODE integration.
- âLinear algebra â Discretizations lead to linear systems and eigenproblems, especially with constraints.
- âFinite differences â To discretize derivatives and build numerical gradients for functionals.
- âConvexity and optimization basics â To interpret necessary vs. sufficient conditions and convergence of gradient descent.
- âC++ programming fundamentals â Required to understand and modify the provided implementations.
- âRungeâKutta methods â Used in the shooting method to integrate ODEs accurately.
- âOptimal control fundamentals â To connect calculus of variations with HJB and policy optimization in continuous systems.
- âVariational formulations of PDEs â Provides context for using energies to solve boundary value problems and for finite element methods.
Detailed Explanation
Tap terms for definitions01Overview
Hook: Imagine planning a hiking trail from A to B that is as short as possible but also avoids cliffs and rivers. You are not choosing a single number; you are choosing an entire path y(x) that balances multiple factors everywhere along the trail. Calculus of variations is the mathematics of making the best choice among all such functions. Concept: Instead of minimizing a scalar function f(x), we minimize a functional J[y] that maps a whole function y(x) to a number. A standard form is J[y] = (\int_a^b L(x, y(x), y'(x)),dx), where L is called the Lagrangian. The necessary condition for an extremum is the EulerâLagrange equation: (\partial L/\partial y - d/dx(\partial L/\partial y') = 0). Boundary conditions (values and/or slopes at the endpoints) complete the problem. Example: The shortest path in the plane between two fixed points corresponds to L(x, y, y') = (\sqrt{1 + (y')^2}). The EulerâLagrange equation simplifies to y'' = 0, giving a straight line. More complex examples include the brachistochrone (fastest descent under gravity) and geodesics on curved surfaces. When constraints on integrals are present (e.g., fixed area under the curve), Lagrange multipliers extend the method and lead to isoperimetric problems.
02Intuition & Analogies
Think of a functional as a âscorekeeperâ for a whole curve. If you draw a path on paper, a functional like length will trace a ruler along every small piece and add them up. Another functional might measure how wiggly the path is (using its slope), like penalizing sharp turns when designing highways. We want the path with the best overall score. The EulerâLagrange equation is like a local fairness rule: take any tiny wiggle you make to the current path; if the path is optimal, that wiggle should not improve the score to first order. Translating that principle of âno beneficial infinitesimal tweaksâ into calculus gives the differential equation (\partial L/\partial y - d/dx(\partial L/\partial y') = 0). So, optimal functions are those for which every microscopic piece already behaves optimally with its neighbors. Constraints add realism: perhaps your garden hose must encircle a fixed area of grass (integral constraint), or endpoints may slide along rails (free boundaries). Lagrange multipliers act like adjustable prices: the system tunes a multiplier so the constraint is exactly satisfied at optimum. In computation, we cannot directly manipulate infinite-dimensional objects, so we chop the interval into many small segments (a grid) and turn the functional into a sum. Now we have an ordinary optimization over many variables (the y-values at grid points). Gradient descent on this discretized energy mimics the EulerâLagrange condition. Alternatively, when the EulerâLagrange equation forms a boundary-value ODE, we can âshootâ from one end and adjust the initial slope until we hit the other boundaryâlike aiming a cannon whose shell must land at a precise target.
03Formal Definition
04When to Use
- Path planning and geometry: shortest curves (geodesics), minimal surfaces, spline smoothing; wherever an integral cost over a continuous curve or surface must be minimized.
- Physics and engineering: deriving equations of motion from least action, optimal shapes of beams (elastic energy), optics via Fermatâs principle (light takes paths of stationary optical length).
- Optimal control and RL (continuous time): designing control signals u(t) that minimize (\int L(x(t), u(t)),dt) subject to (\dot{x} = f(x,u)); the variational viewpoint leads to costates, Hamiltonians, and HJB PDEs. In ML, trajectory optimization and continuous policy gradients in function spaces echo these ideas.
- Statistics and ML regularization: estimating functions by minimizing (\int (\text{loss}) + \lambda (\text{roughness}),dx) (e.g., smoothing splines where the roughness term is (\int (y'')^2,dx)).
- Boundary value problems: whenever the solution of a PDE/ODE can be characterized as an energy minimizer, variational methods give both theoretical guarantees (existence/uniqueness) and practical algorithms (finite elements, finite differences).
â ď¸Common Mistakes
- Treating the EulerâLagrange equation as sufficient. It is only necessary: solutions can be maxima or saddle points. Check second-variation or convexity (e.g., L convex in y' often implies a minimum).
- Ignoring boundary terms. If endpoints are not fixed, the natural boundary condition (\partial L/\partial y' = 0) (or more general transversality) must be enforced. Forgetting this yields incorrect solutions.
- Mixing partial derivatives and total derivatives. In EL, (\partial L/\partial y) and (\partial L/\partial y') are partials treating x, y, y' as independent; then you take a total derivative in x of (\partial L/\partial y'). Confusing these leads to algebraic errors.
- Poor discretization. Using forward differences inconsistently or evaluating L at mismatched points (nodes vs midpoints) can bias gradients or destabilize descent. Central differences and midpoint quadrature are usually better for 1D problems.
- Step size issues in gradient descent. Too-large steps cause divergence; too-small steps stall. Use line search or scale steps with the grid size (e.g., O(h)).
- Mishandling constraints. For isoperimetric problems, either introduce a Lagrange multiplier (and solve for it) or project iterates onto the constraint set. Half-implementations drift away from feasibility.
- Forgetting scaling/units. The Beltrami identity and conservation-like laws assume conditions (e.g., no explicit x-dependence). Applying them blindly when assumptions fail gives wrong constants.
Key Formulas
Functional
Explanation: This defines the cost of a whole function y by integrating a local cost L over the interval. Many variational problems take exactly this form.
EulerâLagrange Equation
Explanation: A necessary condition for y to be an extremum of J. It balances the sensitivity to the function value and to its slope at every point.
Natural Boundary Conditions
Explanation: If an endpoint value is free (not fixed), boundary terms force the derivative of L with respect to y' to vanish at that endpoint.
Beltrami Identity
Explanation: When L has no explicit x-dependence, this quantity is conserved along extremals. It reduces a second-order problem to a first-order one.
Isoperimetric EL with Multiplier
Explanation: For minimizing J[y] with an integral constraint ⍠g(x,y) , introduce a scalar multiplier The augmented EL equation includes Îť times the constraintâs y-derivative.
Hamiltonian in Variational Problems
Explanation: The conjugate momentum p and Hamiltonian H encode the same variational information as EL. If L has no explicit x, H is constant along extremals.
First Variation
Explanation: The first-order change in J due to a small perturbation Integration by parts of the term yields the EL equation and boundary terms.
EL for Quadratic Kinetic + Potential
Explanation: For L equal to kinetic minus potential (or plus for energies), EL reduces to Newton-like ODEs. This is the bridge to mechanics and to BVP solvers.
Discrete EulerâLagrange (Midpoint FD)
Explanation: Finite-difference approximation of EL on a uniform grid using midpoint quadrature. It is the gradient of the discretized functional equal to zero.
Stationary HJB (1D)
Explanation: The HamiltonâJacobiâBellman equation characterizes the value function in optimal control. It mirrors variational optimality for control inputs rather than functions y(x).
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Functional { 5 function<double(double,double,double)> L; // L(x, y, p) 6 function<double(double,double,double)> Ly; // dL/dy(x, y, p) 7 function<double(double,double,double)> Lp; // dL/dp(x, y, p) with p = y' 8 }; 9 10 // Compute J[y] â sum h * L(x_{i+1/2}, y_{i+1/2}, p_{i+1/2}) 11 double computeJ(const Functional &F, const vector<double> &x, const vector<double> &y){ 12 int N = (int)x.size() - 1; // N cells, N+1 nodes 13 double J = 0.0; 14 for(int i = 0; i < N; ++i){ 15 double h = x[i+1] - x[i]; 16 double xmid = 0.5*(x[i] + x[i+1]); 17 double ymid = 0.5*(y[i] + y[i+1]); 18 double p = (y[i+1] - y[i]) / h; // slope on cell 19 J += h * F.L(xmid, ymid, p); 20 } 21 return J; 22 } 23 24 // Compute gradient dJ/dy_k for interior nodes k = 1..N-1 using midpoint discretization 25 vector<double> computeGrad(const Functional &F, const vector<double> &x, const vector<double> &y){ 26 int N = (int)x.size() - 1; 27 vector<double> grad(y.size(), 0.0); 28 vector<double> Ly(N), Lp(N); 29 for(int i = 0; i < N; ++i){ 30 double h = x[i+1] - x[i]; 31 double xmid = 0.5*(x[i] + x[i+1]); 32 double ymid = 0.5*(y[i] + y[i+1]); 33 double p = (y[i+1] - y[i]) / h; 34 Ly[i] = F.Ly(xmid, ymid, p); 35 Lp[i] = F.Lp(xmid, ymid, p); 36 } 37 for(int k = 1; k < N; ++k){ 38 double hk_left = x[k] - x[k-1]; 39 double hk_right = x[k+1] - x[k]; 40 // Contribution from cell (k-1, k) 41 grad[k] += hk_left * 0.5 * Ly[k-1] + Lp[k-1]; 42 // Contribution from cell (k, k+1) 43 grad[k] += hk_right * 0.5 * Ly[k] - Lp[k]; 44 } 45 // grad[0] and grad[N] are zero because endpoints are fixed (Dirichlet) 46 return grad; 47 } 48 49 int main(){ 50 ios::sync_with_stdio(false); 51 cin.tie(nullptr); 52 53 // Example: shortest path (arc length) L = sqrt(1 + p^2) 54 Functional F; 55 F.L = [](double /*x*/, double /*y*/, double p){ return sqrt(1.0 + p*p); }; 56 F.Ly = [](double /*x*/, double /*y*/, double /*p*/){ return 0.0; }; // no y-dependence 57 F.Lp = [](double /*x*/, double /*y*/, double p){ return p / sqrt(1.0 + p*p); }; 58 59 int N = 200; // number of segments 60 double a = 0.0, b = 1.0; 61 vector<double> x(N+1), y(N+1); 62 for(int i = 0; i <= N; ++i){ x[i] = a + (b-a)*i/N; } 63 64 // Boundary conditions y(a)=0, y(b)=1 65 y[0] = 0.0; y[N] = 1.0; 66 // Initialize with a slightly wiggly curve to show descent 67 for(int i = 1; i < N; ++i){ 68 double t = x[i]; 69 y[i] = t + 0.1 * sin(2.0 * M_PI * t); 70 } 71 72 double J = computeJ(F, x, y); 73 cerr << fixed << setprecision(6); 74 cerr << "Initial J = " << J << "\n"; 75 76 int maxIter = 5000; 77 double tol = 1e-8; 78 for(int it = 0; it < maxIter; ++it){ 79 vector<double> g = computeGrad(F, x, y); 80 double gnorm2 = 0.0; 81 for(int i = 1; i < N; ++i) gnorm2 += g[i]*g[i]; 82 if(sqrt(gnorm2) < tol){ 83 cerr << "Converged: gradient norm < tol at iter " << it << "\n"; 84 break; 85 } 86 double oldJ = computeJ(F, x, y); 87 // Backtracking line search 88 double h = (b-a)/N; 89 double alpha = 0.2 * h; // scale step with grid size 90 vector<double> ynew = y; 91 int bt = 0; bool accepted = false; 92 while(bt < 20){ 93 for(int i = 1; i < N; ++i) ynew[i] = y[i] - alpha * g[i]; 94 double newJ = computeJ(F, x, ynew); 95 if(newJ <= oldJ - 1e-6 * alpha * gnorm2){ 96 y = ynew; J = newJ; accepted = true; break; 97 } 98 alpha *= 0.5; ++bt; 99 } 100 if(!accepted){ 101 cerr << "Line search failed at iter " << it << ", stopping.\n"; 102 break; 103 } 104 if(it % 200 == 0) cerr << "iter=" << it << ", J=" << J << ", alpha=" << alpha << "\n"; 105 } 106 107 // The solution should be close to the straight line y = x 108 cout << fixed << setprecision(6); 109 cout << "x y\n"; 110 for(int i = 0; i <= N; ++i){ 111 cout << x[i] << ' ' << y[i] << '\n'; 112 } 113 cerr << "Final J = " << J << " (straight line has J = sqrt(2) â " << sqrt(2.0) << ")\n"; 114 return 0; 115 } 116
We discretize J[y] = ⍠sqrt(1 + (y')^2) dx using midpoint quadrature and central differences. The gradient with respect to interior y-values implements a discrete EulerâLagrange formula. Gradient descent with backtracking line search monotonically reduces the arc length, converging to the straight line that connects the endpoints.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Solve y'' = k * y on [0,1], with y(0)=0, y(1)=1 (minimizes ⍠(1/2)(y')^2 + (k/2) y^2 dx) 5 6 struct State { double y, v; }; // y' = v, v' = k*y 7 8 State rk4_step(const State &s, double h, double k){ 9 auto f = [k](const State &u){ return State{u.v, k*u.y}; }; 10 State k1 = f(s); 11 State k2 = f({s.y + 0.5*h*k1.y, s.v + 0.5*h*k1.v}); 12 State k3 = f({s.y + 0.5*h*k2.y, s.v + 0.5*h*k2.v}); 13 State k4 = f({s.y + h*k3.y, s.v + h*k3.v}); 14 State out; 15 out.y = s.y + (h/6.0)*(k1.y + 2*k2.y + 2*k3.y + k4.y); 16 out.v = s.v + (h/6.0)*(k1.v + 2*k2.v + 2*k3.v + k4.v); 17 return out; 18 } 19 20 // For a given initial slope s = y'(0), integrate to x=1 and return y(1) - target 21 double shoot(double s, double k, int N, double target=1.0){ 22 double h = 1.0 / N; 23 State st{0.0, s}; 24 for(int i = 0; i < N; ++i){ st = rk4_step(st, h, k); } 25 return st.y - target; 26 } 27 28 int main(){ 29 ios::sync_with_stdio(false); 30 cin.tie(nullptr); 31 32 double k = 1.0; // spring-like potential 33 int N = 2000; // steps for RK4 integration 34 35 // Secant method to find s such that y(1)=1 36 double s0 = 0.0, s1 = 2.0; 37 double f0 = shoot(s0, k, N), f1 = shoot(s1, k, N); 38 39 for(int it = 0; it < 50; ++it){ 40 double denom = (f1 - f0); 41 if (abs(denom) < 1e-14) break; // avoid division by zero 42 double s2 = s1 - f1 * (s1 - s0) / denom; 43 double f2 = shoot(s2, k, N); 44 // Shift window 45 s0 = s1; f0 = f1; 46 s1 = s2; f1 = f2; 47 if (abs(f1) < 1e-10) break; 48 } 49 50 // Report results and compare to analytic solution for k=1: y(x) = sinh(x)/sinh(1) 51 double s_star = s1; 52 cout.setf(std::ios::fixed); cout<<setprecision(8); 53 cout << "Estimated initial slope s* = " << s_star << "\n"; 54 if (abs(k - 1.0) < 1e-12){ 55 double s_true = 1.0 / sinh(1.0); 56 cout << "True slope (analytic) = " << s_true << "\n"; 57 } 58 // Output sampled solution using found slope 59 double h = 1.0 / N; 60 State st{0.0, s_star}; 61 cout << "x y\n"; 62 cout << 0.0 << ' ' << st.y << '\n'; 63 for(int i = 0; i < N; ++i){ 64 st = rk4_step(st, h, k); 65 cout << (i+1)*h << ' ' << st.y << '\n'; 66 } 67 return 0; 68 } 69
For L = (1/2)(y')^2 + (k/2)y^2, the EulerâLagrange equation is y'' = k y. We impose boundary values y(0)=0, y(1)=1. The shooting method guesses y'(0)=s, integrates the ODE forward with RK4, and uses the secant method to adjust s until the terminal value matches the boundary.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Discretize [0,1] with Dirichlet boundaries y(0)=y(1)=0. 5 // Minimize E[y] = ⍠0.5 (y')^2 dx subject to ⍠y^2 dx = C by gradient descent + normalization. 6 7 int main(){ 8 ios::sync_with_stdio(false); 9 cin.tie(nullptr); 10 11 int N = 400; // nodes = N+1, cells = N 12 double a = 0.0, b = 1.0; 13 double h = (b-a)/N; 14 double C = 0.1; // desired \int y^2 dx 15 16 vector<double> y(N+1, 0.0), ynew(N+1, 0.0); 17 // Initialize with a small random function (interior only), then project to constraint 18 mt19937 rng(123); 19 normal_distribution<double> nd(0.0, 1.0); 20 for(int i = 1; i < N; ++i){ y[i] = 0.1 * nd(rng); } 21 // enforce boundaries 22 y[0] = 0.0; y[N] = 0.0; 23 auto project = [&](vector<double>& z){ 24 // scale to satisfy ⍠y^2 dx = C 25 double s = 0.0; for(int i = 1; i < N; ++i) s += z[i]*z[i] * h; 26 if (s > 1e-16){ double alpha = sqrt(C / s); for(int i = 1; i < N; ++i) z[i] *= alpha; } 27 }; 28 project(y); 29 30 auto energy = [&](const vector<double>& z){ 31 double E = 0.0; 32 for(int i = 0; i < N; ++i){ double dy = z[i+1]-z[i]; E += 0.5 * (dy*dy) / h; } 33 return E; 34 }; 35 36 double tau = 0.2 * h; // stable step size scaling with grid size 37 int maxIter = 20000; 38 double prevE = energy(y); 39 40 for(int it = 0; it < maxIter; ++it){ 41 // Gradient of E at interior nodes: dE/dy_k = (1/h) * (2 y_k - y_{k-1} - y_{k+1}) 42 for(int i = 1; i < N; ++i){ 43 double lap = (2.0*y[i] - y[i-1] - y[i+1]); 44 ynew[i] = y[i] - tau * (lap / h); // explicit gradient descent step 45 } 46 ynew[0] = 0.0; ynew[N] = 0.0; // Dirichlet boundaries 47 y.swap(ynew); 48 project(y); // enforce isoperimetric constraint 49 50 if (it % 500 == 0){ 51 double E = energy(y); 52 cerr << "iter=" << it << ", E=" << E << "\n"; 53 if (abs(E - prevE) < 1e-12) break; 54 prevE = E; 55 } 56 } 57 58 // Output the resulting function; it should resemble y(x) = A sin(pi x) 59 cout.setf(std::ios::fixed); cout << setprecision(6); 60 cout << "x y\n"; 61 for(int i = 0; i <= N; ++i){ 62 double x = a + i*h; 63 cout << x << ' ' << y[i] << '\n'; 64 } 65 return 0; 66 } 67
We minimize E[y] = ⍠0.5(y')^2 dx subject to ⍠y^2 dx = C by alternating an explicit gradient descent step (which decreases E) and a projection that rescales y to satisfy the integral constraint exactly. The stationary point solves the isoperimetric EL: ây'' + Îť y = 0 with y(0)=y(1)=0, whose first nontrivial solution is a scaled sine. The method converges to that mode while meeting the constraint.