📚TheoryAdvanced

Calculus of Variations

Key Points

  • •
    Calculus of variations optimizes functionals—numbers produced by whole functions—rather than ordinary functions of numbers.
  • •
    The Euler–Lagrange equation \( - \frac{d}{dx}\left(\frac{\partial L}{\partial y'}\right) = 0\) gives a necessary condition that optimal functions must satisfy.
  • •
    Boundary conditions matter as much as the differential equation: fixing endpoints vs. leaving them free changes the solution through natural boundary terms.
  • •
    Constraints on integrals (isoperimetric problems) are handled with Lagrange multipliers, adding terms to the Euler–Lagrange equation.
  • •
    In practice, we solve these problems numerically by discretizing the function and doing gradient descent or by solving the resulting boundary-value ODE with shooting or finite differences.
  • •
    If the Lagrangian does not depend on x explicitly, the Beltrami identity gives a first integral, simplifying calculations.
  • •
    Optimal control generalizes calculus of variations; the Hamilton–Jacobi–Bellman equation plays a similar role to Euler–Lagrange for controls.
  • •
    For ML and RL with continuous dynamics, these tools inform trajectory optimization, policy gradients in function spaces, and understanding continuous-time limits.

Prerequisites

  • →Single-variable differential calculus — Needed to understand derivatives, Taylor expansions, and slopes y'.
  • →Integral calculus — Functionals are integrals over functions; quadrature rules approximate them numerically.
  • →Ordinary differential equations (ODEs) — Euler–Lagrange yields ODE boundary-value problems; numerical solvers rely on ODE integration.
  • →Linear algebra — Discretizations lead to linear systems and eigenproblems, especially with constraints.
  • →Finite differences — To discretize derivatives and build numerical gradients for functionals.
  • →Convexity and optimization basics — To interpret necessary vs. sufficient conditions and convergence of gradient descent.
  • →C++ programming fundamentals — Required to understand and modify the provided implementations.
  • →Runge–Kutta methods — Used in the shooting method to integrate ODEs accurately.
  • →Optimal control fundamentals — To connect calculus of variations with HJB and policy optimization in continuous systems.
  • →Variational formulations of PDEs — Provides context for using energies to solve boundary value problems and for finite element methods.

Detailed Explanation

Tap terms for definitions

01Overview

Hook: Imagine planning a hiking trail from A to B that is as short as possible but also avoids cliffs and rivers. You are not choosing a single number; you are choosing an entire path y(x) that balances multiple factors everywhere along the trail. Calculus of variations is the mathematics of making the best choice among all such functions. Concept: Instead of minimizing a scalar function f(x), we minimize a functional J[y] that maps a whole function y(x) to a number. A standard form is J[y] = (\int_a^b L(x, y(x), y'(x)),dx), where L is called the Lagrangian. The necessary condition for an extremum is the Euler–Lagrange equation: (\partial L/\partial y - d/dx(\partial L/\partial y') = 0). Boundary conditions (values and/or slopes at the endpoints) complete the problem. Example: The shortest path in the plane between two fixed points corresponds to L(x, y, y') = (\sqrt{1 + (y')^2}). The Euler–Lagrange equation simplifies to y'' = 0, giving a straight line. More complex examples include the brachistochrone (fastest descent under gravity) and geodesics on curved surfaces. When constraints on integrals are present (e.g., fixed area under the curve), Lagrange multipliers extend the method and lead to isoperimetric problems.

02Intuition & Analogies

Think of a functional as a “scorekeeper” for a whole curve. If you draw a path on paper, a functional like length will trace a ruler along every small piece and add them up. Another functional might measure how wiggly the path is (using its slope), like penalizing sharp turns when designing highways. We want the path with the best overall score. The Euler–Lagrange equation is like a local fairness rule: take any tiny wiggle you make to the current path; if the path is optimal, that wiggle should not improve the score to first order. Translating that principle of “no beneficial infinitesimal tweaks” into calculus gives the differential equation (\partial L/\partial y - d/dx(\partial L/\partial y') = 0). So, optimal functions are those for which every microscopic piece already behaves optimally with its neighbors. Constraints add realism: perhaps your garden hose must encircle a fixed area of grass (integral constraint), or endpoints may slide along rails (free boundaries). Lagrange multipliers act like adjustable prices: the system tunes a multiplier so the constraint is exactly satisfied at optimum. In computation, we cannot directly manipulate infinite-dimensional objects, so we chop the interval into many small segments (a grid) and turn the functional into a sum. Now we have an ordinary optimization over many variables (the y-values at grid points). Gradient descent on this discretized energy mimics the Euler–Lagrange condition. Alternatively, when the Euler–Lagrange equation forms a boundary-value ODE, we can “shoot” from one end and adjust the initial slope until we hit the other boundary—like aiming a cannon whose shell must land at a precise target.

03Formal Definition

A functional maps a function to a real number. The prototypical variational problem is: find y in an admissible class \(\) that minimizes \(J[y] = L(x, y(x), y'(x))\,dx\), subject to boundary conditions (Dirichlet: y(a), y(b) fixed; Neumann/natural: conditions on y'). A function y is an extremal if the first variation \( J[y; ] = 0\) for all smooth perturbations \(\) vanishing at fixed endpoints. Using integration by parts, the vanishing of the first variation yields the Euler–Lagrange (EL) equation \[(x, y, y') - \frac{d}{dx}\left((x, y, y')\right) = 0.\] If an endpoint is free (not fixed), a natural boundary condition appears: \( L/ y' = 0\) at that endpoint. With an integral constraint \( g(x, y)\,dx = C\), introduce a Lagrange multiplier \(\) and extremize \( (L + g)\,dx\), producing \( L/ y - d/dx( L/ y') + \, g/ y = 0\). Special structure simplifies EL: if L does not depend on x explicitly, the Beltrami identity holds: \(L - y'\, L/ y' = \). In classical mechanics, with Lagrangian \(L = T - V\), EL becomes Newton’s second law; in optimal control, its counterparts are the Pontryagin Maximum Principle and the Hamilton–Jacobi–Bellman equation.

04When to Use

  • Path planning and geometry: shortest curves (geodesics), minimal surfaces, spline smoothing; wherever an integral cost over a continuous curve or surface must be minimized.
  • Physics and engineering: deriving equations of motion from least action, optimal shapes of beams (elastic energy), optics via Fermat’s principle (light takes paths of stationary optical length).
  • Optimal control and RL (continuous time): designing control signals u(t) that minimize (\int L(x(t), u(t)),dt) subject to (\dot{x} = f(x,u)); the variational viewpoint leads to costates, Hamiltonians, and HJB PDEs. In ML, trajectory optimization and continuous policy gradients in function spaces echo these ideas.
  • Statistics and ML regularization: estimating functions by minimizing (\int (\text{loss}) + \lambda (\text{roughness}),dx) (e.g., smoothing splines where the roughness term is (\int (y'')^2,dx)).
  • Boundary value problems: whenever the solution of a PDE/ODE can be characterized as an energy minimizer, variational methods give both theoretical guarantees (existence/uniqueness) and practical algorithms (finite elements, finite differences).

⚠️Common Mistakes

  • Treating the Euler–Lagrange equation as sufficient. It is only necessary: solutions can be maxima or saddle points. Check second-variation or convexity (e.g., L convex in y' often implies a minimum).
  • Ignoring boundary terms. If endpoints are not fixed, the natural boundary condition (\partial L/\partial y' = 0) (or more general transversality) must be enforced. Forgetting this yields incorrect solutions.
  • Mixing partial derivatives and total derivatives. In EL, (\partial L/\partial y) and (\partial L/\partial y') are partials treating x, y, y' as independent; then you take a total derivative in x of (\partial L/\partial y'). Confusing these leads to algebraic errors.
  • Poor discretization. Using forward differences inconsistently or evaluating L at mismatched points (nodes vs midpoints) can bias gradients or destabilize descent. Central differences and midpoint quadrature are usually better for 1D problems.
  • Step size issues in gradient descent. Too-large steps cause divergence; too-small steps stall. Use line search or scale steps with the grid size (e.g., O(h)).
  • Mishandling constraints. For isoperimetric problems, either introduce a Lagrange multiplier (and solve for it) or project iterates onto the constraint set. Half-implementations drift away from feasibility.
  • Forgetting scaling/units. The Beltrami identity and conservation-like laws assume conditions (e.g., no explicit x-dependence). Applying them blindly when assumptions fail gives wrong constants.

Key Formulas

Functional

Explanation: This defines the cost of a whole function y by integrating a local cost L over the interval. Many variational problems take exactly this form.

Euler–Lagrange Equation

Explanation: A necessary condition for y to be an extremum of J. It balances the sensitivity to the function value and to its slope at every point.

Natural Boundary Conditions

Explanation: If an endpoint value is free (not fixed), boundary terms force the derivative of L with respect to y' to vanish at that endpoint.

Beltrami Identity

Explanation: When L has no explicit x-dependence, this quantity is conserved along extremals. It reduces a second-order problem to a first-order one.

Isoperimetric EL with Multiplier

Explanation: For minimizing J[y] with an integral constraint ∫ g(x,y) , introduce a scalar multiplier The augmented EL equation includes λ times the constraint’s y-derivative.

Hamiltonian in Variational Problems

Explanation: The conjugate momentum p and Hamiltonian H encode the same variational information as EL. If L has no explicit x, H is constant along extremals.

First Variation

Explanation: The first-order change in J due to a small perturbation Integration by parts of the term yields the EL equation and boundary terms.

EL for Quadratic Kinetic + Potential

Explanation: For L equal to kinetic minus potential (or plus for energies), EL reduces to Newton-like ODEs. This is the bridge to mechanics and to BVP solvers.

Discrete Euler–Lagrange (Midpoint FD)

Explanation: Finite-difference approximation of EL on a uniform grid using midpoint quadrature. It is the gradient of the discretized functional equal to zero.

Stationary HJB (1D)

Explanation: The Hamilton–Jacobi–Bellman equation characterizes the value function in optimal control. It mirrors variational optimality for control inputs rather than functions y(x).

Complexity Analysis

Numerical calculus of variations reduces infinite-dimensional problems to large but finite optimization tasks. With N grid points over [a, b] and step h ≈ (b−a)/N, midpoint finite-difference discretizations evaluate L, Ly, and Lp on O(N) cells. A single gradient computation for the discretized functional J[y] (using central differences and midpoint quadrature) therefore costs O(N) time and O(N) space for arrays of y, Ly, and Lp. Gradient descent with T iterations yields O(TN) time and O(N) space. Line search typically adds a constant factor (bounded backtracking trials), preserving linear complexity. When the Euler–Lagrange equation reduces to a two-point boundary value ODE, shooting transforms it into repeated initial-value integrations. An explicit Runge–Kutta step is O(1), so integrating across the domain with N steps is O(N). Root-finding usually converges in O(log(1/ iterations for tolerance yielding O(N log(1/ total time and O(1) extra space beyond the solution arrays. For isoperimetric constraints enforced by projection, each iteration performs a gradient step plus a normalization to satisfy ∫ . The gradient step is O(N) using the discrete Laplacian, and the projection (a scalar rescaling using the computed integral) is also O(N). Convergence rates depend on step size (stability requires τ = O(h) for explicit gradient flows) and spectral gaps of the discrete operator; the first mode typically dominates for convex energies. In all methods, accuracy improves as . Discretization error for midpoint/central-difference schemes is typically O() in smooth regions. Thus, to halve the error, you may need roughly four times the work. Memory scales linearly with N, making 1D problems very manageable in C++; higher dimensions require sparse solvers and often multigrid or conjugate gradients to keep costs near-linear per iteration.

Code Examples

Generic 1D variational solver via discrete Euler–Lagrange gradient descent (arc-length geodesic)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Functional {
5 function<double(double,double,double)> L; // L(x, y, p)
6 function<double(double,double,double)> Ly; // dL/dy(x, y, p)
7 function<double(double,double,double)> Lp; // dL/dp(x, y, p) with p = y'
8};
9
10// Compute J[y] ≈ sum h * L(x_{i+1/2}, y_{i+1/2}, p_{i+1/2})
11double computeJ(const Functional &F, const vector<double> &x, const vector<double> &y){
12 int N = (int)x.size() - 1; // N cells, N+1 nodes
13 double J = 0.0;
14 for(int i = 0; i < N; ++i){
15 double h = x[i+1] - x[i];
16 double xmid = 0.5*(x[i] + x[i+1]);
17 double ymid = 0.5*(y[i] + y[i+1]);
18 double p = (y[i+1] - y[i]) / h; // slope on cell
19 J += h * F.L(xmid, ymid, p);
20 }
21 return J;
22}
23
24// Compute gradient dJ/dy_k for interior nodes k = 1..N-1 using midpoint discretization
25vector<double> computeGrad(const Functional &F, const vector<double> &x, const vector<double> &y){
26 int N = (int)x.size() - 1;
27 vector<double> grad(y.size(), 0.0);
28 vector<double> Ly(N), Lp(N);
29 for(int i = 0; i < N; ++i){
30 double h = x[i+1] - x[i];
31 double xmid = 0.5*(x[i] + x[i+1]);
32 double ymid = 0.5*(y[i] + y[i+1]);
33 double p = (y[i+1] - y[i]) / h;
34 Ly[i] = F.Ly(xmid, ymid, p);
35 Lp[i] = F.Lp(xmid, ymid, p);
36 }
37 for(int k = 1; k < N; ++k){
38 double hk_left = x[k] - x[k-1];
39 double hk_right = x[k+1] - x[k];
40 // Contribution from cell (k-1, k)
41 grad[k] += hk_left * 0.5 * Ly[k-1] + Lp[k-1];
42 // Contribution from cell (k, k+1)
43 grad[k] += hk_right * 0.5 * Ly[k] - Lp[k];
44 }
45 // grad[0] and grad[N] are zero because endpoints are fixed (Dirichlet)
46 return grad;
47}
48
49int main(){
50 ios::sync_with_stdio(false);
51 cin.tie(nullptr);
52
53 // Example: shortest path (arc length) L = sqrt(1 + p^2)
54 Functional F;
55 F.L = [](double /*x*/, double /*y*/, double p){ return sqrt(1.0 + p*p); };
56 F.Ly = [](double /*x*/, double /*y*/, double /*p*/){ return 0.0; }; // no y-dependence
57 F.Lp = [](double /*x*/, double /*y*/, double p){ return p / sqrt(1.0 + p*p); };
58
59 int N = 200; // number of segments
60 double a = 0.0, b = 1.0;
61 vector<double> x(N+1), y(N+1);
62 for(int i = 0; i <= N; ++i){ x[i] = a + (b-a)*i/N; }
63
64 // Boundary conditions y(a)=0, y(b)=1
65 y[0] = 0.0; y[N] = 1.0;
66 // Initialize with a slightly wiggly curve to show descent
67 for(int i = 1; i < N; ++i){
68 double t = x[i];
69 y[i] = t + 0.1 * sin(2.0 * M_PI * t);
70 }
71
72 double J = computeJ(F, x, y);
73 cerr << fixed << setprecision(6);
74 cerr << "Initial J = " << J << "\n";
75
76 int maxIter = 5000;
77 double tol = 1e-8;
78 for(int it = 0; it < maxIter; ++it){
79 vector<double> g = computeGrad(F, x, y);
80 double gnorm2 = 0.0;
81 for(int i = 1; i < N; ++i) gnorm2 += g[i]*g[i];
82 if(sqrt(gnorm2) < tol){
83 cerr << "Converged: gradient norm < tol at iter " << it << "\n";
84 break;
85 }
86 double oldJ = computeJ(F, x, y);
87 // Backtracking line search
88 double h = (b-a)/N;
89 double alpha = 0.2 * h; // scale step with grid size
90 vector<double> ynew = y;
91 int bt = 0; bool accepted = false;
92 while(bt < 20){
93 for(int i = 1; i < N; ++i) ynew[i] = y[i] - alpha * g[i];
94 double newJ = computeJ(F, x, ynew);
95 if(newJ <= oldJ - 1e-6 * alpha * gnorm2){
96 y = ynew; J = newJ; accepted = true; break;
97 }
98 alpha *= 0.5; ++bt;
99 }
100 if(!accepted){
101 cerr << "Line search failed at iter " << it << ", stopping.\n";
102 break;
103 }
104 if(it % 200 == 0) cerr << "iter=" << it << ", J=" << J << ", alpha=" << alpha << "\n";
105 }
106
107 // The solution should be close to the straight line y = x
108 cout << fixed << setprecision(6);
109 cout << "x y\n";
110 for(int i = 0; i <= N; ++i){
111 cout << x[i] << ' ' << y[i] << '\n';
112 }
113 cerr << "Final J = " << J << " (straight line has J = sqrt(2) ≈ " << sqrt(2.0) << ")\n";
114 return 0;
115}
116

We discretize J[y] = ∫ sqrt(1 + (y')^2) dx using midpoint quadrature and central differences. The gradient with respect to interior y-values implements a discrete Euler–Lagrange formula. Gradient descent with backtracking line search monotonically reduces the arc length, converging to the straight line that connects the endpoints.

Time: O(N ¡ T) for T gradient descent iterations (each gradient and J evaluation is O(N)).Space: O(N) for storing x, y, and per-cell Ly/Lp arrays.
Shooting method with RK4 for an Euler–Lagrange boundary-value problem
1#include <bits/stdc++.h>
2using namespace std;
3
4// Solve y'' = k * y on [0,1], with y(0)=0, y(1)=1 (minimizes ∍ (1/2)(y')^2 + (k/2) y^2 dx)
5
6struct State { double y, v; }; // y' = v, v' = k*y
7
8State rk4_step(const State &s, double h, double k){
9 auto f = [k](const State &u){ return State{u.v, k*u.y}; };
10 State k1 = f(s);
11 State k2 = f({s.y + 0.5*h*k1.y, s.v + 0.5*h*k1.v});
12 State k3 = f({s.y + 0.5*h*k2.y, s.v + 0.5*h*k2.v});
13 State k4 = f({s.y + h*k3.y, s.v + h*k3.v});
14 State out;
15 out.y = s.y + (h/6.0)*(k1.y + 2*k2.y + 2*k3.y + k4.y);
16 out.v = s.v + (h/6.0)*(k1.v + 2*k2.v + 2*k3.v + k4.v);
17 return out;
18}
19
20// For a given initial slope s = y'(0), integrate to x=1 and return y(1) - target
21double shoot(double s, double k, int N, double target=1.0){
22 double h = 1.0 / N;
23 State st{0.0, s};
24 for(int i = 0; i < N; ++i){ st = rk4_step(st, h, k); }
25 return st.y - target;
26}
27
28int main(){
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 double k = 1.0; // spring-like potential
33 int N = 2000; // steps for RK4 integration
34
35 // Secant method to find s such that y(1)=1
36 double s0 = 0.0, s1 = 2.0;
37 double f0 = shoot(s0, k, N), f1 = shoot(s1, k, N);
38
39 for(int it = 0; it < 50; ++it){
40 double denom = (f1 - f0);
41 if (abs(denom) < 1e-14) break; // avoid division by zero
42 double s2 = s1 - f1 * (s1 - s0) / denom;
43 double f2 = shoot(s2, k, N);
44 // Shift window
45 s0 = s1; f0 = f1;
46 s1 = s2; f1 = f2;
47 if (abs(f1) < 1e-10) break;
48 }
49
50 // Report results and compare to analytic solution for k=1: y(x) = sinh(x)/sinh(1)
51 double s_star = s1;
52 cout.setf(std::ios::fixed); cout<<setprecision(8);
53 cout << "Estimated initial slope s* = " << s_star << "\n";
54 if (abs(k - 1.0) < 1e-12){
55 double s_true = 1.0 / sinh(1.0);
56 cout << "True slope (analytic) = " << s_true << "\n";
57 }
58 // Output sampled solution using found slope
59 double h = 1.0 / N;
60 State st{0.0, s_star};
61 cout << "x y\n";
62 cout << 0.0 << ' ' << st.y << '\n';
63 for(int i = 0; i < N; ++i){
64 st = rk4_step(st, h, k);
65 cout << (i+1)*h << ' ' << st.y << '\n';
66 }
67 return 0;
68}
69

For L = (1/2)(y')^2 + (k/2)y^2, the Euler–Lagrange equation is y'' = k y. We impose boundary values y(0)=0, y(1)=1. The shooting method guesses y'(0)=s, integrates the ODE forward with RK4, and uses the secant method to adjust s until the terminal value matches the boundary.

Time: O(N ¡ I), where N is the number of RK4 steps and I is the number of secant iterations (typically small).Space: O(1) beyond storing the current state; output storage is O(N) if you keep the trajectory.
Isoperimetric constraint via projected gradient flow (minimize ∍ 1/2 (y')^2 dx with ∍ y^2 dx = C)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Discretize [0,1] with Dirichlet boundaries y(0)=y(1)=0.
5// Minimize E[y] = ∍ 0.5 (y')^2 dx subject to ∍ y^2 dx = C by gradient descent + normalization.
6
7int main(){
8 ios::sync_with_stdio(false);
9 cin.tie(nullptr);
10
11 int N = 400; // nodes = N+1, cells = N
12 double a = 0.0, b = 1.0;
13 double h = (b-a)/N;
14 double C = 0.1; // desired \int y^2 dx
15
16 vector<double> y(N+1, 0.0), ynew(N+1, 0.0);
17 // Initialize with a small random function (interior only), then project to constraint
18 mt19937 rng(123);
19 normal_distribution<double> nd(0.0, 1.0);
20 for(int i = 1; i < N; ++i){ y[i] = 0.1 * nd(rng); }
21 // enforce boundaries
22 y[0] = 0.0; y[N] = 0.0;
23 auto project = [&](vector<double>& z){
24 // scale to satisfy ∍ y^2 dx = C
25 double s = 0.0; for(int i = 1; i < N; ++i) s += z[i]*z[i] * h;
26 if (s > 1e-16){ double alpha = sqrt(C / s); for(int i = 1; i < N; ++i) z[i] *= alpha; }
27 };
28 project(y);
29
30 auto energy = [&](const vector<double>& z){
31 double E = 0.0;
32 for(int i = 0; i < N; ++i){ double dy = z[i+1]-z[i]; E += 0.5 * (dy*dy) / h; }
33 return E;
34 };
35
36 double tau = 0.2 * h; // stable step size scaling with grid size
37 int maxIter = 20000;
38 double prevE = energy(y);
39
40 for(int it = 0; it < maxIter; ++it){
41 // Gradient of E at interior nodes: dE/dy_k = (1/h) * (2 y_k - y_{k-1} - y_{k+1})
42 for(int i = 1; i < N; ++i){
43 double lap = (2.0*y[i] - y[i-1] - y[i+1]);
44 ynew[i] = y[i] - tau * (lap / h); // explicit gradient descent step
45 }
46 ynew[0] = 0.0; ynew[N] = 0.0; // Dirichlet boundaries
47 y.swap(ynew);
48 project(y); // enforce isoperimetric constraint
49
50 if (it % 500 == 0){
51 double E = energy(y);
52 cerr << "iter=" << it << ", E=" << E << "\n";
53 if (abs(E - prevE) < 1e-12) break;
54 prevE = E;
55 }
56 }
57
58 // Output the resulting function; it should resemble y(x) = A sin(pi x)
59 cout.setf(std::ios::fixed); cout << setprecision(6);
60 cout << "x y\n";
61 for(int i = 0; i <= N; ++i){
62 double x = a + i*h;
63 cout << x << ' ' << y[i] << '\n';
64 }
65 return 0;
66}
67

We minimize E[y] = ∫ 0.5(y')^2 dx subject to ∫ y^2 dx = C by alternating an explicit gradient descent step (which decreases E) and a projection that rescales y to satisfy the integral constraint exactly. The stationary point solves the isoperimetric EL: −y'' + λ y = 0 with y(0)=y(1)=0, whose first nontrivial solution is a scaled sine. The method converges to that mode while meeting the constraint.

Time: O(N ¡ T) for T iterations (each step computes a discrete Laplacian and a scalar projection).Space: O(N) to store y and ynew.