🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Hessian Matrix

Key Points

  • •
    The Hessian matrix collects all second-order partial derivatives of a scalar function and measures local curvature.
  • •
    For twice continuously differentiable functions, the Hessian is symmetric by Clairaut’s theorem.
  • •
    The quadratic term pT H p in the second-order Taylor expansion predicts how fast the function bends along direction p.
  • •
    Positive definite Hessians indicate local minima, negative definite indicate local maxima, and indefinite indicate saddle points.
  • •
    In optimization, Newton’s method uses the Hessian to choose curvature-aware steps that often converge very fast near the optimum.
  • •
    Numerically approximating a full n×n Hessian with central differences typically requires O(n2) function evaluations.
  • •
    Eigenvalues of the Hessian describe principal curvatures; signs tell you the type of critical point and magnitudes tell you how sharp it is.
  • •
    Hessian–vector products can be computed without forming H explicitly, enabling scalable second-order methods.

Prerequisites

  • →Multivariable calculus (limits, partial derivatives, chain rule) — The Hessian is built from second-order partial derivatives; understanding derivatives and continuity is essential.
  • →Linear algebra (vectors, matrices, eigenvalues) — The Hessian is a symmetric matrix; interpreting curvature requires eigenvalues, quadratic forms, and solving linear systems.
  • →Numerical analysis and floating-point arithmetic — Finite-difference approximation accuracy and stability depend on step sizes and round-off behavior.
  • →Unconstrained optimization basics — Using Hessians in Newton/trust-region methods requires understanding descent directions and line search/trust-region globalization.

Detailed Explanation

Tap terms for definitions

01Overview

The Hessian matrix is a square matrix of all second-order partial derivatives of a real-valued function. If you have a function f that maps points from n-dimensional space to real numbers, the Hessian summarizes how the slope (gradient) changes as you move in different directions. While the gradient tells you the direction of steepest ascent, the Hessian tells you how that direction itself curves—how the function bends. This makes the Hessian central to understanding local shape: whether you are in a valley (minimum), on a hilltop (maximum), or on a saddle. In optimization and machine learning, the Hessian guides powerful algorithms like Newton’s method and quasi-Newton methods that exploit curvature to make faster progress than methods that only use gradients. In numerical analysis, one often approximates Hessians with finite differences when analytic derivatives are not available. The Hessian also appears in physics (stability analysis), computer vision (blob/ridge detection), and statistics (Fisher information approximations). Importantly, when f is smooth enough (C^2), the Hessian is symmetric, enabling efficient linear algebra and meaningful eigenvalue interpretations that connect directly to convexity, stability, and step-size selection.

02Intuition & Analogies

Imagine hiking on a landscape described by a height function f(x, y). The gradient tells you which way is uphill right now. But the gradient alone cannot tell you whether the terrain is gently sloping or sharply curved. The Hessian fills that gap: it’s like a curvature compass that says how the slope changes as you step in any direction. If you step along some direction p, the term p^T H p tells you how quickly the slope increases or decreases—positive values mean the ground curves upward like a bowl, negative means it curves downward like a dome, and values of different signs in different directions indicate a saddle. Another analogy: think of driving on a road where the steering wheel angle is the gradient. The Hessian measures how much you need to turn the steering wheel more or less as you move—how rapidly the steering demand changes. In finance, consider the price of an option as a function of multiple risks; the Hessian entries quantify how sensitive the sensitivity is—the curvature of price with respect to pairs of inputs. In machine learning, loss functions over parameters form a high-dimensional landscape; the Hessian’s eigenvalues show which directions are flat (small eigenvalues) or stiff (large ones). This helps decide step sizes and whether you might oscillate (negative curvature) if you move too aggressively. In short, while the gradient answers “which way?”, the Hessian answers “how does that way change as you go?”, letting you predict and control motion more precisely.

03Formal Definition

Let f: Rn → R be twice differentiable. The Hessian matrix H(x) at point x ∈ Rn is the n × n matrix defined by Hij​(x) = ∂xi​∂xj​∂2f(x)​. If f ∈ C2, then by Clairaut–Schwarz theorem the mixed partials commute and H(x) is symmetric: Hij​(x) = Hji​(x). The second-order Taylor expansion around x for a small step p is f(x + p) ≈ f(x) + ∇ f(x)^{⊤} p + 21​ p⊤ H(x) p, where ∇ f is the gradient. The quadratic form q(p) = p⊤ H(x) p quantifies curvature along direction p. Spectral properties of H characterize local shape: if all eigenvalues are positive (H ≻ 0), f is locally strictly convex and x is a strict local minimizer if ∇ f(x)=0; if all are negative (H ≺ 0), it is a strict local maximizer; if signs are mixed, x is a saddle point. In optimization, Newton’s step solves H(x) p = -∇ f(x), yielding p = -H(x)^{-1} ∇ f(x) when H is nonsingular. When f is convex, H(x) is positive semidefinite for all x; strong convexity corresponds to H(x) ⪰ m I for some m>0.

04When to Use

Use the Hessian when you need to understand or exploit curvature. In unconstrained optimization, Newton’s method and its variants leverage H to achieve quadratic convergence near a solution, drastically reducing iterations compared to gradient descent. In machine learning, analyzing Hessian eigenvalues helps diagnose sharp vs. flat minima, choose trust-region sizes, or precondition gradient methods. In statistics and econometrics, the observed Hessian (or its expectation, Fisher information) provides standard errors via curvature of the log-likelihood. In numerical PDEs and physics, Hessians arise in stability analysis: positive definiteness of an energy functional’s Hessian signals stable equilibria. In computer vision, the Hessian of an image intensity function is used to detect ridges and blobs (e.g., via determinant or eigenvalues). Practically, compute or approximate Hessians when: (1) you need fast local convergence near a minimizer, (2) you must classify a critical point, (3) you require uncertainty quantification from curvature, or (4) you want to approximate function behavior by a quadratic model. When n is large, prefer Hessian–vector products (avoiding explicit H) or quasi-Newton updates (e.g., BFGS) that approximate curvature with lower cost.

⚠️Common Mistakes

• Ignoring symmetry: For C^2 functions, H should be symmetric; numerical approximations can break symmetry due to round-off. Symmetrize by averaging with its transpose. • Bad finite-difference step sizes: Steps that are too small amplify floating-point noise; too large induce truncation error. A common heuristic is h_i = \sqrt{\epsilon} \max(1, |x_i|). • Misclassifying critical points: Checking only the determinant in 2D is not enough when det = 0; you must consider eigenvalues or higher-order terms. • Forgetting domain issues: Hessians assume interior points where partials exist; at kinks or constraints, second derivatives may not exist or be irrelevant (use subgradients or KKT conditions). • Solving Newton steps without safeguards: If H is not positive definite, pure Newton can diverge. Use line search, damping, or modify H (e.g., add \lambda I) to ensure descent. • Mixing units/scales: Highly scaled variables can cause ill-conditioned Hessians and unstable linear solves; rescale features or precondition. • Assuming convexity from a single Hessian: Positive definiteness at one point does not imply global convexity unless H(x) \succeq 0 for all x. • Overlooking Hessian–vector tricks: Forming H explicitly is O(n^2) memory; use Hv products with iterative solvers when n is large.

Key Formulas

Hessian Definition

Hij​(x)=∂xi​∂xj​∂2f(x)​

Explanation: Each entry is a second-order partial derivative. Arranging these entries yields the Hessian matrix H(x) that captures curvature.

Symmetry (Clairaut–Schwarz)

Hij​(x)=Hji​(x)if f∈C2

Explanation: If mixed partials are continuous, their order does not matter. This makes the Hessian symmetric and enables eigen-decomposition.

Second-Order Taylor Expansion

f(x+p)≈f(x)+∇f(x)⊤p+21​p⊤H(x)p

Explanation: This quadratic approximation predicts how f changes for small steps p. The Hessian’s quadratic form is the curvature term.

Quadratic Form

p⊤H(x)p=i=1∑n​j=1∑n​Hij​(x)pi​pj​

Explanation: This expands curvature along direction p in coordinates. The sign and magnitude reveal bowl/dome/flatness along p.

Convexity via Hessian

H(x)⪰0∀x⟺f is convex (for C2 functions)

Explanation: A twice-differentiable function is convex exactly when its Hessian is positive semidefinite everywhere. Nonnegativity of p⊤Hp for all p encodes this.

Newton Step

pNewton​=−H(x)−1∇f(x)

Explanation: The Newton direction solves H p = -g. When H is positive definite near a minimizer, this step often gives rapid (quadratic) convergence.

Second Derivative (Diagonal) Finite Difference

fxi​xi​​(x)≈h2f(x+hei​)−2f(x)+f(x−hei​)​

Explanation: A central difference approximation for the second derivative along axis i. Accuracy is second order in h.

Mixed Partial Finite Difference

fxi​xj​​(x)≈4h2f(x+hei​+hej​)−f(x+hei​−hej​)−f(x−hei​+hej​)+f(x−hei​−hej​)​

Explanation: A symmetric stencil to approximate cross derivatives. Requires four function evaluations per (i,j) pair, yielding second-order accuracy.

2×2 Hessian Eigenvalues

λ1,2​=2tr(H)±tr(H)2−4det(H)​​(n=2)

Explanation: For 2D, eigenvalues depend on the trace and determinant. Their signs classify the critical point type.

Hessian–Vector Product

H(x)v=ϵ→0lim​ϵ∇f(x+ϵv)−∇f(x)​

Explanation: The directional derivative of the gradient equals H times v. This identity enables computing Hv without forming H explicitly.

Complexity Analysis

Computing and storing a full Hessian for a function f: Rn -> R costs O(n2) memory. If you approximate H with central finite differences, the diagonal entries need 2 evaluations per coordinate (plus the baseline f(x)), and each off-diagonal entry needs 4 evaluations. Counting unique entries in the symmetric matrix, that is 2n + 4 * n(n-1)/2 = 2n + 2n(n-1) = 2n2 function evaluations asymptotically, hence O(n2) evaluations. Each evaluation’s cost depends on f; if f costs Cf​ time, the total is O(n2 Cf​). Building gradients numerically with central differences costs O(n Cf​). In Newton-type optimization, per iteration work includes: (1) computing g = ∇f(x) (O(n Cf​) if numerical), (2) computing H (O(n2 Cf​) if numerical), and (3) solving the linear system H p = -g. A dense direct solve via Gaussian elimination with partial pivoting costs O(n3) time and O(n2) space. Therefore, for moderate n, the linear solve often dominates; for very expensive f, derivative evaluations may dominate. When n is large, forming H explicitly is prohibitive; instead, one uses Hessian–vector products H v (which can be computed in O(Cg​rad) time where Cg​rad is the cost of a gradient evaluation) with iterative solvers like Conjugate Gradient, reducing memory from O(n2) to O(n). Numerical stability considerations impact practical performance: ill-conditioned Hessians slow linear solves and degrade finite-difference accuracy; regularization (adding λI), scaling, and line search/trust regions help ensure robust convergence.

Code Examples

Numerical Gradient and Hessian (Central Differences) with a Rosenbrock Demo
1#include <iostream>
2#include <vector>
3#include <functional>
4#include <cmath>
5#include <limits>
6#include <iomanip>
7#include <algorithm>
8
9// Compute central-difference gradient
10std::vector<double> numerical_gradient(
11 const std::function<double(const std::vector<double>&)>& f,
12 const std::vector<double>& x) {
13 const double eps = std::sqrt(std::numeric_limits<double>::epsilon());
14 int n = (int)x.size();
15 std::vector<double> g(n, 0.0);
16 std::vector<double> xp = x, xm = x;
17 for (int i = 0; i < n; ++i) {
18 double hi = eps * std::max(1.0, std::abs(x[i]));
19 xp[i] = x[i] + hi;
20 xm[i] = x[i] - hi;
21 double fp = f(xp);
22 double fm = f(xm);
23 g[i] = (fp - fm) / (2.0 * hi);
24 xp[i] = xm[i] = x[i];
25 }
26 return g;
27}
28
29// Compute central-difference Hessian (symmetric) with O(n^2) evaluations
30std::vector<std::vector<double>> numerical_hessian(
31 const std::function<double(const std::vector<double>&)>& f,
32 const std::vector<double>& x) {
33 const double eps = std::sqrt(std::numeric_limits<double>::epsilon());
34 int n = (int)x.size();
35 std::vector<std::vector<double>> H(n, std::vector<double>(n, 0.0));
36 std::vector<double> xp = x, xm = x, xpp = x, xpm = x, xmp = x, xmm = x;
37
38 double fx = f(x); // baseline value
39
40 // Diagonal second derivatives: f(x+h e_i) - 2 f(x) + f(x-h e_i)
41 for (int i = 0; i < n; ++i) {
42 double hi = eps * std::max(1.0, std::abs(x[i]));
43 xp[i] = x[i] + hi;
44 xm[i] = x[i] - hi;
45 double fp = f(xp);
46 double fm = f(xm);
47 H[i][i] = (fp - 2.0 * fx + fm) / (hi * hi);
48 xp[i] = xm[i] = x[i];
49 }
50
51 // Off-diagonal mixed partials using 4-point stencil (i < j)
52 for (int i = 0; i < n; ++i) {
53 double hi = eps * std::max(1.0, std::abs(x[i]));
54 for (int j = i + 1; j < n; ++j) {
55 double hj = eps * std::max(1.0, std::abs(x[j]));
56 // x + hi e_i + hj e_j
57 xpp[i] = x[i] + hi; xpp[j] = x[j] + hj;
58 // x + hi e_i - hj e_j
59 xpm[i] = x[i] + hi; xpm[j] = x[j] - hj;
60 // x - hi e_i + hj e_j
61 xmp[i] = x[i] - hi; xmp[j] = x[j] + hj;
62 // x - hi e_i - hj e_j
63 xmm[i] = x[i] - hi; xmm[j] = x[j] - hj;
64
65 double fpp = f(xpp);
66 double fpm = f(xpm);
67 double fmp = f(xmp);
68 double fmm = f(xmm);
69
70 double Hij = (fpp - fpm - fmp + fmm) / (4.0 * hi * hj);
71 H[i][j] = H[j][i] = Hij; // enforce symmetry
72
73 // reset perturbed components
74 xpp[i] = xpm[i] = xmp[i] = xmm[i] = x[i];
75 xpp[j] = xpm[j] = xmp[j] = xmm[j] = x[j];
76 }
77 }
78
79 return H;
80}
81
82// Example: 2D Rosenbrock function (a classic nonconvex test problem)
83double rosenbrock(const std::vector<double>& x) {
84 double a = 1.0, b = 100.0;
85 double x0 = x[0], x1 = x[1];
86 return (a - x0) * (a - x0) + b * (x1 - x0 * x0) * (x1 - x0 * x0);
87}
88
89int main() {
90 std::vector<double> x = {-1.2, 1.0};
91
92 auto g = numerical_gradient(rosenbrock, x);
93 auto H = numerical_hessian(rosenbrock, x);
94
95 std::cout << std::fixed << std::setprecision(6);
96 std::cout << "x = [" << x[0] << ", " << x[1] << "]\n";
97 std::cout << "Gradient: [" << g[0] << ", " << g[1] << "]\n";
98 std::cout << "Hessian:\n";
99 for (auto& row : H) {
100 for (double v : row) std::cout << std::setw(12) << v << ' ';
101 std::cout << '\n';
102 }
103 return 0;
104}
105

This program implements central-difference formulas to approximate the gradient and Hessian of a general f: R^n -> R. It then computes these at a point for the 2D Rosenbrock function, which has a curved valley and a unique minimizer at (1,1). The Hessian reveals strong anisotropic curvature: one eigenvalue is large (stiff direction), the other much smaller (flat direction).

Time: O(n^2) function evaluations for the Hessian; O(n) for the gradientSpace: O(n^2) to store the Hessian and O(n) for vectors
Classify Critical Points in 2D via Hessian Eigenvalues (Trace–Determinant Test)
1#include <iostream>
2#include <vector>
3#include <functional>
4#include <cmath>
5#include <limits>
6#include <string>
7
8// Reuse numerical Hessian for 2D (simplified inline for clarity)
9std::vector<std::vector<double>> hessian2D(
10 const std::function<double(const std::vector<double>&)>& f,
11 const std::vector<double>& x) {
12 const double eps = std::sqrt(std::numeric_limits<double>::epsilon());
13 double h0 = eps * std::max(1.0, std::abs(x[0]));
14 double h1 = eps * std::max(1.0, std::abs(x[1]));
15
16 std::vector<std::vector<double>> H(2, std::vector<double>(2, 0.0));
17 double fx = f(x);
18 // Diagonals
19 std::vector<double> p = x, m = x;
20 p[0] = x[0] + h0; m[0] = x[0] - h0;
21 H[0][0] = (f(p) - 2*fx + f(m)) / (h0*h0);
22 p = x; m = x;
23 p[1] = x[1] + h1; m[1] = x[1] - h1;
24 H[1][1] = (f(p) - 2*fx + f(m)) / (h1*h1);
25 // Off-diagonal via 4-point stencil
26 std::vector<double> xpp = x, xpm = x, xmp = x, xmm = x;
27 xpp[0] = x[0] + h0; xpp[1] = x[1] + h1;
28 xpm[0] = x[0] + h0; xpm[1] = x[1] - h1;
29 xmp[0] = x[0] - h0; xmp[1] = x[1] + h1;
30 xmm[0] = x[0] - h0; xmm[1] = x[1] - h1;
31 double Hij = (f(xpp) - f(xpm) - f(xmp) + f(xmm)) / (4*h0*h1);
32 H[0][1] = H[1][0] = Hij;
33 return H;
34}
35
36std::string classify2D(const std::vector<std::vector<double>>& H) {
37 // For symmetric 2x2, eigenvalues from trace and determinant
38 double a = H[0][0], b = H[0][1], c = H[1][0], d = H[1][1];
39 double tr = a + d;
40 double det = a*d - b*c;
41 if (det > 0 && tr > 0) return "Local minimum (H positive definite)";
42 if (det > 0 && tr < 0) return "Local maximum (H negative definite)";
43 if (det < 0) return "Saddle point (H indefinite)";
44 return "Inconclusive (determinant ~ 0); check higher-order terms.";
45}
46
47// Example function: f(x,y) = x^4 + y^4 - 3x^2 - 3y^2
48// Has a local maximum at (0,0) and minima away from origin.
49double quartic(const std::vector<double>& x) {
50 double X = x[0], Y = x[1];
51 return std::pow(X,4) + std::pow(Y,4) - 3*X*X - 3*Y*Y;
52}
53
54int main() {
55 std::vector<std::vector<double>> points = {
56 {0.0, 0.0}, // expect local maximum
57 {1.3, 0.0}, // off-origin point (likely moving toward a minimum)
58 {1.0, 1.0} // a symmetric point
59 };
60 for (auto& x : points) {
61 auto H = hessian2D(quartic, x);
62 std::cout << "Point (" << x[0] << ", " << x[1] << ") -> "
63 << classify2D(H) << "\n";
64 }
65 return 0;
66}
67

This program computes a 2×2 numerical Hessian and classifies the local behavior using determinant and trace. In 2D, det(H) > 0 with positive trace indicates a local minimum, det(H) > 0 with negative trace indicates a local maximum, and det(H) < 0 indicates a saddle. When det ≈ 0, second-order information may be insufficient and higher-order terms or other diagnostics are needed.

Time: O(1) for the 2D classification once the Hessian is obtained; building the 2D Hessian is O(1) with a constant number of function callsSpace: O(1)
Newton's Method with Backtracking Line Search (Using Numerical Hessian)
1#include <iostream>
2#include <vector>
3#include <functional>
4#include <cmath>
5#include <limits>
6#include <algorithm>
7#include <stdexcept>
8#include <iomanip>
9
10// Dot product and norms
11double dot(const std::vector<double>& a, const std::vector<double>& b) {
12 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
13}
14
15double norm2(const std::vector<double>& a) { return std::sqrt(dot(a,a)); }
16
17// Central-difference gradient
18std::vector<double> numerical_gradient(const std::function<double(const std::vector<double>&)>& f,
19 const std::vector<double>& x) {
20 const double eps = std::sqrt(std::numeric_limits<double>::epsilon());
21 int n = (int)x.size();
22 std::vector<double> g(n), xp = x, xm = x;
23 for (int i = 0; i < n; ++i) {
24 double h = eps * std::max(1.0, std::abs(x[i]));
25 xp[i] = x[i] + h; xm[i] = x[i] - h;
26 g[i] = (f(xp) - f(xm)) / (2*h);
27 xp[i] = xm[i] = x[i];
28 }
29 return g;
30}
31
32// Central-difference Hessian (symmetric)
33std::vector<std::vector<double>> numerical_hessian(const std::function<double(const std::vector<double>&)>& f,
34 const std::vector<double>& x) {
35 const double eps = std::sqrt(std::numeric_limits<double>::epsilon());
36 int n = (int)x.size();
37 std::vector<std::vector<double>> H(n, std::vector<double>(n, 0.0));
38 std::vector<double> xp = x, xm = x, xpp = x, xpm = x, xmp = x, xmm = x;
39 double fx = f(x);
40 // Diagonals
41 for (int i = 0; i < n; ++i) {
42 double h = eps * std::max(1.0, std::abs(x[i]));
43 xp[i] = x[i] + h; xm[i] = x[i] - h;
44 H[i][i] = (f(xp) - 2*fx + f(xm)) / (h*h);
45 xp[i] = xm[i] = x[i];
46 }
47 // Off-diagonals
48 for (int i = 0; i < n; ++i) {
49 double hi = eps * std::max(1.0, std::abs(x[i]));
50 for (int j = i+1; j < n; ++j) {
51 double hj = eps * std::max(1.0, std::abs(x[j]));
52 xpp[i]=x[i]+hi; xpp[j]=x[j]+hj;
53 xpm[i]=x[i]+hi; xpm[j]=x[j]-hj;
54 xmp[i]=x[i]-hi; xmp[j]=x[j]+hj;
55 xmm[i]=x[i]-hi; xmm[j]=x[j]-hj;
56 double Hij = (f(xpp) - f(xpm) - f(xmp) + f(xmm)) / (4*hi*hj);
57 H[i][j] = H[j][i] = Hij;
58 xpp[i]=xpm[i]=xmp[i]=xmm[i]=x[i];
59 xpp[j]=xpm[j]=xmp[j]=xmm[j]=x[j];
60 }
61 }
62 return H;
63}
64
65// Solve A x = b with Gaussian elimination + partial pivoting (dense, O(n^3))
66std::vector<double> solve_linear(std::vector<std::vector<double>> A, std::vector<double> b) {
67 int n = (int)A.size();
68 for (int i = 0; i < n; ++i) {
69 // Pivot
70 int piv = i;
71 double best = std::abs(A[i][i]);
72 for (int r = i+1; r < n; ++r) {
73 if (std::abs(A[r][i]) > best) { best = std::abs(A[r][i]); piv = r; }
74 }
75 if (best == 0.0) throw std::runtime_error("Singular matrix in solve_linear");
76 if (piv != i) { std::swap(A[piv], A[i]); std::swap(b[piv], b[i]); }
77 // Eliminate
78 for (int r = i+1; r < n; ++r) {
79 double m = A[r][i] / A[i][i];
80 for (int c = i; c < n; ++c) A[r][c] -= m * A[i][c];
81 b[r] -= m * b[i];
82 }
83 }
84 // Back substitution
85 std::vector<double> x(n);
86 for (int i = n-1; i >= 0; --i) {
87 double s = b[i];
88 for (int c = i+1; c < n; ++c) s -= A[i][c] * x[c];
89 x[i] = s / A[i][i];
90 }
91 return x;
92}
93
94// Backtracking line search with Armijo condition
95double line_search(const std::function<double(const std::vector<double>&)>& f,
96 const std::vector<double>& x,
97 const std::vector<double>& p,
98 const std::vector<double>& g,
99 double alpha0 = 1.0, double c = 1e-4, double rho = 0.5) {
100 double fx = f(x);
101 double dg = dot(g, p);
102 double alpha = alpha0;
103 std::vector<double> xn = x;
104 while (true) {
105 for (size_t i = 0; i < x.size(); ++i) xn[i] = x[i] + alpha * p[i];
106 double fnew = f(xn);
107 if (fnew <= fx + c * alpha * dg) break; // Armijo condition
108 alpha *= rho;
109 if (alpha < 1e-12) break; // safeguard
110 }
111 return alpha;
112}
113
114// Rosenbrock function in 2D
115double rosenbrock(const std::vector<double>& x) {
116 double a = 1.0, b = 100.0;
117 double x0 = x[0], x1 = x[1];
118 return (a - x0) * (a - x0) + b * (x1 - x0 * x0) * (x1 - x0 * x0);
119}
120
121int main() {
122 std::vector<double> x = {-1.2, 1.0}; // initial guess
123 const int max_it = 50;
124 const double tol = 1e-6;
125
126 std::cout << std::fixed << std::setprecision(6);
127 for (int it = 0; it < max_it; ++it) {
128 auto g = numerical_gradient(rosenbrock, x);
129 double gn = norm2(g);
130 std::cout << "it=" << it << ": f=" << rosenbrock(x) << ", ||g||=" << gn << "\n";
131 if (gn < tol) break;
132 auto H = numerical_hessian(rosenbrock, x);
133 // Regularize if necessary to avoid singular/indefinite H
134 for (size_t i = 0; i < H.size(); ++i) H[i][i] += 1e-6; // small damping
135 // Solve H p = -g
136 std::vector<double> rhs = g; for (double &v : rhs) v = -v;
137 std::vector<double> p = solve_linear(H, rhs);
138 // Line search for globalization
139 double a = line_search(rosenbrock, x, p, g);
140 for (size_t i = 0; i < x.size(); ++i) x[i] += a * p[i];
141 }
142 std::cout << "Solution approx: (" << x[0] << ", " << x[1] << ")\n";
143 return 0;
144}
145

This code implements a basic Newton optimizer with backtracking line search using numerical gradient and Hessian. A tiny diagonal damping stabilizes the linear solve when the Hessian is nearly singular or indefinite. Starting from a poor guess on Rosenbrock, it converges near (1,1). For higher dimensions or expensive functions, consider Hessian–vector products and iterative solvers instead of forming H.

Time: Per iteration: O(n^2) function evaluations to build H (if numerical), plus O(n^3) for the dense linear solveSpace: O(n^2) for the Hessian and O(n) for vectors
#hessian matrix#second derivatives#curvature#newton method#finite differences#eigenvalues#convexity#taylor expansion#line search#saddle point#rosenbrock#positive definite#trace determinant#hessian vector product