🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Jacobian Matrix

Key Points

  • •
    The Jacobian matrix collects all first-order partial derivatives of a vector-valued function, describing how small input changes linearly affect each output component.
  • •
    It generalizes the gradient (scalar output) and is the multivariable version of the derivative for functions from \(Rn\) to \(Rm\).
  • •
    Near a point, the function is approximately linear: \(f(x + Δ x) ≈ f(x) + J(x)\,Δ x\), where \(J(x)\) is the Jacobian at \(x\).
  • •
    The determinant of a square Jacobian measures local volume scaling; in 2D it scales area, and in 3D it scales volume.
  • •
    The chain rule for Jacobians is matrix multiplication: the Jacobian of a composition equals the product of Jacobians evaluated at the right places.
  • •
    In practice, Jacobians are computed analytically, by automatic differentiation, or numerically using finite differences, each with different accuracy and cost trade-offs.
  • •
    Newton’s method for solving nonlinear systems uses the Jacobian to update guesses by solving linear systems at each iteration.
  • •
    For change of variables in integrals, you must multiply by the absolute value of the Jacobian determinant to account for area/volume distortion.

Prerequisites

  • →Single-variable differentiation — Understanding rates of change and tangent lines generalizes to partial derivatives and linearization.
  • →Multivariable functions and partial derivatives — Jacobian entries are partial derivatives; the concept relies on differentiability in several variables.
  • →Basic linear algebra (matrices and vectors) — The Jacobian is a matrix; interpreting it as a linear map requires matrix operations.
  • →Matrix norms and determinants — Sensitivity bounds use norms; change of variables uses the determinant.
  • →Numerical methods and floating-point arithmetic — Finite-difference Jacobians and Newton’s method depend on stable numerical approximations.
  • →Gaussian elimination / linear solvers — Newton’s step requires solving linear systems involving the Jacobian.
  • →Chain rule (single-variable) — Extends to the matrix-form chain rule for composed multivariable functions.
  • →Optimization basics — Gradients/Jacobians drive first-order methods and constraint handling.

Detailed Explanation

Tap terms for definitions

01Overview

Hook → Imagine nudging the steering wheel of a car slightly and watching how both your direction and speed change. In many systems, a tiny input tweak causes several outputs to shift at once. The Jacobian matrix is the master dial that predicts these simultaneous changes. Concept → The Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function. If a function maps n inputs to m outputs, the Jacobian has m rows and n columns, with entry (i, j) telling you how output i reacts to input j, holding other inputs fixed. It is the bridge between nonlinear behavior and its best local linear approximation. Example → For f(x, y) = (x^2 + sin y, x y), the Jacobian is J = [[2x, cos y], [y, x]]. At (x, y) = (1, 0.5), J = [[2, cos 0.5], [0.5, 1]], which predicts how tiny changes in x or y affect each output component.

02Intuition & Analogies

Hook → Think of a sound mixing board with many sliders (inputs) controlling several speakers (outputs). Move one slider a little; you’ll hear certain speakers get louder or softer. Move a different slider; a different pattern emerges. The Jacobian is the table that records how each slider affects each speaker near the current setting. Concept → Zoom in very close to a nonlinear function. At this microscopic scale, curves look straight and surfaces look flat. The Jacobian captures this flat, linear picture: it tells you the best-fitting linear transformation that maps small input displacements to output changes. This is why engineers treat complicated systems as linear around an operating point—the Jacobian is the blueprint. Example → Consider GPS position (latitude, longitude, altitude) to screen coordinates (x, y). Near your current location, the relationship is nearly linear. The Jacobian at that point tells the UI exactly how a meter north or east shifts the pixel position. If the Jacobian determinant is large, a small movement on Earth causes a large motion on screen; if it’s near zero, the mapping squashes movements and becomes insensitive in some directions.

03Formal Definition

Hook→To formalize “how outputs change with inputs,” we need partial derivatives organized systematically. Concept→Let f: \(Rn → Rm\) with components \(f1​, …, fm​\). The Jacobian of f at x \(∈ Rn\) is the m×n matrix J(x) whose entries are \(Jij​(x) = ∂xj​∂fi​​(x)\). If f is differentiable at x, then there exists a linear map L such that \(lim∥h∥→0​ ∥h∥∥f(x+h)−f(x)−Lh∥​ = 0\), and in coordinates L is exactly multiplication by J(x). Example→For f(x, y, z) = (x y, ey+z, x+z2), the Jacobian is J = [[y, x, 0], [0, ey+z, ey+z], [1, 0, 2z]]. This linear map approximates f near any chosen point (x, y, z).

04When to Use

Hook → Any time you linearize a nonlinear system, optimize a multivariable function, or transform coordinates, you’re implicitly using the Jacobian. Concept → Use the Jacobian to: (1) linearize dynamics around an equilibrium (control and robotics), (2) run Newton’s method for solving systems of nonlinear equations, (3) propagate uncertainties and sensitivities in engineering models, (4) perform change of variables in integrals, where |det J| accounts for area/volume scaling, and (5) analyze invertibility via the Inverse Function Theorem (nonzero det J implies a local inverse). Example → In image warping, a pixel mapping (u, v) → (x(u, v), y(u, v)) changes area by |det J|. Integrating brightness over the transformed image requires multiplying by this factor to preserve total energy.

⚠️Common Mistakes

Hook → The most frequent Jacobian bugs come from shape confusion and forgetting absolute values. Concept → Common pitfalls include: (1) mixing up rows/columns—remember J is m×n with row i for output i; (2) forgetting the absolute value in |det J| for integrals, which flips sign under orientation changes; (3) using too large or too small finite-difference steps when approximating, causing truncation or roundoff errors; (4) assuming det J ≠ 0 globally from a single point—nonlinear maps can become singular elsewhere; (5) conflating gradient (n×1) with Jacobian (m×n)—for scalar outputs, the gradient is the Jacobian’s transpose; (6) misapplying the chain rule—order matters: J_{g∘f}(x) = J_g(f(x)) J_f(x). Example → In Newton’s method on a 2×2 system, if J is nearly singular, the step can explode. Regularization or line search can stabilize updates, and checking the condition number helps detect trouble.

Key Formulas

Jacobian Entry

Jij​(x)=∂xj​∂fi​​(x)

Explanation: Each entry tells how output i changes with respect to input j near x. Collecting all entries forms the m×n Jacobian matrix.

First-Order Linearization

f(x+Δx)≈f(x)+J(x)Δx

Explanation: Near x, the function behaves almost like the linear map given by the Jacobian. This approximation is accurate when \(\∣Δx∥\) is small.

Chain Rule (Jacobian Form)

Jg∘f​(x)=Jg​(f(x))Jf​(x)

Explanation: The derivative of a composed function equals the product of Jacobians, evaluated at the appropriate points. The order matters because matrix multiplication is not commutative.

2D Determinant

det(J)=J11​J22​−J12​J21​(for 2\times2)

Explanation: In two dimensions, the determinant measures signed area scaling. The absolute value gives the factor by which small areas are stretched or squashed.

Change of Variables

∫Ω​g(x)dx=∫U​g(f(u))∣det(Df(u))∣du

Explanation: When substituting x=f(u) in an integral, you must multiply by the absolute value of the Jacobian determinant to account for area/volume distortion.

Newton’s Method (Multivariate)

xk+1​=xk​−J(xk​)−1f(xk​)

Explanation: Newton’s update solves a linear system using the Jacobian to find a step that drives the function toward zero. In practice, we solve J(xk​)\,Δ x = -f(xk​) for \(Δ x\).

Forward Difference Approximation

∂xj​∂fi​​(x)≈hfi​(x+hej​)−fi​(x)​

Explanation: A simple way to approximate Jacobian entries numerically. The truncation error is O(h) while roundoff error grows as h becomes very small.

Central Difference Approximation

∂xj​∂fi​​(x)≈2hfi​(x+hej​)−fi​(x−hej​)​

Explanation: A more accurate symmetric approximation with truncation error O(h2). It requires twice as many function evaluations per column.

Operator Norm Bound

∥f(x+Δx)−f(x)∥≤∥J(x)∥∥Δx∥+o(∥Δx∥)

Explanation: The Jacobian’s operator norm bounds how much the function can change for small input perturbations. It formalizes sensitivity near a point.

Cost Estimate

CJac​≈(n+1)Cf​(forward differences)

Explanation: Computing a dense Jacobian via forward differences requires one baseline evaluation plus one per input dimension. Here, \(Cf​\) is the cost of one function evaluation.

Complexity Analysis

Let f: Rn → Rm. If you compute a dense Jacobian numerically by forward differences, you evaluate f once at x and once at x + h ej​ for each j=1..n. Thus the dominant time cost is approximately (n+1) times the cost of evaluating f, plus O(mn) arithmetic for differencing and storage. In big-O form, time is O(n · Tf​ + mn), where Tf​ is the time for one evaluation of f. Memory is O(mn) to store the Jacobian, plus O(max(m, n)) for temporary vectors. Using central differences doubles evaluations to about 2n, but improves accuracy from O(h) to O(h2). Automatic differentiation (AD) changes the scaling: forward-mode AD produces one Jacobian column per sweep (≈ n sweeps total), while reverse-mode AD produces one Jacobian row or one gradient per sweep (≈ m sweeps total). Therefore, for tall Jacobians (m ≪ n), reverse mode may be preferable when only a few rows are needed; for wide Jacobians (n ≪ m), forward mode is efficient for a few columns. When using the Jacobian inside Newton’s method for solving f(x)=0 with n=m, each iteration requires assembling J(xk​) and solving a dense linear system JΔx = −f(xk​). A naive dense solve via Gaussian elimination costs O(n3) time and O(n2) memory. Hence, per iteration cost is O(n · Tf​ + n3). For sparse problems, exploiting sparsity can reduce both assembly and solve time dramatically (e.g., to near-linear time in the number of nonzeros with iterative solvers). Numerical stability depends on the conditioning of J. If J is ill-conditioned, small errors in Jacobian entries or in f propagate and can lead to large step errors; pivoting and regularization (e.g., damping) help mitigate instability.

Code Examples

Compute a dense Jacobian with forward finite differences for f: R^n → R^m
1#include <iostream>
2#include <vector>
3#include <functional>
4#include <cmath>
5#include <iomanip>
6#include <algorithm>
7
8using Vec = std::vector<double>;
9using Mat = std::vector<std::vector<double>>;
10
11// Pretty-print a matrix
12void printMatrix(const Mat& A) {
13 std::cout << std::fixed << std::setprecision(6);
14 for (const auto& row : A) {
15 for (double v : row) {
16 std::cout << std::setw(12) << v << " ";
17 }
18 std::cout << "\n";
19 }
20}
21
22// Compute Jacobian via forward differences.
23// f: R^n -> R^m provided as a function mapping vector -> vector
24Mat jacobian_forward_diff(const std::function<Vec(const Vec&)>& f,
25 const Vec& x,
26 double rel_step = 1e-6) {
27 const size_t n = x.size();
28 Vec f0 = f(x); // baseline evaluation
29 const size_t m = f0.size();
30
31 Mat J(m, std::vector<double>(n, 0.0));
32
33 // For each input dimension, perturb and compute one column
34 for (size_t j = 0; j < n; ++j) {
35 double h = rel_step * std::max(1.0, std::abs(x[j]));
36 Vec xh = x;
37 xh[j] += h;
38 Vec fh = f(xh);
39 for (size_t i = 0; i < m; ++i) {
40 J[i][j] = (fh[i] - f0[i]) / h; // forward difference quotient
41 }
42 }
43 return J;
44}
45
46// Example function: f(x, y) = ( x^2 + sin y, x*y )
47Vec F_example(const Vec& x) {
48 double X = x[0];
49 double Y = x[1];
50 return Vec{ X*X + std::sin(Y), X*Y };
51}
52
53int main() {
54 // Point where we compute the Jacobian
55 Vec x = {1.0, 0.5};
56
57 Mat J = jacobian_forward_diff(F_example, x, 1e-6);
58
59 std::cout << "Jacobian J at x = (" << x[0] << ", " << x[1] << ")\n";
60 printMatrix(J);
61
62 // Compare with analytic Jacobian: [[2x, cos y],[y, x]] at (1, 0.5)
63 std::cout << "\nAnalytic J should be approximately:\n";
64 Mat J_analytic = {
65 {2.0 * x[0], std::cos(x[1])},
66 {x[1], x[0]}
67 };
68 printMatrix(J_analytic);
69
70 return 0;
71}
72

This program approximates the Jacobian of a user-defined vector function using forward finite differences. It reuses a single baseline evaluation and perturbs one coordinate at a time to build each column of the Jacobian. The example function f(x,y)=(x^2+sin y, x y) has a simple analytic Jacobian to verify accuracy.

Time: O(n * cost(f) + m n)Space: O(m n)
Newton’s method in R^n using a numerically computed Jacobian and Gaussian elimination
1#include <iostream>
2#include <vector>
3#include <functional>
4#include <cmath>
5#include <limits>
6#include <iomanip>
7#include <algorithm>
8
9using Vec = std::vector<double>;
10using Mat = std::vector<std::vector<double>>;
11
12Vec F_system(const Vec& x) {
13 // Example system in R^2:
14 // f1(x,y) = x^2 + y^2 - 1 (unit circle)
15 // f2(x,y) = x - y (line x = y)
16 // Solutions: (±sqrt(1/2), ±sqrt(1/2))
17 double X = x[0];
18 double Y = x[1];
19 return Vec{ X*X + Y*Y - 1.0, X - Y };
20}
21
22Mat jacobian_forward_diff(const std::function<Vec(const Vec&)>& f,
23 const Vec& x,
24 double rel_step = 1e-6) {
25 const size_t n = x.size();
26 Vec f0 = f(x);
27 const size_t m = f0.size();
28 Mat J(m, std::vector<double>(n, 0.0));
29 for (size_t j = 0; j < n; ++j) {
30 double h = rel_step * std::max(1.0, std::abs(x[j]));
31 Vec xh = x;
32 xh[j] += h;
33 Vec fh = f(xh);
34 for (size_t i = 0; i < m; ++i) {
35 J[i][j] = (fh[i] - f0[i]) / h;
36 }
37 }
38 return J;
39}
40
41// Solve A x = b via Gaussian elimination with partial pivoting (dense, square A)
42Vec solve_linear_system(Mat A, Vec b) {
43 const size_t n = A.size();
44 for (size_t k = 0; k < n; ++k) {
45 // Pivot
46 size_t piv = k;
47 double best = std::abs(A[k][k]);
48 for (size_t i = k+1; i < n; ++i) {
49 if (std::abs(A[i][k]) > best) { best = std::abs(A[i][k]); piv = i; }
50 }
51 if (best < 1e-15) {
52 throw std::runtime_error("Singular or ill-conditioned Jacobian encountered.");
53 }
54 if (piv != k) {
55 std::swap(A[piv], A[k]);
56 std::swap(b[piv], b[k]);
57 }
58 // Eliminate
59 for (size_t i = k+1; i < n; ++i) {
60 double factor = A[i][k] / A[k][k];
61 for (size_t j = k; j < n; ++j) {
62 A[i][j] -= factor * A[k][j];
63 }
64 b[i] -= factor * b[k];
65 }
66 }
67 // Back substitution
68 Vec x(n, 0.0);
69 for (int i = int(n) - 1; i >= 0; --i) {
70 double sum = b[i];
71 for (size_t j = i+1; j < n; ++j) {
72 sum -= A[i][j] * x[j];
73 }
74 x[i] = sum / A[i][i];
75 }
76 return x;
77}
78
79// Newton solver for f(x)=0 in R^n
80Vec newton_solve(const std::function<Vec(const Vec&)>& f,
81 Vec x0,
82 int max_iters = 50,
83 double tol = 1e-10,
84 double rel_step = 1e-6) {
85 for (int k = 0; k < max_iters; ++k) {
86 Vec fx = f(x0);
87 // Check residual norm
88 double nf = 0.0; for (double v : fx) nf = std::max(nf, std::abs(v));
89 if (nf < tol) break;
90
91 Mat J = jacobian_forward_diff(f, x0, rel_step);
92 // Solve J * dx = -f(x)
93 for (double &v : fx) v = -v;
94 Vec dx = solve_linear_system(J, fx);
95
96 // Update and check step size
97 double ndx = 0.0; for (double v : dx) ndx = std::max(ndx, std::abs(v));
98 for (size_t i = 0; i < x0.size(); ++i) x0[i] += dx[i];
99 if (ndx < tol) break;
100 }
101 return x0;
102}
103
104int main() {
105 Vec x0 = {0.7, 0.5}; // initial guess near (sqrt(1/2), sqrt(1/2))
106 try {
107 Vec root = newton_solve(F_system, x0);
108 std::cout << std::setprecision(12) << "Root ≈ (" << root[0] << ", " << root[1] << ")\n";
109 Vec fval = F_system(root);
110 std::cout << "Residuals f(root) ≈ (" << fval[0] << ", " << fval[1] << ")\n";
111 } catch (const std::exception& e) {
112 std::cerr << "Newton failed: " << e.what() << "\n";
113 }
114 return 0;
115}
116

This program solves a 2D nonlinear system with Newton’s method. Each iteration builds the Jacobian via forward differences, solves a linear system with partial pivoting, and updates the iterate. The chosen system intersects a circle and a line; the solver converges to a point where both equations vanish.

Time: O(k * (n * cost(f) + n^3)) for k iterationsSpace: O(n^2)
Change of variables: estimating the area of the unit disk via |det J| in polar coordinates
1#include <iostream>
2#include <cmath>
3#include <iomanip>
4
5// Polar to Cartesian transform: (r, theta) -> (x, y) = (r cos theta, r sin theta)
6// Jacobian J = [ [cosθ, -r sinθ], [sinθ, r cosθ] ] so det J = r
7
8int main() {
9 const int Nr = 400; // radial subdivisions
10 const int Nt = 800; // angular subdivisions
11 const double r_min = 0.0, r_max = 1.0;
12 const double t_min = 0.0, t_max = 2.0 * M_PI;
13
14 const double dr = (r_max - r_min) / Nr;
15 const double dt = (t_max - t_min) / Nt;
16
17 // Midpoint Riemann sum of |det J| over [0,1] x [0, 2π]
18 double area_est = 0.0;
19 for (int i = 0; i < Nr; ++i) {
20 double r_mid = r_min + (i + 0.5) * dr;
21 for (int j = 0; j < Nt; ++j) {
22 // theta midpoint (not needed explicitly for det, but shown for completeness)
23 double t_mid = t_min + (j + 0.5) * dt;
24 (void)t_mid; // suppress unused variable warning
25 double detJ = r_mid; // |det J| = r for polar transform
26 area_est += detJ * dr * dt;
27 }
28 }
29
30 std::cout << std::fixed << std::setprecision(8);
31 std::cout << "Estimated area of unit disk: " << area_est << "\n";
32 std::cout << "Pi ≈ " << M_PI << ", absolute error = " << std::abs(area_est - M_PI) << "\n";
33 return 0;
34}
35

This program numerically integrates |det J| for the polar-to-Cartesian mapping over r∈[0,1], θ∈[0,2π]. Since det J = r, the integral equals π, the area of the unit disk. The midpoint rule converges quickly as subdivisions increase.

Time: O(Nr * Nt)Space: O(1)
#jacobian matrix#partial derivatives#multivariable calculus#linearization#newton method#change of variables#determinant#finite differences#automatic differentiation#sensitivity analysis#chain rule#gradient#hessian#condition number