Derivatives & Differentiation Rules
Key Points
- •Derivatives measure how fast a function changes, and rules like the product, quotient, and chain rule let us differentiate complex expressions efficiently.
- •The product rule handles multiplication of functions: differentiate each part and combine them carefully.
- •The quotient rule manages division by tracking both the numerator and denominator derivatives and squaring the denominator.
- •The chain rule is for compositions; it multiplies the outer derivative (evaluated at the inner function) by the inner derivative.
- •You can compute derivatives symbolically (algebra) or numerically (approximations) and even automatically (dual numbers) in C++.
- •Common mistakes include forgetting the inner derivative in the chain rule and mixing up the product and power rules.
- •Numerical differentiation needs a smart step size to balance truncation and rounding errors.
- •Symbolic and automatic differentiation have different trade-offs: exact formulas vs. fast, accurate values at specific points.
Prerequisites
- →Limits and Continuity — The derivative is defined as a limit, and rules rely on limit laws.
- →Algebraic Manipulation — Applying product, quotient, and chain rules requires careful algebra and simplification.
- →Trigonometry Basics — Knowing derivatives of sin, cos, and tan and their domains is essential for many examples.
- →Exponential and Logarithmic Functions — Common derivative rules involve e^x and ln x, and compositions with them.
- →C++ Basics (functions, structs, operator overloading) — Implementing dual numbers, expression trees, and numerical methods uses these language features.
- →Floating-Point Arithmetic — Numerical differentiation accuracy depends on machine precision and rounding behavior.
- →Trees and Recursion — Symbolic differentiation traverses expression trees recursively to build derivative expressions.
Detailed Explanation
Tap terms for definitions01Overview
Derivatives capture the idea of instantaneous rate of change. If you think of a car’s speedometer, it reports how quickly your position is changing with respect to time—this is a derivative in action. In single-variable calculus, the derivative of a function f at a point x tells us the slope of the tangent line to the graph of f at that point. While the foundational definition uses limits, working with complex functions requires practical rules. The product rule tells us how to differentiate a product of two functions; the quotient rule handles division; and the chain rule deals with compositions—functions plugged into other functions. These three rules, together with simpler ones like the sum rule and power rule, form a toolkit that lets you differentiate most expressions encountered in practice. In computing, we can approximate derivatives numerically, compute them exactly using symbolic manipulation, or evaluate them efficiently at points using automatic differentiation (AD). This lesson explains the intuition behind the key rules, their formal statements, and shows C++ implementations that bring the rules to life: a tiny symbolic differentiator, a dual-number AD engine, and a central-difference numerical derivative with Richardson extrapolation.
02Intuition & Analogies
Imagine you’re hiking up a hill. The steepness under your feet at a specific spot is the slope or derivative of the elevation function at that location. If two hills overlap—say, a gentle slope multiplied by a wavy pattern—the product rule tells you how the combined steepness behaves: you feel the change in the gentle slope times the current waviness, plus the gentle slope times how the waviness itself is changing. If instead you’re comparing two effects by dividing—like signal over noise—the quotient rule balances how both the numerator and denominator are changing, ensuring you adjust for the fact that changing the denominator also affects the ratio. For the chain rule, think of a conveyor belt feeding a machine: if the belt speeds up (inner function changes) then the machine’s output (outer function) changes at a rate that depends both on the machine’s sensitivity (outer derivative) and the belt’s speed (inner derivative). In real programming, these intuitions turn into three computational paradigms. Numerical differentiation samples the function a little to the left and right and estimates the slope, much like measuring steepness with tiny steps. Symbolic differentiation rewrites expressions algebraically using these rules to produce a new formula for the derivative—like doing the calculus homework by hand, but automated. Automatic differentiation, especially forward mode with dual numbers, threads derivative information through each operation; each +, *, sin, or / not only computes the value but also carries along how sensitive that value is to the input. This is like attaching a small ‘speedometer’ to every calculation so the final result comes with its slope for free.
03Formal Definition
04When to Use
Use the product rule whenever you differentiate a product of two differentiable functions, such as x^2 \sin x or (x+1)e^x. Use the quotient rule when differentiating a ratio, like \frac{\sin x}{x} or \frac{x^2+1}{x-3}, assuming the denominator is nonzero in the region of interest. Apply the chain rule when a function is nested inside another, e.g., \sin(x^2), e^{\cos x}, or \ln(3x+1). In software: choose symbolic differentiation when you need exact formulas (for code generation, algebraic simplification, or proofs). Choose automatic differentiation (forward mode with dual numbers) when you need accurate derivatives of scalar-output functions at specific points, and the function is evaluated by a straight-line computation (common in optimization and curve fitting). Choose numerical differentiation when you only have black-box access to function values, symbolic structure is unavailable, or implementing AD is impractical; use central differences with a carefully chosen step and, when possible, Richardson extrapolation to reduce error. For performance: AD often provides machine-precision derivatives with overhead proportional to a small constant factor of the original computation. Symbolic differentiation can blow up expression size (expression swell) but gives exact results. Numerical methods are quick to implement but sensitive to step size and round-off.
⚠️Common Mistakes
• Forgetting the inner derivative in the chain rule: for h(x) = \sin(x^2), h'(x) = \cos(x^2)·2x, not just \cos(x^2). Always multiply by g'(x) when differentiating f(g(x)). • Confusing the product rule with the power rule: (fg)' ≠ f'g' and (x^n)' = n x^{n-1} applies only to powers of x (or more generally (u(x))^n requires the chain rule). If f(x) = x·x, the product rule and power rule coincide, but this is a special case. • Misapplying the quotient rule sign: (f/g)' = (f'g − fg')/g^2; note the minus sign in the numerator and the squared denominator. A common error is swapping signs or forgetting g^2. • Domain issues: derivatives like (\ln x)' = 1/x require x > 0; (1/x)' exists for x ≠ 0. Always check domain constraints before differentiating and evaluating. • Numerical differentiation pitfalls: choosing h too small increases round-off error; too large increases truncation error. Central differences reduce error to O(h^2), and Richardson extrapolation can further improve accuracy. • Symbolic differentiation expression swell: naive application of rules can create huge expressions. Basic simplification (like combining constants, eliminating multiplication by zero or one) helps readability and performance. • Floating-point assumptions: when comparing numerical and exact derivatives, expect small discrepancies due to rounding. Use tolerances (e.g., 1e-9) rather than exact equality checks.
Key Formulas
Limit Definition of Derivative
Explanation: This is the foundational definition of the derivative. It measures the slope of f at a by taking the slope of secant lines and letting the points collapse together.
Sum Rule
Explanation: The derivative distributes over addition. Differentiate each term separately and add the results.
Constant Multiple Rule
Explanation: A constant factor passes through differentiation unchanged. Multiply the derivative by the constant.
Product Rule
Explanation: When differentiating a product, take derivative of the first times the second plus the first times derivative of the second. This follows from the limit definition.
Quotient Rule
Explanation: Differentiate numerator and denominator, subtract in the correct order, and divide by the square of the denominator. It can also be derived by the product rule with .
Chain Rule
Explanation: For a composition h(x)=f(g(x)), multiply the derivative of the outer (evaluated at inner) by the derivative of the inner. This propagates sensitivity through nested functions.
Power Rule
Explanation: For integer n, the derivative of is n times x to the (n−1). For (u(x))^n, multiply by u'(x) via the chain rule.
Exponential Derivatives
Explanation: The derivative of the natural exponential is itself. For general base , multiply by ln(a). For , also multiply by u'(x) via the chain rule.
Logarithm Derivative
Explanation: The derivative of the natural logarithm is reciprocal of x (for ). For (u(x)), multiply by u'(x)/u(x).
Trigonometric Derivatives
Explanation: Basic trig derivatives. For compositions like (u(x)), apply the chain rule: (u(x)) u'(x).
Generalized Power Rule
Explanation: This is the power rule combined with the chain rule. It applies when the base is a differentiable function u(x) and n is a constant.
Central Difference and Richardson
Explanation: The central difference (h) has error O(). Richardson extrapolation combines two estimates to cancel leading error, yielding a higher-accuracy estimate.
Complexity Analysis
Code Examples
1 #include <iostream> 2 #include <cmath> 3 #include <utility> 4 5 // Simple dual number type: value + derivative*epsilon, with epsilon^2 = 0 6 struct Dual { 7 double val; // function value 8 double der; // derivative value 9 10 Dual(double v = 0.0, double d = 0.0) : val(v), der(d) {} 11 }; 12 13 // Basic operators for Dual 14 Dual operator+(const Dual& a, const Dual& b) { 15 return Dual(a.val + b.val, a.der + b.der); 16 } 17 Dual operator-(const Dual& a, const Dual& b) { 18 return Dual(a.val - b.val, a.der - b.der); 19 } 20 Dual operator*(const Dual& a, const Dual& b) { 21 // Product rule: (a*b)' = a'*b + a*b' 22 return Dual(a.val * b.val, a.der * b.val + a.val * b.der); 23 } 24 Dual operator/(const Dual& a, const Dual& b) { 25 // Quotient rule: (a/b)' = (a'*b - a*b') / b^2 26 return Dual(a.val / b.val, 27 (a.der * b.val - a.val * b.der) / (b.val * b.val)); 28 } 29 30 // Operations with scalars (enable expressions like Dual + double) 31 Dual operator+(const Dual& a, double b) { return Dual(a.val + b, a.der); } 32 Dual operator+(double a, const Dual& b) { return Dual(a + b.val, b.der); } 33 Dual operator-(const Dual& a, double b) { return Dual(a.val - b, a.der); } 34 Dual operator-(double a, const Dual& b) { return Dual(a - b.val, -b.der); } 35 Dual operator*(const Dual& a, double b) { return Dual(a.val * b, a.der * b); } 36 Dual operator*(double a, const Dual& b) { return Dual(a * b.val, a * b.der); } 37 Dual operator/(const Dual& a, double b) { return Dual(a.val / b, a.der / b); } 38 Dual operator/(double a, const Dual& b) { 39 // a / b = a * (1/b) 40 return Dual(a / b.val, (-a * b.der) / (b.val * b.val)); 41 } 42 43 // Elementary functions for Dual (Chain rule inside) 44 Dual sin(const Dual& x) { 45 return Dual(std::sin(x.val), std::cos(x.val) * x.der); 46 } 47 Dual cos(const Dual& x) { 48 return Dual(std::cos(x.val), -std::sin(x.val) * x.der); 49 } 50 Dual exp(const Dual& x) { 51 double e = std::exp(x.val); 52 return Dual(e, e * x.der); 53 } 54 Dual log(const Dual& x) { 55 return Dual(std::log(x.val), (1.0 / x.val) * x.der); 56 } 57 Dual pow(const Dual& x, double n) { 58 // (x^n)' = n * x^(n-1) * x' 59 double p = std::pow(x.val, n); 60 double dp = (n == 0.0 ? 0.0 : n * std::pow(x.val, n - 1.0)) * x.der; 61 return Dual(p, dp); 62 } 63 64 // Example function: f(x) = ((x^2 + 3x) * sin(x)) / (x + 1) 65 Dual f(const Dual& x) { 66 Dual num = pow(x, 2.0) + 3.0 * x; // x^2 + 3x 67 Dual s = sin(x); // sin(x) 68 Dual den = x + 1.0; // x + 1 69 return (num * s) / den; // quotient of products -> product, quotient, chain rules apply 70 } 71 72 int main() { 73 double x0 = 1.0; 74 Dual X(x0, 1.0); // seed derivative: d/dx x = 1 75 Dual Y = f(X); 76 std::cout.setf(std::ios::fixed); std::cout.precision(10); 77 std::cout << "f(" << x0 << ") = " << Y.val << "\n"; 78 std::cout << "f'(" << x0 << ") = " << Y.der << "\n"; 79 return 0; 80 } 81
This program implements forward-mode automatic differentiation with dual numbers. Each arithmetic and transcendental operation carries both a value and a derivative, so the product, quotient, and chain rules are enforced automatically by operator overloads. Evaluating f at x0 with the seed derivative 1 returns both f(x0) and f'(x0) exactly up to floating-point rounding.
1 #include <iostream> 2 #include <memory> 3 #include <cmath> 4 #include <string> 5 #include <sstream> 6 using namespace std; 7 8 // Expression tree for single-variable functions f(x) 9 struct Expr { 10 virtual ~Expr() = default; 11 virtual double eval(double x) const = 0; // evaluate at x 12 virtual unique_ptr<Expr> d() const = 0; // symbolic derivative 13 virtual string str() const = 0; // pretty print 14 }; 15 16 struct Const : Expr { 17 double c; 18 explicit Const(double c) : c(c) {} 19 double eval(double) const override { return c; } 20 unique_ptr<Expr> d() const override { return make_unique<Const>(0.0); } 21 string str() const override { 22 ostringstream os; os << c; return os.str(); 23 } 24 }; 25 26 struct Var : Expr { 27 double eval(double x) const override { return x; } 28 unique_ptr<Expr> d() const override { return make_unique<Const>(1.0); } 29 string str() const override { return string("x"); } 30 }; 31 32 struct Add : Expr { 33 unique_ptr<Expr> a, b; 34 Add(unique_ptr<Expr> a, unique_ptr<Expr> b) : a(move(a)), b(move(b)) {} 35 double eval(double x) const override { return a->eval(x) + b->eval(x); } 36 unique_ptr<Expr> d() const override { return make_unique<Add>(a->d(), b->d()); } 37 string str() const override { return string("(") + a->str() + "+" + b->str() + ")"; } 38 }; 39 40 struct Mul : Expr { 41 unique_ptr<Expr> a, b; 42 Mul(unique_ptr<Expr> a, unique_ptr<Expr> b) : a(move(a)), b(move(b)) {} 43 double eval(double x) const override { return a->eval(x) * b->eval(x); } 44 unique_ptr<Expr> d() const override { 45 // (ab)' = a'b + ab' 46 return make_unique<Add>( 47 make_unique<Mul>(a->d(), unique_ptr<Expr>(b->clone())), 48 make_unique<Mul>(unique_ptr<Expr>(a->clone()), b->d()) 49 ); 50 } 51 // Helper for cloning (since unique_ptr is used) 52 Expr* clone() const { return new Mul(unique_ptr<Expr>(a->clone()), unique_ptr<Expr>(b->clone())); } 53 string str() const override { return string("(") + a->str() + "*" + b->str() + ")"; } 54 }; 55 56 struct Div : Expr { 57 unique_ptr<Expr> a, b; 58 Div(unique_ptr<Expr> a, unique_ptr<Expr> b) : a(move(a)), b(move(b)) {} 59 double eval(double x) const override { return a->eval(x) / b->eval(x); } 60 unique_ptr<Expr> d() const override { 61 // (a/b)' = (a'b - ab')/b^2 62 return make_unique<Div>( 63 make_unique<Add>( 64 make_unique<Mul>(a->d(), unique_ptr<Expr>(b->clone())), 65 make_unique<Mul>(make_unique<Const>(-1.0), make_unique<Mul>(unique_ptr<Expr>(a->clone()), b->d())) 66 ), 67 make_unique<Mul>(unique_ptr<Expr>(b->clone()), unique_ptr<Expr>(b->clone())) 68 ); 69 } 70 Expr* clone() const { return new Div(unique_ptr<Expr>(a->clone()), unique_ptr<Expr>(b->clone())); } 71 string str() const override { return string("(") + a->str() + "/" + b->str() + ")"; } 72 }; 73 74 struct Sin : Expr { 75 unique_ptr<Expr> a; 76 explicit Sin(unique_ptr<Expr> a) : a(move(a)) {} 77 double eval(double x) const override { return std::sin(a->eval(x)); } 78 unique_ptr<Expr> d() const override { 79 // (sin a)' = cos(a) * a' 80 return make_unique<Mul>(make_unique<Cos>(unique_ptr<Expr>(a->clone())), a->d()); 81 } 82 Expr* clone() const { return new Sin(unique_ptr<Expr>(a->clone())); } 83 string str() const override { return string("sin(") + a->str() + ")"; } 84 }; 85 86 struct Cos : Expr { 87 unique_ptr<Expr> a; 88 explicit Cos(unique_ptr<Expr> a) : a(move(a)) {} 89 double eval(double x) const override { return std::cos(a->eval(x)); } 90 unique_ptr<Expr> d() const override { 91 // (cos a)' = -sin(a) * a' 92 return make_unique<Mul>(make_unique<Const>(-1.0), make_unique<Mul>(make_unique<Sin>(unique_ptr<Expr>(a->clone())), a->d())); 93 } 94 Expr* clone() const { return new Cos(unique_ptr<Expr>(a->clone())); } 95 string str() const override { return string("cos(") + a->str() + ")"; } 96 }; 97 98 struct PowConst : Expr { 99 unique_ptr<Expr> a; double n; // (a)^n, n constant 100 PowConst(unique_ptr<Expr> a, double n) : a(move(a)), n(n) {} 101 double eval(double x) const override { return std::pow(a->eval(x), n); } 102 unique_ptr<Expr> d() const override { 103 // (a^n)' = n * a^(n-1) * a' 104 return make_unique<Mul>( 105 make_unique<Const>(n), 106 make_unique<Mul>(make_unique<PowConst>(unique_ptr<Expr>(a->clone()), n-1.0), a->d()) 107 ); 108 } 109 Expr* clone() const { return new PowConst(unique_ptr<Expr>(a->clone()), n); } 110 string str() const override { 111 ostringstream os; os << "(" << a->str() << ")^" << n; return os.str(); 112 } 113 }; 114 115 // To avoid rewriting clone in base, provide default implementation via CRTP-like manual clones 116 // Add clone to base via virtual (simple for demo) 117 118 struct CloneableExpr : Expr { 119 virtual Expr* clone() const = 0; 120 }; 121 122 // Adjust classes to inherit CloneableExpr: done above manually with clone methods. 123 124 // Helper constructors 125 unique_ptr<Expr> c(double v) { return make_unique<Const>(v); } 126 unique_ptr<Expr> x() { return make_unique<Var>(); } 127 unique_ptr<Expr> add(unique_ptr<Expr> a, unique_ptr<Expr> b) { return make_unique<Add>(move(a), move(b)); } 128 unique_ptr<Expr> mul(unique_ptr<Expr> a, unique_ptr<Expr> b) { return make_unique<Mul>(move(a), move(b)); } 129 unique_ptr<Expr> div(unique_ptr<Expr> a, unique_ptr<Expr> b) { return make_unique<Div>(move(a), move(b)); } 130 unique_ptr<Expr> sin_(unique_ptr<Expr> a) { return make_unique<Sin>(move(a)); } 131 unique_ptr<Expr> cos_(unique_ptr<Expr> a) { return make_unique<Cos>(move(a)); } 132 unique_ptr<Expr> pow_(unique_ptr<Expr> a, double n) { return make_unique<PowConst>(move(a), n); } 133 134 int main() { 135 // Build f(x) = ((x^2 + 3x) * sin(x)) / (x + 1) 136 auto X = x(); 137 auto f = div( 138 mul(add(pow_(make_unique<Var>(), 2.0), mul(c(3.0), make_unique<Var>())), sin_(make_unique<Var>())), 139 add(make_unique<Var>(), c(1.0)) 140 ); 141 142 // Compute symbolic derivative f'(x) 143 auto df = f->d(); 144 145 double x0 = 1.0; 146 cout.setf(std::ios::fixed); cout.precision(10); 147 cout << "f(x) = " << f->str() << "\n"; 148 cout << "f'(x) = " << df->str() << "\n"; 149 cout << "f(1.0) = " << f->eval(x0) << "\n"; 150 cout << "f'(1.0)= " << df->eval(x0) << "\n"; 151 return 0; 152 } 153
This program builds an expression tree for a function and applies differentiation rules recursively to produce a new expression representing the derivative. The product, quotient, and chain rules are encoded in Mul::d, Div::d, and Sin/Cos/PowConst::d, respectively. It then prints both f(x) and f'(x) and evaluates them at x = 1. Note: For brevity, only minimal simplification is performed; expressions may grow in size.
1 #include <iostream> 2 #include <cmath> 3 #include <functional> 4 #include <limits> 5 6 // Central difference derivative approximation: O(h^2) truncation error 7 static double central_diff(const std::function<double(double)>& f, double x, double h) { 8 return (f(x + h) - f(x - h)) / (2.0 * h); 9 } 10 11 // Choose a step size that balances rounding and truncation errors (heuristic) 12 static double choose_h(double x) { 13 double eps = std::numeric_limits<double>::epsilon(); 14 double scale = std::max(1.0, std::abs(x)); 15 return std::cbrt(eps) * scale; // ~ eps^{1/3} 16 } 17 18 int main() { 19 // Target function: f(x) = ((x^2 + 3x) * sin(x)) / (x + 1) 20 auto f = [](double x) { 21 return ((x*x + 3.0*x) * std::sin(x)) / (x + 1.0); 22 }; 23 24 double x0 = 1.0; 25 double h = choose_h(x0); 26 27 // Two central-difference estimates with h and h/2 28 double D1 = central_diff(f, x0, h); 29 double D2 = central_diff(f, x0, h / 2.0); 30 31 // Richardson extrapolation: cancel leading O(h^2) error 32 double D_rich = (4.0 * D2 - D1) / 3.0; 33 34 std::cout.setf(std::ios::fixed); std::cout.precision(12); 35 std::cout << "h = " << h << "\n"; 36 std::cout << "D_c(h) = " << D1 << "\n"; 37 std::cout << "D_c(h/2) = " << D2 << "\n"; 38 std::cout << "Richardson = " << D_rich << "\n"; 39 return 0; 40 } 41
This program approximates f'(x) using a symmetric (central) finite difference and then improves the estimate with one step of Richardson extrapolation. The central difference has O(h^2) truncation error; Richardson cancels this leading term, yielding a more accurate derivative without symbolic knowledge of f. The step size is chosen heuristically as cbrt(epsilon)·max(1,|x|) for double precision.