Groups
Category
RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.
Automatic differentiation (AD) computes exact derivatives by systematically applying the chain rule to your program, not by symbolic algebra or numerical differences.