Concepts4

Groups

RLHF Mathematics

RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.

#rlhf#bradley-terry#pairwise comparisons+11

Automatic Differentiation

Automatic differentiation (AD) computes exact derivatives by systematically applying the chain rule to your program, not by symbolic algebra or numerical differences.

#automatic differentiation

Concepts4

RLHF Mathematics

Automatic Differentiation

Matrix Calculus

Convex Optimization