Groups
Category
Level
RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.