Groups
Category
RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.
Maximum Likelihood Estimation (MLE) chooses parameters that make the observed data most probable under a chosen model.