Groups
Category
RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.
Maximum A Posteriori (MAP) estimation chooses the parameter value with the highest posterior probability after seeing data.