Groups
Category
Level
RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.
Label smoothing replaces a hard one-hot target with a slightly softened distribution to reduce model overconfidence.