SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization
IntermediateMaksim Afanasyev, Illarion IovFeb 2arXiv
SLIME is a new way to train chatbots so they follow human preferences without forgetting how to write well.
#SLIME#preference optimization#DPO