SLIME is a new way to train chatbots so they follow human preferences without forgetting how to write well.
This paper introduces GANPO, a new way to train language models from human preferences by guiding the model using its hidden thoughts (latent space) instead of just its visible words (token space).