On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
IntermediateShumin Wang, Yuexiang Xie et al.Feb 3arXiv
The paper builds a simple, math-light rule to predict whether training makes a language model more open-minded (higher entropy) or more sure of itself (lower entropy).
#reinforcement fine-tuning#entropy dynamics#GRPO