Papers2

#exploration vs exploitation

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Shumin Wang, Yuexiang Xie et al.Feb 3arXiv

The paper builds a simple, math-light rule to predict whether training makes a language model more open-minded (higher entropy) or more sure of itself (lower entropy).

#reinforcement fine-tuning#entropy dynamics#GRPO

Not triaged yet

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner

Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

Not triaged yet