Papers2

#training dynamics

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.

#Kalman filter#importance sampling ratio#policy optimization

Not triaged yet

Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Intermediate

Gengze Zhou, Chongjian Ge et al.Dec 6arXiv

This paper fixes two big problems in image-making AI that builds pictures step by step: it often practices with perfect answers (teacher forcing) but must perform using its own imperfect guesses later, and the earliest coarse steps are much harder than the later fine steps.

#visual autoregressive modeling#next-scale prediction#exposure bias

Not triaged yet