Papers3

#variance reduction

SLATE is a new way to teach AI to think step by step while using a search engine, giving feedback at each step instead of only at the end.

Not triaged yet

Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.

Not triaged yet

Millions of public AI models exist, but downloads are concentrated on a tiny set of “official” checkpoints, which are not always the best performers.

Not triaged yet