SLATE is a new way to teach AI to think step by step while using a search engine, giving feedback at each step instead of only at the end.
Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.
Millions of public AI models exist, but downloads are concentrated on a tiny set of βofficialβ checkpoints, which are not always the best performers.