When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.
AgentOCR turns an agent’s long text history into pictures so it can remember more using fewer tokens.
This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.
Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.
Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.
MDAgent2 is a special helper built from large language models (LLMs) that can both answer questions about molecular dynamics and write runnable LAMMPS simulation code.
Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.
CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.
The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.
FIGR is a new way for AI to ‘think by drawing,’ using code to build clean, editable diagrams while it reasons.