Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).
The paper studies why two opposite-sounding tricks in RL for reasoning—adding random (spurious) rewards and reducing randomness (entropy)—can both seem to help large language models think better.
This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.
GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.
This paper builds a math problem–solving agent, Intern-S1-MO, that thinks in multiple rounds and remembers proven mini-results called lemmas so it can solve very long, Olympiad-level problems.
SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.