Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.
OCRVerse is a new AI model that can read both plain text in documents and the visual structures in charts, webpages, and science plots, all in one system.
This paper shows that giving an AI a safe, tiny virtual computer (a sandbox) lets it solve many kinds of problems better, not just coding ones.
This paper explains how to turn large language models (LLMs) from quiet students that only answer questions into active agents that can plan, act, and learn over time.
Big language models can learn new facts with simple tutoring (SFT), but that doesn’t automatically teach them how to use those facts well.
MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.
Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.
SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.
This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.
MiMo-V2-Flash is a giant but efficient language model that uses a team-of-experts design to think well while staying fast.
Falcon-H1R is a small (7B) AI model that thinks really well without needing giant computers.
Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.