Reasoning models often talk too much, and those extra words can actually make them more wrong.
Phi-4-reasoning-vision-15B is a small, open-weight AI that understands pictures and text together and is especially good at math, science, and using computer screens.
MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.
LaSER teaches a fast search model to “think” quietly inside its hidden space, so it gets the benefits of step-by-step reasoning without writing those steps out as text.
CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.
GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.
This paper builds GUI-Owl-1.5, an AI that can use phones, computers, and web browsers like a careful human helper.
The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.
The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.
The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?
SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.
SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.