SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.
SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.
The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.
This paper shows that comics (multi-panel pictures with words) can help AI think through problems step by step, just like a student explains their work.
UniReason is a single, unified model that plans with world knowledge before making an image and then edits its own result to fix mistakes, like a student drafting and revising an essay.
LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.
Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.
Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.
Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.
MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.
The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.