This paper introduces Laser, a new way for vision-language models to think in their hidden space before speaking, so they see the whole “forest” before picking out the “trees.”
This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.
DiffCoT treats a model’s step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.
Large language models (LLMs) are good at many math problems but often mess up simple counting when the list gets long.
NextFlow is a single, decoder-only Transformer that can read and write both text and images in one continuous sequence.
This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.
This paper turns messy chains of thought from language models into clear, named steps so we can see how they really think through math problems.
The paper proposes the Laws of Reasoning (LORE), simple rules that say how much a model should think and how accurate it can be as problems get harder.
Traditional self-driving used separate boxes for seeing, thinking, and acting, but tiny mistakes in early boxes could snowball into big problems later.
Vision-Language-Action (VLA) models are robots’ “see–think–do” brains that connect cameras (vision), words (language), and motors (action).
Big AI models often write very long step-by-step solutions, but usual checkers either only check the final answer or get lost in the long steps.
This paper builds a math problem–solving agent, Intern-S1-MO, that thinks in multiple rounds and remembers proven mini-results called lemmas so it can solve very long, Olympiad-level problems.