This paper introduces Laser, a new way for vision-language models to think in their hidden space before speaking, so they see the whole “forest” before picking out the “trees.”
This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.
Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.
DiffCoT treats a model’s step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.
Large language models (LLMs) are good at many math problems but often mess up simple counting when the list gets long.
Falcon-H1R is a small (7B) AI model that thinks really well without needing giant computers.
NextFlow is a single, decoder-only Transformer that can read and write both text and images in one continuous sequence.
This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.
This paper turns messy chains of thought from language models into clear, named steps so we can see how they really think through math problems.
The paper proposes the Laws of Reasoning (LORE), simple rules that say how much a model should think and how accurate it can be as problems get harder.
Traditional self-driving used separate boxes for seeing, thinking, and acting, but tiny mistakes in early boxes could snowball into big problems later.
Vision-Language-Action (VLA) models are robots’ “see–think–do” brains that connect cameras (vision), words (language), and motors (action).