The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.
The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.
This paper shows how to add a tiny helper (a probe) to a big language model so it can classify things like safety or sentiment during the same pass it already does to answer you.
WorldMind teaches AI agents to learn the rules of the real world while they act, instead of cramming everything into fixed model weights.
Big models like Whisper are great for accuracy but too slow for live captions; this paper builds a smaller, faster Thai speech recognizer for real-time use.
Being-H0.5 is a robot brain that learns from huge amounts of human videos and robot demos so it can work on many different robots, not just one.
This paper explains how to turn large language models (LLMs) from quiet students that only answer questions into active agents that can plan, act, and learn over time.
This paper introduces MMDeepResearch-Bench (MMDR-Bench), a new test that checks how well AI “deep research agents” write long, citation-rich reports using both text and images.
ToolPRMBench is a new benchmark that checks, step by step, whether an AI agent using tools picks the right next action.
Agentic-R is a new way to teach a search retriever to find not just similar text, but the text that truly helps an AI get the final answer right.
Terminal-Bench 2.0 is a tough test that checks how well AI agents can solve real, professional tasks by typing commands in a computer terminal.
UniX is a new medical AI that both understands chest X-rays (writes accurate reports) and generates chest X-ray images (high visual quality) without making the two jobs fight each other.