DARC teaches big language models to get smarter by splitting training into two calm, well-organized steps instead of one chaotic loop.
Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.
ToolPRMBench is a new benchmark that checks, step by step, whether an AI agent using tools picks the right next action.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.
Large language models usually get only a final thumbs-up or thumbs-down at the end of their answer, which is too late to fix mistakes made in the middle.
ToolSafe is a new way to keep AI agents safe when they use external tools, by checking each action before it runs.
The paper introduces M^4olGen, a two-stage system that designs new molecules to match exact numbers for several properties (like QED, LogP, MW, HOMO, LUMO) at the same time.
Fast-ThinkAct teaches a robot to plan with a few tiny hidden "thought tokens" instead of long paragraphs, making it much faster while staying smart.
The paper introduces Multiplex Thinking, a new way for AI to think by sampling several likely next words at once and blending them into a single super-token.
The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.
Group-based reinforcement learning for reasoning (like GRPO) uses the group's average reward as a baseline, but that makes its 'advantage' estimates biased.