ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.
The paper tackles a common problem: people can ask AI to do big, complex tasks, but they can’t always explain exactly what they want or check the results well.
DeepSearchQA is a new test with 900 real-world style questions that checks if AI agents can find complete lists of answers, not just one fact.
DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.
Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.
Fast-ThinkAct teaches a robot to plan with a few tiny hidden "thought tokens" instead of long paragraphs, making it much faster while staying smart.
The paper builds a new way to create realistic, long conversations between people and AI that use tools like databases.
ArenaRL teaches AI agents by comparing their answers against each other, like a sports tournament, instead of giving each answer a single noisy score.
Dream-VL and Dream-VLA use a diffusion language model backbone to understand images, talk about them, and plan actions better than many regular (autoregressive) models.
The paper turns video avatars from passive puppets into active doers that can plan, act, check their own work, and fix mistakes over many steps.
SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.
Robots need lots of realistic, long videos to learn, but collecting them is slow and expensive.