The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.
This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.
The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.
OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.
Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.
V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.
EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.
MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.
Similarity tells you if two models seem to think about things the same way, but it doesn’t tell you if that thinking is sturdy when the world wiggles.
World Craft lets anyone turn a short text description into a playable, visual game world without coding.
EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.
SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.