The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.
OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.
Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.
V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.
EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.
MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.
Traditional supervised fine-tuning (SFT) makes a model copy one answer too exactly, which can cause overfitting to the exact wording instead of the real idea.
Similarity tells you if two models seem to think about things the same way, but it doesn’t tell you if that thinking is sturdy when the world wiggles.
World Craft lets anyone turn a short text description into a playable, visual game world without coding.
EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.
SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.
This survey asks how close AI memory systems are to human memory and organizes the answer into three parts: implicit memory (inside the model), explicit memory (outside storage you can look up), and agentic memory (what an AI agent keeps over time to plan and act).