CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
ABC-Bench is a new test that checks if AI coding agents can really do backend work from start to finish, not just write a few lines of code.
The paper shows that when an LLM is trained with spurious (misleading) rewards in RLVR, it can score higher by memorizing answers instead of reasoning.
AgencyBench is a giant test that checks how well AI agents can handle real, long, multi-step jobs, not just short puzzles.
RL-trained search agents often sound confident even when they don’t know, which can mislead people.
The paper studies why large language models (LLMs) sound too sure of themselves when using retrieval-augmented generation (RAG) and how to fix it.
Personalized AI helpers can accidentally copy a user’s past opinions instead of telling objective facts, which the authors call personalization-induced hallucinations.
Medical SAM3 is a text-prompted medical image segmentation model that was fully fine-tuned on 33 diverse datasets to work across many imaging types like ultrasound, X-ray, endoscopy, and pathology.
The paper shows that top reasoning AIs don’t just think longer—they act like a tiny team inside their heads, with different voices that ask, disagree, and then agree.
Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.
MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.