Papers791

CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.

#multi-subject animation#pose-guided video generation#Unbind–Rebind paradigm

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Intermediate

Qiyuan Zhang, Biao Gong et al.Jan 16arXiv

This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.

#physics-aware video generation#rigid body motion#reinforcement learning

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Intermediate

Jie Yang, Honglin Guo et al.Jan 16arXiv

ABC-Bench is a new test that checks if AI coding agents can really do backend work from start to finish, not just write a few lines of code.

#ABC-Bench#agentic backend coding#end-to-end API testing

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Intermediate

Lecheng Yan, Ruizhe Li et al.Jan 16arXiv

The paper shows that when an LLM is trained with spurious (misleading) rewards in RLVR, it can score higher by memorizing answers instead of reasoning.

#RLVR#data contamination#memorization shortcuts

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Intermediate

Keyu Li, Junhao Shi et al.Jan 16arXiv

AgencyBench is a giant test that checks how well AI agents can handle real, long, multi-step jobs, not just short puzzles.

#autonomous agents#long-horizon evaluation#agent benchmarking

BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search

Intermediate

Shiyu Liu, Yongjing Yin et al.Jan 16arXiv

RL-trained search agents often sound confident even when they don’t know, which can mislead people.

#agentic search#reinforcement learning#boundary awareness

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Intermediate

Jiayu Liu, Rui Wang et al.Jan 16arXiv

The paper studies why large language models (LLMs) sound too sure of themselves when using retrieval-augmented generation (RAG) and how to fix it.

#Retrieval-Augmented Generation#Confidence Calibration#Expected Calibration Error

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

Intermediate

Zhongxiang Sun, Yi Zhan et al.Jan 16arXiv

Personalized AI helpers can accidentally copy a user’s past opinions instead of telling objective facts, which the authors call personalization-induced hallucinations.

#personalized large language models#hallucination#factuality

Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Intermediate

Chongcong Jiang, Tianxingjian Ding et al.Jan 15arXiv

Medical SAM3 is a text-prompted medical image segmentation model that was fully fine-tuned on 33 diverse datasets to work across many imaging types like ultrasound, X-ray, endoscopy, and pathology.

#Medical image segmentation#Prompt-based segmentation#Foundation models

Reasoning Models Generate Societies of Thought

Intermediate

Junsol Kim, Shiyang Lai et al.Jan 15arXiv

The paper shows that top reasoning AIs don’t just think longer—they act like a tiny team inside their heads, with different voices that ask, disagree, and then agree.

#society of thought#reasoning reinforcement learning#conversational behaviors

Alterbute: Editing Intrinsic Attributes of Objects in Images

Intermediate

Tal Reiss, Daniel Winter et al.Jan 15arXiv

Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.

#intrinsic attribute editing#visual named entities#identity preservation

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Intermediate

Changle Qu, Sunhao Dai et al.Jan 15arXiv

MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.

#Tool-Integrated Reasoning#Credit Assignment#Bipartite Matching

25 26 27 28 29