Papers1262

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Shih-Yang Liu, Xin Dong et al.Jan 8arXiv

When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.

#GDPO#GRPO#multi-reward reinforcement learning

Not triaged yet

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Intermediate

Boyang Wang, Haoran Zhang et al.Jan 8arXiv

RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.

#robotic manipulation#video diffusion#multi-view generation

Not triaged yet

Plenoptic Video Generation

Intermediate

Xiao Fu, Shitao Tang et al.Jan 8arXiv

PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.

#plenoptic function#camera-controlled video generation#video re-rendering

Not triaged yet

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Intermediate

Shuming Liu, Mingchen Zhuge et al.Jan 8arXiv

The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?

#video reasoning#adaptive reasoning#early exit

Not triaged yet

CoV: Chain-of-View Prompting for Spatial Reasoning

Intermediate

Haoyu Zhao, Akide Liu et al.Jan 8arXiv

This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.

#Chain-of-View Prompting#Embodied Question Answering#Active Viewpoint Reasoning

Not triaged yet

RelayLLM: Efficient Reasoning via Collaborative Decoding

Intermediate

Chengsong Huang, Tong Zheng et al.Jan 8arXiv

RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.

#token-level collaboration#<call>n</call> command#collaborative decoding

Not triaged yet

DocDancer: Towards Agentic Document-Grounded Information Seeking

Intermediate

Qintong Zhang, Xinjie Lv et al.Jan 8arXiv

DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.

#Document Question Answering#Agentic Information Seeking#ReAct

Not triaged yet

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Intermediate

Sixiao Zheng, Minghao Yin et al.Jan 8arXiv

VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.

#Video world model#4D Geometric Control#3D Gaussian trajectories

Not triaged yet

Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

Beginner

Runze He, Yiji Cheng et al.Jan 8arXiv

Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.

#In-Context Image Generation#Reference-based Image Editing#Structured Reasoning

Not triaged yet

Agent-as-a-Judge

Beginner

Runyang You, Hongru Cai et al.Jan 8arXiv

This survey explains how AI judges are changing from single smart readers (LLM-as-a-Judge) into full-on agents that can plan, use tools, remember, and work in teams (Agent-as-a-Judge).

#Agent-as-a-Judge#LLM-as-a-Judge#multi-agent collaboration

Not triaged yet

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Beginner

Wenhao Zeng, Xuteng Zhang et al.Jan 8arXiv

Big reasoning AIs think in many steps, which is slow and costly.

#collaborative inference#initial token entropy#step-level routing

Not triaged yet

Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

Beginner

Muzhao Tian, Zisu Huang et al.Jan 8arXiv

Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).

#steerable memory#memory anchoring#long-term agents

Not triaged yet

67 68 69 70 71