Papers1262

Steering LLMs via Scalable Interactive Oversight

The paper tackles a common problem: people can ask AI to do big, complex tasks, but they can’t always explain exactly what they want or check the results well.

#scalable oversight#interactive alignment#requirement elicitation

Not triaged yet

Training Data Efficiency in Multimodal Process Reward Models

Intermediate

Jinyuan Li, Chengsong Huang et al.Feb 4arXiv

Multimodal Process Reward Models (MPRMs) teach AI to judge each step of a picture-and-text reasoning process, not just the final answer.

#Multimodal Process Reward Model#Process Supervision#Monte Carlo Annotation

Not triaged yet

Likelihood-Based Reward Designs for General LLM Reasoning

Beginner

Ariel Kwiatkowski, Natasha Butt et al.Feb 3arXiv

Binary right/wrong rewards for training reasoning in large language models are hard to design and often too sparse to learn from.

#log-likelihood reward#chain-of-thought (CoT)#reinforcement learning for LLMs

Not triaged yet

VLS: Steering Pretrained Robot Policies via Vision-Language Models

Intermediate

Shuo Liu, Ishneet Sukhvinder Singh et al.Feb 3arXiv

Robots often learn good hand motions during training but get confused when the scene or the instructions change at test time, even a little bit.

#Vision–Language Steering#Inference-time control#Diffusion policy

Not triaged yet

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Intermediate

Yinyi Luo, Yiqiao Jin et al.Feb 3arXiv

AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.

#multi-agent distillation#process reward model#GRPO

Not triaged yet

Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

Intermediate

Tong Zheng, Chengsong Huang et al.Feb 3arXiv

Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.

#parallel thinking#2D probing#consensus-based early stopping

Not triaged yet

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

Intermediate

Minjun Zhu, Zhen Lin et al.Feb 3arXiv

AutoFigure is an AI system that reads long scientific texts and then thinks, plans, and draws clear, good-looking figures—like a careful student who makes a neat, accurate poster from a long chapter.

#AutoFigure#FigureBench#Reasoned Rendering

Not triaged yet

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Intermediate

Zimu Lu, Houxing Ren et al.Feb 3arXiv

This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.

#agentic coding#multi-agent systems#full-stack development

Not triaged yet

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Intermediate

Zhixue Fang, Xu He et al.Feb 3arXiv

This paper introduces 3DiMo, a new way to control how people move in generated videos while keeping the camera moves flexible through text.

#3D-aware motion#implicit motion encoder#motion tokens

Not triaged yet

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Intermediate

Azmine Toushik Wasi, Wahid Faisal et al.Feb 3arXiv

SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.

#SpatiaLab#spatial reasoning#vision-language models

Not triaged yet

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

Beginner

Jianhao Ruan, Zhihao Xu et al.Feb 3arXiv

AOrchestra is like a smart conductor that builds the right mini-helpers (sub-agents) on demand to solve big, multi-step tasks.

#agent orchestration#sub-agent-as-tools#four-tuple abstraction

Not triaged yet

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Intermediate

Ian Wu, Yuxiao Qu et al.Feb 3arXiv

Reasoning Cache (RC) is a new way for AI to think in steps: it writes some thoughts, makes a short summary, throws away the long thoughts, and then keeps going using only the summary.

#Reasoning Cache#iterative decoding#summary-conditioned reasoning

Not triaged yet

30 31 32 33 34