Papers1262

Exploring Reasoning Reward Model for Agents

Kaixuan Fan, Kaituo Feng et al.Jan 29arXiv

The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.

#Agentic Reinforcement Learning#Reasoning Reward Model#Process Supervision

Not triaged yet

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Intermediate

Haozhe Xie, Beichen Wen et al.Jan 29arXiv

DynamicVLA is a small and fast robot brain that sees, reads, and acts while things are moving.

#Dynamic object manipulation#Vision-Language-Action#Continuous inference

Not triaged yet

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Intermediate

Ajay Patel, Colin Raffel et al.Jan 29arXiv

Large language models usually learn by guessing the next word, then get a tiny bit of instruction-following practice; this paper flips that by turning massive web documents into instruction-and-answer pairs at huge scale.

#FineInstructions#synthetic instruction–answer pairs#instruction-tuning pre-training

Not triaged yet

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Intermediate

Anthony Chen, Naomi Ken Korem et al.Jan 29arXiv

This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.

#video dubbing#audio-visual diffusion#joint generation

Not triaged yet

ECO: Quantized Training without Full-Precision Master Weights

Intermediate

Mahdi Nikdan, Amir Zandieh et al.Jan 29arXiv

Training big AI models uses lots of memory because most methods still keep a secret full-precision copy of the weights called master weights.

#quantized training#master weights#error feedback

Not triaged yet

Latent Adversarial Regularization for Offline Preference Optimization

Intermediate

Enyi Jiang, Yibo Jacky Zhang et al.Jan 29arXiv

This paper introduces GANPO, a new way to train language models from human preferences by guiding the model using its hidden thoughts (latent space) instead of just its visible words (token space).

#GANPO#latent space regularization#offline preference optimization

Not triaged yet

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Intermediate

Yibo Wang, Yongcheng Jing et al.Jan 29arXiv

This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.

#vision-text compression#optical memory#iterative reasoning

Not triaged yet

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Intermediate

Wenxuan Huang, Yu Zeng et al.Jan 29arXiv

The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.

#multimodal deep research#visual question answering#ReAct reasoning

Not triaged yet

MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources

Intermediate

Baorui Ma, Jiahui Yang et al.Jan 29arXiv

Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.

#metric depth estimation#sparse metric prompt#monocular depth

Not triaged yet

PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction

Intermediate

Changjian Jiang, Kerui Ren et al.Jan 29arXiv

PLANING is a new way to build 3D worlds from a moving single camera by combining two kinds of pieces: sharp triangles for shape and soft Gaussians for looks.

#Streaming 3D Reconstruction#Triangle Primitives#Neural Gaussians

Not triaged yet

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Intermediate

Johannes Kirmayr, Lukas Stappen et al.Jan 29arXiv

CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.

#LLM agents#benchmarking#consistency

Not triaged yet

Causal World Modeling for Robot Control

Intermediate

Lin Li, Qihang Zhang et al.Jan 29arXiv

Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.

#robot world model#autoregressive diffusion#causal masking

Not triaged yet

41 42 43 44 45