Papers1252

Heterogeneous Agent Collaborative Reinforcement Learning

Zhixia Zhang, Zixuan Huang et al.Mar 3arXiv

This paper introduces HACRL, a way for different kinds of AI agents to learn together during training but still work alone during use.

#HACRL#HACPO#heterogeneous agents

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Beginner

Ziwen Xu, Kewei Xu et al.Mar 3arXiv

Large language models can act unpredictably in sensitive places like schools, hospitals, and customer support, so we need reliable ways to guide how they talk and behave.

#LLM controllability#behavioral granularity#hierarchical evaluation

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

Intermediate

Jiahao Lu, Jiayi Xu et al.Mar 3arXiv

Track4World is a fast, feedforward AI that can follow the 3D path of every pixel in a video using just one camera.

#dense 3D tracking#scene flow#2D-to-3D correlation

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Intermediate

Jiejun Tan, Zhicheng Dou et al.Mar 3arXiv

MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.

#long-term memory#LLM retrieval#proxy model

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Intermediate

Liu Yang, Zeyu Nie et al.Mar 3arXiv

ParEVO teaches AI to write fast, safe parallel code for messy, irregular data like big graphs and uneven trees.

#ParEVO#ParlayLib#irregular parallelism

PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

Intermediate

Rituraj Sharma, Weiyuan Chen et al.Mar 3arXiv

PRISM is a new way to help AI think through hard problems by checking each step, not just the final answer.

#DEEPTHINK#Process Reward Model#step-level verification

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Intermediate

Yichen Liu, Donghao Zhou et al.Mar 2arXiv

HiFi-Inpaint is a new AI method that fills a missing area in a photo of a person by inserting a specific product, while keeping tiny details like logos, textures, and small text crisp.

#reference-based inpainting#high-frequency map#Shared Enhancement Attention

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Beginner

Valentin Lacombe, Valentin Quesnel et al.Mar 2arXiv

Reasoning Core is a tool that automatically creates a huge variety of logic and math puzzles, checks every answer with real solvers, and lets you smoothly dial the difficulty up or down.

#procedural data generation#symbolic reasoning#PDDL planning

Tool Verification for Test-Time Reinforcement Learning

Intermediate

Ruotong Liao, Nikolai Röhrich et al.Mar 2arXiv

The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.

#test-time reinforcement learning#verification-weighted voting#tool verification

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Intermediate

Yiqi Lin, Guoqiang Liang et al.Mar 2arXiv

Kiwi-Edit is a new video editor that follows your words and also copies looks from a picture you give it.

#reference-guided video editing#instruction-based editing#multimodal large language model

SageBwd: A Trainable Low-bit Attention

Beginner

Jintao Zhang, Marco Chen et al.Mar 2arXiv

SageBwd is a way to make the Transformer's attention both fast and trainable by doing most big multiplications in 8-bit instead of full precision.

#SageBwd#low-bit attention#INT8 training

Recursive Think-Answer Process for LLMs and VLMs

Intermediate

Byung-Kwan Lee, Youngchae Chee et al.Mar 2arXiv

This paper teaches AI models to judge how sure they are about an answer and to think again if they are not sure.

#Recursive Think–Answer#Confidence-guided reasoning#Reinforcement learning for LLMs

2 3 4 5 6