Papers160

All Beginner Intermediate Advanced

All Sources arXiv

#reinforcement learning

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Intermediate

Leheng Sheng, Yongtao Zhang et al.Feb 11arXiv

Long texts overwhelm many language models, which forget important bits and slow down as the context grows.

#gated recurrent memory#update gate#exit gate

WorldCompass: Reinforcement Learning for Long-Horizon World Models

Beginner

Zehan Wang, Tengfei Wang et al.Feb 9arXiv

WorldCompass teaches video world models to follow actions better and keep pictures pretty by using reinforcement learning after pretraining.

#world models#reinforcement learning#clip-level rollout

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Beginner

Tiwei Bie, Maosong Cao et al.Feb 9arXiv

LLaDA2.1 teaches a diffusion-style language model to write fast rough drafts and then fix its own mistakes by editing tokens it already wrote.

#discrete diffusion language model#editable decoding#token-to-token editing

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Intermediate

Tianyi Wu, Mingzhe Du et al.Feb 7arXiv

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

POINTS-GUI-G: GUI-Grounding Journey

Intermediate

Zhongyin Zhao, Yuan Liu et al.Feb 6arXiv

This paper teaches a computer to find buttons, text, and icons on screens so it can click and type in the right places, a skill called GUI grounding.

#GUI grounding#reinforcement learning#verifiable rewards

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Intermediate

Dongyang Chen, Chaoyang Wang et al.Feb 5arXiv

V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.

#V-Retrver#multimodal retrieval#agentic reasoning

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Intermediate

Haozhen Zhang, Haodong Yue et al.Feb 5arXiv

BudgetMem is a way for AI helpers to build and use memory on the fly, picking how much thinking to spend so answers are both good and affordable.

#runtime memory extraction#budget-tier routing#reinforcement learning

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Intermediate

Junxiao Liu, Zhijun Wang et al.Feb 5arXiv

TRIT is a new training method that teaches AI to translate and think at the same time so it can solve hard problems in many languages without extra helper models.

#multilingual reasoning#translation-reasoning integration#self-translation

Skin Tokens: A Learned Compact Representation for Unified Autoregressive Rigging

Intermediate

Jia-peng Zhang, Cheng-Feng Pu et al.Feb 4arXiv

Rigging 3D characters is a bottleneck: making bones and skin weights by hand is slow and tricky, and past automatic tools often guess the skin weights poorly.

#auto-rigging#skinning weights#SkinTokens

ERNIE 5.0 Technical Report

Intermediate

Haifeng Wang, Hua Wu et al.Feb 4arXiv

ERNIE 5.0 is a single giant model that can read and create text, images, video, and audio by predicting the next pieces step by step, like writing a story one line at a time.

#ERNIE 5.0#unified autoregressive model#mixture-of-experts

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Intermediate

Yansong Ning, Jun Fang et al.Feb 4arXiv

Agent-Omit teaches AI agents to skip unneeded thinking and old observations, cutting tokens while keeping accuracy high.

#LLM agents#reinforcement learning#agentic RL

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Intermediate

Ian Wu, Yuxiao Qu et al.Feb 3arXiv

Reasoning Cache (RC) is a new way for AI to think in steps: it writes some thoughts, makes a short summary, throws away the long thoughts, and then keeps going using only the summary.

#Reasoning Cache#iterative decoding#summary-conditioned reasoning

1 2 3 4 5