Papers1055

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Fanfan Liu, Youyang Yin et al.Feb 5arXiv

The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.

#LUSPO#RLVR#GRPO

Semantic Search over 9 Million Mathematical Theorems

Intermediate

Luke Alexander, Eric Leonen et al.Feb 5arXiv

This paper builds a Google-for-theorems: a semantic search engine that finds exact theorems, lemmas, and propositions instead of just entire papers.

#semantic theorem search#mathematical information retrieval#dense retrieval

SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers

Intermediate

Keyang Xuan, Pengda Wang et al.Feb 4arXiv

This paper builds SocialVeil, a testing world where AI chat agents must talk to each other even when communication is messy, not perfect.

#social intelligence#communication barriers#semantic vagueness

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Intermediate

Sidi Lu, Zhenwen Liang et al.Feb 4arXiv

Locas is a new kind of add-on memory for language models that learns during use but touches none of the model’s original weights.

#Locas#parametric memory#test-time training

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

Intermediate

Changdae Oh, Seongheon Park et al.Feb 4arXiv

This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.

#uncertainty quantification#LLM agents#interactive AI

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Intermediate

Pingyue Zhang, Zihan Huang et al.Feb 4arXiv

This paper asks a simple question with big consequences: can today’s AI models actively explore a new space and build a trustworthy internal map of it?

#active exploration#cognitive map#spatial belief

Reinforced Attention Learning

Intermediate

Bangzheng Li, Jianmo Ni et al.Feb 4arXiv

This paper teaches AI to pay attention better by training its focus, not just its words.

#Reinforced Attention Learning#attention policy#multimodal LLM

Rethinking the Trust Region in LLM Reinforcement Learning

Intermediate

Penghui Qi, Xiangxin Zhou et al.Feb 4arXiv

The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.

#Reinforcement Learning#Proximal Policy Optimization#Trust Region

Privileged Information Distillation for Language Models

Intermediate

Emiliano Penaloza, Dheeraj Vattikonda et al.Feb 4arXiv

The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.

#Privileged Information#Knowledge Distillation#π-Distill

Horizon-LM: A RAM-Centric Architecture for LLM Training

Intermediate

Zhengqing Yuan, Lichao Sun et al.Feb 4arXiv

Horizon-LM flips the usual training setup by keeping all long-term model stuff in the computer’s RAM (CPU) and using the GPU only as a fast, temporary calculator.

#Horizon-LM#memory-centric training#CPU-master GPU-template

Skin Tokens: A Learned Compact Representation for Unified Autoregressive Rigging

Intermediate

Jia-peng Zhang, Cheng-Feng Pu et al.Feb 4arXiv

Rigging 3D characters is a bottleneck: making bones and skin weights by hand is slow and tricky, and past automatic tools often guess the skin weights poorly.

#auto-rigging#skinning weights#SkinTokens

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Intermediate

Yue Ding, Yiyan Ji et al.Feb 4arXiv

OmniSIFT is a new way to shrink (compress) audio and video tokens so omni-modal language models can think faster without forgetting important details.

#Omni-LLM#token compression#modality-asymmetric

22 23 24 25 26