Papers1262

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Huatong Song, Lisheng Huang et al.Feb 3arXiv

SWE-Master is a fully open, step-by-step recipe for turning a regular coding model into a strong software-fixing agent that works across many steps, files, and tests.

#SWE-Master#software engineering agent#long-horizon SFT

Not triaged yet

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Intermediate

Shumin Wang, Yuexiang Xie et al.Feb 3arXiv

The paper builds a simple, math-light rule to predict whether training makes a language model more open-minded (higher entropy) or more sure of itself (lower entropy).

#reinforcement fine-tuning#entropy dynamics#GRPO

Not triaged yet

MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

Intermediate

Ning Ding, Fangcheng Liu et al.Feb 3arXiv

MeKi is a new way to grow a language model’s knowledge by using storage (ROM) instead of extra heavy calculations (FLOPs).

#MeKi#memory-based scaling#token-level experts

Not triaged yet

Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

Intermediate

Rakshith Vasudev, Melisa Russak et al.Feb 3arXiv

The paper shows that even if a model is great at predicting when an AI agent will fail, jumping in to “fix” the agent mid-task can still make things worse.

#LLM critic#execution-time intervention#disruption–recovery tradeoff

Not triaged yet

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Intermediate

Dongwon Jo, Beomseok Kang et al.Feb 3arXiv

This paper speeds up how AI models read very long texts by carefully choosing which words (tokens) to focus on at each step.

#Token Sparse Attention#Dynamic Token Coverage#Representation Drift

Not triaged yet

Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch

Intermediate

Hyunwoo Kim, Niloofar Mireshghallah et al.Feb 3arXiv

The paper introduces PRIVASIS, a huge, fully synthetic dataset (1.4 million records) filled with realistic-looking private details, but created from scratch so it does not belong to any real person.

#synthetic dataset#privacy preservation#data sanitization

Not triaged yet

FASA: Frequency-aware Sparse Attention

Intermediate

Yifei Wang, Yueqi Wang et al.Feb 3arXiv

FASA is a training-free method that makes large language models faster and lighter on memory by keeping only the most useful past tokens during decoding.

#FASA#Frequency-aware sparse attention#KV cache compression

Not triaged yet

Self-Hinting Language Models Enhance Reinforcement Learning

Intermediate

Baohao Liao, Hanze Dong et al.Feb 3arXiv

When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.

#reinforcement learning#GRPO#self-hinting

Not triaged yet

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Intermediate

Tianhe Wu, Ruibin Li et al.Feb 3arXiv

The paper solves a big problem in fast image generators: they got quick, but they lost variety and kept making similar pictures.

#diffusion distillation#distribution matching distillation#mode collapse

Not triaged yet

Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning

Intermediate

Jiayao Mai, Bangyan Liao et al.Feb 3arXiv

This paper shows that many hard math and AI problems can be solved with one shared idea called homotopy, where we move from an easy version of a problem to the real one step by step.

#homotopy continuation#predictor-corrector#reinforcement learning

Not triaged yet

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

Intermediate

Zhiyuan Yao, Yi-Kai Zhang et al.Feb 3arXiv

Large language models learn better when we spend more practice time on the right questions at the right moments.

#Reinforcement Learning#RLVR#GRPO

Not triaged yet

LatentMem: Customizing Latent Memory for Multi-Agent Systems

Intermediate

Muxin Fu, Guibin Zhang et al.Feb 3arXiv

LatentMem is a new memory system that helps teams of AI agents remember the right things for their specific jobs without overloading them with text.

#multi-agent systems#latent memory#role-aware memory

Not triaged yet

32 33 34 35 36