Papers1055

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Yusong Lin, Haiyang Wang et al.Feb 11arXiv

CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.

#agentic coding#command line interface#Dockerfile

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Intermediate

Qixing Zhou, Jiacheng Zhang et al.Feb 11arXiv

FeatureBench is a new benchmark that tests AI coding agents on building real software features, not just fixing small bugs.

#FeatureBench#agentic coding#execution-based evaluation

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Intermediate

Yitian Gong, Kuangwei Chen et al.Feb 11arXiv

This paper builds a new audio tokenizer, called MOSS-Audio-Tokenizer, that turns sound into tiny tokens the way text tokenizers turn sentences into words.

#audio tokenizer#causal transformer#residual vector quantization

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Intermediate

Chenlong Deng, Mengjie Deng et al.Feb 11arXiv

Most image search systems judge each photo by itself, which fails when clues are split across many photos taken over time.

#context-aware image retrieval#multimodal agents#visual history exploration

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Intermediate

Guobin Shen, Chenxiao Zhao et al.Feb 11arXiv

VESPO is a new, stable way to train language models with reinforcement learning even when training data comes from older or mismatched policies.

#VESPO#off-policy reinforcement learning#importance sampling

How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Intermediate

Jiahao Yuan, Yike Xu et al.Feb 11arXiv

Decoder-only language models can be great at making user profiles (embeddings), but how we let them look at the sequence—called attention masking—changes how smart those profiles are.

#decoder-only LLM#attention masking#causal attention

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Intermediate

Shuo He, Lang Feng et al.Feb 11arXiv

Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.

#Kalman filter#importance sampling ratio#policy optimization

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Intermediate

Ailin Huang, Ang Li et al.Feb 11arXiv

Step 3.5 Flash is a huge but efficient AI that keeps 196 billion total parameters but only wakes up about 11 billion per token, so it thinks smart and fast.

#Sparse Mixture-of-Experts#Sliding-Window Attention#Head-wise Gated Attention

MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning

Intermediate

Chenhao Zhang, Yazhe Niu et al.Feb 11arXiv

Pictures can hide deeper meanings, like a wilted plant meaning someone feels burned out; most AI models miss these hints.

#image metaphor understanding#image implication#visual reinforcement learning

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Intermediate

Leheng Sheng, Yongtao Zhang et al.Feb 11arXiv

Long texts overwhelm many language models, which forget important bits and slow down as the context grows.

#gated recurrent memory#update gate#exit gate

Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

Intermediate

Kirill Pavlenko, Alexander Golubev et al.Feb 10arXiv

The paper fixes a common mistake in training language models for multi-part tasks: giving the same reward signal to every token, even when different text parts aim at different goals.

#Blockwise Advantage Estimation#Outcome-Conditioned Baseline#Group Relative Policy Optimization

Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens

Intermediate

Weihao Liu, Dehai Min et al.Feb 10arXiv

The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.

#latent tokens#chain-of-thought#context-prediction fusion

16 17 18 19 20