Papers5

#Mixture-of-Experts (MoE)

Qwen3-Coder-Next Technical Report

Ruisheng Cao, Mouxiang Chen et al.Feb 28arXiv

Qwen3-Coder-Next is an open-weight coding model that uses only 3B of its 80B total parameters at a time, so it runs fast while still being smart.

#Qwen3-Coder-Next#agentic training#verifiable coding tasks

Not triaged yet

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Intermediate

Guobin Shen, Chenxiao Zhao et al.Feb 11arXiv

VESPO is a new, stable way to train language models with reinforcement learning even when training data comes from older or mismatched policies.

#VESPO#off-policy reinforcement learning#importance sampling

Not triaged yet

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Intermediate

Fanfan Liu, Youyang Yin et al.Feb 5arXiv

The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.

#LUSPO#RLVR#GRPO

Not triaged yet

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Intermediate

Qifan Yu, Xinyu Ma et al.Feb 2arXiv

This paper shows how to safely make a neural network wider in the middle of training without it freaking out.

#Progressive Learning#Width Expansion#RMS scale

Not triaged yet

Scaling Embeddings Outperforms Scaling Experts in Language Models

Intermediate

Hong Liu, Jiaqi Zhang et al.Jan 29arXiv

The paper shows that growing the embedding part of a language model (especially with n-grams) can beat adding more MoE experts once you pass a certain sparsity 'sweet spot.'

#N-gram Embedding#Mixture-of-Experts (MoE)#Embedding Scaling

Not triaged yet