Papers10

#KL divergence

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

BandPO is a new training method for large language models that keeps updates safe while letting the model freely explore smart, low-probability ideas.

#BandPO#PPO clipping#trust region

Not triaged yet

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

Intermediate

Alexander Samarin, Sergei Krutikov et al.Feb 27arXiv

Speculative decoding speeds up big language models by letting a small helper model guess several next words and having the big model check them all at once.

#speculative decoding#acceptance rate#LK losses

Not triaged yet

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Intermediate

Xiaotong Ji, Rasul Tutunov et al.Feb 20arXiv

Decoding (how a language model picks the next word) isn’t a bag of tricks; it’s a clean optimisation problem over probabilities.

#decoding as optimisation#probability simplex#softmax sampling

Not triaged yet

Unified Latents (UL): How to train your latents

Intermediate

Jonathan Heek, Emiel Hoogeboom et al.Feb 19arXiv

Unified Latents (UL) is a way to learn the hidden code (latents) for images and videos by training three parts together: an encoder, a diffusion prior, and a diffusion decoder.

#Unified Latents#diffusion prior#diffusion decoder

Not triaged yet

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Intermediate

Guobin Shen, Chenxiao Zhao et al.Feb 11arXiv

VESPO is a new, stable way to train language models with reinforcement learning even when training data comes from older or mismatched policies.

#VESPO#off-policy reinforcement learning#importance sampling

Not triaged yet

The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

Intermediate

Chenxu Wang, Chaozhuo Li et al.Feb 10arXiv

The paper shows a three-way no-win situation: an AI society cannot be closed off, keep learning forever, and stay perfectly safe for humans all at the same time.

#self-evolving AI#multi-agent systems#AI safety

Not triaged yet

Effective Reasoning Chains Reduce Intrinsic Dimensionality

Beginner

Archiki Prasad, Mandar Joshi et al.Feb 9arXiv

The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?

#intrinsic dimensionality#chain-of-thought#LoRA

Not triaged yet

Rethinking Selective Knowledge Distillation

Intermediate

Almog Tavor, Itay Ebenspanger et al.Feb 1arXiv

The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.

#knowledge distillation#selective distillation#student entropy

Not triaged yet

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Intermediate

Shuoshuo Zhang, Yizhen Zhang et al.Dec 26arXiv

The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.

#BiPS#perceptual shaping#vision-language models

Not triaged yet

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Intermediate

Yang Fei, George Stoica et al.Dec 12arXiv

The paper teaches a video generator to move things realistically by borrowing motion knowledge from a strong video tracker.

#video diffusion#structure-preserving motion#SAM2

Not triaged yet