Papers4

#stability

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Arnas Uselis, Andrea Dittadi et al.Feb 27arXiv

The paper asks a simple question: what must a vision model’s internal pictures (embeddings) look like if it can recognize new mixes of things it already knows?

#compositional generalization#linear representation hypothesis#orthogonal representations

Not triaged yet

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Intermediate

Shuo He, Lang Feng et al.Feb 11arXiv

Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.

#Kalman filter#importance sampling ratio#policy optimization

Not triaged yet

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Intermediate

Zhenghao Xu, Qin Lu et al.Feb 5arXiv

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

Not triaged yet

SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization

Intermediate

Maksim Afanasyev, Illarion IovFeb 2arXiv

SLIME is a new way to train chatbots so they follow human preferences without forgetting how to write well.

#SLIME#preference optimization#DPO

Not triaged yet