Papers4

#LLM alignment

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

Not triaged yet

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Intermediate

Ziwen Xu, Chenyan Wu et al.Feb 2arXiv

The paper shows that three popular ways to control language models—fine-tuning a few weights, LoRA, and activation steering—are actually the same kind of action: a dynamic weight update driven by a control knob.

#language model steering#dynamic weight updates#activation steering

Not triaged yet

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Intermediate

Jie Xiao, Meng Chen et al.Feb 2arXiv

ECHO-2 is a new way to train AI with reinforcement learning that keeps a small, central trainer busy while sending the easy, cheap work (rollouts) to many low-cost computers spread around the world.

#ECHO-2#distributed rollouts#bounded staleness

Not triaged yet

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate

Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

Not triaged yet