πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Total Variation Divergence

Rethinking the Trust Region in LLM Reinforcement Learning

Intermediate
Penghui Qi, Xiangxin Zhou et al.Feb 4arXiv

The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.

#Reinforcement Learning#Proximal Policy Optimization#Trust Region

Not triaged yet