🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#trust region

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Intermediate
Yuan Li, Bo Wang et al.Mar 5arXiv

BandPO is a new training method for large language models that keeps updates safe while letting the model freely explore smart, low-probability ideas.

#BandPO#PPO clipping#trust region

Not triaged yet

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Intermediate
Peter Chen, Xiaopeng Li et al.Dec 18arXiv

The paper studies why two opposite-sounding tricks in RL for reasoning—adding random (spurious) rewards and reducing randomness (entropy)—can both seem to help large language models think better.

#RLVR#Group Relative Policy Optimization#ratio clipping

Not triaged yet

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Intermediate
Timo Klein, Thomas Lang et al.Dec 16arXiv

Reinforcement learning agents often see the world in straight, flat space (Euclidean), but many decision problems look more like branching trees that fit curved, hyperbolic space better.

#hyperbolic reinforcement learning#Hyperboloid#Poincaré Ball

Not triaged yet

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Intermediate
Zhenpeng Su, Leiyu Pan et al.Dec 5arXiv

Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the “safe zone,” causing unstable learning.

#reinforcement learning#PPO-clip#KL penalty

Not triaged yet

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT

Not triaged yet