πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#probability simplex

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Intermediate
Yuan Li, Bo Wang et al.Mar 5arXiv

BandPO is a new training method for large language models that keeps updates safe while letting the model freely explore smart, low-probability ideas.

#BandPO#PPO clipping#trust region

Not triaged yet

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Intermediate
Xiaotong Ji, Rasul Tutunov et al.Feb 20arXiv

Decoding (how a language model picks the next word) isn’t a bag of tricks; it’s a clean optimisation problem over probabilities.

#decoding as optimisation#probability simplex#softmax sampling

Not triaged yet