๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#importance sampling

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Intermediate
Zhenghao Xu, Qin Lu et al.Feb 5arXiv

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Intermediate
Zhaopeng Qiu, Shuang Yu et al.Jan 26arXiv

The paper shows how to speed up reinforcement learning (RL) for large language models (LLMs) by making numbers smaller (FP8) without breaking training.

#FP8 quantization#LLM reinforcement learning#KV-cache

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Intermediate
Eddie Landesberg, Manjari NarayanDec 11arXiv

LLM judges are cheap but biased; without calibration they can completely flip which model looks best.

#LLM-as-judge#calibration#isotonic regression