๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#math reasoning

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Intermediate
Zhenghao Xu, Qin Lu et al.Feb 5arXiv

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

Scaling Multiagent Systems with Process Rewards

Intermediate
Ed Li, Junyu Ren et al.Jan 30arXiv

This paper teaches AI teams to get better by scoring every move they make, not just the final answer.

#multiagent reinforcement learning#process rewards#AI feedback

FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation

Intermediate
Siyang He, Qiqi Wang et al.Jan 30arXiv

Diffusion language models (dLLMs) can write text in any order, but common decoding methods still prefer left-to-right, which wastes their superpower.

#diffusion language models#non-autoregressive generation#frequency-domain analysis

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Intermediate
Yibo Wang, Yongcheng Jing et al.Jan 29arXiv

This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.

#vision-text compression#optical memory#iterative reasoning

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision