๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#mathematical reasoning

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Intermediate
Zhi Zheng, Wee Sun LeeJan 29arXiv

The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.

#latent reasoning#chain-of-thought#variational autoencoder

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Intermediate
Wei Du, Shubham Toshniwal et al.Dec 17arXiv

Nemotron-Math is a giant math dataset with 7.5 million step-by-step solutions created in three thinking styles and with or without Python help.

#mathematical reasoning#long-context fine-tuning#multi-mode supervision

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Intermediate
Zhenpeng Su, Leiyu Pan et al.Dec 5arXiv

Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the โ€œsafe zone,โ€ causing unstable learning.

#reinforcement learning#PPO-clip#KL penalty

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Intermediate
Monishwaran Maheswaran, Rishabh Tiwari et al.Dec 4arXiv

ARBITRAGE makes AI solve step-by-step problems faster by only using the big, slow model when it is predicted to truly help.

#speculative decoding#step-level speculative decoding#advantage-aware routing