πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Mathematical reasoning

Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Intermediate
Shiting Huang, Zecheng Li et al.Feb 10arXiv

The paper teaches large language models to do what good students do: find where they went wrong, turn that lesson into a rule, and remember it for next time.

#Reinforcement Learning with Verifiable Rewards#RLVR#Meta-Experience Learning

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Intermediate
Yanqi Dai, Yuxiang Ji et al.Jan 28arXiv

This paper says that to make math-solving AIs smarter, we should train them more on the hardest questions they can almost solve.

#Mathematical reasoning#RLVR#GRPO

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Intermediate
Jingcheng Hu, Yinmin Zhang et al.Jan 9arXiv

PaCoRe is a way for AI to think in many parallel paths and then coordinate them, so it can use a lot more brainpower at test time without running out of context window space.

#Parallel Coordinated Reasoning#Test-time compute scaling#Message passing

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Intermediate
Shidong Cao, Hongzhan Lin et al.Jan 7arXiv

DiffCoT treats a model’s step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.

#Chain-of-Thought#Diffusion models#Autoregressive decoding