๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Policy Gradient

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Intermediate
Shyam Sundhar Ramesh, Xiaotong Ji et al.Feb 5arXiv

Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.

#Multi-Task Learning#GRPO#Reinforcement Learning Post-Training

AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search

Intermediate
Zefang Zong, Dingwei Chen et al.Jan 8arXiv

AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.

#Agentic Reinforcement Learning#Turn-level Optimization#Tree Search

Meta-RL Induces Exploration in Language Agents

Intermediate
Yulun Jiang, Liangze Jiang et al.Dec 18arXiv

This paper introduces LAMER, a Meta-RL training framework that teaches language agents to explore first and then use what they learned to solve tasks faster.

#Meta-Reinforcement Learning#Language Agents#Exploration vs Exploitation