🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Sokoban

ProAct: Agentic Lookahead in Interactive Environments

Intermediate
Yangbin Yu, Mingyu Yang et al.Feb 5arXiv

ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.

#ProAct#GLAD#MC-Critic

Language-based Trial and Error Falls Behind in the Era of Experience

Intermediate
Haoyu Wang, Guozheng Ma et al.Jan 29arXiv

Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.

#SCOUT#Reinforcement Learning#Supervised Fine-Tuning

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Intermediate
Junbo Li, Peng Zhou et al.Dec 18arXiv

Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.

#Turn-PPO#multi-turn reinforcement learning#agentic LLMs

Meta-RL Induces Exploration in Language Agents

Intermediate
Yulun Jiang, Liangze Jiang et al.Dec 18arXiv

This paper introduces LAMER, a Meta-RL training framework that teaches language agents to explore first and then use what they learned to solve tasks faster.

#Meta-Reinforcement Learning#Language Agents#Exploration vs Exploitation