🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers10

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#PPO

ProAct: Agentic Lookahead in Interactive Environments

Intermediate
Yangbin Yu, Mingyu Yang et al.Feb 5arXiv

ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.

#ProAct#GLAD#MC-Critic

Language-based Trial and Error Falls Behind in the Era of Experience

Intermediate
Haoyu Wang, Guozheng Ma et al.Jan 29arXiv

Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.

#SCOUT#Reinforcement Learning#Supervised Fine-Tuning

Endless Terminals: Scaling RL Environments for Terminal Agents

Intermediate
Kanishk Gandhi, Shivam Garg et al.Jan 23arXiv

Endless Terminals is an automatic factory that builds thousands of realistic, checkable computer-terminal tasks so AI agents can practice and improve with reinforcement learning.

#reinforcement learning#PPO#terminal agents

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Intermediate
Constantinos Karouzos, Xingwei Tan et al.Jan 9arXiv

Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).

#preference tuning#domain shift#supervised fine-tuning

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Beginner
Jinyang Wu, Guocheng Zhai et al.Jan 7arXiv

ATLAS is a system that picks the best mix of AI models and helper tools for each question, instead of using just one model or a fixed tool plan.

#ATLAS#LLM routing#tool augmentation

Multi-hop Reasoning via Early Knowledge Alignment

Intermediate
Yuxin Wang, Shicheng Fang et al.Dec 23arXiv

This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.

#Retrieval-Augmented Generation#Iterative RAG#Multi-hop Reasoning

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Intermediate
Zhenwen Liang, Sidi Lu et al.Dec 17arXiv

This paper teaches large language models (LLMs) to explore smarter by listening to their own gradients—the directions they would update—rather than chasing random variety.

#gradient-guided reinforcement learning#GRL#GRPO

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Intermediate
Timo Klein, Thomas Lang et al.Dec 16arXiv

Reinforcement learning agents often see the world in straight, flat space (Euclidean), but many decision problems look more like branching trees that fit curved, hyperbolic space better.

#hyperbolic reinforcement learning#Hyperboloid#Poincaré Ball

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Intermediate
Zheng Ding, Weirui YeDec 9arXiv

TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.

#TreeGRPO#reinforcement learning#diffusion models

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Intermediate
Changpeng Yang, Jinyang Wu et al.Dec 2arXiv

This paper teaches AI models to reason better by first copying only good examples and later learning from mistakes too.

#Curriculum Advantage Policy Optimization#advantage-based RL#imitation learning