🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#ALFWorld

Reinforcement World Model Learning for LLM-based Agents

Intermediate
Xiao Yu, Baolin Peng et al.Feb 5arXiv

Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.

#Reinforcement World Model Learning#world modeling#LLM agents

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Intermediate
Jinyang Wu, Shuo Yang et al.Jan 28arXiv

SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.

#SPARK#dynamic branching#strategic exploration

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Intermediate
Youwei Liu, Jian Wang et al.Jan 13arXiv

Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.

#Imagine-then-Plan#world models#adaptive lookahead

From Word to World: Can Large Language Models be Implicit Text-based World Models?

Intermediate
Yixia Li, Hongru Wang et al.Dec 21arXiv

This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.

#world models#next-state prediction#text-based environments

Meta-RL Induces Exploration in Language Agents

Intermediate
Yulun Jiang, Liangze Jiang et al.Dec 18arXiv

This paper introduces LAMER, a Meta-RL training framework that teaches language agents to explore first and then use what they learned to solve tasks faster.

#Meta-Reinforcement Learning#Language Agents#Exploration vs Exploitation

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Intermediate
Tong Wei, Yijun Yang et al.Dec 15arXiv

GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.

#GTR-Turbo#checkpoint merging#TIES-merging