🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers30

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#LLM agents

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Intermediate
Zeyuan Liu, Jeonghye Kim et al.Feb 26arXiv

This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.

#EMPO#memory-augmented agents#on-policy learning

Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

Intermediate
Wenxuan Ding, Nicholas Tomlin et al.Feb 18arXiv

This paper teaches AI agents to make smart choices about when to explore for more information and when to act right away.

#Calibrate-Then-Act#cost-aware exploration#LLM agents

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Intermediate
Xiangyi Li, Wenbo Chen et al.Feb 13arXiv

SkillsBench is a big test playground that measures whether giving AI agents step-by-step 'Skills' actually helps them finish real tasks.

#Agent Skills#LLM agents#Benchmarking

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

Intermediate
Xavier Hu, Jinxiang Xia et al.Feb 10arXiv

EcoGym is a new open test playground where AI agents run small businesses over many days to see if they can plan well for the long term.

#EcoGym#long-horizon planning#LLM agents

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Intermediate
Zhi Chen, Zhensu Sun et al.Feb 8arXiv

This paper asks a simple question: do tests written by AI coding agents actually help them fix real software bugs, or do they just look helpful?

#LLM agents#agent-written tests#software engineering agents

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Intermediate
Alisia Lupidi, Bhavul Gauri et al.Feb 6arXiv

AIRS-Bench is a new test suite that checks whether AI research agents can do real machine learning research from start to finish, not just answer questions.

#AIRS-Bench#AI research agents#LLM agents

Reinforcement World Model Learning for LLM-based Agents

Intermediate
Xiao Yu, Baolin Peng et al.Feb 5arXiv

Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.

#Reinforcement World Model Learning#world modeling#LLM agents

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Intermediate
Zhenxiong Yu, Zhi Yang et al.Feb 5arXiv

Before this work, AI agents often stopped to run safety checks at every single step, which made them slow and still easy to trick in sneaky ways.

#Intrinsic Risk Sensing#Event-driven defense#Hierarchical Adaptive Screening

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

Intermediate
Changdae Oh, Seongheon Park et al.Feb 4arXiv

This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.

#uncertainty quantification#LLM agents#interactive AI

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Intermediate
Yansong Ning, Jun Fang et al.Feb 4arXiv

Agent-Omit teaches AI agents to skip unneeded thinking and old observations, cutting tokens while keeping accuracy high.

#LLM agents#reinforcement learning#agentic RL

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Intermediate
Haozhen Zhang, Quanyu Long et al.Feb 2arXiv

MemSkill turns memory operations for AI agents into learnable skills instead of fixed, hand-made rules.

#memory skills#LLM agents#skill bank

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Intermediate
Hang Yan, Xinyu Che et al.Feb 2arXiv

This paper studies how AI agents get better while they are working, not just whether they finish the job.

#Test-Time Improvement#LLM agents#trajectory analysis
123