πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#agent benchmarks

On Data Engineering for Scaling LLM Terminal Capabilities

Intermediate
Renjie Pi, Grace Lam et al.Feb 24arXiv

This paper shows that you can vastly improve a model’s command-line (terminal) skills by carefully engineering the training data, not just by using a bigger model.

#Terminal-Bench 2.0#terminal agents#synthetic task generation

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Intermediate
Weihao Zeng, Yuzhen Huang et al.Feb 8arXiv

LOCA-bench is a test that challenges AI agents to work correctly as their to-do list and background information grow very, very long.

#LOCA-bench#long-context agents#context rot

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Intermediate
Bowen Xu, Shaoyu Wu et al.Feb 2arXiv

This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.

#task decomposition#tool use#large reasoning models

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Intermediate
Xiaoyu Tian, Haotian Wang et al.Jan 29arXiv

ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.

#tool-augmented agents#multi-turn decision making#verifiable environments