The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.

#Trainee-Bench#dynamic task scheduling#active exploration

Evolving Programmatic Skill Networks

Intermediate

Haochen Shi, Xingdi Yuan et al.Jan 7arXiv

This paper teaches a computer agent to grow a toolbox of skills that are real, runnable programs, not just text ideas.

#Programmatic Skill Network#continual learning#symbolic programs

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Intermediate

Muxi Diao, Lele Yang et al.Jan 5arXiv

Supervised fine-tuning (SFT) often makes a model great at a new task but worse at its old skills; this paper explains a key reason why and how to fix it.

#Entropy-Adaptive Fine-Tuning#confident conflicts#token-level entropy