Papers6

All Beginner Intermediate Advanced

All Sources arXiv

#continual learning

Learning Personalized Agents from Human Feedback

Beginner

Kaiqu Liang, Julia Kruk et al.Feb 18arXiv

AI helpers often don’t know new users’ tastes and can’t keep up when those tastes change.

#personalization#human feedback#pre-action clarification

Not triaged yet

ResearchGym: Evaluating Language Model Agents on Real-World AI Research

Intermediate

Aniketh Garikaparthi, Manasi Patwardhan et al.Feb 16arXiv

ResearchGym is a new "gym" where AI agents are tested on real research projects end to end, not just on toy problems.

#ResearchGym#closed-loop research#objective evaluation

Not triaged yet

Position: Agentic Evolution is the Path to Evolving LLMs

Intermediate

Minhua Lin, Hanqing Lu et al.Jan 30arXiv

Big AI models do great in the lab but stumble in the real world because the world keeps changing.

#agentic evolution#A-Evolve#deployment-time adaptation

Not triaged yet

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

Beginner

Daocheng Fu, Jianbiao Mei et al.Jan 13arXiv

The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.

#Trainee-Bench#dynamic task scheduling#active exploration

Not triaged yet

Evolving Programmatic Skill Networks

Intermediate

Haochen Shi, Xingdi Yuan et al.Jan 7arXiv

This paper teaches a computer agent to grow a toolbox of skills that are real, runnable programs, not just text ideas.

#Programmatic Skill Network#continual learning#symbolic programs

Not triaged yet

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Intermediate

Muxi Diao, Lele Yang et al.Jan 5arXiv

Supervised fine-tuning (SFT) often makes a model great at a new task but worse at its old skills; this paper explains a key reason why and how to fix it.

#Entropy-Adaptive Fine-Tuning#confident conflicts#token-level entropy

Not triaged yet