Papers2

#execution-based filtering

TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Elena Bruches, Vadim Alperovich et al.Jan 26arXiv

This paper introduces TAM-Eval, a new way to test how well AI models can create, fix, and update unit tests for real software projects.

#unit test maintenance#LLM for software engineering#reference-free evaluation

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Intermediate

Yibo Lyu, Gongwei Chen et al.Jan 14arXiv

The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.

#personalized GUI agent#implicit intent#preference modeling