Papers2

#self-correction

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Yukang Feng, Jianwen Sun et al.Feb 15arXiv

LongCLI-Bench is a new test that checks how well AI coding agents can handle long, realistic software projects in the command line, not just tiny coding puzzles.

#LongCLI-Bench#agentic programming#command-line interface agents

Not triaged yet

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Beginner

Wenhao Zeng, Xuteng Zhang et al.Jan 8arXiv

Big reasoning AIs think in many steps, which is slow and costly.

#collaborative inference#initial token entropy#step-level routing

Not triaged yet