Papers4

#self-correction

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.

#multimodal models#image generation#reasoning

Not triaged yet

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Beginner

Yukang Feng, Jianwen Sun et al.Feb 15arXiv

LongCLI-Bench is a new test that checks how well AI coding agents can handle long, realistic software projects in the command line, not just tiny coding puzzles.

#LongCLI-Bench#agentic programming#command-line interface agents

Not triaged yet

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Intermediate

Yinyi Luo, Yiqiao Jin et al.Feb 3arXiv

AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.

#multi-agent distillation#process reward model#GRPO

Not triaged yet

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Beginner

Wenhao Zeng, Xuteng Zhang et al.Jan 8arXiv

Big reasoning AIs think in many steps, which is slow and costly.

#collaborative inference#initial token entropy#step-level routing

Not triaged yet