Papers924

Agent-as-a-Judge

Runyang You, Hongru Cai et al.Jan 8arXiv

This survey explains how AI judges are changing from single smart readers (LLM-as-a-Judge) into full-on agents that can plan, use tools, remember, and work in teams (Agent-as-a-Judge).

#Agent-as-a-Judge#LLM-as-a-Judge#multi-agent collaboration

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Beginner

Wenhao Zeng, Xuteng Zhang et al.Jan 8arXiv

Big reasoning AIs think in many steps, which is slow and costly.

#collaborative inference#initial token entropy#step-level routing

Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

Beginner

Muzhao Tian, Zisu Huang et al.Jan 8arXiv

Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).

#steerable memory#memory anchoring#long-term agents

Token-Level LLM Collaboration via FusionRoute

Intermediate

Nuoya Xiong, Yuhang Zhou et al.Jan 8arXiv

Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.

#FusionRoute#token-level collaboration#expert routing

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Intermediate

Maksim Velikanov, Ilyas Chahed et al.Jan 8arXiv

The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.

#learnable multipliers#weight decay#noise–WD equilibrium

SmartSearch: Process Reward-Guided Query Refinement for Search Agents

Intermediate

Tongyu Wen, Guanting Dong et al.Jan 8arXiv

SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.

#Search agents#Process rewards#Query refinement

DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation

Intermediate

Guanzhi Deng, Bo Li et al.Jan 8arXiv

Mixture-of-Experts (MoE) models use many small specialist networks and only activate a few per token, but classic LoRA fine-tuning gives every expert the same rank, wasting parameters on the wrong experts.

#DR-LoRA#Mixture-of-Experts#Low-Rank Adaptation

AgentOCR: Reimagining Agent History via Optical Self-Compression

Intermediate

Lang Feng, Fuchao Yang et al.Jan 8arXiv

AgentOCR turns an agent’s long text history into pictures so it can remember more using fewer tokens.

#AgentOCR#optical self-compression#visual tokens

AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search

Intermediate

Zefang Zong, Dingwei Chen et al.Jan 8arXiv

AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.

#Agentic Reinforcement Learning#Turn-level Optimization#Tree Search

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Intermediate

Tingyu Wu, Zhisheng Chen et al.Jan 8arXiv

KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.

#person understanding#autobiographical narratives#cognitive stream

Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning

Intermediate

Yuyang Hu, Jiongnan Liu et al.Jan 8arXiv

This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.

#event-centric memory#Event Graph#logic-aware retrieval

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Intermediate

Mingxin Li, Yanzhao Zhang et al.Jan 8arXiv

This paper builds two teamwork models, Qwen3-VL-Embedding and Qwen3-VL-Reranker, that understand text, images, visual documents, and videos in one shared space so search works across all of them.

#multimodal retrieval#unified embedding space#cross-encoder reranker

40 41 42 43 44