DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
IntermediateYongtong Wu, Shaoyuan Chen et al.Feb 25arXiv
Agent-style LLMs chat with tools over many short turns, so most tokens are repeats and the system spends more time fetching old memories (KV-Cache) than computing new answers.
#KV-Cache#prefill-decode disaggregation#dual-path loading