Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.
Fast KVzip is a new way to shrink an LLM’s memory (the KV cache) while keeping answers just as accurate.
This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.
This paper shows how to add a tiny helper (a probe) to a big language model so it can classify things like safety or sentiment during the same pass it already does to answer you.
ARBITRAGE makes AI solve step-by-step problems faster by only using the big, slow model when it is predicted to truly help.