Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.
Auto-regressive video models make videos one chunk at a time but run out of GPU memory because the KV-cache grows with history.
This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.
Training big AI models uses lots of memory because most methods still keep a secret full-precision copy of the weights called master weights.
This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.
Terminal-Bench 2.0 is a tough test that checks how well AI agents can solve real, professional tasks by typing commands in a computer terminal.
ThreadWeaver teaches a language model to split big problems into smaller parts it can solve at the same time, like teammates working in parallel.