AgencyBench is a giant test that checks how well AI agents can handle real, long, multi-step jobs, not just short puzzles.
The paper introduces Multiplex Thinking, a new way for AI to think by sampling several likely next words at once and blending them into a single super-token.
AgentOCR turns an agent’s long text history into pictures so it can remember more using fewer tokens.
The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.
EMMA is a single AI model that can understand images, write about them, create new images from text, and edit images—all in one unified system.