MemoBrain is like a helpful co-pilot for AI that keeps important thoughts neat and ready so the main thinker (the agent) doesn’t get overwhelmed.
Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.
Large Vision-Language Models (LVLMs) look great on single images but often stumble when they must reason across multiple images.
Computer-using agents kept forgetting important visual details over long tasks and could not reliably find up-to-date, step-by-step help for unfamiliar apps.
This paper teaches AI to build and improve its own small computer helpers (tools) while solving science problems, instead of relying only on a fixed toolbox made beforehand.
TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.
MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.
OpenTinker is an open-source system that makes training AI agents with reinforcement learning simple, modular, and reusable.
Diffusion Language Models (DLMs) write by polishing whole sentences in several passes instead of one token at a time.
The paper introduces Controlled Self-Evolution (CSE), a smarter way for AI to write and improve code quickly under a tight budget of tries.
VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.
Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.