The authors built a simple six-agent system to see if today’s AI models could plan, run, and write a research paper mostly on their own.
Large language models (LLMs) are good at many math problems but often mess up simple counting when the list gets long.
DreamStyle is a single video-stylizing model that can follow text, copy a style image, or continue from a stylized first frame—without switching tools.
MiMo-V2-Flash is a giant but efficient language model that uses a team-of-experts design to think well while staying fast.
AnyDepth is a new, simple way for a computer to tell how far things are in a picture using just one image (monocular depth).
SimpleMem is a new memory system that helps AI remember long conversations without wasting space or tokens.
Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.
InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.
DiffProxy turns tricky multi-camera photos of a person into a clean 3D body and hands by first painting a precise 'map' on each pixel and then fitting a standard body model to that map.
Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.
VIBE is a tiny but mighty image editor that listens to your words and changes pictures while keeping the original photo intact unless you ask otherwise.
NextFlow is a single, decoder-only Transformer that can read and write both text and images in one continuous sequence.