Mixture-of-Experts (MoE) models often send far more tokens to a few “favorite” experts, which overloads some GPUs while others sit idle.
This paper fixes a big problem in long video-making AIs where the video keeps snapping back to the beginning, like a movie stuck on rewind.
Coding agents waste most of their tokens just reading giant files, which makes them slow and expensive.
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.
Endless Terminals is an automatic factory that builds thousands of realistic, checkable computer-terminal tasks so AI agents can practice and improve with reinforcement learning.
Memory-V2V teaches video editing AIs to remember what they already changed so new edits stay consistent with old ones.
Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.
This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.
Before this work, most text-to-image models used VAEs (small, squished image codes) and struggled with slow training and overfitting on high-quality fine-tuning sets.
This paper shows how to turn any normal photo or video into a seamless 360° panorama without needing the camera’s settings like field of view or tilt.
This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.
Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.