Streamo is a real-time video assistant that knows when to stay quiet, when to wait, and when to speak—while a video is still playing.
DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.
Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.
UltraShape 1.0 is a two-step 3D generator that first makes a simple overall shape and then zooms in to add tiny details.
This paper speeds up how 3D scenes handle big, 512‑dimensional features without throwing away important information.
Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.
Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.
SemanticGen is a new way to make videos that starts by planning in a small, high-level 'idea space' (semantic space) and then adds the tiny visual details later.
LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.
SpatialTree is a new, four-level "ability tree" that tests how multimodal AI models (that see and read) handle space: from basic seeing to acting in the world.
The paper turns video avatars from passive puppets into active doers that can plan, act, check their own work, and fix mistakes over many steps.
The paper shows that big sequence models (like transformers) quietly learn longer goals inside their hidden activations, even though they are trained one step at a time.