C2LLM is a new family of code embedding models that helps computers find the right code faster and more accurately.
DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.
Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.
UltraShape 1.0 is a two-step 3D generator that first makes a simple overall shape and then zooms in to add tiny details.
T2AV-Compass is a new, unified test to fairly grade AI systems that turn text into matching video and audio.
This paper introduces NExT-Vid, a way to teach a video model by asking it to guess the next frame of a video while parts of the past are hidden.
This paper speeds up how 3D scenes handle big, 512‑dimensional features without throwing away important information.
Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.
Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.
TokSuite is a science lab for tokenizers: it trains 14 language models that are identical in every way except for how they split text into tokens.
SemanticGen is a new way to make videos that starts by planning in a small, high-level 'idea space' (semantic space) and then adds the tiny visual details later.
LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.