Sparse-LaViDa makes diffusion-style AI models much faster by skipping unhelpful masked tokens during generation while keeping quality the same.
Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.
FINERWEB is a new, carefully built dataset pipeline that teaches computers to spot names of people, places, and more across 91 languages and 25 writing systems.
SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.
LitePT is a new AI backbone for 3D point clouds that uses convolutions in early layers and attention in later layers to be both fast and accurate.
The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
Digital humans used to just copy motions; this paper makes them think, speak, and move in sync like real people.
RoboTracer is a vision-language model that turns tricky, word-only instructions into safe, step-by-step 3D paths (spatial traces) robots can follow.
The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.
LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.
Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.