Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.
NextFlow is a single, decoder-only Transformer that can read and write both text and images in one continuous sequence.
MDAgent2 is a special helper built from large language models (LLMs) that can both answer questions about molecular dynamics and write runnable LAMMPS simulation code.
K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.
CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.
MindWatcher is a smart AI agent that can think step by step and decide when to use tools like web search, image zooming, and a code calculator to solve tough, multi-step problems.
Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.
Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.
SpatialTree is a new, four-level "ability tree" that tests how multimodal AI models (that see and read) handle space: from basic seeing to acting in the world.
Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.
This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.