This paper teaches AI to pay attention better by training its focus, not just its words.
The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.
The paper shows that friendly, people-pleasing language can trick even advanced language models into agreeing with wrong answers.
Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).
When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.
Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.
Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.
The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.
Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.
LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.