The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.
MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.
SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.
This paper teaches AI to notice not just what is in a picture, but how the picture looks and feels to people.
Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.
LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.
This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.
Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.
GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.
Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.