WebGym is a giant practice world (almost 300,000 tasks) that lets AI web agents learn on real, ever-changing websites instead of tiny, fake ones.
This paper teaches AI to solve diagram-based math problems by copying how people think: first see (perception), then make sense of what you saw (internalization), and finally reason (solve the problem).
COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.
K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.
This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.
NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.
OpenNovelty is a four-phase, AI-powered helper that checks how new a research paper’s ideas are by comparing them to real, retrieved papers.
DrivingGen is a new, all-in-one test that fairly checks how well AI can imagine future driving videos and motions.
SWE-Lego shows that a simple training method called supervised fine-tuning (SFT), when done carefully, can teach AI to fix real software bugs very well.
DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.
A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.
This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.