DSGym is a unified 'gym' where AI data science agents are tested and trained by actually running code on real datasets, not just chatting about them.
Memory-V2V teaches video editing AIs to remember what they already changed so new edits stay consistent with old ones.
Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.
This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.
Before this work, most text-to-image models used VAEs (small, squished image codes) and struggled with slow training and overfitting on high-quality fine-tuning sets.
IVRA is a simple, training-free add-on that helps robot brains keep the 2D shape of pictures while following language instructions.
This paper shows that giving an AI a safe, tiny virtual computer (a sandbox) lets it solve many kinds of problems better, not just coding ones.
This paper shows how to turn any normal photo or video into a seamless 360° panorama without needing the camera’s settings like field of view or tilt.
This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.
Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.
ActionMesh is a fast, feed-forward AI that turns videos, images + text, text alone, or a given 3D model into an animated 3D mesh.
This paper introduces EDIR, a new and much more detailed test for Composed Image Retrieval (CIR), where you search for a target image using a starting image plus a short text change.