SWE-rebench V2 is a giant, language-agnostic robot pipeline that turns real GitHub pull requests into safe, runnable software tasks for training AI coding agents.
SeeThrough3D teaches image generators to understand what should be visible and what should be hidden when objects overlap, just like in real life.
This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.
Multi-agent systems are like teams of smart helpers, but one bad message can mislead the whole team.
People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).
EmbodMocap is a low-cost, portable way to capture people moving inside real places using just two iPhones, so computers and robots can learn from real life instead of studios.
AgentVista is a new test (benchmark) that checks whether AI agents can solve tough, real-life picture-based problems by using multiple tools over many steps.
The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).
GeoWorld is a new way for AI to plan several steps into the future by thinking in shapes (geometry) instead of only numbers.
Before SkillNet, AI agents kept solving the same kinds of problems over and over without saving what they learned in a clean, reusable way.
This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.
This paper shows how to fairly test "general-purpose" AI agents that should work in many places without special tweaks.