Papers1055

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Ibragim Badertdinov, Maksim Nekrashevich et al.Feb 27arXiv

SWE-rebench V2 is a giant, language-agnostic robot pipeline that turns real GitHub pull requests into safe, runnable software tasks for training AI coding agents.

#SWE-rebench V2#software engineering agents#reinforcement learning

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

Intermediate

Vaibhav Agrawal, Rishubh Parihar et al.Feb 26arXiv

SeeThrough3D teaches image generators to understand what should be visible and what should be hidden when objects overlap, just like in real life.

#occlusion-aware generation#3D layout control#text-to-image

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Intermediate

Tilemachos Aravanis, Vladan Stojnić et al.Feb 26arXiv

This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.

#open-vocabulary segmentation#vision-language models#retrieval-augmented

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

Intermediate

Yutong Wang, Siyuan Xiong et al.Feb 26arXiv

Multi-agent systems are like teams of smart helpers, but one bad message can mislead the whole team.

#multi-agent systems#error propagation#test-time rectification

Large Multimodal Models as General In-Context Classifiers

Intermediate

Marco Garosi, Matteo Farina et al.Feb 26arXiv

People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).

#in-context learning#multimodal models#open-world classification

EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

Intermediate

Wenjia Wang, Liang Pan et al.Feb 26arXiv

EmbodMocap is a low-cost, portable way to capture people moving inside real places using just two iPhones, so computers and robots can learn from real life instead of studios.

#Embodied AI#4D human-scene reconstruction#dual-view RGB-D

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Intermediate

Zhaochen Su, Jincheng Gao et al.Feb 26arXiv

AgentVista is a new test (benchmark) that checks whether AI agents can solve tough, real-life picture-based problems by using multiple tools over many steps.

#AgentVista#multimodal agents#visual grounding

The Trinity of Consistency as a Defining Principle for General World Models

Intermediate

Jingxuan Wei, Siyuan Li et al.Feb 26arXiv

The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).

#world model#trinity of consistency#modal consistency

GeoWorld: Geometric World Models

Intermediate

Zeyu Zhang, Danning Li et al.Feb 26arXiv

GeoWorld is a new way for AI to plan several steps into the future by thinking in shapes (geometry) instead of only numbers.

#geometric world model#hyperbolic JEPA#Poincaré ball

SkillNet: Create, Evaluate, and Connect AI Skills

Intermediate

Yuan Liang, Ruobin Zhong et al.Feb 26arXiv

Before SkillNet, AI agents kept solving the same kinds of problems over and over without saving what they learned in a clean, reusable way.

#AI skills#Skill ontology#Skill taxonomy

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Intermediate

Zeyuan Liu, Jeonghye Kim et al.Feb 26arXiv

This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.

#EMPO#memory-augmented agents#on-policy learning

General Agent Evaluation

Intermediate

Elron Bandel, Asaf Yehudai et al.Feb 26arXiv

This paper shows how to fairly test "general-purpose" AI agents that should work in many places without special tweaks.

#general-purpose agents#agent evaluation#unified protocol

5 6 7 8 9