Papers1262

MMA: Multimodal Memory Agent

Long-horizon AI assistants can grab old, low-quality, or conflicting memories and then answer with too much confidence, which is dangerous.

#memory-augmented LLMs#multimodal agents#reliability scoring

Not triaged yet

CADEvolve: Creating Realistic CAD via Program Evolution

Intermediate

Maksim Elistratov, Marina Barannikov et al.Feb 18arXiv

AI models that make CAD designs used to learn mostly from simple “draw-then-extrude” examples, so they struggled with real, complex parts.

#CAD#CadQuery#Image2CAD

Not triaged yet

Multi-agent cooperation through in-context co-player inference

Intermediate

Marissa A. Weis, Maciej Wołczyk et al.Feb 18arXiv

The paper shows that AI agents can learn to cooperate simply by playing lots of different kinds of opponents and figuring them out on the fly, without hardcoding how those opponents learn.

#multi-agent reinforcement learning#in-context learning#co-player inference

Not triaged yet

Learning Personalized Agents from Human Feedback

Beginner

Kaiqu Liang, Julia Kruk et al.Feb 18arXiv

AI helpers often don’t know new users’ tastes and can’t keep up when those tastes change.

#personalization#human feedback#pre-action clarification

Not triaged yet

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Intermediate

Haoxiang Sun, Lizhen Xu et al.Feb 18arXiv

DeepVision-103K is a new 103,000-example picture-and-text math dataset designed to help AI think better using rewards that can be checked automatically.

#DeepVision-103K#multimodal reasoning#RLVR

Not triaged yet

MAEB: Massive Audio Embedding Benchmark

Intermediate

Adnan El Assadi, Isaac Chung et al.Feb 17arXiv

MAEB is a giant, fair report card for audio AI that tests 50+ models on 30 tasks across speech, music, environmental sounds, and audio–text tasks in 100+ languages.

#audio embeddings#MAEB#MTEB

Not triaged yet

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Intermediate

Xitong Yang, Devansh Kukreja et al.Feb 17arXiv

SAM 3D Body (3DB) is a model that turns a single photo of a person into a full 3D body, feet, and hands mesh with state-of-the-art accuracy.

#human mesh recovery#3D human pose#Momentum Human Rig

Not triaged yet

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Intermediate

Sen Ye, Mengde Xu et al.Feb 17arXiv

Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.

#multimodal models#image generation#reasoning

Not triaged yet

GLM-5: from Vibe Coding to Agentic Engineering

Intermediate

GLM-5 Team, Aohan Zeng et al.Feb 17arXiv

GLM-5 is a new open-weight AI model that moves from 'vibe coding' (prompting the model to write code) to 'agentic engineering' (letting the model plan, build, test, and fix software on its own).

#GLM-5#Agentic Engineering#DeepSeek Sparse Attention

Not triaged yet

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Intermediate

Hila Manor, Rinon Gal et al.Feb 17arXiv

This paper teaches image models to copy a change shown in one image pair and apply it to a new image, like saying 'hat added here, add a similar hat there.'

#visual analogy learning#LoRA#LoRA basis

Not triaged yet

World Action Models are Zero-shot Policies

Intermediate

Seonghyeon Ye, Yunhao Ge et al.Feb 17arXiv

DreamZero is a robot brain that learns actions by predicting short videos of the future and the matching moves at the same time.

#World Action Models#DreamZero#video diffusion

Not triaged yet

"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing

Beginner

Johannes Kirmayr, Raphael Wennmacher et al.Feb 17arXiv

The study tested how an in-car AI helper should talk while it works on long, multi-step tasks.

#agentic AI#LLM assistants#intermediate feedback

Not triaged yet

15 16 17 18 19