Papers1055

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

This paper teaches a humanoid robot to find and pick up many different objects in new places using plain-language requests like 'grab the orange mug.'

#humanoid loco-manipulation#end-effector tracking#open-vocabulary perception

Reinforced Fast Weights with Next-Sequence Prediction

Intermediate

Hee Seung Hwang, Xindi Wu et al.Feb 18arXiv

Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.

#fast weight models#next-sequence prediction#reinforcement learning for LMs

Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

Intermediate

Wenxuan Ding, Nicholas Tomlin et al.Feb 18arXiv

This paper teaches AI agents to make smart choices about when to explore for more information and when to act right away.

#Calibrate-Then-Act#cost-aware exploration#LLM agents

Learning Situated Awareness in the Real World

Intermediate

Chuhan Li, Ruilin Han et al.Feb 18arXiv

SAW-Bench is a new test that checks if AI can understand the world from a first-person view, like wearing smart glasses.

#situated awareness#egocentric video#observer-centric reasoning

Towards a Science of AI Agent Reliability

Intermediate

Stephan Rabanser, Sayash Kapoor et al.Feb 18arXiv

Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.

#AI agent reliability#consistency#robustness

MMA: Multimodal Memory Agent

Intermediate

Yihao Lu, Wanru Cheng et al.Feb 18arXiv

Long-horizon AI assistants can grab old, low-quality, or conflicting memories and then answer with too much confidence, which is dangerous.

#memory-augmented LLMs#multimodal agents#reliability scoring

CADEvolve: Creating Realistic CAD via Program Evolution

Intermediate

Maksim Elistratov, Marina Barannikov et al.Feb 18arXiv

AI models that make CAD designs used to learn mostly from simple “draw-then-extrude” examples, so they struggled with real, complex parts.

#CAD#CadQuery#Image2CAD

Multi-agent cooperation through in-context co-player inference

Intermediate

Marissa A. Weis, Maciej Wołczyk et al.Feb 18arXiv

The paper shows that AI agents can learn to cooperate simply by playing lots of different kinds of opponents and figuring them out on the fly, without hardcoding how those opponents learn.

#multi-agent reinforcement learning#in-context learning#co-player inference

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Intermediate

Haoxiang Sun, Lizhen Xu et al.Feb 18arXiv

DeepVision-103K is a new 103,000-example picture-and-text math dataset designed to help AI think better using rewards that can be checked automatically.

#DeepVision-103K#multimodal reasoning#RLVR

MAEB: Massive Audio Embedding Benchmark

Intermediate

Adnan El Assadi, Isaac Chung et al.Feb 17arXiv

MAEB is a giant, fair report card for audio AI that tests 50+ models on 30 tasks across speech, music, environmental sounds, and audio–text tasks in 100+ languages.

#audio embeddings#MAEB#MTEB

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Intermediate

Xitong Yang, Devansh Kukreja et al.Feb 17arXiv

SAM 3D Body (3DB) is a model that turns a single photo of a person into a full 3D body, feet, and hands mesh with state-of-the-art accuracy.

#human mesh recovery#3D human pose#Momentum Human Rig

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Intermediate

Sen Ye, Mengde Xu et al.Feb 17arXiv

Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.

#multimodal models#image generation#reasoning

11 12 13 14 15