Papers11

All Beginner Intermediate Advanced

All Sources arXiv

#vision-language-action

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Intermediate

Siting Wang, Xiaofeng Wang et al.Mar 2arXiv

Robots that read images and instructions (VLAs) get stuck following a narrow, fragile path after normal training.

#vision-language-action#flow matching#stochastic differential equations

Not triaged yet

RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

Beginner

Seungku Kim, Suhyeok Jang et al.Feb 21arXiv

RoboCurate is a way to make better robot training videos by checking if the actions in a generated video actually match what a robot would do in a simulator.

#RoboCurate#neural trajectory#action verification

Not triaged yet

RynnBrain: Open Embodied Foundation Models

Beginner

Ronghao Dang, Jiayan Guo et al.Feb 13arXiv

RynnBrain is an open-source 'robot brain' that helps machines see, think, and plan in the real world across space and time.

#embodied intelligence#egocentric vision#spatiotemporal localization

Not triaged yet

Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models

Beginner

Zichen Jeff Cui, Omar Rayyan et al.Feb 9arXiv

Robots often get confused by wordy instructions, so this paper tells them exactly where to touch instead of what to do in sentences.

#contact-anchored policies#robot utility models#contact anchor

Not triaged yet

RoboBrain 2.5: Depth in Sight, Time in Mind

Intermediate

Huajie Tan, Enshen Zhou et al.Jan 20arXiv

RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.

#Embodied AI#3D spatial reasoning#metric grounding

Not triaged yet

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Intermediate

Siyuan Hu, Kevin Qinghong Lin et al.Dec 31arXiv

Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.

#GUI automation#continuous control#flow matching

Not triaged yet

GR-Dexter Technical Report

Intermediate

Ruoshi Wen, Guangzeng Chen et al.Dec 30arXiv

GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.

#vision-language-action#dexterous manipulation#bimanual robotics

Not triaged yet

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Intermediate

Jiacheng Ye, Shansan Gong et al.Dec 27arXiv

Dream-VL and Dream-VLA use a diffusion language model backbone to understand images, talk about them, and plan actions better than many regular (autoregressive) models.

#diffusion language model#vision-language model#vision-language-action

Not triaged yet

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Intermediate

Xiaopeng Lin, Shijie Lian et al.Dec 18arXiv

Robots learn best from what they would actually see, which is a first-person (egocentric) view, but most AI models are trained on third-person videos and get confused.

#egocentric vision#first-person video#vision-language model

Not triaged yet

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Intermediate

Zechen Bai, Chen Gao et al.Dec 16arXiv

Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.

#EVOLVE-VLA#test-time training#vision-language-action

Not triaged yet

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate

Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model

Not triaged yet