Papers4

All Beginner Intermediate Advanced

All Sources arXiv

#zero-shot generalization

Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models

Beginner

Zichen Jeff Cui, Omar Rayyan et al.Feb 9arXiv

Robots often get confused by wordy instructions, so this paper tells them exactly where to touch instead of what to do in sentences.

#contact-anchored policies#robot utility models#contact anchor

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

Beginner

Nate Gillman, Yinghua Zhou et al.Jan 9arXiv

Video models can now be told what physical result you want (like “make this ball move left with a strong push”) using Goal Force, instead of just vague text or a final picture.

#goal force#force vector control#visual planning

Sharp Monocular View Synthesis in Less Than a Second

Beginner

Lars Mescheder, Wei Dong et al.Dec 11arXiv

SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.

#monocular view synthesis#3D Gaussians#real-time neural rendering

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner

Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling