Papers9

#optical flow

RealWonder: Real-Time Physical Action-Conditioned Video Generation

RealWonder is a system that turns a single picture and 3D physical actions (like pushes, wind, and robot gripper moves) into a realistic video in real time.

#action-conditioned video generation#physics simulation#optical flow

Not triaged yet

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

Intermediate

Jiahao Lu, Jiayi Xu et al.Mar 3arXiv

Track4World is a fast, feedforward AI that can follow the 3D path of every pixel in a video using just one camera.

#dense 3D tracking#scene flow#2D-to-3D correlation

Not triaged yet

DreamWorld: Unified World Modeling in Video Generation

Intermediate

Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

Not triaged yet

Future Optical Flow Prediction Improves Robot Control & Video Generation

Intermediate

Kanchana Ranasinghe, Honglu Zhou et al.Jan 15arXiv

FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.

#optical flow#future optical flow prediction#vision-language model

Not triaged yet

Motion Attribution for Video Generation

Intermediate

Xindi Wu, Despoina Paschalidou et al.Jan 13arXiv

Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.

#motion attribution#video diffusion#optical flow

Not triaged yet

RadarGen: Automotive Radar Point Cloud Generation from Cameras

Intermediate

Tomer Borreda, Fangqiang Ding et al.Dec 19arXiv

RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.

#automotive radar#radar point cloud generation#latent diffusion

Not triaged yet

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Intermediate

Hoiyeong Jin, Hyojin Jang et al.Dec 19arXiv

InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.

#video object insertion#4D scene geometry#diffusion video generation

Not triaged yet

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Intermediate

Chiao-An Yang, Ryo Hachiuma et al.Dec 18arXiv

This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.

#4D perception#multimodal large language models#perceptual distillation

Not triaged yet

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

Intermediate

Zihan Wang, Jiashun Wang et al.Dec 16arXiv

CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.

#real-to-sim#human-scene interaction#planar primitives

Not triaged yet