🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers9

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#optical flow

Rethinking Video Generation Model for the Embodied World

Beginner
Yufan Deng, Zilin Pan et al.Jan 21arXiv

Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.

#robot video generation#embodied AI#benchmark

Future Optical Flow Prediction Improves Robot Control & Video Generation

Intermediate
Kanchana Ranasinghe, Honglu Zhou et al.Jan 15arXiv

FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.

#optical flow#future optical flow prediction#vision-language model

Motion Attribution for Video Generation

Intermediate
Xindi Wu, Despoina Paschalidou et al.Jan 13arXiv

Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.

#motion attribution#video diffusion#optical flow

RadarGen: Automotive Radar Point Cloud Generation from Cameras

Intermediate
Tomer Borreda, Fangqiang Ding et al.Dec 19arXiv

RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.

#automotive radar#radar point cloud generation#latent diffusion

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Intermediate
Hoiyeong Jin, Hyojin Jang et al.Dec 19arXiv

InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.

#video object insertion#4D scene geometry#diffusion video generation

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Intermediate
Chiao-An Yang, Ryo Hachiuma et al.Dec 18arXiv

This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.

#4D perception#multimodal large language models#perceptual distillation

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

Intermediate
Zihan Wang, Jiashun Wang et al.Dec 16arXiv

CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.

#real-to-sim#human-scene interaction#planar primitives

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Beginner
Yixin Wan, Lei Ke et al.Dec 11arXiv

This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.

#motion-centric image editing#optical flow#MotionEdit dataset

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner
Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling