🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#optical flow

Future Optical Flow Prediction Improves Robot Control & Video Generation

Intermediate
Kanchana Ranasinghe, Honglu Zhou et al.Jan 15arXiv

FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.

#optical flow#future optical flow prediction#vision-language model

Motion Attribution for Video Generation

Intermediate
Xindi Wu, Despoina Paschalidou et al.Jan 13arXiv

Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.

#motion attribution#video diffusion#optical flow

RadarGen: Automotive Radar Point Cloud Generation from Cameras

Intermediate
Tomer Borreda, Fangqiang Ding et al.Dec 19arXiv

RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.

#automotive radar#radar point cloud generation#latent diffusion

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Intermediate
Hoiyeong Jin, Hyojin Jang et al.Dec 19arXiv

InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.

#video object insertion#4D scene geometry#diffusion video generation

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Intermediate
Chiao-An Yang, Ryo Hachiuma et al.Dec 18arXiv

This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.

#4D perception#multimodal large language models#perceptual distillation

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

Intermediate
Zihan Wang, Jiashun Wang et al.Dec 16arXiv

CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.

#real-to-sim#human-scene interaction#planar primitives