๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#point cloud

Utonia: Toward One Encoder for All Point Clouds

Intermediate
Yujia Zhang, Xiaoyang Wu et al.Mar 3arXiv

Utonia is a single brain (encoder) that learns from many kinds of 3D point clouds, like indoor rooms, outdoor streets, tiny toys, and even city maps.

#Utonia#point cloud#self-supervised learning

WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

Intermediate
Yisu Zhang, Chenjie Cao et al.Mar 2arXiv

WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.

#video diffusion models#camera control#3D reconstruction

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Intermediate
Shengchao Zhou, Yuxin Chen et al.Dec 23arXiv

The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).

#dynamic spatial reasoning#vision-language models#4D understanding

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

Intermediate
Zihan Wang, Jiashun Wang et al.Dec 16arXiv

CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.

#real-to-sim#human-scene interaction#planar primitives

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Intermediate
Chuhan Zhang, Guillaume Le Moing et al.Dec 9arXiv

D4RT is a new AI model that turns regular videos into moving 3D scenes (4D) quickly and accurately.

#D4RT#dynamic 4D reconstruction#query-based decoding