Papers6

#VGGT

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

The paper introduces VGGT-Det, a new way to detect 3D objects indoors from many photos without needing sensor-provided camera poses or depth maps.

#Sensor-Geometry-Free 3D detection#Indoor multi-view detection#VGGT

Not triaged yet

DreamWorld: Unified World Modeling in Video Generation

Intermediate

Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

Not triaged yet

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Intermediate

Edgar Sucar, Eldar Insafutdinov et al.Jan 14arXiv

V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.

#Dynamic Point Maps#4D reconstruction#scene flow

Not triaged yet

Orient Anything V2: Unifying Orientation and Rotation Understanding

Intermediate

Zehan Wang, Ziang Zhang et al.Jan 9arXiv

This paper teaches an AI model to understand both which way an object is facing (orientation) and how it turns between views (rotation), all in one system.

#object orientation#rotational symmetry#relative rotation

Not triaged yet

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Intermediate

Shuai Yuan, Yantai Yang et al.Jan 5arXiv

InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.

#InfiniteVGGT#rolling memory#causal attention

Not triaged yet

How Much 3D Do Video Foundation Models Encode?

Intermediate

Zixuan Huang, Xiang Li et al.Dec 23arXiv

This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?

#video foundation models#3D awareness#temporal reasoning

Not triaged yet