Robots like cars and drones see the world with many different sensors (cameras, LiDAR, radar, and even event cameras), and this paper shows a clear roadmap for teaching them to understand space by learning from all of these together.
SpatialTree is a new, four-level "ability tree" that tests how multimodal AI models (that see and read) handle space: from basic seeing to acting in the world.