This paper introduces 3DiMo, a new way to control how people move in generated videos while keeping the camera moves flexible through text.
Robots often see the world as flat pictures but must move in a 3D world, which makes accurate actions hard.