Large Vision-Language Models (LVLMs) are great with one picture but get confused when you give them several, often mixing details from different images.
VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.