This paper teaches a computer to find the same object when seen from two very different cameras, like a body camera (first-person) and a room camera (third-person).
3AM is a new way to track and segment the same object across a whole video, even when the camera view changes a lot.