Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.
TimeLens studies how to teach AI not just what happens in a video, but exactly when it happens, which is called video temporal grounding (VTG).