Big AI reasoning models often keep thinking long after they already found the right answer, wasting time and tokens.
Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.
TimeLens studies how to teach AI not just what happens in a video, but exactly when it happens, which is called video temporal grounding (VTG).