PhyCritic is a judge model that checks other AI modelsβ answers about the physical world, like cooking steps, robot actions, or driving choices.
This paper builds TAD, a brand-new test that checks if AI can understand what happens over time in real driving videos.