This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.
SVBench is the first benchmark that checks whether video generation models can show realistic social behavior, not just pretty pictures.