This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.
The paper shows that big sequence models (like transformers) quietly learn longer goals inside their hidden activations, even though they are trained one step at a time.
RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.
Fast-FoundationStereo is a stereo vision system that sees depth from two cameras in real time while still working well on brand‑new scenes it was never trained on.