This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.
Most text-to-image models act like word-to-pixel copy machines and miss the hidden meaning in our prompts.
WorldCanvas lets you make videos where things happen exactly how you ask by combining three inputs: text (what happens), drawn paths called trajectories (when and where it happens), and reference images (who it is).
This paper is about making the words you type into a generator turn into the right pictures and videos more reliably.
OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.
ProPhy is a new two-step method that helps video AIs follow real-world physics, not just make pretty pictures.