Papers5

#semantic alignment

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

Not triaged yet

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Intermediate

Siqi Kou, Jiachun Jin et al.Jan 15arXiv

Most text-to-image models act like word-to-pixel copy machines and miss the hidden meaning in our prompts.

#think-then-generate#reasoning-aware text-to-image#LLM encoder

Not triaged yet

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

Intermediate

Hanlin Wang, Hao Ouyang et al.Dec 18arXiv

WorldCanvas lets you make videos where things happen exactly how you ask by combining three inputs: text (what happens), drawn paths called trajectories (when and where it happens), and reference images (who it is).

#WorldCanvas#promptable world events#trajectory-controlled video generation

Not triaged yet

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Intermediate

Bozhou Li, Sihan Yang et al.Dec 17arXiv

This paper is about making the words you type into a generator turn into the right pictures and videos more reliably.

#diffusion models#text encoder#multimodal large language model

Not triaged yet

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Intermediate

Zijun Wang, Panwen Hu et al.Dec 5arXiv

ProPhy is a new two-step method that helps video AIs follow real-world physics, not just make pretty pictures.

#physics-aware video generation#mixture-of-experts#token-level routing

Not triaged yet