The paper introduces PRIVASIS, a huge, fully synthetic dataset (1.4 million records) filled with realistic-looking private details, but created from scratch so it does not belong to any real person.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.
Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
Vector Prism helps computers animate SVG images by first discovering which tiny shapes belong together as meaningful parts.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.
Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.
The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.
VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.