ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.
VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.
T2AV-Compass is a new, unified test to fairly grade AI systems that turn text into matching video and audio.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
Vector Prism helps computers animate SVG images by first discovering which tiny shapes belong together as meaningful parts.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.
Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.
The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.
VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.