The paper introduces SPOT, a training recipe that fixes an AI model’s mistakes with tiny edits while keeping what it already knows well.
The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).
SkyReels-V4 is a single, unified model that makes videos and matching sounds together, while also letting you fix or change parts of a video.
Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.
LOCA-bench is a test that challenges AI agents to work correctly as their to-do list and background information grow very, very long.
The paper introduces PRIVASIS, a huge, fully synthetic dataset (1.4 million records) filled with realistic-looking private details, but created from scratch so it does not belong to any real person.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.
Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
Vector Prism helps computers animate SVG images by first discovering which tiny shapes belong together as meaningful parts.