Reasoning models often talk too much, and those extra words can actually make them more wrong.
The paper introduces SPOT, a training recipe that fixes an AI model’s mistakes with tiny edits while keeping what it already knows well.
The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).
SkyReels-V4 is a single, unified model that makes videos and matching sounds together, while also letting you fix or change parts of a video.
Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.
LOCA-bench is a test that challenges AI agents to work correctly as their to-do list and background information grow very, very long.
CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.
The paper introduces PRIVASIS, a huge, fully synthetic dataset (1.4 million records) filled with realistic-looking private details, but created from scratch so it does not belong to any real person.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
This paper builds a new test called AgentIF-OneDay that checks if AI helpers can follow everyday instructions the way people actually give them.
Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.