Kiwi-Edit is a new video editor that follows your words and also copies looks from a picture you give it.
The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).
MIND is a new benchmark that fairly tests two core skills of world models: remembering the world over time (memory consistency) and following controls exactly (action control).
The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.
RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.
FastVMT is a faster way to copy motion from one video to another without training a new model for each video.
VideoMaMa is a model that turns simple black-and-white object masks into soft, precise cutouts (alpha mattes) for every frame of a video.
Motion 3-to-4 turns a single regular video into a moving 3D object over time (a 4D asset) by first getting the object’s shape and then figuring out how every part moves.
CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.
FlowAct-R1 is a new system that makes lifelike human videos in real time, so the on-screen person can react quickly as you talk to them.
Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.
MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.