This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.
TwinBrainVLA is a robot brain with two halves: a frozen generalist that keeps world knowledge safe and a trainable specialist that learns to move precisely.
ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.
This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.
APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.
DreamStyle is a single video-stylizing model that can follow text, copy a style image, or continue from a stylized first frame—without switching tools.
NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.
DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.
Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.