This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.
This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.
APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.
ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
DreamStyle is a single video-stylizing model that can follow text, copy a style image, or continue from a stylized first frame—without switching tools.
NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.
DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.
Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.
FlowBlending is a simple way to speed up video diffusion models by smartly choosing when to use a big model and when a small one is enough.
This paper teaches text-to-video models to follow real-world physics, so people, balls, water, glass, and fire act the way they should.
GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.