VABench is a new, all-in-one test that checks how well AI makes videos with matching sound and pictures.
Visionary is a web-based platform that lets you view and interact with advanced 3D scenes, right in your browser, with just a click.
Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.
UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.
OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.
The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.
This paper shows a new way to teach an autoencoder to shape its hidden space (the 'latent space') to look like any distribution we want, not just a simple bell curve.
DeepCode is an AI coding system that turns long, complicated papers into full, working code repositories.
This paper teaches a language model to think along several paths at the same time instead of one step after another.
LLM multi-agent systems often fail quietly (no crash) and leave long, twisty logs that are hard to debug by hand.
Long Video Understanding (LVU) is hard because the important clues are tiny, far apart in time, and buried in hours of mostly unimportant footage.
This paper argues that the fastest and safest path to super-smart AI is for humans and AIs to improve together, not for AI to improve alone.