FlowBlending is a simple way to speed up video diffusion models by smartly choosing when to use a big model and when a small one is enough.
The paper introduces Nested Learning, a new way to build AI that learns in layers (like Russian dolls), so each part can update at its own speed and remember different things.
Youtu-LLM is a small (1.96B) language model that was trained from scratch to think, plan, and act like an agent instead of just copying bigger models.
Language is lumpy: easy stretches and tricky jumps are mixed together, but old models spend the same effort on every word.
Youtu-Agent is a build-and-grow factory for AI agents that cuts manual setup and keeps agents improving over time.
This paper teaches text-to-video models to follow real-world physics, so people, balls, water, glass, and fire act the way they should.
SenseNova-MARS is a vision-language model that can think step-by-step and use three tools—text search, image search, and image cropping—during its reasoning.
FIGR is a new way for AI to ‘think by drawing,’ using code to build clean, editable diagrams while it reasons.
Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.
GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.
GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.
This paper teaches video-language models to first find when the proof happens in a video and then answer with that proof, instead of mixing both steps together.