DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.
Large language models usually get only a final thumbs-up or thumbs-down at the end of their answer, which is too late to fix mistakes made in the middle.
ToolSafe is a new way to keep AI agents safe when they use external tools, by checking each action before it runs.
The paper introduces M^4olGen, a two-stage system that designs new molecules to match exact numbers for several properties (like QED, LogP, MW, HOMO, LUMO) at the same time.
LaViT is a new way to teach smaller vision-language models to look at the right parts of an image before they speak.
The paper introduces SIN-Bench, a new way to test AI that read long scientific papers by forcing them to show exactly where their answers come from.
FlowAct-R1 is a new system that makes lifelike human videos in real time, so the on-screen person can react quickly as you talk to them.
This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.
Fast-ThinkAct teaches a robot to plan with a few tiny hidden "thought tokens" instead of long paragraphs, making it much faster while staying smart.
This paper shows how to make long, camera-controlled videos much faster by generating only a few smart keyframes with diffusion, then filling in the rest using a 3D scene and rendering.