This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.
This paper builds two teamwork models, Qwen3-VL-Embedding and Qwen3-VL-Reranker, that understand text, images, visual documents, and videos in one shared space so search works across all of them.
TourPlanner is a travel-planning system that first gathers the right places, then lets multiple expert ‘voices’ debate plans, and finally polishes the winner with a learning method that follows rules before style.
This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.
The paper teaches a game-playing AI to copy good human players (behavior cloning) and shows that simply scaling up the model and the data makes the AI reason more causally (it pays attention to what truly causes outcomes on screen).
Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.
CHORD is a new way to animate 3D scenes over time (4D) where many objects move and interact, guided only by a text prompt.
The paper introduces Agentic Rubrics, a new way to check code fixes without running the code by creating a smart checklist from the project itself.
APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.
Everyone uses tests (benchmarks) to judge how smart AI models are, but not all tests are good tests.
FOCUSUI makes computer-using AI faster and still accurate by looking only at the important parts of a screen.
The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.