V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.
This paper builds two teamwork models, Qwen3-VL-Embedding and Qwen3-VL-Reranker, that understand text, images, visual documents, and videos in one shared space so search works across all of them.
The paper introduces M3DR, a way for computers to find the right document image no matter which of 22 languages the query or the document uses.