The paper introduces CHAIN, a hands-on 3D playground that tests if AI can not only see objects but also plan and act under real physics.
OCR is like reading a page exactly as it is, and that strictness makes it perfect for fast, parallel generation.
NarraScore turns a video's changing story into a matching soundtrack by using emotion as the bridge.
Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.
XR is a new, training-free team of AI helpers that finds images using both a reference picture and a short text edit (like “same jacket but red”).
Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.
ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
The paper builds YearGuessr, a giant, worldwide photo-and-text dataset of 55,546 buildings with their construction years (1001–2024), GPS, and popularity (page views).