CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.
This paper builds a new test called AgentIF-OneDay that checks if AI helpers can follow everyday instructions the way people actually give them.
ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.
T2AV-Compass is a new, unified test to fairly grade AI systems that turn text into matching video and audio.