This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.
This paper builds a gigantic library of video puzzles (VBVR) so AI can practice not just making pretty videos, but actually thinking through what happens over time.
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.