Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
This paper teaches multimodal AI models to not just read pictures but to also imagine and think with pictures inside their heads.
MMGR is a new benchmark that checks whether AI image and video generators follow real-world rules, not just whether their outputs look pretty.