InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
IntermediateKaican Li, Lewei Yao et al.Dec 21arXiv
This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.
#multimodal reasoning#generalized visual search#reinforcement learning