VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
IntermediateZirui Wang, Junyi Zhang et al.Jan 23arXiv
VisGym is a playground of 17 very different visual tasks that test and train AI models that see and talk (Vision–Language Models) to act over many steps.
#VisGym#Vision–Language Models#Multimodal Agents