VisGym is a playground of 17 very different visual tasks that test and train AI models that see and talk (Vision–Language Models) to act over many steps.
FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.