PyVision-RL: Forging Open Agentic Vision Models via RL
IntermediateShitian Zhao, Shaoheng Lin et al.Feb 24arXiv
PyVision-RL teaches vision-language models to act like curious agents that think in multiple steps and use Python tools to inspect images and videos.
#agentic multimodal models#reinforcement learning#dynamic tooling