SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
IntermediateYong Xien Chng, Tao Hu et al.Dec 30arXiv
SenseNova-MARS is a vision-language model that can think step-by-step and use three tools—text search, image search, and image cropping—during its reasoning.
#multimodal agent#vision-language model#reinforcement learning