AgentVista is a new test (benchmark) that checks whether AI agents can solve tough, real-life picture-based problems by using multiple tools over many steps.
MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.