OmniGAIA is a new test that checks if AI can watch videos, look at images, listen to audio, and use web and code tools in several steps to find a verified answer.
MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.
ET-Agent is a training framework that teaches AI agents to use tools (like search and code) more wisely, not just to get the right answer.
MindWatcher is a smart AI agent that can think step by step and decide when to use tools like web search, image zooming, and a code calculator to solve tough, multi-step problems.