Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.
This paper builds a giant, automatically made video library called SVG2 that tells who is in a video, what they look like, and how they interact over time.
The paper tackles a common problem: people can ask AI to do big, complex tasks, but they can’t always explain exactly what they want or check the results well.
WildGraphBench is a new test that checks how well GraphRAG systems find and combine facts from messy, real-world web pages.
The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.
COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.