This paper builds a live challenge that tests how well Deep Research Agents (DRAs) can write expert-level Wikipedia-style articles.
This survey explains how AI judges are changing from single smart readers (LLM-as-a-Judge) into full-on agents that can plan, use tools, remember, and work in teams (Agent-as-a-Judge).
This paper fixes a common problem in multimodal AI: models can understand pictures and words well but stumble when asked to create matching images.
This paper teaches AI helpers to browse the web more like people do, not just by grabbing static snippets.