The paper argues that the fairest way to check how generally smart an AI is, is to see how quickly and well it learns lots of different human-made games, just like a person with the same time and practice.
This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.
This survey explains how AI judges are changing from single smart readers (LLM-as-a-Judge) into full-on agents that can plan, use tools, remember, and work in teams (Agent-as-a-Judge).