GameDevBench is a new test that checks if AI agents can actually make parts of video games, not just write code in one file.
MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.