MobilityBench is a big, carefully built test that checks how well AI helpers can plan real-world routes using natural language and map tools.
AI helpers often don’t know new users’ tastes and can’t keep up when those tastes change.
This paper shows that giving an AI a safe, tiny virtual computer (a sandbox) lets it solve many kinds of problems better, not just coding ones.