The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.
This paper teaches AI to solve diagram-based math problems by copying how people think: first see (perception), then make sense of what you saw (internalization), and finally reason (solve the problem).