This paper tackles a simple but serious question: can AI agents use paid tools to finish multi-step tasks without blowing the budget?
This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.