World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.
Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.