WorldCompass teaches video world models to follow actions better and keep pictures pretty by using reinforcement learning after pretraining.
The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.