The paper shows that when training reasoning AIs with reinforcement learning, treating every wrong answer the same makes the AI overconfident in some bad paths and less diverse overall.
This paper introduces Self-E, a text-to-image model that learns from scratch and can generate good pictures in any number of steps, from just a few to many.