Papers2

#Reverse KL

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

Yuanda Xu, Hejian Sang et al.Feb 24arXiv

The paper shows that when training reasoning AIs with reinforcement learning, treating every wrong answer the same makes the AI overconfident in some bad paths and less diverse overall.

#ACE#Reinforcement Learning with Verifiable Rewards#GRPO

Not triaged yet

Self-Evaluation Unlocks Any-Step Text-to-Image Generation

Intermediate

Xin Yu, Xiaojuan Qi et al.Dec 26arXiv

This paper introduces Self-E, a text-to-image model that learns from scratch and can generate good pictures in any number of steps, from just a few to many.

#Self-Evaluating Model#Any-step inference#Text-to-image generation

Not triaged yet