Large language models donβt map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.