Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.
Large Vision-Language Models (LVLMs) look great on single images but often stumble when they must reason across multiple images.