People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).
PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
This paper teaches AI teams to get better by scoring every move they make, not just the final answer.
Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.
SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.
GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.
TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.
GEPA is a new way to improve AI prompts by letting the AI read its own work, reflect in plain language on what went wrong, and then rewrite its instructions.