Papers7

#sample efficiency

Large Multimodal Models as General In-Context Classifiers

Marco Garosi, Matteo Farina et al.Feb 26arXiv

People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).

#in-context learning#multimodal models#open-world classification

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate

Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

Scaling Multiagent Systems with Process Rewards

Intermediate

Ed Li, Junyu Ren et al.Jan 30arXiv

This paper teaches AI teams to get better by scoring every move they make, not just the final answer.

#multiagent reinforcement learning#process rewards#AI feedback

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Intermediate

Jinyang Wu, Shuo Yang et al.Jan 28arXiv

SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.

#SPARK#dynamic branching#strategic exploration

GARDO: Reinforcing Diffusion Models without Reward Hacking

Intermediate

Haoran He, Yuxiao Ye et al.Dec 30arXiv

GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.

#GARDO#reward hacking#gated KL regularization

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Intermediate

Zheng Ding, Weirui YeDec 9arXiv

TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.

#TreeGRPO#reinforcement learning#diffusion models

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Intermediate

Lakshya A Agrawal, Shangyin Tan et al.Jul 25arXiv

GEPA is a new way to improve AI prompts by letting the AI read its own work, reflect in plain language on what went wrong, and then rewrite its instructions.

#GEPA#reflective prompt evolution#Pareto frontier