Search

"GRPO"20 resultsKeyword

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A Agrawal, Shangyin Tan et al.Jul 25arXiv

GEPA is a new way to improve AI prompts by letting the AI read its own work, reflect in plain language on what went wrong, and then rewrite its instructions.

#GEPA#reflective prompt evolution#Pareto frontier

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Intermediate

Xiaoqian Shen, Min-Hung Chen et al.Dec 16arXiv

Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.

#Grounded Video Question Answering#Temporal Grounding#Coarse-to-Fine

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Intermediate

Siqi Kou, Jiachun Jin et al.Jan 15arXiv

Most text-to-image models act like word-to-pixel copy machines and miss the hidden meaning in our prompts.

#think-then-generate#reasoning-aware text-to-image#LLM encoder

Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization

Intermediate

Mizanur Rahman, Mohammed Saidul Islam et al.Jan 8arXiv

This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.

#Text-to-Visualization#Reinforcement Learning#GRPO

Puzzle Curriculum GRPO for Vision-Centric Reasoning

Intermediate

Ahmadreza Jeddi, Hakki Can Karaimer et al.Dec 16arXiv

This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.

#vision-language models#reinforcement learning#group-relative policy optimization

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Intermediate

Ying Zhu, Jiaxin Wan et al.Dec 23arXiv

This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.

#Diffusion Language Model#Blockwise dLLM#Post-Training

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Intermediate

Xiaotong Ji, Rasul Tutunov et al.Jan 29arXiv

The paper shows a fast, training-free way to boost an LLM’s step-by-step reasoning by smartly reusing the model’s own probabilities.

#power distribution sampling#distribution sharpening#low-temperature sampling

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Intermediate

Zeyuan Liu, Jeonghye Kim et al.Feb 26arXiv

This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.

#EMPO#memory-augmented agents#on-policy learning

Self-Hinting Language Models Enhance Reinforcement Learning

Intermediate

Baohao Liao, Hanze Dong et al.Feb 3arXiv

When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.

#reinforcement learning#GRPO#self-hinting

ProAct: Agentic Lookahead in Interactive Environments

Intermediate

Yangbin Yu, Mingyu Yang et al.Feb 5arXiv

ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.

#ProAct#GLAD#MC-Critic

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Intermediate

Fanfan Liu, Youyang Yin et al.Feb 5arXiv

The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.

#LUSPO#RLVR#GRPO

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner

Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Intermediate

Jing Tan, Zhaoyang Zhang et al.Jan 5arXiv

Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.

#text-guided image editing#object-level transformation#reinforcement learning

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Intermediate

Chaoyang Wang, Kaituo Feng et al.Dec 18arXiv

AdaTooler-V teaches an image-and-video AI to first ask, “Do I really need a tool?” before using one, which saves time and boosts accuracy.

#adaptive tool-use#multimodal chain-of-thought#visual tool interactions

GameTalk: Training LLMs for Strategic Conversation

Intermediate

Victor Conchello Vendrell, Max Ruiz Luyten et al.Jan 22arXiv

Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.

#strategic conversation#reinforcement learning for LLMs#multi-turn dialogue

Reinforced Attention Learning

Intermediate

Bangzheng Li, Jianmo Ni et al.Feb 4arXiv

This paper teaches AI to pay attention better by training its focus, not just its words.

#Reinforced Attention Learning#attention policy#multimodal LLM

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

Intermediate

Zhiyuan Yao, Yi-Kai Zhang et al.Feb 3arXiv

Large language models learn better when we spend more practice time on the right questions at the right moments.

#Reinforcement Learning#RLVR#GRPO

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

Intermediate

Kishan Panaganti, Zhenwen Liang et al.Jan 27arXiv

LLMs are usually trained by treating every question the same and giving each one the same number of tries, which wastes compute on easy problems and neglects hard ones.

#LLM reasoning#Reinforcement Learning (RL)#GRPO

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Intermediate

Max Ruiz Luyten, Mihaela van der SchaarJan 2arXiv

Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.

#Distributional Creative Reasoning#diversity energy#creativity kernel