Papers181

#GRPO

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.

#polymath learning#one-shot reinforcement learning#GRPO

Not triaged yet

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Intermediate

Jing Tan, Zhaoyang Zhang et al.Jan 5arXiv

Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.

#text-guided image editing#object-level transformation#reinforcement learning

Not triaged yet

Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Beginner

Falcon LLM Team, Iheb Chaabane et al.Jan 5arXiv

Falcon-H1R is a small (7B) AI model that thinks really well without needing giant computers.

#Falcon-H1R#Hybrid Transformer-Mamba#Chain-of-Thought

Not triaged yet

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Intermediate

Shikun Sun, Liao Qu et al.Jan 5arXiv

Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.

#Visual Autoregressive (VAR)#Reinforcement Learning#GRPO

Not triaged yet

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

Intermediate

Zhuofan Shi, Hubao A et al.Jan 5arXiv

MDAgent2 is a special helper built from large language models (LLMs) that can both answer questions about molecular dynamics and write runnable LAMMPS simulation code.

#Molecular Dynamics#LAMMPS#Code Generation

Not triaged yet

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Intermediate

Max Ruiz Luyten, Mihaela van der SchaarJan 2arXiv

Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.

#Distributional Creative Reasoning#diversity energy#creativity kernel

Not triaged yet

CPPO: Contrastive Perception for Vision Language Policy Optimization

Intermediate

Ahmad Rezaei, Mohsen Gholami et al.Jan 1arXiv

CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.

#CPPO#Contrastive Perception Loss#Vision-Language Models

Not triaged yet

Scaling Open-Ended Reasoning to Predict the Future

Intermediate

Nikhil Chandak, Shashwat Goel et al.Dec 31arXiv

The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.

#open-ended forecasting#calibrated prediction#Brier score

Not triaged yet

Figure It Out: Improve the Frontier of Reasoning with Executable Visual States

Intermediate

Meiqi Chen, Fandong Meng et al.Dec 30arXiv

FIGR is a new way for AI to ‘think by drawing,’ using code to build clean, editable diagrams while it reasons.

#executable visual states#diagrammatic reasoning#reinforcement learning for reasoning

Not triaged yet

GARDO: Reinforcing Diffusion Models without Reward Hacking

Intermediate

Haoran He, Yuxiao Ye et al.Dec 30arXiv

GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.

#GARDO#reward hacking#gated KL regularization

Not triaged yet

Training AI Co-Scientists Using Rubric Rewards

Intermediate

Shashwat Goel, Rishi Hazra et al.Dec 29arXiv

The paper teaches AI to write strong research plans by letting it grade its own work using checklists (rubrics) pulled from real scientific papers.

#AI co-scientist#research plan generation#rubric rewards

Not triaged yet

ProGuard: Towards Proactive Multimodal Safeguard

Intermediate

Shaohan Yu, Lijun Li et al.Dec 29arXiv

ProGuard is a safety guard for text and images that doesn’t just spot known problems—it can also recognize and name new, never-seen-before risks.

#proactive safety#multimodal moderation#out-of-distribution detection

Not triaged yet

9 10 11 12 13