🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers134

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization

Intermediate
Mizanur Rahman, Mohammed Saidul Islam et al.Jan 8arXiv

This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.

#Text-to-Visualization#Reinforcement Learning#GRPO

ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing

Beginner
Hengjia Li, Liming Jiang et al.Jan 6arXiv

ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.

#reasoning-centric image editing#reinforcement learning#chain-of-thought

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

Beginner
Yiyuan Li, Zhen Huang et al.Jan 6arXiv

This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.

#polymath learning#one-shot reinforcement learning#GRPO

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Intermediate
Jing Tan, Zhaoyang Zhang et al.Jan 5arXiv

Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.

#text-guided image editing#object-level transformation#reinforcement learning

Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Beginner
Falcon LLM Team, Iheb Chaabane et al.Jan 5arXiv

Falcon-H1R is a small (7B) AI model that thinks really well without needing giant computers.

#Falcon-H1R#Hybrid Transformer-Mamba#Chain-of-Thought

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Intermediate
Shikun Sun, Liao Qu et al.Jan 5arXiv

Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.

#Visual Autoregressive (VAR)#Reinforcement Learning#GRPO

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

Intermediate
Zhuofan Shi, Hubao A et al.Jan 5arXiv

MDAgent2 is a special helper built from large language models (LLMs) that can both answer questions about molecular dynamics and write runnable LAMMPS simulation code.

#Molecular Dynamics#LAMMPS#Code Generation

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Intermediate
Max Ruiz Luyten, Mihaela van der SchaarJan 2arXiv

Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.

#Distributional Creative Reasoning#diversity energy#creativity kernel

CPPO: Contrastive Perception for Vision Language Policy Optimization

Intermediate
Ahmad Rezaei, Mohsen Gholami et al.Jan 1arXiv

CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.

#CPPO#Contrastive Perception Loss#Vision-Language Models

Scaling Open-Ended Reasoning to Predict the Future

Intermediate
Nikhil Chandak, Shashwat Goel et al.Dec 31arXiv

The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.

#open-ended forecasting#calibrated prediction#Brier score

Figure It Out: Improve the Frontier of Reasoning with Executable Visual States

Intermediate
Meiqi Chen, Fandong Meng et al.Dec 30arXiv

FIGR is a new way for AI to ‘think by drawing,’ using code to build clean, editable diagrams while it reasons.

#executable visual states#diagrammatic reasoning#reinforcement learning for reasoning

GARDO: Reinforcing Diffusion Models without Reward Hacking

Intermediate
Haoran He, Yuxiao Ye et al.Dec 30arXiv

GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.

#GARDO#reward hacking#gated KL regularization
56789