🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers130

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

Rethinking Chain-of-Thought Reasoning for Videos

Intermediate
Yiwu Zhong, Zi-Yuan Hu et al.Dec 10arXiv

The paper shows that video AIs do not need long, human-like chains of thought to reason well.

#video reasoning#chain-of-thought#concise reasoning

Learning Unmasking Policies for Diffusion Language Models

Intermediate
Metod Jazbec, Theo X. Olausson et al.Dec 9arXiv

Diffusion language models write by gradually unmasking hidden words, so deciding which blanks to reveal next is a big deal for both speed and accuracy.

#diffusion language models#masked diffusion#unmasking policy

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Intermediate
Zheng Ding, Weirui YeDec 9arXiv

TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.

#TreeGRPO#reinforcement learning#diffusion models

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Intermediate
Ming Chen, Sheng Tang et al.Dec 6arXiv

The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.

#decoding-based regression#sequence-level reward#reinforcement learning

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Intermediate
Hongyu Li, Manyuan Zhang et al.Dec 5arXiv

EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.

#instruction-based image editing#iterative reasoning#multimodal large language model

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Intermediate
Zhenpeng Su, Leiyu Pan et al.Dec 5arXiv

Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the “safe zone,” causing unstable learning.

#reinforcement learning#PPO-clip#KL penalty

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Beginner
Zefeng Zhang, Xiangzhao Hao et al.Dec 4arXiv

COOPER is a single AI model that both “looks better” (perceives depth and object boundaries) and “thinks smarter” (reasons step by step) to answer spatial questions about images.

#COOPER#multimodal large language model#unified model

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Intermediate
Bowen Ping, Chengyou Jia et al.Dec 2arXiv

This paper teaches image models to keep things consistent across multiple pictures—like the same character, art style, and story logic—using reinforcement learning (RL).

#consistent image generation#pairwise reward modeling#reinforcement learning

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Intermediate
Junyan Ye, Leiqi Zhu et al.Nov 29arXiv

RealGen is a new way to make computer-made pictures look so real that they can fool expert detectors and even careful judges.

#photorealistic text-to-image#detector-guided rewards#reinforcement learning
7891011