🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers138

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

Intermediate
Zhenyang Cai, Jiaming Zhang et al.Dec 12arXiv

DentalGPT is a special AI that looks at dental images and text together and explains what it sees like a junior dentist.

#DentalGPT#multimodal large language model#dentistry AI

Rethinking Expert Trajectory Utilization in LLM Post-training

Intermediate
Bowen Ding, Yuhan Chen et al.Dec 12arXiv

The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.

#Supervised Fine-Tuning#Reinforcement Learning#Expert Trajectories

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Intermediate
Yiwen Tang, Zoey Guo et al.Dec 11arXiv

This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.

#Reinforcement Learning#Text-to-3D Generation#Hi-GRPO

MOA: Multi-Objective Alignment for Role-Playing Agents

Intermediate
Chonghua Liao, Ke Wang et al.Dec 10arXiv

Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.

#multi-objective alignment#role-playing agents#reinforcement learning

Rethinking Chain-of-Thought Reasoning for Videos

Intermediate
Yiwu Zhong, Zi-Yuan Hu et al.Dec 10arXiv

The paper shows that video AIs do not need long, human-like chains of thought to reason well.

#video reasoning#chain-of-thought#concise reasoning

Learning Unmasking Policies for Diffusion Language Models

Intermediate
Metod Jazbec, Theo X. Olausson et al.Dec 9arXiv

Diffusion language models write by gradually unmasking hidden words, so deciding which blanks to reveal next is a big deal for both speed and accuracy.

#diffusion language models#masked diffusion#unmasking policy

Thinking with Images via Self-Calling Agent

Intermediate
Wenxi Yang, Yuzhong Zhao et al.Dec 9arXiv

This paper teaches a vision-language model to think about images by talking to copies of itself, using only words to plan and decide.

#Self-Calling Chain-of-Thought#sCoT#interleaved multimodal chain-of-thought

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Intermediate
Zheng Ding, Weirui YeDec 9arXiv

TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.

#TreeGRPO#reinforcement learning#diffusion models

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Beginner
Charlie Zhang, Graham Neubig et al.Dec 8arXiv

The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.

#edge of competence#process-verified evaluation#process-level rewards

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Intermediate
Ming Chen, Sheng Tang et al.Dec 6arXiv

The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.

#decoding-based regression#sequence-level reward#reinforcement learning

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Beginner
Zefeng Zhang, Xiangzhao Hao et al.Dec 4arXiv

COOPER is a single AI model that both “looks better” (perceives depth and object boundaries) and “thinks smarter” (reasons step by step) to answer spatial questions about images.

#COOPER#multimodal large language model#unified model
89101112