🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers159

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

Reinforcement Learning via Self-Distillation

Intermediate
Jonas Hübotter, Frederike Lübeck et al.Jan 28arXiv

The paper teaches large language models to learn from detailed feedback (like error messages) instead of only a simple pass/fail score.

#Self-Distillation#Reinforcement Learning with Rich Feedback#SDPO

Not triaged yet

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Intermediate
Yanqi Dai, Yuxiang Ji et al.Jan 28arXiv

This paper says that to make math-solving AIs smarter, we should train them more on the hardest questions they can almost solve.

#Mathematical reasoning#RLVR#GRPO

Not triaged yet

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate
Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

Not triaged yet

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Intermediate
Haoyou Deng, Keyu Yan et al.Jan 28arXiv

DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.

#DenseGRPO#flow matching#GRPO

Not triaged yet

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Intermediate
Jinyang Wu, Shuo Yang et al.Jan 28arXiv

SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.

#SPARK#dynamic branching#strategic exploration

Not triaged yet

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

Intermediate
Kishan Panaganti, Zhenwen Liang et al.Jan 27arXiv

LLMs are usually trained by treating every question the same and giving each one the same number of tries, which wastes compute on easy problems and neglects hard ones.

#LLM reasoning#Reinforcement Learning (RL)#GRPO

Not triaged yet

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate
Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

Not triaged yet

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Intermediate
James Burgess, Jan N. Hansen et al.Jan 26arXiv

This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.

#RLVR#search agents#PaperSearchQA

Not triaged yet

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation

Intermediate
Chenyu Mu, Xin He et al.Jan 25arXiv

This paper teaches AI to turn simple dialogue into full movie scenes by first writing a detailed script and then filming it step by step.

#dialogue-to-video#cinematic script generation#ScripterAgent

Not triaged yet

SAMTok: Representing Any Mask with Two Words

Intermediate
Yikang Zhou, Tao Zhang et al.Jan 22arXiv

SAMTok turns any object’s mask in an image into just two special “words” so language models can handle pixels like they handle text.

#SAMTok#mask tokenizer#residual vector quantization

Not triaged yet

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Intermediate
Matthew Y. R. Yang, Hao Bai et al.Jan 20arXiv

The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.

#Intervention Training#credit assignment#LLM reasoning

Not triaged yet

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Intermediate
Shengda Fan, Xuyan Ye et al.Jan 20arXiv

DARC teaches big language models to get smarter by splitting training into two calm, well-organized steps instead of one chaotic loop.

#DARC#self-play#curriculum learning

Not triaged yet

56789