🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers134

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Intermediate
Jinyang Wu, Shuo Yang et al.Jan 28arXiv

SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.

#SPARK#dynamic branching#strategic exploration

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

Intermediate
Kishan Panaganti, Zhenwen Liang et al.Jan 27arXiv

LLMs are usually trained by treating every question the same and giving each one the same number of tries, which wastes compute on easy problems and neglects hard ones.

#LLM reasoning#Reinforcement Learning (RL)#GRPO

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate
Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner
Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Intermediate
James Burgess, Jan N. Hansen et al.Jan 26arXiv

This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.

#RLVR#search agents#PaperSearchQA

Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models

Beginner
Kunat Pipatanakul, Pittawat TaveekitworachaiJan 26arXiv

Typhoon-S is a simple, open recipe that turns a basic language model into a helpful assistant and then teaches it important local skills, all on small budgets.

#Typhoon-S#on-policy distillation#full-logits distillation

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation

Intermediate
Chenyu Mu, Xin He et al.Jan 25arXiv

This paper teaches AI to turn simple dialogue into full movie scenes by first writing a detailed script and then filming it step by step.

#dialogue-to-video#cinematic script generation#ScripterAgent

SAMTok: Representing Any Mask with Two Words

Intermediate
Yikang Zhou, Tao Zhang et al.Jan 22arXiv

SAMTok turns any object’s mask in an image into just two special “words” so language models can handle pixels like they handle text.

#SAMTok#mask tokenizer#residual vector quantization

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner
Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Beginner
Zhiwei Zhang, Fei Zhao et al.Jan 22arXiv

Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.

#FISSION-GRPO#error recovery#tool use

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner
Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Intermediate
Matthew Y. R. Yang, Hao Bai et al.Jan 20arXiv

The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.

#Intervention Training#credit assignment#LLM reasoning
23456