🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers159

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition

Intermediate
Jinlong Ma, Yu Zhang et al.Feb 4arXiv

The paper teaches multimodal large language models (MLLMs) to stop guessing from just text or just images and instead check both together before answering.

#GMNER#Multimodal Large Language Models#Modality Bias

Not triaged yet

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Intermediate
Yinyi Luo, Yiqiao Jin et al.Feb 3arXiv

AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.

#multi-agent distillation#process reward model#GRPO

Not triaged yet

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Intermediate
Ian Wu, Yuxiao Qu et al.Feb 3arXiv

Reasoning Cache (RC) is a new way for AI to think in steps: it writes some thoughts, makes a short summary, throws away the long thoughts, and then keeps going using only the summary.

#Reasoning Cache#iterative decoding#summary-conditioned reasoning

Not triaged yet

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration

Intermediate
Bowei He, Minda Hu et al.Feb 3arXiv

This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.

#search-integrated reasoning#reinforcement learning#credit assignment

Not triaged yet

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Intermediate
Changze Lv, Jie Zhou et al.Feb 3arXiv

DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.

#DeepResearch#query-specific rubrics#human preference learning

Not triaged yet

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Intermediate
Huatong Song, Lisheng Huang et al.Feb 3arXiv

SWE-Master is a fully open, step-by-step recipe for turning a regular coding model into a strong software-fixing agent that works across many steps, files, and tests.

#SWE-Master#software engineering agent#long-horizon SFT

Not triaged yet

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Intermediate
Shumin Wang, Yuexiang Xie et al.Feb 3arXiv

The paper builds a simple, math-light rule to predict whether training makes a language model more open-minded (higher entropy) or more sure of itself (lower entropy).

#reinforcement fine-tuning#entropy dynamics#GRPO

Not triaged yet

Self-Hinting Language Models Enhance Reinforcement Learning

Intermediate
Baohao Liao, Hanze Dong et al.Feb 3arXiv

When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.

#reinforcement learning#GRPO#self-hinting

Not triaged yet

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

Intermediate
Zhiyuan Yao, Yi-Kai Zhang et al.Feb 3arXiv

Large language models learn better when we spend more practice time on the right questions at the right moments.

#Reinforcement Learning#RLVR#GRPO

Not triaged yet

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Intermediate
Xiao Liang, Zhong-Zhi Li et al.Feb 2arXiv

The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.

#divide-and-conquer reasoning#chain-of-thought#reinforcement learning

Not triaged yet

Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Intermediate
Harold Haodong Chen, Xinxiang Yin et al.Feb 2arXiv

LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.

#latent reasoning#text-to-image generation#autoregressive models

Not triaged yet

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Intermediate
Jie Xiao, Meng Chen et al.Feb 2arXiv

ECHO-2 is a new way to train AI with reinforcement learning that keeps a small, central trainer busy while sending the easy, cheap work (rollouts) to many low-cost computers spread around the world.

#ECHO-2#distributed rollouts#bounded staleness

Not triaged yet

34567