🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reward shaping

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Intermediate
Huatong Song, Lisheng Huang et al.Feb 3arXiv

SWE-Master is a fully open, step-by-step recipe for turning a regular coding model into a strong software-fixing agent that works across many steps, files, and tests.

#SWE-Master#software engineering agent#long-horizon SFT

SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

Beginner
Jinyang Wu, Changpeng Yang et al.Jan 30arXiv

Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.

#Sweet Spot Learning#tiered rewards#reinforcement learning with verifiable rewards

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate
Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

GameTalk: Training LLMs for Strategic Conversation

Intermediate
Victor Conchello Vendrell, Max Ruiz Luyten et al.Jan 22arXiv

Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.

#strategic conversation#reinforcement learning for LLMs#multi-turn dialogue

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Intermediate
Jiuzhou Zhao, Chunrong Chen et al.Jan 8arXiv

Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.

#multi-agent systems#routing#reasoning chain

Diversity or Precision? A Deep Dive into Next Token Prediction

Intermediate
Haoyuan Wu, Hai Wang et al.Dec 28arXiv

The paper shows that teaching a language model with a special “reward-shaped” next-token objective can make later reinforcement learning (RL) work much better.

#next-token prediction#cross-entropy as policy gradient#reward shaping