🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers131

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

Computer-Using World Model

Intermediate
Yiming Guan, Rui Yu et al.Feb 19arXiv

The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.

#world model#GUI agent#desktop automation

Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

Intermediate
Wenxuan Ding, Nicholas Tomlin et al.Feb 18arXiv

This paper teaches AI agents to make smart choices about when to explore for more information and when to act right away.

#Calibrate-Then-Act#cost-aware exploration#LLM agents

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Intermediate
Sen Ye, Mengde Xu et al.Feb 17arXiv

Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.

#multimodal models#image generation#reasoning

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Intermediate
Haiyang Xu, Xi Zhang et al.Feb 15arXiv

This paper builds GUI-Owl-1.5, an AI that can use phones, computers, and web browsers like a careful human helper.

#GUI agent#visual grounding#reinforcement learning

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

Intermediate
Xiaohan He, Shiyang Feng et al.Feb 12arXiv

Sci-CoE is a two-stage training method that helps one language model learn to both solve science problems and check those solutions with very little labeled data.

#scientific reasoning#co-evolution#solver-verifier

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

Intermediate
Pinyi Zhang, Ting-En Lin et al.Feb 12arXiv

This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.

#personalized reward modeling#generative reward model#evaluation chain

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Intermediate
Yicheng Chen, Zerun Ma et al.Feb 11arXiv

DataChef teaches a large language model to be a smart data chef: it plans and codes full data pipelines that turn messy datasets into great training meals for other models.

#data recipe#data processing pipeline#reinforcement learning

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Intermediate
Leheng Sheng, Yongtao Zhang et al.Feb 11arXiv

Long texts overwhelm many language models, which forget important bits and slow down as the context grows.

#gated recurrent memory#update gate#exit gate

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Intermediate
Tianyi Wu, Mingzhe Du et al.Feb 7arXiv

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

POINTS-GUI-G: GUI-Grounding Journey

Intermediate
Zhongyin Zhao, Yuan Liu et al.Feb 6arXiv

This paper teaches a computer to find buttons, text, and icons on screens so it can click and type in the right places, a skill called GUI grounding.

#GUI grounding#reinforcement learning#verifiable rewards

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Intermediate
Dongyang Chen, Chaoyang Wang et al.Feb 5arXiv

V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.

#V-Retrver#multimodal retrieval#agentic reasoning

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Intermediate
Haozhen Zhang, Haodong Yue et al.Feb 5arXiv

BudgetMem is a way for AI helpers to build and use memory on the fly, picking how much thinking to spend so answers are both good and affordable.

#runtime memory extraction#budget-tier routing#reinforcement learning
12345