🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers181

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Intermediate
Yuan Li, Bo Wang et al.Mar 5arXiv

BandPO is a new training method for large language models that keeps updates safe while letting the model freely explore smart, low-probability ideas.

#BandPO#PPO clipping#trust region

Not triaged yet

HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

Intermediate
The Viet Bui, Wenjun Li et al.Mar 5arXiv

HiMAP-Travel is a team-based AI planner that splits a long trip into daily chunks so it can follow tough rules like budgets without drifting off course.

#hierarchical planning#multi-agent systems#constraint drift

Not triaged yet

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Intermediate
Aradhye Agarwal, Gurdit Siyan et al.Mar 3arXiv

Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.

#MOSAIC#agentic safety#plan-check-act

Not triaged yet

Specificity-aware reinforcement learning for fine-grained open-world classification

Intermediate
Samuele Angheben, Davide Berasi et al.Mar 3arXiv

This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.

#open-world classification#fine-grained recognition#large multimodal models

Not triaged yet

Heterogeneous Agent Collaborative Reinforcement Learning

Intermediate
Zhixia Zhang, Zixuan Huang et al.Mar 3arXiv

This paper introduces HACRL, a way for different kinds of AI agents to learn together during training but still work alone during use.

#HACRL#HACPO#heterogeneous agents

Not triaged yet

Recursive Think-Answer Process for LLMs and VLMs

Intermediate
Byung-Kwan Lee, Youngchae Chee et al.Mar 2arXiv

This paper teaches AI models to judge how sure they are about an answer and to think again if they are not sure.

#Recursive Think–Answer#Confidence-guided reasoning#Reinforcement learning for LLMs

Not triaged yet

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Intermediate
Yixin Nie, Lin Guan et al.Mar 2arXiv

CharacterFlywheel is a step‑by‑step loop that steadily improves chatty AI characters by learning from real conversations on Instagram, WhatsApp, and Messenger.

#CharacterFlywheel#large language models#conversational AI

Not triaged yet

Efficient RLVR Training via Weighted Mutual Information Data Selection

Intermediate
Xinyu Zhou, Boyu Zhu et al.Mar 2arXiv

Reinforcement learning (RL) trains language models by letting them try answers and learn from rewards, but training is slow if we pick the wrong practice questions.

#Reinforcement Learning#RLVR#Data Selection

Not triaged yet

FireRed-OCR Technical Report

Intermediate
Hao Wu, Haoran Lou et al.Mar 2arXiv

FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.

#FireRed-OCR#structural hallucination#document parsing

Not triaged yet

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Intermediate
Qiyuan Zhang, Yufei Wang et al.Mar 2arXiv

Longer explanations are not always better; the shape of thinking matters.

#Generative Reward Models#Chain-of-Thought#Breadth-CoT

Not triaged yet

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate
Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Not triaged yet

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Intermediate
Zhenyu Tang, Chaoran Feng et al.Feb 27arXiv

This paper teaches image generators to place objects in the right spots by building a special teacher called a reward model focused on spatial relationships.

#spatial reasoning#reward modeling#preference learning

Not triaged yet

12345