🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers33

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Reinforcement Learning

Reinforcement Learning for Self-Improving Agent with Skill Library

Intermediate
Jiongxiao Wang, Qiaojing Yan et al.Dec 18arXiv

This paper teaches AI agents to learn new reusable skills and get better over time by using reinforcement learning, not just prompts.

#Reinforcement Learning#Skill Library#Sequential Rollout

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Intermediate
Bingxiang He, Zekai Qu et al.Dec 18arXiv

JustRL shows that a tiny, steady recipe for reinforcement learning (RL) can make a 1.5B-parameter language model much better at math without fancy tricks.

#Reinforcement Learning#GRPO#Policy Entropy

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Intermediate
Xiaoqian Shen, Min-Hung Chen et al.Dec 16arXiv

Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.

#Grounded Video Question Answering#Temporal Grounding#Coarse-to-Fine

Differentiable Evolutionary Reinforcement Learning

Intermediate
Sitao Cheng, Tianle Li et al.Dec 15arXiv

This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.

#Differentiable Evolutionary Reinforcement Learning#Meta-Optimizer#Meta-Reward

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

Intermediate
Juil Koo, Daehyeon Choi et al.Dec 15arXiv

This paper teaches robots to move their camera to a better spot before answering a question about what they see.

#Active Perception#Embodied AI#Vision-Language Models

Rethinking Expert Trajectory Utilization in LLM Post-training

Intermediate
Bowen Ding, Yuhan Chen et al.Dec 12arXiv

The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.

#Supervised Fine-Tuning#Reinforcement Learning#Expert Trajectories

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Intermediate
Yiwen Tang, Zoey Guo et al.Dec 11arXiv

This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.

#Reinforcement Learning#Text-to-3D Generation#Hi-GRPO

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

Intermediate
Salman Rahman, Sruthi Gorantla et al.Dec 2arXiv

SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.

#SPARK#Process Reward Model#PRM-CoT

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Intermediate
Yifan Li, Yingda Yin et al.Dec 2arXiv

ReVSeg teaches an AI to segment objects in videos by thinking step-by-step instead of guessing everything at once.

#Reasoning Video Object Segmentation#Vision-Language Models#Temporal Grounding
123