🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers159

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

Intermediate
Futing Wang, Jianhao Yan et al.Feb 12arXiv

The paper teaches language models to explore more ideas while thinking, so they can solve harder problems.

#In-Context Exploration#Test-Time Scaling#Chain-of-Thought

Not triaged yet

PhyCritic: Multimodal Critic Models for Physical AI

Intermediate
Tianyi Xiong, Shihao Wang et al.Feb 11arXiv

PhyCritic is a judge model that checks other AI models’ answers about the physical world, like cooking steps, robot actions, or driving choices.

#Physical AI#Multimodal critic#Self-referential training

Not triaged yet

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Intermediate
Yicheng Chen, Zerun Ma et al.Feb 11arXiv

DataChef teaches a large language model to be a smart data chef: it plans and codes full data pipelines that turn messy datasets into great training meals for other models.

#data recipe#data processing pipeline#reinforcement learning

Not triaged yet

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Intermediate
Shuo He, Lang Feng et al.Feb 11arXiv

Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.

#Kalman filter#importance sampling ratio#policy optimization

Not triaged yet

MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning

Intermediate
Chenhao Zhang, Yazhe Niu et al.Feb 11arXiv

Pictures can hide deeper meanings, like a wilted plant meaning someone feels burned out; most AI models miss these hints.

#image metaphor understanding#image implication#visual reinforcement learning

Not triaged yet

Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Intermediate
Shiting Huang, Zecheng Li et al.Feb 10arXiv

The paper teaches large language models to do what good students do: find where they went wrong, turn that lesson into a rule, and remember it for next time.

#Reinforcement Learning with Verifiable Rewards#RLVR#Meta-Experience Learning

Not triaged yet

QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search

Intermediate
Jianzhao Huang, Xiaorui Huang et al.Feb 10arXiv

Search engines on social apps used to rely on many separate mini-models that often misunderstood slang and were hard to keep updated.

#Query Processing#Unified Generative Model#Named Entity Recognition

Not triaged yet

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Intermediate
Linli Yao, Yuancheng Wei et al.Feb 9arXiv

This paper teaches AI to write movie-like scripts for videos by adding exact timestamps and rich details about what you see and hear.

#Omni Dense Captioning#time-aware video captioning#audio-visual understanding

Not triaged yet

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Intermediate
Zixuan Huang, Xin Xia et al.Feb 9arXiv

Big AI reasoning models often keep thinking long after they already found the right answer, wasting time and tokens.

#SAGE#efficient reasoning#chain of thought

Not triaged yet

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Intermediate
Tianyi Wu, Mingzhe Du et al.Feb 7arXiv

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

Not triaged yet

Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

Intermediate
Yunze Tong, Mushui Liu et al.Feb 6arXiv

Text-to-image models using GRPO used to give the same final reward to every step, which is like giving the whole team the same grade no matter who did what.

#TurningPoint-GRPO#GRPO#Flow Matching

Not triaged yet

POINTS-GUI-G: GUI-Grounding Journey

Intermediate
Zhongyin Zhao, Yuan Liu et al.Feb 6arXiv

This paper teaches a computer to find buttons, text, and icons on screens so it can click and type in the right places, a skill called GUI grounding.

#GUI grounding#reinforcement learning#verifiable rewards

Not triaged yet

12345