🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers13

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#verifiable rewards

Specificity-aware reinforcement learning for fine-grained open-world classification

Intermediate
Samuele Angheben, Davide Berasi et al.Mar 3arXiv

This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.

#open-world classification#fine-grained recognition#large multimodal models

Heterogeneous Agent Collaborative Reinforcement Learning

Intermediate
Zhixia Zhang, Zixuan Huang et al.Mar 3arXiv

This paper introduces HACRL, a way for different kinds of AI agents to learn together during training but still work alone during use.

#HACRL#HACPO#heterogeneous agents

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Intermediate
Haoxiang Sun, Lizhen Xu et al.Feb 18arXiv

DeepVision-103K is a new 103,000-example picture-and-text math dataset designed to help AI think better using rewards that can be checked automatically.

#DeepVision-103K#multimodal reasoning#RLVR

POINTS-GUI-G: GUI-Grounding Journey

Intermediate
Zhongyin Zhao, Yuan Liu et al.Feb 6arXiv

This paper teaches a computer to find buttons, text, and icons on screens so it can click and type in the right places, a skill called GUI grounding.

#GUI grounding#reinforcement learning#verifiable rewards

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Intermediate
Junxiao Liu, Zhijun Wang et al.Feb 5arXiv

TRIT is a new training method that teaches AI to translate and think at the same time so it can solve hard problems in many languages without extra helper models.

#multilingual reasoning#translation-reasoning integration#self-translation

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Intermediate
Fanfan Liu, Youyang Yin et al.Feb 5arXiv

The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.

#LUSPO#RLVR#GRPO

Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning

Intermediate
Minwu Kim, Safal Shrestha et al.Jan 28arXiv

When training smart language models with RL that use right-or-wrong rewards, learning can stall on 'saturated' problems that the model almost always solves.

#failure-prefix conditioning#RLVR#GRPO

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Intermediate
Qingyu Ren, Qianyu He et al.Jan 10arXiv

Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.

#instruction following#logical structures#parallel constraints

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Intermediate
Shuming Liu, Mingchen Zhuge et al.Jan 8arXiv

The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?

#video reasoning#adaptive reasoning#early exit

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Intermediate
Zhe Huang, Hao Wen et al.Dec 30arXiv

Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.

#multimodal large language model#video understanding#visual hallucination

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Intermediate
Chaoyang Wang, Kaituo Feng et al.Dec 18arXiv

AdaTooler-V teaches an image-and-video AI to first ask, “Do I really need a tool?” before using one, which saves time and boosts accuracy.

#adaptive tool-use#multimodal chain-of-thought#visual tool interactions
12