🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers131

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

Dr. Zero: Self-Evolving Search Agents without Training Data

Intermediate
Zhenrui Yue, Kartikeya Upasani et al.Jan 11arXiv

Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.

#Dr. Zero#self-evolution#proposer-solver

Solar Open Technical Report

Intermediate
Sungrae Park, Sanghoon Kim et al.Jan 11arXiv

Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.

#Solar Open#Mixture-of-Experts#bilingual LLM

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Intermediate
Jie Wu, Haoling Li et al.Jan 11arXiv

X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.

#competitive programming#synthetic data generation#feature-based synthesis

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Intermediate
Qingyu Ren, Qianyu He et al.Jan 10arXiv

Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.

#instruction following#logical structures#parallel constraints

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Intermediate
Jiajie Zhang, Xin Lv et al.Jan 9arXiv

The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.

#deep search agents#reinforcement learning#rubric rewards

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Intermediate
Qiguang Chen, Yantao Du et al.Jan 9arXiv

This paper says long chain-of-thought (Long CoT) works best when it follows a 'molecular' pattern with three kinds of thinking bonds: Deep-Reasoning, Self-Reflection, and Self-Exploration.

#Long Chain-of-Thought#reasoning bonds#Deep Reasoning

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Intermediate
Xiaoshuai Song, Haofei Chang et al.Jan 9arXiv

EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.

#EnvScaler#tool-interactive environments#programmatic synthesis

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Intermediate
Shuming Liu, Mingchen Zhuge et al.Jan 8arXiv

The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?

#video reasoning#adaptive reasoning#early exit

RelayLLM: Efficient Reasoning via Collaborative Decoding

Intermediate
Chengsong Huang, Tong Zheng et al.Jan 8arXiv

RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.

#token-level collaboration#<call>n</call> command#collaborative decoding

TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

Intermediate
Yinuo Wang, Mining Tan et al.Jan 8arXiv

TourPlanner is a travel-planning system that first gathers the right places, then lets multiple expert ‘voices’ debate plans, and finally polishes the winner with a learning method that follows rules before style.

#travel planning#multi-agent reasoning#chain-of-thought

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Intermediate
Jiuzhou Zhao, Chunrong Chen et al.Jan 8arXiv

Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.

#multi-agent systems#routing#reasoning chain

ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition

Intermediate
Muyang Zhao, Qi Qi et al.Jan 7arXiv

The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.

#meta-cognition#budgeted reasoning#token budget
56789