🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers11

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#math reasoning

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Intermediate
Harman Singh, Xiuyu Li et al.Mar 4arXiv

The paper shows that when a model compares two of its own answers head-to-head, it picks the right one more often than when it judges each answer alone.

#pairwise self-verification#test-time scaling#parallel reasoning

Tool Verification for Test-Time Reinforcement Learning

Intermediate
Ruotong Liao, Nikolai Röhrich et al.Mar 2arXiv

The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.

#test-time reinforcement learning#verification-weighted voting#tool verification

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

Intermediate
Yutong Wang, Siyuan Xiong et al.Feb 26arXiv

Multi-agent systems are like teams of smart helpers, but one bad message can mislead the whole team.

#multi-agent systems#error propagation#test-time rectification

SkillOrchestra: Learning to Route Agents via Skill Transfer

Beginner
Jiayu Wang, Yifei Ming et al.Feb 23arXiv

SkillOrchestra is a new way to make teams of AI models and tools work together by thinking in terms of skills, not just picking one big model for everything.

#agent orchestration#model routing#skill discovery

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Intermediate
Shuo He, Lang Feng et al.Feb 11arXiv

Training big language models with reinforcement learning can wobble because the per-token importance-sampling (IS) ratios swing wildly.

#Kalman filter#importance sampling ratio#policy optimization

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Intermediate
Zhenghao Xu, Qin Lu et al.Feb 5arXiv

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

Beginner
Yu-Ang Lee, Ching-Yun Ko et al.Feb 4arXiv

When you tune the learning rate carefully, plain old LoRA fine-tuning works about as well as fancy new versions.

#LoRA#parameter-efficient fine-tuning#learning rate tuning

Scaling Multiagent Systems with Process Rewards

Intermediate
Ed Li, Junyu Ren et al.Jan 30arXiv

This paper teaches AI teams to get better by scoring every move they make, not just the final answer.

#multiagent reinforcement learning#process rewards#AI feedback

FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation

Intermediate
Siyang He, Qiqi Wang et al.Jan 30arXiv

Diffusion language models (dLLMs) can write text in any order, but common decoding methods still prefer left-to-right, which wastes their superpower.

#diffusion language models#non-autoregressive generation#frequency-domain analysis

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Intermediate
Yibo Wang, Yongcheng Jing et al.Jan 29arXiv

This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.

#vision-text compression#optical memory#iterative reasoning

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision