🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers915

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Intermediate
Dylan Zhang, Yufeng Xu et al.Feb 1arXiv

The paper shows that a model that looks great after supervised fine-tuning (SFT) can actually do worse after the same reinforcement learning (RL) than a model that looked weaker at SFT time.

#Supervised Fine-Tuning#Reinforcement Learning#Distribution Mismatch

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Intermediate
Hyesung Jeon, Hyeongju Ha et al.Feb 1arXiv

Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.

#LoRA#Multi-LoRA#KV cache

Sparse Reward Subsystem in Large Language Models

Intermediate
Guowei Xu, Mert Yuksekgonul et al.Feb 1arXiv

The paper discovers a tiny, special group of neurons inside large language models (LLMs) that act like a reward system in the human brain.

#value neurons#dopamine neurons#reward prediction error

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Intermediate
I. Apanasevich, M. Artemyev et al.Jan 31arXiv

Green-VLA is a step-by-step training recipe that teaches one model to see, understand language, and move many kinds of robots safely and efficiently.

#Vision-Language-Action#Unified Action Space#Multi-embodiment Pretraining

Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning

Intermediate
Zhipeng Chen, Xiaobo Qin et al.Jan 31arXiv

This paper teaches a model to make its own helpful hints (sub-questions) and then use those hints to learn better with reinforcement learning that checks answers automatically.

#RLVR#Large Reasoning Models#Sub-question Guidance

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Intermediate
Shengrui Li, Fei Zhao et al.Jan 31arXiv

Training big language models works best when you mix the right kinds of data (general, math, code), but finding the best mix used to be slow and very expensive.

#data mixture optimization#model merging#weighted model merging

Position: Agentic Evolution is the Path to Evolving LLMs

Intermediate
Minhua Lin, Hanqing Lu et al.Jan 30arXiv

Big AI models do great in the lab but stumble in the real world because the world keeps changing.

#agentic evolution#A-Evolve#deployment-time adaptation

VoxServe: Streaming-Centric Serving System for Speech Language Models

Intermediate
Keisuke Kamahori, Wei-Tzu Lee et al.Jan 30arXiv

VoxServe is a new serving system that makes voice AIs respond fast and smoothly when streaming audio to users.

#Speech Language Models#streaming#Time-To-First-Audio

PaperBanana: Automating Academic Illustration for AI Scientists

Beginner
Dawei Zhu, Rui Meng et al.Jan 30arXiv

PaperBanana is a team of AI helpers that turns a paper’s method text and caption into a clean, accurate, publication-ready figure.

#academic illustration#methodology diagrams#visual language models

Scaling Multiagent Systems with Process Rewards

Intermediate
Ed Li, Junyu Ren et al.Jan 30arXiv

This paper teaches AI teams to get better by scoring every move they make, not just the final answer.

#multiagent reinforcement learning#process rewards#AI feedback

Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience

Intermediate
Zhongxiang Sun, Qipeng Wang et al.Jan 30arXiv

Deep search agents can plan and browse the web in many steps, but they often fail because they don’t notice when their own thinking drifts off-track.

#deep search agents#metacognition#consistency monitoring

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Intermediate
Fanmeng Wang, Haotian Liu et al.Jan 30arXiv

Chain-of-Thought (CoT) makes AI think step by step, but it is slow because it writes many tokens one by one.

#Chain-of-Thought#Latent Reasoning#Variational Auto-Encoder
1011121314