🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1055

AllBeginnerIntermediateAdvanced
All SourcesarXiv

ContextBench: A Benchmark for Context Retrieval in Coding Agents

Intermediate
Han Li, Letian Zhu et al.Feb 5arXiv

ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.

#context retrieval#coding agents#software engineering benchmarks

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Intermediate
Wei Liu, Jiawei Xu et al.Feb 5arXiv

This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.

#Triton kernels#Reinforcement learning#Policy gradient

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Intermediate
Xunzhi Xiang, Zixuan Duan et al.Feb 5arXiv

This paper fixes a big problem in long video generation: tiny mistakes that snowball over time and make the video drift and flicker.

#test-time correction#autoregressive video diffusion#distilled diffusion

BABE: Biology Arena BEnchmark

Intermediate
Junting Zhou, Jin Chen et al.Feb 5arXiv

BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.

#BABE Benchmark#Experimental Reasoning#Causal Reasoning

Reinforcement World Model Learning for LLM-based Agents

Intermediate
Xiao Yu, Baolin Peng et al.Feb 5arXiv

Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.

#Reinforcement World Model Learning#world modeling#LLM agents

Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation

Intermediate
Hai Zhang, Siqi Liang et al.Feb 5arXiv

Robots usually need very detailed, step-by-step directions, but real life often gives only short, simple goals like ‘find the red bench.’

#Beyond-the-View Navigation#Sparse Video Generation#Vision-Language Navigation

FastVMT: Eliminating Redundancy in Video Motion Transfer

Intermediate
Yue Ma, Zhikai Wang et al.Feb 5arXiv

FastVMT is a faster way to copy motion from one video to another without training a new model for each video.

#FastVMT#video motion transfer#Diffusion Transformer

Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Intermediate
Zhiqi Yu, Zhangquan Chen et al.Feb 5arXiv

The paper finds a hidden symmetry inside GRPO’s advantage calculation that accidentally stops models from exploring new good answers and from paying the right attention to easy versus hard problems at the right times.

#GRPO#GRAE#A-GRAE

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Intermediate
Shyam Sundhar Ramesh, Xiaotong Ji et al.Feb 5arXiv

Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.

#Multi-Task Learning#GRPO#Reinforcement Learning Post-Training

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Intermediate
Ji Zhao, Yufei Gu et al.Feb 5arXiv

Big idea: use a small, already-trained model to help a bigger model learn good habits early, so the big one trains faster and ends up smarter.

#Late-to-Early Training#LLM pretraining acceleration#representation alignment

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Intermediate
Zhenxiong Yu, Zhi Yang et al.Feb 5arXiv

Before this work, AI agents often stopped to run safety checks at every single step, which made them slow and still easy to trick in sneaky ways.

#Intrinsic Risk Sensing#Event-driven defense#Hierarchical Adaptive Screening

ProAct: Agentic Lookahead in Interactive Environments

Intermediate
Yangbin Yu, Mingyu Yang et al.Feb 5arXiv

ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.

#ProAct#GLAD#MC-Critic
2122232425