Papers1262

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.

#Triton kernels#Reinforcement learning#Policy gradient

Not triaged yet

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Intermediate

Xunzhi Xiang, Zixuan Duan et al.Feb 5arXiv

This paper fixes a big problem in long video generation: tiny mistakes that snowball over time and make the video drift and flicker.

#test-time correction#autoregressive video diffusion#distilled diffusion

Not triaged yet

BABE: Biology Arena BEnchmark

Intermediate

Junting Zhou, Jin Chen et al.Feb 5arXiv

BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.

#BABE Benchmark#Experimental Reasoning#Causal Reasoning

Not triaged yet

Reinforcement World Model Learning for LLM-based Agents

Intermediate

Xiao Yu, Baolin Peng et al.Feb 5arXiv

Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.

#Reinforcement World Model Learning#world modeling#LLM agents

Not triaged yet

Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation

Intermediate

Hai Zhang, Siqi Liang et al.Feb 5arXiv

Robots usually need very detailed, step-by-step directions, but real life often gives only short, simple goals like ‘find the red bench.’

#Beyond-the-View Navigation#Sparse Video Generation#Vision-Language Navigation

Not triaged yet

FastVMT: Eliminating Redundancy in Video Motion Transfer

Intermediate

Yue Ma, Zhikai Wang et al.Feb 5arXiv

FastVMT is a faster way to copy motion from one video to another without training a new model for each video.

#FastVMT#video motion transfer#Diffusion Transformer

Not triaged yet

Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Intermediate

Zhiqi Yu, Zhangquan Chen et al.Feb 5arXiv

The paper finds a hidden symmetry inside GRPO’s advantage calculation that accidentally stops models from exploring new good answers and from paying the right attention to easy versus hard problems at the right times.

#GRPO#GRAE#A-GRAE

Not triaged yet

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Intermediate

Shyam Sundhar Ramesh, Xiaotong Ji et al.Feb 5arXiv

Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.

#Multi-Task Learning#GRPO#Reinforcement Learning Post-Training

Not triaged yet

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Intermediate

Ji Zhao, Yufei Gu et al.Feb 5arXiv

Big idea: use a small, already-trained model to help a bigger model learn good habits early, so the big one trains faster and ends up smarter.

#Late-to-Early Training#LLM pretraining acceleration#representation alignment

Not triaged yet

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Intermediate

Zhenxiong Yu, Zhi Yang et al.Feb 5arXiv

Before this work, AI agents often stopped to run safety checks at every single step, which made them slow and still easy to trick in sneaky ways.

#Intrinsic Risk Sensing#Event-driven defense#Hierarchical Adaptive Screening

Not triaged yet

ProAct: Agentic Lookahead in Interactive Environments

Intermediate

Yangbin Yu, Mingyu Yang et al.Feb 5arXiv

ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.

#ProAct#GLAD#MC-Critic

Not triaged yet

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Intermediate

Fanfan Liu, Youyang Yin et al.Feb 5arXiv

The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.

#LUSPO#RLVR#GRPO

Not triaged yet

27 28 29 30 31