Papers776

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Jintao Tong, Shilin Yan et al.Feb 5arXiv

SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.

#SwimBird#switchable reasoning#hybrid autoregressive

DFlash: Block Diffusion for Flash Speculative Decoding

Intermediate

Jian Chen, Yesheng Liang et al.Feb 5arXiv

DFlash is a new way to make big language models answer much faster without changing the final answers.

#DFlash#speculative decoding#diffusion language model

InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

Intermediate

Sirui Xu, Samuel Schulter et al.Feb 5arXiv

InterPrior is a new brain for simulated humans and humanoid robots that can move, balance, and use objects by following simple goals instead of step-by-step instructions.

#human-object interaction#physics-based control#goal-conditioned policy

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Intermediate

Dongyang Chen, Chaoyang Wang et al.Feb 5arXiv

V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.

#V-Retrver#multimodal retrieval#agentic reasoning

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Intermediate

Shuo Chen, Cong Wei et al.Feb 5arXiv

The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.

#autoregressive video generation#long-context modeling#distribution matching distillation

RISE-Video: Can Video Generators Decode Implicit World Rules?

Intermediate

Mingxin Liu, Shuran Ma et al.Feb 5arXiv

RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.

#Text-Image-to-Video#video generation benchmark#reasoning alignment

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Intermediate

Tiansheng Hu, Yilun Zhao et al.Feb 5arXiv

SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.

#SAGE benchmark#scientific literature retrieval#deep research agents

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Intermediate

Zhenghao Xu, Qin Lu et al.Feb 5arXiv

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Intermediate

Wei Liu, Jiawei Xu et al.Feb 5arXiv

This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.

#Triton kernels#Reinforcement learning#Policy gradient

BABE: Biology Arena BEnchmark

Intermediate

Junting Zhou, Jin Chen et al.Feb 5arXiv

BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.

#BABE Benchmark#Experimental Reasoning#Causal Reasoning

Reinforcement World Model Learning for LLM-based Agents

Intermediate

Xiao Yu, Baolin Peng et al.Feb 5arXiv

Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.

#Reinforcement World Model Learning#world modeling#LLM agents

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Intermediate

Shyam Sundhar Ramesh, Xiaotong Ji et al.Feb 5arXiv

Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.

#Multi-Task Learning#GRPO#Reinforcement Learning Post-Training

1 2 3 4 5