Papers1262

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Jintao Tong, Shilin Yan et al.Feb 5arXiv

SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.

#SwimBird#switchable reasoning#hybrid autoregressive

Not triaged yet

DFlash: Block Diffusion for Flash Speculative Decoding

Intermediate

Jian Chen, Yesheng Liang et al.Feb 5arXiv

DFlash is a new way to make big language models answer much faster without changing the final answers.

#DFlash#speculative decoding#diffusion language model

Not triaged yet

InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

Intermediate

Sirui Xu, Samuel Schulter et al.Feb 5arXiv

InterPrior is a new brain for simulated humans and humanoid robots that can move, balance, and use objects by following simple goals instead of step-by-step instructions.

#human-object interaction#physics-based control#goal-conditioned policy

Not triaged yet

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Intermediate

Dongyang Chen, Chaoyang Wang et al.Feb 5arXiv

V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.

#V-Retrver#multimodal retrieval#agentic reasoning

Not triaged yet

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Intermediate

Shuo Chen, Cong Wei et al.Feb 5arXiv

The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.

#autoregressive video generation#long-context modeling#distribution matching distillation

Not triaged yet

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Intermediate

Haozhen Zhang, Haodong Yue et al.Feb 5arXiv

BudgetMem is a way for AI helpers to build and use memory on the fly, picking how much thinking to spend so answers are both good and affordable.

#runtime memory extraction#budget-tier routing#reinforcement learning

Not triaged yet

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

Beginner

Xianyang Liu, Shangding Gu et al.Feb 5arXiv

AgenticPay is a safe playground where AI agents practice buying and selling by talking, not just by typing numbers.

#multi-agent negotiation#language-mediated bargaining#LLM agents

Not triaged yet

RISE-Video: Can Video Generators Decode Implicit World Rules?

Intermediate

Mingxin Liu, Shuran Ma et al.Feb 5arXiv

RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.

#Text-Image-to-Video#video generation benchmark#reasoning alignment

Not triaged yet

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Intermediate

Tiansheng Hu, Yilun Zhao et al.Feb 5arXiv

SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.

#SAGE benchmark#scientific literature retrieval#deep research agents

Not triaged yet

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Intermediate

Junxiao Liu, Zhijun Wang et al.Feb 5arXiv

TRIT is a new training method that teaches AI to translate and think at the same time so it can solve hard problems in many languages without extra helper models.

#multilingual reasoning#translation-reasoning integration#self-translation

Not triaged yet

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Intermediate

Zhenghao Xu, Qin Lu et al.Feb 5arXiv

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

Not triaged yet

ContextBench: A Benchmark for Context Retrieval in Coding Agents

Intermediate

Han Li, Letian Zhu et al.Feb 5arXiv

ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.

#context retrieval#coding agents#software engineering benchmarks

Not triaged yet

26 27 28 29 30