Papers1262

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Qianben Chen, Tianrui Qin et al.Feb 26arXiv

This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.

#agentic search#parallel evidence acquisition#plan refinement

Not triaged yet

dLLM: Simple Diffusion Language Modeling

Intermediate

Zhanhui Zhou, Lingjie Chen et al.Feb 26arXiv

dLLM is a single, open-source toolbox that standardizes how diffusion language models are trained, run, and tested.

#diffusion language models#masked diffusion#block diffusion

Not triaged yet

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Beginner

Zhiheng Song, Jingshuai Zhang et al.Feb 26arXiv

MobilityBench is a big, carefully built test that checks how well AI helpers can plan real-world routes using natural language and map tools.

#MobilityBench#route-planning agents#large language models

Not triaged yet

Transformers converge to invariant algorithmic cores

Intermediate

Joshua S. SchiffmanFeb 26arXiv

Different transformers may have very different weights, but they often hide the same tiny "engine" inside that actually does the task.

#algorithmic cores#mechanistic interpretability#transformers

Not triaged yet

Causal Motion Diffusion Models for Autoregressive Motion Generation

Intermediate

Qing Yu, Akihisa Watanabe et al.Feb 26arXiv

The paper introduces CMDM, a new way to make computer-generated human motions that feel smooth over time and match the meaning of a text prompt.

#causal diffusion#autoregressive motion generation#text-to-motion

Not triaged yet

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Intermediate

Zezhou Wang, Youjie Li et al.Feb 25arXiv

This paper makes training giant AI models faster and lighter on memory by inventing a new way to split tensors called RaggedShard.

#FSDP#ZeRO#RaggedShard

Not triaged yet

Solaris: Building a Multiplayer Video World Model in Minecraft

Intermediate

Georgy Savva, Oscar Michel et al.Feb 25arXiv

Solaris is a new AI that can imagine the future videos of two Minecraft players at the same time, keeping both cameras consistent with each other.

#multiplayer world model#video diffusion transformer#Minecraft dataset

Not triaged yet

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Intermediate

Hanna Yukhymenko, Anton Alexandrov et al.Feb 25arXiv

The paper builds an automated pipeline that translates AI benchmarks and datasets into many languages while keeping questions and answers correctly connected.

#machine translation#multilingual benchmarks#test-time compute scaling

Not triaged yet

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Intermediate

Rui Yang, Qianhui Wu et al.Feb 25arXiv

GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.

#GUI agent#visual grounding#long-horizon navigation

Not triaged yet

World Guidance: World Modeling in Condition Space for Action Generation

Intermediate

Yue Su, Sijin Chen et al.Feb 25arXiv

WoG (World Guidance) teaches a robot to imagine just the right bits of the near future and use those bits to pick better actions.

#Vision-Language-Action#world modeling#condition space

Not triaged yet

SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

Intermediate

Guibin Chen, Dixuan Lin et al.Feb 25arXiv

SkyReels-V4 is a single, unified model that makes videos and matching sounds together, while also letting you fix or change parts of a video.

#multimodal diffusion transformer#video-audio generation#inpainting

Not triaged yet

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Intermediate

Liangbing Zhao, Le Zhuo et al.Feb 25arXiv

The paper turns image editing from a one-step “before → after” trick into a mini physics simulation that follows real-world rules.

#physics-aware image editing#physical state transition#latent transition priors

Not triaged yet

9 10 11 12 13