Papers791

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Bowen Yang, Kaiming Jin et al.Jan 12arXiv

Computer-using agents kept forgetting important visual details over long tasks and could not reliably find up-to-date, step-by-step help for unfamiliar apps.

#computer-using agents#vision-language models#milestone memory

Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning

Intermediate

Jiaxuan Lu, Ziyu Kong et al.Jan 12arXiv

This paper teaches AI to build and improve its own small computer helpers (tools) while solving science problems, instead of relying only on a fixed toolbox made beforehand.

#Test-Time Tool Evolution#Dynamic tool synthesis#Scientific reasoning

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Intermediate

Yu Xu, Hongbin Yan et al.Jan 12arXiv

TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.

#Task-Aware Gating#Mixture-of-Experts#Unified Image Generation

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

Intermediate

Lei Zhang, Mouxiang Chen et al.Jan 12arXiv

MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.

#agent orchestration#distributed systems#event-driven architecture

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

Intermediate

Siqi Zhu, Jiaxuan YouJan 12arXiv

OpenTinker is an open-source system that makes training AI agents with reinforcement learning simple, modular, and reusable.

#Reinforcement learning#LLM agents#Agent–environment interaction

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Intermediate

Linhao Zhong, Linyu Wu et al.Jan 12arXiv

Diffusion Language Models (DLMs) write by polishing whole sentences in several passes instead of one token at a time.

#Diffusion Language Models#Masked Diffusion#Soft Token Distributions

Controlled Self-Evolution for Algorithmic Code Optimization

Intermediate

Tu Hu, Ronghao Chen et al.Jan 12arXiv

The paper introduces Controlled Self-Evolution (CSE), a smarter way for AI to write and improve code quickly under a tight budget of tries.

#Controlled Self-Evolution#Code optimization#Self-evolving agents

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Intermediate

Jiapeng Shi, Junke Wang et al.Jan 12arXiv

VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.

#Video Large Language Model#Temporal Grounding#Referring Video Object Segmentation

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Intermediate

Yuanyang Yin, Yufan Deng et al.Jan 12arXiv

Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.

#Image-to-Video generation#Diffusion Transformer#Controllability

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Intermediate

Zizhen Li, Chuanhao Li et al.Jan 12arXiv

MeepleLM is a special AI that reads a board game’s rulebook and pretends to be different kinds of players to give helpful, honest feedback.

#virtual playtesting#persona-aligned critique#MDA reasoning

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Intermediate

Seongyun Lee, Yongrae Jo et al.Jan 12arXiv

The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.

#NoisyBench#Rationale-Aware Reward#RARE

Dr. Zero: Self-Evolving Search Agents without Training Data

Intermediate

Zhenrui Yue, Kartikeya Upasani et al.Jan 11arXiv

Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.

#Dr. Zero#self-evolution#proposer-solver

31 32 33 34 35