Papers924

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

Lei Zhang, Mouxiang Chen et al.Jan 12arXiv

MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.

#agent orchestration#distributed systems#event-driven architecture

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

Intermediate

Siqi Zhu, Jiaxuan YouJan 12arXiv

OpenTinker is an open-source system that makes training AI agents with reinforcement learning simple, modular, and reusable.

#Reinforcement learning#LLM agents#Agent–environment interaction

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Intermediate

Linhao Zhong, Linyu Wu et al.Jan 12arXiv

Diffusion Language Models (DLMs) write by polishing whole sentences in several passes instead of one token at a time.

#Diffusion Language Models#Masked Diffusion#Soft Token Distributions

Controlled Self-Evolution for Algorithmic Code Optimization

Intermediate

Tu Hu, Ronghao Chen et al.Jan 12arXiv

The paper introduces Controlled Self-Evolution (CSE), a smarter way for AI to write and improve code quickly under a tight budget of tries.

#Controlled Self-Evolution#Code optimization#Self-evolving agents

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Intermediate

Jiapeng Shi, Junke Wang et al.Jan 12arXiv

VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.

#Video Large Language Model#Temporal Grounding#Referring Video Object Segmentation

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Intermediate

Yuanyang Yin, Yufan Deng et al.Jan 12arXiv

Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.

#Image-to-Video generation#Diffusion Transformer#Controllability

The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Beginner

Weihao Xuan, Qingcheng Zeng et al.Jan 12arXiv

This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.

#LLM agents#calibration#overconfidence

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Intermediate

Zizhen Li, Chuanhao Li et al.Jan 12arXiv

MeepleLM is a special AI that reads a board game’s rulebook and pretends to be different kinds of players to give helpful, honest feedback.

#virtual playtesting#persona-aligned critique#MDA reasoning

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Intermediate

Seongyun Lee, Yongrae Jo et al.Jan 12arXiv

The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.

#NoisyBench#Rationale-Aware Reward#RARE

Dr. Zero: Self-Evolving Search Agents without Training Data

Intermediate

Zhenrui Yue, Kartikeya Upasani et al.Jan 11arXiv

Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.

#Dr. Zero#self-evolution#proposer-solver

Solar Open Technical Report

Intermediate

Sungrae Park, Sanghoon Kim et al.Jan 11arXiv

Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.

#Solar Open#Mixture-of-Experts#bilingual LLM

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Beginner

Haonan Bian, Zhiyuan Yao et al.Jan 11arXiv

RealMem is a new benchmark that tests how well AI assistants remember and manage long, ongoing projects across many conversations.

#RealMem#long-term memory#project-oriented interactions

36 37 38 39 40