Papers1262

All Beginner Intermediate Advanced

All Sources arXiv

LIVE: Long-horizon Interactive Video World Modeling

Intermediate

Junchao Huang, Ziyang Ye et al.Feb 3arXiv

LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.

#cycle consistency#autoregressive video diffusion#exposure bias

Not triaged yet

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

Intermediate

Guangyi Liu, Pengxiang Zhao et al.Feb 3arXiv

MemGUI-Bench is a new test that checks how well phone-controlling AI agents can remember important information both during a task and across different tries.

#mobile GUI agents#memory benchmarking#short-term memory

Not triaged yet

No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

Intermediate

Vynska Amalia Permadi, Xingwei Tan et al.Feb 3arXiv

This paper builds ID-MoCQA, a new two-step (multi-hop) quiz set about Indonesian culture that makes AI connect clues before answering.

#multi-hop question answering#cultural reasoning#Indonesian culture

Not triaged yet

Instruction Anchors: Dissecting the Causal Dynamics of Modality Arbitration

Intermediate

Yu Zhang, Mufan Xu et al.Feb 3arXiv

The paper asks a simple question: when an AI sees a picture and some text but the instructions say 'only trust the picture,' how does it decide which one to follow?

#multimodal instruction following#modality arbitration#instruction tokens

Not triaged yet

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration

Intermediate

Bowei He, Minda Hu et al.Feb 3arXiv

This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.

#search-integrated reasoning#reinforcement learning#credit assignment

Not triaged yet

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Intermediate

Changze Lv, Jie Zhou et al.Feb 3arXiv

DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.

#DeepResearch#query-specific rubrics#human preference learning

Not triaged yet

CL-bench: A Benchmark for Context Learning

Beginner

Shihan Dou, Ming Zhang et al.Feb 3arXiv

CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.

#context learning#benchmark#rubric-based evaluation

Not triaged yet

HY3D-Bench: Generation of 3D Assets

Intermediate

Team Hunyuan3D, : et al.Feb 3arXiv

HY3D-Bench is a complete, open-source “starter kit” for making and studying high-quality 3D objects.

#HY3D-Bench#watertight meshes#part-level decomposition

Not triaged yet

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

Intermediate

Yizhao Gao, Jianyu Wei et al.Feb 3arXiv

HySparse is a new way for AI models to pay attention that mixes a few full attention layers with many fast, memory‑saving sparse layers.

#Hybrid Sparse Attention#Oracle Token Selection#KV Cache Sharing

Not triaged yet

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Intermediate

Bozhou Li, Yushuo Guan et al.Feb 3arXiv

The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.

#Diffusion Transformer#Text Conditioning#Multi-layer LLM Features

Not triaged yet

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces

Intermediate

Mingxuan Du, Benfeng Xu et al.Feb 3arXiv

A-RAG lets the AI choose how to search, what to read, and when to stop, instead of following a fixed recipe.

#Agentic RAG#Hierarchical Retrieval Interfaces#Keyword Search

Not triaged yet

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Intermediate

Shuang Sun, Huatong Song et al.Feb 3arXiv

SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.

#SWE-World#software engineering agents#Docker-free training

Not triaged yet

31 32 33 34 35