Papers1262

Yume-1.5: A Text-Controlled Interactive World Generation Model

Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.

#interactive world generation#video diffusion#temporal-spatial-channel modeling

Not triaged yet

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Intermediate

Yiheng Wang, Yixin Chen et al.Dec 26arXiv

SciEvalKit is a new open-source toolkit that tests AI on real scientific skills, not just trivia or simple Q&A.

#scientific intelligence evaluation#multimodal scientific reasoning#symbolic reasoning in AI

Not triaged yet

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate

Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

Not triaged yet

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Intermediate

Hanzhang Zhou, Xu Zhang et al.Dec 26arXiv

MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.

#GUI agent#GUI grounding#mobile navigation

Not triaged yet

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Intermediate

Shaofei Cai, Yulei Qin et al.Dec 26arXiv

SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.

#Self-verifying agents#Evidence curation#3C principles

Not triaged yet

SWE-RM: Execution-free Feedback For Software Engineering Agents

Intermediate

KaShun Shum, Binyuan Hui et al.Dec 26arXiv

Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.

#execution-free feedback#reward model#software engineering agents

Not triaged yet

TimeBill: Time-Budgeted Inference for Large Language Models

Intermediate

Qi Fan, An Zou et al.Dec 26arXiv

TimeBill is a way to help big AI models finish their answers on time without ruining answer quality.

#time-budgeted inference#response length prediction#execution time estimation

Not triaged yet

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Intermediate

Mengqi He, Xinyu Tian et al.Dec 26arXiv

The paper shows that when vision-language models write captions, only a small set of uncertain words (about 20%) act like forks that steer the whole sentence.

#vision-language models#autoregressive generation#entropy

Not triaged yet

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Intermediate

Steven Xiao, Xindi Zhang et al.Dec 25arXiv

This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.

#Knot Forcing#autoregressive video diffusion#temporal knot

Not triaged yet

An Information Theoretic Perspective on Agentic System Design

Intermediate

Shizhe He, Avanika Narayan et al.Dec 25arXiv

The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.

#agentic systems#compressor-predictor#mutual information

Not triaged yet

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Intermediate

Shuo Cao, Jiayang Li et al.Dec 25arXiv

This paper teaches AI to notice not just what is in a picture, but how the picture looks and feels to people.

#perceptual image understanding#image aesthetics assessment (IAA)#image quality assessment (IQA)

Not triaged yet

SVBench: Evaluation of Video Generation Models on Social Reasoning

Beginner

Wenshuo Peng, Gongxuan Wang et al.Dec 25arXiv

SVBench is the first benchmark that checks whether video generation models can show realistic social behavior, not just pretty pictures.

#social reasoning#video generation#benchmark

Not triaged yet

79 80 81 82 83