Papers1055

All Beginner Intermediate Advanced

All Sources arXiv

DreamOmni3: Scribble-based Editing and Generation

Intermediate

Bin Xia, Bohao Peng et al.Dec 27arXiv

DreamOmni3 lets people edit and create images by combining text, example images, and quick hand-drawn scribbles.

#scribble-based editing#scribble-based generation#joint input scheme

Not triaged yet

Monadic Context Engineering

Intermediate

Yifan Zhang, Yang Yuan et al.Dec 27arXiv

Monadic Context Engineering (MCE) is a way to build AI agents using math-inspired Lego blocks called Functors, Applicatives, and Monads so state, errors, and side effects are handled automatically.

#Monadic Context Engineering#AgentMonad#Functor

Not triaged yet

Self-Evaluation Unlocks Any-Step Text-to-Image Generation

Intermediate

Xin Yu, Xiaojuan Qi et al.Dec 26arXiv

This paper introduces Self-E, a text-to-image model that learns from scratch and can generate good pictures in any number of steps, from just a few to many.

#Self-Evaluating Model#Any-step inference#Text-to-image generation

Not triaged yet

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

Intermediate

Wensi Huang, Shaohao Zhu et al.Dec 26arXiv

Real life directions are often vague, so the paper creates a task where a robot can ask questions while it searches for a very specific object in a big house.

#embodied AI#interactive navigation#instance goal navigation

Not triaged yet

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Intermediate

Shuoshuo Zhang, Yizhen Zhang et al.Dec 26arXiv

The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.

#BiPS#perceptual shaping#vision-language models

Not triaged yet

ProEdit: Inversion-based Editing From Prompts Done Right

Intermediate

Zhi Ouyang, Dian Zheng et al.Dec 26arXiv

ProEdit is a training-free, plug-and-play method that fixes a common problem in image and video editing: the model clings too hard to the original picture and refuses to change what you asked for.

#ProEdit#inversion-based editing#KV-mix

Not triaged yet

Yume-1.5: A Text-Controlled Interactive World Generation Model

Intermediate

Xiaofeng Mao, Zhen Li et al.Dec 26arXiv

Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.

#interactive world generation#video diffusion#temporal-spatial-channel modeling

Not triaged yet

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Intermediate

Yiheng Wang, Yixin Chen et al.Dec 26arXiv

SciEvalKit is a new open-source toolkit that tests AI on real scientific skills, not just trivia or simple Q&A.

#scientific intelligence evaluation#multimodal scientific reasoning#symbolic reasoning in AI

Not triaged yet

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate

Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

Not triaged yet

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Intermediate

Hanzhang Zhou, Xu Zhang et al.Dec 26arXiv

MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.

#GUI agent#GUI grounding#mobile navigation

Not triaged yet

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Intermediate

Shaofei Cai, Yulei Qin et al.Dec 26arXiv

SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.

#Self-verifying agents#Evidence curation#3C principles

Not triaged yet

SWE-RM: Execution-free Feedback For Software Engineering Agents

Intermediate

KaShun Shum, Binyuan Hui et al.Dec 26arXiv

Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.

#execution-free feedback#reward model#software engineering agents

Not triaged yet

65 66 67 68 69