Papers3

#unified multimodal models

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.

#agentic image generation#multimodal reasoning#retrieval-augmented generation

Not triaged yet

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Beginner

Zengbin Wang, Xuecai Hu et al.Jan 28arXiv

Text-to-image models draw pretty pictures, but often put things in the wrong places or miss how objects interact.

#text-to-image#spatial intelligence#occlusion

Not triaged yet

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Intermediate

Jialong Wu, Xiaoying Zhang et al.Jan 27arXiv

The paper argues that making and using pictures inside an AI’s thinking can help it reason more like humans, especially for real-world, physical and spatial problems.

#visual world modeling#multimodal chain-of-thought#unified multimodal models

Not triaged yet