Papers38

#chain-of-thought

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Azmine Toushik Wasi, Wahid Faisal et al.Feb 3arXiv

SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.

#SpatiaLab#spatial reasoning#vision-language models

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Intermediate

Shuang Sun, Huatong Song et al.Feb 3arXiv

SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.

#SWE-World#software engineering agents#Docker-free training

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Intermediate

Xiao Liang, Zhong-Zhi Li et al.Feb 2arXiv

The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.

#divide-and-conquer reasoning#chain-of-thought#reinforcement learning

Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling

Intermediate

Andong Chen, Wenxin Zhu et al.Feb 2arXiv

This paper shows that comics (multi-panel pictures with words) can help AI think through problems step by step, just like a student explains their work.

#multimodal reasoning#visual storytelling#comics

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Intermediate

Dianyi Wang, Chaofan Ma et al.Feb 2arXiv

UniReason is a single, unified model that plans with world knowledge before making an image and then edits its own result to fix mistakes, like a student drafting and revising an essay.

#unified multimodal model#world knowledge reasoning#text-to-image generation

Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Intermediate

Harold Haodong Chen, Xinxiang Yin et al.Feb 2arXiv

LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.

#latent reasoning#text-to-image generation#autoregressive models

No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs

Intermediate

Liyan Xu, Mo Yu et al.Feb 2arXiv

Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.

#chain-of-thought#latent planning horizon#Tele-Lens

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

Intermediate

Jun He, Junyan Ye et al.Feb 2arXiv

Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.

#agentic image generation#multimodal reasoning#retrieval-augmented generation

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Intermediate

Seanie Lee, Sangwoo Park et al.Jan 30arXiv

Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.

#THINKSAFE#self-generated safety alignment#refusal steering

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

Intermediate

Chuxue Cao, Jinluan Yang et al.Jan 30arXiv

Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.

#formal logic verification#interleaved verification#neuro-symbolic reasoning

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Intermediate

Honglin Lin, Zheng Liu et al.Jan 29arXiv

MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.

#multimodal reasoning#vision-language models#chain-of-thought

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Intermediate

Zhi Zheng, Wee Sun LeeJan 29arXiv

The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.

#latent reasoning#chain-of-thought#variational autoencoder

1 2 3 4