🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers16

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#spatial reasoning

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Intermediate
Jianwen Sun, Yukang Feng et al.Jan 14arXiv

World Craft lets anyone turn a short text description into a playable, visual game world without coding.

#AI Town#multi-agent framework#layout generation

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Intermediate
Rang Li, Lei Li et al.Dec 19arXiv

Visual grounding is when an AI finds the exact thing in a picture that a sentence is talking about, and this paper shows today’s big vision-language AIs are not as good at it as we thought.

#visual grounding#multimodal large language models#benchmark

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Intermediate
Yuxin Wang, Lei Ke et al.Dec 18arXiv

This paper teaches a vision-language model to first find objects in real 3D space (not just 2D pictures) and then reason about where things are.

#3D grounding#vision-language models#spatial reasoning

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Beginner
Zefeng Zhang, Xiangzhao Hao et al.Dec 4arXiv

COOPER is a single AI model that both “looks better” (perceives depth and object boundaries) and “thinks smarter” (reasons step by step) to answer spatial questions about images.

#COOPER#multimodal large language model#unified model
12