πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Q-Former

World Guidance: World Modeling in Condition Space for Action Generation

Intermediate
Yue Su, Sijin Chen et al.Feb 25arXiv

WoG (World Guidance) teaches a robot to imagine just the right bits of the near future and use those bits to pick better actions.

#Vision-Language-Action#world modeling#condition space

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

Intermediate
Yuxuan Yang, Zhonghao Yan et al.Feb 23arXiv

Hepato-LLaVA is a special AI that reads giant microscope pictures of the liver and answers medical questions about cancer.

#Hepato-LLaVA#Hepatocellular Carcinoma#Whole Slide Images

Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation

Intermediate
Hai Zhang, Siqi Liang et al.Feb 5arXiv

Robots usually need very detailed, step-by-step directions, but real life often gives only short, simple goals like β€˜find the red bench.’

#Beyond-the-View Navigation#Sparse Video Generation#Vision-Language Navigation

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Intermediate
Shengchao Zhou, Yuxin Chen et al.Dec 23arXiv

The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).

#dynamic spatial reasoning#vision-language models#4D understanding