🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#high-resolution perception

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Beginner
You Li, Chi Chen et al.Feb 26arXiv

The paper asks a simple question: do the model’s invisible “imagination tokens” actually help it reason about images?

#multimodal large language model#visual reasoning#latent visual reasoning

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Intermediate
Jintao Tong, Shilin Yan et al.Feb 5arXiv

SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.

#SwimBird#switchable reasoning#hybrid autoregressive

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Intermediate
Yong Xien Chng, Tao Hu et al.Dec 30arXiv

SenseNova-MARS is a vision-language model that can think step-by-step and use three tools—text search, image search, and image cropping—during its reasoning.

#multimodal agent#vision-language model#reinforcement learning

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

Intermediate
Junjie Bai, Yu-Wei Chao et al.Dec 10arXiv

This paper shows how to make home-helper robots better at long, multi-step chores by smart training on diverse tasks and by polishing the model after training using its own best attempts.

#Vision-Language-Action#long-horizon manipulation#rejection sampling fine-tuning