🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Instruction Following

Quantifying the Gap between Understanding and Generation within Unified Multimodal Models

Intermediate
Chenlong Wang, Yuhang Chen et al.Feb 2arXiv

This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).

#Unified Multimodal Models#GAPEVAL#Gap Score

LLM-in-Sandbox Elicits General Agentic Intelligence

Beginner
Daixuan Cheng, Shaohan Huang et al.Jan 22arXiv

This paper shows that giving an AI a safe, tiny virtual computer (a sandbox) lets it solve many kinds of problems better, not just coding ones.

#LLM-in-Sandbox#Agentic Intelligence#Reinforcement Learning

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Intermediate
Yu Xu, Hongbin Yan et al.Jan 12arXiv

TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.

#Task-Aware Gating#Mixture-of-Experts#Unified Image Generation

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Intermediate
Yuanyang Yin, Yufan Deng et al.Jan 12arXiv

Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.

#Image-to-Video generation#Diffusion Transformer#Controllability

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Beginner
Ruiyan Han, Zhen Fang et al.Jan 6arXiv

This paper fixes a common problem in multimodal AI: models can understand pictures and words well but stumble when asked to create matching images.

#Unified Multimodal Models#Self-Generated Supervision#Conduction Aphasia

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Intermediate
Tianshuai Hu, Xiaolu Liu et al.Dec 18arXiv

Traditional self-driving used separate boxes for seeing, thinking, and acting, but tiny mistakes in early boxes could snowball into big problems later.

#Vision-Language-Action#End-to-End Autonomous Driving#Dual-System VLA