🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#zero-shot generalization

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Intermediate
Chengzu Li, Zanyi Wang et al.Jan 28arXiv

This paper shows that making short videos can help AI plan and reason in pictures better than writing out steps in text.

#video reasoning#visual planning#test-time scaling

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Intermediate
Sangbeom Lim, Seoung Wug Oh et al.Jan 20arXiv

VideoMaMa is a model that turns simple black-and-white object masks into soft, precise cutouts (alpha mattes) for every frame of a video.

#video matting#alpha matte#binary segmentation mask

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Intermediate
Yu Wang, Yi Wang et al.Jan 15arXiv

Cities are full of places defined by people, like schools and parks, which are hard to see clearly from space without extra clues.

#socio-semantic segmentation#vision-language model#reinforcement learning

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Intermediate
Shaocong Xu, Songlin Wei et al.Dec 29arXiv

Transparent and shiny objects confuse normal depth cameras, but video diffusion models already learned how light bends and reflects through them.

#video diffusion model#transparent object depth#normal estimation

Act2Goal: From World Model To General Goal-conditioned Policy

Intermediate
Pengfei Zhou, Liliang Chen et al.Dec 29arXiv

Robots often get confused on long, multi-step tasks when they only see the final goal image and try to guess the next move directly.

#goal-conditioned policy#visual world model#multi-scale temporal hashing

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Intermediate
Xin Lin, Meixi Song et al.Dec 18arXiv

This paper builds a foundation model called DAP that estimates real-world (metric) depth from any 360° panorama, indoors or outdoors.

#panoramic depth estimation#metric depth#360-degree vision