🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers127

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Intermediate
Yu Wang, Yi Wang et al.Jan 15arXiv

Cities are full of places defined by people, like schools and parks, which are hard to see clearly from space without extra clues.

#socio-semantic segmentation#vision-language model#reinforcement learning

STEP3-VL-10B Technical Report

Beginner
Ailin Huang, Chengyuan Yao et al.Jan 14arXiv

STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.

#multimodal foundation model#unified pre-training#perception encoder

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

Intermediate
Lijun Liu, Linwei Chen et al.Jan 14arXiv

SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.

#dermatology AI#vision-language model#Dynamic Visual Encoding

TranslateGemma Technical Report

Intermediate
Mara Finkelstein, Isaac Caswell et al.Jan 13arXiv

TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.

#machine translation#TranslateGemma#Gemma 3

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Intermediate
Youwei Liu, Jian Wang et al.Jan 13arXiv

Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.

#Imagine-then-Plan#world models#adaptive lookahead

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Intermediate
Yao Tang, Li Dong et al.Jan 13arXiv

The paper introduces Multiplex Thinking, a new way for AI to think by sampling several likely next words at once and blending them into a single super-token.

#Multiplex Thinking#chain-of-thought#continuous token

VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory

Intermediate
Shaoan Wang, Yuanfei Luo et al.Jan 13arXiv

VLingNav is a robot navigation system that sees, reads instructions, and acts, while deciding when to think hard and when to just move.

#Vision-Language-Action#embodied navigation#adaptive chain-of-thought

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

Intermediate
Lei Zhang, Mouxiang Chen et al.Jan 12arXiv

MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.

#agent orchestration#distributed systems#event-driven architecture

The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Beginner
Weihao Xuan, Qingcheng Zeng et al.Jan 12arXiv

This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.

#LLM agents#calibration#overconfidence

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Intermediate
Seongyun Lee, Yongrae Jo et al.Jan 12arXiv

The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.

#NoisyBench#Rationale-Aware Reward#RARE

Dr. Zero: Self-Evolving Search Agents without Training Data

Intermediate
Zhenrui Yue, Kartikeya Upasani et al.Jan 11arXiv

Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.

#Dr. Zero#self-evolution#proposer-solver

Solar Open Technical Report

Intermediate
Sungrae Park, Sanghoon Kim et al.Jan 11arXiv

Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.

#Solar Open#Mixture-of-Experts#bilingual LLM
34567