🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1055

AllBeginnerIntermediateAdvanced
All SourcesarXiv

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate
Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Intermediate
Changwoo Baek, Jouwon Song et al.Mar 1arXiv

Big picture: Vision-language models look at hundreds of image pieces (tokens), which makes them slow and sometimes chatty with mistakes called hallucinations.

#visual token pruning#attention-based pruning#diversity-based pruning

Learn Hard Problems During RL with Reference Guided Fine-tuning

Intermediate
Yangzhen Wu, Shanda Li et al.Mar 1arXiv

ReGFT is a simple pre-RL step that shows the model partial human hints, then makes it solve problems in its own words, creating correct, model-style solutions for hard questions.

#Reference-Guided Fine-Tuning#ReGFT#ReFT

ArtLLM: Generating Articulated Assets via 3D LLM

Intermediate
Penghao Wang, Siyuan Xie et al.Mar 1arXiv

ArtLLM is a 3D large language model that turns a rough 3D shape (from an image, text, or mesh) into a complete, movable 3D object with parts and joints.

#Articulated 3D objects#3D large language model#Point cloud understanding

Unified Vision-Language Modeling via Concept Space Alignment

Intermediate
Yifu Qiu, Paul-Ambroise Duquenne et al.Mar 1arXiv

The paper builds v-Sonar, a bridge that maps images and videos into the same meaning-space as text called Sonar, so all modalities “speak” the same language.

#v-Sonar#OmniSONAR#concept space alignment

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Intermediate
Zebin You, Xiaolu Zhang et al.Mar 1arXiv

LLaDA-o is a new AI that understands pictures and text and can also make images, all in one model.

#LLaDA-o#Mixture of Diffusion#masked diffusion models

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

Intermediate
Yang Cao, Feize Wu et al.Mar 1arXiv

The paper introduces VGGT-Det, a new way to detect 3D objects indoors from many photos without needing sensor-provided camera poses or depth maps.

#Sensor-Geometry-Free 3D detection#Indoor multi-view detection#VGGT

Qwen3-Coder-Next Technical Report

Intermediate
Ruisheng Cao, Mouxiang Chen et al.Feb 28arXiv

Qwen3-Coder-Next is an open-weight coding model that uses only 3B of its 80B total parameters at a time, so it runs fast while still being smart.

#Qwen3-Coder-Next#agentic training#verifiable coding tasks

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

Intermediate
Xingguo Xu, Zhanyu Liu et al.Feb 28arXiv

STMI is a new way to recognize the same object across different kinds of cameras (color, night-vision, and thermal) without throwing away useful details.

#multi-modal re-identification#RGB-NIR-TIR fusion#segmentation-guided attention

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Intermediate
Fanqi Kong, Jiayi Zhang et al.Feb 28arXiv

Many real-life requests to AI helpers are vague, so agents must ask good questions before acting.

#Information-driven RL#Turn-level credit assignment#Counterfactual masking

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

Intermediate
Yinghao Ma, Haiwen Xia et al.Feb 28arXiv

Modern music AIs can follow text, lyrics, and even example audio, but judges that score these songs have not kept up.

#music reward model#compositional multimodal instruction#text-to-music evaluation

Spectral Condition for $μ$P under Width-Depth Scaling

Intermediate
Chenyu Zheng, Rongzhen Wang et al.Feb 28arXiv

Big AI models keep getting wider (more neurons per layer) and deeper (more layers), which often makes training unstable and hyperparameters hard to reuse.

#maximal update parametrization#μP#spectral condition
34567