🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate
Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Not triaged yet

Spectral Attention Steering for Prompt Highlighting

Beginner
Weixian Waylon Li, Yuchen Niu et al.Mar 1arXiv

This paper teaches a new way to make a language model pay extra attention to the exact words you highlight in a prompt.

#attention steering#prompt highlighting#key embeddings

Not triaged yet

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Intermediate
Changwoo Baek, Jouwon Song et al.Mar 1arXiv

Big picture: Vision-language models look at hundreds of image pieces (tokens), which makes them slow and sometimes chatty with mistakes called hallucinations.

#visual token pruning#attention-based pruning#diversity-based pruning

Not triaged yet

Learn Hard Problems During RL with Reference Guided Fine-tuning

Intermediate
Yangzhen Wu, Shanda Li et al.Mar 1arXiv

ReGFT is a simple pre-RL step that shows the model partial human hints, then makes it solve problems in its own words, creating correct, model-style solutions for hard questions.

#Reference-Guided Fine-Tuning#ReGFT#ReFT

Not triaged yet

ArtLLM: Generating Articulated Assets via 3D LLM

Intermediate
Penghao Wang, Siyuan Xie et al.Mar 1arXiv

ArtLLM is a 3D large language model that turns a rough 3D shape (from an image, text, or mesh) into a complete, movable 3D object with parts and joints.

#Articulated 3D objects#3D large language model#Point cloud understanding

Not triaged yet

Unified Vision-Language Modeling via Concept Space Alignment

Intermediate
Yifu Qiu, Paul-Ambroise Duquenne et al.Mar 1arXiv

The paper builds v-Sonar, a bridge that maps images and videos into the same meaning-space as text called Sonar, so all modalities “speak” the same language.

#v-Sonar#OmniSONAR#concept space alignment

Not triaged yet

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Intermediate
Zebin You, Xiaolu Zhang et al.Mar 1arXiv

LLaDA-o is a new AI that understands pictures and text and can also make images, all in one model.

#LLaDA-o#Mixture of Diffusion#masked diffusion models

Not triaged yet

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

Intermediate
Yang Cao, Feize Wu et al.Mar 1arXiv

The paper introduces VGGT-Det, a new way to detect 3D objects indoors from many photos without needing sensor-provided camera poses or depth maps.

#Sensor-Geometry-Free 3D detection#Indoor multi-view detection#VGGT

Not triaged yet

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Beginner
Xinyu Zhu, Yihao Feng et al.Mar 1arXiv

CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.

#CHIMERA dataset#synthetic data generation#chain-of-thought

Not triaged yet

Qwen3-Coder-Next Technical Report

Intermediate
Ruisheng Cao, Mouxiang Chen et al.Feb 28arXiv

Qwen3-Coder-Next is an open-weight coding model that uses only 3B of its 80B total parameters at a time, so it runs fast while still being smart.

#Qwen3-Coder-Next#agentic training#verifiable coding tasks

Not triaged yet

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

Intermediate
Xingguo Xu, Zhanyu Liu et al.Feb 28arXiv

STMI is a new way to recognize the same object across different kinds of cameras (color, night-vision, and thermal) without throwing away useful details.

#multi-modal re-identification#RGB-NIR-TIR fusion#segmentation-guided attention

Not triaged yet

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Intermediate
Fanqi Kong, Jiayi Zhang et al.Feb 28arXiv

Many real-life requests to AI helpers are vague, so agents must ask good questions before acting.

#Information-driven RL#Turn-level credit assignment#Counterfactual masking

Not triaged yet

56789