🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1061

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Agentic Code Reasoning

Intermediate
Shubham Ugare, Satish ChandraMar 2arXiv

The paper teaches AI agents to understand big codebases without running the code by following a strict, step-by-step thinking template called semi-formal reasoning.

#agentic code reasoning#semi-formal reasoning#patch equivalence

Not triaged yet

FireRed-OCR Technical Report

Intermediate
Hao Wu, Haoran Lou et al.Mar 2arXiv

FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.

#FireRed-OCR#structural hallucination#document parsing

Not triaged yet

Surgical Post-Training: Cutting Errors, Keeping Knowledge

Intermediate
Wenye Lin, Kai HanMar 2arXiv

The paper introduces SPOT, a training recipe that fixes an AI model’s mistakes with tiny edits while keeping what it already knows well.

#Surgical Post-Training#SPOT#DPO

Not triaged yet

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Intermediate
Qiyuan Zhang, Yufei Wang et al.Mar 2arXiv

Longer explanations are not always better; the shape of thinking matters.

#Generative Reward Models#Chain-of-Thought#Breadth-CoT

Not triaged yet

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Intermediate
Qiyuan Zhang, Junyi Zhou et al.Mar 2arXiv

RubricBench is a new benchmark that checks whether AI judges can use clear, checklist-style rules (rubrics) the way humans do.

#RubricBench#rubric-guided evaluation#reward models

Not triaged yet

LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

Intermediate
Jiajie Jin, Yanzhao Zhang et al.Mar 2arXiv

LaSER teaches a fast search model to “think” quietly inside its hidden space, so it gets the benefits of step-by-step reasoning without writing those steps out as text.

#dense retrieval#chain-of-thought#latent reasoning

Not triaged yet

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate
Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Not triaged yet

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Intermediate
Changwoo Baek, Jouwon Song et al.Mar 1arXiv

Big picture: Vision-language models look at hundreds of image pieces (tokens), which makes them slow and sometimes chatty with mistakes called hallucinations.

#visual token pruning#attention-based pruning#diversity-based pruning

Not triaged yet

Learn Hard Problems During RL with Reference Guided Fine-tuning

Intermediate
Yangzhen Wu, Shanda Li et al.Mar 1arXiv

ReGFT is a simple pre-RL step that shows the model partial human hints, then makes it solve problems in its own words, creating correct, model-style solutions for hard questions.

#Reference-Guided Fine-Tuning#ReGFT#ReFT

Not triaged yet

ArtLLM: Generating Articulated Assets via 3D LLM

Intermediate
Penghao Wang, Siyuan Xie et al.Mar 1arXiv

ArtLLM is a 3D large language model that turns a rough 3D shape (from an image, text, or mesh) into a complete, movable 3D object with parts and joints.

#Articulated 3D objects#3D large language model#Point cloud understanding

Not triaged yet

Unified Vision-Language Modeling via Concept Space Alignment

Intermediate
Yifu Qiu, Paul-Ambroise Duquenne et al.Mar 1arXiv

The paper builds v-Sonar, a bridge that maps images and videos into the same meaning-space as text called Sonar, so all modalities “speak” the same language.

#v-Sonar#OmniSONAR#concept space alignment

Not triaged yet

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Intermediate
Zebin You, Xiaolu Zhang et al.Mar 1arXiv

LLaDA-o is a new AI that understands pictures and text and can also make images, all in one model.

#LLaDA-o#Mixture of Diffusion#masked diffusion models

Not triaged yet

34567