🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers915

AllBeginnerIntermediateAdvanced
All SourcesarXiv

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Intermediate
Bowen Xu, Shaoyu Wu et al.Feb 2arXiv

This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.

#task decomposition#tool use#large reasoning models

LoopViT: Scaling Visual ARC with Looped Transformers

Intermediate
Wen-Jie Shu, Xuerui Qiu et al.Feb 2arXiv

Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.

#ARC-AGI#visual reasoning#Looped Transformer

Quantifying the Gap between Understanding and Generation within Unified Multimodal Models

Intermediate
Chenlong Wang, Yuhang Chen et al.Feb 2arXiv

This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).

#Unified Multimodal Models#GAPEVAL#Gap Score

An Empirical Study of World Model Quantization

Intermediate
Zhongqian Fu, Tianyi Zhao et al.Feb 2arXiv

World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.

#world models#post-training quantization#DINO-WM

No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs

Intermediate
Liyan Xu, Mo Yu et al.Feb 2arXiv

Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.

#chain-of-thought#latent planning horizon#Tele-Lens

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Intermediate
FSVideo Team, Qingyu Chen et al.Feb 2arXiv

FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.

#FSVideo#image-to-video#video diffusion transformer

Closing the Loop: Universal Repository Representation with RPG-Encoder

Intermediate
Jane Luo, Chengyu Yin et al.Feb 2arXiv

The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).

#Repository Planning Graph#RPG-Encoder#semantic lifting

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Intermediate
Mohan Jiang, Dayuan Fu et al.Feb 2arXiv

Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.

#long-horizon agency#pull request chains#software evolution

WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

Beginner
Pengyu Wang, Benfeng Xu et al.Feb 2arXiv

WildGraphBench is a new test that checks how well GraphRAG systems find and combine facts from messy, real-world web pages.

#GraphRAG#Retrieval-Augmented Generation#Wikipedia references

Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

Intermediate
Wei Liu, Peijie Yu et al.Feb 2arXiv

The paper asks AI to hunt for insights in big databases without being told exact questions, like a curious scientist instead of a test-taker.

#Deep Data Research#Agentic LLMs#Investigatory Intelligence

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

Intermediate
Ionut-Vlad Modoranu, Philip Zmushko et al.Feb 2arXiv

Shampoo is a smart optimizer that can train models better than AdamW, but it used to be slow because it must compute tricky inverse matrix roots.

#Shampoo optimizer#second-order optimization#inverse matrix roots

Enhancing Multi-Image Understanding through Delimiter Token Scaling

Intermediate
Minyoung Lee, Yeji Park et al.Feb 2arXiv

Large Vision-Language Models (LVLMs) are great with one picture but get confused when you give them several, often mixing details from different images.

#Large Vision-Language Models#Multi-image understanding#Delimiter tokens
7891011