🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers906

AllBeginnerIntermediateAdvanced
All SourcesarXiv

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Intermediate
Mohan Jiang, Dayuan Fu et al.Feb 2arXiv

Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.

#long-horizon agency#pull request chains#software evolution

WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

Beginner
Pengyu Wang, Benfeng Xu et al.Feb 2arXiv

WildGraphBench is a new test that checks how well GraphRAG systems find and combine facts from messy, real-world web pages.

#GraphRAG#Retrieval-Augmented Generation#Wikipedia references

Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

Intermediate
Wei Liu, Peijie Yu et al.Feb 2arXiv

The paper asks AI to hunt for insights in big databases without being told exact questions, like a curious scientist instead of a test-taker.

#Deep Data Research#Agentic LLMs#Investigatory Intelligence

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

Intermediate
Ionut-Vlad Modoranu, Philip Zmushko et al.Feb 2arXiv

Shampoo is a smart optimizer that can train models better than AdamW, but it used to be slow because it must compute tricky inverse matrix roots.

#Shampoo optimizer#second-order optimization#inverse matrix roots

Enhancing Multi-Image Understanding through Delimiter Token Scaling

Intermediate
Minyoung Lee, Yeji Park et al.Feb 2arXiv

Large Vision-Language Models (LVLMs) are great with one picture but get confused when you give them several, often mixing details from different images.

#Large Vision-Language Models#Multi-image understanding#Delimiter tokens

How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

Intermediate
Huanyu Zhang, Xuehai Bai et al.Feb 2arXiv

VIBE is a new test that checks how well image-editing AI models follow visual instructions like arrows, boxes, and sketches—not just text.

#visual instruction following#image editing benchmark#deictic grounding

Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention

Intermediate
Dvir Samuel, Issar Tzachor et al.Feb 2arXiv

The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.

#autoregressive video diffusion#KV cache compression#sparse attention

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Intermediate
Yuling Shi, Chaoxiang Xie et al.Feb 2arXiv

The paper tests a simple but bold idea: show code to AI as pictures instead of plain text, then shrink those pictures to save tokens and time.

#multimodal language models#code as images#visual code understanding

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

Intermediate
Jun He, Junyan Ye et al.Feb 2arXiv

Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.

#agentic image generation#multimodal reasoning#retrieval-augmented generation

ObjEmbed: Towards Universal Multimodal Object Embeddings

Intermediate
Shenghao Fu, Yukun Su et al.Feb 2arXiv

ObjEmbed teaches an AI to understand not just whole pictures, but each object inside them, and to link those objects to the right words.

#object embeddings#IoU embedding#visual grounding

TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios

Intermediate
Yuanzhe Shen, Zisu Huang et al.Feb 2arXiv

TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.

#TRIP-Bench#long-horizon agents#multi-turn interaction

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Intermediate
Zhongyuan Peng, Caijun Xu et al.Feb 2arXiv

CoDiQ is a recipe for making hard-but-solvable math and coding questions on purpose, and it controls how hard they get while you generate them.

#controllable difficulty#test-time scaling#question generation
7891011