🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

MemoBrain: Executive Memory as an Agentic Brain for Reasoning

Intermediate
Hongjin Qian, Zhao Cao et al.Jan 12arXiv

MemoBrain is like a helpful co-pilot for AI that keeps important thoughts neat and ready so the main thinker (the agent) doesn’t get overwhelmed.

#Executive memory#Tool-augmented agents#Context budget

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Intermediate
Kewei Zhang, Ye Huang et al.Jan 12arXiv

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

More Images, More Problems? A Controlled Analysis of VLM Failure Modes

Intermediate
Anurag Das, Adrian Bulat et al.Jan 12arXiv

Large Vision-Language Models (LVLMs) look great on single images but often stumble when they must reason across multiple images.

#Large Vision-Language Models#Multi-image reasoning#Cross-image aggregation

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Intermediate
Bowen Yang, Kaiming Jin et al.Jan 12arXiv

Computer-using agents kept forgetting important visual details over long tasks and could not reliably find up-to-date, step-by-step help for unfamiliar apps.

#computer-using agents#vision-language models#milestone memory

Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning

Intermediate
Jiaxuan Lu, Ziyu Kong et al.Jan 12arXiv

This paper teaches AI to build and improve its own small computer helpers (tools) while solving science problems, instead of relying only on a fixed toolbox made beforehand.

#Test-Time Tool Evolution#Dynamic tool synthesis#Scientific reasoning

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Intermediate
Yu Xu, Hongbin Yan et al.Jan 12arXiv

TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.

#Task-Aware Gating#Mixture-of-Experts#Unified Image Generation

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

Intermediate
Lei Zhang, Mouxiang Chen et al.Jan 12arXiv

MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.

#agent orchestration#distributed systems#event-driven architecture

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

Intermediate
Siqi Zhu, Jiaxuan YouJan 12arXiv

OpenTinker is an open-source system that makes training AI agents with reinforcement learning simple, modular, and reusable.

#Reinforcement learning#LLM agents#Agent–environment interaction

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Intermediate
Linhao Zhong, Linyu Wu et al.Jan 12arXiv

Diffusion Language Models (DLMs) write by polishing whole sentences in several passes instead of one token at a time.

#Diffusion Language Models#Masked Diffusion#Soft Token Distributions

Controlled Self-Evolution for Algorithmic Code Optimization

Intermediate
Tu Hu, Ronghao Chen et al.Jan 12arXiv

The paper introduces Controlled Self-Evolution (CSE), a smarter way for AI to write and improve code quickly under a tight budget of tries.

#Controlled Self-Evolution#Code optimization#Self-evolving agents

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Intermediate
Jiapeng Shi, Junke Wang et al.Jan 12arXiv

VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.

#Video Large Language Model#Temporal Grounding#Referring Video Object Segmentation

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Intermediate
Yuanyang Yin, Yufan Deng et al.Jan 12arXiv

Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.

#Image-to-Video generation#Diffusion Transformer#Controllability
3738394041