🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers924

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DeepSeek-OCR 2: Visual Causal Flow

Intermediate
Haoran Wei, Yaofeng Sun et al.Jan 28arXiv

DeepSeek-OCR 2 teaches a computer to “read” pictures of documents in a smarter order, more like how people read.

#DeepSeek-OCR 2#DeepEncoder V2#visual tokens

Advancing Open-source World Models

Intermediate
Robbyant Team, Zelin Gao et al.Jan 28arXiv

LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.

#world model#video diffusion#causal attention

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

Intermediate
Runjie Zhou, Youbo Shao et al.Jan 28arXiv

WorldVQA is a new test that checks if multimodal AI models can correctly name what they see in pictures without doing extra reasoning.

#WorldVQA#atomic visual knowledge#multimodal large language models

Efficient Autoregressive Video Diffusion with Dummy Head

Intermediate
Hang Guo, Zhaoyang Jia et al.Jan 28arXiv

This paper finds that about 1 out of every 4 attention heads in autoregressive video diffusion models mostly looks only at the current frame and almost ignores the past, wasting memory and time.

#autoregressive video diffusion#multi-head self-attention#KV cache compression

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate
Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Beginner
Zengbin Wang, Xuecai Hu et al.Jan 28arXiv

Text-to-image models draw pretty pictures, but often put things in the wrong places or miss how objects interact.

#text-to-image#spatial intelligence#occlusion

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Intermediate
Haoyou Deng, Keyu Yan et al.Jan 28arXiv

DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.

#DenseGRPO#flow matching#GRPO

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Intermediate
Jinyang Wu, Shuo Yang et al.Jan 28arXiv

SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.

#SPARK#dynamic branching#strategic exploration

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Intermediate
Vikash Singh, Darion Cassel et al.Jan 27arXiv

VERGE is a teamwork system where an AI writer (an LLM) works with a strict math checker (an SMT solver) to make answers both smart and logically sound.

#VERGE#neurosymbolic reasoning#SMT solver

Self-Distillation Enables Continual Learning

Intermediate
Idan Shenfeld, Mehul Damani et al.Jan 27arXiv

This paper shows a simple way for AI models to keep learning new things without forgetting what they already know.

#Self-Distillation Fine-Tuning#On-Policy Distillation#Continual Learning

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Intermediate
Chen Chen, Lai WeiJan 27arXiv

Big AI models used to get better by getting wider or reading longer texts, but those tricks are slowing down.

#Keel#Post-LayerNorm#Pre-LayerNorm

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Intermediate
Jialong Wu, Xiaoying Zhang et al.Jan 27arXiv

The paper argues that making and using pictures inside an AI’s thinking can help it reason more like humans, especially for real-world, physical and spatial problems.

#visual world modeling#multimodal chain-of-thought#unified multimodal models
1718192021