🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Intermediate
Yuxi Xiao, Longfei Li et al.Dec 23arXiv

SpatialTree is a new, four-level "ability tree" that tests how multimodal AI models (that see and read) handle space: from basic seeing to acting in the world.

#Spatial Intelligence#Multimodal Large Language Models#Hierarchical Benchmark

Active Intelligence in Video Avatars via Closed-loop World Modeling

Intermediate
Xuanhua He, Tianyu Yang et al.Dec 23arXiv

The paper turns video avatars from passive puppets into active doers that can plan, act, check their own work, and fix mistakes over many steps.

#ORCA#L-IVA#Internal World Model

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Intermediate
Seijin Kobayashi, Yanick Schimpf et al.Dec 23arXiv

The paper shows that big sequence models (like transformers) quietly learn longer goals inside their hidden activations, even though they are trained one step at a time.

#hierarchical reinforcement learning#temporal abstractions#autoregressive models

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Intermediate
Amirhosein Ghasemabadi, Di NiuDec 23arXiv

Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.

#self-awareness#large language models#hidden states

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Intermediate
Shengchao Zhou, Yuxin Chen et al.Dec 23arXiv

The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).

#dynamic spatial reasoning#vision-language models#4D understanding

Step-DeepResearch Technical Report

Intermediate
Chen Hu, Haikuo Du et al.Dec 23arXiv

Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.

#Deep Research#Atomic Capabilities#ReAct Agent

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Intermediate
Byung-Kwan Lee, Yu-Chiang Frank Wang et al.Dec 23arXiv

Big vision-language models are super smart but too large to fit on phones and small devices.

#vision-language models#knowledge distillation#masking teacher

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

Intermediate
Wenzheng Zeng, Mingyu Ouyang et al.Dec 23arXiv

SlideTailor is an AI system that turns a scientific paper into personalized presentation slides that match what a specific user likes.

#personalized slide generation#preference-guided summarization#implicit preference distillation

FaithLens: Detecting and Explaining Faithfulness Hallucination

Intermediate
Shuzheng Si, Qingyi Wang et al.Dec 23arXiv

Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.

#faithfulness hallucination#hallucination detection#explainable AI

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Intermediate
Ying Zhu, Jiaxin Wan et al.Dec 23arXiv

This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.

#Diffusion Language Model#Blockwise dLLM#Post-Training

Multi-hop Reasoning via Early Knowledge Alignment

Intermediate
Yuxin Wang, Shicheng Fang et al.Dec 23arXiv

This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.

#Retrieval-Augmented Generation#Iterative RAG#Multi-hop Reasoning

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

Intermediate
Yiming Du, Baojun Wang et al.Dec 23arXiv

Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.

#temporal reasoning#multi-session dialogue#reinforcement learning
5556575859