🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers792

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Step-DeepResearch Technical Report

Intermediate
Chen Hu, Haikuo Du et al.Dec 23arXiv

Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.

#Deep Research#Atomic Capabilities#ReAct Agent

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Intermediate
Byung-Kwan Lee, Yu-Chiang Frank Wang et al.Dec 23arXiv

Big vision-language models are super smart but too large to fit on phones and small devices.

#vision-language models#knowledge distillation#masking teacher

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

Intermediate
Wenzheng Zeng, Mingyu Ouyang et al.Dec 23arXiv

SlideTailor is an AI system that turns a scientific paper into personalized presentation slides that match what a specific user likes.

#personalized slide generation#preference-guided summarization#implicit preference distillation

FaithLens: Detecting and Explaining Faithfulness Hallucination

Intermediate
Shuzheng Si, Qingyi Wang et al.Dec 23arXiv

Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.

#faithfulness hallucination#hallucination detection#explainable AI

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Intermediate
Ying Zhu, Jiaxin Wan et al.Dec 23arXiv

This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.

#Diffusion Language Model#Blockwise dLLM#Post-Training

Multi-hop Reasoning via Early Knowledge Alignment

Intermediate
Yuxin Wang, Shicheng Fang et al.Dec 23arXiv

This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.

#Retrieval-Augmented Generation#Iterative RAG#Multi-hop Reasoning

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

Intermediate
Yiming Du, Baojun Wang et al.Dec 23arXiv

Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.

#temporal reasoning#multi-session dialogue#reinforcement learning

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Intermediate
Ming Li, Chenrui Fan et al.Dec 23arXiv

This paper turns messy chains of thought from language models into clear, named steps so we can see how they really think through math problems.

#Schoenfeld’s Episode Theory#Cognitive Episodes#ThinkARM

How Much 3D Do Video Foundation Models Encode?

Intermediate
Zixuan Huang, Xiang Li et al.Dec 23arXiv

This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?

#video foundation models#3D awareness#temporal reasoning

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Intermediate
Weichen Fan, Haiwen Diao et al.Dec 22arXiv

The paper proposes the Prism Hypothesis: meanings (semantics) mainly live in low frequencies, while fine picture details live in high frequencies.

#Prism Hypothesis#Unified Autoencoding#Frequency-Band Modulator

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Intermediate
Jiacheng Guo, Ling Yang et al.Dec 22arXiv

GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.

#GenEnv#co-evolutionary learning#difficulty-aligned curriculum

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

Intermediate
Xinyao Liao, Qiyuan He et al.Dec 22arXiv

Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.

#autoregressive image generation#tokenizer–generator alignment#pixel-space reconstruction
4647484950