🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers130

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Beginner
Zhenyu Zhang, Shujian Zhang et al.Dec 30arXiv

This paper shows a new way (called RISE) to find and control how AI models think without needing any human-made labels.

#RISE#sparse auto-encoder#reasoning vectors

Nested Browser-Use Learning for Agentic Information Seeking

Beginner
Baixuan Li, Jialong Wu et al.Dec 29arXiv

This paper teaches AI helpers to browse the web more like people do, not just by grabbing static snippets.

#information-seeking agents#browser-use#ReAct function-calling

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Beginner
Ethan Chern, Zhulin Hu et al.Dec 29arXiv

LiveTalk turns slow, many-step video diffusion into a fast, 4-step, real-time system for talking avatars that listen, think, and respond with synchronized video.

#real-time video diffusion#on-policy distillation#multimodal conditioning

Video-Browser: Towards Agentic Open-web Video Browsing

Beginner
Zhengyang Liang, Yan Shu et al.Dec 28arXiv

The paper tackles how AI agents can truly research the open web when the answers are hidden inside long, messy videos, not just text.

#agentic video browsing#pyramidal perception#video understanding

SVBench: Evaluation of Video Generation Models on Social Reasoning

Beginner
Wenshuo Peng, Gongxuan Wang et al.Dec 25arXiv

SVBench is the first benchmark that checks whether video generation models can show realistic social behavior, not just pretty pictures.

#social reasoning#video generation#benchmark

Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Beginner
Li-Zhong Szu-Tu, Ting-Lin Wu et al.Dec 24arXiv

The paper builds YearGuessr, a giant, worldwide photo-and-text dataset of 55,546 buildings with their construction years (1001–2024), GPS, and popularity (page views).

#YearGuessr#building age estimation#ordinal regression

C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling

Beginner
Jin Qin, Zihan Liao et al.Dec 24arXiv

C2LLM is a new family of code embedding models that helps computers find the right code faster and more accurately.

#code retrieval#embedding model#cross-attention pooling

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Beginner
Zhe Cao, Tao Wang et al.Dec 24arXiv

T2AV-Compass is a new, unified test to fairly grade AI systems that turn text into matching video and audio.

#Text-to-Audio-Video generation#multimodal evaluation#cross-modal alignment

Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

Beginner
Jinghan Li, Yang Jin et al.Dec 24arXiv

This paper introduces NExT-Vid, a way to teach a video model by asking it to guess the next frame of a video while parts of the past are hidden.

#autoregressive video pretraining#masked next-frame prediction#context isolation

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

Beginner
Gül Sena Altıntaş, Malikeh Ehghaghi et al.Dec 23arXiv

TokSuite is a science lab for tokenizers: it trains 14 language models that are identical in every way except for how they split text into tokens.

#tokenization#tokenizer robustness#Byte Pair Encoding (BPE)

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Beginner
Hanyang Kong, Xingyi Yang et al.Dec 22arXiv

WorldWarp is a new method that turns a single photo plus a planned camera path into a long, steady, 3D-consistent video.

#Novel View Synthesis#3D Gaussian Splatting#Spatio-Temporal Diffusion

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Beginner
Yuqiao Tan, Minzheng Wang et al.Dec 22arXiv

Large language models (LLMs) don’t act as a single brain; inside, each layer and module quietly makes its own mini-decisions called internal policies.

#Bottom-up Policy Optimization#internal layer policy#internal modular policy
678910