🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers792

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Plenoptic Video Generation

Intermediate
Xiao Fu, Shitao Tang et al.Jan 8arXiv

PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.

#plenoptic function#camera-controlled video generation#video re-rendering

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Intermediate
Shuming Liu, Mingchen Zhuge et al.Jan 8arXiv

The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?

#video reasoning#adaptive reasoning#early exit

CoV: Chain-of-View Prompting for Spatial Reasoning

Intermediate
Haoyu Zhao, Akide Liu et al.Jan 8arXiv

This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.

#Chain-of-View Prompting#Embodied Question Answering#Active Viewpoint Reasoning

RelayLLM: Efficient Reasoning via Collaborative Decoding

Intermediate
Chengsong Huang, Tong Zheng et al.Jan 8arXiv

RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.

#token-level collaboration#<call>n</call> command#collaborative decoding

DocDancer: Towards Agentic Document-Grounded Information Seeking

Intermediate
Qintong Zhang, Xinjie Lv et al.Jan 8arXiv

DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.

#Document Question Answering#Agentic Information Seeking#ReAct

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Intermediate
Sixiao Zheng, Minghao Yin et al.Jan 8arXiv

VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.

#Video world model#4D Geometric Control#3D Gaussian trajectories

Token-Level LLM Collaboration via FusionRoute

Intermediate
Nuoya Xiong, Yuhang Zhou et al.Jan 8arXiv

Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.

#FusionRoute#token-level collaboration#expert routing

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Intermediate
Maksim Velikanov, Ilyas Chahed et al.Jan 8arXiv

The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.

#learnable multipliers#weight decay#noise–WD equilibrium

SmartSearch: Process Reward-Guided Query Refinement for Search Agents

Intermediate
Tongyu Wen, Guanting Dong et al.Jan 8arXiv

SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.

#Search agents#Process rewards#Query refinement

DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation

Intermediate
Guanzhi Deng, Bo Li et al.Jan 8arXiv

Mixture-of-Experts (MoE) models use many small specialist networks and only activate a few per token, but classic LoRA fine-tuning gives every expert the same rank, wasting parameters on the wrong experts.

#DR-LoRA#Mixture-of-Experts#Low-Rank Adaptation

AgentOCR: Reimagining Agent History via Optical Self-Compression

Intermediate
Lang Feng, Fuchao Yang et al.Jan 8arXiv

AgentOCR turns an agent’s long text history into pictures so it can remember more using fewer tokens.

#AgentOCR#optical self-compression#visual tokens

AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search

Intermediate
Zefang Zong, Dingwei Chen et al.Jan 8arXiv

AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.

#Agentic Reinforcement Learning#Turn-level Optimization#Tree Search
3435363738