🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers807

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Improving Recursive Transformers with Mixture of LoRAs

Intermediate
Mohammadmahdi Nouriborji, Morteza Rohanian et al.Dec 14arXiv

Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.

#Mixture of LoRAs#recursive transformers#parameter sharing

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Intermediate
Zhe Liu, Runhui Huang et al.Dec 14arXiv

DrivePI is a single, small (0.5B) multimodal language model that sees with cameras and LiDAR, talks in natural language, and plans driving actions all at once.

#DrivePI#Vision-Language-Action#3D occupancy

State over Tokens: Characterizing the Role of Reasoning Tokens

Intermediate
Mosh Levy, Zohar Elyoseph et al.Dec 14arXiv

Reasoning tokens (the words a model writes before its final answer) help the model think better, but they are not a trustworthy diary of how it really thought.

#State over Tokens#reasoning tokens#chain-of-thought

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Intermediate
Jingzhe Ding, Shengda Long et al.Dec 14arXiv

NL2Repo-Bench is a new benchmark that tests if coding agents can build a whole Python library from just one long natural-language document and an empty folder.

#NL2Repo-Bench#autonomous coding agents#long-horizon reasoning

WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment

Intermediate
Mahir Labib Dihan, Tanzima Hashem et al.Dec 14arXiv

WebOperator is a smart way for AI to use a map of choices (a search tree) to navigate websites safely and reach goals.

#web agent#tree search#best-first search

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

Intermediate
Jingdi Lei, Di Zhang et al.Dec 14arXiv

Standard attention is slow for long texts because it compares every word with every other word, which takes quadratic time.

#error-free linear attention#rank-1 matrix exponential#continuous-time dynamics

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Intermediate
Xiaoxuan Tang, Xinping Lei et al.Dec 13arXiv

AutoMV is a team of AI helpers that turns a whole song into a full music video that matches the music, the beat, and the lyrics.

#music-to-video generation#multi-agent system#music information retrieval

VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

Intermediate
Avinash Amballa, Yashas Malur Saidutta et al.Dec 12arXiv

VOYAGER is a training-free way to make large language models (LLMs) create data that is truly different, not just slightly reworded.

#VOYAGER#determinantal point process#dataset diversity

V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

Intermediate
Chenrui Fan, Yijun Liang et al.Dec 12arXiv

This paper introduces V-REX, a new benchmark that tests how AI systems reason about images through step-by-step exploration, not just final answers.

#V-REX#Chain-of-Questions#Exploratory visual reasoning

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Intermediate
Ye Fang, Tong Wu et al.Dec 12arXiv

V-RGBX is a new video editing system that lets you change the true building blocks of a scene—like base color, surface bumps, material, and lighting—rather than just painting over pixels.

#intrinsic video editing#inverse rendering#forward rendering

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Intermediate
Yang Fei, George Stoica et al.Dec 12arXiv

The paper teaches a video generator to move things realistically by borrowing motion knowledge from a strong video tracker.

#video diffusion#structure-preserving motion#SAM2

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Intermediate
Minglei Shi, Haolin Wang et al.Dec 12arXiv

This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.

#text-to-image#diffusion transformer#flow matching
5859606162