🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers807

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Intermediate
Yuqing Li, Jiangnan Li et al.Dec 19arXiv

Humans keep a big-picture memory (a “mindscape”) when reading long things; this paper teaches AI to do the same.

#Retrieval-Augmented Generation#Mindscape#Hierarchical Summarization

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Intermediate
Rujiao Long, Yang Li et al.Dec 19arXiv

Reasoning Palette gives a language or vision-language model a tiny hidden “mood” (a latent code) before it starts answering, so it chooses a smarter plan rather than just rolling dice on each next word.

#Reasoning Palette#latent contextualization#VAE

Reinforcement Learning for Self-Improving Agent with Skill Library

Intermediate
Jiongxiao Wang, Qiaojing Yan et al.Dec 18arXiv

This paper teaches AI agents to learn new reusable skills and get better over time by using reinforcement learning, not just prompts.

#Reinforcement Learning#Skill Library#Sequential Rollout

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Intermediate
Chiao-An Yang, Ryo Hachiuma et al.Dec 18arXiv

This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.

#4D perception#multimodal large language models#perceptual distillation

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Intermediate
Junbo Li, Peng Zhou et al.Dec 18arXiv

Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.

#Turn-PPO#multi-turn reinforcement learning#agentic LLMs

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

Intermediate
Hanlin Wang, Hao Ouyang et al.Dec 18arXiv

WorldCanvas lets you make videos where things happen exactly how you ask by combining three inputs: text (what happens), drawn paths called trajectories (when and where it happens), and reference images (who it is).

#WorldCanvas#promptable world events#trajectory-controlled video generation

EasyV2V: A High-quality Instruction-based Video Editing Framework

Intermediate
Jinjie Mai, Chaoyang Wang et al.Dec 18arXiv

EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”

#instruction-based video editing#spatiotemporal mask#text-to-video fine-tuning

Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

Intermediate
Qihao Liu, Chengzhi Mao et al.Dec 18arXiv

AuditDM is a friendly 'auditor' model that hunts for where vision-language models get things wrong and then creates the right practice to fix them.

#AuditDM#model auditing#cross-model divergence

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Intermediate
Chaoyang Wang, Kaituo Feng et al.Dec 18arXiv

AdaTooler-V teaches an image-and-video AI to first ask, “Do I really need a tool?” before using one, which saves time and boosts accuracy.

#adaptive tool-use#multimodal chain-of-thought#visual tool interactions

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Intermediate
Guibao Shen, Yihua Du et al.Dec 18arXiv

StereoPilot is a new AI that turns regular 2D videos into 3D (stereo) videos quickly and with high quality.

#stereo video conversion#monocular-to-stereo#depth ambiguity

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Intermediate
Xin Lin, Meixi Song et al.Dec 18arXiv

This paper builds a foundation model called DAP that estimates real-world (metric) depth from any 360° panorama, indoors or outdoors.

#panoramic depth estimation#metric depth#360-degree vision

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Intermediate
Peter Chen, Xiaopeng Li et al.Dec 18arXiv

The paper studies why two opposite-sounding tricks in RL for reasoning—adding random (spurious) rewards and reducing randomness (entropy)—can both seem to help large language models think better.

#RLVR#Group Relative Policy Optimization#ratio clipping
5051525354