🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers159

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#GRPO

NVIDIA Nemotron 3: Efficient and Open Intelligence

Intermediate
NVIDIA, : et al.Dec 24arXiv

Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.

#Nemotron 3#Mixture-of-Experts#LatentMoE

Not triaged yet

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Intermediate
Runtao Liu, Ziyi Liu et al.Dec 23arXiv

LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.

#long video question answering#multi-agent reasoning#temporal grounding

Not triaged yet

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Intermediate
Ying Zhu, Jiaxin Wan et al.Dec 23arXiv

This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.

#Diffusion Language Model#Blockwise dLLM#Post-Training

Not triaged yet

Multi-hop Reasoning via Early Knowledge Alignment

Intermediate
Yuxin Wang, Shicheng Fang et al.Dec 23arXiv

This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.

#Retrieval-Augmented Generation#Iterative RAG#Multi-hop Reasoning

Not triaged yet

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

Intermediate
Yiming Du, Baojun Wang et al.Dec 23arXiv

Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.

#temporal reasoning#multi-session dialogue#reinforcement learning

Not triaged yet

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Intermediate
Jiacheng Guo, Ling Yang et al.Dec 22arXiv

GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.

#GenEnv#co-evolutionary learning#difficulty-aligned curriculum

Not triaged yet

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

Intermediate
Xinyao Liao, Qiyuan He et al.Dec 22arXiv

Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.

#autoregressive image generation#tokenizer–generator alignment#pixel-space reconstruction

Not triaged yet

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Intermediate
Rujiao Long, Yang Li et al.Dec 19arXiv

Reasoning Palette gives a language or vision-language model a tiny hidden “mood” (a latent code) before it starts answering, so it chooses a smarter plan rather than just rolling dice on each next word.

#Reasoning Palette#latent contextualization#VAE

Not triaged yet

Reinforcement Learning for Self-Improving Agent with Skill Library

Intermediate
Jiongxiao Wang, Qiaojing Yan et al.Dec 18arXiv

This paper teaches AI agents to learn new reusable skills and get better over time by using reinforcement learning, not just prompts.

#Reinforcement Learning#Skill Library#Sequential Rollout

Not triaged yet

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Intermediate
Junbo Li, Peng Zhou et al.Dec 18arXiv

Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.

#Turn-PPO#multi-turn reinforcement learning#agentic LLMs

Not triaged yet

Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

Intermediate
Qihao Liu, Chengzhi Mao et al.Dec 18arXiv

AuditDM is a friendly 'auditor' model that hunts for where vision-language models get things wrong and then creates the right practice to fix them.

#AuditDM#model auditing#cross-model divergence

Not triaged yet

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Intermediate
Tianyuan Qu, Lei Ke et al.Dec 18arXiv

RePlan is a plan-then-execute system that first figures out exactly where to edit in a picture and then makes clean changes there.

#instruction-based image editing#vision–language model (VLM)#diffusion model

Not triaged yet

910111213