🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers33

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Reinforcement Learning

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Intermediate
Shikun Sun, Liao Qu et al.Jan 5arXiv

Visual Autoregressive (VAR) models draw whole grids of image tokens at once across multiple scales, which makes standard reinforcement learning (RL) unstable.

#Visual Autoregressive (VAR)#Reinforcement Learning#GRPO

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Intermediate
Huichao Zhang, Liao Qu et al.Jan 5arXiv

NextFlow is a single, decoder-only Transformer that can read and write both text and images in one continuous sequence.

#Next-Scale Prediction#Autoregressive Transformer#Dual-Codebook Tokenization

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

Intermediate
Zhuofan Shi, Hubao A et al.Jan 5arXiv

MDAgent2 is a special helper built from large language models (LLMs) that can both answer questions about molecular dynamics and write runnable LAMMPS simulation code.

#Molecular Dynamics#LAMMPS#Code Generation

K-EXAONE Technical Report

Intermediate
Eunbi Choi, Kibong Choi et al.Jan 5arXiv

K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.

#Mixture-of-Experts#Hybrid Attention#Sliding Window Attention

CPPO: Contrastive Perception for Vision Language Policy Optimization

Intermediate
Ahmad Rezaei, Mohsen Gholami et al.Jan 1arXiv

CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.

#CPPO#Contrastive Perception Loss#Vision-Language Models

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Intermediate
Jiawei Chen, Xintian Shen et al.Dec 29arXiv

MindWatcher is a smart AI agent that can think step by step and decide when to use tools like web search, image zooming, and a code calculator to solve tough, multi-step problems.

#Tool-Integrated Reasoning#Interleaved Thinking#Multimodal Chain-of-Thought

NVIDIA Nemotron 3: Efficient and Open Intelligence

Intermediate
NVIDIA, : et al.Dec 24arXiv

Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.

#Nemotron 3#Mixture-of-Experts#LatentMoE

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Intermediate
NVIDIA, : et al.Dec 23arXiv

Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.

#Mixture-of-Experts#Mamba-2#Transformer

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Intermediate
Yuxi Xiao, Longfei Li et al.Dec 23arXiv

SpatialTree is a new, four-level "ability tree" that tests how multimodal AI models (that see and read) handle space: from basic seeing to acting in the world.

#Spatial Intelligence#Multimodal Large Language Models#Hierarchical Benchmark

Step-DeepResearch Technical Report

Intermediate
Chen Hu, Haikuo Du et al.Dec 23arXiv

Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.

#Deep Research#Atomic Capabilities#ReAct Agent

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Intermediate
Ying Zhu, Jiaxin Wan et al.Dec 23arXiv

This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.

#Diffusion Language Model#Blockwise dLLM#Post-Training

Multi-hop Reasoning via Early Knowledge Alignment

Intermediate
Yuxin Wang, Shicheng Fang et al.Dec 23arXiv

This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.

#Retrieval-Augmented Generation#Iterative RAG#Multi-hop Reasoning
123