🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers9

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#generalization

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Intermediate
Yinyi Luo, Yiqiao Jin et al.Feb 3arXiv

AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.

#multi-agent distillation#process reward model#GRPO

LatentMem: Customizing Latent Memory for Multi-Agent Systems

Intermediate
Muxin Fu, Guibin Zhang et al.Feb 3arXiv

LatentMem is a new memory system that helps teams of AI agents remember the right things for their specific jobs without overloading them with text.

#multi-agent systems#latent memory#role-aware memory

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner
Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

A Pragmatic VLA Foundation Model

Intermediate
Wei Wu, Fan Lu et al.Jan 26arXiv

LingBot-VLA is a robot brain that listens to language, looks at the world, and decides smooth actions to get tasks done.

#Vision‑Language‑Action#foundation model#Flow Matching

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Intermediate
Zhihao Xu, Rumei Li et al.Jan 15arXiv

The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.

#GEM pipeline#text-based trajectory generation#tool-use data synthesis

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision

GR-Dexter Technical Report

Intermediate
Ruoshi Wen, Guangzeng Chen et al.Dec 30arXiv

GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.

#vision-language-action#dexterous manipulation#bimanual robotics

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate
Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT