🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers14

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#generalization

KARL: Knowledge Agents via Reinforcement Learning

Beginner
Jonathan D. Chang, Andrew Drozdov et al.Mar 5arXiv

KARL is a smart search helper that learns to look up information step by step and explain answers using the facts it finds.

#grounded reasoning#enterprise search#reinforcement learning

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

Intermediate
Yutong Wang, Siyuan Xiong et al.Feb 26arXiv

Multi-agent systems are like teams of smart helpers, but one bad message can mislead the whole team.

#multi-agent systems#error propagation#test-time rectification

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Intermediate
Xitong Yang, Devansh Kukreja et al.Feb 17arXiv

SAM 3D Body (3DB) is a model that turns a single photo of a person into a full 3D body, feet, and hands mesh with state-of-the-art accuracy.

#human mesh recovery#3D human pose#Momentum Human Rig

Effective Reasoning Chains Reduce Intrinsic Dimensionality

Beginner
Archiki Prasad, Mandar Joshi et al.Feb 9arXiv

The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?

#intrinsic dimensionality#chain-of-thought#LoRA

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Intermediate
Yinyi Luo, Yiqiao Jin et al.Feb 3arXiv

AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.

#multi-agent distillation#process reward model#GRPO

LatentMem: Customizing Latent Memory for Multi-Agent Systems

Intermediate
Muxin Fu, Guibin Zhang et al.Feb 3arXiv

LatentMem is a new memory system that helps teams of AI agents remember the right things for their specific jobs without overloading them with text.

#multi-agent systems#latent memory#role-aware memory

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner
Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

A Pragmatic VLA Foundation Model

Intermediate
Wei Wu, Fan Lu et al.Jan 26arXiv

LingBot-VLA is a robot brain that listens to language, looks at the world, and decides smooth actions to get tasks done.

#Vision‑Language‑Action#foundation model#Flow Matching

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Intermediate
Zhihao Xu, Rumei Li et al.Jan 15arXiv

The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.

#GEM pipeline#text-based trajectory generation#tool-use data synthesis

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision

GR-Dexter Technical Report

Intermediate
Ruoshi Wen, Guangzeng Chen et al.Dec 30arXiv

GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.

#vision-language-action#dexterous manipulation#bimanual robotics

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate
Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model
12