🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers29

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

KARL: Knowledge Agents via Reinforcement Learning

Beginner
Jonathan D. Chang, Andrew Drozdov et al.Mar 5arXiv

KARL is a smart search helper that learns to look up information step by step and explain answers using the facts it finds.

#grounded reasoning#enterprise search#reinforcement learning

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Beginner
Zhenting Wang, Huancheng Chen et al.Mar 4arXiv

This paper teaches long-horizon AI agents to remember everything exactly without stuffing their whole memory at once.

#indexed memory#LLM agents#long-horizon tasks

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

Beginner
Jiachun Li, Shaoping Huang et al.Mar 2arXiv

MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.

#multimodal reasoning#multi-image understanding#real-life benchmark

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Beginner
Xinyu Zhu, Yihao Feng et al.Mar 1arXiv

CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.

#CHIMERA dataset#synthetic data generation#chain-of-thought

Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning

Beginner
Chris Samarinas, Haw-Shiuan Chang et al.Feb 26arXiv

SLATE is a new way to teach AI to think step by step while using a search engine, giving feedback at each step instead of only at the end.

#retrieval-augmented reasoning#reinforcement learning#GRPO

WorldCompass: Reinforcement Learning for Long-Horizon World Models

Beginner
Zehan Wang, Tengfei Wang et al.Feb 9arXiv

WorldCompass teaches video world models to follow actions better and keep pictures pretty by using reinforcement learning after pretraining.

#world models#reinforcement learning#clip-level rollout

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Beginner
Tiwei Bie, Maosong Cao et al.Feb 9arXiv

LLaDA2.1 teaches a diffusion-style language model to write fast rough drafts and then fix its own mistakes by editing tokens it already wrote.

#discrete diffusion language model#editable decoding#token-to-token editing

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner
Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

Kimi K2.5: Visual Agentic Intelligence

Beginner
Kimi Team, Tongtong Bai et al.Feb 2arXiv

Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.

#multimodal learning#vision-language models#joint optimization

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

Beginner
Zhuoran Yang, Ed Li et al.Jan 28arXiv

This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.

#native reasoning#cybersecurity LLM#chain-of-thought

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner
Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner
Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion
123