🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers34

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#supervised fine-tuning

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Intermediate
Jinpeng Chen, Cheng Gong et al.Mar 2arXiv

CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.

#constraint-guided verification#multi-turn tool use#user simulator

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate
Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Intermediate
Nils Schwager, Simon Münker et al.Feb 26arXiv

This paper tests whether AI can realistically guess what a specific social media user would comment when they see a new post.

#Conditioned Comment Prediction#LLM user simulation#implicit conditioning

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Intermediate
Qianben Chen, Tianrui Qin et al.Feb 26arXiv

This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.

#agentic search#parallel evidence acquisition#plan refinement

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Intermediate
Rui Yang, Qianhui Wu et al.Feb 25arXiv

GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.

#GUI agent#visual grounding#long-horizon navigation

On Data Engineering for Scaling LLM Terminal Capabilities

Intermediate
Renjie Pi, Grace Lam et al.Feb 24arXiv

This paper shows that you can vastly improve a model’s command-line (terminal) skills by carefully engineering the training data, not just by using a bigger model.

#Terminal-Bench 2.0#terminal agents#synthetic task generation

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding

Intermediate
Jihao Qiu, Lingxi Xie et al.Feb 24arXiv

LongVideo-R1 is a smart video-watching agent that jumps to the right moments in long videos instead of scanning everything.

#long video understanding#video navigation#multimodal large language model

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Intermediate
Zehao Chen, Gongxun Li et al.Feb 9arXiv

Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.

#weak-driven learning#logit mixing#weak agents

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Intermediate
Zimu Lu, Houxing Ren et al.Feb 3arXiv

This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.

#agentic coding#multi-agent systems#full-stack development

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Intermediate
Azmine Toushik Wasi, Wahid Faisal et al.Feb 3arXiv

SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.

#SpatiaLab#spatial reasoning#vision-language models

Learning to Repair Lean Proofs from Compiler Feedback

Intermediate
Evan Wang, Simon Chess et al.Feb 3arXiv

This paper teaches AI how to fix broken Lean math proofs by learning from the compiler’s feedback, not just from finished, perfect proofs.

#Lean proof repair#compiler feedback#APRIL dataset

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Intermediate
Jialiang Zhu, Gongrui Zhang et al.Feb 2arXiv

Re-TRAC is a new way for AI search agents to learn from each try, write a clean summary of what happened, and then use that summary to do better on the next try.

#Re-TRAC#trajectory compression#deep research agents
123