Papers43

#supervised fine-tuning

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Jinpeng Chen, Cheng Gong et al.Mar 2arXiv

CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.

#constraint-guided verification#multi-turn tool use#user simulator

Not triaged yet

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate

Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Not triaged yet

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Beginner

Xinyu Zhu, Yihao Feng et al.Mar 1arXiv

CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.

#CHIMERA dataset#synthetic data generation#chain-of-thought

Not triaged yet

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Intermediate

Nils Schwager, Simon Münker et al.Feb 26arXiv

This paper tests whether AI can realistically guess what a specific social media user would comment when they see a new post.

#Conditioned Comment Prediction#LLM user simulation#implicit conditioning

Not triaged yet

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Intermediate

Qianben Chen, Tianrui Qin et al.Feb 26arXiv

This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.

#agentic search#parallel evidence acquisition#plan refinement

Not triaged yet

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Intermediate

Rui Yang, Qianhui Wu et al.Feb 25arXiv

GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.

#GUI agent#visual grounding#long-horizon navigation

Not triaged yet

On Data Engineering for Scaling LLM Terminal Capabilities

Intermediate

Renjie Pi, Grace Lam et al.Feb 24arXiv

This paper shows that you can vastly improve a model’s command-line (terminal) skills by carefully engineering the training data, not just by using a bigger model.

#Terminal-Bench 2.0#terminal agents#synthetic task generation

Not triaged yet

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding

Intermediate

Jihao Qiu, Lingxi Xie et al.Feb 24arXiv

LongVideo-R1 is a smart video-watching agent that jumps to the right moments in long videos instead of scanning everything.

#long video understanding#video navigation#multimodal large language model

Not triaged yet

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Intermediate

Zehao Chen, Gongxun Li et al.Feb 9arXiv

Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.

#weak-driven learning#logit mixing#weak agents

Not triaged yet

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Intermediate

Zimu Lu, Houxing Ren et al.Feb 3arXiv

This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.

#agentic coding#multi-agent systems#full-stack development

Not triaged yet

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Intermediate

Azmine Toushik Wasi, Wahid Faisal et al.Feb 3arXiv

SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.

#SpatiaLab#spatial reasoning#vision-language models

Not triaged yet

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

Beginner

Jianhao Ruan, Zhihao Xu et al.Feb 3arXiv

AOrchestra is like a smart conductor that builds the right mini-helpers (sub-agents) on demand to solve big, multi-step tasks.

#agent orchestration#sub-agent-as-tools#four-tuple abstraction

Not triaged yet

1 2 3 4