Papers34

#supervised fine-tuning

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.

#agentic coding#multi-agent systems#full-stack development

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Intermediate

Azmine Toushik Wasi, Wahid Faisal et al.Feb 3arXiv

SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.

#SpatiaLab#spatial reasoning#vision-language models

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

Beginner

Jianhao Ruan, Zhihao Xu et al.Feb 3arXiv

AOrchestra is like a smart conductor that builds the right mini-helpers (sub-agents) on demand to solve big, multi-step tasks.

#agent orchestration#sub-agent-as-tools#four-tuple abstraction

Learning to Repair Lean Proofs from Compiler Feedback

Intermediate

Evan Wang, Simon Chess et al.Feb 3arXiv

This paper teaches AI how to fix broken Lean math proofs by learning from the compiler’s feedback, not just from finished, perfect proofs.

#Lean proof repair#compiler feedback#APRIL dataset

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Intermediate

Jialiang Zhu, Gongrui Zhang et al.Feb 2arXiv

Re-TRAC is a new way for AI search agents to learn from each try, write a clean summary of what happened, and then use that summary to do better on the next try.

#Re-TRAC#trajectory compression#deep research agents

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Intermediate

Honglin Lin, Zheng Liu et al.Jan 29arXiv

MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.

#multimodal reasoning#vision-language models#chain-of-thought

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Intermediate

Xiaoyu Tian, Haotian Wang et al.Jan 29arXiv

ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.

#tool-augmented agents#multi-turn decision making#verifiable environments

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

Beginner

Zhuoran Yang, Ed Li et al.Jan 28arXiv

This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.

#native reasoning#cybersecurity LLM#chain-of-thought

SERA: Soft-Verified Efficient Repository Agents

Intermediate

Ethan Shen, Danny Tormoen et al.Jan 28arXiv

SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.

#SERA#Soft-Verified Generation#soft verification

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate

Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

Towards Pixel-Level VLM Perception via Simple Points Prediction

Intermediate

Tianhui Song, Haoyu Lu et al.Jan 27arXiv

SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.

#SimpleSeg#multimodal large language model#decoder-free segmentation

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate

Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

1 2 3