This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.
SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.
AOrchestra is like a smart conductor that builds the right mini-helpers (sub-agents) on demand to solve big, multi-step tasks.
This paper teaches AI how to fix broken Lean math proofs by learning from the compiler’s feedback, not just from finished, perfect proofs.
Re-TRAC is a new way for AI search agents to learn from each try, write a clean summary of what happened, and then use that summary to do better on the next try.
MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.
ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.
OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.
SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.
This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.