ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.
MemOCR is a new way for AI to remember long histories by turning important notes into a picture with big, bold parts for key facts and tiny parts for details.
This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
Innovator-VL is a new multimodal AI model that understands both pictures and text to help solve science problems without needing mountains of special data.
SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.
AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.
This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.
LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.
Endless Terminals is an automatic factory that builds thousands of realistic, checkable computer-terminal tasks so AI agents can practice and improve with reinforcement learning.
This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.
Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.