Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.
The paper finds a strange gap: the model’s hidden thoughts almost perfectly show when it should use a tool, but its actual words often don’t trigger the tool under strict rules.
The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.
The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.
GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.
Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.