The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.
The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.
GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.
Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.