The paper finds a strange gap: the model’s hidden thoughts almost perfectly show when it should use a tool, but its actual words often don’t trigger the tool under strict rules.
The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.
The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.
GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.