Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.
TriPlay-RL is a three-role self-play training loop (attacker, defender, evaluator) that teaches AI models to be safer with almost no manual labels.