Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.
TriPlay-RL is a three-role self-play training loop (attacker, defender, evaluator) that teaches AI models to be safer with almost no manual labels.