The paper introduces Rubric-ARM, a system that teaches two AI helpers—a rubric maker and a judge—to learn together using reinforcement learning so they can better decide which answers people would prefer.
Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.