The paper tackles a new kind of search called Wide Research, where an AI must gather lots of related facts under complex rules and put them into a clean table.
The paper introduces Rubric-ARM, a system that teaches two AI helpers—a rubric maker and a judge—to learn together using reinforcement learning so they can better decide which answers people would prefer.
Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.
Ministral 3 is a new family of small-but-mighty AI language models (3B, 8B, 14B) that learn from a larger model using a step-by-step tutoring method called Cascade Distillation.