The paper builds an automated pipeline that translates AI benchmarks and datasets into many languages while keeping questions and answers correctly connected.
RoboCurate is a way to make better robot training videos by checking if the actions in a generated video actually match what a robot would do in a simulator.
Real attackers can try many prompts in parallel until a model slips, so testing safety with only one try badly underestimates risk.