DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.
Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.
CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.