RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.
FIN-bench-v2 is a big, tidy set of Finnish tests that checks how good large language models are at many things like reading, logic, and world knowledge.