Most people on Earth speak more than one language and often switch languages in the same chat, but AI tools arenβt tested well on this real behavior.
EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.
The paper builds YearGuessr, a giant, worldwide photo-and-text dataset of 55,546 buildings with their construction years (1001β2024), GPS, and popularity (page views).