Most people on Earth speak more than one language and often switch languages in the same chat, but AI tools arenβt tested well on this real behavior.
EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.