How I Study AI - Learn AI Papers & Lectures the Easy Way

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

Intermediate

Mohammad Rifqi Farhansyah, Hanif Muhammad Zhafran et al.Jan 24arXiv

Most people on Earth speak more than one language and often switch languages in the same chat, but AI tools aren’t tested well on this real behavior.

#code-switching#multilingual NLP#trilingual dialogue

EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A

Intermediate

Shijian Ma, Yan Lin et al.Jan 14arXiv

EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.

#evasion detection#earnings call Q&A#financial NLP

Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Beginner

Li-Zhong Szu-Tu, Ting-Lin Wu et al.Dec 24arXiv

The paper builds YearGuessr, a giant, worldwide photo-and-text dataset of 55,546 buildings with their construction years (1001–2024), GPS, and popularity (page views).

#YearGuessr#building age estimation#ordinal regression

Papers3

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A

Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models