🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Kendall’s tau

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Intermediate
Qi Qian, Chengsong Huang et al.Jan 7arXiv

Everyone uses tests (benchmarks) to judge how smart AI models are, but not all tests are good tests.

#LLM evaluation#benchmark quality#ranking consistency

Confidence Estimation for LLMs in Multi-turn Interactions

Intermediate
Caiqi Zhang, Ruihan Yang et al.Jan 5arXiv

This paper studies how sure (confident) large language models are during multi-turn chats where clues arrive step by step.

#multi-turn confidence estimation#LLM calibration#InfoECE