🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Kendall’s tau

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Intermediate
Yan Wang, Yi Han et al.Feb 19arXiv

This paper builds Conv-FinRe, a new test that checks if AI financial advisors give advice that fits a person’s true goals, not just what they clicked before.

#financial recommendation#utility-based evaluation#conversational benchmark

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Intermediate
Qi Qian, Chengsong Huang et al.Jan 7arXiv

Everyone uses tests (benchmarks) to judge how smart AI models are, but not all tests are good tests.

#LLM evaluation#benchmark quality#ranking consistency

Confidence Estimation for LLMs in Multi-turn Interactions

Intermediate
Caiqi Zhang, Ruihan Yang et al.Jan 5arXiv

This paper studies how sure (confident) large language models are during multi-turn chats where clues arrive step by step.

#multi-turn confidence estimation#LLM calibration#InfoECE