🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#multilingual evaluation

LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation

Beginner
Zhiling Yan, Dingjie Song et al.Feb 10arXiv

LiveMedBench is a new, always-updating test for medical AIs that keeps test questions safely separated from training data to avoid cheating by memorization.

#LiveMedBench#medical benchmark#data contamination

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking

Intermediate
Xilin Jiang, Qiaolin Wang et al.Jan 25arXiv

AVMeme Exam is a new test made by humans that checks if AI can understand famous internet audio and video clips the way people do.

#AVMeme Exam#multimodal large language models#audio-visual memes

FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models

Intermediate
Joona Kytöniemi, Jousia Piha et al.Dec 15arXiv

FIN-bench-v2 is a big, tidy set of Finnish tests that checks how good large language models are at many things like reading, logic, and world knowledge.

#Finnish language models#benchmark suite#HuggingFace Datasets