🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#F1 score

ContextBench: A Benchmark for Context Retrieval in Coding Agents

Intermediate
Han Li, Letian Zhu et al.Feb 5arXiv

ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.

#context retrieval#coding agents#software engineering benchmarks

ASA: Training-Free Representation Engineering for Tool-Calling Agents

Intermediate
Youjin Wang, Run Zhou et al.Feb 4arXiv

The paper finds a strange gap: the model’s hidden thoughts almost perfectly show when it should use a tool, but its actual words often don’t trigger the tool under strict rules.

#activation steering#representation engineering#tool calling

Ebisu: Benchmarking Large Language Models in Japanese Finance

Intermediate
Xueqing Peng, Ruoyu Xiang et al.Feb 1arXiv

EBISU is a new test that checks how well AI models understand Japanese finance, a language and domain where hints and special terms are common.

#EBISU#Japanese finance NLP#implicit commitment recognition