๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#semantic drift

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

Intermediate
Guoxin Chen, Fanzhe Meng et al.Mar 3arXiv

BeyondSWE is a new benchmark that tests code agents on tougher, more real-life tasks than single-repo bug fixing.

#BeyondSWE#code agents#software engineering benchmark

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Intermediate
Hanna Yukhymenko, Anton Alexandrov et al.Feb 25arXiv

The paper builds an automated pipeline that translates AI benchmarks and datasets into many languages while keeping questions and answers correctly connected.

#machine translation#multilingual benchmarks#test-time compute scaling

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Intermediate
Tiansheng Hu, Yilun Zhao et al.Feb 5arXiv

SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.

#SAGE benchmark#scientific literature retrieval#deep research agents