๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#evaluation benchmark

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Intermediate
Lance Ying, Ryan Truong et al.Feb 19arXiv

The paper argues that the fairest way to check how generally smart an AI is, is to see how quickly and well it learns lots of different human-made games, just like a person with the same time and practice.

#general intelligence#evaluation benchmark#game-based testing

AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts

Intermediate
Shicheng Fang, Yuxin Wang et al.Jan 28arXiv

AgentLongBench is a new test that checks how well AI agents think over very long stories made of their own actions and the world's replies, not just by reading static documents.

#AgentLongBench#long-context agents#environment rollouts